CN107590085A

CN107590085A - A kind of dynamic reconfigurable array data path and its control method with multi-level buffer

Info

Publication number: CN107590085A
Application number: CN201710712378.8A
Authority: CN
Inventors: 王珑; 沈海斌; 王星; 朱佳梁; 管旭光
Original assignee: Zhejiang University ZJU
Current assignee: Zhejiang University ZJU
Priority date: 2017-08-18
Filing date: 2017-08-18
Publication date: 2018-01-16
Anticipated expiration: 2037-08-18
Also published as: CN107590085B

Abstract

The invention discloses a kind of dynamic reconfigurable array data path and its control method with multi-level buffer, the system includes the interface module in the coarseness operators m esh arrays and data path of the multilevel cache system of 4 grades of caching compositions, the reconfigurable control module of multi-level buffer, support multi-level buffer, wherein：Multilevel cache system is under the control of configuration code, for completing the data storage in data path, data exchange and data syn-chronization；Multi-level buffer reconfigurable control module completes the mapping control between mesh arrays read-write read-write variable and multi-level buffer address under the control of configuration code；Coarseness operators m esh arrays, which are used to form, calculates link used, and under the control of configuration code, reads and writes variable to it according to the definition of DFD and be controlled.The configurable dynamic reconfigurable array data path and its method with multi-level buffer of the present invention, have data sharing degree high, data tape is roomy, can improve the computational efficiency of streamline and nonpipeline type.

Description

A kind of dynamic reconfigurable array data path and its control method with multi-level buffer

Technical field

The present invention relates to imbedded reconfigurable system regions, in particular it relates to which a kind of dynamic with multi-level buffer can weigh Structure array data path and its control method.

Background technology

Due to being gone back including special reconfigurable processing unit, reconfigurable arrays in the case where obtaining the superiority condition of performance and power consumption The flexibility of application can be kept, therefore is the idealized model during special field calculates.Reconfigurable arrays framework has two features： First, it is made up of reconfigurable data path and reconstructing controller on hardware configuration.Reconfigurable data path uses basic operation The array architecture of unit composition；Second, processor controlling stream and data stream separation.Reconfigurable data path is used for processing data Stream, and reconstructing controller is used to perform controlling stream to control reconfigurable data path.

Reconfigurable data path includes operator array, memory and interface.Under conditions of array scale is not increased, improve The method of the performance of reconfigurable data path mainly have it is following some：The concurrency calculated in first raising data path, is used Operation level is parallel and data level carries out data calculating processing parallel.Second improves the hardware utilization of streamline, reduces data and reads Idle running in the especially iterative type streamline of streamline caused by write delay.

The current main technological means of performance requirement for more than includes：Improve the shared journey of memory in data path Degree so that same memory multiple operators in the same clock cycle is by operator array are read simultaneously, and can be by another operator Renewal, this method can effectively improve the concurrency of data, reduce the idle running of streamline.Second, improve the data band of memory Width, by the working frequency and increase memory data bit wide that improve memory so that memory is with data path with frequency, storage Device data width is equal to the maximum bit wide of data path single treatment.

But there is situation about mutually restricting for the degree of share of memory and data bandwidth.As degree of share increases, Memory need to have a plurality of address wire and a data wire, and internal logical delay is linearly increasing.This can reduce the read-write band of memory Width, it can further cause the reduction of the flow work frequency.

In summary, during the present invention is realized, it is found by the applicant that storage system in existing reconfigurable data path Can not meet the needs of high shared and high bandwidth simultaneously, so as to limit reconfigurable arrays performance boost.

The content of the invention

It is an object of the present invention to be directed to the above-mentioned problems of the prior art and deficiency, a kind of reconfigurable arrays are proposed In have multi-level buffer structure data path, data processing degree of parallelism in reconfigurable data path can be effectively improved and improved Streamline execution efficiency.

To achieve the above object, the technical solution adopted by the present invention is：There is multi-level buffer knot in a kind of reconfigurable system The data path of structure, including multiple coarseness operators m esh arrays：By between arithmetic element (PE) array of isomorphism and they Interconnection unit composition, interconnected by a NE based on displacement between two adjacent row PE units, Ke Yiling The various topological structure DFG figures of support living；The multilevel cache system in interface module, data path in data path.

The coarseness operators m esh arrays, by arithmetic element (PE) array of isomorphism and the interconnection unit between them Composition.Interconnected, can flexibly be supported each by a NE based on displacement between two adjacent row PE units Kind topological structure DFG figures.Such array has n (n sizes are from 1 to 4) in data path.

Interface in the data path, for reading data from external memory storage and the information being sent into multilevel cache system In.

The multilevel cache system, it is made up of 4 grades of cachings：For completing the data storage in data path, data exchange And data syn-chronization.

Wherein, the multilevel cache system includes：

1st grade of caching：Between two PE being connected units, delay for the data in mesh array internal pipelines Deposit and data exchange, realized using register.

Level 2 cache memory：Between the mesh arrays being connected, for data in the streamline of multiple mesh arrays composition Caching and data exchange, it can also be used to the caching and data exchange of nonpipeline data between mesh arrays, there is 2n (n mesh Array quantity) individual read port and 2n write port, it can be realized simultaneously by all mesh array accesses using register file.

3rd level caches：Caching and exchange for data in nonpipeline between mesh arrays, there is 1 read port and 1 Individual write port, is realized using dual port RAM.

4th grade of caching：The 2nd grade or 3rd level caching are synchronized to for the data buffer storage of data-interface, and by input data, Read the output result of the 2nd grade or 3rd level caching and be sent to output interface, realized using FIFO.

Wherein, described two PE units being connected：The PE units of adjacent rows in mesh arrays, and two PE units There is signal wire connection between having.

Wherein, the streamline inside described mesh arrays：M (m values are in 1-8) is between row PE units and PE in mesh arrays Interconnection logic under the control of configurator, m level production lines can be formed.Calculation function at different levels is complete by PE units in streamline Into the interconnecting relation of streamline is completed by the interconnection logic between PE.

Wherein, the streamline of described multiple mesh arrays composition：N (n values are in 1-4) individual mesh arrays, in each array There are m (m values are in 1-8) row PE units, n*m level production lines can be formed, streamline calculation function at different levels is completed by PE units, mesh The interconnecting relation of streamline is completed by the interconnection logic between PE in array, and the interconnection between mesh arrays passes through slow to the 2nd grade The address deposited, which accesses, completes.

Wherein, the configuration code function of multi-level buffer is：To the 1st grade cache, by reconfigurable arrays PE configuration codes control PE it Between interconnecting relation, while also complete the 1st grade of buffer control；2nd grade and 3rd level are cached, by buffer control in mesh arrays Device is controlled, and the controller is read the 2nd grade and 3rd level caching according to mesh array configuration codes by clock cycle precision Write；4th grade is cached, is controlled by cache controller in mesh arrays, the controller is according to mesh array configuration codes, by the The hollow full scale will of FIFO and mesh array operation sequences are written and read operation in 4 grades of cachings.

Wherein, when the reconfigurable control module of described multi-level buffer is to 1 grade of buffer control, the register of access is by configuration code Determine, it is constant with holding is postponed in single；When controlling the 2nd grade of storage, it is necessary to the register of different address be accessed, by reading and writing Controller completes the control of its accessing operation, and the reading of the 2nd grade of storage and writing all is completed in signal period, and its control instruction is write Enter with readout without waiting for its completion；When controlling 3rd level storage, because read-write be able to could be completed within multiple cycles, need To increase reading in its control instruction and write effective mark, to reduce due to wait extra caused by the 2nd grade of storage of read-write Time；Control during the 4th grade of storage, it is necessary to the full signal of the sky in respective cache be waited, as the starting point of mesh array computations And terminal.

Present invention also offers a kind of collocation method of the data path applied to reconfigurable arrays multi-level buffer structure, bag Include following steps：

Step 1) is by new duty mapping into reconfigurable arrays.

If the new task of step 2) includes non-iterative type pile line operation, if the streamline is by a mesh battle array Row perform, then the input data in streamline and output data are mapped in level 2 cache memory, other intermediate data are mapped to the 1st In level caching；If streamline is made up of multiple mesh arrays, by the input data of streamline, output data and across difference The data of mesh arrays are mapped to level 2 cache memory, and other intermediate data are mapped in the 1st grade of caching.

If the new task of step 3) includes iterative type streamline, if the streamline is held by a mesh array OK, then by streamline input data, output data and need feedback iteration handle data be mapped in level 2 cache memory, other Intermediate data is mapped in the 1st grade of caching；If the streamline is made up of multiple mesh arrays, by the input number of streamline Level 2 cache memory is mapped to according to, output data, across the data of different mesh arrays and the data of feedback iteration, other intermediate data It is mapped in the 1st grade of caching.

Beneficial effect：Technical scheme is used for the dynamic reconfigurable array data with multi-level buffer by a kind of Path and its control method, the degree of share of memory and the data bandwidth of storage system in reconfigurable arrays operation are improved, The structure and collocation method of storage system in traditional reconfigurable array are changed, so as to improve the operation of reconfigurable arrays effect Rate.

Brief description of the drawings

Accompanying drawing is used for providing a further understanding of the present invention, and a part for constitution instruction, the reality with the present invention Apply example to be used to explain the present invention together, be not construed as limiting the invention.In the accompanying drawings：

Fig. 1 is the dynamic reconfigurable array data path architecture schematic diagram with multi-level buffer；

Fig. 2 is the data structure in multi-level buffer in data path；

Fig. 3 is the collocation method figure of the data path of reconfigurable arrays multi-level buffer structure.

Embodiment

The preferred embodiments of the present invention are illustrated below in conjunction with accompanying drawing, it will be appreciated that described herein preferred real Apply example to be merely to illustrate and explain the present invention, be not intended to limit the present invention.

As shown in figure 1, the reconfigurable data path of the present embodiment includes multiple coarseness operators m esh arrays：By isomorphism Arithmetic element (PE) array and the interconnection unit between them form, and are based on putting by one between two adjacent row PE units The NE changed is interconnected；Interface in data path：For from external memory storage read operands and by configuration information It is sent in multilevel cache system；1 grade of caching in data path：Between two PE being connected units, for mesh battle arrays Data buffer storage and data exchange in row internal pipeline, are realized using register；Level 2 cache memory in data path：Positioned at phase Between the mesh arrays of connection, for data buffer storage and data exchange in the streamline of multiple mesh arrays composition, it can also be used to The caching and data exchange of nonpipeline data between mesh arrays；3 grades of cachings in data path：For non-between mesh arrays The caching of data and exchange in streamline；4 grades of cachings in data path：For the data buffer storage of data-interface, and will input Data syn-chronization is into 2 grades or 3 grades cachings.

As shown in Fig. 2 the caching key data structure in reconfigurable arrays data path：Mesh battle arrays are stored in 1 grade of caching Pipeline data in row, these data are present in the register monopolized in PE.When streamline is run, each 1 grade of cycle is slow Data in depositing all are updated；The data exchanged between mesh arrays are stored in level 2 cache memory, these data can be used as occupancy more The pipeline data of individual mesh arrays, the data that can be also operated as nonpipeline.During as pipeline data, each cycle this A little data are all updated.During as nonpipeline data, after higher level mesh arrays are completed to operate, the data are just updated；3 grades Buffer memory nonpipeline data, therefore the readwrite bandwidth of 3 grades of cachings is less than 2 level production lines.3 level production lines also can be by 2 simultaneously The data cached monolithic backup of level.Such case can occur when the flowing water that level 2 cache memory participates in is interrupted；4 grades of caching conducts can weigh The caching that structure array and external data exchange.After external data inputs reconfigurable arrays, these data store in 4 grades of cachings And it is switched in level 2 cache memory.And the result of calculation in level 2 cache memory can be read by 4 grades of cachings by external bus.

As shown in figure 3, the collocation method cached in the data path with multi-level buffer：For streamline executive mode, The data of streamline link in same mesh arrays are mapped in 1 grade of caching shared by link PE.And across mesh battle array The streamline link of row, its data storage is in level 2 cache memory.When streamline interrupts, the data in level 2 cache memory are cached to 3 In level caching.When streamline recovers, the data in 3 grades of cachings are restored in level 2 cache memory.And level 2 cache memory is as mesh battle arrays The input of row, can be recovered according to the configuration of mesh arrays according to mesh array pipelining line series in PE 1 grade are data cached；It is right In nonpipeline executive mode, it is configured to across the data of mesh arrays in 3 grades of cachings, after the mesh arrays are finished, Next mesh arrays read the variable from 3 grades of cachings and continue to calculate.

Claims

1. a kind of dynamic reconfigurable array data path with multi-level buffer, it is characterised in that formed including 4 grades of cachings more Level caching system, the reconfigurable control module of multi-level buffer, the coarseness operators m esh arrays and data path of support multi-level buffer In interface module；

The function that the multilevel cache system requires according to configuration code, for completing the data storage in data path, data are handed over Change and data syn-chronization；

Interface module in the data path, for reading data from external memory storage and the information being sent into multi-level buffer system In system；

The reconfigurable control module under the control of configuration code, complete mesh arrays read-write read-write variable and multi-level buffer address it Between mapping control；

The coarseness operators m esh arrays, which are used to form, calculates link used, and under the control of configuration code, according to data The definition of flow graph is read and write variable to it and is controlled.

2. the dynamic reconfigurable array data path according to claim 1 with multi-level buffer, it is characterised in that described Multilevel cache system include,

1st grade of caching, between two PE being connected units, for the data buffer storage in mesh array internal pipelines and Data exchange,

Level 2 cache memory, it is made up of the memory that pipeline data exchange is carried out between mesh arrays, the memory is by multiple mesh Array is shared；

3rd level caches, and is made up of the caching that data exchange in nonpipeline is carried out between mesh arrays, between mesh arrays The caching of data and exchange in nonpipeline；

4th grade of caching, is made up of, the number for data-interface the shared memory between reconfigurable data path and external interface 2 grades or 3 grades cachings are synchronized to according to caching, and by input data, the output result of 2 grades or 3 grades cachings is read and is sent to output Interface.

3. the dynamic reconfigurable array data path according to claim 2 with multi-level buffer, it is characterised in that described The 1st grade caching using register realize, level 2 cache memory using register file realize, 3rd level caching using dual port RAM realization, 4th grade of caching is realized using FIFO.

4. the dynamic reconfigurable array data path according to claim 2 with multi-level buffer, it is characterised in that described The 1st grade caching in store mesh arrays in pipeline data, these data exist in PE monopolize register in, in flowing water When line is run, the data in each 1 grade of caching of cycle are updated；The data exchanged between mesh arrays are stored in level 2 cache memory, These data as the pipeline data for taking multiple mesh arrays, can also be used as the data of nonpipeline operation, wherein During as pipeline data, each cycle, these data were updated, during as nonpipeline data, when higher level's mesh arrays are complete Into after operation, the data are just updated；3rd level buffer memory nonpipeline data, while the flowing water quilt participated in level 2 cache memory During interruption, 3rd level streamline is by level 2 cache memory data monolithic backup；4th grade of caching is handed over as reconfigurable arrays and external data The caching changed, after external data inputs reconfigurable arrays, these data store in the 4th grade of caching and are switched to the 2nd grade In caching, and the result of calculation in level 2 cache memory is read by the 4th grade of caching by external bus.

5. the dynamic reconfigurable array data path according to claim 1 with multi-level buffer, it is characterised in that described Configuration code function be：1st grade is cached, by interconnecting relation between reconfigurable arrays PE configuration codes control PE, the 1st grade is completed and delays Deposit control；Level 2 cache memory and 3rd level are cached, are controlled by cache controller in mesh arrays, the controller is according to mesh Array configuration code, the 2nd grade and 3rd level caching are written and read by clock cycle precision；4th grade is cached, by mesh arrays Cache controller is controlled, the controller according to mesh array configuration codes, by the hollow full scale will of FIFO in the 4th grade of caching and Mesh array operation sequences are written and read operation.

6. the dynamic reconfigurable array data path according to claim 1 with multi-level buffer, it is characterised in that described Multi-level buffer reconfigurable control module to 1 grade of buffer control when, what the register of access was determined by configuration code, configured in single After keep constant；When controlling the 2nd grade of storage, it is necessary to access the register of different address, its memory access is completed by read-write controller The control of operation, the reading and writing of the 2nd grade of storage are all completed in signal period, and the write-in of its control instruction and readout need not Wait its completion；When controlling 3rd level storage, because read-write be able to could be completed within multiple cycles, it is necessary in its control instruction Effective mark is read and write in increase, to reduce due to the stand-by period extra caused by the 2nd grade of storage of read-write；The 4th grade is controlled to deposit , it is necessary to the full signal of the sky in respective cache be waited, as the beginning and end of mesh array computations during storage.

7. the dynamic reconfigurable array data path according to claim 1 with multi-level buffer, it is characterised in that described Support multi-level buffer coarseness operators m esh arrays, the arithmetic element array by isomorphism and the interconnection unit between them Form, the arithmetic element in arithmetic element array is mainly made up of the register of ALU units and storage ephemeral data, each computing The calculation function that the execution dispensing unit of unit independence is specified, the basic granularity of each arithmetic element is 8 bits, with phase in a line 4 adjacent PE units form the reconfigurable cell group of a 32 bit bit wides, support the arithmetic operation of 32 bit bit wides, adjacent Interconnected between two row PE units by a NE based on displacement.

A kind of 8. control of the data path of reconfigurable arrays multi-level buffer structure applied to described in claim 1-7 any one Method processed, it is characterised in that comprise the following steps：

Step 1) is by new duty mapping into reconfigurable arrays；

If the new task of step 2) includes non-iterative type pile line operation, if the streamline is held by a mesh array OK, then the input data in streamline and output data are mapped in level 2 cache memory, other intermediate data are mapped to 1 grade of caching In；If streamline is made up of multiple mesh arrays, by the input data of streamline, output data and across different mesh arrays Data be mapped to level 2 cache memory, other intermediate data be mapped to 1 grade caching in；

If the new task of step 3) includes iterative type streamline, if the streamline is performed by a mesh array, By streamline input data, output data and the data of feedback iteration processing are needed to be mapped in level 2 cache memory, other intermediate data It is mapped in 1 grade of caching；If the streamline is made up of multiple mesh arrays, by the input data of streamline, output data, Level 2 cache memory is mapped to across the data of different mesh arrays and the data of feedback iteration, other intermediate data are mapped to 1 grade of caching In.