WO2022199459A1 - 一种可重构处理器及配置方法 - Google Patents

一种可重构处理器及配置方法 Download PDF

Info

Publication number
WO2022199459A1
WO2022199459A1 PCT/CN2022/081526 CN2022081526W WO2022199459A1 WO 2022199459 A1 WO2022199459 A1 WO 2022199459A1 CN 2022081526 W CN2022081526 W CN 2022081526W WO 2022199459 A1 WO2022199459 A1 WO 2022199459A1
Authority
WO
WIPO (PCT)
Prior art keywords
computing
array
calculation
data
stage
Prior art date
Application number
PCT/CN2022/081526
Other languages
English (en)
French (fr)
Inventor
赵旺
肖刚军
赖钦伟
Original Assignee
珠海一微半导体股份有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 珠海一微半导体股份有限公司 filed Critical 珠海一微半导体股份有限公司
Priority to EP22774120.4A priority Critical patent/EP4283481A1/en
Publication of WO2022199459A1 publication Critical patent/WO2022199459A1/zh

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F15/00Digital computers in general; Data processing equipment in general
    • G06F15/76Architectures of general purpose stored program computers
    • G06F15/78Architectures of general purpose stored program computers comprising a single central processing unit
    • G06F15/7867Architectures of general purpose stored program computers comprising a single central processing unit with reconfigurable architecture
    • G06F15/7871Reconfiguration support, e.g. configuration loading, configuration switching, or hardware OS
    • G06F15/7878Reconfiguration support, e.g. configuration loading, configuration switching, or hardware OS for pipeline reconfiguration

Definitions

  • the present invention relates to the technical field of reconfigurable computing, and in particular, to a reconfigurable processor and a configuration method.
  • Reconfigurable computing means that a computing system can use reusable hardware resources to flexibly reconfigure its own computing path according to different application requirements, so as to provide a computing structure that matches each specific application requirement.
  • the coarse-grained reconfigurable processor has the advantages of both general-purpose computing and special-purpose computing, and has a good compromise between programming flexibility and computing energy efficiency.
  • the reconfigurable array will have a great impact on the performance of the reconfigurable system due to its efficiency and flexibility.
  • the existing reconfigurable array structures of coarse-grained reconfigurable processors rarely take into account the internal pipeline properties, making complex operations a bottleneck in the computing speed of reconfigurable arrays, with low clock frequency and low computing efficiency.
  • reconfigurable arrays include fully functional computing units, such as adder-subtractors, multipliers, dividers, square extractors, trigonometric function calculators, and so on.
  • fully functional computing units such as adder-subtractors, multipliers, dividers, square extractors, trigonometric function calculators, and so on.
  • most of these computing units adopt a pipelined design. Due to the different computational complexity to be implemented, the computational pipeline depths of different computing units are often different, which makes it difficult for reconfigurable arrays to achieve overall pipelined data processing and limits the improvement of pipeline computing performance of reconfigurable processors.
  • the technical solution of the present invention discloses a reconfigurable processor with an adaptively configured pipeline depth, which adopts multi-stage pipeline control for the reconfigurable array.
  • the specific technical solution includes: a reconfigurable processor, wherein The reconfigurable processor includes a reconfigurable configuration unit and a reconfigurable array; the reconfigurable configuration unit is used to provide reconfiguration information for reconfiguring the computing structure in the reconfigurable array according to an algorithm matched to a current application scenario;
  • the reconfigurable array includes at least two stages of computing arrays, and the reconfigurable array is used to connect two adjacent stages of computing arrays to meet the computing requirements of the algorithm matching the current application scenario according to the reconstruction information provided by the reconfiguration configuration unit
  • the pipeline depths of different computing modules connected to the data path pipeline structure are equal, so that different computing modules connected to the data path pipeline structure output data synchronously; each stage At least one computing module is arranged inside the computing array; wherein, on each column of a reconfigurable array, there is
  • the computing array exists in the reconfigurable array in a cascaded structure; wherein, each stage of the pipeline structure of the data path corresponds to a first stage of the computing array; in each stage of the computing array, the The computing module is equivalent to accessing the corresponding first-level pipeline of the data path pipeline structure; wherein, the pipeline depth is the time it takes for data to flow through the corresponding data path of the data path pipeline structure.
  • the reconfigurable processor is based on adjacent interconnected computing modules for executing computing instructions, and the pipeline depth of the reconfigured adjustment data passing through each stage of the computing array is the same and satisfies the algorithm requirements.
  • the data path pipeline structure for computing requirements so that the reconfigurable processor can configure the appropriate pipeline depth according to different algorithms.
  • the throughput rate of the processor can give full play to the computing performance of the reconfigurable processor, and also reduce the hardware resources required to be configured in the pipeline design of the prior art.
  • the reconfigurable processor further includes an input FIFO group and an output FIFO group; the output ends of the input FIFO group are respectively connected to the input ends of the reconfigurable array, and the reconfigurable array is used for reconfiguring the information according to the reconfiguration information.
  • the input FIFO group is set as the external data to enter the cache of the reconfigurable processor
  • the output FIFO group is set as the cache of the reconfigurable processor to output data to the outside, so as to match the algorithm for the reconfigurable processor. Configure the data exchange and storage requirements between the processor and external system components.
  • the method of connecting the computing arrays of adjacent stages into a data path pipeline structure that satisfies the computing requirements of the algorithm includes: there is no difference between two computing arrays of non-adjacent stages. Cross-stage connection through data paths, so that two-stage computing arrays that are not adjacent to each other are not directly connected to form the data path pipeline structure; and there is no data path between different computing modules in the same-stage computing array; the first-stage computing
  • the input end of the calculation module in the array is used as the input end of the reconfigurable array, and is configured to be connected to the output end matching the input FIFO group based on the reconstruction information;
  • the input end of the calculation module is configured to be connected to the output end of the calculation module that matches a row in the adjacent previous-stage calculation array based on the reconstruction information, wherein the current-stage calculation array is in the available calculation array.
  • the reconstruction array is not the first-level computing array; the output end of the computing module in the current-level computing array is configured to match a row of computing modules in the next-level computing array adjacent to the computing module based on the reconstruction information
  • the input terminals of the reconfigurable array are connected to each other, wherein the current stage of computing array in the reconfigurable array is not the last stage computing array; the output terminal of the computing module in the last stage computing array is used as the output terminal of the reconfigurable array , which is configured to be connected to the input end matching the output FIFO group based on the reconstruction information; wherein, the level of the adjacent previous-level computing array is one level smaller than the level of the current-level computing array.
  • the level of the adjacent next-level computing array is one level higher than that of the current-level computing array; the data path is a path for data transmission.
  • the computing modules in the two adjacent stages of the reconfigurable array are serially connected into the data path pipeline structure according to the reconfiguration information, which reduces the complexity of the interconnection network path, and at the same time It can also realize multi-stage pipeline control simply and efficiently.
  • the reconfigurable array is used to connect a multi-channel data path pipeline structure according to the reconstruction information, so as to meet the application requirements of the number of multiple algorithms to be executed synchronously.
  • the reconstruction information of the calculation module provided by the reconstruction configuration unit includes second configuration information, first configuration information, and third configuration information;
  • the calculation module includes a calculation control unit, a compensation unit, a first interconnection unit, a second configuration information an interconnection unit;
  • a first interconnection unit configured to connect the first interconnection unit and the computing control unit to the current stage pipeline of the data path pipeline structure according to the first configuration information, wherein the first interconnection unit is used for When the first stage of the pipeline corresponds to the first stage of the calculation array, input the data to be calculated output by the matched output terminals in the input FIFO group to the calculation control unit; the first interconnection unit is also used for the current stage of the pipeline.
  • the calculation control unit When not corresponding to the first-level calculation array, input the calculation result output by the matching calculation module in the adjacent previous-stage calculation array to the calculation control unit; the calculation control unit is used for, according to the second configuration information, Choose to connect as a data through path to control the data input to the calculation control unit to pass directly through and transmit to the compensation unit without performing calculations, or select to connect as a data calculation path to control the data input to the calculation control unit.
  • the data path includes a data direct path and a data calculation path;
  • the compensation unit is used for selecting the corresponding delay difference according to the third configuration information to compensate the pipeline depth delay of the same calculation module Calculate the maximum pipeline depth allowed by the array for the current stage;
  • the second interconnection unit is used to connect the second interconnection unit and the compensation unit to the current stage of the pipeline structure of the data path according to the first configuration information, wherein, the second interconnection unit is used to transmit the data processed by the delay compensation of the compensation unit to the matching output FIFO in the output FIFO group when the current stage of the pipeline corresponds to the last stage of the calculation array;
  • the second The interconnection unit is also used to transmit the data processed by the delay compensation of the compensation unit to the matching calculation module in the adjacent next-stage calculation array when the current stage of the pipeline does not correspond to the last stage of the calculation array; wherein , in the same computing module of the current first-level computing array, the input end of the first interconnection unit is the input end of the computing module
  • the computing module is connected to the adjacent previous-level computing array through the first interconnection unit, the computing module is connected to the adjacent subsequent-level computing array through the second interconnection unit, and the first interconnection unit is connected to the second computing array.
  • a calculation control unit and a compensation unit are connected between the interconnection units to form a pipeline based on the reconstruction information, so that the calculation modules are set into a reconfigurable interconnection logic mode according to adjacent columns, and the hardware structure is simple; Determine the maximum pipeline depth of the computing array of the current stage on the basis of determining the computing module that actually performs the computing function of the computing array of the current stage, and then use the difference between the maximum pipeline depth and the pipeline depth of the computing control unit of the computing array of the same stage , the corresponding computing control unit is compensated for the pipeline depth, so that the pipeline depth of the data passing through different computing modules of each stage of the computing array is the same, thereby solving the problem of the coarse-grained reconfigurable processor in the prior art (the reconfigurable processor Iterative processing of data in a
  • the third configuration information is a gating signal, which is used for, after the reconstruction configuration unit determines the calculation control unit with the largest consuming pipeline depth in the current stage pipeline of the data path pipeline structure, In all the calculation modules of the current stage of the pipeline, the matching register path set inside the compensation unit for generating the delay difference is gated, and then the output data of the calculation control unit of the current stage of the pipeline is controlled at The register path transmission, until the corresponding calculation module is output, is determined: the pipeline depth of the calculation module of the current stage of the pipeline is delayed and compensated to the maximum pipeline depth allowed by the current stage of the calculation array; wherein, the compensation unit is It is implemented by using selectors and registers; wherein, the maximum pipeline depth allowed by the current stage computing array is the pipeline depth of the computing control unit where the data flows through the corresponding data paths of the data path pipeline structure with the largest time spent. Therefore, pipeline compensation is performed for the computing control unit that does not reach the pipeline depth of the current one-level computing array, and the overall pipeline of data processing of the reconfigurable
  • the register path for compensating the delay difference is composed of a preset number of registers, and these registers are triggered by the third configuration information to register the same calculation.
  • the selector inside the compensation unit is controlled to connect the calculation control unit to the register path for generating the appropriate delay difference, so that any data passes through the same pipeline depth.
  • Different computing modules of the data path pipeline structure are connected to the same-stage computing array.
  • the first configuration information includes: required for connecting the first interconnect unit in the first-stage computing array and the matching input FIFO set in the input FIFO group to the data path pipeline structure.
  • access address information and time information connect the first interconnection unit in the current stage of the computing array and the second interconnection unit in the adjacent previous stage of the computing array to the data path pipeline structure
  • the access address information and time information required in the computing array of the current level and the matching first interconnection unit in the computing array of the current stage and the computing array of the next stage are connected to the data Access address information and time information required in the channel pipeline structure, connect the second interconnection unit in the last stage of the calculation array and the matching output FIFO set in the output FIFO group to the data channel pipeline structure access address information and time information required in the data path; wherein, both the first interconnection unit and the second interconnection unit support forming between the computing modules in the reconfigurable array or in the data path pipeline structure
  • the topological structure of the interconnection between them can meet the complete function of the algorithm.
  • the technical solution sends the data to the corresponding input end of the first-stage computing array of the multi-stage pipeline, so that the data is sent to the corresponding output after being processed by the computing array in the multi-stage pipeline.
  • FIFO so that under different computing application requirements, when the reconstruction configuration unit switches from providing one kind of reconstruction information to another kind of reconstruction information, it is guaranteed to form a complete interconnection logic between the computing arrays of adjacent stages.
  • the second configuration information is also a gating signal, which is used to control the data transmitted by the first interconnection unit to be gated and output between the data direct path and the data calculation path, so as to meet the requirements of the The calculation requirements of the algorithm in each stage of the pipeline structure of the data path; wherein, the calculation control unit is implemented by a data selector and an arithmetic logic circuit. Thereby, it is determined whether the calculation control unit is currently performing a calculation function.
  • calculation types performed by the calculation control unit include addition and subtraction, multiplication, division, square root, and trigonometric calculation; wherein, the types of calculation control units in each level of calculation array are not all the same, or all the same; wherein , the types of computing control units between two adjacent computing arrays are not all the same, or all the same.
  • the type and number of the computing control units in each level of computing array can be adjusted according to specific application fields and performance requirements.
  • the reconfigurable array can be calculated by changing The calculation method of addition, subtraction, multiplication and division of the module makes the computing control units at all levels interconnected in the reconfigurable array suitable for a variety of algorithms, and can be flexibly configured according to the needs of different algorithms, thus changing the traditional computing array in which one algorithm cooperates with one
  • the fixed array mode greatly improves the computational cost and efficiency.
  • the reconfigurable array is provided with a six-level computing array, and each level of computing array is provided with four-row computing modules, and the six-level computing array is connected under the configuration of the reconstruction information provided by the reconstruction configuration unit.
  • a six-stage pipeline to form the data path pipeline structure and support computing operations of specific granularity; wherein, only one computing module is set in each row in the same-stage computing array; wherein, the four computing modules set in the first-stage computing array are The input terminals are respectively connected to the output terminals of four different input FIFOs set in the input FIFO group based on the reconstruction information, and the output terminal of a computing module set in the sixth-stage computing array is connected based on the reconstruction information.
  • the output terminal of one output FIFO set in the output FIFO group is arranged in the reconfigurable array, and 4 computing units are arranged in rows in each column of computing arrays, for a total of 24 computing units, so as to form a 6*4-stage pipeline, and then 6* A data path pipeline structure conforming to the reconstruction information is reconstructed in the 4-stage pipeline.
  • a configuration method based on the reconfigurable processor comprising: according to the calculation requirements of the current application scenario matching algorithm, connecting the calculation arrays of adjacent stages of the reconfigurable array to support data through the same pipeline depth with equal pipeline depth.
  • the computing module connected to the data path is the current stage pipeline of the data path pipeline structure; wherein, the pipeline depth is the time it takes for data to flow through the corresponding data path of the data path pipeline structure.
  • the configuration method is based on the adjacent interconnected computing modules used to execute computing instructions, and the pipeline depth of the reconstructed adjustment data passing through each stage of the computing array is the same and meets the computing requirements of the algorithm.
  • the data path pipeline structure allows the reconfigurable processor to configure the appropriate pipeline depth according to different algorithms, and on this basis, realizes the overall pipeline of the data processing operation of the reconfigurable array, and improves the reconfigurable processor. Throughput, give full play to the computing performance of reconfigurable processors.
  • the configuration method further includes: configuring the reconfigurable array to receive the data to be calculated transmitted from the input FIFO group, and transmitting the data to be calculated to the data path pipeline structure, and configuring the reconfigurable array at the same time.
  • the structure array outputs the calculation result of the calculation array corresponding to the last stage of the data path pipeline structure to the output FIFO group.
  • the technical solution configures external data to enter the cache of the reconfigurable processor, and sets the cache of the reconfigurable processor to output data to the outside, so as to match the algorithm for the reconfigurable processor and the external system Data exchange and storage requirements of components.
  • the specific configuration method for connecting into the data path pipeline structure includes: in a computing module of the current one-level computing array, judging whether the current one-level computing array is detected as the first stage corresponding to the data path pipeline structure.
  • the pipeline is to connect the first interconnection unit and the calculation control unit to form the first-stage pipeline of the data path pipeline structure, and configure the first interconnection unit to output the matching output terminal in the input FIFO group.
  • the calculation data is input to the calculation control unit; otherwise, the first interconnection unit and the calculation control unit are connected into the current stage pipeline of the data path pipeline structure, and the first interconnection unit is configured to calculate the adjacent previous stage.
  • the calculation result output by the matching calculation module in the array is input to the calculation control unit; it is judged whether the current one-stage calculation array is detected as corresponding to the last-stage pipeline, and if so, the second interconnection unit and the compensation unit are connected into the described the last stage of the data path pipeline structure, and configure the second interconnection unit to transmit the data processed by the delay compensation of the compensation unit to the matching output FIFO in the output FIFO group; otherwise, the second interconnection unit It is connected with the compensation unit to form the current stage pipeline of the data path pipeline structure, and the second interconnection unit is configured to transmit the data processed by the delay compensation of the compensation unit to the adjacent latter stage calculation array for matching
  • the calculation module judge whether the calculation control unit detects the calculation gating signal, if yes, configure the data input to the calculation control unit to output to the compensation unit after performing the calculation, otherwise configure the data input to the calculation control unit to not execute the calculation Under the premise of directly passing through and transmitting to the compensation unit; then configure the compensation unit to select the corresponding delay difference to perform
  • the time compensation is the maximum pipeline depth allowed by the current level of computing array; wherein, the maximum pipeline depth allowed by the current level of computing array is the calculation that takes the longest time for data to flow through the data path in the current level of computing array
  • the pipeline depth of the control unit wherein, the calculation module includes a calculation control unit, a compensation unit, a first interconnection unit, and a second interconnection unit; in the same calculation module of each stage of the calculation array, the input end of the first interconnection unit is the calculation module
  • the input end of the first interconnection unit is connected with the input end of the calculation control unit, the output end of the calculation control unit is connected with the input end of the compensation unit, the output end of the compensation unit is connected with the input end of the second interconnection unit, and the first end of the calculation control unit is connected with the input end of the compensation unit.
  • the output terminal of the two interconnected units is the output terminal of the computing module.
  • the technical solution can determine the computing module that actually performs the computing function of the computing array of the current stage and determine the maximum pipeline depth of the computing array of the current stage, and then use the maximum pipeline depth and the pipeline depth of the computing control unit of the computing array of the same stage.
  • the pipeline compensation is performed on the corresponding computing control unit, so that the data passing through the different computing modules of each stage of the computing array has the same pipeline depth, and then solves the problem of the coarse-grained reconfigurable processor (a kind of the reconfigurable processor). type), the clock frequency is not high, and the computational efficiency is low.
  • the computing control unit and the compensation unit are connected to form a first-level pipeline structure of the data path pipeline structure, thereby realizing multi-level pipeline control.
  • the reconfigurable array two computing arrays of mutually non-adjacent stages are not connected across stages through data paths, so that the two computing arrays of mutually non-adjacent stages are not directly connected to form the The data path pipeline structure; and there is no data path between different computing modules in the same-level computing array, wherein the data path is a data transmission path.
  • it not only ensures the flexibility of the reconfigurable array, but also simplifies the complexity of interconnecting network paths.
  • FIG. 1 is a schematic structural diagram of a reconfigurable processor disclosed in an embodiment of the present invention.
  • FIG. 2 is a schematic diagram of a reconfigurable array with 6-stage pipelines, which is composed of a six-stage computing array (there are four rows of computing modules in each stage of the computing array) in an embodiment of a reconfigurable processor of the present invention, wherein FIG. 2
  • the reconfigurable array is set with 4 different input terminals connected to 4 different input FIFOs respectively.
  • the reconfigurable array in Figure 2 has one output terminal connected to one output FIFO.
  • FIG. 3 is a calculation module (a-1) b1 of row b1 in the (a-1)-th calculation array in an embodiment of the present invention, and calculation module ab2 of the b2-th row of the a-th calculation array, and the calculation module of (a-1)
  • FIG. 4 is a flowchart of a configuration method based on a reconfigurable processor disclosed by another embodiment of the present invention.
  • Each unit module involved in the following embodiments is a logic circuit
  • a logic circuit may be a physical unit, or a state machine composed of multiple logic devices according to a certain read and write sequence and signal logic changes, It can also be a part of a physical unit, and can also be implemented in a combination of multiple physical units.
  • the embodiments of the present invention do not introduce units that are not closely related to solving the technical problems proposed by the present invention, but this does not mean that there are no other units in the embodiments of the present invention .
  • the present invention discloses a reconfigurable processor
  • the reconfigurable processor includes a reconfigurable configuration unit and a reconfigurable array;
  • a reconstruction configuration unit configured to provide reconstruction information of the computing module according to the algorithm matched by the current application scenario, wherein the reconstruction information is configured according to the requirements of the algorithm and used to reconstruct the interior of the reconfigurable array
  • the reconfigurable processor accepts external reconstruction information (including the combination parameters and timing parameters of the logic circuit) for changing the interconnection logic of the computing module for the current data processing application scenario,
  • the reconfigurable processor then changes the physical architecture formed by connecting multiple computing modules based on the reconfiguration information, and then the reconfigurable processor outputs the calculation result, which is equivalent to the software programming calling algorithm in the current data processing application scenario ( algorithm library function) to calculate the corresponding calculation result.
  • the reconfigurable array can be connected to a matching computing structure according to different application requirements, so that it can not only face specific It is designed based on several algorithms in the field, and accepts reconstruction information transplanted into algorithms in other fields to improve flexibility.
  • the reconfigurable array includes at least two-level computing arrays, and within a reconfigurable array, at least two computing arrays are arranged hierarchically, that is, there are at least two computing arrays connected in cascade, or it is understood that there are at least two adjacent computing arrays.
  • Array or it is understood that there are at least two adjacent stages of computing arrays, wherein, only one computing array is set on each column of a reconfigurable array, and one computing array on each column is a first-level computing array; in this reconfigurable array
  • the number of computing arrays in the reconfigurable array is preset, and these computing arrays exist in the reconfigurable array in a cascaded structure.
  • the following description corresponds to the pipeline, all using a one-level computing array to describe a column of computing arrays. Therefore, it is convenient to be connected in hardware to form an interconnection structure of the reconfigurable array.
  • the reconfigurable array is used to connect the computing arrays of two adjacent stages into a data path pipeline structure that meets the computing requirements of the algorithm according to the reconfiguration information provided by the reconfiguration configuration unit.
  • each stage of the pipeline structure of the data path corresponds to a first-level computing array, that is, in the data path pipeline structure, the corresponding first-level pipeline is described with the first-level computing array as a unit; it should be emphasized that , in the current first-level computing array, only the computing module that accesses the data path is regarded as the current first-level pipeline of the data path pipeline structure, because the number of stages of the computing array or the number of computing arrays is preset , the computing array is a hardware resource pre-existing in the reconfigurable array, and the data path pipeline structure is based on the existing computing array, according to the reconstruction information provided by the reconstruction configuration unit to configure adjacent Formed by interconnect logic between compute arrays.
  • the adjacent two-stage computing arrays are connected to each other through two-by-two adjacent interconnections (equivalent to two-by-two interconnection) to form a data path pipeline structure that meets the computational requirements of the algorithm.
  • the reconstruction information changes, the calculation requirements corresponding to the execution algorithm also change accordingly, and the calculation arrays of adjacent columns are rewired based on the changed reconstruction information to realize the hardware circuit. way to execute the algorithm matching the current application scenario.
  • the pipeline depth of the data path pipeline structure connected by the two adjacent computing arrays is also automatically adjusted, that is, the pipeline depth of the data path pipeline structure can change, and the pipeline depth of the data path pipeline structure can also be unchanged, so that the pipeline depth of the data path pipeline structure can also be unchanged.
  • the pipeline depth of the data path pipeline structure is adaptively changed following the change of the reconstruction information.
  • the pipeline depths of different computing modules in the same stage of the computing array are all equal to the maximum pipeline depth allowed by the same stage of computing arrays. Therefore, the maximum pipeline depth allowed by the computing array of the same stage is the maximum pipeline in the computing module used for performing computing operations in the pipeline in the computing array of the same stage.
  • all the pipeline depths of the present invention are the time spent by data passing through the corresponding data paths of the data path pipeline structure, including data transmission time and calculation processing time; the data path pipeline
  • Each stage of the pipeline of the structure corresponds to the first-level computing array, that is, the n-th stage of the computing array belongs to the n-th stage of the pipeline, and the computing module (hardware resource) connected to the data path in the n-th stage of computing array is connected to the n-th stage.
  • stage pipeline that is, access to the data path pipeline structure.
  • the pipeline depth of the data path pipeline structure is: the pipeline of the computing array of all stages accessing the data path pipeline structure (or accessing the data path) in the reconfigurable array sum of depth.
  • the reconfigurable processor disclosed in this embodiment is based on adjacent interconnected computing modules used to execute computing instructions, and the pipeline depth of the reconfigured adjustment data passing through each stage of the computing array is the same and The data path pipeline structure that meets the computing requirements of the algorithm, so that the reconfigurable processor can configure the appropriate pipeline depth according to different algorithms, and on this basis, the overall pipeline of the data processing operation of the reconfigurable array can be realized, which improves the The throughput rate of the reconfigurable processor gives full play to the computing performance of the reconfigurable processor.
  • the hardware resources required to be configured in the pipeline design of the prior art are also reduced.
  • the pipeline depth of the data path pipeline structure adaptively changes following the change of the reconstruction information: under different computing application requirements, when the reconstruction configuration unit switches from providing one type of reconstruction information to another When the information is reconstructed, the data path pipeline structure through which data is accessed in the reconfigurable array changes, so that the pipeline depth of the data path pipeline structure can be adjusted adaptively.
  • the reconfigurable processor further includes an input FIFO group and an output FIFO group; the output terminals of the input FIFO group are respectively connected to the input terminals of the reconfigurable array correspondingly, and the corresponding connection here is made by the reconfigurable array. If the configuration information configures the first stage of the reconfigurable array, the input end of the computing module connected to the data path pipeline structure is connected to the matching output end of the input FIFO group, then the reconfigurable array is used according to The reconstruction information receives the data to be calculated transmitted from the input FIFO group, and transmits the data to be calculated to the data path pipeline structure.
  • the input terminals of the output FIFO group are respectively connected to the output terminals of the reconfigurable array correspondingly, and the corresponding connection here is the access to the data path pipeline of the last stage computing array configured by the reconfiguration information of the reconfigurable array. If the output terminal of the computing module of the structure is connected with the matching input terminal of the output FIFO group, the reconfigurable array is also used to provide the output FIFO group with the last stage of the data path pipeline structure according to the reconstruction information. The first stage corresponding to the pipeline calculates the output data of the array.
  • the reconfigurable processor first stores the input data to be processed in the corresponding input FIFO group.
  • the input FIFO group is set as the external data to enter the cache of the reconfigurable processor
  • the output FIFO group is set as the cache of the reconfigurable processor to output data to the outside, so as to match the algorithm for the reconfigurable processor.
  • the first-level computing array is the first-level (or first-column computing array) of the computing array cascaded within the reconfigurable array.
  • the current level of computing array is the current level (or the current column of computing arrays) of the cascaded computing arrays within the reconfigurable array
  • the last level of computing arrays is the last level of the cascaded computing arrays within the reconfigurable array. level (or the last column of the computed array).
  • the method of connecting the computing arrays of adjacent stages into a data path pipeline structure that meets the computing requirements of the algorithm includes: mutually non-adjacent stages (non-adjacent columns)
  • the two computing arrays are not connected across stages through data paths, so that the two-stage computing arrays that are not adjacent to each other are not directly connected to form the data path pipeline structure; Data path; it is worth noting that if there are only two-stage computing arrays that are not adjacent to each other, the two-stage computing arrays cannot be connected into a data path pipeline structure that meets the computational requirements of the algorithm; within a reconfigurable array , if there are two-stage computing arrays that are not adjacent to each other, the two-stage computing arrays cannot be connected to the data path pipeline structure through the direct connection method of establishing a data path across the stages; therefore, the data path pipeline structure does not Two-stage computing arrays that are not located adjacent to each other are allowed to be directly connected into data paths.
  • the input terminal of the calculation module in the first-stage calculation array is used as the input terminal of the reconfigurable array, and is configured to be connected to the output terminal matched with the input FIFO group based on the reconstruction information; wherein, the first The first-level computing array is the first level of the cascaded computing array in the reconfigurable array; the input end of the computing module in the current-level computing array is configured to be the adjacent previous stage based on the reconstruction information
  • the output terminals of the computing modules in a matching row in the computing array are connected, wherein the current stage of the computing array is not the first stage computing array in the reconfigurable array; the output terminals of the computing modules in the current stage of computing array is configured to be connected to the input end of a corresponding row of computing modules in the adjacent next-stage computing array based on the reconstruction information, wherein the current-stage computing array is not the last in the reconfigurable array
  • a first-level computing array; the output end of the computing module in the last-level computing array is used as the output end of the reconfigur
  • the calculation modules in the two adjacent stages of the calculation array of the reconfigurable array are serially connected into the data path pipeline structure, which reduces the complexity of the interconnection network path, and at the same time It can also realize multi-stage pipeline control simply and efficiently.
  • the reconfigurable array is used to connect a multi-channel data path pipeline structure according to the reconstruction information, so as to meet the application requirements of the number of multiple algorithms to be executed synchronously. Therefore, under the action of external configuration, in order to form a data path pipeline structure, no data path connection is performed between computing modules within the same-level computing array and between non-adjacent-level array computing units.
  • the input terminals of the computing modules inside the first-level computing array) all support the connection of the output terminals of any computing module configured as an adjacent previous-level computing array.
  • the connection between the reconfigurable array and the reconfigurable configuration unit includes but is not limited to direct coupling, indirect coupling or communication connection, which may be electrical, mechanical or other forms , which is used to transmit the reconstruction information of the calculation module.
  • the reconfigurable array shown in Figure 1 there are m-level computing arrays, where m is a positive number greater than or equal to 2; in the first-level computing array, there are computing modules 1_1, computing modules 1_2, ... ..., computing module 1_n1, there is no data path between these computing modules, where n1 is a positive number greater than or equal to 1.
  • "1” represents the number of stages of the first-level computing array
  • "n1” represents the row number of the computing modules arranged in the first-level computing array, and comprehensively expressed as the n1th row of the first-level computing array.
  • the calculation module that is, the calculation module in the n1th row of the first column in the reconfigurable array shown in FIG. 1 ; the pipeline depth of each calculation module in the first-stage calculation array through the configuration data in this embodiment is equal, and the field of The skilled person knows that if the control data flows through a computing module in the first-stage computing array, it is determined to connect this computing module to the first-stage pipeline of the data path pipeline structure. It should be noted that, at least one computing module is set inside the computing array of each stage.
  • the second-level computing array there are computing modules 2_1, .
  • the pipeline depth of the configuration data passing through each calculation module in the second-stage calculation array is equal.
  • the second-level computing array is adjacent to the first-level computing array in the reconfigurable array; the second-level computing array and the first-level computing array are adjacent two-level cascades structure, the second-stage computing array and the first-stage computing array are also connected as two adjacent stages of pipelines in the data path pipeline structure, and the second-stage computing array is equivalent to the next-stage computing array adjacent to the first-stage computing array , the first-level computing array is equivalent to the adjacent preceding-level computing array of the second-level computing array.
  • nm In the mth-level computing array, there are computing modules m_1, ..., and computing modules m_nm, where nm is greater than or equal to 1, nm is not necessarily equal to n1, and nm is not necessarily equal to n1n2; for m_nm, " m" represents the number of stages of the m-th computing array, “nm” represents the row number of the computing modules arranged in the m-th computing array, and comprehensively expressed as the computing modules arranged in the n-th row of the m-th computing array, That is, the calculation module of the mth column and the nmth row in the reconfigurable array shown in FIG. 1 . m is greater than or equal to 2. In the embodiment shown in FIG.
  • m is greater than 2 and belongs to the last stage of the computing array in the hierarchical arrangement in the reconfigurable array in hardware.
  • the pipeline depth of the configuration data passing through each computing module in the m-th computing array is the same.
  • the mth-level computing array and the first-level computing array are not arranged adjacent to the reconfigurable array, and the data path is not established between them by means of cross-level direct connection, but the mth-level computing array is adjacent to it.
  • the previous stage of the data path establishes the data path to be connected as two adjacent stages of pipelines in the data path pipeline structure. It can be seen from FIG. 1 that all the computing modules in the reconfigurable array connected to the data path in the computing array are connected to form the data path pipeline structure, then each stage of the data path pipeline structure is respectively connected to Each stage of the reconfigurable array in the reconfigurable array corresponds to one-to-one computing arrays. Therefore, in the reconfigurable array shown in FIG. 1, the pipeline depth of the data path pipeline structure is the sum of the pipeline depths of the computing arrays of all stages, That is, the sum of the pipeline depths of the m-stage computational array.
  • the reconstruction information of the calculation module provided by the reconstruction configuration unit includes second configuration information, first configuration information, and third configuration information;
  • the calculation module includes a calculation control unit, a compensation unit, and a first interconnection unit , the second interconnection unit.
  • this calculation module a_b2 represents the calculation module of the b2th row in the a-th calculation array.
  • the specific identification method refers to the previous embodiment, and is not repeated here.
  • Subsequent calculation modules and marks of logical units divided inside the calculation modules are also identified with reference to the foregoing embodiments for their positions in the reconfigurable array, which will not be repeated here.
  • the input end of the first interconnection unit is the input end of the computing module
  • the output end of the first interconnection unit is connected with the input end of the computing control unit
  • the computing control unit The output end of the compensating unit is connected with the input end of the compensation unit
  • the output end of the compensation unit is connected with the input end of the second interconnection unit
  • the output end of the second interconnection unit is the output end of the calculation module.
  • the first interconnection unit a_b2 is used to connect the first interconnection unit a_b2 and the calculation control unit a_b2 to the current stage pipeline of the data path pipeline structure (corresponding to FIG. 3 , according to the first configuration information).
  • a-stage pipeline With reference to FIG. 3 , when the first interconnection unit (a-1)_b1 is in the first-stage calculation array and connected to the first-stage pipeline, the first interconnection unit (a-1)_b1 will The data to be calculated output from the output terminal is input to the calculation control unit (a-1)_b1.
  • the first interconnection unit a_b2 When the first interconnection unit a_b2 is not in the first-stage calculation array, the first interconnection unit a_b2 inputs the calculation result output by the matching calculation module (a-1)_b1 in the adjacent previous-stage calculation array to the calculation control unit a_b2 ; Similarly, when the first interconnection unit (a+1)_b3 is not in the first-stage calculation array, the first interconnection unit (a+1)_b3 outputs the output of the matching calculation module a_b2 in the adjacent previous stage calculation array The calculation result of is input to the calculation control unit (a+1)_b3. It should be noted that, in this embodiment, data passing through the aforementioned first interconnection unit is processed regardless of the depth of the pipeline.
  • the calculation control unit a_b2 is configured to select and connect to the data direct path according to the second configuration information to control the data input to the calculation control unit a_b2 to directly pass through and transmit to the compensation unit a_b2, Or choose to be connected into a data calculation path to control the data input to the calculation control unit a_b2 and transmit it to the compensation unit a_b2 after performing the calculation; the calculation control unit (a+1)_b3 shown in FIG. 3, the calculation control unit (a-1 )_b1 is also selected to be connected to the data direct path and the data calculation path under the action of the second configuration information.
  • the data path includes a data direct path and a data computing path.
  • calculation types performed by the calculation control unit include addition and subtraction, multiplication, division, square root, and trigonometric calculations, and these specific calculation operations are performed when the calculation control unit is connected into a data calculation path, and It takes a certain amount of time to generate the pipeline depth corresponding to the calculation type.
  • the calculation control unit when the calculation control unit is an adder-subtractor, if it is connected to a data calculation path, configure its pipeline depth to 1; when the calculation control unit is a multiplier, if If it is connected to a data calculation path, configure its pipeline depth to 4; when the calculation control unit is a divider, if it is connected to a data calculation path, configure its pipeline depth to 6; when the calculation control unit is a square opener, if it is connected to a data calculation path For the computing channel, configure its pipeline depth to 4; when the computing control unit is a trigonometric function calculator, if it is connected to a data computing channel, configure its pipeline depth to 12; , directly connected to the adjacent post-stage computing array, then configure the pipeline depth of the computing control unit to 0.
  • the types of computing control units in each level of computing array are not all the same, or all the same; wherein, the types of computing control units between two adjacent computing arrays are not all the same, or all the same.
  • the type and number of the computing control units in each level of computing array can be adjusted according to specific application fields and performance requirements.
  • the reconfigurable array can be calculated by changing The calculation method of addition, subtraction, multiplication and division of the module makes the computing control units at all levels interconnected in the reconfigurable array suitable for a variety of algorithms, and can be flexibly configured according to the needs of different algorithms, thus changing the traditional computing array in which one algorithm cooperates with one The fixed array mode greatly improves the computational cost and efficiency.
  • the second configuration information is also a gating signal, used to control the gating output of the data transmitted by the first interconnection unit between the data direct path and the data computing path, so as to satisfy the requirements of the The calculation requirements of the algorithm in each stage of the pipeline structure of the data path; wherein, the calculation control unit is implemented by a data selector and an arithmetic logic circuit, and the gate end of the data selector is to receive the second configuration information, and the arithmetic logic circuit
  • the operations that can be performed correspond to the addition and subtraction, multiplication, division, square root, and trigonometric calculations of the above-mentioned embodiments; the input end of the arithmetic logic circuit is connected to the data output end of the data selector, and the data selector is used according to the second configuration information.
  • the data transmitted by the first interconnection unit is switched and output between the data direct path and the data calculation path, so as to meet the calculation requirements of the algorithm at each stage of the data path pipeline structure. Thereby, it is determined whether the calculation
  • the compensation unit is configured to select the corresponding delay difference value according to the third configuration information to compensate the pipeline depth delay of the computing module to which it belongs to the maximum pipeline depth allowed by the current first-level computing array.
  • the computing control unit Add the pipeline depth corresponding to the delay difference compensated by the compensation unit, and the obtained sum is the maximum pipeline depth allowed by the current level of computing array; wherein, the maximum pipeline depth allowed by the current level of computing array is the pipeline depth of the calculation control unit with the maximum time spent by the current stage pipeline of the data path pipeline structure; therefore, the compensation unit (a+1)_b3 shown in FIG.
  • the compensation unit a_b2 after determining the maximum pipeline depth allowed by the a-th computing array, the compensation unit a_b2 also performs a similar delay processing on the computing control unit a_b2; After depth, the compensation unit (a-1)_b1 also performs a similar delay processing on the calculation control unit (a-1)_b1.
  • the second interconnection unit (a+1)_b3 is used to connect the second interconnection unit (a+1)_b3 and the compensation unit (a+1)_b3 to the data according to the first configuration information The (a+1)th stage of the pipeline pipeline structure. It can be seen from FIG. 3 that when the second interconnection unit (a+1)_b3 calculates the array at the last stage (column) and is connected to the last stage of the pipeline, it will undergo the delay compensation processing of the compensation unit (a+1)_b3.
  • the data is transmitted to the matching output FIFO in the output FIFO group; when the second interconnection unit a_b2 is not in the last stage to calculate the array, the second interconnection unit a_b2 transmits the data processed by the delay compensation of the compensation unit a_b2 to the adjacent post
  • the matching computing module (a+1)_b3 in the first-level computing array similarly, when the second interconnecting unit (a-1)_b1 is not in the last-level computing array, the second interconnecting unit (a-1)_b1 will The data processed by the compensation unit (a-1)_b1 delay compensation is transmitted to the matching calculation module a_b2 in the adjacent next-stage calculation array; it should be noted that, in this embodiment, the data passes through the aforementioned second Interconnect units are processed regardless of the depth of the pipeline.
  • the computing module is connected to the adjacent previous-stage computing array through the first interconnection unit, the computing module is connected to the adjacent subsequent-stage computing array through the second interconnection unit, and the first interconnection unit is connected to the second stage.
  • a calculation control unit and a compensation unit are connected between the interconnection units to form a pipeline based on the reconstruction information, so that the calculation modules are set into a reconfigurable interconnection logic mode according to adjacent columns, and the hardware structure is simple; Determine the maximum pipeline depth of the computing array of the current stage on the basis of determining the computing module that actually performs the computing function of the computing array of the current stage, and then use the difference between the maximum pipeline depth and the pipeline depth of the computing control unit of the computing array of the same stage , the corresponding computing control unit is compensated for the pipeline depth, so that the pipeline depth of the data passing through different computing modules of each stage of the computing array is the same, so as to solve the problem of the coarse-grained reconfigurable processor (a kind of the reconfigurable processor). type), the clock frequency is not
  • the third configuration information is a strobe signal, which is used for the calculation control that consumes the largest pipeline depth in the current stage pipeline of the data path pipeline structure determined by the reconstruction configuration unit unit, (that is, after the reconstruction configuration unit determines that the computing control unit with the largest pipeline depth in the data path pipeline structure is connected to the current stage of the computing array), all computations in the current stage of pipeline In the module (in all computing modules of the current stage computing array that are connected to the data path pipeline structure), the matching register path set inside the compensation unit for compensating for the delay difference is gated, and then control The output data of the calculation control unit of the current stage of pipeline is transmitted in the register path (the output data of different calculation control units connected to the data path pipeline structure in the current stage of calculation array is then controlled to be transmitted in the register path), Until the corresponding computing module is output, the pipeline depth of the computing module that realizes the current first-level pipeline is all delayed and compensated to be the maximum pipeline depth allowed by the current first-level computing array, that is, the same-level
  • the pipeline depth of the computing module is all delayed and compensated to the maximum pipeline depth allowed by the current stage of the computing array.
  • the data first passes through the computing control unit of the first pipeline depth, and then is controlled to pass through the matching for compensation.
  • the register path of the delay difference wherein the pipeline depth generated by the data passing through the matching register path used to compensate the delay difference is the second pipeline depth, and the pipeline depth consumed by the data is the first pipeline depth and the second pipeline depth.
  • the sum of the pipeline depth is equal to the maximum pipeline depth allowed by the current one-stage computing array, so that the computing modules connected to the data path pipeline structure in the same-stage computing array all output data synchronously; wherein, the compensation unit adopts a selector and register implementation, so as to selectively perform pipeline compensation for the computing control units that do not reach the pipeline depth of the current one-stage computing array, and support the overall pipeline of data processing of the reconfigurable array on the multi-stage pipeline structure.
  • the register path for compensating the delay difference is composed of a preset number of registers, and these registers are triggered by the third configuration information to register the same calculation
  • the preset number of data output by the calculation control unit in the module changes
  • the delay time generated by the register path used to compensate the delay difference also changes, resulting in different pipeline depths when the data passes through.
  • a matching delay difference is provided for the calculation control units with different pipeline depths. This is the compensation mechanism of those skilled in the art based on the aforementioned pipeline depth.
  • Combination improvements can be obtained, including but not limited to: the strobe terminal of the selector is used to receive the third configuration information, and the multiple data output terminals of the selector are respectively connected to different numbers of register paths, then the same There are a variety of optional register paths in the compensation unit, and under the gating effect of the third configuration information, the compensation unit selects a matching register path to connect the calculation control unit in the same calculation module, Then control the output data of the calculation control unit to be transmitted in the register path until the calculation module is output, so that the pipeline depth of the calculation module connected to the data path pipeline structure in the same-level calculation array is delayed and compensated for the current level. Calculates the maximum pipeline depth allowed by the array.
  • the delay difference value generated by the registration is equal to: the maximum pipeline depth allowed in the current stage of computing array minus the computing control unit connected to the compensation unit in the same computing module The time difference obtained by the pipeline depth.
  • the selector inside the compensation unit is controlled to connect the calculation control unit to the register path for generating an appropriate delay difference, so that any data passes through the same pipeline depth Different computing modules of the data path pipeline structure are connected to the same-stage computing array.
  • the first configuration information includes: required for connecting the first interconnection unit in the first-stage computing array and a matching input FIFO set in the input FIFO group to the data path pipeline structure access address information and time information, connect the first interconnection unit in the current stage of the computing array and the second interconnection unit in the adjacent previous stage of the computing array to the data path pipeline structure
  • the access address information and time information required in the computing array of the current level and the matching first interconnection unit in the computing array of the current stage and the computing array of the next stage are connected to the data Access address information and time information required in the channel pipeline structure, connect the second interconnection unit in the last stage of the calculation array and the matching output FIFO set in the output FIFO group to the data channel pipeline structure access address information and time information required in the data path; wherein, both the first interconnection unit and the second interconnection unit support forming between the computing modules in the reconfigurable array or in the data path pipeline structure
  • the topological structure of the interconnection between them can meet the complete function of the algorithm.
  • the data is sent to the corresponding input end of the first-stage computing array of the multi-stage pipeline, so that after the data is processed by the computing array in the multi-stage pipeline, the data is sent to the corresponding Output FIFO, so that under different computing application requirements, when the reconstruction configuration unit switches from providing one kind of reconstruction information to another kind of reconstruction information, it is guaranteed to form a complete interconnection logic between the computing arrays of adjacent stages pipeline structure.
  • the reconfigurable array is provided with six-level computing arrays, that is, six-column computing arrays; each level of computing array is provided with four-row computing modules, that is, each column of computing arrays is Four calculation modules are arranged in separate rows, and each row in the same-level calculation array has one calculation module, which are represented by the adder-subtractor, multiplier, divider, square root, and trigonometric function calculator shown in Figure 2, respectively.
  • the meaning of the marks carried by the calculation module in FIG. 2 refers to the explanation of “m_nm” in the embodiment of FIG.
  • the mark before '_' indicates the number of columns of the calculation array of the reconfigurable array, or the number of columns of the reconfigurable array.
  • a six-stage computing array is connected into a six-stage pipeline under the configuration of the reconfiguration information provided by the reconfiguration configuration unit, so as to form the data path pipeline structure and support computing operations of a specific granularity; wherein, the same stage Only one calculation module is set in each row in the calculation array; wherein, the input terminals of the four calculation modules set in the first-stage calculation array are respectively connected to four different input FIFOs set in the input FIFO group based on the reconstruction information.
  • the output end of a calculation module set in the sixth-stage calculation array is connected to the output end of an output FIFO set in the output FIFO group based on the reconstruction information.
  • the calculation modules in the reconfigurable array all adopt a pipelined design, and different types of calculation control units inside the calculation module have different pipeline depths.
  • the calculation control unit inside the adder-subtractor The pipeline depth of the multiplier is 1, the pipeline depth of the calculation control unit inside the multiplier is 4, the pipeline depth of the calculation control unit inside the divider is 6, the pipeline depth of the calculation control unit inside the square extractor is 4, and the trigonometric function calculator
  • the pipeline depth of the internal computing control unit is 12.
  • the calculation control unit inside the adder-subtractor is used to perform addition calculation or subtraction calculation when it is connected to the data calculation path. Since the pipeline depth consumed by the addition calculation and the subtraction calculation is the same, the addition calculation will be performed. Calculation or subtraction calculation is simply called addition and subtraction calculation.
  • Fig. 2 is a calculation module filled with diagonal lines, used to represent its internal calculation control unit to perform corresponding function calculation by connecting into the data calculation path; and the calculation module indicated by the arrowed connection line and not filled with diagonal lines is used.
  • the computing control unit inside the representative is connected to form the data through path to realize the data through without processing.
  • the first-stage calculation array includes an adder-subtractor 1_1, an adder-subtractor 1_2, an adder-subtractor 1_3, and an adder-subtractor 1_4; the adder-subtractor 1_1 and the adder-subtractor 1_1 in the first-stage calculation array
  • the internal calculation control units of the subtractor 1_4 are all connected into a data calculation path under the configuration of the second configuration information included in the reconstruction information, and are used to perform addition and subtraction calculations;
  • the insides of the subtractor 1_2 and the adder-subtractor 1_3 are connected to form a data through-path under the action of the second configuration information included in the reconstruction information;
  • Both the adder 1_2 and the adder-subtractor 1_3 are connected as the first-stage pipeline of the data path pipeline structure; under the configuration of the third configuration information, the first-stage computing array corresponds to the access line of the first-stage pipeline.
  • the adder-subtractor 1_1 receives the data transmitted by the first input FIFO of the input FIFO group and the data transmitted by the second input FIFO
  • the adder-subtractor 1_2 receives the second input FIFO group of the input FIFO group.
  • the data transmitted by the input FIFO, the adder-subtractor 1_3 receives the data transmitted by the third input FIFO of the input FIFO group, and the adder-subtractor 1_4 receives the data transmitted by the third input FIFO of the input FIFO group and the fourth input FIFO transmitted data.
  • the second-stage calculation array includes a multiplier 2_1, a multiplier 2_2, a multiplier 2_3, and a multiplier 2_4; in the second-stage calculation array, the multiplier 2_1 respectively receives the output data of the adder-subtractor 1_2 based on the first configuration information and the output data of the adder-subtractor 1_1, the multiplier 2_3 respectively inputs the output data of the adder-subtractor 1_2 and the output data of the adder-subtractor 1_3 based on the first configuration information, and then the internal
  • the calculation control units are all connected into the data calculation path under the configuration of the second configuration information included in the reconstruction information, and are used to perform multiplication calculations; in the second-level calculation array, the multipliers 2_4 are based on the The first configuration information respectively receives the output data of the adder-subtractor 1_4, and then based on the configuration function of the second configuration information, the calculation control unit inside the multiplier 2_4 is connected to form the data through path.
  • the multiplier 2_1, the multiplier 2_3 and the multiplier 2_4 are all connected as the second-stage pipeline of the data path pipeline structure; under the configuration of the third configuration information, the second-stage computing array corresponds to the The pipeline depth of the second stage pipeline is 4.
  • the third-stage calculation array includes an adder-subtractor 3_1, an adder-subtractor 3_2, an adder-subtractor 3_3, and an adder-subtractor 3_4; in the third-stage calculation array, the adder-subtractor 3_1 is based on the first configuration information Receive the output data of the multiplier 2_1, the adder-subtractor 3_3 receives the output data of the multiplier 2_3 based on the first configuration information, and the adder-subtractor 3_4 receives the output data of the multiplier 2_4 based on the first configuration information;
  • the internal calculation control units of the adder 3_1, the adder-subtractor 3_3, and the adder-subtractor 3_4 are all connected to form a data through-path under the configuration of the second configuration information included in the reconstruction information; the adder-subtractor 3_1, The adder-subtractor 3_3 and the adder-subtractor 3_4 are both connected as the third-stage pipeline of the data path pipeline structure;
  • the fourth-stage calculation array includes a multiplier 4_1, a multiplier 4_2, a multiplier 4_3, and a divider 4_4; in the fourth-stage calculation array, the multiplier 4_2 receives the output data of the adder-subtractor 3_1 based on the first configuration information and The output data of the adder-subtractor 3_3, the divider 4_4 receives the output data of the adder-subtractor 3_3 and the output data of the adder-subtractor 3_4 based on the first configuration information; the internal calculation control unit of the multiplier 4_2 is in the Under the configuration of the second configuration information, it is connected to form the data calculation path, which is applied to perform multiplication calculation; the internal calculation control unit of the divider 4_4 is connected to the data calculation path under the configuration of the second configuration information, It is applied to perform division calculation; the multiplier 4_2 and the divider 4_4 are all connected as the fourth-level pipeline of the data path pipeline structure; The depth is 6, which is greater than the pipeline depth of
  • the fifth-stage calculation array includes an adder-subtractor 5_1, an adder-subtractor 5_2, an adder-subtractor 5_3, and an adder-subtractor 5_4; in the fifth-stage calculation array, the adder-subtractor 5_2 is based on the A configuration information receives the output data of the multiplier 4_2 and the output data of the divider 4_4; the internal calculation control unit of the adder-subtractor 5_2 is connected to form the data calculation path under the configuration of the second configuration information, and is applied to Execute addition and subtraction calculation; the adder-subtractor is connected to the fifth-stage pipeline of the data path pipeline structure; under the configuration of the third configuration information, the fifth-stage calculation array corresponds to the connected fifth-stage pipeline.
  • the pipeline depth is 1.
  • the sixth-level calculation array includes a multiplier 6_1, a square extractor 6_2, a divider 6_3, and a trigonometric calculator 6_4; in the sixth-level calculation array, the square extractor 6_2 receives addition and subtraction based on the first configuration information
  • the output data of the square extractor 5_2; the internal calculation control unit of the square extractor 6_2 is connected to the data calculation path under the configuration of the second configuration information, and is applied to perform the square extraction calculation; the square extractor 6_2 is connected as the The sixth-stage pipeline of the data path pipeline structure; under the configuration effect of the third configuration information, the pipeline depth of the sixth-stage pipeline corresponding to the sixth-stage computing array is 4.
  • the squarer 6_2 outputs data to the output FIFO group inside the output FIFO based on the first configuration information.
  • the pipeline depth of the adder-subtractor 1_2 and the calculation control unit inside the adder-subtractor 1_3 in the first-stage calculation array is 0, and it is necessary to control the compensation unit inside each
  • To do one-stage pipeline compensation that is, use the first preset number of registers to complete one-shot delay compensation for the output data of the calculation control unit, and compensate the first-stage pipeline depth for the first-stage calculation array delay.
  • the calculation control unit inside the multiplier 2_4 in the second-stage calculation array is connected as the data through path, the pipeline depth of the calculation control unit inside the multiplier 2_4 is 0, and the third configuration information is required to control the internal compensation unit pair
  • the output data of the calculation control unit inside the multiplier 2_4 is subjected to 4-stage pipeline compensation to compensate the pipeline depth delay of the multiplier 2_4 to the maximum pipeline depth allowed by the second-stage calculation array, that is, the internal calculation control unit is connected as The pipeline depth of the multiplier 2_1 or the multiplier 2_3 of the data calculation path.
  • the pipeline depth of the internal calculation control unit of the multiplier 4_2 in the fourth-stage calculation array is 4, which is less than the pipeline depth 6 of the divider 4_4 that the internal calculation control unit is connected to as the data calculation path, and the pipeline depth of the divider 4_4 It is configured as the maximum pipeline depth allowed by the fourth-stage calculation array; therefore, the compensation unit within the third configuration information control is required to perform 2-stage pipeline compensation on the output data of the calculation control unit within the multiplier 4_2, so that the multiplier The pipeline depth delay compensation of 4_2 is the maximum pipeline depth allowed by the second stage computing array.
  • another embodiment of the present invention discloses a configuration method, which includes: according to the computing requirements of the current application scenario matching algorithm, connecting the computing arrays of adjacent stages of the reconfigurable array to support A data path pipeline structure in which the data passes through different computing modules in the same-stage computing array with equal pipeline depth and meets the computing requirements of the algorithm matching the current application scenario; in the same-stage computing array, different computing modules of the data path pipeline structure are accessed The pipeline depths of the two modules are all equal, so that different computing modules connected to the data path pipeline structure output data synchronously; in the data path pipeline structure, the pipeline depths of different computing modules in the same stage of the computing array are all equal to the same stage. The maximum pipeline depth allowed by the compute array.
  • the maximum pipeline depth allowed by configuring the computing arrays of the same stage in this embodiment is the same stage.
  • At least one computing module is arranged inside each level of computing array; wherein, in a reconfigurable array, one computing array set on each column is a first-level computing array, and the number of computing arrays is preset; Each stage of the pipeline of the data path pipeline structure corresponds to a first-level computing array respectively; the computing module that accesses the data path in the current one-level computing array is the current first-level pipeline that accesses the data path pipeline structure; wherein, the pipeline depth is The time it takes for data to flow through the corresponding datapaths of the datapath pipeline structure.
  • the configuration method is based on the adjacent interconnected computing modules used to execute computing instructions, and the pipeline depth of the reconstructed adjustment data passing through each stage of the computing array is the same and meets the computing requirements of the algorithm.
  • the data path pipeline structure allows the reconfigurable processor to configure the appropriate pipeline depth according to different algorithms, and on this basis, realizes the overall pipeline of the data processing operation of the reconfigurable array, and improves the reconfigurable processor. data throughput.
  • the configuration method further includes: based on the reconstruction information, configuring the reconfigurable array to receive the data to be calculated transmitted from the input FIFO group, and transmitting the data to be calculated to the data path pipeline structure, and configuring the data to be calculated at the same time.
  • the reconfigurable array outputs the calculation result of the calculation array corresponding to the last stage of the data path pipeline structure to the output FIFO group.
  • the configuration method configures external data to enter the cache of the reconfigurable processor, and simultaneously sets the cache of the reconfigurable processor to output data to the outside, so as to match the algorithm for the reconfigurable processor and the external cache. Data exchange and storage requirements for system elements.
  • the configuration method specifically includes: step S41 , starting to configure the reconfigurable array based on the reconfiguration information in the foregoing embodiment, and then entering step S42 .
  • step S42 it is judged whether all computing modules connected to the data path in the current level computing array have been traversed.
  • the data path is a part of the data path pipeline structure, and each stage of the data path pipeline structure is described as a unit.
  • Step S43 start to traverse the new computing module connected to the data path in the current first-level computing array, and then proceed to step S44.
  • Step S44 judging whether the current one-level computing array is detected as the first-level pipeline corresponding to the data path pipeline structure, i.e., judging whether the current one-level computing array has a computing module that accesses the data path of the first-level pipeline, then Go to step S45, otherwise go to step S46.
  • Step S45 connect the first interconnection unit and the calculation control unit to the first-stage pipeline of the data path pipeline structure, and simultaneously connect the second interconnection unit and the compensation unit to the first-stage pipeline of the data path pipeline structure , so that the first-stage pipeline of the data path pipeline structure is connected in the first-stage computing array. Then proceed to step S49.
  • Step S49 judging whether the calculation control unit detects the calculation gating signal (corresponding to the configuration function of the third configuration information in the foregoing embodiment), if yes, go to step S410 , otherwise go to step S411 .
  • Step S410 configure the data input to the calculation control unit to be output to the compensation unit after performing the calculation, and then proceed to step S412.
  • Step S411 configure the data input to the calculation control unit to directly pass and transmit to the compensation unit without performing calculation.
  • Step S412 configure the compensation unit to select the corresponding delay difference to compensate the pipeline depth delay of the computing control unit to the maximum pipeline depth allowed by the current level of computing array, and refer to the compensation method of the aforementioned reconfigurable processor for the specific compensation method. Example of a unit. Return to step S42 again.
  • Step S46 judging whether the current one-stage computing array is detected as corresponding to the last-stage pipeline, if yes, go to step S47, otherwise go to step S48.
  • Step S47 connect the first interconnection unit and the calculation control unit to the last stage of the pipeline structure of the data path, and simultaneously connect the second interconnection unit and the compensation unit to the last stage of the pipeline structure of the data path. , and then proceed to step S49.
  • Step S48 connect the first interconnection unit and the computing control unit to the current first-level pipeline of the data path pipeline structure, and simultaneously connect the second interconnection unit and the compensation unit to the current first-level pipeline of the data path pipeline structure , and then proceed to step S49.
  • step S413 it is judged whether all the calculation arrays in the reconfigurable array have been traversed, if yes, go to step S415, otherwise go to step S414.
  • Step S414 start to traverse the adjacent next-level calculation array, and then return to step S42.
  • Step S415 it is determined that the calculation array of all columns (all stages) in the reconfigurable array has been traversed, and the reconfiguration configuration operation on the reconfigurable array is ended.
  • the calculation module includes a calculation control unit, a compensation unit, a first interconnection unit, and a second interconnection unit; in the same calculation module of each level of calculation array, the input end of the first interconnection unit is the input end of the calculation module, and the first interconnection unit is the input end of the calculation module.
  • the output end of the interconnection unit is connected with the input end of the calculation control unit, the output end of the calculation control unit is connected with the input end of the compensation unit, the output end of the compensation unit is connected with the input end of the second interconnection unit, and the output end of the second interconnection unit is the output of the calculation module.
  • the foregoing steps can determine the computing module that the computing array of the current stage actually performs the computing function and determine the maximum pipeline depth of the computing array of the current stage, and then use the maximum pipeline depth and the pipeline depth of the computing control unit of the computing array of the same stage. difference, perform pipeline compensation on the corresponding computing control unit, so that the data passes through the different computing modules of each stage of the computing array with the same pipeline depth, and then solves the problem of coarse-grained reconfigurable processors (a type of reconfigurable processors). ), the clock frequency is not high, and the computational efficiency is low.
  • the computing control unit and the compensation unit are connected to form a first-level pipeline structure of the data path pipeline structure, thereby realizing multi-level pipeline control.
  • the disclosed systems and chips may be implemented in other manners.
  • the system embodiments described above are only illustrative.
  • the division of the units is only a logical function division. In actual implementation, there may be other division methods.
  • multiple units or components may be combined or Can be integrated into another system, or some features can be ignored, or not implemented.
  • the shown or discussed mutual coupling or direct coupling or communication connection may be through some interfaces, indirect coupling or communication connection of devices or units, and may be in electrical, mechanical or other forms.
  • the units described as separate components may or may not be physically separated, and components displayed as units may or may not be physical units, that is, may be located in one place, or may be distributed to multiple network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution in this embodiment.

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Hardware Design (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Microcomputers (AREA)
  • Multi Processors (AREA)
  • Logic Circuits (AREA)

Abstract

本发明公开一种可重构处理器及配置方法,所述可重构处理器包括重构配置单元和可重构阵列;重构配置单元,用于根据当前应用场景匹配的算法,提供用于重构可重构阵列内的计算结构的重构信息;可重构阵列包括至少两级计算阵列,可重构阵列用于根据重构配置单元提供的重构信息,将相邻两级的计算阵列连接出满足算法的计算需求的数据通路流水线结构;在同一级计算阵列内,接入数据通路流水线结构的不同的计算模块的流水线深度都相等,使得接入数据通路流水线结构的不同的计算模块都同步输出数据。从而让可重构处理器根据不同的算法配置出适应的流水线深度,并在此基础上实现可重构阵列的数据处理操作的整体流水化,提高了可重构处理器的数据吞吐率。

Description

一种可重构处理器及配置方法 技术领域
本发明涉及可重构计算的技术领域,尤其涉及一种可重构处理器及配置方法。
背景技术
可重构计算是指计算系统能够利用可重用的硬件资源,根据不同的应用需求,灵活的对自身计算路径重构,以便为每个特定的应用需求提供与之相匹配的计算结构。粗粒度可重构处理器作为一种新的高性能计算结构,兼具通用计算与专用计算的优势,在编程灵活性和计算能效上具有较好的折中。可重构阵列作为可重构处理器的计算核心,其效率和灵活性,将对可重构系统的性能产生极大影响。现有的粗粒度可重构处理器的可重构阵列结构很少考虑内部的流水性质,使得复杂操作成为可重构阵列的计算速度瓶颈,时钟频率不高,计算效率较低。
在粗粒度的可重构体系结构中,可重构阵列包括功能完整的计算单元,如加减法器、乘法器、除法器、开方器、三角函数计算器等。为了保证可重构处理器较高的时钟频率与计算效率,这些计算单元大都采用流水化设计。由于要实现的计算复杂度不同,不同计算单元的计算流水深度往往不同,从而导致可重构阵列难以实现整体流水化的数据处理,限制可重构处理器的流水线计算性能的提升。
技术解决方案
为了解决上述技术问题,本发明技术方案公开一种自适应配置流水线深度的可重构处理器,对可重构阵列采用多级流水线控制,具体技术方案包括:一种可重构处理器,所述可重构处理器包括重构配置单元和可重构阵列;重构配置单元,用于根据当前应用场景匹配的算法,提供用于重构可重构阵列内的计算结构的重构信息;可重构阵列包括至少两级计算阵列,可重构阵列用于根据重构配置单元提供的重构信息,将相邻两级的计算阵列连接出满足所述当前应用场景匹配的算法的计算需求的数据通路流水线结构;在同一级计算阵列内,接入数据通路流水线结构的不同的计算模块的流水线深度都相等,使得接入数据通路流水线结构的不同的计算模块都同步输出数据;每一级计算阵列的内部设置至少一个计算模块;其中,在一个可重构阵列的每一列上,设置有一个计算阵列,每一个计算阵列是一级计算阵列,计算阵列的数目是预先设定的,这些计算阵列是以级联的结构存在于可重构阵列;其中,所述数据通路流水线结构的每一级流水线分别对应一级计算阵列;在每一级计算阵列中,接入数据通路的所述计算模块相当于接入所述数据通路流水线结构的对应一级流水线;其中,流水线深度是数据流经所述数据通路流水线结构的相应的数据通路所耗费的时间。
与现有技术相比,所述可重构处理器以用于执行计算指令的相邻互连的计算模块为基础,重构调节数据经过每一级计算阵列的流水线深度都相同且满足算法的计算需求的数据通路流水线结构,从而让可重构处理器根据不同的算法配置出适应的流水线深度,并在此基础上实现可重构阵列的数据处理操作的整体流水化,提高了可重构处理器的吞吐率,充分发挥可重构处理器的计算性能,也减小现有技术的流水线设计所需配置的硬件资源。
进一步地,所述可重构处理器还包括输入FIFO组和输出FIFO组;输入FIFO组的输出端分别与可重构阵列的输入端对应连接,可重构阵列用于根据所述重构信息接收来自所述输入FIFO组传输的待计算数据,并将待计算数据传输给所述数据通路流水线结构;输出FIFO组的输入端分别与可重构阵列的输出端对应连接,可重构阵列还用于根据所述重构信息向所述输出FIFO组提供所述数据通路流水线结构的最后一级流水线所对应的一级计算阵列的输出数据。该技术方案设置输入FIFO组作为外部数据进入所述可重构处理器的缓存,设置输出FIFO组作为所述可重构处理器向外部输出数据的缓存,以匹配所述算法对于所述可重构处理器与外部的系统元件的数据交换、存储要求。
进一步地,在所述可重构阵列中,所述将相邻级的计算阵列连接出满足算法的计算需求的数据通路流水线结构的方式包括:互不相邻级的两个计算阵列之间不通过数据通路跨级连接,以使得互不相邻的两级计算阵列不直接连接成所述数据通路流水线结构;且同一级计算阵列内的不同计算模块之间不存在数据通路;第一级计算阵列内的计算模块的输入端作为所述可重构阵列的输入端,用于基于所述重构信息,配置为与所述输入FIFO组相匹配的输出端连接;当前一级计算阵列内的计算模块的输入端用于基于所述重构信息,配置为与其相邻的前一级计算阵列内的相匹配一行的计算模块的输出端相连接,其中,当前一级计算阵列在所述可重构阵列中不是第一级计算阵列;当前一级计算阵列内的计算模块的输出端用于基于所述重构信息,配置为与其相邻的后一级计算阵列内的相匹配一行计算模块的输入端相连接,其中,当前一级计算阵列在所述可重构阵列中不是最后一级计算阵列;最后一级计算阵列内的计算模块的输出端作为所述可重构阵列的输出端,用于基于所述重构信息,配置为与所述输出FIFO组相匹配的输入端连接;其中,相邻的前一级计算阵列的级别比当前一级计算阵列的级别小一级,相邻的后一级计算阵列的级别比当前一级计算阵列的级别大一级;所述数据通路是数据传输的路径。该技术方案按照所述重构信息将所述可重构阵列的彼此相邻两级的计算阵列内的计算模块串行连接为所述数据通路流水线结构,降低了互联网络通路的复杂度,同时还能简单高效的实现多级流水线控制。在此基础上可重构阵列用于根据所述重构信息连接出多路的数据通路流水线结构,以满足同步执行的多种算法的数目应用需求。
进一步地,所述重构配置单元提供的计算模块的重构信息包括第二配置信息、第一配置信息、第三配置信息;计算模块包括计算控制单元、补偿单元、第一互联单元、第二互联单元;第一互联单元,用于根据所述第一配置信息,将第一互联单元和计算控制单元接入所述数据通路流水线结构的当前一级流水线,其中,第一互联单元用于在当前一级流水线对应于第一级计算阵列时,将所述输入FIFO组内相匹配的输出端输出的待计算数据输入给所述计算控制单元;第一互联单元还用于在当前一级流水线不是对应于第一级计算阵列时,将相邻的前一级计算阵列内相匹配的计算模块所输出的计算结果输入给所述计算控制单元;计算控制单元,用于根据第二配置信息,选择连接成数据直通通路以控制输入所述计算控制单元的数据在不执行计算的前提下直接通过并传输给补偿单元、或选择连接成数据计算通路以控制输入所述计算控制单元的数据在执行计算后传输给补偿单元;其中,所述数据通路包括数据直通通路和数据计算通路;补偿单元,用于根据第三配置信息,选择相应的延时差值将同一计算模块的流水线深度延时补偿为当前一级计算阵列所允许的最大流水线深度;第二互联单元,用于根据所述第一配置信息,将第二互联单元和补偿单元接入所述数据通路流水线结构的当前一级流水线,其中,第二互联单元用于在当前一级流水线对应于最后一级计算阵列时,将经过所述补偿单元延时补偿处理的数据传输给所述输出FIFO组内相匹配的输出FIFO;第二互联单元还用于在当前一级流水线不是对应于最后一级计算阵列时,将经过所述补偿单元延时补偿处理的数据传输给相邻的后一级计算阵列内相匹配的计算模块;其中,在当前一级计算阵列的同一计算模块内,第一互联单元的输入端是计算模块的输入端,第一互联单元的输出端与计算控制单元的输入端连接,计算控制单元的输出端与补偿单元的输入端连接,补偿单元的输出端与第二互联单元的输入端连接,第二互联单元的输出端是计算模块的输出端。
在该技术方案中,计算模块通过第一互联单元与相邻的前一级计算阵列连接,计算模块通过第二互联单元与相邻的后一级计算阵列连接,在第一互联单元与第二互联单元之间连接有计算控制单元和补偿单元,形成基于所述重构信息的流水线,从而将所述计算模块按相邻列设置成可重构的互联逻辑模式,硬件结构简单;同时,在确定当前一级的计算阵列实际执行计算功能的计算模块的基础上确定当前一级的计算阵列的最大流水线深度,然后利用这个最大流水深度与同一级的计算阵列的计算控制单元的流水线深度之差,对相应的计算控制单元进行流水线深度的补偿,使数据通过每一级计算阵列的不同计算模块的流水线深度都相同,进而解决现有技术的粗粒度可重构处理器(所述可重构处理器的一种类型)中迭代处理数据效率低、硬件资源配置开销过大的问题。
进一步地,所述第三配置信息是一种选通信号,用于在所述重构配置单元确定所述数据通路流水线结构的当前一级流水线中耗费流水线深度最大的所述计算控制单元后,在当前一级流水线的所有计算模块内,选通所述补偿单元内部设置的相匹配的用于产生所述延时差值的寄存器路径,再控制当前一级流水线的计算控制单元的输出数据在所述寄存器路径传输,直至输出相应的计算模块,才确定:当前一级流水线的计算模块的流水线深度都延时补偿为当前一级计算阵列所允许的最大流水线深度;其中,所述补偿单元是采用选择器和寄存器实现;其中,当前一级计算阵列所允许的最大流水线深度是数据流经所述数据通路流水线结构的相应的数据通路所耗费时间最大的计算控制单元的流水线深度。从而对没有达到当前一级计算阵列的流水线深度的所述计算控制单元做流水补偿,在多级流水线结构上支持可重构阵列的数据处理整体流水化。
进一步地,在所述补偿单元内,所述用于补偿延时差值的寄存器路径是由预设数量的寄存器组成的,且这些寄存器在所述第三配置信息的触发作用下,寄存同一计算模块内的所述计算控制单元输出的数据;其中,寄存所产生的延时差值是等于:所述当前一级计算阵列内所允许的最大流水线深度减去同一计算模块内与所述补偿单元相连接的所述计算控制单元的流水线深度而获得的时间差值。该技术方案基于所述第三配置信息控制所述补偿单元内部的选择器将所述计算控制单元接入用于产生合适的延时差值的寄存器路径中,使得任意数据都以相等流水线深度通过同一级计算阵列内接入所述数据通路流水线结构的不同计算模块。
进一步地,所述第一配置信息包括:将第一级计算阵列内的所述第一互联单元和所述输入FIFO组内部设置的相匹配的输入FIFO连接到所述数据通路流水线结构中所需的访问地址信息和时间信息、将当前一级计算阵列内的所述第一互联单元和相邻的前一级的计算阵列内的相匹配所述第二互联单元连接到所述数据通路流水线结构中所需的访问地址信息和时间信息、将当前一级计算阵列内的所述第二互联单元和相邻的后一级的计算阵列内的相匹配所述第一互联单元连接到所述数据通路流水线结构中所需的访问地址信息和时间信息、将最后一级计算阵列内的所述第二互联单元和所述输出FIFO组内部设置的相匹配的输出FIFO连接到所述数据通路流水线结构中所需的访问地址信息和时间信息;其中,所述第一互联单元和所述第二互联单元都支持在所述可重构阵列内或所述数据通路流水线结构内形成所述计算模块之间互联的拓扑结构,以满足所述算法的完整功能。该技术方案基于所述第一配置信息的要求,把数据送到多级流水线的第一级计算阵列的相应输入端,使得数据经过多级流水线上的计算阵列运算处理后,送到相应的输出FIFO,从而在不同的计算应用需求下,当重构配置单元从提供一种重构信息切换为另一种重构信息时,保证在相邻级的计算阵列之间形成具有完整互连逻辑的流水线结构。
进一步地,所述第二配置信息也是一种选通信号,用于控制所述第一互联单元传输的数据在所述数据直通通路和所述数据计算通路之间选通输出,以满足所述算法在所述数据通路流水线结构的每一级流水线的计算需求;其中,计算控制单元采用数据选择器和算术逻辑电路实现。从而确定计算控制单元当前是否执行计算功能。
进一步地,所述计算控制单元所执行的计算类型包括加减法、乘法、除法、开方、三角计算;其中,每一级计算阵列内的计算控制单元的类型不全相同、或全部相同;其中,相邻两级计算阵列之间的计算控制单元的类型不全相同、或全部相同。在本技术方案中,每一级计算阵列内的所述计算控制单元的类型与个数,可根据具体的应用领域与性能需求加以调整,在此基础上,所述可重构阵列通过改变计算模块的加减乘除计算方式,使得所述可重构阵列互联的各级计算控制单元适用于多种算法,可根据不同算法的需求进行灵活配置,从而改变了传统的计算阵列中一个算法配合一个固定阵列的模式,极大的提高了计算成本和效率。
进一步地,所述可重构阵列设置六级计算阵列,每一级计算阵列都设置有四行的计算模块,六级计算阵列在所述重构配置单元提供的重构信息的配置作用下连接成六级流水线,以形成所述数据通路流水线结构并支持特定粒度的计算操作;其中,同一级计算阵列内每一行只设置一个计算模块;其中,第一级计算阵列设置的四个计算模块的输入端基于所述重构信息分别连接上所述输入FIFO组内设置的四个不同的输入FIFO的输出端,第六级计算阵列设置的一个计算模块的输出端基于所述重构信息连接上所述输出FIFO组内设置的一个输出FIFO的输出端。该技术方案在所述可重构阵列内,设置6列计算阵列,且在每列计算阵列内分行设置4个计算单元,总共24个计算单元,以形成6*4级流水线,再在6*4级流水线内重构出符合所述重构信息的数据通路流水线结构。
一种基于所述可重构处理器的配置方法,包括:根据当前应用场景匹配算法的计算需求,将所述可重构阵列的相邻级的计算阵列连接出支持数据以相等流水线深度通过同一级计算阵列内不同计算模块、且满足当前应用场景匹配的算法的计算需求的数据通路流水线结构;其中,所述数据通路流水线结构的每一级流水线分别对应一级计算阵列;当前一级计算阵列中接入数据通路的计算模块是接入所述数据通路流水线结构的当前一级流水线;其中,流水线深度是数据流经所述数据通路流水线结构的相应的数据通路所耗费的时间。
与现有技术相比,所述配置方法以用于执行计算指令的相邻互连的计算模块为基础,重构调节数据经过每一级计算阵列的流水线深度都相同且满足算法的计算需求的数据通路流水线结构,从而让可重构处理器根据不同的算法配置出适应的流水线深度,并在此基础上实现可重构阵列的数据处理操作的整体流水化,提高了可重构处理器的吞吐率,充分发挥可重构处理器的计算性能。
进一步地,所述配置方法还包括:配置所述可重构阵列接收来自所述输入FIFO组传输的待计算数据,并将待计算数据传输给所述数据通路流水线结构,同时配置所述可重构阵列向所述输出FIFO组输出所述数据通路流水线结构的最后一级所对应的计算阵列的计算结果。该技术方案配置外部数据进入所述可重构处理器的缓存,同时设置所述可重构处理器向外部输出数据的缓存,以匹配所述算法对于所述可重构处理器与外部的系统元件的数据交换、存储要求。
进一步地,连接成所述数据通路流水线结构的具体配置方法包括:在当前一级计算阵列的一个计算模块内,判断当前一级计算阵列是否被检测为对应所述数据通路流水线结构的第一级流水线,是则将第一互联单元和所述计算控制单元接成所述数据通路流水线结构的第一级流水线,并配置第一互联单元将所述输入FIFO组内相匹配的输出端输出的待计算数据输入给所述计算控制单元;否则将第一互联单元和所述计算控制单元接成所述数据通路流水线结构的当前一级流水线,并配置第一互联单元将相邻的前一级计算阵列内相匹配的计算模块所输出的计算结果输入给所述计算控制单元;判断当前一级计算阵列是否被检测为对应最后一级流水线,是则将第二互联单元和补偿单元接成所述数据通路流水线结构的最后一级流水线,并配置所述第二互联单元将经过所述补偿单元延时补偿处理的数据传输给所述输出FIFO组内相匹配的输出FIFO;否则将第二互联单元和补偿单元接成所述数据通路流水线结构的当前一级流水线,并配置所述第二互联单元将经过所述补偿单元延时补偿处理的数据传输给相邻的后一级计算阵列内相匹配的计算模块;判断计算控制单元是否检测到计算选通信号,是则配置输入所述计算控制单元的数据在执行计算后输出给补偿单元,否则配置输入所述计算控制单元的数据在不执行计算的前提下直接通过并传输给补偿单元;然后配置补偿单元选择相应的延时差值对同一计算模块内的所述计算控制单元的输出数据进行延时处理,以将同一计算模块的流水线深度延时补偿为当前一级计算阵列所允许的最大流水线深度;其中,当前一级计算阵列所允许的最大流水线深度是当前一级计算阵列中,数据流经所述数据通路所耗费的时间最大的计算控制单元的流水线深度;其中,计算模块包括计算控制单元、补偿单元、第一互联单元、第二互联单元;在每一级计算阵列的同一计算模块内,第一互联单元的输入端是计算模块的输入端,第一互联单元的输出端与计算控制单元的输入端连接,计算控制单元的输出端与补偿单元的输入端连接,补偿单元的输出端与第二互联单元的输入端连接,第二互联单元的输出端是计算模块的输出端。
该技术方案可以确定当前一级的计算阵列实际执行计算功能的计算模块并确定当前一级的计算阵列的最大流水线深度,然后利用这个最大流水深度与同一级的计算阵列的计算控制单元的流水线深度之差,对相应的计算控制单元进行流水线补偿,使数据通过每一级计算阵列的不同计算模块流水深度都相同,进而解决粗粒度可重构处理器(所述可重构处理器的一种类型)的时钟频率不高,计算效率较低的问题。
同时,通过配置第一互联单元与第二互联单元在计算模块内外部的连接方式,来将计算控制单元和补偿单元连接成为所述数据通路流水线结构的一级流水线结构,进而实现多级流水控制。
进一步地,在所述可重构阵列中,互不相邻级的两个计算阵列之间不通过数据通路跨级连接,以使得互不相邻级的两个计算阵列不直接连接成所述数据通路流水线结构;且同一级计算阵列内的不同计算模块之间不存在数据通路,其中,数据通路是数据传输的路径。与现有技术相比,既保证了所述可重构阵列的灵活性,又简化互联网络通路的复杂度。
附图说明
图1为本发明实施例公开的一种可重构处理器的结构示意图。
图2为本发明一可重构处理器实施例中,由六级计算阵列(每级计算阵列内存在四行计算模块)构成的具有6级流水线的可重构阵列的示意图,其中,图2的可重构阵列设置4个不同输入端分别连接4个不同的输入FIFO,图2的可重构阵列存在一个输出端连接一个输出FIFO。
图3为本发明一实施例中第(a-1)级计算阵列内的第b1行的计算模块(a-1)b1、第a级计算阵列内的第b2行的计算模块ab2、第(a+1)级计算阵列内的第b3行的计算模块(a+1)b3之间的互联逻辑结构的示意图。
图4是本发明另一实施例公开的一种基于可重构处理器的配置方法的流程图。
本发明的实施方式
下面结合附图对本发明的具体实施方式作进一步说明。以下实施方式中所涉及到的各单元模块均为逻辑电路,一个逻辑电路可以是一个物理单元、也可以是由多个逻辑器件按照一定的读写时序和信号逻辑变化组合而成的状态机,也可以是一个物理单元的一部分,还可以以多个物理单元的组合实现。此外,为了突出本发明的创新部分,本发明实施方式中并没有将与解决本发明所提出的技术问题关系不太密切的单元引入,但这并不表明本发明实施方式中不存在其它的单元。
作为一种实施例,本发明公开一种可重构处理器,所述可重构处理器包括重构配置单元和可重构阵列;    重构配置单元,用于根据当前应用场景匹配的算法,提供计算模块的重构信息,其中,所述重构信息是根据所述算法的需求配置的用于重构所述可重构阵列内部的计算结构的信息,实际上是所述可重构处理器针对当前数据处理应用场景,接受外部的用于改变计算模块的互联逻辑的重构信息(包括逻辑电路的组合参数、时序参数),所述可重构处理器再基于重构信息改变多个计算模块连接组成的物理架构,然后所述可重构处理器输出计算结果,等效于在当前数据处理应用场景下软件编程调用算法(算法库函数)去计算出相应的计算结果。对不同的应用需求,当可重构处理器从一种配置变为另一种配置时,可重构阵列可根据不同的应用需求连接出与之相匹配的计算结构,从而不仅可以面对特定领域的几种算法而设计,而且接受移植入其他领域算法的重构信息,提高灵活度。
可重构阵列包括至少两级计算阵列,在一个可重构阵列内,至少存在两个计算阵列分级排列,即至少存在两个计算阵列级联连接,或者理解为至少存在两列相邻的计算阵列,或者理解为至少存在相邻两级的计算阵列,其中,在一个可重构阵列的每一列上只设置有一个计算阵列,每列上的一个计算阵列是一级计算阵列;在这个可重构阵列内的计算阵列的数目是预先设定的,这些计算阵列是以级联的结构存在于可重构阵列。下面为与流水线对应描述,都是使用一级计算阵列来描述一列计算阵列。从而便于后续在硬件上连接成所述可重构阵列的互连架构。
可重构阵列用于根据重构配置单元提供的重构信息,将相邻两级的计算阵列连接出满足算法的计算需求的数据通路流水线结构。其中,所述数据通路流水线结构的每一级流水线分别对应于一级计算阵列,即在所述数据通路流水线结构中,以一级计算阵列作为单位来描述对应的一级流水线;需要强调的是,在当前一级计算阵列内,只有接入所述数据通路的计算模块才视为所述数据通路流水线结构的当前一级流水线,因为计算阵列的级数或计算阵列的数目是预先设定的,计算阵列是预先存在于可重构阵列内的硬件资源,而所述数据通路流水线结构是在已有的计算阵列的基础上,根据所述重构配置单元提供的重构信息配置相邻的计算阵列之间的互连逻辑形成的。
在一个可重构阵列内,将相邻两级的计算阵列通过两两相邻互连(等同于两两互联)的方式连接出满足算法的计算需求的数据通路流水线结构。当所述重构信息发生变化时,相对应执行算法的计算需求也发生相应的改变,则基于变化后的所述重构信息对相邻列的计算阵列进行重新连线,实现以硬件电路的方式执行当前应用场景匹配的算法。同时,相邻两级的计算阵列连接出数据通路流水线结构的流水线深度也自动调整,即数据通路流水线结构的流水线深度可以是发生变化,数据通路流水线结构的流水线深度也可以是不发生变化,使得所述数据通路流水线结构的流水线深度跟随所述重构信息的变化而自适应变化。
在数据通路流水线结构中,同一级计算阵列内不同的计算模块的流水线深度都相等,都等于同一级的计算阵列所允许的最大流水线深度,由于相对小的流水线深度的计算模块执行计算操作的耗时需要延时等待相对大的流水线深度的计算模块执行计算操作,所以同一级的计算阵列所允许的最大流水线深度是同一级的计算阵列内用于流水线中执行计算操作的计算模块中的最大流水线深度或这个属于计算模块的最大流水线深度的预设倍数,但考虑到数据计算效率和时钟频率,一般设置为用于流水线中执行计算操作的计算模块中的最大流水线深度即可。这样保证同一级计算阵列内不同的计算模块同步输出(并行输出)数据,提高可重构处理器的吞吐率。
需要注意的是,由本领域普通技术人员可知,本发明所有的流水线深度是数据经过数据通路流水线结构的相应的数据通路中所耗费的时间,包括数据传输时间和计算处理时间;所述数据通路流水线结构的每一级流水线分别对应于一级计算阵列,即第n级计算阵列是属于第n级流水线,第n级计算阵列中接入数据通路的计算模块(硬件资源)才是接入第n级流水线,即接入所述数据通路流水线结构。
值得注意的是,在一个可重构阵列内,数据通路流水线结构的流水线深度是:可重构阵列内接入数据通路流水线结构(或接入所述数据通路)的所有级的计算阵列的流水线深度之和。
与现有技术相比,本实施例公开的可重构处理器以用于执行计算指令的相邻互连的计算模块为基础,重构调节数据经过每一级计算阵列的流水线深度都相同且满足算法的计算需求的数据通路流水线结构,从而让可重构处理器根据不同的算法配置出适应的流水线深度,并在此基础上实现可重构阵列的数据处理操作的整体流水化,提高了可重构处理器的吞吐率,充分发挥可重构处理器的计算性能。也减小现有技术的流水线设计所需配置的硬件资源。其中,所述数据通路流水线结构的流水线深度跟随所述重构信息的变化而自适应变化表现为:在不同的计算应用需求下,当重构配置单元从提供一种重构信息切换为另一种重构信息时,数据在可重构阵列内访问通过的数据通路流水线结构发生变化,实现自适应地调整所述数据通路流水线结构的流水线深度。
如图1所示,所述可重构处理器还包括输入FIFO组和输出FIFO组;输入FIFO组的输出端分别与可重构阵列的输入端对应连接,这里的对应连接是由所述重构信息配置所述可重构阵列的第一级计算阵列的接入所述数据通路流水线结构的计算模块的输入端与输入FIFO组的相匹配的输出端连接,则可重构阵列用于根据所述重构信息接收来自所述输入FIFO组传输的待计算数据,并将待计算数据传输给所述数据通路流水线结构。输出FIFO组的输入端分别与可重构阵列的输出端对应连接,这里的对应连接是由所述重构信息配置所述可重构阵列的最后一级计算阵列的接入所述数据通路流水线结构的计算模块的输出端与输出FIFO组的相匹配的输入端连接,则可重构阵列还用于根据所述重构信息向所述输出FIFO组提供所述数据通路流水线结构的最后一级流水线所对应的一级计算阵列的输出数据。其中,所述可重构处理器将输入的待处理数据先存放到对应的输入FIFO组中。本实施例设置输入FIFO组作为外部数据进入所述可重构处理器的缓存,设置输出FIFO组作为所述可重构处理器向外部输出数据的缓存,以匹配所述算法对于所述可重构处理器与外部的系统元件的数据交换、存储要求。其中,第一级计算阵列是可重构阵列内级联的计算阵列的第一级(或第一列计算阵列)。依次类推,当前一级计算阵列是可重构阵列内级联的计算阵列的当前一级(或当前一列计算阵列),最后一级计算阵列是可重构阵列内级联的计算阵列的最后一级(或最后一列计算阵列)。
具体地,在所述可重构阵列中,所述将相邻级的计算阵列连接出满足算法的计算需求的数据通路流水线结构的方式包括:互不相邻级(互不相邻列)的两个计算阵列之间不通过数据通路跨级连接,以使得互不相邻的两级计算阵列不直接连接成所述数据通路流水线结构;且同一级计算阵列内的不同计算模块之间不存在数据通路;值得注意的是,若只存在不相邻设置的两级计算阵列,则不能将这两级的计算阵列连接入满足算法的计算需求的数据通路流水线结构;在一个可重构阵列内,若存在不相邻设置的两级计算阵列,则不能通过跨级建立数据通路的直连方式将这两级的计算阵列连接入所述数据通路流水线结构;因此,所述数据通路流水线结构不允许不相邻设置的两级计算阵列直接连接成数据通路。
第一级计算阵列内的计算模块的输入端作为所述可重构阵列的输入端,用于基于所述重构信息,配置为与所述输入FIFO组相匹配的输出端连接;其中,第一级计算阵列是可重构阵列内级联的计算阵列的第一级;当前一级计算阵列内的计算模块的输入端用于基于所述重构信息,配置为与其相邻的前一级计算阵列内的相匹配一行的计算模块的输出端相连接,其中,当前一级计算阵列在所述可重构阵列中不是第一级计算阵列;当前一级计算阵列内的计算模块的输出端用于基于所述重构信息,配置为与其相邻的后一级计算阵列内的相匹配一行计算模块的输入端相连接,其中,当前一级计算阵列在所述可重构阵列中不是最后一级计算阵列;最后一级计算阵列内的计算模块的输出端作为所述可重构阵列的输出端,用于基于所述重构信息,配置为与所述输出FIFO组相匹配的输入端连接;其中,相邻的前一级计算阵列的级别比当前一级计算阵列的级别小一级,相邻的后一级计算阵列的级别比当前一级计算阵列的级别大一级;所述数据通路是数据传输的路径。本实施例按照所述重构信息将所述可重构阵列的彼此相邻两级的计算阵列内的计算模块串行连接为所述数据通路流水线结构,降低了互联网络通路的复杂度,同时还能简单高效的实现多级流水线控制。在此基础上可重构阵列用于根据所述重构信息连接出多路的数据通路流水线结构,以满足同步执行的多种算法的数目应用需求。因此在外部的配置作用下,为了连接成数据通路流水线结构,同一级计算阵列内部的计算模块之间以及不相邻级阵列计算单元之间不进行数据通路连接,每一级计算阵列(不包括第一级计算阵列)内部的计算模块的输入端均支持配置为相邻的前一级计算阵列的任意一计算模块的输出端相连。
如图1所示的可重构处理器内,可重构阵列与重构配置单元之间的连接方式包括但不限于直接耦合、间接耦合或通信连接,可以是电性,机械或其它的形式,用于传输计算模块的重构信息。在图1所示的所述可重构阵列内,存在m级计算阵列,m是大于或等于2的正数;第一级计算阵列内,设置有计算模块1_1、计算模块1_2、......、计算模块1_n1,这些计算模块之间不存在数据通路,其中,n1是大于或等于1的正数。对于1_n1,“1”表示第一级计算阵列的级数,“n1”表示第一级计算阵列内排列的计算模块所处的行数,综合表示为排列在第一级计算阵列的第n1行的计算模块,即图1所述可重构阵列内的第一列第n1行的计算模块;本实施例配置数据通过第一级计算阵列内的每一个计算模块的流水线深度都相等,本领域技术人员可知,若控制数据流经第一级计算阵列内的一个计算模块,则确定将这个计算模块连接入数据通路流水线结构的第一级流水线。需要说明的是,每一级计算阵列的内部设置至少一个计算模块。
第二级计算阵列内,设置有计算模块2_1、......、计算模块2_n2,这些计算模块之间不存在数据通路,但与第一级计算阵列内的被配置互联(互连)的计算模块之间存在数据通路,其中,n2是大于或等于1的数字,n2是不一定等于n1;对于2_n2,“2”表示第二级计算阵列的级数,“n2”表示第二级计算阵列内排列的计算模块所处的行数,综合表示为排列在第二级计算阵列的第n2行的计算模块,即图1所述可重构阵列内的第2列第n2行的计算模块。本实施例配置数据通过第二级计算阵列内的每一个计算模块的流水线深度都相等,本领域技术人员可知,若控制数据流经第二级计算阵列内的一个计算模块,则确定将这个计算模块连接入所述数据通路,即将这个计算模块接入所述数据通路流水线结构的第二级流水线。因此,如图1所示,第二级计算阵列与第一级计算阵列相邻设置在所述可重构阵列内;第二级计算阵列与第一级计算阵列是相邻的两级级联结构,第二级计算阵列与第一级计算阵列也连接为所述数据通路流水线结构中相邻两级流水线,第二级计算阵列相当于第一级计算阵列的相邻的后一级计算阵列,第一级计算阵列相当于第二级计算阵列的相邻的前一级计算阵列。
第m级计算阵列内,设置有计算模块m_1、......、计算模块m_nm,其中,nm大于或等于1,nm不一定等于n1,且nm也不一定等于n1n2;对于m_nm,“m”表示第m级计算阵列的级数,“nm”表示第m级计算阵列内排列的计算模块所处的行数,综合表示为排列在第m级计算阵列的第nm行的计算模块,即图1所述可重构阵列内的第m列第nm行的计算模块。m大于或等于2,在图1所示的实施例中m大于2,且在硬件上属于所述可重构阵列内分级排列的最后一级计算阵列。本实施例配置数据通过第m级计算阵列内的每一个计算模块的流水线深度都相等,本领域技术人员可知,若控制数据通过第m级计算阵列内的一个计算模块,则确定将这个计算模块连接入所述数据通路,即连接入所述数据通路流水线结构的第m级流水线,从而基于所述重构信息配置依次将第一级计算阵列至第m级计算阵列中相应的计算模块连接成为所述数据通路流水线结构,使得第m级流水线成为最后一级流水线。第m级计算阵列与第一级计算阵列不是相邻设置在所述可重构阵列内,它们之间没有通过跨级直连的方式建立所述数据通路,但第m级计算阵列与其相邻的前一级建立所述数据通路以连接为所述数据通路流水线结构中相邻两级流水线。则由图1可知,所述可重构阵列内所有的计算阵列内接入所述数据通路的计算模块连接成所述数据通路流水线结构,则所述数据通路流水线结构的每一级流水线分别与所述可重构阵列内每一级计算阵列一一对应,因此,图1所示的可重构阵列内,所述数据通路流水线结构的流水线深度是所有级的计算阵列的流水线深度之和,即m级计算阵列的流水线深度之和。
作为一种实施例,所述重构配置单元提供的计算模块的重构信息包括第二配置信息、第一配置信息、第三配置信息;计算模块包括计算控制单元、补偿单元、第一互联单元、第二互联单元。
结合图3可知,在计算模块a_b2中,注意这个计算模块a_b2表示第a级计算阵列内的第b2行的计算模块,具体识别方法参照前述实施例,在此不再赘述。后续的计算模块、计算模块内部划分出的逻辑单元的标记也是参照前述实施例对其在所述可重构阵列内的位置进行识别,在此不再赘述。需要补充的是,在当前一级计算阵列的同一计算模块内,第一互联单元的输入端是计算模块的输入端,第一互联单元的输出端与计算控制单元的输入端连接,计算控制单元的输出端与补偿单元的输入端连接,补偿单元的输出端与第二互联单元的输入端连接,第二互联单元的输出端是计算模块的输出端。
结合图3可知:第一互联单元a_b2用于根据所述第一配置信息,将第一互联单元a_b2和计算控制单元a_b2接入所述数据通路流水线结构的当前一级流水线(对应于图3 的第a级流水线)。结合图3可知,第一互联单元(a-1)_b1在第一级计算阵列且接入第一级流水线时,第一互联单元(a-1)_b1将所述输入FIFO组内相匹配的输出端输出的待计算数据输入给计算控制单元(a-1)_b1。第一互联单元a_b2不在第一级计算阵列时,第一互联单元a_b2将相邻的前一级计算阵列内相匹配的计算模块(a-1)_b1所输出的计算结果输入给计算控制单元a_b2;同理,第一互联单元(a+1)_b3不在第一级计算阵列时,第一互联单元(a+1)_b3将相邻的前一级计算阵列内相匹配的计算模块a_b2所输出的计算结果输入给计算控制单元(a+1)_b3。需要说明的是,在本实施例中,数据通过前述的第一互联单元是作不计流水线深度处理的。
计算控制单元a_b2用于根据第二配置信息选择连接成所述数据直通通路以控制输入所述计算控制单元a_b2的数据在不被使能触发执行计算的前提下直接通过并传输给补偿单元a_b2、或选择连接成数据计算通路以控制输入所述计算控制单元a_b2的数据在执行计算后传输给补偿单元a_b2;图3所示的计算控制单元(a+1)_b3、计算控制单元(a-1)_b1也是在第二配置信息的作用下选择连接成所述数据直通通路和数据计算通路。其中,所述数据通路包括数据直通通路和数据计算通路。需要说明的是,所述计算控制单元所执行的计算类型包括加减法、乘法、除法、开方、三角计算,这些具体的计算操作是在计算控制单元连接成数据计算通路时执行的,且耗费一定的时间而产生对应计算类型的的流水线深度,比如:计算控制单元为加减法器时,若连接成数据计算通路则将其流水线深度配置为1;计算控制单元为乘法器时,若连接成数据计算通路则将其流水线深度配置为4;计算控制单元为除法器时,若连接成数据计算通路则将其流水线深度配置为6;计算控制单元为开方器时,若连接成数据计算通路则将其流水线深度配置为4;计算控制单元为三角函数计算器时,若连接成数据计算通路则将其流水线深度配置为12;当计算控制单元连接成数据直通通路时,代表数据直通,直接连接到相邻的后一级计算阵列,则将计算控制单元的流水线深度配置为0。需要注意的是,每一级计算阵列内的计算控制单元的类型不全相同、或全部相同;其中,相邻两级计算阵列之间的计算控制单元的类型不全相同、或全部相同。在本实施例中,每一级计算阵列内的所述计算控制单元的类型与个数,可根据具体的应用领域与性能需求加以调整,在此基础上,所述可重构阵列通过改变计算模块的加减乘除计算方式,使得所述可重构阵列互联的各级计算控制单元适用于多种算法,可根据不同算法的需求进行灵活配置,从而改变了传统的计算阵列中一个算法配合一个固定阵列的模式,极大的提高了计算成本和效率。
优选地,所述第二配置信息也是一种选通信号,用于控制所述第一互联单元传输的数据在所述数据直通通路和所述数据计算通路之间选通输出,以满足所述算法在所述数据通路流水线结构的每一级流水线的计算需求;其中,计算控制单元采用数据选择器和算术逻辑电路实现,数据选择器的选通端是接收第二配置信息,算术逻辑电路所能执行的运算对应上述实施例的加减法、乘法、除法、开方、三角计算;算术逻辑电路的输入端连接上数据选择器的数据输出端,数据选择器用于根据所述第二配置信息将所述第一互联单元传输的数据在所述数据直通通路和所述数据计算通路之间切换输出,以满足所述算法在所述数据通路流水线结构的每一级流水线的计算需求。从而确定计算控制单元当前是否执行计算功能。
补偿单元用于根据第三配置信息选择相应的延时差值将所属的计算模块的流水线深度延时补偿为当前一级计算阵列所允许的最大流水线深度,本实施例是将所述计算控制单元的流水线深度加上所述补偿单元补偿的延时差值对应的流水线深度,获得的和值为当前一级计算阵列所允许的最大流水线深度;其中,当前一级计算阵列所允许的最大流水线深度是数据流经所述数据通路流水线结构的当前一级流水线所耗费时间最大的计算控制单元的流水线深度;因此,图3所示的补偿单元(a+1)_b3将计算控制单元(a+1)_b3的输出数据(理解为计算结果)进行延时处理,将计算模块(a+1)_b3的流水线深度补偿为第(a+1)级计算阵列所允许的最大流水线深度;若计算控制单元(a+1)_b3在第(a+1)级计算阵列中连接形成的数据计算通路中计算传输所耗费的时间,是第(a+1)级计算阵列中所有连接为通路的计算控制单元中最大的,则补偿单元(a+1)_b3产生的延时差值为0,视为不对计算控制单元(a+1)_b3进行延时补偿,使得第(a+1)级计算阵列所允许的最大流水线深度等于计算控制单元(a+1)_b3的流水线深度。同理,在确定第a级计算阵列所允许的最大流水线深度后,补偿单元a_b2也对计算控制单元a_b2执行类似的延时处理;在确定第(a-1)级计算阵列所允许的最大流水线深度后,补偿单元(a-1)_b1也对计算控制单元(a-1)_b1执行类似的延时处理。
结合图3可知:第二互联单元(a+1)_b3用于根据所述第一配置信息,将第二互联单元(a+1)_b3和补偿单元(a+1)_b3接入所述数据通路流水线结构的第(a+1)级流水线。结合图3可知,第二互联单元(a+1)_b3在最后一级(列)计算阵列且接入最后一级流水线时,将经过所述补偿单元(a+1)_b3延时补偿处理的数据传输给所述输出FIFO组内相匹配的输出FIFO;第二互联单元a_b2不在最后一级计算阵列时,第二互联单元a_b2将经过补偿单元a_b2延时补偿处理的数据传输给相邻的后一级计算阵列内相匹配的计算模块(a+1)_b3;同理地,第二互联单元(a-1)_b1不在最后一级计算阵列时,第二互联单元(a-1)_b1将经过补偿单元(a-1)_b1延时补偿处理的数据传输给相邻的后一级计算阵列内相匹配的计算模块a_b2;需要说明的是,在本实施例中,数据通过前述的第二互联单元是作不计流水线深度处理的。
在前述实施例中,计算模块通过第一互联单元与相邻的前一级计算阵列连接,计算模块通过第二互联单元与相邻的后一级计算阵列连接,在第一互联单元与第二互联单元之间连接有计算控制单元和补偿单元,形成基于所述重构信息的流水线,从而将所述计算模块按相邻列设置成可重构的互联逻辑模式,硬件结构简单;同时,在确定当前一级的计算阵列实际执行计算功能的计算模块的基础上确定当前一级的计算阵列的最大流水线深度,然后利用这个最大流水深度与同一级的计算阵列的计算控制单元的流水线深度之差,对相应的计算控制单元进行流水线深度的补偿,使数据通过每一级计算阵列的不同计算模块流水线深度都相同,进而解决粗粒度可重构处理器(所述可重构处理器的一种类型)的时钟频率不高,计算效率较低的问题。
在前述实施例中,所述第三配置信息是一种选通信号,用于在所述重构配置单元确定所述数据通路流水线结构的当前一级流水线中耗费流水线深度最大的所述计算控制单元后,(即在所述重构配置单元确定当前一级计算阵列中接入所述数据通路流水线结构中耗费的流水线深度最大的所述计算控制单元后),在当前一级流水线的所有计算模块内(在当前一级计算阵列的接入所述数据通路流水线结构的所有计算模块内),选通所述补偿单元内部设置的相匹配的用于补偿延时差值的寄存器路径,再控制当前一级流水线的计算控制单元的输出数据在所述寄存器路径传输(再控制当前一级计算阵列内接入所述数据通路流水线结构的不同计算控制单元的输出数据在所述寄存器路径传输),直至输出相应的计算模块,实现当前一级流水线的计算模块的流水线深度都延时补偿为当前一级计算阵列所允许的最大流水线深度,即同一级计算阵列内接入所述数据通路流水线结构的计算模块的流水线深度都延时补偿为当前一级计算阵列所允许的最大流水线深度,在该实施方式中,数据先经过第一流水线深度的计算控制单元,再被控制通过相匹配的用于补偿延时差值的寄存器路径,其中,数据经过相匹配的用于补偿延时差值的寄存器路径产生的流水线深度为第二流水线深度,则数据所耗费的流水线深度为第一流水线深度与第二流水线深度之和,等于当前一级计算阵列所允许的最大流水线深度,使得同一级计算阵列内接入所述数据通路流水线结构的计算模块都同步输出数据;其中,所述补偿单元是采用选择器和寄存器实现,从而有选择地对没有达到当前一级计算阵列的流水线深度的所述计算控制单元做流水补偿,在多级流水线结构上支持可重构阵列的数据处理整体流水化。
优选地,在所述补偿单元内,所述用于补偿延时差值的寄存器路径是由预设数量的寄存器组成的,且这些寄存器在所述第三配置信息的触发作用下,寄存同一计算模块内的所述计算控制单元输出的数据,当预设数量发生变化,则用于补偿延时差值的寄存器路径产生的延时时间也发生变化,导致数据经过时所产生的流水线深度也不同,从而基于选择器的数据选通作用,为流水线深度不同的计算控制单元提供相匹配的延时差值,这是本领域技术人员基于前述流水线深度的补偿机理,采用选择器和寄存器进行逻辑器件的组合改进可获得的,包括但不限于:选择器的选通端是用于接收所述第三配置信息,选择器的多个数据输出端分别连接上不同数量组成的寄存器路径,则同一个所述补偿单元内存在多种可供选择的寄存器路径,在所述第三配置信息的选通作用下,所述补偿单元选择将同一计算模块内的计算控制单元连接上相匹配的寄存器路径,再控制这个计算控制单元的输出数据在所述寄存器路径传输,直至输出这个计算模块,实现同一级计算阵列内接入所述数据通路流水线结构的计算模块的流水线深度都延时补偿为当前一级计算阵列所允许的最大流水线深度。在本实施例中,寄存所产生的延时差值是等于:所述当前一级计算阵列内所允许的最大流水线深度减去同一计算模块内与所述补偿单元相连接的所述计算控制单元的流水线深度而获得的时间差值。本实施例基于所述第三配置信息控制所述补偿单元内部的选择器将所述计算控制单元接入用于产生合适的延时差值的寄存器路径中,使得任意数据都以相等流水线深度通过同一级计算阵列内接入所述数据通路流水线结构的不同计算模块。
优选地,所述第一配置信息包括:将第一级计算阵列内的所述第一互联单元和所述输入FIFO组内部设置的相匹配的输入FIFO连接到所述数据通路流水线结构中所需的访问地址信息和时间信息、将当前一级计算阵列内的所述第一互联单元和相邻的前一级的计算阵列内的相匹配所述第二互联单元连接到所述数据通路流水线结构中所需的访问地址信息和时间信息、将当前一级计算阵列内的所述第二互联单元和相邻的后一级的计算阵列内的相匹配所述第一互联单元连接到所述数据通路流水线结构中所需的访问地址信息和时间信息、将最后一级计算阵列内的所述第二互联单元和所述输出FIFO组内部设置的相匹配的输出FIFO连接到所述数据通路流水线结构中所需的访问地址信息和时间信息;其中,所述第一互联单元和所述第二互联单元都支持在所述可重构阵列内或所述数据通路流水线结构内形成所述计算模块之间互联的拓扑结构,以满足所述算法的完整功能。本实施例该基于所述第一配置信息的要求,把数据送到多级流水线的第一级计算阵列的相应输入端,使得数据经过多级流水线上的计算阵列运算处理后,送到相应的输出FIFO,从而在不同的计算应用需求下,当重构配置单元从提供一种重构信息切换为另一种重构信息时,保证在相邻级的计算阵列之间形成具有完整互连逻辑的流水线结构。
作为一种实施例,如图2所示,所述可重构阵列设置六级计算阵列,即6列计算阵列;每一级计算阵列都设置有四行的计算模块,即每一列计算阵列都分行排列4个计算模块,同一级计算阵列内每一行有一个计算模块,分别使用图2所示的加减法器、乘法器、除法器、开方器、三角函数计算器来表示,而这些计算模块在图2中所带的标记的意义参照图1实施例的“m_nm”的解释,即‘_’之前的标记表示所述可重构阵列的计算阵列的列数、或者所述可重构阵列的计算阵列的级数、所述数据通路流水线结构的流水线级数(每一级计算阵列是每一级流水线),‘_’之后的标记表示所述可重构阵列的计算模块的行数,即计算模块分布在所属的计算阵列的行数。本实施例将六级计算阵列在所述重构配置单元提供的重构信息的配置作用下连接成六级流水线,以形成所述数据通路流水线结构并支持特定粒度的计算操作;其中,同一级计算阵列内每一行只设置一个计算模块;其中,第一级计算阵列设置的四个计算模块的输入端基于所述重构信息分别连接上所述输入FIFO组内设置的四个不同的输入FIFO的输出端,第六级计算阵列设置的一个计算模块的输出端基于所述重构信息连接上所述输出FIFO组内设置的一个输出FIFO的输出端。
需要说明的是,在图2中,所述可重构阵列中计算模块都采用流水化设计,计算模块内部的不同类型计算控制单元有不同流水线深度,其中,加减法器内部的计算控制单元的流水线深度为1,乘法器内部的计算控制单元的流水线深度为4,除法器内部的计算控制单元的流水线深度为6,开方器内部的计算控制单元的流水线深度为4,三角函数计算器内部的计算控制单元的流水线深度为12,当所述计算控制单元选择作为数据直通通路连接时,代表数据直通,流水线深度为0。这里的加减法器内部的计算控制单元在连接为所述数据计算通路时,用于执行加法计算或减法计算,由于执行加法计算和执行减法计算所耗费的流水线深度都相同,所以将执行加法计算或减法计算简称为加减法计算。
图2填充斜线的计算模块,用于代表其内部的计算控制单元通过连接成所述数据计算通路来执行相应功能计算;而带箭头的连线指示的且没填充斜线的计算模块,用于代表内部的计算控制单元通过连接成所述数据直通通路而实现数据不经处理而直通。
如图2所示,第一级计算阵列包括加减法器1_1、加减法器1_2、加减法器1_3、加减法器1_4;第一级计算阵列内的加减法器1_1和加减法器1_4的内部的计算控制单元都被所述重构信息所包括的第二配置信息配置作用下连接成数据计算通路,被应用于执行加减法计算;而第一级计算阵列内加减法器1_2和加减法器1_3的内部都在所述重构信息所包括的第二配置信息配置作用下连接成数据直通通路;加减法器1_1、加减法器1_4、加减法器1_2和加减法器1_3都连接为所述数据通路流水线结构的第一级流水线;在所述第三配置信息的配置作用下,第一级计算阵列对应接入的第一级流水线的流水线深度为1。其中,基于第一配置信息,加减法器1_1接收所述输入FIFO组的第一输入FIFO传输的数据和第二输入FIFO传输的数据,加减法器1_2接收所述输入FIFO组的第二输入FIFO传输的数据,加减法器1_3接收所述输入FIFO组的第三输入FIFO传输的数据,加减法器1_4接收所述输入FIFO组的第三输入FIFO传输的数据和第四输入FIFO传输的数据。
第二级计算阵列包括乘法器2_1、乘法器2_2、乘法器2_3、乘法器2_4;在第二级计算阵列内,乘法器2_1基于所述第一配置信息分别接收加减法器1_2的输出数据和加减法器1_1的输出数据,乘法器2_3基于所述第一配置信息分别输入加减法器1_2的输出数据和加减法器1_3的输出数据,然后乘法器2_1和乘法器2_3的内部的计算控制单元都在所述重构信息所包括的第二配置信息的配置作用下连接成所述数据计算通路,被应用于执行乘法计算;在第二级计算阵列内,乘法器2_4基于所述第一配置信息分别接收加减法器1_4的输出数据,然后基于所述第二配置信息的配置作用,乘法器2_4内部的计算控制单元连接成所述数据直通通路。综上,乘法器2_1、乘法器2_3和乘法器2_4都连接为所述数据通路流水线结构的第二级流水线;在所述第三配置信息的配置作用下,第二级计算阵列对应接成的第二级流水线的流水线深度为4。
第三级计算阵列包括加减法器3_1、加减法器3_2、加减法器3_3、加减法器3_4;在第三级计算阵列内,加减法器3_1基于所述第一配置信息接收乘法器2_1的输出数据,加减法器3_3基于所述第一配置信息接收乘法器2_3的输出数据,加减法器3_4基于所述第一配置信息接收乘法器2_4的输出数据;加减法器3_1、加减法器3_3和加减法器3_4的内部的计算控制单元都在所述重构信息所包括的第二配置信息配置作用下连接成数据直通通路;加减法器3_1、加减法器3_3和加减法器3_4都连接为所述数据通路流水线结构的第三级流水线;在所述第三配置信息的配置作用下,第三级计算阵列对应接成的第三级流水线的流水线深度为0。
第四级计算阵列包括乘法器4_1、乘法器4_2、乘法器4_3、除法器4_4;在第四级计算阵列内,乘法器4_2基于所述第一配置信息接收加减法器3_1的输出数据和加减法器3_3的输出数据,除法器4_4基于所述第一配置信息接收加减法器3_3的输出数据和加减法器3_4的输出数据;乘法器4_2的内部的计算控制单元在所述第二配置信息的配置作用下连接成所述数据计算通路,应用于执行乘法计算;除法器4_4的内部的计算控制单元在所述第二配置信息的配置作用下连接成所述数据计算通路,应用于执行除法计算;乘法器4_2、除法器4_4都连接为所述数据通路流水线结构的第四级流水线;在所述第三配置信息的配置作用下,由于除法器内部的计算控制单元的流水线深度为6,大于乘法器内部的计算控制单元的流水线深度,所以除法器作为当前一级流水线中耗费流水线深度最大的所述计算控制单元,则将第四级计算阵列对应接成的第四级流水线的流水线深度为6。
同理,第五级计算阵列包括加减法器5_1、加减法器5_2、加减法器5_3、加减法器5_4;在第五级计算阵列内,加减法器5_2基于所述第一配置信息接收乘法器4_2的输出数据和除法器4_4的输出数据;加减法器5_2的内部的计算控制单元在所述第二配置信息的配置作用下连接成所述数据计算通路,应用于执行加减法计算;加减法器连接为所述数据通路流水线结构的第五级流水线;在所述第三配置信息的配置作用下,第五级计算阵列对应接成的第五级流水线的流水线深度为1。
同理,第六级计算阵列包括乘法器6_1、开方器6_2、除法器6_3、三角计算器6_4;在第六级计算阵列内,开方器6_2基于所述第一配置信息接收加减法器5_2的输出数据;开方器6_2的内部的计算控制单元在所述第二配置信息的配置作用下连接成所述数据计算通路,应用于执行开方计算;开方器6_2连接为所述数据通路流水线结构的第六级流水线;在所述第三配置信息的配置作用下,第六级计算阵列对应接成的第六级流水线的流水线深度为4。开方器6_2基于所述第一配置信息输出数据至输出FIFO内部的输出FIFO组。
综上,在所述第三配置信息的配置作用下,所述可重构处理器连接出的满足算法的计算需求的数据通路流水线结构的流水线深度为前述的六级流水线的流水线深度之和,具体为:1+4+0+6+1+4=16。使得连接成多级流水线的可重构阵列根据当前配置的数据通路,自适应调整流水线深度之和为16。这样,具备多级流水线的可重构阵列整体可看做一个16级流水线深度的复杂计算模块。对不同的应用需求,当所述可重构处理器从一种配置变为另一种配置时,所述可重构阵列可根据每一级计算阵列的流水深度,自适应调整流水线深度之和。
具体地,基于所述第三配置信息的配置作用,第一级计算阵列中的加减法器1_2和加减法器1_3内部的计算控制单元的流水线深度为0,需要控制各自内部的补偿单元做一级的流水补偿,即使用第一预设数量的寄存器对计算控制单元的输出数据完成打一拍延时补偿,为第一级计算阵列延时补偿1级的流水线深度。
第二级计算阵列中的乘法器2_4内部的计算控制单元连接为所述数据直通通路,乘法器2_4内部的计算控制单元的流水线深度为0,需要所述第三配置信息控制内部的补偿单元对乘法器2_4内部的计算控制单元的输出数据做4级的流水线补偿,以将乘法器2_4的流水线深度延时补偿为第二级计算阵列所允许的最大流水线深度,即内部的计算控制单元连接为所述数据计算通路的乘法器2_1或乘法器2_3的流水线深度。
第四级计算阵列中的乘法器4_2的内部的计算控制单元的流水线深度为4,小于内部的计算控制单元连接为所述数据计算通路的除法器4_4的流水线深度6,除法器4_4的流水线深度配置为第四级计算阵列所允许的最大流水线深度;因此需要所述第三配置信息控制内部的补偿单元对乘法器4_2内部的计算控制单元的输出数据做2级的流水线补偿,以将乘法器4_2的流水线深度延时补偿为第二级计算阵列所允许的最大流水线深度。
基于前述的可重构处理器,本发明另一实施例公开一种配置方法,包括:根据当前应用场景匹配算法的计算需求,将所述可重构阵列的相邻级的计算阵列连接出支持数据以相等流水线深度通过同一级计算阵列内不同计算模块、且满足当前应用场景匹配的算法的计算需求的数据通路流水线结构;在同一级计算阵列内,接入数据通路流水线结构的不同的计算模块的流水线深度都相等,使得接入数据通路流水线结构的不同的计算模块都同步输出数据;在数据通路流水线结构中,配置同一级计算阵列内不同的计算模块的流水线深度都相等,都等于同一级的计算阵列所允许的最大流水线深度。由于相对小的流水线深度的计算模块执行计算操作的耗时需要延时等待相对大的流水线深度的计算模块执行计算操作,所以本实施例配置同一级的计算阵列所允许的最大流水线深度是同一级的计算阵列内用于流水线中执行计算操作的计算模块中的最大流水线深度或这个属于计算模块的最大流水线深度的预设倍数。这样保证同一级计算阵列内不同的计算模块同步输出(并行输出)数据,提高可重构处理器的吞吐率。
每一级计算阵列的内部设置至少一个计算模块;其中,在一个可重构阵列内,每一列上设置的一个计算阵列是一级计算阵列,计算阵列的数目是预先设定的;其中,所述数据通路流水线结构的每一级流水线分别对应一级计算阵列;当前一级计算阵列中接入数据通路的计算模块是接入所述数据通路流水线结构的当前一级流水线;其中,流水线深度是数据流经所述数据通路流水线结构的相应的数据通路所耗费的时间。与现有技术相比,所述配置方法以用于执行计算指令的相邻互连的计算模块为基础,重构调节数据经过每一级计算阵列的流水线深度都相同且满足算法的计算需求的数据通路流水线结构,从而让可重构处理器根据不同的算法配置出适应的流水线深度,并在此基础上实现可重构阵列的数据处理操作的整体流水化,提高了可重构处理器的数据吞吐率。
所述配置方法还包括:基于重构信息还配置所述可重构阵列接收来自所述输入FIFO组传输的待计算数据,并将待计算数据传输给所述数据通路流水线结构,同时配置所述可重构阵列向所述输出FIFO组输出所述数据通路流水线结构的最后一级所对应的计算阵列的计算结果。所述配置方法配置外部数据进入所述可重构处理器的缓存,同时设置所述可重构处理器向外部输出数据的缓存,以匹配所述算法对于所述可重构处理器与外部的系统元件的数据交换、存储要求。
作为一种实施例,如图4所示,所述配置方法具体包括:步骤S41、基于前述实施例的重构信息开始对可重构阵列进行配置,然后进入步骤S42。
步骤S42、判断是否遍历完当前一级计算阵列内接入数据通路的所有计算模块,是则进入步骤S413,否则进入步骤S43。需要说明的是,数据通路是数据通路流水线结构的一部分,作为单元去描述所述数据通路流水线结构的每一级流水线。
步骤S43、开始遍历当前一级计算阵列内接入所述数据通路的新的计算模块,然后进入步骤S44。
步骤S44、判断当前一级计算阵列是否被检测为对应所述数据通路流水线结构的第一级流水线,即判断当前一级计算阵列是否存在接入第一级流水线的数据通路的计算模块,是则进入步骤S45,否则进入步骤S46。
步骤S45、将第一互联单元和所述计算控制单元接入所述数据通路流水线结构的第一级流水线,同时将第二互联单元和补偿单元接入所述数据通路流水线结构的第一级流水线,从而在第一级计算阵列内连接成所述数据通路流水线结构的第一级流水线。然后进入步骤S49。
步骤S49、判断计算控制单元是否检测到计算选通信号(对应于前述实施例的第三配置信息的配置作用),是则进入步骤S410,否则进入步骤S411。
步骤S410、配置输入所述计算控制单元的数据在执行计算后输出给补偿单元,然后进入步骤S412。
步骤S411、配置输入所述计算控制单元的数据在不执行计算的前提下直接通过并传输给补偿单元。
步骤S412、配置补偿单元选择相应的延时差值将所述计算控制单元的流水线深度延时补偿为当前一级计算阵列所允许的最大流水线深度,具体补偿方法参照前述可重构处理器的补偿单元的实施例。再返回步骤S42。
步骤S46、判断当前一级计算阵列是否被检测为对应最后一级流水线,是则进入步骤S47,否则进入步骤S48。
步骤S47、将第一互联单元和所述计算控制单元接入所述数据通路流水线结构的最后一级流水线,同时将第二互联单元和补偿单元接入所述数据通路流水线结构的最后一级流水线,然后进入步骤S49。
步骤S48、将第一互联单元和所述计算控制单元接入所述数据通路流水线结构的当前一级流水线,同时将第二互联单元和补偿单元接入所述数据通路流水线结构的当前一级流水线,然后进入步骤S49。
步骤S413、判断是否遍历完可重构阵列内所有的计算阵列,是则进入步骤S415,否则进入步骤S414。
步骤S414、开始遍历相邻的下一级计算阵列,再返回步骤S42。
步骤S415、确定遍历完可重构阵列内所有列(所有级)的计算阵列,则结束对所述可重构阵列的重构配置操作。
其中,计算模块包括计算控制单元、补偿单元、第一互联单元、第二互联单元;在每一级计算阵列的同一计算模块内,第一互联单元的输入端是计算模块的输入端,第一互联单元的输出端与计算控制单元的输入端连接,计算控制单元的输出端与补偿单元的输入端连接,补偿单元的输出端与第二互联单元的输入端连接,第二互联单元的输出端是计算模块的输出端。
前述步骤可以确定当前一级的计算阵列实际执行计算功能的计算模块并确定当前一级的计算阵列的最大流水线深度,然后利用这个最大流水深度与同一级的计算阵列的计算控制单元的流水线深度之差,对相应的计算控制单元进行流水线补偿,使数据通过每一级计算阵列的不同计算模块流水深度都相同,进而解决粗粒度可重构处理器(所述可重构处理器的一种类型)的时钟频率不高,计算效率较低的问题。同时,通过配置第一互联单元与第二互联单元在计算模块内外部的连接方式,来将计算控制单元和补偿单元连接成为所述数据通路流水线结构的一级流水线结构,进而实现多级流水控制。
需要说明的是,在所述可重构阵列中,互不相邻级的两个计算阵列之间不通过所述数据通路跨级连接,以使得互不相邻级的两个计算阵列不直接连接成所述数据通路流水线结构;且同一级计算阵列内的不同计算模块之间不存在所述数据通路,其中,所述数据通路是数据传输的路径。
在本申请所提供的实施例中,应该理解到,所揭露的系统、芯片,可以通过其它的方式实现。例如,以上所描述的系统实施例仅仅是示意性的,例如,所述单元的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,例如多个单元或组件可以结合或者可以集成到另一个系统,或一些特征可以忽略,或不执行。另一点,所显示或讨论的相互之间的耦合或直接耦合或通信连接可以是通过一些接口,装置或单元的间接耦合或通信连接,可以是电性,机械或其它的形式。所述作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部单元来实现本实施例方案的目。

Claims (13)

  1. 一种可重构处理器,其特征在于,所述可重构处理器包括重构配置单元和可重构阵列;
    重构配置单元,用于根据当前应用场景匹配的算法,提供用于重构可重构阵列内的计算结构的重构信息;
    可重构阵列包括至少两级计算阵列,可重构阵列用于根据重构配置单元提供的重构信息,将相邻两级的计算阵列连接出满足所述当前应用场景匹配的算法的计算需求的数据通路流水线结构;在同一级计算阵列内,接入数据通路流水线结构的不同的计算模块的流水线深度都相等,使得接入数据通路流水线结构的不同的计算模块都同步输出数据;每一级计算阵列的内部设置至少一个计算模块;
    其中,在一个可重构阵列的每一列上只设置有一个计算阵列,每列上的一个计算阵列是一级计算阵列;在这个可重构阵列内的计算阵列的数目是预先设定的,这些计算阵列是以级联的结构存在于可重构阵列;
    其中,所述数据通路流水线结构的每一级流水线分别对应一级计算阵列;在每一级计算阵列中,接入数据通路的所述计算模块相当于接入所述数据通路流水线结构的对应一级流水线;
    其中,流水线深度是数据流经所述数据通路流水线结构的相应的数据通路所耗费的时间。
  2. 根据权利要求1所述可重构处理器,其特征在于,所述可重构处理器还包括输入FIFO组和输出FIFO组;
    输入FIFO组的输出端分别与可重构阵列的输入端对应连接,可重构阵列用于根据所述重构信息接收来自所述输入FIFO组传输的待计算数据,并将待计算数据传输给所述数据通路流水线结构;
    输出FIFO组的输入端分别与可重构阵列的输出端对应连接,可重构阵列还用于根据所述重构信息向所述输出FIFO组提供所述数据通路流水线结构的最后一级流水线所对应的一级计算阵列的输出数据。
  3. 根据权利要求2所述可重构处理器,其特征在于,在所述可重构阵列中,所述将相邻两级的计算阵列连接出满足算法的计算需求的数据通路流水线结构的方式包括:
    互不相邻级的两个计算阵列之间不通过数据通路跨级连接,以使得互不相邻的两级计算阵列不直接连接成所述数据通路流水线结构;且同一级计算阵列内的不同计算模块之间不存在数据通路;
    第一级计算阵列内的计算模块的输入端作为所述可重构阵列的输入端,用于基于所述重构信息,配置为与所述输入FIFO组相匹配的输出端连接;其中,第一级计算阵列是可重构阵列内级联的计算阵列的第一级;
    当前一级计算阵列内的计算模块的输入端用于基于所述重构信息,配置为与其相邻的前一级计算阵列内的相匹配一行的计算模块的输出端相连接,其中,当前一级计算阵列在所述可重构阵列中不是第一级计算阵列;
    当前一级计算阵列内的计算模块的输出端用于基于所述重构信息,配置为与其相邻的后一级计算阵列内的相匹配一行计算模块的输入端相连接,其中,当前一级计算阵列在所述可重构阵列中不是最后一级计算阵列;
    最后一级计算阵列内的计算模块的输出端作为所述可重构阵列的输出端,用于基于所述重构信息,配置为与所述输出FIFO组相匹配的输入端连接;
    其中,相邻的前一级计算阵列的级别比当前一级计算阵列的级别小一级,相邻的后一级计算阵列的级别比当前一级计算阵列的级别大一级;所述数据通路是数据传输的路径。
  4. 根据权利要求3所述可重构处理器,其特征在于,所述重构配置单元提供的计算模块的重构信息包括第二配置信息、第一配置信息、第三配置信息;
    计算模块包括计算控制单元、补偿单元、第一互联单元、第二互联单元;
    第一互联单元,用于根据所述第一配置信息,将第一互联单元和计算控制单元接入所述数据通路流水线结构的当前一级流水线,其中,第一互联单元用于在当前一级流水线对应于第一级计算阵列时,将所述输入FIFO组内相匹配的输出端输出的待计算数据输入给所述计算控制单元;第一互联单元还用于在当前一级流水线不是对应于第一级计算阵列时,将相邻的前一级计算阵列内相匹配的计算模块所输出的计算结果输入给所述计算控制单元;
    计算控制单元,用于根据第二配置信息,选择连接成数据直通通路以控制输入所述计算控制单元的数据在不执行计算的前提下直接通过并传输给补偿单元、或选择连接成数据计算通路以控制输入所述计算控制单元的数据在执行计算后传输给补偿单元;其中,所述数据通路包括数据直通通路和数据计算通路;
    补偿单元,用于根据第三配置信息,选择相应的延时差值将同一计算模块的流水线深度延时补偿为当前一级计算阵列所允许的最大流水线深度;
    第二互联单元,用于根据所述第一配置信息,将第二互联单元和补偿单元接入所述数据通路流水线结构的当前一级流水线,其中,第二互联单元用于在当前一级流水线对应于最后一级计算阵列时,将经过所述补偿单元延时补偿处理的数据传输给所述输出FIFO组内相匹配的输出FIFO;第二互联单元还用于在当前一级流水线不是对应于最后一级计算阵列时,将经过所述补偿单元延时补偿处理的数据传输给相邻的后一级计算阵列内相匹配的计算模块;
    其中,在当前一级计算阵列的同一计算模块内,第一互联单元的输入端是计算模块的输入端,第一互联单元的输出端与计算控制单元的输入端连接,计算控制单元的输出端与补偿单元的输入端连接,补偿单元的输出端与第二互联单元的输入端连接,第二互联单元的输出端是计算模块的输出端。
  5. 根据权利要求4所述可重构处理器,其特征在于,所述第三配置信息是一种选通信号,用于在所述重构配置单元确定所述数据通路流水线结构的当前一级流水线中耗费流水线深度最大的所述计算控制单元后,在当前一级流水线的所有计算模块内,选通所述补偿单元内部设置的相匹配的用于产生所述延时差值的寄存器路径,再控制当前一级流水线的计算控制单元的输出数据在所述寄存器路径传输,直至输出相应的计算模块,才确定:当前一级流水线的计算模块的流水线深度都延时补偿为当前一级计算阵列所允许的最大流水线深度;
    其中,当前一级计算阵列所允许的最大流水线深度是数据流经所述数据通路流水线结构的相应的数据通路所耗费时间最大的计算控制单元的流水线深度。
  6. 根据权利要求5所述可重构处理器,其特征在于,在所述补偿单元内,所述用于补偿延时差值的寄存器路径是由预设数量的寄存器组成的,且这些寄存器在所述第三配置信息的触发作用下,寄存同一计算模块内的所述计算控制单元输出的数据;
    其中,寄存所产生的延时差值是等于:所述当前一级计算阵列内所允许的最大流水线深度减去同一计算模块内与所述补偿单元相连接的所述计算控制单元的流水线深度而获得的时间差值。
  7. 根据权利要求4所述可重构处理器,其特征在于,所述第一配置信息包括:
    将第一级计算阵列内的所述第一互联单元和所述输入FIFO组内部设置的相匹配的输入FIFO连接到所述数据通路流水线结构中所需的访问地址信息和时间信息、
    将当前一级计算阵列内的所述第一互联单元和相邻的前一级的计算阵列内的相匹配所述第二互联单元连接到所述数据通路流水线结构中所需的访问地址信息和时间信息、
    将当前一级计算阵列内的所述第二互联单元和相邻的后一级的计算阵列内的相匹配所述第一互联单元连接到所述数据通路流水线结构中所需的访问地址信息和时间信息、
    将最后一级计算阵列内的所述第二互联单元和所述输出FIFO组内部设置的相匹配的输出FIFO连接到所述数据通路流水线结构中所需的访问地址信息和时间信息;
    其中,所述第一互联单元和所述第二互联单元都支持在所述可重构阵列内或所述数据通路流水线结构内形成所述计算模块之间互联的拓扑结构,以满足所述算法的完整功能。
  8. 根据权利要求4所述可重构处理器,其特征在于,所述第二配置信息也是一种选通信号,用于控制所述第一互联单元传输的数据在所述数据直通通路和所述数据计算通路之间选通输出,以满足所述算法在所述数据通路流水线结构的每一级流水线的计算需求。
  9. 根据权利要求5至8任一项所述可重构处理器,其特征在于,所述计算控制单元所执行的计算类型包括加减法、乘法、除法、开方、三角计算;
    其中,每一级计算阵列内的计算控制单元的类型不全相同、或全部相同;
    其中,相邻两级计算阵列之间的计算控制单元的类型不全相同、或全部相同。
  10. 一种基于权利要求1至9任一项所述可重构处理器的配置方法,其特征在于,包括:
    根据当前应用场景匹配算法的计算需求,将所述可重构阵列的相邻级的计算阵列连接出支持数据以相等流水线深度通过同一级计算阵列内不同计算模块、且满足当前应用场景匹配的算法的计算需求的数据通路流水线结构;
    其中,所述数据通路流水线结构的每一级流水线分别对应一级计算阵列;当前一级计算阵列中接入数据通路的计算模块是接入所述数据通路流水线结构的当前一级流水线;
    其中,流水线深度是数据流经所述数据通路流水线结构的相应的数据通路所耗费的时间。
  11. 根据权利要求10所述配置方法,其特征在于,所述配置方法还包括:配置所述可重构阵列接收来自所述输入FIFO组传输的待计算数据,并将待计算数据传输给所述数据通路流水线结构,同时配置所述可重构阵列向所述输出FIFO组输出所述数据通路流水线结构的最后一级所对应的计算阵列的计算结果。
  12. 根据权利要求11所述配置方法,其特征在于,连接成所述数据通路流水线结构的具体配置方法包括:
    在当前一级计算阵列的一个计算模块内,判断当前一级计算阵列是否被检测为对应所述数据通路流水线结构的第一级流水线,是则将第一互联单元和所述计算控制单元接成所述数据通路流水线结构的第一级流水线,并配置第一互联单元将所述输入FIFO组内相匹配的输出端输出的待计算数据输入给所述计算控制单元;否则将第一互联单元和所述计算控制单元接成所述数据通路流水线结构的当前一级流水线,并配置第一互联单元将相邻的前一级计算阵列内相匹配的计算模块所输出的计算结果输入给所述计算控制单元;
    判断当前一级计算阵列是否被检测为对应最后一级流水线,是则将第二互联单元和补偿单元接成所述数据通路流水线结构的最后一级流水线,并配置所述第二互联单元将经过所述补偿单元延时补偿处理的数据传输给所述输出FIFO组内相匹配的输出FIFO;否则将第二互联单元和补偿单元接成所述数据通路流水线结构的当前一级流水线,并配置所述第二互联单元将经过所述补偿单元延时补偿处理的数据传输给相邻的后一级计算阵列内相匹配的计算模块;
    判断计算控制单元是否检测到计算选通信号,是则配置输入所述计算控制单元的数据在执行计算后输出给补偿单元,否则配置输入所述计算控制单元的数据在不执行计算的前提下直接通过并传输给补偿单元;
    然后配置补偿单元选择相应的延时差值对同一计算模块内的所述计算控制单元的输出数据进行延时处理,以将同一计算模块的流水线深度延时补偿为当前一级计算阵列所允许的最大流水线深度;其中,当前一级计算阵列所允许的最大流水线深度是当前一级计算阵列中,数据流经所述数据通路所耗费的时间最大的计算控制单元的流水线深度;
    其中,计算模块包括计算控制单元、补偿单元、第一互联单元、第二互联单元;在每一级计算阵列的同一计算模块内,第一互联单元的输入端是计算模块的输入端,第一互联单元的输出端与计算控制单元的输入端连接,计算控制单元的输出端与补偿单元的输入端连接,补偿单元的输出端与第二互联单元的输入端连接,第二互联单元的输出端是计算模块的输出端。
  13. 根据权利要求12所述配置方法,其特征在于,在所述可重构阵列中,互不相邻级的两个计算阵列之间不通过所述数据通路跨级连接,以使得互不相邻级的两个计算阵列不直接连接成所述数据通路流水线结构;且同一级计算阵列内的不同计算模块之间不存在所述数据通路,其中,所述数据通路是数据传输的路径。
PCT/CN2022/081526 2021-03-24 2022-03-17 一种可重构处理器及配置方法 WO2022199459A1 (zh)

Priority Applications (1)

Application Number Priority Date Filing Date Title
EP22774120.4A EP4283481A1 (en) 2021-03-24 2022-03-17 Reconfigurable processor and configuration method

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202110311617.5 2021-03-24
CN202110311617.5A CN113064852B (zh) 2021-03-24 2021-03-24 一种可重构处理器及配置方法

Publications (1)

Publication Number Publication Date
WO2022199459A1 true WO2022199459A1 (zh) 2022-09-29

Family

ID=76562112

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/081526 WO2022199459A1 (zh) 2021-03-24 2022-03-17 一种可重构处理器及配置方法

Country Status (3)

Country Link
EP (1) EP4283481A1 (zh)
CN (1) CN113064852B (zh)
WO (1) WO2022199459A1 (zh)

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113064852B (zh) * 2021-03-24 2022-06-10 珠海一微半导体股份有限公司 一种可重构处理器及配置方法
CN114860647A (zh) * 2022-03-18 2022-08-05 北京遥感设备研究所 应用于雷达的SoC芯片
CN114416182B (zh) * 2022-03-31 2022-06-17 深圳致星科技有限公司 用于联邦学习和隐私计算的fpga加速器和芯片
CN115576895B (zh) * 2022-11-18 2023-05-02 摩尔线程智能科技(北京)有限责任公司 计算装置、计算方法及计算机可读存储介质

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20160149580A1 (en) * 2014-11-25 2016-05-26 Qualcomm Incorporated System and Method for Managing Pipelines in Reconfigurable Integrated Circuit Architectures
CN107506329A (zh) * 2017-08-18 2017-12-22 浙江大学 一种自动支持循环迭代流水线的粗粒度可重构阵列及其配置方法
CN108228966A (zh) * 2017-12-06 2018-06-29 复旦大学 基于fpga局部动态重构技术的异型流水线设计方法
CN108647773A (zh) * 2018-04-20 2018-10-12 复旦大学 一种可重构卷积神经网络的硬件互连架构
CN110059493A (zh) * 2019-04-10 2019-07-26 无锡沐创集成电路设计有限公司 基于粗粒度可重构计算单元的skinny-128-128加密算法实现方法及系统
CN110321162A (zh) * 2019-07-01 2019-10-11 无锡沐创集成电路设计有限公司 基于粗粒度可重构计算单元的present加密算法实现方法及系统
CN113064852A (zh) * 2021-03-24 2021-07-02 珠海市一微半导体有限公司 一种可重构处理器及配置方法

Family Cites Families (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US4736333A (en) * 1983-08-15 1988-04-05 California Institute Of Technology Electronic musical instrument
JP2991459B2 (ja) * 1990-06-05 1999-12-20 株式会社東芝 荷電ビーム描画用データの作成方法及び作成装置
US6119215A (en) * 1998-06-29 2000-09-12 Cisco Technology, Inc. Synchronization and control system for an arrayed processing engine
GB2409074B (en) * 2001-03-14 2005-08-03 Micron Technology Inc Arithmetic pipeline
FR2860313B1 (fr) * 2003-09-30 2005-11-04 Commissariat Energie Atomique Composant a architecture reconfigurable dynamiquement
US8099583B2 (en) * 2006-08-23 2012-01-17 Axis Semiconductor, Inc. Method of and apparatus and architecture for real time signal processing by switch-controlled programmable processor configuring and flexible pipeline and parallel processing
US8291201B2 (en) * 2008-05-22 2012-10-16 International Business Machines Corporation Dynamic merging of pipeline stages in an execution pipeline to reduce power consumption

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20160149580A1 (en) * 2014-11-25 2016-05-26 Qualcomm Incorporated System and Method for Managing Pipelines in Reconfigurable Integrated Circuit Architectures
CN107506329A (zh) * 2017-08-18 2017-12-22 浙江大学 一种自动支持循环迭代流水线的粗粒度可重构阵列及其配置方法
CN108228966A (zh) * 2017-12-06 2018-06-29 复旦大学 基于fpga局部动态重构技术的异型流水线设计方法
CN108647773A (zh) * 2018-04-20 2018-10-12 复旦大学 一种可重构卷积神经网络的硬件互连架构
CN110059493A (zh) * 2019-04-10 2019-07-26 无锡沐创集成电路设计有限公司 基于粗粒度可重构计算单元的skinny-128-128加密算法实现方法及系统
CN110321162A (zh) * 2019-07-01 2019-10-11 无锡沐创集成电路设计有限公司 基于粗粒度可重构计算单元的present加密算法实现方法及系统
CN113064852A (zh) * 2021-03-24 2021-07-02 珠海市一微半导体有限公司 一种可重构处理器及配置方法

Also Published As

Publication number Publication date
EP4283481A1 (en) 2023-11-29
CN113064852B (zh) 2022-06-10
CN113064852A (zh) 2021-07-02

Similar Documents

Publication Publication Date Title
WO2022199459A1 (zh) 一种可重构处理器及配置方法
JP2703010B2 (ja) ニユーラルネツト信号処理プロセツサ
US10140124B2 (en) Reconfigurable microprocessor hardware architecture
CN105912501B (zh) 一种基于大规模粗粒度可重构处理器的sm4-128加密算法实现方法及系统
US11017290B2 (en) Signal processing module, especially for a neural network and a neuronal circuit
CN105468568B (zh) 高效的粗粒度可重构计算系统
Coole et al. Adjustable-cost overlays for runtime compilation
CN105975251B (zh) 一种基于粗粒度可重构架构的des算法轮迭代系统及迭代方法
CN101847137B (zh) 一种实现基2fft计算的fft处理器
CN109271138A (zh) 一种适用于大维度矩阵乘的链式乘法结构
US11750195B2 (en) Compute dataflow architecture
BR112019027531A2 (pt) processadores de alto rendimento
CN101617306A (zh) 快速傅立叶变换结构
EP3444758B1 (en) Discrete data representation-supporting apparatus and method for back-training of artificial neural network
CN107092462B (zh) 一种基于fpga的64位异步乘法器
CN113055060A (zh) 面向大规模mimo信号检测的粗粒度可重构架构系统
CN106155979B (zh) 一种基于粗粒度可重构架构的des算法密钥扩展系统及扩展方法
CN111178492B (zh) 计算装置及相关产品、执行人工神经网络模型的计算方法
US10445099B2 (en) Reconfigurable microprocessor hardware architecture
CN112612744B (zh) 一种基于数据流解耦的可重构阵列映射方法
CN111078625B (zh) 片上网络处理系统和片上网络数据处理方法
CN111078624B (zh) 片上网络处理系统和片上网络数据处理方法
Dimitrakopoulos et al. Sorter based permutation units for media-enhanced microprocessors
CN111078623A (zh) 片上网络处理系统和片上网络数据处理方法
CN103716011B (zh) 有限冲击响应csd滤波器

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22774120

Country of ref document: EP

Kind code of ref document: A1

ENP Entry into the national phase

Ref document number: 2022774120

Country of ref document: EP

Effective date: 20230822

NENP Non-entry into the national phase

Ref country code: DE