CN101441564B - Method for implementing reconfigurable accelerator customized for program - Google Patents

Method for implementing reconfigurable accelerator customized for program Download PDF

Info

Publication number
CN101441564B
CN101441564B CN2008101629053A CN200810162905A CN101441564B CN 101441564 B CN101441564 B CN 101441564B CN 2008101629053 A CN2008101629053 A CN 2008101629053A CN 200810162905 A CN200810162905 A CN 200810162905A CN 101441564 B CN101441564 B CN 101441564B
Authority
CN
China
Prior art keywords
program
reconfigurable accelerator
accelerator
reconfigurable
function
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Fee Related
Application number
CN2008101629053A
Other languages
Chinese (zh)
Other versions
CN101441564A (en
Inventor
陈天洲
严力科
陈度
王罡
王勇刚
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhejiang University ZJU
Original Assignee
Zhejiang University ZJU
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhejiang University ZJU filed Critical Zhejiang University ZJU
Priority to CN2008101629053A priority Critical patent/CN101441564B/en
Publication of CN101441564A publication Critical patent/CN101441564A/en
Application granted granted Critical
Publication of CN101441564B publication Critical patent/CN101441564B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Landscapes

  • Stored Programmes (AREA)

Abstract

The invention discloses a method for realizing a reconfigurable accelerator customized for a program. The reconfigurable accelerator customized for the program accelerates the program on an FPGA by arranging the FPGA for the prior general-purpose computer system. The method has a main function of analyzing the program, uses functions to calculate information for the runtime of a granularity sampling program, acquires computing-intensive hot spot functions in the program, realizes the hot spot functions as the reconfigurable accelerator on the FPGA, and modifies call of the hot spot functions in the program into call of the corresponding reconfigurable accelerator to accelerate execution of the hot spot functions. The method uses the reconfigurable accelerator to realize the hot spot functions of the program, improves the total speed-up ratio of the program, uses the FPGA to realize the reconfigurable accelerator, achieves the performance of an approximately applied custom integrated circuit, and simultaneously maintains the flexibility of a general-purpose processor.

Description

Be program reconfigurable accelerator customized implementation method
Technical field
The present invention relates to program optimization design, FPGA design field, relate in particular to a kind of program reconfigurable accelerator customized implementation method that is.
Background technology
Along with the application of new material and the development of new technology, very large scale integration technology makes great progress, and integrated transistor size is about to surpass 10,000,000,000 on the existing processor equivalent area.But because transistor utilization ratio, electric leakage, heat radiation and power problems come obtained performance to promote losing more than gain of becoming to improve processor host frequency.Therefore, the multi-core system structure technology replaces becomes the mainstream technology of processor, by a plurality of process nuclear of encapsulation in single-chip, realized real walking abreast physically, thereby improved transistorized utilization ratio relatively, alleviated heat radiation and power problems, for computing machine has brought bigger performance boost.From current trend, the number of integrated nuclear will continue to increase rapidly in the processor chips.Yet, because the restriction that general application degree of parallelism is difficult to improve, when the processor general purpose core outnumber 16 after, the number that increases common treatment nuclear more just is difficult to bring bigger performance boost, though the therefore simple number that increases the common treatment nuclear of isomorphism can be used up the transistor that increases rapidly, application program but can not make full use of the common treatment nuclear that quantity increases day by day, and calculated performance can not improve along with the increase of process nuclear number naturally simply yet.
The coprocessor and the accelerator of customization are to satisfy the another kind of technological means of user to the ever-increasing demand of performance, often comprise coprocessor or accelerator that some are special-purpose in the modern computing system, comprise " industry application specific processors " such as " domain-specific coprocessor ", graph and image processing and digital signal processing towards science calculating etc., as auxiliary process nuclear, the Intel of Cell Figure media accelerator 950 etc.The architecture of these dedicated coprocessors and accelerator utilizes the feature of application-specific to customize, thereby can reach the high-performance and the high-level efficiency of customized application.But the coprocessor of this customizations and accelerator design only operation institute towards application the time performance that just can obtain, utilization factor and dirigibility are not high, and specialized customization will greatly increase design cost.
In this case, in conventional computer system, increase the reconfigurable accelerator that constitutes by restructural equipment more and provide another kind of approach for promoting calculated performance.Dynamic recognition by restructural equipment, reconfigurable accelerator can be supported various dissimilar application, thereby can reach superior performance in the scope more widely, improve the utilization factor of reconfigurable hardware resource, obtain general processor simultaneously and adapt to the most high flexibilities of using and the high-performance and the high-level efficiency of application specific processor.In the diverse problems of solve using, also can solve accelerator hardware resource utilization, design complexity, system reliability and reduce cost and many-sided problem such as power consumption.
Summary of the invention
In order to obtain the accelerator of better utilization restructural resource, design customization,, the object of the present invention is to provide a kind of program reconfigurable accelerator customized implementation method that is in order to improve the execution performance of application program.
The technical scheme that technical solution problem of the present invention is adopted is:
A kind of is program reconfigurable accelerator customized implementation method:
1) reconfigurable accelerator is auxiliary calculates:
Reconfigurable accelerator is accepted calling of program, is responsible for the part of computation-intensive in the handling procedure, and in the computation process of reconfigurable accelerator, program halt wait reconfigurable accelerator is returned;
2) program customization reconfigurable accelerator implementation procedure:
1. program analysis: the program parsing process comprises 2 steps:
I. determine the function focus
Determine that the function focus is a dynamic profile process, determines to take in the program maximum partial function of execution time; Program when utilizing parser to operation is followed the tracks of, with the function be granularity during to operation program sample, be the statistics of elementary cell then with the function to sampled data, draw each function calls number of times and execution time, from how to few sort by the execution time, wherein maximum function of execution time is exactly the focus function of program, can be used as the candidate functions that is embodied as reconfigurable accelerator;
II. analyzing data relies on
The data dependency analysis is a static analysis process, and the focus function is carried out the degree of parallelism that the data dependency analysis is determined function; If the data that do not exist between the loop iteration rely on, the different iteration of round-robin just can parallel expansion so, thereby makes full use of the high concurrency of physics of FPGA; If the focus function promotes by the forecast assessment obtained performance, so just be embodied as reconfigurable accelerator, with the execution of accelerated procedure;
2. hardware-software partition:
Determine to be embodied as after the function of reconfigurable accelerator, in fact finished division, the hardware-software partition step mainly is responsible for interface and the parameter between define program and the reconfigurable accelerator; Because the routine call reconfigurable accelerator needs extra cost, should in reconfigurable accelerator, increase metadata cache, make communication concentrate the extra cost of repeatedly calling to eliminate with repeatedly calling to merge, increase the execution time of at every turn calling, reduce the number of times of routine call reconfigurable accelerator;
3. the realization of focus function on FPGA:
According to program and the interface between the reconfigurable accelerator and the parameter of 2. middle definition, realize the hardware interface of reconfigurable accelerator, and increase buffer memory, be supported in the call number that reduces reconfigurable accelerator on the software; By increasing buffer memory, the input data of repeatedly calling reconfigurable accelerator by once calling the buffer memory that is transferred to reconfigurable accelerator, are reduced overall communication cost;
Utilize reconfigurable logic Parallel Implementation and focus function identical functions, and satisfy the purpose that improves frequency and reduce the performance period; Improve the frequency of reconfigurable accelerator and reduce the performance period, can both directly improve the performance of reconfigurable accelerator;
4. update routine calls accelerator; Performing step:
At last, need in program, call accelerator on the FPGA:
I. increase code before the focus that reconfigurable accelerator is quickened in program, finish the preparation of reconfigurable accelerator input data;
II. call the execution reconfigurable accelerator by the reconfigurable accelerator software interface, program halt is waited for the reconfigurable accelerator return results;
III. receive the return results of reconfigurable accelerator, arrangement returns to program, and program continues to carry out again.
The beneficial effect that the present invention has is:
The present invention be a kind of be the implementation method of program customization reconfigurable accelerator based on FPGA, its major function is to use FPGA that the focus function of program is embodied as reconfigurable accelerator on computer architecture, and focus function calls in the program is revised as calling of corresponding reconfigurable accelerator, quicken the execution of focus function.
1) the use reconfigurable accelerator realizes the focus function of program, the overall speed-up ratio of raising program;
2) use FPGA to realize reconfigurable accelerator, in the performance that reaches approximate applied customization integrated circuit, the dirigibility that has kept general processor.
Description of drawings
Accompanying drawing is an overview flow chart of the present invention.
Embodiment
For the specific implementation flow process of program reconfigurable accelerator customized implementation method as follows.
1) increase the auxiliary calculating of reconfigurable accelerator:
On the traditional common computer system, increase FPGA as configurable component, FPGA is connected to conventional computer system by the PCI-E bus.
Reconfigurable accelerator is responsible for the part of computation-intensive in the handling procedure, accepts calling of program, and after the routine call reconfigurable accelerator, reconfigurable accelerator begins to handle the input data, in the computing interval of reconfigurable accelerator, program halt; Carry out end when reconfigurable accelerator, the result is returned to program, program continues to carry out again.
2) program customization reconfigurable accelerator implementation procedure, as shown in drawings:
1. program analysis:
I. determine the function focus
Determine that the function focus is a dynamic profile process, can determine to take in the program maximum partial function of execution time;
Program when a. utilizing parser to operation is followed the tracks of track record function calls number of times, and start time of at every turn calling and time of return;
B. be elementary cell with the function to the statistics of sampling gained data, draw each function calls number of times and execution time, by the execution time from how to sort to few, be designated as formation L Func
C. come formation L FuncThe 1st function is maximum function of execution time, is the focus function of program, can be used as the candidate functions that realizes reconfigurable accelerator.
II. analyzing data relies on
The data dependency analysis is a static analysis process, and the focus function is carried out the degree of parallelism that the data dependency analysis can be determined function, and circulation is the part that takies maximum execution time in the function usually, and therefore circulation is the preferential part of quickening;
A. to formation L FuncThe data dependency analysis is carried out in circulation in the function that middle ordering is the 1st, and the data that do not exist between the iteration rely on, and the different iteration of round-robin are carried out the performance prediction assessment with regard to the energy parallel expansion to the corresponding reconfigurable accelerator of function so;
B. the focus function is carried out the performance prediction assessment,, can be implemented as reconfigurable accelerator if the prediction obtained performance promotes; If prediction can not obtained performance promote, from formation L FuncSelect next function to analyze successively, account for total execution time of program up to the execution time of next function and be less than 10%, illustrate that all functions are not in this program.
The performance prediction assessment is as follows:
A. the processor execution time of computing function, be expressed as Time CPU:
Time cpu = ClockCycles CPU Frequency CPU = InstructionNum × CPI Frequency CPU
Wherein
ClockCycles CPUExpression CPU finishes the periodicity of once carrying out;
InstructionNum represents that CPU finishes the instruction number of once carrying out;
CPI is every instruction cycles (Cycles Per Instruction);
Frequency CPUBe processor host frequency.
When CPI was 1, the execution time was approximately:
Time cpu ≈ InstrctionNum Frequency CPU
B. the execution time of the corresponding reconfigurable accelerator of computing function, be expressed as Time AFU:
Time AFU = ClockCycles AFU Frequency AFU
Wherein
ClockCycles AFUExpression FPGA accelerator is finished execution cycle number one time,
Frequency AFUFrequency for the FPGA accelerator.
C. the processor execution time of comparison function and corresponding reconfigurable accelerator execution time, work as Time AFU<Time CPUThe time, that is:
ClockCycles AFU Frequency AFU < InstrctionNum Frequency CPU
Rule of thumb statistics can get, and the dominant frequency of processor approximately is 20 times of FPGA accelerator frequency, so Time AFU<Time CPUThe time:
ClockCycles AFU < InstrctionNum 20
If the speed-up ratio of this explanation FPGA accelerator is greater than 1, the execution cycle number of reconfigurable accelerator should just will be finished the required work of finishing of instruction more than 20 less than program 1/20 of the number that executes instruction in the one-period of reconfigurable accelerator so.
2. hardware-software partition:
The hardware-software partition step mainly is responsible for interface and the parameter between define program and the reconfigurable accelerator, comprises the following steps;
I. define the software transfer interface of reconfigurable accelerator, offer the software interface of routine call, should reduce the setup time of reconfigurable accelerator input data;
II. define the hardware interface of reconfigurable accelerator, in reconfigurable accelerator, increase metadata cache, make communication concentrate the extra cost of repeatedly calling to eliminate with repeatedly calling to merge, increase the execution time of at every turn calling, reduce the number of times of routine call reconfigurable accelerator;
3. the realization of focus function on FPGA:
I. realize the hardware interface of reconfigurable accelerator,, and increase buffer memory, be supported in the call number that reduces reconfigurable accelerator on the software according to program and the interface between the reconfigurable accelerator and the parameter of 2. middle definition; By increasing buffer memory, can reduce overall communication cost with the input data of repeatedly calling reconfigurable accelerator by once calling the buffer memory that is transferred to reconfigurable accelerator;
II. utilize reconfigurable logic Parallel Implementation and focus function identical functions, and satisfy the purpose that improves frequency and reduce the performance period; Improve the frequency of reconfigurable accelerator and reduce the performance period, can both directly improve the performance of reconfigurable accelerator;
4. update routine calls accelerator; Performing step
At last, need in program, call accelerator on the FPGA:
I. increase code before the focus that reconfigurable accelerator is quickened in program, finish the preparation of reconfigurable accelerator input data;
II. call the execution reconfigurable accelerator by the reconfigurable accelerator software interface, program halt is waited for the reconfigurable accelerator return results;
III. receive the return results of reconfigurable accelerator, arrangement returns to program, and program continues to carry out again.

Claims (1)

1. one kind is program reconfigurable accelerator customized implementation method, it is characterized in that:
1) reconfigurable accelerator is auxiliary calculates:
Reconfigurable accelerator is accepted calling of program, is responsible for the part of computation-intensive in the handling procedure, and in the computation process of reconfigurable accelerator, program halt wait reconfigurable accelerator is returned;
2) program customization reconfigurable accelerator implementation procedure:
1. program analysis: the program parsing process comprises 2 steps:
I. determine the function focus
Determine that the function focus is a dynamic profile process, determines to take in the program maximum partial function of execution time; Program when utilizing parser to operation is followed the tracks of, with the function be granularity during to operation program sample, be the statistics of elementary cell then with the function to sampled data, draw each function calls number of times and execution time, from how to few sort by the execution time, wherein maximum function of execution time is exactly the focus function of program, can be used as the candidate functions that is embodied as reconfigurable accelerator;
II. analyzing data relies on
The data dependency analysis is a static analysis process, and the focus function is carried out the degree of parallelism that the data dependency analysis is determined function; If the data that do not exist between the loop iteration rely on, the different iteration of round-robin just can parallel expansion so, thereby makes full use of the high concurrency of physics of FPGA; If the focus function promotes by the forecast assessment obtained performance, so just be embodied as reconfigurable accelerator, with the execution of accelerated procedure;
2. hardware-software partition:
Determine to be embodied as after the function of reconfigurable accelerator, in fact finished division, the hardware-software partition step mainly is responsible for interface and the parameter between define program and the reconfigurable accelerator; Because the routine call reconfigurable accelerator needs extra cost, should in reconfigurable accelerator, increase metadata cache, make communication concentrate the extra cost of repeatedly calling to eliminate with repeatedly calling to merge, increase the execution time of at every turn calling, reduce the number of times of routine call reconfigurable accelerator;
3. the realization of focus function on FPGA:
According to program and the interface between the reconfigurable accelerator and the parameter of 2. middle definition, realize the hardware interface of reconfigurable accelerator, and increase buffer memory, be supported in the call number that reduces reconfigurable accelerator on the software; By increasing buffer memory, the input data of repeatedly calling reconfigurable accelerator by once calling the buffer memory that is transferred to reconfigurable accelerator, are reduced overall communication cost;
Utilize FPGA Parallel Implementation and focus function identical functions, and satisfy the purpose that improves frequency and reduce the performance period;
4. update routine calls accelerator; Performing step:
At last, need in program, call accelerator on the FPGA:
I. increase code before the focus that reconfigurable accelerator is quickened in program, finish the preparation of reconfigurable accelerator input data;
II. call the execution reconfigurable accelerator by the reconfigurable accelerator software interface, program halt is waited for the reconfigurable accelerator return results;
III. receive the return results of reconfigurable accelerator, arrangement returns to program, and program continues to carry out again.
CN2008101629053A 2008-12-04 2008-12-04 Method for implementing reconfigurable accelerator customized for program Expired - Fee Related CN101441564B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN2008101629053A CN101441564B (en) 2008-12-04 2008-12-04 Method for implementing reconfigurable accelerator customized for program

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN2008101629053A CN101441564B (en) 2008-12-04 2008-12-04 Method for implementing reconfigurable accelerator customized for program

Publications (2)

Publication Number Publication Date
CN101441564A CN101441564A (en) 2009-05-27
CN101441564B true CN101441564B (en) 2011-07-20

Family

ID=40726012

Family Applications (1)

Application Number Title Priority Date Filing Date
CN2008101629053A Expired - Fee Related CN101441564B (en) 2008-12-04 2008-12-04 Method for implementing reconfigurable accelerator customized for program

Country Status (1)

Country Link
CN (1) CN101441564B (en)

Families Citing this family (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102193827A (en) * 2010-03-12 2011-09-21 西安交通大学 CBEA (Cell Broadband Engine Architecture)-oriented transplanting method for isomorphic platform application
EP2442228A1 (en) * 2010-10-13 2012-04-18 Thomas Lippert A computer cluster arrangement for processing a computaton task and method for operation thereof
CN102902581B (en) 2011-07-29 2016-05-11 国际商业机器公司 Hardware accelerator and method, CPU, computing equipment
KR101861742B1 (en) * 2011-08-30 2018-05-30 삼성전자주식회사 Data processing system and method for switching between heterogeneous accelerators
CN102929812A (en) * 2012-09-28 2013-02-13 无锡江南计算技术研究所 Reconfigurable accelerator mapping method based on storage interface
CN104572134B (en) * 2015-02-10 2018-03-06 中国农业银行股份有限公司 A kind of optimization method and device
CN106445678B (en) * 2016-07-21 2020-02-07 天津大学 Parallelism adjusting algorithm for reducing power consumption of instruction level parallel processor
US11645059B2 (en) * 2017-12-20 2023-05-09 International Business Machines Corporation Dynamically replacing a call to a software library with a call to an accelerator
US10572250B2 (en) 2017-12-20 2020-02-25 International Business Machines Corporation Dynamic accelerator generation and deployment
WO2020191549A1 (en) * 2019-03-22 2020-10-01 华为技术有限公司 Soc chip, method for determination of hotspot function and terminal device
CN114153494B (en) * 2021-12-02 2024-02-13 中国核动力研究设计院 Hot code optimization method and device based on thermodynamic diagram

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1867893A (en) * 2003-10-14 2006-11-22 史坦利·M·海德克 Method and apparatus for accelerating the verification of application specific integrated circuit designs

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1867893A (en) * 2003-10-14 2006-11-22 史坦利·M·海德克 Method and apparatus for accelerating the verification of application specific integrated circuit designs

Also Published As

Publication number Publication date
CN101441564A (en) 2009-05-27

Similar Documents

Publication Publication Date Title
CN101441564B (en) Method for implementing reconfigurable accelerator customized for program
US10387319B2 (en) Processors, methods, and systems for a configurable spatial accelerator with memory system performance, power reduction, and atomics support features
US10416999B2 (en) Processors, methods, and systems with a configurable spatial accelerator
US10558575B2 (en) Processors, methods, and systems with a configurable spatial accelerator
Wang et al. Kernel fusion: An effective method for better power efficiency on multithreaded GPU
US20190095383A1 (en) Processors, methods, and systems for debugging a configurable spatial accelerator
Luo et al. A performance and energy consumption analytical model for GPU
JPH04307625A (en) Loop optimization system
CN104657219A (en) Application program thread count dynamic regulating method used under isomerous many-core system
CN110427337B (en) Processor core based on field programmable gate array and operation method thereof
CN101989192A (en) Method for automatically parallelizing program
Kumar et al. Speculative parallelism on multicore chip architecture strengthen green computing concept: A survey
Capalija et al. Microarchitecture of a coarse-grain out-of-order superscalar processor
Whaley et al. Heuristics for profile-driven method-level speculative parallelization
CN100583042C (en) Compiling method, apparatus for loop in program
Marowka Energy consumption modeling for hybrid computing
Lee et al. Performance benefits of heterogeneous computing in HPC workloads
Wang et al. A flexible chip multiprocessor simulator dedicated for thread level speculation
Curzel et al. Higher-Level Synthesis: experimenting with MLIR polyhedral representations for accelerator design
Wang et al. Investigation of factors impacting thread-level parallelism from desktop, multimedia and HPC applications
Tianxu Convolutional Neural Network FPGA-accelerator on Intel DE10-Standard FPGA
Lai et al. Performance improvement on heterogeneous platforms: a machine learning based approach
Lee et al. Raptor: A single chip multiprocessor
Wang et al. A bandwidth enhancement method of VTA based on paralleled memory access design
Liu et al. An online profile guided optimization approach for speculative parallel threading

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant
C17 Cessation of patent right
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20110720

Termination date: 20111204