CN107632957A

CN107632957A - A kind of calculating acceleration system and its accelerated method based on large-scale F PGA chips

Info

Publication number: CN107632957A
Application number: CN201710813770.1A
Authority: CN
Inventors: 童欢欢; 杨磊; 潘家晔
Original assignee: Nanjing Bouncing Force Information Technology Co Ltd
Current assignee: Nanjing Bouncing Force Information Technology Co Ltd
Priority date: 2017-09-11
Filing date: 2017-09-11
Publication date: 2018-01-26
Also published as: CN109284250A

Abstract

The invention discloses a kind of calculating acceleration system and its accelerated method based on large-scale F PGA chips, the acceleration system that calculates includes server and the FPGA being connected with server calculating accelerator cards；The server, the result data obtained after accelerator card speed-up computation is calculated to FPGA calculating accelerator cards, and for reading the FPGA for sending data to be calculated；The FPGA calculates accelerator card, data to be calculated for being sent using server carry out corresponding speed-up computation, obtain the result data, designed by the acceleration of hardware description language, it is converted into the soft cores of ip, the hardware logic electric circuit of high concurrent is realized finally by large-scale F PGA chips, realizes and calculates acceleration.It the advantage is that and provide a kind of more common low energy consumption, high performance new calculating accelerated method.

Description

A kind of calculating acceleration system and its accelerated method based on large-scale F PGA chips

Technical field

The invention belongs to calculate acceleration technique field, more particularly to a kind of calculating based on large-scale F PGA chips accelerates system System and its accelerated method.

Background technology

Currently the operand in fields such as genetic engineering, weather forecast, oil exploration, seismic studies increasingly increases.Can be with , it is envisioned that the calculating demand in these following fields will be increasing, this just accelerates to propose higher requirement to calculating.Current Calculate accelerated method and be broadly divided into three kinds：PC cluster accelerates, the parallel computation based on GPU accelerates and the restructural based on FPGA Calculate and accelerate.

PC cluster（Cluster Computing）

Cluster is the parallel or distributed system that the computer being mutually connected to each other by some is formed.Server cluster system System generally be exactly multiple servers are got up by high-speed traffic link connection, externally come, these servers just as One server is working, and internally for, outside come load these Node stations are dynamically allocated into by certain mechanism In, so as to reach high-performance, the High Availabitity that superserver just has.Clustering is a kind of relatively new technology, is passed through Clustering, the of a relatively high receipts in terms of performance, reliability, flexibility can be being obtained in the case of paying lower cost Benefit.At present, in the supercomputer being currently running all over the world, many is all realized using Clustering.

GPU（Graphics Processing Unit）, i.e. graphics processor, it is that one kind exclusively carries out image operation work Microprocessor.Nowadays GPU as coprocessor it is a kind of turn into the present age calculate acceleration system important component it One, it is the current main method for calculating and accelerating.From NVIDIA in 1999 will since first GPU is introduced to the market in the world, GPU is developed rapidly, and its function develops into high-speed parallel from single figure shows and calculated in short ten years（GPGPU, General Purpose GPU, i.e. general-purpose computations GPU）.The Heterogeneous Computing pattern being made up of CPU+GPU, because its is excellent Power dissipation ratio of performance, it is widely deployed in engineering fields such as physics simulation, molecular dynamics, earthquake simulations.

FPGA（Field-Programmable Gate Array）, i.e. field programmable gate array, it is by by can Program the semiconductor devices that configurable logic block (CLB) matrix of mutual downlink connection is formed.Reconfigurable Computation （Reconfigurable Computing）Technology refers under software control, using the reusable resource in system, according to Application need reconfigure a new calculating platform, reach close to specialized hardware design high-performance.

Restructurable computing system based on FPGA accelerates to be a kind of new side to find application in engineering calculation at present Method, suitable for most computation-intensive and data-intensive application, such as financial calculating, cryptography, life science, oil Exploration, big data processing etc..It realizes the more preferable operational efficiency to application-specific, i.e. duration performance by hardware optimization The value closer to peak value operational performance can be reached, while FPGA Energy Efficiency Ratio is higher compared with GPU and CPU, is a kind of emerging Promising calculating accelerated method.

The content of the invention

The technical problems to be solved by the invention are to provide a kind of new FPGA for the deficiency of background technology and can weigh Structure, highdensity calculating acceleration system and its accelerated method based on large-scale F PGA chips.

The present invention uses following technical scheme to solve above-mentioned technical problem

A kind of calculating acceleration system based on large-scale F PGA chips, the FPGA meters being connected comprising server and with the server Calculate accelerator card；

The server, for sending data to be calculated to FPGA calculating accelerator cards, and based on reading the FPGA Calculate the result data obtained after accelerator card speed-up computation；

The FPGA calculates accelerator card, and the data to be calculated for being sent using server carry out corresponding speed-up computation, obtain The result data.

As a kind of further preferred scheme of the calculating acceleration system based on large-scale F PGA chips of the present invention, the clothes Business device includes power module, data distribution and recycling module, interface module, accelerating module and corresponding distribution of services module；

Wherein, power module, for providing electric energy needed for server；

Data distribution and recycling module, for distributing and data collection result of calculation；

Interface module, carry out data transmission for calculating accelerator card with FPGA：By sending data to be calculated to based on the FPGA Accelerator card is calculated, and the result data after FPGA calculating accelerator card speed-up computations is transmitted to server；

Accelerating module and corresponding distribution of services module, for accelerating the distribution and recovery of processing data.

It is described as a kind of further preferred scheme of the calculating acceleration system based on large-scale F PGA chips of the present invention FPGA calculates accelerator card and includes data communication interface, 12 fpga chips and connected one to one respectively with the fpga chip Memory；

The data communication interface, for carrying out data transmission with server：The data to be calculated sent for the reception server, And the result data calculated FPGA after accelerator card speed-up computation is transmitted to server；

The fpga chip, the data to be calculated sent for speed-up computation server；

The memory, after the data to be calculated of storage server transmission, and FPGA calculating accelerator card speed-up computations Result data.

It is described to connect as a kind of further preferred scheme of the calculating acceleration system based on large-scale F PGA chips of the present invention Mouth mold block and data communication interface use PCIe interface.

It is described as a kind of further preferred scheme of the calculating acceleration system based on large-scale F PGA chips of the present invention Fpga chip uses Xilinx Spartan-6 chips.

It is described to deposit as a kind of further preferred scheme of the calculating acceleration system based on large-scale F PGA chips of the present invention Reservoir uses DDR3.

A kind of calculating accelerated method based on large-scale F PGA chips, is specifically comprised the following steps；

Step 1, the data for needing speed-up computation are analyzed；

Step 2, the accelerating part of pending data is extracted, and designs and accelerates the soft cores of ip；

Step 3, communication mode and data format are designed；

Step 4, pending data is transmitted to FPGA and calculates accelerator card FPGA by several data formats of communication modes being related to step 3 Accelerator card, and then complete the speed-up computation of data.

As the present invention it is a kind of based on fpga chip calculating accelerated method further preferred scheme, in step 2, if Meter accelerates the soft cores of ip to be completed especially by following two methods：When the multistage flowing water accelerating algorithm of design, second, design multinuclear increases Add the utilization rate of fpga chip.

The present invention compared with prior art, has following technique effect using above technical scheme：

1. high-performance：FPGA hardware accelerating circuit logical capacity of the present invention is big, speed is fast, degree of parallelism is high, operational capability By force, acceleration effect is notable, and so that DES keys travel through speed as an example, every kilowatt of speed reaches for 657,900,000,000/second；

2. low-power consumption, high performance-price ratio：In FPGA, CPU, GPU three, FPGA every kilowatt of consumed power operational performance highest, so Its is least in power-consuming for identical performance, and compared to traditional calculations, FPGA low energy consumption high-performance calculation is with the obvious advantage；

3. relative to general server cluster（Or cloud computing）, GPU arrays, the cost of the FPGA solutions of same performance Typically want low more, therefore the present invention has very high cost performance.

Brief description of the drawings

Fig. 1 is DES decoding algorithm flow charts；

Fig. 2 is working method schematic diagram of the present invention；

Fig. 3 is the system construction drawing of the present invention；

Fig. 4 is the hardware structure diagram that FPGA calculates accelerator card；

Fig. 5 a are server machine frame pictorial diagrams；

Fig. 5 b are server board pictorial diagrams；

Fig. 6 is shown as the server connection diagram of working method of the present invention.

Embodiment

Technical scheme is described in further detail below in conjunction with the accompanying drawings：

The server includes power module, data distribution and recycling module, interface module, accelerating module and corresponding Distribution of services module；

Wherein, power module, for providing electric energy needed for server；

Accelerating module and corresponding distribution of services module, for accelerating the distribution and recovery of processing data；

The FPGA calculates accelerator card and includes data communication interface, more fpga chips and distinguish one by one with the fpga chip The memory of corresponding connection；The interface module and data communication interface can use Pcle interfaces, and the fpga chip can be adopted With Xilinx Spartan-6 chips, the memory can use DDR3 chips.More fpga chips are minimum two.

The data communication interface, for carrying out data transmission with server：Sent for the reception server to be calculated Data, and the result data after FPGA calculating accelerator card speed-up computations is transmitted to server；

Specific embodiment is as follows：

High-performance restructural server of the present invention includes 10 pieces of FPGA and calculates card, and every piece contains 12 Xilinx Spartan-6 chips, totally 120,10 pieces of boards can be with flexible configuration；Every fpga chip is all equipped with 512MB DRAM internal memories； Monolithic calculates card power and reaches 120 watts, and complete machine power is 1475 watts；Data communication interface is made an excuse using USB3.0, also optional PCIe interface；Machine physical size is standard 6U cabinets；Cluster working method is supported, 10 pieces of a set of equipment calculates card, Ke Yiduo Complete equipment concurrent working, performance linear enhancing.

In use, the system is mainly responsible for dedicated computing, and it must be controlled by other main frames.One host can be with Control multiple servers.Communicated between main frame and server by USB (or PCIe) interface.Host computer side provides java, c++ Language api interface interacts with foundation stone server, to load the soft cores of IP, control, state-detection and data transfer etc..

Server is connected on Linux Host by USB, you can use the system.Mainly there are two kinds of occupation modes, A kind of is the big data application for having had exploitation to finish（Electric power big data calculates application）, run directly on Host, Host can give calculating task to high-performance calculation platform automatically and calculate.For the first occupation mode, the ring after being pre-configured with Behind border, the software for the electric power big data specially developed directly is run on Host.

Also a kind of occupation mode is directly to use the API of bottom, carries out various application and developments.This occupation mode uses SDK, direct maneuvering calculation acceleration system.More suitable for there is the unit of special research staff.

One kind is introduced below in conjunction with the accompanying drawings realizes that DES decrypts accelerated method using the present invention：

DES（Data Encryption Standard）It is the earliest most widely used packet symmetric encipherment algorithm of invention.DES The suction parameter of algorithm has three：Key、Data、Mode.Wherein Key is 8 bytes totally 64, is that the work of DES algorithms is close Key；Data is also 8 bytes 64, is data to be encrypted or decrypted；Mode is DES working method, there is two kinds：Add Close or decryption.

Keys and additional 8 bit parity check positions of the DES using one 56, produce maximum 64 point Group size.This is the block cipher of an iteration, using referred to as Feistel technology, wherein the text block of encryption is divided into Two halves., then will output and second half progress nonequivalence operation using sub-key to wherein half application circulatory function；Then hand over This two halves is changed, this process may proceed to down, but last circulation does not exchange.DES is using 16 circulations, using XOR, Displacement, replacement, four kinds of basic operations of shifting function.

Fig. 1 is that DES decrypts flow chart.

Acceleration design is carried out to DES decipherment algorithms underneath with the present invention：

1st, to needing to accelerate DES decipherment algorithms to analyze.

2nd, accelerating part is extracted, design accelerates the soft cores of ip.Mainly realized and accelerated by two methods, first, design multilevel flow Water accelerating algorithm；Second, design multinuclear increase FPGA utilization rate.

3rd, according to hardware interface design communication mode and data format.

4th, FPGA speed-up computation results are integrated, complete the design that algorithm accelerates.

The present invention is accelerated using FPGA hardware, and the accelerating algorithm that design is completed, this working method can be directly manipulated by SDK Principle is as shown in Figure 2.Overall system architecture figure is as shown in Figure 3.

In general, decrypt performance and the FPGA calculating card quantity contained by equipment is linearly directly proportional, calculate and block more times It is shorter.High-performance server of the present invention includes 10 pieces of FPGA and calculates card, and every piece contains 12 Xilinx Spartan-6 cores Piece, totally 120,10 pieces of boards can be with flexible configuration；Single-chip FPGA（Spartan 6）Crack speed and reach 80.6 hundred million times/second. Hardware configuration is as shown in Figure 4.The restructural server machine frame and restructural server board material object wherein used are respectively as attached Fig. 5 a, shown in 5b.

In addition, the present invention also supports cluster working method, 10 pieces of a set of equipment calculates card, can more parallel works of complete equipment Make, performance linear enhancing；Cooperateed with and decrypted using more equipment autonomouslies, reduce decryption time at double, support minute level decryption performance.Collection Group's working method is as shown in Figure 6.By actual measurement, it is 0.9672 × 10 that the DES accelerated by the system, which decrypts speed,¹²Secondary/second.

Claims

A kind of 1. calculating acceleration system based on large-scale F PGA chips, it is characterised in that：Comprising server and with the service The FPGA of device connection calculates accelerator card；

The server, for sending data to be calculated to FPGA calculating accelerator cards, and based on reading the FPGA Calculate the result data obtained after accelerator card speed-up computation；

The FPGA calculates accelerator card, and the data to be calculated for being sent using server carry out corresponding speed-up computation, obtain The result data.
A kind of 2. calculating acceleration system based on fpga chip according to claim 1, it is characterised in that：The server Include power module, data distribution and recycling module, interface module, accelerating module and corresponding distribution of services module；

Wherein, power module, for providing electric energy needed for server；

Data distribution and recycling module, for distributing and data collection result of calculation；

Interface module, carry out data transmission for calculating accelerator card with FPGA：By sending data to be calculated to based on the FPGA Accelerator card is calculated, and the result data after FPGA calculating accelerator card speed-up computations is transmitted to server；

Accelerating module and corresponding distribution of services module, for accelerating the distribution and recovery of processing data.
A kind of 3. calculating acceleration system based on large-scale F PGA chips according to claim 1, it is characterised in that：It is described FPGA calculates accelerator card and includes data communication interface, 12 fpga chips and connected one to one respectively with the fpga chip Memory；

The data communication interface, for carrying out data transmission with server, and by after FPGA calculating accelerator card speed-up computations Result data transmit to server；

The fpga chip, the data to be calculated sent for speed-up computation server；

The memory, after the data to be calculated of storage server transmission, and FPGA calculating accelerator card speed-up computations Result data.
A kind of 4. calculating acceleration system based on large-scale F PGA chips according to Claims 2 or 3, it is characterised in that： The interface module and data communication interface use PCIe interface.
A kind of 5. calculating acceleration system based on large-scale F PGA chips according to claim 3, it is characterised in that：It is described Fpga chip uses Xilinx Spartan-6 chips.
A kind of 6. calculating acceleration system based on large-scale F PGA chips according to claim 3, it is characterised in that：It is described Memory uses DDR3 chips.
A kind of 7. calculating accelerated method based on large-scale F PGA chips, it is characterised in that：Specifically comprise the following steps；

Step 1, the data for needing speed-up computation are analyzed；

Step 2, the accelerating part of pending data is extracted, and designs and accelerates the soft cores of ip；

Step 3, communication mode and data format are designed；

Step 4, pending data is transmitted to FPGA and calculates accelerator card FPGA by several data formats of communication modes being related to step 3 Accelerator card, and then complete the speed-up computation of data.
A kind of 8. calculating accelerated method based on fpga chip according to claim 7, it is characterised in that：In step 2, Design accelerates the soft cores of ip to be completed especially by following two methods：When the multistage flowing water accelerating algorithm of design, second, design multinuclear Increase the utilization rate of fpga chip.