CN111625368A

CN111625368A - Distributed computing system and method and electronic equipment

Info

Publication number: CN111625368A
Application number: CN202010445663.XA
Authority: CN
Inventors: 曹越; 刘霖; 郭姝辰; 凌伟程
Original assignee: Aerospace Information Research Institute of CAS
Current assignee: Aerospace Information Research Institute of CAS
Priority date: 2020-05-22
Filing date: 2020-05-22
Publication date: 2020-09-04

Abstract

A distributed computing system, method and electronic device, the distributed computing system comprising: the FPGA chip is in communication connection with the at least one slave FPGA chip and is used for realizing data processing, and/or image processing, and/or running water calculation or parallel calculation or iterative calculation of signal processing; the master FPGA chip and the at least one slave FPGA chip are both connected with at least one double data rate memory. The performance indexes of the distributed computing system in the aspects of data computing and data storage are greatly improved, the hardware scale is greatly reduced, the adaptability is strong, and the processing of high-requirement data can be completed at high performance.

Description

Distributed computing system and method and electronic equipment

Technical Field

The present invention relates to the field of data processing, and in particular, to a distributed computing system, a distributed computing method, and an electronic device.

Background

In recent years, with the improvement of the functional performance of various sensors, more and more data needs to be processed at high speed even in real time. In order to meet the processing requirements of various complex scenes, various processors are greatly improved in performance. In the field of data analysis, a Central Processing Unit (CPU) finishes multi-core multi-thread upgrading, and the timeliness of data analysis is greatly improved. In the field of image processing, the structure of a Graphics Processing Unit (GPU) is also significantly optimized, which has been promoted from hundreds of processing cores to thousands of processing cores, and a programming implementation framework has been developed, and products based on opengl and cuda programming frameworks have been stably operated in a plurality of image processing applications. In the field of signal processing, a Digital Signal Processor (DSP) has become a preferred processing device in the industry, and after years of development, the related technology of DSP has matured, and it can stably support product applications in various signal processing fields. However, with the continuous improvement of the precision of the sensor, more and more requirements are developed towards the direction of multivariate data fusion and automatic generation of detection information, the processing platform is required to have multi-source data processing capability in order to meet the development requirement, and meanwhile, the processing performance can be continuously improved.

In order to meet the complex processing requirements of some projects, a heterogeneous processing platform is built, and then the number of core processors is increased to improve the data processing capacity.

The traditional method has a plurality of defects, and is collectively represented as three points: firstly, the hardware scale is large. Multiple types of processors are needed for completing multi-sensor fusion and automatic generation of detection information, peripheral circuits of the multiple types of processors are greatly different, merging optimization design is difficult to realize, different algorithms are realized on a fixed platform due to different processing resources needed by the different algorithms, and the maximum resources needed by various types of processing are selected during platform design, so that resource waste of a certain scale is caused, and the hardware scale of the processing platform is increased; secondly, the development difficulty is high. Because different algorithms are required to run on different chips, and the development environments of the chips are different, software designers are difficult to write programs efficiently, meanwhile, the cooperative processing of different devices definitely involves resource scheduling and data transmission, and the difficulty of stable and efficient communication development of data among different devices is also high. And the processing energy consumption ratio is high, because the parallel processing is the cooperation among the chips, the system consumes larger energy in data synchronization and data exchange, the resource scheduling efficiency is reduced due to low coupling degree of a hardware architecture, and the processing resources of the chips cannot realize the high-timeliness parallel cooperation.

Disclosure of Invention

Technical problem to be solved

In view of the foregoing technical problems, the present invention provides a distributed computing system, a method and an electronic device, which are used to at least partially solve the above technical problems.

(II) technical scheme

One aspect of the present invention provides a distributed computing system, comprising: the main FPGA chip is in communication connection with the at least one slave FPGA chip and is used for realizing data processing, and/or image processing, and/or running water calculation or parallel calculation or iterative calculation of signal processing; the master FPGA chip and the at least one slave FPGA chip are both connected with at least one double data rate memory.

Optionally, the master FPGA chip and the at least one slave FPGA chip are both further connected with at least one quadruple data rate memory.

Optionally, a data transpose module is further disposed between the master FPGA chip and/or the slave FPGA and the data rate storage connected thereto.

Optionally, the maximum bit width of the double data rate memory is 256 bits.

Optionally, the master FPGA chip is communicatively connected to the at least one slave FPGA chip via a high-speed data bus.

Another aspect of the present invention provides a method for multithreaded data transposing in a distributed computing system, comprising: the master FPGA chip distributes data to a double data rate memory or a quadruple data rate memory connected with each slave FPGA chip; the master FPGA chip receives the synchronous instructions sent by the slave FPGA chips, and whether the states of the data rate doubling memories are consistent or not is confirmed according to the synchronous instructions; if so, the master FPGA chip sends a read instruction to each slave FPGA chip, so that each slave FPGA chip respectively performs parallel transposition operation on data in a double data rate memory or a quadruple data rate memory connected with the slave FPGA chip.

A third aspect of the present invention provides a distributed computing method, including: the FPGA chip sends a synchronization instruction to each slave FPGA chip and receives state information fed back by the slave FPGA chip analyzing the synchronization instruction; and the master FPGA chip confirms whether the states of the slave FPGA chips are consistent or not according to the state information, and if so, sends a processing instruction, a processing parameter and data to be processed to the slave FPGA chips to realize distributed calculation of the data.

Optionally, the implementing distributed computation of data includes: and realizing the pipeline calculation or parallel calculation or iterative calculation of the data.

Optionally, the distributed computing method further includes: and reading the calculation structures stored in the double data rate memory or the quadruple data rate memory connected with the FPGA chip to a local memory.

A fourth aspect of the present invention provides an electronic apparatus comprising: one or more processors; memory for storing one or more programs, wherein the one or more programs, when executed by the one or more processors, cause the one or more processors to implement the method described above.

(III) advantageous effects

The invention provides a distributed computing system, a distributed computing method and electronic equipment, which at least have the following beneficial effects:

1. the distributed computing system is based on a Field Programmable Gate Array (FPGA), adopts a distributed architecture of a main FPGA chip and a slave FPGA chip, and greatly improves performance indexes in the aspects of data computation and data storage.

2. In a system mainly based on FPGA, a plurality of processing structures are arranged, so that the pipeline calculation or parallel calculation or iterative calculation of data processing or image processing or signal processing can be realized, and the scale of a hardware platform is greatly reduced.

3. The distributed computing system adopts a mode of combining double data rate memories, quadruple data rates and FPGA internal storage, and can finish complex and efficient data access operation.

4. The FPGA has comprehensive programmable characteristics, the distributed processing architecture is also formed by interconnecting various processors, and different processing flows can be constructed by modifying programs for different algorithms, so that the adaptability is high.

Drawings

FIG. 1 schematically illustrates an architecture diagram of a distributed computing system provided by an embodiment of the present invention;

FIG. 2 schematically illustrates a flow chart of a method of multi-threaded data transpose provided by an embodiment of the present invention;

FIG. 3 is a diagram schematically illustrating a multi-threaded data transpose process provided by an embodiment of the invention;

FIG. 4 is a flow chart that schematically illustrates a method for distributed computation of multithreaded data as provided by an embodiment of the invention;

FIG. 5 is a flow diagram that schematically illustrates an engineering implementation of a distributed computing system, in accordance with an embodiment of the present invention;

fig. 6 schematically shows a block diagram of an electronic device of an embodiment of the invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is described in further detail below with reference to specific embodiments and the accompanying drawings.

The invention designs a distributed computing system based on an FPGA. The FPGA is a fully programmable device, programming resources comprise a look-up table (LUT) resource, a Register (REG) resource and a wiring resource, pins of the FPGA are divided into a select I0 and a RockettI 0, and various complex data communication and high-speed memory interfaces can be completed by writing codes by a user. Distributed computing refers to performing the same or different types of processing on multiple hardware platforms. The key to distributed computing is the design of computational and memory resources. The FPGA has high-flexibility logic resources and interface resources, so that the FPGA can be used for completing the design of a high-performance distributed computing platform. The following detailed description will proceed with reference being made to specific embodiments.

Fig. 1 schematically shows an architecture diagram of a distributed computing system according to an embodiment of the present invention, and referring to fig. 1, the distributed computing system includes a master FPGA chip and at least one slave FPGA chip (generally, the slave FPGA chips are referred to as a plurality for improving data processing capacity and speed). And the master FPGA chip is in distributed communication connection with each slave FPGA chip. In some embodiments of the present invention, the master FPGA chip and the slave FPGA chip may be connected through a high-speed data bus in a communication manner, and may also be connected through a high-speed backplane bus, and the specific connection mode is not limited in the present invention and may be selected according to actual requirements.

The master FPGA chip and each slave FPGA chip which are connected in a distributed mode are used for achieving data processing, and/or image processing, and/or signal processing flow calculation or parallel calculation or iterative calculation. Specifically, the master FPGA chip and each slave FPGA chip are computation cores, and due to the full programmable characteristic of the FPGA, each slave FPGA chip can be programmed differently to realize various processing structures, so that various computations such as flow computation, parallel computation, iterative computation and the like can be completed on one platform, and the scale of a hardware platform is greatly reduced.

In order to meet the Data storage requirement of the distributed computing system and improve the Data storage capacity, in some feasible manners of this embodiment, the master FPGA chip and each slave FPGA chip are connected to at least one Double Data Rate (DDR) memory, generally, the DDR is set to multiple groups, and the maximum bit width of each DDR memory can be designed to be 256 bits, so that the Data access capacity of the distributed computing system can be greatly improved, and a good basis is provided for parallel computing of subsequent Data and the like. In addition, in order to complete data storage with a higher speed than a DDR memory, the DDR memories are configured for the master FPGA chip and each slave FPGA chip, the master FPGA chip and each slave FPGA chip can be connected with one less Quad Data Rate (QDR) memory, and a Block random access memory (Block Ram) is further arranged in each FPGA chip, so that the distributed computing system can complete more complex and efficient data access operation by adopting a mode of combining the double data rate memory, the Quad data rate and the internal storage of the FPGA. In this embodiment, the QDR memory is a full-duplex ram, and the present invention is not limited thereto.

In some possible ways of the embodiment, in order to access data in the ddr (qrd) memory with finer granularity, the data rate memory to which the master FPGA chip or the slave FPGA chip is connected is provided with a data transpose module. The data transposition operation is one of the complex operations of the memory, the structure can well complete the transposition operation of a plurality of groups of DDR data, and simultaneously, the DDR burst characteristic is utilized to complete the parallel processing of multiple threads.

Based on the distributed computing system, the present embodiment further provides a multithreading data transposing method, fig. 2 schematically illustrates a flowchart of the multithreading data transposing method provided by the embodiment of the present invention, fig. 3 schematically illustrates a multithreading data transposing process diagram provided by the embodiment of the present invention, please refer to fig. 2 and fig. 3, where the multithreading data transposing method includes:

s201, the master FPGA chip distributes data to the double data rate memory or the quadruple data rate memory connected with each slave FPGA chip.

The master FPGA chip firstly receives data to be processed sent by an external memory, stores the data in an internal block random access memory, and then distributes the data to the memories of all slave FPGA chips.

S202, the master FPGA chip receives the synchronization instructions sent by the slave FPGA chips.

And after receiving the data, each slave FPGA chip sends a synchronization instruction to the master FPGA chip.

S203, confirming whether the states of the data rate memory are consistent or not according to the synchronous command.

If yes, operation S204 is performed. If not, the state of each data rate memory is waited to be consistent.

And S204, the master FPGA chip sends a reading instruction to each slave FPGA chip, so that each slave FPGA chip respectively performs parallel transposition operation on the data in the double data rate memory or the quadruple data rate memory connected with the slave FPGA chip.

And finishing the data transposition operation of the multiple groups of multiplying power memories.

Based on the distributed computing system, the present embodiment further provides a method for multithreaded data distributed computing, fig. 4 schematically illustrates a flowchart of the method for multithreaded data distributed computing provided in the embodiment of the present invention, and please refer to fig. 4, where the method for multithreaded data transposing includes:

s401, the master FPGA chip sends a synchronization instruction to each slave FPGA chip.

S402, the master FPGA chip receives the state information fed back by the slave FPGA chip analysis synchronization instruction.

And after receiving the synchronous instruction from the FPGA chip, analyzing the synchronous instruction, and feeding back the state to the main FPGA chip according to an analysis result.

And S403, the master FPGA chip confirms whether the states of the slave FPGA chips are consistent or not according to the state information.

If so, operation S404 is performed. And if not, waiting for the states of the slave FPGA chips to be consistent.

And S404, sending a processing instruction, a processing parameter and data to be processed to each slave FPGA chip to realize distributed computation of the data.

Implementing distributed computing of data may include: and realizing the pipeline calculation, parallel calculation, iterative calculation and the like of the data. And after the data processing is finished, reading the calculation structures stored in the double data rate memory or the quadruple data rate memory connected with the FPGA chip to the local memory. And waiting for the next control instruction of the main FPGA chip.

And completing the multi-thread data distributed computation.

The FPGA-based distributed computing system can be widely applied to large-data-volume engineering projects with high timeliness requirements. The implementation of the method in engineering implementation is divided into a total of four steps. Fig. 5 schematically shows a flowchart of an engineering implementation method of a distributed computing system according to an embodiment of the present invention, and referring to fig. 5, the implementation method includes:

and S501, evaluating the demand of processing resources.

The evaluation content includes storage resources and computing resources. The memory resources are the memory accessed for a plurality of times in the whole algorithm period, and the maximum memory access is the number of times. The computing resources include two points: firstly, calculating the rate, wherein the rate is selected for calculating the final influence of the timeliness index on a system clock in FPGA design; and secondly, calculating the resource area, wherein the resource area is the number of the FPGAs of the processing system and the resource utilization rate of each FPGA which are influenced by the resource scale index. And finishing the design of the processing platform by combining the resource evaluation result and the distributed processing architecture.

And S502, analyzing the algorithm to be realized.

And designing structures such as flow processing, parallel processing, iterative processing and the like. And splitting the algorithm to be realized, and mapping the split module into the realization structure.

S503, designing a processing circuit.

And designing a circuit structure corresponding to the processing flow and designing a data storage mode.

S504, designing a control flow.

And combining the data calculation flow and the data storage flow to complete the processing architecture design.

And S505, performing functional performance test on the finished design.

Thus, the engineering implementation of the distributed computing system is completed.

In summary, the present invention provides a distributed computing system and method, where the distributed computing system is based on an FPGA, and adopts a distributed architecture of a master FPGA chip and a slave FPGA chip, so that performance indexes in data computation and data storage are greatly improved. In a system mainly based on FPGA, a plurality of processing structures are arranged, so that the pipeline calculation or parallel calculation or iterative calculation of data processing or image processing or signal processing can be realized, and the scale of a hardware platform is greatly reduced. The distributed computing system adopts a mode of combining double data rate memories, quadruple data rates and FPGA internal storage, and can finish complex and efficient data access operation. In addition, the FPGA has comprehensive programmable characteristics, the distributed processing architecture is also formed by interconnecting various processors, and different processing flows can be constructed by modifying programs for different algorithms, so that the adaptability is high.

Fig. 6 schematically shows a block diagram of an electronic device according to an embodiment of the invention. The electronic device shown in fig. 6 is only an example, and should not bring any limitation to the functions and the scope of use of the embodiments of the present invention.

As shown in fig. 6, the electronic device 600 includes a processor 610, a computer-readable storage medium 620. The electronic device 600 may perform a method according to an embodiment of the invention.

In particular, the processor 610 may comprise, for example, a general purpose microprocessor, an instruction set processor and/or related chip set and/or a special purpose microprocessor (e.g., an Application Specific Integrated Circuit (ASIC)), or the like. The processor 610 may also include onboard memory for caching purposes. Processor 610 may be a single processing unit or a plurality of processing units for performing the different actions of the method flows according to embodiments of the present invention.

Computer-readable storage medium 620, for example, may be a non-volatile computer-readable storage medium, specific examples including, but not limited to: magnetic storage systems, such as magnetic tape or Hard Disk Drives (HDDs); optical storage systems, such as compact discs (CD-ROMs); memory such as Random Access Memory (RAM) or flash memory, etc.

The computer-readable storage medium 620 may comprise a computer program 621, which computer program 621 may comprise code/computer-executable instructions that, when executed by the processor 610, cause the processor 610 to perform a method according to an embodiment of the invention, or any variant thereof.

The computer program 621 may be configured with, for example, computer program code comprising computer program modules. For example, in an example embodiment, code in computer program 621 may include one or more program modules, including 621A, 621B, … …, for example. It should be noted that the division and number of the modules are not fixed, and those skilled in the art may use suitable program modules or program module combinations according to actual situations, so that the processor 610 may execute the method according to the embodiment of the present invention or any variation thereof when the program modules are executed by the processor 610.

The above-mentioned embodiments are intended to illustrate the objects, technical solutions and advantages of the present invention in further detail, and it should be understood that the above-mentioned embodiments are only exemplary embodiments of the present invention, and are not intended to limit the present invention, and any modifications, equivalents, improvements and the like made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims

1. A distributed computing system, the distributed computing system comprising:

the FPGA chip is in communication connection with the at least one slave FPGA chip and is used for realizing data processing, and/or image processing, and/or running water calculation or parallel calculation or iterative calculation of signal processing;

the master FPGA chip and the at least one slave FPGA chip are both connected with at least one double data rate memory.

2. The distributed computing system of claim 1, wherein each of the master FPGA chip and the at least one slave FPGA chip is further coupled with at least one quad data rate memory.

3. The distributed computing system of claim 1, wherein a data transpose module is further disposed between the master FPGA chip and/or the slave FPGA chip and the data rate storage connected thereto.

4. The distributed computing system of claim 1, wherein the maximum bit width of the double data rate memory is 256 bits.

5. The distributed computing system of claim 1, wherein the master FPGA chip is communicatively coupled with the at least one slave FPGA chip via a high speed data bus.

6. A method of multithreaded data transposing based on the distributed computing system of any of claims 1-5, comprising:

the master FPGA chip distributes data to a double data rate memory or a quadruple data rate memory connected with each slave FPGA chip;

the master FPGA chip receives the synchronous instructions sent by the slave FPGA chips, and whether the states of the data rate doubling memories are consistent or not is confirmed according to the synchronous instructions;

if so, the master FPGA chip sends a read instruction to each slave FPGA chip, so that each slave FPGA chip respectively performs parallel transposition operation on data in a double data rate memory or a quadruple data rate memory connected with the slave FPGA chip.

7. A distributed computing method based on the distributed computing system according to any one of claims 1 to 5, comprising:

the master FPGA chip sends a synchronization instruction to each slave FPGA chip and receives state information fed back by the slave FPGA chip analyzing the synchronization instruction;

and the master FPGA chip confirms whether the states of the slave FPGA chips are consistent or not according to the state information, and if so, sends a processing instruction, a processing parameter and data to be processed to the slave FPGA chips to realize distributed calculation of the data.

8. The distributed computing method of claim 7, wherein the implementing distributed computing of data comprises:

and realizing the pipeline calculation or parallel calculation or iterative calculation of the data.

9. The distributed computing method of claim 7, further comprising:

and reading the calculation structures stored in the double data rate memory or the quadruple data rate memory connected with the FPGA chip to a local memory.

10. An electronic device, comprising:

one or more processors;

a memory for storing one or more programs,

wherein the one or more programs, when executed by the one or more processors, cause the one or more processors to implement the method of claim 6 or any of claims 7-9.