CN111090611A - Small heterogeneous distributed computing system based on FPGA - Google Patents

Small heterogeneous distributed computing system based on FPGA Download PDF

Info

Publication number
CN111090611A
CN111090611A CN201811247613.XA CN201811247613A CN111090611A CN 111090611 A CN111090611 A CN 111090611A CN 201811247613 A CN201811247613 A CN 201811247613A CN 111090611 A CN111090611 A CN 111090611A
Authority
CN
China
Prior art keywords
data
module
fpga
calculation
computing system
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201811247613.XA
Other languages
Chinese (zh)
Inventor
陈钰文
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Xuehu Information Technology Co Ltd
Original Assignee
Shanghai Xuehu Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Xuehu Information Technology Co Ltd filed Critical Shanghai Xuehu Information Technology Co Ltd
Priority to CN201811247613.XA priority Critical patent/CN111090611A/en
Publication of CN111090611A publication Critical patent/CN111090611A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F15/00Digital computers in general; Data processing equipment in general
    • G06F15/76Architectures of general purpose stored program computers
    • G06F15/78Architectures of general purpose stored program computers comprising a single central processing unit
    • G06F15/7807System on chip, i.e. computer system on a single chip; System in package, i.e. computer system on one or more chips in a single package
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F15/00Digital computers in general; Data processing equipment in general
    • G06F15/76Architectures of general purpose stored program computers
    • G06F15/78Architectures of general purpose stored program computers comprising a single central processing unit
    • G06F15/7839Architectures of general purpose stored program computers comprising a single central processing unit with memory
    • G06F15/7842Architectures of general purpose stored program computers comprising a single central processing unit with memory on one IC chip (single chip microcontrollers)
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F15/00Digital computers in general; Data processing equipment in general
    • G06F15/76Architectures of general purpose stored program computers
    • G06F15/78Architectures of general purpose stored program computers comprising a single central processing unit
    • G06F15/7867Architectures of general purpose stored program computers comprising a single central processing unit with reconfigurable architecture
    • G06F15/7871Reconfiguration support, e.g. configuration loading, configuration switching, or hardware OS
    • G06F15/7878Reconfiguration support, e.g. configuration loading, configuration switching, or hardware OS for pipeline reconfiguration

Abstract

The invention discloses a small heterogeneous distributed computing system based on an FPGA (field programmable gate array), belonging to the technical field of computation-intensive hardware design, and comprising a data input module, a data computing module and a data returning module; the data input module is used for scattering and recombining data and sending the data to the data calculation module in a string mode in a pipeline mode; the data calculation module is used for receiving the data input module and transmitting the data to the data return module; the data return module is used for grouping the sequence of the calculation output results of the preceding-stage data input module by the disordered return data, can exert the advantages of FPGA (field programmable gate array) flow calculation and high throughput to the greatest extent, and is very suitable for the calculation requirement of tattletales; and adopting an FPGA cascade configurable strategy in the distributed core computing unit to configure according to specific computing requirements.

Description

Small heterogeneous distributed computing system based on FPGA
Technical Field
The invention relates to the technical field of calculation intensive hardware design, in particular to a small heterogeneous distributed computing system based on an FPGA (field programmable gate array).
Background
Most of the existing software open source frameworks are based on an operating system, and for the operating system, the software open source frameworks are based on a hardware unit, and a core unit related to calculation in the hardware unit is a CPU. At present, according to different manufacturers or different instruction sets, CPUs can be divided into different architectures such as x86, MIPS, POWERPC, ARM and the like, but the architectures are all von willebran architectures in nature, each operation is simplified into the execution of a single instruction, and the single instruction undergoes the most basic steps of accessing, fetching, decoding, executing and writing back to complete the actual life cycle of the instruction. Therefore, from a microscopic perspective, each computation CPU performs a relatively complex and time-consuming instruction translation execution process. However, for the CPU, the execution among the instructions must be executed in order, that is, the next instruction must wait for the execution of the previous instruction to complete before continuing the execution, so the micro-accumulated time-consuming calculation will result in the unsatisfied macro real-time high-density calculation. Although various optimization blocks such as branch prediction, superscalar, hyper-threading, hyper-frequency, etc. have been proposed for the lack of computational performance of the CPU, they are just optimization, and their most fundamental architectural problems are not eliminated.
GPUs are also becoming more and more widely used for the market demands of dramatic increases in computational load and complexity. Compared with the CPU, the GPU has the data parallel capability which is not possessed by the CPU and can carry out block parallel operation on data, so that the GPU has higher data throughput rate and can better support streaming calculation of large data volume like multimedia, images and audios and videos. However, for most applications, the GPU is currently running on an operating system, and needs to interact with the CPU, and the computation process of the GPU is wound by one turn in the CPU-based framework, which is obvious. In addition, what is more critical is that the GPU can only perform data parallel, it cannot implement a deep-pipelined computation module, and the data entering the GPU must have no cross relationship before and after one computation process, and once the data are correlated, it must wait for the previous data to be prepared, and then can enter the next data computation. Therefore, although data parallelism is realized, the data parallelism is not really used, and the parallel data can be really calculated only by waiting for the completion of the data of the previous operation.
The computing unit of the existing distributed computing system adopts a CPU or a GPU of a von-Willebrand architecture, wherein the CPU is not suitable for intensive data computing, the CPU is more suitable for task scheduling, the GPU has higher efficiency but only data parallel, and the instruction flow depth is still limited, so the CPU and the GPU are not suitable for intensive computing; the existing FPGA computing modules aiming at acceleration are combined to form an FPGA computing block by adopting high-performance FPGA chips in a PCIE protocol cascading mode, so that great requirements are brought to requirements on PCB design, cost and the like, in addition, the mode has a limit on the number of FPGA integration, and once a single FPGA in the integrated module breaks down, the whole system is paralyzed; in the computing nodes of the distributed computing system, the mode of CPU + NIC is not adopted to receive the node data.
Based on the technical scheme, the invention designs a small heterogeneous distributed computing system based on the FPGA to solve the problems.
Disclosure of Invention
The invention aims to provide a small heterogeneous distributed computing system based on an FPGA (field programmable gate array), so as to solve the problem that the computing unit of the existing distributed computing system proposed in the background art adopts a CPU or a GPU (graphics processing unit) of a von-Willebrand architecture, wherein the CPU is not suitable for intensive data computing, the CPU is more suitable for task scheduling, the GPU has higher efficiency but only has data parallel, and the instruction flow depth of the GPU is still limited, so that the CPU and the GPU are not suitable for intensive computing; the existing FPGA computing modules aiming at acceleration are combined to form an FPGA computing block by adopting high-performance FPGA chips in a PCIE protocol cascading mode, so that great requirements are brought to requirements on PCB design, cost and the like, in addition, the mode has a limit on the number of FPGA integration, and once a single FPGA in the integrated module breaks down, the whole system is paralyzed; in the computing nodes of the distributed computing system, the problem of receiving node data in a mode of CPU + NIC is not adopted.
In order to achieve the purpose, the invention provides the following technical scheme: a small heterogeneous distributed computing system based on FPGA comprises a data input module, a data computing module and a data return module;
the data input module is used for scattering and recombining data and sending the data to the data calculation module in a string mode in a pipeline mode;
the data calculation module is used for receiving the data input module and transmitting the data to the data return module;
and the data returning module is used for grouping the sequence of the arrival of the output result calculated by the preceding-stage data input module through the out-of-sequence returning data.
Preferably, the data input module includes but is not limited to CPU, FPGA and DDR hardware module;
the FPGA module is used for receiving data and scattering and recombining the data;
the CPU module is directly connected with the FPGA module at a high speed through a QPI protocol and is used for the CPU module to rapidly and dynamically configure the FPGA module to receive and transmit data.
Preferably, the data input module further includes at least two groups of ethernet physical interfaces, one group of the ethernet physical interfaces is used for receiving data;
and the other group of Ethernet physical interfaces is used for data forwarding.
Preferably, the data input module further comprises a reassembly pipeline module, and the ethernet physical interface for receiving data can unwind serial input data and transmit parallel data to the reassembly pipeline module.
Preferably, the data calculation module comprises at least one group of data calculation units, and the data calculation units comprise a single group of FPGA, DDR and at least two groups of ethernet physical interfaces.
Preferably, the data backhaul module includes a post-stage processing module, and the post-stage processing module is configured to improve data throughput by means of deep pipelining the recombined data.
Compared with the prior art, the invention has the beneficial effects that: the invention can exert the advantages of FPGA flow calculation and high throughput to the maximum extent, and is very suitable for the calculation requirement of tattletale; adopting an FPGA cascade configurable strategy in a distributed core computing unit to configure according to specific computing requirements; in the data distribution module and the data return module, the FPGA communicates with the CPU through a QPI bus, the CPU can directly access a memory controller of the FPGA and can directly inform the FPGA to read and write data, and therefore, compared with a traditional mode that the CPU and the FPGA share the memory, a large amount of time is saved; the network protocol stack directly receives and transmits the network data packet through the FPGA, so that the time for executing a large amount of decoding in the receiving and transmitting verification process of the CPU can be saved, and the total receiving and transmitting time can be improved by one order of magnitude.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings used in the description of the embodiments will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art that other drawings can be obtained according to the drawings without creative efforts.
FIG. 1 is an overall framework diagram of a distributed heterogeneous computing system of the present invention;
FIG. 2 is a diagram of a distributed heterogeneous computing system hardware framework of the present invention;
FIG. 3 is a block diagram of the embodiment of FIG. 2;
FIG. 4 is an enlarged view of the left end of FIG. 3 in accordance with the present invention;
FIG. 5 is an enlarged view of the right end connection of FIG. 4 according to the present invention;
FIG. 6 is an enlarged view of the right end connection of FIG. 5 in accordance with the present invention;
FIG. 7 is an enlarged view of the right end connection of FIG. 6 according to the present invention;
FIG. 8 is a block diagram of a data computing unit according to the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
Referring to fig. 1-8, the present invention provides a technical solution: a small heterogeneous distributed computing system based on FPGA comprises a data input module, a data computing module and a data return module;
the data input module is used for scattering and recombining data and sending the data to the data calculation module in a string mode in a pipeline mode;
the data calculation module is used for receiving the data input module and transmitting the data to the data return module;
and the data returning module is used for grouping the sequence of the arrival of the output result calculated by the preceding-stage data input module through the out-of-sequence returning data.
It should be noted that the system is composed of three parts, namely a data input module, a data calculation module and a data return module. The input module is composed of hardware modules such as a CPU, an FPGA and a DDR. After input data are transmitted to the input module through a network, the input data are directly received by the FPGA, then the data are scattered and recombined, and the data are forwarded out in a pipeline mode. The CPU in the input module is directly connected with the FPGA at a high speed through a QPI protocol, and the CPU has the function of rapidly and dynamically configuring related strategies for the FPGA to receive and transmit data without directly participating in data receiving, transmitting, verifying, recombining and the like. The FPGA in the input module internally realizes a complete TCP/IP protocol stack, and a group of (two in total) Ethernet physical interfaces are externally configured, wherein one interface is specially used for receiving data, and the other interface is specially used for forwarding the data. At the data receiving end, the serial input data is unfolded into parallel data and transmitted to the recombination pipeline module; at one end of data output, before data forwarding, parallel output data of the recombination pipeline module is converted into serial data through a local frequency multiplication mode. The reorganized data will be serially distributed to the subsequent computing modules in a manner several times higher than the input rate. The computing module is composed of a group of computing units, and each computing unit is a single FPGA, a DDR and two Ethernet physical interfaces. The data distributed from the input module reaches each computing unit after passing through the switch, the computing unit receives the data through the IP internally realizing the TCP/IP protocol stack and transmits the data to the IP specially computed, and after the computation is completed, the data is forwarded to the post-stage return module through the Ethernet interface. The hardware composition of the data returning module is the same as that of the data input module, but the FPGA of the data returning module returns data out of order, specifically, the data returning module groups the data according to the arrival sequence of the output result calculated by the preceding-stage calculation module.
In still further embodiments, the data input module includes, but is not limited to, a CPU, an FPGA, and a DDR hardware module;
the FPGA module is used for receiving data and scattering and recombining the data;
the CPU module is directly connected with the FPGA module at a high speed through a QPI protocol and is used for the CPU module to rapidly and dynamically configure the FPGA module to receive and transmit data.
In a further embodiment, the data input module further includes at least two groups of ethernet physical interfaces, one group of the ethernet physical interfaces is used for receiving data;
and the other group of Ethernet physical interfaces is used for data forwarding.
In a further embodiment, the data input module further comprises a reassembly pipeline module, and the ethernet physical interface for receiving data may unwind serial input data and pass the parallel data to the reassembly pipeline module.
In a further embodiment, the data calculation module includes at least one group of data calculation units, and the data calculation units include a single group of FPGA, DDR and at least two groups of ethernet physical interfaces.
In a further embodiment, the data backhaul module includes a post-processing module, and the post-processing module is configured to improve data throughput by means of deep pipeline on the recombined data;
as shown in fig. 2, the hardware framework of the distributed heterogeneous computing system designed by the present invention includes a front-end data distribution module, a data computing unit, and a data returning unit. Fig. 3 shows a specific embodiment of fig. 2. The data distribution module adopts a CPU + FPGA architecture, and the CPU and the FPGA are connected through a PCIE or QPI bus. The front-end network data is input into the data distribution module through a route or a switch, and the FPGA in the data distribution module and the cascaded DDR thereof are used for caching together. If the later-stage computing module does not need to recombine the data at the moment, the FPGA directly distributes the cached data in parallel through the data distribution IP unit integrated inside. If the later-stage FPGA computing unit needs to recombine the data before computing, the data is directly connected to the data recombination module in series behind the FPGA buffer module and then forwarded to the later-stage computing unit. If the data recombination is complex and the recombination strategy needs to be changed dynamically, the operation required by the recombination can be converted into an instruction corresponding to an MIG module inside the FPGA, and the instruction is directly sent to the FPGA through a PCIE or QPI bus directly connected with the CPU and the FPGA, so that the FPGA can change the strategy of the data recombination rapidly while buffering data efficiently. The data calculation unit is completely composed of a plurality of groups of single FPGA, and the total amount of the data calculation unit is dynamically distributed according to the actual calculation amount or the communication task. The data computation unit in the monolithic FPGA is completed by an internal unique IPCore. The hardware composition of the data return unit is consistent with that of the data distribution module, and the difference is that the MIG instruction transmitted to the FPGA by the CPU and the specific design realization of the data result recombination module and the result return module in the FPGA.
As shown in fig. 3, the data distribution module is cascaded with the data computation module through a switch or other network devices, and the data computation unit is cascaded with the post-stage data backhaul module through another switch or other network devices, and the reason for using two sets of network devices is to fully adapt to the deep pipeline structure in the computation unit module, thereby ensuring high data throughput of the system.
As shown in fig. 4-7, the hardware architecture of the data distribution and data return module is the same, and two network physical interfaces, which may be RJ45 or ST and SC, are provided at the peripheral part of the FPGA. For the data distribution module, the data distribution module receives network computing data through one port, performs data recombination in a deep pipelining mode through an internal dedicated IPcore, and then forwards the data to the post-stage processing module through another port. The data throughput rate is greatly improved by a dual-port and deep pipelining mode. For the data back-transmission module, the design of the internal special IPcore is different from that of the data receiving module, and the IPcore function of the data back-transmission module is to regularly repackage the calculation results which arrive out of order, attach labels to the repackaged calculation results and then transmit the repackaged calculation results back to the rear-stage module. And the data distribution and data return modules realize a network protocol stack in the FPGA. As shown in fig. 8, the data calculation unit is composed of a single FPGA plus a dual network interface. According to actual requirements, the single computing unit can be deployed in a single node, and can also be locally interconnected into a star network or a ring network according to the complexity of computing tasks, and the formed local network and other nodes together form a computing unit part in the computing system. Thus, the computing unit portion is dynamically configured in a structure to compute the needs of the task. And inside each node of the computing unit, a special IPCore is adopted for parallel running computation.
In the description herein, references to the description of "one embodiment," "an example," "a specific example" or the like are intended to mean that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the invention. In this specification, the schematic representations of the terms used above do not necessarily refer to the same embodiment or example. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples.
The preferred embodiments of the invention disclosed above are intended to be illustrative only. The preferred embodiments are not intended to be exhaustive or to limit the invention to the precise embodiments disclosed. Obviously, many modifications and variations are possible in light of the above teaching. The embodiments were chosen and described in order to best explain the principles of the invention and the practical application, to thereby enable others skilled in the art to best utilize the invention. The invention is limited only by the claims and their full scope and equivalents.

Claims (6)

1. A small heterogeneous distributed computing system based on FPGA is characterized in that: the device comprises a data input module, a data calculation module and a data return module;
the data input module is used for scattering and recombining data and sending the data to the data calculation module in a string mode in a pipeline mode;
the data calculation module is used for receiving the data input module and transmitting the data to the data return module;
and the data returning module is used for grouping the sequence of the arrival of the output result calculated by the preceding-stage data input module through the out-of-sequence returning data.
2. The small heterogeneous distributed FPGA-based computing system of claim 1, wherein: the data input module comprises but is not limited to a CPU, an FPGA and a DDR hardware module;
the FPGA module is used for receiving data and scattering and recombining the data;
the CPU module is directly connected with the FPGA module at a high speed through a QPI protocol and is used for the CPU module to rapidly and dynamically configure the FPGA module to receive and transmit data.
3. The small heterogeneous distributed FPGA-based computing system of claim 2, wherein: the data input module also comprises at least two groups of Ethernet physical interfaces, and one group of the Ethernet physical interfaces is used for receiving data;
and the other group of Ethernet physical interfaces is used for data forwarding.
4. The small heterogeneous distributed FPGA-based computing system of claim 3, wherein: the data input module also comprises a recombination pipeline module, and the Ethernet physical interface for receiving data can spread serial input data and transmit the parallel data to the recombination pipeline module.
5. The small heterogeneous distributed FPGA-based computing system of claim 1, wherein: the data calculation module comprises at least one group of data calculation units, and each data calculation unit comprises a single group of FPGA, DDR and at least two groups of Ethernet physical interfaces.
6. The small heterogeneous distributed FPGA-based computing system of claim 1, wherein: the data returning module comprises a post-stage processing module, and the post-stage processing module is used for improving the data throughput of the recombined data in a deep pipeline mode.
CN201811247613.XA 2018-10-24 2018-10-24 Small heterogeneous distributed computing system based on FPGA Pending CN111090611A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201811247613.XA CN111090611A (en) 2018-10-24 2018-10-24 Small heterogeneous distributed computing system based on FPGA

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201811247613.XA CN111090611A (en) 2018-10-24 2018-10-24 Small heterogeneous distributed computing system based on FPGA

Publications (1)

Publication Number Publication Date
CN111090611A true CN111090611A (en) 2020-05-01

Family

ID=70392706

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201811247613.XA Pending CN111090611A (en) 2018-10-24 2018-10-24 Small heterogeneous distributed computing system based on FPGA

Country Status (1)

Country Link
CN (1) CN111090611A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114531459A (en) * 2020-11-03 2022-05-24 深圳市明微电子股份有限公司 Cascade equipment parameter self-adaptive obtaining method, device, system and storage medium
WO2023093043A1 (en) * 2021-11-26 2023-06-01 浪潮电子信息产业股份有限公司 Data processing method and apparatus, and medium

Citations (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20090109974A1 (en) * 2007-10-31 2009-04-30 Shetty Suhas A Hardware Based Parallel Processing Cores with Multiple Threads and Multiple Pipeline Stages
US20090177832A1 (en) * 2007-11-12 2009-07-09 Supercomputing Systems Ag Parallel computer system and method for parallel processing of data
CN104657330A (en) * 2015-03-05 2015-05-27 浪潮电子信息产业股份有限公司 High-performance heterogeneous computing platform based on x86 architecture processor and FPGA (Field Programmable Gate Array)
CN106339351A (en) * 2016-08-30 2017-01-18 浪潮(北京)电子信息产业有限公司 SGD (Stochastic Gradient Descent) algorithm optimization system and method
CN107066802A (en) * 2017-01-25 2017-08-18 人和未来生物科技(长沙)有限公司 A kind of heterogeneous platform calculated towards gene data
CN107145467A (en) * 2017-05-13 2017-09-08 贾宏博 A kind of distributed computing hardware system in real time
CN107273331A (en) * 2017-06-30 2017-10-20 山东超越数控电子有限公司 A kind of heterogeneous computing system and method based on CPU+GPU+FPGA frameworks
CN108052839A (en) * 2018-01-25 2018-05-18 知新思明科技(北京)有限公司 Mimicry task processor
CN108268278A (en) * 2016-12-30 2018-07-10 英特尔公司 Processor, method and system with configurable space accelerator
CN108459988A (en) * 2017-02-17 2018-08-28 英特尔公司 Duration direct distance input and output
CN108563808A (en) * 2018-01-05 2018-09-21 中国科学技术大学 The design method of heterogeneous reconfigurable figure computation accelerator system based on FPGA

Patent Citations (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20090109974A1 (en) * 2007-10-31 2009-04-30 Shetty Suhas A Hardware Based Parallel Processing Cores with Multiple Threads and Multiple Pipeline Stages
US20090177832A1 (en) * 2007-11-12 2009-07-09 Supercomputing Systems Ag Parallel computer system and method for parallel processing of data
CN104657330A (en) * 2015-03-05 2015-05-27 浪潮电子信息产业股份有限公司 High-performance heterogeneous computing platform based on x86 architecture processor and FPGA (Field Programmable Gate Array)
CN106339351A (en) * 2016-08-30 2017-01-18 浪潮(北京)电子信息产业有限公司 SGD (Stochastic Gradient Descent) algorithm optimization system and method
CN108268278A (en) * 2016-12-30 2018-07-10 英特尔公司 Processor, method and system with configurable space accelerator
CN107066802A (en) * 2017-01-25 2017-08-18 人和未来生物科技(长沙)有限公司 A kind of heterogeneous platform calculated towards gene data
CN108459988A (en) * 2017-02-17 2018-08-28 英特尔公司 Duration direct distance input and output
CN107145467A (en) * 2017-05-13 2017-09-08 贾宏博 A kind of distributed computing hardware system in real time
CN107273331A (en) * 2017-06-30 2017-10-20 山东超越数控电子有限公司 A kind of heterogeneous computing system and method based on CPU+GPU+FPGA frameworks
CN108563808A (en) * 2018-01-05 2018-09-21 中国科学技术大学 The design method of heterogeneous reconfigurable figure computation accelerator system based on FPGA
CN108052839A (en) * 2018-01-25 2018-05-18 知新思明科技(北京)有限公司 Mimicry task processor

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
CHI ZHANG等: "High Throughput Large Scale Sorting on a CPU-FPGA Heterogeneous Platform" *
王慕所: "面向组件的通信中间件技术研究" *
高子航: "实时数据包重组及多协议传输技术研究" *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114531459A (en) * 2020-11-03 2022-05-24 深圳市明微电子股份有限公司 Cascade equipment parameter self-adaptive obtaining method, device, system and storage medium
WO2023093043A1 (en) * 2021-11-26 2023-06-01 浪潮电子信息产业股份有限公司 Data processing method and apparatus, and medium

Similar Documents

Publication Publication Date Title
CN104820657A (en) Inter-core communication method and parallel programming model based on embedded heterogeneous multi-core processor
US10140124B2 (en) Reconfigurable microprocessor hardware architecture
CN110610236A (en) Device for executing neural network operation
US11392740B2 (en) Dataflow function offload to reconfigurable processors
JP7389231B2 (en) synchronous network
CN102135950A (en) On-chip heterogeneous multi-core system based on star type interconnection structure, and communication method thereof
CN111090611A (en) Small heterogeneous distributed computing system based on FPGA
KR20210029725A (en) Data through gateway
WO2021201997A1 (en) Deep neural network accelerator with independent datapaths for simultaneous processing of different classes of operations
He et al. Accl: Fpga-accelerated collectives over 100 gbps tcp-ip
CN113114593A (en) Dual-channel router in network on chip and routing method thereof
Haghi et al. A reconfigurable compute-in-the-network fpga assistant for high-level collective support with distributed matrix multiply case study
CN113407479A (en) Many-core architecture embedded with FPGA and data processing method thereof
CN101707599A (en) DSP based Ethernet communication method in fault recording system
US20210406214A1 (en) In-network parallel prefix scan
CN103916316A (en) Linear speed capturing method of network data packages
US8589584B2 (en) Pipelining protocols in misaligned buffer cases
US10445099B2 (en) Reconfigurable microprocessor hardware architecture
CN114138707B (en) Data transmission system based on FPGA
CN112673351A (en) Streaming engine
Gao et al. Impact of reconfigurable hardware on accelerating mpi_reduce
Zhu et al. BiLink: A high performance NoC router architecture using bi-directional link with double data rate
US20110078410A1 (en) Efficient pipelining of rdma for communications
CN113704169A (en) Embedded configurable many-core processor
Ueno et al. VCSN: Virtual circuit-switching network for flexible and simple-to-operate communication in HPC FPGA cluster

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination