CN112035898A - Multi-node multi-channel high-speed parallel processing method and system - Google Patents

Multi-node multi-channel high-speed parallel processing method and system Download PDF

Info

Publication number
CN112035898A
CN112035898A CN202010844411.4A CN202010844411A CN112035898A CN 112035898 A CN112035898 A CN 112035898A CN 202010844411 A CN202010844411 A CN 202010844411A CN 112035898 A CN112035898 A CN 112035898A
Authority
CN
China
Prior art keywords
processed
data packet
reverse
host
memory nodes
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Withdrawn
Application number
CN202010844411.4A
Other languages
Chinese (zh)
Inventor
吴世勇
苏庆会
李银龙
王凯霖
王斌
冯驰
王中原
卫志刚
徐诺
姬少锋
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhengzhou Xinda Jiean Information Technology Co Ltd
Original Assignee
Zhengzhou Xinda Jiean Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhengzhou Xinda Jiean Information Technology Co Ltd filed Critical Zhengzhou Xinda Jiean Information Technology Co Ltd
Priority to CN202010844411.4A priority Critical patent/CN112035898A/en
Publication of CN112035898A publication Critical patent/CN112035898A/en
Withdrawn legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/70Protecting specific internal or peripheral components, in which the protection of a component leads to protection of the entire computer
    • G06F21/71Protecting specific internal or peripheral components, in which the protection of a component leads to protection of the entire computer to assure secure computing or processing of information
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F13/00Interconnection of, or transfer of information or other signals between, memories, input/output devices or central processing units
    • G06F13/14Handling requests for interconnection or transfer
    • G06F13/20Handling requests for interconnection or transfer for access to input/output bus
    • G06F13/28Handling requests for interconnection or transfer for access to input/output bus using burst mode transfer, e.g. direct memory access DMA, cycle steal
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/60Protecting data
    • G06F21/602Providing cryptographic facilities or services
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/70Protecting specific internal or peripheral components, in which the protection of a component leads to protection of the entire computer
    • G06F21/82Protecting input, output or interconnection devices
    • G06F21/85Protecting input, output or interconnection devices interconnection devices, e.g. bus-connected or in-line devices
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F5/00Methods or arrangements for data conversion without changing the order or content of the data handled
    • G06F5/06Methods or arrangements for data conversion without changing the order or content of the data handled for changing the speed of data flow, i.e. speed regularising or timing, e.g. delay lines, FIFO buffers; over- or underrun control therefor

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computer Hardware Design (AREA)
  • Computer Security & Cryptography (AREA)
  • Software Systems (AREA)
  • Mathematical Physics (AREA)
  • Health & Medical Sciences (AREA)
  • Bioethics (AREA)
  • General Health & Medical Sciences (AREA)
  • Data Exchanges In Wide-Area Networks (AREA)

Abstract

The invention provides a multi-node multi-channel high-speed parallel processing method and a system, wherein the method comprises the following steps: selecting a virtual channel from a plurality of virtual channels as a target virtual channel based on the task requirement of a data packet to be processed, and determining a corresponding host forward buffer and a corresponding host reverse buffer as target buffers based on the target virtual channel; distributing required forward memory nodes and reverse memory nodes for the data packets to be processed from the target buffer area, writing the data packets to be processed into the distributed forward memory nodes in the m forward memory nodes by the host, and writing a command word into the corresponding command word FIFO; when the forward DMA module polls the command word FIFO, the forward DMA module determines the address information of the distributed forward memory node and the length of the data packet to be processed based on the command word information, and transmits the data packet to be processed to the algorithm module and the like. The invention can realize high-speed transmission and high-speed processing of data.

Description

Multi-node multi-channel high-speed parallel processing method and system
Technical Field
The invention relates to the technical field of computers, in particular to a multi-node multi-channel high-speed parallel processing method and a multi-node multi-channel high-speed parallel processing system.
Background
An FPGA (Field-Programmable Gate Array) is a product of further development on the basis of Programmable devices such as pal (Programmable Array Logic), gal (general Array Logic), cpld (complex Programmable Logic device), and other complex Programmable Logic devices. The circuit is a semi-custom circuit in the field of application-specific integrated circuits, not only overcomes the defects of the custom circuit, but also overcomes the defect that the number of gate circuits of the original programmable device is limited.
Currently, the main control function of the encryption card is integrated in the FPGA to be implemented, and when data on the host needs to be sent to the encryption card for encryption and decryption, there are generally various requirements for using encryption and decryption algorithms, including symmetric and asymmetric. When data to be encrypted and decrypted is transmitted and received in one channel, the single channel mode is likely to cause data congestion when the data size is large. In addition, if a large amount of task data occupies the front end of the data stream, data of other tasks can be transmitted only after the task data is completed, which easily causes part of processing units in the FPGA to be in an idle state, and the resource utilization rate is not high. In some scenes, a task may need to encrypt and decrypt a group of ordered data packets, and the sizes of the data packets may be different, so that the time consumption for encrypting and decrypting the data packets is different, and then the phenomenon of disorder of the group of ordered data packets after encryption and decryption processing is easily caused, and the problem of data packet sequencing is difficult to solve by adopting a traditional single-channel mode.
In addition, if a plurality of tasks of the same type need to be processed based on the same channel in the same time period, if some task data occupies the task processing channel, other task data needing to be processed, especially emergency task data, can send the current task data only after the previous task data is processed; if the data volume of the previous task is large, the task processing channel is occupied for a long time, and data congestion of other tasks is caused; that is, the task processing mode will affect the transmission efficiency and processing progress of other task data to be processed, especially the urgent task data, and further cause the processing efficiency of the system to be low.
Disclosure of Invention
In view of the above problems, it is desirable to provide a multi-node multi-channel high-speed parallel processing method and system, which support multi-channel data transmission and implement quick and orderly transmission and processing of multi-task data.
The first aspect of the present invention further provides a multi-node multi-channel high-speed parallel processing method, including the following steps:
selecting a virtual channel from a plurality of virtual channels as a target virtual channel based on the task requirement of a data packet to be processed, and determining a corresponding host forward buffer and a corresponding host reverse buffer as target buffers based on the target virtual channel; the system comprises a host, a plurality of virtual channels and a plurality of FPGA chips, wherein a plurality of forward memory nodes in a host forward buffer area of the same virtual channel correspond to a plurality of reverse memory nodes in a host reverse buffer area one by one, the plurality of forward memory nodes are respectively used for caching data packets to be processed for being read by the FPGA chips, and the plurality of reverse memory nodes are respectively used for receiving finished data packets processed by the FPGA chips;
distributing required forward memory nodes and reverse memory nodes for the data packets to be processed from the target buffer area, writing the data packets to be processed into the distributed forward memory nodes in the m forward memory nodes by the host, and writing a command word into the corresponding command word FIFO; when the forward DMA module polls the command word FIFO, the forward DMA module determines the address information of the distributed forward memory node and the length of the data packet to be processed based on the command word information, and transmits the data packet to be processed to the algorithm module;
the algorithm module receives a data packet to be processed and performs operation processing to obtain a corresponding completed data packet, and writes a status word to a corresponding status word FIFO; when the reverse DMA module polls the status word FIFO, the reverse DMA module determines the address information of the allocated reverse memory node and the length of the completion data packet based on the status word information, and writes the completion data packet into the allocated reverse memory node.
The second aspect of the invention provides a multi-node multi-channel high-speed parallel processing system, which comprises an FPGA chip and a host, wherein the FPGA chip is in communication connection with the host and constructs a plurality of virtual channels so as to transmit data packets of different tasks;
the host comprises a plurality of host forward buffer areas and a plurality of host reverse buffer areas, and the host forward buffer areas, the host reverse buffer areas and the virtual channels are in one-to-one correspondence; the FPGA chip comprises a DMA module, a plurality of command word FIFOs, a plurality of state word FIFOs and an algorithm module;
each host forward buffer area comprises a plurality of forward memory nodes, each host reverse buffer area comprises a plurality of reverse memory nodes, the plurality of forward memory nodes in the host forward buffer area of the same virtual channel correspond to the plurality of reverse memory nodes in the host reverse buffer area one by one, the plurality of forward memory nodes are respectively used for caching a data packet to be processed for being read by an FPGA chip, and the plurality of reverse memory nodes are respectively used for receiving a finished data packet processed by the FPGA chip;
the DMA module comprises a forward DMA module and a reverse DMA module, the forward DMA module polls and reads a plurality of command word FIFOs, reads a data packet to be processed in a corresponding forward memory node through a corresponding virtual channel based on command word information, and transmits the data packet to the algorithm module; the reverse DMA module polls and reads a plurality of state word FIFOs and writes the completion data packet processed by the algorithm module into the corresponding reverse memory node through the corresponding virtual channel based on the state word information; the algorithm module is used for receiving the data packet to be processed and carrying out operation processing to obtain a corresponding finished data packet;
and in the same time period, when the FPGA chip and the host machine process the data packets to be processed required by the same task, executing the steps of the multi-node multi-channel high-speed parallel processing method.
The invention has prominent substantive characteristics and remarkable progress, in particular to the following steps:
1) the invention provides a multi-node multi-channel high-speed parallel processing method and a system, which can realize the parallel processing of various tasks through a plurality of virtual channels, and realize the parallel processing of a plurality of data packets of the same task through different nodes corresponding to each virtual channel, thereby improving the processing efficiency of the system. For each host, only the corresponding forward/reverse memory node needs to be concerned, the DMA module is responsible for high-speed reading and writing data for each forward/reverse memory node, meanwhile, a plurality of algorithms of the algorithm module are respectively subjected to parallel high-speed operation processing, the node, the DMA module and the algorithm module are mutually independent, asynchronous operation among the node, the DMA module and the algorithm module is realized, and compared with a traditional synchronous operation mode, the invention improves the efficiency and simultaneously realizes the maximum utilization of each resource;
2) based on the length of the data packet to be processed or the calculated length of the finished data packet, the method allocates a required forward memory node and a required reverse memory node for the data packet to be processed from a target buffer area; the problem of disorder caused by mismatching of the length of a data packet to be processed and a forward memory node or mismatching of the length of a finished data packet and a reverse memory node is avoided;
3) if the total number of the currently required forward memory nodes is larger than or equal to a preset value m, determining the sending sequence of each data packet to be processed based on the emergency degree of each data packet to be processed, and solving the problem that the task processing channel is occupied for a long time due to the large data volume of the previous task to cause congestion of other task data, particularly emergency task data;
4) when the current data packet to be processed is being processed and a new data packet to be processed required by the same task is received, judging which processing mode to execute based on the relation between the number of forward memory nodes required by the new data packet to be processed and the number of current idle forward memory nodes in the forward buffer area of the target host; the method realizes the pipeline processing of the data packets to be processed with the same task requirement, and reduces the cost of the host during the processing of the same task requirement;
5) the data can be orderly arranged while realizing high-speed transmission and high-speed processing of data, and the problem of disorder after a group of orderly data packets to be processed are parallelly operated by a traditional algorithm module is solved;
6) the number of virtual channels and nodes can be expanded according to actual needs, so that the method is suitable for more application scenes and has high applicability;
7) the algorithm module comprises a plurality of algorithm units, the algorithm units can respectively support parallel operation of a plurality of tasks, each algorithm unit also comprises a plurality of algorithms, the algorithms can respectively support parallel operation processing of a plurality of data packets of the same task, the algorithm module is high in operation efficiency, and the utilization rate of each algorithm is high.
Description of the drawings:
FIG. 1 is a block diagram of a multi-node, multi-channel, high-speed parallel processing system of the present invention;
FIG. 2 is a schematic diagram of multi-node parallel processing for a channel according to the present invention;
FIG. 3 is a flow chart of a multi-node multi-channel high-speed parallel processing method according to the invention.
The specific implementation mode is as follows:
in order to make the present invention clearer, the technical solution of the present invention is further described in detail by the following embodiments.
FIG. 3 is a flow chart of a multi-node multi-channel high-speed parallel processing method according to the invention.
As shown in fig. 3, a first aspect of the present invention provides a multi-node multi-channel high-speed parallel processing method, including the following steps:
selecting a virtual channel from a plurality of virtual channels as a target virtual channel based on the task requirement of a data packet to be processed, and determining a corresponding host forward buffer and a corresponding host reverse buffer as target buffers based on the target virtual channel; the system comprises a host, a plurality of virtual channels and a plurality of FPGA chips, wherein a plurality of forward memory nodes in a host forward buffer area of the same virtual channel correspond to a plurality of reverse memory nodes in a host reverse buffer area one by one, the plurality of forward memory nodes are respectively used for caching data packets to be processed for being read by the FPGA chips, and the plurality of reverse memory nodes are respectively used for receiving finished data packets processed by the FPGA chips; the method comprises the steps that m forward memory nodes are preset in each host forward buffer area, and m reverse memory nodes are preset in each host reverse buffer area; the target buffer zone comprises a target host forward buffer zone and a target host reverse buffer zone;
distributing required forward memory nodes and reverse memory nodes for the data packets to be processed from the target buffer area, writing the data packets to be processed into the distributed forward memory nodes in the m forward memory nodes by the host, and writing a command word into the corresponding command word FIFO; when the forward DMA module polls the command word FIFO, the forward DMA module determines the address information of the distributed forward memory node and the length of the data packet to be processed based on the command word information, and transmits the data packet to be processed to the algorithm module;
the algorithm module receives a data packet to be processed and performs operation processing to obtain a corresponding completed data packet, and writes a status word to a corresponding status word FIFO; when the reverse DMA module polls the status word FIFO, the reverse DMA module determines the address information of the allocated reverse memory node and the length of the completion data packet based on the status word information, and writes the completion data packet into the allocated reverse memory node.
Further, when allocating the required forward memory node and reverse memory node for the data packet to be processed from the target buffer, executing: reading the length of a data packet to be processed of a certain application main body, and judging whether the length of the data packet to be processed of the certain application main body is less than or equal to the storage capacity of a forward memory node or not;
if the length of the data packet to be processed is less than or equal to the storage capacity of one forward memory node, generating a first application request based on the data packet to be processed; the host allocates a forward memory node and a reverse memory node for the data packet to be processed according to the first application request corresponding to the application main body;
if the length of the data packet to be processed is larger than the storage capacity of the forward memory node, dividing the data to be processed into a group of ordered sub data packets to be processed, and generating a second application request based on the sub data packets to be processed; and the host allocates a group of ordered forward memory nodes and reverse memory nodes for the data packet to be processed according to the second application request.
It should be noted that, in the present invention, the required forward memory node and reverse memory node are allocated to the to-be-processed data packet from the target buffer based on the length of the to-be-processed data packet, so that the length of the to-be-processed data packet matches the forward memory node, and the problem of disorder caused by the mismatch between the length of the to-be-processed data packet and the forward memory node is avoided.
In one embodiment, a is based on the data packet a to be processedijThe task requirement of (1) selecting an ith virtual channel from the plurality of virtual channels as a target virtual channel, and determining an ith host forward buffer and an ith host reverse buffer as target buffers based on the target virtual channel. If the number of the data packets to be processed is 1 and the length of the data packets to be processed is less than or equal to the storage capacity of a forward memory node, allocating a forward direction to the data packets to be processed according to the first application requestA memory node (jth forward memory node) and a reverse memory node (jth reverse memory node). The host writes a data packet a to be processed into the jth forward memory node in the m forward memory nodesijWriting a command word into the ith command word FIFO; when the forward DMA module polls the ith command word FIFO, the forward DMA module determines the address information of the jth forward memory node and the data packet a to be processed based on the command word informationijAfter the length of the data packet a to be processed is reached, the data packet a to be processed is processedijTransmitting to an algorithm module; the algorithm module receives a data packet a to be processedijAnd performing operation processing to obtain corresponding completion data packet AijWriting a state word to the ith state word FIFO; when the reverse DMA module polls the ith status word FIFO, the reverse DMA module determines the address information of the jth reverse memory node and completes the data packet A based on the status word informationijWill complete packet aijAnd writing into the jth reverse memory node.
It can be understood that the host writes the pending data packet a into the jth forward memory node of the m forward memory nodesijWhile writing the command word b into the ith command word FIFOijThe command word bijComprises a data packet a to be processedijAnd address information stored in the jth forward memory node; when the forward DMA module polls the ith command word FIFO and command word bijWhen updating to the foremost end of the ith command word FIFO, the forward DMA module is based on the command word bijReading a data packet a to be processed in the jth forward memory nodeijMake the data packet a to be processedijAfter carrying the relevant information of j, transmitting the relevant information to the ith FPGA forward buffer area together to wait for the algorithm module to read;
the algorithm module receives a data packet a to be processedijAnd performing operation processing to obtain corresponding completion data packet Aij(ii) a The algorithm module will complete packet aijTransmitting to the ith FPGA reverse buffer area, and simultaneously writing a state word B into the ith state word FIFOijSaid status word BijIncluding completion dataBag AijThe length of the host node and the address information of the jth reverse memory node in the ith host reverse buffer zone to be returned; when the reverse DMA module polls to the ith status word FIFO and status word BijWhen updating to the foremost end of the ith status word FIFO, the reverse DMA module is based on the status word BijFinish data packet A read from ith FPGA reverse buffer areaijAnd is based on the status word BijDetermines the address information of the jth reverse memory node and completes the data packet AijWill complete packet aijAnd writing into the jth reverse memory node.
In another embodiment, the data packet a is processed based on the data to be processedijThe task requirement of (1) selecting an ith virtual channel from the plurality of virtual channels as a target virtual channel, and determining an ith host forward buffer and an ith host reverse buffer as target buffers based on the target virtual channel. If the number of the data packets to be processed is 1 and the length of the data packets to be processed is greater than the storage capacity of one forward memory node, dividing the data packets to be processed into a group of ordered data packets to be processed, and generating a second application request based on the data packets to be processed; according to a second application request corresponding to the application main body, distributing a group of ordered forward memory nodes and reverse memory nodes for the data packet to be processed; the application main body writes corresponding sub data packets to be processed into a group of ordered forward memory nodes respectively; the forward DMA module determines the packet length and the corresponding forward memory node based on the command word information in the command word FIFO, then reads the sub data packet to be processed of each forward memory node, and transmits the sub data packet to the algorithm module; the algorithm module receives a group of ordered to-be-processed sub data packets and performs parallel operation processing to obtain corresponding completed sub data packets; and after determining the length of the determined packet and the corresponding reverse memory node based on the state word information in the state word FIFO, the reverse DMA module writes each completion sub-data packet with the determined length into the corresponding reverse memory node respectively to recombine a group of ordered completion data packets.
It should be noted that after encryption and decryption, ciphertext data is often larger than original plaintext data, and if the storage amount of the reverse memory node is greater than or equal to the length of the complete data packet or the length of the complete sub data packet, the problem of disorder does not occur. In practical application, the storage capacity of the reverse memory node may be smaller than that of the complete data packet or the complete sub data packet, and at this time, the reverse memory node cannot completely store the complete data packet or the complete sub data packet, which may cause data scrambling after the FPGA processing.
Therefore, the invention presets the proportional relation between the length of the data packet to be processed (the length of the original plaintext data) and the length of the completed data packet or the completed sub-data packet after the processing of the algorithm module according to historical experience, when the encryption service requirement of the application main body is received, the length of the encrypted completed data packet or the length of the completed sub-data packet can be calculated based on the length of the data packet to be processed (the length of the original plaintext data) and the preset proportional relation, and then the forward memory node and the reverse memory node are allocated based on the length of the encrypted completed data packet or the length of the completed sub-data packet.
Specifically, when allocating a required forward memory node and a required reverse memory node for a to-be-processed data packet from the target buffer, the following steps are performed: reading the length of a to-be-processed data packet of an application main body, calculating the length of a finished data packet based on the length of the to-be-processed data packet and a preset proportional relation, and judging whether the calculated length of the finished data packet is less than or equal to the storage capacity of a reverse memory node or not;
if the calculated length of the completion data packet is less than or equal to the storage capacity of one reverse memory node, generating a third application request based on the calculated length of the completion data packet; the host allocates a forward memory node and a reverse memory node for the data packet to be processed according to the third application request;
if the deduced length of the completion data packet is larger than the storage capacity of one reverse memory node, generating a fourth application request based on the deduced length of the completion data packet; and the host allocates and distributes a group of ordered forward memory nodes and reverse memory nodes to the data packet to be processed according to the fourth application request.
It is understood that the present invention may allocate the required forward memory node and reverse memory node for the pending data packet from the target buffer based on the pending data packet length or the calculated completed data packet length. It should be noted that, the present invention may also allocate a required forward memory node and a required reverse memory node for the pending data packet from the target buffer based on the pending data packet length and the calculated completion data packet length.
It can be understood that, if the number of the allocated forward memory nodes and the number of the allocated reverse memory nodes are inconsistent based on the relationship between the length of the to-be-processed data packet and the storage amount of one forward memory node and based on the relationship between the calculated length of the completed data packet and the storage amount of one reverse memory node; the allocation policy with the larger number of the forward memory nodes and the reverse memory nodes is selected as the target allocation policy. That is, the forward memory node and the reverse memory node are allocated according to the maximum data length before and after processing, so that data disorder is prevented.
It should be noted that, when there may be more than one to-be-processed data packet corresponding to the target virtual channel in the same time period, if the sending sequence is determined based on the to-be-processed data packet generation time, a problem may occur that the task processing channel is occupied for a long time due to a large amount of previous task data, which may cause congestion of other task data, especially emergency task data. Therefore, when the total number of the forward memory nodes required currently is greater than or equal to the preset value m, the method determines the sending sequence of each data packet to be processed based on the urgency degree of each data packet to be processed.
Specifically, when allocating a required forward memory node and a required reverse memory node for a to-be-processed data packet from the target buffer, the following steps are performed: when the number of the data packets to be processed corresponding to the target virtual channel (i-th virtual channel) in the same time period is two or more, the host obtains the total number of currently required forward memory nodes based on the ratio of the length of each data packet to be processed to the storage capacity of one forward memory node, and judges whether the total number of currently required forward memory nodes corresponding to the target virtual channel (i-th virtual channel) is greater than or equal to a preset value m;
if the total number of the current required forward memory nodes corresponding to the target virtual channel is greater than or equal to a preset value m, respectively calculating the emergency degree of each data packet to be processed, determining the sending sequence of each data packet to be processed based on the emergency degree of each data packet to be processed, and respectively writing the data packet to be processed into the distributed forward memory nodes according to the sending sequence of each data packet to be processed;
and if the total number of the currently required forward memory nodes is less than a preset value m, respectively writing the data packets to be processed into the distributed forward memory nodes according to the generation time sequence of the data packets to be processed.
Further, when calculating the urgency level of each to-be-processed data packet, executing: reading the priority of an application main body generating a data packet to be processed, and determining the sequence of the emergency degree of the data packet to be processed from large to small according to the sequence of the priority of the application main body from high to low; if the application main bodies belonging to the same priority level generate the data packets to be processed with different sizes, determining the sequence of the emergency degree of the data packets to be processed from small to large according to the sequence of the data packets to be processed from small to large.
It should be noted that, when one or more packets to be processed are being processed, there may be a situation that packets to be processed required by the same task arrive; in the prior art, if a certain task data packet is being processed, other task data to be processed, especially emergency task data, must wait for the previous task data to be processed, and then process a new data packet to be processed. In particular, if the previous task has a large amount of data, the task processing channel may be occupied for a long time, and congestion may be caused to other task data (packets to be processed required by the same task).
Aiming at the problems, the invention sets the following processing strategies: when a current data packet to be processed is being processed and a new data packet to be processed required by the same task is received, the host obtains the number of forward memory nodes required by the new data packet to be processed based on the relationship between the length of the new data packet to be processed and the storage capacity of one forward memory node;
judging whether the number of the forward memory nodes required by the new data packet to be processed is less than or equal to the number of the current idle forward memory nodes of a forward buffer area (ith host forward buffer area) of a target host;
if the number of the forward memory nodes required by the new data packet to be processed is less than or equal to the number of the current idle forward memory nodes in the forward buffer area of the target host, generating a fifth application request based on the new data to be processed, and allocating the forward memory nodes and the reverse memory nodes with corresponding numbers to the new data to be processed by the host according to the fifth application request corresponding to an application main body;
and if the quantity of the forward memory nodes required by the new data packet to be processed is greater than the quantity of the current idle forward memory nodes in the forward buffer area of the target host, continuously waiting and accumulating the quantity of the current idle forward memory nodes until the quantity of the current idle forward memory nodes is greater than or equal to the quantity of the forward memory nodes required by the new data packet to be processed.
It can be understood that if the number of the current idle forward memory nodes is small and cannot meet the requirement of a new data packet to be processed, a period of time needs to be waited; if the number of the current idle forward memory nodes is large and the requirement of a new data packet to be processed can be met, distributing a corresponding number of forward memory nodes and reverse memory nodes for the new data packet to be processed, sending the new data packet to be processed to the distributed forward memory nodes, and starting the processing flow of the new data packet to be processed without waiting for the completion of the processing of other data packets; and carrying out asynchronous processing operation on the new data packet to be processed and the current data packet to be processed in the processing.
In a specific embodiment, when the application main body has a service processing requirement, the system allocates a corresponding forward memory node and a corresponding reverse memory node based on the requirement of the application main body, and when the service processing is completed, that is, the reverse memory node receives a completion data packet, and after the completion data packet is read by the application main body and exceeds a preset time, the allocated forward memory node and the allocated reverse memory node are released, so that the memory resource of the host computer is saved, and more service processing scenarios of the application main body are supported. It is understood that the preset time may be set according to the requirement, such as 2s, 5s, etc., and the embodiment is not limited herein.
In another specific embodiment, if the host extracts a completion packet corresponding to a to-be-processed packet from a reverse memory node, a forward memory node corresponding to the reverse memory node is marked as an idle forward memory node.
Further, when the number of the data packets to be processed corresponding to the target virtual channel (ith virtual channel) in the same time period is two or more, the target algorithm unit (ith algorithm unit) calls a corresponding number of algorithm subunits to perform parallel operation processing on a plurality of data packets to be processed required by the same task; the algorithm module comprises a plurality of algorithm units, the algorithm units are respectively in one-to-one correspondence with the virtual channels, the FPGA forward buffer areas and the FPGA reverse buffer areas, each algorithm unit respectively bears different tasks, each algorithm unit comprises a plurality of same algorithm subunits, and the same algorithm subunits respectively carry out parallel operation processing on a plurality of data packets to be processed of the same task.
It can be understood that the node, the DMA module and the algorithm module are all independent of each other, so that asynchronous operation among the node, the DMA module and the algorithm module is realized, and a plurality of algorithms of the algorithm module respectively perform parallel high-speed operation processing; compared with the traditional synchronous operation mode, the method and the device have the advantages that the efficiency is improved, and meanwhile, each resource is utilized to the maximum extent.
FIG. 1 is a block diagram of a multi-node multi-channel high-speed parallel processing system according to the present invention, and FIG. 2 is a schematic diagram of a multi-node parallel processing according to a certain channel of the present invention.
As shown in fig. 1 and fig. 2, a second aspect of the present invention further provides a multi-node multi-channel high-speed parallel processing system, where the system includes an FPGA chip and a host, where the FPGA chip and the host are communicatively connected and construct multiple virtual channels to transmit data packets of different tasks;
the host comprises a plurality of host forward buffer areas and a plurality of host reverse buffer areas, and the host forward buffer areas, the host reverse buffer areas and the virtual channels are in one-to-one correspondence; each host forward buffer area comprises a plurality of forward memory nodes, each host reverse buffer area comprises a plurality of reverse memory nodes, the plurality of forward memory nodes in the host forward buffer area of the same virtual channel correspond to the plurality of reverse memory nodes in the host reverse buffer area one by one, the plurality of forward memory nodes are respectively used for caching a data packet to be processed for being read by an FPGA chip, and the plurality of reverse memory nodes are respectively used for receiving a finished data packet processed by the FPGA chip;
the FPGA chip comprises a DMA module, a plurality of command word FIFOs, a plurality of state word FIFOs and an algorithm module; the DMA module comprises a forward DMA module and a reverse DMA module, the forward DMA module polls and reads a plurality of command word FIFOs, reads a data packet to be processed in a corresponding forward memory node through a corresponding virtual channel based on command word information, and transmits the data packet to the algorithm module; the reverse DMA module polls and reads a plurality of state word FIFOs and writes the completion data packet processed by the algorithm module into the corresponding reverse memory node through the corresponding virtual channel based on the state word information; the algorithm module is used for receiving the data packet to be processed and carrying out operation processing to obtain a corresponding finished data packet;
and in the same time period, when the FPGA chip and the host machine process the data packets to be processed required by the same task, executing the steps of the multi-node multi-channel high-speed parallel processing method.
It can be understood that the plurality of virtual channels may respectively correspond to a plurality of different task requirements, for example, the task corresponding to the 1 st virtual channel may be encryption/decryption, and the task corresponding to the 2 nd virtual channel may be signature/signature verification, but is not limited thereto.
It can be understood that the number of virtual channels and the number of forward memory nodes in each host forward buffer (or the number of reverse memory nodes in each host reverse buffer) may be expanded according to actual requirements; so as to be applicable to more application scenarios, the suitability is higher.
Furthermore, the plurality of command word FIFOs correspond to the plurality of state word FIFOs, the plurality of virtual channels, the plurality of host forward buffer areas and the plurality of host reverse buffer areas one by one; the command word FIFOs are respectively used for indicating whether a data packet to be processed needing to be transmitted to the DMA module exists in a corresponding host forward buffer area or not, and the source node information and the length size of the data packet to be processed. It should be noted that the command word FIFO and the status word FIFO use a First-in First-out queue (FIFO) to store data.
In practical application, when a host has a task to be processed by the FPGA chip, a data packet to be processed is organized and completed in a corresponding host forward buffer area, and a command word FIFO is written in through process application to indicate that a DMA module has the data packet to be transmitted in the virtual channel. Specifically, if a data packet to be processed is formed in a forward buffer of a host, a 32-bit command word is written into the corresponding command word FIFO, and the command word includes length information and address information of the data packet. When a certain command word FIFO is not empty, the DMA module reads the command word in the command word FIFO to obtain the data packet length information and the address information corresponding to the command word, and then the data packet can be transmitted.
It will be appreciated that the number of command words in the command word FIFO may be multiple and that the multiple command words satisfy the requirement of "first in first out", i.e. the command word written first into the command word FIFO should be read by the forward DMA module earlier than the command word written back into the command word FIFO.
As shown in fig. 2, it is preset that the ith host forward buffer includes m forward memory nodes, and the ith host forward buffer, the ith virtual channel, and the ith command word FIFO are in one-to-one correspondence;
the host writes a data packet a to be processed into the jth forward memory node in the m forward memory nodesijWhile writing the command word b into the ith command word FIFOijThe command word bijComprises a data packet a to be processedijAnd stored in the jth forward memory nodeAddress information;
when the forward DMA module polls the ith command word FIFO and command word bijWhen updating to the foremost end of the ith command word FIFO, the forward DMA module is based on the command word bijReading a data packet a to be processed in the jth forward memory nodeij
In practical application, when the ith command word FIFO is polled, the forward DMA module determines that the ith command word FIFO is not empty, and can read the command word at the foremost end of the ith command word FIFO. If the virtual channel is empty, polling the next virtual channel (for example, the (i + 1) th virtual channel).
Furthermore, the FPGA chip also comprises a plurality of FPGA forward buffer areas and a plurality of FPGA reverse buffer areas, and the plurality of FPGA forward buffer areas, the plurality of FPGA reverse buffer areas and the plurality of virtual channels are in one-to-one correspondence;
the plurality of FPGA forward buffer areas are respectively used for receiving data packets to be processed of different tasks transmitted by the forward DMA module and performing buffer processing so as to wait for the algorithm module to read;
and the plurality of FPGA reverse buffer areas are respectively used for receiving the completion data packets of different tasks processed by the algorithm module and performing cache processing to wait for the reverse DMA module to read.
Specifically, the forward DMA module is based on a command word bijReading a data packet a to be processed in the jth forward memory nodeijAnd let the pending data packet aijAnd carrying the relevant information of the j, and then transmitting the j to the ith FPGA forward buffer area together to wait for the algorithm module to read.
It can be understood that the virtual channel described in the present invention is mainly used to represent the mapping link relationship between each host forward/reverse buffer and each FPGA forward/reverse buffer, and there is no specific physical line. By establishing the mapping link relation, the data packet in the 1 st host forward buffer area can only be received by the 1 st FPGA forward buffer area, but not received by other FPGA receiving buffer areas; similarly, the data packet in the 1 st FPGA reverse buffer can only be received by the 1 st host reverse buffer, but not by other host reverse buffers.
Furthermore, the plurality of state word FIFOs correspond to the plurality of command word FIFOs, the plurality of virtual channels, the plurality of host forward buffer areas, the plurality of host reverse buffer areas, the plurality of FPGA forward buffer areas and the plurality of FPGA reverse buffer areas one by one; the plurality of status word FIFOs are respectively used for indicating whether a completion data packet needing to be transmitted by a reverse DMA module exists in a corresponding FPGA reverse buffer area or not, and destination node information and length of the completion data packet.
Specifically, the algorithm module receives a data packet a to be processedijAnd performing operation processing to obtain a corresponding completion data packet AijThe algorithm module will complete packet AijTransmitting to the ith FPGA reverse buffer area, and simultaneously writing a state word B into the ith state word FIFOijSaid status word BijIncluding completion packet aijThe length of the host node and the address information of the jth reverse memory node in the ith host reverse buffer zone to be returned;
when the reverse DMA module polls to the ith status word FIFO and status word BijWhen updating to the foremost end of the ith status word FIFO, the reverse DMA module is based on the status word BijFinish data packet A read from ith FPGA reverse buffer areaijAnd is based on the status word BijThe carried j related information determines the address information of the jth reverse memory node and completes the data packet AijWill complete packet aijAnd writing into the jth reverse memory node.
Furthermore, the algorithm module comprises a plurality of algorithm units, the plurality of algorithm units respectively correspond to the plurality of virtual channels, the plurality of FPGA forward buffer areas and the plurality of FPGA reverse buffer areas one by one, each algorithm unit respectively bears different tasks, each algorithm unit comprises a plurality of same algorithm subunits, and the plurality of same algorithm subunits respectively carry out parallel operation processing on a plurality of data packets to be processed of the same task.
Specifically, the 1 st algorithm unit is used for processing an encryption and decryption task, and the 1 st algorithm unit includes a plurality of SM4 algorithm subunits, and the plurality of SM4 algorithm subunits can perform encryption and decryption processing on a plurality of data packets to be processed in parallel; the 2 nd algorithm unit is used for signing/signature checking tasks, the 2 nd algorithm unit comprises a plurality of SM2 algorithm subunits, and the SM2 algorithm subunits can carry out signing/signature checking on a plurality of data packets to be processed in parallel; but is not limited thereto.
Preferably, the FPGA chip and the host are connected by data communication through a PCIE interface, and the PCIE interface protocol version to which the present invention is applied may be PCIE1.0, PCIE2.0, and PCIE3.0, but is not limited thereto.
Furthermore, each application main body in the host applies for acquiring a predetermined number of forward memory nodes and reverse memory nodes in the corresponding host forward buffer area and host reverse buffer area respectively based on task requirements, and the forward memory nodes and the reverse memory nodes applied between different application main bodies of the same task do not conflict.
In an application scenario of the present invention, two virtual channels are preset, that is, a 1 st virtual channel and a 2 nd virtual channel, and the 1 st virtual channel is used for an encryption/decryption task, and the 2 nd virtual channel is used for a signature/signature verification task, when the 1 st application main body and the 2 nd application main body have requirements for encryption/decryption tasks respectively, the 1 st application main body applies to a host to obtain a 1 st forward memory node and a 1 st reverse memory node of the 1 st virtual channel, the 2 nd application main body applies to the host to obtain a 2 nd forward memory node and a 2 nd reverse memory node of the 1 st virtual channel, and the 1 st application main body and the 2 nd application main body write data packets to be processed into the 1 st forward memory node and the 2 nd forward memory node according to a time slice sequence, respectively. For example: in the first time slice, a 1 st application main body writes a data packet to be processed into a 1 st forward memory node, and simultaneously writes a 1 st command word into a command word FIFO; in the second time slice, the 2 nd application main body writes a data packet to be processed into the 2 nd forward memory node, and simultaneously writes the 2 nd command word into the command word FIFO, and based on the principle of the command word FIFO 'first-in first-out', the 1 st command word is in front of the 2 nd command word. For a forward DMA module of an FPGA chip, command word FIFOs of a 1 st virtual channel and a 2 nd virtual channel are polled and read according to a fairness principle, when the 1 st virtual channel is polled, the 1 st command word is preferentially read according to a first-in first-out principle, the forward DMA module can read out a data packet to be processed in the 1 st forward memory node and transmit the data packet to an algorithm module for encryption and decryption, after the encryption and decryption are completed, the algorithm module writes a corresponding completion data packet into a 1 st FPGA reverse buffer area, and simultaneously writes a 1 st state word into a state word FIFO of the 1 st virtual channel, wherein the 1 st state word comprises destination node information and length of the completion data packet, and the reverse DMA module can write the completion data packet into a 1 st reverse memory node based on the 1 st state word. Then the forward DMA module continues to poll the 2 nd virtual channel, when the command word FIFO of the 2 nd virtual channel is read to be empty, it indicates that all forward memory nodes corresponding to the 2 nd virtual channel have no data, then the forward DMA module can return to poll the 1 st virtual channel, and read the data packet to be processed in the 2 nd forward memory node; when the command word FIFO for reading the 2 nd virtual channel is not empty, the forward DMA module can read out the data packet to be processed in the manner of the 1 st virtual channel and transmit the data packet to the algorithm module for encryption and decryption.
It can be understood that a plurality of traditional application entities share one memory node, and after the task of one application entity is completed, the task of the next application entity can be performed, so that the transmission and processing efficiency is low, and for each algorithm unit in the application entities and the algorithm module, the waiting time is long, the resource is easy to idle, and the utilization rate is low. The application main bodies can write the data packets to be processed into the forward memory nodes respectively, and receive the completed data packets at the corresponding reverse memory nodes, the whole process is not sensitive to the application main bodies, the nodes are independent and do not influence each other, the DMA module of the FPGA chip can poll and read and write the forward/reverse memory nodes at a high speed, and a plurality of algorithms in the algorithm module can perform parallel processing, so that the task processing efficiency can be effectively improved, and the resource utilization rate is maximally improved.
In another application scenario of the present invention, if an application subject needs to process an ordered set of packets to be processed (1, 2, 3, …, 10) of the same task or the length of a packet to be processed exceeds the storage capacity of a forward memory node, the 1 st application subject may apply for an ordered set of forward memory nodes (1, 2, 3, …, 10) and reverse memory nodes (1, 2, 3, …, 10) matching the number of packets to be processed, for the application subject, only the packets to be processed (1, 2, 3, …, 10) need to be sequentially written into the forward memory nodes (1, 2, 3, …, 10), and later a set of ordered completion packets may be received at the reverse memory nodes (1, 2, 3, …, 10) to correspond to the ordered set of packets to be processed (1, 2, 3, 10), …, 10). Based on the method, the device and the system, the data can be transmitted and processed at high speed, and the finished data packets can be arranged in order.
The invention can realize the parallel processing of various tasks through a plurality of virtual channels, and realize the parallel processing of a plurality of data packets of the same task through different nodes corresponding to each virtual channel, thereby improving the processing efficiency of the system. For each host, only the corresponding forward/reverse memory node needs to be concerned, the DMA module is responsible for high-speed data reading and writing of each forward/reverse memory node, a plurality of algorithms of the algorithm module are subjected to parallel high-speed operation processing respectively, the node, the DMA module and the algorithm module are independent of one another, asynchronous operation among the node, the DMA module and the algorithm module is achieved, and compared with a traditional synchronous operation mode, the method and the system improve efficiency and achieve maximum utilization of each resource.
The invention can realize the ordered arrangement of the completed data packets while realizing the high-speed transmission and high-speed processing of the data, and solves the problem that the traditional algorithm module generates disorder after parallel operation on a group of ordered data packets to be processed.
Finally, it should be noted that: the above embodiments are only used for illustrating the technical solutions of the present invention and not for limiting the same, and those skilled in the art should make modifications to the specific embodiments of the present invention or make equivalent substitutions for part of technical features without departing from the spirit of the technical solutions of the present invention, and all of them should be covered in the technical solutions claimed in the present invention.

Claims (10)

1. A multi-node multi-channel high-speed parallel processing method is characterized by comprising the following steps:
selecting a virtual channel from a plurality of virtual channels as a target virtual channel based on the task requirement of a data packet to be processed, and determining a corresponding host forward buffer and a corresponding host reverse buffer as target buffers based on the target virtual channel; the system comprises a host, a plurality of virtual channels and a plurality of FPGA chips, wherein a plurality of forward memory nodes in a host forward buffer area of the same virtual channel correspond to a plurality of reverse memory nodes in a host reverse buffer area one by one, the plurality of forward memory nodes are respectively used for caching data packets to be processed for being read by the FPGA chips, and the plurality of reverse memory nodes are respectively used for receiving finished data packets processed by the FPGA chips;
distributing required forward memory nodes and reverse memory nodes for the data packets to be processed from the target buffer area, writing the data packets to be processed into the distributed forward memory nodes in the m forward memory nodes by the host, and writing a command word into the corresponding command word FIFO; when the forward DMA module polls the command word FIFO, the forward DMA module determines the address information of the distributed forward memory node and the length of the data packet to be processed based on the command word information, and transmits the data packet to be processed to the algorithm module;
the algorithm module receives a data packet to be processed and performs operation processing to obtain a corresponding completed data packet, and writes a status word to a corresponding status word FIFO; when the reverse DMA module polls the status word FIFO, the reverse DMA module determines the address information of the allocated reverse memory node and the length of the completion data packet based on the status word information, and writes the completion data packet into the allocated reverse memory node.
2. The multi-node multi-channel high-speed parallel processing method according to claim 1, wherein when allocating the required forward memory node and reverse memory node for the data packet to be processed from the target buffer, the following steps are performed: reading the length of a data packet to be processed of an application main body, and judging whether the length of the data packet to be processed is less than or equal to the storage capacity of a forward memory node or not;
if the length of the data packet to be processed is less than or equal to the storage capacity of one forward memory node, generating a first application request based on the data packet to be processed; the host allocates a forward memory node and a reverse memory node for the data packet to be processed according to the first application request;
if the length of the data packet to be processed is larger than the storage capacity of the forward memory node, dividing the data to be processed into a group of ordered sub data packets to be processed, and generating a second application request based on the sub data packets to be processed; and the host allocates a group of ordered forward memory nodes and reverse memory nodes for the data packet to be processed according to the second application request.
3. The multi-node multi-channel high-speed parallel processing method according to claim 2, wherein the application main body writes corresponding sub-packets to be processed into a set of ordered forward memory nodes, respectively; the forward DMA module determines the packet length and the corresponding forward memory node based on the command word information in the command word FIFO, then reads the sub data packet to be processed of each forward memory node, and transmits the sub data packet to the algorithm module; the algorithm module receives a group of ordered to-be-processed sub data packets and performs parallel operation processing to obtain corresponding completed sub data packets; and after determining the length of the determined packet and the corresponding reverse memory node based on the state word information in the state word FIFO, the reverse DMA module writes each completion sub-data packet with the determined length into the corresponding reverse memory node respectively to recombine a group of ordered completion data packets.
4. The multi-node multi-channel high-speed parallel processing method according to claim 1, wherein when allocating the required forward memory node and reverse memory node for the data packet to be processed from the target buffer, the following steps are performed: reading the length of a to-be-processed data packet of an application main body, calculating the length of a finished data packet based on the length of the to-be-processed data packet and a preset proportional relation, and judging whether the calculated length of the finished data packet is less than or equal to the storage capacity of a reverse memory node or not;
if the calculated length of the completion data packet is less than or equal to the storage capacity of one reverse memory node, generating a third application request based on the calculated length of the completion data packet; the host allocates a forward memory node and a reverse memory node for the data packet to be processed according to the third application request;
if the deduced length of the completion data packet is larger than the storage capacity of one reverse memory node, generating a fourth application request based on the deduced length of the completion data packet; and the host allocates and distributes a group of ordered forward memory nodes and reverse memory nodes to the data packet to be processed according to the fourth application request.
5. The multi-node multi-channel high-speed parallel processing method according to claim 1, wherein when allocating the required forward memory node and reverse memory node for the data packet to be processed from the target buffer, the following steps are performed:
when the number of the data packets to be processed corresponding to the target virtual channel is two or more in the same time period, the host obtains the total number of the currently required forward memory nodes based on the ratio of the length of each data packet to be processed to the storage capacity of one forward memory node, and judges whether the total number of the currently required forward memory nodes corresponding to the target virtual channel is greater than or equal to a preset value m;
if the total number of the current required forward memory nodes corresponding to the target virtual channel is greater than or equal to a preset value m, respectively calculating the emergency degree of each data packet to be processed, determining the sending sequence of each data packet to be processed based on the emergency degree of each data packet to be processed, and respectively writing the data packet to be processed into the distributed forward memory nodes according to the sending sequence of each data packet to be processed;
and if the total number of the currently required forward memory nodes is less than a preset value m, respectively writing the data packets to be processed into the distributed forward memory nodes according to the generation time sequence of the data packets to be processed.
6. A multi-node multi-channel high-speed parallel processing method according to claim 5, wherein when calculating the urgency of each pending packet, performing:
reading the priority of an application main body generating a data packet to be processed, and determining the sequence of the emergency degree of the data packet to be processed from large to small according to the sequence of the priority of the application main body from high to low; if the application main bodies belonging to the same priority level generate the data packets to be processed with different sizes, determining the sequence of the emergency degree of the data packets to be processed from small to large according to the sequence of the data packets to be processed from small to large.
7. The multi-node multi-channel high-speed parallel processing method according to any one of claims 1 to 6, wherein when a current pending packet is being processed and a new pending packet required by the same task is received, the host obtains the number of forward memory nodes required for the new pending packet based on a relationship between the length of the new pending packet and the storage capacity of one forward memory node;
judging whether the number of forward memory nodes required by the new data packet to be processed is less than or equal to the number of current idle forward memory nodes in a forward buffer area of a target host;
if so, generating a fifth application request based on the new to-be-processed data, and distributing a corresponding number of forward memory nodes and reverse memory nodes for the new to-be-processed data by the host according to the fifth application request corresponding to a certain application main body;
if not, continuing to wait and accumulating the number of the current idle forward memory nodes until the number of the current idle forward memory nodes is more than or equal to the number of the forward memory nodes required by the new data packet to be processed.
8. A multi-node multi-channel high-speed parallel processing system comprises an FPGA chip and a host, wherein the FPGA chip is in communication connection with the host and constructs a plurality of virtual channels so as to transmit data packets of different tasks; the host comprises a plurality of host forward buffer areas and a plurality of host reverse buffer areas, and the host forward buffer areas, the host reverse buffer areas and the virtual channels are in one-to-one correspondence; the FPGA chip comprises a DMA module, a plurality of command word FIFOs, a plurality of state word FIFOs and an algorithm module; the DMA module comprises a forward DMA module and a reverse DMA module, and is characterized in that:
each host forward buffer area comprises a plurality of forward memory nodes, each host reverse buffer area comprises a plurality of reverse memory nodes, the plurality of forward memory nodes in the host forward buffer area of the same virtual channel correspond to the plurality of reverse memory nodes in the host reverse buffer area one by one, the plurality of forward memory nodes are respectively used for caching a data packet to be processed for being read by an FPGA chip, and the plurality of reverse memory nodes are respectively used for receiving a finished data packet processed by the FPGA chip;
executing the steps of the multi-node multi-channel high-speed parallel processing method according to any one of claims 1 to 7 when the FPGA chip and the host process the data packets to be processed required by the same task in the same time period.
9. The multi-node multi-channel high-speed parallel processing system according to claim 8, wherein the algorithm module comprises a plurality of algorithm units, the plurality of algorithm units respectively correspond to the plurality of virtual channels, the plurality of FPGA forward buffers and the plurality of FPGA reverse buffers one by one, each algorithm unit respectively undertakes processing of different tasks, each algorithm unit comprises a plurality of identical algorithm subunits, and the identical algorithm subunits respectively perform parallel operation processing on a plurality of to-be-processed data packets of the same task.
10. The multi-node multi-channel high-speed parallel processing system according to claim 8, wherein each application agent in the host applies for obtaining a predetermined number of forward memory nodes and reverse memory nodes in the corresponding host forward buffer area and host reverse buffer area respectively based on task requirements, and the forward memory nodes and reverse memory nodes applied between different application agents of the same task do not conflict.
CN202010844411.4A 2020-08-20 2020-08-20 Multi-node multi-channel high-speed parallel processing method and system Withdrawn CN112035898A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010844411.4A CN112035898A (en) 2020-08-20 2020-08-20 Multi-node multi-channel high-speed parallel processing method and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010844411.4A CN112035898A (en) 2020-08-20 2020-08-20 Multi-node multi-channel high-speed parallel processing method and system

Publications (1)

Publication Number Publication Date
CN112035898A true CN112035898A (en) 2020-12-04

Family

ID=73581035

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010844411.4A Withdrawn CN112035898A (en) 2020-08-20 2020-08-20 Multi-node multi-channel high-speed parallel processing method and system

Country Status (1)

Country Link
CN (1) CN112035898A (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113992609A (en) * 2021-09-23 2022-01-28 北京连山科技股份有限公司 Method and system for processing multilink service data disorder
CN114553776A (en) * 2022-02-28 2022-05-27 深圳市风云实业有限公司 Signal out-of-order control and rate self-adaptive transmission device and transmission method thereof
WO2023082560A1 (en) * 2021-11-12 2023-05-19 苏州浪潮智能科技有限公司 Task processing method and apparatus, device, and medium

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113992609A (en) * 2021-09-23 2022-01-28 北京连山科技股份有限公司 Method and system for processing multilink service data disorder
CN113992609B (en) * 2021-09-23 2022-06-14 北京连山科技股份有限公司 Method and system for processing multilink service data disorder
WO2023082560A1 (en) * 2021-11-12 2023-05-19 苏州浪潮智能科技有限公司 Task processing method and apparatus, device, and medium
CN114553776A (en) * 2022-02-28 2022-05-27 深圳市风云实业有限公司 Signal out-of-order control and rate self-adaptive transmission device and transmission method thereof
CN114553776B (en) * 2022-02-28 2023-10-10 深圳市风云实业有限公司 Signal disorder control and rate self-adaptive transmission device and transmission method thereof

Similar Documents

Publication Publication Date Title
US11916781B2 (en) System and method for facilitating efficient utilization of an output buffer in a network interface controller (NIC)
CN110083461B (en) Multitasking system and method based on FPGA
CN112035898A (en) Multi-node multi-channel high-speed parallel processing method and system
USRE44151E1 (en) Switching ethernet controller
US7308526B2 (en) Memory controller module having independent memory controllers for different memory types
US7653072B2 (en) Overcoming access latency inefficiency in memories for packet switched networks
CN103810133A (en) Dynamic shared read buffer management
US8989220B2 (en) High speed variable bandwidth ring-based system
TW201703475A (en) Method and apparatus for using multiple linked memory lists
US5347514A (en) Processor-based smart packet memory interface
US7760736B2 (en) Method, system, and computer program product for ethernet virtualization using an elastic FIFO memory to facilitate flow of broadcast traffic to virtual hosts
EP1891503B1 (en) Concurrent read response acknowledge enhanced direct memory access unit
CN111181874A (en) Message processing method, device and storage medium
CN102170401B (en) Method and device of data processing
CN116471242A (en) RDMA-based transmitting end, RDMA-based receiving end, data transmission system and data transmission method
US7751400B2 (en) Method, system, and computer program product for ethernet virtualization using an elastic FIFO memory to facilitate flow of unknown traffic to virtual hosts
US9804959B2 (en) In-flight packet processing
TW202236104A (en) Message communication between integrated computing devices
US8386682B2 (en) Method, apparatus and system for maintaining transaction coherecy in a multiple data bus platform
US20120246263A1 (en) Data transfer apparatus, data transfer method, and information processing apparatus
CN112073316B (en) Data transmission system, method, equipment and storage medium with multi-channel bit width change
CN117041186B (en) Data transmission method, chip system, computing device and storage medium
CN117931391A (en) Lossless and efficient data processing method based on RMDA and network interface card
CN117149276A (en) Method for realizing multi-process concurrency based on PCIe XDMA
CN117812029A (en) Data message sharing processing method and system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
WW01 Invention patent application withdrawn after publication

Application publication date: 20201204

WW01 Invention patent application withdrawn after publication