CN113986791B

CN113986791B - Method, system, equipment and terminal for designing intelligent network card fast DMA

Info

Publication number: CN113986791B
Application number: CN202111071199.3A
Authority: CN
Inventors: 潘伟涛; 王浩; 邱智亮; 殷建飞; 熊子豪
Original assignee: Xidian University
Current assignee: Xidian University
Priority date: 2021-09-13
Filing date: 2021-09-13
Publication date: 2024-02-02
Anticipated expiration: 2041-09-13
Also published as: CN113986791A

Abstract

The invention belongs to the technical field of intelligent network cards, and discloses a method, a system, equipment and a terminal for designing the intelligent network card quick DMA, wherein the intelligent network card quick DMA comprises an H2CDoorBell register, an H2C flow control schedule, an H2C descriptor engine, an H2CCMPT engine, a C2HDoorBell register, a C2H flow control schedule, a C2H descriptor cache alignment, a C2H descriptor engine, a C2HCMPT engine and an interrupt processing module. The invention has the advantages that the adaptation scenes and the processing flows of various descriptors are more abundant, the problems of low efficiency and the like of the traditional QDMA short packet are effectively solved, and an effective and feasible scheme is provided for a large number of devices; the implementation mode is more customized, and better service is provided for TOE service; and establishing effective association of the Session and the queue, and providing good and rapid DMA service for Session between intelligent network cards.

Description

Method, system, equipment and terminal for designing intelligent network card fast DMA

Technical Field

The invention belongs to the technical field of intelligent network cards, and particularly relates to a method, a system, equipment and a terminal for designing a quick DMA of an intelligent network card.

Background

At present, with the widespread rise of technologies such as 5G communication, internet of things, cloud computing, big data and the like, the data traffic has exploded, and although the performance of a CPU is continuously developing and improving, the CPU still becomes a bottleneck of big data processing, TOE (TCP Offload Engine) as one of the most effective solutions has become a hot spot to be studied, but the main application scenario of TOE is limited to servers for processing large data blocks at present, such as: the storage backup, retrieval system and enterprise database all use large data payloads, and these specific scenarios determine that the intelligent network card is more biased towards a custom network card, meaning that a common DMA is not necessarily suitable for the intelligent network card.

QDMA is used as a DMA which has higher performance and is more suitable for large-block data transmission and supports multiple queues, is definitely the most suitable for an intelligent network card, but the intelligent network card supports two modes of TOE and Bypass for adapting to a traditional network card, correspondingly provides two types of descriptors, ensures the scene of being fully compatible with a common network card, and adds a special descriptor for protocol short frame service in the two modes of a transmitting side for improving the performance of DMA for processing short packets; the hybrid descriptor makes the quick DMA of the intelligent network card have the capability of identifying and analyzing different descriptors and executing corresponding operations, besides, the TOE adopts a Session interface to interact between a transmission layer and a host, so the quick DMA also needs to establish the association of a queue and a Session; the descriptor supported by the common QDMA is single, the architecture is fixed, and the method is not completely suitable for the intelligent network card.

QDMA is a high-performance DMA based on multiple queues, by creating a plurality of Queue Pairs (Queue Pairs) in a memory, each of which is SQ (Submission Queue) and CQ (Completion Queue), and the number and depth of which are dynamically configurable; for SQ, the host is its producer (host writes a pointer to the tail of SQ), and the on-card QDMA engine is its consumer (on-card QDMA engine fetches descriptor instruction execution from the head of SQ); for the CQ, the on-network-card QDMA engine is a producer (the on-network-card QDMA engine writes the execution result of the corresponding SQ descriptor instruction into the tail of the CQ), the host is a consumer (the host reads the execution result from the head of the CQ), when the host receives the CQ, the corresponding SQ is released according to the execution result, in order to ensure that the ring queue cannot be rewritten or read missing, a DoorBell register is maintained in the on-network-card QDMA engine to record the head pointer and the tail pointer of the SQ and the CQ, each SQ or CQ has a respective DoorBell register, the host maintains the tail pointer of the SQ and the head pointer of the CQ, and the on-network-card QDMA engine maintains the head pointer of the SQ and the tail pointer of the CQ, so that the interaction of descriptors between the host and the network card is realized.

XILINX pushes out a high-performance QDMA, supports three types of 2K ring queues of H2C (HOST TO CARD: HOST TO network CARD), C2H (CARD TO HOST: network CARD TO HOST) and C2H CMPT (CARD TO HOST Completion), which is TO maintain a Doorbell register in the QDMA, complete data transmission and reception through an analytic descriptor, support 2K MSIx interrupts, support 4 Physical Function and 252 Virtual functions at most, support 8 MSIx interrupts at most, and Function configuration of various types of data interfaces, rates and the like, and is suitable for the working scenes of various common network CARDs.

The QDMA of XILINX is fixed in the descriptor resolution format, can not support the concept of mixed descriptors in the TOE network card and has no Session, and can not establish the relationship between the queue and the Session; the QDMA of XILINX has only a queue of C2H's CMPT and no queue that provides descriptor execution results for the H2C direction. At most 8 interrupt vectors are supported per Function in the QDMA of XILINX, which clearly provides convenience to the users of multi-functions, but is not suitable for TOE cards because it does not provide a unique interrupt vector for each CMPT queue, which increases the design complexity of the card. There is a further need for a new type of fast DMA that adapts TOE mix descriptors, which is more flexible and configurable.

Through the above analysis, the problems and defects existing in the prior art are as follows:

(1) At present, the main application scenario of the TOE is limited to a server for processing large data blocks, and a specific scenario determines that the intelligent network card is more preferable to a customized network card, i.e. the common DMA is not necessarily suitable for the intelligent network card.

(2) The descriptor supported by the common QDMA is single, the architecture is fixed, and the method is not completely suitable for the intelligent network card.

(3) The QDMA of XILINX is fixed in the descriptor resolution format, cannot support the concept of mixed descriptors in TOE cards and has no Session, and cannot establish a relationship between queues and sessions.

(4) The QDMA of XILINX has only a queue of C2H's CMPT and no queue that provides descriptor execution results for the H2C direction.

(5) At most 8 interrupt vectors are supported per Function in the QDMA of XILINX, but are not suitable for TOE cards because they do not provide a unique interrupt vector for each CMPT queue, which increases the design complexity of the card.

The difficulty of solving the problems and the defects is as follows: quick DMA needs to have the capability of identifying and processing different descriptors, and various descriptors can play the greatest advantage in an adaptive scene; mapping the Session and the queue requires the cooperation of software and hardware to establish corresponding connection.

The meaning of solving the problems and the defects is as follows: the fast DMA provides multi-scene support for the intelligent network card; providing good quick DMA service for Session between intelligent network cards; the scene of removing multiple functions is more in line with the application of the intelligent network card, and the interrupt vector and the queue are bound, so that the complexity of design is reduced.

Disclosure of Invention

Aiming at the problems existing in the prior art, the invention provides a method, a system, equipment and a terminal for designing the intelligent network card fast DMA, in particular to a method, a system, equipment and a terminal for designing the intelligent network card fast DMA based on a hybrid descriptor.

The invention is realized in such a way that the intelligent network card rapid DMA design method comprises the following steps:

step one, maintaining a Doorbell register of H2C through the Doorbell register of H2C, and providing relevant information for the flow control and the scheduling of H2C; scheduling the queue of the H2C according to the weight through H2C flow control scheduling, and controlling the flow of the descriptor according to the Crodit information on the board;

step two, acquiring descriptors from the HOST side through an H2C descriptor engine according to a scheduling result, classifying and storing the descriptors, analyzing the descriptors by the descriptor engine, and transferring data from the HOST side to TOE_RX;

Step three, assembling a corresponding completion message by the H2C CMPT engine according to the descriptor type and the state fed back by the TOE_RX side, and writing the completion message back into a CMPT queue of the HOST side;

step four: scheduling the C2H queue according to the weight and the running state of the descriptor engine through C2H flow control scheduling, and controlling the flow of the descriptor according to the Crodit information on the board;

step five, maintaining a C2H Doorbell register through the C2H Doorbell register to provide relevant information for the flow control and the scheduling of the C2H;

step six, caching the prefetched descriptors of the column pairs through the C2H descriptor cache according to the queue number, and giving corresponding state information for lower-level processing;

step seven, the C2H descriptor engine obtains and analyzes the descriptor from the queue according to the queue state and the data condition of the local TOE, and moves the data of the TOE_TX side to the HOST side;

step eight, the message is assembled through the C2H CMPT engine according to the use condition of the local descriptor and is sent to a queue corresponding to the HOST side;

step nine, when the specific CQ of C2H and H2C is completed and abnormal state occurs, the specific MSIx message is assembled by the interrupt processing module and uploaded to HOST.

In the invention, the first step is as follows: providing an effective flow control scheduling mechanism for H2C side data; step two: normal acquisition of descriptors, classifying the descriptors, is the basis for supporting mixed descriptors. Step three: finishing the write-back of CQ, and feeding back the cache and command execution condition to HOST; step five: the same step one, provide the effective flow control scheduling mechanism for C2H side; this step, in fact, is parallel to step one, representing two directions. Step six: prefetching descriptors reduces the latency of DMA transfer data. Step seven: completing the movement of C2H side data according to the descriptor; step eight: the write-back of CQ is completed, and command execution status is fed back to HOST.

Further, the data processing flow of the H2C side includes:

(1) HOST updates the descriptor information to the hardware board card by configuring a H2C side Doorbell;

(2) The flow control scheduling selects descriptors of the corresponding queues for obtaining;

(3) Related information given by a Doorbell register is processed by a DMA engine through length, 4K boundary and the like to send a reading application to an AXI Slave;

(4) After the description is acquired, classifying and identifying the descriptive degree, and transmitting the corresponding type into different descriptor engines according to the OPCODE in the descriptor;

(5) If the data is the TOE data descriptor, sending a read data application to the AXI Slave through the DMA engine through the length, the 4K boundary and other processes according to the address and the length in the descriptor, transmitting the return data to the TOE, and judging whether the LAST bit of the descriptor is 1 or not;

(6) If the LAST bit is 0, continuing to acquire the descriptor of the same Session until the LAST bit is 1, and repeatedly executing the step (5) in the process;

(7) If the LAST bit is 1, waiting for the feedback state of the TOE, and feeding back the corresponding buffer space and the execution result on the TOE, and completing the message according to the information;

(8) The assembled completion message is processed by a DMA engine through length, 4K boundary and the like to send a data writing request to an AXI Slave, the completion message is written into a corresponding CMPT queue, and meanwhile, the Doorbell of the CQ is updated;

(9) Assembling MSIx interrupt uploading of corresponding vector numbers;

(10) If the command is a TOE command descriptor, the data to be transmitted at this time is carried in the descriptor, so that the information load is directly sent to the TOE, the feedback state of the TOE is waited, the buffer space and the execution result of the corresponding command on the TOE are waited, and the message is assembled according to the information;

(11) The step (8) and the step (9) are executed similarly, and the execution of the descriptor is completed;

(12) If the data is a Bypass data descriptor, the data executed at the time is independent and is transmitted through the TOE, so that a read data application is sent to the AXI Slave through the processing of the length, the 4K boundary and the like by a DMA engine according to the address and the length in the descriptor, and return data is transmitted to the TOE;

(13) The DMA assembles the completion message by itself, including the execution result of the descriptor;

(14) Step (8) and step (9) are executed in the same way, and the execution of the descriptor is completed;

(15) If the message is a Bypass command descriptor, the descriptor carries a command which wants the Bypass to the TOE, so that the message load is directly sent to the TOE, and the message is assembled by the message;

(16) And (3) executing the step (8) and the step (9) in the same way, and completing the execution of the current descriptor.

Further, the data processing flow of the C2H side includes:

(1) HOST updates the descriptor information to the hardware board card by configuring a C2H side Doorbell;

(2) The flow control scheduling selects descriptors of the corresponding queues for prefetching;

(4) Storing the received descriptor information into corresponding caches according to queues, and updating the cache information in real time;

(5) If the data is TOE data descriptors, extracting and pre-analyzing the descriptors of the queues;

(6) When the TOE has data to send to the DMA, the data is moved according to the corresponding pre-resolved descriptors, and if the corresponding descriptors do not exist, the application is actively carried out;

(7) Because the plurality of data share one descriptor or the plurality of data use the plurality of descriptors, if the plurality of descriptors are occupied by the plurality of data, the space use condition of the last descriptor is recorded, if the plurality of descriptors are shared by the plurality of data, the use condition of the descriptors is timed, and once the timeout is finished, the transmission is finished in advance, so that the descriptors are prevented from being used for too long time;

(8) The data is processed through the length, the 4K boundary and the like according to the information in the descriptor by the DMA engine to send a data writing request to the AXI Slave, and the data is written to a corresponding address;

(9) According to the descriptor space use condition and whether overtime information is needed to assemble the completion message, the assembled completion message is processed by a DMA engine through length, 4K boundary and the like to send a data writing request to an AXI Slave, the completion message is written into a corresponding CMPT queue, and meanwhile, the Doorbell of the CQ is updated;

(10) Assembling MSIx interrupt uploading of corresponding vector numbers;

(11) If the data is the Bypass data descriptor, the descriptor does not need to be parsed;

(12) When the TOE has Bypass type data to send to the DMA, the description degree of the corresponding queue is extracted from the buffer memory area, and the Bypass type data does not have the condition that descriptors are shared or are multi-purpose, so that the data and the descriptors are in one-to-one relation;

(13) The data is processed through the length, the 4K boundary and the like according to the information in the descriptor by the DMA engine to send a data writing request to the AXI Slave, and the data is written to a corresponding address;

(14) Assembling a completion message according to the information such as the descriptor space use condition, sending a data writing request to an AXI Slave through the DMA engine by processing the length, the 4K boundary and the like of the assembled completion message, writing the completion message into a corresponding CMPT queue, and simultaneously updating the Doorbell of the CQ;

(15) And (5) executing the step (10) to finish the transmission.

The invention further aims to provide the intelligent network card quick DMA designed by the intelligent network card quick DMA design method, wherein the intelligent network card quick DMA supports hybrid descriptors, and the H2C direction comprises four descriptors, namely a TOE data descriptor, a Bypass data descriptor, a TOE command descriptor directly carrying short packet protocol data and a Bypass command descriptor; the method comprises the steps that two descriptors of TOE data descriptors and Bypass data descriptors are included in the C2H direction; both the C2H and H2C directions contain respective completion descriptors; the whole system comprises eight descriptors; each Session in the H2C direction is bound to a unique queue and embedded in the descriptor command, and each Session in the C2H direction is also bound to a unique queue, with the CPU configuring the binding relationship and embedded in the completion descriptor command.

Further, the intelligent network card rapid DMA further includes:

the H2C DoorBell register is used for maintaining the H2C Doorbell register and providing related information for the flow control and the scheduling of the H2C;

the H2C flow control scheduling is used for scheduling the queues of the H2C according to the weight and controlling the flow of the descriptors according to the Crodit information on the board;

the H2C descriptor engine is used for acquiring the descriptors from the HOST side according to the scheduling result, classifying and storing the descriptors, analyzing the descriptors by the descriptor engine, and transferring data from the HOST side to the TOE_RX;

the H2C CMPT engine is used for assembling a corresponding completion message according to the descriptor type and the state fed back by the TOE_RX side and writing the completion message back into the CMPT queue of the HOST side;

the C2H DoorBell register is used for maintaining the C2H Doorbell register and providing relevant information for the flow control and the scheduling of the C2H;

C2H flow control scheduling, which is used for scheduling the queues of the C2H according to the weight and the running state of the descriptor engine, and controlling the flow of the descriptor according to the Crodit information on the board;

the C2H descriptor cache pair column is used for caching the prefetched descriptors according to the queue number and giving corresponding state information for lower-level processing;

The C2H descriptor engine is used for acquiring and analyzing descriptors from the queue according to the state of the queue and the data condition of the local TOE and moving the data of the TOE_TX side to the HOST side;

the C2H CMPT engine is used for assembling the completion message according to the use condition of the local descriptor and sending the completion message to a queue corresponding to the HOST side;

and the interrupt processing module is used for assembling a specific MSIx message to the HOST after the specific CQ of the C2H and the H2C is completed and when the abnormal state occurs.

Another object of the present invention is to provide an intelligent network card rapid DMA design system applying the intelligent network card rapid DMA design method, the intelligent network card rapid DMA design system includes:

the H2C information acquisition module is used for maintaining a Doorbell register of the H2C through the H2C Doorbell register and providing related information for the flow control and the scheduling of the H2C;

the H2C flow control scheduling module is used for scheduling the queues of the H2C according to the weight through H2C flow control scheduling and controlling the flow of the descriptors according to the Crodit information on the board;

the H2C descriptor analysis module is used for acquiring the descriptors from the HOST side through the H2C descriptor engine according to the scheduling result, classifying and storing the descriptors, analyzing the descriptors by the descriptor engine, and transferring data from the HOST side to the TOE_RX;

The H2C register maintenance module is used for assembling a corresponding completion message into a CMPT queue of the HOST side through the H2C CMPT engine according to the descriptor type and the state fed back by the TOE_RX side;

the C2H information acquisition module is used for maintaining a Doorbell register of the C2H through the C2H Doorbell register and providing related information for flow control and scheduling of the C2H;

the C2H flow control module is used for scheduling the queues of the C2H according to the weight and the running state of the descriptor engine through C2H flow control scheduling, and controlling the flow of the descriptor according to the Credit information on the board;

the C2H descriptor caching module is used for caching the prefetched descriptors of the column pair through the C2H descriptor caching, and giving corresponding state information for lower-level processing;

the C2H descriptor analysis module is used for acquiring and analyzing descriptors from the queue through the C2H descriptor engine according to the state of the queue and the data condition of the local TOE and moving the data of the TOE_TX side to the HOST side;

the message assembly module is used for assembling the message through the C2H CMPT engine according to the use condition of the local descriptor and sending the message into a queue corresponding to the HOST side;

and the message uploading module is used for assembling a specific MSIx message to the HOST through the interrupt processing module after the specific CQ of the C2H and the H2C is completed and when the abnormal state occurs.

It is a further object of the present invention to provide a computer device comprising a memory and a processor, the memory storing a computer program which, when executed by the processor, causes the processor to perform the steps of:

maintaining a Doorbell register of the H2C through the Doorbell register of the H2C, and providing related information for the flow control and the scheduling of the H2C; scheduling the queue of the H2C according to the weight through H2C flow control scheduling, and controlling the flow of the descriptor according to the Crodit information on the board; acquiring descriptors from the HOST side according to a scheduling result by an H2C descriptor engine, classifying and storing the descriptors, analyzing the descriptors by the descriptor engine, and transferring data from the HOST side to TOE_RX; assembling a corresponding completion message by the H2C CMPT engine according to the descriptor type and the state fed back by the TOE_RX side, and writing the completion message back into a CMPT queue of the HOST side; maintaining a C2H Doorbell register through the C2H Doorbell register, and providing related information for the flow control and the scheduling of the C2H; scheduling the C2H queue according to the weight and the running state of the descriptor engine through C2H flow control scheduling, and controlling the flow of the descriptor according to the Crodit information on the board; caching the prefetched descriptors of the column pairs through the C2H descriptor cache according to the queue number, and giving corresponding state information for lower-level processing; acquiring and analyzing descriptors from the queue through a C2H descriptor engine according to the state of the queue and the data condition of the local TOE, and moving the data of the TOE_TX side to the HOST side; the message is assembled through the C2H CMPT engine according to the use condition of the local descriptor and is sent to a queue corresponding to the HOST side; when the specific CQ of the C2H and the H2C is completed and the abnormal state occurs, the specific MSIx message is assembled by the interrupt processing module and uploaded to HOST.

Another object of the present invention is to provide a computer readable storage medium storing a computer program which, when executed by a processor, causes the processor to perform the steps of:

The invention also aims to provide an application of the intelligent network card quick DMA in a server.

The invention further aims to provide an information data processing terminal which is used for realizing the intelligent network card rapid DMA design system.

By combining all the technical schemes, the invention has the advantages and positive effects that: the invention provides a quick DMA design method of an intelligent network card, which provides a novel QDMA engine capable of supporting various types of descriptors and establishing a clear mapping relation between Session and opposite columns; adding a CMPT queue of the H2C, and providing a result of descriptor instruction execution and a TOE network card internal cache state for a host; the Function restriction is removed and a unique interrupt vector is provided for each queue.

The invention provides an implementation mode for supporting the rapid DMA of the intelligent network card hybrid descriptor, and the adaptive scene and the processing flow of various descriptors; the network card has more abundant adaptation scenes, effectively solves the problems of low efficiency and the like of the traditional QDMA short packet, and provides an effective and feasible scheme for a large number of devices.

The realization mode of the invention is more customized, the invention is more suitable for various scenes by increasing part of complexity, the use of the mixed descriptor is perfectly compatible with the common network card, and the invention also provides better service for TOE service; an effective association of the Session and the queue is established, and good and rapid DMA service is provided for the Session between the intelligent network cards.

The invention increases CMPT queues on two sides, effectively enables HOST to acquire information such as the buffer memory of the lower layer, the completion state and the like, provides an information basis for effectively realizing a flow control mechanism, and ensures the processing of data. The invention lightens the pressure of software and hardware interrupt design, and provides convenience for interrupt design by adopting an interrupt and queue binding mode.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings that are needed in the embodiments of the present invention will be briefly described below, and it is obvious that the drawings described below are only some embodiments of the present invention, and other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

Fig. 1 is a flowchart of a method for designing a fast DMA of an intelligent network card according to an embodiment of the present invention.

Fig. 2 is a diagram of a TOE QDMA framework provided by an embodiment of the present invention.

Fig. 3 is a flow chart of data processing on the H2C side according to an embodiment of the present invention.

Fig. 4 is a flow chart of data processing on the C2H side according to an embodiment of the present invention.

Detailed Description

The present invention will be described in further detail with reference to the following examples in order to make the objects, technical solutions and advantages of the present invention more apparent. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the invention.

Aiming at the problems existing in the prior art, the invention provides a method, a system, equipment and a terminal for designing a quick DMA of an intelligent network card, and the invention is described in detail below with reference to the accompanying drawings.

As shown in fig. 1, the method for designing the fast DMA of the intelligent network card provided by the embodiment of the invention comprises the following steps:

s101, maintaining a Doorbell register of H2C through the Doorbell register of H2C, and providing relevant information for flow control and scheduling of H2C; scheduling the queue of the H2C according to the weight through H2C flow control scheduling, and controlling the flow of the descriptor according to the Crodit information on the board;

s102, acquiring descriptors from the HOST side according to a scheduling result through an H2C descriptor engine, classifying and storing the descriptors, analyzing the descriptors by the descriptor engine, and transferring data from the HOST side to TOE_RX;

s103, assembling a corresponding completion message by the H2C CMPT engine according to the descriptor type and the state fed back by the TOE_RX side, and writing the completion message back into a CMPT queue of the HOST side;

s104, scheduling the C2H queue according to the weight and the running state of the descriptor engine through C2H flow control scheduling, and controlling the flow of the descriptor according to the Crodit information on the board;

s105, maintaining a C2H Doorbell register through the C2H Doorbell register, and providing relevant information for the flow control and the scheduling of the C2H;

S106, caching the prefetched descriptors of the column pairs through the C2H descriptor cache according to the queue number, and giving corresponding state information for lower-level processing;

s107, the C2H descriptor engine acquires and analyzes the descriptor from the queue according to the queue state and the data condition of the local TOE, and moves the data of the TOE_TX side to the HOST side;

s108, assembling the completion message according to the use condition of the local descriptor by the C2H CMPT engine, and sending the completion message into a queue corresponding to the HOST side;

s109: when the specific CQ of the C2H and the H2C is completed and the abnormal state occurs, the specific MSIx message is assembled by the interrupt processing module and uploaded to HOST.

The intelligent network card rapid DMA provided by the embodiment of the invention supports hybrid descriptors, and comprises four descriptors, namely a TOE data descriptor, a Bypass data descriptor, a TOE command descriptor directly carrying short packet protocol data and a Bypass command descriptor in the H2C direction; the method comprises the steps that two descriptors of TOE data descriptors and Bypass data descriptors are included in the C2H direction; both the C2H and H2C directions contain respective completion descriptors; the whole system comprises eight descriptors; each Session in the H2C direction is bound to a unique queue and embedded in the descriptor command, and each Session in the C2H direction is also bound to a unique queue, with the CPU configuring the binding relationship and embedded in the completion descriptor command.

As shown in fig. 2, the fast DMA of the intelligent network card provided in the embodiment of the present invention includes:

The intelligent network card rapid DMA design system provided by the embodiment of the invention comprises:

The technical scheme of the present invention is further described in conjunction with the term explanation.

TCP (Transmission Control Protocol): is a connection-oriented, reliable, byte stream based transport layer communication protocol (conventionally implemented in software).

TOE (TCP Offload Engine): the TCP unloading engine is a TCP hardware acceleration technology, and achieves the purpose of acceleration by realizing a TCP protocol on hardware.

DMA (Direct MemoryAccess): the interface technology is used for directly exchanging data with the system memory without the CPU, and the whole transmission process does not need the CPU to directly control the transmission, so that the efficiency of the CPU is greatly improved.

QDMA (Quick DirectMemoryAccess): the main difference between fast direct memory access and other DMA is that the concept of a queue is proposed.

Session: the TCP procedure establishes a connection-oriented session.

The technical scheme of the invention is further described below with reference to specific embodiments.

The QDMA of XILINX is fixed in the descriptor resolution format, can not support the concept of mixed descriptors in the TOE network card and has no Session, and can not establish the relationship between the queue and the Session; in response to this problem, the present invention proposes a new QDMA engine that can support multiple types of descriptors and that establishes an explicit mapping between Session and queue pairs.

The QDMA of XILINX has only the CMPT queue of C2H, does not provide the queue of the execution result of the descriptor for the H2C direction, in order to feed back the state of the TOE network card, the software is convenient for providing mechanisms such as flow control and the like for the network card, the invention adds the CMPT queue of H2C, and provides the execution result of the descriptor instruction and the internal cache state of the TOE network card for the host.

The QDMA of XILINX supports 8 interrupt vectors at most per Function, which clearly provides convenience for users of multi-Function, but is not suitable for TOE network cards of the invention, because unique interrupt vectors cannot be provided for each CMPT queue, the design complexity of the network card is increased, the invention removes the limit of functions, and provides unique interrupt vectors for each queue.

The overall structure of the technical scheme will be described below.

The fast DMA of the intelligent network card supporting the mixed descriptor comprises four descriptors, namely a TOE data descriptor, a Bypass data descriptor, a TOE command descriptor directly carrying short packet protocol data and a Bypass command descriptor, in the H2C direction; the method comprises the steps that two descriptors of TOE data descriptors and Bypass data descriptors are included in the C2H direction; both the C2H and H2C directions contain respective completion descriptors; the whole system contains eight descriptors. Each Session in the H2C direction is bound to a unique queue and embedded in the descriptor command, and each Session in the C2H direction is also bound to a unique queue, with the CPU configuring the binding relationship and embedded in the completion descriptor command.

The TOE QDMA frame diagram is shown in FIG. 2.

(1) H2C dorbell register: and maintaining a Doorbell register of the H2C, and providing related information for the flow control and scheduling of the H2C.

(2) H2C flow control scheduling: and scheduling the queue of the H2C according to the weight, and controlling the flow of the descriptor according to the Credit information on the board.

(3) H2C descriptor engine: and acquiring the descriptors from the HOST side according to the scheduling result, classifying and storing the descriptors, analyzing the descriptors by a descriptor engine, and carrying data from the HOST side to transmit the data to TOE_RX.

(4) H2C CMPT engine: and according to the descriptor type and the state fed back by the TOE_RX side, assembling a corresponding completion message, and writing the completion message back into the CMPT queue of the HOST side.

(5) C2H dorbell register: and maintaining a Doorbell register of the C2H to provide relevant information for the flow control and scheduling of the C2H.

(6) C2H flow control scheduling: and scheduling the C2H queue according to the weight and the running state of the descriptor engine, and controlling the flow of the descriptor according to the Credit information on the board.

(7) C2H descriptor cache pair column: the prefetched descriptors are cached according to the queue number and corresponding state information is given for lower processing.

(8) C2H descriptor engine: and acquiring and analyzing descriptors from the queue according to the state of the queue and the data condition of the local TOE, and moving the data of the TOE_TX side to the HOST side.

(9) C2H CMPT engine: and assembling the completion message according to the use condition of the local descriptor, and sending the completion message to a queue corresponding to the HOST side.

(10) An interrupt processing module: when the specific CQ of C2H and H2C is completed and an exception condition occurs, the specific MSIx message is assembled and uploaded to HOST.

The data processing flow of the H2C side is introduced:

the data flow diagram is shown in fig. 3.

(1) HOST updates the descriptor information to the hardware board by configuring the H2C side Doorbell.

(2) And the flow control scheduling selects the descriptors of the corresponding queues for acquisition.

(3) And sending a read application to the AXI Slave through the processing of the length, the 4K boundary and the like by a DMA engine through the related information given by the Doorbell register.

(4) After the description is acquired, the description degree is firstly classified and identified, and the corresponding type is transmitted into different descriptor engines according to the OPCODE in the descriptor.

(6) If the LAST bit is 0, continuing to acquire the descriptor of the same Session until the LAST bit is 1, and repeatedly executing the process (5) in the process;

(7) If LAST bit is 1, waiting for the state fed back by the TOE, mainly the corresponding buffer space above the TOE and the feedback of the execution result, and assembling the message according to the information.

(8) And sending a data writing request to the AXI Slave through the DMA engine by processing the assembled completion message through the length, the 4K boundary and the like, writing the completion message into a corresponding CMPT queue, and simultaneously updating the Doorbell of the CQ.

(9) MSIx assembling corresponding vector number interrupts upload.

(10) If the command is a TOE command descriptor, the data to be transmitted at this time is carried in the descriptor, so that the information load is directly sent to the TOE, the feedback state of the TOE is waited, mainly the buffer space and the execution result of the corresponding command on the TOE, and the message is assembled according to the information.

(11) And (3) the same execution process (8) (9) is completed for executing the current descriptor.

(12) If the data is the Bypass data descriptor, the data executed at the time is independent and is transmitted through the TOE, so that a read data application is sent to the AXI Slave through the processing of the length, the 4K boundary and the like by the DMA engine according to the address and the length in the descriptor, and return data is transmitted to the TOE.

(13) The DMA self-assembles the completion message mainly comprising the execution result of the current descriptor.

(14) And (3) the same execution process (8) (9) is completed for executing the current descriptor.

(15) If the message is a Bypass command descriptor, the command which wants the Bypass to the TOE is carried in the descriptor, so that the message load is directly sent to the TOE, and the message is assembled by the message.

(16) And (3) the same execution process (8) (9) is completed for executing the current descriptor.

The data processing flow on the C2H side is introduced:

the data flow diagram is shown in fig. 4.

(1) HOST updates the descriptor information to the hardware board by configuring the C2H side Doorbell.

(2) The flow control schedule selects descriptors of the corresponding queues for prefetching.

(4) Storing the received descriptor information into corresponding caches according to the queues, and updating the cache information in real time.

(5) In the case of TOE data descriptors, the descriptors of the various queues are extracted and pre-parsed.

(6) When the TOE has data to send to the DMA, the data is moved according to the corresponding pre-resolved descriptors, and if the corresponding descriptors do not exist, the application is actively carried out.

(7) Because there is a plurality of data to share a descriptor or a plurality of data use a plurality of descriptors, if it is a data to occupy a plurality of descriptors, the space use condition of the last descriptor is recorded, if it is a plurality of data to share a descriptor, the use condition of the descriptor needs to be timed, once the timeout finishes this transmission in advance, avoiding that the descriptor is not used for a long time.

(8) And sending a data writing request to the AXI Slave through the DMA engine according to the information in the descriptor and through the length, the 4K boundary and other processes, and writing the data to the corresponding address.

(9) And according to the descriptor space use condition, whether overtime and other information assembly completion messages are carried out, the assembled completion messages are processed through a DMA engine, a length, a 4K boundary and the like to send data writing requests to an AXI Slave, the completion messages are written into corresponding CMPT queues, and the Doorbell of the CQ is updated at the same time.

(10) MSIx assembling corresponding vector number interrupts upload.

(11) In the case of Bypass data descriptors, there is no need to parse the descriptors.

(12) When the TOE has Bypass type data to send to the DMA, the description degree of the corresponding queue is extracted from the buffer memory area, and the Bypass type data has no descriptor sharing or multi-use condition, so that the data and the descriptors are in one-to-one relation.

(13) And sending a data writing request to the AXI Slave through the DMA engine according to the information in the descriptor and through the length, the 4K boundary and other processes, and writing the data to the corresponding address.

(14) And assembling the completion message according to the information such as the descriptor space use condition, sending a data writing request to the AXI Slave through the DMA engine by processing the length, the 4K boundary and the like, writing the completion message into a corresponding CMPT queue, and simultaneously updating the Doorbell of the CQ.

(15) The same process (10) is performed to complete the transmission.

In the above embodiments, it may be implemented in whole or in part by software, hardware, firmware, or any combination thereof. When used in whole or in part, is implemented in the form of a computer program product comprising one or more computer instructions. When loaded or executed on a computer, produces a flow or function in accordance with embodiments of the present invention, in whole or in part. The computer may be a general purpose computer, a special purpose computer, a computer network, or other programmable apparatus. The computer instructions may be stored in a computer-readable storage medium or transmitted from one computer-readable storage medium to another computer-readable storage medium, for example, the computer instructions may be transmitted from one website, computer, server, or data center to another website, computer, server, or data center by a wired (e.g., coaxial cable, fiber optic, digital Subscriber Line (DSL), or wireless (e.g., infrared, wireless, microwave, etc.) means. The computer readable storage medium may be any available medium that can be accessed by a computer or a data storage device such as a server, data center, etc. that contains an integration of one or more available media. The usable medium may be a magnetic medium (e.g., floppy Disk, hard Disk, magnetic tape), an optical medium (e.g., DVD), or a semiconductor medium (e.g., solid State Disk (SSD)), etc.

The foregoing is merely illustrative of specific embodiments of the present invention, and the scope of the invention is not limited thereto, but any modifications, equivalents, improvements and alternatives falling within the spirit and principles of the present invention will be apparent to those skilled in the art within the scope of the present invention.

Claims

1. The intelligent network card rapid DMA design method is characterized by comprising the following steps of:

Step four, scheduling the queue of the C2H according to the weight and the running state of the descriptor engine through C2H flow control scheduling, and controlling the flow of the descriptor according to the Crodit information on the board;

2. The intelligent network card rapid DMA design method according to claim 1, wherein the H2C-side data processing flow comprises:

(9) Assembling MSIx interrupt uploading of corresponding vector numbers;

(12) If the data is a Bypass data descriptor, the data executed at the time is independent and is transmitted through the TOE, so that a read data application is sent to the AXI Slave through the length and 4K boundary processing by the DMA engine according to the address and the length in the descriptor, and return data is transmitted to the TOE;

3. The intelligent network card rapid DMA design method according to claim 1, wherein the data processing flow of the C2H side comprises:

(3) Related information given by a Doorbell register is processed by a length and 4K boundary through a DMA engine to send a read application to an AXI Slave;

(8) Sending a data writing request to the AXI Slave through the DMA engine according to the information in the descriptor and through length and 4K boundary processing, and writing the data to a corresponding address;

(9) According to the using condition of the descriptor space and whether overtime information is used for assembling the completion message, the assembled completion message is processed through the length and 4K boundary by a DMA engine to send a data writing request to an AXI Slave, the completion message is written into a corresponding CMPT queue, and the Doorbell of the CQ is updated at the same time;

(10) Assembling MSIx interrupt uploading of corresponding vector numbers;

(13) Sending a data writing request to the AXI Slave through the DMA engine according to the information in the descriptor and through length and 4K boundary processing, and writing the data to a corresponding address;

(14) Assembling a completion message according to the descriptor space use condition information, sending a data writing request to an AXI Slave through the DMA engine by length and 4K boundary processing, writing the completion message into a corresponding CMPT queue, and simultaneously updating the Doorbell of the CQ;

(15) And (5) executing the step (10) to finish the transmission.

4. An intelligent network card quick DMA designed by the intelligent network card quick DMA design method according to any one of claims 1-3, characterized in that the intelligent network card quick DMA supports hybrid descriptors, and includes four descriptors in H2C direction, namely a TOE data descriptor, a Bypass data descriptor, a TOE command descriptor directly carrying short packet protocol data, and a Bypass command descriptor; the method comprises the steps that two descriptors of TOE data descriptors and Bypass data descriptors are included in the C2H direction; both the C2H and H2C directions contain respective completion descriptors; the whole system comprises eight descriptors; each Session in the H2C direction is bound to a unique queue and embedded in the descriptor command, and each Session in the C2H direction is also bound to a unique queue, with the CPU configuring the binding relationship and embedded in the completion descriptor command.

5. The intelligent network card flash DMA of claim 4, further comprising:

6. An intelligent network card rapid DMA design system applying the intelligent network card rapid DMA design method according to any one of claims 1 to 3, characterized in that the intelligent network card rapid DMA design system comprises:

7. A computer device, characterized in that the computer device comprises a memory and a processor, the memory storing a computer program, which when executed by the processor, causes the processor to execute the steps of the intelligent network card fast DMA design method according to any of the claims 1-3.

8. A computer readable storage medium storing a computer program which, when executed by a processor, causes the processor to perform the steps of the intelligent network card fast DMA design method according to any one of claims 1-3.

9. An application of the intelligent network card fast DMA according to any one of claims 4-5 in a server.

10. An information data processing terminal, wherein the information data processing terminal is configured to implement the intelligent network card rapid DMA design system according to claim 6.