CN113986791A - Intelligent network card rapid DMA design method, system, equipment and terminal - Google Patents

Intelligent network card rapid DMA design method, system, equipment and terminal Download PDF

Info

Publication number
CN113986791A
CN113986791A CN202111071199.3A CN202111071199A CN113986791A CN 113986791 A CN113986791 A CN 113986791A CN 202111071199 A CN202111071199 A CN 202111071199A CN 113986791 A CN113986791 A CN 113986791A
Authority
CN
China
Prior art keywords
descriptor
data
queue
toe
engine
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202111071199.3A
Other languages
Chinese (zh)
Other versions
CN113986791B (en
Inventor
潘伟涛
王浩
邱智亮
殷建飞
熊子豪
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Xidian University
Original Assignee
Xidian University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Xidian University filed Critical Xidian University
Priority to CN202111071199.3A priority Critical patent/CN113986791B/en
Publication of CN113986791A publication Critical patent/CN113986791A/en
Application granted granted Critical
Publication of CN113986791B publication Critical patent/CN113986791B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F13/00Interconnection of, or transfer of information or other signals between, memories, input/output devices or central processing units
    • G06F13/14Handling requests for interconnection or transfer
    • G06F13/20Handling requests for interconnection or transfer for access to input/output bus
    • G06F13/28Handling requests for interconnection or transfer for access to input/output bus using burst mode transfer, e.g. direct memory access DMA, cycle steal
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/54Interprogram communication
    • G06F9/546Message passing systems or structures, e.g. queues
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L47/00Traffic control in data switching networks
    • H04L47/10Flow control; Congestion control
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2209/00Indexing scheme relating to G06F9/00
    • G06F2209/54Indexing scheme relating to G06F9/54
    • G06F2209/548Queue
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2213/00Indexing scheme relating to interconnection of, or transfer of information or other signals between, memories, input/output devices or central processing units
    • G06F2213/0026PCI express
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Abstract

The invention belongs to the technical field of intelligent network cards, and discloses a method, a system, equipment and a terminal for designing the quick DMA of the intelligent network card, wherein the quick DMA of the intelligent network card comprises an H2CDoorBell register, an H2C flow control scheduling, an H2C descriptor engine, an H2CCMPT engine, a C2HDoorBell register, a C2H flow control scheduling, a C2H descriptor cache alignment, a C2H descriptor engine, a C2HCMPT engine and an interrupt processing module. The adaptive scenes and the processing flow of the multiple descriptors of the invention enable the adaptive scenes of the network card to be richer, effectively solve the problems of low efficiency and the like of the traditional QDMA short packet and provide an effective and feasible scheme for a large number of devices; the implementation mode is more customized, and better service is provided for the TOE service; and establishing effective association between the Session and the queue, and providing good rapid DMA service for the Session Session between the intelligent network cards.

Description

Intelligent network card rapid DMA design method, system, equipment and terminal
Technical Field
The invention belongs to the technical field of intelligent network cards, and particularly relates to a method, a system, equipment and a terminal for quickly designing a DMA (direct memory access) of an intelligent network card.
Background
At present, with the widespread rise of technologies such as 5G communication, internet of things, cloud computing, big data and the like, data traffic is explosively increased, although the performance of a CPU is also continuously developed and improved, the data traffic still becomes a bottleneck of big data processing, TOE (tcp Offload engine) has become a hotspot studied as one of the most effective solutions, but at present, the main application scenarios of TOE are limited to a server processing a large data block, such as: the storage backup and retrieval system and the enterprise database all use large data effective loads, and the special scenes determine that the intelligent network card is more biased to a customized network card, which means that the common DMA can not be suitable for the intelligent network card.
The QDMA is used as the DMA with higher performance, more suitable for bulk data transmission and supporting multi-queue, is undoubtedly the most suitable for the intelligent network card, but the intelligent network card supports two modes of TOE and Bypass in order to adapt to the traditional network card, the two descriptors are correspondingly provided, the scene of being completely compatible with the common network card is ensured, and in order to improve the performance of processing short packets by the DMA, a descriptor specially serving for protocol short frames is added under the two modes of the sending side; the mixed descriptor makes the fast DMA of the intelligent network card have the capability of identifying, analyzing different descriptors and executing corresponding operations, in addition, the TOE adopts a Session interface to interact between a transmission layer and a host, so the fast DMA also needs to establish the association between a queue and a Session; the common QDMA supports a single descriptor and a fixed structure, and is not completely suitable for an intelligent network card.
The QDMA is a high-performance DMA based on multiple queues, and by creating a plurality of Queue Pairs in the memory, the Queue Pairs are respectively SQ (sub-transmission Queue) and CQ (completion Queue), the SQ and the CQ have certain depth, and the number and the depth can be dynamically configured; for SQ, the host is its producer (host writes pointer to tail of SQ), the network card QDMA engine is its consumer (network card QDMA engine fetches descriptor instruction execution from head of SQ); for CQ, the QDMA engine on the network card is its producer (the QDMA engine on the network card writes the execution result of the corresponding SQ descriptor instruction to the tail of the CQ), the host is its consumer (the host reads the execution result from the head of the CQ), when the host receives the CQ, the host correspondingly releases the corresponding SQ according to the execution result, in order to ensure that the donut team does not have the condition of rewriting or missing reading, a DoorBell register is maintained in the QDMA engine on the network card to record the head pointer and the tail pointer of SQ and CQ, each SQ or CQ has its own DoorBell register, the host maintains the tail pointer and the head pointer of CQ, and the QDMA engine on the network card maintains the head pointer and the tail pointer of CQ, and by this way, the descriptor interaction between the host and the network card is realized.
XILINX derives a high-performance QDMA, supports 2K ring queues of H2C (HOST TO CARD: HOST TO network CARD), C2H (CARD TO HOST: network CARD TO HOST) and C2H CMPT (CARD TO HOST complete), completes sending and receiving of data through resolving descriptors by maintaining a Doorbell register in QDMA, supports 2K MSIx interrupts, supports 4 Physical functions and 252 Virtual functions at most, supports 8 MSIx interrupts at most for each Function, and is suitable for working scenes of various common network CARDs.
The QDMA of XILINX has a fixed descriptor parsing format, cannot support a mixed descriptor in a TOE network card, has no concept of Session, and cannot establish a relationship between a queue and the Session; the QDMA of XILINX has only the queue for the CMPT of C2H and no queue for the descriptor execution results for the H2C direction. The way that each Function in QDMA of XILINX supports 8 interrupt vectors at most is undoubtedly convenient for users of multi-Function, but is not suitable for TOE network cards, because unique interrupt vectors cannot be provided for each CMPT queue, and the design complexity of the network card is increased. There is a need for a new fast DMA that adapts TOE mix descriptors and is more flexible and configurable.
Through the above analysis, the problems and defects of the prior art are as follows:
(1) at present, the main application scenes of the TOE are limited to servers processing large data blocks, and a specific scene determines that the intelligent network card is more biased to a customized network card, namely, the common DMA is not necessarily suitable for the intelligent network card.
(2) The common QDMA supports a single descriptor and a fixed structure, and is not completely suitable for an intelligent network card.
(3) The QDMA of XILINX has a fixed descriptor parsing format, cannot support a mixed descriptor in a TOE network card, has no concept of Session, and cannot establish a relationship between a queue and the Session.
(4) The QDMA of XILINX has only the queue for the CMPT of C2H and no queue for the descriptor execution results for the H2C direction.
(5) The QDMA of XILINX supports a maximum of 8 interrupt vectors per Function, but is not suitable for TOE cards because it cannot provide a unique interrupt vector for each CMPT queue, which increases the design complexity of the card.
The difficulty in solving the above problems and defects is: the fast DMA needs to have the capability of identifying and processing different descriptors, and various descriptors are made the most advantage in an adaptive scene; and mapping the Session and the queue, and establishing corresponding relation by the cooperative matching of software and hardware.
The significance of solving the problems and the defects is as follows: the fast DMA provides multi-scene support for the intelligent network card; providing good rapid DMA service for Session Session between intelligent network cards; the scene of removing the multi-Function is more consistent with the application of the intelligent network card, and the complexity of design is reduced by binding the interrupt vector and the queue.
Disclosure of Invention
Aiming at the problems in the prior art, the invention provides a method, a system, equipment and a terminal for designing the quick DMA of an intelligent network card, and particularly relates to a method, a system, equipment and a terminal for designing the quick DMA of the intelligent network card based on a mixed descriptor.
The invention is realized in this way, a method for designing the fast DMA of the intelligent network card, which comprises the following steps:
step one, a Doorbell register of H2C is maintained through an H2C Doorbell register, and relevant information is provided for flow control and scheduling of H2C; scheduling the queue of H2C according to the weight through H2C flow control scheduling, and controlling the flow of the descriptor according to the Credit information on the board;
step two, obtaining descriptors from the HOST side through an H2C descriptor engine according to a scheduling result, storing the descriptors in a classified mode, analyzing the descriptors by the descriptor engine, and transferring data from the HOST side to TOE _ RX;
step three, assembling a corresponding completion message by an H2C CMPT engine according to the descriptor type and the state fed back by the TOE _ RX side, and writing the completion message back to a CMPT queue of the HOST side;
step four: scheduling the queue of the C2H according to the weight and the progress condition of the descriptor engine through C2H flow control scheduling, and controlling the flow of the descriptor according to the Credit information on the board;
step five, maintaining a Doorbell register of the C2H through a C2H Doorbell register, and providing related information for flow control and scheduling of the C2H;
step six, caching the prefetched descriptors according to the queue numbers through the C2H descriptor cache, and giving corresponding state information for lower-level processing;
step seven, acquiring and analyzing the descriptor from the queue through a C2H descriptor engine according to the queue state and the data condition of the local TOE, and moving the data on the TOE _ TX side to the HOST side;
step eight, assembling a finished message according to the use condition of the local descriptor by using a C2H CMPT engine, and sending the finished message to a queue corresponding to the HOST side;
step nine, when the specific CQ of C2H and H2C is completed and an abnormal state occurs, assembling a specific MSIx message through the interrupt processing module and uploading the message to the HOST.
The method comprises the following steps: an effective flow control scheduling mechanism is provided for H2C side data; step two: the normal acquisition of descriptors, the classification of descriptors, is the basis for supporting mixed descriptors. Step three: finishing the write-back of the CQ, and feeding back the cache and the command execution condition to the HOST; step five: in the same step I, an effective flow control scheduling mechanism is provided for the C2H side; the sum of this step is in parallel, representing two directions. Step six: the descriptors are pre-fetched, reducing the latency of DMA transfer of data. Step seven: completing the data movement of the C2H side according to the descriptor; step eight: and finishing writing back the CQ and feeding back the command execution condition to the HOST.
Further, the data processing flow on the H2C side includes:
(1) HOST updates descriptor information to a hardware board card by configuring a Doorbell at the H2C side;
(2) flow control scheduling selects descriptors of corresponding queues to obtain;
(3) sending a read application to the AXI Slave through the DMA engine through the processing of length, 4K boundary and the like through the relevant information given by the Doorbell register;
(4) after the description is obtained, firstly, classifying and identifying the description, and transmitting the corresponding type into different descriptor engines according to the OPCODE in the descriptor;
(5) if the descriptor is the TOE data descriptor, sending a read data application to the AXI Slave through the DMA engine according to the address and the length in the descriptor through the processing of the length, the 4K boundary and the like, transmitting the return data to the TOE, and judging whether the LAST bit of the descriptor is 1 or not;
(6) if the LAST bit is 0, continuing to acquire the descriptor of the same Session until the LAST is 1, and repeatedly executing the step (5) in the process;
(7) if the LAST bit is 1, waiting for the feedback state of the TOE, the corresponding cache space on the TOE and the feedback of the execution result, and assembling the message according to the information;
(8) sending a data writing request to the AXI Slave through the DMA engine after the assembled completion message is processed by length, 4K boundary and the like, writing the completion message into a corresponding CMPT queue, and updating the Doorbell of the CQ;
(9) MSIx interruption uploading of corresponding vector numbers is assembled;
(10) if the command descriptor is the TOE command descriptor, the descriptor carries the data to be transmitted at this time, so that the information load is directly sent to the TOE, the feedback state of the TOE, the cache space of the corresponding command on the TOE and the execution result are waited, and the message is assembled according to the information;
(11) similarly executing step (8) and step (9) to complete the execution of the current descriptor;
(12) if the descriptor is a Bypass data descriptor, the executed data is independent and is transmitted through the TOE, so that a read data application is sent to the AXI Slave through the DMA engine according to the address and the length in the descriptor through the processing of the length, the 4K boundary and the like, and the returned data is transmitted to the TOE;
(13) the DMA self assembles a completion message, including the execution result of the descriptor;
(14) similarly executing step (8) and step (9) to complete the execution of the descriptor;
(15) if the command descriptor is the Bypass command descriptor, the command which the Bypass wants to give to the TOE is carried in the descriptor, so that the information load is directly sent to the TOE, and the message is assembled by the TOE;
(16) similarly, step (8) and step (9) are executed to complete the execution of the current descriptor.
Further, the data processing flow on the C2H side includes:
(1) HOST updates descriptor information to a hardware board card by configuring Doorbell at the C2H side;
(2) flow control scheduling selects descriptors of corresponding queues for prefetching;
(3) sending a read application to the AXI Slave through the DMA engine through the processing of length, 4K boundary and the like through the relevant information given by the Doorbell register;
(4) storing the received descriptor information into a corresponding cache according to the queue, and updating the cache information in real time;
(5) if the descriptor is the TOE data descriptor, extracting and pre-analyzing the description degree of each queue;
(6) when the TOE has data to be sent to the DMA, the data is moved according to the corresponding pre-resolved descriptor, and if the corresponding descriptor does not exist, the TOE actively applies for the data;
(7) because a plurality of data share one descriptor or one data share a plurality of descriptors, if one data occupies a plurality of descriptors, the space use condition of the last descriptor is recorded, if a plurality of data share one descriptor, the use condition of the descriptor is timed, once overtime, the transmission is ended in advance, and the descriptor is prevented from being used for too long time;
(8) sending a data writing request to the AXI Slave through the DMA engine according to the information in the descriptor through the processing of length, 4K boundary and the like, and writing the data to a corresponding address;
(9) according to the descriptor space use condition, whether the message is assembled according to the information such as overtime and the like, the assembled completed message is processed by a DMA engine through length, 4K boundary and the like to send a data writing request to AXI Slave, the completed message is written into a corresponding CMPT queue, and simultaneously, the Doorbell of CQ is updated;
(10) MSIx interruption uploading of corresponding vector numbers is assembled;
(11) if the descriptor is the Bypass data descriptor, the descriptor does not need to be parsed;
(12) when the TOE has Bypass type data to send to the DMA, the description degree of a corresponding queue is extracted from the cache region, and the Bypass type data does not have the condition of common or multiple descriptors, so that the data and the descriptors are completely in one-to-one relationship;
(13) sending a data writing request to the AXI Slave through the DMA engine according to the information in the descriptor through the processing of length, 4K boundary and the like, and writing the data to a corresponding address;
(14) assembling a completion message according to information such as the use condition of a descriptor space, sending a write data request to the AXI Slave through the DMA engine through the length, the 4K boundary and the like, writing the completion message into a corresponding CMPT queue, and updating the Doorbell of the CQ;
(15) and (5) executing the step (10) to finish the transmission.
Another objective of the present invention is to provide a fast DMA for an intelligent network card designed by applying the design method for a fast DMA for an intelligent network card, where the fast DMA for an intelligent network card supports a hybrid descriptor, and includes four descriptors, i.e., a TOE data descriptor, a Bypass data descriptor, a TOE command descriptor directly carrying short packet protocol data, and a Bypass command descriptor, in the direction of H2C; the C2H direction comprises two descriptors, namely a TOE data descriptor and a Bypass data descriptor; the C2H and H2C directions both contain respective completion descriptors; the whole system comprises eight descriptors; each Session is bound to a unique queue in the H2C direction and embedded in the descriptor command, each Session is also bound to a unique queue in the C2H direction, and the CPU configures the binding relationship and is embedded in the completion descriptor command.
Further, the quick DMA of intelligent network card still includes:
the H2C Doorbell register is used for maintaining the Doorbell register of the H2C and providing related information for flow control and scheduling of the H2C;
H2C flow control scheduling, which is used for scheduling the queue of H2C according to the weight and controlling the flow of the descriptor according to the Credit information on the board;
the H2C descriptor engine is used for acquiring descriptors from the HOST side according to the scheduling result, storing the descriptors in a classified manner, analyzing the descriptors by the descriptor engine, and transferring data from the HOST side to TOE _ RX;
the H2C CMPT engine is used for assembling a corresponding completion message according to the descriptor type and the state fed back by the TOE _ RX side and writing the completion message back to the CMPT queue of the HOST side;
the C2H Doorbell register is used for maintaining the Doorbell register of the C2H and providing related information for the flow control and scheduling of the C2H;
C2H flow control scheduling, which is used for scheduling the queue of C2H according to the weight and the progress condition of the descriptor engine, and controlling the flow of the descriptor according to the Credit information on the board;
the C2H descriptor buffer alignment is used for buffering prefetched descriptors according to queue numbers and giving corresponding state information for lower-level processing;
the C2H descriptor engine is used for acquiring and analyzing descriptors from the queue according to the queue state and the data condition of the local TOE, and moving the data of the TOE _ TX side to the HOST side;
the C2H CMPT engine is used for assembling a completion message according to the use condition of the local descriptor and sending the completion message to a queue corresponding to the HOST side;
and the interrupt processing module is used for assembling a specific MSIx message and uploading the assembled MSIx message to the HOST when the specific CQ of the C2H and the H2C is completed and an abnormal state occurs.
Another object of the present invention is to provide a fast DMA design system for an intelligent network card using the fast DMA design method for an intelligent network card, the fast DMA design system for an intelligent network card comprising:
the H2C information acquisition module is used for maintaining a Doorbell register of H2C through an H2C Doorbell register and providing related information for flow control and scheduling of H2C;
the H2C flow control scheduling module is used for scheduling the queue of H2C according to the weight through H2C flow control scheduling and controlling the flow of the descriptor according to the Credit information on the board;
an H2C descriptor parsing module, configured to obtain descriptors from the HOST side according to a scheduling result through an H2C descriptor engine, store the descriptors in a classified manner, and the descriptor engine parses the descriptors and transfers data from the HOST side to TOE _ RX;
the H2C register maintenance module is used for assembling a corresponding completion message according to the descriptor type and the state fed back by the TOE _ RX side through the H2C CMPT engine and writing the completion message back to the CMPT queue of the HOST side;
the C2H information obtaining module is configured to maintain a DoorBell register of C2H through a C2H DoorBell register, and provide related information for flow control and scheduling of C2H;
the C2H flow control module is used for scheduling the queue of the C2H according to the weight and the status of the descriptor engine through C2H flow control scheduling, and controlling the flow of the descriptor according to the Credit information on the board;
a C2H descriptor buffer module for buffering prefetched descriptors according to the queue number through the C2H descriptor buffer, and giving corresponding state information for lower-level processing;
the C2H descriptor parsing module is used for acquiring and parsing descriptors from the queue through the C2H descriptor engine according to the queue state and the data condition of the local TOE, and moving the data of the TOE _ TX side to the HOST side;
the message assembly module is used for assembling a finished message according to the use condition of the local descriptor through the C2H CMPT engine and sending the finished message to a queue corresponding to the HOST side;
and the message uploading module is used for assembling a specific MSIx message through the interrupt processing module and uploading the MSIx message to the HOST when the specific CQ of the C2H and the H2C is completed and an abnormal state occurs.
It is a further object of the invention to provide a computer device comprising a memory and a processor, the memory storing a computer program which, when executed by the processor, causes the processor to perform the steps of:
the Doorbell register of H2C is maintained through the H2C Doorbell register, and relevant information is provided for flow control and scheduling of H2C; scheduling the queue of H2C according to the weight through H2C flow control scheduling, and controlling the flow of the descriptor according to the Credit information on the board; obtaining descriptors from the HOST side through an H2C descriptor engine according to a scheduling result, storing the descriptors in a classified manner, analyzing the descriptors by the descriptor engine, and transferring data from the HOST side to TOE _ RX; assembling a corresponding completion message by an H2C CMPT engine according to the descriptor type and the state fed back by the TOE _ RX side, and writing the completion message back to a CMPT queue of the HOST side; the Doorbell register of C2H is maintained through a C2H Doorbell register, and relevant information is provided for the flow control and scheduling of C2H; scheduling the queue of the C2H according to the weight and the progress condition of the descriptor engine through C2H flow control scheduling, and controlling the flow of the descriptor according to the Credit information on the board; the prefetched descriptors are cached according to the queue number through the C2H descriptor cache, and corresponding state information is given for lower-level processing; acquiring and analyzing the descriptor from the queue through a C2H descriptor engine according to the queue state and the data condition of the local TOE, and moving the data on the TOE _ TX side to the HOST side; assembling a finished message according to the use condition of the local descriptor by using a C2H CMPT engine, and sending the finished message to a queue corresponding to the HOST side; when the specific CQ of C2H and H2C is completed and an abnormal state occurs, a specific MSIx message is assembled by the interrupt processing module and uploaded to HOST.
It is another object of the present invention to provide a computer-readable storage medium storing a computer program which, when executed by a processor, causes the processor to perform the steps of:
the Doorbell register of H2C is maintained through the H2C Doorbell register, and relevant information is provided for flow control and scheduling of H2C; scheduling the queue of H2C according to the weight through H2C flow control scheduling, and controlling the flow of the descriptor according to the Credit information on the board; obtaining descriptors from the HOST side through an H2C descriptor engine according to a scheduling result, storing the descriptors in a classified manner, analyzing the descriptors by the descriptor engine, and transferring data from the HOST side to TOE _ RX; assembling a corresponding completion message by an H2C CMPT engine according to the descriptor type and the state fed back by the TOE _ RX side, and writing the completion message back to a CMPT queue of the HOST side; the Doorbell register of C2H is maintained through a C2H Doorbell register, and relevant information is provided for the flow control and scheduling of C2H; scheduling the queue of the C2H according to the weight and the progress condition of the descriptor engine through C2H flow control scheduling, and controlling the flow of the descriptor according to the Credit information on the board; the prefetched descriptors are cached according to the queue number through the C2H descriptor cache, and corresponding state information is given for lower-level processing; acquiring and analyzing the descriptor from the queue through a C2H descriptor engine according to the queue state and the data condition of the local TOE, and moving the data on the TOE _ TX side to the HOST side; assembling a finished message according to the use condition of the local descriptor by using a C2H CMPT engine, and sending the finished message to a queue corresponding to the HOST side; when the specific CQ of C2H and H2C is completed and an abnormal state occurs, a specific MSIx message is assembled by the interrupt processing module and uploaded to HOST.
The invention also aims to provide the application of the intelligent network card rapid DMA in the server.
Another objective of the present invention is to provide an information data processing terminal, where the information data processing terminal is used to implement the fast DMA design system for the intelligent network card.
By combining all the technical schemes, the invention has the advantages and positive effects that: the invention provides a new QDMA engine, which can support various descriptors and establish a clear mapping relation between Session and alignment; adding a CMPT queue of H2C, and providing a result of the execution of the descriptor instruction and an internal cache state of the TOE network card for the host; the Function constraint is removed and a unique interrupt vector is provided for each queue.
The invention provides a realization method for supporting the quick DMA of the intelligent network card mixed descriptor, and an adaptive scene and a processing flow of various descriptors; the adaptive scene of the network card is richer, the problems of low efficiency and the like of the traditional QDMA short packet are effectively solved, and an effective and feasible scheme is provided for a large number of devices.
The implementation mode of the invention is more customized, the invention is more suitable for various scenes by increasing part of complexity, the use of the mixed descriptor is perfectly compatible with the common network card, and better service is provided for the TOE service; and establishing effective association between the Session and the queue, and providing good rapid DMA service for the Session Session between the intelligent network cards.
The invention adds CMPT queues at two sides, effectively enables HOST to acquire information such as lower-layer cache, completion state and the like, provides an information basis for effectively realizing a flow control mechanism, and also ensures that data processing is more guaranteed. The invention reduces the pressure of software and hardware interrupt design, and provides convenience for the interrupt design by adopting the modes of interrupt and queue binding.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings needed to be used in the embodiments of the present invention will be briefly described below, and it is obvious that the drawings described below are only some embodiments of the present invention, and it is obvious for those skilled in the art that other drawings can be obtained according to the drawings without creative efforts.
Fig. 1 is a flowchart of a method for designing a fast DMA of an intelligent network card according to an embodiment of the present invention.
FIG. 2 is a TOE QDMA framework diagram provided by the embodiment of the present invention.
Fig. 3 is a flow chart of data processing on the H2C side according to an embodiment of the present invention.
Fig. 4 is a data processing flowchart of the C2H side according to an embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is further described in detail with reference to the following embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.
Aiming at the problems in the prior art, the invention provides a method, a system, equipment and a terminal for designing the fast DMA of an intelligent network card, and the invention is described in detail with reference to the attached drawings.
As shown in fig. 1, the method for designing a fast DMA of an intelligent network card according to an embodiment of the present invention includes the following steps:
s101, maintaining a Doorbell register of H2C through an H2C Doorbell register, and providing related information for flow control and scheduling of H2C; scheduling the queue of H2C according to the weight through H2C flow control scheduling, and controlling the flow of the descriptor according to the Credit information on the board;
s102, obtaining descriptors from the HOST side through an H2C descriptor engine according to a scheduling result, storing the descriptors in a classified mode, analyzing the descriptors by the descriptor engine, and transferring data from the HOST side to TOE _ RX;
s103, assembling a corresponding completion message by the H2C CMPT engine according to the descriptor type and the state fed back by the TOE _ RX side, and writing the completion message back to the CMPT queue of the HOST side;
s104, scheduling the queue of the C2H according to the weight and the progress condition of the descriptor engine through C2H flow control scheduling, and controlling the flow of the descriptor according to the Credit information on the board;
s105, maintaining a Doorbell register of C2H through a C2H Doorbell register, and providing related information for flow control and scheduling of C2H;
s106, caching the prefetched descriptors according to the queue numbers through the C2H descriptor cache, and giving corresponding state information for lower-level processing;
s107, obtaining and analyzing the descriptor from the queue through a C2H descriptor engine according to the queue state and the data condition of the local TOE, and moving the data on the TOE _ TX side to the HOST side;
s108, assembling a finished message according to the use condition of the local descriptor by using the C2H CMPT engine, and sending the finished message to a queue corresponding to the HOST side;
s109: when the specific CQ of C2H and H2C is completed and an abnormal state occurs, a specific MSIx message is assembled by the interrupt processing module and uploaded to HOST.
The intelligent network card fast DMA provided by the embodiment of the invention supports the mixed descriptor, and comprises four descriptors, namely a TOE data descriptor, a Bypass data descriptor, a TOE command descriptor directly carrying short packet protocol data and a Bypass command descriptor, in the direction of H2C; the C2H direction comprises two descriptors, namely a TOE data descriptor and a Bypass data descriptor; the C2H and H2C directions both contain respective completion descriptors; the whole system comprises eight descriptors; each Session is bound to a unique queue in the H2C direction and embedded in the descriptor command, each Session is also bound to a unique queue in the C2H direction, and the CPU configures the binding relationship and is embedded in the completion descriptor command.
As shown in fig. 2, the fast DMA of the intelligent network card provided by the embodiment of the present invention includes:
the H2C Doorbell register is used for maintaining the Doorbell register of the H2C and providing related information for flow control and scheduling of the H2C;
H2C flow control scheduling, which is used for scheduling the queue of H2C according to the weight and controlling the flow of the descriptor according to the Credit information on the board;
the H2C descriptor engine is used for acquiring descriptors from the HOST side according to the scheduling result, storing the descriptors in a classified manner, analyzing the descriptors by the descriptor engine, and transferring data from the HOST side to TOE _ RX;
the H2C CMPT engine is used for assembling a corresponding completion message according to the descriptor type and the state fed back by the TOE _ RX side and writing the completion message back to the CMPT queue of the HOST side;
the C2H Doorbell register is used for maintaining the Doorbell register of the C2H and providing related information for the flow control and scheduling of the C2H;
C2H flow control scheduling, which is used for scheduling the queue of C2H according to the weight and the progress condition of the descriptor engine, and controlling the flow of the descriptor according to the Credit information on the board;
the C2H descriptor buffer alignment is used for buffering prefetched descriptors according to queue numbers and giving corresponding state information for lower-level processing;
the C2H descriptor engine is used for acquiring and analyzing descriptors from the queue according to the queue state and the data condition of the local TOE, and moving the data of the TOE _ TX side to the HOST side;
the C2H CMPT engine is used for assembling a completion message according to the use condition of the local descriptor and sending the completion message to a queue corresponding to the HOST side;
and the interrupt processing module is used for assembling a specific MSIx message and uploading the assembled MSIx message to the HOST when the specific CQ of the C2H and the H2C is completed and an abnormal state occurs.
The quick DMA design system of the intelligent network card provided by the embodiment of the invention comprises:
the H2C information acquisition module is used for maintaining a Doorbell register of H2C through an H2C Doorbell register and providing related information for flow control and scheduling of H2C;
the H2C flow control scheduling module is used for scheduling the queue of H2C according to the weight through H2C flow control scheduling and controlling the flow of the descriptor according to the Credit information on the board;
an H2C descriptor parsing module, configured to obtain descriptors from the HOST side according to a scheduling result through an H2C descriptor engine, store the descriptors in a classified manner, and the descriptor engine parses the descriptors and transfers data from the HOST side to TOE _ RX;
the H2C register maintenance module is used for assembling a corresponding completion message according to the descriptor type and the state fed back by the TOE _ RX side through the H2C CMPT engine and writing the completion message back to the CMPT queue of the HOST side;
the C2H information obtaining module is configured to maintain a DoorBell register of C2H through a C2H DoorBell register, and provide related information for flow control and scheduling of C2H;
the C2H flow control module is used for scheduling the queue of the C2H according to the weight and the status of the descriptor engine through C2H flow control scheduling, and controlling the flow of the descriptor according to the Credit information on the board;
a C2H descriptor buffer module for buffering prefetched descriptors according to the queue number through the C2H descriptor buffer, and giving corresponding state information for lower-level processing;
the C2H descriptor parsing module is used for acquiring and parsing descriptors from the queue through the C2H descriptor engine according to the queue state and the data condition of the local TOE, and moving the data of the TOE _ TX side to the HOST side;
the message assembly module is used for assembling a finished message according to the use condition of the local descriptor through the C2H CMPT engine and sending the finished message to a queue corresponding to the HOST side;
and the message uploading module is used for assembling a specific MSIx message through the interrupt processing module and uploading the MSIx message to the HOST when the specific CQ of the C2H and the H2C is completed and an abnormal state occurs.
The technical solution of the present invention will be further described with reference to the following explanation of terms.
Tcp (transmission Control protocol): is a connection-oriented, reliable, byte-stream-based transport-layer communication protocol (traditionally implemented in software).
Toe (tcp offroad engine): the TCP unloading engine is a TCP hardware acceleration technology, and achieves the purpose of acceleration by realizing a TCP protocol on hardware.
Dma (direct memoryaccess): the method is an interface technology that external equipment directly exchanges data with a system memory without a CPU, and the transmission is directly controlled without the CPU in the whole transmission process, so that the efficiency of the CPU is greatly improved.
QDMA (quick direct MemoryAccess): fast direct memory access, the main difference with other DMAs, is the concept of queues.
Session: the TCP process establishes a connection-oriented session.
The technical solution of the present invention is further described below with reference to specific examples.
The QDMA of XILINX has a fixed descriptor parsing format, cannot support a mixed descriptor in a TOE network card, has no concept of Session, and cannot establish a relationship between a queue and the Session; to solve the problem, the invention provides a new QDMA engine which can support multiple types of descriptors and establish an explicit mapping relationship between Session and opposite column.
The QDMA of XILINX only has the queue of CMPT of C2H, does not have the queue for providing the descriptor execution result for the H2C direction, and in order to feed back the state of the TOE network card and facilitate the mechanism that software provides flow control and the like for the network card, the invention adds the CMPT queue of H2C and provides the descriptor instruction execution result and the internal cache state of the TOE network card for the host.
The method of the invention is not suitable for the TOE network card of the invention, because the unique interrupt vector cannot be provided for each CMPT queue, and the design complexity of the network card is increased, so the invention removes the limitation of the Function and provides the unique interrupt vector for each queue.
The overall structure of the solution is explained below.
A fast DMA of an intelligent network card supporting mixed descriptors comprises four descriptors, namely a TOE data descriptor, a Bypass data descriptor, a TOE command descriptor directly carrying short packet protocol data and a Bypass command descriptor, in the direction of H2C; the C2H direction comprises two descriptors, namely a TOE data descriptor and a Bypass data descriptor; the C2H and H2C directions both contain respective completion descriptors; the whole system contains eight descriptors. Each Session is bound to a unique queue in the H2C direction and embedded in the descriptor command, each Session is also bound to a unique queue in the C2H direction, and the CPU configures the binding relationship and is embedded in the completion descriptor command.
The TOE QDMA framework is shown in fig. 2.
(1) H2C DoorBell register: and maintaining a Doorbell register of H2C, and providing relevant information for flow control and scheduling of H2C.
(2) H2C flow control scheduling: the queues of H2C are scheduled according to weight, and descriptors are flow-controlled according to the Credit information on the board.
(3) H2C descriptor engine: and obtaining descriptors from the HOST side according to the scheduling result, storing the descriptors in a classified manner, analyzing the descriptors by a descriptor engine, and carrying data from the HOST side to the TOE _ RX.
(4) H2C CMPT engine: and assembling a corresponding completion message according to the descriptor type and the state fed back by the TOE _ RX side, and writing the completion message back to the CMPT queue of the HOST side.
(5) C2H DoorBell register: and maintaining a Doorbell register of the C2H to provide relevant information for flow control and scheduling of the C2H.
(6) C2H flow control scheduling: the queue of C2H is scheduled according to the weight and the progress condition of the descriptor engine, and the flow control is carried out on the descriptor according to the Credit information on the board.
(7) C2H descriptor cache alignment: the prefetched descriptors are cached according to the queue number and corresponding state information is given for the next processing.
(8) C2H descriptor engine: and acquiring and analyzing the descriptor from the queue according to the queue state and the data condition of the local TOE, and moving the data of the TOE _ TX side to the HOST side.
(9) C2H CMPT engine: and assembling a finished message according to the use condition of the local descriptor, and sending the finished message to a queue corresponding to the HOST side.
(10) An interrupt processing module: when specific CQ of C2H and H2C is completed and an abnormal state occurs, a specific MSIx message is assembled and uploaded to HOST.
Introduction of data processing flow on the H2C side:
the data flow diagram is shown in fig. 3.
(1) HOST updates the descriptor information to the hardware board card by configuring Doorbell at the H2C side.
(2) And the flow control scheduling selects the descriptors of the corresponding queues for obtaining.
(3) And sending a read application to the AXI Slave through the DMA engine through the length, the 4K boundary and the like through the relevant information given by the Doorbell register.
(4) After the description is obtained, firstly, the description degree is classified and identified, and the corresponding type is transmitted into different descriptor engines according to the OPCODE in the descriptor.
(5) If the descriptor is the TOE data descriptor, sending a read data application to the AXI Slave through the DMA engine according to the address and the length in the descriptor through the processing of the length, the 4K boundary and the like, transmitting the return data to the TOE, and judging whether the LAST bit of the descriptor is 1 or not;
(6) if the LAST bit is 0, continuing to acquire the descriptor of the same Session until the LAST is 1, and repeatedly executing the process (5);
(7) and if the LAST bit is 1, waiting for the state fed back by the TOE, mainly feeding back the corresponding cache space on the TOE and the execution result, and assembling the message according to the information.
(8) And sending a write data request to the AXI Slave through the DMA engine after the assembled completion message is processed by the length, the 4K boundary and the like, writing the completion message into the corresponding CMPT queue, and updating the Doorbell of the CQ at the same time.
(9) MSIx interrupt upload assembling the corresponding vector number.
(10) If the command descriptor is the TOE command descriptor, the descriptor carries the data to be transmitted at this time, so that the information load is directly sent to the TOE, the feedback state of the TOE is waited, mainly the cache space and the execution result of the corresponding command on the TOE, and the message is assembled according to the information.
(11) And similarly executing the processes (8) and (9) to finish the execution of the descriptor.
(12) If the descriptor is a Bypass data descriptor, the executed data is independent and is transmitted through the TOE, so that a read data application is sent to the AXI Slave through the DMA engine according to the address and the length in the descriptor through the processing of length, 4K boundary and the like, and the returned data is transmitted to the TOE.
(13) The DMA self assembles a completion message, and mainly comprises an execution result of the descriptor.
(14) And similarly executing the processes (8) and (9) to finish the execution of the descriptor.
(15) If the command descriptor is the Bypass command descriptor, the command which is carried in the descriptor and wants Bypass to the TOE at this time, so that the information load is directly sent to the TOE, and the message is assembled by the TOE.
(16) And similarly executing the processes (8) and (9) to finish the execution of the descriptor.
Introduction of data processing flow on the C2H side:
the data flow diagram is shown in fig. 4.
(1) HOST updates the descriptor information to the hardware board card by configuring Doorbell at the C2H side.
(2) And the flow control scheduling selects the descriptors of the corresponding queue to be prefetched.
(3) And sending a read application to the AXI Slave through the DMA engine through the length, the 4K boundary and the like through the relevant information given by the Doorbell register.
(4) And storing the received descriptor information into a corresponding cache according to the queue, and updating the cache information in real time.
(5) If the TOE data descriptor exists, the description degree of each queue is extracted and pre-resolved.
(6) And when the TOE has data to be sent to the DMA, moving the data according to the corresponding pre-resolved descriptor, and actively applying if the corresponding descriptor does not exist.
(7) Because a plurality of data share one descriptor or one data share a plurality of descriptors, if one data occupies a plurality of descriptors, the space use condition of the last descriptor is recorded, if a plurality of data share one descriptor, the use condition of the descriptor needs to be timed, once the time is out, the transmission is ended in advance, and the phenomenon that the descriptor is not used for too long time is avoided.
(8) And sending a write data request to the AXI Slave through the DMA engine according to the information in the descriptor through processing such as length, 4K boundary and the like, and writing the data to a corresponding address.
(9) And according to the descriptor space use condition, whether the message is overtime and other information, the assembled completion message is processed by the DMA engine through length, 4K boundary and the like to send a write data request to the AXI Slave, the completion message is written into the corresponding CMPT queue, and the Doorbell of the CQ is updated at the same time.
(10) MSIx interrupt upload assembling the corresponding vector number.
(11) If it is a Bypass data descriptor, then the descriptor need not be parsed.
(12) When the TOE has the Bypass type data to send to the DMA, the description degree of the corresponding queue is extracted from the cache region, and the Bypass type data does not have the condition of sharing or multiple descriptors, so that the data and the descriptors are completely in one-to-one relationship.
(13) And sending a write data request to the AXI Slave through the DMA engine according to the information in the descriptor through processing such as length, 4K boundary and the like, and writing the data to a corresponding address.
(14) And assembling a completion message according to information such as the use condition of the descriptor space, sending a write data request to the AXI Slave through the DMA engine through the processing of length, 4K boundary and the like, writing the completion message into a corresponding CMPT queue, and updating the Doorbell of the CQ at the same time.
(15) The same procedure (10) is executed to complete the transmission.
The invention provides a realization method for supporting the quick DMA of the intelligent network card mixed descriptor, and an adaptive scene and a processing flow of various descriptors; the adaptive scene of the network card is richer, the problems of low efficiency and the like of the traditional QDMA short packet are effectively solved, and an effective and feasible scheme is provided for a large number of devices.
The implementation mode of the invention is more customized, the invention is more suitable for various scenes by increasing part of complexity, the use of the mixed descriptor is perfectly compatible with the common network card, and better service is provided for the TOE service; and establishing effective association between the Session and the queue, and providing good rapid DMA service for the Session Session between the intelligent network cards.
The invention adds CMPT queues at two sides, effectively enables HOST to acquire information such as lower-layer cache, completion state and the like, provides an information basis for effectively realizing a flow control mechanism, and also ensures that data processing is more guaranteed. The invention reduces the pressure of software and hardware interrupt design, and provides convenience for the interrupt design by adopting the modes of interrupt and queue binding.
In the above embodiments, the implementation may be wholly or partially realized by software, hardware, firmware, or any combination thereof. When used in whole or in part, can be implemented in a computer program product that includes one or more computer instructions. When loaded or executed on a computer, cause the flow or functions according to embodiments of the invention to occur, in whole or in part. The computer may be a general purpose computer, a special purpose computer, a network of computers, or other programmable device. The computer instructions may be stored in a computer readable storage medium or transmitted from one computer readable storage medium to another, for example, the computer instructions may be transmitted from one website site, computer, server, or data center to another website site, computer, server, or data center via wire (e.g., coaxial cable, fiber optic, Digital Subscriber Line (DSL), or wireless (e.g., infrared, wireless, microwave, etc.)). The computer-readable storage medium can be any available medium that can be accessed by a computer or a data storage device, such as a server, a data center, etc., that includes one or more of the available media. The usable medium may be a magnetic medium (e.g., floppy Disk, hard Disk, magnetic tape), an optical medium (e.g., DVD), or a semiconductor medium (e.g., Solid State Disk (SSD)), among others.
The above description is only for the purpose of illustrating the present invention and the appended claims are not to be construed as limiting the scope of the invention, which is intended to cover all modifications, equivalents and improvements that are within the spirit and scope of the invention as defined by the appended claims.

Claims (10)

1. A quick DMA design method of an intelligent network card is characterized by comprising the following steps:
step one, a Doorbell register of H2C is maintained through an H2C Doorbell register, and relevant information is provided for flow control and scheduling of H2C; scheduling the queue of H2C according to the weight through H2C flow control scheduling, and controlling the flow of the descriptor according to the Credit information on the board;
step two, obtaining descriptors from the HOST side through an H2C descriptor engine according to a scheduling result, storing the descriptors in a classified mode, analyzing the descriptors by the descriptor engine, and transferring data from the HOST side to TOE _ RX;
step three, assembling a corresponding completion message by an H2C CMPT engine according to the descriptor type and the state fed back by the TOE _ RX side, and writing the completion message back to a CMPT queue of the HOST side;
fourthly, scheduling the queue of the C2H according to the weight and the progress condition of the descriptor engine through C2H flow control scheduling, and controlling the flow of the descriptor according to the Credit information on the board;
step five, maintaining a Doorbell register of the C2H through a C2H Doorbell register, and providing related information for flow control and scheduling of the C2H;
step six, caching the prefetched descriptors according to the queue numbers through the C2H descriptor cache, and giving corresponding state information for lower-level processing;
step seven, acquiring and analyzing the descriptor from the queue through a C2H descriptor engine according to the queue state and the data condition of the local TOE, and moving the data on the TOE _ TX side to the HOST side;
step eight, assembling a finished message according to the use condition of the local descriptor by using a C2H CMPT engine, and sending the finished message to a queue corresponding to the HOST side;
step nine, when the specific CQ of C2H and H2C is completed and an abnormal state occurs, assembling a specific MSIx message through the interrupt processing module and uploading the message to the HOST.
2. The method for designing the fast DMA of the intelligent network card according to claim 1, wherein the data processing flow at the H2C side includes:
(1) HOST updates descriptor information to a hardware board card by configuring a Doorbell at the H2C side;
(2) flow control scheduling selects descriptors of corresponding queues to obtain;
(3) sending a read application to the AXI Slave through the DMA engine through the processing of length, 4K boundary and the like through the relevant information given by the Doorbell register;
(4) after the description is obtained, firstly, classifying and identifying the description, and transmitting the corresponding type into different descriptor engines according to the OPCODE in the descriptor;
(5) if the descriptor is the TOE data descriptor, sending a read data application to the AXI Slave through the DMA engine according to the address and the length in the descriptor through the processing of the length, the 4K boundary and the like, transmitting the return data to the TOE, and judging whether the LAST bit of the descriptor is 1 or not;
(6) if the LAST bit is 0, continuing to acquire the descriptor of the same Session until the LAST is 1, and repeatedly executing the step (5) in the process;
(7) if the LAST bit is 1, waiting for the feedback state of the TOE, the corresponding cache space on the TOE and the feedback of the execution result, and assembling the message according to the information;
(8) sending a data writing request to the AXI Slave through the DMA engine after the assembled completion message is processed by length, 4K boundary and the like, writing the completion message into a corresponding CMPT queue, and updating the Doorbell of the CQ;
(9) MSIx interruption uploading of corresponding vector numbers is assembled;
(10) if the command descriptor is the TOE command descriptor, the descriptor carries the data to be transmitted at this time, so that the information load is directly sent to the TOE, the feedback state of the TOE, the cache space of the corresponding command on the TOE and the execution result are waited, and the message is assembled according to the information;
(11) similarly executing step (8) and step (9) to complete the execution of the current descriptor;
(12) if the descriptor is a Bypass data descriptor, the executed data is independent and is transmitted through the TOE, so that a read data application is sent to the AXI Slave through the DMA engine according to the address and the length in the descriptor through the processing of the length, the 4K boundary and the like, and the returned data is transmitted to the TOE;
(13) the DMA self assembles a completion message, including the execution result of the descriptor;
(14) similarly executing step (8) and step (9) to complete the execution of the descriptor;
(15) if the command descriptor is the Bypass command descriptor, the command which the Bypass wants to give to the TOE is carried in the descriptor, so that the information load is directly sent to the TOE, and the message is assembled by the TOE;
(16) similarly, step (8) and step (9) are executed to complete the execution of the current descriptor.
3. The method for designing the fast DMA of the intelligent network card according to claim 1, wherein the data processing flow at the C2H side includes:
(1) HOST updates descriptor information to a hardware board card by configuring Doorbell at the C2H side;
(2) flow control scheduling selects descriptors of corresponding queues for prefetching;
(3) sending a read application to the AXI Slave through the DMA engine through the processing of length, 4K boundary and the like through the relevant information given by the Doorbell register;
(4) storing the received descriptor information into a corresponding cache according to the queue, and updating the cache information in real time;
(5) if the descriptor is the TOE data descriptor, extracting and pre-analyzing the description degree of each queue;
(6) when the TOE has data to be sent to the DMA, the data is moved according to the corresponding pre-resolved descriptor, and if the corresponding descriptor does not exist, the TOE actively applies for the data;
(7) because a plurality of data share one descriptor or one data share a plurality of descriptors, if one data occupies a plurality of descriptors, the space use condition of the last descriptor is recorded, if a plurality of data share one descriptor, the use condition of the descriptor is timed, once overtime, the transmission is ended in advance, and the descriptor is prevented from being used for too long time;
(8) sending a data writing request to the AXI Slave through the DMA engine according to the information in the descriptor through the processing of length, 4K boundary and the like, and writing the data to a corresponding address;
(9) according to the descriptor space use condition, whether the message is assembled according to the information such as overtime and the like, the assembled completed message is processed by a DMA engine through length, 4K boundary and the like to send a data writing request to AXI Slave, the completed message is written into a corresponding CMPT queue, and simultaneously, the Doorbell of CQ is updated;
(10) MSIx interruption uploading of corresponding vector numbers is assembled;
(11) if the descriptor is the Bypass data descriptor, the descriptor does not need to be parsed;
(12) when the TOE has Bypass type data to send to the DMA, the description degree of a corresponding queue is extracted from the cache region, and the Bypass type data does not have the condition of common or multiple descriptors, so that the data and the descriptors are completely in one-to-one relationship;
(13) sending a data writing request to the AXI Slave through the DMA engine according to the information in the descriptor through the processing of length, 4K boundary and the like, and writing the data to a corresponding address;
(14) assembling a completion message according to information such as the use condition of a descriptor space, sending a write data request to the AXI Slave through the DMA engine through the length, the 4K boundary and the like, writing the completion message into a corresponding CMPT queue, and updating the Doorbell of the CQ;
(15) and (5) executing the step (10) to finish the transmission.
4. An intelligent network card fast DMA designed by applying the intelligent network card fast DMA design method according to any one of claims 1 to 3, characterized in that the intelligent network card fast DMA supports a mixed descriptor, and comprises four descriptors, i.e. a TOE data descriptor, a Bypass data descriptor, a TOE command descriptor directly carrying short packet protocol data and a Bypass command descriptor, in the direction of H2C; the C2H direction comprises two descriptors, namely a TOE data descriptor and a Bypass data descriptor; the C2H and H2C directions both contain respective completion descriptors; the whole system comprises eight descriptors; each Session is bound to a unique queue in the H2C direction and embedded in the descriptor command, each Session is also bound to a unique queue in the C2H direction, and the CPU configures the binding relationship and is embedded in the completion descriptor command.
5. The intelligent network card fast DMA of claim 4, further comprising:
the H2C Doorbell register is used for maintaining the Doorbell register of the H2C and providing related information for flow control and scheduling of the H2C;
H2C flow control scheduling, which is used for scheduling the queue of H2C according to the weight and controlling the flow of the descriptor according to the Credit information on the board;
the H2C descriptor engine is used for acquiring descriptors from the HOST side according to the scheduling result, storing the descriptors in a classified manner, analyzing the descriptors by the descriptor engine, and transferring data from the HOST side to TOE _ RX;
the H2C CMPT engine is used for assembling a corresponding completion message according to the descriptor type and the state fed back by the TOE _ RX side and writing the completion message back to the CMPT queue of the HOST side;
the C2H Doorbell register is used for maintaining the Doorbell register of the C2H and providing related information for the flow control and scheduling of the C2H;
C2H flow control scheduling, which is used for scheduling the queue of C2H according to the weight and the progress condition of the descriptor engine, and controlling the flow of the descriptor according to the Credit information on the board;
the C2H descriptor buffer alignment is used for buffering prefetched descriptors according to queue numbers and giving corresponding state information for lower-level processing;
the C2H descriptor engine is used for acquiring and analyzing descriptors from the queue according to the queue state and the data condition of the local TOE, and moving the data of the TOE _ TX side to the HOST side;
the C2H CMPT engine is used for assembling a completion message according to the use condition of the local descriptor and sending the completion message to a queue corresponding to the HOST side;
and the interrupt processing module is used for assembling a specific MSIx message and uploading the assembled MSIx message to the HOST when the specific CQ of the C2H and the H2C is completed and an abnormal state occurs.
6. An intelligent network card fast DMA design system applying the intelligent network card fast DMA design method according to any one of claims 1 to 3, the intelligent network card fast DMA design system comprising:
the H2C information acquisition module is used for maintaining a Doorbell register of H2C through an H2C Doorbell register and providing related information for flow control and scheduling of H2C;
the H2C flow control scheduling module is used for scheduling the queue of H2C according to the weight through H2C flow control scheduling and controlling the flow of the descriptor according to the Credit information on the board;
an H2C descriptor parsing module, configured to obtain descriptors from the HOST side according to a scheduling result through an H2C descriptor engine, store the descriptors in a classified manner, and the descriptor engine parses the descriptors and transfers data from the HOST side to TOE _ RX;
the H2C register maintenance module is used for assembling a corresponding completion message according to the descriptor type and the state fed back by the TOE _ RX side through the H2C CMPT engine and writing the completion message back to the CMPT queue of the HOST side;
the C2H information obtaining module is configured to maintain a DoorBell register of C2H through a C2H DoorBell register, and provide related information for flow control and scheduling of C2H;
the C2H flow control module is used for scheduling the queue of the C2H according to the weight and the status of the descriptor engine through C2H flow control scheduling, and controlling the flow of the descriptor according to the Credit information on the board;
a C2H descriptor buffer module for buffering prefetched descriptors according to the queue number through the C2H descriptor buffer, and giving corresponding state information for lower-level processing;
the C2H descriptor parsing module is used for acquiring and parsing descriptors from the queue through the C2H descriptor engine according to the queue state and the data condition of the local TOE, and moving the data of the TOE _ TX side to the HOST side;
the message assembly module is used for assembling a finished message according to the use condition of the local descriptor through the C2H CMPT engine and sending the finished message to a queue corresponding to the HOST side;
and the message uploading module is used for assembling a specific MSIx message through the interrupt processing module and uploading the MSIx message to the HOST when the specific CQ of the C2H and the H2C is completed and an abnormal state occurs.
7. A computer device, characterized in that the computer device comprises a memory and a processor, the memory storing a computer program which, when executed by the processor, causes the processor to carry out the steps of:
the Doorbell register of H2C is maintained through the H2C Doorbell register, and relevant information is provided for flow control and scheduling of H2C; scheduling the queue of H2C according to the weight through H2C flow control scheduling, and controlling the flow of the descriptor according to the Credit information on the board; obtaining descriptors from the HOST side through an H2C descriptor engine according to a scheduling result, storing the descriptors in a classified manner, analyzing the descriptors by the descriptor engine, and transferring data from the HOST side to TOE _ RX; assembling a corresponding completion message by an H2C CMPT engine according to the descriptor type and the state fed back by the TOE _ RX side, and writing the completion message back to a CMPT queue of the HOST side; the Doorbell register of C2H is maintained through a C2H Doorbell register, and relevant information is provided for the flow control and scheduling of C2H; scheduling the queue of the C2H according to the weight and the progress condition of the descriptor engine through C2H flow control scheduling, and controlling the flow of the descriptor according to the Credit information on the board; the prefetched descriptors are cached according to the queue number through the C2H descriptor cache, and corresponding state information is given for lower-level processing; acquiring and analyzing the descriptor from the queue through a C2H descriptor engine according to the queue state and the data condition of the local TOE, and moving the data on the TOE _ TX side to the HOST side; assembling a finished message according to the use condition of the local descriptor by using a C2H CMPT engine, and sending the finished message to a queue corresponding to the HOST side;
when the specific CQ of C2H and H2C is completed and an abnormal state occurs, a specific MSIx message is assembled by the interrupt processing module and uploaded to HOST.
8. A computer-readable storage medium storing a computer program which, when executed by a processor, causes the processor to perform the steps of:
the Doorbell register of H2C is maintained through the H2C Doorbell register, and relevant information is provided for flow control and scheduling of H2C; scheduling the queue of H2C according to the weight through H2C flow control scheduling, and controlling the flow of the descriptor according to the Credit information on the board; obtaining descriptors from the HOST side through an H2C descriptor engine according to a scheduling result, storing the descriptors in a classified manner, analyzing the descriptors by the descriptor engine, and transferring data from the HOST side to TOE _ RX; assembling a corresponding completion message by an H2C CMPT engine according to the descriptor type and the state fed back by the TOE _ RX side, and writing the completion message back to a CMPT queue of the HOST side; the Doorbell register of C2H is maintained through a C2H Doorbell register, and relevant information is provided for the flow control and scheduling of C2H; scheduling the queue of the C2H according to the weight and the progress condition of the descriptor engine through C2H flow control scheduling, and controlling the flow of the descriptor according to the Credit information on the board; the prefetched descriptors are cached according to the queue number through the C2H descriptor cache, and corresponding state information is given for lower-level processing; acquiring and analyzing the descriptor from the queue through a C2H descriptor engine according to the queue state and the data condition of the local TOE, and moving the data on the TOE _ TX side to the HOST side; assembling a finished message according to the use condition of the local descriptor by using a C2H CMPT engine, and sending the finished message to a queue corresponding to the HOST side; when the specific CQ of C2H and H2C is completed and an abnormal state occurs, a specific MSIx message is assembled by the interrupt processing module and uploaded to HOST.
9. An application of the intelligent network card fast DMA in the server according to any one of claims 4 to 5.
10. An information data processing terminal, characterized in that the information data processing terminal is used for implementing the intelligent network card fast DMA design system of claim 6.
CN202111071199.3A 2021-09-13 2021-09-13 Method, system, equipment and terminal for designing intelligent network card fast DMA Active CN113986791B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111071199.3A CN113986791B (en) 2021-09-13 2021-09-13 Method, system, equipment and terminal for designing intelligent network card fast DMA

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111071199.3A CN113986791B (en) 2021-09-13 2021-09-13 Method, system, equipment and terminal for designing intelligent network card fast DMA

Publications (2)

Publication Number Publication Date
CN113986791A true CN113986791A (en) 2022-01-28
CN113986791B CN113986791B (en) 2024-02-02

Family

ID=79735740

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111071199.3A Active CN113986791B (en) 2021-09-13 2021-09-13 Method, system, equipment and terminal for designing intelligent network card fast DMA

Country Status (1)

Country Link
CN (1) CN113986791B (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114885045A (en) * 2022-07-07 2022-08-09 浙江锐文科技有限公司 Method and device for saving DMA channel resources in high-speed intelligent network card/DPU
CN115314439A (en) * 2022-08-11 2022-11-08 迈普通信技术股份有限公司 Flow control method and related device for data storage IO (input/output) request
CN116225999A (en) * 2023-05-04 2023-06-06 太初(无锡)电子科技有限公司 DMA data transmission method and system
CN116303173A (en) * 2023-05-19 2023-06-23 深圳云豹智能有限公司 Method, device and system for reducing RDMA engine on-chip cache and chip
WO2023184991A1 (en) * 2022-03-31 2023-10-05 苏州浪潮智能科技有限公司 Traffic management and control method and apparatus, and device and readable storage medium

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20060288129A1 (en) * 2005-06-17 2006-12-21 Level 5 Networks, Inc. DMA descriptor queue read and cache write pointer arrangement
US20090100200A1 (en) * 2007-10-16 2009-04-16 Applied Micro Circuits Corporation Channel-less multithreaded DMA controller
CN101539902A (en) * 2009-05-05 2009-09-23 中国科学院计算技术研究所 DMA device for nodes in multi-computer system and communication method
CN103885840A (en) * 2014-04-04 2014-06-25 华中科技大学 FCoE protocol acceleration engine IP core based on AXI4 bus
US20160266925A1 (en) * 2013-10-23 2016-09-15 Hangzhou H3C Technologies Co., Ltd. Data forwarding
US10142794B1 (en) * 2017-07-10 2018-11-27 International Business Machines Corporation Real-time, location-aware mobile device data breach prevention
US10657084B1 (en) * 2018-11-07 2020-05-19 Xilinx, Inc. Interrupt moderation and aggregation circuitry
CN112306928A (en) * 2020-11-19 2021-02-02 山东云海国创云计算装备产业创新中心有限公司 Stream transmission-oriented direct memory access method and DMA controller
CN113225307A (en) * 2021-03-18 2021-08-06 西安电子科技大学 Optimization method, system and terminal for pre-reading descriptors in offload engine network card

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20060288129A1 (en) * 2005-06-17 2006-12-21 Level 5 Networks, Inc. DMA descriptor queue read and cache write pointer arrangement
US20090100200A1 (en) * 2007-10-16 2009-04-16 Applied Micro Circuits Corporation Channel-less multithreaded DMA controller
CN101539902A (en) * 2009-05-05 2009-09-23 中国科学院计算技术研究所 DMA device for nodes in multi-computer system and communication method
US20160266925A1 (en) * 2013-10-23 2016-09-15 Hangzhou H3C Technologies Co., Ltd. Data forwarding
CN103885840A (en) * 2014-04-04 2014-06-25 华中科技大学 FCoE protocol acceleration engine IP core based on AXI4 bus
US10142794B1 (en) * 2017-07-10 2018-11-27 International Business Machines Corporation Real-time, location-aware mobile device data breach prevention
US10657084B1 (en) * 2018-11-07 2020-05-19 Xilinx, Inc. Interrupt moderation and aggregation circuitry
CN112306928A (en) * 2020-11-19 2021-02-02 山东云海国创云计算装备产业创新中心有限公司 Stream transmission-oriented direct memory access method and DMA controller
CN113225307A (en) * 2021-03-18 2021-08-06 西安电子科技大学 Optimization method, system and terminal for pre-reading descriptors in offload engine network card

Non-Patent Citations (5)

* Cited by examiner, † Cited by third party
Title
LING ZHENG等: "Design and analysis of a parallel hybrid memory architecture for per-flow buffering in high-speed switches and routers", 《JOURNAL OF COMMUNICATIONS AND NETWORKS》 *
MUENCH, DANIEL等: "SgInt: Safeguarding Interrupts for Hardware-Based I/O Virtualization for Mixed-Criticality Embedded Real-Time Systems Using Non Transparent Bridges", 《 ARCHITECTURE OF COMPUTING SYSTEMS - ARCS 2015》 *
张洪斌: "确定性以太网TT业务交换及同步技术研究", 《硕士电子期刊出版信息》 *
邵凯;梁燕;黄俊;: "1MPC8280的AAL2适配与DMA通道驱动软件的实现", 国外电子元器件, no. 03 *
韩宗芬;周怡;廖小飞;程斌;: "P2PVOD中基于时间片的网络逻辑拓扑构造", 小型微型计算机系统, no. 01 *

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2023184991A1 (en) * 2022-03-31 2023-10-05 苏州浪潮智能科技有限公司 Traffic management and control method and apparatus, and device and readable storage medium
CN114885045A (en) * 2022-07-07 2022-08-09 浙江锐文科技有限公司 Method and device for saving DMA channel resources in high-speed intelligent network card/DPU
CN115314439A (en) * 2022-08-11 2022-11-08 迈普通信技术股份有限公司 Flow control method and related device for data storage IO (input/output) request
CN115314439B (en) * 2022-08-11 2023-10-24 迈普通信技术股份有限公司 Flow control method and related device for data storage IO request
CN116225999A (en) * 2023-05-04 2023-06-06 太初(无锡)电子科技有限公司 DMA data transmission method and system
CN116225999B (en) * 2023-05-04 2023-07-21 太初(无锡)电子科技有限公司 DMA data transmission method and system
CN116303173A (en) * 2023-05-19 2023-06-23 深圳云豹智能有限公司 Method, device and system for reducing RDMA engine on-chip cache and chip
CN116303173B (en) * 2023-05-19 2023-08-08 深圳云豹智能有限公司 Method, device and system for reducing RDMA engine on-chip cache and chip

Also Published As

Publication number Publication date
CN113986791B (en) 2024-02-02

Similar Documents

Publication Publication Date Title
CN113986791A (en) Intelligent network card rapid DMA design method, system, equipment and terminal
US8719456B2 (en) Shared memory message switch and cache
US20050235072A1 (en) Data storage controller
US20160132541A1 (en) Efficient implementations for mapreduce systems
US9390036B2 (en) Processing data packets from a receive queue in a remote direct memory access device
CN107728936B (en) Method and apparatus for transmitting data processing requests
CN113535395A (en) Descriptor queue and memory optimization method, system and application of network storage service
CN112199309B (en) Data reading method and device based on DMA engine and data transmission system
CN106713450A (en) Downloading acceleration method and apparatus based on read-write separation mode
CN109857545B (en) Data transmission method and device
EP1554644A2 (en) Method and system for tcp/ip using generic buffers for non-posting tcp applications
CN115643318A (en) Command execution method, device, equipment and computer readable storage medium
US20050091390A1 (en) Speculative method and system for rapid data communications
CN113079113B (en) Data transmission device and data transmission system
KR100917677B1 (en) System and method for bridging file systems between two different processors in mobile phone
CN113986137A (en) Storage device and storage system
US6108694A (en) Memory disk sharing method and its implementing apparatus
CN115994115A (en) Chip control method, chip set and electronic equipment
US7480739B1 (en) Segregated caching of linked lists for USB
CN110209343B (en) Data storage method, device, server and storage medium
CN112685335A (en) Data storage system
CN115529275B (en) Message processing system and method
JPH11149455A (en) Memory disk sharing method and its executing device
CN115933973B (en) Method for remotely updating data, RDMA system and storage medium
CN115291898B (en) Multi-FPGA slave mode rapid burning method and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant