Disclosure of Invention
In view of the foregoing, embodiments of the present invention provide a PCIe-based data transmission method, apparatus, and system that obviate or mitigate one or more disadvantages in the prior art.
One aspect of the present invention provides a PCIe-based data transmission method, the method being for execution at a device side, a plurality of ports of the device side being identified and configured to be enabled by a driver at a host side, each port respectively matching a data buffer pre-applied by the host side, the method comprising the steps of:
Segmenting a data packet to be sent by each port according to a preset rule to obtain a data packet, and calling a descriptor generator to generate a descriptor according to the data packet;
Invoking the descriptor arbiter to poll the channel corresponding to each port to detect and acquire the descriptor, and transmitting the detected descriptor to a DMA controller;
invoking the DMA controller to send control information to a data grabber and a data transmitter, instructing the data grabber to read the data packet from the corresponding channel and forward the data packet to the data transmitter;
And calling the data transmitter to transmit the data packets to the host side, and storing the data packets in the data buffer area matched with the corresponding ports of the data packets so that the driver can read and aggregate the data packets in the data buffer area one by one.
In some embodiments of the invention, further comprising:
the data transmitter transmits feedback information to the DMA controller and transmits the feedback information to the descriptor arbiter;
And the descriptor arbiter receives the feedback information and continuously polls each channel to acquire the next descriptor.
In some embodiments of the present invention, the data packet includes a data field including valid data information and a tag field including a data packet ID, a time stamp, a length of valid data, packet statistics, and hardware status information.
In some embodiments of the present invention, invoking the data transmitter to transmit the data packets to the host side and storing the data packets in the data buffer matched with the corresponding ports of the data packets for the driver to read and aggregate the data packets one by one in the data buffer, further comprising:
the data transmitter transmits interrupt information to the host, and the driver reads and aggregates the data packets in the data buffer area one by one according to the interrupt information.
In some embodiments of the present invention, the host triggers an interrupt by default or by way of interrupt aggregation.
In some embodiments of the present invention, invoking the data transmitter to transmit the data packets to the host side and storing the data packets in the data buffer matched with the corresponding ports of the data packets for the driver to read and aggregate the data packets one by one in the data buffer, further comprising:
The driver polls the data buffer to obtain the data packets and reads and aggregates the data packets in the data buffer one by one.
In some embodiments of the present invention, invoking the data transmitter to transmit the data packets to the host side and storing the data packets in the data buffer matched with the corresponding ports of the data packets for the driver to read and aggregate the data packets one by one in the data buffer, further comprising:
The data transmitter places the data packets into successive basic units in the data buffer according to a preset sequence;
When the driver processes the data packets in the data buffer, the data packets are taken out one by one in sequence and subjected to offset processing.
Another aspect of the invention provides a PCIe-based data transfer device comprising a plurality of ports, a descriptor generator, a descriptor arbiter, a DMA controller, a data grabber, and a data transmitter;
The descriptor generator is used for generating a descriptor according to the data packet;
The descriptor arbiter is used for detecting and acquiring the descriptor by the channel corresponding to each port, and transmitting the detected descriptor to the DMA controller;
The DMA controller is used for sending control information to the data grabber and the data transmitter, instructing the data grabber to read the data packet from the corresponding channel and forwarding the data packet to the data transmitter;
the data grabber is used for reading the data packet from the corresponding channel according to the control information and transmitting the data packet to the data transmitter;
the data transmitter is used for transmitting the data packets to a host end and storing the data packets in a data buffer area matched with a port corresponding to the data packets so that a driver can read and aggregate the data packets in the data buffer area one by one.
Another aspect of the invention provides a PCIe-based data transmission system comprising a processor and a memory, the memory having stored therein computer instructions for executing the computer instructions stored in the memory, the system implementing the steps of any one of the methods described above when the computer instructions are executed by the processor.
Another aspect of the invention provides a computer readable storage medium having stored thereon a computer program, characterized in that the program when executed by a processor implements the steps of any of the methods described above.
The beneficial effects of the invention at least comprise:
The invention provides a PCIe-based data transmission method, a device and a system, wherein the method is executed at a device end, a plurality of ports of the device end are identified by a driver at a host end to be matched with corresponding data buffers, a data packet is segmented into data packets at the device end, a descriptor generator is called to generate descriptors according to the data packets, a descriptor arbitration polls a channel corresponding to each port to detect the descriptors, the descriptors are transmitted to a DMA controller to generate control information and are transmitted to a data grabber and a data transmitter, the data grabber reads the data packets from the corresponding channels and transmits the data packets to the data buffers at the host end, and the data transmitter transmits the data packets to the data buffers at the host end for the driver to process the data. The method does not need the host to generate the descriptor, reduces the operation steps of the host in the DMA flow, simplifies the transmission flow, reduces the time delay of the DMA and improves the overall efficiency of the system. And the exclusive use of PCIe by large packet data or high priority channel data is avoided by combining a fair polling mechanism of the data packet and the descriptor. The whole data packet is segmented into the data packets, so that the whole data packet can be transmitted by taking the data packets as units before the equipment end is received, the running water type transmission can be realized, and the data backlog is avoided, thereby reducing the load pressure of the PCIe bus and the whole processor system at a certain time point, and improving the utilization rate of the PCIe bus.
Additional advantages, objects, and features of the invention will be set forth in part in the description which follows and in part will become apparent to those having ordinary skill in the art upon examination of the following or may be learned from practice of the invention. The objectives and other advantages of the invention may be realized and attained by the structure particularly pointed out in the written description and drawings.
It will be appreciated by those skilled in the art that the objects and advantages that can be achieved with the present invention are not limited to the above-described specific ones, and that the above and other objects that can be achieved with the present invention will be more clearly understood from the following detailed description.
Detailed Description
The present invention will be described in further detail with reference to the following embodiments and the accompanying drawings, in order to make the objects, technical solutions and advantages of the present invention more apparent. The exemplary embodiments of the present invention and the descriptions thereof are used herein to explain the present invention, but are not intended to limit the invention.
It should be noted here that, in order to avoid obscuring the present invention due to unnecessary details, only structures and/or processing steps closely related to the solution according to the present invention are shown in the drawings, while other details not greatly related to the present invention are omitted.
It should be emphasized that the term "comprises/comprising" when used herein is taken to specify the presence of stated features, elements, steps or components, but does not preclude the presence or addition of one or more other features, elements, steps or components.
It is also noted herein that the term "coupled" may refer to not only a direct connection, but also an indirect connection in which an intermediate is present, unless otherwise specified.
Hereinafter, embodiments of the present invention will be described with reference to the accompanying drawings. In the drawings, the same reference numerals represent the same or similar components, or the same or similar steps.
PCIe (Peripheral Component Interconnect Express) is a computer bus standard for connecting various hardware devices, such as video cards, network adapters, storage devices, etc., to a computer motherboard. PCIe provides a higher data transfer rate than a conventional PCI (Peripheral Component Interconnect) bus. PCIe is a point-to-point connection architecture in which each device is directly connected to a PCIe slot on the motherboard, rather than sharing a bus, which helps to improve data transfer efficiency.
DMA (Direct Memory Access ) is a data transfer technique in computer systems that allows external devices to directly access the system memory without direct intervention by the central processing unit. The main purpose of DMA is to increase the data transfer rate, relieve the burden on the CPU, and enable it to perform other tasks. In practice, DMA is typically managed by a dedicated hardware controller.
An aspect of one embodiment of the present invention provides a PCIe-based data transmission method, where the method is performed at a device side, a plurality of ports at the device side are identified and configured by a driver at a host side to be enabled, and each port is respectively matched with a data buffer area pre-applied by the host side, and the method includes steps S101 to S104:
step S101, segmenting the data packet to be sent by each port according to a preset rule to obtain a data packet, and calling a descriptor generator to generate a descriptor according to the data packet.
The data packet is segmented into data small packets, so that the data large packets can be prevented from monopolizing the PCIe bus for a long time during data transmission, and the purpose of efficient transmission is achieved.
Wherein, the descriptor is an abstract data structure for describing the attribute, position, etc. information of the data. A packet is a logical unit of data transmitted and a descriptor is a means of managing and controlling such data. The descriptor may contain a pointer to the data packet or contain meta-information about the data packet. The descriptor of the present embodiment is used to specify the location, size, and other relevant information of the data packet.
Step S102, calling a descriptor arbiter to poll a channel corresponding to each port to detect and acquire the descriptor, and transmitting the detected descriptor to a DMA controller.
Wherein polling is a technique for acquiring or checking information, in this embodiment descriptors come as requests through respective descriptor channels to a descriptor arbiter, which polls each channel, and takes out descriptors to perform a transmission operation if the current channel polled has a descriptor. Other channels may have valid descriptors at this point, but may only wait for the descriptor arbiter to poll the channel before the corresponding transfer operation can be performed.
And step S103, calling the DMA controller to send control information to the data grabber and the data transmitter, and instructing the data grabber to read the data packet from the corresponding channel and forward the data packet to the data transmitter.
Step S104, calling a data transmitter to transmit the data packets to a host end, and storing the data packets in a data buffer area matched with a port corresponding to the data packets so as to enable a driver to read and aggregate the data packets in the data buffer area one by one.
The driver of the host refers to a device driver in the computer system, and is responsible for communication and coordination with the external device, translating a request of the operating system into an instruction which can be understood by hardware, and translating a response of the external device into data which can be processed by the operating system. The drivers provide interfaces to the operating system for external devices so that applications can access device functions through standard system calls or APIs (application program interfaces).
In some embodiments of the invention, further comprising:
the data transmitter transmits feedback information to the DMA controller and passes the feedback information to the descriptor arbiter.
The descriptor arbiter receives the feedback information and continues to poll each channel to obtain the next descriptor.
In some embodiments of the present invention, the data packet includes a data field and a tag field. The DATA field (DATA field) includes valid DATA information, and the actual length of valid DATA may be smaller than the length of the DATA field. The TAG field (TAG field) includes a packet ID, a time stamp, a length of valid data, packet statistics, and hardware status information. The DATA field (DATA field) in the DATA packet is used to hold the actual DATA content, while the TAG field (TAG field) contains meta-information about the DATA packet and other information for performing more functions.
In some embodiments of the present invention, invoking the data transmitter to transmit the data packets to the host side and storing the data packets in a data buffer matching the port corresponding to the data packets for the driver to read and aggregate the data packets one by one in the data buffer, further comprising:
The data transmitter transmits interrupt information to the host, and the driver reads and aggregates the data packets in the data buffer one by one according to the interrupt information.
Wherein the interrupt information is a hardware or software generated signal that breaks the normal program execution flow, causing the processor to in turn execute a specific interrupt service routine. Interrupts may be triggered by external devices, program errors, or other events that require attention from the processor. In this embodiment, the interrupt information is used for the device side to communicate with the host side, so as to remind the host side to process the data packet transmitted by the device side.
In some embodiments of the present invention, the host side triggers the interrupt by default interrupt or interrupt aggregation.
The default interrupt refers to an interrupt triggering mode preset or defaulted at the host end.
Wherein, interrupt aggregation refers to merging or summarizing multiple interrupt events into a single interrupt event, rather than processing each interrupt individually at a time, the system can merge similar or related interrupts into one, thereby reducing interrupt frequency and processing overhead. This approach helps to improve system efficiency, especially when faced with a large number of interrupt events.
In some embodiments of the present invention, invoking the data transmitter to transmit the data packets to the host side and storing the data packets in a data buffer matching the port corresponding to the data packets for the driver to read and aggregate the data packets one by one in the data buffer, further comprising:
The driver polls the data buffer to obtain data packets and reads and aggregates the data packets in the data buffer one by one.
In some embodiments of the present invention, invoking the data transmitter to transmit the data packets to the host side and storing the data packets in a data buffer matching the port corresponding to the data packets for the driver to read and aggregate the data packets one by one in the data buffer, further comprising:
the data transmitter places the data packets in a predetermined order into successive elementary units in the data buffer.
When the driver processes the data packets in the data buffer, the data packets are fetched one by one in sequence and subjected to offset processing.
The offset processing is to adjust the offset of the data packet. The offset is an offset of a specific portion of the data packet with respect to a start position of the entire data packet.
In some embodiments of the present invention, the length field of the descriptor corresponding to each packet is 64B in size, and the length of the valid data is recorded in the TAG field (TAG field) and parsed by the driver. The offsets of the source address and the destination address corresponding to the descriptor and the source address and the destination address corresponding to the adjacent descriptor, namely the offset of the data packet, are all 64B.
Another aspect of embodiments of the invention provides a PCIe-based data transfer device comprising a plurality of ports, a descriptor generator, a descriptor arbiter, a DMA controller, a data grabber, and a data transmitter.
The descriptor generator is used for generating a descriptor according to the data packet.
The descriptor arbiter is used for each port corresponding channel to detect and acquire the descriptor, and the detected descriptor is transmitted to the DMA controller.
The DMA controller is configured to send control information to the data grabber and the data transmitter, instruct the data grabber to read the data packets from the corresponding channels, and forward the data packets to the data transmitter.
The data grabber is used for reading the data packets from the corresponding channels according to the control information and transmitting the data packets to the data transmitter.
The data transmitter is used for transmitting the data packets to the host end and storing the data packets in a data buffer area matched with the port corresponding to the data packets so as to enable the driver to read and aggregate the data packets in the data buffer area one by one.
Another aspect of the embodiments of the invention provides a PCIe-based data transmission system comprising a processor and a memory, the memory having stored therein computer instructions for executing the computer instructions stored in the memory, the system implementing the steps of the method as in any of the embodiments described above when the computer instructions are executed by the processor.
Another aspect of an embodiment of the present invention provides a computer-readable storage medium, on which a computer program is stored, characterized in that the program, when being executed by a processor, implements the steps of the method as in any of the embodiments described above.
Another embodiment of the present invention provides a PCIe-based data transmission method, apparatus, and system, where the specific implementation manner is as follows:
The overall architecture of this embodiment is shown schematically in FIG. 2, where the left part of PCIe resides within the device, including a descriptor Generator (Generator), a descriptor arbiter (Arbiter), a DMA Controller (Controller), a data grabber (DATA FETCHER), and a data Transmitter (Transmitter). The descriptor generator is responsible for generating descriptors from data packets segmented from data packets and for supplying the descriptors to the descriptor arbiter. The descriptor arbiter is responsible for polling each channel for descriptors and forwarding descriptors to the DMA controller. The DMA controller is responsible for sending control information to the data grabber and the data transmitter. The data grabber is responsible for grabbing data according to the corresponding channel of the control information of the DMA controller. The data transmitter is responsible for transmitting the data packet returned by the data grabber to the corresponding address or the queue position of the Host memory through PCIe according to the control information of the DMA controller.
The right part of PCIe is the Host end. The data buffer area (queue or channel, etc.) in the memory stores the data packet sent by the device end, and the corresponding relation between the data packet and the channel in the device is determined according to the actual situation, and the queue may correspond to the specific user application program or virtual machine, etc. objects according to some corresponding rule. The Driver is responsible for configuring the device during the initialization phase and further processing the data packets in the buffer according to interrupts or polls during the actual working phase.
The implementation is divided into two major phases, an initialization configuration phase and an actual execution phase.
The process of initializing the configuration stage is shown in fig. 3, and the main behavior is that after the driver loads and identifies the board, corresponding memory resources are allocated to the device according to the number of ports (physical or virtual) of the device.
For the actual execution phase, the default execution flow after the initialization configuration is completed is shown in fig. 4. Arbiter will continually poll each channel and send descriptors to the DMA controller if pending descriptors are detected. The DMA controller sends control information to the data grabber and the data transmitter based on the descriptors. The data grabber reads the data from the corresponding channel according to the control information and sends the returned data to the data transmitter. The data transmitter sends the data to the corresponding location in memory, then sends feedback of completion of execution to Arbiter, and notifies interrupt sending logic to send an interrupt to Host. Arbiter after receiving the feedback, determining that the transmission is completed, and detecting whether there is a next descriptor to be processed. After the driver receives the interrupt, the driver can go to the corresponding position in the memory to take out the data and further process the data.
In a practical application environment, the size of the data volume and the frequency of transmitting the data packets are uncertain. For example, frequent interrupt processing under a large data volume may occupy a large amount of CPU resources, thereby affecting the efficiency of data transmission. Therefore, in the practical application environment, the driver can choose whether to send interrupt or send the frequency of interrupt (interrupt aggregation) by configuring the related register according to the practical requirement, or select interrupt aggregation in a pure software mode or process data in a polling mode at the Host end, and the related execution relationship is shown in fig. 5. The dotted line path in the figure can adopt a default interrupt mode, interrupt aggregation in a pure software mode, and the like, can also adopt a driving self-polling data packet presented by a non-dotted line path, or adopts a mode of combining interrupt and polling. In practical application, the above modes can be combined for use or dynamically configured according to the actual scene requirements.
The format of the data packet for each DMA transfer is shown in fig. 6 (a). The data packets are segmented from the complete network packet (data big packet) received from the ethernet network, the relationship of which is shown in fig. 7. Wherein each data packet has a respective corresponding descriptor, the length field of the descriptor is 64B in size, and the length of the effective data is actually recorded in the TAG and parsed by the driver. As shown in fig. 8, the offsets of the source address and the destination address corresponding to the descriptor and the source address and the destination address corresponding to the adjacent descriptor, that is, the offsets of the data packets are all 64B. The benefit of segmentation is to prevent large packets from monopolizing the PCIe bus for long periods of time, so that other channels of possible real-time data are handled in time. Each frame packet consists of a DATA field (DATA field) and a TAG field (TAG field), the lengths of which are fixed. Wherein the effective DATA length of the DATA field (DATA field) may be smaller than the length of the DATA field (DATA field). The information contained in the TAG field (TAG field) includes the ID of the data packet, the length of the valid data, a flag bit for identifying whether the data packet is the last data packet, etc., and of course, the information including, but not limited to, a time stamp, packet statistics, hardware status, etc. may also be carried. As shown in FIG. 6 (b), each DMAC2H places data packets sequentially in DMA fashion into successive queue elements of the data buffer, which aggregates the packets into large packets. The data packets are taken out from the buffer area one by the driver, and after the data in the data packets are subjected to offset processing, the data packets can be used as a data packet to be sent to a corresponding application program.
In the practical application of the DMAC2H scheme, a descriptor is generated by a descriptor generator adjacent to DMAC2H logic in the device, so that the descriptor phase delay in the DMA transmission flow is reduced to the order of one digit nanosecond. In addition, the logic complexity of the DMAC2H is simplified to a certain extent by the processing mode of data segmentation, so that the delay of the whole transmission process is reduced. Meanwhile, the data packets are sequentially put into a Host end buffer area one by one in the form of data packets, so that the purpose of packet aggregation is achieved.
In accordance with the above method, the present invention also provides a system comprising a computer device comprising a processor and a memory, the memory having stored therein computer instructions for executing the computer instructions stored in the memory, the apparatus/system implementing the steps of the method as described above when the computer instructions are executed by the processor.
The embodiments of the present invention also provide a computer readable storage medium having stored thereon a computer program which, when executed by a processor, implements the steps of the edge computing server deployment method described above. The computer readable storage medium may be a tangible storage medium such as Random Access Memory (RAM), memory, read Only Memory (ROM), electrically programmable ROM, electrically erasable programmable ROM, registers, floppy disks, hard disk, a removable memory disk, a CD-ROM, or any other form of storage medium known in the art.
In summary, the present invention provides a PCIe-based data transmission method, apparatus and system, where the method is performed at a device side, a plurality of ports at the device side are identified by a driver at a host side to match corresponding data buffers, the method includes segmenting a data packet into data packets, a descriptor generator generating descriptors according to the data packets, a descriptor arbitration polling a channel corresponding to each port to detect the descriptors, transmitting the descriptors to a DMA controller to generate control information to send to a data grabber and a data transmitter, the data grabber reads the data packets from the corresponding channels and transmits them to the data transmitter, and the data transmitter transmits the data packets to the data buffers at the host side for processing by the driver. The invention can simplify the DMA transmission flow and reduce the data transmission time delay, and the operation of segmenting the whole data packet into the data small packets can realize the running water type transmission, avoid the data backlog, reduce the load pressure of the PCIe bus and the whole processor system and improve the utilization rate of the PCIe bus.
Those of ordinary skill in the art will appreciate that the various illustrative components, systems, and methods described in connection with the embodiments disclosed herein can be implemented as hardware, software, or a combination of both. The particular implementation is hardware or software dependent on the specific application of the solution and the design constraints. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present invention. When implemented in hardware, it may be, for example, an electronic circuit, an Application Specific Integrated Circuit (ASIC), suitable firmware, a plug-in, a function card, or the like. When implemented in software, the elements of the invention are the programs or code segments used to perform the required tasks. The program or code segments may be stored in a machine readable medium or transmitted over transmission media or communication links by a data signal carried in a carrier wave.
It should be understood that the invention is not limited to the particular arrangements and instrumentality described above and shown in the drawings. For the sake of brevity, a detailed description of known methods is omitted here. In the above embodiments, several specific steps are described and shown as examples. The method processes of the present invention are not limited to the specific steps described and shown, but various changes, modifications and additions, or the order between steps may be made by those skilled in the art after appreciating the spirit of the present invention.
In this disclosure, features that are described and/or illustrated with respect to one embodiment may be used in the same way or in a similar way in one or more other embodiments and/or in combination with or instead of the features of the other embodiments.
The above description is only of the preferred embodiments of the present invention and is not intended to limit the present invention, and various modifications and variations can be made to the embodiments of the present invention by those skilled in the art. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present invention should be included in the protection scope of the present invention.