CN117033274A

CN117033274A - DMA data packet transmission method and device and PCIE equipment

Info

Publication number: CN117033274A
Application number: CN202311081559.7A
Authority: CN
Inventors: 苏庆会; 苏智睿; 王中原; 赵鹏翔
Original assignee: Zhengzhou Xinda Jiean Information Technology Co Ltd
Current assignee: Zhengzhou Xinda Jiean Information Technology Co Ltd
Priority date: 2023-08-25
Filing date: 2023-08-25
Publication date: 2023-11-10

Abstract

The embodiment of the invention provides a DMA data packet transmission method, a device and PCIE equipment, which comprise the following steps: storing the acquired multiple DMA read requests into a first buffer area according to a first sequence; recording the length information sequence of a plurality of DMA read requests in a second buffer area; calculating to obtain a plurality of TLP reading requests; storing the plurality of TLP read requests in the third buffer according to the increasing sequence of TAG numbers; sequentially acquiring a plurality of TLP read requests, and sequentially sending the TLP read requests to a host through an IP core; receiving a plurality of TLP completion packets fed back by a host; storing the plurality of TLP completion packets in a fourth buffer according to the sequence of TAG numbers; the plurality of TLP completion packets are sequentially read, and the plurality of TLP completion packets are packed in sequence according to the length information. The technical scheme of the embodiment of the invention can obtain the DMA data packets which are arranged according to the sending sequence of the DMA read requests so as to realize the distinction of TLP completion packets corresponding to a plurality of complete data files.

Description

DMA data packet transmission method and device and PCIE equipment

Technical Field

The embodiment of the invention relates to the technical field of communication, in particular to a DMA data packet transmission method and device and PCIE equipment.

Background

With the development of technology and economy, computers are becoming widely visible in daily life and work of people, computer systems are becoming more and more complex and more powerful. Along with this, the amount of data interaction between the host computer and the peripheral devices increases rapidly. In this context, PCIE (peripheral component interconnect express, high-speed serial computer expansion bus standard) buses have emerged, which have advantages such as high data transmission speed, small physical size, and a good error detection mechanism. Currently, PCIE buses are widely used as a high-performance I/O bus in computer systems, and almost all commercial-level and industrial-level computer manufacturers provide PCIE bus interfaces in the computer systems they produce. Correspondingly, the peripheral devices connected with the host computer through the PCIE bus are called PCIE devices.

Meanwhile, when data interaction is performed between a computer host and PCIE equipment or between PCIE equipment and PCIE equipment, a large number of data copying operations are involved, which brings great burden to a CPU and seriously affects the processing capacity of the CPU. Based on this demand, DMA (Direct Memory Access ) technology has been developed, and DMA is a PCIE-supported transmission mode capable of transmitting a large amount of data at high speed in a "burst" form, and the data copy operation is given to the DMA controller to be executed, so that the processing capability of the CPU is greatly improved.

Further, based on the PCIE bus transaction layer protocol, data transmission is performed between the host computer and the PCIE device, or between the PCIE device and the PCIE device in a TLP (Transaction Layer Packet ) packet form. However, due to the restriction on data length during transmission of TLP packets (divided into TLP read requests and TLP complete packets), multiple TLP read requests can typically cover the address and length of a complete data file. Meanwhile, a situation that multiple complete data files need to be transmitted between the PCIE device and the host often occurs, which commonly causes a problem that it is difficult to distinguish TLP completion packets corresponding to the multiple complete data files.

Disclosure of Invention

Embodiments of the present invention aim to solve at least one of the technical problems in the related art to some extent.

For this reason, the embodiment of the invention discloses a method and a device for transmitting a DMA data packet and PCIE equipment, so as to realize differentiation of TLP completion packets corresponding to a plurality of complete data files.

In a first aspect, an embodiment of the present invention provides a DMA packet transmission method, applied to PCIE devices, the method including: acquiring a plurality of DMA read requests, and storing the acquired DMA read requests into a first buffer area according to a first sequence; each DMA read request comprises address information and length information of a DMA data packet corresponding to the DMA read request; recording the length information sequence of a plurality of DMA read requests in the second buffer area so as to represent the one-to-one correspondence between the ordering of each DMA read request in the first buffer area and the length information in each DMA read request; sequentially acquiring a plurality of DMA read requests in a first buffer area, and calculating TLP address information and TLP length information corresponding to each DMA read request according to a preset cutting length and address information and length information in each DMA read request; wherein the TLP address information and the TLP length information are in one-to-one correspondence; setting TAG numbers for each pair of TLP address information and TLP length information which are in one-to-one correspondence according to a second sequence, so as to obtain a plurality of TLP read requests; each TLP read request includes a TAG number, TLP address information, and TLP length information; storing the plurality of TLP read requests in the third buffer according to the increasing sequence of TAG numbers; sequentially acquiring a plurality of TLP read requests in a third buffer area, and sequentially sending the TLP read requests to a host through an IP core; receiving a plurality of TLP completion packets fed back by the host based on the plurality of TLP read requests; wherein the TLP completion packets correspond to TLP read requests, each TLP completion packet including a TAG number; storing the plurality of TLP completion packets in a fourth buffer according to the increasing sequence of TAG numbers; and sequentially reading the plurality of TLP completion packets from the fourth buffer area, and grouping the plurality of TLP completion packets according to the length information sequence recorded in the second buffer area, so as to obtain DMA data packets arranged in the first sequence.

In a specific embodiment of the first aspect, recording the order of the length information in the plurality of DMA read requests in the second buffer includes: sequentially reading length information in the plurality of DMA read requests according to the storage sequence of the plurality of DMA read requests, and sequentially storing the read length information in the second buffer area; or setting a DMA sequence identifier for each DMA read request, respectively splicing the length information in each DMA read request and the corresponding DMA sequence identifier to obtain a plurality of first splicing information, and storing the plurality of first splicing information into the second buffer area.

In a specific embodiment of the first aspect, the method further comprises: the TLP length information sequence in the plurality of TLP read requests is recorded in the fifth buffer.

In a specific embodiment of the first aspect, recording the TLP length information sequence in the plurality of TLP read requests in the fifth buffer includes: sequentially reading TLP length information in the plurality of TLP read requests according to the storing sequence of the plurality of TLP read requests, and sequentially storing the read TLP length information in a fifth buffer area; or, splicing the TLP length information and the TAG number in each TLP read request to obtain a plurality of second splicing information, and storing the plurality of second splicing information in the fifth buffer.

In a specific embodiment of the first aspect, the method further comprises: and taking the minimum value among the 4KB boundary, the maximum load of the TLP completion packet and the upper limit of the data size corresponding to each agreed TAG number as the preset cutting length.

In a specific embodiment of the first aspect, the method is implemented in a transaction layer of the PCIE bus, where the transaction layer is divided into a DMA transceiver layer, a TLP transceiver layer, and a transmission control layer; the DMA transceiver layer is configured to perform the following steps: acquiring a plurality of DMA read requests, and storing the acquired DMA read requests into a first buffer area according to a first sequence; recording the length information sequence in the plurality of DMA read requests in the second buffer area to represent the one-to-one correspondence between the ordering of each DMA read request in the first buffer area and the length information in each DMA read request; sequentially obtaining a plurality of TLP completion packets from the fourth buffer area, and grouping the plurality of TLP completion packets according to the length information sequence recorded in the second buffer area, so as to obtain DMA data packets arranged in a first sequence; the TLP transceiver layer is configured to perform the following steps: sequentially acquiring a plurality of DMA read requests in a first buffer area, and calculating TLP address information and TLP length information corresponding to each DMA read request according to a preset cutting length and address information and length information in each DMA read request; setting TAG numbers for each pair of TLP address information and TLP length information which are in one-to-one correspondence according to a second sequence, so as to obtain a plurality of TLP read requests; storing the plurality of TLP read requests in the third buffer according to the increasing sequence of TAG numbers; storing the plurality of TLP completion packets in a fourth buffer according to the increasing sequence of TAG numbers; the transmission control layer is configured to perform the following steps: sequentially acquiring a plurality of TLP read requests in a third buffer area, and sequentially sending the TLP read requests to a host through an IP core; receiving a plurality of TLP completion packets fed back by the host based on the plurality of TLP read requests; wherein the TLP completion packets correspond to TLP read requests, each TLP completion packet including a TAG number.

In a specific embodiment of the first aspect, the transmission control layer is further configured to perform the following steps: counting TLP read requests sent to a host in a register form to obtain a first count value; counting the received TLP completion packet in a register form to obtain a second count value; and performing flow control on the IP core according to the difference value between the first count value and the second count value.

In a specific embodiment of the first aspect, the TLP transceiver layer is further configured to perform the following steps: the fourth buffer area is subjected to flow control according to the size of the fourth buffer area, the upper limit of the number of IP core packets and the number of TLP completion packets stored and read; the DMA transceiver layer is further configured to perform the following steps: and the receiving buffer area is subjected to flow control according to the size of the receiving buffer area, the number of the grouped DMA data packets and the number of the sent DMA data packets.

In a second aspect, an embodiment of the present invention further discloses a DMA packet transmission device, which is applied to PCIE devices, where the device includes: the first acquisition module is used for acquiring a plurality of DMA read requests and storing the acquired DMA read requests into the first buffer area according to a first sequence; each DMA read request comprises address information and length information of a DMA data packet corresponding to the DMA read request; recording the length information sequence of a plurality of DMA read requests in the second buffer area so as to represent the one-to-one correspondence between the ordering of each DMA read request in the first buffer area and the length information in each DMA read request; a second acquiring module, configured to sequentially acquire a plurality of DMA read requests in a first buffer from the first acquiring module, and calculate TLP address information and TLP length information corresponding to each DMA read request according to a preset cut length and address information and length information in each DMA read request; wherein the TLP address information and the TLP length information are in one-to-one correspondence; setting TAG numbers for each pair of TLP address information and TLP length information which are in one-to-one correspondence according to a second sequence, so as to obtain a plurality of TLP read requests; each TLP read request includes a TAG number, TLP address information, and TLP length information; storing the plurality of TLP read requests in the third buffer according to the increasing sequence of TAG numbers; a third obtaining module, configured to sequentially obtain, from the second obtaining module, a plurality of TLP read requests in a third buffer area, and sequentially send the plurality of TLP read requests to the host through the IP core; a receiving module, configured to receive a plurality of TLP completion packets fed back by the host based on a plurality of TLP read requests; wherein the TLP completion packets correspond to TLP read requests, each TLP completion packet including a TAG number; a sequencing module, configured to store a plurality of TLP completion packets received from the receiving module into a fourth buffer area according to an ascending order of TAG numbers; a packet grouping module, configured to sequentially read a plurality of TLP completion packets from the fourth buffer area of the ordering module, and sequentially group the plurality of TLP completion packets according to the length information recorded in the second buffer area, so as to obtain DMA data packets arranged in the first order.

In a specific embodiment of the second aspect, the device is implemented in a transaction layer of the PCIE bus, where the transaction layer is divided into a DMA transceiver layer, a TLP transceiver layer, and a transmission control layer; the DMA receiving and transmitting layer comprises the first acquisition module and the packet grouping module; the TLP transceiver layer includes the second acquisition module and the ordering model; the transmission control layer comprises the third acquisition module and the receiving module.

In a third aspect, an embodiment of the present invention further discloses a PCIE device, configured to execute the DMA packet transmission method according to any one of the embodiments of the first aspect.

The embodiment of the invention has the beneficial effects that:

the embodiment of the invention provides a DMA data packet transmission method, a device and PCIE equipment, which are applied to PCIE equipment, wherein the method comprises the following steps: acquiring a plurality of DMA read requests, and storing the acquired DMA read requests into a first buffer area according to a first sequence; recording the length information sequence of a plurality of DMA read requests in a second buffer area; sequentially acquiring a plurality of DMA read requests in a first buffer area, and calculating TLP address information and TLP length information corresponding to each DMA read request according to a preset cutting length and address information and length information in each DMA read request; setting TAG numbers for each pair of TLP address information and TLP length information which are in one-to-one correspondence according to a second sequence, so as to obtain a plurality of TLP read requests; storing the plurality of TLP read requests in the third buffer according to the increasing sequence of TAG numbers; sequentially acquiring a plurality of TLP read requests in a third buffer area, and sequentially sending the TLP read requests to a host through an IP core; receiving a plurality of TLP completion packets fed back by the host based on the plurality of TLP read requests; storing the plurality of TLP completion packets in a fourth buffer according to the TAG number sequence in the plurality of TLP completion packets; and sequentially reading the plurality of TLP completion packets from the fourth buffer area, and grouping the plurality of TLP completion packets according to the length information sequence recorded in the second buffer area, so as to obtain DMA data packets arranged in the first sequence. According to the technical scheme, the DMA read requests are sequentially sent, the received TLP completion packets are packed by utilizing the recorded length information sequence of the DMA read requests, and the DMA data packets which are arranged according to the sending sequence of the DMA read requests are obtained, so that the TLP completion packets corresponding to the complete data files are distinguished.

Drawings

Fig. 1 is a schematic diagram of an application scenario provided in an embodiment of the present invention;

fig. 2 is a flow chart of a DMA data packet transmission method according to an embodiment of the present invention;

fig. 3 is a schematic structural diagram of a DMA packet transmission device according to an embodiment of the present invention;

fig. 4 is a schematic structural diagram of another DMA packet transmission device according to an embodiment of the present invention.

Detailed Description

In order that the above-recited objects, features and advantages of the present invention will be more clearly understood, a detailed description of a preferred embodiment of the present invention will be rendered by reference to the appended drawings, which are appended hereto. All other embodiments obtained by those skilled in the art based on the technical conception of the embodiments of the present invention are within the scope of the present invention.

For easy understanding, first, an application scenario of the embodiment of the present invention will be described. Referring to fig. 1, an application scenario of an embodiment of the present invention includes: host 101 and PCIE device 102 connected to host 101. In practical applications, the PCIE device 102 in the embodiment of the present invention may be peripheral devices such as a PCIE cryptographic card, a PCIE sound card, and a PCIE network card. The technical scheme of the embodiment of the invention is realized in the transaction layer of the PCIE bus.

Further, the technical concept of the embodiment of the present invention is described herein:

as mentioned in the background section, due to the limitation of the data length during TLP packet transmission, a plurality of TLP read requests can usually cover the address and length of a complete data file. In order to solve the problem, the inventor finds that the DMA data packet has no limit on the data length in the process of implementing the present invention, so the technical solution of the embodiment of the present invention adopts the DMA packet (divided into the DMA read request and the DMA data packet) to cover the address and the length of a complete data file. On this basis, the DMA read request is cut into corresponding TLP read requests according to a preset cut length (corresponding to the address and length that can be covered by the TLP read request), so as to meet the transaction layer data transmission requirement. Meanwhile, in order to distinguish TLP completion packets corresponding to a plurality of complete data files, the technical solution of the embodiment of the present invention sequentially sends DMA read requests, and uses the recorded length information sequence of a plurality of DMA read requests to package a plurality of received TLP completion packets, so as to obtain DMA data packets arranged according to the sending sequence of the DMA read requests, so as to realize the distinction of TLP completion packets corresponding to a plurality of complete data files.

Embodiments of the invention will be described in more detail below with reference to the attached drawings:

Referring to fig. 2, fig. 2 is a flow chart of a DMA data packet transmission method according to an embodiment of the present invention. Further, the DMA packet transmission method shown in fig. 2 is applied to the PCIE device 102, more specifically, to a processor in the PCIE device 102, and the DMA packet transmission method is implemented at a transaction layer of the PCIE bus. In practical applications, the processor in PCIE device 102 may be selected from FPGA (Field Programmable Gate Array ), CPLD (ComplexProgrammable Logic Device, complex programmable logic device), ASIC (Application Specific Integrated Circuit ), and the like. Considering that specially designed ASICs are expensive, editable FPGAs and CPLDs are preferred. The DMA data packet transmission method shown in fig. 2 may include the following steps:

in step S201, a plurality of DMA read requests are acquired, and the acquired plurality of DMA read requests are stored in the first buffer in the first order.

The DMA read requests originate from the user logic, that is, are obtained from the core layer of the PCIE bus, and the number of DMA read requests obtained from the core layer of the PCIE device 102 is determined according to the actual requirement. In practical applications, the DMA read request may be generated by the host 101 or by the PCIE device 102, and the condition for generating the DMA read request by the PCIE device 102 is that the PCIE device must know the address and the length of the DMA data packet corresponding to the DMA read request in the host memory, and the DMA read request may generally be obtained by the host initiating a memory write or a mode agreed in advance.

The embodiment of the present invention is directed to the case that the core layer of the PCIE device 102 generates multiple DMA read requests, where the number of DMA read requests in practical applications may be 2, 3 or more. Each DMA read request includes address information and length information of a DMA packet corresponding to the DMA read request. It should be noted that, each DMA read request corresponds to a complete data file, that is, each DMA data packet corresponding to each DMA read request is a complete data file.

Further, the first order may be an order in which the core layer of the PCIE bus sends the multiple DMA read requests to the PCIE bus transaction layer, or may be an order in which the PCIE bus transaction layer randomly generates the DMA read requests. Typically, the order in which the multiple DMA read requests are sent into the PCIE bus transaction layer is random.

In step S202, the length information order of the plurality of DMA read requests is recorded in the second buffer.

It can be understood that, after a certain DMA read request is read from the first buffer, the DMA read request will not appear in the first buffer any more, so that the correspondence between the storage sequence and the length information of the DMA read request cannot be known; in order to solve this problem, in this embodiment, a second buffer is set to record a correspondence between a storage order of each DMA read request and length information, that is, a length information order, and a one-to-one correspondence between an ordering of each DMA read request in the first buffer and the length information in each DMA read request is represented by the length information order, so that, in a later period, corresponding DMA completion packets are grouped and distinguished based on the length information, and DMA data packets arranged in the first order are obtained.

The second buffer area and the first buffer area may physically correspond to different locations in the same storage space, and of course, the second buffer area and the first buffer area may physically correspond to different storage spaces.

In practical applications, various embodiments of step S202 are described in detail below.

Optionally, in a specific implementation of the embodiment of the present invention, step S202 may include:

sequentially reading the length information in the plurality of DMA read requests according to the storage sequence of the plurality of DMA read requests, and sequentially storing the read length information in the second buffer area.

For a more intuitive understanding, this is exemplified herein. Assume that there are three DMA read requests, denoted DMA-R1, DMA-R2, DMA-R3, respectively. Wherein, the length information of DMA-R1 is L1, the length information of DMA-R2 is L2, and the length information of DMA-R3 is L3. The storage sequence (i.e. the first sequence) of the three DMA read requests is DMA-R1, DMA-R2, DMA-R3, and accordingly, the three length information is stored in the second buffer according to the sequence of L1, L2, and L3. It is apparent that the arrangement order of the three length information in the second buffer is also L1, L2, and L3.

In this embodiment, the order of reading the length information in the plurality of DMA read requests is the same as the order of storing the plurality of DMA read requests in the first buffer, and is the first order, and simultaneously, the plurality of length information is sequentially stored in the second buffer according to the reading order, so that the ordering of the plurality of length information in the second buffer is consistent with the ordering of the plurality of DMA read requests in the first buffer, that is, the length information with the same ordering order and the DMA read requests are corresponding.

In this embodiment, the length information of the plurality of DMA read requests is directly recorded by using the length information stored in the second buffer in sequence, the sequence of the plurality of DMA read requests stored in the first buffer is indirectly recorded, the length information and the sequence are corresponding, and the sequence of the plurality of DMA read requests stored in the first buffer does not need to be specially recorded, so that storage resources are saved.

Alternatively, in another specific implementation of the embodiment of the present invention, step S202 may include:

setting a DMA sequence identifier for each DMA read request, respectively splicing the length information in each DMA read request and the corresponding DMA sequence identifier to obtain a plurality of first splicing information, and storing the plurality of first splicing information into a second buffer area.

Wherein the DMA order identifier characterizes an order in which each DMA read request is stored in the first buffer. In this embodiment, the length information in each DMA read request is spliced with the corresponding DMA sequence identifier, that is, the association between the length information in each DMA read request and the sequence of the DMA read request stored in the first buffer area is achieved. It should be noted that, because the first splicing information has already recorded the length information in each DMA read request and the sequence in which the DMA read request is stored in the first buffer, the first splicing information may be stored sequentially according to the sequence size represented by the DMA sequence identifier, or may be stored randomly or according to other set rules.

In step S203, a plurality of DMA read requests in the first buffer are sequentially acquired.

Specifically, "sequentially" herein refers to reading a plurality of DMA read requests from the first buffer in the order in which the plurality of DMA read requests are stored. Taking three DMA read requests with the storage sequence of DMA-R1, DMA-R2 and DMA-R3 as an example, in this step, three DMA read requests are obtained from the first buffer according to the sequence of DMA-R1, DMA-R2 and DMA-R3.

It should be noted that "sequence" in this step is a convention of acquiring multiple DMA read requests in the first buffer, and this step does not need to specifically acquire the sequence in which the multiple DMA read requests are stored in the first buffer, for example, a first-in-first-out mechanism is adopted to acquire the multiple DMA read requests in the first buffer.

In step S204, TLP address information and TLP length information corresponding to each DMA read request are calculated according to the preset cut length and the address information and length information in each DMA read request.

Specifically, in some implementations of the embodiments of the present invention, a minimum value among the 4KB boundary, the maximum payload of the TLP completion packet, and the upper limit of the data size corresponding to each TAG number agreed by the designer may be used as the preset cut length. Of course, the preset cut length may be preset in advance, but the preset cut length should be equal to or less than the minimum value of the 4KB boundary, the maximum payload of the TLP completion packet, and the data size upper limit corresponding to each of the contracted TAG numbers. Obviously, the implementation mode taking the minimum value among the 4KB boundary, the maximum load of the TLP completion packet and the agreed data size upper limit corresponding to each TAG number as the preset cutting length can adapt to the actual working condition, and has higher flexibility.

It should be noted that, since the plurality of DMA read requests in the first buffer are sequentially acquired in step S203, step S204 sequentially calculates the TLP address information and the TLP length information corresponding to each DMA read request according to the acquisition order of the DMA read requests in step S203, and the first order is not required. Taking three DMA read requests with the storage sequence of DMA-R1, DMA-R2 and DMA-R3 as an example, firstly TLP address information and TLP length information corresponding to DMA-R1 are calculated, then TLP address information and TLP length information corresponding to DMA-R2 are calculated, and finally TLP address information and TLP length information corresponding to DMA-R3 are calculated. Wherein the TLP address information and the TLP length information are in one-to-one correspondence.

Further, the purpose and calculation procedure of calculating the TLP address information and TLP length information corresponding to each DMA read request will be described herein.

Because the DMA read request corresponds to a complete data file, and the TLP packet has a limitation on the data length during transmission, it is generally impossible to cover a complete data file, and in order to solve this contradiction, the technical means adopted in the embodiment of the present invention is to replace the DMA read request with a plurality of TLP read requests, so as to obtain the DMA data packet corresponding to the DMA read request.

For ease of understanding, this is illustrated by the limitation of the data length during TLP packet transmission, where the DMA packet is cut into a plurality of data blocks satisfying the data length limitation, and the address of each data block is a TLP address information, and the length of each data block is a TLP length information. The purpose of this step is to calculate TLP address information and TLP length information corresponding to each data block.

Generally, when the length information in the DMA read request is long, at this time, starting from the address information in the DMA read request, the cut calculation is performed with the preset cut length, and one DMA read request can calculate a plurality of pairs of TLP address information and TLP length information. For example, assuming that address information of a DMA read request is Addr, length information is L, and preset cut length is Δl, the first TLP address information may be Addr, the first TLP length information is Δl, the nth TLP address information in the middle is addr+ (n-1) Δl, the nth TLP length information is Δl, the last TLP address information is addr+l-L% Δl, and the last TLP length information is L% Δl. Where n is any one of the middle,% represents the remainder operation.

It is obvious that when the length information in the DMA read request is smaller than the preset cut length, the address information and the length information of the DMA read request are directly used as the TLP address information and the TLP length information. In practice, it may be determined by a data size check whether a DMA read request needs to be cut.

In step S205, TAG numbers are set for each pair of TLP address information and TLP length information in the second order, so as to obtain a plurality of TLP read requests.

Wherein each TLP read request includes a TAG number, TLP address information, and TLP length information.

Specifically, the second order is related to the first order and the TLP address information, where the second order causes the TLP address information to be incremented when the TAG numbers of TLP read requests corresponding to the same DMA read request are incremented, and the TAG numbers corresponding to the DMA read requests with the earlier ranks are also ordered before each other, so as to ensure that the data in each DMA data packet is read in the order, and each DMA data packet is read according to the first order.

Generally, in step S204, each time a pair of TLP address information and TLP length information is calculated, a TAG number is timely assigned to the pair of TLP address information and TLP length information.

It should be noted that the second order is essentially a convention that associates the TAG number setting with the first order and TLP address information, but the convention may already be completed when the TAG number is set, so the first order need not be known in this step.

In step S206, the plurality of TLP read requests are stored in the third buffer in ascending order of TAG numbers.

Through this step, the plurality of TLP read requests are arranged in the third buffer in ascending order of TAG numbers.

In step S207, a plurality of TLP read requests in the third buffer are sequentially acquired, and the plurality of TLP read requests are sequentially sent to the host through the IP core.

Specifically, "sequentially" herein refers to reading the plurality of TLP requests from the third buffer in the order in which the plurality of TLP requests are stored, and sending the plurality of TLP requests to the host in the order in which the TLP requests are read. After receiving the TLP read request, the host responds to the TLP read request to obtain a corresponding TLP completion packet. The working mechanism of the IP core (Intellectual Property core ) may refer to the prior art, and will not be described herein.

It should be noted that "sequence" in this step is a convention of acquiring multiple TLP read requests in the third buffer, and this step does not need to specifically acquire the sequence in which the multiple TLP read requests are stored in the third buffer, for example, a first-in-first-out mechanism is adopted to acquire the multiple TLP read requests in the third buffer.

In step S208, the receiving host 101 receives a plurality of TLP completion packets fed back based on the plurality of TLP read requests.

Specifically, the plurality of TLP completion packets fed back by the host 101 based on the plurality of TLP read requests are received by the IP core.

Wherein the TLP completion packets correspond to TLP read requests, each TLP completion packet including a TAG number. It should be noted that each TLP read request may correspond to one TLP completion packet or may correspond to a plurality of TLP completion packets. The TLP completion packet corresponding to each TLP read request is determined according to an RCB (Read Completion Boundary ) parameter, where the RCB parameter is related to an address where the DMA packet is stored in the host 101, and the host 101 determines the TLP completion packet corresponding to each TLP read request according to the RCB boundary parameter, where an existing mechanism can ensure that all TLP completion packets corresponding to one TLP read request include the same TAG number and are not out of order. For more specific procedures, reference is made to the prior art, and no further description is given here.

In step S209, a plurality of TLP completion packets are stored in the fourth buffer in ascending order of TAG numbers.

It should be noted that, due to the transmission characteristics of the PCIE bus, multiple TLP completion packets may not return in the order of increasing TAG numbers, that is, out of order may occur. Therefore, in this step, the TLP completion packets are stored in the fourth buffer area in ascending order of the TAG numbers, so as to achieve that the TLP completion packet ordering is consistent with the ordering of the TLP read request. It is apparent that when the later TLP completion packet is received and the earlier TLP completion packet is not received, the later TLP completion packet needs to wait for the earlier TLP completion packet to be stored in the fourth buffer according to the increasing sequence of TAG numbers.

It should be noted that the fourth buffer area and the third buffer area may physically correspond to different locations in the same storage space, and of course, the fourth buffer area and the third buffer area may physically correspond to different storage spaces. But typically the fourth buffer and the third buffer physically correspond to different storage spaces.

In step S210, a plurality of TLP completion packets are sequentially read from the fourth buffer, and the plurality of TLP completion packets are sequentially packed according to the length information recorded in the second buffer, so as to obtain DMA data packets arranged in the first order.

Specifically, since the plurality of TLP completion packets are arranged in the fourth buffer in ascending order of TAG numbers, the ordering of the plurality of TLP read requests is consistent. Therefore, the TLP completion packet with the front order corresponds to the DMA read request with the front order, and the TLP completion packet can be distinguished by only reading the TLP completion packet in order and using the length information sequence obtained from the second buffer, so as to group the TLP completion packet corresponding to the same DMA read request, obtain the DMA data packet arranged in the first order, and further facilitate the back end to carry out the whole packet operation on the DMA data packets with different types and different contents. It is apparent that the length information acquisition order of the plurality of DMA read requests coincides with the order in which the plurality of DMA read requests are stored in the first buffer.

It should be noted that "sequence" in this step is a convention of acquiring a plurality of TLP completion packets in the fourth buffer, and this step does not need to specifically acquire the sequence in which the plurality of TLP completion packets are stored in the fourth buffer, for example, a first-in-first-out mechanism is adopted to acquire the plurality of TLP completion packets in the fourth buffer.

It should be noted that, in practical applications, data is generally transferred between the PCIE device and the host in a data stream manner, so the DMA packet transfer method according to the embodiment of the present invention is also generally executed in a stream manner. For example, once the DMA read request is stored in the first buffer in step S201, step S203 may be executed; once step S203 obtains a DMA read request, step S204 may be executed; once the pair of TLP address information and TLP length information are obtained in step 204, step S205 may be performed, and so on, as long as each data can be transferred according to the set order, so that the TLP completion packet belonging to each DMA packet can be distinguished finally. Of course, the embodiment of the present invention does not exclude the possibility that a subsequent step related to a certain step is executed after all the steps are executed. For example, step S203 may be performed after the completion of the storing of the plurality of DMA read requests in step S201.

It should be further noted that, according to the foregoing ordering rule, the TLP read requests do not need to obtain the storage sequence of multiple DMA read requests specially when being ordered, and the DMA data packets do not need to obtain the storage sequence of multiple TLP completion packets specially when being packed, so that these sequences do not need to be stored in each data in the data stream all the time, thereby saving the flow and improving the data transmission efficiency.

The embodiment of the invention provides a DMA data packet transmission method, which is applied to PCIE equipment and comprises the following steps: acquiring a plurality of DMA read requests, and storing the acquired DMA read requests into a first buffer area according to a first sequence; recording the length information sequence of a plurality of DMA read requests in a second buffer area; sequentially acquiring a plurality of DMA read requests in a first buffer area, and calculating TLP address information and TLP length information corresponding to each DMA read request according to a preset cutting length and address information and length information in each DMA read request; setting TAG numbers for each pair of TLP address information and TLP length information which are in one-to-one correspondence according to a second sequence, so as to obtain a plurality of TLP read requests; storing the plurality of TLP read requests in the third buffer according to the increasing sequence of TAG numbers; sequentially acquiring a plurality of TLP read requests in the third buffer area, and sequentially sending the plurality of TLP read requests to the host 101 through the IP core; receiving a plurality of TLP completion packets fed back by the host 101 based on the plurality of TLP read requests; storing the plurality of TLP completion packets in a fourth buffer according to the increasing sequence of TAG numbers; and sequentially reading the plurality of TLP completion packets from the fourth buffer area, and grouping the plurality of TLP completion packets according to the length information sequence recorded in the second buffer area, so as to obtain DMA data packets arranged in the first sequence. According to the technical scheme, the DMA read requests are sequentially sent, the received TLP completion packets are packed by utilizing the recorded length information sequence of the DMA read requests, and the DMA data packets which are arranged according to the sending sequence of the DMA read requests are obtained, so that the TLP completion packets corresponding to the complete data files are distinguished.

In practical applications, it is necessary to determine whether or not a TLP completion packet corresponding to a certain TLP read request is received. Accordingly, the TLP length information in each TLP read request needs to be recorded to determine whether the TLP completion packet corresponding to each TLP read request completes reception according to the TLP length information. Such records may be "explicit" or "implicit". "implicit" means that a contract is used such that all TLP length information is consistent, and at this time, it is determined whether the TLP completion packet is received according to the contract length. The "explicit" is further illustrated below.

Optionally, in a specific implementation manner of the embodiment of the present invention, the DMA data packet transmission method shown in fig. 2 may further include:

the TLP length information sequence in the plurality of TLP read requests is recorded in the fifth buffer.

The TLP length information sequentially records a correspondence between a storing order of each TLP read request and TLP length information, that is, the TLP length information sequentially records a one-to-one correspondence between an ordering of each TLP read request in the third buffer and the TLP length information in each TLP read request.

The following describes a recording method of the TLP length information sequence.

Optionally, in a specific implementation manner of the embodiment of the present invention, recording, in the fifth buffer, a TLP length information sequence in the plurality of TLP read requests includes:

the TLP length information in the TLP read requests is sequentially read according to the storing sequence of the TLP read requests, and the read TLP length information is sequentially stored in the fifth buffer area.

For a more intuitive understanding, this is exemplified herein. Three TLP read requests, referred to as TLP-R1, TLP-R2, TLP-R3, respectively, are assumed. Wherein the TLP-R1 has a length information of L10, the TLP-R2 has a length information of L20, and the TLP-R3 has a length information of L30. The three TLP read requests are stored in the order TLP-R1, TLP-R2, TLP-R3, and accordingly, the three TLP length information is stored in the fifth buffer in the order of L10, L20, and L30. It is apparent that the arrangement order of the three TLP length information in the fifth buffer is also L10, L20, and L30.

In this embodiment, the order of reading the TLP length information in the plurality of TLP read requests is the same as the order of storing the plurality of TLP read requests in the third buffer, and simultaneously, the plurality of TLP length information is sequentially stored in the fifth buffer according to the reading order, so that the ordering of the plurality of TLP length information in the fifth buffer is the same as the ordering of the plurality of TLP read requests in the third buffer, that is, the TLP length information with the same ordering name and the TLP read request are corresponding. In this embodiment, the TLP length information of the plurality of TLP read requests is directly recorded by using the TLP length information stored in the fifth buffer, the sequence of the plurality of TLP read requests stored in the third buffer is indirectly recorded, the TLP length information and the sequence are corresponding, and the sequence of the plurality of TLP read requests stored in the third buffer does not need to be specially recorded, so that storage resources are saved.

Optionally, in another specific implementation manner of the embodiment of the present invention, recording, in the fifth buffer, a TLP length information sequence in the plurality of TLP read requests includes:

and splicing the TLP length information and the TAG number in each TLP read request to obtain a plurality of second splicing information, and storing the second splicing information into a fifth buffer area.

In this embodiment, the TLP length information in each TLP read request is spliced with the corresponding TAG number, so that association between the TLP length information in each TLP read request and the sequence in which the TLP read request is stored in the third buffer is achieved. It should be noted that, because the second concatenation information has already recorded the sequence of the TLP length information in each TLP read request and the sequence of storing the TLP read request in the third buffer, the second concatenation information may be stored sequentially according to the size of the sequence represented by the TAG number, or may be stored randomly or according to other set rules.

It should be noted that, in step S209, it is required to determine whether the corresponding TLP completion packet is completed according to the TLP length information sequence, for the case that the TLP length information is explicit.

Further, optionally, in a specific implementation manner of the embodiment of the present invention, the DMA packet transmission method of the embodiment of the present invention is implemented in a transaction layer of a PCIE bus, where the transaction layer of the PCIE bus is divided into a DMA transceiver layer, a TLP transceiver layer, and a transmission control layer.

Specifically, the DMA transceiving layer is used for executing step S201, step S202, and step S210. The TLP transreceiving layer is configured to perform step S203, step S204, step S205, step S206, and step S209. The transmission control layer is used to perform step S207 and step S208.

In this embodiment, the transaction layer of the PCIE bus is divided into a DMA transceiver layer, a TLP transceiver layer, and a transmission control layer, and a layered structure is adopted to implement the transmission and reception of the DMA packet, the TLP packet, and the host 101 interaction data, respectively. Only data is transmitted between the layers, the logic is clear and simple, internal control signals between the layers are not crossed, the control is relatively independent, the layers are clear, and the internal data flow control of each layer is conveniently realized.

The data flow control of each level will be described later.

It should be noted that, in some embodiments, the TLP transceiver layer is further configured to record, in the fifth buffer, a TLP length information sequence in the plurality of TLP read requests.

Further, the inventors found in the course of implementing the embodiments of the present invention that: in the prior art, the IP core lacks a packet control mechanism, but in some scenarios, packet control is required for the IP core. For this purpose, in a specific implementation of the embodiment of the present invention, the transmission control layer is further configured to perform the following steps:

Counting TLP read requests sent to the host 101 in a register form to obtain a first count value;

counting the received TLP completion packet in a register form to obtain a second count value;

and performing flow control on the IP core according to the difference value between the first count value and the second count value.

Specifically, each time a TLP read request is sent to the host, the first count value is incremented by 1, each time a TLP completion packet is received, the second count value is incremented by 1, and the difference between the first count value and the second count value is the number of packets in the IP core.

In general, one flow control scheme may set a packet compression threshold, and stop sending a TLP read request to the host 101 when the packet compression number reaches the packet compression threshold, so as to implement packet compression number control in the IP core, and prevent packet loss. The packet compression threshold may be set according to the upper limit of the number of packets of the IP core, where the upper limit of the number of packets of the IP core of the PCIE device 102 is determined by the function of the PCIE core itself, and is fixed. In practical application, the method can be configured that the pack pressing threshold value can be set so as to flexibly adapt to different application scenes.

Another flow control scheme is to adjust the transmission speed of sending TLP completion packets to the host 101 according to the difference between the number of packets in the IP core and the upper limit of the number of packets that the IP core can withstand. When the difference is large, the transmission speed is appropriately increased, and when the difference is small, the transmission speed is decreased.

Further, in a specific implementation manner of the embodiment of the present invention, the TLP transceiver layer may further be configured to perform flow control, where the TLP transceiver layer is further configured to perform the following steps:

and the fourth buffer area is subjected to flow control according to the size of the fourth buffer area, the upper limit of the number of IP core packets and the number of TLP completion packets stored and read.

Specifically, the number of TLP completion packets in the "net ingress" fourth buffer may be known according to the number of TLP completion packets stored and read, and the upper limit of the number of TLP completion packets in the "net ingress" fourth buffer may be known according to the size of the fourth buffer and the upper limit of the number of IP core packets. When the difference between the number of the TLP completion packets stored and read is equal to the difference between the size of the fourth buffer area and the upper limit of the number of the IP core packets, the TLP read request is stopped from being acquired from the third buffer area, and when the difference between the number of the TLP completion packets stored and read is smaller than the difference between the size of the fourth buffer area and the upper limit of the number of the IP core packets, the TLP read request is continuously acquired from the third buffer area, so that the flow control of the fourth buffer area, namely the flow control of the TLP transceiving layer, is realized.

Further, in a specific implementation manner of the embodiment of the present invention, the DMA sending layer may be further subjected to flow control, where the DMA sending and receiving layer is further configured to perform the following steps:

And the receiving buffer area is subjected to flow control according to the size of the receiving buffer area, the number of the grouped DMA data packets and the number of the sent DMA data packets.

Specifically, the number of DMA packets "net flowing into" the receiving buffer may be known based on the number of DMA packets that have been packetized and the number of DMA packets that have been sent, and the upper limit of the number of DMA packets that the receiving buffer may accommodate may be known based on the size of the receiving buffer. And stopping acquiring the TLP completion packet from the fourth buffer zone when the number of the DMA data packets in the receiving buffer zone is equal to the size of the receiving buffer zone, and continuously acquiring the TLP completion packet from the fourth buffer zone when the number of the DMA data packets in the receiving buffer zone is smaller than the size of the receiving buffer zone, so as to realize the flow control of the receiving buffer zone, namely the flow control of the DMA receiving and transmitting layer.

Corresponding to the foregoing method embodiment, the embodiment of the present invention further discloses a DMA packet transmission device, which is applied to PCIE device 102 and is configured to execute the steps of the DMA packet transmission method described in the foregoing method embodiment. Referring to fig. 3, the DMA packet transfer apparatus provided in the embodiment of the present invention includes: a first acquisition module 301, a second acquisition module 302, a third acquisition module 303, a receiving module 304, a sorting module 305 and a packing module 306. In addition, fig. 3 also shows an IP core 307 located in the PCIE bus transaction layer. Wherein,

The first obtaining module 301 is configured to obtain a plurality of DMA read requests, and store the obtained plurality of DMA read requests in a first buffer according to a first order; each DMA read request comprises address information and length information of a DMA data packet corresponding to the DMA read request; recording the length information sequence of a plurality of DMA read requests in the second buffer area so as to represent the one-to-one correspondence between the ordering of each DMA read request in the first buffer area and the length information in each DMA read request;

the second acquiring module 302 is configured to sequentially acquire a plurality of DMA read requests in the first buffer from the first acquiring module 301, and calculate TLP address information and TLP length information corresponding to each DMA read request according to a preset cut length and address information and length information in each DMA read request; wherein the TLP address information and the TLP length information are in one-to-one correspondence; setting TAG numbers for each pair of TLP address information and TLP length information which are in one-to-one correspondence according to a second sequence, so as to obtain a plurality of TLP read requests; each TLP read request includes a TAG number, TLP address information, and TLP length information; storing the plurality of TLP read requests in the third buffer according to the increasing sequence of TAG numbers;

The third obtaining module 303 is configured to sequentially obtain the plurality of TLP read requests in the third buffer from the second obtaining module 302, and sequentially send the plurality of TLP read requests to the host 101 through the IP core;

the receiving module 304 is configured to receive a plurality of TLP completion packets fed back by the host 101 based on the plurality of TLP read requests; wherein the TLP completion packets correspond to TLP read requests, each TLP completion packet including a TAG number;

the ordering module 305 is configured to store the TLP completion packets received from the receiving module 304 into the fourth buffer in ascending order of TAG numbers;

the packet grouping module 306 is configured to sequentially read the plurality of TLP completion packets from the fourth buffer of the ordering module 305, and sequentially group the plurality of TLP completion packets according to the length information recorded in the second buffer, so as to obtain DMA data packets arranged in the first order.

It should be noted that, when the second obtaining module 302 obtains a plurality of DMA read requests from the first obtaining module 301, it may actively query whether there are DMA read requests in the first buffer. And when the DMA read requests in the first buffer area are inquired to be processed, sequentially acquiring a plurality of DMA read requests in the first buffer area. Of course, the first acquisition module 301 may also actively send multiple DMA read requests to the second acquisition module 302.

Further, when the third obtaining module 303 obtains multiple TLP read requests from the second obtaining module 302, it may actively query whether there is a TLP read request in the third buffer. When the TLP read requests are inquired to be processed in the third buffer area, a plurality of TLP read requests in the third buffer area are sequentially acquired. Of course, the second acquiring module 302 may also actively send multiple TLP read requests to the third acquiring module 303.

Further, the receiving module 304 actively sends the TLP completion packet to the ordering module 305; the sequencing module actively sends the TLP completion packet to the packetizing module 306.

An embodiment of the present invention provides a DMA packet transmission device, which is applied to PCIE device 102, and the device includes: the first acquisition module is used for acquiring a plurality of DMA read requests and storing the acquired DMA read requests into the first buffer area according to a first sequence; each DMA read request comprises address information and length information of a DMA data packet corresponding to the DMA read request; recording the length information sequence of a plurality of DMA read requests in the second buffer area so as to represent the one-to-one correspondence between the ordering of each DMA read request in the first buffer area and the length information in each DMA read request; the second acquisition module is used for sequentially acquiring a plurality of DMA read requests in the first buffer area from the first acquisition module; calculating TLP address information and TLP length information corresponding to each DMA read request according to the preset cutting length and the address information and the length information in each DMA read request; wherein the TLP address information and the TLP length information are in one-to-one correspondence; setting TAG numbers for each pair of TLP address information and TLP length information which are in one-to-one correspondence according to a second sequence, so as to obtain a plurality of TLP read requests; each TLP read request includes a TAG number, TLP address information, and TLP length information; storing the plurality of TLP read requests in the third buffer according to the increasing sequence of TAG numbers; a third obtaining module, configured to sequentially obtain, from the second obtaining module, a plurality of TLP read requests in a third buffer area, and sequentially send the plurality of TLP read requests to the host 101 through an IP core; a receiving module, configured to receive a plurality of TLP completion packets fed back by the host 101 based on a plurality of TLP read requests; wherein the TLP completion packets correspond to TLP read requests, each TLP completion packet including a TAG number; a sequencing module, configured to store a plurality of TLP completion packets received from the receiving module into a fourth buffer according to an ascending order of TAG numbers in the plurality of TLP completion packets; a packet grouping module, configured to sequentially read a plurality of TLP completion packets from the fourth buffer area of the ordering module, and sequentially group the plurality of TLP completion packets according to the length information recorded in the second buffer area, so as to obtain DMA data packets arranged in the first order. According to the technical scheme, the DMA read requests are sequentially sent, the received TLP completion packets are packed by utilizing the recorded length information sequence of the DMA read requests, and the DMA data packets which are arranged according to the sending sequence of the DMA read requests are obtained, so that the TLP completion packets corresponding to the complete data files are distinguished.

Optionally, in a specific implementation manner of the embodiment of the present invention, when the first obtaining module 301 performs the step of recording, in the second buffer, the sequence of length information in the plurality of DMA read requests is specifically used to:

sequentially reading length information in a plurality of DMA read requests according to a first sequence, and sequentially storing the read length information into a second buffer area;

or,

Optionally, in a specific implementation manner of this embodiment of the present invention, the second obtaining module 302 is further configured to record, in the fifth buffer, a TLP length information sequence in the plurality of TLP read requests, so as to characterize a one-to-one correspondence between the ordering of each TLP read request in the third buffer and the TLP length information in each TLP read request, so as to determine, according to the TLP length information sequence, whether the storing of the plurality of TLP completion packets is completed when the plurality of TLP completion packets are stored in the fourth buffer in an ascending order of TAG numbers.

Optionally, in a specific implementation manner of the embodiment of the present invention, when the second obtaining module 302 performs the step of recording, in the fifth buffer, the TLP length information sequence in the plurality of TLP read requests, the method is specifically used for:

Sequentially reading TLP length information in the plurality of TLP read requests according to the storing sequence of the plurality of TLP read requests, and sequentially storing the read TLP length information in a fifth buffer area;

or,

Optionally, in a specific implementation manner of the embodiment of the present invention, the second obtaining module 302 is further configured to use a minimum value among the 4KB boundary, a maximum payload of the TLP completion packet, and an upper limit of a data size corresponding to each TAG number that is agreed as the preset cut length.

Optionally, in a specific implementation manner of the embodiment of the present invention, the DMA packet transmission device shown in fig. 3 is implemented in a transaction layer of a PCIE bus. Referring to fig. 4, the transaction layer of the PCIE bus is divided into a DMA transceiver layer 401, a TLP transceiver layer 402, and a transmission control layer 403; the DMA transceiver layer 401 includes a first acquisition module 301 and a packet module 306; the TLP transceiver layer 402 includes the second acquisition module 302 and the ordering model 305; the transmission control layer 403 includes a third acquisition module 303 and a reception module 304.

Optionally, in a specific implementation manner of the embodiment of the present invention, the third obtaining module 303 is further configured to count, in a register, TLP read requests sent to the host 101 to obtain a first count value; the receiving module 304 is further configured to count the received TLP completion packet in a register to obtain a second count value; and performing flow control on the IP core according to the difference value between the first count value and the second count value.

Optionally, in a specific implementation manner of the embodiment of the present invention, the ordering module 305 is further configured to flow control the fourth buffer according to the size of the fourth buffer, the upper limit of the number of IP core packets, and the number of TLP completion packets stored and read.

Optionally, in a specific implementation manner of the embodiment of the present invention, the packetizing module 306 is further configured to stream the receiving buffer according to the size of the receiving buffer, the number of DMA packets that have been packetized, and the number of DMA packets that have been sent. Wherein the receiving buffer is located on the packet-forming module.

Further, the embodiment of the invention also discloses a PCIE device, which is configured to execute the DMA packet transmission method according to any one of the foregoing embodiments.

It will be apparent to those skilled in the art that the techniques of embodiments of the present invention may be implemented in software plus a necessary general purpose hardware platform. Based on such understanding, the technical solutions in the embodiments of the present invention may be embodied in essence or what contributes to the prior art in the form of a software product, which may be stored in a storage medium, such as a ROM/RAM, a magnetic disk, an optical disk, etc., including several instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the method described in the embodiments or some parts of the embodiments of the present invention.

In the present specification, each embodiment is described in a progressive manner, and identical and similar parts of each embodiment are referred to each other, so that the embodiments of the present invention and features in the embodiments may be combined with each other without conflict, and each embodiment focuses on differences from other embodiments. In particular, for system and apparatus embodiments, the description is relatively simple, as it is substantially similar to method embodiments, with reference to the description of method embodiments in part.

The terms "first," "second," "third," "fourth" and the like in the description and in the claims and in the above drawings, if any, are used for distinguishing between similar objects and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used may be interchanged where appropriate such that the embodiments of the invention described herein may be implemented, for example, in sequences other than those illustrated or otherwise described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.

The embodiments of the present invention described above do not limit the scope of the present invention. Any modifications, equivalent substitutions and improvements made within the spirit and principles of the present invention should be included in the scope of the present invention.

Claims

1. A DMA packet transmission method, which is applied to PCIE devices, the method comprising:

Acquiring a plurality of DMA read requests, and storing the acquired DMA read requests into a first buffer area according to a first sequence; each DMA read request comprises address information and length information of a DMA data packet corresponding to the DMA read request;

recording the length information sequence of a plurality of DMA read requests in the second buffer area so as to represent the one-to-one correspondence between the ordering of each DMA read request in the first buffer area and the length information in each DMA read request;

sequentially acquiring a plurality of DMA read requests in a first buffer area, and calculating TLP address information and TLP length information corresponding to each DMA read request according to a preset cutting length and address information and length information in each DMA read request; wherein the TLP address information and the TLP length information are in one-to-one correspondence;

setting TAG numbers for each pair of TLP address information and TLP length information which are in one-to-one correspondence according to a second sequence, so as to obtain a plurality of TLP read requests; each TLP read request includes a TAG number, TLP address information, and TLP length information;

storing the plurality of TLP read requests in the third buffer according to the increasing sequence of TAG numbers;

sequentially acquiring a plurality of TLP read requests in a third buffer area, and sequentially sending the TLP read requests to a host through an IP core;

Receiving a plurality of TLP completion packets fed back by the host based on the plurality of TLP read requests; wherein the TLP completion packets correspond to TLP read requests, each TLP completion packet including a TAG number;

storing the plurality of TLP completion packets in a fourth buffer according to the increasing sequence of TAG numbers;

and sequentially reading the plurality of TLP completion packets from the fourth buffer area, and grouping the plurality of TLP completion packets according to the length information sequence recorded in the second buffer area, so as to obtain DMA data packets arranged in the first sequence.

2. The method of claim 1, wherein recording the order of length information in the plurality of DMA read requests in the second buffer comprises:

sequentially reading length information in the plurality of DMA read requests according to the storage sequence of the plurality of DMA read requests, and sequentially storing the read length information in the second buffer area;

or,

3. The method of claim 1, wherein the method further comprises:

4. The method of claim 3, wherein recording a TLP length information sequence in the plurality of TLP read requests in the fifth buffer comprises:

or,

5. The method of any of claims 1 to 4, wherein the method is implemented at a transaction layer of a PCIE bus, the transaction layer being divided into a DMA transceiver layer, a TLP transceiver layer, and a transport control layer; wherein,

the DMA transceiver layer is configured to perform the following steps:

acquiring a plurality of DMA read requests, and storing the acquired DMA read requests into a first buffer area according to a first sequence;

recording the length information sequence in the plurality of DMA read requests in the second buffer area to represent the one-to-one correspondence between the ordering of each DMA read request in the first buffer area and the length information in each DMA read request;

sequentially obtaining a plurality of TLP completion packets from the fourth buffer area, and grouping the plurality of TLP completion packets according to the length information sequence recorded in the second buffer area, so as to obtain DMA data packets arranged in a first sequence;

The TLP transceiver layer is configured to perform the following steps:

sequentially acquiring a plurality of DMA read requests in a first buffer area, and calculating TLP address information and TLP length information corresponding to each DMA read request according to a preset cutting length and address information and length information in each DMA read request;

setting TAG numbers for each pair of TLP address information and TLP length information which are in one-to-one correspondence according to a second sequence, so as to obtain a plurality of TLP read requests;

the transmission control layer is configured to perform the following steps:

receiving a plurality of TLP completion packets fed back by the host based on the plurality of TLP read requests; wherein the TLP completion packets correspond to TLP read requests, each TLP completion packet including a TAG number.

6. The method of claim 5, wherein the transmission control layer is further configured to perform the steps of:

counting TLP read requests sent to a host in a register form to obtain a first count value;

7. The method of claim 5, wherein the TLP transceiving layer is further configured to perform the steps of:

the fourth buffer area is subjected to flow control according to the size of the fourth buffer area, the upper limit of the number of IP core packets and the number of TLP completion packets stored and read;

the DMA transceiver layer is further configured to perform the following steps:

8. A DMA packet transfer device, applied to a PCIE device, comprising:

the first acquisition module is used for acquiring a plurality of DMA read requests and storing the acquired DMA read requests into the first buffer area according to a first sequence; each DMA read request comprises address information and length information of a DMA data packet corresponding to the DMA read request; recording the length information sequence of a plurality of DMA read requests in the second buffer area so as to represent the one-to-one correspondence between the ordering of each DMA read request in the first buffer area and the length information in each DMA read request;

The second acquisition module is used for sequentially acquiring a plurality of DMA read requests in the first buffer area from the first acquisition module; calculating TLP address information and TLP length information corresponding to each DMA read request according to the preset cutting length and the address information and the length information in each DMA read request; wherein the TLP address information and the TLP length information are in one-to-one correspondence; setting TAG numbers for each pair of TLP address information and TLP length information which are in one-to-one correspondence according to a second sequence, so as to obtain a plurality of TLP read requests; each TLP read request includes a TAG number, TLP address information, and TLP length information; storing the plurality of TLP read requests in the third buffer according to the increasing sequence of TAG numbers;

a third obtaining module, configured to sequentially obtain, from the second obtaining module, a plurality of TLP read requests in a third buffer area, and sequentially send the plurality of TLP read requests to the host through the IP core;

a receiving module, configured to receive a plurality of TLP completion packets fed back by the host based on a plurality of TLP read requests; wherein the TLP completion packets correspond to TLP read requests, each TLP completion packet including a TAG number;

a sequencing module, configured to store a plurality of TLP completion packets received from the receiving module into a fourth buffer area according to an ascending order of TAG numbers;

A packet grouping module, configured to sequentially read a plurality of TLP completion packets from the fourth buffer area of the ordering module, and sequentially group the plurality of TLP completion packets according to the length information recorded in the second buffer area, so as to obtain DMA data packets arranged in the first order.

9. The apparatus of claim 8, wherein the apparatus is implemented at a transaction layer of a PCIE bus, the transaction layer being divided into a DMA transceiver layer, a TLP transceiver layer, and a transport control layer; wherein,

the DMA receiving and transmitting layer comprises the first acquisition module and the packet assembly module; the TLP transceiver layer includes the second acquisition module and the ordering model; the transmission control layer comprises the third acquisition module and the receiving module.

10. PCIE device, characterized in that it is configured to perform the DMA packet transfer method according to any of claims 1 to 7.