CN116340246A

CN116340246A - Data pre-reading method and medium for direct memory access read operation

Info

Publication number: CN116340246A
Application number: CN202310575644.2A
Authority: CN
Inventors: 彭海远
Original assignee: Zhuhai Xingyun Zhilian Technology Co Ltd
Current assignee: Zhuhai Xingyun Zhilian Technology Co Ltd
Priority date: 2023-05-22
Filing date: 2023-05-22
Publication date: 2023-06-27
Anticipated expiration: 2043-05-22
Also published as: CN116340246B

Abstract

The application provides a data pre-reading method and medium for direct memory access read operation. The method comprises the following steps: transmitting, by a first device, a plurality of direct memory access read messages to a second device, the first direct memory access read messages indicating access to a first data space of a second memory of the second device and reading first data; transmitting a plurality of corresponding direct memory access response messages to the first device through the second device, wherein the first direct memory access response message indicates the data length of the residual data relative to the first data in the first data space; a first direct memory access read operation for the first data or a second direct memory access read operation for the first data space is selectively initiated by the first device based on the data length of the remaining data. Thus, the service throughput performance is improved, the service transmission delay is reduced, and the effect of improving the service processing performance is achieved.

Description

Data pre-reading method and medium for direct memory access read operation

Technical Field

The present disclosure relates to the field of computer technologies, and in particular, to a data pre-reading method and medium for direct memory access read operation.

Background

With the development of data centers, cloud computing, and other applications, direct memory access (Direct Memory Access, DMA) technology is widely used to improve input/output efficiency and reduce memory copy overhead. To increase data processing speed, remote direct memory access (Remote Direct Memory Access, RDMA) techniques may be utilized to quickly move data from one machine into the memory of a remote machine without the data transfer involving the system kernel. From the standpoint of data production and data consumption, DMA and RDMA are all processed by a data consumer, such as a peripheral device that consumes data, to initiate a DMA read operation to a data producer, such as a host that stores the data to be consumed, to thereby obtain the corresponding data. However, in the DMA technology and RDMA technology used for data reading in the prior art, when the massive data reading service requests are faced, poor performance in throughput and delay is also poor in indexes of service processing performance, because processing bandwidth, computing power, memory overhead, service delay and the like caused by initiating DMA reading operation and responding are much more than the benefits brought by the DMA reading operation.

For this reason, the present application proposes a data pre-reading method and medium for direct memory access read operation, so as to solve the technical problems in the prior art.

Disclosure of Invention

In a first aspect, the present application provides a data pre-read method for a direct memory access read operation. The data pre-reading method comprises the following steps: transmitting, by a first device, a plurality of direct memory access read messages to a second device, wherein a first direct memory access read message is any one of the plurality of direct memory access read messages, the first direct memory access read message indicating access to a first data space of a second memory of the second device and reading first data located in the first data space for storage in the first memory of the first device; transmitting, by the second device, a plurality of direct memory access response messages corresponding to the plurality of direct memory access read messages one to the first device, where a first direct memory access response message corresponds to the first direct memory access read message, and the first direct memory access response message indicates a data length of remaining data in the first data space relative to the first data; and selectively initiating, by the first device, a first direct memory access read operation for the first data or a second direct memory access read operation for the first data space based on a data length of remaining data in the first data space relative to the first data indicated by the first direct memory access response message.

According to the first aspect of the application, the service throughput performance is improved, the service transmission delay is reduced, and the effect of improving the service processing performance is achieved.

In a possible implementation manner of the first aspect of the present application, the second direct memory access read operation includes performing a full data read on the first data space.

In a possible implementation manner of the first aspect of the present application, the second direct memory access read operation includes reading the first data and the remaining data in the first data space relative to the first data.

In a possible implementation manner of the first aspect of the present application, selectively initiating, by the first device, the first direct memory access read operation for the first data or the second direct memory access read operation for the first data based on a data length of the remaining data relative to the first data in the first data space indicated by the first direct memory access response message includes: judging whether the residual data is micro data or not through the first device, initiating the first direct memory access reading operation of the first data when the residual data is not micro data, and initiating the second direct memory access reading operation of the first data space when the residual data is micro data.

In a possible implementation manner of the first aspect of the present application, when the remaining data is not micro data, the second device does not send the remaining data to the first device.

In a possible implementation manner of the first aspect of the present application, when the data length of the remaining data is smaller than a preset data length, the first device determines that the remaining data is trace data.

In a possible implementation manner of the first aspect of the present application, when the data length of the remaining data is smaller than the data length of the header of the first direct memory access read packet, the first device determines that the remaining data is trace data.

In a possible implementation manner of the first aspect of the present application, when the data length of the remaining data is smaller than the data length of the header of the first direct memory access response packet, the first device determines that the remaining data is trace data.

In a possible implementation manner of the first aspect of the present application, when the remaining data is micro data, the second direct memory access read operation initiated by the first device to the first data space includes performing a data pre-read on the remaining data for storage in the first memory of the first device.

In a possible implementation manner of the first aspect of the present application, the first data is located in a first memory address area of the second memory of the second device, and the first data space is one or more partitions in the second memory, and the one or more partitions entirely include the first memory address area.

In a possible implementation manner of the first aspect of the present application, the second direct memory access read operation on the first data space includes performing a full data read on the one or more partitions, respectively.

In a possible implementation manner of the first aspect of the present application, the second direct memory access read operation initiated by the first device to the first data space includes reading the first data and the remaining data in the first data space relative to the first data for storage in the first memory of the first device.

In a possible implementation manner of the first aspect of the present application, the second direct memory access read packet is a next direct memory access read packet, relative to the first direct memory access read packet, of the plurality of direct memory access read packets, the second direct memory access read packet indicating that at least the first data space of the second memory of the second device is accessed and second data in the remaining data located in the first data space is read, the first device reading the second data through the remaining data stored in the first memory of the first device.

In a possible implementation manner of the first aspect of the present application, the second direct memory access read packet further indicates to access a second data space of the second memory of the second device different from the first data space and to read third data located in the second data space, and the first device reads the third data by initiating a third direct memory access read operation on the third data.

In a possible implementation manner of the first aspect of the present application, the first device is a peripheral device, the second device is a host device, the first memory is a memory of the peripheral device, and the second memory is a memory of the host device.

In a second aspect, embodiments of the present application further provide a computer device, where the computer device includes a memory, a processor, and a computer program stored on the memory and executable on the processor, where the processor implements a method according to any implementation manner of any one of the foregoing aspects when the computer program is executed.

In a third aspect, embodiments of the present application also provide a computer-readable storage medium storing computer instructions that, when run on a computer device, cause the computer device to perform a method according to any one of the implementations of any one of the above aspects.

In a fourth aspect, embodiments of the present application also provide a computer program product comprising instructions stored on a computer-readable storage medium, which when run on a computer device, cause the computer device to perform a method according to any one of the implementations of any one of the above aspects.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings needed in the description of the embodiments will be briefly introduced below, and it is obvious that the drawings in the following description are some embodiments of the present application, and other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

Fig. 1 is a schematic diagram of a first device initiating a direct memory access read operation to a second device according to an embodiment of the present application;

FIG. 2 is a schematic flow chart of a data pre-reading method for a direct memory access read operation according to an embodiment of the present application;

fig. 3 is a schematic structural diagram of a computing device according to an embodiment of the present application.

Detailed Description

Embodiments of the present application will be described in further detail below with reference to the accompanying drawings.

It should be understood that in the description of this application, "at least one" means one or more than one, and "a plurality" means two or more than two. In addition, the words "first," "second," and the like, unless otherwise indicated, are used solely for the purposes of description and are not to be construed as indicating or implying a relative importance or order.

Fig. 1 is a schematic diagram of a first device initiating a direct memory access read operation to a second device according to an embodiment of the present application. As shown in fig. 1, the first device 110 sends a direct memory access read message 130 to the second device 120, and the second device 120 sends a direct memory access reply message 140 to the first device 110 in response to the direct memory access read message 130. In this way, the first device 110 initiates a direct memory access (Direct Memory Access, DMA) read operation to the second device 120 using one DMA read operation and one DMA reply data based on the direct memory access read message 130 and the direct memory access reply message 140 in response using the direct memory access (Direct Memory Access, DMA) technique. Each direct memory access read operation initiated by the first device 110 to the second device 120 involves a DMA read operation and a DMA reply data, as well as a direct memory access read message 130 and a direct memory access reply message 140. Also, the direct memory access read message 130 and the direct memory access reply message 140 are generally in accordance with standard communication protocols and follow a certain data format. For example, at the time of inter-access between the peripheral component interconnect express (peripheral component interconnect express, PCIE) buses or PCIE devices, the transferred data packets are encapsulated into transport layer packet (transaction layer packet, TLP) packets, so that both the direct memory access read packet 130 and the direct memory access reply packet 140 have the data format of TLP, for example, have a TLP header of 16 bytes or more. From the point of view of data production and data consumption, the first device 110 is used to consume data, for example, the first device 110 may be a peripheral device that is to consume data, the second device 120 is used to produce data or provide data to be consumed, for example, the second device 120 may be a host that stores data to be consumed. The first device 110 generally has a preferential data storage space, for example, a peripheral device such as a notebook computer, a mobile phone, a smart terminal, etc., has a limited memory space and is required to perform tasks other than storing data by using the limited data storage space as much as possible. While the second device 120 typically has a relatively abundant data storage space, such as a data center server, mainframe, cloud computing platform, etc. Accordingly, the data to be consumed is typically stored in the second device 120, such as a host memory, and when the first device 110, such as a peripheral device, needs to consume a certain amount of data, the first device 110 initiates a direct memory access read operation to the second device 120, so as to obtain the corresponding data to be consumed from the second device 120 for processing. In a multi-queue application scenario, multiple queues may be arranged for data interaction between the first device 110 and the second device 120, e.g., the first device 110 may initiate multiple direct memory access read operations to obtain multiple data through the multiple queues, respectively. Whether direct memory access read operations are initiated through a single queue or direct memory access read operations are initiated through multiple queues, each direct memory access read operation involves at least some amount of processing bandwidth, computational effort, and memory loss, which also results in at least some degree of traffic latency. This is because each direct memory access read operation involves the sending of a direct memory access read message 130 by the data consumer, i.e., the first device 110, to the data producer, or the data provider, i.e., the second device 120, and the sending of a direct memory access reply message 140 by the second device 120 to the first device 110 in response to the direct memory access read message 130.

With continued reference to FIG. 1, a direct memory access read operation is to directly read data stored at a particular address in memory from that particular address. Each direct memory access read operation requires a read request message to be sent and a reply message to be sent. In a multi-queue application, each queue initiates operations and retrieves corresponding data through respective direct memory access read operations. Thus, each direct memory access read operation includes issuing a direct memory access read message 130 by the data consumer, i.e., the first device 110, to the data producer, or the data provider, i.e., the second device 120, and issuing a direct memory access reply message 140 by the second device 120 to the first device 110 in response to the direct memory access read message 130. In application scenarios such as data centers and cloud computing, it is often required to face massive data read service requests, and many of these data read service requests utilize DMA (which is also applicable to RDMA scenarios) technology to improve data processing performance, so that a large number of read request messages and response messages (such as direct memory access read message 130 and direct memory access response message 140) are involved. As described above, depending on the particular communication standard protocol and interface specifications employed, the direct memory access read message 130 and the direct memory access reply message 140 are typically encapsulated according to certain rules into a message packet having a certain data format and typically have a header of a fixed length. For example, the direct memory access read packet 130 and the direct memory access reply packet 140 may be TLPs and have TLP headers with a length of at least 16 bytes or more. When the trace data is obtained through the direct memory access read operation, the data length of the trace data itself may not be as long as the respective header lengths (for example, a TLP header of 16 bytes or more) of the read request message and the response message. That is, when the direct memory access read operation feeds back only trace data, compared to the packet overhead, PCIE bus processing bandwidth, and traffic delay caused by initiating the direct memory access read operation itself, it is not cost-effective in terms of traffic processing performance. Therefore, when there are a large number of direct memory access read operations for trace data, it is disadvantageous to improve the service processing performance. Throughput and service delay are indexes considering service processing performance on the whole, and each time trace data is read, direct memory access read operation and direct memory access response data are performed once more, and the length of a TLP (packet transfer protocol) of direct memory access read or direct memory access response is more than 16 bytes. Therefore, multiple reads for obtaining the data to be consumed are initiated to complete a task, where some direct memory access read operations obtain only trace data (for example, the data length is smaller than the respective header lengths of the read request message and the response message required to complete one direct memory access read operation), and such read operations on the trace data consume the processing bandwidth of PCIE bus transceiving to reduce the service throughput performance, and increase the waiting time of one DMA response to increase the service delay. When the service processing performance is evaluated through throughput and service delay as a whole, the processing bandwidth, calculation power, memory overhead, service delay and the like caused by DMA read operation initiated by trace data and response are exceeded by the benefits brought by DMA read operation. Therefore, when there are a large number of DMA read operations for trace data, it is disadvantageous to improve the service processing performance. Various improvements to the data pre-read method for direct memory access read operations provided by embodiments of the present application are described in further detail below in conjunction with fig. 2.

It should be appreciated that the first device shown in fig. 1 initiates a direct memory access read operation to the second device, and may also be applicable to the relevant scenario of remote direct memory access (Remote Direct Memory Access, RDMA). The first device 110 may utilize RDMA technology to quickly move data from a remote machine, i.e., the second device 120, relative to the first device 110, and the data transfer does not involve a system kernel. In the context of RDMA, the data interaction between the first device 110 and the second device 120 and the communication flow involved in order to implement RDMA data reading follow an RDMA communication protocol, e.g. an RDMA connection may be established between the first device 110 and the second device 120 or belong to the same RDMA network. Each direct memory access read operation initiated by the first device 110 to the second device 120 involves at least a certain amount of processing bandwidth, computational effort, and memory loss, which also results in at least a certain degree of service delay; similarly, RDMA data read operations initiated by the first device 110 to the second device 120 may involve at least some amount of processing bandwidth, computational effort, and memory loss, and may result in at least some degree of traffic latency. Therefore, the data pre-reading method for direct memory access read operation provided by the embodiment of the application can be used for improving the service processing performance in the related scenes of DMA and RDMA.

Fig. 2 is a flowchart of a data pre-reading method for a direct memory access read operation according to an embodiment of the present application. As shown in fig. 2, the data pre-reading method includes the following steps.

Step S202: and sending, by a first device, a plurality of direct memory access read messages to a second device, wherein a first direct memory access read message is any one of the plurality of direct memory access read messages, the first direct memory access read message indicating access to a first data space of a second memory of the second device and reading first data located in the first data space for storage in the first memory of the first device.

Step S204: and sending, by the second device, a plurality of direct memory access response messages corresponding to the plurality of direct memory access read messages one to the first device, where a first direct memory access response message corresponds to the first direct memory access read message, and the first direct memory access response message indicates a data length of remaining data in the first data space relative to the first data.

Step S206: and selectively initiating, by the first device, a first direct memory access read operation for the first data or a second direct memory access read operation for the first data space based on a data length of remaining data in the first data space relative to the first data indicated by the first direct memory access response message.

Referring to the above steps, in step S202, a plurality of direct memory access read messages are sent to a second device through a first device. Here, multiple direct memory access read messages may be sent through a single queue or multiple queues or any other manner. Taking the example of the first device 110 initiating the direct memory access read operation to the second device 120 shown in fig. 1, each direct memory access read operation includes issuing a direct memory access read message 130 from the data consumer, i.e., the first device 110, to the data producer, or the data provider, i.e., the second device 120, and issuing a direct memory access reply message 140 from the second device 120 to the first device 110 as a response to the direct memory access read message 130. Also, depending on the particular communication standard protocol and interface specifications employed, the direct memory access read message 130 and the direct memory access reply message 140 are typically encapsulated according to certain rules into a message packet having a certain data format and typically have a header of a fixed length. For example, the direct memory access read packet 130 and the direct memory access reply packet 140 may be TLPs and have TLP headers with a length of at least 16 bytes or more. Here, the first direct memory access read message is any one of the plurality of direct memory access read messages, the first direct memory access read message indicating a first data space accessing a second memory of the second device and reading first data located in the first data space for storage in the first memory of the first device. The first direct memory access read message may be, for example, the direct memory access read message 130 shown in fig. 1. The first direct memory access read packet may be based on a specifically adopted communication standard protocol and interface specification, for example, the first direct memory access read packet may be a TLP packet. A direct memory access read operation is to directly read data stored at a particular address in memory from that particular address. Each direct memory access read operation requires a read request message to be sent and a reply message to be sent. In a multi-queue application, each queue initiates operations and retrieves corresponding data through respective direct memory access read operations. The first direct memory access read message indicates to access a first data space of a second memory of the second device and to read first data located in the first data space for storage in the first memory of the first device. Thus, the first direct memory access read message indicates a first data space of a second memory of the second device and reads first data located in the first data space.

Next, in step S204, a plurality of direct memory access response messages corresponding to the plurality of direct memory access read messages one-to-one are sent to the first device by the second device. Taking the example of the first device 110 initiating a direct memory access read operation to the second device 120 shown in fig. 1, in a multi-queue application scenario, a plurality of queues may be arranged for data interaction between the first device 110 and the second device 120, e.g., the first device 110 may obtain a plurality of data by respectively initiating multiple direct memory access read operations through the plurality of queues. Therefore, the multiple direct memory access response messages corresponding to the multiple direct memory access read messages one to one may also be implemented by a single queue or multiple queues or any other manner. Here, the first direct memory access response packet corresponds to the first direct memory access read packet, and the first direct memory access response packet indicates a data length of remaining data in the first data space with respect to the first data. In step S202, the first direct memory access read message indicates to access a first data space of a second memory of the second device and to read first data located in the first data space for storage in the first memory of the first device, and thus the first direct memory access read message indicates to the first data space of the second memory of the second device and to read first data located in the first data space. That is, the first data is data to be consumed that needs to be acquired by a direct memory access read operation associated with the first direct memory access read message and the first direct memory access response message. In step S204, a first direct memory access response message corresponds to the first direct memory access read message, where the first direct memory access response message is used to complete reading of the first data together with the first direct memory access read message. And, further, the first direct memory access response message indicates a data length of remaining data in the first data space with respect to the first data. This facilitates subsequent improvements in data read operations.

Then, in step S206, a first direct memory access read operation for the first data or a second direct memory access read operation for the first data space is selectively initiated by the first device based on a data length of remaining data with respect to the first data in the first data space indicated by the first direct memory access response message. In step S204, the data length of the remaining data in the first data space relative to the first data is indicated by the first direct memory access response message sent by the second device to the first device, and such key information is used for improving the data reading operation in step S206. Specifically, in step S206, the first device may selectively initiate a first direct memory access read operation for the first data or a second direct memory access read operation for the first data space based on the key information acquired from the second device in step S204, that is, based on the data length of the remaining data in the first data space relative to the first data indicated by the first direct memory access response message. This means that the direct memory access read operation associated with the first direct memory access read message and the first direct memory access reply message, the data to be consumed to be acquired is the first data, and the first direct memory access read message indicates to access a first data space of a second memory of the second device and to read the first data located in the first data space for storage in the first memory of the first device. The first direct memory access response message corresponds to the first direct memory access read message, and the first direct memory access response message is used for completing reading of first data together with the first direct memory access read message. Further, by letting the first direct memory access response message carry key information, that is, the data length of the remaining data in the first data space relative to the first data, the first device may choose to initiate a first direct memory access read operation only for the first data or choose to initiate a second direct memory access read operation for the first data space after receiving the first direct memory access response message. Wherein a first direct memory access read operation means accessing a first data space of a second memory of the second device and reading first data located in the first data space. The second direct memory access read operation means accessing a first data space of a second memory of the second device and reading all data stored in the first data space, i.e. reading the first data and reading the remaining data in the first data space relative to the first data. The first device makes a selection based on the data length of the remaining data relative to the first data in the first data space indicated by the first direct memory access response message, and takes measures more favorable for improving service processing performance from an overall perspective. In a scenario where the first device initiates the direct memory access read operation to the second device as shown in fig. 1, when there is a large number of DMA read operations for the micro data, it is not beneficial to improve the overall service processing performance. Specifically, when the trace data is obtained by the direct memory access read operation, the data length of the trace data itself may not be as long as the respective header lengths (for example, a TLP header of 16 bytes or more) of the read request packet and the response packet. That is, when the direct memory access read operation feeds back only trace data, compared to the packet length, PCIE bus processing bandwidth, and traffic delay caused by initiating the direct memory access read operation itself, it is not cost-effective in terms of traffic processing performance. Every time the trace data is read, the direct memory access read operation and the direct memory access response data are more than one time, and the TLP of the direct memory access read or the direct memory access response has a length of more than 16 bytes. Therefore, multiple reads for obtaining the data to be consumed are initiated to complete a task, where some direct memory access read operations obtain only trace data (for example, the data length is smaller than the respective header lengths of the read request message and the response message required to complete one direct memory access read operation), and such read operations on the trace data consume the processing bandwidth of PCIE bus transceiving to reduce the service throughput performance, and increase the waiting time of one DMA response to increase the service delay. When the service processing performance is evaluated through throughput and service delay as a whole, the processing bandwidth, calculation power, memory overhead, service delay and the like caused by DMA read operation initiated by trace data and response are exceeded by the benefits brought by DMA read operation. Therefore, when there are a large number of DMA read operations for trace data, it is disadvantageous to improve the service processing performance. The first device takes measures more advantageous for improving the service processing performance from an overall point of view, which means that the number and frequency of DMA read operations for micro data are prevented from being reduced as much as possible. For this purpose, the first device may determine, based on the data length of the remaining data relative to the first data in the first data space indicated by the first direct memory access response message, which is more favorable for improving overall service processing performance, such as improving throughput and reducing latency, between a first direct memory access read operation only for the first data and a second direct memory access read operation for the first data space. For example, the first direct memory access read operation to the first data is initiated when the remaining data is not trace data, and the second direct memory access read operation to the first data space is initiated when the remaining data is trace data. In this way, when the residual data is the trace data, the second direct memory access reading operation on the first data space is initiated, which is equivalent to the pre-reading of the trace data, so that when the pre-read trace data needs to be acquired later, the trace data can be read without initiating the direct memory access operation, and the pre-read trace data is utilized to meet the data reading requirement, thereby saving the processing bandwidth, the calculation power, the memory cost, the service delay and the like caused by each direct memory access reading operation, including saving the cost caused by the need of sending a read request message and sending a response message for each direct memory access reading operation, and reducing the frequency of the direct memory access reading operation initiated on the trace data as a whole. For example, the required data may be intercepted for processing while the remaining trace data is stored in a queue. When the peripheral equipment operates the data next time, the stored micro data and the data obtained by the DMA reading operation next time are combined to form new data for the peripheral equipment to process. Therefore, through DMA pre-reading operation of the micro data, the frequency and the frequency of DMA reading operation for the micro data are reduced as a whole, the business throughput performance is improved, the business transmission delay is reduced, and the effect of improving the business processing performance is achieved.

In one possible implementation, the second direct memory access read operation includes performing a full data read of the first data space. In this way, a first data space accessing the second memory of the second device is realized by a full data read and all data stored in the first data space are read, i.e. the first data and the remaining data in the first data space relative to the first data are read. The method is equivalent to the reading of the trace data in advance, so that when the trace data to be read needs to be acquired later, the trace data can be read without initiating direct memory access operation, and the read needs of the data can be met by utilizing the trace data to be read in advance, so that the processing bandwidth, the calculation power, the memory cost, the service delay and the like caused by each direct memory access reading operation are saved, including the cost caused by the need of sending a read request message and a response message for each direct memory access reading operation, and the frequency of the direct memory access reading operation initiated on the trace data is reduced as a whole.

In one possible implementation, the second direct memory access read operation includes reading the first data and the remaining data in the first data space relative to the first data. In this way, when the residual data is the trace data, the second direct memory access reading operation on the first data space is initiated, which is equivalent to the pre-reading of the trace data, so that when the pre-read trace data needs to be acquired later, the trace data can be read without initiating the direct memory access operation, and the pre-read trace data is utilized to meet the data reading requirement, thereby saving the processing bandwidth, the calculation power, the memory cost, the service delay and the like caused by each direct memory access reading operation, including saving the cost caused by the need of sending a read request message and sending a response message for each direct memory access reading operation, and reducing the frequency of the direct memory access reading operation initiated on the trace data as a whole.

In one possible implementation, selectively initiating, by the first device, the first direct memory access read operation for the first data or the second direct memory access read operation for the first data space based on a data length of the remaining data relative to the first data in the first data space indicated by the first direct memory access response message includes: judging whether the residual data is micro data or not through the first device, initiating the first direct memory access reading operation of the first data when the residual data is not micro data, and initiating the second direct memory access reading operation of the first data space when the residual data is micro data. In this way, when the residual data is the trace data, the second direct memory access reading operation on the first data space is initiated, which is equivalent to the pre-reading of the trace data, so that when the pre-read trace data needs to be acquired later, the trace data can be read without initiating the direct memory access operation, and the pre-read trace data is utilized to meet the data reading requirement, thereby saving the processing bandwidth, the calculation power, the memory cost, the service delay and the like caused by each direct memory access reading operation, including saving the cost caused by the need of sending a read request message and sending a response message for each direct memory access reading operation, and reducing the frequency of the direct memory access reading operation initiated on the trace data as a whole.

In one possible implementation, the second device does not send the remaining data to the first device when the remaining data is not trace data. Thus, the overall business processing performance is improved.

In one possible embodiment, the first device determines that the remaining data is trace data when the data length of the remaining data is less than a preset data length. Thus, by presetting the data length, the first device can determine, based on the data length of the remaining data relative to the first data in the first data space indicated by the first direct memory access response message, which is more favorable for improving overall service processing performance, such as improving throughput and reducing latency, between the first direct memory access read operation only for the first data and the second direct memory access read operation only for the first data space.

In one possible implementation manner, when the data length of the remaining data is smaller than the data length of the header of the first direct memory access read packet, the first device determines that the remaining data is trace data. Depending on the specific communication standard protocol and interface specifications adopted, the direct memory access read message and the direct memory access response message are generally encapsulated according to a certain rule to form a message data packet with a certain data format and generally have a fixed-length message header. For example, the direct memory access read packet and the direct memory access reply packet may be TLPs and have a TLP header with a length of at least 16 bytes or more. Therefore, by comparing the data length of the remaining data with the data length of the header of the first direct memory access read packet, it is helpful to determine which one of the first direct memory access read operation for the first data and the second direct memory access read operation for the first data space is more advantageous for improving overall service processing performance, such as improving throughput and reducing latency.

In one possible implementation manner, when the data length of the remaining data is smaller than the data length of the header of the first direct memory access response message, the first device determines that the remaining data is trace data. By comparing the data length of the remaining data with the data length of the header of the first direct memory access response message, it is helpful to determine which one of the first direct memory access read operation for the first data and the second direct memory access read operation for the first data space is more advantageous for improving overall service processing performance, such as improving throughput and reducing latency.

In one possible implementation, when the remaining data is micro data, the second direct memory access read operation initiated by the first device to the first data space includes performing a data pre-read on the remaining data for storage in the first memory of the first device. When the read-ahead trace data is needed to be obtained later, the read-ahead trace data can be used for meeting the data reading requirement without initiating the direct memory access operation to read the trace data, so that the processing bandwidth, the calculation power, the memory expense, the service delay and the like caused by each direct memory access reading operation are saved, the expense caused by the fact that each direct memory access reading operation needs to send a read request message and send a response message is saved, and the frequency of the direct memory access reading operation initiated on the trace data is reduced as a whole.

In one possible implementation, the first data is located in a first memory address region of the second memory of the second device, the first data space is one or more partitions in the second memory, and the one or more partitions collectively include the first memory address region. Therefore, data reading is performed through the partition, and the overall efficiency is improved. In some embodiments, the second direct memory access read operation on the first data space includes performing a full data read on the one or more partitions, respectively. Therefore, the full data reading is performed through the partition, which is equivalent to the micro data pre-reading performed by the partition, and the whole efficiency is improved.

In one possible implementation, the second direct memory access read operation initiated by the first device to the first data space includes reading the first data and the remaining data in the first data space relative to the first data for storage in the first memory of the first device. In this way, when the residual data is the trace data, the second direct memory access reading operation on the first data space is initiated, which is equivalent to the pre-reading of the trace data, so that when the pre-read trace data needs to be acquired later, the trace data can be read without initiating the direct memory access operation, and the pre-read trace data is utilized to meet the data reading requirement, thereby saving the processing bandwidth, the calculation power, the memory cost, the service delay and the like caused by each direct memory access reading operation, including saving the cost caused by the need of sending a read request message and sending a response message for each direct memory access reading operation, and reducing the frequency of the direct memory access reading operation initiated on the trace data as a whole. In some embodiments, a second direct memory access read message is a next direct memory access read message of the plurality of direct memory access read messages relative to the first direct memory access read message, the second direct memory access read message indicating at least access to the first data space of the second memory of the second device and reading second data in the remaining data located in the first data space, the first device reading the second data through the remaining data stored in the first memory of the first device. Therefore, when the second data to be read in the subsequent read message, such as the second direct memory access read message, is in the residual data read in advance, the first device can meet the requirement of the second direct memory access read message through the first memory, so that the direct memory access operation is not required to be initiated to read the trace data, the data reading requirement is met by utilizing the read trace data, and the processing bandwidth, the calculation power, the memory overhead, the service delay and the like caused by each direct memory access read operation are saved. In some embodiments, the second direct memory access read message further indicates to access a second data space of the second memory of the second device that is different from the first data space and to read third data located in the second data space, the first device reading the third data by initiating a third direct memory access read operation on the third data. Therefore, the micro data which is read in advance and the data to be obtained in the next DMA reading operation, namely the third data, can be combined to form new data, so that the frequency and the frequency of the DMA reading operation for the micro data are reduced as a whole, the service throughput performance is improved, the service transmission delay is reduced, and the effect of improving the service processing performance is achieved.

In one possible implementation, the first device is a peripheral device, the second device is a host device, the first memory is a memory of the peripheral device, and the second memory is a memory of the host device. From the point of view of data production and data consumption, the first device is used for consuming data, e.g. the first device may be a peripheral device for consuming data, the second device is used for producing data or providing data to be consumed, e.g. the second device may be a host storing data to be consumed. The first device generally has a preferential data storage space, and peripherals such as a notebook computer, a mobile phone, an intelligent terminal and the like have limited memory space and use the limited data storage space as much as possible to complete other tasks besides storing data. While the second device typically has a relatively abundant data storage space, such as data center servers, hosts, cloud computing platforms, and the like. Therefore, the data to be consumed is generally stored in the memory of the second device, such as a host, and when the first device, such as a peripheral, needs to consume a certain amount of data, the first device initiates a direct memory access reading operation to the second device so as to acquire the corresponding data to be consumed from the second device for processing.

Fig. 3 is a schematic structural diagram of a computing device provided in an embodiment of the present application, where the computing device 300 includes: one or more processors 310, a communication interface 320, and a memory 330. The processor 310, the communication interface 320 and the memory 330 are interconnected by a bus 340. Optionally, the computing device 300 may further include an input/output interface 350, where the input/output interface 350 is connected to an input/output device for receiving parameters set by a user, etc. The computing device 300 can be used to implement some or all of the functionality of the device embodiments or system embodiments described above in the embodiments of the present application; the processor 310 can also be used to implement some or all of the operational steps of the method embodiments described above in the embodiments of the present application. For example, specific implementations of the computing device 300 performing various operations may refer to specific details in the above-described embodiments, such as the processor 310 being configured to perform some or all of the steps of the above-described method embodiments or some or all of the operations of the above-described method embodiments. For another example, in the embodiment of the present application, the computing device 300 may be used to implement some or all of the functions of one or more components in the apparatus embodiments described above, and the communication interface 320 may be used in particular for communication functions and the like necessary for implementing the functions of these apparatuses, components, and the processor 310 may be used in particular for processing functions and the like necessary for implementing the functions of these apparatuses, components.

It should be appreciated that the computing device 300 of fig. 3 may include one or more processors 310, and that the plurality of processors 310 may cooperatively provide processing power in a parallelized connection, a serialized connection, a serial-parallel connection, or any connection, or the plurality of processors 310 may constitute a processor sequence or processor array, or the plurality of processors 310 may be separated into primary and secondary processors, or the plurality of processors 310 may have different architectures such as employing heterogeneous computing architectures. In addition, the computing device 300 shown in FIG. 3, the associated structural and functional descriptions are exemplary and not limiting. In some example embodiments, computing device 300 may include more or fewer components than shown in fig. 3, or combine certain components, or split certain components, or have a different arrangement of components.

Processor 310 may take many specific forms, for example, processor 310 may include one or more combinations of a central processing unit (central processing unit, CPU), a graphics processor (graphic processing unit, GPU), a neural network processor (neural-network processing unit, NPU), a tensor processor (tensor processing unit, TPU), or a data processor (data processing unit, DPU), and embodiments of the present application are not limited in detail. Processor 310 may also be a single-core processor or a multi-core processor. The processor 310 may be formed by a combination of a CPU and a hardware chip. The hardware chip may be an application-specific integrated circuit (ASIC), a programmable logic device (programmable logic device, PLD), or a combination thereof. The PLD may be a complex programmable logic device (complex programmable logic device, CPLD), a field-programmable gate array (field-programmable gate array, FPGA), general-purpose array logic (generic array logic, GAL), or any combination thereof. The processor 310 may also be implemented solely with logic devices incorporating processing logic, such as an FPGA or digital signal processor (digital signal processor, DSP) or the like. The communication interface 320 may be a wired interface, which may be an ethernet interface, a local area network (local interconnect network, LIN), etc., or a wireless interface, which may be a cellular network interface, or use a wireless local area network interface, etc., for communicating with other modules or devices.

The memory 330 may be a nonvolatile memory such as a read-only memory (ROM), a Programmable ROM (PROM), an Erasable PROM (EPROM), an electrically Erasable EPROM (EEPROM), or a flash memory. Memory 330 may also be volatile memory, which may be random access memory (random access memory, RAM) used as external cache. By way of example, and not limitation, many forms of RAM are available, such as Static RAM (SRAM), dynamic RAM (DRAM), synchronous DRAM (SDRAM), double data rate SDRAM (DDR SDRAM), enhanced SDRAM (ESDRAM), synchronous DRAM (SLDRAM), and direct memory bus RAM (DR RAM). Memory 330 may also be used to store program code and data such that processor 310 invokes the program code stored in memory 330 to perform some or all of the operational steps of the method embodiments described above, or to perform corresponding functions in the apparatus embodiments described above. Moreover, computing device 300 may contain more or fewer components than shown in FIG. 3, or may have a different configuration of components.

Bus 340 may be a peripheral component interconnect express (peripheral component interconnect express, PCIe) bus, or an extended industry standard architecture (extended industry standard architecture, EISA) bus, a unified bus (Ubus or UB), a computer quick link (compute express link, CXL), a cache coherent interconnect protocol (cache coherent interconnect for accelerators, CCIX), or the like. The bus 340 may be divided into an address bus, a data bus, a control bus, and the like. The bus 340 may include a power bus, a control bus, a status signal bus, and the like in addition to a data bus. But is shown with only one bold line in fig. 3 for clarity of illustration, but does not represent only one bus or one type of bus.

The method and the device provided in the embodiments of the present application are based on the same inventive concept, and because the principles of solving the problems by the method and the device are similar, the embodiments, implementations, examples or implementation of the method and the device may refer to each other, and the repetition is not repeated. Embodiments of the present application also provide a system that includes a plurality of computing devices, each of which may be structured as described above. The functions or operations that may be implemented by the system may refer to specific implementation steps in the above method embodiments and/or specific functions described in the above apparatus embodiments, which are not described herein.

Embodiments of the present application also provide a computer-readable storage medium having stored therein computer instructions which, when executed on a computer device (e.g., one or more processors), may implement the method steps in the above-described method embodiments. The specific implementation of the processor of the computer readable storage medium in executing the above method steps may refer to specific operations described in the above method embodiments and/or specific functions described in the above apparatus embodiments, which are not described herein again.

It will be appreciated by those skilled in the art that embodiments of the present application may be provided as a method, system, or computer program product. The present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Embodiments of the present application may be implemented in whole or in part by software, hardware, firmware, or any other combination. When implemented in software, the above-described embodiments may be implemented in whole or in part in the form of a computer program product. The present application may take the form of a computer program product embodied on one or more computer-usable storage media having computer-usable program code embodied therein. The computer program product includes one or more computer instructions. When loaded or executed on a computer, produces a flow or function in accordance with embodiments of the present application, in whole or in part. The computer may be a general purpose computer, a special purpose computer, a computer network, or other programmable apparatus. The computer instructions may be stored in a computer-readable storage medium or transmitted from one computer-readable storage medium to another computer-readable storage medium, for example, the computer instructions may be transmitted from one website, computer, server, or data center to another website, computer, server, or data center by a wired (e.g., coaxial cable, fiber optic, digital subscriber line), or wireless (e.g., infrared, wireless, microwave, etc.). Computer readable storage media can be any available media that can be accessed by a computer or data storage devices, such as servers, data centers, etc. that contain one or more collections of available media. Usable media may be magnetic media (e.g., floppy disks, hard disks, tape), optical media, or semiconductor media. The semiconductor medium may be a solid state disk, or may be a random access memory, flash memory, read only memory, erasable programmable read only memory, electrically erasable programmable read only memory, register, or any other form of suitable storage medium.

The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the application. Each flow and/or block of the flowchart and/or block diagrams, and combinations of flows and/or blocks in the flowchart and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks. These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks. These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

In the foregoing embodiments, the descriptions of the embodiments are emphasized, and for parts of one embodiment that are not described in detail, reference may be made to the related descriptions of other embodiments. It will be apparent to those skilled in the art that various modifications and variations can be made to the embodiments of the present application without departing from the spirit and scope of the embodiments of the present application. The steps in the method of the embodiment of the application can be sequentially adjusted, combined or deleted according to actual needs; the modules in the system of the embodiment of the application can be divided, combined or deleted according to actual needs. Such modifications and variations of the embodiments of the present application are intended to be included herein, if they fall within the scope of the claims and their equivalents.

Claims

1. A data pre-read method for a direct memory access read operation, the data pre-read method comprising:

transmitting, by a first device, a plurality of direct memory access read messages to a second device, wherein a first direct memory access read message is any one of the plurality of direct memory access read messages, the first direct memory access read message indicating access to a first data space of a second memory of the second device and reading first data located in the first data space for storage in the first memory of the first device;

Transmitting, by the second device, a plurality of direct memory access response messages corresponding to the plurality of direct memory access read messages one to the first device, where a first direct memory access response message corresponds to the first direct memory access read message, and the first direct memory access response message indicates a data length of remaining data in the first data space relative to the first data;

and selectively initiating, by the first device, a first direct memory access read operation for the first data or a second direct memory access read operation for the first data space based on a data length of remaining data in the first data space relative to the first data indicated by the first direct memory access response message.

2. The method of claim 1, wherein the second direct memory access read operation comprises performing a full data read of the first data space.

3. The method of claim 1, wherein the second direct memory access read operation includes reading the first data and the remaining data in the first data space relative to the first data.

4. The data pre-reading method of claim 1, wherein selectively initiating, by the first device, the first direct memory access read operation for the first data or the second direct memory access read operation for the first data space based on a data length of the remaining data relative to the first data in the first data space indicated by the first direct memory access response message comprises:

judging whether the residual data is micro data or not through the first device, initiating the first direct memory access reading operation of the first data when the residual data is not micro data, and initiating the second direct memory access reading operation of the first data space when the residual data is micro data.

5. The data pre-reading method of claim 4, wherein the second device does not send the remaining data to the first device when the remaining data is not trace data.

6. The data pre-reading method according to claim 4, wherein the first device judges that the remaining data is trace data when a data length of the remaining data is smaller than a preset data length.

7. The method according to claim 4, wherein the first device determines that the remaining data is trace data when the data length of the remaining data is smaller than the data length of the header of the first direct memory access read message.

8. The method according to claim 4, wherein the first device determines that the remaining data is trace data when the data length of the remaining data is smaller than the data length of the header of the first direct memory access response message.

9. The method of claim 4, wherein the second direct memory access read operation initiated by the first device to the first data space when the remaining data is micro-data comprises performing a data read-ahead of the remaining data for storage in the first memory of the first device.

10. The method of claim 1, wherein the first data is located in a first memory address region of the second memory of the second device, the first data space being one or more partitions in the second memory, the one or more partitions collectively comprising the first memory address region.

11. The data pre-reading method of claim 10, wherein the second direct memory access read operation to the first data space comprises performing a full data read to the one or more partitions, respectively.

12. The method of claim 4, wherein the second direct memory access read operation initiated by the first device to the first data space includes reading the first data and the remaining data in the first data space relative to the first data for storage in the first memory of the first device.

13. The method of claim 12, wherein a second direct memory access read message is a next direct memory access read message of the plurality of direct memory access read messages relative to the first direct memory access read message, the second direct memory access read message indicating at least access to the first data space of the second memory of the second device and reading second data in the remaining data located in the first data space, the first device reading the second data through the remaining data stored in the first memory of the first device.

14. The data pre-reading method of claim 13 wherein the second direct memory access read message further indicates to access a second data space of the second memory of the second device that is different from the first data space and to read third data located in the second data space, the first device reading the third data by initiating a third direct memory access read operation on the third data.

15. A method of pre-reading data according to any one of claims 1 to 14 wherein the first device is a peripheral and the second device is a host, the first memory being the memory of the peripheral and the second memory being the memory of the host.

16. A computer device, characterized in that it comprises a memory, a processor and a computer program stored on the memory and executable on the processor, which processor implements the method according to any of claims 1 to 15 when executing the computer program.

17. A computer readable storage medium storing computer instructions which, when run on a computer device, cause the computer device to perform the method of any one of claims 1 to 15.