WO2023098031A1 - 数据访问方法及计算设备 - Google Patents

数据访问方法及计算设备 Download PDF

Info

Publication number
WO2023098031A1
WO2023098031A1 PCT/CN2022/099520 CN2022099520W WO2023098031A1 WO 2023098031 A1 WO2023098031 A1 WO 2023098031A1 CN 2022099520 W CN2022099520 W CN 2022099520W WO 2023098031 A1 WO2023098031 A1 WO 2023098031A1
Authority
WO
WIPO (PCT)
Prior art keywords
memory
access request
processing unit
data processing
access
Prior art date
Application number
PCT/CN2022/099520
Other languages
English (en)
French (fr)
Inventor
覃国
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Publication of WO2023098031A1 publication Critical patent/WO2023098031A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures
    • G06F12/02Addressing or allocation; Relocation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures
    • G06F12/02Addressing or allocation; Relocation
    • G06F12/0223User address space allocation, e.g. contiguous or non contiguous base addressing
    • G06F12/023Free address space management
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures
    • G06F12/02Addressing or allocation; Relocation
    • G06F12/08Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
    • G06F12/0802Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches
    • G06F12/0877Cache access modes
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F13/00Interconnection of, or transfer of information or other signals between, memories, input/output devices or central processing units
    • G06F13/38Information transfer, e.g. on bus
    • G06F13/40Bus structure
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F13/00Interconnection of, or transfer of information or other signals between, memories, input/output devices or central processing units
    • G06F13/38Information transfer, e.g. on bus
    • G06F13/40Bus structure
    • G06F13/4004Coupling between buses
    • G06F13/4022Coupling between buses using switching circuits, e.g. switching matrix, connection or expansion network
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/27Replication, distribution or synchronisation of data between databases or within a distributed database system; Distributed database system architectures therefor
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5005Allocation of resources, e.g. of the central processing unit [CPU] to service a request
    • G06F9/5011Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resources being hardware resources other than CPUs, Servers and Terminals
    • G06F9/5016Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resources being hardware resources other than CPUs, Servers and Terminals the resource being the memory
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2213/00Indexing scheme relating to interconnection of, or transfer of information or other signals between, memories, input/output devices or central processing units
    • G06F2213/0026PCI express

Definitions

  • the present application relates to the field of data storage, in particular to a data access method and computing equipment.
  • Data storage is usually performed by a central processing unit (CPU) on the host side accessing the memory on the host side. Its execution continues to occupy and consume processor resources. It is understandable that the working time of a processor purchased at a high cost is more beneficial when it is used to undertake high-value computing tasks, and it is obviously not an optimal solution for the processor to undertake the implementation of data storage.
  • CPU central processing unit
  • the prior art proposes adding a data processing unit (data processing unit, DPU) connected to the host-side processor to implement access to the host-side memory to perform the data storage work originally completed by the host-side processor, so that Let the processor devote as much computing power as possible to computing services such as virtual machines running on it.
  • DPU data processing unit
  • the data processing unit is usually arranged outside the host, and hardware needs to be improved so that the data processing unit can exchange data with the memory on the host side.
  • the prior art proposes to connect the hardware memory on the host side to the data processing unit through a hardware cable connection, so that the data processing unit can access the hardware memory of the host.
  • arranging hardware cables will increase the complexity of hardware design of the data processing unit.
  • the data access method of the embodiment of the application is executed by the data processing unit of the computing device.
  • the data processing unit can directly access
  • the hardware connection method is simplified and the complexity of hardware design is reduced.
  • an embodiment of the present application provides a data access method, the method is executed by a data processing unit connected to a processor, and the method includes: writing an access request into the first cache queue of the processor , wherein the access request includes the identifier of the memory to be accessed, wherein the memory is connected to the processor and not directly connected to the data processing unit; the memory is transmitted to the memory through a high-speed peripheral component interconnect (PCIE) point-to-point transmission mode
  • PCIE peripheral component interconnect
  • the memory pointed to by the identifier sends a first instruction, wherein the first instruction includes the position information of the access request in the first cache queue, and the first instruction is used to instruct the memory from the
  • the first cache queue acquires the access request and executes the access request.
  • the data processing unit first writes the access request including the identifier of the memory to be accessed into the first cache queue of the processor, so that the access request can be stored in the first cache queue of the processor middle; then send the first instruction to the memory pointed to by the identifier of the memory through the PCIE point-to-point transmission mode, so that the memory to be accessed can receive the first instruction; the first instruction includes the location information of the access request in the first cache queue, the first The instruction is used to instruct the memory to obtain the access request from the first cache queue and execute the access request, so that after the memory receives the first instruction, it can obtain the access request from a corresponding position in the first cache queue according to the instruction of the first instruction, and Execute the access request.
  • the data processing unit can also directly access the memory connected to the processor without accessing the memory through the processor of the host, so that there is no Occupy host resources.
  • this access method can simplify the hardware connection method and reduce the complexity of hardware design.
  • the access request further includes the type of the access request and an access memory address, and the access memory address is the The memory allocation address of the data processing unit, the type of the access request is used to indicate that the access request is a read request or a write request, and the memory access address is an address accessible to the memory.
  • the type of the access request and the address information of the access object can be determined, so that it can be determined according to the type of the access request whether to perform a read operation or a write operation, and
  • the data processing unit's data access to the memory under the corresponding operation type is completed.
  • the method further includes: from the second cache queue of the processor Obtain the execution result of the access request.
  • the data processing unit Since the data processing unit is connected to the processor, the data processing unit can acquire the execution result of the access request from the second cache queue in the processor, thereby determining the execution status of the access request. The data processing unit does not need to cache the execution result of the access request, which can reduce the storage space occupied by the data processing unit.
  • the method further includes: receiving the data transmitted by the memory through PCIE point-to-point transmission The execution result of the access request.
  • the execution result of the access request does not need to be transmitted from the memory to the data processing unit via the processor, but is directly transmitted from the memory to the data processing unit, which can improve the transmission efficiency of the execution result of the access request.
  • the method further includes: An address of a memory of a processing unit is transmitted to the processor; and the access memory address is received from the processor.
  • the data processing unit can obtain the access memory address that can be accessed by the memory, so that the access memory address can be used as the address for the data processing unit to transmit data through the PCIE point-to-point transmission mode, so that the data processing unit and the memory are in different places. It is possible to transfer data with a direct connection.
  • the method further includes: receiving the first Addresses of a cache queue and a second cache queue, the address of the first cache queue is used to write the access request into the first cache queue of the processor, and the address of the second cache queue is used for accessing from the second cache queue The execution result of the access request is obtained from the cache queue.
  • the data processing unit can find the first cache queue according to the received address of the first cache queue, and when the data processing unit writes the access request into the first cache queue, it can directly write the access request; so that the data
  • the processing unit can find the second cache queue according to the received address of the second cache queue, and when the data processing unit obtains the execution result of the access request from the second cache queue, it can directly obtain the execution result of the access request. This ensures the real-time and accuracy of writing access requests and obtaining access request results.
  • an embodiment of the present application provides a data access method, the method is applied to a computing device, and the computing device includes a processor, a memory connected to the processor, and a memory connected to the processor and connected to the A data processing unit that is not directly connected to the memory, the method includes: the data processing unit writes an access request into the first cache queue of the processor, the access request includes an identifier of the memory to be accessed; The data processing unit sends a first instruction to the memory pointed to by the identifier of the memory through a high-speed peripheral component interconnection (PCIE) point-to-point transmission mode, and the first instruction indicates that the access request is in the first cache queue location; the memory acquires the access request from the first cache queue according to the first instruction and executes the access request.
  • PCIE peripheral component interconnection
  • the computing device when the memory is connected to the processor and not directly connected to the data processing unit, the computing device can directly access the memory connected to the processor without accessing the memory through the processor of the host. , so as not to occupy the resources of the host. Moreover, this access method can simplify the hardware connection method and reduce the complexity of hardware design.
  • the access request further includes an access memory address, and the access memory address is the memory allocated by the processor for the data processing unit address, the access memory address is an address accessible to the memory; the execution of the access request includes: the memory writes data into the data processing unit through PCIE point-to-point transmission according to the access memory address memory or read data from the memory of the data processing unit.
  • the memory can find the access object, that is, the memory of the data processing unit according to the access memory address, so that the access request can be executed to write data to the memory of the data processing unit, or read data from the memory of the data processing unit, The memory access of the data processing unit to the memory is completed.
  • the access request further includes the type of the access request, and the type of the access request is used for indicating that the access request is a read request or a write request; when the access request is a read request, executing the access request includes: reading data from the memory according to the access request, and Memory address, write the data read in the memory into the memory of the data processing unit through PCIE point-to-point transmission; when the access request is a write request, the execution of the access request includes: according to the access A memory address, obtaining data to be written into the memory from the memory of the data processing unit through PCIE point-to-point transmission, and writing the obtained data into the memory.
  • the memory can write data to the memory of the data processing unit, which is equivalent to the memory read memory of the data processing unit; when the access request is a write request, the memory can read data from the memory of the data processing unit, which is equivalent to The memory of the data processing unit writes the memory. In this way, corresponding to different types of access requests, the memory can respond with different operations that meet the requirements of the access requests.
  • the method further includes: the data processing unit sends to the processor The address of the memory of the data processing unit; the processor assigns the access memory address to the memory of the data processing unit according to the address of the memory of the data processing unit; the processor sends the address to the data processing unit Send the access memory address.
  • the processor can allocate an access memory address for the memory of the data processing unit, so that the access memory address can be used as the address for the data processing unit to transmit data through the PCIE point-to-point transmission mode, so that the data processing unit and the memory are not connected. It is possible to transfer data with a direct connection. Allocating the access memory address for the memory of the data processing unit and sending the access memory address can be carried out in the configuration stage, so that after the software in the data processing unit generates an access request, it can directly use the access memory address, which can improve the efficiency of the computing device to execute the access request efficiency.
  • the method further includes: storing, by the memory, the The execution result is written into the second cache queue of the processor; the data processing unit obtains the execution result of the access request from the second cache queue of the processor.
  • the method It also includes: the data processing unit receiving the execution result of the access request transmitted by the memory through PCIE point-to-point transmission.
  • the method further includes: the data processing unit receives the processing The address of the first cache queue and the second cache queue sent by the processor, the address of the first cache queue is used to write the access request into the first cache queue of the processor, and the address of the second cache queue is used for Obtain the execution result of the access request from the second cache queue.
  • the data processing unit can find the first cache queue according to the received address of the first cache queue, so that the data processing unit can accurately and quickly execute the access request written into the first cache queue, from the second The cache queue obtains the operation of the execution result of the access request.
  • the creation of the first cache queue and the second cache queue, and the sending of the addresses of the first cache queue and the second cache queue can be performed in the configuration stage, so that after the software in the data processing unit generates an access request, the first cache queue is directly used and the address of the second cache queue, which can improve the efficiency of the computing device in executing the access request.
  • an embodiment of the present application provides a computing device, including: a processor; a memory connected to the processor; and a data processing unit connected to the processor and not directly connected to the memory, the The data processing unit is configured to execute one or more data access methods in the above first aspect or in multiple possible implementation manners of the first aspect.
  • the embodiments of the present application provide a non-volatile computer-readable storage medium, on which computer program instructions are stored, and when the computer program instructions are executed by a processor, the above-mentioned first aspect or the first aspect can be realized One or more of the many possible implementations of the data access method.
  • the embodiments of the present application provide a computer program product, including computer readable code, or a non-volatile computer readable storage medium bearing computer readable code, when the computer readable code is stored in an electronic
  • the processor in the electronic device executes the data access method of the first aspect or one or more of the multiple possible implementation manners of the first aspect.
  • Fig. 1 shows an implementation manner of a data storage function of a computing device in the prior art.
  • Fig. 2 shows another implementation of the data storage function of a computing device in the prior art.
  • Fig. 3 shows an exemplary application scenario of a computing device according to an embodiment of the present application.
  • FIG. 4 illustrates an exemplary workflow of a computing device according to an embodiment of the present application.
  • Fig. 5 shows an example of the data processing unit sending the memory address of the data processing unit to the processor according to the embodiment of the present application.
  • FIG. 6 shows an example in which a processor obtains an access memory address and sends an access memory address according to an embodiment of the present application.
  • FIG. 7 shows an example in which a processor creates a buffer queue and sends an address of the buffer queue according to an embodiment of the present application.
  • Fig. 8 shows an example in which a data processing unit writes an access request into a first cache queue of a processor according to an embodiment of the present application.
  • FIG. 9 shows an example of the data processing unit generating a first instruction and sending the first instruction to a memory according to an embodiment of the present application.
  • FIG. 10 shows an example of a memory acquiring an access request and executing the access request according to an embodiment of the present application.
  • FIG. 11 shows an example in which the memory writes the execution result of the access request into the second cache queue according to the embodiment of the present application.
  • FIG. 12 shows an example of an execution result of the data processing unit acquiring an access request according to an embodiment of the present application.
  • FIG. 13 shows another exemplary workflow of a computing device according to an embodiment of the present application.
  • FIG. 14 shows an example of writing the execution result of the access request into the data processing unit by the memory according to the embodiment of the present application.
  • FIG. 15 shows a schematic diagram of an exemplary workflow of a data access method according to an embodiment of the present application.
  • Fig. 16 shows a schematic diagram of an exemplary workflow of a data access method according to an embodiment of the present application.
  • Fig. 17 shows an exemplary structural diagram of a computing device according to an embodiment of the present application.
  • Fig. 1 shows an implementation manner of a data storage function of a computing device in the prior art.
  • the computing device shown in FIG. 1 is an example of a computing device under a hyper-converged infrastructure.
  • Hyper converged infrastructure hyper converged infrastructure (hyper converged infrastructure, HCI) is a technology that integrates virtualized computing and storage into the same system platform, which can integrate storage and computing functions into a single computing device (or computing device cluster, each computing device devices provide computing and storage capabilities).
  • a computing device under a hyper-converged infrastructure its host side includes a processor (such as a central processing unit), a SAS expander, a SAS disk array chip, and a memory
  • the memory may include, for example, a Serial Attached SCSI Interface standard hard disk drive/solid state drive (serial attached SCSI hard disk drive/solid state drive, SAS HDD/SSD) and/or non-volatile memory host controller interface standard solid state drive (non-volatile memory express, NVMe SSD), etc.
  • the processor and the memory NVMe SSD are connected through a high-speed peripheral component interconnect express (PCIE) cable, the processor is also connected to the SAS disk array chip through the PCIE cable, and the SAS disk array chip is also connected to the memory HDD through the SAS expander. /SSD.
  • PCIE peripheral component interconnect express
  • the processor runs a storage virtual machine and multiple user virtual machines, where the storage virtual machine also runs distributed storage software, each user virtual machine also runs its own application software, and the distributed storage software is used to issue access requests for the host-side storage According to the access request, the processor can directly access the memory on the host side through PCIE, so as to provide storage services for the application software in the user virtual machine.
  • Fig. 1 shows the calculation in the prior art Another way to implement the data storage function of the device.
  • the data processing unit is connected to the host-side processor (such as the central processing unit) through a PCIE cable, and the data processing unit is used to replace the host-side processor to run distributed storage software to send data to the user virtual machine in the host.
  • the application software provides data storage services, so that as much processor computing power as possible can be used to run the application software of the user virtual machine. In this case, it is necessary to solve the problem of how the data processing unit accesses the memory on the host side.
  • the prior art proposes to connect the data processing unit to the host-side storage through a hardware cable connection, so that the data processing unit can access the host-side storage.
  • the data processing unit and the memory NVMe SSD can be connected through a PCIE cable
  • the data processing unit and the memory HDD/SSD can be connected through a SAS cable.
  • this solution has the following requirements for the hardware facilities of the data processing unit:
  • the SAS disk array chip is usually designed to be plugged into the processor. Therefore, the data processing unit needs to integrate a new SAS disk array chip to connect to the SAS expander. Secondly, the data processing unit is required to have enough PCIE ports, or to integrate a PCIE Switch chip (not shown), so as to connect multiple memory NVMe SSDs through the PCIE port or the PCIE Switch chip.
  • the data access method of the embodiment of the application is executed by the data processing unit of the computing device.
  • the data processing unit can directly access
  • the hardware connection method is simplified and the complexity of hardware design is reduced.
  • Fig. 3 shows an exemplary application scenario of a computing device according to an embodiment of the present application.
  • the computing device host side of the embodiment of the present application includes a processor (such as a central processing unit) and a plurality of memories (may include a memory NVMe SSD and a memory HDD/SSD), and the data processing unit is connected through a PCIE cable for processing device, and do not connect each memory.
  • the data processing unit is not arranged on the host side of the computing device.
  • the processor can run at least one user virtual machine, and each user virtual machine also runs its own application software.
  • Application software has data storage requirements.
  • the processor and memory NVMe SSD are connected through a PCIE cable, the processor is also connected to the SAS disk array chip through the PCIE cable, and the SAS disk array chip is also connected to the memory HDD/SSD through the SAS expander.
  • the data processing unit can run distributed storage software, and when the distributed storage software generates an access request, or other software (not shown) run by the data processing unit generates an access request, the data processing unit executes the data access method of the embodiment of the present application , accessing the memory on the host side through the PCIE point-to-point transmission mode, so as to provide data storage services to the application software in the user virtual machine of the processor on the host side.
  • the computing device in FIG. 3 can be applied, for example, to a hyper-converged infrastructure, and can also be applied to an application scenario where the data processing unit accesses the host-side storage when other data processing units are connected to the host-side processor and not connected to the host-side storage.
  • This application does not limit the specific application scenarios of the computing device.
  • FIG. 4 illustrates an exemplary workflow of a computing device according to an embodiment of the present application.
  • the following describes an exemplary method for implementing a data processing unit accessing a hardware memory on a host side in an embodiment of the present application with reference to FIG. 4 .
  • the computing device in the embodiment of the present application can be configured before being used by the user, so that in the computing device, the data processing unit can directly respond to the access request, without having to reconfigure each device or module in the computing device to provide the access request when the access request is generated. Prepare to respond to access requests. In order to maximize the user experience, configuration can be done, for example, before the computing device leaves the factory. When a user uses a computing device, he can perform various operations on the application software running on the processor. Some operations may involve data reading and writing, and the application software can generate corresponding storage requirements at this time.
  • the processor can instruct the distributed storage software or other software (not shown) run by the data processing unit connected to it to generate an access request.
  • the data processing unit may write the access request into the cache queue in the processor, so that the memory can obtain and execute the access request from the cache queue, so as to achieve the purpose of accessing the hardware memory on the host side.
  • the computing device may execute steps S1-S3 to configure the data processing unit, processor and memory. After the data processing unit, processor and memory are configured, when the distributed storage software generates an access request, the computing device can perform steps S4-S8 to enable the data processing unit to access the hardware memory on the host side.
  • An exemplary method for configuring a data processing unit, a processor, and a memory is firstly introduced below with reference to FIG. 4 .
  • Step S1 the data processing unit sends the memory address of the data processing unit to the processor.
  • Fig. 5 shows an example of the data processing unit sending the memory address of the data processing unit to the processor according to the embodiment of the present application.
  • the data processing unit includes a memory for storing data used or generated by the data processing unit when executing tasks or instructions.
  • the memory of the data processing unit may include a data page area, which is used to store data that the data processing unit can write to other devices or modules (such as memory) in the computing device, and data that is written by other devices or modules (such as memory) in the computing device ) written data.
  • the address of the memory of the data processing unit may be the address of the data page area.
  • the memory is set on the host side. Therefore, for the memory, the object capable of transmitting data through the PCIE point-to-point transmission method should also have the address of the host side.
  • the data processing unit is not arranged on the host side, therefore, the address of the internal memory of the data processing unit is not the address of the host side, and cannot be used as the address of the object for the memory to transmit data through the PCIE point-to-point transmission mode. Since the data processing unit is connected to the processor on the host side, in step S1, the data processing unit may first send the memory address of the data processing unit to the processor on the host side. The address of the host side corresponding to the address of the memory of the data processing unit is firstly processed by the processor (see step S2 below for an example).
  • Step S2 the processor assigns an access memory address to the memory of the data processing unit according to the address of the memory of the data processing unit, and sends the access memory address to the data processing unit.
  • FIG. 6 shows an example in which a processor obtains an access memory address and sends an access memory address according to an embodiment of the present application.
  • the processor is set on the host side of the computing device, therefore, the address generated by the processor is the address of the host side, and can be used as the address of the transfer object for the memory to transfer data through the PCIE point-to-point transfer mode in use.
  • the processor can allocate a corresponding storage area for the memory of the data processing unit in its storage space, and the access memory address can be the address of the storage area.
  • the access memory address When the access memory address is used as the address of the object that the memory transmits data through the PCIE point-to-point transmission mode, the memory of the data processing unit corresponding to the storage area to which the access memory address belongs can be used as the transmission object for the memory to transmit data through the PCIE point-to-point transmission mode, Thus, a PCIE point-to-point transmission channel between the memory and the data processing unit is established.
  • the access memory address can be used when data is transmitted between the data processing unit and the memory through PCIE point-to-point transmission (see step S6 below for an example).
  • step S3 the processor creates a first cache queue and a second cache queue corresponding to the memory, and sends storage addresses of the first cache queue and the second cache queue in the processor to the data processing unit and the memory, respectively.
  • FIG. 7 shows an example in which a processor creates a buffer queue and sends an address of the buffer queue according to an embodiment of the present application.
  • the first cache queue may be a submission queue (submission queue, SQ) for storing access requests
  • the second cache queue may be a completion queue (completion queue, SQ) for storing the execution results of the access requests. CQ).
  • the first cache queue and the second cache queue can be stored in the memory of the processor, and the processor is connected to the data processing unit and the memory at the same time, so the storage addresses of the first cache queue and the second cache queue in the processor can be respectively transmitted to Data processing unit and memory.
  • the data processing unit can write the access request to the first cache queue according to the storage address of the first cache queue in the processor (see step S4 below for an example), and can write the access request from the second cache queue according to the storage address of the second cache queue in the processor.
  • the queue reads the execution result of the access request (see step S8 below for an example).
  • the memory can read the access request from the first cache queue according to the storage address of the first cache queue in the processor (see step S6 below for an example), and can write the execution of the access request according to the storage address of the second cache queue in the processor
  • the result is sent to the second cache queue (see step S7 below for an example). In this way, the transmission of the access request and the execution result of the access request between the data processing unit and the memory can be realized.
  • Step S3 can also be executed before step S1 or step S2, or executed simultaneously with step S1 or step S2. This application does not limit the execution order of step S3 and step S1 and the execution order of step S3 and step S2.
  • Step S4 the data processing unit writes the access request into the first cache queue of the processor.
  • step S3 the data processing unit receives the storage address of the first cache queue, therefore, the location of the first cache queue in the processor is known to the data processing unit.
  • the data processing unit may write the access request into the first cache queue according to the address of the first cache queue.
  • Fig. 8 shows an example in which a data processing unit writes an access request into a first cache queue of a processor according to an embodiment of the present application.
  • the data processing unit can first convert the access request into an entry form (submission queue entry, SQE) in the submission queue (first cache queue), and the converted entry can include an operation code (operation code), Scatter gather list (SGL), identification, etc.
  • the first cache queue may include a plurality of entries, and when the access request is written into the first cache queue of the processor, the entry converted according to the access request may be written into the cache queue through direct memory access (DMA) The position of an entry in the processor's primary cache queue (eg, the entry at the end of the queue).
  • DMA direct memory access
  • the operation code can indicate the type of the access request, and the access request can be divided into different types.
  • the access request can be a request for the data processing unit to read the memory (equivalent to the memory transferring data to the data processing unit), and the access request can be a read Request, when the memory executes the access request, it is a write operation, that is, the memory writes data to the data processing unit (see step S6 below for an example).
  • the access request can also be a request for the data processing unit to write the memory (equivalent to the data processing unit transmitting data to the memory), at this time the access request can be a write request, and the memory is a read operation when executing the access request, that is, the memory reads the data processing unit The data in the internal memory (see step S6 below for an example).
  • the type of the access request can be written into the first cache queue of the processor, so that the memory can obtain the type of the access request from the first cache queue, and determine that the execution of the access request is a read fetch operation or write operation.
  • the scatter-gather table can store access memory addresses. It can be seen from the relevant description of step S2 above that the access memory address can be used as the address of the transfer object when the memory transfers data through PCIE point-to-point transfer. By using the scatter aggregation table to store the access memory address, the access memory address can be written into the first cache queue of the processor, so that when the memory executes the access request, the object of the read operation or the write operation can be found according to the access memory address ( In this example it is the data page area of the data processing unit).
  • Identification can be used to identify the memory to be accessed.
  • the memory to be accessed may be any of the memories on the host side.
  • Each memory on the host side may correspond to a first cache queue.
  • each first cache queue may also include an identifier of the memory corresponding to the first cache queue.
  • the converted entry of the access request may include more content, such as the length of the access request, and the present application does not limit the specific information included in the access request.
  • Step S5 the data processing unit generates a first instruction including the location information of the access request in the first cache queue, and sends the first instruction to the memory pointed to by the identifier of the access request through PCIE point-to-point transmission, and the first instruction is used to indicate the location of the memory Obtain an access request from the first cache queue and execute the access request.
  • the high-speed peripheral component interconnection point-to-point transmission mode (ie, PCIE point-to-point transmission mode) is an existing data transmission mode in the prior art, which can enable two devices to perform data transmission without being directly connected.
  • the PCIE point-to-point transmission mode is used so that the first instruction can be transmitted between the data processing unit that is not directly connected and the memory to be accessed.
  • the memory to be accessed as the sending object of the first instruction may be the memory corresponding to the identifier included in the access request in step S4.
  • the first instruction may include location information of the access request in the first cache queue, and the first instruction may be used to instruct the memory to acquire the access request from a first cache queue corresponding to the memory and execute the access request. That is, the queue from which the memory acquires the access request is determined by the correspondence between the first cache queue and the memory.
  • the memory to be accessed may obtain the access request from a corresponding position in a first cache queue corresponding to the memory according to the instruction of the first instruction, and then execute the access request.
  • FIG. 9 shows an example of the data processing unit generating a first instruction and sending the first instruction to a memory according to an embodiment of the present application.
  • the location information of the access request in the first cache queue can be recorded through a pointer (doorbell), and the pointer of the first cache queue (that is, the doorbell address of the first cache queue) can be obtained, and the first instruction can include the first A pointer to the cache queue.
  • the pointer of the first cache queue may be "3", indicating that the access request written last time is written to the third entry of the first cache queue.
  • step S4 is executed, the pointer in the first cache An access request is newly written at the 4th entry of the queue.
  • the pointer of the first cache queue can be updated to "4".
  • the processor can generate the first instruction, the first instruction includes The pointer of the first buffer queue may be "4".
  • the processor may send the first instruction including the pointer "4" of the first cache queue to the memory to be accessed.
  • Step S6 the memory acquires the access request from the first cache queue and executes the access request according to the instruction of the first instruction.
  • FIG. 10 shows an example of a memory acquiring an access request and executing the access request according to an embodiment of the present application.
  • the memory receives a first instruction from the processor, and the first instruction instructs the memory to acquire an access request from the first cache queue and execute the access request.
  • the first instruction instructs the memory to acquire an access request from the first cache queue and execute the access request.
  • the memory obtains an access request from the position of the fourth entry in the first cache queue.
  • the acquired item may be an entry converted from the access request.
  • the memory can determine the type of access request. According to the scatter-aggregation table in the converted entry, the memory can determine the address of the object for data transmission in the PCIE point-to-point transmission mode (that is, the access memory address). Depending on the type of access request and the address of the accessed memory, the memory can execute the access request.
  • the way for the memory to execute the access request may be to write data into the memory of the data processing unit or read data from the memory of the data processing unit through PCIE point-to-point transmission according to the address of the memory to be accessed.
  • the way the memory executes the access request may be to read data from the memory according to the access request, and write the data read in the memory through the PCIE point-to-point transmission method according to the access memory address
  • the memory of the data processing unit for example, can be written into the data page area of the memory of the data processing unit.
  • the type of access request is that when writing a request, the way the memory executes the access request can be to obtain the data to be written into the memory from the memory of the data processing unit through PCIE point-to-point transmission according to the access memory address, for example, from the data processing unit Obtain data from the data page area of the internal memory, and write the obtained data into the memory.
  • the access request may also include information about accessing a certain storage area of the memory, for example, an identifier of a certain storage area of the memory.
  • the type of the access request is a read request
  • the data stored in the storage area corresponding to the storage area identifier included in the access request may be read from the memory according to the access request.
  • the type of the access request is a write request
  • the acquired data may be written into the storage area corresponding to the storage area identifier included in the access request.
  • Step S7 the memory writes the execution result of the access request into the second cache queue of the processor.
  • the access request is generated by software run by data processing units such as distributed storage software. Therefore, corresponding to the access request, an execution result of the access request can be transmitted to the software that generates the access request, so as to inform the software that generates the access request that it has generated the access request.
  • the implementation of the access request There are two types of execution results of the access request, one is that the access request has been executed, and the other is that the access request has not been executed.
  • FIG. 11 shows an example in which the memory writes the execution result of the access request into the second cache queue according to the embodiment of the present application.
  • the memory may convert the execution result of the access request into the form of an entry (completion queue entry, CQE) in the completion queue (second cache queue), and the converted entry may include an identifier and the like.
  • CQE completion queue entry
  • the converted entry may include an identifier and the like.
  • Different types of identifiers can correspond to different execution results. For example, if the identifier is "1", it can indicate that the entry with this identifier is an entry that indicates that the access request has been completed, and if the identifier is "0", it can indicate that the entry with this identifier is an access request. Items that have not yet been executed.
  • the second cache queue may include a plurality of entries, and when the execution result of the access request is written into the second cache queue of the processor, the entry converted according to the execution result of the access request may be accessed through direct memory access. DMA) mode to write to the location of an entry in the second cache queue of the processor (for example, an entry at the end of the queue).
  • DMA direct memory access
  • each second cache queue may also include a The ID of the storage.
  • the second cache queue with the same identifier can be found among multiple second cache queues according to the identifier included in the access request, and then the execution result of the access request The result is written to the second cache queue.
  • step S8 the data processing unit obtains the execution result of the access request from the second cache queue of the processor.
  • FIG. 12 shows an example of an execution result of the data processing unit acquiring an access request according to an embodiment of the present application.
  • step S7 the memory writes the execution result of the access request into the second cache queue in the processor, and the data processing unit is not notified of the write operation when writing. Therefore, for the data processing unit, the update time of the second cache queue of the processor is unknown.
  • the data processing unit may be set to monitor whether the second cache queue in the processor is updated every preset time period. When it is detected that the execution result of a newly written access request appears in the second cache queue, the execution result of the access request is acquired. The data processing unit can transmit the execution result to the software that generates the access request. The data processing unit does not need to cache the execution result of the access request, which can reduce the storage space occupancy rate of the data processing unit.
  • FIG. 13 shows another exemplary workflow of a computing device according to an embodiment of the present application.
  • steps S1-S6 can refer to the related descriptions of steps S1-S6 in FIG. 4 above, and the part related to the second cache queue can be deleted, and will not be repeated here.
  • step S9 the memory writes the execution result of the access request into the data processing unit through PCIE point-to-point transmission.
  • FIG. 14 shows an example of writing the execution result of the access request into the data processing unit by the memory according to the embodiment of the present application.
  • an execution result of the access request may be generated.
  • the execution result of the access request can be directly written into the data processing unit through PCIE point-to-point transmission.
  • a storage area may be allocated in the storage space of the data processing unit for caching the execution result of the access request. It can be set that an interrupt is generated to notify the data processing unit when the storage content in the area is updated, or the data processing unit can be set to poll the area to monitor whether the storage content is updated. Wherein, interrupt and polling can be implemented based on existing technologies.
  • the data processing unit can find the execution result of the newly written access request, and transmit the execution result to the software generating the access request.
  • the execution result of the access request does not need to be transmitted through the processor, which can improve the efficiency of the data processing unit in obtaining the execution result of the access request.
  • FIG. 15 shows a schematic diagram of an exemplary workflow of a data access method according to an embodiment of the present application.
  • the embodiment of the present application proposes a data access method, the method is executed by a data processing unit connected to a processor, and the method includes steps S10-S11:
  • Step S10 writing an access request into the first cache queue of the processor, wherein the access request includes an identifier of the memory to be accessed, wherein the memory is connected to the processor and not directly connected to the data processing unit.
  • Step S11 sending a first instruction to the memory pointed to by the identifier of the memory through a high-speed peripheral component interconnect (PCIE) point-to-point transmission mode, wherein the first instruction includes the access request in the first cache queue location information, the first instruction is used to instruct the memory to acquire the access request from the first cache queue and execute the access request.
  • PCIE peripheral component interconnect
  • step S10 for an exemplary implementation manner of step S10 , reference may be made to the relevant descriptions above and step S4 in FIG. 4 and FIG. 13 .
  • step S11 for an exemplary implementation manner of step S11 , reference may be made to the relevant descriptions above and step S5 in FIG. 4 and FIG. 13 .
  • the data processing unit first writes the access request including the identifier of the memory to be accessed into the first cache queue of the processor, so that the access request can be stored in the first cache queue of the processor middle; then send the first instruction to the memory pointed to by the identifier of the memory through the PCIE point-to-point transmission mode, so that the memory to be accessed can receive the first instruction; the first instruction includes the location information of the access request in the first cache queue, the first The instruction is used to instruct the memory to obtain the access request from the first cache queue and execute the access request, so that after the memory receives the first instruction, it can obtain the access request from a corresponding position in the first cache queue according to the instruction of the first instruction, and Execute the access request.
  • the data processing unit can also directly access the memory connected to the processor without accessing the memory through the processor of the host, so that there is no Occupy host resources.
  • this access method can simplify the hardware connection method and reduce the complexity of hardware design.
  • the access request further includes the type of the access request and an access memory address, where the access memory address is an address allocated by the processor to the memory of the data processing unit, and the The type of the access request is used to indicate that the access request is a read request or a write request, and the access memory address is an address accessible to the memory.
  • the access request refer to the example of the access request above and in the related description of FIG. 8 .
  • the type of the access request and the address information of the access object can be determined, so that it can be determined according to the type of the access request whether to perform a read operation or a write operation, and
  • the data processing unit's data access to the memory under the corresponding operation type is completed.
  • the method further includes: acquiring an execution result of the access request from a second cache queue of the processor.
  • acquiring an execution result of the access request from a second cache queue of the processor For an exemplary implementation manner thereof, reference may be made to the above and related descriptions of step S8 in FIG. 4 , as well as the above and related descriptions of FIG. 12 .
  • the data processing unit Since the data processing unit is connected to the processor, the data processing unit can acquire the execution result of the access request from the second cache queue in the processor, thereby determining the execution status of the access request. The data processing unit does not need to cache the execution result of the access request, which can reduce the storage space occupied by the data processing unit.
  • the method further includes: receiving an execution result of the access request transmitted by the memory through a PCIE point-to-point transmission manner.
  • a PCIE point-to-point transmission manner For an exemplary implementation manner thereof, reference may be made to the above and related descriptions of step S9 in FIG. 13 , as well as the above and related descriptions of FIG. 14 .
  • the execution result of the access request does not need to be transmitted from the memory to the data processing unit via the processor, but is directly transmitted from the memory to the data processing unit, which can improve the transmission efficiency of the execution result of the access request.
  • the method further includes: transmitting an address of a memory of the data processing unit to the processor; and receiving the access memory address from the processor.
  • transmitting the address of the memory of the data processing unit to the processor reference may be made to the above and related descriptions of step S1 in FIG. 4 and FIG. 13 , as well as the above and related descriptions of FIG. 5 .
  • receiving the access memory address from the processor reference may be made to the relevant descriptions above and FIG. 4 and step S2 in FIG. 13 , as well as the relevant descriptions above and FIG. 6 .
  • the data processing unit can obtain the access memory address that can be accessed by the memory, so that the access memory address can be used as the address for the data processing unit to transmit data through the PCIE point-to-point transmission mode, so that the data processing unit and the memory are in different places. It is possible to transfer data with a direct connection.
  • the method further includes: receiving addresses of the first cache queue and the second cache queue sent by the processor, where the address of the first cache queue is used to write the access request into the The first cache queue of the processor, the address of the second cache queue is used to obtain the execution result of the access request from the second cache queue.
  • the data processing unit can find the first cache queue according to the received address of the first cache queue, and when the data processing unit writes the access request into the first cache queue, it can directly write the access request; so that the data
  • the processing unit can find the second cache queue according to the received address of the second cache queue, and when the data processing unit obtains the execution result of the access request from the second cache queue, it can directly obtain the execution result of the access request. In this way, the real-time and accuracy of writing the access request and obtaining the result of the access request are guaranteed.
  • Fig. 16 shows a schematic diagram of an exemplary workflow of a data access method according to an embodiment of the present application.
  • the present application proposes a data access method, the method is applied to a computing device, the computing device includes a processor, a memory connected to the processor, and a memory connected to the processor and connected to the memory
  • the method includes steps S12-S14:
  • Step S12 the data processing unit writes an access request into the first cache queue of the processor, where the access request includes an identifier of the memory to be accessed;
  • Step S13 the data processing unit sends a first instruction to the memory pointed to by the identifier of the memory through a high-speed peripheral component interconnect (PCIE) point-to-point transmission mode, and the first instruction indicates that the access request is in the first position in the cache queue;
  • PCIE peripheral component interconnect
  • Step S14 the memory acquires the access request from the first cache queue according to the first instruction and executes the access request.
  • step S12 reference may be made to the relevant descriptions above and step S4 in FIG. 4 and FIG. 13 .
  • step S13 reference may be made to the relevant descriptions above and step S5 in FIG. 4 and FIG. 13 .
  • step S14 reference may be made to the relevant descriptions above and step S6 in FIG. 4 and FIG. 13 .
  • the computing device when the memory is connected to the processor and not directly connected to the data processing unit, the computing device can directly access the memory connected to the processor without accessing the memory through the processor of the host. , so as not to occupy the resources of the host. Moreover, this access method can simplify the hardware connection method and reduce the complexity of hardware design.
  • the access request further includes an access memory address, where the access memory address is an address allocated by the processor to the memory of the data processing unit, and the access memory address is the Accessible address; in step S14, the execution of the access request includes: the memory writes data into the memory of the data processing unit or from the data processing unit through the PCIE point-to-point transmission mode according to the access memory address. Read data from the unit's memory.
  • step S6 For accessing the memory address, refer to the example of accessing the memory address above and in the related description of FIG. 6 .
  • step S6 For an exemplary implementation manner of writing data into or reading data from the memory of the data processing unit by the memory, reference may be made to the above and related descriptions of step S6 in FIG. 4 and FIG. 13 .
  • the memory can find the access object, that is, the memory of the data processing unit according to the access memory address, so that the access request can be executed to write data to the memory of the data processing unit, or read data from the memory of the data processing unit, The memory access of the data processing unit to the memory is completed.
  • the access request further includes the type of the access request, and the type of the access request is used to indicate that the access request is a read request or a write request; when the access request is a read request
  • the execution of the access request includes: reading data from the memory according to the access request, and writing the data read in the memory into the memory address through PCIE point-to-point transmission according to the memory address of the access the memory of the data processing unit; when the access request is a write request, the execution of the access request includes: according to the access memory address, obtaining the data to be written from the memory of the data processing unit by means of PCIE point-to-point transmission data into the memory, and write the acquired data into the memory.
  • access requests For the types of access requests, refer to the examples of types of access requests above and in the related description of FIG. 8 .
  • the access request is a read request or a write request
  • step S6 for an exemplary implementation manner of executing the access request, reference may be made to the above and related descriptions of step S6 in FIG. 4 and FIG. 13 .
  • the memory can write data to the memory of the data processing unit, which is equivalent to the memory read memory of the data processing unit; when the access request is a write request, the memory can read data from the memory of the data processing unit, which is equivalent to The memory of the data processing unit writes the memory. In this way, corresponding to different types of access requests, the memory can respond with different operations that meet the requirements of the access requests.
  • the method further includes: the data processing unit sending the address of the memory of the data processing unit to the processor; , allocating the access memory address for the memory of the data processing unit; the processor sends the access memory address to the data processing unit.
  • the data processing unit sending the address of the memory of the data processing unit to the processor; , allocating the access memory address for the memory of the data processing unit; the processor sends the access memory address to the data processing unit.
  • the processor can allocate an access memory address for the memory of the data processing unit, so that the access memory address can be used as the address for the data processing unit to transmit data through the PCIE point-to-point transmission mode, so that the data processing unit and the memory are not connected. It is possible to transfer data with a direct connection. Allocating the access memory address for the memory of the data processing unit and sending the access memory address can be carried out in the configuration stage, so that after the software in the data processing unit generates an access request, it can directly use the access memory address, which can improve the efficiency of the computing device to execute the access request efficiency.
  • the method further includes: writing, by the memory, the execution result of the access request into the second cache queue of the processor; The execution result of the access request is obtained from the second cache queue.
  • the method further includes: the data processing unit receiving an execution result of the access request transmitted by the memory through a PCIE point-to-point transmission manner.
  • the data processing unit receiving an execution result of the access request transmitted by the memory through a PCIE point-to-point transmission manner.
  • the method further includes: receiving addresses of the first cache queue and the second cache queue sent by the processor, where the address of the first cache queue is used to write the access request into the The first cache queue of the processor, the address of the second cache queue is used to obtain the execution result of the access request from the second cache queue.
  • the data processing unit can find the first cache queue according to the received address of the first cache queue, so that the data processing unit can accurately and quickly execute the access request written into the first cache queue, from the second The cache queue obtains the operation of the execution result of the access request.
  • the creation of the first cache queue and the second cache queue, and the sending of the addresses of the first cache queue and the second cache queue can be performed in the configuration stage, so that after the software in the data processing unit generates an access request, the first cache queue is directly used and the address of the second cache queue, which can improve the efficiency of the computing device in executing the access request.
  • Fig. 17 shows an exemplary structural diagram of a computing device according to an embodiment of the present application.
  • an embodiment of the present application provides a computing device, including:
  • the data processing unit 103 connected to the processor 101 and not directly connected to the memory 102, is configured to: write an access request into the first cache queue of the processor 101, the access request includes the to-be-accessed The identifier of the memory 102; sending a first instruction to the memory 102 pointed to by the identifier of the memory through a high-speed peripheral component interconnection (PCIE) point-to-point transmission mode, and the first instruction indicates that the access request is in the first cache queue
  • PCIE peripheral component interconnection
  • the memory 102 is configured to acquire the access request from the first cache queue according to the first instruction, and execute the access request.
  • the memory 102 is configured to acquire the access request from the first cache queue according to the first instruction, and execute the access request.
  • FIG. 4 For an exemplary implementation manner thereof, reference may be made to the relevant descriptions above and FIG. 4 , step S6 in FIG. 13 , and the relevant descriptions in FIG. 10 .
  • the processor 101 can refer to the example of the central processing unit in FIG. 3
  • the memory 102 can refer to the examples of the memory HDD/SSD and NVMe SSD in FIG. 3
  • the data processing unit 103 can refer to the example of the data processing unit in FIG. 3 .
  • the access request further includes an access memory address
  • the access memory address is an address allocated by the processor 101 to the memory of the data processing unit 103, and the access memory address is the An address accessible to the memory 102;
  • the memory 102 is used to write data into the memory of the data processing unit 103 or read data from the memory of the data processing unit 103 through PCIE point-to-point transmission according to the access memory address.
  • the access request further includes a type of the access request, where the type of the access request is used to indicate that the access request is a read request or a write request;
  • the memory 102 is configured to: read data from the memory 102 according to the access request, and store the data read in the memory 102 according to the access memory address Write the internal memory of the data processing unit 103 through the PCIE point-to-point transmission mode;
  • the memory 102 is configured to: obtain the data to be written into the memory 102 from the memory of the data processing unit 103 through PCIE point-to-point transmission according to the access memory address. data, and write the acquired data into the memory 102.
  • the processor 101 is configured to:
  • the data processing unit 103 is configured to: acquire the execution result of the access request from the second cache queue of the processor 101 .
  • the data processing unit 103 is configured to: receive an execution result of the access request transmitted by the memory 102 through a PCIE point-to-point transmission manner.
  • Computing devices may include desktop computers, laptop computers, handheld computers, notebook computers, ultra-mobile personal computers (UMPCs), netbooks, personal digital assistants (PDAs), augmented reality (augmented At least one of reality (AR) equipment, virtual reality (virtual reality, VR) equipment, artificial intelligence (AI) equipment, wearable equipment, vehicle equipment, smart home equipment, or smart city equipment, server equipment .
  • AR augmented reality
  • VR virtual reality
  • AI artificial intelligence
  • wearable equipment wearable equipment
  • vehicle equipment smart home equipment
  • smart city equipment server equipment
  • the computing device may include a processor 101 , a memory 102 and a data processing unit 103 . It can be understood that, the structure illustrated in the embodiment of the present application does not constitute a specific limitation on the computing device. In other embodiments of the present application, the computing device may include more or fewer components than shown, or combine some components, or separate some components, or arrange different components. The illustrated components can be realized in hardware, software or a combination of software and hardware.
  • An embodiment of the present application provides a non-volatile computer-readable storage medium, on which computer program instructions are stored, and when the computer program instructions are executed by a processor, the foregoing method is realized.
  • An embodiment of the present application provides a computer program product, including computer-readable codes, or a non-volatile computer-readable storage medium bearing computer-readable codes, when the computer-readable codes are stored in a processor of an electronic device When running in the electronic device, the processor in the electronic device executes the above method.
  • a computer readable storage medium may be a tangible device that can retain and store instructions for use by an instruction execution device.
  • a computer readable storage medium may be, for example, but is not limited to, an electrical storage device, a magnetic storage device, an optical storage device, an electromagnetic storage device, a semiconductor storage device, or any suitable combination of the foregoing.
  • Non-exhaustive list of computer-readable storage media include: portable computer disk, hard disk, random access memory (Random Access Memory, RAM), read only memory (Read Only Memory, ROM), erasable Electrically Programmable Read-Only-Memory (EPROM or flash memory), Static Random-Access Memory (Static Random-Access Memory, SRAM), Portable Compression Disk Read-Only Memory (Compact Disc Read-Only Memory, CD -ROM), Digital Video Disc (DVD), memory sticks, floppy disks, mechanically encoded devices such as punched cards or raised structures in grooves with instructions stored thereon, and any suitable combination of the foregoing .
  • RAM Random Access Memory
  • ROM read only memory
  • EPROM or flash memory erasable Electrically Programmable Read-Only-Memory
  • Static Random-Access Memory SRAM
  • Portable Compression Disk Read-Only Memory Compact Disc Read-Only Memory
  • CD -ROM Compact Disc Read-Only Memory
  • DVD Digital Video Disc
  • Computer readable program instructions or codes described herein may be downloaded from a computer readable storage medium to a respective computing/processing device, or downloaded to an external computer or external storage device over a network, such as the Internet, local area network, wide area network, and/or wireless network.
  • the network may include copper transmission cables, fiber optic transmission, wireless transmission, routers, firewalls, switches, gateway computers, and/or edge servers.
  • a network adapter card or a network interface in each computing/processing device receives computer-readable program instructions from the network and forwards the computer-readable program instructions for storage in a computer-readable storage medium in each computing/processing device .
  • Computer program instructions for performing the operations of the present application may be assembly instructions, instruction set architecture (Instruction Set Architecture, ISA) instructions, machine instructions, machine-related instructions, microcode, firmware instructions, state setting data, or in one or more source or object code written in any combination of programming languages, including object-oriented programming languages—such as Smalltalk, C++, etc., and conventional procedural programming languages—such as the “C” language or similar programming languages.
  • Computer readable program instructions may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer, or entirely on the remote computer or server implement.
  • the remote computer can be connected to the user's computer through any kind of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or it can be connected to an external computer such as use an Internet service provider to connect via the Internet).
  • electronic circuits such as programmable logic circuits, field-programmable gate arrays (Field-Programmable Gate Array, FPGA) or programmable logic arrays (Programmable Logic Array, PLA), the electronic circuit can execute computer-readable program instructions, thereby realizing various aspects of the present application.
  • each block in a flowchart or block diagram may represent a module, a portion of a program segment, or an instruction that includes one or more Executable instructions.
  • the functions noted in the block may occur out of the order noted in the figures. For example, two blocks in succession may, in fact, be executed substantially concurrently, or they may sometimes be executed in the reverse order, depending upon the functionality involved.
  • each block in the block diagrams and/or flowcharts, and combinations of blocks in the block diagrams and/or flowcharts can be implemented with hardware (such as circuits or ASIC (Application Specific Integrated Circuit, application-specific integrated circuit)), or it can be realized by a combination of hardware and software, such as firmware.
  • hardware such as circuits or ASIC (Application Specific Integrated Circuit, application-specific integrated circuit)
  • firmware such as firmware

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • General Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • Databases & Information Systems (AREA)
  • Computer Hardware Design (AREA)
  • Computing Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Mathematical Physics (AREA)
  • Memory System Of A Hierarchy Structure (AREA)

Abstract

本申请涉及一种数据访问方法及计算设备,所述方法由连接处理器的数据处理单元来执行。根据该方法,在数据处理单元将访问请求写入处理器的第一缓存队列后,通过高速外围组件互联点对点传输方式向存储器的标识指向的存储器发送第一指令,第一指令用于指示存储器从第一缓存队列获取访问请求并执行访问请求。其中,访问请求包括待访问的存储器的标识,存储器连接所述处理器且不直接连接所述数据处理单元,第一指令包括访问请求在第一缓存队列中的位置信息。由此,数据处理单元可以直接访问主机侧的存储器,而无需通过主机的处理器对存储器进行访问,从而不用占用主机的资源。并且,这种访问方式可以简化硬件连接方式,降低硬件设计的复杂度。

Description

数据访问方法及计算设备
本申请要求于2021年11月30日提交中国专利局、申请号为202111441693.4、申请名称为“数据访问方法及计算设备”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请涉及数据存储领域,尤其涉及一种数据访问方法及计算设备。
背景技术
随着数据存储技术的不断创新、完善、加速演进,数据存储的实现机制也越来越繁重。数据存储通常由主机侧中央处理器(central processing unit,CPU)访问主机侧存储器来执行。其执行过程持续占用并消耗处理器资源。可以理解的是,花费高昂成本购买的处理器的工作时间,在用于承担高价值的计算任务时效益更大,使处理器承担数据存储的实现工作显然并不是一个优选的方案。
基于此,现有技术提出增设连接主机侧处理器的数据处理单元(data processing unit,DPU),用于实现访问主机侧存储器来执行原本由主机侧处理器完成的数据存储的工作,这样就可以让处理器将尽可能多的算力投入到其上运行的虚拟机等计算业务中。然而,数据处理单元通常设置在主机外部,需要对硬件进行改进使得数据处理单元能够与主机侧的存储器进行数据交换。现有技术提出通过硬件线缆连接的方式,将主机侧的硬件存储器连接到数据处理单元,使得数据处理单元可以访问主机的硬件存储器。但设置硬件线缆会增加数据处理单元的硬件设计复杂度。
发明内容
有鉴于此,提出了一种数据访问方法及计算设备,本申请实施例的数据访问方法由计算设备的数据处理单元执行,根据本申请实施例的数据访问方法,在保证数据处理单元可以直接访问主机侧的硬件存储器的同时,简化硬件连接方式,降低硬件设计的复杂度。
第一方面,本申请的实施例提供了一种数据访问方法,所述方法由连接处理器的数据处理单元来执行,所述方法包括:将访问请求写入所述处理器的第一缓存队列,其中,所述访问请求包括待访问的存储器的标识,其中,所述存储器连接所述处理器且不直接连接所述数据处理单元;通过高速外围组件互联(PCIE)点对点传输方式向所述存储器的标识指向的所述存储器发送第一指令,其中,所述第一指令包括所述访问请求在所述第一缓存队列中的位置信息,所述第一指令用于指示所述存储器从所述第一缓存队列获取所述访问请求并执行所述访问请求。
根据本申请实施例的数据访问方法,先通过数据处理单元将包括待访问的存储器的标识的访问请求写入处理器的第一缓存队列,使得访问请求可以存储在处理器中的第一缓存队列中;再通过PCIE点对点传输方式向存储器的标识指向的存储器发送第一指令,使得待访问的存储器能够接收到第一指令;第一指令包括访问请求在第一缓存队列中的位置信息,第一指 令用于指示存储器从第一缓存队列获取访问请求并执行访问请求,使得存储器接收到第一指令后,能够按照第一指令的指示,从第一缓存队列中的相应位置处获取访问请求,并执行访问请求。通过这种方式,在存储器连接处理器且不直接连接数据处理单元时,数据处理单元无需通过主机的处理器对存储器进行访问,也能够实现对与处理器连接的存储器的直接访问,从而不会占用主机的资源。并且,这种访问方式可以简化硬件连接方式,降低硬件设计的复杂度。
根据第一方面,在所述数据访问方法的第一种可能的实现方式中,所述访问请求还包括所述访问请求的类型以及访问内存地址,所述访问内存地址为所述处理器为所述数据处理单元的内存分配的地址,所述访问请求的类型用于指示所述访问请求为读请求或写请求,所述访问内存地址为所述存储器可访问的地址。
通过这种方式,使得存储器从第一缓存队列获取访问请求时,可以确定访问请求的类型以及访问对象的地址信息,从而能根据访问请求的类型确定执行的是读取操作还是写入操作,并在执行访问请求时,根据访问对象的地址信息,完成对应操作类型下数据处理单元对于存储器的数据访问。
根据第一方面或第一方面的第一种可能的实现方式,在所述数据访问方法的第二种可能的实现方式中,所述方法还包括:从所述处理器的第二缓存队列中获取所述访问请求的执行结果。
由于数据处理单元连接处理器,使得数据处理单元能够从处理器中的第二缓存队列处获取访问请求的执行结果,从而确定访问请求的执行情况。数据处理单元不需对访问请求的执行结果进行缓存,可以降低对数据处理单元的存储空间的占用。
根据第一方面或第一方面的第一种可能的实现方式,在所述数据访问方法的第三种可能的实现方式中,所述方法还包括:接收所述存储器通过PCIE点对点传输方式传输的所述访问请求的执行结果。
通过这种方式,使得访问请求的执行结果不需由存储器经过处理器传输给数据处理单元,而是由存储器直接传输给数据处理单元,可以提升访问请求的执行结果的传输效率。
根据第一方面的第一种至第三种可能的实现方式中的任意一种可能的实现方式,在所述数据访问方法的第四种可能的实现方式中,所述方法还包括:将数据处理单元的内存的地址传输至所述处理器;接收来自所述处理器的所述访问内存地址。
通过这种方式,使得数据处理单元可以获取到存储器可以访问的访问内存地址,从而能使用该访问内存地址作为数据处理单元通过PCIE点对点传输方式进行数据传输的地址,使得数据处理单元和存储器在不直接连接的情况下传输数据成为可能。
根据第一方面,以及以上第一方面的任意一种可能的实现方式,在所述数据访问方法的第五种可能的实现方式中,所述方法还包括:接收所述处理器发送的第一缓存队列和第二缓存队列的地址,所述第一缓存队列的地址用于将访问请求写入所述处理器的第一缓存队列,所述第二缓存队列的地址用于从所述第二缓存队列中获取所述访问请求的执行结果。
通过这种方式,使得数据处理单元能够根据接收到的第一缓存队列的地址找到第一缓存队列,数据处理单元将访问请求写入第一缓存队列时,可以直接将访问请求写入;使得数据处理单元能够根据接收到的第二缓存队列的地址找到第二缓存队列,数据处理单元从第二缓存队列获取访问请求的执行结果时,能够直接获取访问请求的执行结果。从而保证写入访问 请求和获取访问请求结果的实时性和准确度。
第二方面,本申请的实施例提供了一种数据访问方法,所述方法应用于计算设备,所述计算设备包括处理器、连接所述处理器的存储器、以及连接所述处理器且与所述存储器不直接连接的数据处理单元,所述方法包括:所述数据处理单元将访问请求写入所述处理器的第一缓存队列,所述访问请求包括待访问的所述存储器的标识;所述数据处理单元通过高速外围组件互联(PCIE)点对点传输方式向所述存储器的标识指向的所述存储器发送第一指令,所述第一指令指示所述访问请求在所述第一缓存队列中的位置;所述存储器根据所述第一指令从所述第一缓存队列中获取所述访问请求并执行所述访问请求。
根据本申请实施例的数据访问方法,先使用数据处理单元将包括待访问的存储器的标识的访问请求写入处理器的第一缓存队列,使得访问请求可以存储在处理器中的第一缓存队列中;再使用数据处理单元通过PCIE点对点传输方式向存储器的标识指向的存储器发送第一指令,使得待访问的存储器能够接收到第一指令;第一指令包括访问请求在第一缓存队列中的位置信息,所述第一指令用于指示存储器从第一缓存队列获取访问请求并执行访问请求,使得存储器接收到第一指令后,能够按照第一指令的指示,从第一缓存队列中的相应位置处获取访问请求,并执行访问请求。通过这种方式,使得计算设备中,在存储器连接处理器且不直接连接数据处理单元时,计算设备无需通过主机的处理器对存储器进行访问,也能够实现对与处理器连接的存储器的直接访问,从而不会占用主机的资源。并且,这种访问方式可以简化硬件连接方式,降低硬件设计的复杂度。
根据第二方面,在所述数据访问方法的第一种可能的实现方式中,所述访问请求还包括访问内存地址,所述访问内存地址为所述处理器为所述数据处理单元的内存分配的地址,所述访问内存地址为所述存储器可访问的地址;所述执行所述访问请求包括:所述存储器根据所述访问内存地址,通过PCIE点对点传输方式将数据写入所述数据处理单元的内存或从所述数据处理单元的内存中读取数据。
通过这种方式,使得存储器可以根据访问内存地址找到访问对象也即数据处理单元的内存,从而能执行访问请求写入数据到数据处理单元的内存,或者,从数据处理单元的内存读取数据,完成数据处理单元的内存对存储器的访问。
根据第二方面的第一种可能的实现方式,在所述数据访问方法的第二种可能的实现方式中,所述访问请求还包括所述访问请求的类型,所述访问请求的类型用于指示所述访问请求为读请求或写请求;当所述访问请求为读请求时,所述执行所述访问请求包括:根据所述访问请求从所述存储器中读取数据,并根据所述访问内存地址,将所述存储器中读取的数据通过PCIE点对点传输方式写入所述数据处理单元的内存;当所述访问请求为写请求时,所述执行所述访问请求包括:根据所述访问内存地址,通过PCIE点对点传输方式从所述数据处理单元的内存中获取待写入所述存储器的数据,并将获取的数据写入所述存储器。
访问请求为读请求时,存储器能够写入数据到数据处理单元的内存,相当于数据处理单元的内存读存储器;访问请求为写请求时,存储器能够从数据处理单元的内存读取数据,相当于数据处理单元的内存写存储器。通过这种方式,使得对应于访问请求的不同类型,存储器能够以符合访问请求的需求的不同的操作进行响应。
根据第二方面的第一种或第二种可能的实现方式,在所述数据访问方法的第三种可能的实现方式中,所述方法还包括:所述数据处理单元向所述处理器发送所述数据处理单元的内 存的地址;所述处理器根据所述数据处理单元的内存的地址,为所述数据处理单元的内存分配所述访问内存地址;所述处理器向所述数据处理单元发送所述访问内存地址。
通过这种方式,使得处理器可以为数据处理单元的内存分配访问内存地址,从而能使用该访问内存地址作为数据处理单元通过PCIE点对点传输方式进行数据传输的地址,使得数据处理单元和存储器在不直接连接的情况下传输数据成为可能。为数据处理单元的内存分配访问内存地址并发送访问内存地址可以在配置阶段进行,使得数据处理单元中的软件产生访问请求之后,直接使用该访问内存地址即可,可以提升计算设备执行访问请求的效率。
根据第二方面,以及以上第二方面的任意一种可能的实现方式,在所述数据访问方法的第四种可能的实现方式中,所述方法还包括:所述存储器将所述访问请求的执行结果写入所述处理器的第二缓存队列;所述数据处理单元从所述处理器的第二缓存队列中获取所述访问请求的执行结果。
根据第二方面,以及第二方面的第一种至第三种可能的实现方式中的任意一种可能的实现方式,在所述数据访问方法的第五种可能的实现方式中,所述方法还包括:所述数据处理单元接收所述存储器通过PCIE点对点传输方式传输的所述访问请求的执行结果。
根据第二方面,以及以上第二方面的任意一种可能的实现方式,在所述数据访问方法的第六种可能的实现方式中,所述方法还包括:所述数据处理单元接收所述处理器发送的第一缓存队列和第二缓存队列的地址,所述第一缓存队列的地址用于将访问请求写入所述处理器的第一缓存队列,所述第二缓存队列的地址用于从所述第二缓存队列中获取所述访问请求的执行结果。
通过这种方式,使得数据处理单元能够根据接收到的第一缓存队列的地址找到第一缓存队列,以使得数据处理单元可以准确、迅速地执行将访问请求写入第一缓存队列、从第二缓存队列获取访问请求的执行结果的操作。第一缓存队列和第二缓存队列的创建,以及第一缓存队列和第二缓存队列的地址的发送可以在配置阶段进行,使得数据处理单元中的软件产生访问请求之后,直接使用第一缓存队列和第二缓存队列的地址即可,可以提升计算设备执行访问请求的效率。
第三方面,本申请的实施例提供了一种计算设备,包括:处理器;存储器,连接所述处理器;以及数据处理单元,连接所述处理器且与所述存储器不直接连接,所述数据处理单元用于执行上述第一方面或者第一方面的多种可能的实现方式中的一种或几种的数据访问方法。
第四方面,本申请的实施例提供了一种非易失性计算机可读存储介质,其上存储有计算机程序指令,所述计算机程序指令被处理器执行时实现上述第一方面或者第一方面的多种可能的实现方式中的一种或几种的数据访问方法。
第五方面,本申请的实施例提供了一种计算机程序产品,包括计算机可读代码,或者承载有计算机可读代码的非易失性计算机可读存储介质,当所述计算机可读代码在电子设备中运行时,所述电子设备中的处理器执行上述第一方面或者第一方面的多种可能的实现方式中的一种或几种的数据访问方法。
本申请的这些和其他方面在以下(多个)实施例的描述中会更加简明易懂。
附图说明
包含在说明书中并且构成说明书的一部分的附图与说明书一起示出了本申请的示例性实 施例、特征和方面,并且用于解释本申请的原理。
图1示出现有技术中计算设备的数据存储功能的一种实现方式。
图2示出现有技术中计算设备的数据存储功能的另一种实现方式。
图3示出根据本申请实施例的计算设备的示例性应用场景。
图4示出根据本申请实施例的计算设备的一种示例性工作流程。
图5示出根据本申请实施例的数据处理单元将数据处理单元的内存的地址发送至处理器的一个示例。
图6示出根据本申请实施例的处理器得到访问内存地址并发送访问内存地址的一个示例。
图7示出根据本申请实施例的处理器创建缓存队列并发送缓存队列地址的一个示例。
图8示出根据本申请实施例的数据处理单元将访问请求写入处理器的第一缓存队列的一个示例。
图9示出根据本申请实施例的数据处理单元产生第一指令并向存储器发送第一指令的一个示例。
图10示出根据本申请实施例的存储器获取访问请求并执行访问请求的一个示例。
图11示出根据本申请实施例的存储器将访问请求的执行结果写入第二缓存队列的一个示例。
图12示出根据本申请实施例的数据处理单元获取访问请求的执行结果的一个示例。
图13示出根据本申请实施例的计算设备的另一种示例性工作流程。
图14示出根据本申请实施例的存储器将访问请求的执行结果写入数据处理单元的一个示例。
图15示出根据本申请实施例的数据访问方法的示例性工作流程的示意图。
图16示出根据本申请实施例的数据访问方法的示例性工作流程的示意图。
图17示出根据本申请实施例的计算设备的示例性结构示意图。
具体实施方式
以下将参考附图详细说明本申请的各种示例性实施例、特征和方面。附图中相同的附图标记表示功能相同或相似的元件。尽管在附图中示出了实施例的各种方面,但是除非特别指出,不必按比例绘制附图。
在这里专用的词“示例性”意为“用作例子、实施例或说明性”。这里作为“示例性”所说明的任何实施例不必解释为优于或好于其它实施例。
另外,为了更好的说明本申请,在下文的具体实施方式中给出了众多的具体细节。本领域技术人员应当理解,没有某些具体细节,本申请同样可以实施。在一些实例中,对于本领域技术人员熟知的方法、手段、元件和电路未作详细描述,以便于凸显本申请的主旨。
图1示出现有技术中计算设备的数据存储功能的一种实现方式。图1所示的计算设备是以超融合基础架构下的计算设备为例。超融合基础架构(hyper converged infrastructure,HCI)是一种将虚拟化计算和存储整合到同一个系统平台的技术,可以将存储和计算功能集成到一个单一计算设备(或计算设备集群,每个计算设备都提供计算和存储功能)上。
如图1所示,对于超融合基础架构下的某一计算设备,其主机侧包括处理器(例如中央 处理器)、SAS扩展器、SAS磁盘阵列芯片以及存储器,存储器可例如包括串行连接SCSI接口标准硬盘驱动器/固态驱动器(serial attached SCSI hard disk drive/solid state drive,SAS HDD/SSD)和/或非易失性内存主机控制器接口标准固态驱动器(non-volatile memory express,NVMe SSD)等。其中,处理器和存储器NVMe SSD通过高速外围组件互联(peripheral component interconnect express,PCIE)线缆连接,处理器还通过PCIE线缆连接SAS磁盘阵列芯片,SAS磁盘阵列芯片还通过SAS扩展器连接存储器HDD/SSD。
处理器运行存储虚拟机以及多个用户虚拟机,其中存储虚拟机还运行分布式存储软件,每个用户虚拟机还运行各自的应用软件,分布式存储软件用于发出针对主机侧存储器的访问请求,处理器根据访问请求可以通过PCIE方式直接访问主机侧存储器,以为用户虚拟机中的应用软件提供存储服务。
这种方案的缺点是访问存储器实现数据存储功能由处理器来完成,会大量占用处理器资源,导致处理器对于用户虚拟机的支持能力下降。
为释放主机侧处理器的处理资源和内存,提升用户虚拟机的密度,节省系统授权费用,现有技术在图1的方案基础上提出了一种改进技术方案,图2示出现有技术中计算设备的数据存储功能的另一种实现方式。
如图2所示,数据处理单元通过PCIE线缆连接主机侧处理器(例如中央处理器),使用数据处理单元来代替主机侧处理器运行分布式存储软件,以向主机中的用户虚拟机中的应用软件提供数据存储服务,这样就可以让尽可能多的处理器计算能力来运行用户虚拟机的应用软件。在此情况下,需要解决数据处理单元如何访问主机侧存储器的问题。
现有技术提出通过硬件线缆连接的方式,将数据处理单元连接到主机侧存储器,使得数据处理单元能够访问主机侧存储器。其中,数据处理单元与存储器NVMe SSD可以通过PCIE线缆连接,数据处理单元与存储器HDD/SSD可以通过SAS线缆连接。然而,这种方案对于数据处理单元的硬件设施有如下的要求:
首先,SAS磁盘阵列芯片通常设计为插接在处理器上,因此,数据处理单元需要集成一个新的SAS磁盘阵列芯片来连接SAS扩展器。其次,要求数据处理单元具备足够的PCIE端口,或者需要集成PCIE Switch芯片(未示出),以通过PCIE端口或PCIE Switch芯片连接多个存储器NVMe SSD。
线缆和/或芯片的增加为数据处理单元带来额外的功耗、散热、空间布局、硬件成本等问题,使得数据处理单元的硬件连接方式和硬件设计复杂度大大增加。
有鉴于此,提出了一种数据访问方法及计算设备,本申请实施例的数据访问方法由计算设备的数据处理单元执行,根据本申请实施例的数据访问方法,在保证数据处理单元可以直接访问主机侧的硬件存储器的同时,简化硬件连接方式,降低硬件设计的复杂度。
图3示出根据本申请实施例的计算设备的示例性应用场景。
如图3所示,本申请实施例的计算设备主机侧包括处理器(例如中央处理器)以及多个存储器(可包括存储器NVMe SSD以及存储器HDD/SSD),数据处理单元通过PCIE线缆连接处理器,且不连接各存储器。其中,数据处理单元不设置在计算设备的主机侧。
在主机中,处理器可以运行至少一个用户虚拟机,每个用户虚拟机还分别运行各自的应用软件。应用软件具有数据存储需求。处理器和存储器NVMe SSD通过PCIE线缆连接,处理器还通过PCIE线缆连接SAS磁盘阵列芯片,SAS磁盘阵列芯片还通过SAS扩展器连接存储器 HDD/SSD。
数据处理单元可以运行分布式存储软件,在分布式存储软件产生访问请求时,或者数据处理单元运行的其他软件(未示出)产生访问请求时,数据处理单元执行本申请实施例的数据访问方法,通过PCIE点对点传输方式访问主机侧的存储器,以向主机侧处理器的用户虚拟机中的应用软件提供数据存储服务。
图3中的计算设备可例如应用于超融合基础架构中,也可以应用于其他数据处理单元连接主机侧处理器且不连接主机侧存储器时、数据处理单元访问主机侧存储器的应用场景,本申请实施例对计算设备的具体应用场景不作限制。
图4示出根据本申请实施例的计算设备的一种示例性工作流程。下面结合图4对本申请实施例实现数据处理单元访问主机侧的硬件存储器的示例性方法进行描述。
本申请实施例的计算设备在被用户使用之前可以先进行配置,使得计算设备中,数据处理单元可以直接响应访问请求,而不必在产生访问请求时再配置计算设备中的各装置或模块来为响应访问请求做准备。为了尽可能提升用户体验,配置可例如在计算设备出厂前完成。用户使用计算设备时,可以对处理器运行的应用软件进行各种操作。一些操作可能涉及到数据的读写,此时应用软件可以产生相应的存储需求。根据应用软件的存储需求,或者根据涉及到数据的读写的操作,处理器可以指示其连接的数据处理单元运行的分布式存储软件或其他软件(未示出)产生访问请求。此时数据处理单元可以将访问请求写入处理器中的缓存队列,使得存储器能够从缓存队列获取并执行访问请求,以达到访问主机侧的硬件存储器的目的。
如图4所示,根据本申请实施例的计算设备中,在分布式存储软件产生访问请求之前,计算设备可以执行步骤S1-S3,对数据处理单元、处理器以及存储器进行配置。在数据处理单元、处理器以及存储器配置好后,在分布式存储软件产生访问请求时,计算设备可以执行步骤S4-S8,实现数据处理单元访问主机侧的硬件存储器。
下面结合图4先介绍对数据处理单元、处理器以及存储器进行配置的示例性方法。
步骤S1,数据处理单元将数据处理单元的内存的地址发送至处理器。图5示出根据本申请实施例的数据处理单元将数据处理单元的内存的地址发送至处理器的一个示例。
举例来说,如图5所示,数据处理单元包括内存,用于存储数据处理单元在执行任务或指令时所使用的或者生成的数据。数据处理单元的内存中可以包括数据页面区,用于存储数据处理单元可以写出到计算设备中的其他装置或模块(例如存储器)的数据,以及由计算设备中的其他装置或模块(例如存储器)写入的数据。数据处理单元的内存的地址,可以是数据页面区的地址。存储器设置在主机侧,因此,对于存储器来说,能够通过PCIE点对点传输方式传输数据的对象也应具备主机侧的地址。而数据处理单元不设置在主机侧,因此,数据处理单元的内存的地址,并不是主机侧的地址,不能够作为存储器通过PCIE点对点传输方式传输数据的对象的地址。由于数据处理单元连接主机侧的处理器,在步骤S1中,数据处理单元可以先将该数据处理单元的内存的地址发送给主机侧的处理器。由处理器先处理得到对应于数据处理单元的内存的地址的主机侧的地址(其示例参见下文的步骤S2)。
步骤S2,处理器根据数据处理单元的内存的地址,为数据处理单元的内存分配访问内存地址,并将该访问内存地址发送给数据处理单元。图6示出根据本申请实施例的处理器得到访问内存地址并发送访问内存地址的一个示例。
举例来说,如图6所示,处理器设置在计算设备的主机侧,因此,由处理器生成的地址是主机侧的地址,是可以作为存储器通过PCIE点对点传输方式传输数据的传输对象的地址使用的。基于此,在接收到数据处理单元的内存的地址之后,处理器可以在其存储空间中,为数据处理单元的内存分配一块对应的存储区域,访问内存地址可以是该存储区域的地址。访问内存地址作为存储器通过PCIE点对点传输方式传输数据的对象的地址时,使得与该访问内存地址所属的存储区域对应的数据处理单元的内存,可以作为存储器通过PCIE点对点传输方式传输数据的传输对象,从而建立起存储器和数据处理单元之间的PCIE点对点传输方式的传输通道。访问内存地址可以在数据处理单元与存储器之间通过PCIE点对点传输方式的传输数据时(示例参见下文步骤S6)使用。
步骤S3,处理器创建对应于存储器的第一缓存队列、第二缓存队列,并将第一缓存队列、第二缓存队列在处理器中的存储地址分别发送给数据处理单元和存储器。图7示出根据本申请实施例的处理器创建缓存队列并发送缓存队列地址的一个示例。
举例来说,如图7所示,第一缓存队列可以是用于存储访问请求的提交队列(submission queue,SQ),第二缓存队列可以是存储访问请求的执行结果的完成队列(completion queue,CQ)。第一缓存队列、第二缓存队列可以储存在处理器的内存中,处理器同时连接数据处理单元和存储器,因此可以将第一缓存队列、第二缓存队列在处理器中的存储地址分别传输给数据处理单元和存储器。数据处理单元根据第一缓存队列在处理器中的存储地址可以写入访问请求到第一缓存队列(示例参见下文步骤S4),根据第二缓存队列在处理器中的存储地址可以从第二缓存队列读取访问请求的执行结果(示例参见下文步骤S8)。存储器根据第一缓存队列在处理器中的存储地址可以从第一缓存队列读取访问请求(示例参见下文步骤S6),根据第二缓存队列在处理器中的存储地址可以写入访问请求的执行结果到第二缓存队列(示例参见下文步骤S7)。通过这种方式,可以实现访问请求、访问请求的执行结果在数据处理单元和存储器之间的传输。
步骤S3也可以在步骤S1或步骤S2之前执行,或者,与步骤S1或步骤S2同时执行,本申请对步骤S3和步骤S1的执行顺序以及步骤S3和步骤S2的执行顺序不作限制。
执行步骤S1-S3之后,数据处理单元、处理器以及存储器已经配置完成。下面结合图4介绍数据处理单元、处理器以及存储器配置好后,计算设备实现数据处理单元访问主机侧的硬件存储器的示例性方法。
步骤S4,数据处理单元将访问请求写入处理器的第一缓存队列。
举例来说,在步骤S3中,数据处理单元接收到第一缓存队列的存储地址,因此,第一缓存队列在处理器中的位置对于数据处理单元来说是已知的。在此情况下,在数据处理单元中,分布式存储软件或其他软件产生访问请求时,数据处理单元可以根据第一缓存队列的地址,将访问请求写入第一缓存队列中。
图8示出根据本申请实施例的数据处理单元将访问请求写入处理器的第一缓存队列的一个示例。如图8所示,数据处理单元可以将访问请求先转换为提交队列(第一缓存队列)中的条目的形式(submission queue entry,SQE),转换后的条目可以包括操作码(operation code)、分散聚合表(scatter gather list,SGL)、标识等。第一缓存队列可以包括多个条目,将访问请求写入处理器的第一缓存队列时,可以是将根据访问请求转换后的条目通过直接存储器存取(direct memory access,DMA)方式写入到处理器的第一缓存队列的某条目的 位置(例如队列尾部的条目)。
其中,操作码可以指示访问请求的类型,访问请求可以分为不同类型,例如访问请求可以是数据处理单元读存储器的请求(相当于存储器传输数据到数据处理单元),此时访问请求可以是读请求,存储器执行访问请求时是写入操作,即存储器写入数据到数据处理单元(示例参见下文的步骤S6)。访问请求还可以是数据处理单元写存储器的请求(相当于数据处理单元传输数据到存储器),此时访问请求可以是写请求,存储器执行访问请求时是读取操作,即存储器读取数据处理单元的内存的数据(示例参见下文的步骤S6)。通过使用操作码指示访问请求的类型,从而能将访问请求的类型写入处理器的第一缓存队列,以使得存储器能够从第一缓存队列中获取到访问请求的类型,确定执行访问请求是读取操作还是写入操作。
分散聚合表可以存储访问内存地址。由上文步骤S2的相关描述可知,访问内存地址是可以在存储器通过PCIE点对点传输方式传输数据时作为传输对象的地址使用的。通过使用分散聚合表存储访问内存地址,从而能将访问内存地址写入处理器的第一缓存队列,以使得存储器执行访问请求时,能够根据访问内存地址找到读取操作或写入操作的对象(本示例中是数据处理单元的数据页面区)。
标识可以用于识别待访问的存储器。待访问的存储器可以是主机侧的存储器中的任一存储器。对于主机侧的每一存储器,可以分别对应一个第一缓存队列,在此情况下,每一第一缓存队列也可以包括对应于该第一缓存队列的存储器的标识。将访问请求写入处理器的第一缓存队列时,可以根据访问请求包括的标识,在多个第一缓存队列中,找到具有相同标识的第一缓存队列,再将访问请求写入第一缓存队列。
本领域技术人员应理解,访问请求转换后的条目可以包括更多内容,例如访问请求的长度等等,本申请对于访问请求包括的具体信息不作限制。
步骤S5,数据处理单元产生包括访问请求在第一缓存队列中的位置信息的第一指令,并通过PCIE点对点传输方式向访问请求的标识指向的存储器发送第一指令,第一指令用于指示存储器从第一缓存队列获取访问请求并执行访问请求。
高速外围组件互联点对点传输方式(也即PCIE点对点传输方式)是现有技术已有的一种数据传输方式,可以使得两个设备在不直接连接的情况下也能进行数据传输。本申请实施例中使用PCIE点对点传输方式使得不直接连接的数据处理单元和待访问的存储器之间能够传输第一指令。
作为第一指令的发送对象的待访问的存储器,可以是步骤S4中,访问请求包括的标识对应的存储器。第一指令可以包括访问请求在第一缓存队列中的位置信息,第一指令可以用于指示存储器从存储器对应的一个第一缓存队列获取访问请求并执行访问请求。也即,存储器从哪个队列获取访问请求是由第一缓存队列和存储器的对应关系决定的。在此情况下,待访问的存储器接收到第一指令后,可以根据第一指令的指示,从存储器对应的一个第一缓存队列中的对应位置处获取到访问请求,再执行访问请求。
图9示出根据本申请实施例的数据处理单元产生第一指令并向存储器发送第一指令的一个示例。如图9所示,可以通过指针(doorbell)记录访问请求在第一缓存队列中的位置信息,得到第一缓存队列的指针(即第一缓存队列的doorbell地址),第一指令可以包括第一缓存队列的指针。例如,执行步骤S4之前,第一缓存队列的指针可以是“3”,表示前次写入的访问请求写入到第一缓存队列的第3个条目处,执行步骤S4之后,在第一缓存队列的第4 个条目处新写入了访问请求,在此情况下,第一缓存队列的指针可以更新为“4”,在步骤S5中,处理器可以产生第一指令,第一指令包括的第一缓存队列的指针可以是“4”。处理器可以向待访问的存储器发送包括第一缓存队列的指针“4”的第一指令。
步骤S6,存储器根据第一指令的指示,从第一缓存队列获取访问请求并执行访问请求。
图10示出根据本申请实施例的存储器获取访问请求并执行访问请求的一个示例。
举例来说,如图10所示,存储器接收到来自处理器的第一指令,第一指令指示存储器从第一缓存队列获取访问请求并执行访问请求。以第一指令包括的第一缓存队列的指针信息是“4”为例,根据第一指令的指示,存储器从第一缓存队列的第4个条目的位置处获取访问请求。结合步骤S5的相关描述,获取到的可以是由访问请求转换后的条目。
根据转换后的条目中的操作码,存储器可以确定访问请求的类型。根据转换后的条目中的分散聚合表,存储器可以确定PCIE点对点传输方式进行数据传输的对象的地址(即访问内存地址)。根据访问请求的类型以及访问内存地址,存储器可以执行访问请求。
存储器执行访问请求的方式,可以是根据访问内存地址,通过PCIE点对点传输方式将数据写入数据处理单元的内存或从数据处理单元的内存中读取数据。其中,访问请求的类型是读请求时,存储器执行访问请求的方式,可以是根据访问请求从存储器中读取数据,并根据访问内存地址,将存储器中读取的数据通过PCIE点对点传输方式写入数据处理单元的内存,例如,可以写入数据处理单元的内存的数据页面区。访问请求的类型是写请求时,存储器执行访问请求的方式,可以是根据访问内存地址,通过PCIE点对点传输方式从数据处理单元的内存中获取待写入存储器的数据,例如,可以从数据处理单元的内存的数据页面区获取数据,并将获取的数据写入存储器。
其中,访问请求还可以包括访问存储器的某一存储区域的信息,例如存储器的某一存储区域的标识。访问请求的类型是读请求时,可以是根据访问请求从存储器中读取存储在访问请求包括的存储区域标识对应的存储区域中的数据。访问请求的类型是写请求时,可以将获取的数据写入存储器中访问请求包括的存储区域标识对应的存储区域。
步骤S7,存储器将访问请求的执行结果写入处理器的第二缓存队列。
访问请求是分布式存储软件等数据处理单元运行的软件产生的,因此,对应于访问请求,可以有一个访问请求的执行结果来传输至产生访问请求的软件,以告知产生访问请求的软件其产生的访问请求的执行情况。访问请求的执行结果可以有两种类型,一种是访问请求已执行完成,一种是访问请求尚未执行完成。
图11示出根据本申请实施例的存储器将访问请求的执行结果写入第二缓存队列的一个示例。如图11所示,存储器可以将访问请求的执行结果转换为完成队列(第二缓存队列)中的条目(completion queue entry,CQE)的形式,转换后的条目可以包括标识等。标识的不同类型可以对应不同的执行结果,例如标识是“1”可以表示具备该标识的条目是表示访问请求已执行完成的条目,标识是“0”可以表示具备该标识的条目是表示访问请求尚未执行完成的条目。第二缓存队列可以包括多个条目,将访问请求的执行结果写入处理器的第二缓存队列时,可以是将根据访问请求的执行结果转换后的条目通过直接存储器存取(direct memory access,DMA)方式写入到处理器的第二缓存队列的某条目的位置(例如队列尾部的条目)。
对于主机侧的每一存储器,在分别对应一个第一缓存队列时,还可以分别对应一个第二缓存队列,在此情况下,每一第二缓存队列也可以包括对应于该第二缓存队列的存储器的标 识。将访问请求的执行结果写入处理器的第二缓存队列时,可以根据访问请求包括的标识,在多个第二缓存队列中,找到具有相同标识的第二缓存队列,再将访问请求的执行结果写入第二缓存队列。
步骤S8,数据处理单元从处理器的第二缓存队列获取访问请求的执行结果。
图12示出根据本申请实施例的数据处理单元获取访问请求的执行结果的一个示例。
举例来说,步骤S7中,存储器将访问请求的执行结果写入处理器中的第二缓存队列,在写入时,是没有通知数据处理单元这一写入操作的。因此,对于数据处理单元来说,处理器的第二缓存队列的更新时间是未知的。为了获取访问请求的执行结果,如图12所示,可以设置数据处理单元每间隔预设时间段监测处理器中第二缓存队列是否有更新。在监测到第二缓存队列中出现新写入的访问请求的执行结果时,获取该访问请求的执行结果。数据处理单元可以将执行结果传输给产生访问请求的软件。数据处理单元不需要缓存访问请求的执行结果,可以降低数据处理单元的存储空间的占用率。
图13示出根据本申请实施例的计算设备的另一种示例性工作流程。如图13所示,其中步骤S1-S6可以参见上文图4中步骤S1-S6的相关描述,删去与第二缓存队列相关的部分即可,在此不再赘述。
步骤S6完成后,计算设备可以执行步骤S9:存储器通过PCIE点对点传输方式将访问请求的执行结果写入数据处理单元。图14示出根据本申请实施例的存储器将访问请求的执行结果写入数据处理单元的一个示例。
举例来说,步骤S6中,存储器执行访问请求之后,可以产生一个访问请求的执行结果。如图14所示,数据处理单元具备缓存访问请求的执行结果的功能时,访问请求的执行结果可以直接通过PCIE点对点传输方式写入数据处理单元。例如,可以在数据处理单元的存储空间中分配一块存储区域,用于缓存访问请求的执行结果。可以设置该区域的存储内容更新时产生中断通知数据处理单元,或者设置数据处理单元轮询该区域监测存储内容是否有更新。其中,中断以及轮询可以基于现有技术来实现。在新写入的访问请求的执行结果使得存储内容更新时,数据处理单元可以找到新写入的访问请求的执行结果,并将该执行结果传输给产生访问请求的软件。访问请求的执行结果不需要通过处理器传输,可以提高数据处理单元获取访问请求的执行结果的效率。
图15示出根据本申请实施例的数据访问方法的示例性工作流程的示意图。
如图15所示,本申请实施例提出一种数据访问方法,所述方法由连接处理器的数据处理单元来执行,所述方法包括步骤S10-S11:
步骤S10,将访问请求写入所述处理器的第一缓存队列,其中,所述访问请求包括待访问的存储器的标识,其中,所述存储器连接所述处理器且不直接连接所述数据处理单元。
步骤S11,通过高速外围组件互联(PCIE)点对点传输方式向所述存储器的标识指向的所述存储器发送第一指令,其中,所述第一指令包括所述访问请求在所述第一缓存队列中的位置信息,所述第一指令用于指示所述存储器从所述第一缓存队列获取所述访问请求并执行所述访问请求。
其中,步骤S10的示例性实现方式,可以参照上文以及图4、图13中步骤S4的相关描述。步骤S11的示例性实现方式,可以参照上文以及图4、图13中步骤S5的相关描述。
根据本申请实施例的数据访问方法,先通过数据处理单元将包括待访问的存储器的标识 的访问请求写入处理器的第一缓存队列,使得访问请求可以存储在处理器中的第一缓存队列中;再通过PCIE点对点传输方式向存储器的标识指向的存储器发送第一指令,使得待访问的存储器能够接收到第一指令;第一指令包括访问请求在第一缓存队列中的位置信息,第一指令用于指示存储器从第一缓存队列获取访问请求并执行访问请求,使得存储器接收到第一指令后,能够按照第一指令的指示,从第一缓存队列中的相应位置处获取访问请求,并执行访问请求。通过这种方式,在存储器连接处理器且不直接连接数据处理单元时,数据处理单元无需通过主机的处理器对存储器进行访问,也能够实现对与处理器连接的存储器的直接访问,从而不会占用主机的资源。并且,这种访问方式可以简化硬件连接方式,降低硬件设计的复杂度。
在一种可能的实现方式中,所述访问请求还包括所述访问请求的类型以及访问内存地址,所述访问内存地址为所述处理器为所述数据处理单元的内存分配的地址,所述访问请求的类型用于指示所述访问请求为读请求或写请求,所述访问内存地址为所述存储器可访问的地址。访问请求可以参照上文以及图8的相关描述中的访问请求的示例。
通过这种方式,使得存储器从第一缓存队列获取访问请求时,可以确定访问请求的类型以及访问对象的地址信息,从而能根据访问请求的类型确定执行的是读取操作还是写入操作,并在执行访问请求时,根据访问对象的地址信息,完成对应操作类型下数据处理单元对于存储器的数据访问。
在一种可能的实现方式中,所述方法还包括:从所述处理器的第二缓存队列中获取所述访问请求的执行结果。其示例性实现方式,可以参照上文以及图4中步骤S8的相关描述,以及上文及图12的相关描述。
由于数据处理单元连接处理器,使得数据处理单元能够从处理器中的第二缓存队列处获取访问请求的执行结果,从而确定访问请求的执行情况。数据处理单元不需对访问请求的执行结果进行缓存,可以降低对数据处理单元的存储空间的占用。
在一种可能的实现方式中,所述方法还包括:接收所述存储器通过PCIE点对点传输方式传输的所述访问请求的执行结果。其示例性实现方式,可以参照上文以及图13中步骤S9的相关描述,以及上文及图14的相关描述。
通过这种方式,使得访问请求的执行结果不需由存储器经过处理器传输给数据处理单元,而是由存储器直接传输给数据处理单元,可以提升访问请求的执行结果的传输效率。
在一种可能的实现方式中,所述方法还包括:将数据处理单元的内存的地址传输至所述处理器;接收来自所述处理器的所述访问内存地址。其中,将数据处理单元的内存的地址传输至所述处理器的示例性实现方式,可以参照上文以及图4、图13中步骤S1的相关描述,以及上文及图5的相关描述。接收来自所述处理器的所述访问内存地址的示例性实现方式,可以参照上文以及图4、图13中步骤S2的相关描述,以及上文及图6的相关描述。
通过这种方式,使得数据处理单元可以获取到存储器可以访问的访问内存地址,从而能使用该访问内存地址作为数据处理单元通过PCIE点对点传输方式进行数据传输的地址,使得数据处理单元和存储器在不直接连接的情况下传输数据成为可能。
在一种可能的实现方式中,所述方法还包括:接收所述处理器发送的第一缓存队列和第二缓存队列的地址,所述第一缓存队列的地址用于将访问请求写入所述处理器的第一缓存队列,所述第二缓存队列的地址用于从所述第二缓存队列中获取所述访问请求的执行结果。其 示例性实现方式,可以参照上文以及图4中步骤S3的相关描述,以及上文及图7的相关描述。
通过这种方式,使得数据处理单元能够根据接收到的第一缓存队列的地址找到第一缓存队列,数据处理单元将访问请求写入第一缓存队列时,可以直接将访问请求写入;使得数据处理单元能够根据接收到的第二缓存队列的地址找到第二缓存队列,数据处理单元从第二缓存队列获取访问请求的执行结果时,能够直接获取访问请求的执行结果。从而保证写入访问请求和获取访问请求结果的实时性和准确度。
图16示出根据本申请实施例的数据访问方法的示例性工作流程的示意图。
如图16所示,本申请提出一种数据访问方法,所述方法应用于计算设备,所述计算设备包括处理器、连接所述处理器的存储器、以及连接所述处理器且与所述存储器不直接连接的数据处理单元,所述方法包括步骤S12-S14:
步骤S12,所述数据处理单元将访问请求写入所述处理器的第一缓存队列,所述访问请求包括待访问的所述存储器的标识;
步骤S13,所述数据处理单元通过高速外围组件互联(PCIE)点对点传输方式向所述存储器的标识指向的所述存储器发送第一指令,所述第一指令指示所述访问请求在所述第一缓存队列中的位置;
步骤S14,所述存储器根据所述第一指令从所述第一缓存队列中获取所述访问请求并执行所述访问请求。
其中,步骤S12的示例性实现方式,可以参照上文以及图4、图13中步骤S4的相关描述。步骤S13的示例性实现方式,可以参照上文以及图4、图13中步骤S5的相关描述。步骤S14的示例性实现方式,可以参照上文以及图4、图13中步骤S6的相关描述。
根据本申请实施例的数据访问方法,先使用数据处理单元将包括待访问的存储器的标识的访问请求写入处理器的第一缓存队列,使得访问请求可以存储在处理器中的第一缓存队列中;再使用数据处理单元通过PCIE点对点传输方式向存储器的标识指向的存储器发送第一指令,使得待访问的存储器能够接收到第一指令;第一指令包括访问请求在第一缓存队列中的位置信息,所述第一指令用于指示存储器从第一缓存队列获取访问请求并执行访问请求,使得存储器接收到第一指令后,能够按照第一指令的指示,从第一缓存队列中的相应位置处获取访问请求,并执行访问请求。通过这种方式,使得计算设备中,在存储器连接处理器且不直接连接数据处理单元时,计算设备无需通过主机的处理器对存储器进行访问,也能够实现对与处理器连接的存储器的直接访问,从而不会占用主机的资源。并且,这种访问方式可以简化硬件连接方式,降低硬件设计的复杂度。
在一种可能的实现方式中,所述访问请求还包括访问内存地址,所述访问内存地址为所述处理器为所述数据处理单元的内存分配的地址,所述访问内存地址为所述存储器可访问的地址;步骤S14中,所述执行所述访问请求包括:所述存储器根据所述访问内存地址,通过PCIE点对点传输方式将数据写入所述数据处理单元的内存或从所述数据处理单元的内存中读取数据。
其中,访问内存地址可以参见上文以及图6的相关描述中的访问内存地址的示例。存储器将数据写入所述数据处理单元的内存或从所述数据处理单元的内存中读取数据的示例性实现方式,可以参照上文以及图4、图13中步骤S6的相关描述。
通过这种方式,使得存储器可以根据访问内存地址找到访问对象也即数据处理单元的内 存,从而能执行访问请求写入数据到数据处理单元的内存,或者,从数据处理单元的内存读取数据,完成数据处理单元的内存对存储器的访问。
在一种可能的实现方式中,所述访问请求还包括所述访问请求的类型,所述访问请求的类型用于指示所述访问请求为读请求或写请求;当所述访问请求为读请求时,所述执行所述访问请求包括:根据所述访问请求从所述存储器中读取数据,并根据所述访问内存地址,将所述存储器中读取的数据通过PCIE点对点传输方式写入所述数据处理单元的内存;当所述访问请求为写请求时,所述执行所述访问请求包括:根据所述访问内存地址,通过PCIE点对点传输方式从所述数据处理单元的内存中获取待写入所述存储器的数据,并将获取的数据写入所述存储器。
其中,访问请求的类型可以参见上文以及图8的相关描述中的访问请求的类型的示例。访问请求为读请求或写请求时,执行所述访问请求的示例性实现方式,可以参照上文以及图4、图13中步骤S6的相关描述。
访问请求为读请求时,存储器能够写入数据到数据处理单元的内存,相当于数据处理单元的内存读存储器;访问请求为写请求时,存储器能够从数据处理单元的内存读取数据,相当于数据处理单元的内存写存储器。通过这种方式,使得对应于访问请求的不同类型,存储器能够以符合访问请求的需求的不同的操作进行响应。
在一种可能的实现方式中,所述方法还包括:所述数据处理单元向所述处理器发送所述数据处理单元的内存的地址;所述处理器根据所述数据处理单元的内存的地址,为所述数据处理单元的内存分配所述访问内存地址;所述处理器向所述数据处理单元发送所述访问内存地址。其示例性实现方式,可以参照上文以及图4、图13中步骤S1、S2的相关描述。
通过这种方式,使得处理器可以为数据处理单元的内存分配访问内存地址,从而能使用该访问内存地址作为数据处理单元通过PCIE点对点传输方式进行数据传输的地址,使得数据处理单元和存储器在不直接连接的情况下传输数据成为可能。为数据处理单元的内存分配访问内存地址并发送访问内存地址可以在配置阶段进行,使得数据处理单元中的软件产生访问请求之后,直接使用该访问内存地址即可,可以提升计算设备执行访问请求的效率。
在一种可能的实现方式中,所述方法还包括:所述存储器将所述访问请求的执行结果写入所述处理器的第二缓存队列;所述数据处理单元从所述处理器的第二缓存队列中获取所述访问请求的执行结果。其示例性实现方式,可以参照上文以及图4中步骤S7、S8的相关描述。
在一种可能的实现方式中,所述方法还包括:所述数据处理单元接收所述存储器通过PCIE点对点传输方式传输的所述访问请求的执行结果。其示例性实现方式,可以参照上文以及图13中步骤S9的相关描述。
在一种可能的实现方式中,所述方法还包括:接收所述处理器发送的第一缓存队列和第二缓存队列的地址,所述第一缓存队列的地址用于将访问请求写入所述处理器的第一缓存队列,所述第二缓存队列的地址用于从所述第二缓存队列中获取所述访问请求的执行结果。其示例性实现方式,可以参照上文以及图4中步骤S3的相关描述,以及上文及图7的相关描述。
通过这种方式,使得数据处理单元能够根据接收到的第一缓存队列的地址找到第一缓存队列,以使得数据处理单元可以准确、迅速地执行将访问请求写入第一缓存队列、从第二缓存队列获取访问请求的执行结果的操作。第一缓存队列和第二缓存队列的创建,以及第一缓存队列和第二缓存队列的地址的发送可以在配置阶段进行,使得数据处理单元中的软件产生 访问请求之后,直接使用第一缓存队列和第二缓存队列的地址即可,可以提升计算设备执行访问请求的效率。
图17示出根据本申请实施例的计算设备的示例性结构示意图。
如图17所示,本申请实施例提供一种计算设备,包括:
处理器101;
存储器102,连接所述处理器101;以及
数据处理单元103,连接所述处理器101且与所述存储器102不直接连接,用于:将访问请求写入所述处理器101的第一缓存队列,所述访问请求包括待访问的所述存储器102的标识;通过高速外围组件互联(PCIE)点对点传输方式向所述存储器的标识指向的所述存储器102发送第一指令,所述第一指令指示所述访问请求在所述第一缓存队列中的位置;其示例性实现方式,可以参照上文及图4、图13中步骤S4、S5的相关描述,以及图8、图9的相关描述。
所述存储器102,用于根据所述第一指令从所述第一缓存队列中获取所述访问请求,并执行所述访问请求。其示例性实现方式,可以参照上文及图4、图13中步骤S6的相关描述,以及图10的相关描述。
其中处理器101可以参见图3中的中央处理器的示例,存储器102可以参见图3中的存储器HDD/SSD、NVMe SSD的示例,数据处理单元103可以参见图3中的数据处理单元的示例。
在一种可能的实现方式中,所述访问请求还包括访问内存地址,所述访问内存地址为所述处理器101为所述数据处理单元103的内存分配的地址,所述访问内存地址为所述存储器102可访问的地址;
所述存储器102用于根据所述访问内存地址,通过PCIE点对点传输方式将数据写入所述数据处理单元103的内存或从所述数据处理单元103的内存中读取数据。
在一种可能的实现方式中,所述访问请求还包括所述访问请求的类型,所述访问请求的类型用于指示所述访问请求为读请求或写请求;
当所述访问请求为读请求时,所述存储器102用于:根据所述访问请求从所述存储器102中读取数据,并根据所述访问内存地址,将所述存储器102中读取的数据通过PCIE点对点传输方式写入所述数据处理单元103的内存;
当所述访问请求为写请求时,所述存储器102用于:根据所述访问内存地址,通过PCIE点对点传输方式从所述数据处理单元103的内存中获取所述待写入所述存储器102的数据,并将获取的数据写入所述存储器102。
在一种可能的实现方式中,所述处理器101用于:
接收所述数据处理单元103发送的所述数据处理单元103的内存的地址;
根据所述数据处理单元103的内存的地址,为所述数据处理单元103的内存分配所述访问内存地址;
向所述数据处理单元103发送所述访问内存地址。
在一种可能的实现方式中,所述数据处理单元103用于:从所述处理器101的第二缓存队列中获取所述访问请求的执行结果。
在一种可能的实现方式中,所述数据处理单元103用于:接收所述存储器102通过PCIE点对点传输方式传输的所述访问请求的执行结果。
计算设备可以包括桌面型计算机、膝上型计算机、手持计算机、笔记本电脑、超级移动个人计算机(ultra-mobile personal computer,UMPC)、上网本、个人数字助理(personal digital assistant,PDA)、增强现实(augmented reality,AR)设备、虚拟现实(virtual reality,VR)设备、人工智能(artificial intelligence,AI)设备、可穿戴式设备、车载设备、智能家居设备、或智慧城市设备、服务器设备中的至少一种。本申请实施例对该计算设备的具体类型不作特殊限制。
计算设备可以包括处理器101,存储器102,数据处理单元103。可以理解的是,本申请实施例示意的结构并不构成对计算设备的具体限定。在本申请另一些实施例中,计算设备可以包括比图示更多或更少的部件,或者组合某些部件,或者拆分某些部件,或者不同的部件布置。图示的部件可以以硬件,软件或软件和硬件的组合实现。
本申请的实施例提供了一种非易失性计算机可读存储介质,其上存储有计算机程序指令,所述计算机程序指令被处理器执行时实现上述方法。
本申请的实施例提供了一种计算机程序产品,包括计算机可读代码,或者承载有计算机可读代码的非易失性计算机可读存储介质,当所述计算机可读代码在电子设备的处理器中运行时,所述电子设备中的处理器执行上述方法。
计算机可读存储介质可以是可以保持和存储由指令执行设备使用的指令的有形设备。计算机可读存储介质例如可以是――但不限于――电存储设备、磁存储设备、光存储设备、电磁存储设备、半导体存储设备或者上述的任意合适的组合。计算机可读存储介质的更具体的例子(非穷举的列表)包括:便携式计算机盘、硬盘、随机存取存储器(Random Access Memory,RAM)、只读存储器(Read Only Memory,ROM)、可擦式可编程只读存储器(Electrically Programmable Read-Only-Memory,EPROM或闪存)、静态随机存取存储器(Static Random-Access Memory,SRAM)、便携式压缩盘只读存储器(Compact Disc Read-Only Memory,CD-ROM)、数字多功能盘(Digital Video Disc,DVD)、记忆棒、软盘、机械编码设备、例如其上存储有指令的打孔卡或凹槽内凸起结构、以及上述的任意合适的组合。
这里所描述的计算机可读程序指令或代码可以从计算机可读存储介质下载到各个计算/处理设备,或者通过网络、例如因特网、局域网、广域网和/或无线网下载到外部计算机或外部存储设备。网络可以包括铜传输电缆、光纤传输、无线传输、路由器、防火墙、交换机、网关计算机和/或边缘服务器。每个计算/处理设备中的网络适配卡或者网络接口从网络接收计算机可读程序指令,并转发该计算机可读程序指令,以供存储在各个计算/处理设备中的计算机可读存储介质中。
用于执行本申请操作的计算机程序指令可以是汇编指令、指令集架构(Instruction Set Architecture,ISA)指令、机器指令、机器相关指令、微代码、固件指令、状态设置数据、或者以一种或多种编程语言的任意组合编写的源代码或目标代码,所述编程语言包括面向对象的编程语言—诸如Smalltalk、C++等,以及常规的过程式编程语言—诸如“C”语言或类似的编程语言。计算机可读程序指令可以完全地在用户计算机上执行、部分地在用户计算机上执行、作为一个独立的软件包执行、部分在用户计算机上部分在远程计算机上执行、或者完全在远程计算机或服务器上执行。在涉及远程计算机的情形中,远程计算机可以通过任意种类的网络—包括局域网(Local Area Network,LAN)或广域网(Wide Area Network,WAN)—连接到用户计算机,或者,可以连接到外部计算机(例如利用因特网服务提供商来通过因 特网连接)。在一些实施例中,通过利用计算机可读程序指令的状态信息来个性化定制电子电路,例如可编程逻辑电路、现场可编程门阵列(Field-Programmable Gate Array,FPGA)或可编程逻辑阵列(Programmable Logic Array,PLA),该电子电路可以执行计算机可读程序指令,从而实现本申请的各个方面。
这里参照根据本申请实施例的方法、装置(系统)和计算机程序产品的流程图和/或框图描述了本申请的各个方面。应当理解,流程图和/或框图的每个方框以及流程图和/或框图中各方框的组合,都可以由计算机可读程序指令实现。
附图中的流程图和框图显示了根据本申请的多个实施例的装置、系统、方法和计算机程序产品的可能实现的体系架构、功能和操作。在这点上,流程图或框图中的每个方框可以代表一个模块、程序段或指令的一部分,所述模块、程序段或指令的一部分包含一个或多个用于实现规定的逻辑功能的可执行指令。在有些作为替换的实现中,方框中所标注的功能也可以以不同于附图中所标注的顺序发生。例如,两个连续的方框实际上可以基本并行地执行,它们有时也可以按相反的顺序执行,这依所涉及的功能而定。
也要注意的是,框图和/或流程图中的每个方框、以及框图和/或流程图中的方框的组合,可以用执行相应的功能或动作的硬件(例如电路或ASIC(Application Specific Integrated Circuit,专用集成电路))来实现,或者可以用硬件和软件的组合,如固件等来实现。
尽管在此结合各实施例对本发明进行了描述,然而,在实施所要求保护的本发明过程中,本领域技术人员通过查看所述附图、公开内容、以及所附权利要求书,可理解并实现所述公开实施例的其它变化。在权利要求中,“包括”(comprising)一词不排除其他组成部分或步骤,“一”或“一个”不排除多个的情况。单个处理器或其它单元可以实现权利要求中列举的若干项功能。相互不同的从属权利要求中记载了某些措施,但这并不表示这些措施不能组合起来产生良好的效果。
以上已经描述了本申请的各实施例,上述说明是示例性的,并非穷尽性的,并且也不限于所披露的各实施例。在不偏离所说明的各实施例的范围的情况下,对于本技术领域的普通技术人员来说许多修改和变更都是显而易见的。本文中所用术语的选择,旨在最好地解释各实施例的原理、实际应用或对市场中的技术的改进,或者使本技术领域的其它普通技术人员能理解本文披露的各实施例。

Claims (17)

  1. 一种数据访问方法,其特征在于,所述方法由连接处理器的数据处理单元来执行,所述方法包括:
    将访问请求写入所述处理器的第一缓存队列,其中,所述访问请求包括待访问的存储器的标识,其中,所述存储器连接所述处理器且不直接连接所述数据处理单元;
    通过高速外围组件互联(PCIE)点对点传输方式向所述存储器的标识指向的所述存储器发送第一指令,其中,所述第一指令包括所述访问请求在所述第一缓存队列中的位置信息,所述第一指令用于指示所述存储器从所述第一缓存队列获取所述访问请求并执行所述访问请求。
  2. 根据权利要求1所述的方法,其特征在于,所述访问请求还包括所述访问请求的类型以及访问内存地址,所述访问内存地址为所述处理器为所述数据处理单元的内存分配的地址,所述访问请求的类型用于指示所述访问请求为读请求或写请求,所述访问内存地址为所述存储器可访问的地址。
  3. 根据权利要求1或2所述的方法,其特征在于,所述方法还包括:
    从所述处理器的第二缓存队列中获取所述访问请求的执行结果。
  4. 根据权利要求1或2所述的方法,其特征在于,所述方法还包括:
    接收所述存储器通过PCIE点对点传输方式传输的所述访问请求的执行结果。
  5. 根据权利要求2-4中任一项所述的方法,其特征在于,所述方法还包括:
    将数据处理单元的内存的地址传输至所述处理器;
    接收来自所述处理器的所述访问内存地址。
  6. 一种数据访问方法,其特征在于,所述方法应用于计算设备,所述计算设备包括处理器、连接所述处理器的存储器、以及连接所述处理器且与所述存储器不直接连接的数据处理单元,所述方法包括:
    所述数据处理单元将访问请求写入所述处理器的第一缓存队列,所述访问请求包括待访问的所述存储器的标识;
    所述数据处理单元通过高速外围组件互联(PCIE)点对点传输方式向所述存储器的标识指向的所述存储器发送第一指令,所述第一指令指示所述访问请求在所述第一缓存队列中的位置;
    所述存储器根据所述第一指令从所述第一缓存队列中获取所述访问请求并执行所述访问请求。
  7. 根据权利要求6所述的方法,其特征在于,所述访问请求还包括访问内存地址,所述访问内存地址为所述处理器为所述数据处理单元的内存分配的地址,所述访问内存地址为所述存储器可访问的地址;
    所述执行所述访问请求包括:所述存储器根据所述访问内存地址,通过PCIE点对点传输方式将数据写入所述数据处理单元的内存或从所述数据处理单元的内存中读取数据。
  8. 根据权利要求7所述的方法,其特征在于,所述访问请求还包括所述访问请求的类型,所述访问请求的类型用于指示所述访问请求为读请求或写请求;
    当所述访问请求为读请求时,所述执行所述访问请求包括:根据所述访问请求从所述存储器中读取数据,并根据所述访问内存地址,将所述存储器中读取的数据通过PCIE点对点传输方式写入所述数据处理单元的内存;
    当所述访问请求为写请求时,所述执行所述访问请求包括:根据所述访问内存地址,通过PCIE点对点传输方式从所述数据处理单元的内存中获取待写入所述存储器的数据,并将获取的数据写入所述存储器。
  9. 根据权利要求7或8所述的方法,其特征在于,所述方法还包括:
    所述数据处理单元向所述处理器发送所述数据处理单元的内存的地址;
    所述处理器根据所述数据处理单元的内存的地址,为所述数据处理单元的内存分配所述访问内存地址;
    所述处理器向所述数据处理单元发送所述访问内存地址。
  10. 根据权利要求6-9中任一项所述的方法,其特征在于,所述方法还包括:
    所述存储器将所述访问请求的执行结果写入所述处理器的第二缓存队列;
    所述数据处理单元从所述处理器的第二缓存队列中获取所述访问请求的执行结果。
  11. 根据权利要求6-9中任一项所述的方法,其特征在于,所述方法还包括:
    所述数据处理单元接收所述存储器通过PCIE点对点传输方式传输的所述访问请求的执行结果。
  12. 一种计算设备,其特征在于,包括:
    处理器;
    存储器,连接所述处理器;以及
    数据处理单元,连接所述处理器且与所述存储器不直接连接,用于:
    将访问请求写入所述处理器的第一缓存队列,所述访问请求包括待访问的所述存储器的标识;
    通过高速外围组件互联(PCIE)点对点传输方式向所述存储器的标识指向的所述存储器发送第一指令,所述第一指令指示所述访问请求在所述第一缓存队列中的位置;
    所述存储器,用于根据所述第一指令从所述第一缓存队列中获取所述访问请求并执行所述访问请求。
  13. 根据权利要求12所述的设备,其特征在于,所述访问请求还包括访问内存地址,所述访问内存地址为所述处理器为所述数据处理单元的内存分配的地址,所述访问内存地址为所述存储器可访问的地址;
    所述存储器用于根据所述访问内存地址,通过PCIE点对点传输方式将数据写入所述数据处理单元的内存或从所述数据处理单元的内存中读取数据。
  14. 根据权利要求13所述的设备,其特征在于,所述访问请求还包括所述访问请求的类型,所述访问请求的类型用于指示所述访问请求为读请求或写请求;
    当所述访问请求为读请求时,所述存储器用于:根据所述访问请求从所述存储器中读取数据,并根据所述访问内存地址,将所述存储器中读取的数据通过PCIE点对点传输方式写入所述数据处理单元的内存;
    当所述访问请求为写请求时,所述存储器用于:根据所述访问内存地址,通过PCIE点对点传输方式从所述数据处理单元的内存中获取所述待写入所述存储器的数据,并将获取的数据写入所述存储器。
  15. 根据权利要求13或14所述的设备,其特征在于,所述处理器用于:
    接收所述数据处理单元发送的所述数据处理单元的内存的地址;
    根据所述数据处理单元的内存的地址,为所述数据处理单元的内存分配所述访问内存地址;
    向所述数据处理单元发送所述访问内存地址。
  16. 根据权利要求12-15中任一项所述的设备,其特征在于,所述数据处理单元用于:
    从所述处理器的第二缓存队列中获取所述访问请求的执行结果。
  17. 根据权利要求12-15中任一项所述的设备,其特征在于,所述数据处理单元用于:
    接收所述存储器通过PCIE点对点传输方式传输的所述访问请求的执行结果。
PCT/CN2022/099520 2021-11-30 2022-06-17 数据访问方法及计算设备 WO2023098031A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202111441693.4 2021-11-30
CN202111441693.4A CN116204456A (zh) 2021-11-30 2021-11-30 数据访问方法及计算设备

Publications (1)

Publication Number Publication Date
WO2023098031A1 true WO2023098031A1 (zh) 2023-06-08

Family

ID=86509960

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/099520 WO2023098031A1 (zh) 2021-11-30 2022-06-17 数据访问方法及计算设备

Country Status (2)

Country Link
CN (1) CN116204456A (zh)
WO (1) WO2023098031A1 (zh)

Families Citing this family (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116521097B (zh) * 2023-07-03 2023-09-08 摩尔线程智能科技(北京)有限责任公司 存储器访问电路及存储器访问方法、集成电路和电子设备
CN116578245B (zh) * 2023-07-03 2023-11-17 摩尔线程智能科技(北京)有限责任公司 存储器访问电路及存储器访问方法、集成电路和电子设备
CN116719479B (zh) * 2023-07-03 2024-02-20 摩尔线程智能科技(北京)有限责任公司 存储器访问电路及存储器访问方法、集成电路和电子设备
CN116820344B (zh) * 2023-07-03 2024-04-26 摩尔线程智能科技(北京)有限责任公司 存储器访问电路及存储器访问方法、集成电路和电子设备
CN116795605B (zh) * 2023-08-23 2023-12-12 珠海星云智联科技有限公司 一种外围器件互联扩展设备异常自动恢复系统以及方法
CN116991593B (zh) * 2023-09-26 2024-02-02 芯来智融半导体科技(上海)有限公司 操作指令处理方法、装置、设备及存储介质
CN117112044B (zh) * 2023-10-23 2024-02-06 腾讯科技(深圳)有限公司 基于网卡的指令处理方法、装置、设备和介质
CN117858035B (zh) * 2024-03-06 2024-05-17 金锐同创(北京)科技股份有限公司 用于远程协助的数据处理方法、装置、计算机设备及介质

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20180121120A1 (en) * 2016-10-27 2018-05-03 Samsung Electronics Co., Ltd. Scaling out architecture for dram-based processing unit (dpu)
CN110892380A (zh) * 2017-07-10 2020-03-17 芬基波尔有限责任公司 用于流处理的数据处理单元
CN111602117A (zh) * 2018-01-19 2020-08-28 龙加智科技有限公司 具有记录及回放支持的任务关键型ai处理器
CN112131164A (zh) * 2020-09-23 2020-12-25 山东云海国创云计算装备产业创新中心有限公司 应用于加速板卡的数据调度方法、装置及加速板卡和介质
US20210250285A1 (en) * 2020-02-11 2021-08-12 Fungible, Inc. Scaled-out transport as connection proxy for device-to-device communications
CN113574656A (zh) * 2020-02-28 2021-10-29 华为技术有限公司 一种数据处理装置及方法
CN113676416A (zh) * 2021-10-22 2021-11-19 浙江锐文科技有限公司 一种在高速网卡/dpu内提升网络服务质量的方法

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20180121120A1 (en) * 2016-10-27 2018-05-03 Samsung Electronics Co., Ltd. Scaling out architecture for dram-based processing unit (dpu)
CN110892380A (zh) * 2017-07-10 2020-03-17 芬基波尔有限责任公司 用于流处理的数据处理单元
CN111602117A (zh) * 2018-01-19 2020-08-28 龙加智科技有限公司 具有记录及回放支持的任务关键型ai处理器
US20210250285A1 (en) * 2020-02-11 2021-08-12 Fungible, Inc. Scaled-out transport as connection proxy for device-to-device communications
CN113574656A (zh) * 2020-02-28 2021-10-29 华为技术有限公司 一种数据处理装置及方法
CN112131164A (zh) * 2020-09-23 2020-12-25 山东云海国创云计算装备产业创新中心有限公司 应用于加速板卡的数据调度方法、装置及加速板卡和介质
CN113676416A (zh) * 2021-10-22 2021-11-19 浙江锐文科技有限公司 一种在高速网卡/dpu内提升网络服务质量的方法

Also Published As

Publication number Publication date
CN116204456A (zh) 2023-06-02

Similar Documents

Publication Publication Date Title
WO2023098031A1 (zh) 数据访问方法及计算设备
US9645956B2 (en) Delivering interrupts through non-transparent bridges in a PCI-express network
US9285995B2 (en) Processor agnostic data storage in a PCIE based shared storage environment
CN109471833B (zh) 用于最大化PCIe对等连接的带宽的系统和方法
JP2019153297A (ja) Fpgaベースの加速のための新たなssd基本構造
US8645594B2 (en) Driver-assisted base address register mapping
EP2889780A1 (en) Data processing system and data processing method
CN108984465B (zh) 一种消息传输方法及设备
CN112130748B (zh) 一种数据访问方法、网卡及服务器
US8966130B2 (en) Tag allocation for queued commands across multiple devices
US9495172B2 (en) Method of controlling computer system and computer system
US20130166849A1 (en) Physically Remote Shared Computer Memory
US11201836B2 (en) Method and device for managing stateful application on server
WO2021063160A1 (zh) 访问固态硬盘的方法及存储设备
KR20110123541A (ko) 데이터 저장 장치 및 그것의 동작 방법
WO2016101856A1 (zh) 数据访问方法及装置
US20240061802A1 (en) Data Transmission Method, Data Processing Method, and Related Product
US11093175B1 (en) Raid data storage device direct communication system
JP2009199428A (ja) ストレージ装置及びアクセス命令送信方法
JP5728088B2 (ja) 入出力制御装置及び入出力制御装置のフレーム処理方法
US10860334B2 (en) System and method for centralized boot storage in an access switch shared by multiple servers
US20230385213A1 (en) Method and system for a disaggregated persistent memory system using persistent memory servers
US11960417B2 (en) Input/output queue hinting for resource utilization
WO2022141250A1 (zh) 数据传输方法和相关装置
US11847316B2 (en) System and method for managing data storage in network interface controllers

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22899851

Country of ref document: EP

Kind code of ref document: A1

WWE Wipo information: entry into national phase

Ref document number: 2022899851

Country of ref document: EP

ENP Entry into the national phase

Ref document number: 2022899851

Country of ref document: EP

Effective date: 20240603