WO2023105603A1 - Dispositif de gestion de transfert, procédé de gestion de transfert et programme de gestion de transfert - Google Patents

Dispositif de gestion de transfert, procédé de gestion de transfert et programme de gestion de transfert Download PDF

Info

Publication number
WO2023105603A1
WO2023105603A1 PCT/JP2021/044832 JP2021044832W WO2023105603A1 WO 2023105603 A1 WO2023105603 A1 WO 2023105603A1 JP 2021044832 W JP2021044832 W JP 2021044832W WO 2023105603 A1 WO2023105603 A1 WO 2023105603A1
Authority
WO
WIPO (PCT)
Prior art keywords
transfer
priority
request
queue
size
Prior art date
Application number
PCT/JP2021/044832
Other languages
English (en)
Japanese (ja)
Inventor
育生 大谷
圭 藤本
奨悟 斎藤
Original Assignee
日本電信電話株式会社
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 日本電信電話株式会社 filed Critical 日本電信電話株式会社
Priority to PCT/JP2021/044832 priority Critical patent/WO2023105603A1/fr
Publication of WO2023105603A1 publication Critical patent/WO2023105603A1/fr

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures
    • G06F12/02Addressing or allocation; Relocation
    • G06F12/06Addressing a physical block of locations, e.g. base addressing, module addressing, memory dedication
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F13/00Interconnection of, or transfer of information or other signals between, memories, input/output devices or central processing units
    • G06F13/14Handling requests for interconnection or transfer
    • G06F13/20Handling requests for interconnection or transfer for access to input/output bus
    • G06F13/28Handling requests for interconnection or transfer for access to input/output bus using burst mode transfer, e.g. direct memory access DMA, cycle steal

Definitions

  • the present invention relates to a transfer management device, a transfer management method, and a transfer management program.
  • Non-Patent Document 1 DMA hardware transfer enables high-speed and large-capacity data transfer (Non-Patent Document 2).
  • DMA often uses an application (APL) on a server as a transfer source and an accelerator or network device on the same server as a transfer destination.
  • APL application
  • Non-Patent Document 3 describes the XDMA specifications of Xilinx Corporation, which is an example of a transfer engine.
  • the Xilinx driver has multiple channels of queues that store requests to be transferred, and takes out requests from each channel in round robin (equally) and transfers them.
  • FIG. 26 is a configuration diagram of a conventional physical server 100z.
  • a plurality of threads 11z each issues a request (illustrated as a hatched rectangle) requesting DMA transfer of data to the host DRAM 12z.
  • the DMA transfer driver 1z reads each request and sequentially transfers them to the device DRAM 51z in the device 50z via the PCIe (Peripheral Component Interconnect Express) 59z.
  • An IP core 52z which is an FPGA (Field Programmable Gate Array) in the device 50z, reads and processes each request transferred to the device DRAM 51z.
  • FIG. 27 is a configuration diagram of the DMA transfer driver 1z in FIG.
  • the DMA transfer driver 1z has a reception queue 20z, a transfer queue 30z, and a transfer engine 40z.
  • the reception queue 20z stores a request issued from each thread 11z as a pointer pointing to a data storage destination in the host DRAM 12z.
  • the transfer queue 30z stores the pointer of the request read from the reception queue 20z to notify the transfer engine 40z.
  • the transfer engine 40z sequentially reads request data in the host DRAM 12z from the pointer in the transfer queue 30z and transfers the data to the device DRAM 51z.
  • a transfer wait for the next request occurs while a certain request is being transferred.
  • the following procedure has a transfer wait problem.
  • the file transfer application A1 (first thread 11z) requests the transfer engine 40z to make a request R1 for a file with a large data size.
  • the transfer engine 40z starts transferring the request R1 because there are no other requests to be transferred at this time.
  • the call application A2 (second thread 11z) issues a high-priority call data request R2 with a short delay (TAT: Turnaround Time) requirement to the transfer engine 40z.
  • the main object of the present invention is to satisfy the transfer efficiency and delay requirements of the transferred data in a well-balanced manner when transferring data from a plurality of transfer sources to the same transfer destination.
  • the present invention is a transfer management apparatus having a host section having a host memory in which transfer source data is stored, and a device section having a device memory as a data transfer destination,
  • the host unit has a transfer control unit that performs DMA transfer from the host memory to the device memory, and a plurality of transfer queues that store requests indicating transfer requests specifying data to be DMA transferred. cage,
  • the transfer control unit creates one or more subsets from the data in the host memory specified by the request in each of the transfer queues, and DMA transfers the selected subsets of each of the transfer queues in order to the device memory. characterized by
  • the present invention when transferring data from a plurality of transfer sources to the same transfer destination, it is possible to satisfy the transfer efficiency and delay requirements of the transferred data in a well-balanced manner.
  • FIG. 1 is a configuration diagram of a physical server related to this embodiment
  • FIG. 2 is a configuration diagram of a DMA transfer driver in FIG. 1 relating to the present embodiment
  • FIG. FIG. 27 is a time chart when high-priority requests are kept waiting when the conventional DMA transfer driver shown in FIG. 26 is used
  • FIG. FIG. 4 is a time chart when the waiting time of high-priority requests is shortened by data division of requests when the DMA transfer driver of the present embodiment shown in FIG. 1 is used
  • FIG. 5 is a time chart when the waiting time of high-priority requests is further reduced than in the case of FIG. 4 by changing the size of the subset of requests according to the present embodiment
  • It is a time chart when the subset size of an own request regarding this embodiment is large.
  • FIG. 4 is a time chart when the subset size of own requests is small according to the present embodiment; 3 is a detailed configuration diagram of a DMA transfer driver according to this embodiment; FIG. FIG. 9 is a table showing the state of transfer processing of the DMA transfer driver of FIG. 8 according to the present embodiment at each time;
  • FIG. 4 is a configuration diagram of a queue ID table according to this embodiment;
  • FIG. 4 is a configuration diagram of a transfer queue management table according to the embodiment;
  • FIG. 4 is a configuration diagram of a subset size table according to the embodiment;
  • FIG. 4 is a configuration diagram of a request counter table according to the embodiment;
  • FIG. 4 is a configuration diagram of a transfer engine table according to the embodiment;
  • FIG. 4 is a configuration diagram of a priority table according to the embodiment;
  • FIG. 3 is a hardware configuration diagram of a physical server according to the embodiment;
  • FIG. FIG. 5 is a sequence diagram showing request transfer processing of a physical server according to the embodiment;
  • FIG. 18 is a sequence diagram showing the request transfer processing of the physical server following FIG. 17;
  • FIG. 10 is a flow chart showing processing of notifying a request from a reception queue to a transfer queue according to the embodiment;
  • FIG. 11 is a flow chart showing processing for notifying a transfer engine of a subset of requests from a transfer queue according to the present embodiment;
  • FIG. FIG. 10 is a flowchart showing a process of DMA-transferring a subset from a transfer engine to a device according to this embodiment;
  • FIG. 7 is a flowchart illustrating a first example of processing for changing the subset size of requests according to the present embodiment
  • FIG. 23 is a time-series graph of subset sizes of high priority and low priority in the first example of FIG. 22
  • FIG. 11 is a flowchart illustrating a second example of processing for changing the subset size of requests according to the present embodiment
  • FIG. FIG. 25 is a time-series graph of subset sizes of high priority and low priority in the second example of FIG. 24
  • FIG. 1 is a configuration diagram of a conventional physical server
  • FIG. FIG. 27 is a configuration diagram of a DMA transfer driver in FIG. 26;
  • FIG. 1 is a configuration diagram of a physical server 100.
  • a physical server (transfer management device) 100 has a host (host unit) 10 having a host DRAM (host memory) 12 in which transfer source data is stored, and a device DRAM (device memory) 51 as a data transfer destination.
  • device 50 device section.
  • the thread 11 is, for example, a network application that speeds up data processing using the accelerator device 50, and operates according to the vRAN (virtual radio access network) standard.
  • vRAN virtual radio access network
  • a plurality of threads 11 each issues a request, which is a data transfer request, to the host DRAM 12 .
  • a DMA transfer driver (transfer control unit) 1 reads the data of each request and sequentially transfers the data to the device DRAM 51 in the device 50 via the PCIe 59 .
  • An IP core 52 which is an FPGA within the device 50, reads and processes the data of each request transferred to the device DRAM51.
  • FIG. 2 is a configuration diagram of the DMA transfer driver 1 of FIG.
  • the DMA transfer driver 1 has a reception queue 20 , a transfer queue 30 and a transfer engine 40 .
  • the DMA transfer driver 1 creates one or more subsets from the data in the host DRAM 12 specified by the request in each transfer queue 30 and DMA-transfers the sequentially selected subsets of each transfer queue 30 to the device DRAM 51 .
  • the reception queue 20 stores requests issued from each thread 11 as pointers in the host DRAM 12 .
  • the transfer queue 30 stores the pointer of the request read from the reception queue 20 to notify the transfer engine 40 .
  • one transfer queue 30z is provided in one DMA transfer driver 1z, but in FIG. 2, one DMA transfer driver 1 is provided with a plurality of transfer queues 30.
  • Each transfer queue 30 is prepared, for example, for each priority of a request, and one or more requests with the same priority are stored in the same transfer queue 30 . In this specification, a request with a higher priority value is prioritized over other requests in the DMA transfer process.
  • each queue in FIG. 2 is expanded from each queue in FIG. 27 as follows.
  • the reception queue 20 corresponds to a plurality of transfer queues 30, and a request can be distributed to any one of the transfer queues 30 according to an instruction from the transfer queue determination unit 21 of FIG.
  • Individual priorities are assigned to the plurality of transfer queues 30, and requests of the priorities are stored.
  • Each transfer queue 30 is extended to enqueue (insert into the queue) a single request and then dequeue (remove from the queue) the request divided into multiple subsets. Therefore, the transfer queue 30 may have separate areas for storing requests before division and areas for storing subsets after division.
  • the transfer engine 40 is extended so that subsets from a plurality of transfer queues 30 can be sequentially read and transferred according to instructions from the transfer queue selector 41 of FIG.
  • a "subset" of the request that is, one I will call it a split request.
  • the transfer engine 40 sequentially reads requests in the host DRAM 12 from pointers in the transfer queue 30 and transfers the requests to the device DRAM 51 . Each transfer engine 40 does not accept the next request until the transfer of the request accepted by itself is completed. When the transfer engine 40 enters a ready state in which no transfer is being performed, if there is only one transfer queue 30 with a subset, the transfer engine 40 takes out the subset from the transfer queue 30 and transfers it.
  • the transfer engine 40 takes out and transfers the subsets from the transfer queues 30 selected in order by a method such as round robin.
  • the selection method of the transfer queue 30 may be flexibly changed to round robin, weighted round robin, or the like, but logic with relatively light load is preferable.
  • a plurality of transfer engines 40 may be provided within one DMA transfer driver 1 . Each transfer engine 40 can perform transfer processing in parallel unless requests are transferred to the same transfer destination at the same time.
  • FIG. 3 is a time chart when high-priority requests R2 and R3 are kept waiting when the conventional DMA transfer driver 1z shown in FIG. 26 is used.
  • the horizontal axis of the time chart indicates time slots (time t0 to t9), and the vertical axis indicates requests stored in each component in physical server 100z at each time.
  • pointers are issued in the order of three types of requests R1, R2, and R3 in the reception queue 20z.
  • the request R1 has a large data size but has a low priority, and the requests R2 and R3 have a small data size but have a high priority (with strict TAT requirements).
  • the pointers of requests R1, R2, and R3 are enqueued from the reception queue 20z into the transfer queue 30z in this order.
  • the transfer engine 40z reads the request R1 from the transfer queue 30z and DMA transfers it from the host DRAM 12z to the device DRAM 51z.
  • the request R1 is first stored in the IP core 52z(#1) after time t6.
  • the transfer engine 40z reads the remaining requests R2 and R3 from the transfer queue 30z and DMA transfers them.
  • the request R2 arrives at the IP core 52z(#1) at time t6
  • the request R3 arrives at the IP core 52z(#2) at time t7.
  • the high-priority requests R2 and R3 failed to comply with the delay requirement due to the long waiting time caused by the request R1 from time t3 to t5.
  • FIG. 4 is a time chart when the waiting time of high-priority requests R2 and R3 is reduced by data division of requests when the DMA transfer driver 1 of this embodiment shown in FIG. 1 is used.
  • time t0 similarly to time t1 in FIG. 3, pointers are issued in the order of three types of requests R1, R2, and R3 in the reception queue 20 (the priority and size of each request, and the illustrated painting pattern are also (identical to FIG. 3).
  • a low-priority transfer queue 30 (#1) is assigned a low-priority request R1
  • a high-priority transfer queue 30 (#2) is assigned high-priority requests R2 and R3.
  • the pointer of each request is enqueued in the transfer queue 30, so the size of the requests is not directly shown in the time chart.
  • the requests in the transfer queue 30 are then divided into a plurality of subsets of the same determined size until they are dequeued.
  • Each of the high priority requests R2 and R3 is divided into two, and the low priority request R1 is divided into twelve.
  • the transfer engine 40 sequentially (round robin) reads subsets from the two transfer queues 30 and DMA transfers them from the host DRAM 12 to the device DRAM 51 .
  • the requests R2 and R3 that enter the DMA transfer driver 1 later at time t0 are transferred through the interval between the divisions of the request R1 that entered earlier at time t0, thereby suppressing the delay. rice field.
  • FIG. 5 is a time chart when the waiting time of high-priority requests is further reduced than in the case of FIG. 4 by changing the subset size of requests.
  • the DMA transfer driver 1 refers to the priority of each request to determine the size of the subset. Times t0 and t1 are the same in FIGS. One or more subsets are generated from the requests in the transfer queue 30 according to the size according to the priority of each request.
  • the DMA transfer driver 1 (transfer size determination unit 33 in FIG. 8) increases the subset size as the priority is higher.
  • the high-priority requests R2 and R3 are not divided and become subsets one by one.
  • a low-priority request R1 is divided into twelve.
  • the transfer engine 40 sequentially (round robin) reads subsets from the two transfer queues 30 and DMA transfers them from the host DRAM 12 to the device DRAM 51 .
  • FIG. 6 is a time chart when the subset size of own request R1 is large.
  • the subset size of its own request R1 it is divided into two subsets and DMA-transferred.
  • the other request R2 started DMA transfer at time t3 and arrived at the IP core 52 (#2) at time t4. Therefore, the amount of one-time transfer of the own request R1 increases, and the other request R2 cannot be transferred during that time, so transfer interruption (preemption) is less likely to occur. Therefore, the throughput of the own request R1 increases, and the TAT of the other request R2 deteriorates.
  • FIG. 7 is a time chart when the subset size of own request R1 is small.
  • the subset size of own request R1 it is divided into four subsets and DMA-transferred.
  • the other request R2 started DMA transfer at time t2 and arrived at the IP core 52 (#2) at time t3. Therefore, the amount of one-time transfer of the own request R1 is reduced, and the other request R2 can be transferred in between, making the transfer more likely to be interrupted. Therefore, the throughput of the own request R1 is reduced, and the TAT of the other request R2 is improved.
  • the subset size is a trade-off between throughput and TAT. Therefore, the DMA transfer driver 1 attempts optimization by flexibly changing the subset size. By changing the subset size, the frequency with which the request is interrupted by transfer requests from other requests is adjusted (details are shown in FIGS. 22 to 25).
  • FIG. 8 is a detailed block diagram of the DMA transfer driver 1.
  • the DMA transfer driver 1 includes, in addition to the reception queue 20, M transfer queues 30, and N transfer engines 40 described with reference to FIG. It has a counter 31 , a request division unit 32 , a transfer size determination unit 33 , a transfer queue selection unit 41 , an integrated control unit 42 and an external controller IF 61 .
  • each arrow of FIG. 8 is as follows. ⁇ Bold solid arrow: Request path ⁇ Bold dashed arrow: Management signal (m-plane) path ⁇ Thin solid arrow: Control signal (c-plane) path
  • the transfer queue determination unit 21 dequeues the request in the reception queue 20 and determines which transfer queue 30 to enqueue.
  • the request counter 31 counts the number of requests stored in each transfer queue 30 .
  • the priority determination unit 22 determines the priority of the request based on the content of the request or information from the external controller IF61.
  • the transfer size determination unit 33 determines the size of subsets when splitting a request based on the priority of the request and the communication requirements (TAT, throughput, etc.) required for the request. For example, the transfer size determining unit 33 desirably increases the size of the subset as the priority of the request is higher. As a result, the higher the priority of the request, the more data can be transferred at once, so the throughput and TAT of the high priority request can be improved.
  • the request division unit 32 divides the request based on the size determined by the transfer size determination unit 33 .
  • the transfer queue selector 41 determines from which transfer queue 30 an empty transfer engine 40 dequeues a subset.
  • the integrated control unit 42 sets setting information for the transfer size determining unit 33 (designation of the minimum and maximum size of the subset) and setting information for the transfer queue selection unit 41 (designation of the read count policy). control.
  • a control signal from the external controller 60 is notified inside the DMA transfer driver 1 such as the priority determining unit 22 via the external controller IF 61 .
  • external cooperation can be executed, such as the external controller 60 notifying the priority determining unit 22 of the priority of each request.
  • the DMA transfer driver 1 includes the request division unit 32 and the transfer queue selection unit 41, so that each request is divided into a plurality of subsets with a fixed size and then DMA-transferred as described with reference to FIG. .
  • the processing does not refer to the priority of each request, so the development cost for modifying the DMA transfer driver 1 can be eliminated.
  • the DMA transfer driver 1 includes the priority determination unit 22 and the transfer size determination unit 33 in addition to the request division unit 32 and the transfer queue selection unit 41, so that each Requests are divided into one or more subsets according to priority and then DMA transferred. As a result, the completion of the DMA transfer of the requests R2 and R3 could be made earlier than in FIG. On the other hand, some development costs are required for modifying the DMA transfer driver 1 to allow the DMA transfer driver 1 to recognize the priority of each request.
  • FIG. 9 is a table 70 showing the state of transfer processing of the DMA transfer driver 1 of FIG. 8 at each time.
  • Table 70 shows the data stored in reception queue 20, transfer queue 30 (#1 for low priority, #2 for high priority), transfer engine 40, and device DRAM 51, which are components for each of times T1 to T7. .
  • the components of the DMA transfer driver 1 shown in FIG. 8 will be clarified by explaining the contents of the table 70 along the time T1 to T7.
  • request B (low priority) is received in the reception queue 20 .
  • the two transfer queues 30 (#1, #2) are both empty (no stored entries), and the transfer engine 40 is also empty.
  • the transfer queue determination unit 21 enqueues the request B dequeued from the reception queue 20 to the transfer queue 30 (#1) determined by referring to the priority of the request determined by the priority determination unit 22 .
  • the transfer size determining unit 33 refers to the subset size table 73 of FIG. 12 and determines the division size and division number (here, 2) of the request B in the transfer queue 30 (#1).
  • the request dividing unit 32 divides the request B into two subsets of requests B1 and B2.
  • the transfer queue selection unit 41 dequeues the request B1 from the transfer queue 30 (#1) selected by referring to the transfer engine table 75 of FIG. 14, and requests the transfer engine 40 to perform DMA transfer.
  • the transfer engine 40 DMA-transfers the request B1 to the device DRAM 51 .
  • request A (high priority) is received in the reception queue 20 .
  • the transfer queue determination unit 21 enqueues the request A dequeued from the reception queue 20 to the transfer queue 30 (#2) determined by referring to the priority of the request determined by the priority determination unit 22 .
  • the request dividing unit 32 divides the request A in the transfer queue 30 (#2) into one subset because the original size is small.
  • the transfer queue selection unit 41 refers to the transfer engine table 75 and dequeues the request A of the transfer queue 30 (#2) instead of the request B2 of the transfer queue 30 (#1) as the next processing target.
  • the transfer engine 40 DMA-transfers the request A to the device DRAM 51 .
  • the request A which arrived at the reception queue 20 later than the request B, was transmitted with priority over the request B2, thereby keeping the TAT.
  • FIG. 10 is a configuration diagram of the queue ID table 71. As shown in FIG. A transfer queue ID is registered in the queue ID table 71 for each priority.
  • the transfer queue determination unit 21 refers to the queue ID table 71 and selects the transfer queue 30 of the transfer destination.
  • the integrated control unit 42 writes entries in the queue ID table 71 in advance.
  • FIG. 11 is a configuration diagram of the transfer queue management table 72. As shown in FIG. The transfer queue management table 72 registers the usage status for each transfer queue ID.
  • the transfer queue determination unit 21 refers to the transfer queue management table 72, which is held by the integrated control unit 42, and checks whether the transfer queue 30 can be used.
  • the integrated control unit 42 updates the latest status of the transfer queue 30 as an entry in the transfer queue management table 72 as needed.
  • FIG. 12 is a configuration diagram of the subset size table 73.
  • the subset size table 73 registers a subset size range (minimum size, maximum size) and a subset size for each time slot (slot 0, slot 1, . . . ) for each transfer queue ID.
  • the request dividing unit 32 refers to the subset size table 73 to determine the size to divide each request.
  • the integrated control unit 42 sets the subset size range of the subset size table 73 in advance. Based on the request counter table 74 (FIG. 13), the transfer size determining unit 33 writes the subset size for each time slot to the subset size table 73 within the range of the subset size.
  • FIG. 13 is a configuration diagram of the request counter table 74. As shown in FIG.
  • the request counter table 74 stores the total number of requests arriving at the transfer queue 30 and the number of requests in each time slot (slot 0, slot 1, . . . ) for each transfer queue ID (for each priority). ).
  • the transfer size determination unit 33 refers to the request counter table 74 to determine the size of the subset.
  • FIG. 14 is a configuration diagram of the transfer engine table 75.
  • a transfer engine table 75 is prepared for each transfer engine 40 .
  • the subset read count and the size of the subset are associated with each transfer queue ID. These contents are set from the integrated control unit 42 .
  • the transfer queue selector 41 refers to the transfer engine table 75 to determine from which transfer queue 30 a specific transfer engine 40 should dequeue the subset. For example, in the transfer engine table 75 of FIG. 14, the read count is 1 for each. Therefore, the transfer queue selection unit 41 reads one subset from the transfer queue (0) this time, and then reads one subset from the transfer queue (1) next time. Note that when the size of the subset read out reaches the value registered in the transfer engine table 75 instead of the number of times of reading, the transfer queue selection unit 41 switches to dequeue the subset from another transfer queue 30 next time. may
  • FIG. 15 is a configuration diagram of the priority table 76.
  • the priority determining unit 22 determines the priority of requests issued from each thread 11 as exemplified in (Method 1) to (Method 3) below.
  • the thread 11 individually passes the priority to the DMA transfer driver 1 as a parameter for each request without referring to the priority table 76 .
  • An API (Application Programming Interface) example of the thread 11 for parameter passing is implemented by a data write command "write(priority, *buf, size);”.
  • the API of this write command is an API to the effect that the data of the data size of size from the pointer position of *buf in the host DRAM 12 is written to the device DRAM 51 with the priority of priority.
  • Method 2 By referring to the base priority of the priority table 76, a fixed priority is set for each request of a specific thread ID, even over time. For example, in the priority table 76 of FIG. 15, it is determined that thread #1 has priority "0" and thread #2 has priority "2". (Method 3) Referring to the priority of each time slot (t1, t2, t3, . priority dynamically depending on the For example, in the priority table 76 of FIG. 15, when the time slot is t2, the thread #1 is assigned priority "1" and the thread #2 is assigned priority "2".
  • FIG. 16 is a hardware configuration diagram of the physical server 100.
  • Physical server 100 is configured as computer 900 having CPU 901 , RAM 902 , ROM 903 , HDD 904 , communication I/F 905 , input/output I/F 906 and media I/F 907 .
  • Accelerator 905 is device 50 ( FIG. 1 ) that processes at least one of data from communication I/F 908 and data from RAM 902 at high speed.
  • a type look-aside type that returns the execution result to the CPU 901 or the RAM 902 after executing the processing from the CPU 901 or the RAM 902 may be used.
  • accelerator 905 a type (in-line type) that performs processing by entering between the communication I/F 908 and the CPU 901 or the RAM 902 may be used.
  • Accelerator 905 is connected to external device 915 via communication I/F 908 .
  • Input/output I/F 906 is connected to input/output device 916 .
  • a media I/F 907 reads and writes data from a recording medium 917 .
  • the CPU 901 controls each processing unit by executing a program (also called an application or an app for short) read into the RAM 902 .
  • This program can be distributed via a communication line or recorded on a recording medium 917 such as a CD-ROM for distribution.
  • FIG. 17 is a sequence diagram showing request transfer processing of the physical server 100.
  • the priority determining unit 22 reads the metadata (description information) of the request from the reception queue 20 (S201), and notifies the transfer queue determining unit 21 of the extracted priority information of the request (S202). Based on the priority information of the request, the transfer queue determining unit 21 determines the storage destination of the request as the transfer queue information (#0) (S203). The transfer queue determination unit 21 enqueues the request dequeued from the reception queue 20 to the transfer queue 30 (#0) (S204).
  • FIG. 18 is a sequence diagram showing the request transfer processing of the physical server 100 following FIG.
  • the transfer queue selector 41 checks whether or not a request is stored in the transfer queue 30 (#0) (S211), and receives notification of the stored request (S212).
  • the transfer engine 40 (#0) is in an idle state, not in transfer, and notifies the transfer queue selector 41 that new requests can be accepted (S213).
  • the transfer queue selection unit 41 requests the transfer queue 30 (#0) to transfer the request notified in S212 to the transfer engine 40 (#0) notified in S213 (S214).
  • the request dividing unit 32 receives the request requested to be transferred in S214 (S215), divides it into subsets, and notifies the transfer engine 40 (#0) of the subsets (S216).
  • FIG. 19 is a flow chart showing the process of notifying the transfer queue 30 of a request from the reception queue 20 .
  • the transfer queue determination unit 21 determines whether or not a new request has been received from the thread 11 to the reception queue 20 (S101). If Yes in S101, the metadata (description information) of the request is transmitted from the reception queue 20 to the priority determination unit 22 (S102). If No in S101, proceed to S103.
  • the transfer queue determination unit 21 determines whether or not there is a request for which transfer queue information to be enqueued has been determined (S103). If No in S103, the process returns to S101.
  • the transfer queue determining unit 21 determines whether or not there is a vacancy in the transfer queue 30 indicated by the determined transfer queue information (S104). If No in S104, the process returns to S101. If Yes in S104, the transfer queue determining unit 21 transmits (enqueues) the request to the transfer queue 30 based on the transfer queue information (S105).
  • FIG. 20 is a flow chart showing the process of notifying the transfer engine 40 of a subset of requests from the transfer queue 30 .
  • the transfer queue selector 41 selects the transfer queue 30 to dequeue the request. Then, the transfer queue selection unit 41 determines whether or not it has received from the transfer engine 40 that is available (S111). If No in S111, the process returns to S111. If Yes in S111, the request dividing unit 32 divides the requests in the transfer queue 30 into subsets. The transfer size determination unit 33 determines the size of the subset. Then, the transfer queue selection unit 41 transmits the divided subsets to the transfer engine 40 that has received the reception availability notification in S111 (S112). Further, the transfer queue selection unit 41 notifies the number of sent subsets (the number of requests for each priority) to the request counter 31 to update the request counter table 74 (S113).
  • FIG. 21 is a flow chart showing the process of DMA transferring a subset from transfer engine 40 to device 50 .
  • the transfer engine 40 determines whether or not the previous subset is being transferred (S121). If Yes in S121, the process returns to S121. If No in S121, the transfer queue selection unit 41 determines whether or not data of a subset to be transferred exists in any of the transfer queues 30 serving as read destinations (S122). If No in S122, the process returns to S121. If Yes in S122, the transfer queue selection unit 41 refers to the transfer engine table 75 and the previous transfer history to determine the transfer queue 30 of the read destination this time (S123).
  • the transfer queue selection unit 41 transmits a subset transfer request to the transfer queue 30 (S124), and receives the subset from the transfer queue 30 (S125).
  • the transfer queue selector 41 transfers the subset of S125 to the transfer engine 40 (S126). As a result, the subset is DMA-transferred from the transfer engine 40 to the device 50 .
  • FIG. FIG. 22 is a flowchart illustrating a first example of processing for changing the subset size of requests.
  • the transfer size determining unit 33 sets the size of the subset created from the request data to the size of the subset with the low priority "1" during the period when the request with the high priority "2" does not enter the transfer queue 30.
  • the size of the low priority subset is controlled to be reduced. The details of subset size control will be described below with reference to the flowchart.
  • the transfer size determining unit 33 extracts a count value obtained by counting the number of requests currently stored in each transfer queue 30 (S301). When the count value of priority "2" has increased compared to the previous count value (Yes in S302), that is, when a burst request occurs, the transfer size determination unit 33 determines the transfer size corresponding to priority "1". The subset size of the queue 30 is returned to the initial value (S303). If No in S302 and the count value of priority "1" has increased by a certain amount or more compared to the previous count value (Yes in S304), the transfer size determining unit 33 determines the transfer queue corresponding to priority "1". Increase the subset size of 30 (S305). This subset size increasing process increases by a predetermined value (by +n) within a range not exceeding the maximum value.
  • FIG. 23 is a time-series graph of subset sizes of high priority and low priority in the first example of FIG.
  • the upper graph shows the count value c2(k) in time slot k with priority "2".
  • the lower graph shows the subset size s1(k) at time slot k with priority "1". This s1(k) increases to the maximum subset size s1max of priority "1" when throughput is prioritized.
  • the TAT with the other priority "2" has priority, it is reduced to the minimum subset size s1min with the priority "1".
  • period T11 priority "2" does not occur, so the subset size of priority "1” is increased to s1max to give priority to throughput.
  • period T12 priority "2” occurs, so the subset size of priority “1” is reduced to s1min, giving priority to TAT of priority "2".
  • period T13 priority "2” does not occur, so throughput is prioritized by gradually increasing the subset size of priority "1” to s1max.
  • priority "2” occurs, so the subset size of priority "1” is reduced to s1min, giving priority to TAT of priority "2".
  • period T15 priority "2” does not occur, so throughput is prioritized by gradually increasing the subset size of priority "1” to s1max.
  • FIG. 24 is a flowchart illustrating a second example of processing for changing the subset size of requests.
  • the priority "2" high priority
  • the transfer size determining unit 33 sets the size of the subset created from the request data to that of the low priority "1" while the number of high priority "2" requests entering the transfer queue 30 increases.
  • the size of the subset is reduced while the size of the low priority subset is increased during periods when the number of high priority requests entering the transfer queue 30 is reduced.
  • the transfer size determining unit 33 extracts the count value of the transfer queue 30 corresponding to the priority "2" of the time slot k (S402), and calculates s1(k) by (Equation 1) (S403).
  • the transfer size determination unit 33 updates the subset size in time slot k with priority "1" to s1(k) calculated in S403 (S404).
  • FIG. 25 is a time-series graph of subset sizes of high priority and low priority in the second example of FIG. Similar to FIG. 23, the upper graph shows c2(k) and the lower graph shows s1(k). Also, the threshold value c2P in the upper graph is a count threshold value with which priority "2" should be processed with the highest priority. In the period T21, since there is no request with the priority "2", the priority "1" is set to the maximum size s1max. In period T22, the number of requests with priority "2" increases, so the size of priority "1” is reduced. This improves the TAT for priority "2".
  • the present invention is a physical server 100 having a host 10 having a host DRAM 12 in which transfer source data is stored and a device 50 having a device DRAM 51 as a data transfer destination,
  • a host 10 has a DMA transfer driver 1 that performs DMA transfer from a host DRAM 12 to a device DRAM 51, and a plurality of transfer queues 30 that store requests indicating transfer requests for data to be DMA transferred,
  • the DMA transfer driver 1 creates one or more subsets from the data in the host DRAM 12 specified by the request in each transfer queue 30, and DMA-transfers the sequentially selected subsets of each transfer queue 30 to the device DRAM 51. characterized by
  • the transfer delay of each request is suppressed by dividing the request data before the DMA transfer.
  • the threads 11 request DMA transfer to the same device 50, while one thread 11 is transferring a large request, it is possible to transfer a request from another thread 11 that comes later. becomes. Therefore, since fairness among the threads 11 is maintained, the threads 11 with strict delay requirements can operate on the same physical server 100 as other threads 11 (noisy neighbors). In this way, regardless of the size of the data size of the request from each thread 11, by providing fair data transfer opportunities among the threads 11, when data is transferred from a plurality of transfer sources to the same transfer destination. , the transfer efficiency of the transferred data and the delay requirement can be met in a well-balanced manner.
  • each transfer queue 30 is assigned an individual priority and stores requests of that priority
  • the DMA transfer driver 1 is characterized in that the size of the subset created from the data of the request is increased as the priority of the request is higher.
  • the DMA transfer driver 1 has versatility so that it can handle the priority of each request, so it is implemented so that priority control (designation of transfer order and required bandwidth) is performed by directly linking between threads 11 Reduce development costs. Moreover, since the threads 11 with different priorities can be accommodated in the same physical server 100, the housing cost and power consumption can be reduced compared to a method in which a plurality of physical servers 100 are operated for each priority.
  • each transfer queue 30 is assigned an individual priority and stores requests of that priority
  • the DMA transfer driver 1 expands the size of the subset created from the data of the request while increasing the size of the subset with low priority while the high priority request does not enter the transfer queue 30. requests enter the transfer queue 30, the size of the low-priority subset is reduced.
  • each transfer queue 30 is assigned an individual priority and stores requests of that priority, As the size of the subset created from the request data, the DMA transfer driver 1 reduces the size of the low-priority subset while the number of high-priority requests entering the transfer queue 30 increases. It is characterized by controlling the size of the low-priority subset to be expanded during a period in which the number of high-priority requests entering the queue 30 decreases.
  • the subset size can be dynamically controlled to meet the transfer requirements of both high-priority and low-priority requests in a well-balanced manner.
  • DMA transfer driver (transfer control unit) 10 host (host part) 11 threads 12 host DRAM (host memory) 20 reception queue 21 transfer queue determination unit 22 priority determination unit 30 transfer queue 31 request counter 32 request division unit 33 transfer size determination unit 40 transfer engine 41 transfer queue selection unit 42 integrated control unit 50 device (device unit) 51 Device DRAM (device memory) 52 IP cores 59 PCIe 60 External controller 61 External controller IF 71 queue ID table 72 transfer queue management table 73 subset size table 74 request counter table 75 transfer engine table 76 priority table 100 physical server (transfer management device)

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computer And Data Communications (AREA)

Abstract

Hôte (10) d'un serveur physique (100) comprenant un pilote de transfert DMA (1) qui effectue un transfert DMA d'une DRAM hôte (12) à une DRAM de dispositif (51), et une pluralité de files d'attente de transfert (30) qui stockent des requêtes de transfert de données à transférer par DMA. Le pilote de transfert DMA (1) crée un ou plusieurs sous-ensembles à partir de données dans la DRAM hôte (12), comme spécifié par la requête dans chaque file d'attente de transfert (30), et transfère séquentiellement, à la DRAM de dispositif (51) par DMA, le sous-ensemble sélectionné de chacune des files d'attente de transfert (30).
PCT/JP2021/044832 2021-12-07 2021-12-07 Dispositif de gestion de transfert, procédé de gestion de transfert et programme de gestion de transfert WO2023105603A1 (fr)

Priority Applications (1)

Application Number Priority Date Filing Date Title
PCT/JP2021/044832 WO2023105603A1 (fr) 2021-12-07 2021-12-07 Dispositif de gestion de transfert, procédé de gestion de transfert et programme de gestion de transfert

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/JP2021/044832 WO2023105603A1 (fr) 2021-12-07 2021-12-07 Dispositif de gestion de transfert, procédé de gestion de transfert et programme de gestion de transfert

Publications (1)

Publication Number Publication Date
WO2023105603A1 true WO2023105603A1 (fr) 2023-06-15

Family

ID=86729777

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/JP2021/044832 WO2023105603A1 (fr) 2021-12-07 2021-12-07 Dispositif de gestion de transfert, procédé de gestion de transfert et programme de gestion de transfert

Country Status (1)

Country Link
WO (1) WO2023105603A1 (fr)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH03191453A (ja) * 1989-12-15 1991-08-21 Internatl Business Mach Corp <Ibm> データ処理システム
JP2010262526A (ja) * 2009-05-08 2010-11-18 Canon Inc メモリ制御装置
JP2011175621A (ja) * 2010-01-27 2011-09-08 Fujitsu Semiconductor Ltd データ転送制御装置およびデータ転送制御方法
US20150067087A1 (en) * 2013-08-29 2015-03-05 International Business Machines Corporation Automatic pinning and unpinning of virtual pages for remote direct memory access

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH03191453A (ja) * 1989-12-15 1991-08-21 Internatl Business Mach Corp <Ibm> データ処理システム
JP2010262526A (ja) * 2009-05-08 2010-11-18 Canon Inc メモリ制御装置
JP2011175621A (ja) * 2010-01-27 2011-09-08 Fujitsu Semiconductor Ltd データ転送制御装置およびデータ転送制御方法
US20150067087A1 (en) * 2013-08-29 2015-03-05 International Business Machines Corporation Automatic pinning and unpinning of virtual pages for remote direct memory access

Similar Documents

Publication Publication Date Title
US8615629B2 (en) Access scheduler
US7350004B2 (en) Resource management device
JP4723260B2 (ja) ソースデバイスに対するリクエストをスケジューリングする装置及び方法
US8838853B2 (en) Access buffer
JP6072783B2 (ja) メモリコントローラおよびかかるメモリコントローラの動作方法
EP2241979A1 (fr) Arbitrage d&#39;interruption pour multiprocesseurs
JP4034969B2 (ja) 共通メモリのメモリ管理システム
KR20050020942A (ko) 연속 매체 우선순위 인식 저장장치 스케줄러
US11093352B2 (en) Fault management in NVMe systems
US6745262B1 (en) Method, system, program, and data structure for queuing requests having different priorities
JP2016195375A5 (fr)
JP6146128B2 (ja) データ処理装置
US20060047874A1 (en) Resource management apparatus
KR20080071840A (ko) 복수의 dma 요청 블록들을 가지는 직접 메모리 액세스장치의 우선 순위를 결정하는 방법 및 장치
EP2423824A1 (fr) Dispositif de transfert de données, procédé de transfert de données et appareil de formation d&#39;images
KR20020008955A (ko) 버스 시스템 및 그 실행 순서 조정방법
CN116664377A (zh) 数据传输方法及相关装置
EP3440547B1 (fr) Prise en charge de demandes basées sur une classe de qualité de service destinées à une ressource partagée
WO2023105603A1 (fr) Dispositif de gestion de transfert, procédé de gestion de transfert et programme de gestion de transfert
JP2011232917A (ja) 半導体集積回路、及びリクエスト制御方法
JP2000194683A (ja) 共有メモリの調停回路およびその調停方法
JP6036806B2 (ja) バスアクセス調停回路およびバスアクセス調停方法
JP4872942B2 (ja) ストレージシステム、ストレージ装置、優先度制御装置および優先度制御方法
KR20090128851A (ko) 버스 중재 방법 및 장치
US9977751B1 (en) Method and apparatus for arbitrating access to shared resources

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21967109

Country of ref document: EP

Kind code of ref document: A1

ENP Entry into the national phase

Ref document number: 2023565710

Country of ref document: JP

Kind code of ref document: A