CN110865953B - Asynchronous copying method and device - Google Patents

Asynchronous copying method and device Download PDF

Info

Publication number
CN110865953B
CN110865953B CN201910950581.8A CN201910950581A CN110865953B CN 110865953 B CN110865953 B CN 110865953B CN 201910950581 A CN201910950581 A CN 201910950581A CN 110865953 B CN110865953 B CN 110865953B
Authority
CN
China
Prior art keywords
channel
asynchronous
channels
candidate
copy request
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201910950581.8A
Other languages
Chinese (zh)
Other versions
CN110865953A (en
Inventor
李丁丁
陈振科
刘海
汤庸
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
South China Normal University
Original Assignee
South China Normal University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by South China Normal University filed Critical South China Normal University
Priority to CN201910950581.8A priority Critical patent/CN110865953B/en
Publication of CN110865953A publication Critical patent/CN110865953A/en
Application granted granted Critical
Publication of CN110865953B publication Critical patent/CN110865953B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F13/00Interconnection of, or transfer of information or other signals between, memories, input/output devices or central processing units
    • G06F13/14Handling requests for interconnection or transfer
    • G06F13/20Handling requests for interconnection or transfer for access to input/output bus
    • G06F13/28Handling requests for interconnection or transfer for access to input/output bus using burst mode transfer, e.g. direct memory access DMA, cycle steal

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Bus Control (AREA)

Abstract

The invention discloses an asynchronous copying method and an asynchronous copying device, wherein the method comprises the following steps: acquiring an asynchronous copy request; according to the asynchronous copy request, channel allocation is carried out by using a channel allocation algorithm to obtain a channel set, wherein the channel set comprises at least one channel; and carrying out asynchronous copying according to the channel set and the asynchronous copying request. The channel allocation is carried out by the channel allocation algorithm of the invention, a proper channel can be provided according to the asynchronous copy request, the task of asynchronous copy is carried out by utilizing the proper channel, the advantages of multiple channels can be maximized, and the performance of asynchronous copy is improved. The asynchronous copying method and the asynchronous copying device can be widely applied to the technical field of data transmission.

Description

Asynchronous copying method and device
Technical Field
The invention relates to the field of data transmission, in particular to an asynchronous copying method and an asynchronous copying device.
Background
The Storage Performance Development Kit (SPDK) can realize DMA drive based on I/OAT technology, the I/OAT is a DMA controller, and asynchronous copy work can be carried out by borrowing a channel (DMA channel) in dedicated DMA hardware through the SPDK and the I/OAT. The asynchronous copy means that, after receiving a request, the CPU itself does not copy but transfers the copy operation to the DMA controller to perform the copy operation. The existing SPDK performs asynchronous copy work by using DMA channels through a DMA controller after receiving an asynchronous copy request, however, since a plurality of DMA channels cannot be reasonably distributed in the plurality of DMA channels, the overhead of channel control right is greatly increased by using too many channels, and if a user program cannot process the overhead, the blocking of some channels is caused, so that the advantages of the multiple channels cannot be exerted, and the asynchronous copy performance is finally affected.
Disclosure of Invention
In view of the above, in order to solve the above technical problems, an object of the present invention is to provide an asynchronous copying method and apparatus for improving asynchronous copying performance.
The technical scheme adopted by the invention is as follows: an asynchronous copying method, comprising the steps of:
acquiring an asynchronous copy request;
according to the asynchronous copy request, channel allocation is carried out by using a channel allocation algorithm to obtain a channel set, wherein the channel set comprises at least one channel;
and carrying out asynchronous copying according to the channel set and the asynchronous copying request.
Further, the step of performing channel allocation by using a channel allocation algorithm according to the asynchronous copy request to obtain a channel set includes the following steps:
acquiring a right;
and according to the asynchronous copy request after the authority is obtained, channel allocation is carried out by utilizing a channel allocation algorithm to obtain a channel set.
Further, the step of performing channel allocation by using a channel allocation algorithm according to the asynchronous copy request to obtain a channel set includes the following steps:
determining the number of channels to be distributed according to the asynchronous copy request and a preset threshold value;
acquiring a heat coefficient of a channel;
and obtaining a channel set according to the number of the channels to be distributed and the heat coefficient.
Further, the method also comprises the following steps: and updating the heat coefficient of the channel according to the channel set.
Further, the step of obtaining the channel set according to the number of the channels to be distributed and the heat coefficient includes the following steps:
determining a candidate channel combination according to the heat coefficient and a preset proportion;
determining whether the number of the candidate channels in the candidate channel combination is greater than or equal to the number of the channels to be distributed;
if yes, selecting candidate channels with the same number as the channels to be distributed from the candidate channels as the channel set;
otherwise, adding a supplementary channel, and taking the supplementary channel and the candidate channel as the channel set.
Further, the step of adding a supplemental channel and taking the supplemental channel and the candidate channel as the channel set includes the following steps:
acquiring a source address and a destination address corresponding to an asynchronous copy request memory;
acquiring the physical distance between a channel and a node where the source address and/or the destination address are/is located;
selecting a supplementary channel from channels except the candidate channel according to the physical distance;
and forming the channel set by the supplementary channel and the candidate channel.
Further, the step of performing asynchronous copy according to the channel set and the asynchronous copy request includes the following steps: dividing the asynchronous copy request into tasks with the same number as the number of channels according to the number of the channels in the channel set;
and correspondingly allocating one task to each channel in the channel set to perform asynchronous copying.
Further, the step of correspondingly allocating one task to each channel in the channel set and performing asynchronous copy includes the following steps:
refining the task into a refined task;
determining the number of required descriptors according to the refinement tasks, wherein one descriptor corresponds to one refinement task;
acquiring an enqueue control right;
calculating the offset of a source page and a destination page according to the result of obtaining the enqueue control right, wherein a source address memory corresponding to the asynchronous copy request has at least one source page, and a destination address memory corresponding to the asynchronous copy request has at least one destination page;
filling descriptor teams according to the descriptor quantity and the offset calculation result, wherein the descriptor teams are provided with at least one descriptor;
and carrying out asynchronous copying according to the filling result of the descriptor team.
The present invention also provides an asynchronous copying apparatus, comprising:
the acquisition module is used for acquiring the asynchronous copy request;
the distribution module is used for carrying out channel distribution by utilizing a channel distribution algorithm according to the asynchronous copy request to obtain a channel set, wherein the channel set comprises at least one channel;
and the processing module is used for carrying out asynchronous copying according to the channel set and the asynchronous copying request.
The present invention also provides an asynchronous copying apparatus, comprising:
at least one processor;
at least one memory for storing at least one program;
when executed by the at least one processor, cause the at least one processor to implement the asynchronous copy method.
The invention has the beneficial effects that: according to the received asynchronous copy request, channel allocation is carried out by utilizing a channel allocation algorithm to obtain a channel set, and asynchronous copy is carried out according to the channel set and the asynchronous copy request; the invention carries out channel allocation through the channel allocation algorithm, can provide a proper channel according to the asynchronous copy request, carries out the task of asynchronous copy by utilizing the proper channel, can maximize the advantages of multiple channels and improve the performance of asynchronous copy.
Drawings
FIG. 1 is a schematic flow chart of the steps of the method of the present invention;
FIG. 2 is a framework hierarchy diagram of an embodiment of the present invention.
Detailed Description
The invention will be further explained and explained with reference to the drawings and the embodiments in the description. The step numbers in the embodiments of the present invention are set for convenience of illustration only, the order between the steps is not limited at all, and the execution order of each step in the embodiments can be adaptively adjusted according to the understanding of those skilled in the art.
As shown in fig. 1, the asynchronous copying method includes the following steps:
acquiring an asynchronous copy request;
according to the asynchronous copy request, channel allocation is carried out by using a channel allocation algorithm to obtain a channel set, wherein the channel set comprises at least one channel;
and carrying out asynchronous copying according to the channel set and the asynchronous copying request.
In this embodiment, the channels are physical DMA channels, and are abstracted into a channel structure (struct SPDK _ ioat _ chan) in the SPDK, and each channel structure represents a unique DMA channel.
Aiming at the asynchronous copy interface provided by the existing SPDK: 1) the condition that the same channel is used by multiple threads concurrently cannot be met; 2) the user program sends an asynchronous copy request and must sense a DMA channel; 3) when a plurality of user threads call the asynchronous copy interface to use the same DMA channel at the same time, the DMA channel team data can be deleted and modified randomly by different threads; 4) the user program has the defect of difficult rational use of a plurality of channels.
In this embodiment, a Linux kernel kfifo lock-free queue enqueuing and dequeuing algorithm (or DPDK lock-free queue enqueuing and dequeuing algorithm) is used to redesign a new interface ioat _ copy, and provide an asynchronous NUMA remote Memory copy interface (NUMA structure-Non-Uniform Memory Access) for a NUMA architecture user program, where the Memory and the processor are logically divided into a plurality of NUMA nodes, each node includes a certain number of processors and memories, and the nodes are connected and communicated through internal transmission links IC (Inter-connections), so as to implement support of multithreading lock-free and concurrent enqueuing and dequeuing in the same channel. The new interface is declared as follows:
struct spdk_ioat_chan*ioat_copy(void*cb_arg,
void(*cb_fn)(*void),
void*dst,
const void*src,
uint64_t nbytes)
wherein dst and src can be virtual addresses of local memory or remote memory (for a processor CPU, local memory-the processor CPU is located in the memory of the same node, remote memory-the memory located in a different node from the processor CPU), which respectively represent a destination address and a source address. The parameter nbytes indicates the length (in bytes) of the data copy, and indicates that the source data segment with the virtual address src to src + nbytes is copied to the destination data segment with the virtual address dst to dst + nbytes. cb is a callback function, fn and arg are parameters of the callback function, void, constvoid and uint64_ t are all types of parameters, and the callback function is executed after the asynchronous copy task is detected to be completed during polling.
In this embodiment, when the user program calls the new interface ioat _ copy, and sends the asynchronous copy request, the asynchronous copy request is obtained and summarized.
Further as a preferred embodiment, the step of performing channel allocation by using a channel allocation algorithm according to the asynchronous copy request to obtain a channel set includes the following steps:
acquiring a right;
and according to the asynchronous copy request after the authority is obtained, channel allocation is carried out by utilizing a channel allocation algorithm to obtain a channel set.
In this embodiment, since the asynchronous copy request may come from a plurality of different threads, in order to asynchronously copy the request, a certain thread needs to acquire the right first, that is, acquire the lock, and only after acquiring the lock, the channel allocation algorithm can be used for channel allocation, thereby ensuring the reliability of the channel allocation algorithm. And unlocking after the channel allocation is finished, and then performing the steps after other threads acquire the locks.
Further as a preferred embodiment, the step of performing channel allocation by using a channel allocation algorithm according to the asynchronous copy request to obtain a channel set includes the following steps:
determining the number of channels to be distributed according to the asynchronous copy request and a preset threshold value;
acquiring a heat coefficient of a channel;
and obtaining a channel set according to the number of the channels to be distributed and the heat coefficient.
Further as a preferred embodiment, the heat coefficient of the channel is updated according to the channel set.
Further as a preferred embodiment, the step of obtaining the channel set according to the number of channels to be allocated and the heat coefficient includes the following steps:
determining a candidate channel combination according to the heat coefficient and a preset proportion;
determining whether the number of the candidate channels in the candidate channel combination is greater than or equal to the number of the channels to be distributed;
if yes, selecting candidate channels with the same number as the channels to be distributed from the candidate channels as the channel set;
otherwise, adding a supplementary channel, and taking the supplementary channel and the candidate channel as the channel set.
Further, as a preferred embodiment, the step of adding a supplemental channel and taking the supplemental channel and the candidate channel as the channel set includes the following steps:
acquiring a source address and a destination address corresponding to an asynchronous copy request memory;
acquiring the physical distance between a channel and a node where the source address and/or the destination address are/is located;
selecting a supplementary channel from channels except the candidate channel according to the physical distance;
and forming the channel set by the supplementary channel and the candidate channel.
In this embodiment, the preset threshold is 1GB, in other embodiments, different values may be set according to actual needs, at least one channel is allocated to data of copy size of 1GB, and if the number of channels is less than 1GB, the channel is calculated according to 1 GB. For example: the asynchronous copy request with the copy size of 5.5GB can be distributed to obtain 6 channels, namely the number of the channels to be distributed is 6, so that the number of the channels is limited, and excessive channels are prevented from being distributed.
In this embodiment, a temperature coefficient is introduced to measure the contention of each channel, where the temperature coefficient is defined as the number of threads that are using the channel or enqueue the channel at a certain time, and the higher the temperature coefficient is, the more violent the contention of the channel is. The preset proportion is 80%, and other proportions can be adopted in other embodiments and can be adjusted according to actual needs.
In practice, for a certain asynchronous copy request, in a multi-thread environment, since a channel belongs to a preemptive resource, if the channel is allocated too much, overhead for acquiring a channel control right is increased, so that benefits brought by multiple channels are offset. Therefore, it is necessary to allocate a proper channel for the asynchronous copy request while ensuring a certain number of channels. In this embodiment, the (DMA) channel allocation algorithm, specifically:
inputting: dst, src, nbytes
And (3) outputting: channel set Cselect
1 Lock () -Lock
2:/setting the number of channels to be dispensed
3:num=nbytes/230
4:if nbytes%230>0then
5:num++
6:end if
7:/. find the value of the highest heat
8:Maxtemperature=0
9:for all Chan∈Call do
10:Maxtemperature=max(Maxtemperature,Chan->temperature)
11:end for
12:/. first the less hot channel is selected:
13:for all Chan∈Call do
14:if Chan->temperature<Maxtemperature*80%then
15 Chan adds Cselect
16:Chan->temperature++
17:end if
18:end for
19/. Low Heat channel is not sufficient, select channel with a close distance from the Source node and the destination node
20:if Num(Cselect)<num then
21 random slave Cnode[Node(dst)]∪Cnode[Node(src)]To select n-Num (C)select) A channel, and none of the channels are in CselectIn
22:end if
Unlock () -unlocking 23
24:return Cselect
Where Num () - -obtains the number of channels of the channel set, Cselect- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -Set of channels selected by the scheduling algorithm, Call-a set of channels for all nodes, Cnode[i]-a set of channels on Node i, Node (adr) -NUMA Node where virtual address adr is located, Chan-pointer to channel, n-number of channels to be allocated, Node-NUMA Node where DMA channel is located.
The method comprises the following specific steps:
1) firstly, traversing all channels to obtain the value of the highest heat coefficient;
2) all channels with the heat coefficient less than 80 percent of the highest heat coefficient, namely the channels with the lower heat coefficient, are taken as candidate channels to form a candidate channel combination;
3) determining whether the number of the candidate channels in the candidate channel combination is greater than or equal to the number of the channels to be distributed, if so, for example, the number of the candidate channels is 6 or 7, and directly using the candidate channel combination as a channel set or extracting 6 channels from the candidate channel combination as the channel set; if not, i.e. the number of candidate channels is less than 6, for example: the number of candidate channels is only 5, and the 5 candidate channels plus 1 supplementary channel are used as a channel set.
In this embodiment, according to the characteristics of the NUMA architecture, since the performance of the DMA hardware accessing the memory is related to the distance from the DMA hardware accessing the memory to the physical memory, the longer the distance is, the worse the performance is, the physical distance between the channel and the node where the source address and/or the destination address are/is located is introduced as a reference standard to perform channel supplementation. The specific method comprises the following steps: determining the physical distance between the channel and the node where the memory source address and/or the memory destination address are located, selecting a supplementary channel (preferentially selecting the channel with the shortest physical distance) from the channels except the candidate channels, and combining the supplementary channel and the candidate channels to form a channel set.
Wherein, after determining the channel set, updating the heat coefficient, wherein the channel allocation is performed by using a channel allocation algorithm, and the finally obtained channel set refers to C obtained in step 24 in the above descriptionselect
Further as a preferred embodiment, the step of performing asynchronous copy according to the channel set and the asynchronous copy request includes the following steps:
dividing the asynchronous copy request into tasks with the same number as the number of channels according to the number of the channels in the channel set;
and correspondingly allocating one task to each channel in the channel set to perform asynchronous copying.
Further as a preferred embodiment, the step of correspondingly allocating one task to each channel in the channel set and performing asynchronous copy includes the following steps:
refining the task into a refined task;
determining the number of required descriptors according to the refinement tasks, wherein one descriptor corresponds to one refinement task;
acquiring an enqueue control right;
calculating the offset of a source page and a destination page according to the result of obtaining the enqueue control right, wherein a source address memory corresponding to the asynchronous copy request has at least one source page, and a destination address memory corresponding to the asynchronous copy request has at least one destination page;
filling descriptor teams according to the descriptor quantity and the offset calculation result, wherein the descriptor teams are provided with at least one descriptor;
and carrying out asynchronous copying according to the filling result of the descriptor team.
In the present embodiment, according to the channel set CselectThe number of channels in (C) divides the task of the asynchronous copy request into a plurality of tasks and distributes the tasks to a channel set CselectAll the channels in the system are completed, and each task is correspondingly distributed to one channel, so that the multiple channels can be asynchronously copied in a concurrent manner, and the advantages of the multiple channels are exerted. Specifically, the method comprises the following steps:
1) dividing a source data segment (a data segment with virtual addresses from src to src + nbytes) and a target data segment (a data segment with virtual addresses from dst to dst + nbytes) into n data segments (n is a channel set C) according to ascending order of addressesselectNumber of channels) such that each channel copies one data segment;
2) by calling the new interface, is CselectEach channel ofAnd allocating a copy task, wherein one copy task copies one data segment correspondingly, so that the source address and the destination address of each task are ensured to be in one-to-one correspondence with the original source data segment and the original destination data segment, and the consistency of data is ensured.
Then, copy task refinement and lock-free entry C are performedselectA team of lanes in (1). After each task is allocated to each channel, the channel descriptor queue needs to be filled by a lock-free enqueue algorithm. In the present embodiment, the descriptor team ring (FIFO lock-free circular descriptor team) implements the circular team by an array, and the team is composed of 215A structure of descriptors, each descriptor representing a copy request with a copy granularity up to a page size, wherein the ring array has a plurality of indices: pro _ head- -head of team enqueue, pro _ tail- -tail of team enqueue, cons _ head- -head of team dequeue, cons _ tail- -tail of team dequeue, all threads with subscripts being touch-aware. (DMA channel) lock-free enqueue algorithm, specifically:
inputting: chan, cb _ arg, cb _ fn, dst, src, length (length of copy, in bytes)
And (3) outputting:
25 reckoning the number n of required descriptors based on whether dst and src are page aligned or not
26:ret=1
27 while ret ≠ 0 do/update Chan- > pro _ head first, refill descriptor ≠ ^
28:old_head=Chan->pro_head
29:new_head=old_head+n
30:ret=CAS(&Chan->pro_head,old_head,new_head)
V. success returns 0 and failure returns 1
31:end while
32 orig _ head ═ old _ head// enqueue descriptor array subscript
33:while length>0do
Op _ SIZE ═ min (Page _ SIZE-Page _ OFFSET (src)), -Page SIZE- -Source Address Page offset
Page _ SIZE-Page _ offset (dst), Page _ SIZE) -Page SIZE-destination Address Page offset
35:length-=op_size
36 descriptor fill (& Chan- > ring [ orig _ head ],/. fill the orig _ head descriptor >
SPDK Address translation (dst), SPDK Address translation (src), op _ size)
37:orig_head++
38:src+=op_size
39:dst+=op_size
40:end while
41:Chan->ring[orig_head-1].cb_fn=cb_fn
42:Chan->ring[orig_head-1].cb_arg=cb_arg
43 Chan- > pro-tail:/update
44:while Chan->pro_tail≠old_head do
45 do pause instruction
46:end while
47:Chan->pro_tail=new_head
Chan- > temperature-/. update the channel heat coefficient >
The method comprises the following steps:
1) and refining the task of each channel into a refining task, wherein each descriptor describes one refining task with the copy granularity being maximum page size. In order to manage the memory, the operating system artificially divides the memory into a plurality of pages with the same size, manages the memory by taking the page as a unit, and the page size refers to the size of the page, and is usually 4 KB.
2) Calculating the number n of required descriptors according to the detailed task, whether dst and src are aligned with each other or not;
3) acquiring an enqueue control right, which means that when different threads make asynchronous requests, each thread needs to use a CAS instruction (the line 31 of the code) to atomically rob the control right of the pro _ head, and can advance the pro _ head (the lines 27 to 31 of the code) after success, successfully updating a local variable old _ head of the thread successfully rob the control right to the pro _ head, and successfully updating a local variable new _ head of the thread to old _ head + n (the number of descriptors);
4) since the descriptor does not support cross-page copying, an offset calculation is performed for the source page and the destination page (line 34 of code); meanwhile, the descriptor directly faces DMA hardware, and the DMA hardware can only identify a physical address, so that the address translation interface (code 36 line) of the SPDK is used for carrying out address translation on a source address and a destination address so as to translate a virtual address into the physical address;
5) fill in descriptors with descriptor team ring index between old _ head and new _ head-1 (lines 33 to 40 of code);
6) adding a callback function and a parameter of the callback function to the last descriptor (lines 41 to 42 of the code) for calling the callback function during subsequent polling;
7) advancing pro _ tail to inform the DMA controller that the descriptors with index between old _ head and new _ head-1 have been filled, and the DMA controller then makes an asynchronous copy based on these descriptors (lines 44 to 46 of code);
8) updating pro _ tail to new _ head to complete enqueue;
9) and updating the channel heat coefficient.
Wherein the thread must update pro _ tail after completing the fill descriptor. All threads can see the update of the pro _ tail, if the threads find that the local variable old _ head of the threads is the same as the pro _ tail, the while loop is stopped (line 31 of the code), and the pro _ tail is continuously updated by the threads using the channel by continuously updating the pro _ tail to be the new _ head.
The above overall process steps can be summarized as follows:
inputting: cb _ arg, cb _ fn, dst, src, nbytes
And (3) outputting: channel set Cselect
49:CselectOid← DMA channel assignment algorithm (dst, src, nbytes)
50:length=nbytes/Num(Cselect)
51:for all Chan∈Cselect do
DMA channel lockless enqueue Algorithm (Chan, cb _ arg, cb _ fn, dst, src, length)
53:dst+=length
54:src+=length
55:end for
56:return Cselect
Further as a preferred embodiment, the method further includes a polling request process: after the copying task is refined and the lock-free access channel team is carried out, the DMA controller executes corresponding copying operation, when each DMA channel finishes a team descriptor, the DMA controller updates a cons _ head variable of a corresponding channel structure body, the value of the cons _ head is not larger than pro _ tail, and the fact that the descriptor with the subscript between the cons _ head and the pro _ tail is not finished is shown; and descriptors with subscripts between cons _ head and cons _ tail indicate that copying is complete but not yet polled; after all channels have polled (without a pollable descriptor), the task of the asynchronous copy request is completed, wherein the polling serves to execute a callback function that has completed the descriptor.
Wherein the user program can poll the channel set returned by the new interface ioat _ copy at any time, so as to inquire whether the request submitted by the user program is completed. The invention provides a polling algorithm, which is a dequeuing algorithm (or a DPDK dequeuing algorithm) based on a Linux kernel kfifo lock-free queue, and supports multithreading to poll one channel simultaneously. The framework provides an interface ioat _ poll for the user program to poll, and the polling algorithm specifically comprises the following steps:
inputting: channel set Cselect
And (3) outputting:
57:for all Chan∈Cselect do
58 if Chan- > cons _ tail ═ Chan- > cons _ head/> has no pollable descriptor-
59:then continue
60:end if
61:ret=1
62 while ret ≠ 0 do/. Refresh Chan- > cons _ tail >
63:old_tail=Chan->cons_tail
64:new_tail=Chan->cons_head
65:ret=CAS(&Chan->cons_tail,old_tail,new_tail)
V. success returns 0 and failure returns 1
66:end while
67 orig _ tail ═ old _ tail// dequeue descriptor array subscript
68:while orig_tail<new_tail do
69 if chan- > ring [ orig _ tail ]. cb _ fn ≠ NULL/. the callback function of descriptor is not NULL ≠ based
70, the chan- > ring [ orig _ tail ]. cb _ fn (chan- > ring [ orig _ tail ]. cb _ arg)// dequeue orig _ head descriptor
71:end if
72:orig_tail++
73:end while
74:while Chan->pro_tail≠old_head do
75 do pause instruction
76:end while
77:Chan->cons_tail=new_tail
78:end for
Wherein old _ tail and new _ tail are variables of the thread.
In summary, the steps of the asynchronous copy method are described as follows:
1) asynchronous copy request flow:
firstly, acquiring an asynchronous copy request;
secondly, channel allocation is carried out by utilizing a channel allocation algorithm according to the asynchronous copy request to obtain a channel set Cselect
Refining and distributing the copy task by the asynchronous copy request: according to channel set CselectDividing the asynchronous copy request into a plurality of copy tasks, and respectively handing over the copy tasks to CselectAll channels in (1) are finished, and one channel corresponds to one copy task;
fourthly, according to the copy task, the copy task is refined into a refinement task, descriptor team rings of the channels are filled according to the refinement task, and the method refers to a lock-free enqueue algorithm;
2) and (3) polling request flow: after the asynchronous copy request process is executed, the DMA controller executes copy operation, and polls through a polling algorithm (a dequeuing algorithm based on a Linux kernel kfifo lock-free queue) so as to inquire whether the copy operation of the asynchronous copy request is completed.
The present invention also provides an asynchronous copying apparatus, comprising:
the acquisition module is used for acquiring the asynchronous copy request;
the distribution module is used for carrying out channel distribution by utilizing a channel distribution algorithm according to the asynchronous copy request to obtain a channel set, wherein the channel set comprises at least one channel;
and the processing module is used for carrying out asynchronous copying according to the channel set and the asynchronous copying request.
As shown in fig. 2, in this embodiment, an asynchronous copy framework is provided at the user layer, and includes the obtaining module and the allocating module, where the asynchronous copy framework is designed with the new interface ioat _ copy to provide a new asynchronous copy interface for the SPDK, and the asynchronous copy framework is provided with the channel allocation algorithm for the new interface.
During operation, a user program can call a new interface of the asynchronous copy framework to provide a new NUMA structure asynchronous copy interface for the SPDK, and then the SPDK realizes the work of asynchronous copy by using I/OAT technology (DMA controller).
Further as a preferred embodiment, the system further comprises a uio module arranged in the kernel layer for exposing the physical address of the I/O device (the device connected to the DMA hardware) to the user program, so that the user program can access the I/O device in the user space to realize the user layer driver.
Further, as a preferred embodiment, the hardware layer is DMA hardware, and includes a DMA controller (I/OAT chip) and a DMA channel, and the DMA controller includes the processing module.
The embodiment of the invention also provides an asynchronous copying device, which comprises:
at least one processor;
at least one memory for storing at least one program;
when executed by the at least one processor, cause the at least one processor to implement the asynchronous copy method.
The contents in the above method embodiments are all applicable to the present system embodiment, the functions specifically implemented by the present system embodiment are the same as those in the above method embodiment, and the beneficial effects achieved by the present system embodiment are also the same as those achieved by the above method embodiment.
In summary, compared with the prior art, the invention has the following advantages:
1) the channel allocation is carried out through a channel allocation algorithm, a proper channel is provided for the asynchronous copy request, the task of asynchronous copy is carried out by utilizing the proper channel, the advantages of multiple channels can be maximized, the performance advantages of the multiple-channel concurrent copy are fully exerted, and the performance of the asynchronous copy of the NUMA remote memory is accelerated;
2) a new asynchronous copy frame is provided, so that a user program does not need to sense a DMA channel, the user program can carry out asynchronous copy at lower cost and more simply, the use threshold of I/OAT is reduced, and reliable and asynchronous NUMA remote memory copy service is improved;
in alternative embodiments, the functions/acts noted in the block diagrams may occur out of the order noted in the operational illustrations. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality/acts involved. Furthermore, the embodiments presented and described in the flow charts of the present invention are provided by way of example in order to provide a more thorough understanding of the technology. The disclosed methods are not limited to the operations and logic flows presented herein. Alternative embodiments are contemplated in which the order of various operations is changed and in which sub-operations described as part of larger operations are performed independently.
Furthermore, while the invention is described in the context of functional modules and illustrated in the form of block diagrams, it should be understood that, unless otherwise stated to the contrary, one or more of the described functions and/or features may be integrated into a single physical device and/or software module or one or more functions and/or features may be implemented in a separate physical device or software module. It will also be appreciated that a detailed discussion of the actual implementation of each module is not necessary for an understanding of the present invention. Rather, the actual implementation of the various functional modules in the apparatus disclosed herein will be understood within the ordinary skill of an engineer, given the nature, function, and internal relationship of the modules. Accordingly, those skilled in the art can, using ordinary skill, practice the invention as set forth in the claims without undue experimentation. It is also to be understood that the specific concepts disclosed are merely illustrative of and not intended to limit the scope of the invention, which is defined by the appended claims and their full scope of equivalents.
In the description herein, references to the description of the term "one embodiment," "the present embodiment," "an example," "a specific example," or "some examples," etc., mean that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the invention. In this specification, the schematic representations of the terms used above do not necessarily refer to the same embodiment or example. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples.
While the preferred embodiments of the present invention have been illustrated and described, it will be understood by those skilled in the art that various changes in form and details may be made therein without departing from the spirit and scope of the invention as defined by the appended claims.

Claims (8)

1. An asynchronous copying method, comprising the steps of:
acquiring an asynchronous copy request;
according to the asynchronous copy request, channel allocation is carried out by using a channel allocation algorithm to obtain a channel set, wherein the channel set comprises at least one channel;
according to the channel set and the asynchronous copy request, carrying out asynchronous copy;
the step of obtaining the channel set by using the channel allocation algorithm to allocate the channels according to the asynchronous copy request comprises the following steps:
determining the number of channels to be distributed according to the asynchronous copy request and a preset threshold value;
acquiring a heat coefficient of a channel;
determining a candidate channel combination according to the heat coefficient;
determining whether the number of the candidate channels in the candidate channel combination is greater than or equal to the number of the channels to be distributed;
if yes, selecting candidate channels with the same number as the channels to be distributed from the candidate channels as the channel set;
otherwise, adding a supplementary channel, and taking the supplementary channel and the candidate channel as the channel set.
2. The asynchronous copying method of claim 1, wherein: the step of obtaining the channel set by using the channel allocation algorithm to allocate the channels according to the asynchronous copy request comprises the following steps:
acquiring a right;
and according to the asynchronous copy request after the authority is obtained, channel allocation is carried out by utilizing a channel allocation algorithm to obtain a channel set.
3. The asynchronous copying method of claim 1, wherein: further comprising the steps of:
and updating the heat coefficient of the channel according to the channel set.
4. The asynchronous copying method of claim 1, wherein: the step of adding the supplementary channel and taking the supplementary channel and the candidate channel as the channel set comprises the following steps:
acquiring a source address and a destination address corresponding to an asynchronous copy request memory;
acquiring the physical distance between a channel and a node where the source address and/or the destination address are/is located;
selecting a supplementary channel from channels except the candidate channel according to the physical distance;
and forming the channel set by the supplementary channel and the candidate channel.
5. The asynchronous copying method of claim 1, wherein: the step of performing asynchronous copying according to the channel set and the asynchronous copying request comprises the following steps:
dividing the asynchronous copy request into tasks with the same number as the number of channels according to the number of the channels in the channel set;
and correspondingly allocating one task to each channel in the channel set to perform asynchronous copying.
6. The asynchronous copying method of claim 5, wherein: the step of correspondingly allocating one task to each channel in the channel set and performing asynchronous copying comprises the following steps:
refining the task into a refined task;
determining the number of required descriptors according to the refinement tasks, wherein one descriptor corresponds to one refinement task;
acquiring an enqueue control right;
calculating the offset of a source page and a destination page according to the result of obtaining the enqueue control right, wherein a source address memory corresponding to the asynchronous copy request has at least one source page, and a destination address memory corresponding to the asynchronous copy request has at least one destination page;
filling descriptor teams according to the descriptor quantity and the offset calculation result, wherein the descriptor teams are provided with at least one descriptor;
and carrying out asynchronous copying according to the filling result of the descriptor team.
7. An asynchronous copying apparatus, comprising:
the acquisition module is used for acquiring the asynchronous copy request;
the distribution module is used for carrying out channel distribution by utilizing a channel distribution algorithm according to the asynchronous copy request to obtain a channel set, wherein the channel set comprises at least one channel;
the processing module is used for carrying out asynchronous copying according to the channel set and the asynchronous copying request;
the step of obtaining the channel set by using the channel allocation algorithm to allocate the channels according to the asynchronous copy request comprises the following steps:
determining the number of channels to be distributed according to the asynchronous copy request and a preset threshold value;
acquiring a heat coefficient of a channel;
determining a candidate channel combination according to the heat coefficient;
determining whether the number of the candidate channels in the candidate channel combination is greater than or equal to the number of the channels to be distributed;
if yes, selecting candidate channels with the same number as the channels to be distributed from the candidate channels as the channel set;
otherwise, adding a supplementary channel, and taking the supplementary channel and the candidate channel as the channel set.
8. An asynchronous copying apparatus, comprising:
at least one processor;
at least one memory for storing at least one program;
when executed by the at least one processor, cause the at least one processor to implement the asynchronous copy method of any of claims 1-6.
CN201910950581.8A 2019-10-08 2019-10-08 Asynchronous copying method and device Active CN110865953B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910950581.8A CN110865953B (en) 2019-10-08 2019-10-08 Asynchronous copying method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910950581.8A CN110865953B (en) 2019-10-08 2019-10-08 Asynchronous copying method and device

Publications (2)

Publication Number Publication Date
CN110865953A CN110865953A (en) 2020-03-06
CN110865953B true CN110865953B (en) 2021-01-26

Family

ID=69652777

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910950581.8A Active CN110865953B (en) 2019-10-08 2019-10-08 Asynchronous copying method and device

Country Status (1)

Country Link
CN (1) CN110865953B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112600882B (en) * 2020-12-01 2022-03-08 上海交通大学 Hardware acceleration method based on shared memory communication mode
CN117909268B (en) * 2024-03-19 2024-05-24 麒麟软件有限公司 GPU driving optimization method

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101099140A (en) * 2005-02-08 2008-01-02 索尼计算机娱乐公司 Methods and apparatus for hybrid dma queue and dma table
CN101150486A (en) * 2007-11-15 2008-03-26 曙光信息产业(北京)有限公司 A management method for receiving network data of zero copy buffer queue
CN101923479A (en) * 2010-09-08 2010-12-22 青岛海信移动通信技术股份有限公司 Mobile terminal and method for controlling screen display when starting terminal
CN107193766A (en) * 2017-05-12 2017-09-22 苏州中科集成电路设计中心有限公司 A kind of multichannel ordered data transmission method between PCIe device and main frame
CN108228490A (en) * 2018-01-26 2018-06-29 武汉精测电子集团股份有限公司 A kind of driving method based on PCIe card high speed data transfer

Family Cites Families (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN100527111C (en) * 2007-09-29 2009-08-12 北京时代民芯科技有限公司 On-chip DMA structure and its implement method
US9727501B2 (en) * 2011-10-31 2017-08-08 Brocade Communications Systems, Inc. SAN fabric online path diagnostics
CN102968395B (en) * 2012-11-28 2015-04-15 中国人民解放军国防科学技术大学 Method and device for accelerating memory copy of microprocessor
CN104572519B (en) * 2014-12-22 2017-06-27 中国电子科技集团公司第三十八研究所 A kind of multiport memory access controller and its control method for multiprocessor
CN105991472A (en) * 2015-03-03 2016-10-05 中兴通讯股份有限公司 Message moving method and device
CN106951388B (en) * 2017-03-16 2020-06-30 湖南博匠信息科技有限公司 PCIe-based DMA data transmission method and system
CN109388590B (en) * 2018-09-28 2021-02-26 中国电子科技集团公司第五十二研究所 Dynamic cache block management method and device for improving multichannel DMA (direct memory access) access performance

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101099140A (en) * 2005-02-08 2008-01-02 索尼计算机娱乐公司 Methods and apparatus for hybrid dma queue and dma table
CN101150486A (en) * 2007-11-15 2008-03-26 曙光信息产业(北京)有限公司 A management method for receiving network data of zero copy buffer queue
CN101923479A (en) * 2010-09-08 2010-12-22 青岛海信移动通信技术股份有限公司 Mobile terminal and method for controlling screen display when starting terminal
CN107193766A (en) * 2017-05-12 2017-09-22 苏州中科集成电路设计中心有限公司 A kind of multichannel ordered data transmission method between PCIe device and main frame
CN108228490A (en) * 2018-01-26 2018-06-29 武汉精测电子集团股份有限公司 A kind of driving method based on PCIe card high speed data transfer

Also Published As

Publication number Publication date
CN110865953A (en) 2020-03-06

Similar Documents

Publication Publication Date Title
US11775429B2 (en) NUMA-aware garbage collection
US10210092B1 (en) Managing cache access and streaming data
TWI531974B (en) Method and system for managing nested execution streams
WO2018119952A1 (en) Device virtualization method, apparatus, system, and electronic device, and computer program product
US10140208B2 (en) NUMA-aware garbage collection
US11741019B2 (en) Memory pools in a memory model for a unified computing system
US20170344398A1 (en) Accelerator control device, accelerator control method, and program storage medium
CN110865953B (en) Asynchronous copying method and device
WO2017185285A1 (en) Method and device for assigning graphics processing unit task
US8291426B2 (en) Memory allocators corresponding to processor resources
US8566532B2 (en) Management of multipurpose command queues in a multilevel cache hierarchy
US20190332529A1 (en) Atomic operations for fabric shared memories
US20090083496A1 (en) Method for Improved Performance With New Buffers on NUMA Systems
TW201351276A (en) Scheduling and execution of compute tasks
JP7217341B2 (en) How processors and registers are inherited
JPWO2009075177A1 (en) Memory relocation system and memory relocation method
KR102275181B1 (en) Method and apparatus for memory allocation in a multi-core processor system, and recoding medium therefor
CN117033028A (en) Inter-process data sharing method and device and electronic equipment
CN117009109A (en) Inter-process data sharing method and device and electronic equipment
CN117076154A (en) Inter-process data sharing method and device and electronic equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant