CN112817887B - Far memory access optimization method and system under separated combined architecture - Google Patents
Far memory access optimization method and system under separated combined architecture Download PDFInfo
- Publication number
- CN112817887B CN112817887B CN202110209483.6A CN202110209483A CN112817887B CN 112817887 B CN112817887 B CN 112817887B CN 202110209483 A CN202110209483 A CN 202110209483A CN 112817887 B CN112817887 B CN 112817887B
- Authority
- CN
- China
- Prior art keywords
- data
- rdma
- read
- data block
- write
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F13/00—Interconnection of, or transfer of information or other signals between, memories, input/output devices or central processing units
- G06F13/14—Handling requests for interconnection or transfer
- G06F13/20—Handling requests for interconnection or transfer for access to input/output bus
- G06F13/28—Handling requests for interconnection or transfer for access to input/output bus using burst mode transfer, e.g. direct memory access DMA, cycle steal
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/46—Multiprogramming arrangements
- G06F9/54—Interprogram communication
- G06F9/542—Event management; Broadcasting; Multicasting; Notifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/46—Multiprogramming arrangements
- G06F9/54—Interprogram communication
- G06F9/544—Buffers; Shared memory; Pipes
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/46—Multiprogramming arrangements
- G06F9/54—Interprogram communication
- G06F9/546—Message passing systems or structures, e.g. queues
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F2209/00—Indexing scheme relating to G06F9/00
- G06F2209/54—Indexing scheme relating to G06F9/54
- G06F2209/544—Remote
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F2209/00—Indexing scheme relating to G06F9/00
- G06F2209/54—Indexing scheme relating to G06F9/54
- G06F2209/548—Queue
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Software Systems (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Multimedia (AREA)
- Bus Control (AREA)
Abstract
A far memory access optimization method and system under a separable and combinable framework are disclosed, firstly, a writable working set is deployed on a local computing node according to the memory read-write frequency of application, and a read-only working set is deployed on a far-end memory node; selecting a proper default data block size according to hardware resource characteristics in the data transmission process, and realizing transparent dispersion and integration of data blocks by setting indexes for the data blocks and combining dynamic blocking in the RDMA transmission process; a bidirectional unilateral operation mechanism matched with local application reading and writing is realized by utilizing unilateral reading and writing and a RDMA mechanism based on a queue; and setting a buffer by using an asynchronous read-write mechanism based on event notification to realize asynchronous parallel processing of local computation and RDMA data read-write. The method can fully mine the performance potential of the application layer computing task for accessing the remote memory by using RDMA.
Description
Technical Field
The invention relates to a technology in the field of distributed data processing, in particular to a remote memory access optimization method and a remote memory access optimization system under a separable and combinable architecture.
Background
Under the existing split and combinable memory architecture with scarce memory resources, people use high-speed networks such as RDMA protocol to realize the read-write of remote memory. The existing application-aware remote memory access solution replaces the RDMA at the back end of the Linux page exchange mechanism to transparently perform remote memory access, cannot avoid the additional overhead generated by kernel introduction, and does not consider the parallel potential brought by the memory access characteristic of an upper-layer application program.
Disclosure of Invention
Aiming at the defects in the prior art, the invention provides a remote memory access optimization method and a remote memory access optimization system under a separated and combined architecture, which can fully mine the performance potential of an application layer computing task for accessing a remote memory by using RDMA.
The invention is realized by the following technical scheme:
the invention relates to an RDMA (remote direct memory Access) remote memory access method applying parallel collaboration under a combinable architecture, which comprises the steps of firstly deploying a writable working set on a local computing node according to the memory read-write frequency of application, and deploying a read-only working set on a remote memory node; selecting the size of a default data block according to hardware resource characteristics in the data transmission process, and realizing transparent dispersion and integration of the data block by setting indexes for the data block and combining RDMA transmission process dynamic blocking; a bidirectional unilateral operation mechanism matched with local application reading and writing is realized by utilizing unilateral reading and writing and a RDMA mechanism based on a queue; and setting a buffer by using an asynchronous read-write mechanism based on event notification to realize asynchronous parallel processing of local computation and RDMA data read-write.
The said separated and combined structure is: the data center is provided with a framework for flexibly combining and matching a plurality of server CPUs and memories in a network connection mode, wherein: servers with computing tasks as functions are used as computing nodes (computer nodes), and servers with Memory access as functions are used as Memory nodes (Memory nodes).
The far memory architecture is as follows: a distributed architecture comprising at least one compute node and at least one memory node, wherein: the computing node and the memory node both comprise a server, and the computing node and the memory node are in wired connection through respective RDMA network cards.
The server takes a CPU as a computing core and a DRAM as a memory unit, the RDMA network card is connected with a mainboard of the server through PCIe, the CPU of each server uses a local memory and uses a remote memory through the RDMA network card without occupying the resources of the remote CPU.
The working set deployment specifically includes:
i) dividing a read-only working set according to the memory read-write frequency of the application;
ii) in the preprocessing process, the data blocks in the read-only working set divided in the step i) are transmitted to a remote memory area in a blocking mode in an RDMAwrite mode;
and iii) in the calculation execution process, the local application program continuously initiates a request for reading a remote data block, and the remote end returns the corresponding data block required by the server to the local machine in an RDMA Read mode according to the received request of the server program for use by the current program.
The default data block size is Chunk ═ α × Channel × Frame ÷ Core, where: channel is the number of PCIe channels for transmitting data once of the mainboard, Core is the number of CPUs of the mainboard, Frame is the number of data frames of the RDMA network card, and alpha is more than or equal to 1 and less than or equal to 1024.
The step of setting the index of the data block refers to: the index is set as the address pair lkey and the key pair rkey of the memory area corresponding to the data block.
The dynamic blocking of the RDMA transmission process refers to: when the Data block Data _ block which needs to be sent currently is larger than the default size Chunk which is set currently, the Data block is divided into Data blocks when being sentRespectively sending the data, otherwise, sending the data as a Chunk size so as to realize transparent dispersion; when receiving, integrating the data blocks divided into beta according to the original sequence according to the indexes, thereby realizing integration.
The bidirectional unilateral operation mechanism matched with local application reading and writing is as follows: the server program sets a buffer area for receiving information, sends an index for reading data to the far end, and the far end receives a corresponding data block according to the index and carries out unilateral read-write operation based on a RDMA mechanism of a queue, and directly writes the data into the buffer area of the server without data replication.
In the single-side operation, each time a data block read from or written to the receiving buffer is treated as a new read data block, the later data block will overwrite the previous data block information in the buffer.
The queue-based RDMA mechanism refers to:
step 1: sending (receiving) a request for adding an event A in a queue;
step 2: executing the event A, and starting reading and writing data;
and step 3: a, popping up a sending (receiving) queue by an event; and adding the data into a completion queue;
and 4, step 4: the next send (receive) event B enters the send (receive) queue;
and 5: completing the popping of the event A in the queue, and scanning the state of the event A;
step 6: when the event state of the A is successful, starting to execute the B; and when the state is unsuccessful, an error is reported.
And 7: steps 2-6 are repeated until there is no new time to join the transmit (receive) queue.
The asynchronous read-write mechanism based on event notification is as follows: when the remote memory starts to read/write, i.e. represents that the read/write is successful, the event is moved from the sending/receiving queue to the completion queue, and the actual time of the read-write process depends on the size of the read-write data block and the current network bandwidth.
The buffer areas are located in the local memory area and the remote memory area, and specifically include: an asynchronous parallel buffer, a transmit region, and a receive region, wherein: the sending area and the receiving area correspondingly execute data sending (writing) or receiving (reading) operation, an asynchronous parallel buffer area is used for temporary storage of transmission data to support asynchronous reading, and the data written in the asynchronous parallel buffer area cannot be overwritten until the data is rewritten next time.
The asynchronous parallel processing refers to: in the process of local computation, a data transmission process of RDMA is prepared at the same time, that is, in the last iteration, one step of receiving data to be used next time is prepared, and the specific steps include:
i) computing an iteration start;
ii) copying data of the RDMA receiving buffer to a computing area, and reading information of the read-only working set;
iii) preparing the index of the data block to be accessed in the next round, and sending the index to the far end;
iv) opening the RDMA receive buffer to receive data in preparation for the next round of data;
v) executing the calculation part of the iteration;
vi) returning to the step i) until the algorithm is converged or no new remote data block needing to be accessed is available.
Technical effects
The invention integrally solves the performance bottleneck caused by the fact that the kernel cannot introduce the access characteristic of an upper application program due to the fact that the extra overhead generated by the kernel introduction during the far memory access cannot be avoided because the back end of the Linux page exchange mechanism is replaced by RDMA in the prior art.
Compared with the prior art, the invention realizes the remote memory read-write framework of the application layer, optimizes the application read-write characteristics and the characteristics of RDMA hardware in fine granularity, and comprises the steps of deploying a read-only working set at a remote memory node, selecting a proper default data block size according to the characteristics of hardware resources, setting indexes for the data blocks to realize the transparent dispersion and integration of the data blocks, setting a bidirectional unilateral operation mechanism matched with local application read-write, designing an asynchronous read-write buffer area, achieving the asynchronous parallel of local calculation and RDMA data read-write, reducing the bandwidth occupation, improving the transmission efficiency, improving the overall performance applied under the remote memory framework, reducing the overall delay and even achieving the effect close to local memory processing.
Drawings
FIG. 1 is a schematic diagram of the system of the present invention;
FIG. 2 is a flow chart of the present invention;
FIG. 3 is a schematic diagram of an embodiment framework;
FIG. 4 is a schematic diagram of an embodiment data block partitioning and integration;
FIG. 5 is a block diagram illustrating an embodiment RDMA single-sided read and compute communication parallelism.
Detailed Description
In this embodiment, taking graph computing application as an example, using RDMA as a remote memory medium, the system environment is as follows: two Intel (R) Xeon (R) Gold 6148 CPUs with 2 20 cores, 256GB memory, 21TB hard disk and a two-channel Mellanox ConnectX-5RDMA network card. One of the servers serves as a compute node, and the other serves as a remote memory access node (remote node).
As shown in fig. 1, a far memory access optimization system under a split and combinable architecture according to the present embodiment includes: at least one local computing node and at least one remote memory node, which are connected and exchange data through respective RDMA network cards and in a wired manner, wherein: each node comprises a memory area, the memory area consists of a local area and a remote area, and a CPU of the local computing node exchanges data with the memory area through a cache.
As shown in fig. 3, the remote memory access optimization system includes: a compute node and a memory node, wherein: the computing node divides the application content into a read-only part and a non-read part according to the local application read-write characteristic, and simultaneously, the computing node blocks the data according to the local network card and the memory hardware parameter and carries out read-write interaction with the memory node through remote memory read-write operation. The memory nodes write the data transmitted by the computing nodes into the remote memory based on the unilateral memory writing according to the reading and writing requirements of the computing nodes, and read the data required by the computing nodes into the computing nodes by adopting the unilateral memory reading mode based on the index.
The computing node comprises: the device comprises an application read-write separation module, a data block selection module, a data block dispersion integration module, a first bidirectional unilateral read-write module and a first asynchronous parallel module, wherein: the application read-write separation module divides the application content into a read-only part and a non-read-only part according to the local application read-write characteristic to obtain two major data blocks and transmits the two major data blocks to the data block selection module; the data block selecting module selects a certain data block size according to the local network card and the memory hardware parameters and outputs the data block size to the data block dispersing and integrating module; the first bidirectional unilateral reading and writing module transmits the data block to the first asynchronous parallel module according to the size of the data block selected by the data block selecting module, the first bidirectional unilateral reading and writing module and the first asynchronous parallel module are transmitted to the first asynchronous parallel module in a far memory writing mode, and the data block is read from an asynchronous buffer area of the first asynchronous parallel module in a far memory reading mode; the first asynchronous parallel module is provided with a sending and receiving isolation area through an asynchronous buffer area, asynchronously transmits the data block to the first bidirectional unilateral reading and writing module in a far memory writing mode, simultaneously reads the returned data block from the first bidirectional unilateral reading and writing module in real time in the far memory reading mode, and stores the data block to the asynchronous buffer area for the first bidirectional unilateral reading and writing module to use; the first bidirectional unilateral read-write module carries out remote memory read and write processing on the data, and under a remote memory write mode, the data from the first asynchronous parallel module is written into a remote memory based on unilateral memory write, and a corresponding data index is locally reserved; in a far memory reading mode, the required data is transmitted to a far memory according to a reserved index, a corresponding data block is searched in the far memory, read back in a far memory reading mode and output to a first asynchronous parallel module.
The memory node is matched with the computing node and comprises: a second bidirectional one-sided read-write module and a second asynchronous parallel module, wherein: the second bidirectional unilateral read-write module writes and reads data from the computing node, and writes the data into the second asynchronous parallel module and returns a corresponding index to the computing node based on receiving a unilateral memory write data block from the computing node in a far memory write mode; in a far memory reading mode, according to an index from a computing node, searching a corresponding data block, acquiring the data block from the second asynchronous parallel module, and returning the data block to the computing node in a far memory reading mode; the second asynchronous parallel module reads the returned data block from the second bidirectional unilateral reading and writing module in real time in a far memory writing mode through an asynchronous buffer area and a transmitting and receiving isolation area, stores the data block to the asynchronous buffer area, and simultaneously transmits the data block to the second bidirectional unilateral reading and writing module asynchronously in the far memory reading mode.
As shown in fig. 2, the present embodiment relates to a method for optimizing remote memory access under the split and combinable architecture of the above system, which includes the following steps:
step 1) dividing the edge data of the graph calculation into a read-only working set.
And step 2) deploying the read-only working set of the graph calculation to a remote memory in a preprocessing stage in an RDMAwrite mode.
And 3) selecting a default size of the transmission data block according to hardware characteristics, and calculating the default size of the data block according to the number of PCIe channels for transmitting data once of the mainboard, the number of CPUs (central processing units) Core of the mainboard and the number of frames Frame of the RDMA (remote direct memory access) network card, wherein the default size is equal to or larger than alpha and equal to or less than 1024.
Step 4) as shown in fig. 4, dividing and integrating the Data according to the size of the Data block determined in step 3), specifically including dividing the Data block into Data blocks when the Data block Data _ block currently required to be transmitted is larger than the default size Chunk currently set, when transmitting the Data block Data _ block, dividing the Data block into Data blocksAnd each is sent separately. And when the Data block Data _ block which needs to be sent currently is smaller than or equal to the default size Chunk which is set currently, merging the Data block Data _ block into a Chunk size for sending. Upon reception, if the data is now divided into β data blocks, the data blocks are assembled in the original order upon reception.
Step 5) RDMA port bindings and connection setup are first opened locally and remotely as shown in fig. 5.
And 6) setting three buffers, namely an asynchronous parallel buffer, a sending region and a receiving region in the local memory region and the remote memory region at the same time, wherein the local sending region L-SB and the remote receiving region R-RB correspondingly execute data sending (writing) operation, and the local receiving region L-RB and the remote sending region R-SB correspondingly execute data receiving (reading) operation. .
Step 7) in the preprocessing stage, the system writes the data block of the read-only working set into the remote memory through RDMA single-side Write, and the data block transmission mode in the step 4) is also followed.
Step 8) during the iterative computation, performing the time coverage of computation and communication to achieve the parallel effect, wherein the specific steps comprise
i) Computing an iteration start;
ii) copying data of the RDMA receiving buffer to a computing area, and reading information of the read-only working set;
iii) preparing the index of the data block to be accessed in the next round, and sending the index to the far end;
iv) opening the RDMA receive buffer to receive data in preparation for the next round of data;
v) executing the calculation part of the iteration;
vi) returning to step i) until the algorithm converges or no new remote data block needs to be accessed.
And 9) after the calculation process is finished, recovering the memory areas occupied by the local and the remote ends, and disconnecting the RDMA connection.
Through practical experiments, by rewriting BFS and Pagerank algorithms in a Gridgraph graph calculation framework, the remote memory access can be performed by using RDMA in the above manner, and 4 data sets (1G to 32G are unequal) such as Livejournal are processed, and the obtained experimental result is that as shown in the following table, under the condition that 80% of local memory is saved, the total time is improved by about 8.3 times compared with the latest remote memory access framework Fastwap. Compared with the prior art, the performance index improvement of the method is less local memory occupation, less transmission throughput and faster total delay.
The foregoing embodiments may be modified in many different ways by those skilled in the art without departing from the spirit and scope of the invention, which is defined by the appended claims and all changes that come within the meaning and range of equivalency of the claims are therefore intended to be embraced therein.
Claims (10)
1. An RDMA (remote direct memory Access) remote memory access method applying parallel collaboration under a separate and combinable architecture is characterized in that firstly, a writable working set is deployed on a local computing node according to the memory read-write frequency of application, and a read-only working set is deployed on a remote memory node; selecting a proper default data block size according to hardware resource characteristics in the data transmission process, and realizing transparent dispersion and integration of data blocks by setting indexes for the data blocks and combining dynamic blocking in the RDMA transmission process; a bidirectional unilateral operation mechanism matched with local application reading and writing is realized by utilizing unilateral reading and writing and a RDMA mechanism based on a queue; setting a buffer area by using an asynchronous read-write mechanism based on event notification to realize asynchronous parallel processing of local computation and RDMA data read-write;
the said separated and combined structure is: the data center is provided with a framework for flexibly combining and matching a plurality of server CPUs and memories in a network connection mode, wherein: taking a server with a calculation task as a calculation node and a server with a memory access function as a memory node;
the far memory means: a distributed architecture comprising at least one compute node and at least one memory node, wherein: the computing node and the memory node respectively comprise a server, and the computing node and the memory node are in wired connection through respective RDMA network cards;
the server takes a CPU as a computing core and a DRAM as a memory unit, the RDMA network card is connected with a mainboard of the server through PCIe, the CPU of each server uses a local memory and uses a remote memory through the RDMA network card without occupying the resources of the remote CPU.
2. The RDMA remote memory access method applying parallel collaboration under a split and combinable architecture as claimed in claim 1, wherein said working set deployment specifically comprises:
i) dividing a read-only working set according to the memory read-write frequency of the application;
ii) in the preprocessing process, the data blocks in the read-only working set divided in the step i) are transmitted to a remote memory area in a blocking mode in an RDMAwrite mode;
and iii) in the calculation execution process, the local application continuously initiates a request for reading a remote data block, and the remote end returns the corresponding data block required by the server to the local machine in an RDMA Read mode according to the received request of the server program for use by the current program.
3. The RDMA remote memory access method applying parallel coordination under the split combinable architecture as claimed in claim 1, wherein the size of the default data block is Chunk ═ α × Channel × Frame ÷ Core, wherein: channel is the number of PCIe channels for transmitting data once of the mainboard, Core is the number of CPUs of the mainboard, Frame is the number of data frames of the RDMA network card, and alpha is more than or equal to 1 and less than or equal to 1024.
4. The RDMA remote memory access method applying parallel collaboration under split and combinable architecture as claimed in claim 1, wherein said indexing data blocks is: the index is set as the address pair lkey and the key pair rkey of the memory area corresponding to the data block.
5. The RDMA remote memory access method applying parallel collaboration under a split and combinable architecture as claimed in claim 1, wherein said RDMA transfer process dynamic blocking is: when the Data block Data _ block which needs to be sent currently is larger than the default size Chunk which is set currently, the Data block is divided into Data blocks when being sentRespectively sending the data, otherwise, sending the data as a Chunk size so as to realize transparent dispersion; when receiving, integrating the data blocks divided into beta according to the original sequence according to the indexes, thereby realizing integration.
6. The RDMA remote memory access method of application parallel collaboration under the separate and combinable architecture as claimed in claim 1, wherein the bidirectional one-sided operation mechanism cooperating with local application read-write is: the server program sets a buffer area for receiving information, sends an index for reading data to a far end, the far end receives a corresponding data block according to the index and carries out unilateral read-write operation based on a RDMA mechanism of a queue, and the data is directly written into the buffer area of the server without data copying;
in the single-side operation, each time a data block read from or written to the receiving buffer is treated as a new read data block, the later data block will overwrite the previous data block information in the buffer.
7. The RDMA remote memory access method applying parallel collaboration under split and combinable architecture as claimed in claim 1, wherein said queue-based RDMA mechanism is:
step 1: sending (receiving) a request for adding an event A in a queue;
step 2: executing the event A, and starting reading and writing data;
and step 3: a, popping up a sending (receiving) queue by an event; and adding the data into a completion queue;
and 4, step 4: the next send (receive) event B enters the send (receive) queue;
and 5: completing the popping of the event A in the queue, and scanning the state of the event A;
step 6: when the event state of the A is successful, starting to execute the B; when the state is unsuccessful, reporting an error;
and 7: steps 2-6 are repeated until there is no new time to join the transmit (receive) queue.
8. The RDMA remote memory access method applying parallel collaboration under separate and combinable architecture as claimed in claim 1, wherein said asynchronous read-write mechanism based on event notification is: when the remote memory starts to read/write, i.e. represents that the read/write is successful, the event is moved from the sending/receiving queue to the completion queue, and the actual time of the read-write process depends on the size of the read-write data block and the current network bandwidth.
9. The RDMA remote memory access method applying parallel collaboration under a split and combinable architecture as recited in claim 1, wherein the buffer is located in a local memory region and a remote memory region, and specifically comprises: an asynchronous parallel buffer, a transmit region, and a receive region, wherein: the sending area and the receiving area correspondingly execute data sending or receiving operation, the asynchronous parallel buffer area is used for temporarily storing transmission data to support asynchronous reading, and the data written in the asynchronous parallel buffer area cannot be overwritten until the data is rewritten next time.
10. The RDMA remote memory access method applying parallel collaboration under split and combinable architecture as claimed in claim 1, wherein said asynchronous parallel processing is: in the process of local computation, a data transmission process of RDMA is prepared at the same time, that is, in the last iteration, one step of receiving data to be used next time is prepared, and the specific steps include:
i) computing an iteration start;
ii) copying data of the RDMA receiving buffer to a computing area, and reading information of the read-only working set;
iii) preparing the index of the data block to be accessed in the next round, and sending the index to the far end;
iv) opening the RDMA receive buffer to receive data in preparation for the next round of data;
v) executing the calculation part of the iteration;
vi) returning to the step i) until the algorithm is converged or no new remote data block needing to be accessed is available.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110209483.6A CN112817887B (en) | 2021-02-24 | 2021-02-24 | Far memory access optimization method and system under separated combined architecture |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110209483.6A CN112817887B (en) | 2021-02-24 | 2021-02-24 | Far memory access optimization method and system under separated combined architecture |
Publications (2)
Publication Number | Publication Date |
---|---|
CN112817887A CN112817887A (en) | 2021-05-18 |
CN112817887B true CN112817887B (en) | 2021-09-17 |
Family
ID=75865550
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202110209483.6A Active CN112817887B (en) | 2021-02-24 | 2021-02-24 | Far memory access optimization method and system under separated combined architecture |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN112817887B (en) |
Families Citing this family (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113448897B (en) * | 2021-07-12 | 2022-09-06 | 上海交通大学 | Optimization method suitable for pure user mode far-end direct memory access |
CN113395359B (en) * | 2021-08-17 | 2021-10-29 | 苏州浪潮智能科技有限公司 | File currency cluster data transmission method and system based on remote direct memory access |
CN115495246B (en) * | 2022-09-30 | 2023-04-18 | 上海交通大学 | Hybrid remote memory scheduling method under separated memory architecture |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US7281030B1 (en) * | 1999-09-17 | 2007-10-09 | Intel Corporation | Method of reading a remote memory |
CN105426321A (en) * | 2015-11-13 | 2016-03-23 | 上海交通大学 | RDMA friendly caching method using remote position information |
CN106844048A (en) * | 2017-01-13 | 2017-06-13 | 上海交通大学 | Distributed shared memory method and system based on ardware feature |
CN108268208A (en) * | 2016-12-30 | 2018-07-10 | 清华大学 | A kind of distributed memory file system based on RDMA |
CN110262754A (en) * | 2019-06-14 | 2019-09-20 | 华东师范大学 | A kind of distributed memory system and lightweight synchronized communication method towards NVMe and RDMA |
CN111221773A (en) * | 2020-01-15 | 2020-06-02 | 华东师范大学 | Data storage architecture method based on RMDA high-speed network and skip list |
Family Cites Families (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US7539780B2 (en) * | 2003-12-01 | 2009-05-26 | International Business Machines Corporation | Asynchronous completion notification for an RDMA system |
US20060168094A1 (en) * | 2005-01-21 | 2006-07-27 | International Business Machines Corporation | DIRECT ACCESS OF SCSI BUFFER WITH RDMA ATP MECHANISM BY iSCSI TARGET AND/OR INITIATOR |
US8966195B2 (en) * | 2009-06-26 | 2015-02-24 | Hewlett-Packard Development Company, L.P. | Direct memory access and super page swapping optimizations for a memory blade |
CN105589664B (en) * | 2015-12-29 | 2018-07-31 | 四川中电启明星信息技术有限公司 | Virtual memory high speed transmission method |
CN111078607B (en) * | 2019-12-24 | 2023-06-23 | 上海交通大学 | Network access programming framework deployment method and system for RDMA (remote direct memory access) and nonvolatile memory |
CN111400307B (en) * | 2020-02-20 | 2023-06-23 | 上海交通大学 | Persistent hash table access system supporting remote concurrent access |
CN111400306B (en) * | 2020-02-20 | 2023-03-28 | 上海交通大学 | RDMA (remote direct memory Access) -and non-volatile memory-based radix tree access system |
CN111459418B (en) * | 2020-05-15 | 2021-07-23 | 南京大学 | RDMA (remote direct memory Access) -based key value storage system transmission method |
-
2021
- 2021-02-24 CN CN202110209483.6A patent/CN112817887B/en active Active
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US7281030B1 (en) * | 1999-09-17 | 2007-10-09 | Intel Corporation | Method of reading a remote memory |
CN105426321A (en) * | 2015-11-13 | 2016-03-23 | 上海交通大学 | RDMA friendly caching method using remote position information |
CN108268208A (en) * | 2016-12-30 | 2018-07-10 | 清华大学 | A kind of distributed memory file system based on RDMA |
CN106844048A (en) * | 2017-01-13 | 2017-06-13 | 上海交通大学 | Distributed shared memory method and system based on ardware feature |
CN110262754A (en) * | 2019-06-14 | 2019-09-20 | 华东师范大学 | A kind of distributed memory system and lightweight synchronized communication method towards NVMe and RDMA |
CN111221773A (en) * | 2020-01-15 | 2020-06-02 | 华东师范大学 | Data storage architecture method based on RMDA high-speed network and skip list |
Non-Patent Citations (1)
Title |
---|
"基于RDMA的分布式存储系统研究综述";陈游旻 等;《计算机研究与发展》;20190129;第227-238页 * |
Also Published As
Publication number | Publication date |
---|---|
CN112817887A (en) | 2021-05-18 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN112817887B (en) | Far memory access optimization method and system under separated combined architecture | |
CN113485823A (en) | Data transmission method, device, network equipment and storage medium | |
CN111339192A (en) | Distributed edge computing data storage system | |
CN111708738B (en) | Method and system for realizing interaction of hadoop file system hdfs and object storage s3 data | |
CN107907867A (en) | A kind of real-time SAR quick look systems of multi-operation mode | |
CN111708719B (en) | Computer storage acceleration method, electronic equipment and storage medium | |
CN114201421B (en) | Data stream processing method, storage control node and readable storage medium | |
US11243714B2 (en) | Efficient data movement method for in storage computation | |
WO2023019800A1 (en) | Filecoin cluster data transmission method and system based on remote direct memory access | |
CN113590528A (en) | Multi-channel data acquisition, storage and playback card, system and method based on HP interface | |
CN101576912A (en) | System and reading and writing method for realizing asynchronous input and output interface of distributed file system | |
CN112445735A (en) | Method, computer equipment, system and storage medium for transmitting federated learning data | |
US20080201549A1 (en) | System and Method for Improving Data Caching | |
JP4208506B2 (en) | High-performance storage device access environment | |
US7600074B2 (en) | Controller of redundant arrays of independent disks and operation method thereof | |
US7409486B2 (en) | Storage system, and storage control method | |
CN103986771A (en) | High-availability cluster management method independent of shared storage | |
CN112003800B (en) | Method and device for exchanging and transmitting messages of ports with different bandwidths | |
CN116074179B (en) | High expansion node system based on CPU-NPU cooperation and training method | |
WO2023093608A1 (en) | Automatic distributed cloud storage scheduling interaction method and apparatus, and device | |
US11847049B2 (en) | Processing system that increases the memory capacity of a GPGPU | |
CN105740166A (en) | Cache reading and reading processing method and device | |
CN106502828A (en) | A kind of remote copy method based on LVM of optimization | |
US8054857B2 (en) | Task queuing methods and systems for transmitting frame information over an I/O interface | |
JPH0715670B2 (en) | Data processing device |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |