WO2014210602A1 - Replicated database using one sided rdma - Google Patents

Replicated database using one sided rdma Download PDF

Info

Publication number
WO2014210602A1
WO2014210602A1 PCT/US2014/044924 US2014044924W WO2014210602A1 WO 2014210602 A1 WO2014210602 A1 WO 2014210602A1 US 2014044924 W US2014044924 W US 2014044924W WO 2014210602 A1 WO2014210602 A1 WO 2014210602A1
Authority
WO
WIPO (PCT)
Prior art keywords
server
data
database
index structure
client
Prior art date
Application number
PCT/US2014/044924
Other languages
French (fr)
Inventor
Michael Andrew RAYMOND
Lance EVANS
Original Assignee
Silicon Graphics International Corp.
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Silicon Graphics International Corp. filed Critical Silicon Graphics International Corp.
Publication of WO2014210602A1 publication Critical patent/WO2014210602A1/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/27Replication, distribution or synchronisation of data between databases or within a distributed database system; Distributed database system architectures therefor

Definitions

  • the present invention relates to replication of data.
  • the present invention relates to replication of data using memory to memory transfers.
  • DBMS database management
  • Latency in memory access operations can cause database performance to suffer. To ensure that data is available and up to date as quickly as possible, any reduction in latency is highly desirable. What is needed is an improved method of replicating databases in which latency is reduced.
  • the present technology may provide database replication with low latency using onesided remote direct memory access.
  • a client may communicate with a DMBS spread across more than one server.
  • the database may include one or more collections of data, known as tables. Each table may be composed of one or more memory data blocks of storage. Memory blocks are either in use storing data, or free for later use. In some DBMSs an in-use block is known as a database row.
  • Each in-use block may be uniquely identified by a descriptor known as a key.
  • Each table may have an index which may be used to find specific data blocks quickly based on their keys.
  • the index structure may also indicate what data blocks are used and unused. To read the data from a table associated with a certain key, the index structure is accessed to find the specific block containing the data referenced by the key.
  • the data is retrieved by reading from the block.
  • the index must be checked again to see if another client stored a new set of data associated with the key in a different block and updated the index to point to the new block.
  • An embodiment may perform a method for replicating data.
  • a memory location may be allocated in a first database.
  • a remote direct memory access command may be sent from a client to a first database and a second database to write data to the memory location.
  • An index structure for each of the first database and second database may be updated with information regarding the data.
  • An embodiment may include a system for displaying data.
  • the system may include a processor, a memory, and one or more modules stored in memory.
  • the one or more modules may be executed by the processor to allocate a memory location in a first database, send a remote direct memory access command from a client to a first database and a second database to write data to the memory location, update an index structure for each of the first database and second database with information regarding the data.
  • FIGURE 1 is a system for replicating data.
  • FIGURE 2 is a block diagram of a database server.
  • FIGURE 3 is a method for writing data.
  • FIGURE 4 is a method for reading data.
  • FIGURE 5 provides a computing device for implementing the present technology.
  • the present technology may provide database replication with low latency using onesided remote direct memory access.
  • a client may communicate with a DMBS spread across more than one server.
  • the database may include one or more collections of data, known as tables. Each table may be composed of one or more memory data blocks of storage. Memory blocks are either in use storing data, or free for later use. In some DBMSs an in-use block is known as a database row.
  • Each in-use block may be uniquely identified by a descriptor known as a key.
  • Each table may have an index which may be used to find specific data blocks quickly based on their keys.
  • the index structure may also indicate what data blocks are used and unused. To read the data from a table associated with a certain key, the index structure is accessed to find the specific block containing the data referenced by the key.
  • the data is retrieved by reading from the block.
  • the index must be checked again to see if another client stored a new set of data associated with the key in a different block and updated the index to point to the new block.
  • FIGURE 1 is a system for replicating data.
  • the system of FIGURE 1 includes database 110, network 120, and servers 130 and 140.
  • Database 110 may be implemented as a computing device capable of accessing data and communicating over network 120, and may be, for example, a desktop, laptop, tablet or other computer, a mobile device of other computing device.
  • Database 110 may communicate with databases 130-140 through network 120.
  • database 110 may communicate with the servers by remote direct memory access (RDMA).
  • RDMA is a form direct memory access from the memory of one computer to that of another without involving either computer's operating system. This process of access permits high-throughput, low latency networking.
  • the system of FIGURE 1 may include any application, software module, and process required to implement RDMA communications.
  • RDMA module 115 may reside on database 110.
  • RDMA module 115 may include one or more software modules or processes which may use RDMA to directly perform operations such as read, write and modify the memory of databases 130-140.
  • RDMA module 115 performs operations on database memory without passing control of data access to the operating systems of databases 130-140.
  • database 110 may access, store and modify data stored in memory at databases 130 and 140 through RDMA.
  • the RDMA communications may be one-sided in that database 110 sends RDMA commands to databases 130 and 140, but databases 130-140 do not control access operations and do not send RDMA commands to database 110
  • Network 120 may communicate with clients 110, server 130 and server 140.
  • Network 120 may be comprised of any combination of a private network, a public network, a local area network, a wide area network, the Internet, an intranet, a Wi-Fi network, a cellular network, or some other network.
  • Server 130 and 140 may each include one or more servers for storing data.
  • the data may be structured data or unstructured data, and be replicated over the two databases.
  • the memory of each of server 130-140 may be accessible by RDMA module 115 and/or database 110 via RDMA commands.
  • FIGURE 2 is a block diagram of a database server.
  • the database server 210 of FIGURE 2 includes data blocks 220 and a data table 230.
  • Data blocks 220 may include blocks at which data may be stored, accessed, and modified.
  • the data may be structured or unstructured data.
  • the database server 210 may be used to implement each of databases 130-140 of FIGURE 1.
  • Data table 230 may include an index structure for storing information about data blocks within database 210.
  • the index structure of data table 230 may include pointers to data block locations in memory currently in use. If a particular data block is not being used, the index structure of data table 230 will not include a pointer. In some
  • the index structure for data table 230 may be a bit map.
  • the DBMS may have a management process to coordinate security checks and aiding setting up initial access to the table, such as to data block 220. This helps maintain serialization in writing data from multiple sources.
  • a writer and reader may exist outside of the DBMS container.
  • Each table in the DBMS can only have one writing client at a time, and may have any number of threads or other reading clients at a time.
  • FIGURE 3 is a method for writing data.
  • the method of FIGURE 3 may be performed by database 110 using RDMA commands sent to one or more of databases 130 and 140.
  • First, an unused data block may be found in a data structure table at step 310.
  • database 110 may send an RDMA command to have a read process retrieve the index structure of the data table within the database receiving the request.
  • the index structure will not include pointers for data blocks which are unused.
  • An unused data block may be marked as a used data block in the index structure of the data table at step 320.
  • database 110 may send an RDMA command to a database write process to update the index structure for a particular data block.
  • the data block that is marked used will be the data block that is being written to by database 110.
  • Database 110 may send an RDMA command to the write process to write data to the memory block.
  • database 110 does not involve any processes of the server being written to. Rather, the data is written directly from the memory of database 110 to the memory of the particular data base server. The server has no control over any portion of the process.
  • Data in the memory block of the second server may be written using RDMA commands at step 340.
  • the data is replicated for durability.
  • the index structure of the tables at each database server is updated at step 350.
  • the update may include adding a pointer to the memory block at which data was just written.
  • FIGURE 4 is a method for retrieving data.
  • the method of FIGURE 4 may be performed by database 110 through the use of RDMA commands sent to databases 130 or 140.
  • an index structure is accessed for desired data at step 410.
  • the index structure may be accessed by sending an RDMA command to a database.
  • the RDMA command instructs network hardware to do a read from the memory on the server a return the data.
  • a data block location is determined for desired data at step 420.
  • the data block location may be determined from a pointer associated with the desired data in the index structure.
  • Data is retrieved using RDMA commands sent by the client at step 430.
  • the RDMA commands allow the client to retrieve data from a database without ever passing control over the retrieval operation to a particular database.
  • the index structure may be accessed again and a determination is made as to whether there is a change in the index structure pointer associated with the memory block read at step 440. If any change occurred between the time when the index structure was first accessed and the time that data was retrieved, the data received by database 110 may not be the most up-to-date data. Therefore, if a change is detected, the method of FIGURE 4 returns to step 420 where the data block is retrieved again. If there is no change in the index structure, the retrieved data is up to date and the method of FIGURE 4 ends at step 450.
  • FIGURE 5 provides a computing device for implementing the present technology.
  • Computing device 500 may be used to implement devices such as for example data base servers 130 and 140 and database 110.
  • FIGURE 5 illustrates an exemplary computing system 500 that may be used to implement a computing device for use with the present technology.
  • System 500 of FIGURE 5 may be implemented in the contexts of the likes of client computer 210, servers that comprise services 230-250 and 270-280, application server 260, and data store 267.
  • the computing system 500 of FIGURE 5 includes one or more processors 510 and memory 520.
  • Main memory 520 stores, in part, instructions and data for execution by processor 510.
  • Main memory 520 can store the executable code when in operation.
  • the system 500 of FIGURE 5 further includes a mass storage device 530, portable storage medium drive(s) 540, output devices 550, user input devices 560, a graphics display 570, and peripheral devices 580.
  • FIGURE 5 The components shown in FIGURE 5 are depicted as being connected via a single bus 590. However, the components may be connected through one or more data transport means.
  • processor unit 510 and main memory 520 may be connected via a local microprocessor bus, and the mass storage device 530, peripheral device(s) 580, portable storage device 540, and display system 570 may be connected via one or more input/output (I/O) buses.
  • Mass storage device 530 which may be implemented with a magnetic disk drive or an optical disk drive, is a non-volatile storage device for storing data and instructions for use by processor unit 510. Mass storage device 530 can store the system software for implementing embodiments of the present invention for purposes of loading that software into main memory 520.
  • Portable storage device 540 operates in conjunction with a portable non-volatile storage medium, such as a floppy disk, compact disk or Digital video disc, to input and output data and code to and from the computer system 500 of FIGURE 5.
  • a portable non-volatile storage medium such as a floppy disk, compact disk or Digital video disc
  • the system software for implementing embodiments of the present invention may be stored on such a portable medium and input to the computer system 500 via the portable storage device 540.
  • Input devices 560 provide a portion of a user interface.
  • Input devices 560 may include an alpha-numeric keypad, such as a keyboard, for inputting alpha-numeric and other information, or a pointing device, such as a mouse, a track ball, stylus, or cursor direction keys.
  • the system 500 as shown in FIGURE 5 includes output devices 550. Examples of suitable output devices include speakers, printers, network interfaces, and monitors.
  • Display system 570 may include a liquid crystal display (LCD) or other suitable display device.
  • Display system 570 receives textual and graphical information, and processes the information for output to the display device.
  • LCD liquid crystal display
  • Peripherals 580 may include any type of computer support device to add additional functionality to the computer system.
  • peripheral device(s) 580 may include a modem or a router.
  • the components contained in the computer system 500 of FIGURE 5 are those typically found in computer systems that may be suitable for use with embodiments of the present invention and are intended to represent a broad category of such computer components that are well known in the art.
  • the computer system 500 of FIGURE 5 can be a personal computer, hand held computing device, telephone, mobile computing device, workstation, server, minicomputer, mainframe computer, or any other computing device.
  • the computer can also include different bus configurations, networked platforms, multi-processor platforms, etc.
  • Various operating systems can be used including Unix, Linux, Windows, Macintosh OS, Palm OS, and other suitable operating systems.

Abstract

This innovation provides a method for a networked and replicated database management system (DBMS) using only one-sided remote direct memory access (RDMA). Replicated databases retain some access to the stored data in the face of server failure. In the prior state of the art, after the software in the DBMS on one of the servers acted on a client's request to update the database, it would contact the other replicas of the database and ensure that they had recorded the change, before responding to the client that the transaction was complete. This innovation describes a method whereby the database client directly interacts with each DBMS replica over the network using only RDMA to directly modify the stored data while maintaining the properties of database atomicity and consistency. This method reduces transactional latency by removing any need for the server DBMS software to respond to or forward requests for service.

Description

REPLICATED DATABASE USING ONE SIDED RDMA
BACKGROUND
Field of the Invention
The present invention relates to replication of data. In particular, the present invention relates to replication of data using memory to memory transfers.
Description of the Prior Art
Replication of data across database servers is a common safeguard for protecting data. Typically, when reading or writing data, a request to perform a data operation is sent from a client to a database. The database receives the request and processes the request. Processing the request in prior art systems may include the database management (DBMS) system taking control of the data access detecting a request on the data, process the request by searching for the data and performing an operation on the data, generating a response, and transmitting the response. With large amounts of data requests, the DBMS handling of data replication related requests can cause latency issues.
Latency in memory access operations can cause database performance to suffer. To ensure that data is available and up to date as quickly as possible, any reduction in latency is highly desirable. What is needed is an improved method of replicating databases in which latency is reduced.
SUMMARY
The present technology may provide database replication with low latency using onesided remote direct memory access. A client may communicate with a DMBS spread across more than one server. The database may include one or more collections of data, known as tables. Each table may be composed of one or more memory data blocks of storage. Memory blocks are either in use storing data, or free for later use. In some DBMSs an in-use block is known as a database row.
Each in-use block may be uniquely identified by a descriptor known as a key. Each table may have an index which may be used to find specific data blocks quickly based on their keys. The index structure may also indicate what data blocks are used and unused. To read the data from a table associated with a certain key, the index structure is accessed to find the specific block containing the data referenced by the key.
After the location is determined, the data is retrieved by reading from the block. After the data is retried, the index must be checked again to see if another client stored a new set of data associated with the key in a different block and updated the index to point to the new block.
An embodiment may perform a method for replicating data. A memory location may be allocated in a first database. A remote direct memory access command may be sent from a client to a first database and a second database to write data to the memory location. An index structure for each of the first database and second database may be updated with information regarding the data.
An embodiment may include a system for displaying data. The system may include a processor, a memory, and one or more modules stored in memory. The one or more modules may be executed by the processor to allocate a memory location in a first database, send a remote direct memory access command from a client to a first database and a second database to write data to the memory location, update an index structure for each of the first database and second database with information regarding the data. BRIEF DESCRIPTION OF THE DRAWINGS
FIGURE 1 is a system for replicating data.
FIGURE 2 is a block diagram of a database server.
FIGURE 3 is a method for writing data.
FIGURE 4 is a method for reading data.
FIGURE 5 provides a computing device for implementing the present technology.
DETAILED DESCRIPTION
The present technology may provide database replication with low latency using onesided remote direct memory access. A client may communicate with a DMBS spread across more than one server. The database may include one or more collections of data, known as tables. Each table may be composed of one or more memory data blocks of storage. Memory blocks are either in use storing data, or free for later use. In some DBMSs an in-use block is known as a database row.
Each in-use block may be uniquely identified by a descriptor known as a key. Each table may have an index which may be used to find specific data blocks quickly based on their keys. The index structure may also indicate what data blocks are used and unused. To read the data from a table associated with a certain key, the index structure is accessed to find the specific block containing the data referenced by the key.
After the location is determined, the data is retrieved by reading from the block. After the data is retried, the index must be checked again to see if another client stored a new set of data associated with the key in a different block and updated the index to point to the new block.
FIGURE 1 is a system for replicating data. The system of FIGURE 1 includes database 110, network 120, and servers 130 and 140. Database 110 may be implemented as a computing device capable of accessing data and communicating over network 120, and may be, for example, a desktop, laptop, tablet or other computer, a mobile device of other computing device. Database 110 may communicate with databases 130-140 through network 120.
In some embodiments, database 110 may communicate with the servers by remote direct memory access (RDMA). RDMA is a form direct memory access from the memory of one computer to that of another without involving either computer's operating system. This process of access permits high-throughput, low latency networking.
The system of FIGURE 1 may include any application, software module, and process required to implement RDMA communications. For example, RDMA module 115 may reside on database 110. RDMA module 115 may include one or more software modules or processes which may use RDMA to directly perform operations such as read, write and modify the memory of databases 130-140. RDMA module 115 performs operations on database memory without passing control of data access to the operating systems of databases 130-140. Thus, database 110 may access, store and modify data stored in memory at databases 130 and 140 through RDMA. In some embodiments of the invention, the RDMA communications may be one-sided in that database 110 sends RDMA commands to databases 130 and 140, but databases 130-140 do not control access operations and do not send RDMA commands to database 110 Network 120 may communicate with clients 110, server 130 and server 140. Network 120 may be comprised of any combination of a private network, a public network, a local area network, a wide area network, the Internet, an intranet, a Wi-Fi network, a cellular network, or some other network.
Server 130 and 140 may each include one or more servers for storing data. The data may be structured data or unstructured data, and be replicated over the two databases. The memory of each of server 130-140 may be accessible by RDMA module 115 and/or database 110 via RDMA commands.
FIGURE 2 is a block diagram of a database server. The database server 210 of FIGURE 2 includes data blocks 220 and a data table 230. Data blocks 220 may include blocks at which data may be stored, accessed, and modified. The data may be structured or unstructured data. The database server 210 may be used to implement each of databases 130-140 of FIGURE 1.
Data table 230 may include an index structure for storing information about data blocks within database 210. In embodiments, the index structure of data table 230 may include pointers to data block locations in memory currently in use. If a particular data block is not being used, the index structure of data table 230 will not include a pointer. In some
embodiments, the index structure for data table 230 may be a bit map.
The DBMS may have a management process to coordinate security checks and aiding setting up initial access to the table, such as to data block 220. This helps maintain serialization in writing data from multiple sources. A writer and reader may exist outside of the DBMS container. Each table in the DBMS can only have one writing client at a time, and may have any number of threads or other reading clients at a time.
FIGURE 3 is a method for writing data. The method of FIGURE 3 may be performed by database 110 using RDMA commands sent to one or more of databases 130 and 140. First, an unused data block may be found in a data structure table at step 310. To find the unused data block, database 110 may send an RDMA command to have a read process retrieve the index structure of the data table within the database receiving the request. The index structure will not include pointers for data blocks which are unused.
An unused data block may be marked as a used data block in the index structure of the data table at step 320. To mark a data block in the index structure, database 110 may send an RDMA command to a database write process to update the index structure for a particular data block. The data block that is marked used will be the data block that is being written to by database 110.
Data is written to the memory block of a first server using an RDMA command at step 330. Database 110 may send an RDMA command to the write process to write data to the memory block. By using the RDMA command, database 110 does not involve any processes of the server being written to. Rather, the data is written directly from the memory of database 110 to the memory of the particular data base server. The server has no control over any portion of the process.
Data in the memory block of the second server may be written using RDMA commands at step 340. By writing the data in a memory block of a second server, the data is replicated for durability. The index structure of the tables at each database server is updated at step 350. The update may include adding a pointer to the memory block at which data was just written.
FIGURE 4 is a method for retrieving data. The method of FIGURE 4 may be performed by database 110 through the use of RDMA commands sent to databases 130 or 140. First, an index structure is accessed for desired data at step 410. The index structure may be accessed by sending an RDMA command to a database. The RDMA command instructs network hardware to do a read from the memory on the server a return the data. Next, a data block location is determined for desired data at step 420. The data block location may be determined from a pointer associated with the desired data in the index structure. Data is retrieved using RDMA commands sent by the client at step 430. The RDMA commands allow the client to retrieve data from a database without ever passing control over the retrieval operation to a particular database.
After receiving the data, the index structure may be accessed again and a determination is made as to whether there is a change in the index structure pointer associated with the memory block read at step 440. If any change occurred between the time when the index structure was first accessed and the time that data was retrieved, the data received by database 110 may not be the most up-to-date data. Therefore, if a change is detected, the method of FIGURE 4 returns to step 420 where the data block is retrieved again. If there is no change in the index structure, the retrieved data is up to date and the method of FIGURE 4 ends at step 450.
FIGURE 5 provides a computing device for implementing the present technology.
Computing device 500 may be used to implement devices such as for example data base servers 130 and 140 and database 110. FIGURE 5 illustrates an exemplary computing system 500 that may be used to implement a computing device for use with the present technology. System 500 of FIGURE 5 may be implemented in the contexts of the likes of client computer 210, servers that comprise services 230-250 and 270-280, application server 260, and data store 267. The computing system 500 of FIGURE 5 includes one or more processors 510 and memory 520. Main memory 520 stores, in part, instructions and data for execution by processor 510. Main memory 520 can store the executable code when in operation. The system 500 of FIGURE 5 further includes a mass storage device 530, portable storage medium drive(s) 540, output devices 550, user input devices 560, a graphics display 570, and peripheral devices 580.
The components shown in FIGURE 5 are depicted as being connected via a single bus 590. However, the components may be connected through one or more data transport means. For example, processor unit 510 and main memory 520 may be connected via a local microprocessor bus, and the mass storage device 530, peripheral device(s) 580, portable storage device 540, and display system 570 may be connected via one or more input/output (I/O) buses. Mass storage device 530, which may be implemented with a magnetic disk drive or an optical disk drive, is a non-volatile storage device for storing data and instructions for use by processor unit 510. Mass storage device 530 can store the system software for implementing embodiments of the present invention for purposes of loading that software into main memory 520.
Portable storage device 540 operates in conjunction with a portable non-volatile storage medium, such as a floppy disk, compact disk or Digital video disc, to input and output data and code to and from the computer system 500 of FIGURE 5. The system software for implementing embodiments of the present invention may be stored on such a portable medium and input to the computer system 500 via the portable storage device 540.
Input devices 560 provide a portion of a user interface. Input devices 560 may include an alpha-numeric keypad, such as a keyboard, for inputting alpha-numeric and other information, or a pointing device, such as a mouse, a track ball, stylus, or cursor direction keys. Additionally, the system 500 as shown in FIGURE 5 includes output devices 550. Examples of suitable output devices include speakers, printers, network interfaces, and monitors.
Display system 570 may include a liquid crystal display (LCD) or other suitable display device. Display system 570 receives textual and graphical information, and processes the information for output to the display device.
Peripherals 580 may include any type of computer support device to add additional functionality to the computer system. For example, peripheral device(s) 580 may include a modem or a router.
The components contained in the computer system 500 of FIGURE 5 are those typically found in computer systems that may be suitable for use with embodiments of the present invention and are intended to represent a broad category of such computer components that are well known in the art. Thus, the computer system 500 of FIGURE 5 can be a personal computer, hand held computing device, telephone, mobile computing device, workstation, server, minicomputer, mainframe computer, or any other computing device. The computer can also include different bus configurations, networked platforms, multi-processor platforms, etc. Various operating systems can be used including Unix, Linux, Windows, Macintosh OS, Palm OS, and other suitable operating systems.
The foregoing detailed description of the technology herein has been presented for purposes of illustration and description. It is not intended to be exhaustive or to limit the technology to the precise form disclosed. Many modifications and variations are possible in light of the above teaching. The described embodiments were chosen in order to best explain the principles of the technology and its practical application to thereby enable others skilled in the art to best utilize the technology in various embodiments and with various modifications as are suited to the particular use contemplated. It is intended that the scope of the technology be defined by the claims appended hereto.

Claims

CLAIMS WHAT IS CLAIMED IS:
1. A method for replicating data, comprising:
allocating a memory location in a first server;
sending a remote direct memory access command from a client to a first server and a second server to write data to the memory location; and
updating an index structure for each of the first server and second server with information regarding the data.
2. The method of claim 1, wherein allocating includes finding an unused data block in a data structure within each of the first server and the second server.
3. The method of claim 1, wherein allocating includes marking a data block in a data structure within the first server and the second server as vised.
4. The method of claim 1, wherein the information regarding the data includes an updated pointer to the memory block.
5. The method of claim 1, wherein the write at the first server memory location does not utilize a server process.
6. The method of claim 1, wherein each index structure is associated with a table, each table associated with a single write client.
7. The method of claim 1, further comprising:
finding desired data in the index structure of one of the first server and the second server;
determining the location of the data from a pointer in the index structure and associated with the data;
retrieving the data using a remote ciirect memory access command from a client to a first server; and
detecting whether the index structure changed.
8. A computer readable storage medium having embodied thereon a program, the program being executable by a processor to perform a method for replicating data, the method comprising:
allocating a memory location in a first server;
sending a remote direct memory access command from a client to a first server and a second server to write data to the memory location; and
updating an index structure for each of the first server and second server with information regarding the data.
9. The computer readable storage medium of claim 8, wherein allocating includes finding an unused data block in a data structure within each of the first server and the second server.
10. The computer readable storage medium of claim 8, wherein allocating includes marking a data block in a data structure within the first server and the second server as used.
11. The computer readable storage medium of claim 8, wherein the information regarding the data includes an updated pointer to the memory block.
12. The computer readable storage medium of claim 8, wherein the write at the first server memory location does not utilize a server process.
13. The computer readable storage medium of claim 8, wherein each index structure is associated with a table, each table associated with a single write client.
14. The computer readable storage medium of claim 8, the method further comprising:
finding desired data in the index structure of one of the first server and the second server;
determining the location of the data from a pointer in the index structure and associated with the data;
retrieving the data using a remote direct memory access command from a client to a first server; and
detecting whether the index structure changed.
15. A system for displaying data, comprising:
a processor;
memory; and
one or more modules stored in memory and executed by the processor to allocate a memory location in a first server, send a remote direct memory access command from a client to a first server and a second server to write data to the memory location, update an index structure for each of the first server and second server with information regarding the data.
16. The system of claim 15, wherein allocating includes finding an unused data block in a data structure within each of the first server and the second server.
17. The system of claim 15, wherein allocating includes marking a data block in a data structure within the first server as used.
18. The system of claim 15, wherein allocating includes marking a data block in a data structure within the first server and the second server as used.
19. The system of claim 15, wherein the write at the first server memory location does not utilize a server process.
20. The system of claim 15, wherein each index structure is associated with a table, each table associated with a single write client.
21. The system of claim 15, further comprising:
finding desired data in the index structure of one of the first server and the second server;
determining the location of the data from a pointer in the index structure and associated with the data;
retrieving the data using a remote direct memory access command from a client to a first server; and
detecting whether the index structure changed.
PCT/US2014/044924 2013-06-28 2014-06-30 Replicated database using one sided rdma WO2014210602A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US13/931,790 2013-06-28
US13/931,790 US20150006478A1 (en) 2013-06-28 2013-06-28 Replicated database using one sided rdma

Publications (1)

Publication Number Publication Date
WO2014210602A1 true WO2014210602A1 (en) 2014-12-31

Family

ID=52116645

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US2014/044924 WO2014210602A1 (en) 2013-06-28 2014-06-30 Replicated database using one sided rdma

Country Status (2)

Country Link
US (1) US20150006478A1 (en)
WO (1) WO2014210602A1 (en)

Families Citing this family (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9986028B2 (en) * 2013-07-08 2018-05-29 Intel Corporation Techniques to replicate data between storage servers
US9558146B2 (en) * 2013-07-18 2017-01-31 Intel Corporation IWARP RDMA read extensions
US9412146B2 (en) 2013-10-25 2016-08-09 Futurewei Technologies, Inc. System and method for distributed virtualization of GPUs in desktop cloud
CN105446827B (en) 2014-08-08 2018-12-14 阿里巴巴集团控股有限公司 Date storage method and equipment when a kind of database failure
US10025628B1 (en) 2015-06-26 2018-07-17 Amazon Technologies, Inc. Highly available distributed queue using replicated messages
US10303646B2 (en) 2016-03-25 2019-05-28 Microsoft Technology Licensing, Llc Memory sharing for working data using RDMA
CN111221773B (en) * 2020-01-15 2023-05-16 华东师范大学 Data storage architecture method based on RDMA high-speed network and skip list
US11620254B2 (en) * 2020-06-03 2023-04-04 International Business Machines Corporation Remote direct memory access for container-enabled networks
CN114817232A (en) * 2021-01-21 2022-07-29 华为技术有限公司 Method and device for accessing data

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6061678A (en) * 1997-10-31 2000-05-09 Oracle Corporation Approach for managing access to large objects in database systems using large object indexes
US6785706B1 (en) * 1999-09-01 2004-08-31 International Business Machines Corporation Method and apparatus for simplified administration of large numbers of similar information handling servers
US20060230119A1 (en) * 2005-04-08 2006-10-12 Neteffect, Inc. Apparatus and method for packet transmission over a high speed network supporting remote direct memory access operations
US20070226331A1 (en) * 2000-09-12 2007-09-27 Ibrix, Inc. Migration of control in a distributed segmented file system

Family Cites Families (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7856421B2 (en) * 2007-05-18 2010-12-21 Oracle America, Inc. Maintaining memory checkpoints across a cluster of computing nodes
US20090144388A1 (en) * 2007-11-08 2009-06-04 Rna Networks, Inc. Network with distributed shared memory
US8069366B1 (en) * 2009-04-29 2011-11-29 Netapp, Inc. Global write-log device for managing write logs of nodes of a cluster storage system
US8601222B2 (en) * 2010-05-13 2013-12-03 Fusion-Io, Inc. Apparatus, system, and method for conditional and atomic storage operations
CN102598019B (en) * 2009-09-09 2015-08-19 才智知识产权控股公司(2) For equipment, the system and method for memory allocated
US8327102B1 (en) * 2009-10-21 2012-12-04 Netapp, Inc. Method and system for non-disruptive migration
US8364640B1 (en) * 2010-04-09 2013-01-29 Symantec Corporation System and method for restore of backup data
US20120011176A1 (en) * 2010-07-07 2012-01-12 Nexenta Systems, Inc. Location independent scalable file and block storage
US8856460B2 (en) * 2010-09-15 2014-10-07 Oracle International Corporation System and method for zero buffer copying in a middleware environment
US8650165B2 (en) * 2010-11-03 2014-02-11 Netapp, Inc. System and method for managing data policies on application objects
WO2012116369A2 (en) * 2011-02-25 2012-08-30 Fusion-Io, Inc. Apparatus, system, and method for managing contents of a cache
US8806160B2 (en) * 2011-08-16 2014-08-12 Pure Storage, Inc. Mapping in a storage system
US9043283B2 (en) * 2011-11-01 2015-05-26 International Business Machines Corporation Opportunistic database duplex operations

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6061678A (en) * 1997-10-31 2000-05-09 Oracle Corporation Approach for managing access to large objects in database systems using large object indexes
US6785706B1 (en) * 1999-09-01 2004-08-31 International Business Machines Corporation Method and apparatus for simplified administration of large numbers of similar information handling servers
US20070226331A1 (en) * 2000-09-12 2007-09-27 Ibrix, Inc. Migration of control in a distributed segmented file system
US20060230119A1 (en) * 2005-04-08 2006-10-12 Neteffect, Inc. Apparatus and method for packet transmission over a high speed network supporting remote direct memory access operations

Also Published As

Publication number Publication date
US20150006478A1 (en) 2015-01-01

Similar Documents

Publication Publication Date Title
US20150006478A1 (en) Replicated database using one sided rdma
US10754562B2 (en) Key value based block device
US10824673B2 (en) Column store main fragments in non-volatile RAM and the column store main fragments are merged with delta fragments, wherein the column store main fragments are not allocated to volatile random access memory and initialized from disk
CN110998557B (en) High availability database system and method via distributed storage
US9697247B2 (en) Tiered data storage architecture
US8392388B2 (en) Adaptive locking of retained resources in a distributed database processing environment
US8924357B2 (en) Storage performance optimization
US9424137B1 (en) Block-level backup of selected files
US10891074B2 (en) Key-value storage device supporting snapshot function and operating method thereof
CN103597440A (en) Method for creating clone file, and file system adopting the same
JP7062750B2 (en) Methods, computer programs and systems for cognitive file and object management for distributed storage environments
US20150193526A1 (en) Schemaless data access management
US10248668B2 (en) Mapping database structure to software
US20140223100A1 (en) Range based collection cache
US10048883B2 (en) Integrated page-sharing cache storing a single copy of data where the data is stored in two volumes and propagating changes to the data in the cache back to the two volumes via volume identifiers
US11663166B2 (en) Post-processing global deduplication algorithm for scaled-out deduplication file system
US10970175B2 (en) Flexible per-request data durability in databases and other data stores
US20200387412A1 (en) Method To Manage Database
KR102214697B1 (en) A computer program for providing space managrment for data storage in a database management system
US7051158B2 (en) Single computer distributed memory computing environment and implementation thereof
CN108694209B (en) Distributed index method based on object and client
US7130931B2 (en) Method, system, and article of manufacture for selecting replication volumes
CN115552391B (en) Zero-copy optimization of Select queries
US11537597B1 (en) Method and system for streaming data from portable storage devices
KR102227113B1 (en) A file processing apparatus based on a shared file system

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 14817512

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 14817512

Country of ref document: EP

Kind code of ref document: A1