US11386078B2 - Distributed trust data storage system - Google Patents
Distributed trust data storage system Download PDFInfo
- Publication number
- US11386078B2 US11386078B2 US16/222,931 US201816222931A US11386078B2 US 11386078 B2 US11386078 B2 US 11386078B2 US 201816222931 A US201816222931 A US 201816222931A US 11386078 B2 US11386078 B2 US 11386078B2
- Authority
- US
- United States
- Prior art keywords
- transaction
- data partition
- data
- log
- storage system
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active, expires
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/23—Updating
- G06F16/2379—Updates performed during online database operations; commit processing
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/27—Replication, distribution or synchronisation of data between databases or within a distributed database system; Distributed database system architectures therefor
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/06—Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
- G06F3/0601—Interfaces specially adapted for storage systems
- G06F3/0602—Interfaces specially adapted for storage systems specifically adapted to achieve a particular effect
- G06F3/0614—Improving the reliability of storage systems
- G06F3/0619—Improving the reliability of storage systems in relation to data integrity, e.g. data losses, bit errors
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/06—Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
- G06F3/0601—Interfaces specially adapted for storage systems
- G06F3/0602—Interfaces specially adapted for storage systems specifically adapted to achieve a particular effect
- G06F3/062—Securing storage systems
- G06F3/0623—Securing storage systems in relation to content
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/06—Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
- G06F3/0601—Interfaces specially adapted for storage systems
- G06F3/0628—Interfaces specially adapted for storage systems making use of a particular technique
- G06F3/0638—Organizing or formatting or addressing of data
- G06F3/0644—Management of space entities, e.g. partitions, extents, pools
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/06—Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
- G06F3/0601—Interfaces specially adapted for storage systems
- G06F3/0628—Interfaces specially adapted for storage systems making use of a particular technique
- G06F3/0646—Horizontal data movement in storage systems, i.e. moving data in between storage devices or systems
- G06F3/065—Replication mechanisms
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/06—Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
- G06F3/0601—Interfaces specially adapted for storage systems
- G06F3/0628—Interfaces specially adapted for storage systems making use of a particular technique
- G06F3/0655—Vertical data movement, i.e. input-output transfer; data movement between one or more hosts and one or more storage devices
- G06F3/0659—Command handling arrangements, e.g. command buffers, queues, command scheduling
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/06—Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
- G06F3/0601—Interfaces specially adapted for storage systems
- G06F3/0668—Interfaces specially adapted for storage systems adopting a particular infrastructure
- G06F3/0671—In-line storage system
- G06F3/0673—Single storage device
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L9/00—Cryptographic mechanisms or cryptographic arrangements for secret or secure communications; Network security protocols
- H04L9/06—Cryptographic mechanisms or cryptographic arrangements for secret or secure communications; Network security protocols the encryption apparatus using shift registers or memories for block-wise or stream coding, e.g. DES systems or RC4; Hash functions; Pseudorandom sequence generators
- H04L9/0643—Hash functions, e.g. MD5, SHA, HMAC or f9 MAC
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L9/00—Cryptographic mechanisms or cryptographic arrangements for secret or secure communications; Network security protocols
- H04L9/08—Key distribution or management, e.g. generation, sharing or updating, of cryptographic keys or passwords
- H04L9/0891—Revocation or update of secret information, e.g. encryption key update or rekeying
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L9/00—Cryptographic mechanisms or cryptographic arrangements for secret or secure communications; Network security protocols
- H04L9/08—Key distribution or management, e.g. generation, sharing or updating, of cryptographic keys or passwords
- H04L9/0894—Escrow, recovery or storing of secret information, e.g. secret key escrow or cryptographic key storage
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/23—Updating
- G06F16/2365—Ensuring data consistency and integrity
Definitions
- the subject matter described herein relates generally to distributed computing and more specifically to logging transactions in a distributed data storage system.
- a distributed data storage system can store data across multiple computing nodes. These computing nodes can be located across different racks, availability zones, and/or data centers in the distributed data storage system. Furthermore, the distributed data storage system can be configured to store data from multiple tenants. Data from each individual tenant can be organized into one or more data partitions and stored in at least one data container. Moreover, each data partition can be stored in one of the computing nodes in the distributed data storage system. As such, locating data within the distributed data storage system, for example, in order to respond to a query (e.g., a structured query language (SQL) statement and/or the like), can require identifying the data partition and/or the data container holding the data. Alternatively and/or additionally, locating data within the distributed data storage system can require identifying the data center, availability zone, rack, and/or computing node storing the data.
- SQL structured query language
- the system may include at least one data processor and at least one memory.
- the at least one memory may store instructions that result in operations when executed by the at least one data processor.
- the operations may include: receiving, at a node comprising a distributed trust data storage system, a request from a client to execute a first transaction modifying at least a portion of a first data partition, replicas of the first data partition being stored at a plurality of nodes comprising the distributed trust data storage system; responding to the request by at least sending, to the plurality of nodes storing replicas of the first data partition, the first transaction to modify at least the portion of the first data partition; and committing the first transaction based at least on a threshold quantity of the plurality of nodes reaching a consensus by determining a same cryptographic hash for the first transaction, the first transaction being committed by at least sending, to the plurality of nodes, an indication to add a first entry corresponding to the first transaction to a transaction log recording one or more transactions executed on at least the portion of the first data partition.
- the cryptographic hash of the first transaction may be determined based at least on the first transaction.
- the cryptographic hash of the first transaction may be further determined based on a cryptographic hash of at least a second transaction.
- the second transaction may be a previous transaction modifying a same data record from the first data partition as the first transaction.
- the entry corresponding to the first transaction may include a globally unique identifier associated with the first transaction and the cryptographic hash of at least the second transaction.
- the transaction log may further include a second entry corresponding to the second transaction.
- the second entry may include a globally unique identifier associated with the second transaction and a cryptographic hash of at least a third transaction preceding the second transaction and modifying a same data record as the second transaction.
- the first transaction may further modify at least a portion of a second data partition.
- the first transaction may be further committed by at least adding the first entry to another transaction log recording one or more transactions executed on at least the portion of the second data partition.
- At least one of the plurality of nodes may store a replica of the first data partition and a replica of the second data partition. None of the plurality of nodes may store more than one replica of a same data partition.
- the transaction log and the other transaction log may be interlinked by a mesh of transactions including the first transaction and a second transaction that modify a same data record from the first data partition and/or the second data partition as the first transaction.
- At least a portion of the plurality of nodes storing the first data partition may be queried.
- the portion of the plurality of nodes may be queried to at least retrieve one or more entries from the transaction log recording the one or more transactions executed on at least the portion of the first data partition.
- the one or more entries retrieved from the plurality of nodes may be returned to the client to at least enable a detection of a forged transaction.
- the forged transaction may be detected based at least on a mismatch between a first entry retrieved from a first node storing a first replica of the first data partition and a second entry received from a second node storing a second replica of the first data partition.
- a remaining portion of the plurality of nodes storing the first data partition may be queried to at least retrieve the one or more entries from the transaction log recording the one or more transactions executed on at least the portion of the first data partition.
- the response to the other request may be determined based on the one or more entries of the transaction log retrieved from the remaining portion of the plurality of nodes.
- the one or more entries retrieved from the plurality of nodes may include the first entry corresponding to the first transaction and a third entry corresponding to a second transaction preceding the first transaction.
- the forged operation may be further detected based at least on a mismatch in a hash value of the second transaction stored in a key-value pair associated with the first transaction and an actual hash of the second transaction.
- a key of the key-value pair may include the globally unique identifier of the second transaction.
- a value of the key-value pair may include a cryptographic hash of at least a portion of the second entry corresponding to the second transaction.
- a method for a distributed trust data storage system may include: receiving, at a node comprising a distributed trust data storage system, a request from a client to execute a first transaction modifying at least a portion of a first data partition, replicas of the first data partition being stored at a plurality of nodes comprising the distributed trust data storage system; responding to the request by at least sending, to the plurality of nodes storing replicas of the first data partition, the first transaction to modify at least the portion of the first data partition; and committing the first transaction based at least on a threshold quantity of the plurality of nodes reaching a consensus by determining a same cryptographic hash for the first transaction, the first transaction being committed by at least sending, to the plurality of nodes, an indication to add a first entry corresponding to the first transaction to a transaction log recording one or more transactions executed on at least the portion of the first data partition.
- the cryptographic hash of the first transaction may be determined based at least on the first transaction.
- the cryptographic hash of the first transaction may be further determined based on a cryptographic hash of at least a second transaction.
- the second transaction may be a previous transaction modifying a same data record from the first data partition as the first transaction.
- the entry corresponding to the first transaction may include a globally unique identifier associated with the first transaction and the cryptographic hash of at least the second transaction.
- the transaction log may further include a second entry corresponding to the second transaction.
- the second entry may include a globally unique identifier associated with the second transaction and a cryptographic hash of at least a third transaction preceding the second transaction and modifying a same data record as the second transaction.
- the first transaction may further modify at least a portion of a second data partition.
- the first transaction may be further committed by at least adding the first entry to another transaction log recording one or more transactions executed on at least the portion of the second data partition.
- At least one of the plurality of nodes may store a replica of the first data partition and a replica of the second data partition. None of the plurality of nodes may store more than one replica of a same data partition.
- the transaction log and the other transaction log may be interlinked by a mesh of transactions including the first transaction and a second transaction that modify a same data record from the first data partition and/or the second data partition as the first transaction.
- the method may further include: responding to another request from the client to read the first data partition by querying at least a portion of the plurality of nodes storing the first data partition, the portion of the plurality of nodes being queried to at least retrieve one or more entries from the transaction log recording the one or more transactions executed on at least the portion of the first data partition; detecting a forged transaction based at least on a mismatch between a first entry retrieved from a first node storing a first replica of the first data partition and a second entry received from a second node storing a second replica of the first data partition; in response to detecting the forged transaction, querying a remaining portion of the plurality of nodes storing the first data partition to at least retrieve the one or more entries from the transaction log recording the one or more transactions executed on at least the portion of the first data partition; and in response to a match in the one or more entries of the transaction log retrieved from a quorum of the remaining portion of the plurality of nodes, responding to the other request based
- the one or more entries retrieved from the plurality of nodes may include the first entry corresponding to the first transaction and a third entry corresponding to a second transaction preceding the first transaction.
- the forged operation may be further detected based at least on a mismatch in a hash value of the second transaction stored in a key-value pair associated with the first transaction and an actual hash of the second transaction.
- a key of the key-value pair may include the globally unique identifier of the second transaction.
- a value of the key-value pair may include a cryptographic hash of at least a portion of the second entry corresponding to the second transaction.
- a computer program product for a distributed trust data storage system.
- the computer program product may include a non-transitory computer readable medium storing instructions that cause operations when executed by at least one data processor.
- the operations may include: receiving, at a node comprising a distributed trust data storage system, a request from a client to execute a first transaction modifying at least a portion of a first data partition, replicas of the first data partition being stored at a plurality of nodes comprising the distributed trust data storage system; responding to the request by at least sending, to the plurality of nodes storing replicas of the first data partition, the first transaction to modify at least the portion of the first data partition; and committing the first transaction based at least on a threshold quantity of the plurality of nodes reaching a consensus by determining a same cryptographic hash for the first transaction, the first transaction being committed by at least sending, to the plurality of nodes, an indication to add a first entry corresponding to the first transaction to a transaction log recording one or more transactions executed on at least
- Implementations of the current subject matter can include, but are not limited to, methods consistent with the descriptions provided herein as well as articles that comprise a tangibly embodied machine-readable medium operable to cause one or more machines (e.g., computers, etc.) to result in operations implementing one or more of the described features.
- machines e.g., computers, etc.
- computer systems are also described that may include one or more processors and one or more memories coupled to the one or more processors.
- a memory which can include a non-transitory computer-readable or machine-readable storage medium, may include, encode, store, or the like one or more programs that cause one or more processors to perform one or more of the operations described herein.
- Computer implemented methods consistent with one or more implementations of the current subject matter can be implemented by one or more data processors residing in a single computing system or multiple computing systems. Such multiple computing systems can be connected and can exchange data and/or commands or other instructions or the like via one or more connections, including, for example, to a connection over a network (e.g. the Internet, a wireless wide area network, a local area network, a wide area network, a wired network, or the like), via a direct connection between one or more of the multiple computing systems, etc.
- a network e.g. the Internet, a wireless wide area network, a local area network, a wide area network, a wired network, or the like
- FIG. 1 depicts a system diagram illustrating a distributed data storage system consistent with some implementations of the current subject matter
- FIG. 2 depicts replicas of a data partition being stored in a portion of a distributed data storage system consistent with some implementations of the current subject matter
- FIG. 3 depicts a portion of a mesh of transactions interlinking multiple transaction logs consistent with some implementations of the current subject matter
- FIG. 4A depicts a flowchart illustrating a process for performing an operation to modify a data partition stored in a distributed data storage system consistent with some implementations of the current subject matter
- FIG. 4B depicts a flowchart illustrating a process for performing an operation to read a data partition stored in a distributed data storage system consistent with some implementations of the current subject matter.
- FIG. 5 depicts a block diagram illustrating a computing system consistent with some implementations of the current subject matter.
- a distributed trust data storage system can store data across a plurality of distributed data storage systems. For example, a data container holding an organized collection of data objects can be divided into one or more data partitions. Furthermore, each data partition can be replicated multiple times and a replica of each data partition can be stored at a distributed data storing system having one or more individual computing nodes. Consistency across the replicas of a single data partition can be maintained based on a state machine at each distributed data storage system. Each state machine may operate in accordance to a consensus protocol. For instance, a client of the distributed trust data storage system can request to execute a transaction, which may require the performance of one or more operations causing changes to data records held in one or more different data partitions.
- These changes to the one or more data partition can be propagated to replicas of the same data partitions stored at various computing nodes within different distributed data storage systems across the distributed trust data storage system.
- the state machine at each of the affected distributed data storage systems can undergo one or more corresponding state transitions if the changes to the data partitions are successfully propagated to a threshold quantity of computing nodes at each distributed data storage system storing replicas of the data partitions.
- each replica of a data partition can include one or more snapshots of a state machine.
- a snapshot of the state machine can be created at a checkpoint and can therefore capture a state of the data partition at that checkpoint.
- each replica of the data partition can include a replica of a transaction log for recording transactions executed on data records in the data partition subsequent to the checkpoint.
- a transaction may include one or more operations, each of which being performed on a different data record from the same data partition and/or a different data partition. That is, a single transaction can affect multiple data partitions, the replicas of which being stored across a plurality distributed data storage systems.
- a transaction affecting multiple data partitions can be added as a log entry to the transaction logs of these data partitions if the transaction is confirmed by a threshold quantity of the distributed data storage systems storing replicas of every data partition affected by the transaction.
- a log entry in the transaction log of one data partition can store the hashes of every previous transaction that affected the same data records, including data records from other data partitions, thereby creating an interlinked and immutable mesh of transactions in the transaction logs stored across the distributed trust data storage system.
- the sequence of transactions stored in every transaction log across all data partitions can be immune to modifications, as the transaction logs can be built as a cryptographically secured mesh of cross-linked transactions.
- this sequence of transactions may be immutable over all data and is linked to form a mesh of transactions that is resistant to malicious attacks.
- the transactions recorded in the transaction logs can be replayed on a snapshot of the corresponding state machine created prior to the crash to restore the replica of the data partition to the state prior to the crash.
- replicas of a data partition can be represented as collections of key-value pairs held in key-value stores at various computing nodes in various distributed data storage systems across the distributed trust data storage system.
- each individual data record in a data partition can be associated with a key in a key-value pair.
- the execution of a transaction which can include performing one or more corresponding operations on data records from at least one data partition, can modify the value in the key-value pairs representative of the data records.
- a log entry can be generated in response to the execution of a transaction which, as noted, can affect multiple data partitions by at least modifying data records from different data partitions.
- the log entry can include a cryptographic hash determined based on the current transaction that is being executed as well as one or more previous transactions affecting the same data records. Because a single transaction can modify data records from multiple data partitions, a corresponding log entry can be added to the transaction logs of multiple data partitions, thereby interlinking the transaction logs of these data partitions.
- the log entry corresponding to a transaction can be added to the transaction logs held at the distributed data storage systems storing replicas of every data partition affected by the transaction if a threshold quantity of these distributed data storage systems are able to reach a consensus by at least determining a same cryptographic hash.
- the state of an individual data partition can be tracked by multiple distributed data storage systems, including those storing replicas of different data partitions. For example, a chain of transactions affecting a data partition, for example, by modifying data records within that data partition, can be determined by at least traversing the transaction logs at every distributed data storage system within the distributed trust data storage system.
- FIG. 1 depicts a system diagram illustrating a distributed trust data storage system 100 consistent with implementations of the current subject matter as present at each participant of the distributed trust.
- the distributed trust data storage system 100 can include a plurality of distributed data storage systems including, for example, a first distributed data storage system 110 A, a second distributed data storage system 110 B, a third distributed data storage system 110 C, and/or the like.
- the distributed trust data storage system 100 can be communicatively coupled, via a network 140 , with one or more clients including, for example, a client 130 .
- the network 140 can be any wired and/or wireless network including, for example, a public land mobile network (PLMN), a local area network (LAN), a virtual local area network (VLAN), a wide area network (WAN), the Internet, and/or the like.
- PLMN public land mobile network
- LAN local area network
- VLAN virtual local area network
- WAN wide area network
- the client 130 can be any processor-based device including, for example, a mobile device, a wearable device, a tablet computer, a desktop computer, a laptop computer, and/or the like.
- the distributed trust data storage system 100 can be configured to store one or more data containers, each of which holding an organized collection of data objects.
- a data container can be divided into one or more data partitions, for example, based on a partition key that includes a placement prefix and a cryptographic hash of a data key.
- data from the data container can be placed across the distributed trust data storage system 100 in a pseudo-random manner.
- each data partition can be replicated multiple times and a replica of each data partition can be stored at multiple computing nodes within an individual distributed data storage system, such as, for example, the first distributed data storage system 110 A, the second distributed data storage system 110 B, and/or the third distributed data storage system 110 C.
- a data container 180 can be divided into a plurality of data partitions including, for example, a first data partition 170 A, a second data partition 170 B, and/or the like.
- the first data partition 170 A and/or the second data partition 170 B can each be replicated a k quantity of times and each of the k quantity of replicas can be stored at one of a k quantity of distributed data storage systems within the distributed trust data storage system 100 .
- the first distributed data storage system 110 A and/or the second distributed data storage system 110 B can each store a single replica of the first data partition 170 A while the third distributed data storage system 110 C can store at least one replica of the second data partition 170 B.
- FIG. 1 shows the distributed trust data storage system 100 as including multiple distributed data storage systems that each include one or more clusters of computing nodes, it should be appreciated that the distributed trust data storage system 100 can also be formed from individual computing nodes. Furthermore, it should be appreciated that a single entity within the distributed trust data storage system 100 (e.g., the first distributed data storage system 110 A, the second distributed data storage system 110 B, the third distributed data storage system 110 C, and/or the like) can store no more than a single replica of a data partition. Accordingly, each of the k quantity of replicas of the first data partition 170 A and/or the second partition 170 B can be stored at and/or managed by a different entity, thereby preventing a single entity from assuming control over an entire data partition.
- the distributed trust data storage system 100 can also be formed from individual computing nodes.
- a single entity within the distributed trust data storage system 100 e.g., the first distributed data storage system 110 A, the second distributed data storage system 110 B, the third distributed data storage system 110 C, and
- the placement of data records across different data partitions can be performed based on the partition key assigned to each data record.
- the partition key associated with a data record can be formed by concatenating a placement prefix and a cryptographic hash of the data key associated with the data record. While the placement prefix portion of the partition key can ensure that the data record is placed at a specific location (e.g., a distributed data storage system located in a certain jurisdiction), the cryptographic hash portion of the partition can preserve the pseudo-random nature of the placement.
- each replica of the first data partition 170 A can include one or more snapshots of a state machine configured to track changes to the first data partition 170 A.
- each replica of the second data partition 170 B can also include one or more snapshots of a state machine configured to track changes to the second data partition 170 B.
- a snapshot of the corresponding state machine can be created at a checkpoint to capture a state of the first data partition 170 A and/or the second data partition 170 B at that checkpoint.
- each replica of the data partition can include one or more transaction logs for recording operations performed on the first data partition 170 A and/or the second data partition 170 B subsequent to the checkpoint.
- Each replica of a data partition can be stored at a different distributed data storage system. Meanwhile, multiple replicas of the same data partition can be prevented from being stored at a single distributed data storage system. For example, as shown in FIGS. 1-2 , a first replica of the first data partition 170 A can be stored at the first distributed data storage system 110 A, a second replica of the first data partition 170 A can be stored at the second distributed data storage system 110 B, and a third replica of the first data partition 170 A can be stored at the third distributed data storage system 110 C.
- replicas of a first transaction log 115 A which records record transactions affecting at least the first data partition 170 A, can be stored across the first distributed data storage system 110 A, the second distributed data storage system 110 B, and the third distributed data storage system 110 C, for example, at different data stores within the first distributed data storage system 110 A, the second distributed data storage system 110 B, and the third distributed data storage system 110 C.
- a single replica of the second data partition 170 B can also be stored at the first distributed data storage system 110 A, the second distributed data storage system 110 B, and/or the third distributed data storage system 110 C.
- FIGS. 1-2 shows replicas of a second transaction log 115 B recording transactions that affect at least the second data partition 170 B as being stored at different data stores across the first distributed data storage system 110 A, the second distributed data storage system 110 B, and the third distributed data storage system 110 C.
- data records from the first data partition 170 A and/or the second data partition 170 B can be stored as key-value pairs in a key-value store.
- the first distributed data storage system 110 A, the second distributed data storage system 110 B, and/or the third distributed data storage system 110 C can each include multiple data stores. These data stores can each be a key-value store configured to store data records in the form of one or more key-value pairs (KVPs).
- KVPs key-value pairs
- the data stores forming the first distributed data storage system 110 A, the second distributed data storage system 110 B, and/or the third distributed data storage system 110 C can each be a hybrid key-value store in which data records that do not exceed a threshold size (e.g., 2 kilobytes and/or a different size) are stored in an in-memory key-value store and data records that do exceed the threshold size (e.g., 2 kilobytes and/or a different size) are stored in a secondary data store.
- a threshold size e.g., 2 kilobytes and/or a different size
- an in-memory key-value store can be implemented using any type of persistence that supports low latency access including, for example, random access memory (RAM) and/or the like.
- the secondary data store can be implemented using any type of persistence that supports high capacity storage including, for example, hard disk and/or the like.
- the first transaction log 115 A can record transactions affecting at least the first data partition 170 A, for example, by modifying data records from the first data partition 170 A.
- the second transaction log 115 B can record transactions affecting at least the second data partition 170 B, for example, by modifying data records from the second data partition 170 B.
- the first transaction log 115 A and/or the second transaction log 115 B can each be stored as a page list in the data stores within first distributed data storage system 110 A, the second distributed data storage system 110 B, and/or the third distributed data storage system 110 C. It should be appreciated that the first data partition 170 A and/or the second data partition 170 B can be associated with additional transaction logs.
- the first transaction log 115 A and/or the second transaction log 115 B can record transactions affecting only a portion of the corresponding data partition.
- the transaction logs recording transactions performed on other portions of the first data partition 170 A and/or the second data partition 170 B can be stored as different page lists in the first distributed data storage system 110 A, the second distributed data storage system 110 B, and/or the third distributed data storage system 110 C.
- the first transaction log 115 A can store an immutable sequence of transactions that affect at least the first data partition 170 A, for example, by modifying data records from the first data partition 170 A.
- the second transaction log 115 B can store an immutable sequence of transactions that affect at least the second data partition 170 B, for example, by modifying data records from the second data partition 170 B.
- a single transaction can include multiple operations, each of which modifying a data record from a same data partition and/or a different data partition.
- the distributed trust data storage system 100 can receive, from the client 130 , a request to execute a transaction that includes a first operation that modifies a first data record from the first data partition 170 A and a second operation that modifies a second data record from the second data partition 170 B. Accordingly, that transaction can be recorded as part of a log entry in the first transaction log 115 A and the second transaction log 115 B.
- the first transaction log 115 A and the second transaction log 115 B can each include one or more log entries.
- Each log entry can include a cryptographic hash determined based on a current transaction as well as one or more previous transactions that modify the same data records.
- a log entry can be added to the first transaction log 115 A and/or the second transaction log 115 B if a threshold quantity of the distributed data storage systems storing replicas of the data partitions affected by the corresponding transaction are able to reach a consensus, for example, by determining a same cryptographic hash for the transaction.
- the distributed trust data storage system 100 is responding to a request to execute a transaction that affects the first data partition 170 A and the second data partition 170 B.
- a log entry corresponding to that transaction can be added to the first transaction log 115 A and the second transaction log 115 B if a
- k 2 quantity (or a different threshold quantity) of the distributed data storage system storing replicas of the first data partition 170 A and the second data partition 170 B (e.g., the first distributed data storage system 110 A, the second distributed data storage system 110 B, the third distributed data storage system 110 C, and/or the like) determine a same cryptographic hash for that transaction.
- FIG. 3 depicts a portion of a mesh of transactions 300 interlinking multiple transaction logs consistent with some implementations of the current subject matter.
- the first transaction log 115 A can record transactions affecting at least the first data partition 170 A while the second transaction log 115 B can record transactions affecting at least the second data partition 170 B. Since a single transaction can affect multiple data partitions, the respective transaction logs for these data partitions can be interlinked by the corresponding log entries. Such transactions can form a mesh of transactions that interlink multiple transaction logs.
- a transaction can modify data records from the first data partition 170 A and the second data partition 170 B.
- the first transaction log 115 A and the second transaction log 115 B can at least be interlinked by the corresponding log entry of the transaction affecting both the first data partition 170 A and the second data partition 170 B.
- the basis for the mesh of transactions 300 can include plurality of log entries including, for example, a first log entry 310 A, a second log entry 310 B, and/or a third log entry 310 C.
- Each of the first log entry 310 A, the second log entry 310 B, and the third log entry 310 C can correspond to a transaction.
- the first log entry 310 A can correspond to a first transaction 322 A
- the second log entry 310 B can correspond to a second transaction 322 B
- the third log entry 310 C can correspond to a third transaction 322 C.
- Each of the first transaction 322 A, the second transaction 322 B, and/or the third transaction 322 C can affect at least one data partition, for example, by modifying data records from that data partition.
- the transaction logs of the data partitions affected by the first transaction 322 A, the second transaction 322 B, and/or the third transaction 322 C can include log entries corresponding to the first transaction 322 A, the second transaction 322 B, and/or the third transaction 322 C.
- the first transaction 322 A, the second transaction 322 B, and/or the third transaction 322 C can each modify data records from one or more data partitions including, for example, the first data partition 170 A, the second data partition 170 B, and/or the like.
- the first transaction log 115 A and the second transaction log 115 B can each include the first log entry 310 A corresponding to the first transaction 322 A, the second log entry 310 B corresponding to the second transaction 322 B, and/or the third log entry 310 C corresponding to the third transaction 322 C.
- each transaction that is executed at the distributed trust data storage system 100 can be associated with a globally unique identifier (GUID) that enables a differentiation between different transactions.
- GUID globally unique identifier
- the first transaction 322 A can be associated with a first globally unique identifier 325 A
- the second transaction 322 B can be associated with a second globally unique identifier 325 B
- the third transaction 322 C can be associated with a third globally unique identifier 325 C.
- each transaction can have a payload that includes, for example, the transaction itself and one or more key-value pairs identifying previous transactions modifying the same data records.
- FIG. 3 shows the first transaction 322 A having a first payload 320 A, the second transaction 322 B having a second payload 320 B, and/or the third transaction 322 C having a third payload 320 C.
- replicas of the first data partition 170 A can be stored at each of a k quantity of distributed data storage systems including, for example, the first distributed data storage system 110 A and the second distributed data storage system 110 B.
- replicas of the second data partition 170 B can be stored at another k quantity of distributed data storage systems including, for example, the third distributed data storage system 110 C.
- the first log entry 310 A corresponding to the first transaction 322 A, the second log entry 310 B corresponding to the second transaction 322 B, and/or the third log entry 310 C corresponding to the third transaction 322 C can each be propagated, in accordance with a consensus protocol, to each of the distributed data storage systems storing replicas of the data partitions (e.g., the first data partition 170 A, the second data partition 170 B, and/or the like) affected by the first transaction 322 A, the second transaction 322 B, and/or the third transaction 322 C.
- replicas of the data partitions e.g., the first data partition 170 A, the second data partition 170 B, and/or the like
- a node of the first distributed data storage system 110 A can act as a leader node in the consensus protocol while a node of the second distributed data storage system 110 B and/or the third distributed data storage system 110 C can each act as a follower node in the consensus protocol.
- the first distributed data storage system 110 A can receive, from the client 130 , a request to execute the first transaction 322 A, the second transaction 322 B, and/or the third transaction 322 C.
- the first distributed data storage system 110 A can determine whether the first transaction 322 A, the second transaction 322 B, and/or the third transaction 322 C should be committed, for example, by being added to the first transaction log 115 A and/or the second transaction log 115 B stored at each of the distributed data storage systems storing replicas of the first data partition 170 A and/or the second data partition 170 B including, for example, the first distributed data storage system 110 A, the second distributed data storage system 110 B, and/or the third distributed data storage system 110 C.
- the first distributed data storage system 110 A can determine that the first transaction 322 A, the second transaction 322 B, and/or the third transaction 322 C should be committed if a threshold quantity (e.g.,
- k 2 or a different quantity of the k quantity of distributed data storage systems storing replicas of the data partitions affected by the transactions can determine a same cryptographic hash for each operation.
- a transaction that affects multiple data partitions can appear in the transaction logs of multiple data partitions.
- the log entry for the transaction can further include a cryptographic hash of previous transactions that affect the same data records.
- the first transaction 322 A, the second transaction 322 B, and/or the third transaction 322 C can modify data records from at least the first data partition 170 A and the second data partition 170 B.
- the first log entry 310 A corresponding to the first transaction 322 A, the second log entry 310 B corresponding to the second transaction 322 B, and/or the third log entry 310 C corresponding to the third transaction 322 C can appear in the first transaction log 115 A associated with the first data partition 170 A as well as the second transaction log 115 B associated with the second data partition 170 B, thereby interlinking the first transaction log 115 A and the second transaction log 115 B.
- the first transaction log 115 A and the second transaction log 115 B can be further interlinked by the cryptographic hashes of previous transactions that are included in each of the first log entry 310 A, the second log entry 310 B, and/or the third log entry 310 C because the first transaction log 115 A and/or the second transaction log 115 B can further include log entries corresponding to these previous transactions.
- modifying any individual transaction log (e.g., the first transaction log 115 A, the second transaction log 115 B, and/or the like) by changing any one of the first log entry 310 A, the second log entry 310 B, and/or the third log entry 310 C can require a computation of the cryptographic hashes of all subsequent transactions recorded in the mesh of transactions 300 , which may interlink the first transaction log 115 A and the second transaction log 115 B. This computation can require a prohibitively large quantity of computational resources.
- the log entries included in the first transaction log 115 A and/or the second transaction log 115 B including, for example, the first log entry 310 A, the second log entry 310 B, and/or the third log entry 310 C can be immune to modification including, for example, via malicious attacks.
- the payload of transaction modifying data records from at least one data partition can include one or more key-value pairs.
- the key of each key-value pair can correspond to a globally unique identifier (GUID) of a previous transaction modifying the same data records while the value of each key-value pair can correspond to a cryptographic hash of a log entry for the transaction associated with that globally unique identifier (GUID).
- GUID globally unique identifier
- the first payload 320 A of the first log entry 310 A can include first key-value pairs 324 A.
- the keys of the first key-value pairs 324 A can correspond to the second globally unique identifier 325 B of the second transaction 322 B corresponding to the second log entry 310 B and/or the third globally unique identifier 325 C of the third transaction 322 C corresponding to the third log entry 310 C.
- the values of the first key-value pairs 324 A can correspond to cryptographic hashes of the second log entry 310 B for the second transaction 322 B associated with the second globally unique identifier 325 B and/or the third log entry 310 C for the third transaction 322 C associated with the third globally unique identifier 325 C.
- a cryptographic hash for the first transaction 322 A can be determined based on the first transactions 322 A as well as a cryptographic hash of one or more previous transactions modifying the same data records as the first transaction 322 A including, for example, the second transaction 322 B corresponding to the second log entry 310 B, the third transaction 322 C corresponding to the third log entry 310 C, and/or the like. It should be appreciated that any hash function can be used to generate the cryptographic hash of the first transaction 322 A and/or the cryptographic hash of the previous transactions including for example, MD5, SHA1, SHA2, and/or the like.
- the first log entry 310 A corresponding to the first transaction 322 A can be added to the first transaction log 115 A and/or the second transaction log 115 B if a threshold quantity (e.g.,
- replicas of the data partitions affected by the first transaction 322 A are able to determine a same cryptographic hash for the first transaction 322 A.
- a node of the first distributed data storage system 110 A in its capacity as the leader node in the consensus protocol, can distribute the first log entry 310 A corresponding to the first transaction 322 A to follower nodes participating in the consensus protocol including, for example, a node of the second distributed data storage system 110 B and/or the third distributed data storage system 110 C.
- the first log entry 310 A can include the first globally unique identifier (GUID) 325 A of the first transaction 322 A as well as the first payload 320 A.
- GUID globally unique identifier
- the first payload 320 A can include the first transaction 322 A itself and one or more key-value pairs 324 A identifying previous transactions that modified the same data records as the first transaction 322 A (e.g., the second transaction 322 B, the third transaction 322 C, and/or the like).
- Each follower node e.g., the second distributed data storage system 110 B, the third distributed data storage system 110 C, and/or the like
- each follower node can compute a cryptographic hash of one or more previous transactions modifying the same data records as the first transaction 322 A.
- the one or more previous transactions can be determined independently by each of the follower nodes, for example, by traversing the transaction logs (e.g., the first transaction log 115 A, the second transaction log 115 B, and/or the like) stored at each distributed data storage system within the distributed trust data storage system 100 because, as noted, the transaction logs can be interlinked by the mesh of transactions 300 stored across the transaction logs.
- the transaction logs e.g., the first transaction log 115 A, the second transaction log 115 B, and/or the like
- the cryptographic hashes determined by the follower nodes can be returned to a leader node of the first distributed data storage system 110 A which, in its capacity as the leader node, can determine whether a threshold quantity (e.g.,
- the follower nodes e.g., one or more nodes of the second distribute data storage system 110 B, the third distributed data storage system 110 C, and/or the like
- the follower nodes are able to reach a consensus by at least determining a same cryptographic hash. If a consensus was reached by a threshold quantity (e.g.,
- the first distributed data storage system 110 A can commit the first transaction 322 A by at least adding the corresponding first log entry 310 A to the replica of the first transaction log 115 A and/or the second transaction log 115 B stored at the first distributed data storage system 110 A. Furthermore, the first distributed data storage system 110 A can commit the first transaction 322 A by at least sending, to the second distributed data storage system 110 B and/or the third distributed data storage system 110 C, an indication to add the first log entry 310 A to the replicas of the first transaction log 115 A and/or the second transaction log 115 B stored at the second distributed data storage system 110 B and/or the third distributed data storage system 110 C.
- a cryptographic hash of the second transaction 322 B can be determined based on the second transaction 322 B as well as a cryptographic hash of one or more previous transactions modifying the same data records as the second transaction 322 B including, for example, the third transaction 322 C.
- the second log entry 310 B can be added to the replica of the first transaction log 115 A and/or the second transaction log 115 B if the first distributed data storage system 110 A, in its capacity as the leader node of the consensus protocol, determines that a threshold quantity (e.g.,
- k 2 or a different quantity) of follower nodes storing the data partitions affected by the second transaction 322 B are able to reach a consensus by at least determining a same cryptographic hash for the second transaction 322 B.
- the client 130 can request to perform an operation to read data stored in the distributed trust data storage system 100 including, for example, data records that are part of the first data partition 170 A and/or the second data partition 170 B.
- the operation to read data from the first data partition 170 A and/or the second data partition 170 B can be performed by at least determining a current state of at least a portion of the first data partition 170 A and/or the second data partition 170 B as well as a history of changes to those portions of the first data partition 170 A and/or the second data partition 170 B.
- the current state of the first data partition 170 A and/or the second data partition 170 B as well as the history of changes to the portion of the first data partition 170 A and/or the second data partition 170 B can be determined by querying the first transaction log 115 A and/or the second transaction log 115 B.
- replicas of the first transaction log 115 A and/or the second transaction log 115 can be stored across the distributed trust data storage system 100 including, for example, at the first distributed data storage system 110 A, the second distributed data storage system 110 B, the third distributed data storage system 110 C, and/or the like.
- the first transaction log 115 A and the second transaction log 115 B can be interlinked by at least storing a mesh of transactions (e.g., the mesh of transactions 300 ).
- querying the first transaction log 110 A and the second transaction log 110 B can include traversing the first transaction log 110 A and the second transaction log 110 B in order to retrieve a sequence of transactions corresponding to the first log entry 310 A, the second log entry 310 B, and/or the third log entry 310 C.
- the operation to read data from the first data partition 170 A and/or the second data partition 170 B can be performed by querying one or more of the distributed data storage systems in the meshed storage system 100 storing the replicas of first data partition 170 A and/or the second data partition 170 B including, for example, the replicas of the corresponding first transaction log 115 A and the second transaction log 115 B.
- the client 130 can query at least a
- the client 130 can verify the results received from each of the
- k 2 quantity of distributed data storage systems can indicate a presence of a forged transaction. These discrepancies can include a mismatch in the results received from two or more or the
- the client 130 may query the remaining distributed data storage systems in the distributed trust data storage system 100 that are storing replicas of the first data partition 170 A and/or the second data partition 170 B. If a quorum can be found, for example, with more than
- the payload of a transaction included in each log entry can include one or more key-value pairs.
- the key in each of these key-value pairs can correspond to a globally unique identifier of a previous transaction while the value in each of these key-value pairs can correspond to cryptographic hashes of the log entries for the transactions associated with these globally unique identifiers.
- the first payload 320 A of the first transaction 322 A can include first key-value pairs 324 A.
- the keys of the first key-value pairs 324 A can correspond to the second globally unique identifier 325 B of the second transaction 322 B and/or the third globally unique identifier 325 C of the third transaction 322 C.
- the values of the first key-value pairs 324 A can correspond to cryptographic hashes of the second log entry 310 B for the second transaction 322 B associated with the second globally unique identifier 325 B and/or the third log entry 310 C for the third transaction 322 C associated with the third globally unique identifier 325 C.
- the client 130 can verify the results received from the
- the client 130 can determine that the first transaction 322 A and/or the second transaction 322 B are forged based at least on a mismatch between the value stored in the first key-value pairs 324 A and a cryptographic hash of the second log entry 310 B for the second transaction 322 B associated with the second globally unique identifier 325 B.
- the client 130 can determine that the second transaction 322 B and/or the third transaction 322 C are forged based at least on a mismatch between the value of second key-value pairs 324 B and the a cryptographic hash of the third log entry 310 C for the third transaction 322 C associated with the third globally unique identifier 325 C.
- FIG. 4A depicts a flowchart illustrating a process 400 for executing a transaction in a distributed trust data storage system consistent with some implementations of the current subject matter.
- the process 400 can be performed by the distributed trust data storage system 100 , for example, the first distributed data storage system 110 A (or another distributed data storage system) to modify at least a portion of the first data partition 170 A and/or the second data partition 170 B.
- replicas of the first data partition 170 A and/or the second data partition 170 B can be stored across the distributed trust data storage system 100 , for example, at a k quantity of distributed data storage systems including, for example, the first distributed data storage system 110 A, the second distributed data storage system 110 B, and/or the third distributed data storage system 110 C.
- a single distributed data storage system can store replicas of multiple data partitions
- multiple replicas of a single data partitions may not be stored at a single distributed data storage system.
- the process 400 can be performed in order to propagate, in accordance to a consensus protocol, the transaction to the k quantity of distributed data storage systems storing replicas of the first data partition 170 A and/or the second data partition 170 B affected by the transaction.
- the first distributed data storage system 110 A can receive, in its capacity as the leader node in the consensus protocol, a request from the client 130 to execute a transaction modifying at least a portion of the first data partition 170 A and/or the second data partition 170 B stored across the distributed trust data storage system 100 ( 402 ).
- a request from the client 130 to execute a transaction modifying at least a portion of first data partition 170 A and/or the second data partition 170 B can be routed to the first distributed data storage system 110 A (or another distributed data storage system) acting as the leader node in a consensus protocol.
- the first distributed data storage system 110 A can respond to the request by sending, to one or more follower nodes in the consensus protocol storing replicas of the first data partition 170 A and/or the second data partition 170 B, a log entry corresponding to the transaction to modify at least the portion of the first data partition 170 A and/or the second data partition 170 B ( 404 ).
- the first distributed data storage system 110 A can send the first log entry 310 A corresponding to the first transaction 322 A to one or more follower nodes participating in the consensus protocol including, for example, the second distributed data storage system 110 B and/or the third distributed data storage system 110 C.
- the first distributed data storage system 110 A can identify, based on a topology of the distributed trust data storage system 100 stored at a topology manager, the second distributed data storage system 110 B and/or the third distributed data storage system 110 C as storing replicas of the first data partition 170 A and/or the second data partition 170 B.
- the first distributed data storage system 110 A can determine whether a threshold quantity of the one or more follower nodes reached a consensus by at least determining the same cryptographic hash for the transaction ( 405 ).
- the cryptographic hash for a transaction can be determined based on the transaction as well as the cryptographic hashes of one or more previous transactions that modified the same data records. For instance, as shown in FIG.
- the cryptographic hash for the first transaction 322 A can be determined based on the first transaction 322 A as well as the second log entry 310 B (e.g., the second payload 320 B) corresponding to the second transaction 322 B and/or the third log entry 310 C (e.g., the third payload 320 C) corresponding to the third transaction 322 C.
- the second log entry 310 B and/or the third log entry 310 C can be part of a mesh of transactions (e.g., the mesh of transactions 300 ) interlinking multiple transaction logs such as, for example, the first transaction log 115 A, the second transaction log 115 B, and/or the like.
- each of the second distributed data storage system 110 B and/or the third distributed data storage system 110 C can determine a cryptographic hash for the first transaction 322 A based on the first transaction 322 A and one or more previous transactions (e.g., the second transaction 322 B corresponding to the second log entry 310 B, the third transaction 322 C corresponding to the third log entry 310 C, and/or the like).
- the one or more previous transactions can be identified by at least traversing the mesh of transactions 300 interlinking the first transaction log 115 A, the second transaction log 115 B, and/or the like.
- the first transaction 322 A can be verified if a threshold quantity (e.g.,
- k 2 or a different quantity) of the k quantity of distributed data storage systems storing replicas of the first data partition 170 A and/or the second data partition 170 B can reach a consensus by determining a same cryptographic hash for the first transaction 322 A.
- the first distributed data storage system 110 A determines that a threshold quantity of the one or more follower nodes did not reach a consensus by determining a same cryptographic hash for the transaction ( 405 -N).
- the first distributed data storage system 110 A can send an error indication to the client 130 ( 406 ). For example, the first distributed data storage system 110 A can determine that the transaction requested by the client 130 has failed if fewer than the threshold quantity (e.g.,
- the threshold quantity (e.g., k 2 or a different quantity) of the k quantity of distributed data storage systems storing replicas of the first data partition 170 A and/or the second data partition 170 B determined a same cryptographic hash for the first transaction 322 A.
- the threshold quantity e.g., k 2 or a different quantity
- k 2 or a different quantity can include and/or exclude the first distributed data storage system 110 A which, as noted, can be acting as the leader node in the consensus protocol.
- the first distributed data storage system 110 A can determine that a threshold quantity of the one or more follower nodes did reach a consensus by determining a same cryptographic hash for the transaction ( 405 -Y). As such, the first distributed data storage system 110 A can commit the transaction by at least sending, to the one or more follower nodes in the consensus protocol storing replicas of the first data partition 170 A and/or the second data partition 170 B, an indication to add a log entry corresponding to the transaction to a transaction log stored at each of the one or more follower nodes ( 408 ).
- the first distributed data storage system 110 A can send, to the client 130 , an indication that the transaction was successfully performed on at least the portion of the first data partition 170 A and/or the second data partition 170 B ( 410 ). For example, if the threshold quantity (e.g.,
- the first transaction 322 A can be committed by at least adding, to the first transaction log 115 A and/or the second transaction log 115 B, the corresponding first log entry 310 A.
- the first distributed data storage system 110 A can send, to the follower nodes participating in the consensus protocol, an indication to add the first log entry 310 A to the replicas of the first transaction log 115 A and/or the second transaction log 115 B stored at each of the follower nodes including, for example, the second distributed data storage system 110 B and/or the third distributed data storage system 110 C.
- the first transaction log 115 A and the second transaction log 115 B can be interlinked by the mesh of transactions 300 , which includes an entire sequence of transactions modifying the first data partition 170 A and/or the second data partition 170 B, up to the first transaction 322 A corresponding to the first log entry 310 A.
- This sequence of transactions can be immutable due to the prohibitively large quantity of computing resources required to modify any of the log entries in any of the transaction logs (e.g., the first transaction log 115 A, the second transaction log 115 B, and/or the like) interlinked by the mesh of transactions 300 .
- the first log entry 310 A, the second log entry 310 B, and/or the third log entry 310 C in the first transaction log 115 A and/or the second transaction log 115 B can require a computation of the cryptographic hashes of all subsequent transactions recorded in every transaction log that is interlinked by the mesh of transactions including, for example, the first transaction log 115 A, the second transaction log 115 B, and/or the like.
- FIG. 4B depicts a flowchart illustrating a process 450 for performing an operation to read a data partition stored in a distributed trust data storage system consistent with some implementations of the current subject matter.
- the process 450 can be performed by the client 130 to read at least a portion of the first data partition 170 A and/or the second data partition 170 B stored across the distributed trust data storage system 100 .
- replicas of the first data partition 170 A and/or the second data partition 170 B can be stored at different distributed data storage systems within the distributed trust data storage system 100 such as, for example, at a k quantity of distributed data storage systems that includes the first distributed data storage system 110 A, the second distributed data storage system 110 B, and/or the third distributed data storage system 110 C. Accordingly, in some implementations of the current subject matter, the process 450 can be performed in order for the client 130 to verify the responses from one or more of the k quantity of distributed data storage systems storing replicas of the first data partition 170 A and/or the second data partition 170 B.
- the client 130 can send, to the distributed trust data storage system 100 , a request to read the first data partition 170 A and/or the second data partition 170 B that includes querying one or more distributed data storage systems within the distributed trust data storage system 100 that are storing replicas of the first data partition 170 A and/or the second data partition 170 B ( 452 ).
- the client 130 can send, to the distributed trust data storage system 100 , a request to read the first data partition 170 A and/or the second data partition 170 B.
- Reading the first data partition 170 A and/or the second data partition 170 B can require determining a latest state of one or more portions of the first data partition 170 A and/or the second data partition 170 B as well as a history of changes to these portions of the first data partition 170 A and/or the second data partition 170 B.
- the latest state of a portion of the first data partition 170 and/or the second data partition 170 B as well as the history of the changes to that portion of the first data partition 170 A and/or the second data partition 170 B can be determined by at least querying, for example, the first transaction log 115 A and/or the second transaction log 115 B recording the transactions that modified the first data partition 170 A and/or the second data partition 170 B.
- the request to read the first data partition 170 A and/or the second data partition 170 B can require querying each of the one or more of the k quantity of distributed data storage systems storing replicas of the first data partition 170 A and/or the second data partition 170 B.
- the request to read the first data partition 170 A and/or the second data partition 170 B can require querying at least a
- the client 130 can receive, from the distributed trust data storage system 100 , a response to the request to read the data partition 170 that includes one or more results from querying the one or more distributed data storage systems storing replicas of the first data partition 170 A and/or the second data partition 170 B to at least traverse the first transaction log 115 A and/or the second transaction log 115 B ( 454 ).
- the client 130 can receive multiple results from each of the
- k 2 quantity of distributed data storage systems that includes, for example, one or more of the first log entry 310 A, the second log entry 310 B, and/or the third log entry 310 B from the replicas of the first transaction log 115 A and/or the second transaction log 115 B stored at each of the
- the client 130 can determine to reject the response based at least on one or more discrepancies in the one or more results ( 456 ). In some implementations of the current subject matter, the client 130 can verify the results received from each of the
- the client 130 can further reject the response to the request to read the first data partition 170 A and/or the second data partition 170 B if one or more discrepancies are present in the results received from the
- k 2 quantity or distributed data storage systems can indicate a presence of at least one forged transaction.
- a discrepancy can include, for example, a mismatch in the results received from two or more of the
- the client 130 can determine that the first transaction 322 A and/or the second transaction 322 B are forged based at least on a mismatch between the value of first key-value pairs 324 A and the cryptographic hash of the second log entry 310 B for the second transaction 322 B referenced by the second globally unique identifier 325 B.
- the client 130 can determine that the second transaction 322 B and/or the third transaction 322 C are forged based at least on a mismatch between the value of second key-value pairs 324 B and the cryptographic hash of the third log entry 310 C for the third transaction 310 C referenced by the third globally unique identifier 325 C.
- FIG. 5 depicts a block diagram illustrating a computing system 500 consistent with implementations of the current subject matter.
- the computing system 500 can be used to implement the distributed trust data storage system 100 and/or any components therein.
- the computing system 500 can include a processor 510 , a memory 520 , a storage device 530 , and input/output devices 540 .
- the processor 510 , the memory 520 , the storage device 530 , and the input/output devices 540 can be interconnected via a system bus 550 .
- the processor 510 is capable of processing instructions for execution within the computing system 500 . Such executed instructions can implement one or more components of, for example, distributed trust data storage system 100 .
- the processor 510 can be a single-threaded processor. Alternately, the processor 510 can be a multi-threaded processor.
- the processor 510 is capable of processing instructions stored in the memory 520 and/or on the storage device 530 to display graphical information for a user interface provided via the input/output device 540 .
- the memory 520 is a computer readable medium such as volatile or non-volatile that stores information within the computing system 500 .
- the memory 520 can store data structures representing configuration object databases, for example.
- the storage device 530 is capable of providing persistent storage for the computing system 500 .
- the storage device 530 can be a floppy disk device, a hard disk device, an optical disk device, a tape device, a solid-state device, and/or other suitable persistent storage means.
- the input/output device 540 provides input/output operations for the computing system 500 .
- the input/output device 540 includes a keyboard and/or pointing device.
- the input/output device 540 includes a display unit for displaying graphical user interfaces.
- the input/output device 540 can provide input/output operations for a network device.
- the input/output device 540 can include Ethernet ports or other networking ports to communicate with one or more wired and/or wireless networks (e.g., a local area network (LAN), a wide area network (WAN), the Internet).
- LAN local area network
- WAN wide area network
- the Internet the Internet
- the computing system 500 can be used to execute various interactive computer software applications that can be used for organization, analysis and/or storage of data in various formats.
- the computing system 500 can be used to execute any type of software applications.
- These applications can be used to perform various functionalities, e.g., planning functionalities (e.g., generating, managing, editing of spreadsheet documents, word processing documents, and/or any other objects, etc.), computing functionalities, communications functionalities, etc.
- the applications can include various add-in functionalities (e.g., SAP Integrated Business Planning as an add-in for a spreadsheet and/or other type of program) or can be standalone computing products and/or functionalities.
- the functionalities can be used to generate the user interface provided via the input/output device 540 .
- the user interface can be generated and presented to a user by the computing system 500 (e.g., on a computer screen monitor, etc.).
- One or more aspects or features of the subject matter described herein can be realized in digital electronic circuitry, integrated circuitry, specially designed ASICs, field programmable gate arrays (FPGAs) computer hardware, firmware, software, and/or combinations thereof.
- These various aspects or features can include implementation in one or more computer programs that are executable and/or interpretable on a programmable system including at least one programmable processor, which can be special or general purpose, coupled to receive data and instructions from, and to transmit data and instructions to, a storage system, at least one input device, and at least one output device.
- the programmable system or computing system may include clients and servers.
- a client and server are generally remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other.
- machine-readable medium refers to any computer program product, apparatus and/or device, such as for example magnetic discs, optical disks, memory, and Programmable Logic Devices (PLDs), used to provide machine instructions and/or data to a programmable processor, including a machine-readable medium that receives machine instructions as a machine-readable signal.
- machine-readable signal refers to any signal used to provide machine instructions and/or data to a programmable processor.
- the machine-readable medium can store such machine instructions non-transitorily, such as for example as would a non-transient solid-state memory or a magnetic hard drive or any equivalent storage medium.
- the machine-readable medium can alternatively or additionally store such machine instructions in a transient manner, such as for example, as would a processor cache or other random access memory associated with one or more physical processor cores.
- one or more aspects or features of the subject matter described herein can be implemented on a computer having a display device, such as for example a cathode ray tube (CRT) or a liquid crystal display (LCD) or a light emitting diode (LED) monitor for displaying information to the user and a keyboard and a pointing device, such as for example a mouse or a trackball, by which the user may provide input to the computer.
- a display device such as for example a cathode ray tube (CRT) or a liquid crystal display (LCD) or a light emitting diode (LED) monitor for displaying information to the user and a keyboard and a pointing device, such as for example a mouse or a trackball, by which the user may provide input to the computer.
- CTR cathode ray tube
- LCD liquid crystal display
- LED light emitting diode
- keyboard and a pointing device such as for example a mouse or a trackball
- Other kinds of devices can be used to provide
- phrases such as “at least one of” or “one or more of” may occur followed by a conjunctive list of elements or features.
- the term “and/or” may also occur in a list of two or more elements or features. Unless otherwise implicitly or explicitly contradicted by the context in which it used, such a phrase is intended to mean any of the listed elements or features individually or any of the recited elements or features in combination with any of the other recited elements or features.
- the phrases “at least one of A and B;” “one or more of A and B;” and “A and/or B” are each intended to mean “A alone, B alone, or A and B together.”
- a similar interpretation is also intended for lists including three or more items.
- the phrases “at least one of A, B, and C;” “one or more of A, B, and C;” and “A, B, and/or C” are each intended to mean “A alone, B alone, C alone, A and B together, A and C together, B and C together, or A and B and C together.”
- Use of the term “based on,” above and in the claims is intended to mean, “based at least in part on,” such that an unrecited feature or element is also permissible.
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Human Computer Interaction (AREA)
- Databases & Information Systems (AREA)
- Computer Security & Cryptography (AREA)
- Data Mining & Analysis (AREA)
- Computer Networks & Wireless Communication (AREA)
- Signal Processing (AREA)
- Computing Systems (AREA)
- Power Engineering (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
Description
quantity (or a different threshold quantity) of the distributed data storage system storing replicas of the
or a different quantity) of the k quantity of distributed data storage systems storing replicas of the data partitions affected by the transactions (e.g., the
or a different quantity) of the distributed data storage systems storing replicas of the data partitions affected by the
or a different quantity) of the follower nodes (e.g., one or more nodes of the second distribute
or a different quantity) or rollover nodes, the first distributed
or a different quantity) of follower nodes storing the data partitions affected by the
quantity of the distributed data storage systems in order to retrieve, from each of the
quantity of distributed data storage systems, one or more of the
quantity of distributed data storage systems. Any discrepancy in the results received from the
quantity of distributed data storage systems can indicate a presence of a forged transaction. These discrepancies can include a mismatch in the results received from two or more or the
quantity of distributed data storage systems.
of the remaining distributed data storage systems agreeing on a same value, then this value can be used as the corrected result of the read operation. Thus, in order to successfully forge a transaction, an attacker must be able to propagate the forged transaction to at least a quorum of distributed data storage system storing replicas of the corresponding data partition.
quanitity of distributed data storage systems based on the first key-
or a different quantity) of the k quantity of distributed data storage systems storing replicas of the
or a different quantity) of the k quantity of distributed data storage systems storing replicas of the
or a different quantity) can include and/or exclude the first distributed
or a different quantity) of the k quantity of distributed data storage systems storing replicas of the
quantity of the k quantity or distributed data storage systems in order to retrieve, from each of the
quantity of distributed data storage systems, one or more of the
quantity of distributed data storage systems.
quantity of distributed data storage systems that includes, for example, one or more of the
quantity of distributed data storage systems.
quantity of distributed data storage systems. The
quantity of distributed data storage systems. As noted, discrepancies in the results received from the
quantity or distributed data storage systems can indicate a presence of at least one forged transaction. A discrepancy can include, for example, a mismatch in the results received from two or more of the
quantity or distributed data storage systems and/or a lack of continuity between two or more log entries in the results. For instance, the
Claims (14)
Priority Applications (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| US16/222,931 US11386078B2 (en) | 2018-12-17 | 2018-12-17 | Distributed trust data storage system |
Applications Claiming Priority (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| US16/222,931 US11386078B2 (en) | 2018-12-17 | 2018-12-17 | Distributed trust data storage system |
Publications (2)
| Publication Number | Publication Date |
|---|---|
| US20200192888A1 US20200192888A1 (en) | 2020-06-18 |
| US11386078B2 true US11386078B2 (en) | 2022-07-12 |
Family
ID=71071627
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| US16/222,931 Active 2039-08-16 US11386078B2 (en) | 2018-12-17 | 2018-12-17 | Distributed trust data storage system |
Country Status (1)
| Country | Link |
|---|---|
| US (1) | US11386078B2 (en) |
Cited By (1)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US20220043830A1 (en) * | 2016-04-18 | 2022-02-10 | Amazon Technologies, Inc. | Versioned hierarchical data structures in a distributed data store |
Families Citing this family (4)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US11086840B2 (en) | 2018-12-07 | 2021-08-10 | Snowflake Inc. | Transactional streaming of change tracking data |
| US11423013B2 (en) * | 2019-05-30 | 2022-08-23 | Ebay Inc. | Transactions on non-transactional database |
| US11243820B1 (en) * | 2021-04-30 | 2022-02-08 | Snowflake Inc. | Distributed deadlock detection and resolution in distributed databases |
| US12222964B2 (en) * | 2022-04-28 | 2025-02-11 | Snowflake Inc. | Database processing using hybrid key-value tables |
Citations (34)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US5878431A (en) * | 1996-10-04 | 1999-03-02 | Hewlett-Packard Company | Method and apparatus for providing topology based enterprise management services |
| US20090150566A1 (en) * | 2007-12-07 | 2009-06-11 | Microsoft Corporation | Virtually synchronous paxos |
| US20100293140A1 (en) * | 2009-05-18 | 2010-11-18 | Shuhei Nishiyama | Distributed database system by sharing or replicating the meta information on memory caches |
| US8032498B1 (en) * | 2009-06-29 | 2011-10-04 | Emc Corporation | Delegated reference count base file versioning |
| US20140289358A1 (en) * | 2010-12-17 | 2014-09-25 | Facebook, Inc. | Distributed Storage System |
| US20140298034A1 (en) * | 2011-10-14 | 2014-10-02 | Hitachi, Ltd. | Data authenticity assurance method, management computer, and storage medium |
| US20150254264A1 (en) * | 2013-12-30 | 2015-09-10 | Huawei Technologies Co., Ltd. | Method for Recording Transaction Log, and Database Engine |
| US20160105471A1 (en) | 2014-10-14 | 2016-04-14 | Midokura Sarl | System and method for distributed flow state p2p setup in virtual networks |
| US20160308968A1 (en) | 2015-04-14 | 2016-10-20 | E8 Storage Systems Ltd. | Lockless distributed redundant storage and nvram cache in a highly-distributed shared topology with direct memory access capable interconnect |
| US20170124169A1 (en) * | 2015-10-30 | 2017-05-04 | Intuit Inc. | Managing synchronization issues between profile stores and sources of truth |
| US20170295061A1 (en) * | 2007-12-14 | 2017-10-12 | Nant Holdings Ip, Llc | Hybrid Transport - Application Network Fabric Apparatus |
| US9800517B1 (en) * | 2013-10-31 | 2017-10-24 | Neil Anderson | Secure distributed computing using containers |
| US20170364700A1 (en) * | 2015-06-02 | 2017-12-21 | ALTR Solutions, Inc. | Immutable logging of access requests to distributed file systems |
| US20170366353A1 (en) * | 2015-06-02 | 2017-12-21 | ALTR Solutions, Inc. | Generation of hash values within a blockchain |
| US20170364698A1 (en) * | 2015-06-02 | 2017-12-21 | ALTR Solutions, Inc. | Fragmenting data for the purposes of persistent storage across multiple immutable data structures |
| US20170364273A1 (en) * | 2016-06-16 | 2017-12-21 | Sap Se | Consensus protocol enhancements for supporting flexible durability options |
| US9875510B1 (en) * | 2015-02-03 | 2018-01-23 | Lance Kasper | Consensus system for tracking peer-to-peer digital records |
| US20180139278A1 (en) * | 2016-11-14 | 2018-05-17 | International Business Machines Corporation | Decentralized immutable storage blockchain configuration |
| US20180150230A1 (en) * | 2016-11-29 | 2018-05-31 | Sap Se | State machine abstraction for log-based consensus protocols |
| US20180165476A1 (en) * | 2016-12-09 | 2018-06-14 | International Business Machines Corporation | Interlocked blockchains to increase blockchain security |
| US20180335955A1 (en) * | 2017-05-22 | 2018-11-22 | Sap Se | Processing large requests in data storage systems with limited/constant buffer sizes |
| US10200197B1 (en) * | 2017-12-18 | 2019-02-05 | Nec Corporation | Scalable crash fault-tolerance consensus protocol with efficient message aggregation |
| US20190081793A1 (en) * | 2017-09-12 | 2019-03-14 | Kadena, LLC | Parallel-chain architecture for blockchain systems |
| US20190179939A1 (en) * | 2017-12-11 | 2019-06-13 | International Business Machines Corporation | Distributed database having blockchain attributes |
| US20190199515A1 (en) * | 2017-12-26 | 2019-06-27 | Akamai Technologies, Inc. | Concurrent transaction processing in a high performance distributed system of record |
| US10348817B2 (en) * | 2017-05-22 | 2019-07-09 | Sap Se | Optimizing latency and/or bandwidth of large client requests for replicated state machines |
| US20190258991A1 (en) * | 2018-02-22 | 2019-08-22 | Idlogiq Inc. | System and methods for querying the distribution path of product units within a supply chain |
| US10552069B2 (en) * | 2017-07-07 | 2020-02-04 | Sap Se | Caching the topology of a distributed data storage system |
| US20200050386A1 (en) * | 2018-08-07 | 2020-02-13 | International Business Machines Corporation | Private and fault-tolerant storage of segmented data |
| US10579974B1 (en) * | 2015-02-16 | 2020-03-03 | AI Coin Inc. | Systems, methods, and program products for a distributed digital asset network with rapid transaction settlements |
| US20200160336A1 (en) * | 2017-03-24 | 2020-05-21 | Alibaba Group Holding Limited | Method and apparatus for consensus verification |
| US20200169412A1 (en) * | 2018-11-26 | 2020-05-28 | Amazon Technologies, Inc. | Cryptographic verification of database transactions |
| US20200186355A1 (en) * | 2016-07-08 | 2020-06-11 | Kalypton International Limited | Distributed transaction processing and authentication system |
| US10868673B2 (en) * | 2017-09-25 | 2020-12-15 | Sap Se | Network access control based on distributed ledger |
-
2018
- 2018-12-17 US US16/222,931 patent/US11386078B2/en active Active
Patent Citations (36)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US5878431A (en) * | 1996-10-04 | 1999-03-02 | Hewlett-Packard Company | Method and apparatus for providing topology based enterprise management services |
| US20090150566A1 (en) * | 2007-12-07 | 2009-06-11 | Microsoft Corporation | Virtually synchronous paxos |
| US7849223B2 (en) | 2007-12-07 | 2010-12-07 | Microsoft Corporation | Virtually synchronous Paxos |
| US20170295061A1 (en) * | 2007-12-14 | 2017-10-12 | Nant Holdings Ip, Llc | Hybrid Transport - Application Network Fabric Apparatus |
| US20100293140A1 (en) * | 2009-05-18 | 2010-11-18 | Shuhei Nishiyama | Distributed database system by sharing or replicating the meta information on memory caches |
| US8032498B1 (en) * | 2009-06-29 | 2011-10-04 | Emc Corporation | Delegated reference count base file versioning |
| US20140289358A1 (en) * | 2010-12-17 | 2014-09-25 | Facebook, Inc. | Distributed Storage System |
| US20140298034A1 (en) * | 2011-10-14 | 2014-10-02 | Hitachi, Ltd. | Data authenticity assurance method, management computer, and storage medium |
| US9800517B1 (en) * | 2013-10-31 | 2017-10-24 | Neil Anderson | Secure distributed computing using containers |
| US20150254264A1 (en) * | 2013-12-30 | 2015-09-10 | Huawei Technologies Co., Ltd. | Method for Recording Transaction Log, and Database Engine |
| US20160105471A1 (en) | 2014-10-14 | 2016-04-14 | Midokura Sarl | System and method for distributed flow state p2p setup in virtual networks |
| US9875510B1 (en) * | 2015-02-03 | 2018-01-23 | Lance Kasper | Consensus system for tracking peer-to-peer digital records |
| US10579974B1 (en) * | 2015-02-16 | 2020-03-03 | AI Coin Inc. | Systems, methods, and program products for a distributed digital asset network with rapid transaction settlements |
| US20160308968A1 (en) | 2015-04-14 | 2016-10-20 | E8 Storage Systems Ltd. | Lockless distributed redundant storage and nvram cache in a highly-distributed shared topology with direct memory access capable interconnect |
| US20170364698A1 (en) * | 2015-06-02 | 2017-12-21 | ALTR Solutions, Inc. | Fragmenting data for the purposes of persistent storage across multiple immutable data structures |
| US20170366353A1 (en) * | 2015-06-02 | 2017-12-21 | ALTR Solutions, Inc. | Generation of hash values within a blockchain |
| US20170364700A1 (en) * | 2015-06-02 | 2017-12-21 | ALTR Solutions, Inc. | Immutable logging of access requests to distributed file systems |
| US20170124169A1 (en) * | 2015-10-30 | 2017-05-04 | Intuit Inc. | Managing synchronization issues between profile stores and sources of truth |
| US20170364273A1 (en) * | 2016-06-16 | 2017-12-21 | Sap Se | Consensus protocol enhancements for supporting flexible durability options |
| US20200186355A1 (en) * | 2016-07-08 | 2020-06-11 | Kalypton International Limited | Distributed transaction processing and authentication system |
| US20180139278A1 (en) * | 2016-11-14 | 2018-05-17 | International Business Machines Corporation | Decentralized immutable storage blockchain configuration |
| US20180150230A1 (en) * | 2016-11-29 | 2018-05-31 | Sap Se | State machine abstraction for log-based consensus protocols |
| US20180165476A1 (en) * | 2016-12-09 | 2018-06-14 | International Business Machines Corporation | Interlocked blockchains to increase blockchain security |
| US20200160336A1 (en) * | 2017-03-24 | 2020-05-21 | Alibaba Group Holding Limited | Method and apparatus for consensus verification |
| US20180335955A1 (en) * | 2017-05-22 | 2018-11-22 | Sap Se | Processing large requests in data storage systems with limited/constant buffer sizes |
| US10348817B2 (en) * | 2017-05-22 | 2019-07-09 | Sap Se | Optimizing latency and/or bandwidth of large client requests for replicated state machines |
| US10788998B2 (en) * | 2017-07-07 | 2020-09-29 | Sap Se | Logging changes to data stored in distributed data storage system |
| US10552069B2 (en) * | 2017-07-07 | 2020-02-04 | Sap Se | Caching the topology of a distributed data storage system |
| US20190081793A1 (en) * | 2017-09-12 | 2019-03-14 | Kadena, LLC | Parallel-chain architecture for blockchain systems |
| US10868673B2 (en) * | 2017-09-25 | 2020-12-15 | Sap Se | Network access control based on distributed ledger |
| US20190179939A1 (en) * | 2017-12-11 | 2019-06-13 | International Business Machines Corporation | Distributed database having blockchain attributes |
| US10200197B1 (en) * | 2017-12-18 | 2019-02-05 | Nec Corporation | Scalable crash fault-tolerance consensus protocol with efficient message aggregation |
| US20190199515A1 (en) * | 2017-12-26 | 2019-06-27 | Akamai Technologies, Inc. | Concurrent transaction processing in a high performance distributed system of record |
| US20190258991A1 (en) * | 2018-02-22 | 2019-08-22 | Idlogiq Inc. | System and methods for querying the distribution path of product units within a supply chain |
| US20200050386A1 (en) * | 2018-08-07 | 2020-02-13 | International Business Machines Corporation | Private and fault-tolerant storage of segmented data |
| US20200169412A1 (en) * | 2018-11-26 | 2020-05-28 | Amazon Technologies, Inc. | Cryptographic verification of database transactions |
Cited By (2)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US20220043830A1 (en) * | 2016-04-18 | 2022-02-10 | Amazon Technologies, Inc. | Versioned hierarchical data structures in a distributed data store |
| US12174854B2 (en) * | 2016-04-18 | 2024-12-24 | Amazon Technologies, Inc. | Versioned hierarchical data structures in a distributed data store |
Also Published As
| Publication number | Publication date |
|---|---|
| US20200192888A1 (en) | 2020-06-18 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| US10754562B2 (en) | Key value based block device | |
| US11386078B2 (en) | Distributed trust data storage system | |
| Siddiqa et al. | Big data storage technologies: a survey | |
| Chandra | BASE analysis of NoSQL database | |
| US9830372B2 (en) | Scalable coordination aware static partitioning for database replication | |
| US10296498B2 (en) | Coordinated hash table indexes to facilitate reducing database reconfiguration time | |
| US10546021B2 (en) | Adjacency structures for executing graph algorithms in a relational database | |
| JP2023546249A (en) | Transaction processing methods, devices, computer equipment and computer programs | |
| EP3120261B1 (en) | Dependency-aware transaction batching for data replication | |
| US20090012932A1 (en) | Method and System For Data Storage And Management | |
| US10528262B1 (en) | Replication-based federation of scalable data across multiple sites | |
| US10176210B2 (en) | System and method for minimizing lock contention | |
| US10180812B2 (en) | Consensus protocol enhancements for supporting flexible durability options | |
| US20160283331A1 (en) | Pooling work across multiple transactions for reducing contention in operational analytics systems | |
| AU2016271618A1 (en) | Disconnected operation within distributed database systems | |
| US20210326359A1 (en) | Compare processing using replication log-injected compare records in a replication environment | |
| Merceedi et al. | A comprehensive survey for hadoop distributed file system | |
| Jayasekara et al. | Optimizing checkpoint‐based fault‐tolerance in distributed stream processing systems: Theory to practice | |
| US11531595B2 (en) | Non-blocking secondary reads | |
| US11048728B2 (en) | Dependent object analysis | |
| US20130132400A1 (en) | Incremental context accumulating systems with information co-location for high performance and real-time decisioning systems | |
| US8862544B2 (en) | Grid based replication | |
| US10353920B2 (en) | Efficient mirror data re-sync | |
| US11947994B2 (en) | Adaptive hardware transactional memory based concurrency control | |
| US10649976B1 (en) | Using a global sequence number for replicating mutating data |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| FEPP | Fee payment procedure |
Free format text: ENTITY STATUS SET TO UNDISCOUNTED (ORIGINAL EVENT CODE: BIG.); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY |
|
| AS | Assignment |
Owner name: SAP SE, GERMANY Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:SCHRETER, IVAN;REEL/FRAME:048417/0904 Effective date: 20181214 |
|
| STPP | Information on status: patent application and granting procedure in general |
Free format text: RESPONSE TO NON-FINAL OFFICE ACTION ENTERED AND FORWARDED TO EXAMINER |
|
| STPP | Information on status: patent application and granting procedure in general |
Free format text: FINAL REJECTION MAILED |
|
| STPP | Information on status: patent application and granting procedure in general |
Free format text: DOCKETED NEW CASE - READY FOR EXAMINATION |
|
| STPP | Information on status: patent application and granting procedure in general |
Free format text: NON FINAL ACTION MAILED |
|
| STPP | Information on status: patent application and granting procedure in general |
Free format text: RESPONSE TO NON-FINAL OFFICE ACTION ENTERED AND FORWARDED TO EXAMINER |
|
| STPP | Information on status: patent application and granting procedure in general |
Free format text: NOTICE OF ALLOWANCE MAILED -- APPLICATION RECEIVED IN OFFICE OF PUBLICATIONS |
|
| STCF | Information on status: patent grant |
Free format text: PATENTED CASE |