WO2023043736A1 - System and apparatus of secure transaction processing and data stores - Google Patents

System and apparatus of secure transaction processing and data stores Download PDF

Info

Publication number
WO2023043736A1
WO2023043736A1 PCT/US2022/043364 US2022043364W WO2023043736A1 WO 2023043736 A1 WO2023043736 A1 WO 2023043736A1 US 2022043364 W US2022043364 W US 2022043364W WO 2023043736 A1 WO2023043736 A1 WO 2023043736A1
Authority
WO
WIPO (PCT)
Prior art keywords
transaction
database
data
transaction data
tuple
Prior art date
Application number
PCT/US2022/043364
Other languages
French (fr)
Inventor
Justin Y. SHI
Original Assignee
Temple University- Of The Commonwealth System Of Higher Education
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Temple University- Of The Commonwealth System Of Higher Education filed Critical Temple University- Of The Commonwealth System Of Higher Education
Publication of WO2023043736A1 publication Critical patent/WO2023043736A1/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/60Protecting data
    • G06F21/64Protecting data integrity, e.g. using checksums, certificates or signatures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/27Replication, distribution or synchronisation of data between databases or within a distributed database system; Distributed database system architectures therefor
    • G06F16/278Data partitioning, e.g. horizontal or vertical partitioning

Definitions

  • the blockchain protocols leverage secure and non-stop computing in large-scale systems over the Internet, without dedicated trusted servers.
  • the blockchain protocol uses traditional raw hop-to-hop data communication protocols (e.g., TCP/IP sockets and RPC (remote procedure call)).
  • TCP/IP sockets and RPC remote procedure call
  • An issue arises when a system attains scalability and reliability while sacrificing throughput.
  • developing a technology that better meets scalability requirements while minimizing the trade-offs between performance and reliability would be desirable in high- performance computing environments and parallel processing environments.
  • ACAN Active Content Addressable Networking
  • This present disclosure relates to a virtual network storage for a high-performance, scalable transaction system.
  • An issue arises as transaction systems scale because the systems tend to sacrifice performance.
  • high-performance systems and traditional databases
  • suffer scalability challenges because adding database servers for processing larger workloads needs to make a priority between performance and reliability.
  • the disclosed technology addresses the issue by providing a virtual data storage including gateway servers and database servers.
  • Respective database servers manage partitions of a database in a replicated data stores in the database servers.
  • Each gateway server drives multiple database servers.
  • the gateway servers collectively form a unidirectional virtual ring (UVR).
  • UVR virtual ring
  • the virtual network storage has no receiver “address” or “hop.”
  • IP addresses associated with nodes or gateway servers are invisible from client applications.
  • An ACAN application includes a UVR of participating gateway servers. The UVR is formed when the ACAN application runs.
  • the disclosed technology provides a transaction pool of crypto transactions (e.g., a mempool of Bitcoin) as an ACAN application.
  • the virtual database provides scalability without loss of performance, reliability, or security as data grows in the ACAN.
  • the disclosed technology uses the phrase “service scalability” as an ability to expand processor/network/storage, physical and virtual, for handling ever-increasing workloads without loss in processing performance, reliability, and security.
  • FIG. 1 illustrates an overview of an example system for a virtual database in accordance to aspects of the present disclosure.
  • FIG. 2 illustrates an example system for storing data in unidirectional virtual ring with shift-mirrored data partitions in accordance with aspects of the present disclosure.
  • FIGS. 3A-B illustrate example of data structures and commands in accordance with aspects of the present disclosure.
  • FIG. 4 illustrates an example of an ACAN application as a processing pipeline for processing crypto-currency transaction based on blockchain in accordance with aspects of the present disclosure.
  • FIG. 5 illustrates an example time line of using ACAN for processing blockchain transactions in accordance with aspects of the present disclosure.
  • FIGS. 6A, 6B and 6C illustrate an example of a method for processing transactions in crypto-currency based on blockchain in accordance with aspects of the present disclosure.
  • FIG. 7 is a simplified diagram of a computing device with which aspects of the present disclosure may be practiced.
  • FIG. 8 illustrates a Unidirectional Virtual Ring.
  • FIG. 9 illustrates an ACAN database transaction processing network.
  • FIG. 10 illustrates a Parallel Block Processing Pipeline for blockchain applications.
  • FIG. 11 is a graph of a speedup under Amdahl’s Law.
  • FIGS. 12A and 12B illustrate the differences between Amdahl’s and Gustafson’s formulations.
  • FIG. 13 is a flowchart illustrating translating parallel measurement for Amdahl’s Law.
  • FIG. 14 is a graph illustrating Amdahl’s Law with a fixed problem size.
  • FIG. 15 is a graph illustrating Amdahl’s Law with an open problem size.
  • FIG. 16 is a graph illustrating the hidden behavior of Amdahl’s Speedup Bound.
  • FIG. 17. illustrates a conceptual CAN.
  • FIG. 18 illustrates an AC AN wrapper.
  • FIG. 19 is a graph illustrating wrapped against native MPI.
  • FIG. 20 is a graph illustrating wrapped against native MPI for test 2.
  • FIG. 21 is an illustration of a transaction replication switch.
  • FIG. 22 is an illustration of the scaling of transaction switches.
  • FIG. 23 is a table illustrating TPC-E benchmark results.
  • FIG. 24 is a graph illustrating hiding replication overheads.
  • FIG. 25 is a graph depicting read/write AC AN against HDFS.
  • FIG. 26 is a graph illustrating worst-case write performance.
  • the basic software safety requirement is zero single-point failure for all mission critical services. To date, no enterprise services can meet this basic requirement. The traditional software development methodologies tend to assume 100% hardware reliability. The reliability, security, and scalability of the resulting services are the “after thoughts.” Attempted remedies fall short of meeting the basic software safety requirements.
  • the blockchain protocol is a high-level one-sided communication protocol built using raw hop-to-hop data communication protocols, such as TCP/IP sockets and RPC (remote procedure call).
  • the blockchain protocol constructs a virtual ledger processor powered by participating nodes without assuming reliable networks and trusted servers. Blockchain network has no single point failures. Participating nodes are rewarded with digital tokens for contributing transaction-processing efforts. This seemingly revolutionary feature demonstrated the feasibility of non-stop lossless data services that enabled crypto-currencies and other applications where the traditional trusted database servers fall short.
  • the blockchain transaction throughput is relatively low, estimated 4-7 transactions per second.
  • the protocol requires a single public ledger for all transactions from the “genesis block,” and all transactions must be processed and stored on all nodes without central authority but network-wide consensus. Different consensus algorithms have been developed trying to increase the transaction throughput. To date, all proposals introduced different risk factors that can weaken the integrity of the virtual ledger processor.
  • the blockchain network builds immutable digital assets that are not directly accessible by ordinary users via the Internet.
  • the off-chain networks and transaction processing “exchanges” help bringing the blockchain network to the ordinary users.
  • the crypto transaction processing exchanges deploy traditional databases or private blockchain for higher transaction processing throughputs.
  • the ultimate transaction settlements are done on the public blockchain. These exchanges suffer the scaling challenges in databases and both public blockchain and private blockchain.
  • the exchange security appears to be the most vulnerable.
  • the present disclosure relates to a virtual data storage that sustains scalability requirements of process-intensive and data intensive transactional applications including blockchain applications.
  • Transaction processing is typically done by database systems, SQL or Non-SQL databases.
  • the disclosed technology 1) decouples programs from processors and networks, b) enables statistic multiplexing of hardware components, and c) requires client-side timeout/retransmission discipline to enable the potentially infinite scaling.
  • the disclosed technology effectively provides a paradigm shift from client- server computing to complete “server less” computing.
  • FIG. 1 illustrates an overview of an example system of virtual database in accordance to aspects of the present disclosure.
  • System 100 may represent a system in the cloud or onpremises data center.
  • the system 100 includes a client application 102, a network 136, and active content addressable network (ACAN) 150.
  • ACAN 150 provides a virtual data storage where a virtual database includes multiple partitions.
  • Database gateway replicates and maintains shifted mirroring of respective database partitions.
  • ACAN 150 is formed on gateway servers A-F (104A-F), the gateways drive multiple database servers A-F (110A-F).
  • the gateway network forms a unidirectional virtual ring 120.
  • the system 100 represents a virtual database including six partitions, each partition managed by a gateway server.
  • each of six gateway servers 104A-F connects to two database servers.
  • the gateway server A 104A connects to the database server A 110A and the database server B HOB.
  • the gateway server B 104B connects to the database server B HOB and the database server C HOC.
  • the gateway server C 104C connects to the database server C HOC and the database server D 110D.
  • the gateway server D 104D connects to the database server D 110D and the database server E 110E.
  • the gateway server E 104E connects to the database server E 110E and the database server F 110F.
  • the gateway server F 104F connects to the database server F 110F and the database server A 104A.
  • a unidirectional virtual ring 120 forms in a counterclockwise direction (i.e., from the gateway server A 104A toward the gateway server B 104B).
  • each gateway server manages a database partition by driving two synchronous database servers in a shifted mirror manner.
  • the database server A 110A also stores a replicated database F 110F, where the database server B 110B also stores a mirrored image of database A 104A.
  • each gateway server has one or more backup servers configured on the DNS on automatically failover when the primary gateway server becomes unresponsive.
  • the ACAN 150 represents an ACAN application.
  • a number of partitions and a number of replications for respective partitions are not limited to the example system 100.
  • the ACAN database network is a network transaction processing gateways.
  • the ACAN network may deliver scalable transactional processing performances without reliability or security degradation.
  • each partition has K replicas in a shifted mirroring fashion on UVR, as long as P » K, the AC AN transaction network may deliver increasing performances without negative reliability and security impacts.
  • a compute-intensive ACAN application includes a client (master) that creates tasks to be processed and a solver program runnable on any number of nodes (workers).
  • a client master
  • a solver program runnable on any number of nodes (workers).
  • the ACAN application runs, its runtime tuple matching system will automatically form types of parallel processing including single instruction, multiple data (SIMD) and multiple instruction, multiple data (MIMD).
  • SIMD single instruction, multiple data
  • MIMD multiple instruction, multiple data
  • Pipelined parallel processing clusters leverages the UVR as the common data plane. Available processors and networks may process the tasks in parallel.
  • the client program has a retransmission protocol that will resend a task when the processing times out.
  • a critical difference between the ACAN parallel application and legacy parallel applications is the inclusion of the timeout/retransmission protocol in the clients and the hardware statistic multiplexing effects enabled by the ACAN protocol.
  • the client timeout/retransmission protocol will automatically recover all network and processor failures while allowing the best-effort performance of the entire processing platform.
  • the ACAN parallel application will eventually terminate regardless partial network and node failures. Tuning the task size (granularity) can optimize the parallel performance according to the Brachistochrone Curve, an equilibrium of computing and communication overheads solving problems of different sizes.
  • the ACAN application will enter an infinite wait cycle when all nodes and networks fail at the same time.
  • ACAN enables decentralized processing for traditionally centralized parallel processing applications.
  • Security protocol processing can also be parallelized to minimize the timing impact without reliability concerns.
  • nodes e.g., gateway servers and corresponding database servers
  • the virtual database may need additional partitions as data grows.
  • the ACAN network may add gateway servers and corresponding database servers while maintaining a smaller number of data replications.
  • the UVR may expand by including the additional gateway servers accordingly.
  • FIG. 2 illustrates an example system for storing partitions of a database in accordance with aspects of the present disclosure.
  • the system 200 is a part of an ACAN (e.g., the ACAN 150 as shown in FIG. 1.
  • the system represents a virtual database with six partitions. Two database servers for reliability replicate each partition.
  • the system 200 includes gateway server A 202A, gateway server B 202B, gateway server C 220C, database server A 204A, database server B 204B, and database server C 204C.
  • the gateway server A 202 A connects to the database server A 204A and database server B 204B.
  • the gateway server B 202B connects to the database server B 204B and the database server C 204C.
  • the gateway server C 202C connects to the database server C 204C. While not shown the gateway server C 202C connects to another database server.
  • the database server A 204A manages partition 6b 206A and partition la 206B.
  • the database server B 204B manages partition lb 208A and partition 2a 208B.
  • the database server C 204C manages partition 1c 210A and partition 3a 210B, etc.
  • each gateway server manages a partition of the virtual database by driving two synchronous database servers in shift mirroring.
  • each gateway server performs the following three tasks:
  • the database servers are forced to execute queries in the exact same order in synchrony if they update the same data. Otherwise, transactions will be replicated in wire speed, synchronizing the corresponding database servers in real time.
  • the ACAN database client Similar to the parallel application client, the ACAN database client also needs a transaction retransmission protocol when a query times out. This retransmission protocol always checks the status of the timeout transaction on the target server before re-sending the transaction. This eliminates the “double spending” risks and enables the ACAN transaction network to overcome the gateway single-point failures by automatically deploying backup gateways and any live database servers under the same public gateway IP address (resolved by a global or local DNS server).
  • FIGS. 3A-B illustrate examples of data associated with a tuple system in accordance with aspects of the present disclosure.
  • FIG. 3A illustrates an example data structure 300A of a tuple that stores transactional data of a crypto-currency (e.g., Bitcoin) based on blockchain in accordance with aspects of the present disclosure.
  • the tuple has a form including key 302 and a value 304.
  • the key 302 includes a transaction identifier (ID) 310 and R-count 312.
  • the R-count 312 represents a number of miners working on the same transaction. That is, the R-count 312 describes a number of readers (e.g., miners in Bitcoin transaction) that access the transaction data.
  • Each miner, according to the Bitcoin protocol, for example, has an asynchronous listener waiting for a verified block tuple’s broadcast.
  • Value 304 may include content of the transaction.
  • FIG. 3B illustrates protocols for access tuples.
  • the tuple space provides the following three protocols: put, get, and read.
  • Put key, value
  • Get &key, &value
  • & indicates the mutability of the parameter when a key match is found in the tuple space.
  • Read &key, &value
  • Read reads a tuple; the matching tuple remains in the tuple space after the tuple is read.
  • the three protocols may provide the following additional features when ACAN implements a transaction pool in processing a crypto-currency based on blockchain.
  • FIG. 1 illustrates a running application’s UVR.
  • FIG. 4 illustrates an example system for processing transactional blocks based on blockchain in accordance with aspects of the present disclosure.
  • System 400 includes a plurality of miners.
  • Miner A 402A, Miner B 402B, and Miner C 402C collectively represent miners that working on respective transactions associated with blocks stored in the Active Content Addressable Network 410 (e.g., the ACAN 150 as shown in FIG. 1).
  • the Active Content Addressable Network 410 represent a transaction pool in cryptocurrencies (e.g., a mempool of Bitcoin).
  • Blockchain 412 includes another set of transaction blocks (Block 406A, Block H-max-1 406B, and block H-max 406C).
  • the arrow 404 represents a block pipeline with a queue length of four as an example.
  • the system 400 represents a virtual ledger processor. Since the disclosed technology allows direct block retrieval by specifying a transaction or block ID via the ACAN, there may be no need to store the entire ledger on any one node. As more miners join the ACAN, the ledger may grow accordingly with ever increasing storage capabilities to accommodate it.
  • Transaction broadcasts are limited by R-count that only R-count copies of ledgers exist in the blockchain network.
  • the transaction blocks (transaction logs) may be randomly distributed. Each validated transaction may be packed into R-count blocks that are to be group- mined in parallel. Each node packs its own transaction blocks; the distribution of transactions may automatically be randomized.
  • R-count may be adjusted based on security requirements of the ledger. Once R-count is determined, adding miners to the network will increase the blockchain transaction throughput by forming pipelined parallel processors with fixed broadcast/block synchronization overheads. Network storage capabilities also increase incrementally.
  • the virtual ledger can now grow due to the automatic constrained block replication and distribution. All nodes will still par-take in the ledger processing and storage, but at randomized fashion, thus improving blockchain security gradually.
  • the AC-chain can scale as long as R-count « the number of total miners.
  • FIG. 5 is a timing chart of communications among users 502, a pending transaction pool 504, a miner 506, and a blockchain 508.
  • the timing chart 500 starts with users perform transaction using a crypto-currency, followed by a put operation 510 and ends with an insert operation 528.
  • the timing chart 500 may include more or fewer steps or may arrange the order of the steps differently than those shown in FIG. 5.
  • the timing chart 500 can be executed as a set of computer-executable instructions executed by a computer system and encoded or stored on a computer readable medium. Further, processing according to the timing chart 300 can be performed by gates or circuits associated with a processor, an ASIC, an FPGA, a SOC or other hardware device.
  • the timing chart 300 shall be explained with reference to the systems, component, devices, modules, software, data structures, data characteristic representations, signaling diagrams, methods, etc. described in conjunction with FIGS. 1, 2, 3A- B, 4, 6, and 7.
  • the users 502 sends a put operation 510 to the pending transaction pool 504.
  • the pending transaction pool 504 includes an ACAN-based tuple space (e.g., the ACAN 150 as shown in FIG. 1).
  • the put operation 510 may include a transaction ID and data associated with the transaction.
  • the pending transaction pool 504 stores the tuple in the virtual database.
  • a name of a tuple may be a combination of the transaction ID and R-count.
  • the R-count represents a number of miners working on the transaction. The R-count effectively limits a number of miners that can access the transaction data.
  • the pending transaction pool 504 replicates the partition(s) associated with the inserted tuple using the UVR of gateway servers and the database servers with shifted mirroring of data.
  • the miner 506 When the miner 506 generates a new block tuple with the R-count, listeners of miners associated with the transaction is notified. In aspects, the miner 506 sends a read operation 518 to the pending transaction pool 504 to read the transaction data by specifying the transaction ID. In some aspects, the read operation 518 retrieves data associated with the transaction without removing the tuple that represents the transaction. Accordingly, the pending transaction pool 504 sends the transaction data 520 to the miner 506. The Miner validates (516) the transaction data using cryptographic signatures (e.g., hash values), then assemble the validated transactions into a block (526). The miner then sends get operation (522) by specifying the transaction ID. The pending transaction pool 504 removes the tuple and sends (524) the transaction data to the miner 506. Accordingly, by insert operation (528), the miner 506 inserts transaction block to the blockchain 508.
  • cryptographic signatures e.g., hash values
  • operations 510-528 are described for purposes of illustrating timing of the present method and systems and are not intended to limit the disclosure to a particular sequence of steps, e.g., steps may be performed in different order, an additional steps may be performed, and disclosed steps may be excluded without departing from the present disclosure.
  • FIG. 6A illustrates an example of a method of processing a crypto-currency using ACAN in accordance with aspects of the present disclosure.
  • a general sequence of operations of the method 600 is shown in FIG. 6A.
  • the method 600A starts with a start operation 602 and ends with an end operation 624.
  • the method 600A may include more or fewer steps or may arrange the order of the steps differently than those shown in FIG. 6A.
  • the method 600A can be executed as a set of computer-executable instructions executed by a computer system and encoded or stored on a computer readable medium. Further, the method 600A can be performed by gates or circuits associated with a processor, an ASIC, an FPGA, a SOC or other hardware device.
  • a receive operation 604 receives transaction data by the AC AN.
  • a buyer and a seller of a transaction inserts the transaction data as a tuple of a virtual database, (e.g., the ACAN 150 as shown in FIG. 1).
  • Determine operation 608 determines a value of R-count.
  • the R-count represents a number of miners working on the same transaction. That is, the R-count describes a number of readers (e.g., miners in Bitcoin transaction) that access the transaction data.
  • the ACAN may remove the transaction tuple when R-count becomes zero.
  • Store operation 610 stores the transaction data in the ACAN.
  • the transaction data is represented by a tuple stored in a partition of the virtual database.
  • miners may ignore the R-count when the miners pack transaction blocks. The miners proceed as the original Bitcoin protocol.
  • the ACAN tuple space daemon may remove the transaction tuple when R-count reaches zero. The mechanism ensures no more than R-count miners working on the same transaction at a time. Each miner also has an asynchronous listener waiting for a verified block tuple’s broadcast from other miners.
  • Generate operation 612 generates, by a miner, a new validated block tuple with a new cryptographic signature (e.g., a hash).
  • Notify operation 614 notifies miner’s listeners for the miner to fetch the validated block.
  • blocks are randomly distributed to miners. Each transaction may only be replicated R-count times.
  • the value of R-count may be based on safety of a ledger. The value may be large enough to counter hostile takeover attempts but smaller than the total number of miners.
  • Verify operation 618 verifies, by the miner, the transaction block based on immutable cryptographic signatures (e.g., hash values).
  • immutable cryptographic signatures e.g., hash values
  • Remove operation 620 by the ACAN (i.e., the virtual database), removes the tuple that corresponds to the transaction.
  • a get operation as provided by the ACAN removes the tuple when a value of R-count is zero.
  • the transaction data as a tuple corresponds to a partition of the virtual database.
  • Store operation 622 by the miner. The miner validates the broadcast block, compares the POW (proof of work) between the local block, stores the winner POW verified transaction block in the local blockchain as the new tip. The loser block is discarded.
  • operations 602-624 are described for purposes of illustrating the present methods and systems and are not intended to limit the disclosure to a particular sequence of steps, e.g., steps may be performed in different order, an additional steps may be performed, and disclosed steps may be excluded without departing from the present disclosure.
  • FIG. 7 illustrates a simplified block diagram of a device with which aspects of the present disclosure may be practiced in accordance with aspects of the present disclosure.
  • the device may be a mobile computing device, for example.
  • One or more of the present embodiments may be implemented in an operating environment 700. This is only one example of a suitable operating environment and is not intended to suggest any limitation as to the scope of use or functionality.
  • Other well-known computing systems, environments, and/or configurations that may be suitable for use include, but are not limited to, personal computers, server computers, hand-held or laptop devices, multiprocessor systems, microprocessor-based systems, programmable consumer electronics such as smartphones, network PCs, minicomputers, mainframe computers, distributed computing environments that include any of the above systems or devices, and the like.
  • the operating environment 700 typically includes at least one processing unit 702 and memory 704.
  • memory 704 instructions to perform a cellular-communication-assisted PPV as described herein
  • memory 704 may be volatile (such as RAM), non-volatile (such as ROM, flash memory, etc.), or some combination of the two.
  • This most basic configuration is illustrated in FIG. 7 by dashed line 706.
  • the operating environment 700 may also include storage devices (removable, 708, and/or non-removable, 710) including, but not limited to, magnetic or optical disks or tape.
  • operating environment 700 may also have input device(s) 714 such as remote controller, keyboard, mouse, pen, voice input, on-board sensors, etc. and/or output device(s) 716 such as a display, speakers, printer, motors, etc. Also included in the environment may be one or more communication connections, 712, such as LAN, WAN, a nearfield communications network, a cellular broadband network, point to point, etc.
  • input device(s) 714 such as remote controller, keyboard, mouse, pen, voice input, on-board sensors, etc.
  • output device(s) 716 such as a display, speakers, printer, motors, etc.
  • Also included in the environment may be one or more communication connections, 712, such as LAN, WAN, a nearfield communications network, a cellular broadband network, point to point, etc.
  • Operating environment 700 typically includes at least some form of computer readable media.
  • Computer readable media can be any available media that can be accessed by processing unit 702 or other devices comprising the operating environment.
  • Computer storage media includes volatile and nonvolatile, removable and non-removable media implemented in any method or technology for storage of information such as computer readable instructions, data structures, program modules or other data.
  • Computer storage media includes, RAM, ROM, EEPROM, flash memory or other memory technology, CD-ROM, digital versatile disks (DVD) or other optical storage, magnetic cassettes, magnetic tape, magnetic disk storage or other magnetic storage devices, or any other tangible, non-transitory medium which can be used to store the desired information.
  • Computer storage media does not include communication media.
  • Computer storage media does not include a carrier wave or other propagated or modulated data signal.
  • Communication media embodies computer readable instructions, data structures, program modules, or other data in a modulated data signal such as a carrier wave or other transport mechanism and includes any information delivery media.
  • modulated data signal means a signal that has one or more of its characteristics set or changed in such a manner as to encode information in the signal.
  • communication media includes wired media such as a wired network or direct-wired connection, and wireless media such as acoustic, RF, infrared and other wireless media.
  • the operating environment 700 may be a single computer operating in a networked environment using logical connections to one or more remote computers.
  • the remote computer may be a personal computer, a server, a router, a network PC, a peer device or other common network node, and typically includes many or all of the elements described above as well as others not so mentioned.
  • the logical connections may include any method supported by available communications media.
  • Such networking environments are commonplace in offices, enterprise-wide computer networks, intranets and the Internet.
  • FIG. 7 illustrates an example computer system with multiple network connections. In examples, all components in FIG. 7 have non-deterministic finite service life.
  • some embodiments include computer-implemented method of storing transaction data in a database, the method comprising receiving transaction data; determining a maximum read count associated with the transaction data, wherein the maximum read count is based at least on a number entities allowed to verify a transaction associated with the transaction data, storing the transaction data in a tuple, wherein the tuple includes a tuple key, and wherein the tuple key includes a transaction identifier and the maximum read count, receiving a request for the transaction data, responsive to the receiving the request, transmitting the transaction data, and removing the transaction data.
  • One embodiment contemplates a non-transitory computer-readable medium with instructions stored thereon, that when executed by a processor, facilitate network-scale parallel computing by performing the following steps: executing a computational process from at least one client node; distributing a set of computational instructions and a set of input data derived from the computational process to a plurality of worker nodes via a content addressable service network; maintaining a global data store comprising the set of computational instructions, the set of input data, and a set of result data; and sending a first single-sided communication message from the at least one client node to the plurality of worker nodes over a content addressable UVR network; wherein the first single-sided communication message sent from the at least one client node is assigned an expected completion time; wherein the plurality of worker nodes competes on the content addressable UVR network; and wherein the at least one client node sends a retransmission of the first single-sided communication message if a first response is not received within the expected completion time, and the retransmission
  • the tuple space abstraction may be implemented with a singlesided communication interface.
  • At least one of client nodes communicates with a DNS server that records domain information in a synchronously replicated network-scale storage using the content addressable UVR service network.
  • the first single-sided communication message comprises a key
  • the steps further comprise: responding to the query message with a value or set of values that correspond to the key; and placing the value or set of values that correspond to the key in a shadow state; wherein the value or set of values remain in the shadow state until a second query message is received indicating that the value or set of values should be removed from the shadow state.
  • the instructions may further comprise a replication markup language comprising "lock” and “unlock” statements for queries updating data concurrently, configured to protect data replication consistency and, optionally, wherein the replication markup language has a form of (lock, name) and (unlock, name).
  • Further steps may include (i) providing a wrapped communication interface for an existing parallel processing application that uses an application programming interface selected from the group consisting of the Message Passing Interface (MPI), OpenMP, Remote Method Invocation (RMI) and Remote Procedure Call (RPC); (ii) translating a first direct message, received from the existing parallel processing application, into a first single-sided (key, value) Tuple Space operation; and (iii) translating a second single-sided (key, value) Tuple Space operation into a second direct message for delivery to the existing parallel processing application and, optionally, wherein the wrapped communication interface is capable of changing a processing granularity of the parallel process without recompiling the set of computational instructions, and is capable of fault tolerant non-stop operation without checkpoints.
  • MPI Message Passing Interface
  • RMI Remote Method Invocation
  • RPC Remote Procedure Call
  • a network-scale distributed storage system may comprise (i) at least one client node; (ii) a plurality of storage nodes connected by a set of redundant communication links in a content addressable UVR service network implemented using the Tuple Space abstraction using a Statistic Multiplexed Computing protocol; (iii) wherein each storage node comprises a transaction replication gateway, configured to provide dynamic serialization of synchronously replicated concurrent update queries, dynamic query load balancing, and non-stop database resynchronization.
  • the plurality of storage nodes may be configured to communicate with one another via a single-sided (key, value) API.
  • a DNS server may be included hosting a single public service URL mapped to all nodes in the content addressable UVR service network and, optionally, wherein the DNS server comprises a network-scale distributed storage system comprising a content addressable UVR service network.
  • At least one client node may implement a redundancy elimination protocol, and wherein the at least one client node uses the content addressable UVR service network for service.
  • a network-scale parallel computing system comprising: a client node; and a UVR content addressable service network, comprising: a plurality of distributed Statistic Multiplexed Computing nodes having a collective set of resources, and implementing a Tuple Space abstraction; and a plurality of redundant networks, wherein each of the plurality of distributed Statistic Multiplexed Computing nodes is communicatively connected to at least one of the plurality of redundant networks; wherein the client node is connected to the service content addressable network via a plurality of redundant network connections; and wherein the service content addressable network is configured to completely decouple programs and data from processors, storage, and communication hardware.
  • a partial failure in any processor, network and storage does not irreparably interrupt execution of the set of instructions.
  • the client node implements a retransmission and redundancy elimination protocol upon a timeout event on a task executing on a first node, the retransmission and redundancy elimination protocol configured to execute from different processors, networks and storage from the first node and/or, wherein each worker node comprises a data and transaction replication gateway, configured to ensure parallel synchronous replication on multiple nodes and non-stop data resynchronization for storage or database recovery.
  • a system comprising a client side network and a service-side network; wherein the service-side network employs programming using a singlesided Statistic Multiplexed Computing protocol; and wherein the client-side network employs programming comprising: direct client to service protocols; at least one timeout retransmission; and redundancy elimination protocols.
  • the following solution solves the software safety challenge by resolving the blockchain transaction processing throughput and energy efficiency challenges.
  • This solution also includes transformation of legacy transactional processing infrastructures and compute intensive applications for safer and more scalable services, paving a way for bridging the deployment gaps between public and private blockchains as well as permission-based legacy database and traditional high performance computing systems. It enables a universal data plane from mobile to cloud ready for integration with compute and data intensive artificial intelligence and machine learning applications, and quantum computer applications.
  • the blockchain protocol is a high-level one-sided communication protocol built using raw hop-to-hop data communication protocols, such as TCP/IP sockets and RPC (remote procedure call). Satoshi Nakamoto, “Bitcoin: A Peer-to-Peer Electronic Cash System”, www.bitcoin.org, 2008.
  • the blockchain protocol constructs a virtual ledger processor powered by participating nodes without reliable networks and trusted servers. Blockchain network has no single point failures. Participating nodes are rewarded with digital tokens for contributing transaction processing powers. This seemingly revolutionary feature demonstrated the feasibility of non-stop lossless data services that enabled cryptocurrencies and other applications where the traditional trusted database servers fall short.
  • Sam Daley “30 Blockchain Applications and Real- World Use Cases Disrupting the Status Quo,” March 31, 2021, Updated July 11, 2021. (https://builtin.com/blockchain/blockchain-applications).
  • the blockchain transaction throughput is low, estimated 4-7 transactions per second.
  • Kenny L. “The Blockchain Scalability Problem & The Race for Visa-Like Transaction Speed,” Jan 30, 2019 (https://towardsdatascience.com/the-blockchain-scalability-problem-the-race-for- visa-like-transaction-speed-5cce48f9d44).
  • the protocol requires a single public ledger for all transactions from the “genesis block”, and all transactions must be processed and stored on all nodes. Different consensus algorithms have been developed trying to increase the transaction throughput. To date, all proposals introduced different risk factors that can weaken the integrity of the virtual ledger processor.
  • Jeff Nijsse and Alan Litchfield “A Taxonomy of Blockchain Consensus Methods,” Journal of Cryptography 2020, 4(4), 32: https://doi.org/10.3390/cryptography4040032 , November 2020.
  • Bitcash an attempt to increase transaction block size from 1MB to 4MB (SegWit related hard fork from Bitcoin) to accelerate transaction processing throughput, and many off-chain transaction processing applications such as Raiden Network, Lightning Network, and others.
  • Bitcash is faster than Bitcoin, its transaction processing throughput also stagnates when more miners join the network.
  • the off-chain crypto applications only use the Bitcoin chain for final settlement by handling small transactions directly in private chains or traditional databases. Lome Lantz, Daniel Cawrey, “Mastering Blockchain,” O’Reilly Media, Inc. ISBN: 9781492054702, November 2020.
  • Ledger Storage Limit The blockchain ledger is a historical transaction log since the genesis block. Unlike traditional database servers, once a server’s capability is saturated, adding more servers can expand the services to serve more users. The blockchain ledger will eventually fail when the storage requirement exceeds every participating node. Currently, a full node in Bitcoin requires at least 340GB storage with expected growth at least 1GB per month. Vitalik Buterin, “The Limits to Blockchain Scalability ' , May 23, 2021 (https://vitalik.ca/general/2021/201723/scaling.html).
  • the fundamental scalability challenge is the ability to deliver performance and reliability together when expanding the infrastructure (hardware processors, networks and storage) processing capabilities without compromising security and reliability.
  • the following solution is applicable to all life cycles of every mission critical service.
  • the blockchain network builds immutable digital assets that are not directly accessible by ordinary users via the Internet.
  • the off-chain networks and transaction processing “gateways”, such as Coinbase and Binance, help bringing the blockchain network to the ordinary users. Lome Lantz, Daniel Cawrey, “Mastering Blockchain” , O’Reilly Media, Inc. ISBN: 9781492054702, November 2020.
  • the crypto transaction processing gateways deploy traditional databases or private blockchains for higher transaction processing throughputs.
  • the ultimate transaction settlements are done on the public blockchains.
  • These gateways suffer the scaling challenges in both worlds.
  • the ledger security proved the most vulnerable.
  • the recent FBI intercept of Ransomware payment at the Bitcoin exchange was a proof of the security vulnerability at the crypto exchanges.
  • Department of Justice “Department of Justice Seizes $2.3 Million in Cryptocurrency Paid to the Ransomware Extortionists Darkside,” June 2021 (https://www.justice.gov/opa/pr/department-justice-seizes-23-million-cryptocurrency-paid- ransomware-extortionists-darkside).
  • ACAN Active Content Addressable Networking
  • ACAN stands for Active Content Addressable Networking.
  • ACAN is a high level single-sided data communication and synchronization protocol based on the Tuple Space abstraction.
  • a Tuple Space is a virtual network memory for ⁇ key, value> tuples. Tuples are accessible via three protocols:
  • ACAN Since ACAN runs on all nodes and controlling all network connections, ACAN applications are completely decoupled from processing hardware and networks.
  • An ACAN application forms a UVR (unidirectional virtual ring) of participating nodes. The UVR is formed when the application runs.
  • FIG. 8 illustrates a running application’s UVR.
  • An ACAN application consists of a client (master) that creates tasks to be processed and a solver program runnable on any number of nodes (workers).
  • ACAN application runs, its runtime tuple matching system will automatically form SIMD, MIMD and pipelined parallel processing clusters leveraging the UVR as the common data plane.
  • the tasks are processed by all available processors and networks in parallel.
  • the client program has a retransmission protocol that will resend a task when its expected result is not returned on time.
  • a critical difference between ACAN parallel application and legacy parallel applications is the inclusion of the timeout/retransmission protocol in the client.
  • the client timeout/retransmission protocol will automatically recover all network and processor failures while allowing the best-effort performance of the entire processing platform.
  • ACAN enables decentralized processing for traditionally centralized parallel processing applications.
  • Security protocol processing can also be parallelized to minimize the timing impact without reliability concerns.
  • SDPP Software Defined Parallel Processing
  • Transaction processing is typically done by database systems, SQL or Non-SQL.
  • the ACAN network can be exploited to deliver infinitely scalable transactional processing performances without reliability or security degradation.
  • Each gateway performs three tasks: a. Synchronous replication for data changing transactions with dynamic serialization. This means that all servers are forced to execute queries in the exact same order in synchrony if they update the same data. Otherwise, all transactions will be replicated in wire speed. This ensures all K servers are in sync at all times. b. Dynamic load distribution for read-only transactions. This ensures all K servers contribute evenly leveraging the K synchronously replicated database servers. c. Non-stop database resynchronization. When the replicated transaction results in inconsistency for any reason, the gateway’s voting algorithm determines the winner(s). The losers are instantly disconnected to ensure the database consistency. The gateway is also responsible for rebuilding and reconnecting the disconnected servers in parallel with no more than 60 seconds service downtime independent of database size using a “Mobius strip algorithm”.
  • the ACAN transaction network is a single logical database with K replicas. Similar to the parallel application client, the ACAN database client also needs a transaction retransmission protocol when a query times out. This retransmission protocol always checks the status of the timeout transaction on the target server before re-sending the transaction. This eliminates the “double spending” risks and enables the ACAN transaction network to overcome the gateway and database server single-point failures by automatically deploying backup gateways and servers under the same public gateway IP address (resolved by a global or local DNS server).
  • the disclosed methods are applicable to network storage where storage client software updates are trivial.
  • Using the ACAN storage in DNS will eliminate the last Internet single point failure: DNS, thus making network-based DDoS attacks far less lethal.
  • SFDC Software Defined Database Cluster
  • serverless transaction processor serverless transaction processor
  • the blockchain protocol requires a transaction pool, a bulletin board where all transactions are posted. In the case of Bitcoin, it is called “mempool”. Sean O’Connor, “Mastering Mempool, ” hackemoon.com , April 19,2020. (https://hackernoon.com/mastering- the-mempool- a-ho w- to- guide-zs7u32ou) .
  • the mempool is used by all miners to build transaction blocks.
  • mempool is implemented using RPC (Remote Procedure Call) protocol.
  • RPC Remote Procedure Call
  • RPC requires receiver’s IP addresses to send data. It is one of the hop-to-hop protocols.
  • the Bitcoin protocol builds a sophisticated mechanism to circumvent the protocoloprocessor coupling by implementing internal DNS (domain name server) services.
  • AC-Chain an Active Content Addressable Networking (ACAN) based blockchain: AC-Chain.
  • ACAN Active Content Addressable Networking
  • AC-Chain replaces the legacy blockchain RPC transaction broadcast protocol by ACAN protocols.
  • AC-Chain implements a network- wide Tuple Space. It has a simpler DNS service leveraging its UVR (unidirectional virtual ring) topology.
  • UVR unidirectional virtual ring
  • Each transaction will be stored as a tuple.
  • the tuple name is the transaction ID suffixed by a replication count:
  • the ACAN Tuple Space replaces the mempool. When miners pack transaction blocks, they ignore the r-count. The miners proceed as the original Bitcoin protocol. The ACAN Tuple Space daemon will remove the transaction tuple when r-count reaches zero. This ensures no more than r-count miners working on the same transaction. Each miner has an asynchronous listener waiting for a verified block tuple’s broadcast.
  • Blocks are randomly distributed to all miners. Each transaction is only replicated r- count times.
  • Transaction broadcasts are limited by R-count that only R-count copies of ledgers exist in the entire blockchain network.
  • the transaction blocks (transaction logs) are randomly distributed. Each transaction can only be packed into r-count blocks that are to be mined in parallel. Since each node packs its own transaction blocks, the distribution of transactions will be automatically randomized.
  • R-count can be adjusted based on security requirements of the ledger. Once r-count is determined, adding miners to the network will increase the blockchain transaction throughput by forming pipelined parallel processors with fixed broadcast/block synchronization overheads.
  • the virtual ledger can now grow indefinitely due to the automatic constrained block replication and distribution. All nodes will still par-take in the ledger processing and storage, but at randomized fashion, thus improving blockchain security gradually.
  • FIG. 10 illustrates the block processing pipeline on AC-Chain.
  • UVR Unidirectional Virtual Ring
  • the ACAN enables centralized control over decentralized processes.
  • the two foundational improvements enable more miners to earn productive rewards without compromising ledger integrity using either POW or POS.
  • a slower miner can get ahead when r-count reaches zero.
  • Hostile takeover risk is reduced since it highly unlikely a high-power miner would be able to insert consecutive blocks.
  • a parallel block processing pipeline will form. The overall energy efficiency will be dramatically improved.
  • the AC-chain can scale indefinitely as long as R-count « the number of total miners.
  • AC-Chain is applicable for both public and private blockchains.
  • the blockchain per transaction processing time must be throttled to ensure the integrity of the transactions (double- spending free).
  • the current industry standard is six confirmations. Since each block requires approximately 10 minutes to verify, the difficulty level is automatically adjusted periodically to maintain that time, the total elapsed time is about 60 minutes [18]. This time will not change even under AC-Chain.
  • the AC-Chain will improve the overall blockchain’s efficiency without compromising security as more miners join the network.
  • the new miners form a block processing pipeline as depicted in FIG. 10. Even though the blocks still need to be added sequentially, since the waiting miners are in sleep-mode with drastically lower power consumption, the reduced network bandwidth due to r-count replication limit, and saved electricity consumption improves the overall efficiency of the chain. Since the efficiency increases monotonically without security/reliability compromises as new miners join the network, AC-Chain can scale indefinitely.
  • N is the number of transaction blocks to be processed and P is the number of pipeline stages (blocks being mined by “committee of miners”) formed by concurrently running miners regulated by r-count (FIG. 10). Since typically N >> P, the efficiency bound is P -times.
  • ACAN deployment enables more “democratic computing” by enabling low power computers to join the network.
  • AC-Chain Security [0187] A.3.3.5 AC-Chain Security [0188] The AC-Chain minimal security calculus remains the same as the original Bitcoin network with R-count nodes, since no miner is allowed to expand the mining pool per transaction. Due to randomization of ledger blocks and mining, hostile takeover is more difficult than the original blockchain network.
  • Private blockchains are suitable for applications that need to be regulated either by government or some other authorities.
  • the AC-Chain protocol allows seamless integration of public and private blockchains via the content addressable networking protocols.
  • the public, private AC-Chain or the exchange gateway can seamlessly integrate HPC (high performance computing) hardware (GPUs and FPGAs) with ACAN enabled Al and ML applications to form super-nodes, because the ACAN data plane is accessible from cloud to mobile devices.
  • HPC high performance computing
  • This exemplary embodiment discloses a practical application of the Active Content Addressable Networking concept and implementations. These include compute and data intensive applications, transactional applications and blockchain applications.
  • the fundamental principles in delivering high performance, high reliability and security for critical infrastructures include four necessary conditions: a. Complete decoupling of software and data from processors, networks and storage using higher level data communication protocols like ACAN protocols. b. Full resource statistic multiplexing at runtime (ACAN runtime). c. Client programs must include ACAN retransmission protocol. d. Infrastructure scaling must maintain higher parallel gains and slower overhead growth, especially replicated overheads (ACAN infrastructure discipline). [0198] The disclosed technology enables a computing paradigm shift from the traditional client-server to the serverless paradigms. In the era of quantum computing, the ACAN powered serverless infrastructures are well suited to deploy quantum computers of fixed cubits for practical scalable applications.
  • This embodiment first examines the impacts of parallel scaling directions in speedup predictions.
  • the Amdahl’s pessimistic speedup bound under the “fixed-scale” assumption informs the limitation of fine-grain parallel computing. It is only half of the story. Gustafson’s model helped to quantify the elusive behavior of Amdahl’s speedup bound by revealing the other half.
  • Both processor scaling directions are useful in applications deploying multiple parallel accelerators. Once the problem sizes are clearly understood, both models converge to the same unlimited speedup bound. In other words, given unlimited resources and open problem size, all applications should scale indefinitely. This propels the clouds capable running these applications to the “quantum class”, since infinity is greater than any performance by a fixed size quantum computer.
  • ACAN active content addressable network
  • FIGs. 12A and 12B illustrate the differences between Amdahl’s and Gustafson’s formulations.
  • Gustafson’s is an inductive model based on actual measures (s’ and p’) of a parallel program using N processors. To calculate speedup, it needs to project the theoretical workload using a single processor. Generalizing this speedup mapping makes the model a recurrence relation with respect to N. A deductive argument is required in order to prove the bound of this recurrence model.
  • Bertrand Russel’s “inductive turkey”, described as follows [6] : one cannot conclude that the turkey will live forever since it is fed every day. Similarly, the “fixed time” assumption: s’ + p’ 1 is insufficient for proving its speedup bound.
  • ACAN Active Content Addressable Network
  • Ge is a network implementation of the Tuple Space data abstraction. Gelernter, D., Carriero N., ''Coordination Languages and Their Significance,” CACM, 35(2), pp 97-107, 1992.
  • a Tuple Space is a transient memory of ⁇ key, value> objects supporting three operators: put(key, value), get(&pattern, &buffer) and read(&pattern, &buffer), as opposed to the legacy “send” and “receive” protocols.
  • the “&” sign indicates that the data will be modified once a match is found in the Tuple Space. Since the read() and get() are blocking operators, applications built using this API enables data-parallel processing amongst communicating programs automatically. This simple parallel process coordination language facilitates automated parallel processing and latency hiding.
  • FIG. 17 shows the conceptual ACAN implementation via a unidirectional virtual ring (UVR). Shi, J.Y., US Patent Application Publication 20210297480.
  • the word “virtual” is important since it enables harnessing multiple interconnection networks at runtime.
  • the worstcase UVR traversal complexity is Ct(lgkP) where k is the fan-out degree of a ring broadcast protocol and P is the number of processors on the ring.
  • a Tuple Space daemon runs on all nodes that each is responsible for the first-class local resource exploitation for all parallel programs. It is also responsible for the UVR maintenance and application controls. Each node is only deployed for one parallel program or multiple different parallel programs. This design allows first-class resource exploitation using multiple federated computing clouds without breaking security barriers, since the Tuple Space daemon is a user-privilege program. In other words, the UVR only forms when a user application is running.
  • a Quantum-Class Cloud Computing Architecture [0240] Traditionally, hardware designs define the computing architectures of the running programs, because the application programming interface (API) is driven by the hardware design choices. Each running application is a finite state automata executed by the hardware components. Hardware component correctness was assumed. Processor crash failure meant “game over” for any single-core architecture.
  • API application programming interface
  • the proposed unified data access model enables a new data parallel statistic multiplexed computing (SMC) architecture without memory size constrains.
  • SMC data parallel statistic multiplexed computing
  • the SMC programs are stateless. All hardware resources are multiplexed.
  • the SMC architecture enables the best-effort performance and reliability at the same time.
  • the architecture is equally applicable for parallel accelerators and supercomputer clusters for mission critical applications using the quantum-class features.
  • Tightly coupled bare metal parallel programs are suitable parallel accelerators. However, they are under the spell of the speedup limit by the “fixed size” Amdahl’s bound and the increasing multicore crash failures.
  • Complete program and data decoupling from hardware enables easy scaling for performance and reliability at the same time with fixed or open problem sizes.
  • demonstrating performance advantage using a decoupling protocol against an optimized tight coupling protocol is never easy. For example, comparing the packet- switching protocol against the circuit-switching protocol in small scale is pointless. It would be equally difficult comparing parallel performances using the complex decoupling protocol against applications built using highly optimized tightly-coupled bare metal protocols, such as MPI (message passing interface).
  • MPI message passing interface
  • Each ACAN worker controls a native MPI program running on 4 cores.
  • the UVR is constructed between the SMC workers and master.
  • the ACAN wrappers communicate via the Tuple Space abstraction.
  • the wrappers bring two critical benefits: a) they provide fault tolerance without checkpoints, and b) they allow the MPI programs to run on different granularities (partition sizes) without re-compiling.
  • the end-to-end crash protection without checkpoints makes this small SMC experiment inductive. Adding processors (more resources) will only amplify the performance differences with increasing reliability guarantees. This setup establishes the relevance to extreme scale computing.
  • FIG. 18 shows the partitioned program configuration.
  • the test was a parallel dense matrix multiplication program written in MPICH2.
  • the MPI program was measured for multiplication of two 6000 x 6000 square matrices using all 16 cores.
  • the MPI program was loop-order and GCC -03 optimized.
  • the SMC wrapper used a Java implementation called AnkaCom.
  • the native MPI program yielded an average elapsed time of 23.5 seconds without using the mpi_scatterv( ) primitive.
  • the mpi_scatterv( ) primitive leverages the pre-loaded data on multiple nodes without physically broadcasting the matrix data to workers. In this test, both MPI and SMC wrapped MPI must broadcast data physically.
  • FIG. 19 shows that the best SMC tuned performance is 20 seconds at multiple points.
  • the granularity tuning allowed the application's state machine to align the networks and processors so that all tasks will complete approximately the same time. Larger problem sizes and/or more processors will only amplify the performance differences.
  • a more recent test compared an optimized MPI matrix code with the mpi_scatterv( ) zero-copy primitive against the same SMC wrapped MPI code without the benefits of optimized data distribution.
  • the MPI code is also GCC -03, loop order and mpi_scatterv( ) optimized.
  • the SMC wrapper is a new Synergy4 implementation in C.
  • FIG. 20 demonstrated that even with physical broadcast overheads, the ACAN wrapped MPI program still delivered better performances against the unwrapped MPI program at multiple points.
  • the programs computed the products of two 9000 x 9000 matrices using 12 compute nodes with total 288 cores (using only 24 cores per 48 core node to test the Infiniband impacts).
  • the test environment was the NSF Chameleon bare metal cluster.
  • the optimal points are on a cycloid. In other words, there are multiple optimal points that will lead to the shortest computing time if without other overheads (such as network sessions).
  • the optimal size is much smaller than the typical fixed task distribution size.
  • the program/data decoupling effects enabled parallel amortizing the processing overheads over time. Although the ACAN protocol costs are almost doubled compared to the bare metal protocols such as MPI, the performance amortizing (automated “squish packing”) effects guarantees better performances.
  • FIG. 21 The transaction replication switch is installed in front of multiple data servers.
  • the data clients connect to the switch for data accesses.
  • the switch performs three functions: a) dynamically serialized synchronous parallel transaction replication with real time data inconsistency test and instant disconnects, b) dynamic load balancing for read-only requests, and c) non-stop data resynchronization between data sources.
  • the complete end-to-end data multiplexing protocol requires all data clients to include the timeout/retransmission discipline. Thus, there is no need to maintain state information in the replication switches, they can be replaced arbitrarily without transaction losses.
  • the replication switch has an in-line parser to inspect each passing query for data changes. For data-changing queries with update conflicts, the switch will perform dynamic serialization to force all data servers to obey the identical commit order. Others will be replicated in wire-speed. For read-only queries, the switch can either dynamically distribute the load or targeting a designated “master” to avoid transient inconsistencies caused by physical data transmission delays. As in compute intensive SMC architecture, this transaction replication/switching architecture is incomplete without the re-transmission discipline in the data clients. The re-transmission protocol enables multiplexing the replication switch such that it can crash arbitrarily without transaction losses. Multiple redundant switches can also boost transaction processing performances.
  • the non-stop re-synchronization algorithm follows the Mobius strip principle — a paper tape attached at both ends via a half-twist.
  • the mathematical property of this strip is that it only has a single boundary. It has an infinite “walkable surface”. If we let transaction service time be on that single boundary, we will then have a non-stop data re- synchronization algorithm (the decentralized atomic 2PC).
  • the idea can be described as follows: a) Create a full backup using one of the synchronously replicated servers as the source in the background, b) Restore the backup set to all the targets to be resynchronized in the background, c) Periodically scan the source for data changes; put all target servers to active duty if no change is found; they are all synchronized with the identical data contents, d) If the scan lasts more than a threshold number of times, pause the switch. The switch will then automatically put all targets to active duty causing no more than 60 second service downtime regardless database sizes. The switch pause will stop all incoming transactions; automatically complete all pending transactions and replications. The switch pauses and restarts can also be automated.
  • step c The switch's strategic network position is crucial to deliver the seamless half-twist, the infinite service time then unfolds itself following the above four lines of logic.
  • the correctness proof rests in the fact that transaction log time is faster than transaction processing time. Therefore, the above algorithm results in a monotonically decreasing series of scan times (step c). However, at the end of the scans, the incoming traffic forms a direct queue with the target(s). This can result in an oscillating tail that may never terminate. The manual pause is to terminate the tail. This situation typically happens during heavy transactions.
  • the original CAP conjecture included higher levels A and P that require breaking the data consistency.
  • a and P are satisfied within the confines of consistent data services under extreme partial failure conditions.
  • the decentralized non-stop data resynchronization protocol satisfies the application level end-to-end reliability and performance requirements.
  • the CAP Theorem remains correct for systems using hop-by-hop RPC (remote procedure call) protocols.
  • the proposed AC AN protocol can lift the data intensive application out of the performance and reliability traps.
  • the “share-nothing” database ideal can indeed become feasible. Stonebraker, M., “The case for shared nothing architecture,” Database Engineering, 9(1), 1986.
  • the blockchain protocol already demonstrated that CAP can be fully satisfied in practice in decentralized environments once arbitrary message losses are eliminated. Nakamoto, S., “Bitcoin: A peer-to-peer electronic cash system,” https://bitcoin.org/bitcoin.pdf, 2009. Unfortunately, for cryptocurrency applications, the blockchain protocol sacrificed resource multiplexing in pursue of the non-scalable consensus algorithms.
  • the complete ACAN storage unit design with synchronous data replication contains three components: a) a stateless data replication switch, b) a non-stop data re-synchronization algorithm, and c) a client-side redundancy check and re-transmission protocol.
  • This unit design can be scaled indefinitely as described next.
  • load distribution (or data partitioning) is the only way to further expand the performance of the storage system. Without the SMC data decoupling features, every data partition becomes a single-point-failure of the entire system. Scaling is limited. Under the SMC framework, each replication switch is responsible for one data partition with R synchronously replicated data servers. Scaling this transaction switching system is done by keeping P » R where P is the number of data partitions. Higher P will deliver better performance. Since R does not change, this infrastructure can upscale to deliver increasingly better performances indefinitely.
  • Each partition's switch can be replaced arbitrarily without transaction losses.
  • a DNS- load balancing function is needed for the access to the storage system, since every node now is equal to any other nodes most of the times.
  • the transaction replication switches form a data intensive active content addressable network.
  • the performance losses in query parsing and statistical multiplexing are compensated by up scaling the infrastructure and amortizing the overheads in parallel.
  • adding resources can improve the transaction performance and reliability indefinitely.
  • TPC-E benchmark (FIG. 23) using Microsoft SQL Servers.
  • the TPC-E benchmark simulates a brokerage firm's operation using relational databases.
  • the preliminary results demonstrated the feasibility of high data reliability with near linear transaction processing speedup at the same time.
  • P number of database servers
  • R number of synchronously replicated data sets. Since R defines the reliability of the data store independent of P, these experiments are also inductive.
  • FIG. 25 illustrates consistent superior read and write performances against HDFS. Note that file system meta-data management still requires a switched transactional storage replication layer FIG. 22.
  • FIG. 26 reports the performance results of the worst-case write performances of a single 2GB file with 2-11 remote replicas via a single 1-Gbit Ethernet switch.
  • the overall replication performance grows as the number of replicas increases, the latency (time) degradation is dramatically slower than legacy methods.
  • ACAN architecture is naturally suited for real time mission critical applications.
  • ACAN can offer far better performance, reliability and security than the current Control Area Network (CAN) in self-driving vehicles on land, sea and in space.
  • CAN Control Area Network
  • the stateless nature of ACAN applications isolates performance and reliability concerns, thus enables formal functional verification and validation without runtime dynamics modeling.
  • ACAN cloud application shifts the power from cloud vendors back to the application owners thus creating new business dynamics.
  • the ACAN applications can endure arbitrary infrastructure expansions and contractions without modifying programs, thus cutting project maintenance costs dramatically.
  • ACAN data access model also encourages independent hardware innovations for different parallel workloads using both fixed and open- sized APIs.
  • the ACAN architecture is a catalyst for delivering growing powers of computing clouds with complementing quantum, chemical or biological computers for specific tasks for decades to come.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Computer Security & Cryptography (AREA)
  • Databases & Information Systems (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Bioethics (AREA)
  • Software Systems (AREA)
  • Computer Hardware Design (AREA)
  • General Health & Medical Sciences (AREA)
  • Computing Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Data Mining & Analysis (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

Systems and methods are provided for a virtual network storage as a transaction pool of a transactional application. In particular, the virtual network storage includes a set of gateway servers that form unidirectional virtual ring as a virtual single-system image of an Active Content Addressable Network (AC AN). Each gateway server manages a partition of a database by driving replicated database servers, in a shifted mirror fashion. The ACAN may represent a transaction pool for processing transactions of a blockchain (e.g., crypto-currency). Unlike traditional blockchain crypto-currency protocols based on lower level hop-to-hop networking protocols, the disclosed technology processes data in parallel without addresses or hops by decoupling applications from processing hardware and networks, thereby attaining performance while maintaining scalability and reliability incrementally without infrastructure size limitations. Accordingly, the disclosed technology enables quantum-class cloud computing.

Description

NON-PROVISIONAL PATENT APPLICATION
System and Apparatus of Secure Transaction Processing and Data Stores
CROSS REFERENCES
This application claims priority to and the benefit of U.S. Provisional Patent Application No. 63/261,179, filed on September 14, 2021, the disclosure of which is hereby incorporated herein by reference in its entirety.
BACKGROUND
[0001] Use of cloud computing for transaction processing has become increasingly popular. Examples of transactions in a distributed computing environment include use of blockchain technologies for processing big data, transaction data, and other high-intensity processing applications (e.g., crypto-currencies). As transactions increasingly rely upon distributed systems (e.g., blockchain), demands for improving all aspects of scalability, reliability, and performance in processing mission-critical transactions have risen. Traditionally, transaction systems include a combination of permission-based legacy databases and traditional high performance computing systems. The traditional transaction systems increasingly suffer from throughput as the systems scale up, primarily caused by a tradeoff between throughput and reliability.
[0002] In particular, the blockchain protocols leverage secure and non-stop computing in large-scale systems over the Internet, without dedicated trusted servers. The blockchain protocol uses traditional raw hop-to-hop data communication protocols (e.g., TCP/IP sockets and RPC (remote procedure call)). An issue arises when a system attains scalability and reliability while sacrificing throughput. Thus, developing a technology that better meets scalability requirements while minimizing the trade-offs between performance and reliability would be desirable in high- performance computing environments and parallel processing environments.
[0003] It is with respect to these and other general considerations that the aspects disclosed herein have been made. Also, although relatively specific problems may be discussed, it should be understood that the examples should not be limited to solving the specific problems identified in the background or elsewhere in this disclosure. SUMMARY
[0004] According to the present disclosure, the above and other issues are resolved by storing data in Active Content Addressable Networking (ACAN), a distributed tuple space in a virtual network memory using a high-level, single-sided data communication and synchronization protocol.
[0005] This present disclosure relates to a virtual network storage for a high-performance, scalable transaction system. An issue arises as transaction systems scale because the systems tend to sacrifice performance. In particular high-performance systems (and traditional databases) suffer scalability challenges because adding database servers for processing larger workloads needs to make a priority between performance and reliability.
[0006] The disclosed technology addresses the issue by providing a virtual data storage including gateway servers and database servers. Respective database servers manage partitions of a database in a replicated data stores in the database servers. Each gateway server drives multiple database servers. The gateway servers collectively form a unidirectional virtual ring (UVR). Unlike the traditional lower level hop-to-hop network protocols (e.g., Message Parsing Interface, Remote Procedure Call, and the like), the virtual network storage has no receiver “address” or “hop.” Thus, IP addresses associated with nodes or gateway servers are invisible from client applications. An ACAN application includes a UVR of participating gateway servers. The UVR is formed when the ACAN application runs. For example, the disclosed technology provides a transaction pool of crypto transactions (e.g., a mempool of Bitcoin) as an ACAN application. The virtual database provides scalability without loss of performance, reliability, or security as data grows in the ACAN.
[0007] The disclosed technology uses the phrase “service scalability” as an ability to expand processor/network/storage, physical and virtual, for handling ever-increasing workloads without loss in processing performance, reliability, and security.
[0008] This Summary is provided to introduce a selection of concepts in a simplified form, which is further described below in the Detailed Description. This Summary is not intended to identify key features or essential features of the claimed subject matter, nor is it intended to be used to limit the scope of the claimed subject matter. Additional aspects, features, and/or advantages of examples will be set forth in part in the following description and, in part, will be apparent from the description, or may be learned by practice of the disclosure. BRIEF DESCRIPTIONS OF THE DRAWINGS
[0009] Non- limiting and non-exhaustive examples are described with reference to the following figures.
[0010] FIG. 1 illustrates an overview of an example system for a virtual database in accordance to aspects of the present disclosure.
[0011] FIG. 2 illustrates an example system for storing data in unidirectional virtual ring with shift-mirrored data partitions in accordance with aspects of the present disclosure.
[0012] FIGS. 3A-B illustrate example of data structures and commands in accordance with aspects of the present disclosure.
[0013] FIG. 4 illustrates an example of an ACAN application as a processing pipeline for processing crypto-currency transaction based on blockchain in accordance with aspects of the present disclosure.
[0014] FIG. 5 illustrates an example time line of using ACAN for processing blockchain transactions in accordance with aspects of the present disclosure.
[0015] FIGS. 6A, 6B and 6C illustrate an example of a method for processing transactions in crypto-currency based on blockchain in accordance with aspects of the present disclosure.
[0016] FIG. 7 is a simplified diagram of a computing device with which aspects of the present disclosure may be practiced.
[0017] FIG. 8 illustrates a Unidirectional Virtual Ring.
[0018] FIG. 9 illustrates an ACAN database transaction processing network.
[0019] FIG. 10 illustrates a Parallel Block Processing Pipeline for blockchain applications.
[0020] FIG. 11 is a graph of a speedup under Amdahl’s Law.
[0021] FIGS. 12A and 12B illustrate the differences between Amdahl’s and Gustafson’s formulations.
[0022] FIG. 13 is a flowchart illustrating translating parallel measurement for Amdahl’s Law.
[0023] FIG. 14 is a graph illustrating Amdahl’s Law with a fixed problem size.
[0024] FIG. 15 is a graph illustrating Amdahl’s Law with an open problem size. [0025] FIG. 16 is a graph illustrating the hidden behavior of Amdahl’s Speedup Bound.
[0026] FIG. 17. illustrates a conceptual CAN.
[0027] FIG. 18 illustrates an AC AN wrapper.
[0028] FIG. 19 is a graph illustrating wrapped against native MPI.
[0029] FIG. 20 is a graph illustrating wrapped against native MPI for test 2.
[0030] FIG. 21 is an illustration of a transaction replication switch.
[0031] FIG. 22 is an illustration of the scaling of transaction switches.
[0032] FIG. 23 is a table illustrating TPC-E benchmark results.
[0033] FIG. 24 is a graph illustrating hiding replication overheads.
[0034] FIG. 25 is a graph depicting read/write AC AN against HDFS.
[0035] FIG. 26 is a graph illustrating worst-case write performance.
DETAILED DESCRIPTION
[0036] Various aspects of the disclosure are described more fully below with reference to the accompanying drawings, which from a part hereof, and which show specific example aspects. However, different aspects of the disclosure may be implemented in many different ways and should not be construed as limited to the aspects set forth herein; rather, these aspects are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the aspects to those skilled in the art. Aspects may be practiced as methods, systems, or devices. Accordingly, aspects may take the form of a hardware implementation, an entirely software implementation or an implementation combining software and hardware aspects. The following detailed description is, therefore, not to be taken in a limiting sense.
[0037] The basic software safety requirement is zero single-point failure for all mission critical services. To date, no enterprise services can meet this basic requirement. The traditional software development methodologies tend to assume 100% hardware reliability. The reliability, security, and scalability of the resulting services are the “after thoughts.” Attempted remedies fall short of meeting the basic software safety requirements. [0038] The blockchain protocol is a high-level one-sided communication protocol built using raw hop-to-hop data communication protocols, such as TCP/IP sockets and RPC (remote procedure call). The blockchain protocol constructs a virtual ledger processor powered by participating nodes without assuming reliable networks and trusted servers. Blockchain network has no single point failures. Participating nodes are rewarded with digital tokens for contributing transaction-processing efforts. This seemingly revolutionary feature demonstrated the feasibility of non-stop lossless data services that enabled crypto-currencies and other applications where the traditional trusted database servers fall short.
[0039] The blockchain transaction throughput is relatively low, estimated 4-7 transactions per second. The protocol requires a single public ledger for all transactions from the “genesis block,” and all transactions must be processed and stored on all nodes without central authority but network-wide consensus. Different consensus algorithms have been developed trying to increase the transaction throughput. To date, all proposals introduced different risk factors that can weaken the integrity of the virtual ledger processor.
[0040] The blockchain network builds immutable digital assets that are not directly accessible by ordinary users via the Internet. The off-chain networks and transaction processing “exchanges” help bringing the blockchain network to the ordinary users. The crypto transaction processing exchanges deploy traditional databases or private blockchain for higher transaction processing throughputs. The ultimate transaction settlements are done on the public blockchain. These exchanges suffer the scaling challenges in databases and both public blockchain and private blockchain. The exchange security appears to be the most vulnerable.
[0041] Traditional databases enjoy millisecond transaction processing throughput. However, these databases suffer a different scalability challenge: when adding database servers for processing larger workloads, the infrastructure must choose between performance or reliability. Improving both performance and reliability while adding database servers for scalability has been an open challenge.
[0042] As discussed in more detail below, the present disclosure relates to a virtual data storage that sustains scalability requirements of process-intensive and data intensive transactional applications including blockchain applications. Transaction processing is typically done by database systems, SQL or Non-SQL databases. Unlike the legacy parallel applications, the disclosed technology 1) decouples programs from processors and networks, b) enables statistic multiplexing of hardware components, and c) requires client-side timeout/retransmission discipline to enable the potentially infinite scaling. The disclosed technology effectively provides a paradigm shift from client- server computing to complete “server less” computing.
[0043] FIG. 1 illustrates an overview of an example system of virtual database in accordance to aspects of the present disclosure. System 100 may represent a system in the cloud or onpremises data center. The system 100 includes a client application 102, a network 136, and active content addressable network (ACAN) 150. In aspects, AC AN 150 provides a virtual data storage where a virtual database includes multiple partitions. Database gateway replicates and maintains shifted mirroring of respective database partitions. ACAN 150 is formed on gateway servers A-F (104A-F), the gateways drive multiple database servers A-F (110A-F). The gateway network forms a unidirectional virtual ring 120. In aspects, the system 100 represents a virtual database including six partitions, each partition managed by a gateway server. Each partition is replicated for reliability (e.g., two database servers). Accordingly, in this example, each of six gateway servers 104A-F connects to two database servers. For example, the gateway server A 104A connects to the database server A 110A and the database server B HOB. The gateway server B 104B connects to the database server B HOB and the database server C HOC. The gateway server C 104C connects to the database server C HOC and the database server D 110D. The gateway server D 104D connects to the database server D 110D and the database server E 110E. The gateway server E 104E connects to the database server E 110E and the database server F 110F. And, the gateway server F 104F connects to the database server F 110F and the database server A 104A. A unidirectional virtual ring 120 (UVR) forms in a counterclockwise direction (i.e., from the gateway server A 104A toward the gateway server B 104B).
[0044] In aspects, each gateway server manages a database partition by driving two synchronous database servers in a shifted mirror manner. For example, the database server A 110A also stores a replicated database F 110F, where the database server B 110B also stores a mirrored image of database A 104A. Further, each gateway server has one or more backup servers configured on the DNS on automatically failover when the primary gateway server becomes unresponsive.
[0045] In some aspects, the ACAN 150 represents an ACAN application. A number of partitions and a number of replications for respective partitions are not limited to the example system 100. The ACAN database network is a network transaction processing gateways. The ACAN network may deliver scalable transactional processing performances without reliability or security degradation. In AC AN transaction network, for a database with P partitions, each partition has K replicas in a shifted mirroring fashion on UVR, as long as P » K, the AC AN transaction network may deliver increasing performances without negative reliability and security impacts.
[0046] In certain aspects, a compute-intensive ACAN application includes a client (master) that creates tasks to be processed and a solver program runnable on any number of nodes (workers). When the ACAN application runs, its runtime tuple matching system will automatically form types of parallel processing including single instruction, multiple data (SIMD) and multiple instruction, multiple data (MIMD). Pipelined parallel processing clusters leverages the UVR as the common data plane. Available processors and networks may process the tasks in parallel. The client program has a retransmission protocol that will resend a task when the processing times out.
[0047] A critical difference between the ACAN parallel application and legacy parallel applications is the inclusion of the timeout/retransmission protocol in the clients and the hardware statistic multiplexing effects enabled by the ACAN protocol. The client timeout/retransmission protocol will automatically recover all network and processor failures while allowing the best-effort performance of the entire processing platform.
[0048] The ACAN parallel application will eventually terminate regardless partial network and node failures. Tuning the task size (granularity) can optimize the parallel performance according to the Brachistochrone Curve, an equilibrium of computing and communication overheads solving problems of different sizes. The ACAN application will enter an infinite wait cycle when all nodes and networks fail at the same time.
[0049] ACAN enables decentralized processing for traditionally centralized parallel processing applications. Security protocol processing can also be parallelized to minimize the timing impact without reliability concerns.
[0050] For parallel applications with problem fixed size, adding nodes (e.g., gateway servers and corresponding database servers) to the ACAN network will improve performance subject to the economic law of diminishing return. There are no negative consequences in application reliability and security. Accordingly, expanding these applications for solving bigger problems by employing more networking, processing and storage hardware can improve performance and reliability at the same time. For example, the virtual database may need additional partitions as data grows. The ACAN network may add gateway servers and corresponding database servers while maintaining a smaller number of data replications. The UVR may expand by including the additional gateway servers accordingly.
[0051] As should be appreciated, the various methods, devices, applications, features, etc., described with respect to FIG. 1 are not intended to limit the system 100 to being performed by the particular applications and features described. Accordingly, additional controller configurations may be used to practice the methods and systems herein and/or features and applications described may be excluded without departing from the methods and systems disclosed herein.
[0052] FIG. 2 illustrates an example system for storing partitions of a database in accordance with aspects of the present disclosure. The system 200 is a part of an ACAN (e.g., the ACAN 150 as shown in FIG. 1. The system represents a virtual database with six partitions. Two database servers for reliability replicate each partition. The system 200 includes gateway server A 202A, gateway server B 202B, gateway server C 220C, database server A 204A, database server B 204B, and database server C 204C. The gateway server A 202 A connects to the database server A 204A and database server B 204B. The gateway server B 202B connects to the database server B 204B and the database server C 204C. The gateway server C 202C connects to the database server C 204C. While not shown the gateway server C 202C connects to another database server.
[0053] In aspects, the database server A 204A manages partition 6b 206A and partition la 206B. The database server B 204B manages partition lb 208A and partition 2a 208B. The database server C 204C manages partition 1c 210A and partition 3a 210B, etc.
[0054] In aspects, the partition la 206B and the partition lb 208A are mirrored, replicated images of partition 1 of a virtual database. Similarly, the partition 2a 208B and the partition 2b 210A are mirrored, replicated images of partition 2 of the virtual database. Accordingly, each gateway server manages a partition of the virtual database by driving two synchronous database servers in shift mirroring.
[0055] In aspects, each gateway server performs the following three tasks:
[0056] a) Synchronous replication for data changing transactions with dynamic serialization. The database servers are forced to execute queries in the exact same order in synchrony if they update the same data. Otherwise, transactions will be replicated in wire speed, synchronizing the corresponding database servers in real time. [0057] b) Dynamic load distribution for read-only transactions. Corresponding database servers contribute evenly leveraging the synchronously replicated database servers (e.g., two replicated database servers in the example system 200 as shown in FIG. 2).
[0058] c) Non-stop database resynchronization. When the replicated transaction results in inconsistency in multiple servers, the gateway’s voting algorithm determines the winner(s). The losers are instantly disconnected to ensure the database consistency. The gateway is also responsible for rebuilding and reconnecting the disconnected servers in parallel. For example, the rebuilding process based on a “Mobius strip algorithm” takes no more than 60 seconds service downtime independent of database size. Thus the gateway eliminates the database server single point failures.
[0059] Similar to the parallel application client, the ACAN database client also needs a transaction retransmission protocol when a query times out. This retransmission protocol always checks the status of the timeout transaction on the target server before re-sending the transaction. This eliminates the “double spending” risks and enables the ACAN transaction network to overcome the gateway single-point failures by automatically deploying backup gateways and any live database servers under the same public gateway IP address (resolved by a global or local DNS server).
[0060] Unless all gateways and replicated database servers crash at the same time or the domain name system (DNS) is compromised, combined with entire network blackout, database queries will be processed and delivered, if there exists a single path connecting the client with at least one server.
[0061] As should be appreciated, the various methods, devices, applications, features, etc., described with respect to FIG. 2 are not intended to limit the system 200 to being performed by the particular applications and features described. Accordingly, additional controller configurations may be used to practice the methods and systems herein and/or features and applications described may be excluded without departing from the methods and systems disclosed herein.
[0062] FIGS. 3A-B illustrate examples of data associated with a tuple system in accordance with aspects of the present disclosure. FIG. 3A illustrates an example data structure 300A of a tuple that stores transactional data of a crypto-currency (e.g., Bitcoin) based on blockchain in accordance with aspects of the present disclosure. The tuple has a form including key 302 and a value 304. In aspects, the key 302 includes a transaction identifier (ID) 310 and R-count 312. In aspects, the R-count 312 represents a number of miners working on the same transaction. That is, the R-count 312 describes a number of readers (e.g., miners in Bitcoin transaction) that access the transaction data. Each miner, according to the Bitcoin protocol, for example, has an asynchronous listener waiting for a verified block tuple’s broadcast. Value 304 may include content of the transaction.
[0063] FIG. 3B illustrates protocols for access tuples. In some aspects, the tuple space provides the following three protocols: put, get, and read. Put (key, value) inserts a tuple based on a set of a key and a value as specified. Get (&key, &value) (“&” indicates the mutability of the parameter when a key match is found in the tuple space. Once matched, the matching tuple is removed from the tuple space. Read (&key, &value) reads a tuple; the matching tuple remains in the tuple space after the tuple is read.
[0064] In aspects, the three protocols may provide the following additional features when ACAN implements a transaction pool in processing a crypto-currency based on blockchain. Put (Name, Tuple, &Count): where Count controls the number of read/get this tuple would allow. When Count reaches zero, the tuple is deleted in the space. Get (&Name, &Buffer): where & indicates the parameter mutability when a matching tuple is found. The matching tuple will be deleted in the space. Read (&Name, &Buffer): where & indicates the parameter mutability when a matching tuple is found. The matching tuple persists in the space if the tuple Count is not zero. Otherwise, it will be removed from the space.
[0065] For example, a sender can put (“Speakerl”, “Bob”) into the network. Any program running key=”Speaker*” and read(&key, &buffer) will result in key=”Speakerl” and buffer=”Bob”. The information “Speakerl = Bob” is a broadcast to the entire network. Using put and get can implement “exclusive fetch” or the message passing functionality. Unlike the lower level hop-to-hop networking protocols, such as MPI (Message Passing Interface), RPC (Remote Procedure Call) and others, there is no concept of receiver “address.” There is no “hop” in ACAN applications. In other words, the node IP addresses are invisible in ACAN applications. Since ACAN runs on all nodes and controlling all network connections, ACAN applications are completely decoupled from processing hardware and networks. An ACAN application forms a UVR (unidirectional virtual ring) of participating nodes. The UVR forms when the application runs. FIG. 1 illustrates a running application’s UVR.
[0066] FIG. 4 illustrates an example system for processing transactional blocks based on blockchain in accordance with aspects of the present disclosure. System 400 includes a plurality of miners. Miner A 402A, Miner B 402B, and Miner C 402C collectively represent miners that working on respective transactions associated with blocks stored in the Active Content Addressable Network 410 (e.g., the ACAN 150 as shown in FIG. 1). A plurality of blocks, including block H-max+1 404A, block H-max+2 404B, block H-max+3 404C, block H-max+4 404D, represent a set of transaction blocks that are yet to be inserted into a block chain. In aspects, The Active Content Addressable Network 410 represent a transaction pool in cryptocurrencies (e.g., a mempool of Bitcoin). Blockchain 412 includes another set of transaction blocks (Block 406A, Block H-max-1 406B, and block H-max 406C). The arrow 404 represents a block pipeline with a queue length of four as an example.
[0067] In aspects, the system 400 represents a virtual ledger processor. Since the disclosed technology allows direct block retrieval by specifying a transaction or block ID via the ACAN, there may be no need to store the entire ledger on any one node. As more miners join the ACAN, the ledger may grow accordingly with ever increasing storage capabilities to accommodate it.
[0068] Transaction broadcasts are limited by R-count that only R-count copies of ledgers exist in the blockchain network. The transaction blocks (transaction logs) may be randomly distributed. Each validated transaction may be packed into R-count blocks that are to be group- mined in parallel. Each node packs its own transaction blocks; the distribution of transactions may automatically be randomized.
[0069] The value of R-count may be adjusted based on security requirements of the ledger. Once R-count is determined, adding miners to the network will increase the blockchain transaction throughput by forming pipelined parallel processors with fixed broadcast/block synchronization overheads. Network storage capabilities also increase incrementally.
[0070] The virtual ledger can now grow due to the automatic constrained block replication and distribution. All nodes will still par-take in the ledger processing and storage, but at randomized fashion, thus improving blockchain security gradually.
[0071] In aspects, electricity wastes may be dramatically reduced since the miners are automatically informed when the R-count is reached. The waiting miners enter low energy sleep-mode. They will become ready when the next transaction block is ready to be built as soon as the pending block ID is confirmed. The implementation of R-count adjustments is facilitated by the UVR (Unidirectional Virtual Ring) protocol in ACAN by the virtual single-system image delivered by UVR. Accordingly, unlike the blockchain protocol, the ACAN enables centralized control over decentralized processes.
[0072] The AC-chain can scale as long as R-count « the number of total miners. In aspects, the minimal R-count is 7 (=3f+l). This Byzantine failure prevention formula is to prevent any f=2 nodes to conspire together for a hostile takeover attack.
[0073] As should be appreciated, the various methods, devices, applications, features, etc., described with respect to FIG. 4 are not intended to limit the system 400 to being performed by the particular applications and features described. Accordingly, additional controller configurations may be used to practice the methods and systems herein and/or features and applications described may be excluded without departing from the methods and systems disclosed herein.
[0074] FIG. 5 is a timing chart of communications among users 502, a pending transaction pool 504, a miner 506, and a blockchain 508. Generally, the timing chart 500 starts with users perform transaction using a crypto-currency, followed by a put operation 510 and ends with an insert operation 528. The timing chart 500 may include more or fewer steps or may arrange the order of the steps differently than those shown in FIG. 5. The timing chart 500 can be executed as a set of computer-executable instructions executed by a computer system and encoded or stored on a computer readable medium. Further, processing according to the timing chart 300 can be performed by gates or circuits associated with a processor, an ASIC, an FPGA, a SOC or other hardware device. Hereinafter, the timing chart 300 shall be explained with reference to the systems, component, devices, modules, software, data structures, data characteristic representations, signaling diagrams, methods, etc. described in conjunction with FIGS. 1, 2, 3A- B, 4, 6, and 7.
[0075] The users 502 (e.g., a buyer and/or a seller using a crypto-currency) sends a put operation 510 to the pending transaction pool 504. In aspects, the pending transaction pool 504 includes an ACAN-based tuple space (e.g., the ACAN 150 as shown in FIG. 1). The put operation 510 may include a transaction ID and data associated with the transaction.
[0076] The pending transaction pool 504 stores the tuple in the virtual database. In aspects, a name of a tuple may be a combination of the transaction ID and R-count. The R-count represents a number of miners working on the transaction. The R-count effectively limits a number of miners that can access the transaction data. The pending transaction pool 504 replicates the partition(s) associated with the inserted tuple using the UVR of gateway servers and the database servers with shifted mirroring of data.
[0077] When the miner 506 generates a new block tuple with the R-count, listeners of miners associated with the transaction is notified. In aspects, the miner 506 sends a read operation 518 to the pending transaction pool 504 to read the transaction data by specifying the transaction ID. In some aspects, the read operation 518 retrieves data associated with the transaction without removing the tuple that represents the transaction. Accordingly, the pending transaction pool 504 sends the transaction data 520 to the miner 506. The Miner validates (516) the transaction data using cryptographic signatures (e.g., hash values), then assemble the validated transactions into a block (526). The miner then sends get operation (522) by specifying the transaction ID. The pending transaction pool 504 removes the tuple and sends (524) the transaction data to the miner 506. Accordingly, by insert operation (528), the miner 506 inserts transaction block to the blockchain 508.
[0078] As should be appreciated, operations 510-528 are described for purposes of illustrating timing of the present method and systems and are not intended to limit the disclosure to a particular sequence of steps, e.g., steps may be performed in different order, an additional steps may be performed, and disclosed steps may be excluded without departing from the present disclosure.
[0079] FIG. 6A illustrates an example of a method of processing a crypto-currency using ACAN in accordance with aspects of the present disclosure. A general sequence of operations of the method 600 is shown in FIG. 6A. Generally, the method 600A starts with a start operation 602 and ends with an end operation 624. The method 600A may include more or fewer steps or may arrange the order of the steps differently than those shown in FIG. 6A. The method 600A can be executed as a set of computer-executable instructions executed by a computer system and encoded or stored on a computer readable medium. Further, the method 600A can be performed by gates or circuits associated with a processor, an ASIC, an FPGA, a SOC or other hardware device. Hereinafter, the method 600 shall be explained with reference to the systems, component, devices, modules, software, data structures, data characteristic representations, signaling diagrams, methods, etc. described in conjunction with FIGS. 1, 2, 3A-B, 4, 5, 6B-6C, and 7. [0080] Following the start operation 602, a receive operation 604 receives transaction data by the AC AN. In aspects, either or both a buyer and a seller of a transaction inserts the transaction data as a tuple of a virtual database, (e.g., the ACAN 150 as shown in FIG. 1).
[0081] Determine operation 608 determines a value of R-count. In aspects, the R-count represents a number of miners working on the same transaction. That is, the R-count describes a number of readers (e.g., miners in Bitcoin transaction) that access the transaction data. The ACAN may remove the transaction tuple when R-count becomes zero.
[0082] Store operation 610 stores the transaction data in the ACAN. In aspects, the transaction data is represented by a tuple stored in a partition of the virtual database. In aspects, miners may ignore the R-count when the miners pack transaction blocks. The miners proceed as the original Bitcoin protocol. The ACAN tuple space daemon may remove the transaction tuple when R-count reaches zero. The mechanism ensures no more than R-count miners working on the same transaction at a time. Each miner also has an asynchronous listener waiting for a verified block tuple’s broadcast from other miners.
[0083] Generate operation 612 generates, by a miner, a new validated block tuple with a new cryptographic signature (e.g., a hash). Notify operation 614 notifies miner’s listeners for the miner to fetch the validated block. In aspects, blocks are randomly distributed to miners. Each transaction may only be replicated R-count times. In some aspects, the value of R-count may be based on safety of a ledger. The value may be large enough to counter hostile takeover attempts but smaller than the total number of miners.
[0084] Read operation 616, by the miner, the validated transactional block broadcast from other miner. Only R-count copies may be read by miners. Unsuccessful listeners may continue their low energy waiting states.
[0085] Verify operation 618 verifies, by the miner, the transaction block based on immutable cryptographic signatures (e.g., hash values).
[0086] Remove operation 620, by the ACAN (i.e., the virtual database), removes the tuple that corresponds to the transaction. In aspects, a get operation as provided by the ACAN removes the tuple when a value of R-count is zero. The transaction data as a tuple corresponds to a partition of the virtual database. [0087] Store operation 622, by the miner. The miner validates the broadcast block, compares the POW (proof of work) between the local block, stores the winner POW verified transaction block in the local blockchain as the new tip. The loser block is discarded.
[0088] As should be appreciated, operations 602-624 are described for purposes of illustrating the present methods and systems and are not intended to limit the disclosure to a particular sequence of steps, e.g., steps may be performed in different order, an additional steps may be performed, and disclosed steps may be excluded without departing from the present disclosure.
[0089] FIG. 7 illustrates a simplified block diagram of a device with which aspects of the present disclosure may be practiced in accordance with aspects of the present disclosure. The device may be a mobile computing device, for example. One or more of the present embodiments may be implemented in an operating environment 700. This is only one example of a suitable operating environment and is not intended to suggest any limitation as to the scope of use or functionality. Other well-known computing systems, environments, and/or configurations that may be suitable for use include, but are not limited to, personal computers, server computers, hand-held or laptop devices, multiprocessor systems, microprocessor-based systems, programmable consumer electronics such as smartphones, network PCs, minicomputers, mainframe computers, distributed computing environments that include any of the above systems or devices, and the like.
[0090] In its most basic configuration, the operating environment 700 typically includes at least one processing unit 702 and memory 704. Depending on the exact configuration and type of computing device, memory 704 (instructions to perform a cellular-communication-assisted PPV as described herein) may be volatile (such as RAM), non-volatile (such as ROM, flash memory, etc.), or some combination of the two. This most basic configuration is illustrated in FIG. 7 by dashed line 706. Further, the operating environment 700 may also include storage devices (removable, 708, and/or non-removable, 710) including, but not limited to, magnetic or optical disks or tape. Similarly, the operating environment 700 may also have input device(s) 714 such as remote controller, keyboard, mouse, pen, voice input, on-board sensors, etc. and/or output device(s) 716 such as a display, speakers, printer, motors, etc. Also included in the environment may be one or more communication connections, 712, such as LAN, WAN, a nearfield communications network, a cellular broadband network, point to point, etc. [0091] Operating environment 700 typically includes at least some form of computer readable media. Computer readable media can be any available media that can be accessed by processing unit 702 or other devices comprising the operating environment. By way of example, and not limitation, computer readable media may comprise computer storage media and communication media. Computer storage media includes volatile and nonvolatile, removable and non-removable media implemented in any method or technology for storage of information such as computer readable instructions, data structures, program modules or other data. Computer storage media includes, RAM, ROM, EEPROM, flash memory or other memory technology, CD-ROM, digital versatile disks (DVD) or other optical storage, magnetic cassettes, magnetic tape, magnetic disk storage or other magnetic storage devices, or any other tangible, non-transitory medium which can be used to store the desired information. Computer storage media does not include communication media. Computer storage media does not include a carrier wave or other propagated or modulated data signal.
[0092] Communication media embodies computer readable instructions, data structures, program modules, or other data in a modulated data signal such as a carrier wave or other transport mechanism and includes any information delivery media. The term “modulated data signal” means a signal that has one or more of its characteristics set or changed in such a manner as to encode information in the signal. By way of example, and not limitation, communication media includes wired media such as a wired network or direct-wired connection, and wireless media such as acoustic, RF, infrared and other wireless media.
[0093] The operating environment 700 may be a single computer operating in a networked environment using logical connections to one or more remote computers. The remote computer may be a personal computer, a server, a router, a network PC, a peer device or other common network node, and typically includes many or all of the elements described above as well as others not so mentioned. The logical connections may include any method supported by available communications media. Such networking environments are commonplace in offices, enterprise-wide computer networks, intranets and the Internet. FIG. 7 illustrates an example computer system with multiple network connections. In examples, all components in FIG. 7 have non-deterministic finite service life.
[0094] The description and illustration of one or more aspects provided in this application are not intended to limit or restrict the scope of the disclosure as claimed in any way. The aspects, examples, and details provided in this application are considered sufficient to convey possession and enable others to make and use the best mode of claimed disclosure. The claimed disclosure should not be construed as being limited to any aspect, for example, or detail provided in this application. Regardless of whether shown and described in combination or separately, the various features (both structural and methodological) are intended to be selectively included or omitted to produce an embodiment with a particular set of features. Having been provided with the description and illustration of the present application, one skilled in the art may envision variations, modifications, and alternate aspects falling within the spirit of the broader aspects of the general inventive concept embodied in this application that do not depart from the broader scope of the claimed disclosure.
[0095] The present disclosure relates to systems and methods for storing transaction data in a virtual storage according to at least the examples provided in the sections below. In one aspect, some embodiments include computer-implemented method of storing transaction data in a database, the method comprising receiving transaction data; determining a maximum read count associated with the transaction data, wherein the maximum read count is based at least on a number entities allowed to verify a transaction associated with the transaction data, storing the transaction data in a tuple, wherein the tuple includes a tuple key, and wherein the tuple key includes a transaction identifier and the maximum read count, receiving a request for the transaction data, responsive to the receiving the request, transmitting the transaction data, and removing the transaction data.
[0096] One embodiment contemplates a non-transitory computer-readable medium with instructions stored thereon, that when executed by a processor, facilitate network-scale parallel computing by performing the following steps: executing a computational process from at least one client node; distributing a set of computational instructions and a set of input data derived from the computational process to a plurality of worker nodes via a content addressable service network; maintaining a global data store comprising the set of computational instructions, the set of input data, and a set of result data; and sending a first single-sided communication message from the at least one client node to the plurality of worker nodes over a content addressable UVR network; wherein the first single-sided communication message sent from the at least one client node is assigned an expected completion time; wherein the plurality of worker nodes competes on the content addressable UVR network; and wherein the at least one client node sends a retransmission of the first single-sided communication message if a first response is not received within the expected completion time, and the retransmission of the first single-sided communication message is sent via a different network path than the first single-sided communication message, provided that at least one of the plurality of network nodes is active and connected to the content addressable UVR service network, wherein the content addressable UVR service network is implemented using a tuple space abstraction with (key, value) pairs and data parallel semantics.
[0097] In this embodiment the tuple space abstraction may be implemented with a singlesided communication interface.
[0098] At least one of client nodes communicates with a DNS server that records domain information in a synchronously replicated network-scale storage using the content addressable UVR service network.
[0099] The first single-sided communication message comprises a key, and wherein the steps further comprise: responding to the query message with a value or set of values that correspond to the key; and placing the value or set of values that correspond to the key in a shadow state; wherein the value or set of values remain in the shadow state until a second query message is received indicating that the value or set of values should be removed from the shadow state.
[0100] The instructions may further comprise a replication markup language comprising "lock" and "unlock" statements for queries updating data concurrently, configured to protect data replication consistency and, optionally, wherein the replication markup language has a form of (lock, name) and (unlock, name).
[0101] Further steps may include (i) providing a wrapped communication interface for an existing parallel processing application that uses an application programming interface selected from the group consisting of the Message Passing Interface (MPI), OpenMP, Remote Method Invocation (RMI) and Remote Procedure Call (RPC); (ii) translating a first direct message, received from the existing parallel processing application, into a first single-sided (key, value) Tuple Space operation; and (iii) translating a second single-sided (key, value) Tuple Space operation into a second direct message for delivery to the existing parallel processing application and, optionally, wherein the wrapped communication interface is capable of changing a processing granularity of the parallel process without recompiling the set of computational instructions, and is capable of fault tolerant non-stop operation without checkpoints.
[0102] In another embodiment a network-scale distributed storage system may comprise (i) at least one client node; (ii) a plurality of storage nodes connected by a set of redundant communication links in a content addressable UVR service network implemented using the Tuple Space abstraction using a Statistic Multiplexed Computing protocol; (iii) wherein each storage node comprises a transaction replication gateway, configured to provide dynamic serialization of synchronously replicated concurrent update queries, dynamic query load balancing, and non-stop database resynchronization.
[0103] In an embodiment, the plurality of storage nodes may be configured to communicate with one another via a single-sided (key, value) API.
[0104] A DNS server may be included hosting a single public service URL mapped to all nodes in the content addressable UVR service network and, optionally, wherein the DNS server comprises a network-scale distributed storage system comprising a content addressable UVR service network.
[0105] In an embodiment, at least one client node may implement a redundancy elimination protocol, and wherein the at least one client node uses the content addressable UVR service network for service.
[0106] In another embodiment a network-scale parallel computing system is disclosed that comprises: a client node; and a UVR content addressable service network, comprising: a plurality of distributed Statistic Multiplexed Computing nodes having a collective set of resources, and implementing a Tuple Space abstraction; and a plurality of redundant networks, wherein each of the plurality of distributed Statistic Multiplexed Computing nodes is communicatively connected to at least one of the plurality of redundant networks; wherein the client node is connected to the service content addressable network via a plurality of redundant network connections; and wherein the service content addressable network is configured to completely decouple programs and data from processors, storage, and communication hardware.
[0107] In an embodiment, a partial failure in any processor, network and storage does not irreparably interrupt execution of the set of instructions.
[0108] In any embodiment, the client node implements a retransmission and redundancy elimination protocol upon a timeout event on a task executing on a first node, the retransmission and redundancy elimination protocol configured to execute from different processors, networks and storage from the first node and/or, wherein each worker node comprises a data and transaction replication gateway, configured to ensure parallel synchronous replication on multiple nodes and non-stop data resynchronization for storage or database recovery.
[0109] In a further embodiment a system is disclosed comprising a client side network and a service-side network; wherein the service-side network employs programming using a singlesided Statistic Multiplexed Computing protocol; and wherein the client-side network employs programming comprising: direct client to service protocols; at least one timeout retransmission; and redundancy elimination protocols.
[0110] Any of the one or more above aspects in combination with any other of the one or more aspect. Any of the one or more aspects as described herein.
Exemplary Embodiment 1
[0111] A.l Mission Critical Energy-Efficient Infinite Scale Blockchain
[0112] The proliferation of Internet enabled services has made software safety the leading research and development challenge in recent history. The public blockchain protocol experiments demonstrated that secure and non-stop computing, considered impossible by IT experts, is practically feasible in large scale without dedicated trusted servers. The immutable virtual ledger using unreliable networks and computers not only enabled financial services for people who were out of the reach of traditional banks, but also inspired many enterprise asset management applications. From high-value asset tracking, cryptocurrencies, blockchain passports, medical records to blockchain loT (Internet of Things) applications, all promising transformative changes to the existing technology landscape. [0113] However, it was reported that as high as 92% enterprise blockchain experiments have failed. These failures exposed the fundamental flaws in the protocol design and practical deployments.
[0114] The most visible blockchain shortcoming is its scalability. The requirements of a single ledger for all transactions and all nodes must process and store the same ledger seem impossible to scale from the outset.
[0115] Applying a complex protocol, such as blockchain, to solving practical problems requires thorough understanding of the protocol strengths and weaknesses; and potential unintended consequences that may arise. Blockchain, like other technologies, does not live in a vacuum devoid of any significant linkage to organizational and societal norms, dysfunctions and purposes.
[0116] The traditional software development methodologies assume 100% hardware reliability. The reliability, security and scalability of the resulting services are the “after thoughts”. To date, all remedies fall short of meeting the basic software safety requirements.
[0117] The following solution solves the software safety challenge by resolving the blockchain transaction processing throughput and energy efficiency challenges. This solution also includes transformation of legacy transactional processing infrastructures and compute intensive applications for safer and more scalable services, paving a way for bridging the deployment gaps between public and private blockchains as well as permission-based legacy database and traditional high performance computing systems. It enables a universal data plane from mobile to cloud ready for integration with compute and data intensive artificial intelligence and machine learning applications, and quantum computer applications.
[0118] The basic software safety requirement is zero single-point failure for all mission critical services. To date, no enterprise services can meet this basic requirement. The problem is rooted in the unrealistic assumptions made in the design of software development methodologies: reliable processors, networks and storage, despite of the programming fallacy warnings in 1990’ s.
[0119] The blockchain protocol is a high-level one-sided communication protocol built using raw hop-to-hop data communication protocols, such as TCP/IP sockets and RPC (remote procedure call). Satoshi Nakamoto, “Bitcoin: A Peer-to-Peer Electronic Cash System”, www.bitcoin.org, 2008. The blockchain protocol constructs a virtual ledger processor powered by participating nodes without reliable networks and trusted servers. Blockchain network has no single point failures. Participating nodes are rewarded with digital tokens for contributing transaction processing powers. This seemingly revolutionary feature demonstrated the feasibility of non-stop lossless data services that enabled cryptocurrencies and other applications where the traditional trusted database servers fall short. Sam Daley, “30 Blockchain Applications and Real- World Use Cases Disrupting the Status Quo,” March 31, 2021, Updated July 11, 2021. (https://builtin.com/blockchain/blockchain-applications).
[0120] The blockchain transaction throughput is low, estimated 4-7 transactions per second. Kenny L., “The Blockchain Scalability Problem & The Race for Visa-Like Transaction Speed,” Jan 30, 2019 (https://towardsdatascience.com/the-blockchain-scalability-problem-the-race-for- visa-like-transaction-speed-5cce48f9d44). The protocol requires a single public ledger for all transactions from the “genesis block”, and all transactions must be processed and stored on all nodes. Different consensus algorithms have been developed trying to increase the transaction throughput. To date, all proposals introduced different risk factors that can weaken the integrity of the virtual ledger processor. Jeff Nijsse and Alan Litchfield, “A Taxonomy of Blockchain Consensus Methods,” Journal of Cryptography 2020, 4(4), 32: https://doi.org/10.3390/cryptography4040032 , November 2020.
[0121] Traditional databases enjoy millisecond transaction processing throughput, such as Visa processing, estimated at least 1,700 transactions per second. However, these databases suffer a different scalability challenge: when adding database servers for processing larger workloads, the infrastructure must choose between performance and reliability. Achieving better performance and reliability together while adding database servers has been an open challenge. Matt Allen, “Relational Databases Are Not Designed For Scale,” MarkLogic.com, (https://www.marklogic.com/blog/relational-databases-scale/) and Vaidehi Joshi, “Scalability Problems: Hidden Challenges of Growing a System,” February 27, 2019 (https://medium.com/baseds/scalabilitv-problems-hidden-challenges-of-growing-a-system- f74313bO63c3).
[0122] Other crypto developments include Bitcash - an attempt to increase transaction block size from 1MB to 4MB (SegWit related hard fork from Bitcoin) to accelerate transaction processing throughput, and many off-chain transaction processing applications such as Raiden Network, Lightning Network, and others. Although Bitcash is faster than Bitcoin, its transaction processing throughput also stagnates when more miners join the network. The off-chain crypto applications only use the Bitcoin chain for final settlement by handling small transactions directly in private chains or traditional databases. Lome Lantz, Daniel Cawrey, “Mastering Blockchain,” O’Reilly Media, Inc. ISBN: 9781492054702, November 2020.
[0123] Ethereum was created with the intention to be a global open-source platform for custom assets management applications. The latest Ethereum 2.0 plan includes consensus algorithm change to use POS (proof of stake) and ledger sharding to improve transaction verification performance. Vitalik Buterin, “Why Sharing is Great: Demystifying the Technical Properties, ” April 7. 2021 (https://vitalik.ca/general/2021/04/07/sharding.html). Since all nodes must still store all transaction logs, the scaling challenges remain. In other words, the ledger growth will eventually saturate all nodes leading to a complete chain shutdown.
[0124] A.2. Blockchain Scalability Limits
[0125] Ledger Storage Limit: The blockchain ledger is a historical transaction log since the genesis block. Unlike traditional database servers, once a server’s capability is saturated, adding more servers can expand the services to serve more users. The blockchain ledger will eventually fail when the storage requirement exceeds every participating node. Currently, a full node in Bitcoin requires at least 340GB storage with expected growth at least 1GB per month. Vitalik Buterin, “The Limits to Blockchain Scalability ' , May 23, 2021 (https://vitalik.ca/general/2021/05/23/scaling.html).
[0126] Electricity Consumption Limit: Since all nodes must process all transactions, all nodes burn electricity for transaction verification using Proof of Work (POW) or Proof of Stake (POS) blindly, the electricity consumption soars monotonically as the number of miner increases. Since more mining nodes does not contribute to better transaction processing throughput, the electricity consumption limit in local grid will be reached eventually. Amanda Shendruk, Tim McDonnell, “How Much Energy Does Bitcoin Use?”, Quartz.com (qz.com), June 25, 2021, (https://qz.com/2023032/how-much-energy-does-bitcoin- use/#:~:text=The%20most%20reputable%20such%20estimate,close%20to%20what%20is%20c onsumed).
[0127] Security Limit: The most dangerous security attack on the blockchain network is the hostile takeover attack. Individual miner DoS (Denial of Service) attacks are to assist the takeover attacks. The current protocol incentive policy resulted centralized mining pools for profits. There are real hostile protocol takeover possibilities if the largest mining rigs conspire together. Since the blockchain cannot tell how many unique users are in the network, there are also social-coordinated takeover risks. Jeff Nijsse and Alan Litchfield, “A Taxonomy of Blockchain Consensus Methods, ” Journal of Cryptography 2020, 4(4), 32: https://doi.org/10.3390/cryptography4040032 , November 2020.
[0128] Transaction Processing Efficiency Limit: The average Bitcoin transaction processing time is about ten minutes with an expected confirmation waiting time of 60 minutes for six confirmations. These times are necessary for security reasons. The challenge is the worsening efficiency on the same ledger since more miners consume more electricity and network bandwidth with negative impacts on processing throughput.
[0129] A.3. A Scalability Solution
[0130] The fundamental scalability challenge is the ability to deliver performance and reliability together when expanding the infrastructure (hardware processors, networks and storage) processing capabilities without compromising security and reliability. The following solution is applicable to all life cycles of every mission critical service.
[0131] The blockchain network builds immutable digital assets that are not directly accessible by ordinary users via the Internet. The off-chain networks and transaction processing “gateways”, such as Coinbase and Binance, help bringing the blockchain network to the ordinary users. Lome Lantz, Daniel Cawrey, “Mastering Blockchain" , O’Reilly Media, Inc. ISBN: 9781492054702, November 2020.
[0132] The crypto transaction processing gateways deploy traditional databases or private blockchains for higher transaction processing throughputs. The ultimate transaction settlements are done on the public blockchains. These gateways suffer the scaling challenges in both worlds. The ledger security proved the most vulnerable. The recent FBI intercept of Ransomware payment at the Bitcoin exchange was a proof of the security vulnerability at the crypto exchanges. Department of Justice, “Department of Justice Seizes $2.3 Million in Cryptocurrency Paid to the Ransomware Extortionists Darkside,” June 2021 (https://www.justice.gov/opa/pr/department-justice-seizes-23-million-cryptocurrency-paid- ransomware-extortionists-darkside). There have also been many high-profile cryptocurrency heists at the exchanges. Since 2008, however, except for one protocol hack attempt that was fixed before any damages, no successful Bitcoin attack was reported on the blockchain itself. Lome Lantz, Daniel Cawrey, “Mastering Blockchain" , O’Reilly Media, Inc. ISBN: 9781492054702, November 2020. [0133] The phrase “service scalability” has different meanings to different people. Technically, the scalability of a robust service infrastructure should be its ability to expand processor/network/storage, physical and virtual, for handling ever increasing workloads without loss in processing performance, reliability and security. The following bases the discussion on this definition of scalability.
[0134] For legacy databases, security is centrally controlled. Hardware scaling can improve performance without compromising security. For blockchain, however, hardware scaling will alter the network security calculus due to the changed bases in the consensus algorithm risk measures.
[0135] The prerequisite for achieving scalable performance and reliability together when expanding hardware capabilities is to decouple software from hardware devices. This is because all hardware components are the “first-responders”, if using the human social network analogy. Regardless manufacturing processes, all hardware have limited hazard tolerance ranges and typically non-deterministic service lives.
[0136] The blockchain protocol design naturally decoupled the protocol software from the running processors and networks. However, the design of the consensus algorithm and process violated the scalability requirements: increase in the participating mining nodes does not increase transaction processing throughput. In practical applications, this has resulted in large electricity wastes since majority of the mining nodes are competing blindly.
[0137] The blockchain network integrity is only protected by the consensus algorithm without relying on any trusted third-party servers. There must be enough mining nodes to ensure the integrity of the virtual ledger in order to prevent potential hostile takeover. DoS (denial of service) attacks can aid the takeover attacks. The current blockchain scalability challenge is the direct consequence of the security-only design.
[0138] The following discloses use of ACAN (Active Content Addressable Networking) technology for the infinite scaling of compute intensive, transactional and blockchain applications.
[0139] A.3.1 ACAN for Infinitely Scalable Parallel Processing
[0140] ACAN stands for Active Content Addressable Networking. ACAN is a high level single-sided data communication and synchronization protocol based on the Tuple Space abstraction. [0141] A Tuple Space is a virtual network memory for <key, value> tuples. Tuples are accessible via three protocols:
1. put(key, value)
2. get(&key, &value) - where “&” indicates the mutability of the parameter when a key match is found in the Tuple Space. Once matched, the matching tuple is removed from the space.
3. read(&key, &value) - where “&” indicates the mutability of the parameter when a key match is found in the Tuple Space. The matching tuple remains in the space.
[0142] For example, a sender can put(“Speakerl”, “Bob”) into the network. Any program running key=”Speaker*” and read(&key, &buffer) will result in key=”Speakerl” and buffer=”Bob”. The information “Speakerl = Bob” is a broadcast to the entire network. Using put and get can implement “exclusive fetch” or the message passing functionality. Unlike the lower level hop-to-hop networking protocols, such as MPI (Message Passing Interface), RPC (Remote Procedure Call) and others, there is no concept of receiver “address”. There is no “hop” in ACAN applications. In other words, the node IP addresses are invisible in ACAN applications. Since ACAN runs on all nodes and controlling all network connections, ACAN applications are completely decoupled from processing hardware and networks. An ACAN application forms a UVR (unidirectional virtual ring) of participating nodes. The UVR is formed when the application runs. FIG. 8 illustrates a running application’s UVR.
[0143] An ACAN application consists of a client (master) that creates tasks to be processed and a solver program runnable on any number of nodes (workers). When the ACAN application runs, its runtime tuple matching system will automatically form SIMD, MIMD and pipelined parallel processing clusters leveraging the UVR as the common data plane. The tasks are processed by all available processors and networks in parallel. The client program has a retransmission protocol that will resend a task when its expected result is not returned on time.
[0144] A critical difference between ACAN parallel application and legacy parallel applications is the inclusion of the timeout/retransmission protocol in the client. The client timeout/retransmission protocol will automatically recover all network and processor failures while allowing the best-effort performance of the entire processing platform.
[0145] The ACAN parallel application will eventually terminate regardless partial network and node failures. This ensures the “liveliness” of the application. Tuning the task size (granularity) can optimize the parallel performance according to the Brachistochrone Curve, an equilibrium of computing and communication overheads solving problems of different sizes. Justin Y. Shi, “Program Scalability Analysis for HPC Cloud: Applying Amdahl’s Law to NAS Benchmarks," Supercomputing 2012 Companion, November 2012.
(https://ieeexplore.ieee.org/document/6495929) and Justin Y. Shi, Moussa Taifi, A. Khreishah, and Jie Wu, “Tuple Switching Network: When Slower Maybe Better," Journal of Parallel and Distributed Computing, 2012 (https://www.sciencedirect.com/science/article/abs/pii/S0743731512000263). The ACAN application will enter an infinite wait if all nodes and networks fail at the same time.
[0146] ACAN enables decentralized processing for traditionally centralized parallel processing applications. Security protocol processing can also be parallelized to minimize the timing impact without reliability concerns.
[0147] For parallel applications with problem fixed size, adding nodes to the ACAN network will improve performance subject to the economic law of diminishing return. Since there are no negative consequences in application reliability and security, expanding these applications for solving bigger problems, can deliver better performance and reliability indefinitely.
[0148] The above statements have been proved in practical experiments. Justin Y. Shi, “Program Scalability Analysis for HPC Cloud: Applying Amdahl’s Law to NAS Benchmarks," Supercomputing 2012 Companion, November 2012. They can also be proved formally using Amdahl’s Law and Gustafson’s parallel performance quantification method. Justin Y. Shi, Moussa Taifi, A. Khreishah, and Jie Wu, “Tuple Switching Network: When Slower Maybe Better, ” Journal of Parallel and Distributed Computing, 2012.(https://www.sciencedirect.com/science/article/abs/pii/S0743731512000263). Note that Amdahl’s Law does not account for communication overheads in parallel processing. Timing models can be built to calibrate for accurate performance predictions. Justin Y. Shi, “Reevaluating Amdahl’s Law and Gustafson’s Law," 1996 (https://www.researchgate.net/publication/228367369_Reevaluating_Amdahfs_law_and_Gustaf soris_law#:~:text=In%201967%2C%20Amdahl's%20Law%20was,laws%20are%20in%20fact% 20identical.&text=For%20these%20algorithms%2C%20the%201aw%20can%20be%20abused.).
[0149] The three fundamental differences between ACAN parallel application and legacy parallel applications are a) complete decoupling of programs from processors and networks, b) statistic multiplexing of all hardware components, and c) client-side timeout/retransmission discipline. It may be called Software Defined Parallel Processing (SDPP). It is a paradigm shift from client-server to serverless computing.
[0150] A.3.2 ACAN for Infinitely Scalable Transaction Processing
[0151] Transaction processing is typically done by database systems, SQL or Non-SQL. The ACAN network can be exploited to deliver infinitely scalable transactional processing performances without reliability or security degradation.
[0152] The ACAN transactional network is a network of transaction processing gateways. Each transaction processing gateway represents a partition of the database. The gateway connects to K database servers in a shifted circular fashion (FIG. 9). The entire ACAN transaction network represents a single virtual database with P partitions. In FIG. 9, each gateway drives K=2 database servers.
[0153] Each gateway performs three tasks: a. Synchronous replication for data changing transactions with dynamic serialization. This means that all servers are forced to execute queries in the exact same order in synchrony if they update the same data. Otherwise, all transactions will be replicated in wire speed. This ensures all K servers are in sync at all times. b. Dynamic load distribution for read-only transactions. This ensures all K servers contribute evenly leveraging the K synchronously replicated database servers. c. Non-stop database resynchronization. When the replicated transaction results in inconsistency for any reason, the gateway’s voting algorithm determines the winner(s). The losers are instantly disconnected to ensure the database consistency. The gateway is also responsible for rebuilding and reconnecting the disconnected servers in parallel with no more than 60 seconds service downtime independent of database size using a “Mobius strip algorithm”.
[0154] The ACAN transaction network is a single logical database with K replicas. Similar to the parallel application client, the ACAN database client also needs a transaction retransmission protocol when a query times out. This retransmission protocol always checks the status of the timeout transaction on the target server before re-sending the transaction. This eliminates the “double spending” risks and enables the ACAN transaction network to overcome the gateway and database server single-point failures by automatically deploying backup gateways and servers under the same public gateway IP address (resolved by a global or local DNS server).
[0155] Unless all K servers crash at the same time or the DNS is compromised, combined with entire network blackout, all database queries will always be processed and delivered, as long as there exists a single path connecting the client with at least one server.
[0156] Up scaling the ACAN transaction network for larger workloads can increase partitions P. Enhancing data resilience can increase K. In legacy database servers, P horizontal partitioning improves the service performance but with increased service downtime risks by the same P factor. In ACAN transaction network, each partition has K replicas in a shifted mirroring fashion on UVR, as long as P >> K, the ACAN transaction network can deliver increasing performances indefinitely without negative reliability and security impacts.
[0157] The disclosed methods are applicable to network storage where storage client software updates are trivial. Using the ACAN storage in DNS will eliminate the last Internet single point failure: DNS, thus making network-based DDoS attacks far less lethal.
[0158] This may be called Software Defined Database Cluster (SFDC) or serverless transaction processor.
[0159] A.3.3 AC-Chain: ACAN Blockchain
[0160] The blockchain protocol requires a transaction pool, a bulletin board where all transactions are posted. In the case of Bitcoin, it is called “mempool”. Sean O’Connor, “Mastering Mempool, ” Hackemoon.com , April 19,2020. (https://hackernoon.com/mastering- the-mempool- a-ho w- to- guide-zs7u32ou) .
[0161] The mempool is used by all miners to build transaction blocks. In the case of Bitcoin, mempool is implemented using RPC (Remote Procedure Call) protocol. RPC requires receiver’s IP addresses to send data. It is one of the hop-to-hop protocols. The Bitcoin protocol builds a sophisticated mechanism to circumvent the protocoloprocessor coupling by implementing internal DNS (domain name server) services.
[0162] In this embodiment, an Active Content Addressable Networking (ACAN) based blockchain: AC-Chain, is proposed. The AC-Chain replaces the legacy blockchain RPC transaction broadcast protocol by ACAN protocols. AC-Chain implements a network- wide Tuple Space. It has a simpler DNS service leveraging its UVR (unidirectional virtual ring) topology. [0163] AC-Chain also has three protocols:
1. put(Name, Tuple, &Count): where Count controls the number of read/get this tuple would allow. When Count reaches zero, the tuple is deleted in the space.
2. get(&Name, &Buffer): where & indicates the parameter mutability when a matching tuple is found. The matching tuple will be deleted in the space.
3. read(&Name, &Buffer): where & indicates the parameter mutability when a matching tuple is found. The matching tuple persists in the space if the tuple Count is not zero. Otherwise, it will be removed from the space.
[0164] Each transaction will be stored as a tuple. The tuple name is the transaction ID suffixed by a replication count:
61al23026477e6b53c4423d23fd85954894627a2d0204982971ela82902980b0c I r-count
[0165] The ACAN Tuple Space replaces the mempool. When miners pack transaction blocks, they ignore the r-count. The miners proceed as the original Bitcoin protocol. The ACAN Tuple Space daemon will remove the transaction tuple when r-count reaches zero. This ensures no more than r-count miners working on the same transaction. Each miner has an asynchronous listener waiting for a verified block tuple’s broadcast.
[0166] After a miner found the hash for a block target, it generates a new block tuple with a same r-count into the space. The miners’ listeners will be triggered, but only r-count copies will be read. Unsuccessful listeners will continue their low energy waiting states.
[0167] When a miner reads a block tuple and has verified its validity, it stores the block to the blockchain along with its r-index. The block is rejected if the verification fails.
[0168] Blocks are randomly distributed to all miners. Each transaction is only replicated r- count times.
[0169] The value of r-count is determined by the ledger safety calculus. It must be big enough to counter the hostile takeover attempts but smaller than the total number of miners.
[0170] Since AC-Chain protocol allows direct block retrieval by transaction ID via the network, it is not necessary to store the entire ledger on any one node. As more miners join the network, the ledger can grow indefinitely.
[0171] Transaction broadcasts are limited by R-count that only R-count copies of ledgers exist in the entire blockchain network. The transaction blocks (transaction logs) are randomly distributed. Each transaction can only be packed into r-count blocks that are to be mined in parallel. Since each node packs its own transaction blocks, the distribution of transactions will be automatically randomized.
[0172] The value of R-count can be adjusted based on security requirements of the ledger. Once r-count is determined, adding miners to the network will increase the blockchain transaction throughput by forming pipelined parallel processors with fixed broadcast/block synchronization overheads.
[0173] The virtual ledger can now grow indefinitely due to the automatic constrained block replication and distribution. All nodes will still par-take in the ledger processing and storage, but at randomized fashion, thus improving blockchain security gradually.
[0174] Electricity wastes will be dramatically reduced since the miners are automatically informed when the R-count is reached. The waiting miners enter low energy sleep-mode. They will become ready when the next transaction block is ready to be built as soon as the pending block ID is confirmed. FIG. 10 illustrates the block processing pipeline on AC-Chain.
[0175] The implementation of r-count adjustments is facilitated by the UVR (Unidirectional Virtual Ring) protocol in ACAN by the virtual single-system image delivered by UVR.
[0176] Unlike the blockchain protocol, the ACAN enables centralized control over decentralized processes. The two foundational improvements enable more miners to earn productive rewards without compromising ledger integrity using either POW or POS. A slower miner can get ahead when r-count reaches zero. Hostile takeover risk is reduced since it highly unlikely a high-power miner would be able to insert consecutive blocks. A parallel block processing pipeline will form. The overall energy efficiency will be dramatically improved.
[0177] The AC-chain can scale indefinitely as long as R-count « the number of total miners. The minimal R-count is 7 (=3f+lf This Byzantine failure prevention formula is to prevent any /=2 nodes to conspire together for a hostile takeover attack [19].
[0178] AC-Chain is applicable for both public and private blockchains.
[0179] A.3.3.1 AC-Chain Transaction Processing Time
[0180] The blockchain per transaction processing time must be throttled to ensure the integrity of the transactions (double- spending free). The current industry standard is six confirmations. Since each block requires approximately 10 minutes to verify, the difficulty level is automatically adjusted periodically to maintain that time, the total elapsed time is about 60 minutes [18]. This time will not change even under AC-Chain.
[0181] A.3.3.2 AC-Chain Transaction Processing Efficiency
[0182] The AC-Chain will improve the overall blockchain’s efficiency without compromising security as more miners join the network. The new miners form a block processing pipeline as depicted in FIG. 10. Even though the blocks still need to be added sequentially, since the waiting miners are in sleep-mode with drastically lower power consumption, the reduced network bandwidth due to r-count replication limit, and saved electricity consumption improves the overall efficiency of the chain. Since the efficiency increases monotonically without security/reliability compromises as new miners join the network, AC-Chain can scale indefinitely. The pipeline efficiency is above bounded by NP/(N+P-1 ), where N is the number of transaction blocks to be processed and P is the number of pipeline stages (blocks being mined by “committee of miners”) formed by concurrently running miners regulated by r-count (FIG. 10). Since typically N >> P, the efficiency bound is P -times.
[0183] A.3.3.3 Mining Node Requirements and Reduced Processing Costs
[0184] ACAN deployment enables more “democratic computing” by enabling low power computers to join the network. A computer with a 500GB SSD and 8GB memory with a consumer-grade Internet that could handle 1-3 MB blocks every 10 seconds, would be able to earn rewards when joining the network. More miners joining more efficient processing will lead to eventual lower transaction costs.
[0185] A.3.3.4 UVR Overhead
[0186] The AC-Chain UVR overhead is above bounded to O(lg(P)) where P is the total number of miners. For a million miners, the overhead is about 20 multicasts. Currently, the largest Bitcoin network once reached about 14,000 miner nodes. It took approximately 15 seconds to populate the 98% of the Bitcoin network. Kenny L., “The Blockchain Scalability Problem & The Race for Visa-Like Transaction Speed," Jan 30, 2019 (https://towardsdatascience.com/the-blockchain-scalability-problem-the-race-for-visa-like- transaction-speed-5cce48f9d44).
[0187] A.3.3.5 AC-Chain Security [0188] The AC-Chain minimal security calculus remains the same as the original Bitcoin network with R-count nodes, since no miner is allowed to expand the mining pool per transaction. Due to randomization of ledger blocks and mining, hostile takeover is more difficult than the original blockchain network.
[0189] A.3.3.6 Public and Private Chain Integration
[0190] Private blockchains are suitable for applications that need to be regulated either by government or some other authorities. The AC-Chain protocol allows seamless integration of public and private blockchains via the content addressable networking protocols.
[0191] A.3.3.7 Exchange Gateway Database Integration
[0192] Vast majority financial and web applications use traditional databases for ordinary users. The AC-Chain protocol also allows seamless public, private crypto infrastructure integration with ACAN transaction networks.
[0193] A.3.3.8 HPC Engine Integration
[0194] The public, private AC-Chain or the exchange gateway can seamlessly integrate HPC (high performance computing) hardware (GPUs and FPGAs) with ACAN enabled Al and ML applications to form super-nodes, because the ACAN data plane is accessible from cloud to mobile devices.
[0195] Summary
[0196] This exemplary embodiment discloses a practical application of the Active Content Addressable Networking concept and implementations. These include compute and data intensive applications, transactional applications and blockchain applications.
[0197] The fundamental principles in delivering high performance, high reliability and security for critical infrastructures include four necessary conditions: a. Complete decoupling of software and data from processors, networks and storage using higher level data communication protocols like ACAN protocols. b. Full resource statistic multiplexing at runtime (ACAN runtime). c. Client programs must include ACAN retransmission protocol. d. Infrastructure scaling must maintain higher parallel gains and slower overhead growth, especially replicated overheads (ACAN infrastructure discipline). [0198] The disclosed technology enables a computing paradigm shift from the traditional client-server to the serverless paradigms. In the era of quantum computing, the ACAN powered serverless infrastructures are well suited to deploy quantum computers of fixed cubits for practical scalable applications.
Exemplary Embodiment 2
[0199] Quantum-class Cloud Computing
[0200] Resource efficiency is the driving force for the growing cloud applications. The insatiable needs for Al and extreme scale data intensive high performance applications are driving GPU supercomputing and quantum computing developments. All efforts, quantum and clouds, face the same technical challenge: how to produce high performance results reliably at any scale - a fundamental challenge in general computing architectures.
[0201] B.l Introduction
[0202] Reliable computing using unreliable components has been a known theory since 1950’s. The actual reliable computing architectures, however, have been elusive causing the common scalability dilemma today. This white paper reports Active Content Addressable Networking (ACAN) computing architecture designed for solving the common dilemma, promising to deliver reliable high performance results in any scale. An immediately related challenge is the theoretical scaling limits of any cloud parallel application. By showing the mathematical convergence of the Amdahl’s Law and Gustafson’s Law, the Amdahl’s Law has enough power to prove the unlimited scaling solving problems with open sizes. Since a quantum computer with any number of qubits will have a fixed performance upper bound, the reported ACAN architecture would be an ideal quantum-class cloud computing architecture subsuming any number of clouds and quantum computers. Proof of concept results are included demonstrating better performances with increasingly better reliability in any scale for compute and data intensive applications and transactional applications.
[0203] The first theoretical challenge in computer architecture design is to determine the ultimate application’s scaling limits. For estimating a parallel application’s speedup, Amdahl's Law and Gustafson's Law each draws a different speedup bound. They have been labeled as “fixed-size” and “scaled-size” models, respectively. Historically, Amdahl's Law was used as an argument against massively parallel processing (MPP), whereas Gustafson's Law has been used to justify MPP. A 1996 analysis found that these laws were mathematically equivalent based off Amdahl’s model. However, the analysis did not explain why the two models would yield different speedup bounds while producing identical predictions. The myth persists to this day.
[0204] This embodiment first examines the impacts of parallel scaling directions in speedup predictions. The Amdahl’s pessimistic speedup bound under the “fixed-scale” assumption informs the limitation of fine-grain parallel computing. It is only half of the story. Gustafson’s model helped to quantify the elusive behavior of Amdahl’s speedup bound by revealing the other half. Both processor scaling directions are useful in applications deploying multiple parallel accelerators. Once the problem sizes are clearly understood, both models converge to the same unlimited speedup bound. In other words, given unlimited resources and open problem size, all applications should scale indefinitely. This propels the clouds capable running these applications to the “quantum class”, since infinity is greater than any performance by a fixed size quantum computer.
[0205] In order to harness the increasingly available resources, a proprietary decentralized active content addressable network (ACAN) data access model is reported. A critical criterion of this design is the complete decoupling of programs and data from hardware resources in order to deliver the best-effort performance and reliability in any scale deployments. This decentralized data model should enable “first-class” resource exploitation of all available sources, including federated clouds and quantum computers.
[0206] Since most existing parallel programs were built using legacy stateful (hop-by-hop) APIs, this paper also includes exploratory experiments of ACAN wrapped (protected) legacy programs against unwrapped in performance and reliability. Decentralized ACAN storage experiments are also included to demonstrate the power of automatic deep latency hiding against the Hadoop Distributed File System (HDFS).
[0207] B.2. The Amdahl’s and Gustafson’s Speedup Myth
[0208] In the early days of parallel system developments, a parallel program was modeled to contain a pure sequential part (s) and a pure parallel part (p) under a theoretical single processor condition in order to model its theoretical peak performance using N processors. For algebraic simplicity, the total of the parts is normalized to the unit value of one as the worst-case sequential processing time (Tseq). The speedup = Tseq/Tpar then becomes (s+p)/(s+p/N)=l/(s+p/N) [1, 2]. As N approaches infinity, the theoretical speedup is above bonded to (1/s), a steep function declining towards zero as the value of (s) increases FIG. 11. Amdahl, G.M., “Validity of single-processor approach to achieving large-scale computing capability”, Proceedings of AFIPS Conference, Reston, VA., pp. 483-485, 1967._Amdahl’s Law was an argument against massively parallel processing or MPP.
[0209] In 1988, Gustafson and colleagues conducted numerous practical parallel experiments using thousands of processors at the Sandia National Laboratories. Gustafson, J.L., “Gustafson’s Law”, Encyclopedia of Parallel Computing, Springer Link: https://link.springer.com/referenceworkentry/10.1007%2F978-0-387-09766-4_78 , 2019. They obtained measures of the sequential and parallel running times using multiple processors. They found Amdahl’s predictions were far off the actual measurements. Gustafson, J.L., “Reevaluating Amdahl's Law”, CACM, 31(5), pp. 532-533, 1988. Since the measured times were obtained under parallel processing environments, Gustafson proposed normalizing the measured parallel run’s sequential part (s’) and parallel part (p’) as the unit value of one in order to derive the theoretical peak sequential processing workload using a single processor. FIGs. 12A and 12B illustrate the differences between Amdahl’s and Gustafson’s formulations. The “Scaled-Size Model” formula, now called Gustafson’s Law [3]: speedup = Tseq / Tpar = s’ + Np’, for N processors. The speedup seems to linearly approach infinity as N approaches infinity, thus the term “scaled speedup” in the literature. Gustafson’s Law Entry, Wikipedia: https://en.wikipedia.org/wiki/Gustafson%27s_law, Retrieved 2019.
[0210] B.3. Problem Size and the Inductive Turkey
[0211] The problem size is absent in the speedup formulas, even though it is assumed in both models. Amdahl’s model is universally applicable to any problem size. Scaling the number of processors under a fixed problem size undermines the instruction-level parallelism (ILP) research, such as the Thinking Machines or Dataflow Machines. Taubes, G.A., “The Rise and Fall of Thinking Machines,” Inc.com. Online: https://www.inc.com/magazine/19950915/2622.html, Retrieved 2019, and Dennis, J. B., Misunas D.P., “A Preliminary Architecture for a Basic Data-Flow Processor,” Proceedings of the 2nd Annual Symposium on Computer Architecture (ISCA’75), 1975. Gustafson’s is an inductive model based on actual measures (s’ and p’) of a parallel program using N processors. To calculate speedup, it needs to project the theoretical workload using a single processor. Generalizing this speedup mapping makes the model a recurrence relation with respect to N. A deductive argument is required in order to prove the bound of this recurrence model. A famous example of requiring the deductive reasoning is Bertrand Russel’s “inductive turkey”, described as follows [6] : one cannot conclude that the turkey will live forever since it is fed every day. Similarly, the “fixed time” assumption: s’ + p’ = 1 is insufficient for proving its speedup bound.
[0212] B.4. Building the Deductive Proof
[0213] From Section B.l, it is evident that (s, p) and (s’, p’) are different percentage measures with (s, p) applicable to any problem size but (s’, p’) are dependent on problem size solved using N processors. A deductive proof can be constructed using Amdahl’s serial percentage definition based on Gustafson’s serial workload model: the N-processor theoretical serial workload(N) = Speedup (N) = s’ + Np’. For each N increment, N+l, the Amdahl’s workload(N+l) = s’ + (N+l)p’, while Gustafson’s mapping becomes a recurrence relation. Taking the limit of the Gustafson’s model with respect to N leads to an indeterminate form.
[0214] Starting from the very first un-normalized parallel measurements (s’, p’, N), we can compute Amdahl’s sequential workload percentages as follows:
[0215] Given: workload(N) = s’ + Np’.
[0216] Applying Amdahl’s definition for this N-processor workload: s(N) = s’ and p(N) = Np’.
[0217] Normalizing: s(N)=s’/(s’+Np’)
[0218] The N+l processor workload(N+l) = s’ + (N+l)p’, since s(N+l) = s’ and p(N+l) = (N+l)p’.
[0219] Normalizing: s(N+l)=s’/(s’+(N+l)p’), ...
[0220] In other words, the “inductive turkey” (s) shrinks monotonically with each N increment.
[0221] Unfortunately, Gustafson’s model does not contain (s) for completing the deductive proof. Amdahl’s model provides a path since it is applicable to any problem size and has a tight speedup upper bound. However, its bound (1/s) was never examined under the open problem size conditions while taking the limit of N.
[0222] B.5. The Translation
[0223] For any parallel measurements (s’, p’) using N processors, the Amdahl’s theoretical sequential workload of the parallel part is (p=Np’), and (s=s’). Since (s) and (p) in Amdahl’s model are independent of N, after normalization, one can then scale N freely. These steps show that the theoretical speedup bound will remain (1/s), which means that there was really no new “law” as one had hoped. FIG. 13)
[0224] For example, for N=10 processors, the sequential run time measured 5 seconds and parallel 6 seconds. After normalization: s’= 5/(5+6)=0.454, p’=6/(5+6)=0.545. Gustafson’s Law will predict Speedup = 0.454 + 0.545*10=5.9. If N=50, Gustafson’s Law will give: Speedup = 0.454 + 0.545*50 = 27.72.
[0225] To translate for Amdahl’s Law, s=5, N=10, p=6N=60. Normalizing: (s=5/(5+60), p=60/(5+60)) Amdahl’s Law predicts: Speedup = 1/(0.077 + 0.923/10) =5.9, which is identical to Gustafson’s Law. If N=50, the total theoretical sequential workload becomes s=5, p=6*N = 300. Normalizing (s) and (p), Amdahl’s Law will also produce the identical result: Speedup = 1/(0.077 + 0.923/50)=27.72.
[0226] A common mistake in applying Amdahl’s model for speedup prediction based on parallel measurements is using (s’, p’) in places of (s, p) by forgetting to recalibrate (s) against the total theoretical serial workload under single processor in the predicted state.
[0227] B.6. Solving the Mystery
[0228] If both models predict the same speedup, then why does Gustafson’s model allude to infinite speedup but Amdahl’s is above bounded to (1/s)? What about the “unforgiving curve” of the Amdahl’s Law that was used against MPP?
[0229] The problem stems from the direction of processor scaling. Scaling the processors against a fixed problem size emphasizes on the serial processing and informs the limitation of massively parallel fine-grained supercomputers. Taubes, G.A., “77z<? Rise and Fall of Thinking Machines, ” Inc.com. Online: https://www.inc.com/magazine/19950915/2622.html, Retrieved 2019, and Dennis, J. B., Misunas D.P., “A Preliminary Architecture for a Basic Data-Flow Processor, ” Proceedings of the 2nd Annual Symposium on Computer Architecture (ISCA’75), 1975. Scaling the number of processors with open problem size allows amplifying the parallel benefits by solving bigger problems using modern clusters. Gustafson’s model helped to reveal the diminishing (s) as the problem size increases but it failed to include (s) in its model. Putting the two models together exposed the hidden numerical behaviors of the Amdahl’s speedup bound.
[0230] Under the same definitions and assumptions proposed by Gustafson, mathematically, the (1/s) “unforgiving curve” is only applicable to fine-grain parallel processing or ILP machines. Although (1/s) approaches zero quickly when (s) increases, (1/s) approaches infinity equally as quickly as (s) approaches zero. FIG.ll). Decades of hardware advances by Moore’s Law and growing massively parallel application sizes have resulted in rapidly diminishing (s) values that were unimaginable in earlier days. Gustafson’s model quantitatively revealed that (s) will asymptotically approach zero if we recalibrate (s) after each N increase. A complete speedup analysis for Amdahl’s model is then possible: under the open problem size assumption, as N approaches infinity, with (p>0), (s+p/N) will approach 0, and the speedup=l/(s+p/N) approaches infinity. This also concludes the deductive proof for Gustafson’ s speedup bound.
[0231] Therefore, the decades old Amdahl’s Law image (Figure 4) needs an update to show the full power of the law. FIG. 15. In FIGs.14 and 15, (s) and (p) are normalized, p = (1-s). With a fixed problem size, the value of (p) is limited by the nature of the problem and its size. FIG. 14. With open problem size, when N approaches infinity, (p) will approach one (1) and the speedup will approach infinity. FIG. 15.
[0232] B.7. Lessons Learned
[0233] There are indeed two types of massively parallel computing: fixed-size massively parallel (fine-grain parallel computing or parallel accelerators) and scaled-size massively parallel (coarse-grain parallel computing or parallel clusters). Amdahl’s fixed-size speedup bound (1/s) informs the limitation of fine-grain parallel accelerators due to the fixed memory size. The model is too “economical” that quantifying (s, p) in practice was impossible until Gustafson’s proposal. Mixing the two types of serial percentages led to the wide-spread confusions. Gustafson’s model quantitatively revealed the dynamics of (s) as N scales to infinity with growing problem sizes. The mystery disappears using the deductive reasoning over the changing problem sizes in both models. A more formal treatment for the translation between the two models can be found in Shi, Y., “Reevaluating Amdahl’s Law and Gustafson’s Law”, Semanticscholar.org: https://pdfs.semanticscholar.org/bd5a/7d5be926a00a9c3falfa2bbdlfal8a74509e.pdf, 1996. Therefore, in addition to the historical “fixed scale” interpretation, Amdahl’s model is also useful for showing the diminishing impact of serial computing as the problem size grows. The model is still relevant today for applications using supercomputer with parallel accelerators, such as general-purpose GPUs (Graphic Processing Unit), where both scaling directions are useful for crafting an effective parallel solution by running small parallel experiments first. In theory, however, there is only one law. [0234] In other words, Gustafson’s model enabled the deductive proof that demystified hidden behavior of Amdahl’s elusive speedup bound (1/s). FIG. 16. This study enables infrastructure research for quantum-class HPC clouds.
[0235] B .8. Unified Data Access Model: Active Content Addressable Network
[0236] As illustrated by Amdahl’s model, fixed problem size limits the potential speedup in a downward trend as the sequential percentage (s) grows. A decentralized data access model is then an essential requirement to enable the upward speedup for applications with open sizes in order to shrink (s) indefinitely. To harness the increasingly available resources, a unified network data access model is essential in order to deliver performance and reliability at the same time without complex programming.
[0237] Active Content Addressable Network (ACAN) is a network implementation of the Tuple Space data abstraction. Gelernter, D., Carriero N., ''Coordination Languages and Their Significance,” CACM, 35(2), pp 97-107, 1992. Unlike the legacy communication networks, a Tuple Space is a transient memory of <key, value> objects supporting three operators: put(key, value), get(&pattern, &buffer) and read(&pattern, &buffer), as opposed to the legacy “send” and “receive” protocols. The “&” sign indicates that the data will be modified once a match is found in the Tuple Space. Since the read() and get() are blocking operators, applications built using this API enables data-parallel processing amongst communicating programs automatically. This simple parallel process coordination language facilitates automated parallel processing and latency hiding.
[0238] FIG. 17 shows the conceptual ACAN implementation via a unidirectional virtual ring (UVR). Shi, J.Y., US Patent Application Publication 20210297480. The word “virtual” is important since it enables harnessing multiple interconnection networks at runtime. The worstcase UVR traversal complexity is Ct(lgkP) where k is the fan-out degree of a ring broadcast protocol and P is the number of processors on the ring. A Tuple Space daemon runs on all nodes that each is responsible for the first-class local resource exploitation for all parallel programs. It is also responsible for the UVR maintenance and application controls. Each node is only deployed for one parallel program or multiple different parallel programs. This design allows first-class resource exploitation using multiple federated computing clouds without breaking security barriers, since the Tuple Space daemon is a user-privilege program. In other words, the UVR only forms when a user application is running.
[0239] B.9. A Quantum-Class Cloud Computing Architecture [0240] Traditionally, hardware designs define the computing architectures of the running programs, because the application programming interface (API) is driven by the hardware design choices. Each running application is a finite state automata executed by the hardware components. Hardware component correctness was assumed. Processor crash failure meant “game over” for any single-core architecture.
[0241] Once the hardware architecture entered the multicore and multiprocessor era, this paradigm was no longer appropriate. Regardless significant efforts in the past, a computing architecture’s inflection point was recognized in 2012. Computing Community White Paper, “21st Century Computer Architecture,” [Online] http://csl.stanford.edu/ christos/publications/2012.2 lstcenturyarchitecture.whitepaper.pdf, 2012. Applications built using the extensions of existing APIs, such as MPI, OpenMP, RPC and RMI, hit the scaling limits as seemingly insurmountable “walls”. Mathematically, the probability of application failure grows as the size of the “multi” part grows, since preventing crash failure was not part of the API designs. All existing APIs assume the interconnection networks between processors are reliable — the top fallacy in distributed and parallel computing. Joy, B., Deutsch, P., “Fallacies of Distributed Computing,” [online] http://en.wikipedia.org/wiki/Fallacies of Distributed Computing, Retrieved November 2019. These are the root causes of the commonly known scalability dilemma. Shi, J.Y., “The Scalability Dilemma and the Case for Decoupling,” Invited Article, HPCWIRE. [online] https://www.hpcwire.com/2016/03/30/scalability-dilemma-and- case-for-decoupling/. From the view point of state machines, each running parallel application’s state machine is so fragile that any single component crash will halt the entire application.
Regardless industry’s efforts to improve component reliability, the growth of core counts outstripped component reliability improvements. The API now defines computing infrastructure for every application. Currently, checkpoint/restart has become the de facto large scale HPC application failure prevention routines. The consequence is that cumulatively, at least 50% energy is wasted on checkpointing and unplanned idling.
[0242] The proposed unified data access model enables a new data parallel statistic multiplexed computing (SMC) architecture without memory size constrains. See, US Patent Application Publication 20210297480. Unlike applications built using stateful APIs, the SMC programs are stateless. All hardware resources are multiplexed. Thus, like the Internet, the SMC architecture enables the best-effort performance and reliability at the same time. The architecture is equally applicable for parallel accelerators and supercomputer clusters for mission critical applications using the quantum-class features.
[0243] B.10. Proofs of Concepts
[0244] This section reports computational results for the completely decoupled compute and data intensive applications. Contrary to the common wisdom of “either performance or reliability” when upscaling the number of processors, these experiments show that it is indeed possible to “have the cake and eat it too” under the proposed unified data access model.
[0245] B.10.1 Extreme Scale Computing
[0246] Tightly coupled bare metal parallel programs are suitable parallel accelerators. However, they are under the spell of the speedup limit by the “fixed size” Amdahl’s bound and the increasing multicore crash failures. Complete program and data decoupling from hardware enables easy scaling for performance and reliability at the same time with fixed or open problem sizes. However, demonstrating performance advantage using a decoupling protocol against an optimized tight coupling protocol is never easy. For example, comparing the packet- switching protocol against the circuit-switching protocol in small scale is pointless. It would be equally difficult comparing parallel performances using the complex decoupling protocol against applications built using highly optimized tightly-coupled bare metal protocols, such as MPI (message passing interface).
[0247] It turned out, however, that parallel computing has inherent volatilities that the tightly coupled hop-by-hop protocol actually prevents the best effort performance. We built a small controlled experiment using 16 cores of a 48 core Intel XEON processor in the NSF Chameleon bare metal cloud. The idea is to compare the performances between a native MPI parallel program running on all 16 cores directly without checkpoint protection against a partitioned wrapped 4 MPI programs (from the same source), each running on 4 cores, also without checkpointing but with a timeout-retransmission protocol to exploit available resources when crashes occur. The ACAN wrapper consists of a master and four workers such that the master retransmission discipline will enter an infinite wait only when all workers crash at the same time. Each ACAN worker controls a native MPI program running on 4 cores. The UVR is constructed between the SMC workers and master. The ACAN wrappers communicate via the Tuple Space abstraction. The wrappers bring two critical benefits: a) they provide fault tolerance without checkpoints, and b) they allow the MPI programs to run on different granularities (partition sizes) without re-compiling. The end-to-end crash protection without checkpoints makes this small SMC experiment inductive. Adding processors (more resources) will only amplify the performance differences with increasing reliability guarantees. This setup establishes the relevance to extreme scale computing. FIG. 18 shows the partitioned program configuration.
[0248] The test was a parallel dense matrix multiplication program written in MPICH2. The MPI program was measured for multiplication of two 6000 x 6000 square matrices using all 16 cores. The MPI program was loop-order and GCC -03 optimized. The SMC wrapper used a Java implementation called AnkaCom. The fixed granularity for each processor is 6000/16=375 rows. The native MPI program yielded an average elapsed time of 23.5 seconds without using the mpi_scatterv( ) primitive. The mpi_scatterv( ) primitive leverages the pre-loaded data on multiple nodes without physically broadcasting the matrix data to workers. In this test, both MPI and SMC wrapped MPI must broadcast data physically.
[0249] The granularity of the SMC wrapped MPI program was tuned from 100 to 1500 (=6000/4). FIG. 19 shows that the best SMC tuned performance is 20 seconds at multiple points. The granularity tuning allowed the application's state machine to align the networks and processors so that all tasks will complete approximately the same time. Larger problem sizes and/or more processors will only amplify the performance differences.
[0250] A more recent test compared an optimized MPI matrix code with the mpi_scatterv( ) zero-copy primitive against the same SMC wrapped MPI code without the benefits of optimized data distribution. The MPI code is also GCC -03, loop order and mpi_scatterv( ) optimized. The SMC wrapper is a new Synergy4 implementation in C. FIG. 20 demonstrated that even with physical broadcast overheads, the ACAN wrapped MPI program still delivered better performances against the unwrapped MPI program at multiple points. In these tests, the programs computed the products of two 9000 x 9000 matrices using 12 compute nodes with total 288 cores (using only 24 cores per 48 core node to test the Infiniband impacts). The test environment was the NSF Chameleon bare metal cluster.
[0251] The experiments in FIG. 20 also included a curious test for a single Tuple Space daemon with direct communications with all programs without the decentralized UVR in order to measure the UVR ring traversal overheads ("Synergy Wrapper (with Ring)" vs. "Synergy Wrapper"). As shown in FIG. 20 that due to automatic component parallelization effects, the UVR overheads were successfully hidden in all cases. [0252] The cyclic nature of the SMC performance curve can be explained using the Brachistochrone equation for solving an ancient gravity (computing) against normal force (communication) puzzle in physics and mathematics. Ashby, N., Brittin, W.E., Love, W.F., and Wyss, W., “Brachistochrone with coulomb frictions, ” American Journal of Physics, 43:902-905, 1975. The optimal points are on a cycloid. In other words, there are multiple optimal points that will lead to the shortest computing time if without other overheads (such as network sessions). The optimal size is much smaller than the typical fixed task distribution size. The program/data decoupling effects enabled parallel amortizing the processing overheads over time. Although the ACAN protocol costs are almost doubled compared to the bare metal protocols such as MPI, the performance amortizing (automated “squish packing”) effects guarantees better performances.
[0253] Technically, in pure communication performance tests, a statistic multiplexing (packet switching) protocol should be significantly slower than direct circuit- switched protocol. This is why the Internet core still employs growing number of circuit- switching “backbones”. The packet switching protocols are the “edge” protocols that enabled infinite scaling of the Internet enabling the best-effort performance and reliability at the same time. In parallel computing, the biggest performance bottleneck for a large scale application is the parallel task synchronization overhead, even for the regularly-shaped applications like the matrix multiplication using homogeneous processors. The reason is that all parallel tasks compete for the limited interconnection networks. The network load distribution is never even regardless the uniformity of the processor architectures. The subtask synchronization overhead grows as the number of parallel tasks increases. The wrappers delivered the “edge” protocol effects similar to the Internet applications.
[0254] Once the parallel programs are decoupled from physical processors and networks, the infrastructure volatility is isolated. Hard real time applications can be easily supported with resource over-provisioning. Formal validation of distributed and parallel applications becomes practically meaningful. As prescribed by the Amdahl’s speedup bond, with fixed resources, once the application is parallelized, the maximal deliverable application performance is above bounded by the law of diminishing returns. Infinite application scaling is possible for open problem sizes and continued infrastructure expansions and application performance tuning - a tool only available for ACAN parallel programs.
[0255] B.10.2 Extreme Scale Transactional Processing [0256] All computation results must be stored in some stable storage. An efficient stable data storage is the ultimate parallel processing performance and reliability bottleneck. Transactional storage (with concurrent updates) are typically mission critical and much more difficult to protect. Data replication is the only means to defend for arbitrary data losses. Tanenbaum, A.S., Steen, M. V. “Distributed Systems: Principles and Paradigms, ” Prentice Hall PTR, Upper Saddle River, NJ, USA, 1st edition, 2001.
[0257] For storage A and B to hold real time synchronized copies of transactional data is a long standing challenge for distributed computing. Gray, J. et. al. “The dangers of replication and a solution,” ACM SIGMOD International Conference on Management of Data Archive, pages 173-182, Montreal, Quebec, Canada, 1996. Currently, synchronized atomic data replication is implemented using the two-phase-commit (2PC) replication protocol. Mohan, C. and Lindsay, B., “Efficient commit protocols for the tree of processes model of distributed transactions,” ACM SIGOPS Operating Systems Review, 1985. With 2PC, any transient failure must immediately rollback the entire transaction. Thus, both service availability (A) and network partition tolerance (P) are lost if one desires data consistency. 2PC provides end-to-end data protection but with severe performance penalty. Today, vast majority transactional data are asynchronously replicated. Asynchronous data replication cannot eliminate arbitrary data losses when the storage fails. This is why today every financial institution still employs an auditor checking daily transaction logs manually.
[0258] These difficulties were reported in the literature as the dangers for transaction replication, the CAP conjecture and CAP theorem. Brewer, E. “Towards robust distributed systems,” In Proceedings of the Annual ACM Symposium on Principles of Distributed Computing. ACM, 2000. The proof of the CAP conjecture linked the difficulties to the impossibility theory using hop-by-hop protocols. Gilbert, S. and Lynch, N., “Brewers conjecture and the feasibility of consistent, available, partition tolerant web services,” In ACM SIGACT News (2002), volume 33(2), page 59. ACM, 2002.
[0259] Applying the end-to-end SMC concept to networked transactional storage design, data replication consistency is an absolute necessity. Eventual data consistency will lead to massive transient data inconsistencies with uncertain durations. Vogels, W., “Eventually consistent,” ACM Queue, 6, 2008. In practice, the lack of data consistency has real life consequences, even for non-mission critical services. [0260] Technically, under extreme partial failure conditions, any data state is semantically acceptable to all data clients if the inconsistent states are instantly isolated. Following the trajectory of the storage quality improvements, data re-synchronization should be deferred without service interruption. Mission critical consistent data services can then be provided under extreme partial system failures as long as there exists one path to at least one server with the trustworthy data.
[0261] This insight allowed a transformation from the instant rollback 2PC to a discrete nonstop data re-synchronization 2PC via a transaction replication switch or gateway. FIG. 21. The transaction replication switch is installed in front of multiple data servers. The data clients connect to the switch for data accesses. The switch performs three functions: a) dynamically serialized synchronous parallel transaction replication with real time data inconsistency test and instant disconnects, b) dynamic load balancing for read-only requests, and c) non-stop data resynchronization between data sources. Note that the complete end-to-end data multiplexing protocol requires all data clients to include the timeout/retransmission discipline. Thus, there is no need to maintain state information in the replication switches, they can be replaced arbitrarily without transaction losses.
[0262] The replication switch has an in-line parser to inspect each passing query for data changes. For data-changing queries with update conflicts, the switch will perform dynamic serialization to force all data servers to obey the identical commit order. Others will be replicated in wire-speed. For read-only queries, the switch can either dynamically distribute the load or targeting a designated “master” to avoid transient inconsistencies caused by physical data transmission delays. As in compute intensive SMC architecture, this transaction replication/switching architecture is incomplete without the re-transmission discipline in the data clients. The re-transmission protocol enables multiplexing the replication switch such that it can crash arbitrarily without transaction losses. Multiple redundant switches can also boost transaction processing performances.
[0263] The non-stop re-synchronization algorithm follows the Mobius strip principle — a paper tape attached at both ends via a half-twist. The mathematical property of this strip is that it only has a single boundary. It has an infinite “walkable surface”. If we let transaction service time be on that single boundary, we will then have a non-stop data re- synchronization algorithm (the decentralized atomic 2PC). The idea can be described as follows: a) Create a full backup using one of the synchronously replicated servers as the source in the background, b) Restore the backup set to all the targets to be resynchronized in the background, c) Periodically scan the source for data changes; put all target servers to active duty if no change is found; they are all synchronized with the identical data contents, d) If the scan lasts more than a threshold number of times, pause the switch. The switch will then automatically put all targets to active duty causing no more than 60 second service downtime regardless database sizes. The switch pause will stop all incoming transactions; automatically complete all pending transactions and replications. The switch pauses and restarts can also be automated.
[0264] The switch's strategic network position is crucial to deliver the seamless half-twist, the infinite service time then unfolds itself following the above four lines of logic. The correctness proof rests in the fact that transaction log time is faster than transaction processing time. Therefore, the above algorithm results in a monotonically decreasing series of scan times (step c). However, at the end of the scans, the incoming traffic forms a direct queue with the target(s). This can result in an oscillating tail that may never terminate. The manual pause is to terminate the tail. This situation typically happens during heavy transactions.
[0265] In theory, with the instant-rollback atomic data replication, the original CAP conjecture included higher levels A and P that require breaking the data consistency. With the decentralized two-phase-commit, A and P are satisfied within the confines of consistent data services under extreme partial failure conditions. The decentralized non-stop data resynchronization protocol satisfies the application level end-to-end reliability and performance requirements.
[0266] The CAP Theorem remains correct for systems using hop-by-hop RPC (remote procedure call) protocols. The proposed AC AN protocol can lift the data intensive application out of the performance and reliability traps. The “share-nothing” database ideal can indeed become feasible. Stonebraker, M., “The case for shared nothing architecture,” Database Engineering, 9(1), 1986. In fact, the blockchain protocol already demonstrated that CAP can be fully satisfied in practice in decentralized environments once arbitrary message losses are eliminated. Nakamoto, S., “Bitcoin: A peer-to-peer electronic cash system,” https://bitcoin.org/bitcoin.pdf, 2009. Unfortunately, for cryptocurrency applications, the blockchain protocol sacrificed resource multiplexing in pursue of the non-scalable consensus algorithms. [0267] In summary, the complete ACAN storage unit design with synchronous data replication contains three components: a) a stateless data replication switch, b) a non-stop data re-synchronization algorithm, and c) a client-side redundancy check and re-transmission protocol. This unit design can be scaled indefinitely as described next.
[0268] Given the physical storage maximal capability, load distribution (or data partitioning) is the only way to further expand the performance of the storage system. Without the SMC data decoupling features, every data partition becomes a single-point-failure of the entire system. Scaling is limited. Under the SMC framework, each replication switch is responsible for one data partition with R synchronously replicated data servers. Scaling this transaction switching system is done by keeping P » R where P is the number of data partitions. Higher P will deliver better performance. Since R does not change, this infrastructure can upscale to deliver increasingly better performances indefinitely. One rendition is to place the P partitions each replicated R times unidirectionally (similar to “striping” in RAID 0 systems) through UVR to offer the “single-system” image (FIG. 22). In this configuration, each node holds its own data and copies of its R-l predecessor’s data (thus the entire dataset for local access). This storage system will only start losing data when R consecutive nodes crash at the same time. This storage system will only cease operation when all P nodes crash at the same time.
[0269] Each partition's switch can be replaced arbitrarily without transaction losses. A DNS- load balancing function is needed for the access to the storage system, since every node now is equal to any other nodes most of the times.
[0270] With unlimited resources, this architecture can scale indefinitely without a performance upper bound. This seemingly outrageous claim has already been demonstrated by the Internet architecture for communication applications. It is further confirmed by the convergence of Amdahl's and Gustafson's Laws. For ACID compliant transaction processing, a Replication Markup Language (RML) is necessary. U.S. Patent Application Publication 2008/018969. Networked storage does not need this level of complexity.
[0271] Technically, like the packet-switching networks, the transaction replication switches form a data intensive active content addressable network. The performance losses in query parsing and statistical multiplexing are compensated by up scaling the infrastructure and amortizing the overheads in parallel. Most importantly, like the Internet, adding resources can improve the transaction performance and reliability indefinitely. [0272] To demonstrate the potentials of such a counter intuitive framework, we have conducted performance tests using TPC-E benchmark (FIG. 23) using Microsoft SQL Servers. The TPC-E benchmark simulates a brokerage firm's operation using relational databases. The preliminary results demonstrated the feasibility of high data reliability with near linear transaction processing speedup at the same time. In Table 1, P=number of database servers, R=number of synchronously replicated data sets. Since R defines the reliability of the data store independent of P, these experiments are also inductive.
[0273] For non-transactional file systems, decoupling data from storage also offers surprising benefits. For example, big file replication overhead is the major headache for file system with automated synchronous replication. A decentralized ACAN file system (named Anka) demonstrated the solution to the latency problem by optimizing all available resources in parallel. This means that sufficient resources under the ACAN framework can make the synchronous data replication overheads invisible. FIG. 24 shows the performances of replicating R=2 to R=4 data copies of different sizes under the proposed framework. FIG. 25 illustrates consistent superior read and write performances against HDFS. Note that file system meta-data management still requires a switched transactional storage replication layer FIG. 22.
[0274] A recent UFS (UVR File System) implementation delivered more illustrative results. FIG. 26 reports the performance results of the worst-case write performances of a single 2GB file with 2-11 remote replicas via a single 1-Gbit Ethernet switch. The overall replication performance grows as the number of replicas increases, the latency (time) degradation is dramatically slower than legacy methods.
[0275] B .11. Summary and Broader Impacts
[0276] Infinity is a concept that is bigger than any numbers. Unraveling the myth of Amdahl’s and Gustafson’s Laws exposed the true theoretical scaling limits of all parallel applications with growing sizes. It is conceivable that multiple fixed size qubit quantum computers can be harnessed in the quantum cloud. Although the theoretical speedup model does not account for implementation overheads, smarter implementations will continue to narrow the speedup gap asymptotically.
[0277] The ACAN architecture is naturally suited for real time mission critical applications. ACAN can offer far better performance, reliability and security than the current Control Area Network (CAN) in self-driving vehicles on land, sea and in space. The stateless nature of ACAN applications isolates performance and reliability concerns, thus enables formal functional verification and validation without runtime dynamics modeling. ACAN cloud application shifts the power from cloud vendors back to the application owners thus creating new business dynamics. The ACAN applications can endure arbitrary infrastructure expansions and contractions without modifying programs, thus cutting project maintenance costs dramatically.
[0278] The decentralized nature of ACAN applications is ideally suited for cryptocurrency transaction processing without performance bounds. The ACAN data access model also encourages independent hardware innovations for different parallel workloads using both fixed and open- sized APIs.
[0279] The ACAN architecture is a catalyst for delivering growing powers of computing clouds with complementing quantum, chemical or biological computers for specific tasks for decades to come.

Claims

CLAIMS What is claimed is:
1. A computer-implemented method of storing transaction data in a database, the method comprising: receiving transaction data; determining a maximum replication count associated with verifying the transaction data for preventing an attack; storing the transaction data in a tuple, wherein the tuple includes a tuple key, and wherein the tuple key includes a transaction identifier and the maximum read count; replicating a partition of the database among a set of replicated database servers, wherein the partition includes a memory space storing the tuple associated with the transaction data; receiving a request for the transaction data; responsive to the receiving the request, transmitting the transaction data; and removing the transaction data.
2. The computer-implemented method according to claim 1, further comprising: partitioning the database into P partitions; determine the maximal number (K) of replications; configure one database gateway for each partition; configure K database servers for each gateway; configure shifted partition replication algorithm such that a K- 1 database failures cannot compromise the system’s data integrity; configure backup database gateways to hold initial private IP addresses and can failover to a public production gateway IP addresses on demand; mark up a client application with concurrent update data lock and unlock statements; complete the client transaction timeout retransmission discipline to verify a status of a current transaction before retransmission to prevent double- spending; configure a domain name service (DNS) server to automatically failover gateway servers using backup servers; and increasing partitions P to spread a workload to more gateway servers and database servers while keeping replication factor K constant to enable scaling without reliability and security losses.
3. The computer-implemented method according to claim 1, further comprising: determining a maximum replication count associated with the transaction data, wherein the maximum replication count is based at least on a number entities allowed to verify a transaction associated with the transaction data for preventing an attack; receiving transactions at a plurality of participating nodes, wherein the plurality of participating nodes includes P partitions; validating and packing the received transactions into blocks; broadcasting the packed transactions for network validation; selecting R miners to validate a transaction block, one at a time; informing waiting miners to enter low energy mode; validating the received block using locally stored blocks and AC AN stored blocks; adding validated blocks to randomly selected decentralized local chains or discard the received block if any node validation fails; informing waiting miners when a block validation is completed to initiate a next block validation; adjusting R; and adding more miners (P) to accommodate more transaction processing needs, wherein P is substantially larger than R to increase transaction workload and storage.
4. The computer-implemented method according to claim 1, further comprising: determining a maximum replication count associated with the transaction data, wherein the maximum replication count is based at least on a number of entities allowed to verify a transaction associated with the transaction data for preventing an attack.
5. The computer-implemented method according to claim 1, wherein the database represents a scalable transaction pool associated with blockchain.
6. The computer-implemented method according to claim 1 , wherein the database represents a scalable pool of one or more unconfirmed transactions according to a cryptocurrency protocol.
7. The computer-implemented method according to claim 1, wherein the attack includes Byzantine attacks.
8. A device for secure scalable data transaction, the device comprises a processor configured to execute a method comprising: receiving transaction data; determining a maximum replication count associated with verifying the transaction data for preventing an attack; storing the transaction data in a tuple, wherein the tuple includes a tuple key, and wherein the tuple key includes a transaction identifier and the maximum read count; replicating a partition of a database among a set of replicated database servers, wherein the partition includes a memory space storing the tuple associated with the transaction data; receiving a request for the transaction data; responsive to the receiving the request, transmitting the transaction data; and removing the transaction data.
9. The device according to claim 8, the processor further configured to execute a method comprising: partitioning the database into P partitions; determine the maximal number (K) of replications; configure one database gateway for each partition; configure K database servers for each gateway; configure shifted partition replication algorithm such that a K- 1 database failures cannot compromise the system’s data integrity; configure backup database gateways to hold initial private IP addresses and can failover to a public production gateway IP addresses on demand; mark up a client application with concurrent update data lock and unlock statements; complete the client transaction timeout retransmission discipline to verify a status of a current transaction before retransmission to prevent double- spending; configure a domain name service (DNS) server to automatically failover gateway servers using backup servers; and increasing partitions P to spread a workload to more gateway servers and database servers while keeping replication factor K constant to enable scaling without reliability and security losses.
10. The device according to claim 8, the processor further configured to execute a method comprising: determining a maximum replication count associated with the transaction data, wherein the maximum replication count is based at least on a number entities allowed to verify a transaction associated with the transaction data for preventing an attack; receiving transactions at a plurality of participating nodes, wherein the plurality of participating nodes includes P partitions; validating and packing the received transactions into blocks; broadcasting the packed transactions for network validation; selecting R miners to validate a transaction block, one at a time; informing waiting miners to enter low energy mode; validating the received block using locally stored blocks and AC AN stored blocks; adding validated blocks to randomly selected decentralized local chains or discard the received block if any node validation fails; informing waiting miners when a block validation is completed to initiate a next block validation; adjusting R miners; and adding more miners (P) to accommodate more transaction processing needs, wherein P is substantially larger than R to increase transaction workload and storage.
11. The device according to claim 8, the processor further configured to execute a method comprising: determining a maximum replication count associated with the transaction data, wherein the maximum replication count is based at least on a number entities allowed to verify a transaction associated with the transaction data for preventing an attack.
12. The device according to claim 8, wherein the database represents a scalable transaction pool associated with blockchain.
13. The device according to claim 8, wherein the database represents a scalable pool of one or more unconfirmed transactions according to a crypto-currency protocol.
14. The device according to claim 8, wherein the attack includes Byzantine attacks.
15. A system for secure scalable data transaction, the system comprises a processor configured to execute a method comprising: receiving transaction data; determining a maximum replication count associated with verifying the transaction data for preventing an attack; storing the transaction data in a tuple, wherein the tuple includes a tuple key, and wherein the tuple key includes a transaction identifier and the maximum read count; replicating a partition of a database among a set of replicated database servers, wherein the partition includes a memory space storing the tuple associated with the transaction data; receiving a request for the transaction data; responsive to the receiving the request, transmitting the transaction data; and removing the transaction data.
16. The system according to claim 15, the processor further configured to execute a method comprising: partitioning the database into P partitions; determine the maximal number (K) of replications; configure one database gateway for each partition; configure K database servers for each gateway; configure shifted partition replication algorithm such that a K- 1 database failures cannot compromise the system’s data integrity; configure backup database gateways to hold initial private IP addresses and can failover to a public production gateway IP addresses on demand; mark up a client application with concurrent update data lock and unlock statements; complete the client transaction timeout retransmission discipline to verify a status of a current transaction before retransmission to prevent double- spending; configure a domain name service (DNS) server to automatically failover gateway servers using backup servers; and increasing partitions P to spread a workload to more gateway servers and database servers while keeping replication factor K constant to enable scaling without reliability and security losses.
17. The system according to claim 15, the processor further configured to execute a method comprising: determining a maximum replication count associated with the transaction data, wherein the maximum replication count is based at least on a number entities allowed to verify a transaction associated with the transaction data for preventing an attack; receiving transactions at a plurality of participating nodes, wherein the plurality of participating nodes includes P partitions; validating and packing the received transactions into blocks; broadcasting the packed transactions for network validation; selecting R miners to validate a transaction block, one at a time; informing waiting miners to enter low energy mode; validating the received block using locally stored blocks and AC AN stored blocks; adding validated blocks to randomly selected decentralized local chains or discard the received block if any node validation fails; informing waiting miners when a block validation is completed to initiate a next block validation; adjusting R miners; and adding more miners (P) to accommodate more transaction processing needs, wherein P is substantially larger than R to increase transaction workload and storage.
18. The system according to claim 15, the processor further configured to execute a method comprising: determining a maximum replication count associated with the transaction data, wherein the maximum replication count is based at least on a number entities allowed to verify a transaction associated with the transaction data for preventing a Byzantine attack.
19. The system according to claim 15, wherein the database represents a scalable transaction pool associated with blockchain.
20. The system according to claim 15, wherein the database represents a scalable pool of one or more unconfirmed transactions according to a crypto-currency protocol.
PCT/US2022/043364 2021-09-14 2022-09-13 System and apparatus of secure transaction processing and data stores WO2023043736A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US202163261179P 2021-09-14 2021-09-14
US63/261,179 2021-09-14

Publications (1)

Publication Number Publication Date
WO2023043736A1 true WO2023043736A1 (en) 2023-03-23

Family

ID=85602012

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US2022/043364 WO2023043736A1 (en) 2021-09-14 2022-09-13 System and apparatus of secure transaction processing and data stores

Country Status (1)

Country Link
WO (1) WO2023043736A1 (en)

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20160188591A1 (en) * 2014-12-31 2016-06-30 Nexenta Systems, Inc. Methods and systems for key-value-tuple-encoded storage
US20190251199A1 (en) * 2018-02-14 2019-08-15 Ivan Klianev Transactions Across Blockchain Networks
US10846279B2 (en) * 2015-01-29 2020-11-24 Hewlett Packard Enterprise Development Lp Transactional key-value store

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20160188591A1 (en) * 2014-12-31 2016-06-30 Nexenta Systems, Inc. Methods and systems for key-value-tuple-encoded storage
US10846279B2 (en) * 2015-01-29 2020-11-24 Hewlett Packard Enterprise Development Lp Transactional key-value store
US20190251199A1 (en) * 2018-02-14 2019-08-15 Ivan Klianev Transactions Across Blockchain Networks

Similar Documents

Publication Publication Date Title
Cachin et al. Blockchain consensus protocols in the wild
Vukolić The quest for scalable blockchain fabric: Proof-of-work vs. BFT replication
Xiao et al. Distributed consensus protocols and algorithms
Buchman Tendermint: Byzantine fault tolerance in the age of blockchains
Bessani et al. State machine replication for the masses with BFT-SMART
JP7477576B2 (en) Method and system for consistent distributed memory pool in a blockchain network
Bessani et al. From byzantine replication to blockchain: Consensus is only the beginning
Moca et al. Distributed results checking for MapReduce in volunteer computing
István et al. Streamchain: Do blockchains need blocks?
JP7004423B2 (en) Data security of storage of shared blockchain data based on error correction code
US11588926B2 (en) Statistic multiplexed computing system for network-scale reliable high-performance services
CN115152177B (en) System and method for providing specialized proof of confidential knowledge
Li et al. A convergence of key‐value storage systems from clouds to supercomputers
JP2021522738A (en) Memory consensus of shared blockchain data based on error correction code
Cai et al. Benzene: Scaling blockchain with cooperation-based sharding
Peng et al. Neuchain: a fast permissioned blockchain system with deterministic ordering
Saldamli et al. Improved gossip protocol for blockchain applications
Tseng Recent results on fault-tolerant consensus in message-passing networks
JP2022523217A (en) Topology Driven Byzantine Fault Tolerant Consensus Protocol with Voting Aggregation
Baheti et al. DiPETrans: A framework for distributed parallel execution of transactions of blocks in blockchains
Tian et al. A byzantine fault-tolerant raft algorithm combined with Schnorr signature
Fu et al. Votes-as-a-Proof (VaaP): Permissioned blockchain consensus protocol made simple
Iliakis et al. GPU accelerated blockchain over key‐value database transactions
Liu et al. A secure cross-shard view-change protocol for sharding blockchains
WO2023043736A1 (en) System and apparatus of secure transaction processing and data stores

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22870573

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE