CA2718136A1

CA2718136A1 - Computing infrastructure

Info

Publication number: CA2718136A1
Application number: CA2718136A
Authority: CA
Inventors: David Duchesneau
Original assignee: SCRUTINY Inc
Current assignee: SCRUTINY Inc
Priority date: 2007-04-23
Filing date: 2008-04-23
Publication date: 2008-10-30
Also published as: EP2145362A4; WO2008131446A3; EP2145362A2; WO2008131446A2

Abstract

'An affordable, highly trustworthy, survivable and available, operationally efficient distributed supercomputing infrastructure for processing, sharing and protecting both structured and unstructured information ' A primary objective of the SHADOWS infrastructure is to establish a highly survivable, essentially maintenance-free shared platform for extremely high-performance computing (i, e, supercornputing) - with 'high performance' define both in terms of total throughput, but also in terms of very low- latency (although not every problem or customer necessarily requires very low latency) - while achieving unprecedented levels of affordability At its simplest, the idea is to use distributed 'teams' of nodes in a self-healing network as the basis for man-aging and coordinating both the work to be accomplished and the resources available to do the work The SHADOWS concept of 'teams' is responsible for its ability to 'self-heal' and 'adapt' its distributed resources in an 'organic' manner Furthermore, the 'teams' themselves are at the heart of decision-making, processing, and storage in the SHADOWS infrastructure Everything that's important is handled under the auspices and stewardship of a team.

Description

DEMANDE OU BREVET VOLUMINEUX

LA PRRSENTE PARTIE DE CETTE DEMANDE OU CE BREVET COMPREND
PLUS D'UN TOME.

NOTE : Pour les tomes additionels, veuillez contacter le Bureau canadien des brevets JUMBO APPLICATIONS/PATENTS

THIS SECTION OF THE APPLICATION/PATENT CONTAINS MORE THAN ONE
VOLUME

NOTE: For additional volumes, please contact the Canadian Patent Office NOM DU FICHIER / FILE NAME:

NOTE POUR LE TOME / VOLUME NOTE:

COMPUTING INFRASTRUCTURE
CROSS-REFERENCE TO RELATED APPLICATIONS
[0001] This application claims priority to U.S. Provisional Appl. No.
60/913,502, filed April 23, 2007, which is hereby incorporated by reference in its entirety.

2 1 Systems and Methods for Self-Healing Adaptive Distributed Organic Working Storage 1.1 Background of the Invention There was a time when anything that could be done on a computer could be done faster on a supercomputer. However, because supercomputers could address challenges that were well beyond the capabilities of ordinary computers, they became increasingly specialized, with emphasis on compute-bound problems, making them somewhat less suitable for general purpose computing.
Business computers also evolved, while retaining their general purpose nature, and also became faster.
In order to address a broader range of high-performance computing (HPC) needs, supercomputers need to need become more general purpose, because the largest potential HPC markets are, by far, associated with business needs. Likewise, to address those large potential HPC markets, business computers need to gain much more performance.
Finally, the largest potential HPC markets are likely to remain unaddressable until the associated needs can be met, and one of the key needs is affordability.
There is an ongoing and increasing demand, possibly an insatiable demand, for affordable computer processing power. Supercomputers - or alternatively, high-performance computing (HPC) systems - have historically been very expensive, and thus confined to a relatively small set of applications (e.g., weather modeling, academic and government-based research, etc.) paid for by well-funded customers and users, and thus have been out of reach of many potential customers. The customer set has been so limited that for many years the highest performing systems have been tracked on a list presently known as the "Top 500"
(www.too5oo.orr ). Such a list would not be practical if even a reasonable fraction of the customers who could take advantage of supercomputers actually purchased and operated them.
The potential market for supercomputers is essentially untapped, primarily because it is mostly not addressable with the supercomputers available today (nor is it apparently addressable with today's business servers, including "mainframes").
Today's supercomputers and business servers not only miss the mark in terms of affordability, but also in terms of their fitness for purpose. While there are clearly a few enterprise applications that are a good fit for supercomputers as they are presently designed, the large majority of commercial applications are not a good fit at present, partly due to a requirements mismatch. While business servers are already better-suited to running today's commercial applications, they cannot provide the computer power needed for the next generation of HPC-class business applications. Ultimately, the vendors of supercomputers and business servers are both racing to address the same markets, but from different starting points...
Beyond the applications themselves, there are very real issues associated with achieving high levels of affordable system survivability, disaster recovery, and security (including data confidentiality integrity, availability, etc.), which (we assert) are not addressed well by any contender:
= The costs of electricity to power a datacenter exceeds the costs of the datacenter itself, and the cost of power is not only not going down, but is anticipated to rise sharply over the next few decades.
= Datacenters rarely store more than 72 hours worth of fuel, so they typically contract with local fuel suppliers to commence refueling deliveries within 24 hours of an extended power outage. In the event of a regional disaster that drops the utility power grids and renders key roads impassable, timely fuel delivery is unlikely. Of course, the contracted fuel sources may also be out of commission.
= The actual power densities of datacenters may, sooner or later, exceed their intended designs, due to increased electronics density (the trend toward smaller, lower-power chips is offset by the insatiable demand for computing power).
= There's an increased awareness of the need for the conservation of non-renewable resources, and vendors are striving to produce equipment that consumes less energy. The current reality is that heat energy rejected into the atmosphere by datacenters is simply wasted.
= Datacenters represent a high concentration of assets, and thus make excellent targets for thieves, espionage, and terrorists.

3 = With few exceptions, typical "manned" or "guarded" datacenters are designed to prevent the unauthorized admission of anyone who is not carrying a weapon. However, typical security staff present little deterrent to armed attackers, much less to well-organized, well-funded attackers with armed with inside information, tools, and automatic weapons - thus, the claimed security associated with typical hardened datacenters is illusory.
= Datacenters represent a single point of failure (the datacenter itself), regardless of the level of internal redundancy. Thus, a single regional disaster or terrorist attack may effectively destroy companies whose livelihood depends on a failed (or destroyed) datacenter. Of course, a datacenter may fail without actually being attacked.
= "There are two kinds of datacenters; those which have failed, and those that will."
= Despite a general awareness of Byzantine failure scenarios, disaster recovery preparedness rarely extends beyond having one or two backup datacenters, if any. Synchronously connected datacenters typically must be colocated relatively nearby (e.g., 10 to 60 miles is typical, with 300 miles or so as an upper limit), which means they may be subject to the same regional threats, and thus, simultaneous failures. Asynchronously connected datacenters may be geographical located at arbitrary distances, but may lag in data currency. Also, if Byzantine failures are considered, and a datacenter is taken as a single process, backed up by others coordinating asynchronously, then accommodating a single faulty datacenter would require a minimum of four data centers (i.e., 3f+1, where f is the number of faulty datacenters to be tolerated').
Collectively, these concerns bring us back to affordability, not only in terms of capital expense (i.e., the cost of asset acquisition), but more importantly, the operational expense. It is well-known in the industry that, despite the fact that the acquisition cost of supercomputing assets is very high, it is quickly surpassed by the cost of operating those assets. Together, these asset acquisition and operational costs comprise the total cost of ownership (TCO), which is a key factor in any return-on-investment (ROI) calculation. However, as we move forward, the trend toward an increased demand for supercomputing may not be merely to achieve a particular ROI, but rather, to survive. If done right, which includes affordability as a prerequisite, supercomputing may enjoy network effects and become indispensable (i.e., highly competitive companies may, in all likelihood, need access to supercomputing), and ROI may become far less relevant, making the buying decision a more obvious choice. Thus, for the addressable markets, affordable supercomputing may become a necessary component in the business survival kit.
Summary of Key Problems with Datacenters Today (all are addressed by SHADOWS
and SUREFIRE):
= High acquisition cost (space/real estate, construction, equipment, integration) = High power consumption and accelerating = Sprawling layouts, high space requirements = Physical security becoming less effective = Requires Manpower, Operational Overhead = Mediocre survivability (many vulnerabilities) = Concentration of assets increases risk (makes datacenters an important target, etc.) = Heterogeneous, difficult-to-manage mix of everything (computers, network gear, power, cooling, etc.) "Impossibility of Distributed Consensus with One Faulty Process," by Michael J. Fischer, Nancy A. Lynch, and Michael S. Paterson. Journal of the ACM, Vol. 32, No. 2, April 1985, pp. 374-382.

4 1.2 Detailed Description of the Invention "An affordable, highly trustworthy, survivable and available, operationally efficient distributed supercomputing infrastructure for processing, sharing and protecting both structured and unstructured information."
Over time, however, there has been increasing impetus to extend the applicability of supercomputers to a much broader range of commercial problem areas. In a preferred embodiment the system establishes a supercomputing platform that meets the needs of the broad but unaddressed market for high-performance computing - both in terms of total throughput, but also in terms of low-latency (although not every problem or customer necessarily requires low latency) - while achieving unprecedented levels of affordability (both capital and operational expense). Affordability, trustworthiness, survivability, and fault tolerance may be among the essential needs, yet these requirements have historically been in conflict with each other, and resolving them requires a new approach.
The design of the system began with a focus on built-in security, survivability, and trustworthiness - the key ingredients for business-critical / mission-critical computing -- and the associated implications have permeated every aspect of the invention.
Recognizing that labor is a significant operational expense, and thus affects affordability from a TCO
perspective, the system was intentionally designed to operate in a "lights-out," "hands-off', "unattended,"
"maintenance-free" environment. In a preferred embodiment, suitably configured, the system is designed to operate unattended, in low-cost remote locations - for years.
The system incorporates design concepts that address next-generation needs, so that supercomputer manufacturing and production can occur on a scale appropriate for a radically enlarged addressable market, resulting in the lowest possible cost structure, while maximizing production flexibility. This sort of quantum reduction in cost structure is preferred, if only to achieve the levels of affordability necessary to jumpstart the otherwise unaddressable broadest markets.

1.2.1 Invention Highlights Due to the start-from-scratch system design approach, and the myriad details associated with that approach, the invention is best summarized by highlighting its extreme benefits. In a preferred embodiment, the system offers:
= Extremely affordable supercomputing (due to novel design and the novel use of commodity components) = Extremely low power dissipation, achieved technologically (i.e., despite the use of commodity components) = Extremely low dependence on utility power (renewable energy, self-contained multifuel power plant) = Extremely low levels of system maintenance anticipated (i.e., upgrades only, no preventive maintenance) = Extremely high performance per TCO dollar (i.e., in terms of both capital and operational expenses) = Extremely high online and nearline internal storage capacity and throughput (without sacrificing scalability) = Extremely high security (self-defense, resistance to cyber attack, physical attack, tampering, theft) = Extremely high survivability expected (secured, hardened, designed to resist Byzantine failures) = Extremely high levels of interoperability with other systems (generous and diverse 10 capacity) = Extremely useful ability to marshal external computing resources (further improving utilization and capacity) 1.2.2 Organization of the System Description Although the system is relatively modular by design, there are many components, and thus a commensurate large number of potential interconnections, dependencies, and interactions.
Nonetheless, for the purpose of teaching the aforementioned systems and methods, the Table of Contents beginning on the next page attempts to summarize and group the diverse components and their relationships, and to organize them in a somewhat linear way, despite their distinctly non-linear nature.

2 Table of Contents 1 Systems and Methods for Self-Healing Adaptive Distributed Organic Working Storage ........................... 1 1.1 Background of the Invention ...............................................................................
............................................1 1.2 Detailed Description of the Invention ...............................................................................
...............................3 1.2.1 Invention Highlights ...............................................................................
.....................................................3 1.2.2 Organization of the System Description ...............................................................................
......................3 2 Table of Contents ...............................................................................
......................................................... 4 3 Glossary of Terms ...............................................................................
........................................................ 7 4 SHADOWSTM - Architectural Overview & Motivations ..............................................................................

4.1 The Goal, in No Uncertain Terms ...............................................................................
..................................26 4.2 Historically Conflicting Requirements ...............................................................................
.............................26 4.3 SHADOWS as a Distributed, Decentralized Centralized Architecture ...........................................................26 4.4 SUREFIRE Sites as Survivable Mini-Datacenters ...............................................................................
.........27 4.5 How Distributed Machines Are Organized at Multiple Sites.. ..... ___ ............ ..................... ...... 27 ..............5 SERVANT (Service Executor, Repository, & Voluntary Agent - Non-Trusted) ........................................ 29

5.1 MARSHAL (Multi-Agent Routing, Synchronization, Handling, & Aggregation Layer) ....................................30 5.2 DELEGATE (Distributed Execution via Local Emulation GATEway).............................................................30

6 SCRAM - Survivable Computing, Routing, & Associative Memory ......................................................... 31 6.1 SCRAM Subsystem ...............................................................................
.......................................................34 6.2 SCRAM Processing Node ...............................................................................
..............................................34

7 SELF - Secure Emergent Learning of Friends. ...... -................................................................
.............. 34 7.1 SELF Concepts.-...............................................................................
..........................................................34 7.1.1 SELF - Resource Management Via Teams. ................ ... ___ ................... - ...................... ____ .... __.34 7.1.2 SELF - Software Rejuvenation & Process-Port Combinations.. ..........
................... ____ ... - ................. 38 7.1.3 BOSS - Asynchronous Byzantine Agreement ...............................................................................
..........39 7.1.4 MASTER - Relationship of MASTER to BOSS
...............................................................................
.........40 7.2 BOSS (Byzantine Object & Subject Security) ...............................................................................
................43 7.2.1 Minimum Redundancy for Byzantine Agreement ...............................................................................
......43 7.2.2 Byzantine Agreement Among Peers ...............................................................................
.........................44 7.2.3 Byzantine Agreement Among Peers, as Viewed by Third Parties ............................................................44 7.3 MASTER (Multiprocessor Adaptive Scheduler & Task Executor/Redirector) ................................................46 7.3.1 Load-Balancing SHADOWS Native Processes... .................... - ...
... - ................... .................. ____46 7.3.2 Forces Influencing SHADOWS Adaptive Load-Balancing.. .....
.................. ...................... ...... 46 ...........7.4 SLAVE (Storage-Less Adaptive Virtual Environment) ...............................................................................
....48

8 CHARM - Compressed Hierarchical Associative & Relational Memory ..................................................49 8.1 CHARM Concepts ...............................................................................
..........................................................52 8.1.1 CHARM Object Characteristics... ....... __ .................... - ......
....... __ .................. ____ ........ ......... 52 .8.1.2 Storage & Communications - Slices and Slivers ...............................................................................
......53 8.1.3 CHARM - FEC Pseudo-Random Ordinals (PRO) Encoding Concept .....................................................54 8.1.4 CHARM - Representation of Infinite Precision Floating Point Numbers ..................................................55 8.1.5 CHARM - Word and Phrase Tables ...............................................................................
.........................57 8.2 PUMP - Parallel Universal Memory Processor ...............................................................................
..............59 8.2.1 Overview ...............................................................................
................................................................... 59 8.2.2 Principle of Operation ...............................................................................
................................................59 8.2.3 Inter-Quadrant Connectivity ...............................................................................
......................................60 8.2.4 Inter-Lobe Connectivity ...............................................................................
.............................................60 8.3 FLAMERouter - Firewall, Link-Aggregator/Multiplexer & Edge Router .........................................................62 8.4 FIRE - Fast Index & Repository Emulator.. ....................
................. ........................................... ___62 8.5 NEAR - Nearline Emulation & Archival Repository ...............................................................................
........62 8.5.1 CENTRAL CONCEPT
...............................................................................
...............................................62 8.5.2 BASIC CONCEPTS
...............................................................................
..................................................63 8.5.3 NEARdrive - Preferred Embodiments.. ..... ..................... __ ............................................. __ ...... ....... 64 8.5.4 NEARdrive Thermal Stabilization to Avoid Thermal Stress ......................................................................65 8.5.5 NEARdrive Thermal Stabilization to Prevent Thermally Induced Read Errors .......................................65 8.5.6 Periodic Analysis of Drive SMART Data ...............................................................................
...................66 8.5.7 Predictive Statistical Properties of Disk Drive Failures .............................................................................66 8.5.8 Load-Shifting Away from Failed and At-Risk Drives.
................................. __ ........................................... 67 8.5.9 Pre-Spin-Down Drive Analysis and Maintenance ...............................................................................
......68 8.5.10 On-The-Fly Drive Analysis and Maintenance ...............................................................................
............68

9 CORE - Computation, Optimization, & Reasoning Engines ....................................................................69 9.1 CORE Concepts ...............................................................................
............................................................69 9.2 FACTUAL - Frequency-Adaptive Computation Table & Use-Adaptive Lookup ............................................69 9.3 FASTpage (Fast Associative Search Tree, pageable) ...............................................................................
...70 9.3.1 KEY DEFINITIONS
...............................................................................
...................................................70 9.3.2 CENTRAL CONCEPT
...............................................................................
...............................................71 9.3.3 BASIC CONCEPTS
...............................................................................
..................................................71 9.3.4 KEY APPLICATION AREAS
...............................................................................
.....................................72 9.3.5 APPLICATION CONSIDERATIONS
...............................................................................
.........................73 9.3.6 IMPLEMENTATION CONSIDERATIONS.....
......................................................................_..._....
..........75 9.3.7 EXAMPLE
...............................................................................
.................................................................76 9.4 RECAP - Reliably Efficient Computation, Adaptation, & Persistence.
............ ............................. ..... 76 .....9.4.1 RECAP - Resource-Sharing Concepts ...............................................................................
.....................76 9.5 RUSH - Rapid Universal Secure Handling ...............................................................................
....................78 9.5.1 CENTRAL CONCEPT
...............................................................................
...............................................78 9.5.2 RUSH - Dynamic Inter-Site Path Characterization ...............................................................................
...81 9.5.3 RUSH - Energy Considerations for Routing ...............................................................................
.............82 9.5.4 RUSH - Inter-Node Messaging Plan ...............................................................................
.........................82 9.5.5 RUSH - Pre-Validation of Session Traffic.
...............................................................................
................83 9.5.6 RUSH - Using Bloom Filters to Pre-Validate RUSH Traffic.....
................. ..................... ..... _ ..... ......... - 87 9.5.7 RUSH - Time Stamping & Synchronization, Effects of Congestion, Tampering & Attack ........................88 9.5.8 RUSH - Example RUSH Messages (subset) ...............................................................................
............89 9.6 VOCALE -Vocabulary-Oriented Compression & Adaptive Length Encoding ...............................................90 9.6.1 KEY DEFINITIONS
...............................................................................
...................................................90 9.6.2 CENTRAL CONCEPT
...............................................................................
...............................................90 9.6.3 BASIC CONCEPTS
...............................................................................
..................................................91 9.7 UMA - UpdateMovingAverages(iVa/ue) ...............................................................................
.......................92 9.7.1 PSEUDOCODE
...............................................................................
.........................................................93 FRAME (Forced Recapture, Aggregation & Movement of Energy)...
......................................... ...... ..... .95

10.1 SLAM - SCADA, Logging, Analysis & Maintenance.. ........................
____ .......... _ ........................... ____97 10.2 STEER - Steerable Thermal Energy Economizing Router ...........................................................................99 10.2.1 STEER - Latching Digital Flow Rate Control Valve. ............. __ ................. ------- ............ ............ 99 10.2.2 STEER -Para Ilei-Series Reconfigurator ...............................................................................
................101 10.3 RUBE - Recuperative Use of Boiling Energy ...............................................................................
..............103 10.3.1 RUBE - Heat Energy Recuperation Cycle Overview .............................................................................10 10.3.2 RUBE - Double Boiler ...............................................................................
.............................................109 10.3.3 RUBE - Inner Boiler ...............................................................................
................................................110 10.3.4 RUBE Vapor Injector ...............................................................................
...............................................115 10.3.5 RUBE Air-Cooled Subcooler ...............................................................................
...................................117 10.3.6 RUBE Liquid-Cooled Subcooler ...............................................................................
..............................117 10.3.7 RUBE Recuperator Assembly ...............................................................................
.................................120 10.3.8 RUBE Recuperator Tube ...............................................................................
........................................121 10.3.9 RUBE Condenser-Separator Tube. ........ ___ ...........
................... ____ ......... ................................ _ 122 10.4 PERKS - Peak Energy Reserve, Kilowatt-Scale..........................................................................
..............124 10.4.1 Electrical Power Conditioning and Electrical Energy Storage ................................................................125 10.5 FORCE - Frictionless Organic Rankine Cycle Engine ...............................................................................
.126 10.5.1 FORCE
Turboalternator................................................................
.........................................................127 10.5.2 FORCE Post-Turboalternator Recuperator ...............................................................................
.............129 10.5.3 FORCE Catalytic Vaporizer ...............................................................................
.....................................130 10.5.4 FORCE External Thermal Energy Vaporizer ...............................................................................
...........130 10.5.5 FORCE Exhaust Dehumidifier.
...............................................................................
................................130 10.6 SOLAR - Self-Orienting Light-Aggregating Receiver ...............................................................................
...131 10.6.1 SOLAR Parabolic Dish for Concentrating Solar Power- Back-of-the-Envelope Calculations ...............131 10.6.2 Non-Concentrating Solar Power Considerations ...............................................................................
.....134 10.6.3 SOLAR Parabolic Dish for Concentrating Solar Power - Candidate Phase-Change Working Fluids ....135 10.6.4 FORCE Nanoturbine Considerations ...............................................................................
......................136

11 SUREFIRE - Survivable Unmanned Renewably Energized Facility & Independent Reconfigurable Environment... .......................................... _ ................................. .....................................
.......................... 136 11.1 SUREFIRE Freestanding Vault ...............................................................................
....................................137 11.2 SUREFIRE Mini-Silo ...............................................................................
....................................................137 11.3 SUREFIRE Single-Level Underground Vault ...............................................................................
...............137 11.4 SUREFIRE Multi-Level Underground Vault ...............................................................................
.................138 11.4.1 SUREFIRE Colocated with a 108 KW Wind Turbine (Preferred Embodiment)...
.... ............................. 138

12 DEFEND - Deterrent/Emergency Force Especially for Node Defense ..................................................139

13 WARN - Weather & Advance Risk Notification ...............................................................................
.......139 13.1 LISTEN - Locate, Identify, & Scrutinize Threats Emerging Nearby..
............ ................... __ .................. 140 13.1.1 Moxon Rectangle Directional Beams for 49, 52.25, and 59 MHz ...........................................................140 13.1.2 A Set of Contingencies for the Use of Moxon Rectangles as Null-Based Direction-Finding Arrays .......142 13.1.3 Basic Properties of the Lindenblad Omni-Directional Elliptically Polarized Dipole Array..... ....... ........... 145 13.1.4 Lindenblad Omni-Directional Elliptically Polarized Dipole Arrays With Full-Length & Shortened-Capped Elements... .... ........................ _ .........................
...................................... ____ .................................................. 146 13.1.5 Direction-Finding Options: The Moxon Rectangle, Doppler Arrays, And Adcock Arrays ........................ 148 13.2 PODIUM - Pneumatically Operated Directional Intelligent Unmanned Masthead ................................... 151

14 CLAIMS
...............................................................................
....................................................................151 14.1 SHADOWS: Systems and Methods for Self-Healing Adaptive Distributed Organic Working Storage ....... 151 14.2 SCRAM: Survivable Computation, Routing, & Associative Memory ..........................................................152 14.3 SUREFIRE: Survivable Unmanned Renewably Energized Facility & Independent Reconfigurable Environment ...............................................................................
...............................................................................

14.4 SELF: Secure Emergent Learning of Friends ...............................................................................
.............154

15 ABSTRACT
...............................................................................
..............................................................154

16 FIGURES
...............................................................................
.................................................................154 16.1 PEERS - Packet Engines Enabling Routing & Switching ...........................................................................211 3 Glossary of Terms ADSL. Asymmetric DSL. A DSL communications link characterized by its asymmetric download and upload rates (e.g., 1.5 Mbps download, 900 Kbps upload). See also: DSL.
AFR. Annualized Failure Rate. The percentage of disk drives in a population that fail in a test scaled to a per-year estimation. The AFR of a new product is typically estimated based on accelerated life and stress tests, or based on field data from earlier products (commonly on the assumption that the drives are 100%
powered on). AFR estimates are typically included in vendor datasheets (e.g., 0.88%, 0.73%, 0.63%, etc., for high quality disks with MTTFs in the range of 1 million hours to 1.4 million hours). However, the datasheet AFR and the field AFR differ widely. According to a 2006 study of large-scale supercomputer clusters and ISPs that analyzed drives over a 5-year period, the field AFR
exceeds 1 %, with 2% to 4%
common, and up to 12% observed on some systems. See also: ISP, NEAR, MTTF.
API. Application Programming Interface.
ASCII. American Standard Code for Information Interchange.
ASIC. Application-Specific Integrated Circuit. A hardware device containing fixed (not reconfigurable) logic and other circuitry. See also: FGPA.
Availability. Ability of a component or service to perform its required function at a stated instant or over a stated period of time. It is usually expressed as an availability ratio or percentage, i.e., the proportion of time that a system can be used for productive work. Availability over a particular period of time is calculated as ((Total Time - Unavailable Time) / Total Time), where the Unavailable Time is closely related to the MTTR.
Availability goals are often expressed as 99.9% ("three nines" availability), or 99.99% ("four nines"
availability), and so on. "Three nines" (99.9%) implies a maximum downtime of 8 hours and 46 minutes per year, whereas "four nines" (99.99%) limits the downtime to 53 minutes per yearr. Also: In the context of security, availability refers to the property of a system or a system resource that ensures it is accessible and usable upon demand by an authorized system user. Availability is one of the core characteristics of a secure system. See also: AFR, MTTF, MTTR.
BB-RAM. Battery-Backed RAM. A type of NVRAM implemented by combining a conventional RAM (e.g., SRAM or SDRAM) with a battery backup, in order to prevent data loss in case of power failure. Since the battery backup capability can typically sustain a relatively short period of operation, BB-RAM systems sometimes include low-cost secondary storage (such as one or more magnetic disk drives, depending on redundancy requirements). Alternative NVRAM implementations (e.g., MRAM) are available that do not require battery backup capability. The SHADOWS CHARM technology uses diverse NVRAM technologies, including BB-RAM. However, CHARM's primary rationale for using BB-RAM (in conjunction with the FIRE
and NEAR technologies) is to be able to better control its secure erasure in case of intrusion, in addition to its traditional NVRAM use. See also: CHARM, FIRE, MRAM, NEAR, NVRAM, RAM.
BFT. Byzantine Fault Tolerance.
BLOB. Binary Large OBject.
Bloom Filter. A probabilistic algorithm to quickly test membership in a large set using multiple hash functions into a single array of bits. Bloom filters are a space-efficient set membership structure whose key characteristic is to never yield false negatives, although it may yield false positives. Bloom's algorithm lets one determine whether a value has been previously handled (e.g., processed, stored, etc.), although one must allow the possibility of a false positive when extracting this information. Given a Bloom bit array N bits long, and a numeric value (e.g., a high-quality hash) whose presence we wish to store, we take the first N
bits from the result of the numeric value and set the index N in the bit array to TRUE. We repeat this about 5 to 20 times setting 5 to 20 bits, respectively, in the bit array using the next consecutive N bits from the numeric value. The N-bit array starts out reset (FALSE), before we start to populate it with data. As the bit array becomes more populated, sometimes we may set a bit that has already been set. This is the root of the false positive cases we examine later. When we wish to whether a numeric value has been previously handled, we proceed in almost the exact way, except that we read the bits from the bit array instead of setting them. As we read the bits, if any of them are zero (FALSE), the numeric value is guaranteed to have never been handled. If all of the bits are set (TRUE), the numeric value has probably been handled and other means must be used to obtain a definitive, authoritative answer. The saturation of a Bloom filter is Based on 365.25 days per year, or 8766 hours per year.

defined to be the percent of bits set to TRUE in the bit array. The phase of a Bloom filter is defined to be the number of times we attempt to set a bit in the array (5 to 20 times in this example). Both of these variables can be modified to change the accuracy and capacity of the Bloom filter.
Generally speaking, the larger the size of the bit array (N) and the higher the phase, the smaller the probability of false positive responses occurring. Statistically, the chance of a false positive can be determined by taking the saturation (which decreases with increasing filter size) and raising it to the power of the phase.
BLOOMER". Banked & Layered Orthogonally Optimized Matrix of Energy Receivers.
An apparatus for deploying and controlling a multiplicity of energy receivers in orthogonal banks and layers so as to maximize their ability to collect energy from their intended source (each is typically designed for a particular renewable or high capacity energy resource, such as the sun, the wind, a river, or the ocean). The basic idea is that, in lieu of deploying an array of one or more "large" receivers, to be able to deploy one or more arrays (possibly distributed geographically) of "smaller" receivers that can more easily be automatically repositioned and/or reconfigured. The goal of automatic repositioning and reconfiguration is at least two-fold:
1) to maximize each receiver's ability to collect energy and thereby maximize the collected energy in aggregate, and 2) to maximize the survivability of each receiver array and thereby maximize the survivability of the maximal energy collection capability. Conceptually, when "closed" the apparatus minimizes the exposure of the energy receivers to various primary and/or secondary threats, and when "bloomed" the apparatus maximizes the exposure of the energy receivers to their intended energy source.
BOSSTM. Byzantine Object & Subject Security. A distributed, timely, trusted computing base (TCB) and object/subject security system that incorporates Byzantine agreement logic (from the classic "Byzantine generals" problem) in its decision-making process, and collectively makes security decisions in a "fail-silent"
manner that provides survivability even in the face of multiple failures and/or corrupted nodes. BOSS is implemented and instantiated only in conjunction with a MASTER, and works in conjunction with CHARM to control who gets access to what, and when, while ensuring that unauthorized information is not exposed (not even to other internal systems). BOSS is designed to enable the SHADOWS
infrastructure to support both classified and unclassified information processing and storage (e.g., to meet or exceed Common Criteria (CC) Protection Profiles (PP) such as the U.S. DoD Remote Access Protection Profile for High Assurance Environments Version 1.0, June 2000, nominally at EAL5, or potentially at EAL6 if implemented by a single, qualified development organization). Any BOSS node that fails or becomes corrupted may be restarted or replaced, and in any case should not be trusted until its trustworthiness can be re-established from scratch to the satisfaction of the surviving trusted nodes, including, at a minimum, other MASTERs with which it previously participated as a team member. See also: CC, CHARM, DoD, EAL, MASTER, PP, TCB.
Cache. A small fast memory (relative to a larger, slower memory) holding recently accessed data, designed to speed up subsequent access to the same data. Most often applied to processor-memory access but also used for a local copy of data accessible over a network, etc. See also: Cache Conflict, Direct Mapped Cache, Fully Associative Cache, Set Associative Cache.
Cache Conflict. A sequence of accesses to memory repeatedly overwriting the same cache entry. This can happen if two blocks of data, which are mapped to the same set of cache locations, are needed to be read or written simultaneously. See also: Cache Line.
Cache Line. (Or cache block). The smallest unit of memory than can be transferred between the main memory and the cache. Rather than reading from a larger but slower memory one word or byte at a time, each cache entry is usually holds a certain number of words, known as a "cache line" or "cache block," and a whole line is read and cached at once. This takes advantage of the principle of locality of reference - if one location is read then nearby locations (particularly following locations) are likely to be read soon afterwards.
See also: Cache.
Caching. Caching is a useful general technique that sometimes makes programs run faster. It does this by exchanging space for time: Caching tries to save previously accessed or computed results in an attempt to reuse them later rather than recomputing them. Caching is useful in all kinds of situations, including, almost any kind of searching (cache the results of the search so that you can skip it next time), HTML generation (cache the results of the generation process so that you don't have to generate the page next time), and numeric computation (cache the results of the computation). See also: Cache, FACTUAL, Memoization.
CC. Common Criteria. Here, this refers to Common Criteria (CC) for Information Technology Security Evaluations, Version 2. 1, August 1999. See also: PP.
CHARMTM. Compressed Hierarchical, Associative, & Relational Memory. An associative memory system that provides high-capacity, highly available and survivable persistent storage, and secure, rapid recall of information. Incorporates local FIREblades and NEARblades (or FIREdrives and NEARdrives), and collaborates with CHARM systems of other local and remote SCRAM nodes. Unlike relational databases, all information is indexed, and, from a hardware perspective, all index storage is electronic (no latency due to spinning media), but without the expense associated with general-purpose SSD.
Unlike relational databases, records or objects not meeting security constraints are never even accessed (e.g., in order to check security attributes). Hard disks store only fractional archival data.
All geographically colocated equipment and data are expendable without information loss or disclosure. See also: FACTUAL, FIRE, NEAR, SCRAM, SSD.
CNC. Computer Numerical Control. A computer-aided manufacturing technology using computers to control cutting machines such as milling machines and lathes to cut specified three-dimensional shapes.
CNC has been used since the early 1970s. Prior to this, machines were controlled by prepared tapes and the process was called simply Numerical Control (NC).
Codec. A complementary pair of functions comprising an encoder and a decoder, such that an input can be provided to the encoder, the output of which is fed to the decoder, the output of which is the original input.
See also: FEC.

CORE". Communications, Optimization, & Reasoning Engines. A collection of "engines" whose purpose is to encapsulate and securely execute high-performance and/or hardware-assisted general purpose implementations of critical compute-intensive processes, to minimize latency, maximize overall throughput, and reduce operational costs. These engines (e.g., FACTUAL, FASTpage) are typically closely associated with other systems, and are thus described in those contexts. See also: FACTUAL, FASTpage.
CPU. Central Processing Unit.
CRC. Cyclic Redundancy Check.
Critical Heat Flux. The heat flux beyond which boiling cannot be sustained because the liquid working fluid no longer wets the surface of the heat source. Since heat flux is typically given in of W/cm2 , those skilled in the art may recognize that the critical heat flux can be extended by numerous means, including any that increase the effective surface area of the heat source (the denominator) for a given level of power (the numerator), but also including the use of increased turbulence, special coatings, etc. See also: Heat Flux, RUBE.
CWS. Chilled Water System.
DBMS. Data Base Management System. See also: DELEGATE.
DDoS. Distributed DoS. A DoS involving distributed attack sources, such as when multiple compromised systems cooperate to flood the bandwidth or resources of a targeted system, usually one or more web servers. The systems involved in a DDoS attack become compromised by the DDoS
perpetrators via a wide variety of methods, and, depending on the nature and extent of the compromise, may contain relatively static hardcoded attack vectors (e.g., "MyDoom," which involved a hardcoding the IP
address in advance of an attack), or may contain sophisticated control mechanisms such that the compromised systems collectively form one or more "botnets." Unlike hardcoded attacks, botnets can be controlled dynamically, and thus targeted at any IP address at any time. The DDoS strategy provides an attack perpetrator with numerous advantages: 1) orders of magnitude more attack traffic than a simple DoS, 2) increased stealth and detection avoidance, and 3) significant defense challenges for the targeted victims, since the aggregate bandwidth of a large botnet can easily any practical amount of surplus bandwidth a defender might purchase to mitigate DDoS attacks (and perpetrators can always add more compromised systems to their botnets). In a preferred embodiment of the SHADOWS infrastructure, there are significant strong defenses against DDoS
attacks, implemented primarily in the SHADOWS FLAMERouter technology (the essence of which is also in common with the SHADOWS RUSHrouter technology, and by inclusion, in the SHADOWS MARSHALs).
The most important defenses include the ability to efficiently and accurately distinguish "self' (bona fide traffic) from "non-self," the ability to impose adaptive flow control, the ability to provide "hidden" services and stealthy interfaces, the ability to present a continually moving multi-point target (thus dividing and effectively decimating the attack resources), and significant means for load-shedding and adaptive redistribution of defensive resources. See also: DoS, FLAMERouter, MARSHAL, RUSHrouter, SELF.
DEFENDTM. Deterrent/Emergency Force Especially for Node Defense.
DELEGATETM. Distributed Execution via Local Emulation GATEway. A distinguished SERVANT node having the responsibilities of fulfilling a DELEGATE role. The DELEGATE role implements secure client-side proxy" agent that appears to locally implement a particular service which would normally be implemented elsewhere, such as on a local or remote server, but instead is actually implemented within the SHADOWS
network cloud. The DELEGATE concept is described further in section 5.2. See also: API, DBMS, DNS, LDAP, MPI, POP3, RUSH, RUSHrouter, SIP, SLA, SMTP, VoIP.
Delta Compression. Whereas any object can always be expressed in terms of another object (the reference object) plus the set of differences or "deltas" between them, such expression is only beneficial if a suitable reference object can be found easily, and resulting expression (in terms of deltas) achieves useful a compression ratio. In the art, delta compression attempts to recognize the differences between successive or nearly successive versions of a file, in order to express one in terms of the other plus some differences (the "delta"). This works well in practice because the reference object (i.e., the earlier file version) is already known, and the relationship between the files is already established - the files are expected to be similar, so processing is straightforward and efficient. When differences are small, the compression ratio is excellent, such that storing two nearly identical files with delta compression would consume little more space than storing one of them (the ideal is reached when the files are identical).
However, applying the technique much more widely - such that an arbitrary data sequence can be efficiently expressed in terms of some other arbitrary data sequence whose identity is unknown but must be efficiently discovered during the compression process - is a very challenging problem that is well beyond the state of the practice. The compression in CHARM directly addresses this broader challenge, by using its FASTpage search mechanisms and massively parallel implementation to efficiently locate fine-grained candidate components (i.e., sub-objects) of objects whose content is most similar to the various components of the object to be compressed. CHARM recognizes the differences between the similar content of any two objects or subjects, which trivially includes versions of the same object - an extremely frequent occurrence in CHARM because its objects are immutable, so every version is a new object. More importantly, however, CHARM's compression can take advantage of different objects with partially common content, such as objects having one or more shared vocabularies, or having similar components (sub-objects).
CHARM allows multiple objects to be used as reference objects, so wherever commonality exists, any object can be efficiently expressed in terms of the selected reference object(s) plus a set of deltas.
Note that prior to applying delta compression, CHARM applies other compression techniques, such as RLE-1, RLE-2, and RLE-3. While all the CHARM algorithms can be implemented in software, in a preferred embodiment they are implemented in hardware, in the CHARM PUMP. See also: CHARM, FASTpage, PUMP, RLE.
Demand Paging. In a virtual memory system, a technique to conserve by relatively scarce physical memory by loading virtual memory pages into physical memory only as they are accessed. See also: Page Fault, Swapping.
Direct Mapped Cache. A cache where the cache location for a given address is determined from the middle address bits. If the cache line size is 2" then the bottom n address bits correspond to an offset within a cache entry. If the cache can hold 2m entries then the next m address bits give the cache location. The remaining top address bits are stored as a "tag" along with the entry. In this scheme, there is no choice of which block to flush on a cache miss since there is only one place for any block to go.
This simple scheme has the disadvantage that if the program alternately accesses different addresses which map to the same cache location then it may suffer a cache miss on every access to these locations.
This kind of cache conflict is quite likely on a multi-processor. See also: Cache, Fully Associative Cache, Set Associative Cache.
DMZ. De-Militarized Zone. In networking, this is a buffer zone situated between a protected LAN and a WAN, and occupied by bastion servers, firewalls, or other devices that are sufficiently hardened so as to safely withstand direct exposure to the Internet.
DNS. Domain Name System. See also: DELEGATE.
DoD. Department of Defense.
DoS. Denial of Service. In computer security, a denial-of-service attack (DoS
attack) is an attempt to make a computer resource unavailable to its intended users. A DoS attack is characterized by an explicit attempt by attackers to prevent legitimate users of a service from using that service, typically via one of the methods:
1) consumption of computational resources, such as bandwidth, disk space, or CPU time; 2) disruption of configuration information, such as routing information; 3) disruption of physical network components.
Attacks can be directed at any network device, including attacks on network routers and servers (e.g., web servers, email servers, DNS servers, etc.). Examples of DoS attacks include:
1) flooding a network, thereby preventing legitimate network traffic; 2) disrupting a server by sending more requests than it can possibly handle, thereby preventing access to a service; 3) preventing a particular individual from accessing a service; and 4) disrupting service to a specific system or person. See also:
DDoS.

DRAM. Dynamic RAM. Volatile RAM that is characterized by its need to be continually refreshed, in order to prevent data loss, and by it relatively high storage density, relatively high performance, and relatively low cost. The dynamic nature of DRAM, in conjunction with its rapidly increasing storage density, creates a situation where there's a significant probability of multiple SEUs coinciding in the same access, resulting in data loss that is detected by cannot be corrected, or worse yet, undetectable data loss. See also: CHARM, MRAM, NVRAM, RAM, SEU, SRAM.
DSL. Digital Subscriber Line. A half-duplex communications link typically superimposed on a standard telephone line pair, such that the ordinary analog voice signal (if present) on the same line is not affected.
DSL links are normally considered to be "broadband," especially for download speeds of at least 256 Kbps.
See also: ADSL.
DSSA. Direct Spread-Spectrum Addressing.
EAL. Evaluation Assurance Level. A package consisting of assurance components from the Common Criteria (CC), Part 3 that represents a point on the Common Criteria predefined assurance scale (e.g., EAL5 or EAL6).
EMP. Electro-Magnetic Pulse.
Emulate. Generally, to imitate exactly. Specifically, the capacity of one computer system to imitate another, or to imitate the interfaces and environment of another, such that relative to a particular set of expectations there is no difference between the emulator and that which it emulates. See also: Simulate.
Emulation. The situation in which one computer behaves like another, or imitates another's interfaces and environment. See also: Simulation.

FACTUALTM. Frequency-Adaptive Computation Table & Use-Adaptive Lookup. A
process-oriented memoization ("memo table") capability that retrieves previously computed, "vetted" results for arbitrary deterministic processes and functions. All values that can affect the output (including the identification of the exact process and any parameters) may be provided as input, along with a timeout value and a list intended recipients, and a signed and certified result can be sent to them. FACTUAL
implements a race ("looking up"
vs. "recomputing" vetted results), but lookup typically starts before the request even reaches the head of the request queue for the target process. In the event a process starts due to timeout, if there's a "hit" and the looked-up result becomes available in time, it may be used as an oracle instead, to check the process.
Misses cause no latency penalty. Unlike a memoized function (which is responsible for caching its own results), FACTUAL is a global, process-based capability that takes advantage of the persistent associative memory of the CHARM subsystem. See also: Cache, CHARM, Memoization, Memoize, Memoized Function.
FASTpageTM. Fast Associative Search Tree, pageable. A fast, highly scalable, associative memory mechanism that can adapt to the information to be remembered, in order to optimize both time and space.
FASTpage index size is limited only by the availability of system-wide resources. FASTpage is well-suited to both transient in-memory data (generally faster than hash-table searching) and persistent data (designed for extremely fast searches with indexes in flash memory). The FASTpage search mechanism is based on a hybrid comprising a novel "pageable" Ternary Search Tree (TST) (having compressed, variable-length nodes and exhibiting locality of reference) and a novel "pageable" digital search Trie (having vectored, compressed, variable-length off-page references). The FASTpage storage mechanism is optimal for flash-based storage, and intended for use in hierarchical memory systems such as those involving DRAM, NVRAM or MRAM, flash-memory, and magnetic disk. In a preferred embodiment, DRAM is used for caching FASTpage pages, NVRAM or MRAM is used for building new pages, flash-memory is used to store FASTpage pages, and magnetic disk is used to store referenced content. Each FASTpage implementation supports an arbitrary number of independent local search spaces, limited only by local storage capacity. Each FASTpage search space may be individually defined to be either transient or persistent, with individually specifiable survival requirements. In a preferred embodiment, FASTpage is implemented in hardware as a CORE engine, within one or more PUMP devices, and also in software that executes on MASTERs, SLAVEs, and SERVANTs.
See also: CORE, DRAM, MASTER, MRAM, NVRAM, PUMP, SERVANT, SLAVE, TST.
FEC. Forward Error Correction. A form of error correction that encodes redundancy into data in order to recover the original (intended) data in the event of partial data loss or corruption. The "forward" nature of FEC stems from the fact that corrections can be applied while still making progress - i.e., without having to "go backward" by retrying a communication or retrieval operation. CHARM uses FEC for communications, and also for data distributed both locally and remotely on both transient media (e.g., in DRAM) and persistent media (e.g., flash or magnetic storage). Because CHARM's storage formats are already FEC-encoded, stored data can be transmitted without further FEC encoding, as appropriate.
In a preferred embodiment of SHADOWS, the FEC coders are implemented as part of the CORE functions embedded in the PUMP
devices. In a preferred embodiment, the general purpose processors in SHADOWS
also implement the FEC
codecs. See also: CHARM, Codec, CORE, DRAM, ECC, FIRE, NEAR, PUMP.
FIRETM. Fast Index & Repository Emulator. The technology underlying a FIREbladeTM or FIREdriveTM, and used by CHARM as its primary online persistent storage. FIRE combines quickly securable DRAM and BB-RAM for high-speed storage of rapidly changing data, with the DRAM used for caching already-stored (and therefore expendable) in-the-clear data , for example, and the BB-RAM
used for buffering committed transactional data (in conjunction with a set of other suitably distributed instantiations of FIRE with which it communicates) not yet written to long-term, persistent data storage. FIRE uses flash-based storage (or its equivalent) rather than magnetic storage, and this provides high-performance all-electronic, long-term, persistent data storage that is immune to mechanical wear and vibration (including seismic events). The flash-based storage also operates at very low power (typically less than 1 microwatt per TOPS, vs. more than 20 to 40 milliwatts per TOPS for low-power and/or high-performance magnetic disk drives). The persistently stored data (whether in BB-RAM or flash memory) is safe from intruders even if stolen. In the case of power failure, information secured in the BB-RAM can be written directly to its reserved locations in long-term (e.g., flash-based) storage. In a preferred embodiment, with hundreds of input/output channels, the number of read/write accesses per second, per FIRE channel, is orders of magnitude faster than the per-channel rate of hard disk drives. See also: BB-RAM, DRAM, CHARM, NEAR.
FIFO. First In, First Out. A queuing discipline intuitively equivalent to "first come, first served."
FLAMERouterTM. Firewall, Link-Aggregator/Multiplexer & Edge Router. (aka FLAMER) A special MASTER that serves as a gateway and tunneling router between the LAN fabrics of a SHADOWS node and one or more wide-area networks (WANs). Automatically tunnels SHADOWS
communications protocols (e.g., RUSH, RECAP, UNCAP) over existing LAN and/or WAN protocols as necessary. See also: HSLS, RUSHrouter, LAN, Tunneling Router, WAN.

FORCETM. Frictionless Organic Rankine Cycle Engine. A kilowatt-scale turboalternator (heat engine) consisting primarily of an efficient, low-temperature (130 C), low-pressure (6-8 bar) vapor turbine connected to an alternator. The FORCE turboalternator has only one moving part (the shaft), which spins at very high speed (e.g., nominally at 62,000 RPM in a preferred embodiment) and rides on hydrodynamic "vapor bearings" - essentially a vapor layer created by its rotating foils, due to the Bernoulli effect. The vapor bearings are optimally advantageous to reduce friction losses to near zero during normal operation (there is still some residual friction due to colliding vapor molecules). A novel optional embodiment of the FORCE
turbine also engages separate pneumatic-like vapor bearings during spin-up and spin-down (but not during normal operation), and thus completely avoids any wear-inducing friction within the turbo machinery. In the absence of separate spin-up/spin-down vapor bearings, the foils of such turbo-machinery incur friction (and thus, wear) during spin-up and spin-down (whenever the turbine drops below, say, for example, about 2500 RPM).
FPGA. Field-Programmable Gate Array. A type of reconfigurable logic, based in hardware, that may also include specialized, embedded devices to that provide enhanced functionality and/or performance while minimizing the use of reconfigurable logic gates. Contrast with ASIC. See also: ASIC.
FPSC. Free Piston Stirling Cooler.
FPSE. Free Piston Stirling Engine.

FRAME"'. Forced Recuperation, Aggregation & Movement of Energy. A power production and/or peak-shaving energy management capability whose goal is to reduce operational costs and enhance or enable survivability. FRAME works by significantly reducing the energy required to operate a heat-dissipating system (such as a computing system), through the recuperative use of energy in general, and by time-shifting the generation and consumption of power to the most effective and/or efficient time-frames.
FSLS. Fuzzy Sighted Link State. A family of wireless routing algorithms (e.g., for a wireless mesh) that depend on the observation that changes in links that are far away (i.e., relative to the mesh) are less This does not imply that the already-stored data is stored in the clear (it is not), but rather, that the in-the-clear data is allowed to exist only in DRAM, where it can be rapidly erased (via de-powering the DRAM, for example) in case of a security breach. Note that de-powering DRAM is insufficient to prevent recovery of the most recent content by a sophisticated (e.g., state-sponsored) attacker, but it is the best available move against a powered-on probe attack.

important than those links that are nearby. With FSLS, any changes in link states are propagated quickly to nearby nodes, and much less quickly to distant nodes (because the distant nodes don't directly use the nearby link state updates in their link state calculations). See also: HSLS.
Fully Associative Cache. A cache where data from any address can be stored in any cache location. The whole address must be used as the tag. All tags must be compared simultaneously (associatively) with the requested address and if one matches then its associated data is accessed.
This requires an associative memory to hold the tags which makes this form of cache more expensive. It does however solve the problem of contention for cache locations (cache conflict) since a block need only be flushed when the whole cache is full and then the block to flush can be selected in a more efficient way. See also: Cache, Direct Mapped Cache, Set Associative Cache.
Gbps. Giga-bits per second. A measure that often refers to a serial communications rate, in billions of bits per second (i.e., one thousand Mbps). Although there are technically 8 bits per byte, in serial communications there is usually a synchronization overhead (e.g., 1 start bit and 1 stop bit for every 8 bits of data), resulting in a 10:1 ratio of bits to bytes when calculating raw throughput (i.e., ignoring compression and additional protocol overheads). Thus, for example a 10 Gbps link might yield only 1 GBps (10/10 = 1) rather than 1.25 GBps (10/8=1.25). See also: Mbps.
GWP. Global Warming Potential.

HANDLERTM. Host Abstraction for Named Devices & Layered Executable Resources.
In a preferred embodiment, a MASTER assigns virtualized computing, storage, and communications resources to a set of SLAVEs over which it has authority, and a HANDLER implements the physical interfaces of these resources.
In a preferred embodiment, the HANDLER interfaces and logic are implemented within the SLAVE PUMP
device(s) to which the SLAVE processors are attached, such that the HANDLER
hardware provides functionality similar to a software-based isolation kernel. In a preferred embodiment, the HANDLER's hardware implementation supports dedicated per-process registers and FIFO
devices that enable user-space input/output without system call overhead, within the security constraints set by the MASTER. See also: MASTER, PUMP, SELF, SERVANT, SLAVE.
Heat Flux. The flow rate of heat across or through a material, or the quantity of thermal energy transferred to a unit area per unit time, typically given in units of W/cm2. See also:
Critical Heat Flux, RUBE.
HMAC. Hashed Message Authentication Code. (Sometimes just MAC, although this can be confusing). A
one-way hash computed from a message and some secret data, for the purpose of detecting whether a message has been altered. It is difficult to forge without knowing the secret data. See RFC 2402. See also:
MAC.
HPC. High Performance Computing.
HSLS. Hazy-Sighted Link State. A routing algorithm (invented by researchers at BBN Technologies) in the family of wireless routing algorithms called FSLS. Its designers sought to minimize global network waste, the total overhead of which they defined as, "the amount of bandwidth used in excess of the minimum amount of bandwidth required to forward packets over the shortest distance (in number of hops) by assuming that the nodes had instantaneous full-topology information." The network overhead associated with HSLS is theoretically optimal, utilizing both proactive and reactive link-state routing to limit network updates in space and time, and on larger networks HSLS begins to exceed the efficiencies of the best-known other routing algorithms. Unlike traditional methods, HSLS does not flood the network with link-state information to attempt to cope with moving nodes that change connections with the rest of the network, nor does it require each node to have the same view of the network. In the SHADOWS infrastructure, a variant of HSLS may be used generally within any subsystems where distributed resource information is relevant to the distribution of resource flows (e.g., information, working fluids, energy, etc.), including specifically within the implementation of the FRAME subsystems (e.g., STEER) and the RUSH protocol (e.g., within each RUSHrouter) in particular. However, whereas HSLS chooses a single path such as within a wireless (radio) mesh network, the RUSH protocol chooses multiple paths, and is not limited to any particular types of networks (e.g., wired vs. wireless, mesh vs. point-to-point, etc.). Also, whereas HSLS chooses its path based on performance, the RUSH protocol views performance as only one of several indicators, and also considers resource consumption (e.g., channel types, bandwidth quotas, energy usage, energy reserves), service factors (e.g., type of service, SLAs, QoS), and security issues (e.g., risk posture, channel safety, stealth, visibility to traffic analysis), etc. See also: FLAMERouter, FSLS, QoS, RUSH, RUSHrouter, SLA.
HVAC. Heating, Ventilation, & Air Conditioning.

12P. "Garlic Router". An open-source, anonymizing overlay network based on establishing secure, multi-hop connections among intentionally selected 12P nodes. Although 12P
incorporates lessons learned from TOR, an alternative anonymizing network that predates it, 12P is fundamentally a packet switched network, while TOR is fundamentally a circuit switched one, allowing 12P to transparently route around congestion or other network failures, operate redundant pathways, and load-balance the data across available resources.
TOR and 12P complement each other in their focus - TOR works towards offering high speed anonymous Internet outproxying, while 12P works towards offering a decentralized, resilient, low-latency network in itself.
One goal of 12P is to achieve appropriateness for use in hostile regimes against state-level adversaries. 12P
uses a technique called "garlic routing" - layered encryption of messages, passing through routers selected by the original sender. 12P sends messages by taking a message, encrypting it with the recipient's public key, taking that encrypted message and encrypting it (along with instructions specifying the next hop), and then taking that resulting encrypted message and so on, until it has one layer of encryption per hop along the path. Furthermore, at each layer, any number of messages can be contained, not just a single message. In addition to the "cloves" (individual messages), each unwrapped garlic message contains a sender-specified amount of padding data, allowing the sender to take active countermeasures against traffic analysis. 12P
makes a strict separation between the software participating in the network (a "router") and the anonymous endpoints ("destinations") associated with individual applications. Any SHADOWS nodes that implement the RUSH protocol can participate in the 12P network both as one or more 12P
routers and as one or more 12P
endpoints or destinations. Although SHADOWS does not depend on 12P, participating in the 12P network provides a source of mix-in traffic that helps to prevent traffic analysis by a sophisticated attacker, while also helping the 12P network. See also: TOR, RUSH.
IEEE. Institute of Electrical and Electronic Engineers.
IGMP. Internet Group Management Protocol. A standard protocol for managing multicast groups.
Integrity. The quality of an information system reflecting the logical correctness and reliability of the operating system; the logical completeness of the hardware and software implementing the protection mechanisms; and the consistency of the data structures and occurrence of the stored data. In a formal security mode, integrity is interpreted more narrowly to mean protection against unauthorized modification or destruction of information.
Internet. The largest collection of networks in the world, interconnected in such a way as to allow them to function as a single virtual network. See also: IP.
IOPS. Input-output Operations Per Second. A measure of storage device random access performance.
Storage devices such as magnetic disk drives support on the order of 100 TOPS, assuming one 512-byte sector for each 10 (input-output) operation. Because the number of TOPS is partly determined by track-to-track latency and partly by rotational latency, and the latter is tied to the rotational rate of the drive, performance is much less than 100 TOPS for low-cost drives spinning at 4200 to 7200 RPM, and only slightly more than 100 TOPS for expensive top-of-the-line drives spinning at 15,000 RPM. In contrast, an inexpensive USB flash drive capable of read or writing at 20 MBps can achieve on the order of 40,000 TOPS
(an improvement of 400x). Storage devices, however, are also rated in terms of sustained throughput (e.g., MBps), such as when streaming a large file with a single access, and in this context a single magnetic disk drive provides on the order of 60 MBps sustained vs. a USB flash drive throughput on the order of 20 MBps sustained. See also: MBps, USB.
IP. Internet Protocol. See RFC 791 and RFC 2460. See also: TCP, UDP.
ISP. Internet Service Provider.
LAN. Local Area Network.
LDAP. Lightweight Directory Access Protocol. See also: DELEGATE.
LEB128. Little Endian Base 128. A scheme (generally known in the art) for encoding integers densely that exploits the assumption that most integers are small in magnitude. This encoding is equally suitable whether the target machine architecture represents data in big-endian or little-endian order. It is little-endian only in the sense that it avoids using space to represent the "big" end of an unsigned integer when the big end is all zeroes or sign-extension bits.
MAC. Message Authentication Code. Used to validate information transmitted between two parties that share a secret key. Also: Media Access Control, the globally unique hardware address of an Ethernet Little Endian describes a processor architecture for which the least significant byte of a multibyte value has the smallest address.

network interface card. Also: Mandatory Access Control, a security criteria that contrasts with Discretionary Access Control (DAC). See also: DAC, HMAC.
MASTERTM. Multiprocessor Adaptive Scheduler & Task Executor/Redirector. A
distinguished capability that is responsible for participating in security decisions, including resource allocation, under the auspices of a trusted BOSS. A MASTER (or would-be MASTER, i.e., a "Candidate MASTER) maintains its distinguished bona fide MASTER status only under the auspices of other MASTERs (which is part of the function of the SELF system). Bona fide MASTERs self-organize into local and distributed teams that are collectively responsible for getting work done (including the computation and storage of data). In a preferred embodiment, a MASTER may have a number of dedicated, trusted, attached (and therefore local) SLAVE
resources over which it enjoys complete control, via a HANDLER, and any number of "volunteer" SERVANT
resources that are not trusted. See also: BOSS, HANDLER, SELF, SERVANT, SLAVE.
MARSHAL T11. Multi-Agent Routing, Synchronization, Handling, & Aggregation Layer. A distinguished SERVANT node having the responsibilities of fulfilling a MARSHAL role. Any node, authenticated as having a MARSHAL role, that serves as a gateway for system users to access SHADOWS
services via a network (e.g., the Internet). A MARSHAL may also communicate with other MARSHALs, under the auspices and control of a MASTER-led team, in order to implement one or more overlay networks and/or network fabrics whose purposes and characteristics are determined by the MASTER-led team (but are opaque to the MARSHALs). By design, a MARSHAL is not trusted, and the role is typically fulfilled by a SERVANT node (which is also inherently untrusted). Occasionally the MARSHAL role is fulfilled by a SLAVE (emulating a MARSHAL) that is operating under the auspices and control of a MASTER, through a HANDLER, and is therefore trusted, but this fact is never known to those communicating with the MARSHAL. A MARSHAL
may reside virtually anywhere (e.g., at an ISP, on customer premises, at a telco central office, at a datacenter, on a utility pole, within a server or PC, etc.). See also:
HANDLER, ISP, MASTER, PC, SELF, SERVANT, SLAVE.
Mbps. Mega-bits per second. A measure that often refers to a serial communications rate, in millions of bits per second. Although there are technically 8 bits per byte, in serial communications there is usually a synchronization overhead (e.g., 1 start bit and 1 stop bit for every 8 bits of data), resulting in a 10:1 ratio of bits to bytes when calculating raw throughput (i.e., ignoring compression and additional protocol overheads).
Thus, for example a 10 Mbps link might yield only 1 MBps (10/10 = 1) rather than 1.25 MBps (10/8=1.25).
See also: MBps.
MBps. Mega-Bytes per second. A measure that often refers to a storage device throughput rate, in millions of bytes per second, where 1 byte equals 8 bits. A storage device may have a peak rate that is constrained by its interface, and this rate is normally achieved only for short bursts, when the associated read or write request can be satisfied via the device's cache memory. A
storage device also has a sustained rate that corresponds to the maximum rate at which the device can continuously read or write data, and this rate is tied to the accessibility of the underlying storage media (i.e., the media rate). See also: Mbps.
MDS. Maximum Distance Separable. Refers to a class of space-optimal erasure codes specified as (n,k), where n-k specially coded extra symbols are created from k original symbols, and any k out of n original and extra symbols is sufficient to reconstruct the original k symbols, which means that up to e erasures can be tolerated, where e = n-k. Such a code may also be equivalently represented as (n+m,m), where n specially coded extra symbols are created from m original symbols, and any m out of n+m original and extra symbols is sufficient to reconstruct the original m symbols, which means that up to e erasures can be tolerated, where e = n. By definition, erasures are symbols missing from known locations (i.e., the symbols are not known, but their position is). If, instead of e erasures, there are up to f faulty symbols, but their positions are unknown, then a system that can correct up to e erasures can correct at most f faulty symbols, where f= e/2 (intuitively, half the redundant codes are used to locate the errors, and the other half to correct them).
Memoization. A technique by which an existing function can be transformed into one that "remembers"
previous arguments and their associated results. See also: FACTUAL, Memoize, Memoized Function.
Memoize. To modify a function such that re-computation of previously computed results is avoided in favor of retrieving and substituting the previously computed results themselves.
Memoization essentially augments a computational function with a cache of previously computed results, indexed by the arguments of (i.e., inputs to) the previous computations. Memoization, since it is based on caching, therefore trades space for time. Memoization is only appropriate for pure functions (one with no side effects, whose return value depends only on the values of its arguments). Memoization is useful in all kinds of situations, including: almost any kind of searching (cache the results of the search so that you can skip it next time),

17 HTML generation (cache the results of the generation process so that you don't have to generate the page next time), and numeric computation (cache the results of the computation).
The word "memoize" was coined by Donald Michie in 1968. See also: Cache, Caching, FACTUAL, Memoization.
Memoized Function. A function that remembers which arguments it has been called with and the result returned and, if called with the same arguments again, returns the result from its memory rather than recalculating it. A memoized function (i.e., one with caching) may run faster than one without caching, but it uses up more memory. This same principle is found at the hardware level in computer architectures which use a cache to store recently accessed memory locations. See also: Cache, Caching, FACTUAL, Memoize.
MPI. Message-Passing Interface. See also: DELEGATE.
MRAM. Magnetic RAM. A relatively new type of NVRAM. See also: BB-RAM, NVRAM, RAM.
MTBF. Mean Time Before Failure. Older but synonymous term for MTTF. See also:
MTTF, MTTR.
MTTF. Mean Time To Failure. When applied to disk drives, the MTTF is estimated as the number of power-on hours per year (usually assumed at 100% power on) divided by the AFR.
Thus, a server-class disk drive with a manufacturer-specified AFR of 0.63% would have an estimated MTTF of about 1.4 million hours. PC-class disk drives typically have much lower AFR values, which are also calculated with a much lower number of power-on hours per year. Note, however, that even for server-class drives, observed AFR
values in the field exceed 1 %, with 2% to 4% common, and up to 12% observed in some systems, so the estimated MTTF needs to be carefully considered. See also: AFR, Availability, MTBF, MTTR.
MTTR. Mean Time To Repair. The average time (usually determined through empirical measurement) required to restore service after a breakdown or loss. See also: Availability, MTTF.
NaN. Not a Number. A value or symbol that is usually produced as the result of an operation on invalid input operands, especially in floating-point calculations. For example, most floating-point units are unable to explicitly calculate the square root of negative numbers, and may instead indicate that the operation was invalid and return a NaN result. In floating-point calculations, NaN is not the same as infinity, although both are typically handled as special cases in floating-point representations of real numbers as well as in floating-point operations. An invalid operation is also not the same as an arithmetic overflow (which might return an infinity) or an arithmetic underflow (which would return the smallest normal number, a denormal number, or zero). A NaN does not compare equal to any floating-point number or NaN, even if the latter has an identical representation. One can therefore test whether a variable has a NaN value by comparing it to itself (i.e. if x != x then x is NaN). In the IEEE floating-point standard (IEEE 754), arithmetic operations involving NaN
always produce NaN, allowing the value to propagate through a calculation (there are exceptions to this behavior in a proposed future standard). See also: IEEE.

NEAR". Nearline Emulation & Archival Repository. Used by CHARM for nearline storage, NEAR is the technology underlying a NEARbladeTM or NEARdriveTM. It provides high-capacity, electronically assisted long-term data storage that is subject to minimal mechanical risk (including wear, vibration, and seismic events), due to significantly reduced mechanical duty cycle. The NEAR
technology attempts to minimize the number of spinning disk drives while providing full accessibility to data.
Prior to spin-down, the NEAR
technology performs extensive analysis and maintenance, after which it may reconfigure the system as necessary in accordance with the analysis and maintenance results. Data stored in NEAR is safe from intruders even if stolen. As a fringe benefit of the NEAR storage approach, the number of read and/or accesses per second is orders of magnitude faster than unassisted hard disk drives. See also: CHARM, FIRE, SMART.
NVRAM. Non-Volatile RAM. Contrast with DRAM and SRAM, which are volatile. BB-RAM and MRAM are types of NVRAM. See also: BB-RAM, DRAM, MRAM, RAM, SRAM.
Object. An entity that contains or receives information and upon which subjects perform operations.
Packet. An ordered group of data and control signals transmitted through a network as a subset of a larger message.
Packet Switching. A communications paradigm in which packets (messages or fragments of messages) are individually routed between nodes, with no previously established communication path. Packets are routed to their destination through the most expedient route (as determined by some routing algorithm). Not all packets traveling between the same two hosts, even those from a single message, necessarily follow the same route.

18 Page Fault. The condition that occurs in a virtual memory system when there is an attempt to access a virtual memory page that is not currently present in physical memory. See also: Demand Paging, Swapping.
PC. Personal Computer.

PERKSTM. Peak Energy Reserve, Kilowatt-Scale. A peak-shaving system that directly captures excess or low-cost electrical energy from a multiplicity of sources (when it is cheapest or most readily available) and stores it for later reuse, such as during peak periods (when power is most expensive or less available).
Unlike a UPS which remains charged "just in case," the PERKS capability continually captures and discharges stored energy `just in time," as needed, so as to reduce the overall energy cost and maximize full-processing availability. Depending on capacity and load, PERKS may also serve as an extended runtime UPS.
POL. Point-of-Load. Point-of-load (POL) DC-DC converters enable electronic developers to overcome the challenges caused by the high peak current demands and low noise margins of high-performance semiconductor devices, by placing individual, non-isolated, DC power sources near their point of use, thereby minimizing losses caused by voltage drops and ensuring tight voltage regulation under dynamic load conditions. POL devices also reduce noise sensitivity and EMI emissions by significantly shortening potential radiators and RF-susceptible conductors.
Policy-based Management. A method of managing system behavior or resources by setting "policies"
(often in the form of "if-then" rules) that the system interprets.
POP3. Post Office Protocol, v.3. See also: DELEGATE.
PP. Protection Profile. An implementation-independent set of security requirements and objectives for a category of products or systems which meet similar consumer needs for IT
security. A PP is intended to be reusable and to define requirements which are known to be useful and effective in meeting the identified objectives. Also: A reusable set of either functional or assurance components (e.g., an EAL), combined together to satisfy a set of identified security objectives. Information about Protection Profiles can be found on the Internet at http://www.iatf.net. See also: CC.
Priority Loads. In power plants where load management schemes are used, a priority is assigned to each load center. Loads with the highest priority are powered first and shed last.
PRNG. Pseudo-Random Number Generator. A mechanism for generating pseudo-random numbers on a computer. They're called pseudo-random, because you can't get truly random numbers from a completely non-random thing like a computer. A pseudo-random number generator is a computational or physical device designed to generate a sequence of numbers that does not have any easily discernable pattern, so that the sequence can be treated as being random. In reality, however, if a computer generates the number, another computer can reproduce the process. Random number generators have existed since ancient times, in the form of dice and coin flipping, the shuffling of playing cards, the use of yarrow stalks in the I
Ching, and many other methods. See also: Pseudo-Random Number, Random Number Generator.
Protocol. A formal set of conventions governing the formatting and relative timing of message exchange between two or more communicating systems or devices.
Proxy. A software agent, often a firewall mechanism, that performs a function or operation on behalf of another application or system while hiding the details involved. See also:
FLAMERouter, MARSHAL, RUSHrouter.
Proxy Server. A firewall component that manages Internet traffic to and from a LAN and that can provide other features, such as document caching and access control. A proxy server can improve performance by supplying frequently requested data, such as a popular Web page, and it can filter and discard requests that the owner does not consider appropriate, such as requests for unauthorized access to proprietary files.
Pseudo-Random Number. One of a sequence of numbers generated by some algorithm so as to have an even distribution over some range of values and minimal correlation between successive values. See also:
PRNG.
PUMPTM. Parallel Universal Memory Processor. See also: HANDLER, MASTER, SLAVE.
PV. Photo-Voltaic. A type of solar cell and produces electrical energy on exposure to sufficiently bright light sources.

19 QoS. Quality of Service. A term used in an SLA denoting a guaranteed level of performance (e.g., response times less than 1 second). Also: A group of service classes that define the performance of a given circuit.
RAM. Random Access Memory. A computer's direct access memory that can be accessed very quickly and overwritten with new information. With the exception of NVRAM (which is specifically non-volatile), RAM
loses its content when power is turned off, but not so much that its cannot be reconstructed by a sophisticated (e.g., state-funded) adversary (the longer that a particular memory bit maintains its value, the more recoverable it is by an adversary that gains physical access, so powering off, or making a few passes of rewriting random data really has little effect). In a preferred embodiment of the CHARM PUMP
subsystem, memory accessible by the PUMP (and via the PUMP), can be protected by the PUMP's ability to maintain complementary states in memory - a technique where memory locations are invisibly toggled to and from their complementary states such that each state has a duty cycle of approximately 50% (which means that an adversary gaining physical access cannot determine previous contents after a power-off).
See also: BB-RAM, CHARM, DRAM, MRAM, NVRAM, PUMP, SRAM.

RECAPTM. Reliably Efficient Computation, Adaptation, & Persistence. A
proprietary asynchronous real-time protocol used by MASTERs to communicate with other MASTERs, and with any SLAVEs under their control. The RECAP protocol is never used in the clear, even locally, and it is assumed to be subject to Byzantine failures. RECAP may be safely tunneled via other protocols, especially RUSH, but such tunneling is performed only by a specially authorized device called a FLAMERouter (described elsewhere), which also contains a MASTER. RECAP is used for communication among what are "hoped" to be trusted parties, in contrast to UNCAP. See also: FLAMERouter, RUSH, Tunneling Router, UNCAP.
RF. Radio Frequency.
RLE. Run-Length Encoding. A very simple form of lossless data compression in which runs of data (that is, sequences in which the same data value occurs in many consecutive data elements) are stored as a single data value and count, rather than as the original run. This is most useful on data that contains many such runs; for example, simple graphic images such as icons and line drawings, but can provide compression as long as the encoded sequence is shorter than the original run.
Compression in CHARM is augmented by RLE. CHARM uses a back-to-back encoding sequence RLE-1, RLE-2, and RLE-3, to first encode repeating single-byte runs (such as a string of blanks), then repeating double-byte runs (including pairs produced by RLE-1), then repeating triple-byte runs (including triplets produced by RLE-2). After CHARM does RLE-encoding, additional compression techniques are used. While all the CHARM algorithms can be implemented in software, in a preferred embodiment they are implemented in hardware, in the CHARM PUMP. See also: CHARM, Delta Compression.
RNG. Random Number Generator. A random number generator is a computational or physical device designed to generate a sequence of numbers that does not have a pattern. In theory, true random numbers only come from truly random sources, such as atmospheric noise and radioactive decay.
RUBE T"'. Recuperative Use of Boiling Energy. A system using a relatively low-temperature phase-change working fluid in conjunction with heat exchanger surfaces that promote heterogeneous nucleation, in order to separately recuperate heat energy ("boiling energy") from hot spots and warm spots for immediate or subsequent reuse. "Boiling energy" in this context refers to energy that can be used immediately (or stored for later use) to help effect a liquid/vapor phase-change, without approaching the critical heat flux. In a preferred embodiment, recuperated energy heats and expands the working fluid (causing a phase-change to vapor if the temperature is sufficiently high), which, in conjunction with optional vapor injection, creates a motive force that helps to circulate the working fluid among system components (in order to thermally stabilize the system, to further extract re-usable energy for immediate reuse or storage, and to efficiently exhaust waste energy without overly subcooling the working fluid). In a preferred embodiment, a small, continuous, positively pressurized liquid flow is maintained, ensured via a low-power pump means, in order to prevent dryout, eliminate local hot spots, and assure thermal stability -as an asset-protection mechanism that serves to reduce or eliminate dependency on thermal expansion, nucleation and vapor injection as the only motive forces (the pump may be optionally powered off when the required flow can be maintained without it, e.g., due to vapor injection or other means). In a preferred embodiment, the low-power pump means is doubly or triply redundant, due to its nature as an asset-protection mechanism. An overview of the RUBE Heat Energy Recuperation Cycle is depicted below, for a preferred embodiment. See also: Critical Heat Flux, Heat Flux, RUBE Double-Boiler, RUBE Inner Boiler, RUBE Vapor Injector.

RUBE Double-Boiler. The RUBE Double-Boiler apparatus is part of a closed-loop system, that, in a preferred embodiment is connected to other components as shown in the figure, "RUBE - Heat Energy Recuperation Cycle Overview." The RUBE Double-Boiler apparatus comprises an "inner boiler" and an "outer boiler," such that the former is fully enclosed within the latter, in order to maximize the recuperation of heat energy (thermal energy) dissipated by the aggregation of enclosed heat sources, and optionally, to separate the recuperated heat energy into two or more "grades" according to desired or observed temperatures. In a preferred embodiment, the "hot" heat sources (i.e., those components with a relatively higher heat flux, such as CPUs) are placed within the inner boiler (or at least have their "hot" surfaces within the inner boiler), and the "warm" heat sources (i.e., those components with a relatively lower heat flux, such as flash memory chips) are placed within the outer boiler. Both the inner and outer boilers are pressure vessels intended to withstand a maximum of 7-bar operating pressures (100 PSI) under normal conditions.
Leaks within the inner boiler cause only a reduction in efficiency, but leaks in the outer boiler can cause a loss of working fluid and a subsequent reduction in local survivability. In a preferred embodiment, such as for electronics thermal stabilization applications, the working fluid may be an organic dielectric fluid with a boiling point between 20 C and 40 C, such as 1-methoxy-heptafluoropropane (C3F7OCH3). Other working fluids may also be suitable, some examples of which are listed in section 10.3. In a preferred embodiment, the RUBE Double-Boiler apparatus has an outer shell of cast aluminum (although other construction methods and materials are possible), and its external shape and form factor is such that it can mate with guide channels extruded into a vertically oriented cylindrical or partly cylindrical aluminum extrusion designed to contain a multiplicity of RUBE Double-Boiler units. Given the aforementioned vertically oriented extrusion, the intent is to be able to easily align and slide the RUBE Double-Boiler apparatus from the extrusion upper opening, downward into the extrusion until it reaches a bulkhead, where couplings and connectors on the bottom of the Double-Boiler apparatus mate with complementary couplings and connectors within the extrusion. In a preferred embodiment, the RUBE Double-Boiler apparatus is a pressure-sealed, field-replaceable unit having blind-mating, quick-disconnect inlet and outlet couplings with double EPDM seals, and capable of operating at 100 PSI, such as those available from Colder (the extrusion would contain mating couplings). In a preferred embodiment, the RUBE Double-Boiler apparatus is also electrically sealed and EMP-hardened, having blind-mating, quick-disconnect electrical connectors with a multiplicity of conductors appropriate for the ingress and egress of electrical power feeds and various high-frequency signals such as are common in computer and telecommunications devices. In a preferred embodiment, the RUBE Double-Boiler apparatus connects to a "bottom plane" or equivalent connector arrangement in the vertical extrusion by means of a proprietary, pin-free connector designed by Morgan Johnson, and having the property of providing an extremely high quality, nearly noise-free connection. See also: RUBE, RUBE
Inner Boiler.
RUBE Inner Boiler. A means for recuperating the heat energy dissipated by the relatively high-heat-flux heat-producing devices so that, to the extent practical, it can be converted to usable mechanical and/or electrical energy. The inner boiler apparatus is colocated with the "hot"
surfaces (the surfaces with the largest heat flux) of the "hottest" of the heat-producing devices, which are so arranged that such placement is possible with a minimum (or otherwise convenient) number of manifoldst. In a preferred embodiment, partly depicted below, the inner boiler apparatus is oriented vertically (although it is depicted horizontally here, for convenience) such that the liquid inlet 0 and vapor outlet 0 are at the top, and the liquid outlet 0 is at the bottom. Once normal steady-state operation is reached, working fluid vapor is expelled through vapor outlet 0 and liquid outlet 0 is not used. Liquid working fluid is forced into liquid inlet 0, where it is equitably distributed within the injection-molded manifold chamber 0 and 0 to each heat exchanger's 0 inlet check valve, which it can then enter, since the working fluid is under pressure. For each heat exchanger 0, once the working fluid passes the corresponding inlet check valve, it enters the heat exchanger 0, where it circulates among the heat exchangers fins, pins, or other heat exchange surfaces. Depending on the then-current temperature and pressure, the working fluid may acquire heat energy, causing all or part of it to evaporate. In a preferred embodiment, such as for electronics thermal stabilization applications, the working fluid may be an organic dielectric fluid with a boiling point between 20 C and 40 C, such as 1-methoxy-heptafluoropropane (C3F7OCH3). Other working fluids may also be suitable, some examples of which are listed in section 10.3. In a preferred embodiment, the working fluid expands substantially when heated.
Since the inlet is check-valved, this expansion greatly pressurizes the heat exchanger and the working fluid is expelled through the outlet check-valve (where it makes its way to vapor outlet 0 and/or liquid outlet 0), thereby creating a partial vacuum within the heat exchanger 0 under discussion (which helps to pull in more Although other seal materials are possible, EPDM is preferred for its compatibility with the preferred working fluid.
f One of the factors determining the maximum size of the manifolds is the desire to take advantage of"Rapid Injection Molding" techniques, in order to reduce the cost and lead times normally associated with injection-molded components.

liquid working fluid). The hotter the system gets, the higher the pressure at which it can operate, up to the maximum desired target temperature of the heat-producing devices, or the useful upper limit of the working fluid, whichever is lower. In a preferred embodiment, one set of manifolds operates in the 30 C to 40 C
range for a particular class of heat-producing electronic chips, while another set operates simultaneously in the 90 C to 110 C range for a different class of heat-producing electronic chips. The same working fluid is used for both - in fact, the cooler system can "feed" the hotter system (however, this would typically require a boost in pressure, which can be accomplished externally via pumps, or via the RUBE Vapor Injector. See also: Critical Heat Flux, Heat Flux, RUBE, RUBE Double-Boiler, RUBE Vapor Injector.
RUBE Vapor Injector. Inspired by the Gifford Steam Injector (invented in 1858), the RUBE Vapor Injector is a means to: 1) maintain a load (the "boiler") within a desired temperature range, and 2) recuperate as much energy as possible from the heat dissipated by the load, in order to convert the recuperated heat energy into mechanical energy (specifically, pressure energy) that can be used as motive force to reduce or eliminate the energy that would otherwise be needed for circulation pumps in a phase-change heating, cooling, and/or power generation system. In a preferred embodiment, such as for electronics thermal stabilization applications, the working fluid may be an organic dielectric fluid with a boiling point between

20 C and 40 C, such as 1-methoxy-heptafluoropropane (C3F7OCH3). Other working fluids may also be suitable, some examples of which are listed in section 10.3. In a preferred embodiment, the working fluid expands substantially when heated. See also: RUBE, RUBE Double-Boiler, RUBE
Inner Boiler.

RUSH"'. Rapid Universal Secure Handling. A multi-level proprietary communications protocol that has both asynchronous and synchronous characteristics and can stand alone or be tunneled over existing WAN
protocols (whether synchronous or asynchronous). RUSH is used as the primary carrier protocol among FLAMERouters, MARSHALs, and client-side RUSHrouter software or hardware. RUSH
can directly incorporate flows from the RECAP and UNCAP protocols, and also tunnels them, along with various industry-standard protocols. The RUSH protocol can take advantage of other protocols (e.g., 12P, TOR) as necessary to prevent (or reduce the threat of) traffic analysis, and can also tunnel other protocols, for the same reasons. A key characteristic of RUSH is its propensity for simultaneously utilizing multiple network channels, interfaces, gateways, routes, etc., such that a single conceptual source and destination pair effectively becomes multiple targets and destinations that tend not to be apparently related unless an adversary has truly global visibility (in which case, such an adversary still faces a multiplicity of overwhelming cryptographic and traffic analysis challenges). RUSH incorporates statistical information for resource management (load balancing, energy usage, QoS, etc.). See also: DoD, FLAMERouter, 12P, MARSHAL, RECAP, RUSH, RUSHrouter, TOR, Tunneling Router, UNCAP, WAN.

RUSHrouterTM. An untrusted software or hardware tunneling router that implements only a subset of FLAMERouter capability, and in particular, can communicate with the SHADOWS
infrastructure only via the RUSH protocol (which also embeds the UNCAP protocol). RUSHrouters are untrusted because of the lack of control over their environments, and has nothing to do with its inherent trustworthiness. Any computing system containing a SHADOWS non-trusted component (e.g., DELEGATE, SERVANT) must also include at least one RUSHrouter to facilitate communication with the SHADOWS
infrastructure. In a preferred embodiment, each outbound channel interface (e.g., a physical network interface, wireless adapter, etc.) has a dedicated RUSHrouter operating in its own VM; a separate RUSHrouter, also in its own VM, serves as the default gateway for the host computer, interfacing any hosted applications to the SHADOWS infrastructure by appropriately routing communications through the RUSHrouters that control the channel interfaces. See also: DELEGATE, FLAMERouter, RUSH, SERVANT, Tunneling Router, UNCAP, VM.
SAS. Serial Attached SCSI. A disk drive interface standard that supersedes parallel SCSI and can accept either SAS or SATA disk drives. See also: SATA.
SATA. Serial ATA. A disk drive interface standard that supersedes parallel ATA. SATA disk drives be used with either SAS or SATA disk host adapters, but a SATA host adapter can communicate only with SATA drives. See also: SAS.
SBU. Sensitive-but-Unclassified. See also: Sensitive Information.
SCADA. Supervisory Control And Data Acquisition. A category of mechanisms for process control that includes hardware and software components. SCADA provides for the collection of data in real time from sensors and machines in order to control equipment and conditions, and typically includes transmitting the data to one or more central locations for logging and/or analysis.
SCRAMTM. Survivable Computing, Routing, & Associative Memory. A SHADOWS
network building block including computation, routing, and associative memory ("working storage") and implementing at least a particular minimum configuration of the CHARM, SELF, CORE, and FRAME
technologies. An individual SCRAM machine is intended to be self-contained and capable of operating on or off the electrical grid for extended durations, and without human attention or maintenance. By design, a SCRAM machine is its own miniature datacenter that can be located in out-of-the way places such as underground, on a pole or roof, etc., as easily as in an office, warehouse, or datacenter. See also: CHARM, CORE, FRAME, SCRAMnet, SELF, SUREFIRE.
SCRAMnetTM. A SCRAM-based network comprising any number of geographically proximate MASTERs, SLAVEs, and SERVANTs. On a WAN level, SCRAMnets are the basis of the SHADOWS
infrastructure, but must always operate under the auspices of a distributed team that includes multiple MASTERs. Each SHADOWS node is a SCRAMnet, but not necessarily vice-versa. A
SCRAMnet must meet specific requirements to become a SHADOWS node, Geographically proximate SERVANTs can organize themselves into SCRAMnets without having a local MASTER, but only for the purpose of establishing communication with the SHADOWS network, at which point they may be assigned to a MultiMASTER team (which always has multiple MASTERs, by definition). SERVANTs must be able to communicate with the SHADOWS network, either individually or collectively, and they may cooperate extensively to do so. Any "Candidate MASTER" that is unable to establish itself as a MASTER (i.e., a full peer with other MASTERs) retains its candidacy but is unable to fulfill any of the responsibilities of a MASTER. Rather than waste its resources, a "Candidate MASTER" may "volunteer"
(or attempt to volunteer) to operate under the auspices of a team of MASTERs, in the role of SERVANT. See also:
MASTER, SCRAM, SERVANT.
SEC-DED. Single Error Correction, Double-Error Detection. A form of ECC that can correct a single memory error or SEU, and detect two. See also: DRAM, ECC, SEU.

SELF TM. Secure Emergent Learning of Friends. An automated identity- and role-oriented "immune system" that differentiates "self' and "non-self', "friend" and "foe" - i.e., between authorized and unauthorized objects, subjects, and interactions. The focus of SELF is on the recognition of a relatively small set of correct behaviors rather than the recognition of any of an infinitely large set of counterfeit behaviors (by definition, all non-self behavior is assumed malicious). SELF is the basis for establishing and maintaining trust among the interdependent systems, subsystems, and components in a SHADOWS
infrastructure. SELF includes novel Byzantine agreement logic in its decision-making process. SELF is highly integrated with BOSS (which is the definitive authority on trust and correctness), and with the RECAP, UNCAP, and RUSH protocols. Any anomalous, "non-self' behavior activates an appropriate immune system response. See also: BOSS, MASTER, RECAP, RUSH, SCRAMnet, SERVANT, UNCAP.
Sensitive Information. Information that, as determined by a competent authority, must be protected because its unauthorized disclosure, alteration, loss, or destruction can at least cause perceivable damage to someone or something. (DoD 5200.28-STD). See also: SBU, Sensitivity Label.
Sensitivity Label. A piece of information that represents the security level of an object and that describes the sensitivity (e.g., classification) of the data in the object. Sensitivity labels are used by the TCB as the basis for mandatory access control decisions. (DoD 5200.28-STD). In the SHADOWS infrastructure, an object's sensitivity label (and other security properties) is available to CHARM (the SHADOWS associative memory system), and therefore to BOSS (the SHADOWS TCB), without having to retrieve the object itself, via its FASTpage index entries. See also: BOSS, CHARM, FASTpage, Sensitive Information, TCB.
SERVANT TM. Service Executor, Repository, & Voluntary Agent- Non-Trusted. A
cooperative computing and/or storage node that is untrusted (usually due to potential threat exposure). A MASTER that is not recognized as a MASTER by other MASTERS may operate as a SERVANT (but to do so, it must use the UNCAP protocol, tunneled via RUSH, rather than the RECAP protocol). See also: BOSS, MASTER, RECAP, RUSH, SCRAMnet, SELF, UNCAP.
Set Associative Cache. A compromise between a direct mapped cache and a fully associative cache where each address is mapped to a certain set of cache locations. The address space is divided into blocks of 2m bytes (the cache line size), discarding the bottom m address bits. An "n-way set associative" cache with S sets has n cache locations in each set. Block b is mapped to set "b mod S" and may be stored in any of the n locations in that set with its upper address bits as a tag. To determine whether block b is in the cache, set "b mod S" is searched associatively for the tag. A direct mapped cache could be described as "one-way set associative" (i.e., one location in each set), whereas a fully associative cache is N-way associative (where N is the total number of blocks in the cache). Performance studies have shown that is is generally more effective to increase number of entries rather than associativity and that 2- to 16-way set associative caches perform almost as well as fully associative caches at little extra cost over direct mapping.
See also: Cache, Direct Mapped Cache, Fully Associative Cache.
SEU. Single Event Upset. A probabilistic, localized error in computer memory (e.g., DRAM) or logic, typically caused by cosmic rays or alpha particles striking a transistor or memory cell and causing it to change state. The primary goal of ECC mechanisms is to detect and/or correct the inevitable occurrence of one or more SEUs. Consumer computers rarely have ECC at all, but server computers often protect their main memory systems with SEC-DED ECC (which is capable of correcting a single error per access), and sometimes have a "Chipkill" type of ECC that can detect a single chip failure and some multiple SEU
combinations. Because SEUs are probabilistic, however, as memory capacities and densities increase, and as average chip temperature increase, the likelihood of SEUs increases even more quickly than one might expect. SEU likelihood has now increased to the point that failure due to uncorrectable SEU is becoming a relatively common event, even when SEC-DED ECC is used. See also: CHARM, DRAM, ECC, FEC, SEC-DED.
SFF. Small Form Factor.

SHADOWS'. Self-Healing Adaptive Distribute Organic Working Storage. A SHADOWS
network consists of a combination of terrestrial and space-based SHADOWS nodes and singleton SCRAM machines (described later, along with SERVANTs and MARSHALs). In general, a geographically proximate collection of SCRAM machines may self-organize into a geographically proximate SCRAMnet comprising a SHADOWS node. SCRAM machines that are unable to join a SHADOWS node remain as singletons until they can join one, if ever. Singletons act as SERVANTs to bona-fide SHADOWS
nodes, and as MARSHALs between SHADOWS nodes and system users (however, non-singletons also volunteer for these roles on a part-time basis).
Simulate. Representing the functions of one system by another (e.g., using a computerized system to represent a physical system). See also: Emulate.
Simulation. Generally, the process of representing one system by another (e.g., representing the real world by a mathematical model solved by a computer). See also: Emulation.
SIP. Session Initiation Protocol. See also: DELEGATE.
SLA. Service Level Agreement.
SLAVE TM. Storage-Less Adaptive Virtual Environment. A trusted cooperative computing, memory, and/or storage capability under total control of a MASTER, which delegates authority and resources to the SLAVE as needed. Every SLAVE must be physically attached to and co-located with at least one local MASTER in order to operate at all.
SMART. Self Monitoring Analysis & Reporting Technology. Also: S.M.A.R.T. A
monitoring system and signaling interface for magnetic disk drives to detect and report on various indicators of reliability.
SMART enables a host processor to receive analytical information from the disk drive that may be useful for anticipating failures. See also: NEAR.

SOLAR TM. Self-Orienting Light-Aggregating Receiver. In a preferred embodiment, a system using a relatively low-temperature phase-change working fluid to receive heat energy from the sun for immediate use (in which case it acts as a "boiler") or subsequent use, and especially for the primary purpose of generating electricity. In an alternative embodiment, a system using a relatively low-vapor-pressure working fluid (for example, an appropriate Paratherm thermal oil) to receive heat energy from the sun for immediate or subsequent use. The heat energy in this context refers to energy that can be immediately used immediately (or stored for later use) to effect or help effect a liquid/vapor phase-change, such as occurs, by design, in a "boiler." Received energy heats and expands the phase-change working fluid (which may have been preheated via RUBE, above), and which, in conjunction with optional vapor injection (see RUBE Vapor Injector, described elsewhere) in the "boiler" feed circuit, and in conjunction with a FORCE nanoturbine or FPSE (Free Piston Stirling Engine) in the "boiler output circuit, can be used to accomplish work, and particularly, to generate electricity.
SRAM. Static RAM. A type of volatile RAM whose cells do not need to be continually refreshed, but which may lose data if power is removed. Contrast with DRAM and NVRAM. See also:
DRAM, NVRAM, RAM.
SSD. Solid-State Disk. A general-purpose electronic storage device that emulates a traditional "spinning"
disk drive, but actually contains no moving parts, and thus incurs no performance penalty due to rotational latency or track-to-track seek latency. Historically, SSDs have been expensive, and thus relegated to special purpose applications requiring the lowest possible disk access latency.
Implementations based on NVRAM
or SRAM are much more expensive per byte stored than those based on DRAM, largely due to their higher speed and lower storage density. It is generally accepted that SSD can comprise a battery-backed RAM
(BB-RAM) with a backup disk drive, but there is less consensus as to whether "flash memory" on its own can constitute a general purpose SSD, due to the fact that flash memory technology currently supports only a finite number of write cycles (typically 10,000, or 100,000, or a million) to a particular location. Relatively new technologies such as MRAM hold promise, since they have the potential to be dense, fast, and relatively inexpensive. See also: BB-RAM, MRAM, NVRAM.
STEER TM. Steerable Thermal Energy Economizing Router.
STP. Standard Temperature & Pressure. A temperature of zero degrees Celsius (0 C) and a pressure of one atmosphere.
Subject. An entity that causes operations to be performed.
SUREFIRETM. Survivable Unmanned Renewably Energized Facility & Independent Reconfigurable Environment. A miniature, self-contained, unmanned, secure, outdoor (often underground) supercomputing datacenter designed to be physically visited for maintenance purposes at most only once or twice a year (and these may be combined with scale-up visits). SUREFIRE sites can be located on virtually any outdoors property, but also in basements or on rooftops, etc. SUREFIRE sites usually include one or more renewable energy systems, in addition to conventional energy sources. SUREFIRE sites are designed for maximal energy efficiency, and emit very little waste heat. All SUREFIRE sites may be expendable without data loss, and penetration can never yield useful information to an attacker. See also:
SCRAM.
Swap File. A special file in a virtual memory system which is used to temporarily store "dirty" memory pages. Swap files, although typically disk-based, are often organized for relatively rapid access compared to writing dirty pages back to their original location. See also: Demand Paging, Swapping.
Swapping. In a virtual memory system, a technique to remove virtual pages from physical memory in order to replace them with others that are currently needed. "Dirty" pages (those which came from an executable image or data file and have been modified but not yet written back) are written to a "swap file" temporarily (unless they've been written previously and are unchanged, in which case they can simply be deleted). Non-dirty pages can simply be deleted, since they can be reread on demand. Pages are swapped out only if the data in them cannot be retrieved another way. See also: Swap File.
Tcase. The temperature of the case (package) enclosing an integrated circuit chip at a particular point in time.
TCB. Trusted Computing Base. The TCB is a useful concept because it identifies, within a system, the subsystem which owns the security (in the SHADOWS infrastructure, BOSS
implements the TCB). The rest of the components may communicate with this TCB and rely on it to make correct security decisions. Thus, the TCB must exist and it must make 100% of the security decisions. The DoD
defines the TCB as the totality of protection mechanisms within a computer system - including hardware, firmware, and software -the combination of which is responsible for enforcing a security policy. A TCB
consists of one or more components that together enforce a unified security policy over a product or system. The ability of a trusted computing base to correctly enforce a security policy depends solely on the mechanisms within the TCB and on the correct input by system administrative personnel of parameters (e.g., a user's clearance) related to the security policy. (DoD 5200.28-STD). TCSEC1983 defines the TCB as "the totality of protection mechanisms within a computer system, including hardware, firmware, and software, the combination of which is responsible for enforcing a security policy. Note: The ability of a TCB to enforce correctly a unified security policy depends on the correctness of the mechanisms within the TCB, the protection of those mechanisms to ensure their correctness, and the correct input of parameters related to the security policy."
See also: BOSS, DoD, TCSEC.
TCP. Transmission Control Protocol. A set of IP-based networking protocols widely used on the Internet that provides communications across interconnected networks of computers with diverse hardware architectures and various operating systems. TCP over IP (TCP/IP) includes standards for how computers communicate and conventions for connecting networks and routing traffic. See RFC 793. See also: UDP.
TCS. Trusted Computer System. A system that employs sufficient hardware and software integrity measures to allow its use for processing simultaneously a range of sensitive or classified information. (DoD
5200.28-STD).

TCSEC. Trusted Computer System Evaluation Criteria.
TDP. Thermal Design Power. For power-hungry integrated circuit chips, there is sometimes an observable or even specified relationship between Tcase and TDP. See also: Tcase, TLB. Translation Look-aside Buffer.
TOR. The Onion Router. An open-source, anonymizing overlay network based on establishing secure, multi-hop TCP connections among randomly selected TOR nodes. Any SHADOWS nodes that implements the RUSH protocol can participate in the TOR network as a TOR node. Although SHADOWS does not depend on TOR, participating in the TOR network provides a source of mix-in traffic that helps to prevent traffic analysis by a sophisticated attacker, while also helping the TOR
network. See also: 12P, RUSH.
Trap Door. A hidden software or hardware mechanism that permits system protection mechanisms to be circumvented. It is activated in some non-apparent manner (e.g., special "random" key sequence at a terminal). (DoD 5200.28-STD).
Trojan Horse. A computer program with an apparently or actually useful function that contains additional (hidden) functions that surreptitiously exploit the legitimate authorizations of the invoking process to the detriment of security. For example, making a "blind copy" of a sensitive file for the creator of the Trojan Horse. (DoD 5200.28-STD).
Trusted. A Trusted system or component is one whose failure can break the security policy. See also:
Trustworthy.
Trusted Path. A mechanism by which a person at a terminal can communicate directly with the TCB. This mechanism can only be activated by the person or the TCB and cannot be imitated by untrusted software.
(DoD 5200.28-STD). See also: TCB, Trusted Software.
Trusted Software. The software portion of a TCB. (DoD 5200.28-STD). See also:
TCB.
Trustworthy. A trustworthy system or component is one that won't fail. [R.J.
Anderson, "Security Engineering: A Guide to Building Dependable Distributed Systems," Wiley (2001) ISBN 0-471-38922-6]. See also: TCB, TCG, Trusted, Trusted Path, Trusted Software.
Tunneling. Refers to the encapsulation of protocol A within protocol B, such that A treats B as though it were a data link layer. See also: Tunneling Router.
Tunneling Router. Router or system capable of routing traffic by encrypting it and encapsulating it for transmission across an untrusted network, for eventual de-encapsulation and decryption. The FLAMERouter and RUSHrouter are both tunneling routers. See also: FLAMERouter, RUSHrouter, Tunneling.
UDP. User Datagram Protocol. A Transmission Control Protocol (TCP) complement that offers a connectionless datagram service that guarantees neither delivery nor correct sequencing of delivered packets, much like Internet Protocol (IP) up which it depends. See RFC 768.
See also: TCP.
UNCAPTM. Untrusted Node Computation, Adaptation, & Persistence. The secure, proprietary protocol used for communication between MASTER-led teams and the SERVANTs (i.e., untrusted nodes) that "belong" to them. UNCAP appears to be used for RUSHrouter-to-RUSHrouter communication also, but this is only coincidental, since every RUSHrouter comprises at least one SERVANT.
UNCAP is always tunneled via the RUSH protocol, but unlike RECAP, there is no expectation of trustworthiness among its participants.
See also: FLAMERouter, RECAP, RUSH, RUSHrouter, Tunneling Router.
Usability. The usability of a system involves three potentially conflicting factors: how quickly users can do what they want to do, how correctly they can do it, and how much they enjoy doing it. The underlying design of a computer system can affect its usability. Designing usability into a system involves analyzing users' needs, and then designing around those needs while optimizing the three factors.
USB. Universal Serial Bus. A tri-speed (high, full, low) signaling standard.
High-speed USB 2.0 allows data transfer up to 480 Mbps, which is 40 times faster than full-speed USB.
Due to signaling overhead, the USB 2.0 standard appears to have a throughput limitation of around 25 to 30 MBps, or is about half of what is implied by the raw data rate.
User. Any person who interacts directly with a computer system. (DoD 5200.28-STD). Also: Any entity (human user or external IT entity) outside of the computer system that interacts with it.

User Data. Data created by and for the user, that does not affect the operation of the system's security functions.
WAN. Wide Area Network.
WLAN. Wireless LAN.
VM. Virtual Machine. (In other contexts, "Virtual Memory").
VMM. Virtual Machine Monitor. Equivalent to hypervisor. Responsible for supervising virtual machines.
In SHADOWS, the VMM is part of the BOSS role.
VLAN. Virtual LAN.
VoIP. Voice over IP. See also: DELEGATE.
VPN. Virtual Private Network.

4 SHADOWSTM - Architectural Overview & Motivations 4.1 The Goal, in No Uncertain Terms To achieve "An affordable, highly trustworthy, survivable and available, operationally efficient supercomputing infrastructure for processing, sharing and protecting both structured and unstructured information."

4.2 Historically Conflicting Requirements A primary objective of the SHADOWS infrastructure is to establish a highly survivable, essentially maintenance-free shared platform for extremely high-performance computing (i.e., supercomputing) - with "high performance" define both in terms of total throughput, but also in terms of very low-latency (although not every problem or customer necessarily requires very low latency) - while achieving unprecedented levels of affordability (both capital and operational expense) - that is capable of earning a deserved reputation for trustworthiness, survivability, and fault tolerance. These requirements have historically been in conflict with each other, and resolving them requires a new approach.

4.3 SHADOWS as a Distributed, Decentralized Centralized Architecture At its simplest, the idea is to use distributed "teams" of nodes in a self-healing network as the basis for managing and coordinating both the work to be accomplished and the resources available to do the work. The SHADOWS concept of "teams" is responsible for its ability to "self-heal" and "adapt"
its distributed resources in an "organic" manner. Furthermore, the "teams"
themselves are at the heart of decision-making, processing, and storage in the SHADOWS
infrastructure. Everything that's important is handled under the auspices and stewardship of a team.
Think: "The Borg" The idea is to achieve an apparently centralized supercomputing infrastructure (SHADOWS), with all the advantages of centralization (but not the disadvantages), through the implementation of a highly distributed, decentralized organic network of cooperating nodes (working storage) that self-organize into teams, dynamically partition work and resources among the nodes and teams, and -importantly - hold each other accountable.
While it is straightforward to achieve high throughput via a large number of distributed nodes, it is not possible to do so with very low latency, and at low very cost. Thus, there must be collections of nodes that are sufficiently localized to reach a "critical mass" of computing power, in order to achieve the lowest possible latency for those problems and/or customers that require it, and to do so at the lowest possible cost (without sacrificing trustworthiness, survivability, and fault tolerance). A
new kind of supercomputing machine - SCRAM - was conceived as the means to reconcile the conflicting requirements, including that of achieving low acquisition and operating costs. Each SCRAM machine is a self-contained supercomputer in its own right, but can be colocated with other SCRAM machines to multiply its capacity and performance without sacrificing latency, and SCRAM machines can be distributed to achieve arbitrary levels of survivability.
In SHADOWS terminology, "working storage" is not passive - it's active - the working storage actually does the work. A "node" is the smallest addressable unit of intelligent storage, or working storage. A node comprises at least one processor (to do computational work), along with some mix of volatile and non-volatile memory (to provide information storage).
SHADOWS nodes can be organized into "machines," and machines can be organized into "sites" - and these are the basis of the two primary conceptual SHADOWS building blocks:
= Machines Comprise Nodes = Sites Comprise Machines The Borg: .A being comprising life-like yet robot-like beings with a collective, distributed consciousness (from the science-fiction television show, "Star Trek").

In a preferred embodiment, the subject machines are SCRAM machines, and the subject sites are SUREFIRE sites.
SCRAM machines are miserly in their energy usage and are self-contained (including computing, networking, persistent storage, power generation, etc.). SCRAM machines do not need computer-friendly environments (you could safely drop one into a lake without damaging it), so they can easily be distributed to multiple sites, which need not be data centers (any physically secure location may be appropriate). SCRAM
machines are essentially very small, self-contained datacenters, except that they require external power and, to some degree (depending on the threat profile), physical protection.
In conjunction with SUREFIRE sites, SCRAM machines are designed for deployment to unmanned/unattended locations (e.g., underground) and require no routine maintenance.
4.4 SUREFIRE Sites as Survivable Mini-Datacenters A SHADOWS "site" is defined as a group of SHADOWS machines (whether or not they are SCRAM
machines) that share the same or approximately the same GPS coordinates (within some radius and/or margin of error) and are interconnected with a multiplicity of switching and/or routing communications fabrics.
In a preferred embodiment, one or more SCRAM machines would be co-located at a site - in a particular kind of highly survivable facility referred to as a SUREFIRE site.
The are numerous SUREFIRE site configurations possible, in order to provide the basis of meeting a diverse set of needs. The four exemplary configurations described here are:
= SUREFIRE Freestanding Vault (preferred embodiment) = SUREFIRE Mini-Silo (preferred embodiment) = SUREFIRE Single-Level Underground Vault (alternate embodiment) = SUREFIRE Multi-Level Underground Vault (alternate embodiment) The SUREFIRE Freestanding Vault is a preferred embodiment, and by design its minimal configuration would enjoy the lowest cost of the four example unmanned configurations if deployed in volume, which would enable affordable, widespread deployment. The packaging of all of its major components makes it equally at home in a datacenter, office building, warehouse, basement, or on the roof.
Even though it contains its own multifuel power plant, it requires less than 50 square feet of floor space, including room for maintenance access. The SUREFIRE Freestanding Vault can be configured to support various levels of performance in the sub-TFLOPS to 20 TFLOPS range, per vault.
The SUREFIRE Mini-Silo is a preferred embodiment, and by design its minimal configuration would enjoy the lowest cost of the three example underground configurations if deployed in volume, which would enable affordable, widespread deployment. The packaging of all of its major components is tailored especially to a silo configuration (a cylindrical shape approximately 3 feet in diameter). The SUREFIRE Mini-Silo can be configured to support various levels of performance in the sub-TFLOPS to 10 TFLOPS range, per silo.
The SUREFIRE Single-Level Underground Vault is an alternate embodiment - a larger diameter silo - that could be affordably produced in fairly low quantities (relative to the SUREFIRE Mini-Silo), and is able to accommodate a higher degree of conventional equipment than the SUREFIRE Mini-Silo. The SUREFIRE
Single-Level Underground Vault is especially well-suited to supercomputing accompanied by significant radio communications (the silo itself serves as the base for relatively lightweight communications towers). The SUREFIRE Single-Level Underground Vault can be configured to support various levels of performance in the 0.5 TFLOPS to 10 TFLOPS range, per silo.
The SUREF/RE Multi-Level Underground Vault is an alternate embodiment - also in a silo configuration -that is likely to require a somewhat substantial level of site engineering and preparation prior to deployment.
A typical deployment scenario would be underneath (literally) a commercial-class wind turbine (100 KW or more). While the basic design is straightforward to replicate, its site preparation unlikely to be, due to the facility depth and likely permitting issues. The SUREFIRE Multi-Level Underground Vault can be configured to support various levels of performance in the 2 TFLOPS to 50 TFLOPS range, per silo.

4.5 How Distributed Machines Are Organized at Multiple Sites A SHADOWS "mesh" (which may also be a "neighborhood" and/or "community") is a group of SHADOWS
sites in the same locale, sharing proximate GPS coordinates and interconnected with a meshed network of point-to-point and point-to-multipoint links, augmented by WAN links (in a preferred embodiment, a diverse multiplicity of terrestrial and satellite channels are used to achieve specific survivability goals).
A SHADOWS "region" is typically (but this may be defined by policy) the collection of WAN-connected (at least) SHADOWS sites supplied (or potentially supplied) by the same utility power grid (thus, in the U.S., for example, there are four regions under this definition, but other definitions are possible also). Adjacent regions may also enjoy mesh-like point-to-point or point-to-multipoint interconnections, which may have the effect of collapsing two or more physical (or policy-defined) regions into a single logical region.
A SHADOWS "theater" is a collection of WAN-connected sites which, for our purposes, is essentially distinguished by some combination of geographical, political, military, legal, and technical considerations that force special or self-similar treatment throughout the collection. Examples of theaters are North America, Western Europe, China, Japan, Australia, the stratosphere, the troposphere, LEO satellites, MEO satellites, the moon, and Mars (this is clearly a non-exhaustive list).
Finally, the SHADOWS "universe" is the total collection of SHADOWS theaters, whether interconnected by any means whatsoever, or even disconnected.
Although traditional route optimizers and link-balancing devices can optimize for some combination of link performance and/or link cost, they generally consider the network only from the device's point of view, or with respect to a set of relatively local properties. This means, for example, that there is nothing to prevent such an device from choosing a low-cost outbound link that corresponds to an high-cost inbound link at the ultimate destination. This is usually the best that can be expected, especially when only the near end device is under local administrative control and responsibility.
In stark contrast, SHADOWS considers its entire network (i.e., the SHADOWS
universe) as the basis for optimization. When optimizing for cost, for example, SHADOWS considers both the sending and receiving links for every SHADOWS node along a path.
Key SHADOWS drivers include the current and probable future availability of resources, and the maintenance of adequate reserves to ensure appropriate levels of survivability.
SHADOWS network routing is further complicated by the need to intentionally thwart traffic analysis by potential attackers. Thus, in addition to the functional boundaries, roles, and optimizations noted above, there are non-functional ones as well.
In particular, once communications exits the virtual world of SHADOWS, such that connection to the "real world" is required, new types of special capabilities are called for.
The Firewall, Link-Aggregator-Multiplexer, & Edge Router (FLAMERouter) capability lives at the interface between a SHADOWS supercomputing node and all external network connections (LAN and WAN). One of its primary responsibilities is to cooperate with the FLAMERouters of other SHADOWS nodes in order to transparently and logically connect each SHADOWS node to the others, optimally, using the Scrutiny RECAP (Reliably Efficient Computation, Adaptation, & Persistence) protocol over any and all channels available (private and public). A key goal is to handle traffic as though all the nodes were connected on an amalgam of VLANs and VPNs (but without the VLANs and VPNs), taking extraordinary measures as necessary to avoid partitioning of the "virtual network."
Another key role of the SHADOWS FLAMERouters is to safeguard the SHADOWS
communications channels, not only to prevent denial of service (which includes resisting DDOS
attacks), but also to prevent traffic analysis, so as to render the SHADOWS network opaque. FLAMERouters use active techniques, in conjunction with the SELF subsystem, to classify both inbound and outbound traffic as friendly, benign, or malicious. Friendly traffic (as determined by SELF) is granted the highest priorities. Benign and malicious traffic are both allowed, depending on the properties of the traffic itself, but are closely managed by the FLAMERouters so as to meet the specific needs of SHADOWS (non-self traffic is desirable for mixing purposes, as part of defending against traffic analysis by attackers, but must be limited to exactly the desired bandwidths, while ensuring that no malicious traffic is allowed to propagate).
As a fringe benefit of defending a SHADOWS network from DDOS attacks, a wide deployment of FLAMERouters is expected to have the ability to help mitigate the effects of "botnets" across the Internet in general. FLAMERouters can execute behavior-appropriate countermeasures.
The FLAMERouter processes can be implemented in software and/or hardware, but in a preferred embodiment are implemented primarily in reconfigurable hardware, under the auspices of dynamic configuration software, and under the control of the BOSS (Byzantine Object &
Subject Security) and SELF
(Secure Emergent Learning of Friends) subsystems.
The SHADOWS RUSHrouter behaves much like the FLAMERouter, but is designed for deployment to client locations, where it can serve as a host-resident proxy or default gateway, or live in the DMZ as a server, edge router, and default gateway. The primary role of the RUSHrouter is to enable and manage secure communications between client machines and RUSHrouters, between RUSHrouters and FLAMERouters (indirectly, because a RUSHrouter never knows when it is communicating with a FLAMERouter, which can emulate RUSHrouters), and among RUSHrouters, all under the auspices and control of the FLAMERouters.
RUSHrouters communicate natively using Scrutiny's RUSH and UNCAP protocols.
The RUSH (Rapid Universal Secure Handling) protocol focuses on meeting the needs of clients (i.e., on the client-side of the RUSHrouters). The UNCAP (Untrusted Node Computation, Adaptation, &
Persistence) protocol is a subset of RUSH and focuses on communications between the SHADOWS infrastructure and any SERVANTs that are implemented on client machines.
Because they are in essence client-side gateways and firewalls, with built-in proxy and server functions, RUSHrouters can also communicate (like a residential gateway/firewall) with arbitrary Internet destinations, including to and through overlay networks (e.g., the anonymizing networks TOR, 12P, etc.), and can do so by using any and all available connections (like a FLAMERouter). Client preferences (especially firewall and bandwidth preferences) can be set by authenticated clients, but all such configuration changes actually occur only at the behest of the FLAMERouters, based on client requests to configuration control processes in the SHADOWS infrastructure.
Although RUSHrouters are under the control of FLAMERouters, they technically do not actually communicate with them directly, since FLAMERouters are generally invisible except to specially privileged devices (and specifically, not to RUSHrouters). Instead, RUSHrouters communicate with a multiplicity of what they "think" is FLAMERouter, but is in actuality a SHADOWS MARSHAL
(Multiprocessor Adaptive Scheduler & Task Executor/Redirector).
A MARSHAL is much like a RUSHrouter, except that it lives not on the client side, but out in the Internet itself, typically in data centers or network hubs where multiple high-bandwidth connections are available.
RUSHrouters and MARSHALs work together to route, mix, aggregate, and manage traffic, under the auspices and control of the FLAMERouters. Note that RUSHrouters and MARSHALs may be directed to send traffic to FLAMERouters (thinking they're sending it to another RUSHrouter or MARSHAL, because the destinations aren't recognizable as FLAMERouters). Only legitimate (i.e., authorized) traffic is ever directed to the FLAMERouters, although this may include both benign and malicious traffic (if desired by the FLAMERouters, but only to the extent so desired). A compromised RUSHrouter or MARSHAL that directs unwanted traffic (malicious or not) toward the FLAMERouters may face appropriate countermeasures.
The key differences among RUSHrouters, MARSHALs, and FLAMERouters are their roles, purposes, locations, location-induced vulnerabilities, configurations, and implementations. Otherwise, they are conceptually more alike than different, from a process point of view.
RUSHrouters are oriented to client-side functions, MARSHALs are oriented to "middleman" functions, and FLAMERouters are oriented to server-side functions, yet they all can at least appear to emulate each other, to a point.
Note: Any FLAMERouter can emulate any number of RUSHrouters and MARSHALs, and so can communicate directly with them without revealing itself.

5 SERVANT (Service Executor, Repository, & Voluntary Agent -Non-Trusted) SERVANTTM. Service Executor, Repository, & Voluntary Agent- Non-Trusted. A
cooperative computing and/or storage node that is untrusted (usually due to potential threat exposure). A MASTER that is not recognized as a MASTER by other MASTERS may operate as a SERVANT (but to do so, it must use the UNCAP protocol, tunneled via RUSH, rather than the RECAP protocol). See also: BOSS, MASTER, RECAP, RUSH, SCRAMnet, SELF, UNCAP.

5.1 MARSHAL (Multi-Agent Routing, Synchronization, Handling, &
Aggregation Layer) MARSHAL". Multi-Agent Routing, Synchronization, Handling, & Aggregation Layer.
A distinguished SERVANT node having the responsibilities of fulfilling a MARSHAL role. Any node, authenticated as having a MARSHAL role, that serves as a gateway for system users to access SHADOWS
services via a network (e.g., the Internet). A MARSHAL may also communicate with other MARSHALs, under the auspices and control of a MASTER-led team, in order to implement one or more overlay networks and/or network fabrics whose purposes and characteristics are determined by the MASTER-led team (but are opaque to the MARSHALs). By design, a MARSHAL is not trusted, and the role is typically fulfilled by a SERVANT node (which is also inherently untrusted). Occasionally the MARSHAL role is fulfilled by a SLAVE (emulating a MARSHAL) that is operating under the auspices and control of a MASTER, through a HANDLER, and is therefore trusted, but this fact is never known to those communicating with the MARSHAL. A MARSHAL
may reside virtually anywhere (e.g., at an ISP, on customer premises, at a telco central office, at a datacenter, on a utility pole, within a server or PC, etc.). See also:
HANDLER, ISP, MASTER, PC, SELF, SERVANT, SLAVE.

5.2 DELEGATE (Distributed Execution via Local Emulation GATEway) DELEGATE TM. Distributed Execution via Local Emulation GATEway. A
distinguished SERVANT node having the responsibilities of fulfilling a DELEGATE role. The DELEGATE role implements secure client-side "proxy" agent that appears to locally implement a particular service which would normally be implemented elsewhere, such as on a local or remote server, but instead may actually be implemented within the SHADOWS network cloud.
The DELEGATE proxy handles both stateless and stateful communication (the latter may be expected to be "chatty") with the client-side software requesting service, such that the DELEGATE proxy translates requests to and from the RUSH protocol as needed. In one embodiment, for example, an open-source DBMS API like that of, say, MySQL or PostgresSQL is implemented as a DELEGATE; the MySQL or PostgresSQL
DELEGATE can then be run locally on an arbitrary machine (e.g., a PC or server), and any software applications that expect the selected DBMS may run as though it were present.
Although the selected DBMS may appear to be local, its operations may actually be carried out on the SHADOWS supercomputing infrastructure; there is no need for database replication, because the survival and integrity of distributed data is intrinsic to the SHADOWS architecture.
Any number of authorized subjects at any authorized locations can similarly instantiate the selected DBMS
DELEGATE, and they may all be sharing the same database (if that is what is called for), or diverse databases, as required. Furthermore, if one application requires one DBMS, say MySQL, and another requires a different DBMS, say PostgresSQL, and a third application requires an OpenLDAP server, and a fourth requires an Apache web server, then four appropriately selected DELEGATEs can be instantiated on the local machine. Each DELEGATE may implement the requisite local API, but can communicate (via the RUSH protocol) with a local set of virtual RUSHrouters, which can communicate (again, via the RUSH
protocol) with the distributed SHADOWS infrastructure, where the actual computing operations can be carried out in accordance with an appropriate SLA.
In a preferred embodiment, the DELEGATE concept can be applied to common Internet-based services, including DNS, email (POP3, SMTP, etc.), VoIP (SIP), and so forth.
In a preferred embodiment, the DELEGATE concept is applied to HPC-class interprocessor communications by implementing an MPI API.
See also: API, DBMS, DNS, LDAP, MPI, POP3, RUSH, RUSHrouter, SIP, SLA, SMTP, VoIP.

6 SCRAM - Survivable Computing, Routing, & Associative Memory A SHADOWS "machine" comprises one or more nodes sharing a common chassis or other container of some sort, without regard to specific packaging. In a preferred embodiment, "SCRAM" is one such machine;
its extruded aluminum chassis is cylindrical in shape, comprising a set of Quadrants, each of which comprises a set of Lobes and an optional set of Blades.

x Main Section. In a preferred embodiment, the main section (i.e., the vertical upright portion) is a single large aluminum extrusion with an overall diameter of about 25" (including cooling fins not shown). However, there are only a handful of extruders in the world capable of handling a diameter approaching 25" (and the associated tonnage of press capacity required), so in an alternate embodiment, the main section is split into three identical interlocking sections (one per 90 quadrant), each of which has a maximum diameter of <20".
In an alternative small-form-factor embodiment (not shown), the main section is a single aluminum extrusion with an overall diameter of about 12" to 13" (including cooling fins not shown), with the other dimensions and capacities scaled as needed (while maintaining similar aspect ratios).
Faceplates and/or 10 panels that attach via the quadrant interlocking mechanism are used to cover surfaces exposed by the "missing" fourth quadrant.
Inner Diameter. The "inner" diameter is smaller than depicted in order to increase the interior room, and is assigned a cooling function.
Lower Section. Lower extruded section is three interlocking "outrigger"
sections (one per quadrant) that are identical large extrusions with a maximum "diameter" (cross-sectional length) of about 28" in a preferred embodiment (or somewhat less than 17" in an alternative small-form-factor embodiment not shown), or six interlocking "outrigger" sections (two per quadrant) that are identical large extrusions with a maximum "diameter" (cross-sectional length) of about 20" in a preferred embodiment (or about 12" in an alternative small-form-factor embodiment not shown, with values proportional to dimensions shown).
In a preferred embodiment, the coolant sump and pumps are accessed from the "open" side (where the missing quadrant is). In a alternate embodiment, the unit is serviced from the top. Note that the sump is normally dry, except in the rare case of accidental spills (all the working fluid couplings are blind-mating and self-sealing).
Depending on the selected pump, there may be as many as four high-reliability (>= 50,000 hours MTBF) pumps in a quadruple modular redundant arrangement, where each such arrangement is responsible for a certain percentage of the necessary flow. Under normal loads, and depending on the ambient temperature (or other cooling temperature), only one pump is typically operating (some conditions require no pumps at all). In a preferred embodiment based on variable-voltage DC pumps, the pumps are small and nominally dissipate less than 25 watts each, while pumping up to 1200 LPH (-317 GPH, or -5.3 GPM) or providing pressures up to 3.5 bar (50 PSI).
In a preferred embodiment, the SCRAM Supercomputer is designed to be self-contained storage-wise, with up to 32 full-size (3.5-inch) disk drives per quadrant, or (preferably) 128 small-form factor (2.5-inch) disk drives per quadrant. An Introductory Limited Edition might ship with 2 nearline outrigger blades, each populated with 16 drives of 80 GB, or 1.28 TB per blade, for a total of 2.56 TB. Although we could easily use higher-capacity drives, the selected 80GB drives are at a sweet spot for price and performance. Given a fixed budget, far more performance can be had with the lower capacity drives, because many more drives can be purchased for the same amount of money, and more spindles means higher levels of parallel access.
For 2007, the highest density 2.5-inch SAS disk drive has a raw (uncompressed) capacity of 146 GB, so the maximum hard disk storage capacity possible with 128 drives is 18.7 TB per quadrant (146GB x128), or 56 TB for the chassis. With dual-ported SAS drives, however, there are 256 channels of access (300 MBps each), rather than 128 channels (all SATA drives are only single-ported).
In a preferred embodiment, each NEARblade is a 16-drive SAS/SATA hybrid, consisting of 4 to 8 dual-ported SAS drives for speed and 8 to 12 single-ported SATA drives for high capacity and cost reduction. Note that despite the fact that typical SAS drives (10K RPM) are much faster than high-capacity SATA drives (5400 or 7200 RPM), both are considered "only" nearline storage in a SCRAM
Supercomputer.
= A 4-SAS, 12-SATA hybrid with the drives noted above would have 572 GB of high-performance drives (via 8 channels) and 3.6 TB of high-capacity drives (via 12 channels), for a maximum total capacity of just over 4 TB per NEARblade (via 20 channels). A full complement of 24 such blades would yield 96 TB of hybrid storage (13.7 TB SAS, 86.4 TB SATA) with 2007 technology.
= An 8-SAS, 8-SATA hybrid with the drives noted above would have almost 1.2 TB
(via 16 channels) of high-performance drives, plus 2.4 TB (via 8 channels) of high-capacity drives, for a total capacity of 3.6 TB per NEARblade (via 24 channels). A full complement of 24 such blades would yield 86.4 TB of hybrid storage (28.8 TB SAS, 57.6 TB SATA).
Each storage blade is likely to weigh 15 to 20 pounds (16 drives plus frame, thermal conductors and coolant). If all the drives in a bay were spinning at once, they would require 160 to 300 watts of power, depending on the mix of SATA and SAS drives.
In a typical Scrutiny configuration, much less than 20% of the drives would normally be spinning, reducing the power load to the neighborhood of 32 to 60 watts maximum.
In a preferred embodiment, blades are either top-loaded or front-loaded, but must be selected for maintenance and powered down before removal. This is a matter of authenticating, making a menu selection, and waiting for a light to indicate that the blade is ready to be removed, and that the solenoid-controlled blade-latching mechanism is unlocked. An outrigger blade such as a drive bay can be removed without shutting down the SCRAM lobes in the corresponding quadrant.
Because a phase-change coolant is in use, removing a SCRAM lobe requires an authorized power-down of the quadrant containing it, and likewise waiting for a light to indicate that the quadrant is ready to be opened, and that its solenoid-controlled module latching mechanism is also unlocked.
The same solenoid control prevent tampering and other unauthorized access.

Due to the very large surface area, the outer fins provide substantial cooling even in the absence of data center-style air conditioning. Phase-change working fluid is circulated in the outer walls, causing the vapor to condense under normal circumstances.
The walls containing the optional inner fins also incorporate fluid circulation channels, and can provide cooling when forced air is available (say, from a data center underfloor air conditioning system). A high-reliability, low-noise blower is also contained in the base (as a backup) to supplement other means of cooling during over-temperature conditions.
Note 1: The fluid channels in the inner walls are distinct from the fluid channels in the outer walls, and may be used separately, although there is a relatively low-resistance conduction path in the current design because they're contained in the same all-aluminum extrusion.
Note 2: Although it is not shown here (because it's not related to the extrusion), there is also a heat exchanger and couplings for connecting with a building or datacenter chilled water system. In most cases, the already-hot return water is sufficient for cooling a SCRAM node, which has substantial economic implications, especially for overloaded datacenters.

In a preferred embodiment, a SCRAM node is composed of 1 to 4 quadrants O.
Each quadrant contains 4 lobes 6 that are fully connected to each other and to the lobes in the other quadrants. Each quadrant controls up to 8 optional "outrigger blades" 0 (discussed elsewhere), in any combination, and each blade is fully connected to each lobe 6 in the corresponding quadrant O.
Note: In the illustration above, the particular internal configuration details of each of the individual lobes are not significant, except that the PEERS fabric 6 local to each lobe connects with the PEERS fabric 6 in the other lobes, and also with the "outrigger blades" 0 (none of these connectivity details is shown here anyway). . As a matter of convenience, the configuration shown above reflects that depicted on the slide SCRAM "Lobe"- Conceptual Interaction Diagram #2, rather than that depicted on the slide SCRAM
"Lobe"- Conceptual Interaction Diagram #1 (the latter is the preferred embodiment).
In a preferred embodiment, one or more of the blocks depicted above as (optional) "outrigger blades" also are implemented internally (i.e., within a lobe) in a non-bladed manner, so that the specific means are also built into the lobe and provide the corresponding capability inherently (i.e., without the need for optional outrigger blades), in order to reduce the cost of a basic configuration.
Each lobe's workload is handled by SELF/CHARM blocks that function symbiotically to securely store, retrieve, and process information using an associative memory hierarchy. In particular, the SELF roles of BOSS, MASTER, and SLAVE are each paired with a CHARM PUMP capability that is tailored for the particular role. In the diagram above, the pairings (BOSS & PUMPO, MASTER &
PUMPO, and multiple SLAVEs with multiple PUMPsO) are depicted without arrows to emphasize the symbiotic coupling. Each pairing includes one or more means for processing, along with one or more levels of local memory and/or cache. Note that, in a preferred embodiment, the multiple SLAVEs with multiple PUMPs 0 in a one-to-one configuration are replaced with one or more PUMPs 0, each having a multiplicity of SLAVEs 0.
In each SELF/CHARM pairing, the SELF means and the CHARM means may each be implemented via one or more traditional CPUs (SMP or not), programmable and/or reconfigurable logic (e.g., FPGAs, ASICs, etc.), or even discrete logic, or any combination thereof, including implementation of a pairing or multiple pairings on a single chip using any combination of means.
In a preferred embodiment, the BOSS/PUMPO and MASTER/PUMPO pairings are implemented via a single CPU handling the BOSS & MASTER functionality, and a single FPGA or Structured ASIC handling both their respective PUMP functionalities. The SLAVE/PUMPO pairings are each implemented via a single CPU
handling the SLAVE functionality and a single FPGA or Structured ASIC handling the corresponding PUMP
functionality.
Logically, each lobe has a PEERS 0 switching & routing fabric, but in a preferred embodiment there are actually at least two redundant fabrics working together in an active/active configuration.
SCRAM machines provide a solid foundation for the SHADOWS infrastructure, which is highly distributed, with inter-node communications occurring globally over WANs, quasi-locally within a locale via WLANs, and locally (within a site) via a multiplicity of LAN switch fabrics and/or meshes. Nonetheless, the SHADOWS
infrastructure is designed to "play nice," which allows it to safely participate in other networks, in various roles (e.g., supercomputer, NAS appliance, a complete SAN deployment, etc.) - all as a first-class citizen.
Furthermore, the SHADOWS infrastructure is designed to take advantage of idle or unused computing, storage, and communications resources associated with the networks to which it is attached, as authorized, in order to maximize its supercomputing throughput while minimizing the cost of doing so. The SCRAM
machines provide the magic that makes it possible.
Regardless of the physical implementation, a SCRAM machine comprises four major logical functions, and thus four major types of means: SELF, CHARM, CORE, and FRAME.
= The SELF means defines roles for key architectural entities and enables secure, trustworthy, high-performance cooperation among those entities in the SHADOWS infrastructure.
= The CHARM means comprises a local hardware implementation of a secure, distributed (i.e., local node plus multiple remote nodes) hierarchical and associative memory processing system, with overlaid relational capabilities and a compressed persistent store.
= The CORE means comprises the processes and protocols related to the implementation of an associative memory, reasoning and belief systems, and cooperative processing and communications protocols.
= The FRAME means comprises the hardware and processes for survivably and securely energizing and maintaining the system.
The high level logical building blocks of a SCRAM machine are depicted below:
Internally, each of a SCRAM machine's Lobes (and optionally any Blade) comprises at least one MASTER
and typically at least one SLAVE, and both MASTERs and SLAVEs typically comprise multi-core general purpose processors, but may optionally comprise special-purpose processors, including without limitation, devices or modules comprising fixed or reconfigurable logic such as ASICs, FPGAs, and so forth.
Each MASTER is further distinguished by its isomorphic association with unique instantiations of BOSS and SELF (which are implemented at least partly in secure, immutable hardware).
Thus, in this aforementioned embodiment, a "node" could refer to the SCRAM machine itself, or any of the Quadrants, Lobes, Blades, MASTERs or SLAVEs, or even the processors, whereas they collectively determine the Machine.

6.1 SCRAM Subsystem 6.2 SCRAM Processing Node 7 SELF -- Secure Emergent Learning of Friends In a preferred embodiment, SELF is an automated role-oriented "immune system"
that differentiates "self' and "non-self', "friend" and "foe" - thus, said system may distinguish between authorized and unauthorized objects, subjects, and interactions.
In a preferred embodiment, SELF may establish and maintain trust among interdependent systems, subsystems, and components.
In a preferred embodiment, SELF may integrate with BOSS (see section 7.1.3) to incorporate Byzantine agreement logic (from the classic "Byzantine generals" problem) into its decision-making process, so that it may make correct decisions in the face of overt or covert attack, collusion, and corruption.
In a preferred embodiment, SELF may be highly integrated with BOSS, and with the RECAP, UNCAP, and/or RUSH protocols.
In a preferred embodiment, any anomalous behavior detected by SELF, or of which SELF becomes aware, may trigger an appropriate "immune system" response.

7.1 SELF Concepts 7.1.1 SELF- Resource Management Via Teams CENTRAL CONCEPT
At its simplest, the idea is to use distributed "teams" of nodes in a self-healing network as the basis for managing and coordinating both the work to be accomplished and the resources available to do the work.
The SHADOWS concept of "teams" is responsible for its ability to "self-heal"
and "adapt" its distributed resources in an "organic" manner. Furthermore, the "teams" themselves are at the heart of decision-making, processing, and storage in the SHADOWS infrastructure. Anything that may be important may be handled under the auspices and stewardship of a team.
The purpose of having teams is at least five-fold: 1) to distribute the automated resource management overhead, 2) to partition, parallelize, and distribute the actual processing load and improve overall performance, 3) to increase the fault-tolerance of the system, 4) to increase the inherent survivability of the system, and 5) to increase the difficulty of successfully attacking the system.
BASIC CONCEPTS
1. Every MASTER Leads a Team.
2. Not Every Potential MASTER Becomes a MASTER.
3. All Stored Information is Immutable and has an Identity 4. Almost Everything has an Identity, and Anything with an Identity "Belongs"
to a Team 5. A SHADOWS Team Comprises Members with No Common Regional Threats 6. Teams are Stewards of Information to be Stored or Handled 7. Teams are Stewards of Processes, including Memoized Results Although distributed, the SHADOWS infrastructure cannot be correctly described as strictly centralized or strictly decentralized. It is definite not centralized in the sense that a traditional mainframe or supercomputer is intentionally centralized. Neither is it decentralized, in the sense that a peer-to-peer network, or perhaps a grid network -- is intentionally decentralized (so as to avoid centralized functionality, which often requires significant trade-offs). Rather, SHADOWS is a little of both, in a "Borg"-like way. SHADOWS might best be described as having .a conceptually centralized function that happens to have local representation, but a highly decentralized implementation.

7.1.1.1 Every MASTER Leads A Team In the SHADOWS infrastructure, every MASTER is the leader of at least one team to which other MASTERs, both local and remote, are also assigned. Depending upon the nature of a particular team, there may also be non-MASTER participants, and these may be voluntary (SERVANTs) or non-voluntary (SLAVEs).
Given a set of MASTERs cooperating as a team, a specific MASTER is always the team leader (if present and functioning correctly), and each of the other MASTERs has a specific (but potentially dynamic) role relative to the current team leader. There are at least as many teams has there are MASTERs, so that every competent MASTER leads at least one team, and also participates in subordinate roles in other teams. The more MASTERs there are, the more powerful the system is.

7.1.1.2 Not Every Potential MASTER Becomes a MASTER
Although every MASTER leads a team, not every "potential" MASTER may actually become a MASTER.
The state of being a MASTER is neither automatic nor assured -- it requires establishing identities and relationships with other potential MASTERs, and/or with actual MASTERs, until a sort of "critical mass" of relationships, qualifications, trustworthiness and "actual trust" is reached --enabling the state of being a MASTER to be achieved. Until then, a potential MASTER can be a SERVANT (i.e., it can "volunteer"), but cannot lead a team. A SERVANT is a useful, but untrusted, "working storage"
resource - it is capable of storing, retrieving, and forwarding encoded, encrypted information (but not decrypting or decoding it). In general, a SERVANT doesn't possess enough information to make decrypting and decoding possible, regardless of the computing resources available to a would-be attacker. A
SERVANT is also capable of executing in-memory processes against information securely received but not stored, under the auspices of a MASTER-led team.
Some SERVANTs are assigned a MARSHAL role, which adds to their responsibilities, but not to their trustworthiness (like the SERVANT, the MARSHAL role is inherently untrusted).

7.1.1.3 All Stored Information is Immutable and has an Identity In the SHADOWS infrastructure, by design, any information that is intentionally stored is deemed immutable (this does not apply to transient information existing only in memory).
Immutable data content can never be changedt, and has an identity that is determined by the content itself - a cryptographic digest that is somewhat like a DNA signature. This digest, or content-based identity, is known by various names, but in this document may be referred to as simply the "ContentDigest." The ContentDigest is calculated with a cryptographic one-way function and is sufficiently random that it is useful for quasi-randomly assigning the content to the team currently responsible for the logical "universal partition" to which the subject ContentDigest belongs. In addition to the ContentDigest, all stored information is also given a universally unique "ContentAlias" that can remain forever associated with the ContentDigest, and is more convenient and efficient to use. The ContentAlias is permanently assigned by the same team that is responsible for the logical "universal partition" to which the subject ContentDigest belongs, and the team's identity is embedded in the ContentAlias. Thus, both the ContentDigest and the ContentAlias implicitly or explicitly identify the same team, which essentially becomes the "Content StewardTeam" that is accountable for knowing "about"
the content (its logical whereabouts and other potentially privileged information may not actually be known by the team acting as content steward, but it serves as the focal point), but especially the bidirectional mapping of ContentDigest and ContentAlias.

7.1.1.4 Almost Everything has an Identity, and Anything with an Identity "Belongs" to a Team Every SHADOWS resource, task, and identifiable entity of any virtually kind (including, without limiting the generality of the foregoing, processes, objects, subjects, and records), is assigned to a team, as are all users The SHADOWS "Borg-like" operational team concept may be vaguely reminiscent of the physics concept of "quantum entanglement" -- a quantum mechanical phenomenon in which the quantum states of two or more objects have to be described with reference to each other, even though the individual objects may be spatially separated.
Note however, that the internal storage format of the information may be modified without changing the content, and thus, without changing the identity.

and/or actors that exhibit any kind of producer and/or consumer behavior with respect to the SHADOWS
infrastructure or its mission.

7.1.1.5 A SHADOWS Team Comprises Members with No Common Regional Threats From a rudimentary viewpoint, in a preferred embodiment, a SHADOWS team comprises, for example, at least five active MASTERs: two colocated MASTERs, Loca/MASTER 1 (the team leader) and LocaIMASTER_2, and three non-colocated MASTERs RemoteMASTER_1, RemoteMASTER_2, and RemoteMASTER_3. This minimal team is sufficient to maintain Byzantine agreement in the face of one Byzantine fault (e.g., a single corrupt MASTER) or one failed siteY (e.g., due to a regional disaster).
Minimum Redundancy for Byzantine Agreement It is accepted in the art that the minimum number n of team members required for Byzantine agreement is 3f +1, where f is the number of faults to be toleratedt, and no more than one-third of the team members are faulty (whether benign or malicious). However, SHADOWS uses coding theory rather than voting to implement Byzantine agreement. Thus, Byzantine agreement among k out of n MASTERs on the same SHADOWS team is sufficient to tolerate f faults, where f = (n-k)/2 and n>k in the general case of f faulty and/or malicious team members, assuming that it is not known which f of the n MASTERs are faulty and/or malicious. This means that for the case when f=1, then n=2k.
If, instead, it is allowed that up to c MASTERs have simply crashed or failed to respond, and it is known which ones these are, then SHADOWS may tolerate a combination of up to c known crashed or unresponsive MASTERs and up to f faulty or malicious (but unknown) MASTERs, where (c+2f) < (n-k). The mechanisms for accomplishing this are further explained in section 7.1.
Colocated vs. Remote MASTERs In general, there must be at least two colocated MASTERs (both of which must have already qualified to lead teams), such that one of them can lead the team, and the other can serve as local backup (simultaneous failure of both is equivalent to failure of that team in the local geographic region).
There must also be at least three MASTERs that are remotely located (not in the same geographic region as the local MASTERs, and not in the same geographic region as each other), such that at least three additional geographic regions are represented, none of which shares any regional threats with the others. This can be considered as a special case of Byzantine agreement, except that agreement is among regions, and at least n regions are required for Byzantine agreement, where n = 3f +1, and where f is the number of faulty (or failed) regions to be tolerated. Once the basic requirements of geographic diversity are met, any number of additional MASTERs, whether colocated or remote, can be added to any team as needed.
The definitions of "geographic regions" and "regional threats" as used here are determined by policy decisions that are outside the scope of this document. There may also be other team membership requirements that are likewise determined by policy.
Once the minimum team membership requirements have been met, a SHADOWS team can form and begin "rounding itself out," by virtue of extending its membership as the SHADOWS
infrastructure grows. In particular, potential MASTERs that cannot yet participate in a MASTER role (for whatever reason) may volunteer as SERVANTs, and thus become immediately usable by any and all existing SHADOWS teams.
At some point, potential MASTERs may qualify to become MASTERs, in which case they can be assigned to one or more SHADOWS teams in subordinate (non-team-leader roles), and can also be assigned teams of their own as new teams are formed. Note that non-team-leader roles can become "acting" team leaders at any time, if their superiors are unable to perform their roles.
Whenever a team's leadership capacity becomes diminished, either in absolute terms (e.g., diminished capacity, fewer team members through attrition, failure, eviction, etc.), or in relative terms (e.g., team member overload, unacceptable risk profile, etc.), then additional team members are aggressively recruited as necessary (without "lowering the bar" for qualifications, however).

In the event of a failed site, individual subsystems may still function sufficiently as to be able to "call home" and contact remaining portions of the SHADOWS infrastructure. In such a scenario, the surviving resources will be assimilated back into the infrastructure as SERVANTs if they cannot qualify or re-qualify as MASTERs.
The SHADOWS architecture acknowledges this as a starting point, although there is reason to believe that 3f+1 may be overly conservative.
However, because survivability and trust are key to SHADOWS, conservatism is quite acceptable. In any case, if 3f+1 is too conservative, then achieving 3f+1 means that a larger number of faults may be tolerated with no actual changes. On the other hand, SHADOWS uses a linear MDS code (e.g., a variant of Reed-Solomon) to achieve Byzantine agreement.

7.1.1.6 Teams are Stewards of Information to be Stored or Handled When any team (here, the "SubmittingTeam") receives an artifact containing information to be stored or handled, say from another team, or an external source, it is analyzed at least sufficiently to classify the information boundaries if not already known (for example, it is helpful to know the granularity of the object, such as whether it is a file, database record, or email message, etc.).
Conceptually, the artifact's ContentDigest is computed at the coarsest granularity, and then looked up in a local "RecognizedContentlndex" to find out if the artifact or its content has been previously handled. The actual lookup occurs by first checking the SubmittingTeam's "local copy" of the RecognizedContentlndex, and if not found, sending a lookup request message containing the ContentDigest to the accountable ContentStewardTeam. If found either way, then the ContentAlias is now known, and, from a simplistic viewpoint, the storage request has essentially been "fulfilled," since the content has already been stored (of course, there's a little more to it, in terms of tracking accessing parties, etc., but that sort of detail is well known in the art and out of scope for this document).
If the ContentDigest is neither in the RecognizedContentlndex (which may not be completely up-to-date) nor the ContentStewardTeam's RecognizedContentlndex, then the information to be stored is "new" by definition. Note that although the Content Steward Team may eventually be responsible for assigning a ContentAlias to the information to be stored, pairing it with the associated ContentDigest, and "publishing" it to the SHADOWS infrastructure. However, such assignment cannot occur until the artifact and its information content is received and vetted according to the Content Steward Team's rules, because the assignment of a ContentAlias is both automatic and permanent, and thus, by design, cannot be changed later.
Once it has been determined that the information to be stored is actually new, then it is further analyzed to determine if there are any recognizable finer granularities (this can occur in parallel with the initial lookups, if there are sufficient processing resources, and simply aborted if the coarser-grained artifact is subsequently recognized as having been stored already). Because changing even a single bit of an artifact's information content results in a different ContentDigest, by design this means that the resulting artifact is a different artifact, from the SHADOWS viewpoint. However, given the high degree of overlap between two artifacts that differ in as little as one bit, this fact can be revealed (if not already known) by performing successively finer-grained analyses, and any discovered overlap in content can be used to great advantage by SHADOWS.
By way of explanation, consider an artifact such as a book, which contains unstructured information from the viewpoint of a DBMS (database management system), for example, but yet clearly has some sort of structure based on its inherent natural boundaries and granularities (e.g., entire book, chapters, pages, paragraphs, sentences, etc.). In this example, the entire book has a single identity. Each of the chapters also has an identity, as do each of the pages, each of the paragraphs, each of the sentences, and so on. At some point, the difference in identities between the content of two editions of a particular book, for example, may boil down to the specific areas where they differ in content, and this may occur anywhere along the granularity spectrum. This is likewise true for artifacts that are purported to be different - their actual differences can be discovered and revealed.
The problem (and process) of analyzing content in order to identify it is well-suited to the SHADOWS
infrastructure, and in fact was one of its architectural drivers. From the outset, such analysis lends itself well to a cooperative parallel processing configuration, and the more fine-grained the analysis, the more "embarrassingly parallel" the problem becomes. In the SHADOWS infrastructure, each problem to be solved is assigned to a team, and highly parallel problems natural involve the use of teams operating in a highly parallel fashion.

7.1.1.7 Teams are Stewards of Processes, including Memoized Results Every SHADOWS process is an artifact, and thus has an identity, and thus is assigned to one or more teams, each of which has a particular role with respect to that process. In simplistic terms, SHADOWS
teams cooperatively share the management responsibilities of each artifact, and process artifacts are no exception. One SHADOWS team is responsible for storing a particular process (i.e., its executable image is an artifact), another for verifying it prior to distribution or execution, another for executing it, another for Typically, the Submitting"Team is actually part of a computing cluster of some sort, so the "local copy" of the RecognizedContentIndex is most likely distributed over the local cluster, meaning that even a local lookup entails sending a message to the appropriate local team responsible for that particular slice of the RecognizedContentlndex.

monitoring its execution, etc. Thus, when "software rejuvenation" is called for, multiple SHADOWS teams are involved on a cooperative basis.
Another area of process-specific cooperation among teams is in the area of "memoization," which is essentially the capability of looking up known results of deterministic processes and/or functions rather than recomputing them from scratch. We've already noted that each artifact and each process (down to the bit-level) has its own identity, and that each existent combination of artifacts also has its own identity (within the limitation of acceptable granularity). Accordingly, whenever a deterministic process or function accepts a particular set of input values and produces a deterministic set of output values, we can treat the set of input values and the specific process as a new artifact, with an identity. We can also treat the set of output values as an artifact, with an identity. This done, "memoization" is a conceptually simple matter of establishing a "pairing" between the input/process identity and the output identity, such that any already-known output can be looked up and identified. Thus, given any input/process identity, it can be determined (through a lookup) whether the result has been previously computed, and if so, what its identity is. Conversely, it can be directly determined which input/process identities, if any, have generated a particular output identity.
The SHADOWS FACTUAL capability is conceptually "just" a memoization system, but one that is designed to operate at global scale and supercomputing speed, with the high levels of security and survivability commensurate with the SHADOWS infrastructure. Teams are used to perform the processing required to arrive at previously unknown results, and to reach consensus on "vetted"
results prior to memoization (which is particularly important for FACTUAL, because memoized results can be reused as authoritative results that sidestep process execution). As with any artifact, the various content and identities associated with memoized results need to be stored, which involves teams on a SHADOWS-wide basis, as does the lookup of memoized results. If it cannot be readily determined (on a local basis) whether a memoized result exists, the problem to be solved is queued for processing, but can normally be dequeued if a vetted, memoized result is obtained prior to the start of execution. A memoized result that is obtained after execution has already started can be used as a test oracle to verify the result, thereby serving as a built-in system integrity check. Memoization of results, and whether to use lookup of memoized results, is context-specific and configurable at the process level or process-family level. In general, lookups of memoized results may not be utilized when such lookups consume more resources than would be required to simply recompute the results, unless such lookups reduce a local processing load by shifting the lookup elsewhere. The lookups of memoized (and therefore already-known) results are also vetted, by virtue of the fact that lookups (like other operations) are handled by geographically distributed teams that are difficult to attack. Not only must a distributed team reach consensus on the identity of the memoized result, but other distributed teams are typically involved in moving a copy of the content of the identified result to where it is needed, and in all cases the recipient(s) can determine the degree to which consensus was reached in each step. The availability of memoized results is also very helpful in cases of Byzantine failure that would otherwise hamper the achievement of vetted results.

7.1.2 SELF- Software Rejuvenation & Process-Port Combinations Software rejuvenation coordinates heartbeat rekeying and process-port changes with process version updates and restarts. Actual rejuvenation is managed by each MASTER's BOSS
role, which comprises a virtual machine (VM) with special privileges and responsibilities as a "timely, trusted supervisor," one of which is starting a new VM/process pair and migrating its essential state to it through the hypervisorNMM.
Rejuvenation can include a new version of executable from the same source, with no functional changes (using 1-way translation to deter reverse-engineering).
Globally Active Process-Port Combinations Each node maintains a bitmap of globally active process-port combinations, including ports in transition (e.g., due to version update). Assuming one bit per process-port, this requires at most 64 Kbits or 8 KB. An encrypted, authenticated bitmap is distributed periodically and upon request via RECAP. Active process-port updates are also distributed periodically via RECAP, as are periodic authentication requests to verify a non-corrupted image at each node. Incorrect or missing responses trigger SELF
reporting and likely escalation.
Any message received on a globally non-active port constitutes behavior that is both a diagnostic clue and/or a SELF clue, as is any message received on an active but not-ready port at a particular node. The latter could be legitimate within a short window corresponding to propagation delay, if the sender did not receive a not-ready update in time to prevent message transmission. In the latter case, the sender must immediately follow up with a retraction message within a specific time period if the difference in message timestamps (request time minus not-ready time) exceeds the allowable maximum (which is designed to accommodate propagation and update delay). The timely receipt of an authenticated retraction message (say, within a second, or some other policy-specified threshold) prevents escalation.
Site-Local Active Volunteer Nodes (SERVANTs) Each node maintains a map (e.g., a bitmap) of site-local volunteer nodes --nodes whose load is sufficiently light (both absolutely and relatively) that they can accommodate a higher-than-average load (which means that "ready" virtual SERVANT processes and/or SERVANT VMs, possibly running on some combination of MASTERs and SLAVEs, can be granted execution resources).
Given, for example, a maximum of 8K nodes at a particular site, this bitmap of site-local volunteer nodes can be represented in only 1 KB. Each node on a multi-node "street" updates others on its street via street-local multicast, and they take turns updating their neighborhood. Each node in a neighborhood takes its turn updating its community, and each node in a community takes its turn updating the other communities in the site.
When aggregated updates are applied, overwriting of newer data is avoided (the part to be avoided is simply skipped over). Local data (one bit) is always most up-to-date, then street-local, neighborhood-local, community-local, and site-local.
The site-local volunteer bitmap can be ANDed with the site-local process-port bitmap for a particular process-port combo (which is updated in the same manner) in order to find volunteer nodes for the process-port. Volunteers are typically sought at a higher rate than draftees (which can be any node with a ready process-port).
Typically, a random or pseudo-random number is generated to find a starting bitmap offset, and the next available bit is selected (or next N bits if more are needed). A full word size of bits can be read at once.
Compressed bitmaps are also possible (see bit-sliced index manipulation).
No Default Route = Eliminate default route = Change pseudo default route periodically with key changes, etc.
= CSP must specify pseudo default route.

7.1.3 BOSS - Asynchronous Byzantine Agreement BOSS is a distributed, timely, trusted computing base (TCB) and object/subject security system that incorporates Byzantine agreement logic (from the classic "Byzantine generals"
problem) in its decision-making process, and collectively makes security decisions in a "fail-silent"
manner that provides survivability even in the face of multiple failures and/or corrupted nodes. BOSS is implemented and instantiated only in conjunction with a MASTER, and works in conjunction with CHARM to control who gets access to what, and when, while ensuring that unauthorized information is not exposed (not even to other internal systems).
In the SHADOWS infrastructure, BOSS implements the TCB, and thus owns the security of the system. The rest of the components rely on BOSS to make correct security decisions - and it must make 100% of the security decisions.
Any BOSS node that fails or becomes corrupted can be restarted or replaced, and in any case cannot be trusted until its trustworthiness can be re-established from scratch to the satisfaction of the surviving trusted nodes, including, at a minimum, other MASTERs with which it previously participated as a team member.
Keeping in mind that every MASTER is associated with a BOSS component (and that BOSS is a distributed function), refer back to "A SHADOWS Team Comprises Members with No Common Regional Threats" on page 34 for more information on Byzantine agreement.
BOSS is designed to enable the SHADOWS infrastructure to support both classified and unclassified information processing and storage (e.g., to meet or exceed Common Criteria (CC) Protection Profiles (PP) such as the U.S. DoD Remote Access Protection Profile for High Assurance Environments Version 1 .0, June 2000, nominally at EAL5, or potentially at EAL6 if implemented by a single, qualified development organization).
The DoD defines the TCB as the totality of protection mechanisms within a computer system - including hardware, firmware, and software - the combination of which is responsible for enforcing a security policy. A
TCB consists of one or more components that together enforce a unified security policy over a product or system. The ability of a trusted computing base to correctly enforce a security policy depends solely on the mechanisms within the TCB and on the correct input by system administrative personnel of parameters (e.g., a user's clearance) related to the security policy. (DoD 5200.28-STD).
TCSEC1983 defines the TCB as "the totality of protection mechanisms within a computer system, including hardware, firmware, and software, the combination of which is responsible for enforcing a security policy.
Note: The ability of a TCB to enforce correctly a unified security policy depends on the correctness of the mechanisms within the TCB, the protection of those mechanisms to ensure their correctness, and the correct input of parameters related to the security policy."

7.1.4 MASTER- Relationship of MASTER to BOSS
See also: 7.1.3 and 7.2 In the beginning, "Candidate MASTERs" in a system seek to establish trust relationships with existing MASTERs, and failing that, with other "Candidate MASTERs," and if successful, self-organize to become full-fledged MASTERs.
Any "Candidate MASTER" that is unable to establish itself as a MASTER (i.e., a full peer with other MASTERs) may retains its candidacy but is unable to fulfill any of the responsibilities of a MASTER. Rather than waste the resources of such a candidate, it may "volunteer" (or attempt to volunteer) to operate under the auspices of a team of MASTERs, in the role of SERVANT (Service Executor, Repository, & Volunteer Agent -- Non-Trusted).

7.1.4.1 Prerequisites for Being a MASTER
Each MASTER is distinguished from other MASTERs and from non-MASTERs by a set of inherent traits and capabilities possessed only by MASTERs and "Candidate MASTERs" (which are singleton, would-be MASTERs that have not been accepted and deemed trustworthy by a sufficient quorum of other MASTERs and/or Candidate MASTERs, and thus have not yet attained "MASTER-hood").
Conceptually, one could think of MASTERs and "Candidate MASTERs" as being genetically and behaviorally related in ways that are mutually detectable.
Genetics. At the hardware level, each MASTER has a one-to-one correspondence with, and physical attachment to: 1) a BOSS device or subsystem that has a universally unique cryptographic identity, and 2) a SELF device or subsystem that can cryptographically establish whether the BOSS
device or subsystem and any other arbitrary entity claiming to be part of the same system are indeed parts of the same "self." In a preferred embodiment, "self' in this context refers to a bona fide SHADOWS
infrastructure. This test is somewhat like a DNA-based identification test where parts of the same "self' share a common DNA
sequence, so that in concept, your nose and your right hand could both "claim"
to be part of the same self (e.g., "you") - and the claim could be definitively verified.
Behaviors. A MASTER or "Candidate MASTER" is also behaviorally related to other MASTERs and "Candidate MASTERs" that are part of the same SHADOWS infrastructure, and these behaviors are intended to be collectively inimitable. By way of analogy, there's a helpful saying, "If it walks like a duck and talks like a duck, it's a duck." However, in the SHADOWS infrastructure an ability to imitate behavior intended to be inimitable is merely inconclusive - only the converse is true:
"If it does not walk exactly like a duck, OR it does not talk exactly like a duck, then it is not a ducts' As a consequence, any non-self behavior by a MASTER or "Candidate MASTER" is taken as evidence of counterfeit'. There is no concept of "once trusted, always trusted" - a trusted MASTER can become untrusted and therefore shunned at the first sign of misbehavior. In a preferred embodiment, a shunned MASTER can be rejuvenated, put "on probation" as a closely watched "Candidate MASTER," and rehabilitated to the extent possible.
During rehabilitation, it may fulfill roles typically assigned to a SERVANT (which is inherently non-trusted), or possibly, the roles of a "Probationary MASTER" (whereby it "thinks" it's a MASTER and is apparently allowed a voice in decisions, without being able to actually affect their outcomes, and its decisions are closely monitored for correctness).
In a preferred embodiment, a shunned MASTER (now a closely watched "Probationary MASTER") that fails to rehabilitate fully may either be shut down (turned off) or continue to be shunned, but in the latter case may not recognize that it has been shunned (a context is createdt, as part of putting it on probation, that "keeps Note that non-self behavior by a MASTER is distinguished from the misbehavior of, for example, a communication channel used for MASTER-to-MASTER communication. The SHADOWS infrastructure attempts to determine and isolate the actual source(s) of misbehavior - not doing so would render it much more vulnerable to denial-of service (DoS) attacks.
t Somewhat like a "honeypot" or "honeynet" - configurations used by researchers and system administrators to monitor attackers.

up appearances" in such a way as to marginalize the shunned MASTER while consuming minimal resources).

7.1.4.2 Resources All system resources are partitioned in such a way as to allocate the management of them among all the MASTERs in the system. Every MASTER leads at least one team, and also participates on multiple teams led by other MASTERs.
Each MASTER is the primary steward of several sets of resources, and for each such set, leads a team of MASTERs that is collectively responsible for that set of resources, despite the simultaneous failure or corruption of any number of MASTERs (up to a policy-specified threshold).
Failed and/or corrupted MASTERs (including the team leader) are adaptively tolerated until detected, at which point they are replaced.
A system's resources essentially refer to its capacity as a network of "working storage" comprising the areas of communications, processing, storage, and energy. Each of these resource areas can be further refined in terms of understanding their capacities as resources, constraints on their use (or non-use), and other resource-specific aspects. For example, the communications resource area comprises connectivity and bandwidth, as well as quantitative quality levels for each (connectivity comprises availability and reliability, for example, and bandwidth comprises rate, latency, and jitter, among others), Similarly, the processing resource area comprises the ability and readiness to accomplish particular tasks (with accompanying arrival rates, service rates, etc., as well as quantitative quality levels). The storage resource area comprises the ability and capacity to store information to transient and/or persistent memory and subsequently retrieve it (further comprising various addressing means and rates, with accompanying quantitative quality levels). The energy resource area comprises the various energy sources and sinks (for example, having sufficient energy to power a combination of system components during a particular time window, and to absorb, store, or reject any waste energy produced during that same window), with accompanying quantitative quality levels.
Each MASTER maintains a viewpoint of the resources claimed to be available in the system, both locally and elsewhere, including its own, in a radial fashion. In a preferred embodiment, each resource claim is associated with a reputation that can be used to weight that resource claim.
Relative proximity to the center (as represented by distance from a set of local MASTERs) determines relative update detail and frequency.
For example, local resources (those comprising the center) are the most detailed and frequently updated, whereas nearby (but non-local) resources are less detailed and less frequently updated, and remote resources are the least detailed and least frequently updated.
Locally, each MASTER summarizes the resources for which it is responsible, normalizes the summary to a format that is standardized among the local MASTERs, and shares it with its immediate peers (i.e., the other local MASTERs) on a mutually agreeable schedule. In a preferred embodiment, the schedule of local updates is both event-driven and periodic, but the period is actually time-varying on a prearranged basis, as agreed among the local MASTERs (failure to meet the time-varying requirements provides a hint to SELF
that may trigger an "auto-immune" response).
In a preferred embodiment, each MASTER also uses the local resource summaries provided to it by its immediate peers (the same set of peers referred to in the previous paragraph) and creates a further summary comprising their collective local resources, then normalizes the collective summary to a format that is standardized among those peers, and shares the collective summary with non-local-but-nearby MASTERs on a mutually agreeable schedule. (Conceptually, in a set of concentric rings centered on the local MASTERs, these non-local-but-nearby MASTERs would correspond to the nearest larger ring). In a preferred embodiment, the schedule of next-ring updates is both event-driven and periodic, but the period is actually time-varying on a prearranged basis, as agreed among the local MASTERs (failure to meet the time-varying requirements provides a hint to SELF that may trigger an "auto-immune"
response).

7.1.4.3 Reputation In a preferred embodiment, Byzantine agreement via BOSS is used locally by the BELIEF (Bayesian Emergent Learning & Intelligent Evaluation of Facts) subsystem to create a 4-bit reputation estimate for each process at each node, including its own, based on its belief in reputation estimates proffered by others, which The notion of radial proximity can be substituted with a hierarchical notion based on fixed granularity - e.g., neighborhood/town/state/country.

are weighted by their own reputations and normalized to a 4-bit result. The local BOSS subsystem maintains its own view of the 4-bit reputation estimate for each process at each node, including its own, as a rolling average that can be queried at a rate independent of its update rate.
A reputation vector of length (2b) bits contains the last (2(b-)) r-bit reputation estimates. Thus, if b=128 and r-8, the vector is (b/8) =16 bytes long and contains (b/r) = 16 reputation estimates. If r=4, then the same vector would contain the last (bir) = 16 reputation estimates. Alternatively, with r=4, then 16 reputation estimates could be stored in only 8 bytes.

7.1.4.4 Weighting Claims by Reputation Given a c-bit ClaimedValue and an r-bit ReputationForClaimedValue proportional to the confidence in the claimant with respect to such claims (or perhaps overall), where 0 is worst-case, and (2 )-1 and (2r)-1 are the respective best-case values for each variable, the w-bit ClaimWeightedByReputation value can be calculated as:
Claim WeightedByReputation = (ClaimedValue * ReputationForClaimedValue) /(2(")) where all variables are integer, w < (c+r), and (c+r-w) is usually a constant.
The division by a power of 2 can be accomplished with a simple right-shift of its exponent, yielding ClaimWeightedByReputation = (ClaimedValue * ReputationForClaimedValue) >> (c+r-w) For example, given a 4-bit ClaimedValue (c=4)and a 4-bit ReputationForClaimedValue (r=4) proportional to the confidence in the claimant with respect to such claims (or perhaps overall), where 0 is worst-case and 15 is best-case for each variable, their 4-bit weighted product,ClaimWeightedByReputation, can be calculated as:
Claim WeightedByReputation = (ClaimedValue * ReputationForClaimedValue) >> (4 + 4 - 4) Although the interim product is (c+r) bits wide (4+4=8 in this case), shifting it right by (c+r-w) bits normalizes it back to the desired w-bit result, where, in this case, w=4.

See also:
CORE -- MOVING AVERAGE CALCULATIONS -- UpdateMovingAverages(iValue) in section 9.7.
7.1.4.5 CENTRAL CONCEPT
The idea is to complement - but avoid the necessity of - conventional synchronous (lockstep) execution of identical instruction on identical CPUs at exactly the same time, in a high-availability (HA), duplicate modular redundancy (DMR) or triple modular redundancy (TMR) configuration with voting logic to determine the correct outcome.
The lockstep approach is useful for quickly detecting and handling transient errors and hardware problems, including software errors associated with their proper handling (by masking such errors when possible). This approach can greatly improve the availability of a particular machine in a friendly environment, but it does nothing for the availability in a hostile one. Thus, if the HA server or site is compromised or taken down, the system immediately becomes either untrustworthy or unavailable.
Note that being untrustworthy - but still available - is the worst possible outcome, unless the system is specifically designed to assume the presence of untrustworthy nodes.
In contrast, the SHADOWS "Asynchronous Byzantine Agreement" approach removes the need for lockstep execution (although it can still be used, but with less benefit, since the ROI
would be greatly diminished).
The need for maximal asynchronous operation cannot be underestimated in a high-survivability system, since all manner of network traffic problems (and corresponding mitigations) can occur due to anticipated or actual attacks, or even just normal congestion.
Instead of fine-grained, instruction-level voting, SHADOWS assumes that there are no completely trustworthy individual nodes, but that consensus among a policy-determined quorum is sufficient to warrant trust. In particular, consensus is reached on the final result rather than on each instruction involved in its calculation.

This approach has the significant advantage of being able to incorporate arbitrary levels of diversity in multiple dimensions, such as geographic locations, political environments, security mechanisms, algorithms, software versions (e.g., differentiating among authors, skill levels, code versions, programming languages, build environments, certification levels, etc.), CPUs, memory systems, EMP/radiation hardening, physical access controls, etc.).
By comparing only the results (or more precisely, a representation of the results) through the use of a Byzantine agreement algorithm (from the classic "Byzantine generals" problem), the overhead associated with voting logic can be minimized while still establishing a nearly arbitrary level of trust. Even nodes with intermittent hardware and/or error-prone or corrupted software can contribute useful results.
In a SHADOWS or SCRAM network, voters may actually be data consumers not involved in the calculation (that is, they're not the "Byzantine generals"). In this scenario, the voters do not have the actual results, and must first receive results from the producers (i.e., from the Byzantine generals). First, the producers can each calculate the appropriate result (including compression, encryption, and a CRC or message digest as appropriate). However, rather than each producer transmitting the entire result to each consumer, the producer then computes an FEC-encoding of the result message (with a suitable rate code) and extracts its "share" of the FEC-encoded message, which it also encrypts, and to which it may add a MAC (message authentication code) and/or digital signature. The "share" is then transmitted to any consumers that need it, along with identifying information as appropriate for the communications protocols in use. The code rate (n,k) used determines the number of uncorrupted "shares" (i.e., k of n) that must be received in order to decode a result and determine its validity.
Data consumers may be addressed individually via unicast, or collectively via multicast, but in both cases the ability of a group of authorized (but not necessarily trusted) producers to send FEC-encoded slices (or "slivers") to the consumers greatly increases the likelihood that each consumer receives the correct desired data at the maximum rate it can be received (such as when limited by the consumer's aggregated inbound bandwidth, which may be greater than the individual outbound rates of the individual senders).

7.2 BOSS (Byzantine Object & Subject Security) The BOSS subsystem is built on the principles of a Trusted Computing Base (TCB) as are known by those skilled in the art. In addition, however, BOSS is further constructed with accurate time-keeping mechanisms and hardware support for the logic and processing required to implemented a Timely TCB, (TTCB), whose principles are also known in the art, but to a lesser degree, and are well described elsewhere. The BOSS
subsystem may also be implemented (or emulated) in software, given an environment is sufficient to meet the particular set of needs.
The novel BOSS hardware is tamper-proof or tamper-resistant and provides enables synchronization and reconciliation of multiple time-keeping sources that are authoritative to varying degrees (e.g., local atomic clocks, local crystal-controlled oscillators, terrestrial radio or satellite-based signals such as WWV, GPS, etc.).
The BOSS hardware also securely implements various cryptographic processes and provides secure encrypted storage of associated variables, keys, and so forth. The BOSS
hardware also securely implements the error and/or erasure coding mechanisms described below, that enable the use and application of forward error correction (FEC) as described below.

7 .2.1 Minimum Redundancy for Byzantine Agreement It is accepted in the art that the minimum number n of team members required for Byzantine agreement is 3f +1, where f is the number of faults to be tolerated, and no more than one-third of the team members are faulty (whether benign or malicious). However, SHADOWS uses coding theory rather than voting to implement Byzantine agreement. Thus, Byzantine agreement among k out of n MASTERs on the same SHADOWS team is sufficient to tolerate IF faults, where IF = (n-k)/2 and n>k in the general case off faulty The SHADOWS architecture acknowledges this as a starting point, although there is reason to believe that 3f+1 may be overly conservative.
However, because survivability and trust are key to SHADOWS, conservatism is quite acceptable. In any case, if 3f+1 is too conservative, then achieving 3f+1 means that a larger number of faults may be tolerated with no actual changes. On the other hand, SHADOWS uses a linear MDS code (e.g., a variant of Reed-Solomon) to achieve Byzantine agreement.

and/or malicious team members, assuming that it is not known which f of the n MASTERs are faulty and/or malicious. This means that for the case when f-1, then n=2k.
If, instead, it is allowed that up to c MASTERs have simply crashed or failed to respond, and it is known which ones these are, then SHADOWS can tolerate a combination of up to c known crashed or unresponsive MASTERs and up to f faulty or malicious (but unknown) MASTERs, where (c+2f) < (n-k).
7.2.2 Byzantine Agreement Among Peers In a preferred embodiment, a multiplicity of peers (say, n of them) representing only a portion of the peers that are competent, ready, and willing to perform, is responsible for a particular computation, and consensus is required among at least k of them (where k <= n). Each of the n peers uses the same information basis to independently perform the computation (which should be identical to those created by the other n-1 participating peers). The computational result is then compressed, encrypted, sliced, and FEC-encoded with a systematic (n,k) code, such that any k of the slices (where k <= n) is sufficient to correctly retrieve the consensus result.
Each of the n peers shares only one slice, which means that the threshold value k (which may vary with context) determines how many correct slices - and thus, how many correct peers - are required to reconstruct the consensus result. This technique (which, in a preferred embodiment, is also used in other contexts) contributes to Byzantine fault-tolerance, since up to (n-k) faulty contributors can be ignored (however, the SELF and BOSS subsystems take note of such failures).
In the case of each of the peers needing to know the consensus result, each peer can simply share a single slice with the others, and the specific slice to be shared is tied to the relative number of each peer within the set of n collaborating peers (e.g., peer 1 shares slice 1, peer 2 shares slice 2, and so on, up to peer n, which shares slice n). Each peer digitally signs its slice so that recipients can verify whose it is (i.e., that it was actually provided by the peer with which it is identified). Any peer that shares the "wrong" slice, or a "corrupted" slice, or fails to share a slice, becomes included in the set of up to (n-k) faulty contributors whose slice can be ignored during this round of computation (but the responsible peer is noted, of course). Once a peer has received at least (k-1) of the possible n slices from the other peers (or k slices if the peer doesn't have its own slice), then the consensus result can be independently reconstructed locally without further communication.
The use of this method, while not avoiding the intended redundancy of computation, does eliminate unnecessary communications overhead. Instead of each peer sharing a copy of the computational result, or alternatively a copy of a digest of the computational result, for a total of n copies - each peer shares only a fraction (1/n) of a copy that has been enlarged slightly (to n/k of its original size), so that in the aggregate only 1 copy is shared at most, and that copy is n/k of its original size (where k <= n). Each slice is appropriately encrypted and digitally signed by each sending peer prior to distributing it to the other peers, in order to assure accountability. Consensus can be reached despite the Byzantine failure of up to (n-k) peers.
7.2.3 Byzantine Agreement Among Peers, as Viewed by Third Parties The use of this method is a tremendous benefit where one or more authorized third-party entities needs a consensus result from, say, n peers that are collectively providing a service, and the third-parties not only need to know that consensus is reached, but also what the actual result is.
The particulars of the method are the same as those stated above, except that each peer also sends one or more suitably encoded (encrypted, etc.) slices of the consensus result to the authorized third parties.
In cases involving non-local communications, the communications mechanisms and current operational profile are fundamentally tied to the level of redundancy required, as are the encryption mechanisms, so the same exact slice used for consensus is not sent to the third party, but rather, each peer creates a new set of n'slices from its own unencrypted slice of the consensus result, and these are FEC-encoded with a systematic (n',k~ code, such that any k' of the n' slices (where k' <= n') is sufficient for third-party entities to correctly reconstruct a single slice of the k slices needed to reconstruct the original consensus result. In this scenario, each sending peer's values for n' and k' are independent of those used by the other peers, and may be heavily influenced by the properties of this communications channels, since extra redundancy is may be appropriate.

Each peer represents a single potential Byzantine fault or failure, and thus gets only one vote in the original consensus result.

For example, based on the current levels of network congestion between itself and the destination third-parties, one peer may independently decide to send 20 slices to a third party, such that any 15 slices is sufficient to reconstruct the sender's original slice of the consensus result.
If there are 4 channels to be used for the transmission, for example, the sender may opt to split the slices up among the available channels such that each channel handles a few of the slices, according to its individual data rate, congestion, reliability, etc. The same principles apply, however, and consequently, the authorized third-party entities need only to receive any legitimate k' of n' slices from a given sender to reconstruct that sender's single slice of the consensus result. Further, the authorized third-party entities need only to reconstruct legitimate slices from any k of n senders to reconstruct the original consensus result.

7.3 MASTER (Multiprocessor Adaptive Scheduler & Task Exec utor/Redi rector) 7.3.1 Load-Balancing SHADOWS Native Processes In general, SHADOWS native processes do not push data around as loads are shifted and requests are made, etc. Instead, IDs are pushed around, and if a process actually needs the associated data, it can request it (on a "pull" basis), or, if there are no other operands, just forward the request to the team that owns the data (resources permitting). The act of pushing an ID, however, has the effect of putting the team owning the associated data on notice that it may be needed soon, essentially identifying the ID as a speculative prefetch opportunity.
In its simplest form, a SHADOWS native process has an input queue and an output queue, as depicted above, on the left. Ignoring security issues, the input queue accepts tuples of the form {TxID, Operand ID
List }, performs the work of the process which is to generate one or more Result IDs, then enqueue them for distribution. The transaction id (TxID) ties the Operand IN (received as input) to the Result IN associated with the processing results.
Under the covers of a SHADOWS native process, however, there are actually a number of latency-hiding, asynchronous parallel processes whose purpose is to keep a "simple process"
busy doing actual work for as long as there are queued up requests. This is depicted in Fig. 7.3.1, on the right.
In a preferred embodiment, as depicted in Fig. 7.3.1, the input queue [1]
accepts tuples of the form { TxID, Operand ID List }. An interior process immediately fires off requests [2] for the actual data associated with the Operand IDs, requesting the teams that have the specific data to send it along to a specific destination team (which may be the current team or some other one). A message may also sent to the specified destination team to put it on notice that data for the particular TxID may soon be arriving (unexpected data may trigger defensive behavior).
When the data is retrieved by the team that owns it, it may be sent to the specified destination team [3], thus, for a given TxID, the outbound message [2] and the inbound message [3] are unlikely to occur on the same machine, unless the sender of [2] specifically wishes to receive and process the data [3] within its own team for some reason. When the specified destination team collects the operands [3]
and stores them in RAM [4], it may then enqueue process descriptors [5] (including pointers to the operand data in RAM) into the input queue of the embedded "simple process." The simple process may later dequeue the process descriptor [6]
and associated data, perform its process and enqueue the "raw" results [7] to its output queue. A
postprocessor may dequeue the raw results [8] in order to create one or more digests of the results, thereby generating one or more Result IDs. The Result IN and raw data may be "pushed"
to the appropriate teams [9] (i.e., to the owners/stewards, based on the Result IDs), and then a tuple { TxID, Operand ID List, Result ID List } (or equivalent) may be enqueued to the output queue [10].

7.3.2 Forces Influencing SHADOWS Adaptive Load-Balancing Each MASTER may have its own viewpoint of the entire system based on its own local statistics and a global (i.e., non-local) statistical summary of the rest of the system, so that it may "think globally, but act locally."
Statistics may be summarized into simple percentiles (a preferred embodiment may use quartiles, such that any statistic may be summarized in just four states - requiring just two bits -for the purpose of decision-making). Each MASTER may communicate its local statistics to its immediate neighbors, both periodically and whenever a state-change occurs. In a preferred embodiment, the neighboring MASTERs may individually calculate the statistics for their "neighborhood" and communicate them upward to a higher aggregation level. In an alternate embodiment, the neighboring MASTERs may "take turns" rolling up the statistics for their "neighborhood" and communicating them upward to a higher aggregation level.
A systems thinking diagram as depicted in Fig. 7.3.2-1 may help teach how each MASTER's work-scheduling decision-making may be influenced by the "forces" associated with the current values of the system variables. In the diagram, the arrows marked with "S" represent forces or trends that may cause one system variable to affect another in the same direction (i.e., the pointed-to variable may be influenced to go up or down as the other one does). Likewise, the arrows marked with "0"
represent forces or trends that may cause one system variable to affect another in the opposite direction (i.e., the pointed-to variable may tend to go down if the other one goes up, or up if the other one goes down).
In understanding the operational description that follows, it may be useful to know that cooperating MASTERs in the SHADOWS infrastructure may volunteer to do work, thus making themselves eligible to be delegated to by other MASTERs. Any MASTER may delegate work to any volunteer without requiring further permission. Volunteers who receive temporary overloads may re-delegate the work (if authorized), or push it back, or ignore it (which is effectively equivalent to "crashing," which may be noticed by the delegator).
MASTERs may also delegate work to SLAVEs and/or SERVANTs, but only those for which they are responsible.
Given any resource capacity of interest (e.g., processing capacity, memory capacity, storage capacity, energy capacity, etc.), the two primary variables driving resource-balancing are the AverageLocalNodeUtilization 0 and the AverageNonLocal Utilization 0. Examples:
A utilization of 100% means that the node is operating exactly at the desired utilization goal (which may be somewhat less than its raw capacity), whereas 70% means that its capacity is underutilized by 30%. Likewise, 150% means that 50% more work is queued at the node than it was intended to handle all at once. It could still be accomplished eventually, but service level agreements (SLAs) might not be met, and customer satisfaction might suffer. An average utilization that continually exceeds 100% is one indication that system capacity should be increased.
For simplicity, in the following discussion we'll conceptually aggregrate all of a node's resources into the concept of a relative workload capacity - the ratio of the local node's capacity to accomplish requested work compared to the average capacity of the other nodes. We'll use the AverageLocalNodeUtilization 0 and the AverageNonLocalUtilization 0 variables to represent their respective resource utilizations. We can then define an important dependent variable we'll call LocalNodeRelativeWorkload 0 (i.e., the node's workload, as a percentage, compared to what other nodes are currently experiencing, on average). The LocalNodeRelativeWorkload 0 is calculated as:
((AverageLocalNodeUtilization / AverageNonLocalUtilization) -1) * 100 Thus, a LocalNodeRelativeWorkload 0 value of +20% means that the local workload is 20% above the average workload of the other nodes (i.e., excluding itself), whereas -20%
means that the local workload is 20% below the average of the other nodes. This percentage can then be compared to the current percentile (or quartile, etc.) thresholds, in order to classify the workload of the local node relative to the other nodes.
One goal is to determine which "load category" the local node fits best:, such that one of four values (which requires 2 bits to represent) can be assigned for classification purposes, e.g.: Very Heavily loaded (3), Heavily Loaded (2), Lightly Loaded (1), Very Lightly Loaded (0). While any eligible node can be delegated to, those with the lightest load may receive a statistically larger fraction of any delegated work, and those with the heaviest load may receive the smallest fraction (and possibly zero).
Workload adaptation is continuous, by enjoys relatively low overhead due to the hysteresis induced by basing local adaptation decisions on relative workload, and by classifying the relative workload into a small number of statistical categories that correlate well with the actions to be taken.

Two additional dependent variables are the LocalWillingnessToVolunteer 0 and the LocalWillingnessToDelegate 0, both of which are intuitively related to the relative workload.
Volunteering: The trend of the local node's willingness to "volunteer" to take on additional load (LocalWillingnessToVolunteer 0) is opposite the trend of LocalNodeRelativeWorkload 0. Thus, if a local node's relative workload increases, then its willingness to volunteer decreases, and vice-versa. Over time (i.e., there's a delay), any change in the local node's volunteer efforts may also drive its average utilization (AverageLocalNodeUtilization 0) in the same direction (increased volunteering may increase its own utilization, and vice-versa). However, over time, the local node's volunteer efforts may also drive the utilization of the other (i.e., non-local) nodes in the same direction. The more that the local node delegates its work to other nodes, the more work the other nodes have to do, which decreases their utilization, and vice-versa.

Delegation: The trend of the local node's willingness to "delegate" part of its workload is in the same direction as the trend of Local NodeRelativeWorkload 0. Thus, if a local node's relative workload increases, then its willingness to delegate increases, and vice-versa. Over time (i.e., there's a delay), any change in the local node's delegation efforts may also drive its average utilization in the opposite direction (increased delegation may decrease its own utilization, and vice-versa).
However, over time, the local node's volunteer efforts may also drive the utilization of the other (i.e., non-local) nodes in the opposite direction. The more that the local node volunteers to do the work of other nodes, the less work the other nodes have to do, which decreases their utilization, and vice-versa.

7.4 SLAVE (Storage-Less Adaptive Virtual Environment) The virtual machine (VM) sandboxes and software layers depicted above have been or could be implemented in any of several open source and/or commercially available virtual memory environments.
Examples of systems that could be tailored to the SLAVE PUMP in a straightforward manner may include the open source software "Xen" (now a part of Linux), "OpenVZ," and others, as well as the experimental software "Denali" (and its variants, from the University of Washington), both of which are well-suited to Unix, Linux, and various BSD environments.
The key novel capabilities introduced by interfacing to the SLAVE PUMP rather than to traditional computer hardware, include:
= A dedicated set of memory-mapped, "user-space" registers, per SLAVE CPU
process = Direct, user-space access to dedicated "phantom" peripherals as projected by the SLAVE PUMP
= Direct, user-space access to system time, timers = Direct, user-space access to process accelerators implemented in hardware by SLAVE PUMP
= Reduction in hypervisor overhead (shifts away from SLAVE CPU to hardware and/or specialized processors) = Absolute control over the software and hardware operating environment (including BIOS) Each PUMP implements two inter-PUMP HT links. Ideally, these are 32-bit HT
links, but could be 16-bit if FPGA I/O pins are insufficient. Here, PUMP 0 is distinguished in that it terminates both ends of the daisy chain, while the other PUMPs each implement an HT tunnel (with a cave for each PUMP's local functions).
Thus, in the diagram above, one of PUMP 0's links connects to PUMP 1 and the other to PUMP 2. PUMP 1 and PUMP 1 also connect to each other. In theory, the number of PUMPs is limited primarily by requirements of the type of bus chosen (e.g., HT is limited to 31 devices).
Each PUMP also emulates at least one HT tunnel (with a cave for PUMP
functions) between a pair of HT
links connected to pairs of corresponding processors (represented here as a pair of MASTERs and multiple pairs of SLAVEs). Here, PUMP 0 is shown with only one pair of processor-to-PUMP HT links for the MASTER processors (plus a ClearSpeed ClearConnect Bus interface), whereas the other PUMPs are depicted with two pairs of processor-to-PUMP HT links (two pairs of SLAVEs per PUMP are hoped for).
Ideally, these processor-to-PUMP HT links are 16-bit, but could be 8-bit if FPGA I/O pins are insufficient (16-bit processor-to-PUMP HT links are more important to PUMP 0 than to the other PUMPs).
In general, when only one coherent (i.e., cache-coherent) inter-processor HT
link is available on each processor (such as in a 2-way Opteron configuration), that link is connected directly to a PUMP (rather than to its mating processor). The PUMP then emulates each processor's mating processor. This allows each processor to see all of the PUMP's memory as belonging to the other processor.
Since the PUMPs collaborate, their collective memory is shared among all the connected processors, regardless of which PUMP each processor connects to. "Local" (non-PUMP) processor memory is shared between directly connected processors only if the links connecting them are coherent;
otherwise, such processors are connected only for I/O and inter-processor message-passing.
Note: Given a pair of SLAVEs (say, 1w and 1e, for example), one of the processors could be replaced by a specialized processor adhering to the processor's bus protocol. Examples would include replacing an AMD
Opteron processor with a DRC or XtremeData coprocessor. Such a coprocessor would have the same access to memory as the replaced processor.
Ideally all PUMP devices are identical - at least in hardware part number, if not design. Failing that, PUMP 0 may be unique and all others (1..n) must be identical.
PUMP 0 is associated with the MASTER processors, which are responsible for module initialization, etc.

8 CHARM - Compressed Hierarchical Associative & Relational Memory In a preferred embodiment, a SCRAM node is composed of 1 to 4 fully connected quadrants; each quadrant contains 4 lobes and controls up to 8 optional "blades" (discussed elsewhere), in any combination, and each blade is fully connected to each lobe in the corresponding quadrant. Each lobe comprises a number of means whose conceptual interaction is depicted above. In a preferred embodiment, one or more of the blocks depicted above as (optional) "blades" also are implemented internally (i.e., within a lobe) in a non-bladed manner, so that the specific means are also built into the lobe and provide the corresponding capability inherently.
In a preferred embodiment, as depicted above, a MASTER CPU or SMP 0 works cooperatively and symbiotically with a MASTER PUMP 0 via a direct communication path (for example, HyperTransport). The MASTER CPU or SMP 0 typically has a multiplicity of volatile DRAM memory channels (preferably SECDED or Chipkill ECC) that it makes partly available to the MASTER PUMP 0 (e.g., some memory is set aside for local use, and some is allocated to the MASTER PUMP 0). The MASTER
PUMP 0 has a multiplicity of non-volatile, high-reliability low-power DRAM memory channels that it makes partly available (e.g., as a block device) to the MASTER CPU or SMP 0.
In a preferred embodiment, a MASTER CPU or SMP 0 works cooperatively and symbiotically with one or more SLAVE PUMPs 0 via a direct communication path (for example, HyperTransport). Each SLAVE
PUMP 0 typically has a multiplicity of SLAVE CPU/SMP devices 0 associated with it, and that are entirely dependent on the SLAVE PUMP 0 for all non-local input/output. In a preferred embodiment, each SLAVE
PUMP 0 emulates any devices required to bootstrap each of its dependent SLAVE
CPU/SMP devices 0, and well as all communications and storage devices, so that all aspects of the software execution environment for the SLAVE CPU/SMP devices 0 are under control of the SLAVE
PUMP 0, which is acting on behalf of the cooperative pairing of MASTER CPU or SMP 0 and MASTER PUMP 0.
Each SLAVE
CPU/SMP device 0 typically has a multiplicity of volatile DRAM memory channels (preferably SEC-DED or Chipkill ECC) that is entirely available to the SLAVE PUMP 0, which is acting on behalf of the cooperative pairing of MASTER CPU or SMP 0 and MASTER PUMP 0. Some memory is set aside for local use by SLAVE CPU/SMP devices 0, and some is allocated to the MASTER PUMP 0, which may delegate it back to its various SLAVE PUMP devices 0 in any allocation. In general, the various CPU/SMP devices 0 and 0 use local memory in traditional ways. However, because the local memory is limited to SECDED or Chipkill ECC (and the latter is costly), it is well known in the art that high-density DRAM cannot be relied upon for data held in memory long term (if even there is no risk of power failure), due to the relatively high single-event upset (SEU) probability leads to an accumulation of uncorrectable errors. Thus, in a preferred embodiment, only a portion of the potentially large DRAM capacity is allocated for local memory use, and only to processors whose processes are fault-tolerant to an appropriate degree (e.g., checkpointed and/or executed redundantly). The remainder of the DRAM capacity is allocated to the various PUMP devices 0 and 0, which construct a multiplicity of very high speed virtual block storage devices from aggregations of the corresponding memory channels, using a suitable (n,k) FEC to encode and decode block data stored into (and retrieved from) the virtual block storage devices. For example, if commodity AMD64 CPUs are used for all CPU/SMP devices 0 and 0, and there is one MASTER CPU or SMP 0, and four SLAVE CPU/SMP
devices 0, and each CPU has dual-channel memory configuration, then there are ten (10) memory channels (averaging 3.2 GB/second each, with low-end memory devices), for a maximum aggregate rate of about 32 GB/second. In this configuration, for example, an with (n,k) FEC code where n=10 and k=8 would yield an effective throughput of about 25 GB/second (80% of 32 GB/second), and any 8 of 10 channels would be adequate to preserve data integrity (in other words, any two memory channels --or an entire CPU with both its channels - could fail entirely, and yet no data loss would occur in the virtual block device). For performance, all FEC encoding and decoding would occur in the various PUMP
devices 0 and 0, which would be typically implemented in reconfigurable logic, ASICs, or other hardware implementation. In a preferred embodiment, FEC encoding is similarly applied to remote (non-colocated) memory systems, such that communications can be used to combine (n,k) FEC codes advantageously (for example, a local (10,8) code could combine with a remote (10,8) code to create a (20,16) code that would preserve data integrity as long as any 16 of the 20 channels were available). In a preferred embodiment, FEC encoding is similarly applied to the hard disk storage devices in the system, as well as to the non-hard disk storage devices in the system (for example, a large number of USB flash memory or SD flash memory devices), such as those accessible via PEERS 0 (Packet Engines Enabling Routing & Switching) and via the various "outrigger blades" 0.

In a preferred embodiment of a SCRAM "Lobe," one or more of the means collectively labeled here as "Blades" also are implemented internally (i.e., within a lobe) in a non-bladed manner, so that the specific means are also built into the lobe and provide the corresponding capability inherently.
In a preferred embodiment, as depicted above, HyperTransport (HT) is used to create a system bus for connecting a MASTER CPU or SMP 0 (comprising at least 3 HT links) with a MASTER PUMP 0 and two SLAVE PUMPs 0. The MASTER CPU or SMP 0 is responsible for initializing the various HT devices it can reach, up to any non-transparent bridges, as well as any bridged non-HT
devices such may be attached to or reached via the PEERS fabrics 0, up to any non-transparent bridges. In a preferred embodiment, the MASTER CPU or SMP 0 has a multiplicity of volatile DRAM memory channels (preferably with SECDED or Chipkill ECC) that can be accessed by the PUMP devices 0 and 0 as appropriate, via the HT links. The MASTER PUMP 0 has a multiplicity of non-volatile, high-reliability low-power DRAM memory channels that it makes partly available (e.g., as a block device) via any of its HT links (in a preferred embodiment, the MASTER PUMP 0 has includes at least four such links - 2 tunneled and 2 bridged).
In a preferred embodiment, various combinations of single or multiple MASTER
CPU or SMPs 0, MASTER
PUMPs 0, SLAVE PUMPs 0, and HT-bridges (which are implicit in the PEERS
fabrics 0 and "Blades" 0, due to the presence of PCI Express), up to the maximum number of addressable HT devices (note that, by design, the number of SLAVE CPU/SMP devices 0 is not included in this count), can be coupled together in a double-ended daisy-chain fashion. The HT connection from MASTER PUMP 0 to CRAY SeaStar 0 (optional) is exemplary of a connection to a vendor-specific interface, in this case to any of a family of CRAY-specific communications chips (SeaStar, SeaStar2, etc.) designed to provide high-performance communications between components of a supercomputing system. Note that the HT
connection to CRAY
SeaStar 0 (or some other bridged interface) could alternatively originate at either (or both) of the SLAVE
PUMPs 0 rather than the MASTER PUMP 0 (any combination of "available" PUMP
interfaces is feasible); it could also alternatively originate at the MASTER CPU or SMP 0 if a processor with 4 HT links is used, or an SMP comprising at least a pair of 3-link CPUs is configured (leaving at least one HT link free).
In a preferred embodiment, a MASTER CPU or SMP 0 works cooperatively and symbiotically with one or more SLAVE PUMPs 0 via its direct HT paths, for the primary purpose of implementing high-performance computing cluster using "slaved" commodity processors (e.g., the SLAVE CPU/SMP
devices 0), which need not be homogeneous. Each of the SLAVE CPU/SMP devices 0 typically has (or in any case, is required to have) only a single HT link, if any (and if none, then is interfaced to a device that can supply at least one).
Thus, a key aspect of the SLAVE PUMP 0 is its ability to directly interface to a multiplicity of HT devices having only singleton HT links, and to provide communications with and among them and with other devices in the system as authorized, without requiring the attached devices have multiple HT links of their own (which is normally required for HT-based multiprocessor communication).
In a preferred embodiment, the SLAVE PUMPs 0 implement external interfaces comprising a 16-bit HT
tunnel pair, and at least five 16-bit non-transparent bridged HT device ports (which can either be internally switched, such as with a crossbar switch, or implemented internally as a set of connected HT tunnels, where each non-transparent HT device is simply an HT bridge with a tunnel, and the tunnels are connected in series). In addition to the bridged HT device ports, the SLAVE PUMP 0 also implements an HT cave with its own functionality comprising logic, internal local memory, and input/output queues, all operating under the auspices of the cooperative pairing of MASTER CPU or SMP 0 and MASTER PUMP 0.
In a preferred embodiment, at least four of the five 16-bit non-transparent bridged HT device ports can be alternatively implemented as eight 8-bit non-transparent bridged HT device ports. In a preferred embodiment, any internal HT tunnels and paths within the SLAVE PUMPs 0 are of maximum width (as of this writing, the HT standard specifies a maximum width of 32 bits).
In a preferred embodiment, the MASTER PUMP 0 implements external interfaces comprising a 16-bit HT
tunnel pair, and at least one 16-bit bridged HT device port that can be configured as either a transparent or non-transparent bridge, depending on the intended use. In addition to the bridged HT device port, the MASTER PUMP 0 also implements an HT cave with its own functionality comprising logic, internal local memory, external local memory, and input/output queues, all operating under the auspices of its cooperative pairing with the MASTER CPU or SMP 0.

In a preferred embodiment, its adaptive transparent/non-transparent bridged HT
device port can be alternatively implemented as two 8-bit adaptive transparent/non-transparent bridged HT device ports. In a preferred embodiment, any internal HT tunnels and paths within the MASTER PUMP
0 are of maximum width (as of this writing, the HT standard specifies a maximum width of 32 bits).

In an alternative embodiment (relative to that depicted in Conceptual Interaction Diagram #1), the BOSS/PUMPO and MASTER/PUMPO pairings are implemented via a single CPU handling the BOSS &
MASTER functionality, and a single FPGA or Structured ASIC handling both their respective PUMP
functionalities. The SLAVE/PUMPO pairings are each implemented via a single CPU handling the SLAVE
functionality and a single FPGA or Structured ASIC handling the corresponding PUMP functionality.

Each processor (each is designated here as either a MASTER or SLAVE) has a relatively small, but dedicated, "local" memory (shown as a pair of red bars above) that is closely matched to the needs of the specific processor. For an Opteron processor, a pair of dual-channel DIMMs well-matched to the clock rate would be anticipated.
Each PUMP has nine (9) parallel 72-bit ECC-protected memory channels, typically implemented with commodity DIMMs that lie at the economic sweet spot of performance, capacity, and price. When possible, each PUMP supports at least a second bank, for a total of 18 DIMMs accessible 9 at a time. On each access, the 72 bits from each of the 9 DIMMs is corrected to 8 bytes (64 bits) of data per DIMM. One bit from each such byte is used to form a second 72-bit ECC word (1 bit/byte x 8 bytes/DIMM x 9 DIMMs = 72 bits), which is then corrected to a 64-bit data word, and this is done 8 times per access, yielding a total of eight (8) orthogonally error-corrected 64-bit words (64 bytes total) per access. Note that 64 bytes is both the size of an Opteron cache line and the minimum payload in an HT packet. The internal PUMP buffers must handle blocks of at least 648 bytes (8 accesses of 9 channels at 9 bytes per channel), netting out to 512 bytes of orthogonally error-corrected data.
Each PUMP can deliver a 64-byte cache line refill in a single memory access, using a single minimal HT
packet. A multi-line cache refill can require the same number of accesses as cache lines, which provides a 4x throughput improvement over a dual-channel memory configuration - plus another dimension of error correction - and the data still fits in a single HT packet. Since the processors mostly operate out of their own local dual-channel memory banks, there is a surplus of memory bandwidth, and this is used to support FPGA-based accelerators within the PUMP as well as accelerators connected to the PUMP (such as the ClearSpeed chips, e.g., CSX600, that connect via the ClearConnect Bus).
Each PUMP also has several banks of 72-bit ECC-protected NVRAM (shown above as an array of red dots), some of which are likely be implemented on stacked daughterboards attached to the CHARM module via stacking connectors (1 or 2 banks per stack). The appropriate buffer sizes depend on the typical currently available chips, which are currently limited to on the order of 128K x 8 each, so each bank would have a capacity on the order of 1 MB.

Because it need only be large enough for an appropriate "working set," each processor's local memory (shown as a pair of red bars above) can be implemented with components that trade off density for speed, while optimizing for the economic "sweet spot." From the perspective of a processor's memory controller and/or MMU, this local memory is the processor's "main memory" (or, in the case of the Opteron processor, a slice of it), and it can serve this purpose well as long as it is significantly larger than the largest cache supporting it.
From the system's viewpoint, however, such local memory is just another cache level, whereas the PUMP's memory fulfills the role of "main memory." In the case of coherent processor-to-PUMP HT links, each processor thinks the PUMP's memory belongs to a peer processor, so it can be accessed directly via HT bus requests. Otherwise, virtual memory page faults in a processor's local-but-small physical memory can normally be satisfied from the PUMP's large physical memory rather than from an actual disk (the PUMP can emulate a paging disk of arbitrary size, limited only by the overall memory and storage capacity of the entire distributed system).
The PUMP's NVRAM is a somewhat limited resource intended primarily for internal use by the PUMP logic to maintain metadata related to flash and disk-based storage, and to buffer critical data until it can be safely distributed and stored in a higher capacity distributed memory. For example, the FPGA-based associative memory algorithm logic (including the CHARM FASTpage logic) resides in each PUMP. In particular, the FASTpage logic uses NVRAM to persistently maintain the ternary search tree meta data and buffer updates to search tree memory pages before they're written to flash memory (flash is the primary storage media for persistent associative memory).

8.1 CHARM Concepts 8.1.1 CHARM Object Characteristics 8.1.1.1 Mutable vs. Immutable Objects One bit of a CHARM object id is used to determine whether or not an object is immutable (i.e., not mutable).
An immutable object is one whose content cannot be changed (for any reason).
Once an object becomes immutable it can never again be mutable. A mutuable object is essentially an incomplete work in progress -- once the work has been completed, the object becomes immutable. For example, transactional contributions from distributed processes building a search tree would all be targeted to the immutable object id of the tree being constructed (the targeted immutable object id would be known to the distributed processes).
Mutable object ids are NOT reusable. After an object has become immutable, the associated mutable object id is no longer valid (except in an audit context), and its use would be automatically detected as a security issue.

8.1.1.2 Transient vs. Persistent Objects One bit of a CHARM object id is used to determine whether or not an object is transient (i.e., not persistent).
In this context, transient means "temporary" and persistent means "permanent,"
and these terms may be used interchangeably.
Transient ids are used in two different contexts:
1. Whenever an object id is needed for an intermediate result that need not be (i.e., cannot be) persistently stored.
2. Whenever an object id is needed for a persistent (i.e., permanent) result that may or may not already exist but, in either case, is not yet known. Any number of transient ids may be mapped to the same persistent id.
Transient ids are ultimately reusable, but only after their previous use has been verifiably flushed from the system. Active transient ids are managed (issued and revoked) in large-ish blocks to minimize overhead.
8.1.1.3 Mutability vs. Persistence Legitimate uses existence for all four combinations of mutability and persistence, but there are constraints on where the associated objects may be stored, as indicated in the table below:
TIP M/I Description Uses and Where Stored 0 0 Transient Mutable Temporary objects (volatile RAM only) 0 1 Transient Immutable Aliased permanent objects (online NV only) 1 0 Persistent Mutable In-work permanent objects (online NV only) 1 1 Persistent Immutable Completed permanent objects (anywhere) Transient Mutable (T/P=O, M/1=0) objects are used only for temporary, discardable objects and can only be stored in volatile RAM (CHARM cannot allow them to be stored in persistent storage, including NVRAM).
For security reasons, all decrypted objects fall into this category (no object can be stored in the clear, ever) -which means that a first-pass deletion of all in-the-clear objects can occur instantaneously by de-powering (i.e., removing the power from) the volatile RAM where they're temporarily stored.

In a preferred embodiment, the memory is re-powered after a brief period and at least secondary and tertiary passes through the volatile memory occur, in order to write "white noise" patterns into the memory, in keeping with the security assumptions stated elsewhere (which anticipate the Transient Immutable (T/P=O, M/1=1) objects are really temporary aliases for Persistent Immutable (T/P=1, M/1=1) objects whose object ids are not yet (and might never be) known locally. Such objects are kept in online NVRAM rather than RAM, but cannot be stored in nearline storage. If the Persistent Immutable id becomes known locally, it subsumes every occurrence of the Transient Immutable id, which is immediately released. A Transient Immutable id may also be revoked without ever mapping it to a Persistent Immutable id. A bidirectional mapping is maintained such that given either the Transient Immutable object id or a Persistent Immutable object id, the other(s) can be determined, until such time as the Transient Immutable object id has been flushed from the system.
Persistent Mutable (T/P=1, M/1=0) objects are those which are destined to become immutable, but are not yet complete. Such objects are kept in online NVRAM rather than RAM, but cannot be stored in nearline storage.
Persistent Immutable (T/P=1, M/1=1) objects are those which are already complete and can never be changed. Each version of an externally supplied artifact, for example, has its own Persistent Immutable object id. Every Persistent Immutable object has a corresponding message digest that serves as its digital "fingerprint." A bidirectional mapping is maintained such that given either the Persistent Immutable object id or the message digest, the other can be determined.

8.1.2 Storage & Communications - Slices and Slivers Due to the thresholding and FEC scheme used, only "slices" and "slivers" of pre-compressed data are stored. As described below, a "slice" is a fraction of the original data, and a "sliver" is a fraction of a "slice."
Given an object or other data to be stored (or communicated), it can be divided up into k fractions of the whole, such as k packets, or k "slices." In a preferred embodiment, a cryptographic message digest (its dna_tag) is computed for the object to be stored (which may or may not be in the clear), and then the object is compressed and encrypted, at which point a second, outer" cryptographic message digest is computed for the encrypted result and concatenated to the encrypted result, which concatenation then becomes the basis for the FEC encoding process described below, by treating it as the "original data" and dividing it k ways, i.e., into k slices. Thus, reconstruction of this "original data" from its k slices (or their FEC-encoded siblings) does not actually make the original in-the-clear data available, but only a compressed, encrypted (and therefore still secure) isomorphism of it. Successful reconstruction of the still-encrypted data from its k slices can be verified only if the key used to generate the "outer" cryptographic message digest is known, along with the length of the digest.
Given a systematic (n,k) FEC code, the original k slices of data are used to encode up to n redundant "slices," any k of which is sufficient to reconstruct the original data. Since a systematic (n,k) code is used, the original k slices (which are included in the set of n slices) are also sufficient to reconstruct the original data.
In a preferred embodiment, no more than one slice of data from a single object is allowed to be stored on a particular device, precluding the loss of more than one of an object's slices due to a single device failure or theft. Note that multiple slices (each from a different object) may be stored on each device, without penalty.
In a preferred embodiment, any node or device entrusted with a slice can further encode the data for sub-distribution using an (n,k) FEC code, thereby encoding up to n'redundant "slivers," any k'of which is sufficient to reconstruct the original slice. Slivers are particularly useful for highly distributed "nearline"
storage, such as when data is distributed to non-trusted devices (PC-based storage, servers, or any SERVANT, or device running or emulating a SERVANT).
In a preferred embodiment, no more than one sliver of data from a single object is allowed to be stored in a single data-handling unit (e.g., a disk sector, file, or record, etc.), precluding the loss of more than one of an object's slivers due to localized hard or soft failures (e.g., bad disk sector, corrupted file, etc.).
The number of slices or slivers that can be co-located in the same facility and/or within a geographic region is constrained to be less than some policy-specified threshold. Thus, the capacity required on a particular device, or at a particular facility, is only a fraction of what would be required to store an arbitrary object.
As a performance enhancement (but at the risk of decreased survivability, depending on the exact configuration and policy-specified thresholds), SHADOWS nearline activities can include co-locating slices of possibility that an attacker has significant state-sponsored resources for accessing high-value assets). More passes may be specified in the applicable security policy.

new versions of objects with slices of recent versions of the same objects, by combining them into volumes or clusters. When a particular object is recalled from storage, all or part of the corresponding sliver volume can be retrieved all at once, it order to reduced latency, especially in the case where the versions are related through reversible delta-compression operations.
Note: As a consequence of its FEC-based slicing and associated constraints, SHADOWS storage needs no separate defragging procedures.

8.1.3 CHARM - FEC Pseudo-Random Ordinals (PRO) Encoding Concept In a preferred embodiment, CHARM uses FEC (forward error correction), and specifically systematic codes such Reed-Solomon (RS), Cauchy Reed-Solomon (CRS), and/or others, with encoders and decoders implemented in software and/or hardware. A key property of the systematic (n,k) codes in CHARM is that, given a set of n redundantly "encoded" data packets (or other chunks of data), any k of them are sufficient to decode and thereby reconstruct the original data (k < n).
If a Reed-Solomon code variant (or similar) is used, where n packets are generated and any k of them can enable reconstruction of the original data, then the redundancy can be given as r = (n-k), and the code can support up to r"erasures" (missing packets where it is known which packets are missing), or r/2 errors (where some packets are in error, but which ones they are is unknown). In CHARM, storage-based data and data-in-transit tend to require only erasure correction (because there are other means for identifying missing or corrupted packets), whereas volatile memory-based data tend to require full error detection.
In CHARM, FEC is applied in multiple dimensions, with different parameters, as appropriate to the purpose and nature of the data, which may differ, for example, among the various contexts associated with local vs.
distributed data, transient vs. persistent data, mutable vs. immutable data, data in transit vs. data at rest, public data vs. private data, etc. Other considerations and requirements may apply as well, such as requisite levels of security, integrity, availability, persistence, survivability, etc.
In general, the higher the required levels of redundancy for a particular context, the larger the value of n is likely to be, relatively speaking. Also, the higher the required levels of security for a particular context, the larger the value of k is likely to be, relative to a particular value of n, where (k < n).
When using the known Luigi Rizzo FEC algorithms with 8-bit wide symbols (i.e., a Galois Field GF(p) where p=2 and w=8, as described by Rizzo in 1997, which implies a maximum of n=p"' =
28 = 256), up to n=256 packets can be generated for any given chunk of data in such a way that any k packets out of the 256 is sufficient to reconstruct the original data.
Because Rizzo's algorithm is based on a systematic code, the first k packets of the n=256 maximum can be directly aggregated to represent the original data without decoding. The remaining 256-k packets (or n-k packets, in general) contain redundant data in accordance with the selected FEC algorithm. If any of the first k packets are missing, any packets from the remaining 256-k may be substituted, but FEC decoding must occur in order to reconstruct the original data.
For decoding efficiency, it is preferable to use only the original k packets, if available, thereby completely avoiding the FEC decoding algorithm. However, left as-is, this would reduce security somewhat, because data could be reconstructed with less effort by an attacker (by not adding FEC
decoding to the cryptographic burden).
The Scrutiny Pseudo-Random Ordinals (PRO) encoding concept is to pseudo-randomly distribute the original k packets among the n=256 packets. A key associated with (but, in a preferred embodiment, distinct from) the data's underlying security key can be used to seed a PRNG. When the key is known, the ordinal positions of the k original packets can be determined directly, allowing aggregation without the overhead of FEC decoding (assuming all the original k packets are available).
The same technique can be applied to group the remaining 256-k packets according to accessibility (locale, storage level, etc.). For example, the next most-accessible m packets can be distributed among the remaining 256-k packets in exactly the same way.
Note that the PRNG sequence-generation algorithm must automatically skip over (and not emit) duplicate ordinals within a given sequence, which implies that memory must be associated with sequence generation (in order to keep track of which ordinals have been emitted). A Bloom filter would be excellent in this respect, since it is both compact and relatively opaque (which aids its resistance to analysis and attack), yet the lack of a particular entry corresponding to a candidate ordinal is clear evidence that the candidate is not a duplicate of a previously occurring ordinal, The presence of a matching entry, however, is an indication that the candidate "might" be a duplicate of an ordinal that has occurred already, so the candidate can simply be skipped in favor of a new candidate.

8.1.4 CHARM- Representation of Infinite Precision Floating Point Numbers In the CHARM implementation of infinite precision floating point numbers, each number is a transreal value, and thus, in addition to the set of integers and real numbers, also includes +/- infinity and NULL. In a preferred embodiment, the implementation is further enhanced to include a small set of signed, infinite precision, symbolic constants such as it (pi) and e, along with a small set of others, plus a means for referring to an extended set of mathematical and/or scientific constants (typically irrational numbers ) which are generally known in the art.
In a preferred embodiment, the CHARM infinite precision floating point representation requires variable length word size on 1-byte boundaries, with at least an 8-bit word (1 byte), which may be represented as having bits numbered from 0 to 7, left to right. The first bit is the sign bit, 'S', the second bit is the exponent extension flag, 'e', the third bit is the exponent bit, 'E', the fourth bit is the fraction extension flag, 'f, and the final four bits are the fraction bits, 'FFFF:
S e E f FFFF

When e=0, no exponent extension is required (i.e., there are no additional exponent bytes).
When e=1, the next byte begins a field of 1 or more exponent extension bytes (signed LEB128 format).
When f=0, no fraction extension is required (i.e., no additional fraction bytes).
When f=1, the byte after the exponent extension, if any, begins a field of 1 or more fraction extension bytes (unsigned LEB128 format).
When e=0 and E=0, the value of the exponent is 0 (i.e., the fraction is also unnormalized and contains an integer value); a single byte may contain the signed values 0 to 15 (the sign bit applies).

Numbers which can only be written as a never-ending, non-repeating decimal fraction. Irrational numbers cannot be written in the form of a fraction where the numerator and denominator are both integers, whole numbers.

When a=0 and E=1 (E_MAX) and f=0, the value of the exponent is 0 and the fraction represents special values (the sign bit S differentiates all values, and all values except 0 and 1 are signed):
FFFF Field Special Value 0 NULL or Nullity (unsigned) when S=O, and Rational_Number_Indicator when S=1 (ratio of integers to follow) I RESERVED for OID_Indicator when S=0 (OlDt to follow), and for BLOB_Indicator when S=1 (BLOB descriptors & content to follow) 2 RESERVED for Extended_Constant_indicator (extended constant code to follow) 4 Euler (gamma) = 0.57721566490153286060651209008240243104215933593992...
log(2)= 0.69314718055994530941723212145817656807550013436025...
6 log (pi) = 1.14472988584940017414342735135305871164729481291531...
7 sgroot(2)= 1.41421356237309504880168872420969807856967187537694...
8 sgroot(e)= 1.64872127070012814684865078781416357165377610071014...
9 sgroot(pi)=1.77245385090551602729816748334114518279754945612238...
log (10) = 2.30258509299404568401799145468436420760110148862877...
11 e = 2.71828182845904523536028747135266249775724709369995...
12 pi= 3.1415926535897932384626433832795028841971693993751...
13 e^e =15.15426224147926418976043027262991190552854853685613...
14 e A pi=23.14069263277926900572908636794854738026610624260021 ...
INFINITY (signed) When e=1 in the first byte, then one or more of the following bytes contribute to the exponent, where the first bit of each such byte indicates whether additional bytes are required, and the other 7 bits of the byte are appended to the right of previous exponent bits (beginning with the 'E' bit in the first byte).
Likewise, when f=1 in the first byte, then one or more of the bytes following the exponent extension bytes, if any, contribute to the fraction, where the first bit of each such byte indicates whether additional bytes are required, and the other 7 bits of the byte are appended to the right of previous fraction bits (beginning with the four 'F' bits in the first byte).
S e E f FFFF (eEEEEEEE)(eEEEEEEE) ... (eEEEEEEE) (f FFFFFFF)(f FFFFFFF) ... (f FFFFFFF) 8.1.4.1 Example Determinations of Represented Values The value 'V' represented by the word may be determined as follows:
If e=0 and E=1 and F is 0, and S is 0, then V=Nullity'.
If e=0 and E=1 and F is 0, and S is 1, then V=(the ratio of integers specified by the following number pair) If e=0 and E=1 and F is 1, and S is 0, then V= (the value of the dereferenced OID) If e=0 and E=1 and F is 1, and S is 1, then V= (the value of the dereferenced BLOB descriptor and content) If e=0 and E=1 and F is 2, and S is 0, then V= (+) (the extended constant indexed by the following integer) If e=0 and E=1 and F is 2, and S is 1, then V= (-) (the extended constant indexed by the following integer) If e=0 and E=1 and F is 3, then V= (TBD) [Note: This value of F is RESERVED]
If e=0 and E=1 and F is 4, and S is 0, then V= (+) (Euler's Constant =
0.577215664901532...) If e=0 and E=1 and F is 4, and S is 1, then V= (-) (Euler's Constant =
0.577215664901532...) ... (and so on) If e=0 and E=1 and F is 11, and S is 0, then V= (+) e (2.718281828459045235360...) Ife=O and E=1 and F is 11, and S is 1, then V= (-) e (2.718281828459045235360...) Rational numbers can be represented by a ratio of integers (i.e., a pair of numbers corresponding to a numerator and a denominator). The Rational _Number Indicator byte indicates that a pair of numbers follows, each of'which is a variable-length signed integer, t An OID is a variable-length value in a format somewhat similar to the numeric format. The F (exponent) field is replace by a T (type) field, and the F (fractional) field is replaced by an I (identifier) field. Specific field values are also different. In the case of a Perspex matrix, the OID can be dereferenced, resulting in a value compatible with the expected numeric value.
+ A BLOB descriptor is a variable-length value in a format somewhat similar to the numeric format (followed by the described content). The F
(exponent) field is replace by a T (type) field, and the F (fractional) field is replaced by a C (content) field. Specific field values are also different.
In the case of a Perspex matrix, the BLOB content can be decoded resulting in a value compatible with the expected numeric value.
B Nullity is unsigned (minus Nullity cannot occur).
"Nullity" is something like NaN, or "Not a number", but better defined -essentially a NULL value for numbers.

If e=0 and E=1 and F is 12, and S is 0, then V= (+) pi (3.14159265358979323846...) If e=0 and E=1 and F is 12, and S is 1, then V= (-) pi (3.14159265358979323846...) ... (and so on) If e=0 and E=1 and F is F_MAX, and S is 0, then V= (+) Infinity If e=0 and E=1 and F is F_MAX, and S is 1, then V= (-) Infinity If e>O then V= (-1)**S * 2 -- (E) * (1.F) where "1. F" is intended to represent the binary number created by prefixing F
with an implicit leading 1 and a binary point.
If a=0 and E=0 and F is nonzero, then V= (-1)**S * F
If e=0 and E=0 and F is ZERO, then V= 0 (regardless of S, which is always normalized by masking it to 0 also) 8.1.4.2 Comparison to IEEE Double Precision Floating Point Unlike the CHARM floating point format, the IEEE double precision floating point standard representation requires a 64-bit word, which may be represented as numbered from 0 to 63, left to right. The first bit is the sign bit, S, the next eleven bits are the exponent bits, 'E', and the final 52 bits are the fraction 'F':
S EEEEEEEEEEE FFFFFFFFFFFFFFFFF'FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF

The value V represented by the word may be determined as follows:
If E=2047 and F is nonzero, then V=NaN ("Not a number") If E=2047 and F is zero and S is 1, then V=-Infinity If E=2047 and F is zero and S is 0, then V=Infinity If 0<E<2047 then V=(-1)**S * 2 ** (E-1023) * (1.F) where "11" is intended to represent the binary number created by prefixing F with an implicit leading 1 and a binary point.
If E=0 and F is nonzero, then V=(-1)**S * 2 ** (-1022) * (05) These are "unnormalized" values.
If E=0 and F is zero and S is 1, then V=-0 If E=0 and F is zero and S is 0, then V=0 Whereas frequently occurring small values (e.g., -15 to +15) require only 1 byte in the CHARM format, 8 bytes are required in the IEEE double-precision floating point format (4 bytes are required in the IEEE single-precision format, which is otherwise not discussed here). Furthermore, whereas larger or less frequent values occupy only as many bytes as required for their actual representation in the CHARM format, but as many bytes as are needed can be used (including the representation of rational numbers), the IEEE floating point format is fixed at 8 bytes, regardless of whether 8 bytes is too many, or not enough.

8.1.4.3 CHARM Implementation of Perspex Transreal Values CHARM, and particularly, PUMP and CORE, can include Perspex processing within its native capabilities. A
Perspex matrix is a matrix of 16 numbers arranged in 4 rows and 4 columns that can be handled as a single operand, with a minimum of 16 bytes (1 byte per number).
In the CHARM (CORE) implementation of Perspex, each number in a Perspex matrix is a transreal value, and thus, in addition to the set of integers and real numbers, must include +/-infinity and nullity (NULL in CHARM). The CHARM implementation is further enhanced to include a small set of signed, infinite precision, symbolic constants such as it (pi) and e, along with a small set of others, plus a means for referring to an extended set of constants. The aforementioned math constants are generally known in the art.
8.1.5 CHARM - Word and Phrase Tables 8.1.5.1 CENTRAL CONCEPT
In a preferred embodiment, a unique code value is assigned to every unique textual word occurring within selected lexicons anywhere in the system, and the code assigned is determined by word length and frequency of occurrence.
In a preferred embodiment, word phrases are similarly assigned a code value.
In a preferred embodiment, word codes and phrase codes are used to maximize internal compression and throughput.

8.1.5.2 BASIC CONCEPTS
1. Lowercase is used as the canonical form of every word.
2. Every word has an unsigned LEB128 code, where longer codes are used for lower frequencies.
3. One-byte codes are reserved for ASCII 0 to 127.
4. The first (16,384 - 128) two-byte codes are reserved for the most frequently occurring lowercase dictionary words that are three characters or longer.
5. The remaining 16,384 two-byte codes are reserved for the most frequently occurring case-variations (as encountered) of dictionary words that are three characters or longer and already encoded.
6. The first 128 three-byte codes are reserved for ASCII 128 to 255.
7. The next 1,048,512 three-byte codes are reserved for the most frequently occurring lowercase dictionary and corpus words that are three characters or longer and not already encoded.
8. The remaining 1,048,512 three-byte codes are reserved for the most frequently occurring but not yet encoded case-variations (as encountered) of dictionary and corpus words that are three characters or longer and already encoded.
9. The first half of the 268,435,456 four-byte codes, the first half of the 34,359,738,368 five-byte codes, and the first half of each distinct length of the successive multi-byte codes are reserved for additional lowercase words in arbitrary order (as they're encountered, for instance).
10. The second half of the 268,435,456 four-byte codes, the second half of the 34,359,738,368 five-byte codes, and the second half of each distinct length of the successive multi-byte codes are reserved for the most frequently occurring but not yet encoded case-variations (as encountered) of dictionary and corpus words that are three characters or longer and already encoded.
11. No database entries are required for ASCII values 0 to 255.
12. Canonical entries (i.e., lowercase words) are defined by their code, the associated text string, and a list of non-canonical entries.
13. Non-canonical entries (i.e., mixed-case and uppercase words) are defined by their code, the code of the associated canonical entry, and a variable-length bit pattern (LEB128) that defines which characters need to be uppercase (one case bit per text character).

8.2 PUMP -- Parallel Universal Memory Processor See (also) glossary entries for : CHARM, Delta Compression, PUMP, RLE.
8.2.1 Overview The figure above depicts a specific dual-switching-fabric implementation of PEERS, as would be contained within 1 lobe from I quadrant of a SCRAM node (there are 4 such lobes in each quadrant). The MASTER
and BOSS blocks depicted here are intended only to indicate PEERS
connectivity, and are external to PEERS itself.
This particular implementation assumes the use of a processor configuration including at least 2 external HyperTransport interfaces @ available (1 for each of the 2 independent switch fabrics). Any AMD Opteron processor trivially meets this requirement. However, a nearly identical configuration can be achieved with a single external HyperTransport interface 0 (meaning that any processor with a HyperTransport interface may do, not just an AMD Opteron), as long as the HyperTransport-bridge chips selected (e.g., the nVidia chips 0 in the implementation above) implement a HyperTransport tunnel (dual bi-directional interfaces on the bridge chip, like 0 and 0, so they can be daisy-chained). In this example, a pair of nVidia n3600 chips are already daisy-chained for each fabric, so a processor with only 1 HyperTransport link would require the same number of chips, but with all 4 of them (2 pairs) in the same daisy chain.
Many variants of this PEERS implementation are possible (including the use of entirely different components, and alternative connections and numbers of connections at its external interfaces), but there are a number of distinctive features that are common to an acceptable PEERS implementation, and these are noted in section 8.2.2.
Logically, each lobe has a PEERS switching & routing fabric, but in a preferred embodiment there are actually several redundant fabrics working together in an active/active configuration. In a preferred embodiment, each lobe has at least a dual fabric working in conjunction with other lobes and quadrants that also have at least a dual fabric.

8.2.2 Principle of Operation There are at least 2 independent PEERS switching fabrics servicing a particular Lobe's processor/PUMP
configuration. In a preferred embodiment, HyperTransport is used to connect the processor/PUMP
configuration to each PEERS fabric, which is primarily PCI Express (PCIe) to simplify off-board switching and routing of input/output (10), so the primary connection to PEERS is via one or more HyperTransport Bridge/Tunnel interface chips 0, which connect directly to the processor(s) 0 and PUMP(s) 0. Each HyperTransport interface is bidirectional and also double-ended (allowing configuration and control from either end).
In the configuration above, a representative pair of nVidia n3600 chips 0 is daisy-chained to achieve a sufficient number of bridged HyperTransport-to-PCIe connections (56 lanes) for each fabric. The particular chip pair also includes additional 10 (12 SATA lanes, 4 GBE ports, and 20 USB
ports) that improve the economics of the system by being available "for free" and thereby eliminating the need for some chips that may otherwise be needed.
The 12 SATA lanes from each fabric (24 lanes total) are distributed (not shown here) to the 8 Outrigger Blade positions as follows: 4 distinguished blade positions each receive a set of 4 lanes (i.e., from each of 4 lobes, for a total of 16 lanes per distinguished blade position), and all 8 blade positions each receive 1 lane (i.e., from each of 4 lobes, for a total of 4 lanes in any blade position).
This supports a low-cost configuration with 8 JBOD ("just-a-bunch-of-disk") storage blades where all 8 blades can each have 4 high-capacity full-size (3.5") SATA drives, or alternatively, 4 can have 4 full-size SATA drives and 4 can have 16 small-form-factor (2.5") SATA drives.
Each of the 4 GBE ports is connected to a separate 5-port GBE switch chip (not shown), with I such chip for each of the 4 lobes in a quadrant, leaving 1 unused switch port which is then provided as an external port.
This allows all 4 lobes in a quadrant to invisibly monitor and share the load for each external GBE port, and each quadrant externalizes 4 such ports (1 per lobe).

The 20 USB ports on each pair of nVidia n3600 chips 0 are distributed to a set of 20 internal USB
connectors that provide low-latency access to (cheap) flash-based storage. Up to 4 PCIe/PCI-X bridges 0 (e.g., PEX 8114) account for 8 PCIe lanes from the pair of nVidia n3600 chips 0 in each fabric.
Each of the bridges 0 supports 4 NEC pPD720101 USB 2.0 host controllers 0, each of which provides a root hub with 5 downstream-facing ports, for a total of 80 additional USB
ports per fabric, or 160 additional USB ports per lobe. For modularity, each set of 80 additional USB ports would typically be on a separate PCB would could be optionally omitted from a particular configuration (say, to achieve cost savings).
By default, the CHARM technology (discussed elsewhere) stores only fractional, compressed, encrypted, FEC-encoded data on each flash drive and disk drive, using multiple lobes and quadrants (plus external nodes) to distribute the information.
In each fabric of a preferred embodiment, 16 PCIe lanes from the pair of nVidia n3600 chips 0 (8 from each chip) are connected to a PCIe switch 0 (e.g., PEX 8548) whose primary purpose is to provide an 8-lane communications path to each of the 3 other lobes in the same quadrant that the lobe under discussion is in.
A reserve of 8 additional PCIe lanes is provided and may be flexibly configured as needed using 1 to 4 ports (e.g., 8x1, 4x2, 4x1 +2x2, 2x4, 4x1 +1x3, 1x4, etc.). There is 1 such PCIe switch 0 for each of the lobe's switch fabrics.
In each fabric of a preferred embodiment, 16 PCIe lanes from 1 chip in the pair of nVidia n3600 chips 0 (i.e., all 16 lanes from 1 chip) are connected to a PCIe switch 0 (e.g., PEX 8548) whose purpose is to provide a 4-lane communications path to the 4 other lobes in each of the 2 other quadrants (for a total of 8 lobes).
There is 1 such PCIe switch 0 for each of the lobe's switch fabrics.
In each fabric of a preferred embodiment, 16 bidirectional PCIe lanes from 1 chip in the pair of nVidia n3600 chips 0 (i.e., all 16 lanes from 1 chip) are connected to a PCIe switch 0 (e.g., PEX 8548) whose purpose is to provide a 4-lane communications path to each of the 8 Outrigger Blades that share the same quadrant the as the lobe under discussion. There is 1 such PCIe switch 0 for each of the lobe's switch fabrics. In this preferred embodiment, then, each of the 8 Outrigger Blades is therefore directly connected to each of the 4 lobes in the same quadrant via a 4-lane communications path on each of the lobe's switch fabrics. Given just 2 switch fabrics as depicted above, this means each blade has 16 lanes (4 lanes to each lobe) on each of 2 fabrics, for a total of 32 bidirectional lanes. At this writing, readily available parts allow for a rate of 2.5 Gbps in each direction, per lane, aggregating to a 32-lane total of 80 Gbps (8 GB/second) in each direction, per Outrigger Blade. In the relatively near term, this can double to a per-blade total of 160 Gbps (16 GB/second) each way as the newest PCIe standard is implemented and deployed, and this embodiment may likely enjoy further speed increases over time. (Note that the per-lane PCIe rates described in this paragraph also apply everywhere else that PCIe is named).
In a preferred embodiment, HyperTransport is used to connect the processor/PUMP configuration 0 and in each lobe to at least 1 CRAY Seastar (or Seastar2, etc.) chip 0, which can be used to interconnect any number of SCRAM nodes with each other, and/or with other CRAY systems. In the case of one or more SCRAM nodes acting in the role of an intelligent storage server to a much larger CRAY supercomputer, a minimum of 4 SeaStar-based interfaces is available per quadrant, and each one offers 6 high-speed links with a sustained bidirectional throughput of 6 GB/second (by comparison, a fast FC link - such as a SAN
interface - is 4 Gbps, which is more than an order of magnitude slower).

8.2.3 Inter-Quadrant Connectivity 8.2.4 Inter-Lobe Connectivity The 4 interfaces labeled A,B,C,D on each of the Quadrant 1 modules (depicted above) are available with COTS hardware, on a single-chip system I/O controller, such as the Broadcom HT-

21 00. In the preferred embodiment, at least four independent switch fabrics are desired, with 4 or more aggregated links to each fabric. Although fewer or more links and fabrics can be used, a minimum of 4 provides robustness and enables graceful degradation in the case of hardware faults or failures. In the figure above each line depicted between the blocks represents either 4 or 8 aggregated links.
For the purposes of both incremental cost scalability and graceful degradation, in the preferred embodiment, the design is intentionally partitioned in such a way that adding modules to the system also increases the capacity for communication among the modules. This is accomplished by ensuring that the necessary fraction of the total capacity is directly available on each module, in contrast with the usual practice of placing the switching fabric on its own modules. Although the wiring of the switch fabric is more complex (especially between the "North" and "South" switches) than would be otherwise necessary, in the preferred embodiment the complexity can be relegated to an entirely passive wiring harness, flex circuit, or PCB.
The HT-2100 has 24 PCIe links with support for up to 5 PCIe controllers, and can thus offer up to 5 independent ports, of which only 4 are needed in the preferred embodiment. Two controllers would each aggregate 8 PCIe links and two would each aggregate 4 PCIe links, using a total of 24 links (out of 24 possible), but only 4 of the 5 available controllers.
The 4 interfaces labeled A,B,C,D above correspond to the 4 PCIe controllers to be used (such as 4 of the 5 available on an HT-2100), and would each comprise either 4 or 8 aggregated PCIe links. Each of the 4 controllers would be connected to a different switch fabric chip (all 4 or 8 of the links assigned to a particular PCIe controller would connect to the same switch fabric chip).
The HT-2100 has two HyperTransport ports (16x) with an integrated tunnel. In a preferred embodiment, the HT-2100 would be interposed on the HyperTransport interface between two Scrutiny PUMP devices, or alternatively, between a MASTER and a PUMP, or between two MASTERs.

In an alternate embodiment, there are a multiplicity of fabrics interconnecting the lobes and quadrants, and for each fabric there are 4 interfaces labeled A,B,C,D on each of the Quadrant I lobes (depicted above, and likewise for the lobes not depicted) that can be implemented via a single system I/O controller chip, such as the Broadcom HT-2000.
The number of interfaces (and lanes per interface) is highly dependent up the particular combination of HyperTransport-to-PCIe bridge chips used. As an example, the HT-2000 has two HyperTransport ports (1 6x and 8x) with an integrated tunnel. Given a homogeneous combination of HT-2000 chips, for example, its 16x HyperTransport port would be connected upstream to the processor array, and downstream to an optional HyperTransport device (not shown). The HT-2000 also has 17 PCIe lanes with support for up to 4 controllers. In this example, the 4 controllers would each aggregate 4 PCIe lanes, for a total of 16 lanes used. Thus, the 4 interfaces labeled A,B,C,D above would have its own PCIe controller and each interface would comprise 4 aggregated PCIe lanes. Each of the 4 PCIe controllers would be connected to a different PCIe switch fabric chip (each set of 4 lanes would connect to the same switch fabric chip), as depicted.

8.3 FLAMERouter - Firewall, Link-Aggregator/Multiplexer & Edge Router 8.4 FIRE - Fast Index & Repository Emulator Read Performance: 640 drives/quadrant @ 25 MBps (assuming somewhat better-than-average USB flash drives of any storage capacity) = 16 GBps (128 Gbps) per quadrant, or 384 Gbps sustained throughput per 3-quadrant chassis (6.1 Tbps per 16-chassis system). The performance constraint here is each individual USB flash drive and link, not the aggregate bandwidths along the other communications paths.
Write Performance: 640 drives/quadrant @ 10 MBps (minimum) = 6.4 GBps (49 Gbps) per quadrant, or 147 Gbps sustained throughput per chassis (2.3 Tbps per 16-chassis system).
The write performance with a "better" device would be approximately 50% higher, at 9.6 GBps minimum per quadrant (based on 15 MBps typical, per drive).
IOPS Performance @ 4KB: Assuming 4KB per 10 and a conservative 10 MBps per drive for both read &
write, each drive would be capable of about 2500 TOPS. 640 drives/quadrant @
2500 TOPS/drive = 1.6 million TOPS per quadrant or over 4.8 million IOPS per 3-quadrant chassis.
These rates compare favorably with the 7680 TOPS per quadrant achievable with high-performance SAS
drives, which are 166x slower.
LOPS Performance @ 1 KB: When using smaller I/O sizes the number of TOPS
increases accordingly. For example, a normal SHADOWS FASTpage index access is 1 KB, not 4KB, so the number of TOPS per quadrant is increased by 4X, to 6.4 million TOPS per quadrant. Furthermore, given a "better" USB flash (say, for example, 23 MBps vs. 10 MBps, assuming 80% reads and 20% writes), this could improve further by a factor of 2.3X, to more than 14 million LOPS per quadrant. (Note that some of the best drives claims speeds of up to 34 MBps read and 21 MBps write).
Note that these approximations of I/O rates are tied to the read/write access speeds of the USB flash drives, regardless of the storage capacity of the drives. Thus, small, cheap flash drives can be very effective for improving throughput. Since the system was designed to accommodate a very large number of flash drives, high capacities can be achieved by aggregating many small, cheap drives.
Affordability: A single quadrant fully populated with 640 cheap 2GB flash drives ($15 each) would yield more than 1.2 TB of flash memory performing 1.6 million TOPS for under $10,000. If high-performance 4GB drives were used instead (about $30 each), a single quadrant would yield 2.5 TB of flash for under $20,000. However, the better drives are also at least twice as fast (23 MBps vs. 10 MBps, assuming 80%
reads and 20% writes), so the performance would double to 3.2 million TOPS for the same $20,000.
Although this slide contrasts the capacities achievable using 2GB and 4GB USB
flash drives (yielding 1.2 TB
and 2.5 TB of storage, respectively, per quadrant), these are by no means the maximum capacities. Rather, these capacities represent the "sweet spot" where useful capacity is available at a reasonable price. As 8GB
flash drives move into the sweet spot, the increased flash-based total capacity is enough to match the total capacity available when using high-performance SAS 64 GB hard disk drives (see previous slide). However, in terms of TOPS, the flash drives are orders or magnitude faster. Even today, however, flash drives are available in capacities up to 64 GB per drive, and these would increase the flash-based storage capacity to more than 40 TB per quadrant, and more than 120 TB for a 3-quadrant configuration.

8.5 NEAR - Nearline Emulation & Archival Repository 8.5.1 CENTRAL CONCEPT
Nominally (excluding internal self-maintenance activities), only one minimally redundant logical copy of the nearline data is normally spun up anywhere in the system; extra redundant storage is spun down. In a preferred embodiment, the duty cycle approaches kin, where the storage is FEC-encoded with an (n,k) erasure code is used to establish redundancy, and where the survival of any k of the n fragments is sufficient to guarantee successful retrieval. Nearline storage is maintained on a spun-down basis.

8.5.2 BASIC CONCEPTS
1. In a preferred embodiment, SHADOWS NEARdrives are used only for nearline storage, and contain only immutable objects (and more precisely, only fragments of objects).
2. Storage drives are preferably dual-ported, or connect to a dual-ported multiplexer.
3. Drives are usually spun down, especially those used by SHADOWS. After a drive is no longer needed for the current session (i.e., after hand-off has occurred, but prior to actual spin-down), the drive's SMART data is analyzed to characterize its failure potential, and then partly based on the results, the drive is tested further, maintained, and possibly repaired, under automated control.
Afterward, it is spun down. Additional detail on this process appears in a later section.
4. Two local physical drives are distinguished as the active drives. Each user's space is available as a compressed, secure (encrypted) virtual volumes on these two drives. The two drives are "owned"
and managed by separate MASTERs on separate system boards.
5. Active user space is always duplexed/mirrored across the two active drives, which optionally spin down when idle (in a preferred embodiment, this is the default behavior), but with a fairly long delay period (modified dynamically by predictive heuristics that track user behavior).
6. Only a portion of any drive is available as active space (say, for example, 200 GB out of 300 GB
total).
7. All data in user space is also pushed to SHADOWS in accordance with the SLA
and policy, preferences, etc.
8. In the event of an active drive failure, one of the non-active drives is immediately spun up and synchronized.
9. All writes intended for non-active drives are queued onto the active drives. When a safe (i.e., adequately short, but not too frequent) delay has elapsed for a particular non-active drive, it is spun up and synchronized with the active drives (which are then cleared of queued data).
10. Non-active drives are spun up occasionally for data retrieval, if the information cannot be retrieved from the two active drives.
11. The active drives cache mutually exclusive entries (objects) that already exist on the non-active drives, In a preferred embodiment, this is complemented by the presence of flash-based (or equivalent) caching of high-demand objects, also on a mutually exclusive basis. Mutual exclusivity maximizes the caching effect - preventing redundant caching enables more objects to be cached in a given amount of space.
12. The active drives redundantly cache all objects that do not exist on the non-active drives. Such objects are also cached in higher levels of memory and storage, at least until safely distributed and stored in accordance with storage policy (and any applicable SLAs).
13. Periodically (and with a fairly long period), the active drives are rotated (one at a time), so that, except initially, there is always an old one and a new one. The long period minimizes start/stop cycles, and the rotation levels the wear and MTTF.
14. If a system board fails, its buddy assumes its role and spins up one of its non-active drives (making it active), then queues its buddy's data to its own newly active drive. Although it could potentially write directly to its buddy's drive (if dual-ported), the point is that its buddy might not really be dead, and conflict is to be avoided. A buddy's drives can be read-accessed at any time, however, if necessary.
15. After further analysis determines that a buddy has truly failed (and this requires voting, etc., and coordination via BOSS), a system board can indeed take over the buddy's drives, after consensus (under the auspices of BOSS) has put the buddy into a state such that it "can do no harm."
16. During high-performance activities, such as when the nearline storage system is needed to behave like an "online" storage system, more (perhaps all) drives may be spun up (but staggered to reduce inrush current) in order to obtain an amplified striping effect.

8.5.3 NEARdrive - Preferred Embodiments Three primary preferred NEARdrive embodiments are envisioned as most useful, a small, vacuum-sealed steel can, a "full-sized" storage "blade" or module, and a "Mini" version that has approximately the same dimensions as a single full-sized 3.5-inch disk drive.
Embodiment Description NEARdrive CanTMA SAS and/or SATA interface(s), with 2" drives (2<=n<=3) NEARdrive BladeTM Multi-fabric PCIe interface, with 2" drives (n>=2, but typically n=4) NEARdrive MiniTM SAS and/or SATA interface(s), with 2" drives (2<=n<=3) 8.5.3.1 NEARdrive CanTM
In a preferred embodiment, NEARdrive Can implementations for the following storage configurations would be typical:
Configuration Description 4-drive, Low cost 4 matching 2.5" SAS or SATA drives, any capacity 6-drive, Hybrid Same as above, plus 2 matching 1.8" SATA drives 8-drive (double-height) 8 matching 2.5" SAS or SATA drives, any capacity 12-drive, Hybrid Same as above, plus 4 matching 1.8" SATA drives Each NEARdrive Can has the same form factor, with the approximate dimensions of 3.25" diameter x 6"H
(12"H for double-height can). The standard height can is sufficient to accommodate at least 4 small form factor (SFF) drives (2.5" or smaller), plus 2 smaller 1.8" drives (optional).
In a preferred embodiment, each NEARdrive Can communications interface comprises a single 12-lane data connector that can support any combination of single-ported or dual-ported drives (up to 300 Gbps each in the current state of the practice, but this is not limited by the invention) requiring 12 ports or less. The number of lanes supported is arbitrary and can be reduced or increased as necessary, of course. In the preferred embodiment, one of the goals is to maximize the data throughput and TOPS during peak periods, which requires a minimum of 1 lane per physical drive (2 for dual-ported drives).
In an alternate embodiment that minimizes the host interface requirements, the number of connector lanes is reduced to 1 or 2, and 1 or 2 multiplexers are embedded inside the NEARdrive Can. The multiplexers allow switching between the contained disk drives based on software control mechanisms (SATA or SAS
protocols).
Each NEARdrive Can requires an upstream host adapter channel for each lane in the data connector.
Each NEARdrive Blade may optionally configured to be "intelligent," with its own NEARdrive controller and switching logic, which case it has its own local MASTER (and possibly includes CHARM processing logic), or it may be "switched only," in which case it operates under the control of a nearby MASTER.

8.5.3.2 NEARdrive BladeTM
In a preferred embodiment, NEARdrive Blade implementations for the following storage configurations would be typical:
Configuration Description 4-drive, Low cost 4 matching full-size (3.5 inch) SAS or SATA drives, any capacity 4-drive, Hybrid 4 matching full-size (3.5 inch) NEARdrive Mini drives, any capacity 16-drive, Typical 16 matching 2.5 inch SAS or SATA drives 16-drive, Hybrid 16 matching 2.5 inch drives (4 SAS, 12 SATA) Each NEARdrive Blade has the same form factor, with the approximate dimensions of 7"H x 2.5"W (thick) x 9"D. This is sufficient to accommodate up to four full-sized (3.5 inch) drives or at least 16 small form factor (SFF) drives (2.5" or smaller).
Sufficient ports are available so that the drives on each NEARdrive Blade may be single-ported or dual-ported (up to 300 Gbps each in the current state of the practice, but this is not limited by the invention).

Each NEARdrive Blade includes redundant SAS controllers, such that dual-ported drives are connected to independent controllers and switches within the blade.
Each NEARdrive Blade may optionally configured to be "intelligent," with its own NEARdrive controller and switching logic, which case it has its own local MASTER (and possibly includes CHARM processing logic), or it may be "switched only," in which case it operates under the control of a nearby MASTER.

8.5.3.3 NEARdrive MiniTM
Each NEARdrive Mini typically comprises a NEARdrive controller, and either four matching 2.5-inch SAS/SATA drives or eight matching smaller SAS/SATA drives. With eight smaller drives, a hybrid configuration such as 4 SAS and 4 SATA drives is possible.
Each NEARdrive Mini has a conceptual or actual "buddy" (which may not be available).
Each NEARdrive Mini is responsible for its own 4 or 8 drives, but can, in a preferred embodiment, directly access the 4 or 8 drives of its buddy.

8.5.3.4 NEARFIRE - Hybrid Blade with NEAR & FIRE
NEARFIRE Technology Sketch:

8.5.4 NEARdrive Thermal Stabilization to Avoid Thermal Stress Given the data from the Google study of 100,000 disk drives, we draw somewhat different conclusions than those of the authors with regard to the effect of temperature on the annual failure rate (AFR) of disk drives in a data center environment. Our view of the Google data is that it indicates that the lowest failure rates occur in the moderate temperature range of 30 C to 45 C, and particularly in the range 35 C to 40 C. The Google authors conclude that at such moderate temperature ranges it is likely that there are other effects which affect failure rates much more than temperature, and we concur.
The Google authors further observe that temperatures outside this moderate range tend to increase the failure rate. While it is obvious that increased temperatures can increase the failure rate, they had no explanation for why lower temperatures would also apparently increase the failure rate. However, when the problematic lower average temperatures cited (15 C to 30 C) seem unlikely to occur in spinning drives in a busy data center - it is far more likely that the lower average temperature indicate the occurrence of thermal cycling (such as what would occur when systems or drives are powered up and down, or when drives are spun down by BIOS power saving settings. Since there is no apparent electronics phenomenon that would account for an increase in failure rate due to such moderately reduced temperatures (in fact, moderate temperature reductions are normally expected to decrease the failure rate of electronic devices), it is far more likely that the significantly increased failure rate is due to thermally induced stress}.
In a preferred embodiment, the NEAR technology can be integrated with the SHADOWS RUBEt technology (described elsewhere), so that a working fluid is actively circulated through the NEARdrive (Blade or Mini) and among its components, in order to provide thermal stabilization and thereby minimize thermal stress. In a preferred embodiment, the boiling point of the working fluid is 34 C (at STP, and slightly higher at mildly elevated pressures, making the effective range approximately 34 C to 40 C), and the RUBE technology supplies an appropriate combination of liquid and vapor according to how much cooling or heating is needed.
If the NEARdrive components approach or exceed a target temperature, they act as a heat source and are efficiently cooled by the fluid as it changes phases; on the other hand, if the components drop below this temperature (such as when they are spun down), they act as a heat sink and are efficiently kept heated by the fluid as it phase-changes the other direction - thus, the fluid greatly increases thermal stability. When drives are spun up after a period of non-use, they are already warmed up and ready to go without thermal stress.

8.5.5 NEARdrive Thermal Stabilization to Prevent Thermally Induced Read Errors Another important impact of NEARdrive's thermal stabilization, with respect to drive reliability, has less to do with outright drive failure and more to do with preventing read errors in the first place. Thermal variation can affect the relative head alignment between writing and reading operations. If the head is directly aligned with In fact, intentionally induced thermal stress is a primary technique used during accelerated life cycle testing (to force failures).
RUBE (Recuperative Use of Boiling Energy) is part of the FRAME (Forced Recuperation, Aggregation & Movement of Energy) subsystem.

the track, performance is relatively good; as the head moves off-track, the performance drops markedly as the magnetic remnant components of previously written data are read back along with the newly-written signal, leading to the potential for increased read errors. Thermal stabilization helps to sidestep this particular threat vector.

8.5.6 Periodic Analysis of Drive SMART Data Industry-standard disk drives provide various self-monitoring signals that are available through the SMART
standard interface. SMART detects and reports on various indicators of reliability. SMART enables a host processor to receive analytical information from the disk drive that may be useful for anticipating failures.
Industry-based empirical analysis of a very large number of drives in a well characterized data center environment indicates that some signals appear to be more relevant to the study of failures than others, and this is confirmed by the Google study referred to in the previous section.
In a preferred embodiment, the NEARdrive analysis of SMART data available from each disk drive focuses first on indicators whose "critical threshold" values were established with high confidence (>95%) by the Google study, as summarized in Table 8.5.6-1:
Table 8.5.6-1 SMART Indicators with High-Confidence Critical Threshold Values Indicator Description Critical Threshold Observed Consequence Background disk surface scans Scan Errors > 0 39x more likely to fail within 60 days Background scrubbing Offline Reallocations > 0 21x more likely to fail within 60 days Suspected bad sectors Probational Count > 0 16x more likely to fail within 60 days Sectors remapped on-the-fly Reallocation Count > 0 14x more likely to fail within 60 days The Google study noted, however, that it is unlikely that SMART data alone can be effectively used to build models that predict failures of individual drives, given that over 36% of all failed drives had zero counts on all four of these SMART variables. Thus, while a non-zero count is highly predictive of imminent failure of the corresponding drive, a zero count does not ensure that all is well. However, the NEAR technology can put this information to good use, as described in section 8.5.8.

8.5.7 Predictive Statistical Properties of Disk Drive Failures In a 2006 study of large-scale supercomputer clusters and ISPs, CMU
researchers analyzed the data from a 765-node supercomputer cluster with 3,060 CPUs, 3,060 DIMMs, 765 motherboards, 3,406 disk drives, and other components - over a 5-year period (the useful life of a disk drive). The CMU researchers were able to draw a number of important conclusions (which are applied in a novel way by the NEAR technology, as described in section 8.5.8):
= Disk drive failure was the third most likely cause of node outage, accounting for 16% of such failures, with approximately 90% of the drive failures being permanent, and thus requiring time-consuming and expensive repair actions. Although CPU failures and DIMM
(memory) failures accounted for more node outages (44% and 29%, respectively), they were "only"
transient failures mostly triggered by parity errors that required "just" a reboot to bring the failed node back up.
= Comparing the relative frequency of hardware component failures that required replacement, the four known hardware components that failed most frequently, in descending order, were disk drives (30.6%), memory (28.5%t), CPUs (12.4%), and motherboards (4.9%). Miscellaneous and unknown replacements accounted for another 14.4%.

According to the CMU researchers, the number of errors was too large to be corrected by the embedded ECC. This fact is particularly relevant for SHADOWS, since it provides further justification for the SHADOWS ECC and EEC-based error correction strategies, described elsewhere.
t From our own experience with memory systems, we believe that the failure-induced memory replacement rates are likely to he artificially high, since DIMMs are often replaced in an attempt to stein an unexpectedly high number of ECC errors, despite the fact that the errors are actually transient and can only be overcome by improving the FCC correction capability, which is the very premise of Chipkill-style ECC, with its hundred-fold reduction in uncorrectable errors. It is very likely that with an appropriate level of ECC, memory failure rates would drop significantly below CPU failure rates, which is highly intuitive, since CPUs tend to run hot and are more likely to suffer from thermal-induced failures. In the event that inferior ECC does not sufficiently account for the memory failure rates, there is a strong likelihood that memory overheating is a significant factor, and such a problem usually must be dealt with as a design consideration.

= Even during the first few years (<3 years) of a system's lifetime, when wear-out is not expected to be a significant factor, the datasheet MTTF and observed MTTF can vary by as much as a factor of 6.
= Contrary to common and proposed models, disk drive failures don't enter steady state after the first year of operation. Instead, failure rates seem to steadily increase over time.
= Early onset of wear-out seems to have a much stronger impact on lifecycle failure rates than infant mortality, even when considering only the first 3 or 5 years of a system's lifetime. The underrepresentation (in datasheets) of the early onset of wear-out is a much more serious factor than the underrepresentation of infant mortality.
= Disk drive failures exhibit significant levels of autocorrelation and long-range dependence, so their statistical properties do not form a Poisson process as is commonly assumed. The failure rate in one time interval is predictive of the failure rate in the following time interval. Thus, a week that follows a week with a "small" number of failures is more likely to see a small number of failures than a week that follows a week with a "large" number of failures.
= Disk drive failures are not realistically modeled by an exponential distribution as is commonly assumed, but rather, are characterized by higher levels of variability and decreasing hazard rates (the empirical distributions are fit well by a Weibull distribution with shape parameter less than 1).
The decreasing hazard rate function predicts that the expected remaining time until the next failure grows with the time since the last failure. It is observed, for example, that right after a failure, the expected time until the next failure is around 4 days. After surviving for 10 days without failures, the expected remaining time until the next failure grows from 4 days initially to 10 days. After surviving a total of 20 days without failures, the expected time until the next failure grows to 15 days.

8.5.8 Load-Shifting Away from Failed and At-Risk Drives In a preferred embodiment of the NEAR technology, the highly predictive non-zero SMART data counts referred to in section 8.5.6 and inferences from the findings summarized in section 8.5.7 both mandate and enable direct preventive action (rather than only remedial action) for the corresponding drive, in advance of drive failure. In particular, such a drive can be (and should be) treated exactly as if it has just encountered an actual failure, with the exception the drive may not have actually failed yet, and such actual failure may in fact be preventable, or at least deferrable.
In a preferred embodiment of the NEAR technology, a predicted or actual drive failure causes the data storage and retrieval responsibilities of the failed or "at-risk" drive to be immediately shifted to other drives.
In a preferred embodiment, if said responsibilities cannot be shifted for all at-risk drives relatively immediately, then load-shifting occurs for any failed drives first, and the relative risk among the at-risk drives can be used to determine the load-shifting order among the at-risk drives. In a preferred embodiment, the at-risk drives may be prioritized by the relative risk apparently (but not actually) implied by their relative SMART
Indicator values, with consideration given to other indicators and risk information that may be available.
In a preferred embodiment, failed or at-risk drives are left spun up, if possible, and then subjected to a pre-spin-down drive analysis and maintenance cycle as described in section 8.5.9.
For each drive, if the cycle is successful, the drive is spun down the same as if no SMART errors or failure had been detected, and it is left in the normal drive rotation for later use. For any drive where the cycle is unsuccessful, including mechanical failure or burn-out, the drive is permanently de-powered and taken out of the normal drive rotation.
In a preferred embodiment of the NEAR technology, the highly predictive non-zero SMART data counts referred to in section 8.5.6 can be used in conjunction with any observed failures to trigger an elevated risk level, by predicting the increased relative risk of disk drive failures, especially among those drives sharing one or more common dependencies (e.g., same module, same thermal environment, same vibrational environment, same power environment, same EMP environment, etc.).
In particular, if any drive (e.g., in a NEARdrive Blade or NEARdrive Mini) has experienced an actual failure, or such a failure is (or was) predicted by a non-zero count for any of the four SMART Indicators listed in Table 8.5.6-1, then there is an increased risk that some number of other drives (at the very least, those with mutual dependencies) may fail as well, within the predictive time period.
In a preferred embodiment of the NEAR technology, this situation triggers a drive analysis and maintenance cycle for each of the drives in the collective group (e.g., those in the affected drive rotation). One-by-one, the This contrasts with the prediction under an exponential distribution, where the expected remaining time stays constant.

data storage and retrieval responsibilities of each such drive is immediately shifted to other drives that are known-good or otherwise not at risk. If the cycle is successful, the drive under test is restored to normal operation and left in the normal drive rotation for immediate or later use.
For any drive where the cycle is unsuccessful, including mechanical failure or burn-out, the drive is permanently de-powered and taken out of the normal drive rotation.

8.5.9 Pre-Spin-Down Drive Analysis and Maintenance In a preferred embodiment of the NEAR technology, an automated pre-spin-down drive analysis and maintenance of every disk drive is executed on a periodic basis as described herein, and also executed when triggered in accordance with proactive risk management activity such as that described in 8.5.8.
It is generally accepted among knowledgeable intelligence professionals that it is effectively impossible to "sanitize" (i.e., securely erase") disk storage locations by simply overwriting them, no matter how many overwrite passes are made or what data patterns are written. Each track contains an image of everything ever written to it, but the contribution from each "layer" gets progressively smaller the further back (in time) it was made.
Although we conceptualize writing each bit to a disk drive as either a logical one or a zero, the actual effect is closer to obtaining a 0.95 when a zero is overwritten with a one, and a 1.05 when a one is overwritten with a one. Normal disk circuitry is set up so that both these values are read as ones, but using specialized algorithms, drive capabilities, and/or specialized circuitry, it is possible to determine the information stored in previous "layers" (i.e., due to previous writes). In a preferred embodiment of the NEAR technology, this fact is exploited by painstakingly microstepping the drive and reading and rereading the signal from the analog head electronics (essentially oversampling it by rereading tracks with slightly changed data threshold and window offsets and varying the head positioning by a few percent to either side of the track), synthesizing the oversampled waveform, and analyzing it in software (possibly with the help of reconfigurable or dedicated logic) to generate an "ideal" read signal and subtract it from what was actually read, leaving as the difference the remnant of the previous signal (i.e., the "recovered" data).
As each sector is sampled, its apparent content is saved, so that it can be restored or moved as needed:
= In the case of a sector with read errors, meaning that the current data is unreadable by normal means, the analysis process essentially described in the previous paragraph is used to determine what the sector's data should have been y (i.e., what it used to be, before the errant bits occurred).
Once this is known, the sector can be carefully re-written, this time with improved margins (or moved, if its present location needs to be mapped out as "bad sector"), thereby completing the recovery and heading off the risk of future read errors.
= In the case of a sector having no errors, the sector can still be carefully re-written with improved margins, thus proactively heading off the risk of future read errors.
The aforementioned process can easily be combined with, or replaced by, other data recovery algorithms and processes known in the industry, as appropriate, in order to enhance the survivability of the data stored via the NEAR technology, and without detracting from its most important properties, namely, the capability to recover from some number of otherwise unrecoverable errors, the capability to proactively prevent some number of errors that may otherwise be encountered, and the capability of accomplishing said error recovery and error prevention on a fully automated basis without encountering any unplanned downtime or interruption of service.

8.5.10 On-The-Fly Drive Analysis and Maintenance In a preferred embodiment of the NEAR technology, a disk read error (i.e., one that has not yet caused the corresponding drive to be labeled as having failed) triggers an "on-the-fly"
drive analysis and maintenance cycle that is limited to the immediate sector(s) corresponding to the read error, beginning with the errant sector, while also triggering the full drive analysis process as described in the previous section. The idea is to attempt to complete the current access, even if a delay is required.
Because every NEARdrive stores only fragments of objects, and a sector error can affect at most one such fragment, the analysis-induced delay Note that, unlike offline intelligence analysis which can take advantage of sophisticated equipment to recover data from multiple previous writes, the NEARdrive recovery analysis is limited by the sensitivity and precision of the drive electronics, and thus, under normal circumstances, can recover only the previous layer.

cannot add any latency to the overall storage/retrieval operation, except in the case where it corresponds to the "swing vote" (i.e., an extremely unlikely scenario involving multiple failures, where no other fragments are available to help complete the operation).

9 CORE - Computation, Optimization, & Reasoning Engines 9.1 CORE Concepts See glossary entry, 9.2 FACTUAL - Frequency-Adaptive Computation Table & Use-Adaptive Lookup FACTUAL capability is a "memoization" system designed to operate at global scale and supercomputing speed, with the high levels of security and survivability commensurate with the SHADOWS infrastructure.
"Memoization" is essentially the capability of looking up known results of deterministic processes and/or functions rather than recomputing them from scratch.
Because each SHADOWS artifact and each process has its own identity, whenever a deterministic process or function accepts a particular set of input values and produces and deterministic set of output values, we can treat the set of input values and the specific process identity as a new artifact having an identity of its own. We can likewise treat the set of output values as an artifact, with an identity. "Memoization" then becomes a conceptually simple matter of establishing a "pairing" between the input/process identity and the output identity, such that any already-known output can be looked up and identified. Thus, given any input/process identity, it can be determined (through a lookup) whether the result has been previously computed, and if so, what its identity is.
Teams are used to perform the processing required to arrive at previously unknown results, and to reach consensus on "vetted" results prior to memoization (which is particularly important for FACTUAL, because memoized results can be reused as authoritative results that sidestep process execution). As with any artifact, the various content and identities associated with memoized results need to be stored, which involves teams on a SHADOWS-wide basis, as does the lookup of memoized results.
If it cannot be readily determined (on a local basis) whether a memoized result exists, the problem to be solved is queued for processing, but can normally be dequeued if a vetted, memoized result is obtained prior to the start of execution. A memoized result that is obtained after execution has already started can be used as a test oracle to verify the result, thereby serving as a built-in system integrity check. Memoization of results, and whether to use lookup of memoized results, is context-specific and configurable at the process level or process-family level. In general, lookups of memoized results may not be utilized when such lookups consume more resources than would be required to simply recompute the results, unless such lookups reduce a local processing load by shifting the lookup elsewhere. The lookups of memoized (and therefore already-known) results are also vetted, by virtue of the fact that lookups (like other operations) are handled by geographically distributed teams that are difficult to attack. Not only must a distributed team reach consensus on the identity of the memoized result, but other distributed teams are typically involved in moving a copy of the content of the identified result to where it is needed, and in all cases the recipient(s) can determine the degree to which consensus was reached in each step. The availability of memoized results is also very helpful in cases of Byzantine failure that would otherwise hamper the achievement of vetted results.
As noted in section 7.3.1, "Load-Balancing SHADOWS Native Processes," on page 46, in general, SHADOWS native processes do not push data around as loads are shifted and requests are made, etc.
Instead, IDs are pushed around, and if a process actually needs the associated data, it can request it (on a "pull" basis), or, if there are no other operands, just forward the request to the team that owns the data (resources permitting). The act of pushing an ID, however, has the effect of putting the team owning the associated data on notice that it may be needed soon, essentially identifying the ID as a speculative prefetch opportunity.
As previously noted, in its simplest form, a SHADOWS native process has an input queue and an output queue, as depicted above, on the left. Ignoring security issues, the input queue accepts tuples of the form {
TxID, Operand ID List }, performs the work of the process which is to generate one or more Result IDs, then enqueue them for distribution. The transaction id (TxID) ties the Operand lDs (received as input) to the Result IDs associated with the processing results.
The FACTUAL system works as depicted above, on the right, by intercepting the input queue 0 before it is seen by the aforementioned process (i.e., the one described earlier in section 7.3.1, which is depicted above as a thumbnail illustration at 0). The input queue 0 accepts tuples of the form { TxID, Operand ID List }. An interior lookup process immediately fires off requests 0 to find out if the current process (which must be deterministic) has previously computed a result for the Operand IDs, requesting that the results (i.e., a "hit"
with results, or a "miss" with nothing) be sent it along to a specific destination team responsible for collecting results 0 (which may be the current team or some other one). A message including the {TxID, Operand ID
List } tuple is also sent to the specified "collect results" destination team 0 to put it on notice that data for the particular TxID may soon be arriving (unexpected data can trigger defensive behavior), and to the team responsible for throttling the actual execution of the process if already-computed results aren't found "soon enough" (note that the throttling team is co-located with the actual execution process, and has read and update access to its input queue). As lookup results, if any, are received 0, they are collected and a "hit" or "miss" is determination is made, with a message to that effect 0 being sent to the team responsible for throttling the actual execution of the process, and also to a "prefetch"
process (if the result is a hit, then the ID of the results is known, but not the actual results, so the prefetch sends a speculative prefetch message to the appropriate team; if the result is a miss, then a (non-speculative) prefetch message with the { TxID, Operand ID List } tuple is sent to the appropriate team, because the team responsible for actual process execution may be requesting the operands shortly).
When the team responsible for throttling the actual execution of the process receives the { TxID, Operand ID
List } tuple, it queues immediately enqueues it internally to the co-located execution process input queue 0.
If a "hit" message 0 occurs, the input queue @is checked. If processing has not started, the entry is dequeued because the result is already known. If the processing has already started, it is left to run, and its result can be verified against the known result. In any case, the "hit"
message 0 can be forwarded to the "Vet Results" process team 0. If the co-located execution completes, its results 0 are also forwarded to the "Vet Results" process team. Validated results can be placed into the normal process output queue 0. If results were calculated that were previously not seen, they can be posted for update 0, in order to support a future computation.

9.3 FASTpage (Fast Associative Search Tree, pageable) 9.3.1 KEY DEFINITIONS
Child. A node of a tree referred to by a parent node. Every node, except the root, is the child of some parent.
Internal node. A "simple" node of a tree that has one or more child nodes;
equivalently, one that is not a leaf. All FASTpage internal nodes are allocated beginning with an ordinate of 0 and incrementing their ordinate value, and therefore exhibit heap-like growth behavior.
Node. (1) A unit of reference in a data structure. Also called a vertex in graphs and trees. (2) A collection of information which must be kept at a single memory location.
String. A list of characters, usually implemented as an array. Informally a word, phrase, sentence, etc.
Since text processing is so common, a special type with substring operations is often available. Note: The term string usually refers to a small sequence of characters, such as a name or a sentence. The term text usually refers to a large sequence of characters, such as an article or a book.
Tree. (1) A data structure accessed beginning at the root node. Each node (including the root) is either a leaf or an internal node. An internal node has one or more child nodes and is called the parent of its child nodes. All children of the same node are siblings. Contrary to a physical tree, the root is usually depicted at the top of the structure, and the leaves are depicted at the bottom. (2) A
connected, undirected, acyclic graph. It is rooted and ordered unless otherwise specified.
Trie. (pronounced "tree"). Also: "Digital Search Trie." A tree for storing strings in which there is one node for every common prefix. The strings are stored in extra leaf nodes. Note: The name comes from reTRiEval and is pronounced, "tree."
Root. The distinguished initial or fundamental node of a tree. The only node which has no parent.

Parent. The tree node conceptually above or closer to the root than a particular node (the child node) and which has a link to the child node.
Leaf. A node in a tree, but without any children. In a FASTpage index, a leaf is also a "compound" node that may include both an external reference (i.e., the proper data of a "leaf') and also an internal reference (i.e., a "follow-on" reference to the next internal node). Note: Every node in a tree is either a leaf or an internal node. All FASTpage leaf nodes are allocated beginning with an ordinate of 255 and decrementing their ordinate value, and therefore exhibit stack-like growth behavior.

9.3.2 CENTRAL CONCEPT
The idea behind FASTpageTM is to create a fast, highly scalable, associative memory mechanism that can adapt to the information to be remembered, in order to optimize both time and space. Each FASTpage implementation supports an arbitrary number of independent local search spaces, limited only by local storage capacity. Each FASTpage search space may be individually defined to be either transient or persistent, with individually specifiable survival requirements.
In essence, FASTpageTM is a fast, efficient associative memory mechanism that can also persist indefinitely.
Although the persistence properties of a FASTpage index can be achieved any common data storage means, the concept is designed to capitalize on the very high performance of solid state disk (SSD) drives (and SSD-accelerated storage systems) in general, and the Scrutiny' FIRE* and NEAR' technologies in particular. In a preferred embodiment, FASTpage indexes takes advantage of minimal NVRAM$ resources and efficiently uses flash memory to amplify its high performance. The fact that flash memory can typically endure only a limited number of write-cycles is fully accommodated within the internal FASTpage mechanisms (which cannot approach such limits, by design), yet does not negatively impact FASTpage indexes in any way. FASTpage indexes can also take full advantage of means that have no such limits.
The FASTpage concepts as described in this brief assume a strictly local implementation (no distributed properties are described). However, any FASTpage implementation can participate in higher level distributed architectures, such as SHADOWS and CHARMS (of which FASTpage is a component). Given a FASTpage implementation that is participating in such a distributed architecture, then each FASTpage search space can also participate, on an individually selectable basis, in a higher-level, distributed search space.
FASTpage indexes can be implemented relatively easily in hardware or software, while avoiding the negative attributes of various traditional associative search mechanisms. Nonetheless, the FASTpage mechanism was inspired by the respective individual benefits normally attributed to memory tries, binary search, binary trees, splay trees, ternary search trees, hash tables, distributed hash tables, and Bayer-tree variants . In particular, the ternary search tree (TST) serves as the conceptual jumping-off point for understanding the FASTpage search concepts. It is assumed that the reader is generally familiar with the properties of these traditional search mechanisms.

9.3.3 BASIC CONCEPTS
1. FASTpage indexes combine the properties of a ternary search tree (TST) and a digital search trie (spelled "Trie," but pronounced "tree"), taking advantage of their in-memory search performance, while adding persistence with a page-sized storage unit (typically a convenient multiple of a 512-byte sector).
2. TST concepts are central to FASTpage, especially the property that "not found" conditions in string searches are determined, on average, faster than equivalent searches with a hash table.
Also important is that, unlike hash tables, FASTpage requires no reorganization to accommodate growth.

FIRE (Fast Index & Repository Emulator) is the technology underlying a FIREb!adeTM or F/REdriveTM. It provides high-performance all-electronic, long-term data storage that is immune to mechanical wear and vibration (including seismic events). The stored data is safe from intruders even if stolen. The number of read/write accesses per second is orders of magnitude faster than hard disk drives.
t NEAR (Nearline Emulation & Archival Repository) is the technology underlying a NEARb/adeTM or NEARdriveTM. It provides high-capacity, electronically assisted long-term data storage that is subject to minimal mechanical risk (including wear, vibration, and seismic events), due to significantly reduced mechanical duty cycle. The stored data is safe from intruders even if stolen. The number of read and/or accesses per second is orders of magnitude faster than unassisted hard disk drives.
$ NVRAM (Non-Volatile Random Access Memory) tends to be expensive and of fairly low capacity, and is therefore somewhat of a precious resource. A key NVRAM property is that it usually supports an unlimited number of read/write cycles and potentially long-term data retention (always long enough to survive a reboot, and sometimes as long as 10 years).
CHARM (Compressed Hierarchical Associative Relational Memory) is itself a component of SHADOWS (Self-Healing Adaptive "Distributed Organic Working Storage). FASTpage, CHARM, and SHADOWS are trademarks of Scrutiny, Inc.
The three most well-known variants of Bayer trees are commonly known as B-trees, B+trees, or B'trees.

3. Unlike traditional TSTs, a FASTpage TST requires only 4 bytes per internal node (1 byte for each of the four indices: split, left, equal, right). Each FASTpage page has sufficient page for exactly 256 nodes, so that each index can refer to any node. A typical FASTpage TST page requires only 1 KB of space of memory and/or storage.
4. Trie algorithms are known for their extremely fast in-memory search speeds, but at the expense of explosive sparse memory requirements. When the set of keys is sparse, i.e. when the actual keys form a small subset of the set of potential keys, as is very often the case, many (most) of the internal nodes in the Trie have only one child, which wastes memory. FASTpage Trie pages usually start out as FASTpage TST pages, however, and these may be densely populated (each node requires only about 4 bytes on average, which is perhaps one-third the space of a classic TST node). When sufficient leading-byte diversity exists, the TST is converted to a Trie.
5. FASTpage attempts to diversify the nodes on a page (given a common string up to a particular character position), with the ultimate goal of collecting all of their descendents into a Trie node. A FASTpage TST page is converted to a FASTpage Trie page at the point when the leading byte of its first node corresponds to the lowest possible byte value (in the associated set of possible keys) and the number of keys with diverse but contiguously sequential leading bytes in the same node exceeds a threshold.
Any nodes with non-contiguous leading bytes are allowed to remain in the Trie page, but are moved to their respective "proper" locations (based on their leading byte value).
6. As part of the conversion from a FASTpage TST page to FASTpage Trie page, any successor keys based on non-leading bytes (regardless of whether their corresponding predecessors were contiguous) are relegated to their respective lower-level successor pages.
7. A "complete" FASTpage Trie page is obtained when the highest-valued possible differing byte is also included in the contiguous sequence. Any byte at that position can then serve as an 8-bit index into the FASTpage Trie page, and thus it can be used to directly obtain a reference or pointer to the appropriate descendent page.
8. Once a "complete" FASTpage Trie page is created, it becomes immutable (never changes). Immutability of Trie pages provides significant performance benefits for caches and survivability benefits for limited-write devices such as flash-based storage.
9. FASTpage metadata can be embedded in each page, but in a preferred embodiment is stored elsewhere. In particular, the metadata is maintained in a separate set of pageable, persistent storage (co-located with the corresponding FASTpage pages) that can be indexed by the FASTpage page number.
10. Each FASTpage page has an associated metadata descriptor comprising a page type (e.g., TST, Trie, etc.), page size (optional), node size (e.g., 4 or 8 bytes), reference size (e.g., 2, 4, 8, or 16 bytes), the most-significant portion held in "common" by the majority leaf node reference addresses, current indices for the stack and heap, access control & security barrier information, data validation information (optional), and one or more indicators of progress toward TST-to-Trie conversion.
11. The metadata describing most-significant "common" portion, if any, of external references associated with the majority of leaf nodes on the page is used to factor out the common portion, which typically reduces leaf node memory requirements by half, thereby allowing more nodes to be stored and increasing the page density.
12. All of the FASTpage pages at a particular location may be subordinated to pages held elsewhere, such that the location itself (to an arbitrary level of detail) may comprise a portion of the actual search key. One effect of this is that of natural key partitioning. Another effect is storage space conservation by not having to store (or process) a portion of a key that is held in common by all the keys at a particular location.
13. Leaf nodes are not fixed in size, but may consume 1 or 2 (or more) 4-byte entries, as required, in order to contain their variable-length external reference information (plus a 1-byte follow-on index to an internal node). A
single 4-byte entry contains a 1-byte internal index plus a 1-, 2-, or 3-byte external reference. A double 4-byte entry (8 bytes) contains a 1-byte internal index plus a 4-, 5-, -6, or 7-byte external reference. Similarly, a triple 4-byte entry (12 bytes) can contain up to an 11-byte external reference, and so on. In a preferred embodiment, numerical references are encoded with a variable-length unsigned LEB128*
binary number (1 bit per byte is a flag, with 7 bits per byte of numerical information, so each byte contributes a factor of 27 to the addressable range).

9.3.4 KEY APPLICATION AREAS
Because a FASTpage index is associative, it can serve wherever there is a need for an arbitrary n-to-rn mapping (e.g., 1-to-1, 1-to-many, many-to-1, many-to-many), which corresponds to a very large application space. Although a FASTpage index excels with keys of any size, it is particularly well-suited to long and/or variable-length keys that may be problematic for other lookup means.

LEB128 is a relatively well known data format that refers to a "Little Endian Base 128" integer with 128 possible values per byte.

Here is a non-exhaustive list of some examples of applications where a FASTpage index can be optimal, both in-memory and on disk:
1. Replacement for almost any hash table' (contra-indicated only for fixed-size tables with pre-optimized "perfect hash" keys) 2. Replacement for any disk-based indext such as those based on B-trees, B+trees, or B*trees 3. Replacement of any of several conventional indexes with a single FASTpage index 4. Metadata and configuration data storage and lookup 5. Identification, authentication, and ACL (access control list) functions 6. File system/directory lookups 7. Symbol table and other dictionary functions 8. Memo tables (a special type of cache for looking up previously computed, "memoized" results) 9. "Longest-match" IP-routing tables 10. DNS (domain name system) forward and reverse lookup functions 11. Blacklisting and whitelisting functions 12. LDAP (lightweight directory access protocol) lookup functions 13. Full-text search, content management functions 14. Data-squashing functions 15. Data aggregation, sorting, and grouping functions 16. CBR (case-based reasoning) case look-ups 17. CAM (content-addressable memory) 18. CAS (content-addressable storage) 9.3.5 APPLICATION CONSIDERATIONS
File System Applications. FASTpage keys are variable-length, and can be of any length, without penalty, so hierarchical file systems can be implemented without arbitrarily restricting the length of file names and directories (folders), and each directory (or folder) can contain any number of entries. Because FASTpage keys are variable-length, without restriction, it is possible to implement path names of unlimited length, such that there is just a single index for an entire file system. Nonetheless, a typical approach would be to implement a "nested" index where each directory (or folder) has its own secondary FASTpage index, because it offers a number of advantages (the discussion of which are outside the scope of this document).
Fully Indexed Database Applications. A FASTpage index can be substituted wherever a traditional database index can be used, such as on an index-per-field and/or index-per-key basis, for each table. With a FASTpage index, it is also quite reasonable to index EVERY field or column, rather than just a selected few, in every table, with a single database-level index. In a preferred embodiment (and notably, in Scrutiny's CHARM technology), a single index can easily be used to subsume other indexes by prefixing each key value with both a table identifier and a field or column identifier. The table identifier, field/column identifier, A FASTpage index is about the same speed as a hash table for a successful lookup, but often much faster for an unsuccessful lookup, especially with long keys (this is important, because hash tables are often used to determine that a key is not present). Unlike a hash table, a FASTpage index can traverse key information in sorted order (forwards or backwards) and perform "nearest match" searches.
Unlike a hash table, a FASTpage index never needs wholesale reorganization to account for growth. A FASTpage index maintains high-density key pages to optimize time and space, and becomes even more efficient over time as FASTpage TST pages are converted to FASTpage Trie pages.
A key goal of disk-based indexes is to reduce the number of disk accesses required to locate a key, since disk access is usually a major performance bottleneck. Accordingly, their disk-based index nodes tend to be "fat" and contain many keys, in order to minimum the number of fetches required. Likewise, a FASTpage index contains many keys, but is much smaller and finer-grained by design, so as to be able to cache many more nodes that have a high probability of relevance. Although a FASTpage index will potentially incur more disk accesses, one should expect less actual disk I/O overall, because of the higher probability that useful nodes are cached early on. Also, because flash memory is the primary FASTpage persistence mechanism (by design), in a preferred embodiment the FASTpage lookup rate in a very large database will easily exceed that of a typical disk-based index by as many as several orders of magnitude.
$ In a preferred embodiment, each table identifier, and each field or column identifier would be a variable-length numeric value (typically only one byte) that is mapped to the corresponding table name, or field or column name, respectively. As a prefix to the key value, each and each unique value -- is automatically factored out and stored only once.
By so doing, a multi-table search can be carried out easily, and tables not containing a particular field or column cannot contain the associated field or column identifier and thus can yield no relevant records.
Likewise, fields containing null values naturally occupy no space at all, and if none of the records in a table have a value for that field, even the identifier itself need not be stored (a search for that field, or values in that field, can yield no relevant records). When the key values associated with a particular field or column are defined to be UNIQUE (not duplicated), the result of each successful record-oriented database index search is typically a record number or row number; for object-oriented databases, the result of each successful index search is typically an object identifier. However, when DUPLICATE key values are allowed, the result of both record-oriented and object-oriented database search is either a reference to an array (or list) of record/row numbers or object identifiers, respectively, or a recursive reference to secondary FASTpage index containing further order-related information (e.g., GROUP BY).
Compressed Database Applications. In addition to using one or more FASTpage indexes to replace traditional database indexes, they can also be used to achieve significant database compression (a technique used in Scrutiny's CHARM technology). The idea is to achieve compression by factoring out the "vocabulary" associated with a particular database. One way to achieve this is to create a non-duplicated index of key values comprising the vocabulary of the database, including at least all non-BLOB, non-numeric values, but possibly numeric values as well. In a preferred embodiment, the key value is prefixed by a data type identifier before index insertion (which means search keys need to be prefixed likewise). As each non-duplicate key value is inserted into the vocabulary index, a variable-length code (e.g., LEB128) is automatically assigned based on its predicted or actual likelihood of occurrence (frequency), such that the highest-frequency key values may be assigned the shortest codes , and vice-versa. A reverse-mapping entry is also inserted. Once the vocabulary is mapped, all database values can be replaced by their vocabulary codes, and the database becomes compressed, and speedier. It is a policy decision as to whether speculative vocabulary insertions may be allowed (not recommended). It make sense to use a representative corpus to extract a useful vocabulary. Either way, if a key is not in the vocabulary, then by definition, it cannot be in the database either. Likewise, if a key is found in the vocabulary, but has no external reference, then it is nonetheless not (yet) in the database. However, if a key is found in the vocabulary, and there is an external reference, then it refers to a secondary FASTpage index that reveals all matching database locations via follow-keys {table code, field or column code, record or object id}.
Relational Database Applications. Given the compressed database application environment of the previous paragraph, it would be straightforward to construct an RDBMS
(relational database management system) over it. Most importantly, SQL (structured query language) queries would need to be translated to incorporate the appropriate vocabulary coding, so that any corresponding FASTpage indexes can be properly searched. Data manipulations (e.g., joins) occur normally, except that coded values are used, which generally makes the searches must faster. If the database is fully indexed (recommended), search results can be MUCH faster. Eventually, in many cases the search results must be mapped back from their encoded values to their traditional equivalents, for presentation purposes.
Content Management Applications. A FASTpage index is well-suited to full-text indexing in general, and indexing of arbitrary content in particular, since its variable-length keys with leading compression provide a great deal of flexibility. Some of the same techniques described above for compressed databases are also directly applicable to content management, regardless of whether the content repository is a file system, a database management system, or something else. The vocabulary compression technique is particularly useful, since it also allows search keys to be mapped from vocabulary words to coded "concepts." Concept-coding and tagging can supplement simple text searching by incorporating thesauri and other external concept-oriented information that can help a searcher optimize precision and recall. External classifiers and reasoning engines can also contribute key pairs for a given chunk of content.
In addition to using a FASTpage index for key data, multiple temporary FASTpage indexes can be used during content analysis and also during queries for quickly cross-matching and correlating interim results.
High-Security Applications. A FASTpage index is well-suited to high-security applications for two primary reasons: 1) designed-in, fine-grained access control, and 2) elimination of the need to retrieve the target records and/or objects to process queries and make security determinations.
MAC (mandatory access identifier itself would only be stored once in a FASTpage index, due to its inherent key compression properties (each such identifier would be common to all key values appearing after it).
Unless codes are specially assigned, this means that results cannot be sorted or grouped alphanumerically by vocabulary codes (which are assigned by probable frequency). Instead, once the results are available, the vocabulary codes can be mapped back to traditional data values for grouping, sorting, and presentation purposes.

control) and/or DAC (discretionary access control) security "barriers*" can be inserted into the stored key information (as part of the key itself) in one or more various locations, according to the desired effect. In a database application, for example, a security barrier "token" can be inserted into the key just before a table identifier (e.g., before each table identifier, or perhaps just one of them), and this would have the effect of "skipping over" any table that should be invisible to a particular query (based on the security context of the query itself). Similar security barriers (which are tied to security policy) may be placed at other important locations within any key, and also within the area containing the target of any key, as well as being associated with all keys on a particular FASTpage index page and/or its descendents. It is typical that database systems which offer very fine-grained access control policies (e.g., "only personnel reps can view salaries over $30,000") must first fetch the candidate target records in order to determine which records are in scope for a particular query (assuming the non-security criteria are otherwise met). FASTpage allows "fully indexed" information, and one useful consequence of this is that all access-oriented security decisions can be made before the corresponding records and/or objects are actually retrieved (i.e., before their risk of exposure becomes increased by accessing them).

9.3.6 IMPLEMENTATION CONSIDERATIONS
Synchronous/Blocking vs. Asynchronous/Non-Blocking/Queued Interface. The FASTpage processes can be implemented completely in software or hardware as an API (application programming interface) comprising synchronous (blocking) function calls or system calls. In a preferred embodiment, however, a software implementation would comprise a set of asynchronous (non-blocking) message-oriented transactional services accessible via a queued messaging interface, and a hardware implementation would comprise at least a non-blocking, transactional, packet-oriented queued interface such as might be implemented with PCI Express or HyperTransport (e.g., with retrieval requests, posted writes).
Software vs. Hardware. In a preferred embodiment, the FASTpage processes would be implemented in both software and hardware, due to their overall utility. The idea is to standardize on FASTpage indexes and use them wherever they're applicable. General purpose CPUs would use software implementations when appropriate (especially for temporary or transient indexes), but would also have access to hardware-accelerated implementations. The hardware implementationst would be a shared resource, accessible to multiple processors, and would largely be responsible for all persistent data.
Local vs. Distributed Operation. In a preferred embodiment (e.g., the SHADOWST
" infrastructure), all persistent data is distributed over a large number of globally distributed processes and devices that cooperate to effect a secure, survivable, persistent, associative memory with significant computing power.
In such a context, the data is both encrypted and widely scattered in such a way as to render all persistent data unusable as stored (i.e., if stolen it would be worthless). Quite a bit of the local processing takes place on encrypted data, without bringing it into the clear. Each local node has partial responsibility, however, for some fraction of the global key space that it must process in the clear, even if it appears scrambled (scrambled data does not present a significant hurdle for a well-funded, sophisticated attacker).
Such in-the-clear data is created only as needed, and exists only in protected, volatile memory that can be erased on demand, such as when an attacker or intruder is detected. In general, due to the highly distributed nature of the preferred embodiment, any in-the-clear data captured by an attacker at one or even a few locations would be of little utility.
Compression. A FASTpage index enjoys leading key compression quite naturally.
Also, from a disk space viewpoint, many keys can be stored on each page, so pages can be densely rather than sparsely populated.
Furthermore, space usage in general is quite low, because internal references are all page-relative, and external references are variable-length.
In a preferred embodiment, the page-relative internal references consume w bits each, thus a search node consumes (4w/8 = w/2) bytes and 2`" such nodes are accommodated on each page, where the page size is determined as (2w * w/2) bytes, which simplifies to w*2("'") bytes. . Leaf nodes require (w/2) bytes in the normal case (1 byte for a follow-on internal reference and up to ((w/2)-1) bytes per short external reference, and up to (w-1) bytes per long external reference.

The security "barrier" is a special coded token that the FASTpage index can discern from the otherwise expected bytes in an index key sequence. When security barriers are enabled at a particular level of granularity, there is a 1-bit overhead to flag the security barrier at that level. Thus, a table-level barrier will incur an overhead of one bit per table, even if a particular table has no barrier. Barriers are available for at least the following levels: database, table, field or column, data type, vocabulary code, and target (e.g., data or external reference).
t In a preferred embodiment, multiple FASTpage processes would be instantiated within each PUMP (Parallel Universal Memory Processor) device (described elsewhere). The PUMP device which would initially be implemented with reconfigurable logic (e.g., FPGA) or "Structured ASIC."

In a preferred embodiment, the page-relative internal references consume only 8 bits (one byte) each (w=8), thus a search node consumes only 4 bytes rather than the 13 bytes required by a "traditional" ternary search tree) and up to 256 such nodes are accommodated on each 1 K (1024-byte) page.
Leaf nodes also require 4 bytes in the normal case (1 byte for a follow-on internal reference and up to 3 bytes of external reference), or 8 bytes if more addressing capacity is required (1 byte for a follow-on internal reference and up to 7 bytes of external reference).
In an alternate embodiment, the page-relative internal references are extended to w bits each (w>8), in order to allow for more nodes per page, and larger page sizes.
In an alternate embodiment, the page-relative internal references are reduced to w bits each (w<8), in order to allow for fewer nodes per page, and smaller page sizes.
In a preferred embodiment, the page-relative internal references are specified on a per-page basis to w bits each, in order to flexibly and dynamically determine the nodes per page, and the page size, for a specific application scenario.
In a preferred embodiment, each FASTpage index page has an associated offset (somewhat like a base address) that can be added to any external references to extend them.
Security Barriers. In a preferred embodiment, an LEB128-like code would be used to identify tables, fields or columns, etc., within an index key, where one bit of an 8-bit byte is used as a "stop" bit for variable-length values, with the consequence that only 7 bits per byte remain for data, yielding 27 or 128 possible values (hence the name). When a security barrier is enabled at a particular level in such an embodiment, the LEB1 28-like code would be replaced with a similar but modified code where the first byte is special, by virtue of having a bit dedicated to flag the presence of a security barrier, and with the rest of the bytes, if any, being LEB128-like, as before. With the extra flag bit dedicated to the security barrier, the first byte can now take on only 26 or 64 possible values. When the security barrier flag is NOT SET, it means that the byte sequence is NOT a security barrier, and is therefore processed normally. When the security barrier flag IS SET, it means that the first byte (i.e., the one containing the security barrier flag) and any continuation bytes, collectively comprise a byte sequence that represents a security barrier, and accordingly identifies a security policy that must be complied with - after which (i.e., if and only if the security policy is complied with) the next immediately following byte sequence may be processed normally (up to the next security barrier, if any).
9.3.7 EXAMPLE
In the following example, a spreadsheet is used to depict how a series of words is in inserted into a FASTpage index. A label indicates the word whose insertion is depicted, and the spreadsheet snippet depicts the content of the index.

9.4 RECAP - Reliably Efficient Computation, Adaptation, &
Persistence 9.4.1 RECAP - Resource-Sharing Concepts Load-balancing and other resource-sharing information is shared as typed block data in standard heartbeat messages. The information content and frequency of distribution varies according to a hierarchy of "granularity" that reflects the degree of locality most affected by the information.

9.4.1.1 Hierarchical Granularities of Locality In a preferred embodiment, SHADOWS recognizes several hierarchical granularities of locality that can be configured as required to appropriately represent resource distributions, and comprising at least the following notional levels:
= Machine (more fine-grained) = Site = Neighborhood = Community = Region = World (less fine-grained) In this discussion, "Machine" may be taken to be the most fine-grained locality, because it is sufficient for teaching purposes, but there are usually also finer-grained localities, and the same principles apply (the hierarchy actually extends in both directions). For example, in a preferred embodiment, the Machine (which is a SCRAM node that is described elsewhere) comprises a set of Quadrants, each of which comprises a set of Lobes and an optional set of Blades, where each Lobe (and optionally any Blade) comprises at least one MASTER and typically at least one SLAVE, and both MASTERs and SLAVEs are typically multi-core processors.

9.4.1.2 Information Roll-Up by Locality Sharing of intra-locality-specific load-balancing information occurs within each hierarchical granularity, and sharing of summarized load-balancing information occurs by pushing it to the next less-fine-grained level.
[Note: In a preferred embodiment, this information sharing is implemented with secure multicasting wherever and whenever such multicasting is feasible, for efficiency, and with secure "simulated" multicasting otherwise.]
For example, within a specific Machine (e.g., among the multiple processors of a multiprocessor machine), information sharing is more fine-grained (i.e., more detail and shared at a higher frequency) than across the Site containing the Machine, or the Neighborhood containing the Site, etc.
Similarly, within a "Neighborhood," information sharing is more fine-grained than across the Community containing the Neighborhood, or the Region containing the Community, etc.
Every Machine shares load and resource information with its Site at a relatively high frequency (compared to the summarization of the Site's information). Likewise, every Site shares load and resource information with its Neighborhood at a relatively high frequency, compared to the summarization of the Neighborhood's information, and so on.
Accordingly, load-balancing information is fresher within a Machine than within a Site or Neighborhood, but fresher within a Neighborhood than within a Community, and so on.

9.4.1.3 Scope of Information Roll-Ups Note that the information roll-up technique described here is not limited to load-balancing information, but is generally applicable to other information related to resource-sharing, and is especially useful for quantified classification of resource availability (i.e., relative capacity available rather than relative current load).
Resource information can also be much finer-grained than the available of a general resource such "computing capacity" -- it may extend, for example, to the capacity for handling a very specific task, or to the level of energy production, fuel reserves, network bandwidth, etc. The resulting information is particularly actionable when used in conjunction with "Think Globally, Act Locally"
decision-making processes such as those used by the MASTER (described elsewhere) to determine its immediate propensity to offload tasks through delegation vs. handling them locally vs. volunteering to take on even more tasks.

9.4.1.4 Regularity of Information Roll-Ups On an event-driven basis, every Machine shares only significant changes in its load or resources, where significance is statistically designated by, for example, a change in quartile. In a preferred embodiment, load classification by quartile is used, and each quartile class is represented by just two bits. By using quartiles, changes in classification occur relatively slowly and provide natural hysteresis, which is very desirable.
When a quartile change does occur, it represents a substantive (and therefore usually actionable) change in load classification. Causal factors (e.g., load spike, failed CPU, etc.) may optionally be shared if known, and if allowed/required by policy, at the cost of increased communication overhead (in a preferred embodiment, such information is not shared generally, but rather, is shared only with those processes, or other entities, that have a "need to know").
On a periodic basis, every Machine re-establishes a baseline for its load and available resources, by sharing fine-grained information with its Site (which includes its partner), and by keeping track of information shared by affiliated Machines ("peers") within its Site. Thus, every Machine has Machine-level load info for all its peers (by definition, all the affiliated Machines within its Site).
On a Predetermined basis (in a preferred embodiment, round-robin turn-taking is used, determined by assigned time slot), each Machine also summarizes its Site's load info and shares it with both its Site (i.e., with all the peer Machines for whom it is summarizing information) and with its Neighborhood (i.e., with all the Sites that are peers of its Site). Each Machine takes a turn periodically, in order to amortize the overhead across the Site's multiple Machines). The summary includes a list (expressed or implied) of the Site's Machines and their Machine indices, ranked by quartile (in a preferred embodiment, quartile is used, but other classifications schemes are usable).
This summarization process occurs at each hierarchical level. Thus, every Machine has access to summarized Machine-level load and resource information for every affiliated Machine in the Site, and Site-level information for every affiliated Site in the Neighborhood, and Neighborhood-level information for every affiliated Neighborhood in the Community, and so on.

9.4.1.5 Sharing the Overhead Associated with Creating Roll-Ups In a preferred embodiment, a multiplicity of peers at each level, but representing only a portion of the peers at that level (say, n of them), is responsible for each roll-up operation.
Each such peer uses the same information basis to independently create a roll-up dataset (which should be identical to those created by the other n-1 participating peers). The dataset is then compressed, encrypted, sliced, and FEC-encoded with a systematic (n,k) code, such that any k of the slices (where k <= n) is sufficient to retrieve the dataset. Each of the n peers share only one slice, which means that the threshold value k (which may vary with context) determines how many correct slices have to be received to reconstruct the dataset. This technique (which, in a preferred embodiment, is also used in other contexts) contributes to Byzantine fault-tolerance, since up to n-k faulty contributors can be ignored (however, the SELF and BOSS
subsystems take note of such failures).

9.5 RUSH - Rapid Universal Secure Handling 9.5.1 CENTRAL CONCEPT
An "untrusted node" such as an end-user PC or non-Scrutiny server (also referred to as the "subject machine") can be configured with SHADOWS software processes that enable its participation in the SHADOWS supercomputing infrastructure. Any PC or server machine, regardless of its "PC" or "server"
label can serve as both a SHADOWS client (for one or more end-users) or as a SHADOWS server.
9.5.1.1 Who Can Talk With Whom?

Process Long Name Process Type DELEGATE Distributed Execution via Local Emulation Gateway Agent FLAMERouter Firewall, Link-Aggregator/Multiplexer & Edge Router Agent MARSHAL Multi-Agent Routing, Synchronization, Handling & Aggregation Layer Agent RUSH Rapid Universal Secure Handling Protocol RUSHrouter Rapid Universal Secure Handling router Agent SERVANT Service Executor, Repository, & Voluntary Agent -- Non-Trusted Agent SHADOWS Self-Healing Adaptive Distributed Organic Working Storage System UNCAP Untrusted Node Computation, Aggregation, & Persistence Protocol Note that no end-user software processes running on an untrusted node are ever allowed to communicate with the SHADOWS infrastructure directly. All communication must take place via one or more agent processes (SHADOWS agents) installed and/or executing on the user's machine.
In particular, non-SHADOWS software (e.g., user applications) can communicate only with local SHADOWS DELEGATEs, which in turn communicates only with its local RUSHrouter. Locally installed SHADOWS SERVANTs (SERVANT agents), if any, can also communicate with the local RUSHrouter, which is responsible for all external communications. The user-local RUSHrouter further implements one or more MARSHAL roles that communicate with their assigned MARSHAL teams (more on this later) via one or more wide-area networks (WANs). Each MARSHAL may also communicate with other "nearby" MARSHALs as specifically instructed by its MARSHAL team. The MARSHAL team communicates amongst itself and with other MARSHAL teams as necessary and permitted, in order to reach the WAN-facing FLAMERouters of the "back-end" SHADOWS
infrastructure (which themselves act as MARSHALs, such that a "mere" MARSHAL
doesn't actually know when or if it is actually communicating with a FLAMERouter).

9.5.1.2 Quid Pro Quo SLA
The SHADOWS infrastructure consists of a widely distributed cloud of dedicated "back-end" supercomputing and storage nodes (a discussion of which is beyond the scope of this brief), augmented by a collection of user-supplied computing and storage resources ("untrusted nodes"). The back-end supercomputing and storage nodes provide the resources necessary to achieve a basic level of service under the terms of a basic service level agreement (SLA).
However, users can extend this basic service via "Quid Pro Quo" SLA, which provides a means to leverage the actual capacity of their local resources. For example, a user typically consumes much less than 10% of the available computing capacity over a 24-hour period, leaving more than 90%
unused, and therefore --wasted. Under the terms of the Quid Pro Quo SLA, and with the support of the SHADOWS infrastructure, a user can not only take advantage of nearly 100% of the available capacity, but can do so when it is needed most. The combination of a Quid Pro Quo SLA and the SHADOWS infrastructure essentially allows a user to "bank" the unused or unneeded resources and recall them on demand.
For example, we can measure compute time in CPU-seconds, CPU-minutes, or CPU-hours. Compiling a simple software source code file might consume 100% of a CPU for anywhere from a fraction of a second to several minutes. However, compiling an operating system like Linux with, say, 17,000 files might take 3 hours or more on a fast machine, and the CPU might not be at 100% the whole time, depending on the speed of the machines disk drives. For the sake of discussion, let's say that compiling Linux locally requires 180 CPU-minutes (3 hours at 100% CPU, and we'll ignore the number of CPUs, CPU
speed, disk speed, etc., for now). Under the terms of the Quid Pro Quo SLA, if the SHADOWS
infrastructure had already been able to take advantage of 180 CPU-minutes of idle computing resources on the user's machine, then the user would have "banked" sufficient resources to recall them all at once and apply them to a single task (all on an automated basis). In this case, the task of compiling Linux would be carried out by the Quid Pro Quo-augment SHADOWS infrastructure, which means that all 17,000 files would be compiled in parallel, with the user-specified options, and the results returned to the user's machine. The compilation itself might take only a second or two, but let's just say "less than a minute" (rather than 3 hours), which is a significant speedup, and would exhaust the banked 180 CPU-minutes all at once. Additional resources would be consumed for communicating and storing requests and results, but this is also done optimally under the SHADOWS
infrastructure.
A user's computation and data storage resources cannot be trusted by the SHADOWS infrastructure, which mandates special processes in order to achieve the high level of security required. This extra processing creates an overhead that must be accounted for in the Quid Pro Quo SLA, but otherwise provides the same benefits (high security and data integrity) to the user contributing the resource as for other users.
When talking about "untrusted nodes" in the context of the SHADOWS
infrastructure, it is useful to distinguish between the foreground processing, background processing, and communications processing that occurs locally - i.e., on the subject machine (e.g., the user's PC, or a server belonging to the user's employer). In this context, foreground processing refers to any SHADOWS
processing that is performed locally to satisfy the immediate request(s) of a bona fide user. Background processing refers to any SHADOWS processing that is performed locally to satisfy the Scrutiny SHADOWS
Quid Pro Quo SLA) associated with the subject machine. Communications processing refers to any local processing and communications required to satisfy the combined communications needs of foreground processing or background processing.

9.5.1.3 Foreground Processing End-user software applications can integrate with the SHADOWS infrastructure through a SHADOWS
DELEGATE, which is essentially an application-specific proxy, or gateway, that provides the necessary interface. The user-facing side of the DELEGATE implements one or more APIs (application programming interfaces) and/or protocols needed by the user's software applications (and to be provided by SHADOWS).
Examples would include various file systems, version control systems, database managements systems, directory systems, email systems, instant messaging, VoIP, etc.
In general, each DELEGATE provides a single, minimalist user-facing API that implements a particular API
and/or protocol. For example, one DELEGATE might implement the MAPI
email/messaging protocol, while another DELEGATE implements the IMAP email/messaging protocol. If IMAP isn't needed, then the IMAP
DELEGATE is not installed. Likewise, if the user needs both the proprietary Oracle DBMS and the open source MySQL DBMS to be available, then the appropriate DELEGATE can be installed for each.

Regardless of the user-facing API or protocol implemented by a particular DELEGATE, the SHADOWS-facing side of the DELEGATE implements the SHADOWS RUSH protocol. Thus, the DELEGATE is essentially a protocol and data translation process interposed between a user's software application and the SHADOWS infrastructure. In each case, the user's software application depends on some functionality external to itself (which is "normally" provided by another local or remote software application or service), and it is the role of the appropriate DELEGATE to emulate that functionality. The DELEGATE itself rarely implements the functionality on its own, but rather, simply provides bi-directional translations of requests responses, while communicating with the SHADOWS infrastructure to do most of the actual work.
A DELEGATE need not be installed as a service on the subject machine -- it may run on-demand as an application. For example, a software developer could locally run a "compiler"
from a make file. The actual compiler can be replaced with a DELEGATE that is responsible for the compilation, but which securely communicates with the SHADOWS infrastructure to do the actual work and return exactly the same results that would have been returned by the local compiler, but with improvements in one or more dimensions (e.g., less elapsed time, deeper analysis, etc).
Note that if a user requests that changed files be automatically archived to the SHADOWS infrastructure, this is still considered foreground processing (although it may occur at reduced priority), because it is requested by the user, on behalf of the user.

9.5.1.4 Background Processing First, note that very little, if any, background processing occurs except under the terms of the Quid Pro Quo SLA.
In the SHADOWS infrastructure, background processing on an untrusted node is always carried out by a SERVANT agent (or simply, SERVANT). Using the UNCAP protocol, instructions and data are sent from the SHADOWS infrastructure back-end via FLAMERouter(s) to multiple MARSHALs, across WANs, to the RUSHRouter-acting-as-MARSHAL on the subject machine. The RUSHrouter routes the communications to the SERVANT, which acts on the received instructions and data. In general, all or some of the data is cached in a local associative memory structure, but portions may be also be stored persistently (only as directed) in a specific encrypted container intended for that purpose. In general, the encrypted persistent store contains insufficient information to accomplish any task, including basic data retrieval. The UNCAP
protocol depends heavily on forward error correction (FEC) and bits of information supplied only on an as-needed, JIT basis. Most UNCAP instructions to a SERVANT are provided in terms of operators (actions to take) and operands (specified objects on which to operate), and the information necessary to construct either one may not arrive until just before it is needed, at which point it is cached in memory rather than stored persistently.
FEC is also used to return actual results of a particular operation. For results that seem to be novel, FEC
can be used (as instructed) to encode any results, which may be partially cached and partially stored, and --more importantly -- only partially returned to the SHADOWS infrastructure (or forwarded elsewhere as requested). By using FEC, each SERVANT sends only a fraction of the result, taking advantage of the almost-always-smaller-uplink-capacity associated with the subject machine, while taking also taking advantage of the almost-always-larger-downlink-capacity of the target recipient by receiving an aggregation of inputs from diverse SERVANTs. The ability to capitalize on asymmetric uplink/downlink capacities is particularly beneficial for communicating interim results among a large collection of SERVANTs, under the direct control of the SHADOWS infrastructure. Corrupt and/or unresponsive SERVANTs are easily detected and worked around by the combination of FEC and encryption, among other techniques. SERVANTs are not provided with communication capability except that required to communicate with their local (interior) MARSHAL.
One of the operations of a SERVANT process is to launch a subordinate SERVANT
process and register it with its MARSHAL (the subordinate SERVANT must also register directly). The subordinate SERVANT
process "executable" is first created from cached and/or stored objects Oust like any other result), under the auspices of the SHADOWS infrastructure. Once the synthesis is approved, it is then launched as a separate request (which also requires concurrence of the local RUSHrouter, which cannot withhold concurrence unless something isn't quite right). The local MARSHAL may kill any local SERVANT process, and may also causes its own virtual machine to be restarted (by crashing itself, if need be). SERVANT processes may thus be created and deleted on the fly, but each is completely expendable. In fact, from the SHADOWS
infrastructure viewpoint, every aspect of a user's machine is expendable.

Background processing is always subordinate to foreground processing, and thus the SERVANT may relinquish control whenever there is foreground processing to do (which could be due to any DELEGATE, or any other application under user control).

9.5.1.5 Communications Processing Actors When servicing users, RUSH is the only visible protocol after the user's application programming interface (API) up to and including the SHADOWS FLAMERouter. On the user side, the API
is terminated locally by the associated resident DELEGATE process, which serves as a stateful protocol translator (API to RUSH) and application gateway:
API RUSH RUSH RUSH RUSH RUSH RECAP
User App <> DELEGATE <> RUSHrouter <> MARSHAL <> MARSHAL <> MARSHAL <>
FLAMERouter <>
MASTER
Note that the FLAMERouter on the server side has an embedded MASTER, and can fully terminate the RUSH protocol. The FLAMERouter's MASTER can communicate with other internal MASTERs using RECAP.
When a user's machine (or actually, a SERVANT on the user's machine) is servicing SHADOWS, the UNCAP protocol is tunneled through RUSH, such that RUSH is still the only visible protocol:
RUSH RUSH RUSH RUSH RUSH UNCAP & RECAP
SERVANT <> RUSHrouter <> MARSHAL <> MARSHAL <> MARSHAL <> FLAMERouter <>
MASTER
In this scenario the SHADOWS SERVANT natively uses the RUSH protocol, so no user API is involved. On the server side, which is actually the originating end, UNCAP, which is tunneled over RUSH, is used to direct the SERVANT. RECAP traffic occurs only between the MASTER and the FLAMERouter (which has an embedded MASTER), but does not propagate to RUSH.

9.5.2 RUSH - Dynamic Inter-Site Path Characterization The SHADOWS infrastructure frequently needs to take advantage of one-way links (unicast or multicast), because requiring a return path imposes an unnecessary constraint on the system. Thus, all route planning is base on one-way routes. This also turns out to be very advantageous when asymmetric links are to be used, such as ADSL. In any case, RUSH models all bidirectional links as two unidirectional links, because the presence of one direction (e.g., reception) does not imply the correct functioning of the other (e.g., transmission). This same principle holds for radio frequency and optical traffic used for RUSH
communications.
Given a need for one-way communication between two SHADOWS sites A and B, there are likely to be multiple paths (each consisting of one or more links), any or all of which could apparently be utilized. The effect of actually selecting and utilizing one particular path over another may have significant consequences in terms of delay, cost, and/or reliability.
Any link that is defined in terms of one or more usage and/or capacity thresholds (e.g., bursting limits, capacity windows, etc.) is modeled in SHADOWS as a set of related sub-links, each defined by its own set of thresholds. Given a set of sub-links, the links must be labeled in ascending order of sequential use (e.g., Al is used before A2, etc.).
The SHADOWS infrastructure dynamically characterizes the paths (especially the major ingress/egress points, and intermediate SHADOWS nodes) between its various sites, and creates a plan for each site that allocates one-way traffic (to each possible destination site) along its outbound links, in such a way as to globally optimize the use of each link. The allocation of traffic is dynamic in that it may be re-planned on a periodic or event-driven basis, but is almost always event-driven (expiration of a plan is considered an event, as is any major perturbation in SHADOWS network status).
Route planning generally strives to achieve several potentially conflicting goals. Conflicts among goals may not occur (or may be of no consequence) as long as link utilization stays below some "bottleneck" threshold.
One of the desirable side effects of route planning is to identify and monitor actual or potential inter-site bottlenecks in the SHADOWS network, and to recommend (or ideally, to execute) provisioning changes.
For route-planning purposes, at any point in time there is a set of parameters that statistically characterizes a particular path from site A to site B, and this set depends on a similar set of parameters that characterizes each of the links in the path. The statistical path (and link) parameters comprise at least some combination of the following:
= Plan Expiry (timestamp) = Capacity Remaining RMS & Variance (MB) = Drawdown Rate RMS & Variance (MB/h) = Utilization RMS & Variance %
= Packet Loss RMS & Variance %
= Latency RMS & Variance (ms) = Jitter RMS & Variance (ms) (see note 1) = Transit Time RMS & Variance (s/MB) = Operational Cost ($/MB) = Infrastructure Cost ($/MB) 9.5.3 RUSH - Energy Considerations for Routing Energy usage is a key consideration for survivability, especially for systems that must continue to operate when the utility power grid is down, and for systems that routinely operate off-grid. Although a given site may load-shed to an extreme degree, such as getting rid of all processing and storage tasks, leaving only communications (in order to avoid a potential network partition), there can still be a critical shortage of energy, especially if prolonged periods must be endured.
Communications, and particularly, transmissions can be highly consumptive of sparse energy resources. For that reason, SHADOWS (and specifically, the RECAP and RUSH protocols) considers energy use as an optionally advantageous part of its resource information. The RUSH protocol, in particular, considers energy usage in its routing algorithms, such that during certain resource scenarios, routing occurs in such a way as to maximize network coverage while minimizing energy usage. In this context, the overriding goal is to:
Conserve as much energy as possible, but expend as much energy as necessary to prevent a network partition.
In a preferred embodiment of the SHADOWS infrastructure, each SHADOWS site has multiple external communication channels that connect it directly or indirectly to all other SHADOWS sites via a diverse variety of networks (VLAN, WAN, WLAN, etc.) over a combination of optical, wired, terrestrial wireless, and satellite wireless links. Normally, only a few of these links (as few as one) is needed to prevent a network partition.
In a strictly site-local energy crisis, survival of the site (preventing a network partition) may be a simple matter of choosing the link with the lowest power requirement. However, in the case of a site whose presence is pivotal in keeping several other sites connected, the problem is more complex.
Such a problem could occur, for example, if the local site is the only survivor capable of connecting two parts of a mesh, and both of the parts depend on the local site for WAN connectivity. In this case, the system uses the minimum number of links needed (at appropriate power levels) to safely prevent a network partition, and the determination of a minimalist configuration requires understanding of the present nearby network topology, since lower power can potentially be achieved by introducing more hops and a potentially more indirect path, thereby trading away minimal latency.
In a preferred embodiment, the power requirements for data transmission are determined and normalized into a cost per 512-byte packet . A table of representative costs is depicted below:

9.5.4 RUSH - Inter-Node Messaging Plan A SHADOWS inter-node messaging plan is a means for globally optimal use of locally available network communications links, while dynamically adapting to a frequently changing network state.
Inter-node messaging plans are created and maintained in quasi-real-time, and are sufficiently event-driven to account for network perturbations and state changes. However, "event-driven" doesn't mean "message-Other packet sizes are possible, but 512 bytes is used as a normalization unit since it is a convenient and frequently occurring size in the SHADOWS infrastructure, in anonymization networks, various real-time protocol, and so forth.

driven." Plans are not calculated on demand according to current outbound message load. Rather, a single plan is expected to apply to a very large number of messages, so that the amortized computational overhead can be relatively small.
Each messaging plan accommodates the both the recommended outbound link capacity and maximum outbound link capacity of a particular node. When plan recommendations are approached and/or exceeded, actionable messages occur to request automated workload-shifting and/or provisioning requests.
A messaging plan for one-way A-to-B communications is simply, for each QoS
priority, a list of next-hop links (or sub-links) that are to be used, along with the percentage of data that is to be sent via each link.
Example: The messaging plan data structure for point A (the origin of a one-way communication from point A
to point B) is conceptually similar to the following:

Plan Expiry={timestamp}
Origin=A
Destination=B
QoS=1 {Link 1, 45%}
{Link 2, 23%) {Link 3, 20%}
{Link 5, 11 %) {Link 6, 1 %) QoS=2 {Link 3, 50%) {Link 4, 28%) {Link 5, 9%) {Link 2, 8%) {Link 1, 6%) QoS=3 Destination=C
QoS=...
Destination=...
Origin=B
Destination=A
QoS=...
Destination=C
QoS=...
Destination=...
9.5.5 RUSH - Pre-Validation of Session Traffic 9.5.5.1 CENTRAL CONCEPT
The idea is to pre-validate session setup traffic in a way that shifts the burden of proof (and the overhead of verifying that proof) away from the SHADOWS infrastructure, while simultaneously challenging inbound traffic originators in a manner that mitigates the threat of DDoS, improves auditability, and minimizes the use of SHADOWS resources.

9.5.5.2 BASIC CONCEPTS
1. SHADOWS presents itself as a small target on a slowly moving "tar baby"
2. Bona fide SHADOWS clients can easily hit the target, others cannot 3. SHADOWS can separate hits from misses with very little overhead 4. Attempts that miss the target tend to stick to the "tar baby" with little SHADOWS overhead 9.5.5.2.1 A Slow Target on a Slowly Moving "Tar Baby"
On networks used to enable and facilitate automated communications with "new"
or previously unconnected entities ("supplicants"), a beacon is transmitted that allows legitimate devices to sync up with, and connect to, the SHADOWS network, in order to initiate additional steps in an identification and authentication process. A
very important aspect of the SHADOWS beaconing mechanism is that the burden of proof is shifted away from SHADOWS to the devices desiring a connection. Although it may appear from the description below that there is significant overhead on the SHADOWS side, it is illusory, because the computations occur so rarely (no more often than once per tens or hundreds of seconds) and are then amortized over, say, thousands of inbound connections. The outbound beacon fragments are multicast (or returned in special Hashcash-enabled DNS requests), and require little transmission overhead.
Beacons provide a low-overhead defense layer intended to quickly obtain a pre-validation estimate that answers the question: "friend or foe?" If the estimated answer is "foe," then it's actually definitive, and low-overhead traffic management techniques can be used to cause each such connection attempt to get "stuck"
(usually for a long time as determined by SHADOWS, or by security policy, but generally for long enough to decimate a DDOS attack). If the estimated answer is "friend," then enough access is allowed such that the next level of verification is possible.
In a preferred embodiment, beacons are used for "almost" all initial communications with the SHADOWS
network , but different beaconing tactics are used to help limit access to different resources. If beacons are in use to limit access to a resource, then the resource cannot be accessed successfully without first interpreting the beacons correctly.
In a preferred embodiment, a sequence of beacons is transmitted via a multiplicity of time-varying communications channels, with legitimate beacons camouflaged among bogus beacons, both on a given channel, and also among different channels. In a preferred embodiment, the mix of communications channels used for beacon transmissions changes over time, so that at any point in time a variety of legitimate channels is available for hearing a mix of legitimate and bogus beacons, along with a variety of channels where only bogus beacons are heard. Any listening device can potentially hear the beacons, but only bona fide communications partners can understand easily separate the legitimate beacons from the bogus beacons - doing so requires both detailed process/protocol knowledge typically built into hardware, and also knowledge of several public and private cryptographic keys. In the next two paragraphs, the values of s, p, and d- referred to respectively in the context of 1-of s, 1-of-p, and 1-of-d- are all values that can be computed by a supplicant based on a reasonably accurate knowledge of time, and a completely correct knowledge of the aforementioned process/protocol. The correct values of s, p, and d are needed to select from among the sets of cryptographic keys that are supposedly known to the supplicant.
In the SHADOWS network, a legitimate beacon comprises the current date, current time, a nonce, and a 1-of-s digital signature associated with the SHADOWS time-keeping authority (a BOSS team). Although the nonce value is, by definition, a random value, it is tracked internally, and must be used by successful supplicants. A bogus beacon is very similar, except that the nonce field and digital signature are both filled with random data values that are not tracked, and where the nonce field is guaranteed to differ from any currently legitimate nonce value..
Prior to transmission, a legitimate beacon is encrypted with a 1-of p private key associated with the SHADOWS beacon service, then with a 1-of-d private keys associated with supplicants devices (many supplicant devices share the same 1-of-d keys, so these aren't the same as the actual - and unique -private keys assigned to such devices).
Prior to transmission, a bogus beacon is similarly encrypted, but with any randomly selected 1-of-p private key associated with the SHADOWS beacon service except for the presently correct one, then with any randomly selected 1-of-d private keys associated with supplicants devices except for the presently correct one.
Finally, the legitimate beacon is divided into k packets and FEC-encoded into n legitimate beacon packets with a (n,k) erasure code. One or more bogus beacons are similarly FEC-encoded into bogus beacon packets.
After encoding, the legitimate n beacon packets are then transmitted periodically, one at a time, on the various legitimate beacon channels, at a dynamically determined rate and transmission pattern, interspersed with bogus beacon packets with a dynamically determined (and appropriate) percentage of packets going Security policy determines the degree to which none, sozne, or all of the initial communications depends on successful use of beaconed information.

over each legitimate channel. The bogus beacon packets are also transmitted over one or more of the currently non-legitimate beacon channels.
When a bona fide communications partner is ready to initiate communications with the SHADOWS network, it first obtains a reasonably reliable estimate of the current date and time (rounded to the nearest minute), and then uses it to compute a message digest of a sequence comprising data from the following tuple:
= date and time = 9-of-s known public keys associated with the SHADOWS time-keeping authority (where s is a function of the current date and time) 9.5.5.2.2 Easy Target for Bona Fide Clients, but Not Others Secret knock: increasingly sophisticated pattern involving: sync, knowledge, crypto, client-side burden, stickiness.
Client-side burden of proof is essentially effortless and invisible to bona fide clients.
9.5.5.2.3 Low-Overhead Classification of Inbound Hits and Misses Classification as self vs. non-self begins with: address/port, protocol, date/time, Hashcash, consistency, public/private key.
"Close" doesn't count in pre-validation classification.
A low-saturation Bloom filter can be used to partially validate RUSHRouter traffic against virtual channel combinations that occur, by detecting those which are clearly invalid. The idea is to use a Bloom filter at the network edge to reject bogus traffic without further processing. If traffic appears bogus traffic (e.g., attempted communication on an invalid virtual channel is detected via a Bloom filter), then by definition it is bogus (there are no false negatives). However, if traffic appears legitimate, it requires further validation.
9.5.5.2.4 Originators of Target Misses Risk Getting Stuck The SHADOWS infrastructure actively manages inbound traffic, including throttling it as necessary to insure that the SLAs for various classes of service are maintained. Traffic that is apparently or actually bogus has an associated class of service (below all others), and thus is also actively managed, which includes deciding whether and to what extent connections and/or packets can be dropped, bandwidth-reduced, delayed by latency, de-prioritized, etc.
Attackers may be able to compromise client-side components to various degrees, and thus may be able to get "closer" to hitting a particular target more quickly.
Although some clients may get further along in the pre-validation process than others, they and their traffic are internally classified as "non-self' (i.e., originated by an attacker) as soon as pre-validation fails. Pre-validation may be allowed to continue temporarily after such a failure (as though no failure occurred) in order to mask the step in which the failure occurred and/or to collect additional information that may be useful for characterizing an attack or attacker. In no case, however, can continued pre-validation (after a failure) lead to successful validation.

9.5.5.3 PROCESS OVERVIEW
RUSHrouters (and other SHADOWS entities) use the RUSH protocol, which takes advantage of a SHADOWS-specific flavor of directed spread-spectrum addressing (DSSA) to balance the communications load across multiple virtual channels while also helping to validate RUSH
trafficr. DSSA exhibits an apparently random "hopping around" behavior that cannot be replicated without knowledge of the configuration parameters and cryptographic keys used to generate the behavior.
Incorrect hopping behavior triggers SELF reporting and likely escalation.
In conjunction with DSSA, the RUSH protocol also takes advantage of a distributed efficiently amortizable CPU cost-function with no trap-door (e.g., a Hashcash-like algorithm) in order to reduce the risk of DDoS by creating an asymmetric initialization burden (i.e., the client-side RUSH
protocol initiator has a much greater burden than the server-side connect point).

A similar technique can be used with certain types internal traffic, with suitably adapted DSSA configuration parameters.

The need for further validation if the traffic appears legitimate is partly due to the inherent properties of Bloom filters (which could be easily circumvented with a FASTpage index, at higher resource cost), but mostly due to the fact that the RUSH protocol's DSSA hopping behavior is intended to be a low-overhead, front-end mechanism that simply helps to determine "friend" or "foe" at the network edge. Thus, it is like the a series of secret handshakes that must succeed before other, more costly, validation tests are attempted (e.g., exchanging digital credentials initially*, but also just the ongoing encryption and decryption associated with routine communications). Also, because the hopping behavior is ongoing, it serves as a low-cost mechanism to help assure continuous traffic validation (there's no point in using cryptographic techniques to validate a message if one already knows it cannot be valid).
The presence of bogus traffic is a direct indication to SELF that one or more "non-self' communications agents are engaging (or attempting to engage) in unauthorized communications with the SHADOWS
infrastructure.
RUSHrouters obtain sets of DSSA configuration parameters from the MARSHALs to which they connect (under the auspices and control of a MASTER-led team), and they do so using Byzantine agreement (see also: BOSS). Every agent may have its own parameters, but, in general, multiple agents (a non-unity fraction of the total) may safely share the same parameters (without knowing it), thus reducing the number of concurrent parameter sets that must be maintained on the server side.
In the SHADOWS implementation of DSSA for protocol validation, the DSSA
configuration parameters include an internal seed value to be used in conjunction with a high-quality PRNG to generate sets of sequences of destination addresses, ports, and noncest that drive the adaptive "hopping" behavior of communications based on the RUSH protocol. The DSSA configuration parameters also include the percentage of communications (of each type) to be transmitted via each channel, and approximate time-windows, along with a mask that specifies which fields are required during the specified time window (those not required are "don't care'). Note that the DSSA configuration parameters are relatively long-lived (e.g., perhaps minutes or hours), compared to the durations of each DSSA "hop" (e.g., seconds).
The destinations are aware of the specific DSSA configuration parameters for each set of RUSHrouters that shares parameters (each set may include any number of RUSHrouters). At any point in time, the concurrently active DSSA configuration parameters for each set of RUSHrouters can be aggregated into a single Bloom filter, or into one of several Bloom filters, as long as they share the same parameter definitions and format. The ability to merge Bloom filters enables the set of concurrently active DSSA configuration parameters to be both maintained individually (so that obsolete ones can be deleted and new ones can be added), and also combined into a merged filter that can be used for rapid traffic validation.
The DSSA parameters to be reflected in the Bloom filter may include any or all of the following:
= Nonce = RUSHrouter group number = Source MAC
= Source IP
= Source Port = Destination MAC
= Destination IP
= Destination Port = Operation Requested Note that, for security reasons, these parameters are suggestive, rather than being precisely defined here.
DSSA parameters not reflected in the Bloom filter may include any or all of the following:
= Transmission Timestamp = Receipt Timestamp When checking for a bogus message, if any of the Bloom filter bits are zero (FALSE), the message is guaranteed to be bogus, so validation processing can cease as soon as zero bits are detected. If all of the A relatively static (large window) set of DSSA configuration parameters can be used prior to authentication (these may be somewhat vulnerable, but still create a useful entry barrier). Another set of DSSA configuration parameters can be issued and used after authentication, t A nonce is a randomly chosen value, different from previous choices, inserted in a message to protect against replays.

bits are set (TRUE), the message is likely legitimate, but further processing is required before validation can be confirmed.

9.5.6 RUSH - Using Bloom Filters to Pre-Validate RUSH Traffic 9.5.6.1 CENTRAL CONCEPT
A low-saturation Bloom filter can be used to partially validate RUSHRouter traffic against virtual channel combinations that occur, by detecting those which are clearly invalid. The idea is to use a Bloom filter at the network edge to reject bogus traffic without further processing. If traffic appears bogus traffic (e.g., attempted communication on an invalid virtual channel is detected via a Bloom filter), then by definition it is bogus (there are no false negatives). However, if traffic appears legitimate, it requires further validation.
9.5.6.2 BASIC CONCEPTS
1. SHADOWS presents itself as a small target on a slowly moving "tar baby"
2. Bona fide SHADOWS clients can easily hit the target, others cannot 3. SHADOWS can separate hits from misses with very little overhead 4. Attempts that miss the target tend to stick to the "tar baby" with little SHADOWS overhead RUSHrouters (and other SHADOWS entities) use the RUSH protocol, which takes advantage of a SHADOWS-specific flavor of directed spread-spectrum addressing (DSSA) to balance the communications load across multiple virtual channels while also helping to validate RUSH
traffic. DSSA exhibits an apparently random "hopping around" behavior that cannot be replicated without knowledge of the configuration parameters and cryptographic keys used to generate the behavior.
Incorrect hopping behavior triggers SELF reporting and likely escalation.
In conjunction with DSSA, the RUSH protocol also takes advantage of a distributed efficiently amortizable CPU cost-function with no trap-door (e.g., a Hashcash-like algorithm) in order to reduce the risk of DDoS by creating an asymmetric initialization burden (i.e., the client-side RUSH
protocol initiator has a much greater burden than the server-side connect point).
The need for further validation if the traffic appears legitimate is partly due to the inherent properties of Bloom filters (which could be easily circumvented with a FASTpage index, at higher resource cost), but mostly due to the fact that the RUSH protocol's DSSA hopping behavior is intended to be a low-overhead, front-end mechanism that simply helps to determine "friend" or "foe" at the network edge. Thus, it is like the a series of secret handshakes that must succeed before other, more costly, validation tests are attempted (e.g., exchanging digital credentials initiallyt, but also just the ongoing encryption and decryption associated with routine communications). Also, because the hopping behavior is ongoing, it serves as a low-cost mechanism to help assure continuous traffic validation (there's no point in using cryptographic techniques to validate a message if one already knows it cannot be valid).
The presence of bogus traffic is a direct indication to SELF that one or more "non-self' communications agents are engaging (or attempting to engage) in unauthorized communications with the SHADOWS
infrastructure.
RUSHrouters obtain sets of DSSA configuration parameters from the MARSHALs to which they connect (under the auspices and control of a MASTER-led team), and they do so using Byzantine agreement (see also: BOSS). Every agent may have its own parameters, but, in general, multiple agents (a non-unity fraction of the total) may safely share the same parameters (without knowing it), thus reducing the number of concurrent parameter sets that must be maintained on the server side.
In the SHADOWS implementation of DSSA for protocol validation, the DSSA
configuration parameters include an internal seed value to be used in conjunction with a high-quality PRNG to generate sets of sequences of destination addresses, ports, and noncest that drive the adaptive "hopping" behavior of communications based on the RUSH protocol. The DSSA configuration parameters also include the percentage of communications (of each type) to be transmitted via each channel, and approximate time-A similar technique can be used with certain types internal traffic, with suitably adapted DSSA configuration parameters.
A relatively static (large window) set of DSSA configuration parameters can be used prior to authentication (these may be somewhat vulnerable, but still create a useful entry barrier). Another set of DSSA configuration parameters can be issued and used after authentication.
'' A nonce is a randomly chosen value, different from previous choices, inserted in a message to protect against replays.

windows, along with a mask that specifies which fields are required during the specified time window (those not required are "don't care'). Note that the DSSA configuration parameters are relatively long-lived (e.g., perhaps minutes or hours), compared to the durations of each DSSA "hop" (e.g., seconds).
The destinations are aware of the specific DSSA configuration parameters for each set of RUSHrouters that share parameters (each set may include any number of RUSHrouters). At any point in time, the concurrently active DSSA configuration parameters for each set of RUSHrouters can be aggregated into a single Bloom filter, or into one of several Bloom filters, as long as they share the same parameter definitions and format.
The ability to merge Bloom filters enables the set of concurrently active DSSA
configuration parameters to be both maintained individually (so that obsolete ones can be deleted and new ones can be added), and also combined into a merged filter that can be used for rapid traffic validation.
The DSSA parameters to be reflected in the Bloom filter may include any or all of the following:
= Nonce = RUSHrouter group number = Source MAC
= Source IP
= Source Port = Destination MAC
= Destination IP
= Destination Port = Operation Requested Note that, for security reasons, these parameters are suggestive, rather than being precisely defined here.
DSSA parameters not reflected in the Bloom filter may include any or all of the following:
= Transmission Timestamp = Receipt Timestamp When checking for a bogus message, if any of the Bloom filter bits are zero (FALSE), the message is guaranteed to be bogus, so validation processing can cease as soon as zero bits are detected. If all of the bits are set (TRUE), the message is likely legitimate, but further processing is required before validation can be confirmed.

9.5.7 RUSH - Time Stamping & Synchronization, Effects of Congestion, Tampering & Attack SHADOWS RUSHrouters running on user PCs and/or servers receive digitally signed beacons with embedded time signals in packets originating from the SHADOWS networkt at variable, but regular intervals.
The data to be uploaded regarding security events originating at the user's PC, such as a scanning a fingerprint scan or reading a SmartCard, is timestamped with both the local time and the last n SHADOWS
time signals received (where n is a configurable parameter, but usually n >=2 for authentication-oriented security events). Thus, the time associated with a security event can be guaranteed to be accurate within an epsilon that is controlled by the interval between SHADOWS time signals, unless corruption is detected.
9.5.7.1 Local Tampering and Attack By considering the local time at a user's PC as well as one or more SHADOWS
time stamps, SHADOWS
can more easily detect local attempts to defeat security (such by adjusting a PC's clock and/or interfering Any computing system containing a SHADOWS non-trusted component (e.g., DELEGATE, SERVANT) must also include at least one RUSHrouter to facilitate communication with the SHADOWS infrastructure. In a preferred embodiment, each outbound channel interface (e.g., a physical network interface, wireless adapter, etc.) has a dedicated RUSHrouter operating in its own VM; a separate RUSHrouter, also in its own VM, serves as the default gateway for the host computer, interfacing any hosted applications to the SHADOWS infrastructure by appropriately routing communications through the RUSHrouters that control the channel interfaces.
f In a preferred embodiment, the time signals originate from a set of "stratum I" NTP time servers embedded within SHADOWS machines (e.g., sites with SCRAM nodes), since these have internal atomic clocks (with crystal-controlled backup clocks) and are designed to survive and remain accurate in the absence of national or international time sources (e.g., GPS
satellites). Alternatively, any "stratum 1" NTP time server will do.

with network transmissions). Heartbeat events supply clues here also, based on whether any are missing, especially in the same timeframe that a security event occurs.

9.5.7.2 Network Congestion and Attack, with Local Detection and Response During known or extended periods of network congestion, or in order to conserve bandwidth or server capacity, local RUSHrouters (which also have some ability to locally detect attempted local security breaches) can be instructed (ultimately by the SHADOWS SELF and BOSS
functionality) as to how they are to respond.
For instance, a RUSHrouter may fail silently (i.e., force a crash or restart of itself and/or any SHADOWS
components, shut down the user's PC or server, etc.), or report back to the SHADOWS network with a specified set of information, or some combination thereof. The RUSHrouter's ability to respond is also influenced to a large degree by whether the RUSHrouter itself has been corrupted, or just the platform upon which it is operating. In the case of a RUSHrouter implemented within a self-contained VM (which is a preferred embodiment), the RUSHrouter may continue to operate normally, despite corruption in its physical host.
If the SERVANT is accompanied by, and associated with, a RUSHrouter (described elsewhere), which may also be implemented within a self-contained virtual machine, then in some configurations all of the physical host's communication may be directed toward the RUSHrouter (acting as the default gateway), and inbound communication may be directed to the RUSHrouter as a DMZ machine. In either or both of these scenarios, the RUSHrouter accompanying the local SHADOWS SERVANT(s) can act locally and defensively to mitigate malicious outbound behavior by the physical host, and inbound malicious traffic targeted at the physical host (or at the SERVANT(s), for that matter), and thereby possibly enable the SERVANT(s) to continue operating.
Self-detected corruption always causes a SERVANT to fail silently.

9.5.8 RUSH - Example RUSH Messages (subset) Application Messages - Apply Apply transform to message (all outputs are transient) Infrastructure Messages - Cache Prefetch id(s) but do NOT forward them - Fetch Fetch id(s) and forward them - Prefetch Prefetch id(s) and forward them - Reassign Reassign new transient id(s) to NEW persistent id(s) - Resolve Resolve new transient id(s) to EXISTING persistent id(s) Ingres-Egress Messages - Upload Upload inbound data - Demux Demultiplex message(s) from other(s) and forward them - Mux Multiplex message(s) into other(s) and forward them Input type T1 Output type T2 N-way INPUT Rendezvous on common TX
Input n1 of ml instances of type T1, n2 of m2 instances of type T2, etc.
N-way OUTPUT Split Output n1 instances of type T1, n2 instances of type T2, etc.
Every message type is associated with a Process Input Control List (PICL), a Process Output Control List (POCL), and a set of corresponding access control masks.
The PICL identifies each process that is allowed to input the message type, and the POCL identifies each process that is allowed to output it. Each input (or output) access control mask is a variable-length bitstring with one bit allocated for every process specified in the PICL (or POCL).
Access Control Mask Types:
Input Message Mask (IMM) Configuration-dependent Input Access Mask (IAM) Access privileges-dependent Input Preferences Mask (IPM) Preferences-dependent Output Message Mask (OMM) Configuration -dependent Output Access Mask (OAM) Access privileges -dependent Output Preferences Mask (OPM) Preferences -dependent There is exactly one IMM and one OMM for each message type in the system. The IMM and OMM are each represented by a quasi-static value that can vary only when the system is reconfigured in one or more of the four following ways:
1. Add one or more additional processes 2. Disable one or more existing processes 3. Enable one or more existing (but disabled) processes 4. Remove one or more existing processes There is one ICM and one OCM for each subscriber, for each message type in the system. Each [CM
(OCM) mask has the same number of bits as the corresponding IMM (OMM), which associates each bit with a specific process. The IPM and OPM masks are each represented by a quasi-static value that can vary only when the subscriber changes her or her effective preferences in some way.

9.6 VOCALE -Vocabulary-Oriented Compression & Adaptive Length Encoding 9.6.1 KEY DEFINITIONS
Atomic Word. A variable length sequence of characters comprising the largest word or partial word which can be encoded and handled as a single unit. Examples: "My" "Dog" "has"
"fleas".
Compound Word. A single word comprised of a sequence of two or more atomic words, such that the sequence encountered hints at more complexity than would normally be encoded as a single atomic word.
Examples:
"MyDogHasFleas" "JSmith43" "iVarName" "Max Val"
Permanently Assigned Token (PAT). A variable-length, unique identifier that is permanently assigned to a particular data value, with the characteristics that the length of the PAT is relatively shorter than the length of the data value it identifies, and thus can be used to uniquely represent the relatively longer data value. A
PAT is always at least four bytes in length, but has no natural maximum size (although it can be artificially constrained).
Vocabulary-Relative Token (VoRT). A variable-length, unique identifier (implemented as an unsigned LEB128 integer) that is permanently assigned either to a raw data value, or to a particular PAT, with the characteristics that the length of a VoRT is relatively shorter than the length of the PAT it is associated with, if any, and thus can be used to uniquely represent the relatively longer PAT.
A VoRT is always at least one byte in length, but less than four bytes in length (i.e., always shorter than a PAT).
Leader Byte. The first byte in an atomic word.
Trailer Byte. The last byte in an atomic word.
Vocabulary. A specific, associatively addressable collection of token-pairs where each pair contains both a PAT and a VoRT. Partial pairs are not allowed (i.e., for every PAT in the vocabulary there must be a VoRT, and vice-versa). Given either a PAT or a VoRT, the vocabulary can be queried to determine its mate within the vocabulary, providing that such a pairing exists . There can be any number of vocabularies, and each has a permanent, unique identifier which serves to identify its namespace (VoRT values can overlap among vocabularies, so whenever VoRTs are used, the identifier of the vocabulary they're relative to must be specified).

9.6.2 CENTRAL CONCEPT
The idea is to enable the creation of a level of indirection between a document and its word indices by referencing every encoded document element to a particular vocabulary whenever appropriate. In subsequent usage (e.g., such as when compressing a document), there may be multiple concurrent . A pairing either exists or not; there are no unpaired (singleton) entries.

vocabularies for each document (a variable number of bits in the first byte of each compression tag provide for on-the-fly vocabulary switching).
Note: The concepts described here do not relate to an individual document, but rather, to an entire corpus or lexicon. Individual documents or other artifacts can then be specified, synthesized, analyzed, and/or compressed relative to the available vocabularies.

9.6.3 BASIC CONCEPTS
1. Virtual Global Word List. A list of globally unique words is maintained, such that every word ever encountered is added to a "virtual" global list and given a permanently assigned token (PAT) in the system. The PAT is the "long" mechanism for uniquely specifying a word using a variable-length identifier. Each PAT is at least 4 bytes in length and specifies exactly one word. Responsibility for list updates is distributed globally among SHADOWS MASTER teams (partitioning is such that every possible word belongs to exactly one MASTER team, and that team is responsible for adding new words to the team's partition of the global word list, assigning a PAT, and distributing the new { word, PAT } tuple to the various SHADOWS sites', so they can update their non-authoritativet local copies of the "virtual" global world list. In a preferred embodiment, each list (as described here) is maintained via a distributed FASTpage index.
2. Virtual Local Copy of Virtual Global Word List. In the same way that the "virtual" global word list is distributed globally among SHADOWS MASTER teams, a "virtual" local copy of that list is distributed locally (i.e., at a site, or within a neighborhood) among local SHADOWS MASTER teams, so that each such team is responsible for maintaining (and searching) a specific partition of the local copy on behalf of the other MASTERs in that locale. If a local word search in the correct partition doesn't yield a matching { word, PAT } tuple, then the locally responsible team passes the word to the responsible global team. In a preferred embodiment, each list (as described here) is maintained via a distributed FASTpage index.
3. Encryption of Tuples. In a preferred embodiment, the {word, PAT } tuples are encrypted prior to insertion into the list. In this case, a target word or PAT to be searched for must first be encrypted, and then the encrypted value is sought.
4. Multiple Vocabularies. There may be many vocabularies, the point of which is to enable words that occur together naturally to also occur together in a vocabulary (any word may occur in multiple vocabularies, however). Vocabularies are particularly useful for data compression, and for creating implicit relationships among artifacts. In a preferred embodiment, each vocabulary (as described below) is maintained via a distributed FASTpage index. From a process perspective, the team-based handling of vocabularies exactly parallels the handling of the virtual global word list, as described in 1, 2, and 3 above (the actual teams, partitioning, and encryption may differ, as appropriate).
5. Vocabulary-Specific Encoding. Rather than storing the actual word in each vocabulary, the word's associated PAT is stored instead. To add a word to a vocabulary, a relatively short vocabulary-specific code (a VoRT) is assigned to the word (or rather, to the relatively longer PAT associated with the word, resulting in a { VoRT, PAT } tuple that is simply inserted into the vocabulary, analogously to the process in 1, 2, and 3 above.
6. Most Valuable Characters. Each vocabulary also includes a list of the 63 most frequently used (or otherwise most valuable) characters (which may include multibyte Unicode characters) in that vocabulary, each of which is assigned a 6-bit non-zero code (see Leader Byte, below).
7. Structure of a PAT. The first byte in a PAT is the "leader" byte, and the last byte is the "trailer" byte.
In a PAT of n bytes (where n > 3), the first n-1 bytes describe the "root"
word or stem, and the following "trailer" byte describes the details (suffix, capitalization, accents, etc.).
8. Compound Words. In the case of an irregularly capitalized word like "ScrutinyAgent," it is encoded as a single PAT sequence describing a compound word. The PAT sequence is formed by " This distribution responsibility includes both initial distribution as words are added, and on-demand distribution as local copies of the global list are discovered to need "new" words added. Note that on-demand distribution includes words that are truly new as well as existing words the site did not know about (for whatever reason).
t Actually, the local copy of the global word list is authoritative for words that are present in the list, but not words that are absent.
" The mere fact that two artifacts share the same vocabulary serves as a point of commonality.

juxtaposing each PAT of the individual words comprising the compound word with its counterparts, in the same relative order as the parts of the compound word, and setting the compound word bit in each individual word but the last.
9. Leader Byte. The leader byte determines how the data immediately following it is encoded. The allocation of information in the leader byte is as follows: 1 bit - Numeric flag (indicates whether the atomic word is a number or an alphabetic string); 6 bits - either a signed LEB64 Numeric value (0-63) which may be combined with subsequent signed LEB128 bytes to form a number of arbitrary precision, or a Small Alphabet" character (any of the 63 "most valuable"
characters, or if not, then a null value as an "escape" to indicate that the next byte - which may be any 8-bit value - contains the first character in place of the 6-bit value); 1 bit - Last-byte flag (i.e., signals the last byte of an atomic word in the case where the leader byte is the only byte, including the Small Alphabet scenario where the next byte stands in as the first byte). Each vocabulary can optionally specify the associated 63-character alphabet, each character of which has a 6-bit identifier that can represent an arbitrary Unicode character (even a multi-byte character) within a single leader byte.
10. Trailer Byte. The trailer byte describes the details (suffix, capitalization, accents, etc.) of how the preceding bytes of the word are to be interpreted, and what the suffix is, if any. By simply ignoring the trailer byte, a word-oriented operation can compare canonical word roots.
The allocation of information in the trailer byte is as follows: 2 bits - Capitalization specification (irregular, lower, first only, upper); 4 bits -Normal but vocabulary-specific word-suffix specification (such as, "ed", "ee", "er", "es", "ie", "ies", "ing", "or", "y", etc.); I bit - Compound word flag (includes next word); 1 bit - Last-byte flag (i.e., signals last byte of an atomic word).
11. Overlapped Vocabulary Namespace. In a preferred embodiment, the first 2 bytes of a PAT
overlap with the 2-byte namespace allowed for each vocabulary, which allows 1 or 2 bytes per word and uses 1 bit per byte to signal continuation (thus, such a vocabulary is limited to 16K of the most valuable words).
12. Tri-Word Phrases. In a preferred embodiment, a dictionary of three-word phrases is maintained.
Each three-word phrase is represented as a sorted tuple {smallest PAT, middle PAT, largest PAT, trailer byte}, where 3 bits of the trailer byte are needed to specify the order of the words encountered (there are 6 possible orderings relative to the default sorted order). There would be a pairing of a phrase-PAT value with a triplet of sorted word-PAT values. Searching for a word-PAT triplet would return a phrase-PAT value (with trailer byte), if one exists. Searching for a phrase-PAT value with trailer byte would return a triplet of sorted word-PAT values, if one exists.
Searching for a phrase-PAT value without a trailer byte would return all the corresponding triplets of sorted word-PAT
values, if any. There can only be 3! = 6 such triplets, at most.

9.7 UMA - UpdateMovingAverages(iValue) One of the CORE methods implemented in a PUMP device is that associated with the UpdateMovingAverages(iValue) message.
In a preferred embodiment, the CORE method corresponding to the UpdateMovingAverages(iValue) uses a 16-byte data structure (4 of them fit in a 64-byte cache line, and 32 of them fit in a 512-byte sector). The data structure comprises a set of bit fields, as follows:

RawDataArray 7 bytes (8 nibbles + 4 nibbles + 2 nibbles) Sums & Indexes 3 bytes (7+6+5 bits and 3+2+1 bits, respectively) CycleCounts 4 bytes (16+16 bits, but reducible if more fields are needed) Averages 2 bytes (4 averages at 4-bits each: WMA, CMA, RMA, and HMA) -------------------------------- -------------TOTAL 16 bytes As should be apparent to anyone skilled in the art, the format (including specific fields, field widths, and number of data points, etc.) and resulting space requirements of the 16-byte data structure were chosen for convenience and are somewhat arbitrary; the essence of the process is easily adaptable to other formats and space requirements.

The UpdateMovingAverages(iValue) method is stateless; it uses the integer parameter iValue (modulus 2", where n=4) to update the object data structure. Specifically, the data structure contains the information necessary to maintain a circular buffer of 8 raw data points (with an index to indicate where the head/tail interface is) that drives the current moving average (CMA), as well as a quadruple of slower-moving "recent moving averages" (RMA), and a pair of the "historical moving averages" (HMA).
The RMA and HMA are essentially averages of averages; after every so many cycles of updating the CMA's 8 raw data points, the RMA is updated with the CMA, and after every so many cycles of the updating the RMA's data points, the HMA is updated with the CMA.
The update frequency of the CMA is driven by the frequency of the UpdateMovingAverages(iValue) message sent. With each invocation, the oldest data point is dropped and the newest is added, and a new CMA is calculated. The CMA has an associated 12-bit CMA cycle count that is incremented every time the CMA
buffer index wraps around to zero (i.e., after every 8th CMA update). Whenever this occurs, the set of 4 "recent moving average" (RMA) data points is updated with the latest CMA, dropping the oldest RMA data point in the process, and a new RMA is then calculated. Likewise, the RMA has an associated 12-bit RMA
cycle count that is incremented every time the RMA cycle count wraps around to zero (i.e., after every 4th RMA update). Whenever this occurs, a pair of 2 "historical moving average"
(HMA) data points is updated with the latest RMA, dropping the oldest HMA data point in the process, and a new HMA is then calculated.
In addition to the three moving averages, a "weighted moving average" (WMA) is also calculated, and this occurs with every invocation of the UpdateMovingAverages(iValue), according to the following formula:
WMA = (2 * CMA + RMA + HMA) / 4 which is equivalent to:
WMA=(CMA+CMA+RMA+HMA)>>2 This means that the 8 most recent data points carry 50% of the weight, with 25% going to the recent moving averages, and 25% to the historical moving averages.
In operation, all four averages (WMA, CMA, RMA, HMA) are available and can be used as needed.
9.7.9 PSEUDOCODE
definition: object MovingAverages // Note: Field widths conveniently chosen for 16-byte structure size visible fields WMA field of 4 bits as unsigned // Weighted Moving Average CMA field of 4 bits as unsigned // Current Moving Average RMA field of 4 bits as unsigned // Recent Moving Average HMA field of 4 bits as unsigned // Historical Moving Average hidden fields CMAdata field of 32 bits as array [8] of unsigned Last 8 CMA updates RMAdata field of 16 bits as array [4] of unsigned // Last 4 RMA updates HMAdata field of 8 bits as array [2] of unsigned // Last 2 HMA updates CMAsum field of 7 bits as unsigned // Running total of CMA updates RMAsum field of 6 bits as unsigned // Running total of RMA updates HMAsum field of 5 bits as unsigned // Running total of HMA updates CMAindex field of 3 bits as unsigned // Next CMA position to update RMAindex field of 2 bits as unsigned // Next RMA position to update HMAindex field of 1 bits as unsigned // Next HMA position to update CMAcycles field of 16 bits as unsigned Wrap-around CMA counter (update RMA
when = 0) RMAcycles field of 16 bits as unsigned // Wrap-around RMA counter (update HMA
when = 0) end definition method: Update( iValue : field of 4 bits as unsigned) // Note: Constructor initializes all fields to 0 CMAsum ( CMAdata( CMAindex Drop last slot value from sum CMAsum +_ ( CMAdata[ CMAindex. iValue Update slot and add new value to sum CMA = CMAsum >> 3; // Divide new sum by 2=8 to re-average if CMAindex == 0 if CMAcycles == 0 UpdateRMA( iValue );
endif CMAcycles = (CMAcycles + 1) 1 (2'16) Increment cycles with 16-bit wrap-around endif CMAindex = ( CMAindex + 1 ) % 16; Increment index with 4-bit wrap-around end method private method: UpdateRMA( iValue: field of 4 bits as unsigned) RMAsum -_ ( RMAdata[ RMAindex Drop last slot value from sum RMAsum += ( RMAdata( RMAindex ) = iValue Update slot and add new value to sum RMA = RMAsum >> 2; Divide new sum by 22=4 to re-average if RMAindex == 0 if RMAcycles == 0 UpdateHMA( iValue );
endif RMAcycles = ( RMAcycles + 1 ) % 16; Increment cycles with 4-bit wrap-around endif RMAindex = ( RMAindex + 1 ) 1 4; // Increment index with 2-bit wrap-around end method private method: UpdateHMA( iValue: field of 4 bits as unsigned) HMAsum -_ ( HMAdata[ HMAindex Drop last slot value from sum HMAsum +_ ( HMAdata[ HMAindex ] = iValue Update slot and add new value to sum HMA = RMAsum >> 1; Divide new sum by 21=2 to re-average HMAindex = ( HMAindex + 1 ) % 2; increment index with 1-bit wrap-around end method FRAME (Forced Recapture, Aggregation & Movement of Energy) The FRAME (Forced Recuperation, Aggregation & Movement of Energy)subsystem comprises the following identifiable sub-subsystems (abbreviated with acronyms) as means, and these are depicted in Fig. 10-1:
= SLAM (SCADA, Logging, Analysis & Maintenance) = STEER (Steerable Thermal Energy Economizing Router) = RUBE (Recuperative Use of Boiling Energy) = PERKS (Peak Energy Reserve, Kilowatt-Scale) = FORCE (Frictionless Organic Rankine Cycle Engine) = SOLAR (Self-Orienting Light-Aggregating Receiver) FRAME is an energy production and/or peak-shaving energy management subsystem whose goal is to reduce operational costs and enhance or enable survivability. FRAME may significantly reducing the energy required to operate a heat-dissipating system (such as a computing system), through the recuperative use of energy in general, and by time-shifting the generation and consumption of power to the most effective and/or efficient time-frames.
The FRAME subsystem may acquire energy renewably when possible and appropriate, use energy efficiently to generate power, conserve energy through recapture and recycling, and efficiently maintain energy storage reserves. FRAME is described here with respect to the "node"
with which it is associated.
FRAME may significantly reduce the energy required to operate a heat-dissipating system (such as a computing system in general, and in particular the SCRAM subsystem [6]
depicted in Fig. 10-1), through the recuperative use of energy in general, and by shifting the generation and consumption of power to the most effective and/or efficient time-frames. FRAME may also reduce the cost of energy required for operation, by selecting from among the available energy sources and including economic considerations.
The basic idea underlying FRAME is to maximize the survivability of a site (for example, a computing and/or communications facility, whether manned or unmanned) by adaptively minimizing its dependence on external energy sources and supplies. FRAME does so with an integrated, automated system that controls and provides energy generation, consumption, conservation, and storage, along with automated interfaces for external replenishment and repair (thereby minimizing human attention and energy expenditures as well).
In essence, FRAME is like a highly integrated "co-generation" power plant that is internal to the local node -that is, the node which directly depends on FRAME for power. As depicted in Fig. 10-1, the box labeled "SCRAM (etc.)" represents just such an example (where the SCRAM apparatus implements the primary functionality of the node containing both SCRAM and FRAME, and where said functionality is also the node's primary power consumer). SCRAM [7] is described in section 5.
However, FRAME and its dependent node are actually co-dependent, because FRAME
also depends on the node for waste energy. Maximal power optimization is possible through co-operation by design and integration(which are static) and co-operation by cooperative collaboration (which are dynamic) while the system is running. In a preferred embodiment, parts of FRAME are so tightly integrated with the dependent node so as to significantly blur the boundaries between producing and consuming subsystems, because of their symbiotic relationships (such as exists between FRAME's RUBE sub-subsystem and the SCRAM
subsystem to which it connects, via the STEER means [2], as further described in section 10.3).
More precisely, FRAME is a power production and/or peak-shaving energy management subsystem whose goal is to enhance or enable survivability. In at least some contexts, economy of operation is a fringe benefit rather than the primary driver. By design, survivability and availability are primary drivers, with economy of operation being a consequence of meeting two key survivability constraints:
1. Effective conservation of available energy 2. Independence from the need for timely maintenance and repair Survivability is enhanced by significantly reducing operational costs (especially labor and energy costs) of both the local node and some larger system (with which the local node is typically associated, and in which the local node typically plays a part).
Summary of FRAME Subsystems FRAME comprises some combination of six primary subsystems, as depicted in Fig. 10-1; these are summarized below, and described in their respective sections:
SLAM: A subsystem that may monitor, track, and control a node's physical environment (including energy production, storage, and consumption), maintain system time and geolocation, authenticate and communicate with maintenance staff, and "call home." The SLAM
apparatus [1] is described more fully in section 10.1.
STEER: A subsystem of manifolds, valves, and motive devices that may be computer-controlled, and that may work somewhat like a crossbar switch, in order to dynamically control working fluid flow in a manner that may optimize the exchange of thermal energy between working fluids of different temperatures in such a way as to meet specific goals for the availability of the working fluids at specified temperature ranges. The STEER apparatus [1] is described more fully in section 10.2.
RUBE: A subsystem that may recuperate thermal energy ("boiling energy," in the form of heated working fluids) from "hot spots" and "warm spots" that may be exchanged for cooled working fluids, with the thermal energy ("heat") being transferred elsewhere (where it may be put to good use). The RUBE apparatus [3] is described more fully in section 10.3.
PERKS: The PERKS apparatus [4] may capture excess or low-cost energy from a multiplicity of sources (e.g., opportunistically, such as when it is cheapest or most readily available) and store it for later (i.e., time-shifted) use, such as during peak periods (e.g., when power is relatively more expensive or less available). In a preferred embodiment, as depicted in the context of Fig. 10-1, the PERKS apparatus [4] may intermediate the supplies of fuels and/or electrical power from external sources (including, for example, from, or via, a facility in which it may be located or co-located), storing a portion of the associated flow in convenient form and passing along the rest to internal consumers (other subsystems). The PERKS apparatus [4] is described more fully in section 10.4.
FORCE As depicted in Fig. 10-1, the FORCE apparatus [5] is a kilowatt-scale (e.g., 0.5KW to 50KW) modified Rankine cycle heat engine that may comprise some combination of various heat sources, working fluids (including at least one appropriate organic working fluid for two-phase liquid/vapor operation), vaporizer, superheaters, low-temperature/low-pressure vapor turbines, generators and/or alternators, recuperators, desuperheaters, preheaters, dehumidifiers, condensers, subcoolers, and STEER apparatus [2] interfaces. In a preferred embodiment, the primary object of the FORCE apparatus [5] may be to convert externally supplied electrical energy, chemical energy (e.g., one or more types of fuel), and/or thermal energy (e.g., heat contained in some type of working fluid) into electrical energy and/or thermal energy that may then be provided as an output to other subsystems. In a preferred embodiment, said electrical energy may be output directly to the PERKS apparatus [4] for subsequent further conversion, storage, and/or distribution. In a preferred embodiment, high-quality thermal energy may be provided as an output in addition to, or in lieu of, electrical energy. In a preferred embodiment, said thermal energy may be output to the STEER apparatus [2] for subsequent further transport, conversion, storage, and/or distribution. The FORCE apparatus [5] is described more fully in section 10.5.
SOLAR The SOLAR apparatus [6] comprises some combination of apparatus for tracking and/or concentrating solar energy and directing it to a receiver, where it may be collected and converted to thermal energy and transferred to a working fluid. In a preferred embodiment, the SOLAR
apparatus [6] may also comprise a STEER apparatus [2] interface for accepting and delivering working fluid to one or more companion subsystems (e.g., the RUBE apparatus [3], or FORCE
apparatus [5] , etc.). The SOLAR apparatus [6] is described more fully in section 10.6.

The HVAC, CWS, and other non-storage thermal interfaces to the FRAME means, as depicted in Fig. 10-1, are described with the RUBE apparatus [3], in section 10.3.

10.1 SLAM - SCADA, Logging, Analysis & Maintenance SLAM: A subsystem that may monitor, track, and control a node's physical environment (including energy production, storage, and consumption), maintain system time and geolocation, authenticate and communicate with maintenance staff, and "call home."
In a preferred embodiment, when the SELF (see section 7), FRAME (see section 10), DEFEND (see section 12), and/or WARN (see section 13) subsystems are present, the SLAM apparatus [1] may integrate with one or more of them to implement cooperative functionality. Ina preferred embodiment, the SLAM apparatus [1]
may be an integral part of the FRAME subsystem.
In a preferred embodiment, the SLAM apparatus [1] may comprise a multiplicity of SLAM devices, each of which may independently provide the full functionality of the SLAM apparatus.
In a preferred embodiment, a SLAM device may implement the SELF functionality described in section 7. In a preferred embodiment, at least one host processor within a SLAM device implementing the SELF
functionality described in section 7 may specifically be enabled for the role of candidate MASTER and/or MASTER as described in section 7.3.
In a preferred embodiment, when a SLAM device is embedded into a SCRAM system (where it may be reasonably secure from a physical point of view, and where it may have nearby MASTERs to communicate with, including other SLAM devices), then it may have or acquire internal MASTER and BOSS capabilities that become enabled, as described in section 7, in which case individual SLAM
devices - or the SLAM
apparatus [1] as a whole - may participate in system-wide security decisions.
In a preferred embodiment, SLAM devices comprising the SLAM apparatus [1] may be sufficient, through cooperation, Byzantine agreement logic (as described in section 7.2), and mutual agreement that may result, to enable each other to become MASTERs as described in section 7.3, even absent the participation of non-SLAM MASTERs and non-SLAM "Candidate MASTERs."
In a preferred embodiment, when a SLAM device becomes a MASTER as described in section 7.3, it may delegate work to others, including to other SLAM devices, and thus offload a portion of its tasks. In a preferred embodiment, offloaded tasks may typically include those which may be compute-bound (e.g., analysis and optimization). In this way, SLAM devices may take advantage of the computational capabilities inherent in the system of which they may be a part.
In a preferred embodiment, the SLAM apparatus [1] depicted in Fig. 10-1 may comprise decision-making, control and supervisory functions to oversee the operation of the FRAME
subsystem, as well as non-FRAME
subsystems to which FRAME may be connected. SLAM may also participate in the monitoring, tracking, and control of other (i.e., non-FRAME) subsystems that depend on the FRAME
apparatus within the local node.
SLAM may be involved in the interfaces to external subsystems such as Facility HVAC/CWS facilities [7]
and External Thermal Exchange/Storage facilities [8].
In a preferred embodiment, the SLAM apparatus [1] depicted in Fig. 10-1 monitors, tracks, and controls (to the extent possible) the physical environment of the node incorporating the FRAME apparatus, including controlling the sibling FRAME subsystems (i.e., STEER [2], RUBE [3], PERKS
[4], FORCE [5], and SOLAR [6]) comprising said FRAME apparatus as described herein. In a preferred embodiment, SLAM
provides optionally advantageous SCADA (supervisory control and data acquisition) functions, particularly with respect to establishing and maintaining the proper system thermal parameters and power generation/usage parameters, in order to optimize the use of the system's energy resources.
In a preferred embodiment, the SLAM devices of the local node may collaborate with, and coordinate with, the SLAM devices in other nodes, both local and remote, in order to enhance survivability of both the local node and the SHADOWS network as a whole.
In a preferred embodiment, the SLAM apparatus [1] depicted in Fig. 10-1 may provide steering and/or other control signals (not shown) to the STEER apparatus [2] flow control devices, in order to optimize the temperature and pressure ranges associated with each of its embedded manifolds and pseudo-reservoirs.
The SLAM apparatus [1] may interact with other subsystems, and especially other FRAME subsystem components (i.e., [3] through [6]), to carry out the desired energy resource usage policies. In a preferred embodiment, said control signals may be generated through cooperation among MASTERs (and especially SLAM devices that are MASTERs), Byzantine agreement logic (as described in section 7.2), and mutual agreement that may result Byzantine agreement with respect to said control signals. In this way, there may be increased assurance that the said control signals can maximize system survivability and optimize the overall operation of the system.
In a preferred embodiment, the SLAM apparatus [1] depicted in Fig. 10-1 may contain a crystal-controlled oscillator for maintaining relatively accurate time in the absence of external time synchronization signals. In a preferred embodiment, said apparatus may synchronize with a GPS or other satellite-originated signal and also with an external PPS (pulse-per-second) signal such as that available with a local or remote atomic (e.g., Rubidium- or Cesium-based) clock, and which may provide a reconciled PPS output (not shown). In a preferred embodiment, high-quality PPS inputs may originate with a commercially available miniature atomic clock module that may be internal to, and integrated with, the local SCRAM
system.
In a preferred embodiment, a SLAM device may be equipped with geolocation devices, radio devices, and/or other devices that may be capable of acquiring or approximating the current time, location, and bearing through unidirectional or bidirectional communications from external sources, including LEO (low-earth orbit) or MEO (mid-earth orbit) satellites, communications towers, wireless access points, homing beacons, and other signal transmitting or transceiving devices. In a preferred embodiment, information acquired or inferred from said external geolocation sources may serve to inform a SLAM device's "belief as to the current time, location, and bearing (as well as its confidence in said "belief"), and the SLAM device may use said belief to synchronize internal timekeeping and geolocation devices accordingly.
In a preferred embodiment, a SLAM device may be equipped with internal timekeeping and geolocation devices that may be capable of approximating the current time, location, and bearing even in a "GPS-denied"
environment. In a preferred embodiment, said internal geolocation devices may comprise a combination of oscillators, electronic compasses, inertial reference units, three-axis accelerometers, and other suitable devices as may be available.
In a preferred embodiment, SLAM's three-axis accelerometers may sense movement and may thus be able to determine orientation of the system of which the SLAM may be a part, as well as the direction and rate of movement, if any. In a preferred embodiment, SLAM's magnetic compass may help In a preferred embodiment, said accelerometers may also be useful for sensing vibration (which may include vibration due to failed or failing components), physical attacks and/or other impacts to the equipment, and seismic P-waves (which may provide sufficient warning, often on the order of a few seconds or tens of seconds), and may therefore enable the triggering of some emergency action. In a preferred embodiment, if the WARN and/or LISTEN subsystems are present, the SLAM may integrate with them in order to exchange sensory inputs and analyses (threat, time, location, accelerometer readings, etc.), and may also cooperate on monitoring and/or maintenance functions.
In a preferred embodiment, various sensory input signals may be made available simultaneously to multiple SLAM devices and MASTERs, such that Byzantine agreement with respect to the correctness of said signals may be reached through cooperation among MASTERs (and especially SLAM devices that are MASTERs), Byzantine agreement logic (as described in section 7.2), and mutual agreement that may result from Byzantine agreement with respect to said sensory input signals. In this way, there may be increased assurance that the said sensory input signals can maximize system survivability and optimize the overall operation of the system.
In a preferred embodiment, SLAM devices may provide monitoring and tracking of high-value assets, said assets normally being the system itself or a subsystem to which a SLAM device may be attached or with which it may be co-located.
In a preferred embodiment, a SLAM device may be equipped with one or more communications channels that enable external communications, including communications with authorized personnel. In a preferred embodiment, said communications channels include a combination of wired and wireless channels suitable for secure communications with authorized personnel, and with other authorized systems (which may include the ability to "call home" to one or more authorized destinations to report intrusion attempts, geolocation, system status, asset-tracking information, or other authorized information).
One possible embodiment of a SLAM device configuration is depicted in Fig.
10.1-1. In an alternate preferred embodiment, there may be no actual "SLAM device," but rather the various SLAM I/O devices may be distributed among the system's existing electronics modules, for example, with the SLAM device's processing responsibilities relegated to tasks or processes or virtual machines that may execute as part of the inherent workload of other processors within the system of which the SLAM
device would otherwise be part.

In a preferred embodiment, when a SLAM device is embedded into a SCRAM
subsystem, the preferred means of communications with maintenance personnel may be via a suitably equipped personal computer, PDA, or other device having reliable means of biometric sensory inputs (e.g., fingerprint, iris, etc.) for authentication - over a multiplicity of secure wired and/or wireless channels, each such channel bearing a portion of the data stream with independent encryption keys. In an alternative embodiment, any acceptably secure communications devices and/or channels may be utilized for said purpose.

10.2 STEER - Steerable Thermal Energy Economizing Router STEER: A system of manifolds, valves, and motive devices that may be computer-controlled, and that may work somewhat like a crossbar switch, in order to dynamically control working fluid flow in a manner that may optimize the exchange of thermal energy between working fluids of different temperatures in such a way as to meet specific goals for the availability of the working fluids at specified temperature ranges.
In a preferred embodiment, the STEER apparatus [2] depicted as part of the FRAME subsystem in Fig. 10-1 may comprise dynamically reconfigurable "plumbing" interconnectivity, such that some or all of the fluid-based devices that may comprise its sibling subsystems (such as pumps, heat exchangers, etc.), may be adaptively and dynamically connected, disconnected, reconnected, and/or operated in reconfigurable patterns that may achieve particular purposes at a particular point in time.
In a preferred embodiment, such reconfigurations and operations may occur under the control of the SLAM
apparatus [1], so that thermal energy may be optimally routed to the various devices in the system with minimized energy loss or waste.
The STEER apparatus [2] comprises a combination of pipes, tubes, joints, manifolds, and valves (a combination of fixed and statically or dynamically adjustable valves, under system control as necessary) that may collectively work somewhat like a crossbar switch, in order to dynamically control working fluid flow between various sources and sinks of thermal energy, mixing working fluid of different temperature ranges as needed to meet specific goals for moving compatible working fluids in accordance with appropriate temperature and pressure ranges.
In the depiction of Fig. 10-1, only three broad (and non-specific) temperature ranges are shown as examples, but in principle as many ranges as needed may be used, with separate means provided for each type of working fluid and operating regime. The key limitation driving the mixing may be the conservation of energy (inputs and outputs), and the need to blend changing temperatures and pressures adaptively while respecting the desired operating ranges (which may vary over time). Note that although the STEER
apparatus is depicted as a "centralized" box in Fig. 10-1 in order to better visualize its role conceptually, in a preferred embodiment its components may be conveniently distributed around the local system, in order to more easily co-locate them with their associated devices and connectors.

10.2.1 STEER - Latching Digital Flow Rate Control Valve In order to control the rate of flow to various devices by using control signals from the SLAM apparatus [1 ]
depicted in Fig. 10-1, the STEER apparatus [2] may incorporate a "Latching Digital Flow Rate Control Valve"
apparatus (hereafter, simply "Rate Control Valve"), as example of which is depicted in Fig. 10.2.1-1.
As depicted in Fig. 10.2.1-1, in a preferred embodiment, each Rate Control Valve comprises a set of binary latching valves [3] that may be individually latched open or closed, and at least a common inlet manifold [2]
with inlet connection [1], a common outlet manifold [4] with outlet connection [5], along with signaling connections (e.g., depicted in Fig. 10.2.1-1 as "A," "B," "C," and "D") - and optionally, sensory connections (not shown) - appropriate to each binary latching valve.
In a preferred embodiment, the inlet connection [1] and outlet connection [5]
may be arranged to be on diagonally opposite sides of the Rate Control Valve (i.e., in a "reverse return" configuration) such that the path from connection [1] to connection [5] through any individual valve in the set of binary latching valves [3]
is the same length as the path through any other such individual valve in the set.
In a preferred embodiment, a Rate Control Valve may comprise magnetic latching valves as binary latching valves, specifically to conserve electrical energy, since they can be automated, yet relatively little electrical energy may be required to operate them, and only to toggle them from one position (e.g., open or closed) to the other (e.g., closed or open, respectively). In an alternative embodiment, one or more non-latching valves or motor valves may be substituted instead. In yet another alternative embodiment, which may not be fully automatable, one or more manually operated valves may be substituted.

In a preferred embodiment, a Rate Control Valve may be constructed by connecting the inlets of a set of N
binary latching valves to a common inlet manifold, and connecting the outlets of said binary latching valves to a common outlet manifold. The example depicted in Fig. 10.2.1-1 comprises a set of four binary latching valves [3], so for this example, N=4. A truth table for this example, depicting the flow rates for the various input combinations, is depicted in Table 10.2.1-1.
In a preferred embodiment, the individual binary latching valves may be selected or constructed to accommodate the working pressures associated with their intended use, and to ensure suitability with respect to the requisite control signaling and any relevant sensory requirements.
In a preferred embodiment, a Rate Control Valve may be constructed such that the aggregate flow rates of the binary latching valves are the constraining factor. In an alternative embodiment, something else may be the constraining factor. In either embodiment, given a Rate Control Valve containing a set of N binary latching valves (N > 0) that are functioning correctly, the rate of flow may be controllable into N+1 binary steps over a range of 0% to 100%, with a step size equal to (100% / N). For example, a Rate Control Valve comprising four binary latching valves (i.e., N=4) may have five (4+1=5) steps ranging from 0% to 100%, at 25% intervals (100%14=25%), corresponding to flow rates of 0%, 25%, 50%, 75%, and 100%, with built-in redundancy for all intermediate steps (but not for the zero-flow and full-flow settings). In a preferred embodiment, redundancy for the zero-flow and full-flow settings may be created straightforwardly using known modular redundancy techniques (e.g., such as TMR, or triple modular redundancy).

Table 10.2.1-1 STEER Latching Digital Flow Rate Control Valve - Flow-Rate Truth Table Effect of Control Variables A B C D
= 0% flow (oft) Off Off Off Off = 25% flow Off Off Off On = 25% flow Off Off On Off = 25% flow Off On Off Off = 25% flow On Off Off Off = 50% flow Off Off On On = 50% flow Off On Off On = 50% flow Off On On Off = 50% flow On Off Off On = 50% flow On Off On Off = 50% flow On On Off Off = 75% flow Off On On On = 75% flow On Off On On = 75% flow On On Off On = 75% flow On On On Off = 100% flow On On On On Given a set of suitable binary latching valves, constructed and configured as taught here, the signaling connections may then be used to force the individual binary latching valves to a desired state, according to the flow rate desired. When all the binary latching valves are actually closed, the corresponding flow rate is zero, which is clearly a trivial case. When all the binary latching valves are commanded closed, the corresponding flow rate may be zero, which is to say, there is some possibility that at least one individual valve in the set may fail open. In a preferred embodiment, apparatus that requires variable flow and also -for example, assured zero flow and/or full flow - may be created straightforwardly using known TMR (triple modular redundancy) techniques.
In a preferred embodiment, when all the individual binary latching valves are open, the corresponding flow rate of the Rate Control Valve may be constrained by whichever flow rate is most constraining - i.e., the rates of the inlet and/or outlet connections to their respective manifolds, the rates of the manifolds themselves, or the aggregate flow rates of the open binary latching valves.
In a preferred embodiment, the optional sensory signals of the individual binary latching valves may be used to determine whether each particular valve is properly opened or closed. In an alternative preferred embodiment, a flow sensor may be used to determine the aggregate flow, thereby allowing the proper functioning of the binary latching valves to be inferred. In yet another preferred embodiment, both techniques may be used together, in order to increase confidence that the actual state is known.
In the case of individual binary latching valve malfunction, significant redundancy accrues as N becomes larger, and this is a fringe benefit of making the controllable flow rates increasingly fine-grained (i.e., smaller increments). In particular, a valve that refuses to open (or close) can be worked around by opening (or closing) a different one instead. However, the special case of zero flow and full flow, however cannot be made redundant without additional valves and a different configuration.
Nonetheless, it is straightforward (and known in the art) to accommodate the two special cases, if necessary, through modular redundancy, where the entire Rate Control Valve is treated as a single modular device.
In a preferred embodiment of the STEER apparatus [2] depicted in Fig. 10-1, the Rate Control Valves are monitored and controlled by the SLAM means [1], and TMR (triple modular redundancy), which is known in the art, is used to implement a high-availability configuration of Rate Control Valves when system criticality warrants it.

10.2.2 STEER - Parallel-Series Reconfigurator In a preferred embodiment, in order to dynamically reconfigure the flow to various fluid-based devices without actually modifying the physical plumbing connections, for example by using control signals from the SLAM apparatus [1] depicted in Fig. 10-1, the STEER apparatus [2] incorporates one or more "Parallel-Series Reconfigurator" devices (hereafter, each is simply called a "Reconfigurator"). For simplicity, in a preferred embodiment, each Reconfigurator supports only two devices, and larger configurations may be supported by treating each two-device Reconfigurator as a single device that can be substituted into a separate Reconfigurator apparatus operating at a higher level. In an alternate embodiment, and with increased complexity, additional devices may be supported within a single Reconfigurator by introducing additional control variables as desired and defining the appropriate control states.
In a preferred embodiment, with fluid-handling components connected as depicted in Fig. 10.2.2-1, each Reconfigurator comprises a set of three binary latching valves (i.e., valves that are latchable in either the open or closed position and require no electrical power to sustain their presently latched setting, as depicted by [3], [9], and [10] in Fig. 10.2.2-1), and four devices suitable for splitting and/or merging liquid flows (e.g., "Y" couplings, "T" couplings, manifolds, etc., such as those depicted by [4]
and [6], for example, in Fig. 10.2.2-1), optional check valves (not shown), along with signaling connections (e.g., depicted in Fig. 10.2.2-1 as "A," "B," and "C" on latching valves [3], [9], and [10], respectively) - and optionally, sensory connections (not shown) - appropriate to each binary latching valve.
In the Reconfigurator depicted in Fig. 10.2.2-1, the flow enters at [1]. In a preferred embodiment, the flow capacity at entry point [1] and exit point [6] are normally at least equal to the aggregate capacity of the branches that split from [1] into points [2] and [7], and ultimately, the branches that merge again at point [6].
in a preferred embodiment, the branches at [2] and [7], and all other internal flow points, including latches [3], [9], and [10], are each at least equal to the flow capacity of the greater of the flow capacities of Device #1 [8]
and Device #2 [5]. In a preferred embodiment, flow capacities of Device #1 [8]
and Device #2 [5] may be identical.
In a preferred embodiment, a Rate Control Valve may comprise magnetic latching valves as binary latching valves, specifically to conserve electrical energy, since they can be automated, yet relatively little electrical energy may be required to operate them, and only to toggle them from one position (e.g., open or closed) to the other (e.g., closed or open, respectively). In an alternative embodiment, one or more non-latching valves or motor valves may be substituted instead. In yet another alternative embodiment, which may not be fully automatable, one or more manually operated valves may be substituted.
In a preferred embodiment, a Reconfigurator's configuration may be determined by a truth table such as one having three binary control variables (i.e., logical control inputs), and thus having eight (2=8) possible states, of which five are generally valid (the other three states are not normally needed, but may be used for special circumstances or unusual devices). Table 10.2.2-1 depicts a truth table relating to the configuration of Fig. 10.2.2-1, corresponding to three latch-control variables and eight possible configuration states.
In an alternative preferred embodiment, one or more of the individual binary latching valves depicted in Fig. 10.2.2-1 may each be substituted with a Latching Digital Flow Rate Control Valve apparatus as described in section 10.2.1 (an example of which is depicted in Fig. 10.2.1-1). While such substitution may make the configuration significantly more complex conceptually, it may be modeled easily and may afford opportunity to provide additional dynamic balancing of system flows, including the possibility of taking advantage of one or more partial-flow configurations listed in Table 10.2.2-1.

Table 10.2.2-1 STEER Parallel-Series Reconfigurator- Latch-Control Truth Table Effect of Latching Variables A B C
= No flow: Off Off Off = Flow through Device #1 only Off Off On = SERIES FLOW through Devices #1 and #2 Off On Off = Full flow through Device #1, partial flow through Device #2 Off On On = Flow through Device #2 only On Off Off = PARALLEL FLOW through Devices #1 and #2 On Off On = Partial flow through Device #1, full flow through Device #2 On On Off = Partial flow through Devices #1 and #2, with bypass On On On In a preferred embodiment, a Reconfigurator may be used to dynamically "re-plumb" a mated pair of devices (e.g., Device #1 and Device #2 of Fig. 10.2.2-1) from a parallel configuration (i.e., A=On, B=Off, C=On) to a serial configuration (i.e., A=Off, B=On, C=Off), or vice-versa. In a preferred embodiment, other combinations may be useful, as indicated in Table 10.2.2-1. In a preferred embodiment, said dynamic reconfiguration may be automated and/or unattended.
For example, a pair of pumps presently operating in parallel to maximize flow rate may be dynamically configured to operate in series instead, perhaps in order to increase pressure as part of a work-around for a failed (or failing) downstream component. In this particular example one might expect such a reconfiguration to cause the flow rate to be reduced, as is commonly the case, but this is not necessarily so, because in a preferred embodiment, the FRAME apparatus (of which the STEER apparatus taught here may be a part) may comprise variable-speed pumps wherever pumps are used. Thus, the tradeoff may be increased power consumption by the affected pumps, rather than necessitating a change in operating pressures and/or flows.
There is, of course, the possibility that there would be no tradeoff at all, such as in the case where a reduced flow rate is a feature of the intended reconfiguration (such as with a modified power usage profile).
As another example, a pair of heat exchangers that are operating in series to maximize the heat exchange for a particular scenario, may be dynamically configured to operate in parallel instead, perhaps in order to decrease pressure drop to compensate for a failed upstream pump, or valve, etc., or to simply achieve a different energy consumption rate, or a different thermal profile.
In a preferred embodiment of the STEER apparatus [2] depicted in Fig. 10-1, multiple Reconfigurators may be combined as necessary to achieve nearly arbitrary parallel-series combinations. For example, in an environment where the energy availability, power requirements, cooling load, ambient temperature, etc., are all changing dynamically, and possibly dramatically, it may be very difficult to "tune" the system to a configuration that is optimal using conventional means. However, the use of Reconfigurators may enable dynamic reconfiguration and tuning of the system to match changing real-world requirements (for example, in conjunction with external control logic, such as may be provided by SLAM
apparatus [1] depicted in Fig. 10-1.
In a preferred embodiment, a specific type of Reconfigurator may be constructed as an assembly of a pair of mated devices, plus other requisite parts, such that the resulting assembly may be seen as having a single fluid input and a single fluid output, along with a set of control and/or sensory signals, and having the function of a single device whose characteristics at a point in time are equivalent to the pair of mated devices operating in one of the desired configurations (which may be a subset of those technically possible).
In an alternate preferred embodiment, a specific type of Reconfigurator may be constructed as an assembly of a pair of non-mated devices, plus other requisite parts, such that the resulting assembly may be seen as having a single fluid input and a single fluid output, along with a set of control and/or sensory signals, and having the function of a single device whose characteristics at a point in time are equivalent to the pair of non-mated devices operating in one of the desired configurations (which may be a subset of those technically possible).

In an alternative preferred embodiment, as depicted in Fig. 10.2.2-2, a generalized Virtual Reconfigurator may be constructed without actually embedding or otherwise including the devices [5] and [8], but rather, by providing two extra pairs of inlet/outlet connections to accommodate a pair of said devices that are external and may be attached later, plus the other requisite parts previously described, such that the resulting assembly has a single fluid input and a single fluid output, along with a set of control and/or optional sensory signals, plus two pairs of inlet/outlet connections for the two devices to be attached later.

10.3 RUBE - Recuperative Use of Boiling Energy RUBE: A subsystem that may recuperate thermal energy ("boiling energy," in the form of heated working fluids) from "hot spots" and "warm spots" that may be exchanged for cooled working fluids, with the thermal energy ("heat") being transferred elsewhere (where it may be put to good use).
The RUBE apparatus [3] depicted as part of the FRAME subsystem in Fig. 10-1 may comprise devices for recuperating, transferring, and exchanging thermal energy contained in working fluids, and these may be referred to hereafter as "RUBE devices."
In a preferred embodiment, the RUBE apparatus [3] may be co-located and closely integrated with its thermal energy sources and sinks. Said sources and sinks may be any devices which may emit or absorb thermal energy, respectively, to be recuperated, transferred, and/or exchanged by RUBE.
In a preferred embodiment, origins of said "boiling energy" may be co-located subsystems that may need to be kept to specific desired (or maximum) operating temperatures or temperature ranges - and typically, these may be subsystems which also generate waste heat that otherwise may ordinarily need to be rejected from the system.
In a preferred embodiment, the RUBE apparatus [3] may be tightly integrated with a combination of power-dissipating components such that the cooling of temperature-sensitive components can be aided through the thermal energy contributed by relatively hotter and/or potentially less temperature-sensitive components.
In a preferred embodiment, the RUBE apparatus [3] depicted as being part of the FRAME subsystem in Fig. 10-1 may be tightly integrated with the SCRAM apparatus [7] described in section 5. In another preferred embodiment, the RUBE apparatus [3] is also integrated with the FORCE
apparatus [5]. In still another preferred embodiment, the RUBE apparatus [3] may also integrated with the SOLAR apparatus [6]
described in section 10.6. In a preferred embodiment, the RUBE apparatus [3]
may be integrated with external sources of thermal energy, which may include sources of energy that may otherwise be wasted.
In a preferred embodiment, the RUBE apparatus [3] may use a relatively low-temperature phase-change working fluid to recuperate thermal energy, and as a medium for transferring thermal energy. In a preferred embodiment, such as for electronics thermal stabilization applications, the working fluid is preferentially a non-flammable, non-ozone depleting, low-GWP, organic dielectric fluid with a boiling point between 20 C and 40 C, and a useful upper limit of at least 125 C, such as 1-methoxy-heptafluoropropane (C3F7OCH3), which is practically non-toxic and currently not regulated for transport or use .
Other suitable working fluids may include, for example, C5F12, C5F14, C4F9OCH3, C4F9CH3, C4F9OC2H5, and C4F9C5H5, as well as others, and may also include combinations of said fluids, some of which may not be organic dielectric fluids having boiling points within the exemplary range. In a preferred embodiment, the working fluid expands substantially when heated and vaporizes easily.
In a preferred embodiment, the RUBE apparatus [3] may integrate with and/or connect to a variety of thermal energy sources and sinks over a wide temperature range, but there may be others external to the RUBE
apparatus [3] that may be accessible via the STEER apparatus [2].
As depicted in Fig. 10-1, and in addition to the thermal energy it may directly recuperate thermal energy from the subsystems with which it integrates, RUBE [3] devices may accept working fluids that are relatively cool (well below the fluid's boiling point, but nowhere near freezing) for its own use, and may deliver working fluids that are relatively warm (at or below the fluid's boiling point). In a preferred embodiment, the RUBE
apparatus [3] may accept working fluid from the STEER apparatus [2] depicted in Fig. 10-1, for cooling purposes.
Internally, the temperatures to which the RUBE apparatus [3] may be exposed may be well above the working fluid's boiling point. In a preferred embodiment, it is an object of the RUBE apparatus [3] to provide integral sources and sinks for said thermal energy within non-FRAME subsystems to which the FRAME

apparatus may be connected and integrated (and of which it may be a part). In a preferred embodiment the SCRAM [7] apparatus depicted in Fig. 10-1, and in the Venn diagram of Fig.
10.3-1, may exemplify one such non-FRAME subsystem. In the Venn-diagram of Fig. 10.3-1, the overlapping areas between the two circles (depicting RUBE and SCRAM) represent interfaces and integrations that may exist between various RUBE
and SCRAM devices, such as where a SCRAM heat source may mate with a RUBE heat sink, for example.
However, thermal sources and sinks may also occur within RUBE's sibling FRAME
subsystems (i.e., SLAM [1], STEER [2], PERKS [4], and FORCE [5], and these may be opportunistically be utilized by RUBE [3].
In a preferred embodiment, RUBE [3] may exchange working fluids with its sibling FRAME subsystems only via the STEER [2] apparatus depicted in Fig. 10-1, rather than directly, and so may take advantage of any available thermal sources and sinks, including those external to the node itself, that are under the control of the SLAM apparatus [1] and accessible via the STEER [2] apparatus.
In a preferred embodiment, RUBE [3] may utilize a relatively low-temperature phase-change working fluid in conjunction with heat exchanger surfaces that may promote heterogeneous nucleation, so that it may more easily acquire and recuperate heat energy ("boiling energy") from hot spots and warm spots for immediate or subsequent reuse.
In a preferred embodiment, recuperated "boiling energy" heats and expands the working fluid, which, by natural convection or thermosiphon, or in conjunction with a vapor injection mechanism, may implement a type of thermal pump or thermocompressor that may deliver motive force that may circulate or help to circulate said working fluid. "Boiling energy" in this context refers to energy that may be immediately used immediately (or stored for later use) and that may help effect a liquid/vapor phase-change, without approaching the working fluid's critical heat flux. In a preferred embodiment, said vapor injection mechanism may also implement a phase, temperature, and pressure conversion capability suitable for merging streams of working fluid that may differ in phase, temperature, and/or pressure. In a preferred embodiment, said stream-merging and conversion capability may be utilized by the STEER [2]
apparatus depicted in Fig. 10-1.
In a preferred embodiment, recuperated energy heats and expands the working fluid (possibly involving a complete or partial phase-change, depending on the temperature and pressure), which, in conjunction with optional vapor injection and adequate subcooling, may create a motive force that may help to circulate the working fluid among system components. In a preferred embodiment, said circulated working fluid may help to thermally stabilize the system, to further extract re-usable energy for immediate reuse or storage, and/or to efficiently reject waste energy without overly subcooling the working fluid.
In a preferred embodiment, a relatively small, continuous, positively pressurized liquid flow may be maintained among selected subsystems or components, which may be ensured via one or more low-power pumps, in order to prevent dryout, eliminate local hot spots, and assure thermal stability - as an asset-protection mechanism that may serve to reduce or eliminate dependency on thermal expansion, nucleation and vapor injection as the only motive forces. In a preferred embodiment, said pump(s) may operate at reduced power levels or may be powered off completely when the required flow can be maintained without them.
In a preferred embodiment, said low-power pumps may be of a high-reliability variable-voltage direct-current sealless magnetic type, most preferably with a spherical ceramic bearing or other means of minimizing friction and mechanical wear, thereby contributing to reliability and availability. In a preferred embodiment, said low-power pumps may be configured to be at least doubly or triply redundant, partly due to their role as an asset-protection mechanism.

10.3.1 RUBE - Heat Energy Recuperation Cycle Overview An overview of the RUBE Heat Energy Recuperation Cycle is depicted in Fig.
10.3.1-1, Fig. 10.3.1-2, and Fig. 10.3.1-3 for three different preferred embodiments, labeled "v.1,"
"v.2," and "v.3" for convenience within the next three subsections.
In a preferred embodiment common to all three embodiments described, the RUBE
Double-Boiler Inner Boiler [5] preferentially receives relatively cooler working fluid because it contains a combination of potentially temperature-sensitive and/or high-heat-flux electronic devices whose respective operating temperature ranges must be appropriately maintained in order to retain the optionally advantageous properties of the system. The RUBE Double-Boiler Inner Boiler [5] is part of the RUBE Double-Boiler described in 10.3.2, and is further described in 10.3.3.
In a preferred embodiment common to all three embodiments described, the RUBE
Double-Boiler Outer Boiler Chamber & Reservoir [10] receives relatively warmer working fluid because it contains a combination of potentially temperature-insensitive and/or low-heat-flux electronic devices whose respective operating temperature ranges are sufficiently lax as to need no extraordinary attention in order to retain the optionally advantageous properties of the system. The RUBE Double-Boiler Outer Boiler Chamber &
Reservoir [10] is part of the RUBE Double-Boiler described in 10.3.2.

10.3.1.1 RUBE - Heat Energy Recuperation Cycle Overview, v.1 In the following description of a preferred embodiment, labeled for convenience as "RUBE Heat Energy Recuperation Cycle v.1" and depicted in Fig. 10.3.1-1, any of the devices may each be replaced by a multiplicity of units with the same or altered characteristics, plumbed in series or parallel so as to modify flow or pressure as desired, or to effect an optimal use of a variety of heat or cold sources of differing characteristics. In a preferred embodiment, said devices may be part of and/or integral to the STEER
apparatus [2] depicted in Fig. 10-1, and thus may be dynamically reconfigurable.
As depicted in Fig. 10.3.1-1, the flow control valves [2], [4], [6], [11], [12], and [14] may each be either a simple check valve (the minimum requirement), or an optional hybrid apparatus ("hybrid flow control valve") comprising a check valve combined with some type of tap valve, proportional valve, or other means of flow control that can be used to optimally balance the system, and which may be optionally dynamically controllable, for example via integration with the STEER apparatus [2]
depicted in Fig. 10-1, or via other electronic and/or computer-controlled mechanisms.
In a preferred embodiment, said flow control valves may be deployed on aggregated flows as depicted above, or alternatively, on individual flows when a multiplicity of one or more of the devices are present. In a preferred embodiment, the optional hybrid flow control valves may be present only if they're under the control of a system monitoring and control function, in which case it may be presumed that some combination of temperature, pressure, and/or flow sensors may also be placed at appropriate points to provide feedback to the system monitoring and control function.
As depicted in Fig. 10.3.1-1, pump [1] pulls working fluid from the RUBE
Double-Boiler outer boiler chamber & reservoir [10] and pushes it through flow control valve [2] into the liquid inlet [3] of the RUBE
Vapor Injector, out the delivery outlet, and through another flow control valve [4] into the RUBE Double-Boiler inner boiler apparatus [5], which may add heat to the fluid and may cause it to partially or fully evaporate.
The heated working fluid, which may be any combination of liquid and vapor, then may exit the inner boiler apparatus [5] through either or both of two flow control valves [6] and/or [12], where it may proceed along either or both of the two downstream paths, following the path of least resistance (determined pseudo-statically by the configuration, or dynamically when hybrid flow control valves are present).
In an alternate preferred embodiment (not shown), the output of pump [1] may connect to a dynamically controllable splitter or diverter valve (not shown), one of whose outputs may connect as previously described to flow control valve [2], and the other of which may connect into an additional inlet (not shown) on the RUBE
condenser apparatus [8], in order to mitigate the risk associated with the possibility that all working fluid circulating to the inner boiler apparatus [5] may be inadvertently leaked into the outer boiler chamber &
reservoir [10], thereby bypassing the RUBE condenser apparatus [8]. In a variant of said alternative preferred embodiment, a check-valved separate pump (not shown) may comprise the bypass mechanism. In either of said alternative preferred embodiments, fluid may be cooled and/or condensed directly from the outer boiler chamber & reservoir [10], which may thereby allow the system to function (at a lower efficiency) and/or survive for a longer period.
From flow control valve [6] the working fluid and/or vapor may flow into the optional PERKS or RUBE load-shaver apparatus [7] if present, and then into the condenser apparatus [8], or else into the condenser apparatus [8] directly. In the RUBE condenser apparatus [8], any vapor present may condense to fluid and may return via pump [9] to the outer boiler chamber & reservoir [10], where it may be circulated and preheated in preparation for repeating the cycle through pump [1], and/or through the pump bypass path directly into flow control valve [11 ].
From flow control valve [12] the working fluid and/or vapor may flow into the vapor inlet [13] of the RUBE
Vapor Injector where it may be mixed with liquid working fluid from liquid inlet [3]. When little or no vapor is present, the RUBE Vapor Injector may simply serve as a mixer, with no particular contribution to thermal efficiency, and this situation may occur when the system is operating at sufficiently low power levels that the working fluid is still below its boiling point after leaving the inner boiler apparatus [5] (this may be the normal startup scenario - the energy recuperation apparatus may be initialized to begin the flow of working fluid before power may be applied to the inner boiler apparatus [5]). Under high-power scenarios, and/or with sufficient pressure in the outer boiler chamber & reservoir [10], vapor may vent through flow control valve [14] into vapor inlet [13].
When sufficient vapor is presented at the vapor inlet [13] of the RUBE Vapor Injector, so as to activate its normal "vapor injection" operating mode (as described in section 10.3.4), a suction may be created at liquid inlet [3] and a positive pressure may be created at the delivery outlet, which may cause working fluid to flow from flow control valves [2] and/or [11 ] according to the path of least resistance.
Depending on the actual working temperatures and pressures within the system, including the settings of the flow control valves [6], [12], and/or [14], it may be possible to power-down pump [1] while retaining the full operation of the system, thereby adding to the overall efficiency of the system. In the case of a powered-down pump [1], it may also be possible to close flow control valve [2] (i.e., if it is an optional hybrid as described earlier), thereby preventing flow-induced wear on pump [1] (even though such wear may be minimal).

10.3.1.2 RUBE - Heat Energy Recuperation Cycle Overview, v.2 In the following description of the embodiment depicted in Fig. 10.3.1-2, the operation is nearly identical to that of Fig. 10.3.1-1, except for an altered path between the outer and inner boilers (from the outer boiler to inner boiler) of the RUBE Double-Boiler apparatus, depicted as [10] and [5], respectively, and the inclusion of an optional recuperative vapor path from said outer boiler [10] to the optional PERKS or RUBE Load Shaver [7] if present, or to the RUBE Condenser [8] otherwise.
In the following description of a preferred embodiment, labeled for convenience as "RUBE Heat Energy Recuperation Cycle v.2" and depicted in Fig. 10.3.1-2, any of the devices may each be replaced by a multiplicity of units with the same or altered characteristics, plumbed in series or parallel so as to modify flow or pressure as desired, or to effect an optimal use of a variety of heat or cold sources of differing characteristics. In a preferred embodiment, said devices may be part of and/or integral to the STEER
apparatus [2] depicted in Fig. 10-1, and thus may be dynamically reconfigurable.
As depicted in Fig. 10.3.1-2, the flow control valves [2], [4], [6], [11], [12], [14], and [15] may each be either a simple check valve (the minimum requirement), or an optional hybrid apparatus ("hybrid flow control valve") comprising a check valve combined with some type of tap valve, proportional valve, or other means of flow control that can be used to optimally balance the system, and which may be optionally dynamically controllable, for example via integration with the STEER apparatus [2]
depicted in Fig. 10-1, or via other electronic and/or computer-controlled mechanisms.
In a preferred embodiment, said flow control valves may be deployed on aggregated flows as depicted above, or alternatively, on individual flows when a multiplicity of one or more of the devices are present. In a preferred embodiment, the optional hybrid flow control valves may be present only if they're under the control of a system monitoring and control function, in which case it may be presumed that some combination of temperature, pressure, and/or flow sensors may also be placed at appropriate points to provide feedback to the system monitoring and control function.
As depicted in Fig. 10.3.1-2, two paths feed the RUBE Double-Boiler Inner Boiler [5]. In a first path, pump [1] may pull working fluid from the RUBE Double-Boiler Outer Boiler Chamber & Reservoir [10] and push it through flow control valve [11] into the RUBE Double-Boiler inner boiler apparatus [5]. Ina second path, a combination of pressure from RUBE Double-Boiler Outer Boiler Chamber &
Reservoir [10] and suction induced by RUBE Vapor Injector at liquid inlet [3] may pull working fluid from the RUBE Double-Boiler Outer Boiler Chamber & Reservoir [10] through flow control valve [2]
into the liquid inlet [3] of the RUBE Vapor Injector, out the delivery outlet, and through a flow control valve [4] into the RUBE Double-Boiler inner boiler apparatus [5].
The RUBE Double-Boiler inner boiler apparatus [5] may add heat to the working fluid and thereby cause it to partially or fully evaporate. The heated working fluid, which may be any combination of liquid and vapor, then exits the inner boiler apparatus [5] through either or both of two flow control valves [6] and/or [12], where it may proceed along either or both of the two downstream paths, following the path of least resistance (determined pseudo-statically by the configuration, or dynamically when hybrid flow control valves are present).
From flow control valve [6] the working fluid and/or vapor may flow into the optional PERKS or RUBE load-shaver apparatus [7] if present, and then into the condenser apparatus [8], or else into the condenser apparatus [8] directly. In the RUBE condenser apparatus [8], any vapor present can condense to fluid and return via pump [9] to the outer boiler chamber & reservoir [10], where it can provide cool fluid to pump [1 ]
and/or RUBE Vapor Injector [3] (via flow control valve [2]), and also be circulated, preheated, and/or vaporized, in which case it can exit through flow control valve [14] into the vapor inlet [13] of the RUBE Vapor Injector [3].
From flow control valve [12] the working fluid and/or vapor may flow into the vapor inlet [13] of the RUBE
Vapor Injector where it may be mixed with liquid working fluid from liquid inlet [3]. When little or no vapor is present, the RUBE Vapor Injector may simply serve as a mixer, with no particular contribution to thermal efficiency, and this situation may occur when the system is operating at sufficiently low power levels that the working fluid may still be below its boiling point after leaving the inner boiler apparatus [5] (this may be the normal startup scenario - the energy recuperation apparatus may be initialized to begin the flow of working fluid before power may be applied to the inner boiler apparatus [5]).
In a preferred embodiment, under high-power scenarios, and/or with sufficient pressure in the outer boiler chamber & reservoir [10], vapor may vent through flow control valve [14] into vapor inlet [13].
When sufficient vapor is presented at the vapor inlet [13] of the RUBE Vapor Injector, so as to activate its normal "vapor injection" operating mode (as described in section 10.3.4), a suction may be created at liquid inlet [3] and a positive pressure may be created at the delivery outlet, which may cause working fluid to flow from flow control valve [2] into the RUBE Double-Boiler inner boiler apparatus [5] via flow control valve [4].
Depending on the actual working temperatures and pressures within the system, including the settings of the flow control valves [6], [12], and/or [14], it may be possible to power-down pump [1] while retaining the full operation of the system, thereby adding to the overall efficiency of the system. In the case of a powered-down pump [1], it may also be possible to close flow control valve [11] (i.e., if it is an optional hybrid as described earlier), thereby preventing flow-induced wear on pump [1] (even though such wear may be minimal).

10.3.1.3 RUBE - Heat Energy Recuperation Cycle Overview, v.3 In the following description of the embodiment depicted in Fig. 10.3.1-3, the operation is nearly identical to that of Fig. 10.3.1-2, except for the addition of a new path between the outer and inner boilers (from the inner boiler to outer boiler) of the RUBE Double-Boiler apparatus, depicted as [5]
and [10], respectively; the elimination of both the flow control valve at [12] and the path from [12] to RUBE Vapor Injector [13], leaving an altered feed path to the RUBE Vapor Injector, depicted as the path from [14] to [13].
In the following description of a preferred embodiment, labeled for convenience as "RUBE Heat Energy Recuperation Cycle v.3" and depicted in Fig. 10.3.1-3, any of the devices may each be replaced by a multiplicity of units with the same or altered characteristics, plumbed in series or parallel so as to modify flow or pressure as desired, or to effect an optimal use of a variety of heat or cold sources of differing characteristics. In a preferred embodiment, said devices may be part of and/or integral to the STEER
apparatus [2] depicted in Fig. 10-1, and thus may be dynamically reconfigurable.
As depicted in Fig. 10.3.1-3, the flow control valves [2], [4], [6], [11], [14], and [15] may each be either a simple check valve (the minimum requirement), or an optional hybrid apparatus ("hybrid flow control valve") comprising a check valve combined with some type of tap valve, proportional valve, or other means of flow control that can be used to optimally balance the system, and which may be optionally dynamically controllable, for example via integration with the STEER apparatus [2]
depicted in Fig. 10-1, or via other electronic and/or computer-controlled mechanisms.
In a preferred embodiment, said flow control valves may be deployed on aggregated flows as depicted above, or alternatively, on individual flows when a multiplicity of one or more of the devices are present. In a preferred embodiment, the optional hybrid flow control valves may be present only if they're under the control of a system monitoring and control function, in which case it may be presumed that some combination of temperature, pressure, and/or flow sensors may also be placed at appropriate points to provide feedback to the system monitoring and control function.
As depicted in Fig. 10.3.1-3, two paths feed the RUBE Double-Boiler Inner Boiler [5]. In a first path, pump [1] may pull working fluid from the RUBE Double-Boiler Outer Boiler Chamber & Reservoir [10] and push it through flow control valve [11] into the RUBE Double-Boiler inner boiler apparatus [5]. Ina second path, a combination of pressure from RUBE Double-Boiler Outer Boiler Chamber &
Reservoir [10] and suction induced by RUBE Vapor Injector at liquid inlet [3] may pull working fluid from the RUBE Double-Boiler Outer Boiler Chamber & Reservoir [10] through flow control valve [2]
into the liquid inlet [3] of the RUBE Vapor Injector, out the delivery outlet, and through a flow control valve [4] into the RUBE Double-Boiler inner boiler apparatus [5].
The RUBE Double-Boiler inner boiler apparatus [5] may add heat to the working fluid and thereby cause it to partially or fully evaporate. The heated working fluid, which may be any combination of liquid and vapor, then exits the inner boiler apparatus [5] through either or both of two flow control valves [6] and/or [12], where it may proceed along either or both of the two downstream paths, following the path of least resistance (determined pseudo-statically by the configuration, or dynamically when hybrid flow control valves are present).
From flow control valve [6] the working fluid and/or vapor may flow into the optional PERKS or RUBE load-shaver apparatus [7] if present, and then into the condenser apparatus [8], or else into the condenser apparatus [8] directly. In the RUBE condenser apparatus [8], any vapor present can condense to fluid and return via pump [9] to the outer boiler chamber & reservoir [10], where it can provide cool fluid to pump [1]
and/or RUBE Vapor Injector [3] (via flow control valve [2]), and also be circulated, preheated, and/or vaporized, in which case it can exit through flow control valve [14] into the vapor inlet [13] of the RUBE Vapor Injector.
From the RUBE Double-Boiler inner boiler apparatus [5] the working fluid may can exit under pressure at point [12] and flow directly into the main reservoir of the outer boiler apparatus [10]. The omission of a flow control valve at point [12] is an intentional departure from the preferred embodiment described in the previous section (10.3.1.2). Keeping in mind that the inner boiler apparatus [5] is actually contained within the outer boiler apparatus [10], the omission of a flow control valve at point [12] (which may actually represent a multiplicity of control valves) enables the path from the inner boiler apparatus [5] to the outer boiler apparatus [10] to simply be a pattern of convenient egress points (e.g., "holes") in the inner boiler apparatus [5].
In a preferred embodiment, some or all of the egress points from the inner boiler apparatus [5] can be distributed over the interfacing surfaces of the inner boiler apparatus [5] so as to evenly (or otherwise) distribute the escaping fluid into the outer boiler apparatus [10], thereby improving turbulence and enhancing the mixing of fluid temperatures with the outer boiler apparatus [10], while also reducing or smoothing the thermal gradients that may otherwise be present within the outer boiler apparatus [10].
In a preferred embodiment, some or all of said egress points, particularly those most likely to handle relatively hotter working fluid (and thus relatively higher vapor content) from the inner boiler apparatus [5]
can be preferentially distributed over the interfacing surfaces of the inner boiler apparatus [5] so as to distribute the fluid with relatively higher vapor content into specific areas of the outer boiler apparatus [10].
In a preferred embodiment, said specific areas are those where the escaping fluid is less likely to encounter relatively cooler fluid (which would have a cooling and/or condensing effect), thereby improving the rate at which working fluid is converted to vapor, which consequently increases the pressure within the outer boiler apparatus [10], which may beneficially raise the boiling point of the working fluid and/or increase the motive force available at the vapor inlet [13] of the RUBE Vapor Injector (the latter may occur only if the fluid is allowed to vent through flow control valve [14] into vapor inlet [13]).
In a preferred embodiment, under high-power scenarios, and/or with sufficient pressure in the outer boiler chamber & reservoir [10], vapor may vent through flow control valve [14] into vapor inlet [13].

When sufficient vapor is presented at the vapor inlet [13] of the RUBE Vapor Injector, so as to activate its normal "vapor injection" operating mode (as described in section 10.3.4), a suction may be created at liquid inlet [3] and a positive pressure may be created at the delivery outlet, which may cause working fluid to flow from flow control valve [2] into the RUBE Double-Boiler inner boiler apparatus [5] via flow control valve [4].
Depending on the actual working temperatures and pressures within the system, including the settings of the flow control valves [6], [12], and/or [14], it may be possible to power-down pump [1] while retaining the full operation of the system, thereby adding to the overall efficiency of the system. In the case of a powered-down pump [1], it may also be possible to close flow control valve [11] (i.e., if it is an optional hybrid as described earlier), thereby preventing flow-induced wear on pump [1] (even though such wear may be minimal).

10.3.2 RUBE - Double Boiler The RUBE Double-Boiler apparatus is part of a closed-loop system, that, in a preferred embodiment, is connected to other components as depicted in Fig. 10.3.1-1, Fig. 10.3.1-2, and Fig. 10.3.1-3, corresponding to three different preferred embodiments of the RUBE Heat Energy Recuperation Cycle described in section 10.3.1.
Each RUBE Double-Boiler apparatus comprises one or more "inner boiler" units and an "outer boiler," such that the former are fully enclosed within the latter, in order to maximize the recuperation of heat energy (thermal energy) dissipated by the aggregation of enclosed heat sources, and optionally, to separate the recuperated heat energy into two or more "grades" according to desired or observed temperatures.
In a preferred embodiment common to all three embodiments described in section 10.3.1, the RUBE Double-Boiler Inner Boiler apparatus preferentially receives relatively cooler working fluid, while the RUBE Double-Boiler Outer Boiler Chamber & Reservoir receives relatively warmer working fluid.
In a preferred embodiment, potentially temperature-sensitive and/or high-heat-flux electronic devices whose respective operating temperature ranges must be appropriately maintained in order to retain the optionally advantageous properties of the system are preferentially placed within an inner boiler (or at least have their "hot" surfaces within an inner boiler). Consequently, potentially temperature-insensitive and/or low-heat-flux electronic devices (i.e., those devices whose respective operating temperature ranges are sufficiently lax as to need no extraordinary attention in order to retain the optionally advantageous properties of the system) are placed within the enclosing outer boiler.
In an alternative preferred embodiment, without special concern for temperature sensitivity (such as when the components under consideration are not particularly temperature-sensitive), the relatively "hot" heat sources (e.g., those components with a relatively higher heat flux, such as CPUs and point-of-load power regulator components) are preferentially placed within the inner boiler (or at least have their "hot" surfaces within an inner boiler), and the "warm" heat sources (i.e., those components with a relatively lower heat flux, such as DRAM and flash memory chips) are placed within the enclosing outer boiler.
Both inner and outer boilers may be pressure vessels intended to withstand a maximum of 7-bar operating pressures (100 PSI) under normal conditions, plus a margin of safety.
Unintentional leaks within an inner boiler cause only a reduction in efficiency, but leaks in the outer boiler can cause a loss of working fluid and a subsequent reduction in local survivability. In a preferred embodiment the normal operating pressure for both the inner and outer boilers does not exceed 2 bar, and may be substantially less.
In a preferred embodiment, it is optionally advantageous for the RUBE Double-Boiler Outer Boiler Chamber & Reservoir to have a somewhat vertical orientation with a distinct "top"
having a vapor dome to simplify the collection of vapor evolving from the working fluid.
In a preferred embodiment, such as for electronics thermal stabilization applications, the working fluid may be an organic dielectric fluid with a boiling point between 20 C and 40 C, such as 1-methoxy-heptafluoropropane (C3F7OCH3). Other working fluids may also be suitable, some examples of which are listed in section 10.3. In a preferred embodiment, the working fluid expands substantially when heated and vaporizes easily.
In a preferred embodiment, the RUBE Double-Boiler apparatus has an outer shell of cast aluminum (although other construction methods and materials are possible), and its external shape and form factor are such that it can mate with guide channels extruded into a vertically oriented cylindrical or partly cylindrical aluminum extrusion designed to contain a multiplicity of RUBE Double-Boiler units.

Given the aforementioned vertically oriented extrusion, in a preferred embodiment, the intent is to be able to easily align and slide the RUBE Double-Boiler apparatus from the extrusion upper opening, downward into the extrusion until it reaches a bulkhead, where couplings and connectors on the bottom of the Double-Boiler apparatus mate with complementary couplings and connectors within the extrusion. In an alternate embodiment, the RUBE Double-Boiler apparatus aligns and slides downward from the extrusion upper opening, into the extrusion until it reaches a mechanical stop; at that point a lever or cam means accessible from the top can be exercised such that it pulls the RUBE Double-Boiler apparatus toward a nearby vertical interior wall, such that couplings and connectors on the side of the RUBE
Double-Boiler apparatus mate with complementary couplings and connectors on the extrusion's interior wall.
In a preferred embodiment, the RUBE Double-Boiler apparatus is a pressure-sealed, field-replaceable unit having blind-mating, quick-disconnect inlet and outlet couplings with double EPDM seals, and capable of operating at 100 PSI, such as those available from Colder (the extrusion would contain mating couplings).
In a preferred embodiment, the RUBE Double-Boiler apparatus is also electrically sealed and EMP-hardened, having blind-mating, quick-disconnect electrical connectors with a multiplicity of conductors appropriate for the ingress and egress of electrical power feeds and various high-frequency signals such as are common in computer and telecommunications devices.
In a preferred embodiment, the RUBE Double-Boiler apparatus connects to a "bottom plane," "mid-plane," or "backplane," or equivalent connector arrangement in the vertical extrusion by means of a proprietary, pin-free connector designed by Morgan Johnson, and having the property of providing an extremely high quality, nearly noise-free connection. In an alternate embodiment, the same connector arrangement is used, but is placed on the side (or back) of the RUBE Double-Boiler apparatus, rather than on the bottom.
See also: RUBE, RUBE Inner Boiler.

Fig. 10.3.2-1 is intended to further clarify the relationship of the inner and outer boilers. It uses the same numbering as the previous figure. Although a preferred embodiment may include multiple inner boilers, only one is depicted here, for clarity.

In a preferred embodiment of the RUBE Double Boiler, an example of which is depicted in Fig. 10.3.2-2, the outer boiler [1 ] shown on the left is a pressure vessel containing a dielectric working fluid and the electronics module shown on the right. In this simple example, the memory modules [2] are representative of electronics that are immersed in the working fluid of the outer boiler. Notice that they appear on both sides of the assembly. However, they're not on opposite sides of the same PCB, but rather on two different PCBs.
The reverse side of each PCB contains "hot" chips like CPUs, etc., and these are placed back-to-back with a pair of injection-molded manifolds [3] between them. Heat exchangers affixed to the "hot" chips are immersed in the path of turbulent working fluid moving through the manifold assembly or "inner boiler." As a special example, the devices at [4] represent two-sides modules that are "extra hot" and participate in the inner boiler just like the other hot chips. However, one surface of the module faces the inner boiler, and the other faces the outer boiler, so cooling (or heating, if you have the boiler's point of view) can occur from both sides at the same time.

10.3.3 RUBE- Inner Boiler In a preferred embodiment, the primary objective of the RUBE Inner Boiler apparatus is to ensure that the maximum case temperatures (TcaS.) of high-heat-flux heat-producing devices do not exceed particular thresholds, in order to ensure that the devices do not produce more heat than their individual or collective desired target thresholds. The importance of this cannot be overstated, because the ability to stay below said thresholds enables drastic reductions in power consumption for an entire class of integrated circuit devices. The idea here is that avoiding energy waste preemptively is a great improvement over recuperating a portion of the energy that would otherwise be wasted.
In a preferred embodiment, the secondary objective of the RUBE Inner Boiler apparatus is to recapture thermal energy dissipated by the electronic devices contained within, so that said energy may be put to good use rather than wasting it (e.g., by rejecting it to the environment as heat).

Although other seal materials are possible, EPDM is preferred for its compatibility with the preferred working fluid.

Fig. 10.3.3-1 depicts the basic mechanical concept of the inner boiler, namely, that there is an assembly or other mechanism capable of containing and isolating a set of selected heat-producing chips from those not selected, while preferentially cooling said chips by efficiently circulating coolant to them, capturing the heat energy they emit, and passing it downstream for reuse..
In a preferred embodiment, the RUBE Double-Boiler's inner boiler apparatus comprises one or more check-valved manifold assemblies (6 are shown above), each with any number of heat-exchanger seals 0, baseplate heat exchangers 0 (an example is shown, but many commercial off-the-shelf units are suitable, including those with non-rectangular shapes), heat-producing devices (not shown, but typically electronics devices on a printed circuit board, or PCB), and backing/pressure plates to aid in providing clamping force (such as would be placed on the reverse side of a PCB, but also not shown).
The check-valved manifold assemblies further comprise a liquid inlet 0, vapor and liquid outlet 0, two-piece injection-molded manifold chamber 0 and 0 with one or more seals between them, an inlet and outlet check-valve pair (either as individual components, or individually constructed as an injection-molded channel with an attached check-valve means such as a ball-and-spring or flapper as depicted by 0) for each baseplate heat-exchanger 0, and suitable molded-in working fluid flow guides and channels within the two-piece injection-molded manifold chamber 0 and 0.

In a preferred embodiment, the RUBE Inner Boiler accommodates the cooling of devices that may, for the purposes of this discussion, be conveniently categorized as either "temperature-sensitive" or "non-temperature-sensitive." In this context, the former refers to devices whose operating characteristics (e.g., power dissipation) may vary considerably over their specified allowable temperature ranges, whereas the operating characteristics of the latter do not. The two categories of devices typically coexist and may be colocated, thus necessitating a strategy for dealing with both their respective needs within the same RUBE
Inner Boiler (nonetheless, there may be multiple instances of the RUBE Inner Boiler, and they may operate independently with differing devices and/or at differing temperatures).
In a preferred embodiment, the working fluid flows within a RUBE Inner Boiler are configured such that the temperature-sensitive devices receive priority servicing, in order to specifically drive such devices toward the desired operating characteristics and/or thresholds (i.e., by forcing such devices to operate within a specific sub-range of the otherwise allowable range). As a second priority, the non-temperature-sensitive devices are then serviced (i.e., only after the needs of the temperature-sensitive devices have been met). At all times, both categories of devices must be kept within their respective operating temperature ranges, and preferably, well below the upper end of their individual ranges.
The types of temperature-sensitive devices for which the RUBE Inner Boiler is well-suited tend to be commercially available in multiple speed and temperature grades, such as integrated circuits (e.g., a CPU or "processor" having temperature-variable power dissipation, where increased operating temperature results in increased power dissipation). In general, higher speed devices produce more heat than their otherwise equivalent counterparts, and cost more than their lower speed counterparts.
Furthermore, devices that consume less power (and produce less heat) cost more than their otherwise equivalent (and in particular, speed-equivalent) counterparts. Thus, high-speed, low-power components tend to cost the most. Also, the devices with the highest speeds are often unavailable as low-power devices, and certainly not in the lowest power grades (by definition). Finally, the fastest devices often have variable power dissipation, as depicted in Table 10.3.3-1 for three different processors, a commodity-priced device ("A") intended for consumer-class PCs and two much more expensive premium devices ("B" and "C") intended for server-class computers, where "B" is a standard temperature device, and "C" is a low-power (and higher-priced) "premium" device.

In Table 10.3.3-1, which depicting three different temperature-sensitive processors, the desired power dissipation targets for (say, under 30 watts) are highlighted with shading and bold boxes. The power dissipations just outside this range (say, under 40 watts) are also shaded (but without bold boxes), with other values not shaded. All power dissipation values shown are rounded to the nearest integer.

Table. 10.3.3-1.
S ecified Power Dissipation of Three Different Processors at Various Case Tern eratures Tcase Max ( C) Tcase Max ( F) Processor "A" Processor "B" Processor "C"
Power (W) Power (W) Power (W) 49 120 22 28 13.2 51 123.8 28 11 36 11 17 53 127.4 34 44 20 57 134.6 47 60 28 59 138.2 53 68 32 61 141.8 59 76 36 63 145.4 66 84 40 67 152.6 78 95 47 69 156.2 84 N/A 51 71 159.8 89 N/A 55 Although the allowable upper limit of temperature range is specified to be at least 67 C for all three of this example's target processors (TeaS5 Max is 71 C for processors "A" and "C", and 67 C for processor "B"), the temperature sub-ranges required to keep the processor operating at a power dissipation target of 30 watts or less would be defined more strictly (e.g., Tcase Max must be no more than about 51 C for processors "A" and "C", and 49 C for processor "B"). Thus, in this example, flows within the RUBE
Inner Boiler would be prioritized to ensure that the more restrictive temperature subrange is achieved, by way of directing the least-heated working fluid to the higher-priority temperature-sensitive devices first (e.g., one or more processors as described in this example), and to the lowest-priority, non-temperature sensitive devices last. If there are multiple, significantly differing temperature ranges among the devices, those with the highest maximum ranges are placed last within their category (e.g., a non-temperature-sensitive device with an upper operating limit of 70 C would receive working fluid before another non-temperature-sensitive device with an upper operating limit of 100 C), so that the working fluid can absorb maximal heat energy (without endangering the corresponding devices) before exiting the RUBE Inner Boiler.
Table 10.3.3-1 illustrates that a power dissipation target of, for example, under 30 watts can only be achieved by holding the processor's maximum case temperature (Tcase) value to 51 C for processor "B", and 49 C for processors "A" and "C". This is precisely the primary goal of the RUBE Inner Boiler, which in a preferred embodiment, uses a phase-change working fluid that has a normal boiling point of 34 C. Under increased operating pressure the boiling point can go up somewhat while remaining well under a Tcase of 49 C, and we can use this fact to great advantage when rejecting heat to a warm ambient environment or heat sink. The reverse is also true; the boiling point can be reduced if the internal operating pressure is reduced, and we use this fact to increase the temperature delta between the boiling point and the target device temperature.
Furthermore, it is clear that the RUBE Inner Boiler enables gny of the processors depicted to be selected (and still meet the example's "under-30-watt" goal), which means that lower cost devices can be used without penalty. And finally, while not depicted specifically in this table (which is speed agnostic), it means that the For example, rejecting heat into the return loop of a chilled water system, which contains water that has already been heated, and thus will be further heated by the SHAI>OWS FRAME/RUBE system, which improves the efficiency of the chilled water system's external cooling devices.

highest speed device can be used, if desired, which - in this example - is only available in a commodity part (rather than as a premium-priced, lower-power part).
The RUBE Inner Boiler is also a means for recuperating the heat energy dissipated by the relatively high-heat-flux heat-producing devices so that, to the extent practical, it can be converted downstream to usable mechanical and/or electrical energy. In this case, the ability to reject the heat energy to ambient is not a RUBE consideration at all, since RUBE can serve to preheat working fluid for the downstream power production system, thereby increasing overall system efficiency.
Basic Fluid Flow Concept The basic fluid flow concept is depicted in Fig. 10.3.3-2. The inner boiler apparatus is colocated with the "hot" surfaces (the surfaces with the largest heat flux) of the most temperature-sensitive "hot" devices and "hottest" of the heat-producing devices, which are so arranged that such placement is possible with a minimum pressure drop, minimum (or otherwise convenient) number of manifolds, or other possibly constraining criteria. In this basic scenario, devices may be differentiated as to whether or not they are within the inner boiler apparatus, or outside of it. Devices within the inner boiler apparatus may be relatively undifferentiated from each other with respect to how the basic flow accounts for any possible temperature sensitivities they may have (i.e., whether one device is more temperature sensitive than other).
In a preferred embodiment, once normal steady-state operation is reached, working fluid vapor may be expelled through outlet 0 and little or no liquid is present. Liquid working fluid is forced into liquid inlet 0, where it is equitably distributed within the injection-molded manifold chamber 0 and 0 to each heat exchanger's 0 inlet check valve 0, which it can then enter, since the working fluid is under pressure.
For each heat exchanger 0, once the working fluid passes the corresponding inlet check valve, it enters the heat exchanger 0, where it circulates among the heat exchangers fins, pins, or other heat exchange surfaces. Depending on the then-current temperature and pressure, the working fluid can acquire heat energy, causing all or part of it to evaporate.
In a preferred embodiment, such as for electronics thermal stabilization applications, the working fluid may be an organic dielectric fluid with a boiling point between 20 C and 40 C, such as 1-methoxy-heptafluoropropane (C3F7OCH3). Other working fluids may also be suitable, some examples of which are listed in section 10.3. In a preferred embodiment, the working fluid expands substantially when heated and vaporizes easily. Since the inlet is check-valved, this expansion greatly pressurizes the heat exchanger and the working fluid is expelled through the outlet check-valve (where it makes its way to outlet 0), thereby creating a partial vacuum within the heat exchanger 0 under discussion (which helps to pull in more liquid working fluid). The hotter the system gets, the higher the pressure at which it can operate, up to the maximum desired target temperature of the heat-producing devices, or the maximum allowable enclosure pressure, or the useful upper limit of the working fluid, whichever is most constraining.

Advanced Fluid Flow Concept The inner boiler apparatus is colocated with the "hot" surfaces (the surfaces with the largest heat flux) of the most temperature-sensitive "hot" devices and "hottest" of the heat-producing devices, which are so arranged that such placement is possible with a minimum (or otherwise convenient) number of manifoldst.
In keeping with the primary goal of driving the temperature-sensitive devices toward a specific temperature threshold, the internal routes of the working fluid can be configured to encounter these devices prior to encountering the non-temperature-sensitive devices.

In a preferred embodiment, partly depicted in Fig. 10.3.3-3, the inner boiler apparatus is oriented vertically (although it is depicted horizontally here, for convenience) such that both the liquid inlet 0 and vapor and liquid outlet 0 are at the top.

One of the factors determining the maximum size of the manifolds is the desire to take advantage of "Rapid Injection Molding' techniques, in order to reduce the cost and lead times normally associated with injection-molded components.
t One of the factors detenmining the maximum size of the manifolds is the desire to take advantage of"Rapid Injection Molding' techniques, in order to reduce the cost and lead times normally associated with injection-molded components.

In a preferred embodiment, once normal steady-state operation is reached, working fluid vapor may be expelled through outlet 0 and little or no liquid is present. Liquid working fluid is forced into liquid inlet 0, where it is equitably distributed within the injection-molded manifold chamber 0 and 0 to each heat exchanger's 0 inlet check valve 0, which it can then enter, since the working fluid is under pressure.
For each heat exchanger 0, once the working fluid passes the corresponding inlet check valve, it enters the heat exchanger 0, where it circulates among the heat exchangers fins, pins, or other heat exchange surfaces. Depending on the then-current temperature and pressure, the working fluid can acquire heat energy, causing all or part of it to evaporate.
In a preferred embodiment, such as for electronics thermal stabilization applications, the working fluid may be an organic dielectric fluid with a boiling point between 20 C and 40 C, such as 1-methoxy-heptafluoropropane (C3F7OCH3). Other working fluids may also be suitable, some examples of which are listed in section 10.3. In a preferred embodiment, the working fluid expands substantially when heated and vaporizes easily. Since the inlet is check-valved, this expansion greatly pressurizes the heat exchanger and the working fluid is expelled through the outlet check-valve (where it makes its way to outlet 0), thereby creating a partial vacuum within the heat exchanger 0 under discussion (which helps to pull in more liquid working fluid). The hotter the system gets, the higher the pressure at which it can operate, up to the maximum desired target temperature of the heat-producing devices, or the maximum allowable enclosure pressure, or the useful upper limit of the working fluid, whichever is most constraining.
In a preferred embodiment, one set of manifolds operates in the 30 C to 40 C
range for a particular class of heat-producing electronic chips, while another set operates simultaneously in the 90 C to 110 C range for a different class of heat-producing electronic chips. The same working fluid is used for both - in fact, the cooler system can "feed" the hotter system (however, this would typically require a boost in pressure, which can be accomplished externally via pumps, or via the RUBE Vapor Injector, or a combination thereof. See also: Critical Heat Flux, Heat Flux, RUBE, RUBE Double-Boiler, RUBE Vapor Injector.

10.3.3.1 Description of Assembly In a preferred embodiment, the RUBE Double-Boiler's inner boiler apparatus comprises one or more check-valved manifold assemblies (6 are shown above), each with any number of heat-exchanger seals 0, baseplate heat exchangers 0 (an example is shown, but many commercial off-the-shelf units are suitable, including those with non-rectangular shapes), heat-producing devices (not shown, but typically electronics devices on a printed circuit board, or PCB), and backing/pressure plates to aid in providing clamping force (such as would be placed on the reverse side of a PCB, but also not shown).
The check-valved manifold assemblies further comprise a liquid inlet 0, vapor and liquid outlet 0, two-piece injection-molded manifold chamber 0 and 0 with one or more seals between them, an inlet and outlet check-valve pair (either as individual components, or individually constructed as an injection-molded channel with an attached check-valve means such as a ball-and-spring or flapper as depicted by 0) for each baseplate heat-exchanger 0, and suitable molded-in working fluid flow guides and channels within the two-piece injection-molded manifold chamber 0 and 0.
In a preferred embodiment, each heat exchanger O is of a suitable design and construction (in conjunction with the design of the corresponding injection-molded manifold cavity) so as to promote nucleation of an impinging working fluid, and of suitable size and thickness (which may vary among the potentially diverse or otherwise non-homogeneous mix of heat exchangers within the inner boiler apparatus) so that when the manifold assembly is mounted and secured to the PCB or other apparatus containing heat-producing devices, each heat exchanger 0 is pressed directly and firmly against its corresponding heat-producing device, and fully covers the primary high-heat-flux surface of the device.
In an alternate embodiment, fewer check valves are used, possibly omitting either the individual ingress or egress check valves (or both) in favor of a single shared ingress or egress check valve, respectively, or possibly eliminating such valves altogether.
In an alternate embodiment, all or some of the mechanical check valves are substituted with, or augmented by, electrical or electro-mechanical valves (for example, from the class of valves that are similar to those used in fuel injection systems to control flow).
In a preferred embodiment, care is exercised to ensure that all seals in this assembly are compatible with the selected working fluid. In a preferred embodiment, such as for electronics thermal stabilization applications, the working fluid may be an organic dielectric fluid with a boiling point between 20 C and 40 C, such as 1-methoxy-heptafluoropropane (C3F7OCH3), in which case the seals preferentially comprise EPDM having minimal plasticizer content (ideally zero). Other working fluids may also be suitable, some examples of which are listed in section 10.3, and selection of other such fluids should be accompanied by analysis and selection of fluid-appropriate seal materials. Each heat exchanger seal 0 may be clamped between the two-piece injection-molded manifold chamber 0 and 0 and the heat exchanger O. In a preferred embodiment, the heat exchanger 0 may be attached to the two-piece injection-molded manifold chamber 0 and 0 by suitable screws or other fasteners, using the 4 corner holes. In a preferred embodiment, in order to improve reliability and maintainability, and to reduce manufacturing costs, the two-pieces 0 and 0 of the injection-molded manifold chamber may be injection molded out of Black Acetal (Delrin), which allows mass production, and thus economies of scale. In a preferred embodiment, "rapid injection molding" may be used to manufacture low volumes of the manifold chambers 0 and 0, in order to reduce costs and lead times.
10.3.3.2 Baseplate Heat Exchanger In a preferred embodiment, primary heat transfer from hot surfaces or "hot spots" is effected via a baseplate heat-exchanger 0, which is attached to the underlying hot surfaces (e.g., integrated circuit chip packages, or other heat exchange surfaces) by means of an epoxy having the properties of:
= Compatibility with any organic solvents that may used during manufacturing, = Compatibility with the working fluid used during operation, = High temperature resistance, = Resistance to repeated thermal cycling, and = Exceptionally high thermal conductivity.
In a preferred embodiment, the baseplate heat-exchanger 0 comprises a commodity "water block" baseplate of high quality, such as the CNC-machined C110 copper baseplate that is a component of the commercially available Swiftech (www.Swiftech.com) Apogee GT water block (patent pending), which has a thickness of 3mm to promote a high surface compliance factor. The commercially available Koolance (www.Koolance.com) family of water blocks is similarly acceptable. Such water blocks are typically designed using CFD (computational fluid dynamics) to specifically increase surface area, coolant velocity, and surface compliance (with its mating hot surface), while minimizing thermal resistance, pressure drop, and cost.
In an alternative embodiment, a custom baseplate heat-exchanger 0 can be used to achieve specific packaging, heat transfer, weight, cost, availability, manufacturing, or other goals, in accordance with necessary design trade-offs, without diverging from the concept taught here.
In a preferred embodiment, it is desirable to apply special coatings or textures to the baseplate heat-exchanger 0 in order to improve surface area and increase the number of nucleation sites for the phase-change working fluid. In a preferred embodiment, the baseplate is treated by acid etching (such as to achieve a rough 40-60 grit, possibly in combination with other treatments, including those which are mechanical or optical rather than chemical), in order to create a large number of microfeatures on all surfaces, thereby further increasing the surface area available for heat exchange. On the surfaces exposed to phase-change working fluid, this also serves to promote nucleate boiling.
In an electronics application, such as a computing system, the hottest surfaces are typically associated with the electronic chips with the highest transistor counts (CPUs, FPGAs, switches, network interfaces, radios, etc.), and also with various power-handling devices.
In a preferred embodiment with phase-change working fluid both inside and outside the manifold (the manifold and electronics are immersed in it), the working fluid can fill any minute gaps between the hot surfaces and the baseplate, and then boil off, which means that surface compliance is much less important than it would be in a conventional implementation (such as one requiring thermal grease, which is specifically omitted here). Because a preferred baseplate is CNC-machined from copper (and then possibly gold-plated in an alternative embodiment), different thicknesses can easily be created if needed, to accommodate potentially different heights of hot surfaces to be mated with. This allows a fixed manifold profile to be used, with any variability shifted to differing baseplate heights as necessary.

1'0.3.4 RUBE Vapor Injector Inspired by the Gifford Steam Injector invented in 1858, the RUBE Vapor Injector is a means to: 1) maintain a load (the "boiler") within a desired temperature range, and 2) recuperate as much energy as possible from the heat dissipated by the load, in order to convert the recuperated heat energy into mechanical energy (specifically, pressure energy) that can be used as motive force to reduce or eliminate the energy that would otherwise be needed for circulation pumps in a phase-change heating, cooling, and/or power generation system.
In a preferred embodiment, such as for electronics thermal stabilization applications, the working fluid may be an organic dielectric fluid with a boiling point between 20 C and 40 C, such as 1-methoxy-heptafluoropropane (C3F7OCH3). Other working fluids may also be suitable, some examples of which are listed in section 10.3. In a preferred embodiment, the working fluid expands substantially when heated and vaporizes easily. See also: RUBE, RUBE Double-Boiler, RUBE Inner Boiler in the glossary.

10.3.4.1 How the RUBE Vapor Injector Differs from Prior Art Compared to a steam injector:
= The steam injector was designed to operate at high temperatures (e.g., superheated steam, 300 F to 700 F+).
The RUBE Vapor Injector operates at considerably lower temperatures (e.g., typically from 90 F
saturated vapor up to 250 F or so for superheated vapor).
= The steam injector requires superheated steam.
The RUBE Vapor Injector does not.
= The steam injector requires an overflow gap.
The RUBE Vapor Injector has no overflow gap.
= The steam injector requires an overflow valve ("clack valve") and waste pipe.
The RUBE Vapor Injector has neither overflow valve nor waste pipe.
= The steam injector fails (overflows and vents externally) if conditions are not close to "perfect."
The RUBE Vapor Injector continues to function (possibly suboptimally), no matter what.
= The steam injector is designed to vent to the atmosphere.
The RUBE Vapor Injector does not vent externally, but rather, is part of a closed-loop system.
= The steam injector requires an attending engineer or control mechanism.
The RUBE Vapor Injector does not.
Compared to an eductor, ejector, or jet pump ("eductor"):
= The eductor has no thermodynamic effect; the RUBE Vapor Injector has primarily thermodynamic effects.
= The eductor's primary (only) effect is due to Venturi effect; the RUBE Vapor Injector sees this as a beneficial - but secondary - effect with a relatively minor performance contribution.
= The eductor construction must be "tuned" to achieve the Venturi effect at a specific set of pressure and flow parameters, and doesn't work at other settings.
The RUBE Vapor Injector functions well over a relatively wider set of parameters, and always beneficially.

10.3.4.2 The RUBE Vapor Injector - Principle of Operation In essence, the RUBE Vapor Injector comprises three cones (vapor cone 0, combining cone 0 and delivery cone 0 ), with a throat or bottleneck 0 between the latter two (but specifically no overflow gap or overflow valve). The idea is to use a jet of working fluid vapor, when available, to augment the flow of working fluid into the boiler, heating it up in the process.
In a preferred embodiment, such as for electronics thermal stabilization applications, the working fluid may be an organic dielectric fluid with a boiling point between 20 C and 40 C, such as 1-methoxy-heptafluoropropane (C3F7OCH3). Other working fluids may also be suitable, some examples of which are listed in section 10.3. In a preferred embodiment, the working fluid expands substantially when heated and vaporizes easily.

Working fluid (with a boiling point below the desired upper threshold) enters at check-valved liquid inlet 0 (it is initially pumped, but once the process gets going, the working fluid is actually sucked from the inlet 0, due to the thermodynamic effect of the partial vacuum created by condensing vapor in step C, and to a lesser extent, the Venturi effect). Depending on the actual thermodynamic conditions (which in a preferred embodiment, is actively monitored and controlled), the feed pump(s) may continue to operate, but at a reduced load.
Vapor from check-valved vapor inlet 0 enters the converging vapor cone 0 where partial condensation occurs, a partial vacuum is created and pressure energy is converted into velocity (kinetic) energy, resulting in a high velocity jet at the nozzle of vapor cone 0, but with a drop in pressure.
High velocity vapor from the nozzle of the vapor cone 0 enters the converging combining cone 0 where the vapor contacts and thoroughly mixes with liquid working fluid, resulting in a high vacuum as complete condensation occurs. (With a preferred working fluid, such as methoxy-nonafluorobutane, the volume of the vapor is more than 100 times greater (about 116x for methoxy-nonafluorobutane) than the volume of the preferred working fluid from which it was produced, so when condensation occurs in the combining cone 0, vapor returns to liquid with a typical reduction in volume of more than 100:1, resulting in a partial vacuum, which provides suction at liquid inlet 0.) In combining cone 0 the vapor's kinetic energy is transferred to the liquid which results in a jet of heated liquid rushing through the throat of the combining cone 0 and into the divergent delivery cone 0. Note that, unlike a steam injector, there is no overflow gap 0, and thus no overflow outlet 0, or downstream overflow valve.
The diverging shape of the delivery cone 0 converts the kinetic energy of the heated liquid into pressure energy that is at least slightly higher than boiler pressure, which traverses delivery pipe 0 and opens the check-valved flow 0 of working fluid into the boiler via delivery outlet 0.
Disabling conditions such as insufficient vapor speed, imperfect vapor condensation (say, due to overly warm fluid at liquid inlet 0 or an overly hot valve body), cannot occur in the RUBE vapor injector, because its function is to optimize the energy required for pumping, rather than to enable pumping in the first place. On the other hand, the hotter the load becomes, the more efficiently the RUBE vapor injector operates, and the greater "free" motive force it supplies. Working fluid can always be delivered to the delivery outlet 0 if it is supplied at either inlet. Because the RUBE vapor injector requires much less precision than a steam injector, it is expected to be relatively cheaper to manufacture (less precision machining, if any).

10.3.5 RUBE Air-Cooled Subcooler In a preferred embodiment, a facility that provides air-based equipment cooling (for example, via HVAC
ductwork and/or CRAC-cooled raised-floor plenums) connects to the FRAME means directly via an air-cooled RUBE air-cooled subcooler heat-exchanger apparatus. This connection can occur in three different ways:
The RUBE apparatus can reject additional heat into the hot return side of the facility system (increasing overall efficiency) before the air or fluid is actually returned, rather than creating an additional load on the cold supply side of the facility system). In another preferred embodiment, the facility connects to the FRAME means or directly (or indirectly through a heat exchanger) into an embodiment of the STEER
apparatus [2] to which an embodiment of the RUBE apparatus [3] is optionally attached.

10. 3.6 RUBE Liquid-Cooled Subcooler An energy production and/or peak-shaving energy management capability whose goal is to reduce operational costs and enhance or enable survivability. FRAME works by significantly reducing the energy required to operate a heat-dissipating system (such as a computing system), through the recuperative use of energy in general, and by time-shifting the generation and consumption of power to the most effective and/or efficient time-frames.

Typically, the chilled-water system (CWS), is one of three basic designs (constant volume, variable volume with constant evaporator flow, or variable primary flow), but all ultimately provide both a chilled water source and a warm water return. In a typical CWS, a chiller cools water to between 40 F and 45 F (4 C and 7 C). The chilled water is distributed throughout the facility in a piping system and connected to local cooling units as needed.
Typically, a CWS-cooled datacenter also distributes chilled underfloor air via raised flooring; in this case the local cooling units may include a number of CRAC (computer room air conditioning) units comprising heat exchangers and air movers; these move air over heat exchanger coils that have chilled water circulating through them, thus chilling the air.

In a preferred embodiment, given a FRAME subsystem (means) operating with an interface to a co-located facility [7] as depicted in Fig. 10-1, where the facility is using air conditioning (e.g., CRAC or HVAC) or chilled water for cooling datacenter-like heat loads (e.g., a "post-use" chilled water return temperature of 80 F to 85 F or less), the facility can route some portion (possibly all) of its return air or return water directly or indirectly into a FRAME means, as depicted in Fig. 10-2 and 10-3, respectively Fig. 10-2. FRAME Interface to Air Conditioning System (CRAC or HVAC) Fig. 10-3. FRAME Interface to Chilled Water System (CWS) In a preferred embodiment, the facility connects to the FRAME means directly via a RUBE heat-exchanger apparatus [3], so that the RUBE apparatus [3] can reject additional heat into the hot return side of the facility system (increasing overall efficiency) before the air or fluid is actually returned, rather than creating an additional load on the cold supply side of the facility system). In another preferred embodiment, the facility connects to the FRAME means or directly (or indirectly through a heat exchanger) into an embodiment of the STEER apparatus [2] to which an embodiment of the RUBE apparatus [3] is optionally attached.
In another preferred embodiment, facilities with roof access or other access to outside air, or with access to a ground loop, can easily reject heat energy while completely avoiding the additional energy cost associated with a CWS or HVAC system (FRAME, and specifically, the RUBE apparatus [3], gives up its waste heat at least partially via phase-change, which is thermally efficient since no compressor would be required for ambient temperatures up to 90 F or more). Of course, where applicable, the waste heat could also be put to good use in other facility heating or preheating applications (hot water heating, snow removal, etc.). The waste heat temperature optionally available from FRAME can be significantly higher (by 10 F to 30 F or more) than typical data center waste heat, and therefore may be potentially more useful.
In a facility using CRAC units (or equivalently for this description, centralized air conditioning without chilled water), chilled air is typically forced under a computer room raised floor to where it is needed, with warmed return air moved back to the CRAC unit via a path near the ceiling (in a well-designed system, there may also be hot and cold aisles, but these have no bearing on this description). In a preferred embodiment, the FRAME means is connected to said facility CRAC units in the warm "return-air" path, where the warm air provides a cooling effect that is utilized by the FRAME's RUBE liquid-to-air condenser means, while also raising the average temperature of the air that is actually passed on to said CRAC units. In an alternate preferred embodiment, the FRAME means is connected to the facility CRAC units in the cold-air path, where the air provides a cooling effect that is utilized by the RUBE liquid-to-air condenser, in a manner not unlike typical air-cooled datacenter equipment.. In another preferred embodiment, the FRAME means is connected to the facility CRAC units in both the cold-air and warm "return-air" paths, with one or more dynamically controllable dampers that provide selectivity as to which air source path is primary, and to what degree, so as to be capable of accepting control inputs that allow the relative quantities of cooling air directed to the FRAME means, and thereby regulating the temperature of the input air. In a preferred embodiment, said dampers are controlled by the SLAM apparatus [1]
depicted in Fig. 10-1.

A setup similar to CRAC units can used with the HVAC (Heating, Ventilation, &
Air Conditioning) units often deployed in buildings (other than datacenters), except that traditional HVAC
systems tend to utilize duct work rather than depending on a raised floor with underfloor air movement.
Nonetheless, in a preferred embodiment, the FRAME means is connected to a facility HVAC system with the same set of air source constraints (i.e., warm return air, cold air, or dynamically selectable) as for a system based on one or more CRAC units, but using ductwork rather than underfloor spaces for the cold air supply.

Given a facility that uses a chilled water system (CWS) for cooling, and which optionally also has CRAC units comprising heat exchangers with air movers, in a preferred embodiment the connection of the FRAME means to the air system can be strictly optional, substituting instead a connection to the CWS itself, with the FRAME means emulating a CRAC unit, either directly, or via a liquid-to-liquid heat exchanger.

Given a facility that uses a chilled water system (CWS) for cooling, but which provides only CRAC units comprising heat exchangers with air movers without providing for chilled water distribution to non-CRAC units, in a preferred embodiment the FRAME means to the air system can be strictly optional, substituting instead a connection to the chilled water system itself, either directly, or via a liquid-to-liquid heat exchanger.

Given a facility that uses a CWS for cooling, and has no air movers, said FRAME means connects to the chilled water system itself, either directly, or via a liquid-to-liquid heat exchanger. Typically, the CWS is one of three basic designs.

= Constant volume chilled water system = Variable volume chilled water system with constant evaporator flow = Variable primary flow chilled water systems (VPF) In a preferred embodiment, the FRAME means is connected to the facility CWS
units in the warm "return-water"
path, where the warm return water provides a cooling effect that is utilized by the FRAME's RUBE condenser means, while also raising the average temperature of the air that is actually passed on to said CRAC units. In an alternate preferred embodiment, the FRAME means is connected to the facility CRAC units in the cold-air path, where the air provides a cooling effect that is utilized by the RUBE
condenser, in the same way as typical air-cooled datacenter equipment.. In another preferred embodiment, the FRAME means is connected to the facility CRAC units in both the cold-air and warm "return-air' paths, with one or more dynamically controllable dampers that provide selectivity as to which path is primary, and to what degree, so as to regulate the temperature and quantity of cooling air directed to the FRAME means. In a preferred embodiment, said dampers are controlled by the SLAM apparatus [1] depicted in Fig. 10-1.

= RUBE Condenser may operate stand-alone (air-cooled) in ambient environments exceeding 100 F.
= RUBE Condenser may tap into the return (hot side) of datacenter's chilled water loop, just before it returns to chiller.
= RUBE Condenser may normally accept hot water (typically 85 F) being returned to the chiller as its cold input, and transfer additional heat energy to it (which further raises its temperature and increases chiller efficiency).
= RUBE Condenser may connect into both sides (chilled supply & hot return) of a chiller loop (or alternatively, air management system), adaptively using both sides for cooling.
= RUBE Condenser may use chilled water (supply-side) only as a backup measure to temper overly hot (>100F) return water. Alternatively, any small, reliable, energy-efficient compressor can be used (a Stirling cycle compressor would be used in a preferred embodiment) to increase the temperature and pressure of the RUBE Condenser working fluid enough to reject it into the chilled water system's overly hot return water, in which case no connection to the chilled water supply is required.

10.3.7 RUBE Recuperator Assembly At its simplest, a RUBE Recuperator Assembly comprises two inlet manifolds and two outlet manifolds at the bottom (shared by all the pairs of tubes), a structural member at the top, and a set of RUBE Recuperator Tube pairs spanning them, where each tube pair resembles an inverted "U" as can be seen in Fig. 10.3.7-1.
From left to right, Fig. 10.3.7-1 depicts the end, isometric, and side views of a simple half-height assembly capable of supporting two working fluid streams in a counter-current arrangement.
Looking at the end view at the left of Fig. 10.3.7-1, the working fluids enter at the bottom, with one working fluid stream entering from the left, and the other from the right, and each stream proceeds, within its own tube, in the opposite direction from one another, up over the U-bend and down the other side to its respective outlet manifold. Although each tube pair appears to be a single U-shaped tube, the tube you can see is actually the plastic (preferentially polycarbonate) outer tube, and it fully encloses a metal tube within.
In the vernacular of shell-and-tube heat exchangers, the outer plastic tube is the "shell," and the inner metal tube is the "tube." The left-most view in Fig. 10.3.7-2 is a cutaway depicting the spiraling ribs extruded into the wall of the outer plastic tube. These ribs serve multiple purposes that result in a significantly improved heat exchanger capacity and efficiency. The novel construction of these special tube pairs is described further in section 10.3.8.
The fact that this assembly can support "only" two working fluid streams is precisely what makes it a "simple"
assembly. Actually, with only the three views provided in Fig. 10.3.7-1, it is impossible to determine whether this is a "simple" two-stream assembly or something more complex. Given an array of tube pairs such as that depicted in this assembly, each tube pair could conceivably handle a pair of working fluid streams that is completely independent of all the others in the assembly. Of course, that would imply that, rather than all the tube pairs sharing a common set of inlet and outlet manifolds, each such tube pair would requires its own set of inlet and outlet manifolds (a total of four manifolds for each tube pair).
In practice, most complex assemblies may work with only a few different working fluid streams, in which case the real issue is one of combining multiple, smaller simple assemblies into a larger assembly, in order to solve a packaging or space density problem, for example.
In a preferred embodiment, multiple simple assemblies are packaged together to construct a more sophisticated assembly containing multiple sets of inlet and outlet manifolds (a total of four manifolds for each simple assembly). Each simple assembly then becomes a device whose working fluid inputs and outputs can be routed by the STEER subsystem described in section 10.2, under the control of the SLAM
subsystem described in section 10.1. Under the control of SLAM and STEER, two sets of apparently independent heat exchanger assemblies can be directed to operate in parallel with the same set of working fluids, or they can be dynamically (and automatically) reconfigured by SLAM
and STEER to operate in series, for example, in order to increase the effective tube length.
In a preferred embodiment, the FRAME subsystem utilizes only a single working fluid shared by all internal systems, and this enables maximum flexibility in the reconfiguration options available to the SLAM and STEER subsystems. An additional working fluid or two may be needed to interface to external subsystems, but represents only a fraction of the heat exchange streams that may be active in a typical system.
In a preferred embodiment, in order to maximize the both the aggregate heat exchanger capacity and packaging density within in the FRAME subsystem, the normal orientation for RUBE Recuperator assemblies is vertical. In a preferred embodiment, each recuperator assembly is over six feet in height (the tube pairs alone are in excess of six feet), approximately two feet deep (i.e., front to back), and several inches wide (around three inches in width is needed just for the tubes, depending on the chosen outer tube diameter (preferentially 1.25").
Since said recuperator assemblies are preferentially approximately two feet deep (i.e., front to back) and over six feet high, they can easily fit within the equipment profile of a SUREFIRE Mini-Silo miniature underground datacenter, which, in a preferred embodiment requires equipment to fit within a cylinder approximately three feet in diameter. The SUREFIRE Mini-Silo is described in section 11.2 In a preferred embodiment, said assemblies are deployed with telescoping rails at the top and bottom, so that in a datacenter-style equipment rack/cabinet, the assemblies would span most of the vertical space, which means that rack space would be allocated horizontally rather than vertically. For more examples of such configurations, refer to the SUREFIRE Freestanding Vault described in section 11.1.
In the case of datacenter-style equipment rack/cabinets, the amount of horizontal rack space to be allocated to recuperator assemblies depends on various factors beyond the scope of discussion here, but ultimately will be driven by the capacity needed. In the case of the aforementioned SUREFIRE Freestanding Vault, which includes a number of self-contained power plants and a supercomputer, at least two recuperator assemblies are needed. In a preferred embodiment, said assemblies are thus packaged in pairs as depicted in Fig. 10.3.7-2, with each pair sharing a set of telescoping rails.

10.3.8 RUBE Recuperator Tube The RUBE Recuperator Tube pair depicted in Fig. 10.3.8-1 (hereafter, in this section, simply "recuperator tube pair") is a modular U-shaped heat exchanger whose function is to transfer heat energy between two streams of working fluids, with the transfer of heat being toward the cooler fluid. Because its primary role is that of a thermal energy recovery device (i.e., "recuperator") its design accommodates the temperatures and fluids contemplated for its intended applications, rather than those of a general purpose heat exchanger.
Nonetheless, its range of suitable applications is relatively wide, the primary constraint being that of maximum operating temperature, due to the materials of construction, namely, the plastic outer shell.

This section primarily addresses the properties and construction of the recuperator tube pair, rather than specific applications, because the individual devices are designed to be incorporated into other assemblies.
In particular, it is relatively straightforward to construct a high-capacity recuperator array from said tube pairs.
An important property of the recuperator tube pair is for it to work particularly well in scenarios where one of the streams is a liquid (or mostly so) and the other is relatively low-temperature, low pressure vapor (e.g., typically <2 bar, although higher pressures can be accommodated), which in general calls for an efficient vapor path that can minimize pressure loss.

In a preferred embodiment, this is partly accomplished via the construction of the shell, such that said vapor is preferentially transported in the interstitial space between the plastic shell and the finned metal tube it encloses.
In a preferred embodiment, the shell is constructed of a polycarbonate plastic that accommodates working fluids up to somewhat less than 130 C (266 F). In alternate embodiments, other shell materials can be used that may have other temperature characteristics. In a preferred embodiment, the enclosed metal tube is a commercially available doubly enhanced copper alloy tube having integral helical ridges on the inside of the tube to increase turbulence of the tube-side fluid, as well as internal surface area. The outside of said tube has external integral fins spaced approximately 1.3mm apart, to increase external surface area and enhance performance when the shell-side resistance is controlling. In a preferred embodiment, said commercially available copper alloy tube is further enhanced via acid etching of both the inner and outer surfaces, in order to increase surface roughness, and specifically in order to increase the number and variety of nucleation sites in order to promote nucleate boiling. In an alternate preferred embodiment, said acid etching is performed only on one of the surfaces (preferentially the inner surface, which is where liquids to be heated may be preferentially assigned).
In a preferred embodiment, each U-shaped recuperator tube assembly is slightly more than six feet long, which means that the total tube length is in excess of twelve feet. At this length, thermal expansion can be a significant problem for typical shell-and -tube heat exchanger designs.
However, because of its U-shaped, single-tube-per-shell design, the recuperator tolerates thermal stresses well.
In particular, the U-shape effectively cuts the expansion travel in half, and the inner metal tube is free to expand or contract significantly within the plastic shell without encountering the problems that may otherwise occur in a shell-and -tube exchanger, such as when one or more tubes break free of their tube sheet due to thermal expansion.
In an alternate embodiment, shorter or longer recuperators can be easily constructed. The commercially available metal tube material is specified with appropriate-length plain ends and lands, and in a preferred embodiment, with a land where the U-bend is to be made. In a preferred embodiment with the U-bend at a land, the U-bend can be made with a bend radius equal to 1.5 times the diameter of the metal tube (this is much tighter than conventional wisdom would suggest). In an alternate embodiment, the U-bend can be made without a land, although doing so increases the shell-side pressure drop associated with the U-bend.
In a preferred embodiment, any further surface treatment of the metal tube that is to occur (e.g., acid-etching) takes place after the U-bend is made, since this reduces the length of the tube by half, thereby simplifying the surface treatment process. An appropriately sized straight length of the plastic shell is then slipped over the each leg of the U-shaped metal tube (preferentially, the end of the plastic shell occurs approximately where the U-bend land begins).
In a preferred embodiment, the U-bend of the plastic shell is fabricated by means of a two-piece end that fits together over the metal tube and fits against the ends of the straight lengths of plastic shell. In a preferred embodiment, said plastic U-bend halves are fabricated of the same material as the shell's straight lengths, and can be plastic-welded/cement-welded to each other and to the straight lengths (this is an automatable process). In alternate embodiments, other means of fastening may be used, and enhancements may be made to adjoining surfaces (e.g., lips, necks, snaps, etc.) to promote mechanical strength and/or to simplify manual assembly.
In a preferred embodiment, the same basic plastic U-bend fabrication process can be used with variations of the plastic U-bend, in order to achieve specific purposes. For example, the cutaway view of Fig. 10.3.8-2 depicts a smooth continuous U-bend such as may be used for normal heat exchange between two streams of working fluid. However, the cutaway view of Fig. 10.3.9-1 depicts a modification such as may be used to provide a collection and egress point for condensate in an alternative application.
In a preferred embodiment, as depicted in the cutaway views of Fig. 10.3.8-2, the recuperator comprises an extruded, spirally ribbed, plastic tube (preferentially polycarbonate) containing enclosing a one-piece U-shaped metal tube (preferentially copper alloy) having outer fins and inner spiral grooves that are part of the metal itself, with an interstitial space between the finned metal tube and the ribbed plastic shell. The spiral ribs extruded into the plastic tube provide evenly distributed spacing along the its length, so that the finned metal tube is kept centered within the plastic tube. Importantly, the spiral ribs guide the working fluid in a relatively shallow - but also relatively turbulent - spiral around the finned metal tubes (preferentially, said fins are not spiraled), thus forcing the flow to be distinctly non-laminar.
In a preferred embodiment, as depicted in the cutaway views of Fig. 10.3.8-3, four triangular shaped ribs are extruded as the plastic shell itself is extruded, with rotation during extrusion, so as to force the ribs into a spiral pattern. In a preferred embodiment, the base and height of each triangular rib is approximately equal to the wall thickness of the shell, which is preferentially 0.125 inch, and each rib makes three revolutions over a span of approximately six feet, or approximately one-half revolution per foot. In alternative embodiments, other wall thicknesses and rib dimensions (i.e., base, height) may be used, with fewer or more ribs, and fewer or more revolutions per foot.

10.3.9 RUBE Condenser-Separator Tube The RUBE Condenser-Separator Tube depicted in Fig. 10.3.9-1 is an energy recovery and condenser apparatus comprising primarily a heat exchanger and a condensate collector, whose goal is to extract excess heat energy from a low-pressure stream comprising both condensable vapor and non-condensable gases, transfer the heat energy to an alternative stream of cooler working fluid (i.e., the "coolant," which is preferentially a liquid suitable for such purposes), and also separate out and emit a third stream comprising condensate from any condensable vapor in the low-pressure stream.

In a preferred embodiment, the RUBE Condenser-Separator Tube depicted in Fig.
10.3.9-1 is physically very similar to the RUBE Recuperator Tube described in section 10.3.8 and depicted in Fig. 10.3.9-2, and in fact they share the same basic design, that being an extruded, spirally ribbed, plastic tube (preferentially polycarbonate) enclosing a one-piece U-shaped metal tube (preferentially copper alloy) having outer fins and inner spiral grooves that are part of the metal itself.
The primary differences in construction between the condenser-separator tube and the basic recuperator stem from the differences in their respective applications, with the recuperator needing to deal with two fluid streams and the condenser-separator needing to deal with three. Besides simple counter-current heat exchange between two streams, the condenser-separator tube has the additional task of condensing and collecting condensate from the outer plastic tube without obstructing the flow of the non-condensing stream containing it.
The intended use of this device as a condenser results in an operational orientation that is inverted from that of the recuperator tube of 10.3.8, specifically in order to facilitate condensation and simplify the collection and egress of condensate. Another difference is the modification of the outer channel at the U-bend to facilitate condensate collection, and the addition of a third stream port to accommodate condensate egress.

An important use of this device is to dehumidify a low-pressure exhaust stream containing water vapor resulting from a catalytic oxidation reaction such as occurs in the catalytic vaporizer described in section 10.5.3. Thus, the stream containing the vapor to be cooled and condensed is typically a relatively slow-moving exhaust stream from a flameless catalytic heating process, and an excessive pressure drop is undesirable. In contrast, the intended use of this device also presumes a coolant that is under external motive force which results in a pressure and velocity that is in any case significantly greater than that of the low-pressure exhaust stream.

As designed, the interstitial space between the outer tube and the inner tube's outer surface, as depicted in the cutaway view of Fig. 10.3.9-3, has slightly higher cross-sectional area, and thus higher volume, than the inside channel of the metal inner tube, and thus enjoys slightly less pressure drop than the channel inside the inner tube (making it more suitable for low-pressure vapor flow, whereas the coolant, which is presumably a liquid, can easily be pumped in either space). Additionally, the outside of the metal tube has greater surface area for heat exchange than the inside of the metal tube (despite the spirally grooved walls inside the metal tube, which are not visible on the plain ends), and thus is more suitable for acquiring heat from a vapor stream than would be the channel inside the metal tube. Since the coolant is presumably a liquid, or mostly so, and is in any case under pressure due to an external motive force, and since liquid can typically transfer heat much more efficiently than vapor due to lower thermal resistance at the heat transfer surface, relegating the coolant to the inside channel of the metal tube does not create a heat exchange "bottleneck."
Accordingly, in a preferred embodiment, the exhaust stream containing the vapor to be condensed can be circulated from one end of the U-shaped condenser-separator tube to the other, through the interstitial area, so that it can contact the outer surface of the metal tube as well as the heat exchange fins on said outer surface. In said embodiment, the coolant stream, preferentially a liquid, is circulated in the opposite direction, starting at the opposite end of the U-shaped condenser-separator tube and proceeding in counter-current fashion to the other, in order to maximize the transfer of heat from the vapor to the coolant, thereby maximizing condensation as well. Condensate may form on either "leg" of the U-shaped condenser-separator tube, and under the force of gravity will move downward toward the condensate collection area at the bottom (i.e., at the U-bend), where it can drain through the egress port and be removed from the tube.
In a preferred embodiment, during manufacturing the two-part plastic U-bend component at one end of the recuperator described in section 10.3.8 can be replaced with a similar piece having a special condensate collection area, in addition to a condensate egress port. Thus, the initial manufacturing steps of the condenser-separator tube are identical with the initial manufacturing steps of said recuperator, until the final step. At the final step, the process is essentially the same, except that an alternative component is substituted, said part being the two-part plastic U-bend component intended to fit over the U-shaped metal tube at one end of the recuperator, thus connecting the two previously unconnected plastic tubes that at this manufacturing step are fitted like sleeves over the U-shaped metal tube ends.
Said substitutable components (i.e., the two types of two-part plastic U-bend components) differ primarily in the presence of a special pre-formed condensate collection area with accompanying condensate egress port. If desired, the selection of which type of U-shaped heat exchanger tube to produce in a given manufacturing cell can be made on a just-in-time basis, as late as the final manufacturing step.
In an alternative preferred embodiment (depicted in Fig. 10.3.9-1), the as-built recuperator tube of section 10.3.8 can serve as-is as the basis for construction of a condenser-separator tube, and the existing plastic U-bend at one end of the recuperator can simply be drilled to provide one or more drain holes [3] as condensate egress ports. A separate tube [4] referred to here as the condensate channel and comprising the same type of material [2] as the recuperator, and having one or more corresponding drain holes compatible with those of the recuperator, and which therefore may provide condensate originating in the recuperator tube with ingress to the condensate channel tube, can be fastened directly to the recuperator, aligning their respective drain holes, thereby providing both a condensate egress port and a channel for transferring the condensate.

10.4 PERKS - Peak Energy Reserve, Kilowatt-Scale See glossary description.
= FRAME Interfaces to Facility Power = FRAME Interfaces to Facility Fuel Supplies = FRAME Interfaces to External Thermal Exchange/Storage External Thermal Exchange/Storage In a preferred embodiment, thermal energy can be transferred to or from working fluids by pre-heating or pre-chilling them, respectively, at convenient points in time, and storing said fluids into their respective locations, in preparation for their subsequent use (i.e., at a later time).
In a preferred embodiment, relatively "hot" working fluid, preheated or otherwise obtained from a source of relatively low-grade heat (e.g., above 85 C) can be stored externally in an insulated tank or other low heat-loss storage means [8], and such storage can be connected to the STEER
apparatus [2] directly or indirectly via a heat exchanger means, then subsequently used to directly or indirectly supply thermal energy to the FORCE apparatus [5] as a means to help generate electrical power on demand. In a preferred embodiment, an engineered fluid with a relatively low boiling point (e.g., 93 F) is used as the working fluid for temperatures up to approximately 260 F. In an alternative preferred embodiment, a non-toxic, high-grade, low-vapor-pressure, low-viscosity thermal oil (e.g., such as Paratherm ) is used for thermal energy storage and transfer for fluid temperatures up to approximately 650 F.
In a preferred embodiment, relatively "cold" working fluid, pre-chilled or otherwise obtained from a source of at least low-grade cold (e.g., below 15 C) can be stored in an insulated tank or other low heat-gain storage means. In a preferred embodiment, an engineered phase-chase working fluid is used directly for this, especially if very low temperatures are available to justify the expense (such fluids are often pumpable to below minus 100 C, but are relatively expensive). In an alternative embodiment, a more conventional (and less expensive) water/glycol solution or other suitable fluid is used. In a preferred embodiment, an engineered phase-change working fluid have a very low freezing point (e.g., below minus 100 C) is stored directly, and maintained at a temperature well below the ambient temperatures anticipated to occur subsequently.
In a preferred embodiment, opportunistic thermal sinks (e.g., cool or cold ambient air, ground loops, liquid tanks, etc.) can be used directly (e.g., as is) or indirectly (by cooling an intermediate working fluid) to subcool or otherwise "pre-cool" working fluids that can be stored in an insulated tank or other low heat-gain storage means, and such storage can be connected to the STEER apparatus [2] directly or via a heat exchanger means, then subsequently used to optimally supply thermal cooling in a low-cost, time-shifted manner. In a preferred embodiment, a combination of ground loops and underground liquid tanks serves not only to optimally supply thermal cooling (by virtue of the intentional subterranean surface area), but also serves as an important storage means.
PERKS As depicted in the context of Fig. 10-1, the PERKS apparatus 0 directly captures excess or low-cost energy from a multiplicity of sources (e.g., opportunistically, such as when it is cheapest or most readily available) and stores it for later (i.e., time-shifted) use, such as during peak periods (e.g., when power is relatively more expensive or less available). In a preferred embodiment, to directly capture available electrical energy, batteries based on nano-structured lithium titanate spinel oxide (LTO) electrode materials (which replace the graphite electrode materials found in negative electrodes of conventional Li-Ion batteries) are used, in order to achieve a high capacity battery array with a high cycle life (thousands for full-depth battery charge/discharge cycles), quick charge and discharge without heat issues or out-gassing, and relative insensitivity to ambient temperatures (with no safety issues, and no need to consume energy to control the ambient temperatures seen by the batteries). In an alternative preferred embodiment, zinc-bromine-based flow batteries are used, and these can source warm working fluid and sink both warm working and cool working fluid to maintain the desired electrolyte temperature range. In alternative embodiments, other battery technologies are also feasible, with commensurate enhancements or reductions in specific parameters.

In a preferred embodiment, as depicted in Fig. 10-1, the PERKS apparatus [4]
typically exchanges working fluids that are relatively warm (at or below the fluid's boiling point, which may be only 93 F in a preferred embodiment) or cool (i.e., well below the fluid's boiling point, but nowhere near freezing), but these are primarily related to internal operation rather than as sources of thermal energy to be stored. In an alternate embodiment, more extreme temperature ranges (e.g., relatively hot and/or cold) may be accepted, thermally stored, and subsequently delivered, thereby providing an energy reserve in thermal form.
In a preferred embodiment, to directly capture available electrical energy, batteries based on nano-structured lithium titanate spinel oxide (LTO) electrode materials (which replace the graphite electrode materials found in negative electrodes of conventional Li-Ion batteries) are used, in order to achieve a high capacity battery array with a high cycle life (thousands for full-depth battery charge/discharge cycles), quick charge and discharge without heat issues or out-gassing, and relative insensitivity to ambient temperatures (with no safety issues, and no need to consume energy to control the ambient temperatures seen by the batteries).
In an alternative preferred embodiment, zinc-bromine-based flow batteries are used, and these can source warm working fluid and sink both warm working and cool working fluid to maintain the desired electrolyte temperature range. In alternative embodiments, other battery technologies are also feasible, with commensurate enhancements or reductions in specific parameters.
The PERKS apparatus 0 directly captures excess or low-cost energy from a multiplicity of sources (e.g., opportunistically, such as when it is cheapest or most readily available) and stores it for later (i.e., time-shifted) use, such as during peak periods (e.g., when power is relatively more expensive or less available).
In a preferred embodiment, to directly capture available electrical energy, batteries based on nano-structured lithium titanate spinel oxide (LTO) electrode materials (which replace the graphite electrode materials found in negative electrodes of conventional Li-Ion batteries) may be used, in order to achieve a high capacity battery array with a high cycle life (thousands for full-depth battery charge/discharge cycles), quick charge and discharge without heat issues or out-gassing, and relative insensitivity to ambient temperatures (with no safety issues, and no need to consume energy to control the ambient temperatures seen by the batteries).
In an alternative preferred embodiment, zinc-bromine-based flow batteries are used, and these can source warm working fluid and sink both warm working and cool working fluid to maintain the desired electrolyte temperature range. In alternative embodiments, other battery technologies are also feasible, with commensurate enhancements or reductions in specific parameters.

10.4.1 Electrical Power Conditioning and Electrical Energy Storage Wind energy is another form of solar energy, and often tends to peak during daylight hours (diurnal cycle) due to solar heating, with dips in wind energy at night (nocturnal cycle).
Many SUREFIRE sites may be intentionally located where wind energy can be taken advantage of, even if only on a small scale. Wind energy can be used to directly generate electricity for immediate use, and excess electrical energy can be used to charge a PERKS battery array (which can later provide electrical power on demand). Because wind speed is highly variable, wind turbines tend to generate variable-voltage, variable-frequency AC power ("wild AC"), which is subsequently "conditioned" (i.e., rectified to DC, and then optionally inverted back to "stable"
AC, with optional voltage and/or phase-changes along the way). All of the power conditioning actions that generate heat that must normally viewed as energy loss, and thus, a loss of efficiency.
In the Scrutiny SUREFIRE system, however, most or all of the power conditioning apparatus (such as PERKS, described elsewhere) is colocated with (or near) the other electrical loads in the system, so that any generated (and otherwise lost) heat energy associated with power conditioning can be recaptured and used.
For example, whereas the alternators frequently found in turbomachinery and wind turbines typically provide local rectification (i.e., within the alternator itself), in a preferred embodiment remotely located SUREFIRE
alternators output only "wild AC," with any necessary rectification taking place within the confines of FRAME
energy recapturing apparatus. This provides three significant benefits: 1) the alternators are simpler, lighter, and potentially more reliable, 2) the wild AC typically incurs much lower distribution losses than rectified DC, which is especially important for wind turbines atop a tall tower (or otherwise located such that power distribution is a consideration), and 3) the heat energy generated by rectification can be recaptured rather than being lost.
For another example, battery charge/discharge cycles also typically generate heat, and this heat must normally be rejected from the system, while also maintaining battery temperatures within strict limits for optimum life, and ensuring that excessive discharge (e.g., more than 50%
discharged, typically) doesn't occur. However, in a preferred embodiment of the Scrutiny SUREFIRE PERKS
subsystem, a commercially available ZBB (zinc-bromine battery) flow battery array is colocated with the electrical loads, and the heat energy from its charge/discharge cycles is recaptured. Not only does this conserve energy, but the ZBB

array becomes highly tolerant of ambient temperature swings, and by its nature, can be routinely discharged to 0% (i.e., 100% discharged) without damage. Furthermore, during periods with no charge/discharge activity, heat energy from the various other parts of the system (already described) can be used to maintain, for "free", a thermal stable environment for the ZBB array, whose electrolyte freezes at 10 C (50 F), and which operates best in the temperature range 26 C (80 F) to 32 C (90 F), with an absolute operating range of 21 C (70 F) minimum to 49 C (120 F) maximum. Ina preferred embodiment, thermal stability of temperature-sensitive battery array is maintained by via thermal and sensory interfaces to STEER
apparatus, in order to recapture and redirect heat energy by sourcing or sinking it as appropriate.

The PERKS subsystem uses the power from SCRAM's multi-rail redundant AC power supplies to charge batteries when everything is fine, thereby requiring no additional capacity.

10.5 FORCE - Frictionless Organic Rankine Cycle Engine As depicted in Fig. 10-1, the FORCE apparatus [5] is a kilowatt-scale (e.g., 0.5KW to 50KW) modified Rankine cycle heat engine that may comprise the following in some combination:
electrical energy sources, fuel or chemical energy sources, thermal energy sources, low-temperature/low-pressure vapor turbines, generators or alternators, heaters, working fluids (including at least one appropriate organic working fluid for two-phase liquid/vapor operation), heat exchangers (including, for example, vaporizers, superheaters, recuperators, desuperheaters, heaters, preheaters, dehumidifiers, condensers, and subcoolers), insulation, reflectors, sensors, valves, manifolds, pumps, miscellaneous plumbing apparatus, etc.
In a preferred embodiment, the FORCE apparatus [5] may interface with the SLAM
apparatus [1] depicted in Fig. 10-1 for some or all of its control and/or sensory inputs and outputs.
In a preferred embodiment, the FORCE apparatus [5] may interface with the STEER apparatus [2] depicted in Fig. 10-1 for some or all of its working fluid inputs and outputs, where said working fluids provide a controllable means for thermal energy exchange. In a preferred embodiment, the FORCE apparatus [5] may interface and integrate with the STEER apparatus [2] depicted in Fig. 10-1 for some or all of the internal connectivity (i.e., the working fluid inputs and outputs among its internal subsystems), where said working fluids provide a controllable means for thermal energy exchange. In an alternative preferred embodiment (i.e., not involving the STEER apparatus [2] depicted in Fig. 10-1), the FORCE
apparatus [5] may interface directly or indirectly to the various apparatus depicted in Fig. 10-1 for some or all of its working fluid inputs and outputs, where said working fluids provide a controllable means for thermal energy exchange (not depicted).
In a preferred embodiment, the FORCE apparatus [5] may interface with the PERKS apparatus [4] depicted in Fig. 10-1 for some or all of its electrical energy fuel inputs and outputs and/or chemical energy (e.g., fuel) inputs. In an alternative preferred embodiment, the FORCE apparatus [5] may interface with the SCRAM
apparatus [7] depicted in Fig. 10-1 for some or all of its electrical energy outputs (not depicted).
In a preferred embodiment, a primary object of the FORCE apparatus [5] is to convert externally supplied electrical energy, chemical energy (e.g., one or more types of fuel), and/or thermal energy (e.g., heat contained in some type of working fluid) into electrical energy and/or thermal energy that may then be provided as an output to other subsystems. In a preferred embodiment, said electrical energy may be output directly to the PERKS apparatus [4] for subsequent further conversion, storage, and/or distribution. In a preferred embodiment, high-quality thermal energy may be provided as an output in addition to, or in lieu of, electrical energy. In a preferred embodiment, said thermal energy may be output to the STEER
apparatus [2] for subsequent further transport, conversion, storage, and/or distribution.
In a preferred embodiment, the FORCE apparatus [5] depicted in Fig. 10-1 may comprise closed-loop thermodynamic circuits involving a single phase-change working fluid that may be interchanged among the various subsystems of the FRAME apparatus depicted in Fig. 10-1. In a preferred embodiment, such as when FORCE is integrates with electronics thermal stabilization applications, the working fluid may be an organic dielectric fluid with a boiling point between 20 C and 40 C, such as 1-methoxy-heptafluoropropane (C3F7OCH3). Other working fluids may also be suitable, some examples of which are listed in section 10.3.
In a preferred embodiment, the working fluid expands substantially when heated and vaporizes easily.
In a preferred embodiment, the FORCE apparatus [5] depicted in Fig. 10-1 may be augmented with additional closed-loop thermodynamic circuits involving one or more non-phase-change working fluids that may be interchanged among selected subsystems of the FRAME apparatus depicted in Fig. 10-1. In a preferred embodiment, one said non-phase-change working fluid may be a thermal oil with a low vapor pressure (e.g., less than 5 PSIA) within a full operational temperature range of, for example, 49 C (120 F) to 315 C (600 F), such as commercially available Paratherm NF heat transfer fluid (available from Paratherm Corp., 4 Portland Road, West Conshohocken PA 19428 USA).

10.5.1 FORCE Turboalternator In a preferred embodiment, the vapor turbine and generator or alternator means may be combined into a single turboalternator or turbogenerator unit (hereafter referred to as a "turboalternator" for simplicity) - an integrated unit comprising a turbine means and a generator-or-alternator means, such that the combination shares a common direct-drive shaft (i.e., the turbine shares the same shaft with the alternator or generator, forcing them to spin together).
In a preferred embodiment, said turboalternator means may be constructed so as to have only one moving part, that being the shared shaft, such that during operation the shared shaft may rotate at an essentially constant ("fixed") rate in the range of 50,000 to 250,000 RPM, and during said rotation may float hydrodynamically on a vapor layer created by its foils, thus implementing quasi-frictionless "vapor bearings"
or "gas bearings that may need no lubrication or maintenance (wear occurs only during spin-up and spin-down, such as when the rotational speed drops below a particular physical threshold, that may be, for example, equivalent to 3% of its normal fixed rate, at which point the foils of the turboalternator may begin to incur friction). In a preferred embodiment, said turboalternator may be designed and constructed, using techniques known to those skilled in the art, so as to enable on the order of 25,000 to 50,000 spin-up/spin-down cycles without maintenance, and an operating life on the order of 100,000 hours (more than 11 years of continuous operation).
In a preferred embodiment, the turbine portion of said turboalternator may be of the radial inflow type, specifically designed, using techniques known to those skilled in the art, so as to work well with a selection from among the preferred superheated working fluids, preferred inlet temperatures (e.g., 120 C to 160 C), and preferred pressure ranges (e.g., 3 bar to 8 bar), and with an electrical load range appropriate for the intended purpose, such that the characterizations of both the design point and selected "off-design" points may be known, and such that the relationship between inlet pressure and output power may be sufficiently understood, so as to enable proper control.
In a preferred embodiment, said turboalternator may be designed to operate at a specific, fixed rotational rate subrange within an overall range of 50,000 to 250,000 RPM, while allowing both the differential pressure (i.e., the difference between inlet pressure and outlet pressure) and the electrical load to vary under external control, as long as the fixed rotational rate within said subrange is maintained (i.e., a control function may be required so that, for example, if the electrical load is reduced, then the differential pressure may also be reduced (e.g., by reducing the inlet pressure, as is common practice), in order to ensure that the specified fixed rotational rate range is not exceeded). In a preferred embodiment, the differential pressure may be reduced by some combination of decreasing the inlet pressure and/or increasing the outlet pressure (e.g., by creating backpressure downstream of the turboalternator outlet), thereby increasing the range of external control options available. Referring to Fig. 10-1, in a preferred embodiment said external control may be provided by a combination of the SLAM apparatus [1] and the STEER apparatus [2].
In a preferred embodiment, the system may comprise a multiplicity of said turboalternators, having the same, similar, or widely differing capacities, so that a control function within the system of which said turboalternators are a part may dynamically reconfigure and control said turboalternators in order to meet the needs of said system at a point in time. Referring to Fig. 10-1, in a preferred embodiment said control function may be provided by a combination of the SLAM apparatus [1] and the STEER apparatus [2].
In a preferred embodiment, a family of turboalternators (hereafter "turboalternator family") may be designed and constructed to accept a variable set of working fluids with a range of compatible properties. In a preferred embodiment, said turboalternator family may be optimally designed for organic working fluids, and most preferentially an organic dielectric fluid with a relatively low boiling point between, for example, 20 C
and 40 C, such as 1-methoxy-heptafluoropropane (C3F7OCH3). Other suitable working fluids may include, for example, C5F12, C6F14, C4F9OCH3, C4F9CH3, C4F9OC2H5, C4F9C5H5, and CCI2FCH3, as well as others, and may also include combinations of said fluids, some of which may not be organic dielectric fluids having boiling points within the exemplary range.

In an alternative embodiment, said turboalternator family members may be constructed at low cost by using off-the-shelf radial turbine and bearings means (for example, by repurposing a suitable mass-produced refrigerator compressor turbine and operating it in reverse such that it effectively becomes a radial inflow turbine). In said alternative embodiment, the availability of suitable components may severely constrain the turboalternator design and construction of said family members, especially with respect to performance, efficiency, cycle life, and runtime.

In a preferred embodiment, said turboalternator family comprises a set of relatively miniature "nanoturbine"
turboalternator units at the 10 KW, 1.5 KW, and 500-watt electrical output design points (many other design points are possible, including both much larger and much smaller power outputs, and may be selected according to intended use), operating with a common design-point inlet temperature of 125 C, an outlet temperature exceeding 100 C, an inlet pressure of approximately 6 bar (87 PSIA), and an outlet pressure of approximately 1 bar (14.5 PSIA), with overall efficiency (i.e., that being the product of adiabatic efficiency, mechanical efficiency, and alternator or generator efficiency) typically greater than 60%. The mass flow rates, outlet temperatures, rotational speeds, and relative outputs at off-design points (e.g., at 5 bar, 4 bar, etc.) may vary widely among the family members, due to their inherent differences in electrical output and anticipated loads; Table 10.5.1-1 provides an approximate cross-family characterization that may be achieved for said preferred embodiment, at said design point.

Table 10.5.1-1 FORCE Turboalternator Family Differences for a Preferred Embodiment Approx. Output at 6 bar: 10 KW 1.5 KW 0.5 KW
Approximate Mass flow: 600 g/s 100 g/s 35 g/s Approximate RPM: >60,000 >60,000 100,000 In a preferred embodiment, said turboalternator families may output alternating current (AC) rather than rectifying it to direct current (DC), thereby avoiding rectification when AC
is an acceptable output (and also avoiding, in this case, an unnecessary, energy-wasting conversion), or enabling rectification to occur remotely or at some distance from the turboalternator (for example, in order to simplify recapturing of waste heat energy from rectification). In a preferred embodiment, said turboalternator families output single-phase "wild AC," or alternating current (AC) voltage where neither the voltage nor the current is constant; but for a given AC frequency, both the voltage and current may vary from one power output level to another. For example, as the power output increases, the current may increase while the voltage decreases, as is depicted in Table 10.5.1-2 for one of the members of said turboalternator family. In said embodiment, the output frequency (e.g., 1000 Hz) may be a direct consequence of, and therefore as stable as (i.e., as constant as) the turboalternator's actual rotational rate (e.g., 50,000 RPM).
In an alternative preferred embodiment, an AC-to-DC rectification circuit may be co-located with, or attached to, the turboalternator.
Table 10.5.1-2 FORCE Turboalternator Family - Examples of "Wild AC" Electrical Outputs Power Voltage Current At Design Point: 100% 84% 100%
At Off-Design Point #1: 92% 85% 91%
At Off-Design Point #2: 58% 89% 55%
At Off-Design Point #3: 30% 92% 27%
At Off-Design Point #4: 0% 100% 0%

We use the "nanoturbine" label to denote a class of turbine machinery that is quantitatively smaller than the turbine machinery commonly referred to as "microturbines," rather than to denote the class of still smaller turbine machines that one might fabricate by integrating mechanical elements, sensors, actuators, and electronics on a common silicon substrate, using microfabrication technology commonly referred to as MEMS (Micro-Electro-Mechanical Systems). To maintain a monotonically decreasing machine class nomenclature, a MEMS-based machine would most probably be referred to as a "picoturbine," which is the label used herein.

In an alternative preferred embodiment, a turboalternator family may be constructed using (e.g., by using known MEMS microfabrication techniques in conjunction with known turbomachinery design principles) that comprises a set of quantitatively smaller "picoturbine"" turboalternator units at the 10-watt, 1.5-watt, and 500 milliwatt electrical output design points (many other design points are possible, and may be selected according to intended use), operating with a common design-point inlet temperature of 125 C, outlet temperature exceeding 100 C, inlet pressure of up to 8 bar (116 PSIA), and outlet pressure of 1 bar (14.5 PSIA).

10.5.2 FORCE Post- Turboalterna tor Recuperator The FORCE Post-Turboalternator Recuperator (hereafter, simply "recuperator"
when unambiguous) is a counter-current heat exchanger whose purpose is to recuperate the thermal energy remaining in the still-superheated working fluid after an upstream turboalternator has reduced the pressure of said fluid. The temperature of the fluid at the recuperator's inlet is effectively equal to the temperature of the fluid at the turboalternator's outlet.
In the art, the turbine component of a turboalternator is modeled as an adiabatic process (i.e., one in which no heat is gained or lost by the system) that accomplishes work by converting a pressure differential into mechanical energy (which the alternator/generator component subsequently converts into electrical energy).
However, in the real world, "adiabatic" process efficiency is less than 100%, so some heat may be lost, and this partly accounts for a difference in the working fluid temperatures between the turbine's inlet and outlet, with the outlet seeing a reduced temperature. In a preferred embodiment, the turboalternator family described in 10.5.1 may have a common inlet temperature of approximately 125 C
(257 F) and an outlet temperature of approximately 100 C (212 F) or more, which may subsequently be the lowest temperature at the recuperator's inlet.
In a preferred embodiment, the recuperator accepts a superheated working fluid in approximately the 100 C
(212 F) to 125 C (257 F) temperature range as the "hot stream" to be cooled, and accepts working fluid in liquid and/or saturated vapor form in approximately the 20 C (68 F) to 40 C
(104 F) temperature range as the "cold stream" to be heated (an example is depicted in Fig. 10.5.2-1).
In a preferred embodiment, such as for electronics thermal stabilization applications, the working fluid may be an organic dielectric fluid with a boiling point between 20 C and 40 C, such as 1-methoxy-heptafluoropropane (C3F7OCH3). Other working fluids may also be suitable, some examples of which are listed in section 10.3. In a preferred embodiment, the working fluid expands substantially when heated and vaporizes easily.
In a preferred embodiment, said hot and cold streams use the same type of working fluid and are intentionally part of the same circuit, thus enabling the convenient mixing of streams to accomplish specific objectives. In an alternative preferred embodiment, said hot and cold streams use the same type of working fluid, but are not part of the same circuit (i.e., the hot and cold streams cannot mix under normal circumstances).
In a preferred embodiment, the RUBE Recuperator Assembly as described in section 10.3.7, can be used to implement the FORCE Post-Turboalternator Recuperator. In an alternative preferred embodiment, a flat-plate heat exchanger with adequate surface area and not excessive pressure drop may be used. In an alternative embodiment, some other fluid-compatible heat exchanger with adequate surface area and not excessive pressure drop may be used.

FORCE The FORCE apparatus 0 is a kilowatt-scale (e.g., 0.5KW to 50KW) turboalternator means (e.g., one or more differential pressure "heat engines" with mechanically coupled alternators or generators) and heat source means - designed and configured such that it can accept relatively low-temperature (e.g., 95 C to 130 C), low-pressure (e.g., 3-8 bar) working fluid (e.g., in a preferred embodiment, from the STEER
apparatus ) in order to generate electrical power (in a preferred embodiment, the output is "wild AC"). In a preferred embodiment, the slightly cooled (e.g., by 20 C to 30 C), pressure-reduced (e.g., to 1-bar) working We use the "picoturbine" label to denote a class of turbine machinery that is quantitatively smaller than the turbine machinery we refer to as "nanoturbines."

fluid is returned to the STEER apparatus @, where it can be mixed and/or redistributed where needed in order to efficiently recuperate its residual heat energy. In an alternative embodiment, the slightly cooled working fluid is routed through a heat exchanger means in order to reject unwanted heat energy to another system, or to the ambient environment, etc.
In a preferred embodiment involving the FORCE nanoturbine system, a multiplicity of nanoturbines may be used when the total heat energy available exceeds the capacity of a single unit, and/or to achieve a specific level of redundancy (e.g., in order to achieve an availability threshold).
In alternative preferred embodiment, the FORCE nanoturbine system indicated above can be combined (e.g., in a fixed configuration, or dynamically via the STEER apparatus) with one or more FPSE (Free Piston Stirling Engine) devices, such that waste heat still present in the nanoturbine outlet stream is captured and used to heat one or more FPSE devices (which convert the heat to mechanical energy and can therefore do useful work, including the generation of electricity). In yet another alternative embodiment, one or more FPSE devices take the place of single or multiple FORCE nanoturbine devices as described above.

10.5.3 FORCE Catalytic Vaporizer FORCE Catalytic Vaporizer is an energy conversion apparatus, comprising an infrared radiation collector and multi-stage heat exchanger means, whose goal is to extract the radiant energy from a radiant energy source and transfer the heat energy to a stream of working fluid. FORCE has two types of internal heaters:
= Diffused Air Source Catalytic Heater for gaseous fuels (e.g., NG, LPG, butane) = Pre-Mixed Air-Fuel Catalytic Heater for liquid fuels (e.g., methanol, ethanol, mixtures) A peek at the energy conversion apparatus:

10.5.4 FORCE External Thermal Energy Vaporizer = Uses hi-temp thermal oil as working fluid.
= Use flat-plate heat exchanger, or RUBE Recuperator Assembly with thermal oil in inner tube, HFE-7000 in shell.
An energy conversion apparatus, comprising a radiant energy source (i.e., an infrared radiation emitter means), whose goal is to convert fuel-based latent energy into radiant energy that can be subsequently harnessed (e.g., directed to a collectors means and subsequently utilized).

10.5.5 FORCE Exhaust Dehumidifier = Object is to avoid heat loss.
= Object is to avoid injecting H2O into the room air.
= Use hot exhaust air as hot fluid.
= Use HFE-7000 and/or water as coolant fluids = Use RUBE recuperator in inverted orientation (i.e., actual U-shape) with condensate collection at the bottom (i.e., where the bend is).
= Route desuperheated exhaust through inner tube.
= Route HFE-7000 and/or water through shell.
An energy conversion apparatus, comprising a radiant energy source (i.e., an infrared radiation emitter means), whose goal is to convert fuel-based latent energy into radiant energy that can be subsequently harnessed (e.g., directed to a collectors means and subsequently utilized).

10.6 SOLAR - Self-Orienting Light-Aggregating Receiver SOLARTM. Self-Orienting Light-Aggregating Receiver. In a preferred embodiment, a system using a relatively low-temperature phase-change working fluid to receive heat energy from the sun for immediate use (in which case it acts as a "boiler") or subsequent use, and especially for the primary purpose of generating electricity. In an alternative embodiment, a system using a relatively low-vapor-pressure working fluid (for example, an appropriate Paratherm thermal oil) to receive heat energy from the sun for immediate or subsequent use. The heat energy in this context refers to energy that can be immediately used immediately (or stored for later use) to effect or help effect a liquid/vapor phase-change, such as occurs, by design, in a "boiler." Received energy heats and expands the phase-change working fluid (which may have been preheated via RUBE, above), and which, in conjunction with optional vapor injection (see RUBE Vapor Injector, described elsewhere) in the "boiler" feed circuit, and in conjunction with a FORCE nanoturbine or FPSE (Free Piston Stirling Engine) in the "boiler" output circuit, can be used to accomplish work, and particularly, to generate electricity.
SOLAR The SOLAR apparatus [6] comprises some combination of means for tracking and/or concentrating solar energy and directing it to a receiver means where it is collected and converted to thermal energy and transferred to a working fluid. In a preferred embodiment, the SOLAR apparatus [6] may also comprise a STEER apparatus [2] interface for accepting and delivering working fluid to one or more companion subsystems (e.g., the RUBE
apparatus [3], or FORCE apparatus [5], etc.).
In an alternate preferred embodiment, a means such as the FORCE apparatus [5], or a subset of it, may be co-located with (and possibly connected directly to) the SOLAR apparatus [6]
in order to directly generate electrical power without the potential thermal energy losses associated with transporting working fluid. In another preferred embodiment, the SOLAR apparatus [6] may comprise some combination of concentrating and non-concentrating photo-voltaic (PV) means for generating electrical power, accompanied by a means for recuperating thermal energy from said PV means (and while also beneficially reducing the operating temperature of the PV means). In a preferred embodiment, the means for recuperating thermal energy from said PV means may be provided by incorporating a RUBE recuperator assembly [3]
into the SOLAR
apparatus [6]. In still another preferred embodiment, one or more, and possibly all, of the aforementioned energy capture and power generation means may be present, allowing for dynamic reconfiguration and repurposing of the energy collection means and maximum flexibility in the generation and distribution of power from said energy. In a preferred embodiment, the SOLAR apparatus [6] may also comprise a thermal energy dissipation device or apparatus (e.g., a radiator or heat exchanger) for dissipating thermal energy (e.g., through radiation, convection, and/or conduction) to the environment.
In a preferred embodiment, said thermal dissipation device or apparatus may take advantage of exposed metallic surfaces associated with the SOLAR apparatus [6]. In a preferred embodiment, said exposed metallic surfaces may include multi-use surfaces capable of providing an optically reflective and/or light-concentrating surface on one side, and dissipative and/or physically protective surface on the other side. In a preferred embodiment, the SOLAR
apparatus [6] may comprise a means for reorienting said multi-use surfaces such that at least one configuration may protect the solar energy receiver while enabling simultaneous thermal energy dissipation.
In a preferred embodiment, the SOLAR apparatus [6] comprises a means for reorienting said multi-use surfaces such that one configuration may track the sun while reflecting and concentrating solar energy onto a receiver means, while also still enabling thermal energy dissipation.

10.6.1 SOLAR Parabolic Dish for Concentrating Solar Power- Back-of-the-Envelope Calculations In a preferred embodiment, each SHADOWS site - and to a lesser extent, any arbitrary datacenter - is equipped with one or more SHADOWS SOLAR concentrating solar power systems, in order to increase survivability, and to decrease dependence on fuel reserves during off-grid operation.
The sun occupies 32 minutes of arc (i.e., approximately 0.53 degrees) and is not a point source. The sun's 0.53 degree source is a good match for a 2.4 meter (8-foot) parabolic dish having an aperture area of 4.5 m2 (i.e., (pi * (1.2m)2) = 4.5 m2, approximately) and a 3db beamwidth of 0.71 (C-band).
If such a dish is covered with (or composed of) a highly reflective material (in the optical sense) and used as a parabolic solar reflector, this corresponds to a theoretical maximum of about 4.5KW of collectible solar energy (given a solar insolation of 1000 W/m2 and a tracking system to keep the dish "on sun"). The 2.4M

dish is the largest low-cost one-piece dish available. Larger dishes are disproportionately more expensive, both in CAPEX (capital expense) and OPEX (operational expense), and require significant labor and logistics just to handle them. (Of course, a multiplicity of smaller, low-cost dishes could also be used, such as the 24-inch diameter polished aluminum dishes from Edmunds Scientific).
Normally the aperture opening where the feedhorn would be (if the parabolic dish were used as an antenna) is around 1 inch (2.54 cm) in diameter for a 2.4M dish, corresponding to a focal area of about 5.08 cm2 (0.000508 m2). This is corresponds to a solar concentration ratio of almost 9,000 suns ((4.5 - 0.000508) /
0.000508 = 8,857). The maximum solar concentration feasible with today's most advanced concentrator PV
cells is about 1000 suns, so we would need to defocus to a concentration of "only" 1000 suns. In a preferred embodiment, defocusing would be achieved partly (probably mostly) by moving the receiver away from the focal point (and away from the dish, for our purposes), and in an alternative embodiment, partly by using a highly reflective (97.4%) material that is slightly diffuse (e.g., spacesuit material, which is reflective but does not have a mirror finish).
In another alternative preferred embodiment, a Cassegrain reflector apparatus is used, where each parabolic reflector is accompanied by an hyperbolic subreflector at the focus point, which reflects back through an opening in the center of the parabolic reflector, and directs light onto the concentrator PV cells and other apparatus (cooling, etc.) situated behind into the parabolic reflector.
Depending on the materials used for the hyperbolic subreflector, and the precision and quality of finish, it could be sized to approximate the natural focal area of the parabolic dish (in which case it would need to efficiently reflect a solar flux of almost 9,000 suns), or in a preferred embodiment would be sized (and shaped) to be slightly defocused, so as to reflect a solar flux approximately equal to, or slightly greater than, the maximum solar flux to be handled by the concentrator PV cells (allowing for "oversplash," which is discussed further below), after taking into account the hyperbolic subreflector efficiency. In a preferred embodiment, the hyperbolic subreflector is highly polished, but mass-produced, and would likely become extremely hot (i.e., beyond the melting point of the subreflector) without active cooling -- therefore, in such an embodiment, the subreflector would be treated as an intended receiver of heat energy, and active cooling would be achieved by transferring the heat energy to an intermediate heat transfer fluid (described further below).
Assuming that we defocus to 1000 suns, the defocused "hot spot" area must be approximately equal to S
(m), where:
(4.5 -SS==1000 => 4.5 - S = (1000 * S) m2 =>4.5=(1000*S)+Sm2 => 4.5 = 1001 *Sm2 =>S=4.5/1001 m2 => S = 0.004496 m2 => S - 0.0045 m2 _> S - 45 cm2 (which corresponds to a circular approximately 3 inches in diameter) At a solar insolation of 1000 W/m2 (or about 100 W/cm), Spectrolab's Improved Triple-Junction (ITJ) PV
cells, for example, are estimated to achieve an efficiency of at least 28%, or 28 W/cm2. Ignoring reflector efficiency for now, with a cell area of 45 cm2 this corresponds to 1260 W of direct PV-generated power (28 *
45 = 1260). Most of the remaining solar energy (100% - 28% leaves about 72%) is either absorbed by the PV cells or reflected from the PV cells and absorbed by the enclosing receiver's "black body" interior. Either way, this leaves about 3240W of collected heat energy (4500W - 1260W = 3240W) that must be removed from the receiver via a heat transfer fluid. Spectrolab's ITJ PV cells can operate at 1000 suns only if they're on a ceramic substrate that is actively cooled to around 105 C (221 F) or less, which can be trivially accomplished via liquid cooling and rejecting the waste heat to the air or into a ground loop, or to some other heat sink. Note: Some concentrating PV cells are maximally efficient at other intensities (e.g., 500 suns), and the effect is that of reducing intensity is to allow further defocusing, which creates a large area (possibly allowing more cells), but also possibly reducing their cooling requirements.
It is typically not possible to direct concentrated light onto "only" the PV cells -- and any such "oversplashed" light causes local heating wherever it touches, so in a preferred embodiment, a high-quality black body is used as a secondary receiver of heat energy, in order to capture it for downstream use.
In a preferred embodiment, however, the waste heat from the PV system above (and, in the case of an embodiment having an actively cooled Cassegrain subreflector, any heat the subreflector produces via absorption and transfer to a working fluid) is used to directly or indirectly drive our FORCE nanoturbine (FORCE stands for Frictionless Organic Rankine Cycle Engine), as part of an integration of our supercomputing system or other heat-producing subsystem(s) with our local power generation capability.
A preferred embodiment comprises a combination of said PV system with a FORCE
low-temperature closed-loop nanoturbine system), along with a supercomputing or other heat-producing subsystem(s) whose waste heat serves to preheats and partly or completely vaporizes the phase-change working fluid. In a preferred embodiment, the working fluid may be an organic dielectric fluid with a boiling point between 20 C and 40 C, such as 1-methoxy-heptafluoropropane (C3F7OCH3). Other working fluids may also be suitable, some examples of which are listed in section 10.3. In a preferred embodiment, the working fluid expands substantially when heated and vaporizes easily.
The hotter the various component parts of the system can be allowed to get, the higher the pressure at which the thermal system can operate, up to the maximum desired pressure (in a preferred embodiment based on the FORCE nanoturbine, the maximum desired pressure is that which is required as the turbine's inlet pressure, which is only 3 to 6 bar) and/or target temperature of the heat-producing devices, or the useful upper limit of the working fluid, whichever is lower. In a preferred embodiment, one or more sets of RUBE
"manifolds" (described elsewhere) in the supercomputer or other heat-producing subsystem(s) preferably operates in the 30 C to 40 C range for a particular class of heat-producing electronic chips (in order to maximize reliability and minimize the power dissipation that occurs due to "leaky" transistors as such chips approach their maximum TDP temperatures, the avoidance of which requires maintaining Tcase well below that), while another set operates simultaneously (and safely) in the 90 C to 110 C range for a different class of heat-producing electronic chips (the PV devices described fall into the latter range, along with certain power supply components and other devices). Any waste heat captured from the "oversplash" of concentrated light is expected to be below 130 C if the preferred phase-change fluid is used, and possibly well above that if an intermediate heat transport fluid is used. In a preferred embodiment, the same working fluid is used for all three temperature ranges - in fact, the cooler system can "feed" the hotter systems (however, this would typically require a boost in inlet pressure for the downstream "hotter" systems, which can be accomplished externally via pumps, or via the RUBE Vapor Injector (described elsewhere).
In a preferred embodiment of our low-temperature closed-loop FORCE nanoturbine system, we can either use the phase-change working fluid directly to cool the PV cells to 105 C or less, or we can use an intermediate heat transfer /transport fluid with a very low vapor pressure (for example, a thermal oil such as Paratherm ) to transport the heat to a point where it can be transferred to the phase-change fluid via a heat exchanger. The choice of fluids (and whether to use an intermediate heat transport fluid) is largely driven by the proximity of the FORCE nanoturbine to the PV cells. In a preferred embodiment, the PV cells, FORCE
nanoturbine, and power-consuming/heat-producing subsystem(s) are all colocated, and no intermediate heat transport fluid is necessary for heating. In an alternative embodiment, an intermediate fluid is used to actively cool the PV cells to 105 C (221 F) or less while heating the intermediate heat transport fluid to around 93 C (200 F), and then using it to preheat and/or "boil" the phase-change working fluid for the nanoturbine system. The intermediate heat transport fluid may be used immediately and/or stored thermally for later use, depending on the volume available, the operational status of the FORCE nanoturbine system, immediate electrical power demand, etc. Normally, however, the intermediate heat transport fluid is supplied directly to the FORCE nanoturbine (which, in a preferred embodiment is colocated with the SOLAR
collector apparatus), thereby eliminating the energy loss that would occur if the nanoturbine were located further away.
Depending on the temperature of the "cold source" (e.g., cooling water or other intermediate heat transport fluid, which could range, for example, from 24 C (55 F) down to -18 C (0 F), or lower, according to climate, season, etc.), the Carnot efficiency of the nanoturbine ranges from a typical 24% to an infrequent 34% or more. Thus, ignoring SOLAR collector efficiency for now, and also ignoring energy recapture elsewhere in the system, the typical nanoturbine power available from a single 2.4-meter solar reflector dish would range from -775W (24% of 3240W) to -1100W (34% of 3240W). However, for each solar reflector that generates heat from the sun there is also an electrical load that generates heat, and in a preferred embodiment this heat is collected and used to preheat the working fluid supplied to the PV
cells. As a consequence, at least 2200W per reflector must be added to the energy provided to the nanoturbine system, raising the pre-Carnot figures from 3240W to 5740W (3240 + 2500 = 5740). Recalculating the theoretical nanoturbine output based solely on Carnot efficiency yields a range of -1375W (24% of 5740W) to -1950W
(34% of 5740W).

10.6.2 Non-Concentrating Solar Power Considerations Operating principles similar to those described in section 10.6.1 also hold for simpler, non-concentrating PV
arrays, and to a lesser extent, for non-PV arrays. To the extent such systems are not self-orienting (i.e., sun-tracking), they typically must be oriented toward the equator and angled to match the latitude, plus or minus a few degrees. From a SHADOWS viewpoint, two key considerations are that:
1) Due to their non-concentrating nature, and therefore lower efficiency per square foot of collector area, such array tend to be much larger than their concentrating (e.g., SOLAR) counterparts, and therefore may require more sun-facing area than is readily available (or affordable).
2) Due to their size (relative to a SOLAR/BLOOMER dish array, for example) such arrays may not be able to be easily protected from weather and other external risks.
Heat absorption can be an issue for non-concentrating PV arrays, since they tend to operate more efficiently at cooler temperatures, and in any case, the absorption of heat implies wasted energy. Depending on the specifics of a particular non-concentrating PV array, and the importance of collecting energy relative to the cost of doing so, active cooling based on FRAME/RUBE technology may be very appropriate.
Non-PV solar arrays are designed specifically to collect heat energy, and thus fit quite naturally with FRAME/RUBE technology. Depending on the availability of adequate space for a non-PV solar array, integration with FRAME/RUBE technology may be very appropriate.

10,6.3 SOLAR Parabolic Dish for Concentrating Solar Power- Candidate Phase-Change Working Fluids PROPERTIES C3F7OCH3 n-Pentane Fluorinert Distilled FC-72 Water Boiling Point 34 C 36 C 50-60 (56)- C 100 C
Pour Point -122.5 C -130 C -90 C 0 C
Kinematic Viscosity (centistokes) 0.32 25 C 0.42 20 C
Absolute Viscosity centi oise 0.47 0.64 Vapor Pressure (PSI) 7.8 @ 20 C 8.28 4.5 @ 20 C 0.34 @ 20 C
9.3757 25 C 20 C 4.5 25 C
Vapor Pressure (Pa) 64.6x103 57.1 x103 30.9x103 Density (kg/m3) 1400 641 1680 998 @ 200 C

Specific Gravity 1.6 0.626 1.7 1 Molecular Weight /moi 200 72.2 25 C 18.0 Coefficient of Volume Expansion (per C) 0.00222 0.00158 0.0016 0.00021 --Specific Heat (J/(kg C 1300 1668 1100 4182 Heat of Vaporization at Boiling Point J/ 142 88 2260 Global Warming Potential (GWP) 370 3 0 Ozone Deletion Potential (ODP) 0 0 0 Dielectric Strength (kV, 0.1" a -40 38 Dielectric Constant (1 KHz) 7.4 1.8-1.84 1.75 80.4 @ 20 C
25 C 25 C 55.3 100 C
Volume Resistivit ohm cm) 108 1015 104-10' Flash Point N/A -49 C N/A N/A
Autoignition Temperature 415 C 309 C N/A N/A
Flammability Nonflammable per Highly Flammable Nonflammable per Nonflammable Toxicology, Environmental Hazards Mild eye irritant. Harmful by inhalation, No immediate health, Gastrointestinal irritant, ingestion or skin physical, or Prolonged or repeated absorption. Irritant. environmental exposure may cause liver Narcotic in high hazards known. Not effects. Not regulated by concentration. regulated by EPA.
EPA. Hazardous Hazardous decomposition at elevated decomposition at temperature (over 200 C). elevated temperature over 200 C .

Flash point is an indication of the combustibility of the vapors of a substance, and is defined as the lowest temperature at which the vapor can be ignited under specified conditions. Flash point is clearly related to safety.

10.6.4 FORCE Nanoturbine Considerations Assuming maximum insolation, and ignoring reflector inefficiency (and other system inefficiencies) for now, the maximum total power available from a single solar reflector, including recapture of energy from its corresponding load, is in the range of -2635W (1260 + 1375) to -3210W (1260 +
1950). Thus, during periods of high insolation, a single reflector provides slightly more power than its corresponding 2500W load, with a system efficiency in the range of 58% (2635/4500) to 71% (3210/4500).
Excess electrical energy is used to charge a commercially available battery array (for example, a ZBB
array), which can then provide power on demand -- both peak-shaving and backup power when it cannot be otherwise generated.
Energy storage, whether thermal or electrical, is an important part of the SUREFIRE concept. However, even during periods of high insolation, a single solar reflector doesn't provide enough excess capacity for storage (except to provide ride-through for passing clouds, etc.). In general, each 2500W continuous load requires at least a second solar reflector dish, nanoturbine, etc. By associating a FORCE nanoturbine with a solar reflector, it is simple to scale out the power capacity on a plug-and-play basis, making it easy to provide excess daytime/good weather capacity that can be stored for later use. Even when there is no sunshine, the heat collected from electrical loads can be stored thermally for later use, and/or used to pre-heat an intermediate heat transport fluid, a portion of which is subsequently heated (as necessary) by a fuel-fired boiler (e.g., such as one that burns fuel, whether renewable or not), and then used to vaporize the selected phase-change working fluid that feeds the FORCE nanoturbine.

11 SUREFIRE - Survivable Unmanned Renewably Energized Facility & Independent Reconfigurable Environment The are numerous SUREFIRE site configurations possible, in order to provide the basis of meeting a diverse set of needs. The three exemplary configurations described here are:
= SUREFIRE Freestanding Vault (a preferred embodiment) = SUREFIRE Mini-Silo (a preferred embodiment) = SUREFIRE Single-Level Underground Vault (an alternate embodiment) = SUREFIRE Multi-Level Underground Vault (an alternate embodiment) The SUREFIRE Freestanding Vault is a preferred embodiment, ...
The SUREFIRE Mini-Silo is a preferred embodiment, and by design its minimal configuration would enjoy the lowest cost of the three example configurations if deployed in volume, which would enable affordable, widespread deployment. The packaging of all of its major components is tailored especially to a silo configuration (a cylindrical shape approximately 3 feet in diameter). The SUREFIRE Mini-Silo can be configured to support various levels of performance in the 0.5 TFLOPS to 10 TFLOPS range, per silo.
The SUREFIRE Single-Level Underground Vault is an alternate embodiment - a larger diameter silo - that could be affordably produced in fairly low quantities (relative to the SUREFIRE Mini-Silo), and is able to accommodate a higher degree of conventional equipment than the SUREFIRE Mini-Silo. The SUREFIRE
Single-Level Underground Vault is especially well-suited to supercomputing accompanied by significant radio communications (the silo itself serves as the base for relatively lightweight communications towers). The SUREFIRE Single-Level Underground Vault can be configured to support various levels of performance in the 0.5 TFLOPS to 10 TFLOPS range, per silo.
The SUREFIRE Multi-Level Underground Vault is an alternate embodiment - also in a silo configuration -that is likely to require a somewhat substantial level of site engineering and preparation prior to deployment.
A typical deployment scenario would be underneath (literally) a commercial-class wind turbine (100 KW or more). While the basic design is straightforward to replicate, its site preparation may not be, due to the facility depth and likely permitting issues. The SUREFIRE Multi-Level Underground Vault can be configured to support various levels of performance in the 2 TFLOPS to 50 TFLOPS range, per silo.

DEMANDE OU BREVET VOLUMINEUX

LA PRRSENTE PARTIE DE CETTE DEMANDE OU CE BREVET COMPREND
PLUS D'UN TOME.

NOTE : Pour les tomes additionels, veuillez contacter le Bureau canadien des brevets JUMBO APPLICATIONS/PATENTS

THIS SECTION OF THE APPLICATION/PATENT CONTAINS MORE THAN ONE
VOLUME

NOTE: For additional volumes, please contact the Canadian Patent Office NOM DU FICHIER / FILE NAME:

NOTE POUR LE TOME / VOLUME NOTE:

Claims

normally incurs significant signal strength loss that the equipment must restore within its amplification chain.
Commercial systems using Adcock principles are available.
The Adcock DF system has the following list of features and challenges:
1. Like Doppler systems, the Adcock system is vertically polarized, with low sensitivity to line-of- sight horizontally polarized signals.
2. The antenna system requires continuous rotation during DF service, with or without the addition of a sensing antenna.
3. The antenna elements form a small package that may easily fit within small RF-transparent pods for weather protection.
4. The relative sensitivity of the Adcock antennas to nearby conductive materials in any direction is unknown. This factor might affect the placement of such a system on the SUREFIRE tower assemblies.
5. The Adcock antennas exhibit a very low feedpoint impedance and thus require intervening impedance transformation components prior to equipment entry.
6. It is likely that the Adcock system requires specialized receiving equipment, with or without a sensing antenna element, in order to maintain calibration. Hence, such a system may require purchase from a vendor with possibly prohibitive pricing. However, such purchases would likely reduce the amount of software development necessary for the project.

13.1.5.7 Conclusion These notes are not designed to reach a recommendation as to which DF system might be the best for the SUREFIRE project. Indeed, the list of candidates is not exhaustive, but only representative of the candidates that seem most apt to the project goals. In making the decision, the first decisive factor should be a review of the most likely polarization of the target signals for DF work. The remaining factors are likely to fall into a "best compromise" matrix related to both the goals of and the constraints upon the overall project.

13.2 PODIUM - Pneumatically Operated Directional Intelligent Unmanned Masthead 14.1 SHADOWS: Systems and Methods for Self-Healing Adaptive Distributed Organic Working Storage

1. A system comprising geographically distributed computing subsystems ("SHADOWS nodes") that are relatively distant from one another, each of which is capable of carrying out at least a limited set of functions to include communications, data and cryptographic processing, memoization, and data storage and retrieval, and where each such node recognizes, establishes trust, and collaborates with - other nodes of the same system. In general, SHADOWS nodes communicate via WAN (wide-area network) links (to include MAN, or metro-area network links, and RAN, or regional-area network links, and other links to relatively non-local destinations), including both terrestrial and non-terrestrial wireless communications.

2. The system described in [1], where each SHADOWS node mutually associates with selected other qualified and trusted SHADOWS nodes in order to form a multiplicity of higher-level collaborative system survival units ("SHADOWS teams"), where each SHADOWS node is, geographically speaking, relatively closer to some team members and relatively much more distant to others, and where each such team is jointly and severally accountable for a portion of the aggregate responsibilities of the overall system.

3. The system described in [1] and [2], where each SHADOWS node timely publishes selected general resource information to other SHADOWS nodes, and also publishes team-specific resource information to its fellow team members for each team to which it belongs.

4. The system described in [3], where each SHADOWS node "volunteers" its readiness (according to its operational profile and relative "survivability" factors, such as resource availability, vulnerabilities, threat exposure, etc.) to accept selected portions of the immediate responsibilities of other SHADOWS nodes as its own, while also delegating selected portions of its immediate responsibilities to other qualified SHADOWS nodes on a similar basis, as appropriate, so as to meet outstanding SLAs (service level agreements) and/or other operational goals (e.g., such as maximize throughput and/or survivability, minimize latency, risk, and operational expense, etc.).

5. The system described in [4], where SHADOWS nodes dynamically and autonomically self-modify or otherwise adapt their collective operational profile and associated readiness so as to economize operational resources and/or optimize their collective survivability while maintaining the necessary trust relationships and continuing to fulfill SLA-related commitments as appropriate. (Note that a SHADOWS node or other resource that is no longer trusted in one or more operational roles is effectively removed from those roles, but might still be trustworthy in other roles, or may re-establish its trustworthiness, and thus dynamically changing trust relationships directly bear on the operational configuration, profile, readiness, and survivability of the system's components, but not necessarily on the externally visible operation, functionality, performance, or survivability of the overall system.)

6. The system described in [5], where the periodic and/or temporary availability of lower cost or otherwise opportunistic resources (such as insolation for a solar-powered energy plant, strong winds for a wind-powered energy plant, waste methane for a gas-powered energy plant, surplus bandwidth on a network communications plan, etc.) are automatically taken advantage of by autonomically shifting the resource utilization (e.g., computing load) among SHADOWS nodes to follow the resource availability, to the extent it can be accomplished without jeopardizing committed levels of survivability.

7. The system described in any of claims [1] through [6], where each SHADOWS
node is itself a subsystem comprising a set of geographically distributed computing subsystems ("SCRAM nodes"
as claimed in 14.2) that are generally in relatively closer proximity to each other than the SHADOWS
nodes are to each other, and where each SCRAM node is capable of carrying out at least a limited set of functions to include communications, data and cryptographic processing, memoization, and data storage and retrieval.

14.2 SCRAM: Survivable Computation, Routing, & Associative Memory

8. A system comprising distributed computing subsystems ("SCRAM nodes") that are relatively local to one another (in contrast to SHADOWS nodes, which are relatively distant to each other), each of which is capable of carrying out at least a limited set of functions to include communications, data and cryptographic processing, memoization, and data storage and retrieval, and where each such node recognizes, establishes trust, and collaborates with - other nodes of the same system. In general, SCRAM nodes communicate via a combination of LAN (local-area network) and WAN
(wide-area network) links, including both terrestrial and non-terrestrial wireless communications, although non-LAN links are used primarily to augment LAN links and provide increased inter-connectivity, bandwidth and survivability, and thus are typically short-circuited by exploiting localized paths held in common to the extent possible.

9. The system described in [8], where each SCRAM node mutually associates with selected other qualified and trusted SCRAM nodes in order to form a single, relatively local SHADOWS node (see 14.1), and to participate in a multiplicity of higher-level collaborative system survival units ("SHADOWS teams", see 14.1). (NOTE: If such a set of SCRAM nodes is so localized that it may be concentrated and physically packaged into a single entity comprising a multiplicity of logical SCRAM nodes, it may be referred to as a SCRAM machine or SCRAM unit. A less-dense configuration of SCRAM nodes that cannot be physically packaged in to a single entity may be referred to as a SCRAM "mesh," which is essentially a logical SCRAM machine.
Either configuration may be sufficient to collectively form a local SHADOWS node, but acceptance of the configuration as a SHADOWS node is strictly up to the other SHADOWS nodes with which it wishes to become a peer.)

10. The system described in [8] and [9], where each SCRAM node timely publishes selected general resource information to other SCRAM nodes, and also publishes team-specific resource information to its fellow team members for each team to which it belongs.

11. The system described in [10], where each SCRAM node "volunteers" its readiness (according to its operational profile and relative "survivability" factors, such as resource availability, vulnerabilities, threat exposure, etc.) to accept selected portions of the immediate responsibilities of other SCRAM
nodes as its own, while also delegating selected portions of its immediate responsibilities to other qualified SCRAM nodes on a similar basis, as appropriate, so as to meet outstanding SLAs (service level agreements) and/or other operational goals (e.g., such as maximize throughput and/or survivability, minimize latency, risk, and operational expense, etc.).

12. The system described in [11], where SCRAM nodes dynamically and autonomically self-modify or otherwise adapt their collective operational profile and associated readiness so as to economize operational resources and/or optimize their collective survivability while maintaining the necessary trust relationships and continuing to fulfill SLA-related commitments as appropriate. (Note that a SCRAM node or other resource that is no longer trusted in one or more operational roles is effectively removed from those roles, but might still be trustworthy in other roles, or may re-establish its trustworthiness, and thus dynamically changing trust relationships directly bear on the operational configuration, profile, readiness, and survivability of the system's components, but not necessarily on the externally visible operation, functionality, performance, or survivability of the overall system.)

13. The system described in [12], where the periodic and/or temporary availability of lower cost or otherwise opportunistic resources (such as insolation for a solar-powered energy plant, strong winds for a wind-powered energy plant, waste methane for a gas-powered energy plant, surplus bandwidth on a network communications plan, etc.) are automatically taken advantage of by autonomically shifting the resource utilization (e.g., computing load) among SCRAM nodes to follow the resource availability, to the extent it can be accomplished without jeopardizing committed levels of survivability.

14. The system described in any of claims [8] through [13], where each SCRAM
node is itself a subsystem comprising a set of locally distributed computing subsystems ("MASTER/SLAVE nodes") that are generally in very close proximity to each other (much more so than the SCRAM nodes are to each other), and where each MASTER/SLAVE node is capable of carrying out at least a limited set of functions to include communications, data and cryptographic processing, memoization, and data storage and retrieval.

14.3 SUREFIRE: Survivable Unmanned Renewably Energized Facility &
Independent Reconfigurable Environment

15. A system implementing a miniature, self-contained, unmanned, secure (often outdoors or underground) supercomputing "datacenter," comprising a multiplicity of SCRAM
nodes (see 14.2), along with one or more self-contained power plants capable of operating off-grid (off the utility power grid) for extended periods (to include permanently), and designed to be physically visited for maintenance purposes as infrequently as every few years, but at most only once or twice a year (and these may be combined with scale-up visits). (NOTE: Depending upon its available resources, a single SUREFIRE system may fully implement a single SHADOWS node as described in 14.1).
NOTE: SUREFIRE sites can be located on virtually any outdoors property, but also in basements or on rooftops, etc. SUREFIRE sites usually include one or more renewable energy systems, in addition to conventional energy sources. SUREFIRE sites are designed for maximal energy efficiency, and emit very little waste heat. All SUREFIRE sites are expendable without data loss, and penetration can never yield useful information to an attacker.

16. The system described in [15], augmented by a FRAME subsystem ("Forced Recapture, Aggregation & Movement of Energy"), where the heat energy caused by the various SUREFIRE
subsystems is captured as reused, with transformation as necessary, such that it either reduces the load on internal or external power generation, or contributes to internal power generation, or both.

17. The system described in [16], where one or more of the self-contained power plants is implemented through the use of renewable energy sources, such as through incorporating any combination of the BLOOMER, FLOWER, and SOLAR subsystems.

18. The system described in [17], where the exposure to hazards introduced by the use of renewable energy sources is reduced or otherwise mitigated, and the production of power is improved or otherwise managed, with the help of any combination of the DEFEND, WARN, LISTEN, and PODIUM subsystems.

19. The system described in any of claims [15] through [18], where the analytic, planning, and other computing and reasoning capabilities of the system (and in particular, the system's internal SCRAM
nodes), in conjunction with similar capabilities of peer systems located elsewhere, are engaged to recognize threats to its own survival and the survival of its peers, to protect itself (including both the physical asset and its information content), and to protect the overall system of which it is a part.

14.4 SELF: Secure Emergent Learning of Friends

20. A system comprising distributed computing subsystems ("SCRAM nodes") that are relatively local to one another (in contrast to SHADOWS nodes, which are relatively distant to each other), each of which is capable of carrying out at least a limited set of functions to include communications, data and cryptographic processing, memoization, and data storage and retrieval, and where each such node recognizes, establishes trust, and collaborates with - other nodes of the same system. In general, SCRAM nodes communicate via a combination of LAN (local-area network) and WAN
(wide-area network) links, including both terrestrial and non-terrestrial wireless communications, although non-LAN links are used primarily to augment LAN links and provide increased inter-connectivity, bandwidth and survivability, and thus are typically short-circuited by exploiting localized paths held in common to the extent possible.