WO2001090851A2 - Systems and methods for voting on multiple messages - Google Patents

Systems and methods for voting on multiple messages Download PDF

Info

Publication number
WO2001090851A2
WO2001090851A2 PCT/US2001/016830 US0116830W WO0190851A2 WO 2001090851 A2 WO2001090851 A2 WO 2001090851A2 US 0116830 W US0116830 W US 0116830W WO 0190851 A2 WO0190851 A2 WO 0190851A2
Authority
WO
WIPO (PCT)
Prior art keywords
voting
parameters
messages
voter
message
Prior art date
Application number
PCT/US2001/016830
Other languages
French (fr)
Other versions
WO2001090851A3 (en
Inventor
David E. Bakken
David A. Karr
Christopher C. Jones
Original Assignee
Bbnt Solutions Llc
Washington State University Research Foundation
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Bbnt Solutions Llc, Washington State University Research Foundation filed Critical Bbnt Solutions Llc
Priority to AU2001263410A priority Critical patent/AU2001263410A1/en
Publication of WO2001090851A2 publication Critical patent/WO2001090851A2/en
Publication of WO2001090851A3 publication Critical patent/WO2001090851A3/en

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L9/00Cryptographic mechanisms or cryptographic arrangements for secret or secure communications; Network security protocols
    • H04L9/40Network security protocols
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/10Protocols in which an application is distributed across nodes in the network
    • H04L67/1001Protocols in which an application is distributed across nodes in the network for accessing one among a plurality of replicated servers
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/10Protocols in which an application is distributed across nodes in the network
    • H04L67/1001Protocols in which an application is distributed across nodes in the network for accessing one among a plurality of replicated servers
    • H04L67/10015Access to distributed or replicated servers, e.g. using brokers

Definitions

  • the present invention relates generally to distributed systems and, more particularly, to systems and methods that vote on multiple requests to or replies from a server.
  • Replication is a commonly used technique to increase the availability of these services and lower the delay in accessing them.
  • Replication typically includes replicating clients and/ or servers so that requests and replies may be generated and transmitted in an expedited manner.
  • the active replication scheme multicasts a service request from a client to all server replicas atomically and in order. Each replica then processes the request and sends back a reply. One of these replicas' replies must be chosen for the client. This process of choosing one reply from many is called "voting.” The client uses the reply chosen by the voting process as needed (e.g., the reply may contain data the client has requested from the server). Current support for voting is quite limited, however, in both research projects and in commercial products and standards. This may be the result of the way such active replication schemes tend to handle the parameters of requests and replies, namely, as opaque blocks of data that can only be compared for equality.
  • voting among active replicas generally consists of delivering a reply from the server replicas when some quorum of the servers (e.g., one, a majority, or all) have sent replies exactly identical byte-for-byte to the one being delivered.
  • the return value e.g., "out parameters” and "inout parameters”
  • a similarly limited functionality may be provided to vote on client requests before delivering one to the server.
  • a system in accordance with the purpose of the invention as embodied and broadly described herein, includes an unmarshal module, a voter core, and a marshal module.
  • the unmarshal module receives multiple messages and extracts at least one parameter from each of the messages.
  • the voter core votes on each of the parameters based on a current voting policy.
  • the marshal module constructs a message using the voted parameters and outputs the constructed message.
  • a portable voting system usable in a plurality of different types of middleware systems includes an unmarshal module, a voter core, and a marshal module.
  • the unmarshal module receives multiple messages in a first format, translates the messages to a second format, and extracts at least one parameter from each of the messages.
  • the voter core votes on the parameters based on a voting policy.
  • the marshal module generates a message using the voted parameters, translates the generated message from the second format to a third format, and outputs the generated message.
  • a voting system includes a vote manager and a voter.
  • the vote manager obtains a current voting policy from multiple voting policies.
  • the voter receives multiple messages directed to at least one destination, extracts at least one parameter from each of the messages, votes on each of the parameters based on the current voting policy, constructs a message using the voted parameters, and outputs the constructed message to the at least one destination.
  • a method for voting includes receiving multiple ballots having associated values; excluding at least one of the ballots based on the values associated with the at least one ballot after a predetermined number of the ballots have been received; and selecting a value from the values associated with ones of the ballots remaining after the exclusion.
  • a voting system in another implementation consistent with the present invention, includes a voter and at least one interface.
  • the voter performs operations, including receiving multiple messages directed to at least one destination, extracting at least one parameter from each of the messages, voting on each of the parameters based on a current voting policy, constructing a message using the voted parameters, and outputting the constructed message to the at least one destination.
  • the interface permits at least one external entity to control the operations of the voter.
  • a voting state machine includes a quorum component, an exclusion component, and a collation component.
  • the quorum component waits for a number of ballots to arrive. Each of the ballots includes at least one value.
  • the exclusion component excludes at least one of the ballots after the number of the ballots arrive.
  • the collation component selects a value from the values included in the non- excluded ballots as representative of the values included in the ballots.
  • Fig. l is a diagram of an exemplary system in which systems and methods consistent with the present invention may be implemented
  • Fig. 2 is an exemplary diagram of a computer device upon which systems and methods consistent with the present invention may be implemented;
  • Fig. 3 is a diagram of an exemplary voting system consistent with the present invention.
  • Fig. 4 is a diagram of an exemplary voter of Fig. 3 in an implementation consistent with the present invention
  • Fig. 5 is a flowchart of voting stages entered into by the voter core of Fig. 4 in an implementation consistent with the present invention
  • Fig. 6 is a diagram of the voter support of Fig. 3 in an implementation consistent with the present invention
  • Fig. 7 is a diagram of the voter status service of Fig. 3 in an implementation consistent with the present invention.
  • Figs. 8A and 8B are flowcharts of system processing for voting on a message in an implementation consistent with the present invention
  • Fig. 9 is an exemplary diagram of an alternate system in which systems and methods consistent with the present invention may be implemented.
  • Fig. 10 is an exemplary diagram of another alternate system in which systems and methods consistent with the present invention may be implemented.
  • Systems and methods consistent with the present invention provide a voting mechanism that allows voting policies to be changed at runtime and works with presentation layers, such as those for distributed objects, message-oriented middleware, and other kinds of middleware.
  • the systems and methods support both dynamic collation voting, where voting algorithms can be influenced by group membership changes, and static collation voting, where voting algorithms cannot be influenced by group membership changes.
  • EXEMPLARY SYSTEM Fig. l is an exemplary diagram of a system 100 in which systems and methods consistent with the present invention maybe implemented.
  • the system 100 includes a client no communicating with multiple replicated servers [l, . . . , N] 120 via a voting system 130.
  • the client 110, servers 120, and voting system 130 may be implemented as one or more devices and may communicate via any communications medium (e.g., by wired or wireless communication; via a network, such as the Internet; via a storage device; etc.).
  • the client no may include any device or object that can communicate with a server 120, such as a personal computer, a laptop, a personal digital assistant (PDA), or a process running on one of these devices.
  • the client 110 may request data from a server 120.
  • Each of the servers 120 may include any conventional server device or object.
  • the server 120 for example, may supply data for use by the client 110.
  • a single client has been shown for simplicity.
  • One skilled in the art would recognize that the system 100 may include any number of clients 110 and servers 120.
  • the voting system 130 may include a device, such as a computer, or a process operating on one or more devices.
  • the voting system 130 receives replies from each of the replicated servers 120, generates a reply from the received replies based on a current voting policy, and transmits the reply to the client 110.
  • Fig. 2 is an exemplary diagram of a device 200 that may incorporate client no, one or more of the servers 120, and/or the voting system 130 in one implementation consistent with the present invention.
  • client no one or more of the servers 120, and/or the voting system 130 in one implementation consistent with the present invention.
  • One skilled in the art would recognize that other configurations of the device 200 are possible.
  • the device 200 may include a bus 210, a processor 220, a main memory 230, a read only memory (ROM) 240, a storage device 250, an input device 260, an output device 270, and a communication interface 280.
  • the bus 210 permits communication among the components of the device 200.
  • the processor 220 may include any type of conventional processor or microprocessor that interprets and executes instructions.
  • the main memory 230 may include a random access memory (RAM) or another dynamic storage device that stores information and instructions for execution by the processor 220.
  • the ROM 240 may include a conventional ROM or another type of static storage device that stores static information and instructions for use by the processor 220.
  • the storage device 250 may include a magnetic and/or optical recording medium and its corresponding drive.
  • the input device 260 may include any conventional mechanism that permits an operator to input information to the device 200, such as a keyboard, a mouse, a pen, voice recognition and/ or biometric mechanisms, etc.
  • the output device 270 may include any conventional mechanism that outputs information to the operator, including a display, a printer, a pair of speakers, etc.
  • the communication interface 280 may include any transceiver-like mechanism that enables the device 200 to communicate with other devices and/ or systems.
  • the communication interface 280 may include mechanisms for communicating with another device or system via a network.
  • a device 200 performs tasks in response to processor 220 executing sequences of instructions contained in a computer-readable medium, such as memory 230.
  • a computer-readable medium may include one or more memory devices and/or carrier waves.
  • processor 220 executes the sequences of instructions contained in the computer-readable medium to perform processes that will be described later.
  • hardwired circuitry may be used in place of or in combination with software instructions to implement processes consistent with the present invention.
  • the present invention is not limited to any specific combination of hardware circuitry and software.
  • Fig. 3 is an exemplary diagram of a voting system 130 consistent with the present invention.
  • the voting system 130 may include a voter 310, voter support 320, and voting status service 330.
  • the voter 310 performs the voting process.
  • Fig. 4 is an exemplary diagram of the voter 310.
  • the voter 310 may include an unmarshal module 410, a voter core 420, and a marshal module 430.
  • the unmarshal module 410 may include conventional mechanisms that receive network messages from each of the replicas and convert them into sets of parameters in a language used by the voter core 420.
  • CORBA Common Object Request Broker Architecture
  • the unmarshal module 410 may receive network messages "flattened" for transmission into CORBA General Inter-Object Request Broker (ORB) protocol (GIOP)'s common data representation (CDR) format that is used with Internet Inter-ORB protocol (HOP) and convert them into sets of parameters in the voter core 420 language, such as Java.
  • CORBA Common Object Request Broker Architecture
  • GIOP CORBA General Inter-Object Request Broker
  • CDR common data representation
  • HOP Internet Inter-ORB protocol
  • the marshal module 430 may include conventional mechanisms that construct a message from the set of parameters chosen or constructed by the voter core 420 and transmit the message to the client in the language in which it was received or the language used by the client.
  • the voter core 420 selects or constructs one set of parameters, from the received sets of parameters, to be returned to the client.
  • the voter core 420 may receive inputs from the voter support 320 that inform it of the current voting policy, as well as notifying it when a failure occurs.
  • the voter core 420 may output status information on the voting to the voting status service 330 to help detect performance problems, potential intrusions, or other anomalies.
  • Fig. 5 is a flowchart of voting stages entered into by the voter core 420 in an implementation consistent with the present invention.
  • a "ballot” is a request or reply message sent by a single replica.
  • a "vote” is the process of choosing (or constructing) one message from among all of the ballots.
  • the voter core 420 may operate in three stages in the processing of a single vote in one implementation consistent with the present invention. At each stage, the current voting policy dictates the action of the voter core 420.
  • the three stages include a quorum stage 510, an exclusion stage 520, and a collation stage 530. Other stages may be possible.
  • the voter core 420 waits for enough ballots to arrive to begin the actual voting process. For example, the voter core 420 may wait for: a) N ballots b) All but N ballots c) x% of the ballots d) N identical ballots
  • the first operation involves static voting, while the second, third, and fourth involve dynamic voting.
  • the voter core 420 After sufficient ballots arrive, the voter core 420 enters the exclusion stage 520. In this stage, the voter core 420 may exclude a number of ballots from further consideration to enable the voter 310 to tolerate value failures, such as failures caused by a faulty (and undetected) floating point processor or a breach in security (e.g., an adversary could break into either the voter 310 or the middleware to insert messages that look like correct request or reply messages, but contain incorrect data).
  • value failures such as failures caused by a faulty (and undetected) floating point processor or a breach in security (e.g., an adversary could break into either the voter 310 or the middleware to insert messages that look like correct request or reply messages, but contain incorrect data).
  • the voter core 420 may exclude: a) The lowest n values b) The highest n values c) The furthest n values (from the median, for integers; can be high or low) d) All values e distance from the mean (for floating point values) or median (for integers) e) All values x sigma from the mean (for floating point values) f) None
  • the voter core 420 After the exclusion stage 520, the voter core 420 enters the collation stage 530. In this stage, the voter core 420 chooses or constructs a value from those values that were not excluded in the exclusion stage 520. For example, the voter core 420 may: a) Choose the median value b) Choose the mean value (for floating point parameters) c) choose the mode value (i.e., the most common one) d) Construct a value
  • the voter core 420 may choose the "best" parameter for each parameter in a message, then combine these "best" values into one request/reply message.
  • the actual vector selected by the voter core 420 may not have been generated by any client or server. For example, if server [1] returned ⁇ 1,2 ⁇ and server [2] returned ⁇ 3,4 ⁇ , the voter core 420 might create a reply message of ⁇ 3,2 ⁇ by choosing the "best" of each parameter.
  • the current voting policy might specify what is best. Other types of voting may also be possible. For example, the voting policy may instruct the voter core 420 to perform random voting or weighted voting operations.
  • Random voting operations may be performed by the voter core 420 at any or all of the three stages to help thwart an adversary.
  • the voter core 420 may be instructed to wait for a random percentage of the maximum ballots (i.e., the size of the group sending the ballots) to arrive. This number could change with each vote or could be the same for all votes the voter core 420 manages.
  • the voter core 420 may exclude a random number or percentage of the ballots or all but a random number of the ballots. This number could potentially be reset at different time intervals.
  • the voter core 420 may randomly choose a value. This may occur in some applications where the particularly bad data has already been discarded in the exclusion stage 520.
  • Weighted voting improves security by giving different replicas varying amounts of trust in the different stages of the voting. For example, in the quorum stage 510, instead of waiting for a given number of ballots, the voter core 420 may wait for some number of "points," where each replica is assigned a different number of points. When a ballot arrives, that ballot's arrival counts towards the number of points dictated by its quorum weighting. In the collation stage 530, the voter core 420 may perform its operations after the remaining ballots are expanded based on a weighting. For example, suppose there are ballots with values ⁇ 2,3,4 ⁇ from replicas 1-3, respectively, and the weighting is ⁇ 1,2,2 ⁇ . After expansion, the ballots will be ⁇ 2,3,3,4,4 ⁇ and this is what the voter core 420 operates upon to choose a value, such as median, mode, mean, or random.
  • the voter core 420 may throw exceptions as directed by the current voting policy. In this case, the voter 310 may return an exception to the client that would normally receive the vote. For example, in the quorum stage 510, the voter core 420 may throw an exception if the quorum is not met within a predetermined period of time. In the exclusion stage 520, the voter core 420 may throw an exception if too many members (i.e., replicas) are excluded. In the collation stage 530, the voter core 420 may throw an exception if the number of ballots that included the voted-upon value was too small.
  • Fig. 6 is an exemplary diagram of voter support 320 in an implementation consistent with the present invention.
  • thin lines denote runtime interactions that may occur synchronously to the arrival of a ballot
  • dashed lines denote runtime interactions that may occur asynchronously to the arrival of a ballot
  • thick lines denote interactions that may occur at compile time.
  • Voter support 320 may include server files 610, repository 620, preload modules 630 and 640, lookup table 650, voting description language (VDL) files 660, compiler 670, analysis tools 680, and vote manager 690.
  • the server files 610 and repository 620 store information regarding types of application-level parameters used by particular interfaces.
  • the preload modules 630 and 640 access the server files 610 and repository 620, respectively, to load parameter types into the lookup table 650. To save time, the preload modules 630 and 640 may load the parameter types at compile time rather than runtime.
  • the lookup table 650 stores the parameter types from the server files 610 and the repository 620.
  • the unmarshal module 410 may access the lookup table 650 to determine the types of application-level parameters in a received request or reply. If the lookup table 650 does not have the information, the unmarshal module 410 may directly access the repository 620.
  • the VDL files 660 store voting policies for use by the voter core 420.
  • the following exemplary VDL code fragments may be used: a) "wait until 4 ballots have arrived, exclude the lowest one, then choose a random one from those left" quorum 4 exclusion lowest 1 collation random b) "wait until half of the ballots have arrived, exclude a random one, then choose the median of those left" quorum 50 percent exclusion random 1 collation median c) "wait until all but 2 of the messages have arrived, exclude the two highest-valued ones, then choose the most common one left" quorum all but 2 exclusion highest 2 collation mode
  • the compiler 670 reads a voting, policy file from the VDL files 660 and loads it into the vote manager 690.
  • the analysis tools 680 analyze the voting policy file to provide the vote manager 690 with quantitative tradeoffs of the policy, such as its costs and benefits under various operating conditions, and with various group memberships and performance. Based on the inputs from the compiler 670 and the analysis tools 680, the vote manager 690 provides a current voting policy to the voter core 420, so that the voter core 420 may.determine the appropriate parameters to supply to the marshal module 430.
  • Fig. 7 is an exemplary diagram of voting status service 330 in an implementation consistent with the present invention.
  • the voter core 420 tracks various kinds of information to implement its functionality. Some of this information may be useful to other entities, so the voter core 420 provides the information to the voting status service 330.
  • the voting status service 330 receives various voting statistics from the voter core 420, including, for example, the progress of each ballot through the voter core 420, such as whether the ballot arrived after the quorum arrived and whether it was excluded or chosen in the collation stage 530.
  • the voter core 420 may also provide information to the voting status service 330 regarding whether a replica sends a duplicate ballot or one which is early, and so forth.
  • the voting status service 330 collects this information and maintains various kinds of moving averages, called status conditions. Different entities may then register to be informed when a given status condition crosses a threshold. For example, a security management system may want to know if a given replica (or host or domain) is frequently giving bad data (i.e., its ballots are being excluded by non- random exclusion operators). A performance management system may want to track the performance of the system (e.g., whether ballots from particular replicas are frequently late).
  • the voting status service 330 may provide information regarding group communication systems. Group communication systems that provide virtual synchrony may suffer severe performance degradations when even a single group member is overloaded. It is typically very difficult or impossible for such a system to exclude a slow member without knowing how the group is being used by the client application or other information not generally available to the system.
  • the voting status service 330 may, however, provide the needed information. For example, the voting status service 330 may send an alert to the group communication system when it receives replies from a particular member that are sufficiently late compared to the others over some recent moving average of votes.
  • This alert can be based, for example, on the following conditions: a) a given replica in a given group is late "too often;” b) any replica in a given group is late "too often;” c) any replica on a given host is late “too often;” d) too many replicas on any one host are late "too often;” e) too many replicas in any one domain are late "too often;” where the "too often” is a moving average that the subscriber can parameterize. Given such an alert, then, the group communication system can exclude the slow member to improve performance.
  • the voter core 420 may also handle group membership changes in the group communication system. When a given member of the server group fails, that group member usually cannot send a ballot to the voting system 130. The group communication system typically discards any messages after having declared the member to have failed. In this case, upon receipt of a failure notification, the voter core 420 may safely reduce the number of ballots required to achieve a quorum (e.g., to meet 50%) in the quorum stage 510. If, however, the voter core 420 is past the quorum stage 510 and a ballot was already received (i.e., before the failure notification), the voter core 420 may use that ballot as it normally would.
  • a quorum e.g., to meet 50%
  • the group communication system may inform the voter core 420 of the invocation sequence number on which the new member is expected to first vote. Alternatively, the voter core 420 may simply wait for the first ballot from that member to arrive.
  • FIGs. 8A and 8B are flowcharts of system processing for voting on a message in an implementation consistent with the present invention.
  • the following description will assume that a single client 110 (Fig. 1) sends a request that is multicast to multiple replicated servers 120.
  • a single client 110 (Fig. 1) sends a request that is multicast to multiple replicated servers 120.
  • Fig. 1 sends a request that is multicast to multiple replicated servers 120.
  • Fig. 1 sends a request that is multicast to multiple replicated servers 120.
  • the description also applies to multiple replicated clients that send requests to one or more servers.
  • the unmarshal module 410 (Fig. 6) receives each of the messages [step 805] (Fig. 8A) and reads the message's header to obtain, for example, the interface name, the method name, and the direction (i.e., reply or request) [step 810]. Using this information, the unmarshal module 410 accesses the lookup table 650 to determine the types of application-level parameters included in the message [step 815].
  • the preload modules 630 and 640 Prior to access by the unmarshal module 410, the preload modules 630 and 640 load parameter types into the lookup table 650. The preload modules 630 and 640 may do this at compile time so that this information will be available when needed by the unmarshal module 410. If the lookup table 650 does not have the information on a particular interface, the unmarshal module 410 may directly access the repository 620. If the repository 620 also lacks the information, the voter 310 may throw an exception.
  • the unmarshal module 410 determines the parameter types, it extracts the parameters from the message [step 820] and converts them, if necessary, to the language used by the voter core 420 [step 825]. For example, the unmarshal module 410 may map the parameters from CORBA Interface Definition Language (IDL) type to their corresponding Java types. The unmarshal module 410 then sends the parameters and the parameter types to the voter core 420 [step 830].
  • IDL CORBA Interface Definition Language
  • the voter core 420 votes on the parameters based on the current voting policy supplied by the vote manager 690 [step 835] (Fig. 8B). As described above, the voter core 420 may perform the voting in many different ways, such as taking the median, mean, or mode value or using weighted or random voting, to generate a voted set of parameters. Once the voting has been completed, the voter core 420 may send voting statistics for tracking by the voting status service 330 (Fig. 3) [step 840].
  • the voter core 420 sends the voted set of parameters and the parameter types to the marshal module 430 [step 845].
  • the marshal module 430 uses this information to construct a message that it transmits to the client 110 [steps 850 and 855].
  • the marshal module 430 may also provide a confidence value to the client 110 [step 860].
  • the confidence value might be generated by the voter core 420 and indicate how good the 1 outcome was. For example, the value may depend on: the number of values that were "equal to" (for integers) or "close to” (for floating point values) the other values, the arrival distribution (jitter or standard deviation) of the messages, etc. This would permit the client to decide how to use a reply based on how good it is perceived to be.
  • Fig. 9 is an exemplary diagram of a system 900 in which systems and methods consistent with the present invention may be implemented.
  • the system 900 includes multiple replicated clients [1, . . . , M] 910 communicating with multiple replicated servers [1, . . . , N] 920 via a voting system 930.
  • the clients 910, servers 920, and voting system 930 may be implemented as one or more devices and may communicate via any communications medium (e.g., by wired or wireless communication; via a network, such as the Internet; via a storage device; etc.).
  • the voting system 930 is similar in structure and operation to the voting system 130 described above with regard to Figs. 1-8B. In this case, however, the voting system 930 receives multiple replicated requests for service from the replicated clients 910. The voting system 930 votes on the requests in the manner described above to generate a request message for sending to the servers 920. The voting system 930 multicasts the request message to the servers 920 and receives replicated replies. The voting system 930 votes on the replies to generate a reply message for sending to the replicated clients 910. The voting system 930 multicasts the reply message to the clients 910 to satisfy their requests.
  • Fig. 10 is an exemplary diagram of a system 1000 in which systems and methods consistent with the present invention may be implemented.
  • the system 1000 includes multiple replicated clients [1, . . . , M] 1010 communicating with multiple replicated servers [l, . . . , N] 1020 via multiple replicated voting systems [1, . . . , L] 1030.
  • the clients 1010, servers 1020, and voting systems 1030 may be implemented as one or more devices and may communicate via any communications medium (e.g., by wired or wireless communication; via a network, such as the Internet; via a storage device; etc.).
  • the voting systems 1030 are similar in structure and operation to the voting systems 130 and 930 described above with regard to Figs. 1-9. In this case, however, each of the voting systems 1030 receives all messages (requests/replies) in the same order to eliminate the need to synchronize between them. In the case of random voting, it may be necessary to seed all of the replicated voting systems 1030 with the exact same random seed, at least on a per-connection (client-server relationship) basis. The same may be true for weighted voting. Other restrictions may also apply so that multiple replicated voting systems [1, . . . , L] 1030 process each identical ballot in the same deterministic manner.
  • the voting system 130 may take the form of a gateway within the client-server request-reply path.
  • a normal CORBA client may obtain an object reference from its ORB by calling, for example, bind() that contacts a naming service for it.
  • the ORB may create a proxy (stub) object and then return a pointer or reference (depending on the language) for that proxy to the client. This is what the client considers the object reference.
  • the client may call the proxy object just like it would a local object.
  • the gateway approach modifies this by having the client not call the ORB's bind(), but rather a similar routine, such as a routine called connectQ.
  • This routine "forges” the object reference to "point” to the gateway (e.g., using the correct Internet protocol (IP) address and port).
  • IP Internet protocol
  • the method connectQ may call a method of the client-side ORB, such as a string_to_object() method that creates a proxy for the client, a pointer or reference to which is returned to the client as the return value from connectQ.
  • the client can now make invocations on this object reference just as it would one returned by the ORB's bindQ, and the HOP messages emitted by the ORB are delivered to the gateway.
  • the voting system 130 is not, however, restricted to using only the gateway approach.
  • the voting system 130 may use other approaches so long as it is passed a marshaled message or a vector of application-level parameters (i.e., iiop_msg[ ⁇ ...N] o ⁇ para ⁇ ik).
  • CORBA interceptors may be used to provide the system builder the ability to insert code at predefined locations in the ORB, either after or before the linearized request buffer has been created.
  • a tool taking as inputs the IDL and NDL, to generate a voting algorithm in a smart proxy (or smart stubs) (i.e., an optional layer containing user-level code above the ORB's proxy but below the client).
  • a smart proxy or smart stubs
  • Systems and methods consistent with the present invention provide voting at a high level of abstraction, generality, and portability.
  • middleware technology such as CORBA or quality objects (QuO)
  • these systems and methods can provide a powerful, well- specified, flexible, and adaptable tool for transmitting correct requests and replies between clients and servers, with wide-ranging applications including active replication, security, and N-version programming.
  • the ability of the systems and methods to monitor and report trends in failure modes has potential applications in performance, dependability, and security.
  • the voting system may use an application program interface (API) to define the voting system operations and pass a notification callback.
  • API application program interface
  • the API may provide a direct mapping of the voting system stages or higher-level functionality.
  • One such API might allow the specification of a failure mode, the maximum number of faults to be tolerated, and the initial number of replicas for the service.
  • Another API might offer simple choices, such as majority, all, consensus, etc.

Abstract

A voting system includes an unmarshal module (410), a voter (420), and a marshal module (430). The unmarshal module (410) receives multiple messages and extracts at least one parameter from each of the messages. The voter core (420) votes on each of the parameters based on a current voting policy. The material module (430) constructs a message using the voted parameters and outputs the constructed message.

Description

SYSTEMS AND METHODS FOR VOTING ON MULTIPLE MESSAGES
A. Technical Field The present invention relates generally to distributed systems and, more particularly, to systems and methods that vote on multiple requests to or replies from a server.
B. Background
The explosive growth of the Internet in the last few years has thrust distributed systems programs from the realm of research projects and hand-crafted specialty applications to common, almost ubiquitous, applications. Simultaneously, computer hardware has become much cheaper. These two trends have resulted in the rapid growth of programs being distributed across corporate networks, intranets, and extranets. As a result, society is relying much more heavily on services provided by distributed programs.
Replication is a commonly used technique to increase the availability of these services and lower the delay in accessing them. Replication typically includes replicating clients and/ or servers so that requests and replies may be generated and transmitted in an expedited manner.
One major form of replication includes active replication. The active replication scheme multicasts a service request from a client to all server replicas atomically and in order. Each replica then processes the request and sends back a reply. One of these replicas' replies must be chosen for the client. This process of choosing one reply from many is called "voting." The client uses the reply chosen by the voting process as needed (e.g., the reply may contain data the client has requested from the server). Current support for voting is quite limited, however, in both research projects and in commercial products and standards. This may be the result of the way such active replication schemes tend to handle the parameters of requests and replies, namely, as opaque blocks of data that can only be compared for equality. As a result, voting among active replicas generally consists of delivering a reply from the server replicas when some quorum of the servers (e.g., one, a majority, or all) have sent replies exactly identical byte-for-byte to the one being delivered. For the purpose of this voting procedure, the return value (e.g., "out parameters" and "inout parameters") are all included in the reply without distinction. In recent systems that also support replicated clients, a similarly limited functionality may be provided to vote on client requests before delivering one to the server.
In the absence of any voting algorithm capable of distinguishing the different parameters in a message, the technique of byte-by-byte comparison depends on properties of data marshaling that in some cases are not guaranteed by the relevant standards. These conventional systems often ignore important semantics of the components of the data being voted on, which depend on the signature of the method being implemented (i.e., the order and data types of parameters and return value). These systems also tend to be very inflexible in dealing with floating point values, as well as inexact voting techniques. Moreover, little progress has been made toward the ability to change voting algorithms at runtime in response to changing conditions or requirements and tradeoffs.
As a result, a need exists for a system that overcomes the deficiencies in the conventional systems and performs voting at a high level of abstraction, generality, and portability. SUMMARY
Systems and methods consistent with the present invention address this need by providing a voting system that provides a powerful, well- specified, flexible, and adaptable tool for transmitting correct requests and replies between clients and servers using wide-ranging applications and supporting middleware standards and products.
In accordance with the purpose of the invention as embodied and broadly described herein, a system includes an unmarshal module, a voter core, and a marshal module. The unmarshal module receives multiple messages and extracts at least one parameter from each of the messages. The voter core votes on each of the parameters based on a current voting policy. The marshal module constructs a message using the voted parameters and outputs the constructed message.
In another implementation consistent with the present invention, a portable voting system usable in a plurality of different types of middleware systems includes an unmarshal module, a voter core, and a marshal module. The unmarshal module receives multiple messages in a first format, translates the messages to a second format, and extracts at least one parameter from each of the messages. The voter core votes on the parameters based on a voting policy. The marshal module generates a message using the voted parameters, translates the generated message from the second format to a third format, and outputs the generated message.
In yet another implementation consistent with the present invention, a voting system includes a vote manager and a voter. The vote manager obtains a current voting policy from multiple voting policies. The voter receives multiple messages directed to at least one destination, extracts at least one parameter from each of the messages, votes on each of the parameters based on the current voting policy, constructs a message using the voted parameters, and outputs the constructed message to the at least one destination. In a further implementation consistent with the present invention, a method for voting includes receiving multiple ballots having associated values; excluding at least one of the ballots based on the values associated with the at least one ballot after a predetermined number of the ballots have been received; and selecting a value from the values associated with ones of the ballots remaining after the exclusion.
In another implementation consistent with the present invention, a voting system includes a voter and at least one interface. The voter performs operations, including receiving multiple messages directed to at least one destination, extracting at least one parameter from each of the messages, voting on each of the parameters based on a current voting policy, constructing a message using the voted parameters, and outputting the constructed message to the at least one destination. The interface permits at least one external entity to control the operations of the voter.
In yet another implementation consistent with the present invention, a voting state machine includes a quorum component, an exclusion component, and a collation component. The quorum component waits for a number of ballots to arrive. Each of the ballots includes at least one value. The exclusion component excludes at least one of the ballots after the number of the ballots arrive. The collation component selects a value from the values included in the non- excluded ballots as representative of the values included in the ballots.
BRIEF DESCRIPTION OF THE DRAWINGS
The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate the invention and, together with the description, explain the invention. In the drawings,
Fig. l is a diagram of an exemplary system in which systems and methods consistent with the present invention may be implemented; Fig. 2 is an exemplary diagram of a computer device upon which systems and methods consistent with the present invention may be implemented;
Fig. 3 is a diagram of an exemplary voting system consistent with the present invention;
Fig. 4 is a diagram of an exemplary voter of Fig. 3 in an implementation consistent with the present invention;
Fig. 5 is a flowchart of voting stages entered into by the voter core of Fig. 4 in an implementation consistent with the present invention; Fig. 6 is a diagram of the voter support of Fig. 3 in an implementation consistent with the present invention;
Fig. 7 is a diagram of the voter status service of Fig. 3 in an implementation consistent with the present invention;
Figs. 8A and 8B are flowcharts of system processing for voting on a message in an implementation consistent with the present invention;
Fig. 9 is an exemplary diagram of an alternate system in which systems and methods consistent with the present invention may be implemented; and
Fig. 10 is an exemplary diagram of another alternate system in which systems and methods consistent with the present invention may be implemented.
DETAILED DESCRIPTION
The following detailed description of the invention refers to the accompanying drawings. The same reference numbers in different drawings identify the same or similar elements. Also, the following detailed description does not limit the invention. Instead, the scope of the invention is defined by the appended claims and equivalents. Systems and methods consistent with the present invention provide a voting mechanism that allows voting policies to be changed at runtime and works with presentation layers, such as those for distributed objects, message-oriented middleware, and other kinds of middleware. The systems and methods support both dynamic collation voting, where voting algorithms can be influenced by group membership changes, and static collation voting, where voting algorithms cannot be influenced by group membership changes.
EXEMPLARY SYSTEM Fig. l is an exemplary diagram of a system 100 in which systems and methods consistent with the present invention maybe implemented. The system 100 includes a client no communicating with multiple replicated servers [l, . . . , N] 120 via a voting system 130. The client 110, servers 120, and voting system 130 may be implemented as one or more devices and may communicate via any communications medium (e.g., by wired or wireless communication; via a network, such as the Internet; via a storage device; etc.).
The client no may include any device or object that can communicate with a server 120, such as a personal computer, a laptop, a personal digital assistant (PDA), or a process running on one of these devices. The client 110 may request data from a server 120. Each of the servers 120 may include any conventional server device or object. The server 120, for example, may supply data for use by the client 110. A single client has been shown for simplicity. One skilled in the art would recognize that the system 100 may include any number of clients 110 and servers 120.
The voting system 130 may include a device, such as a computer, or a process operating on one or more devices. The voting system 130 receives replies from each of the replicated servers 120, generates a reply from the received replies based on a current voting policy, and transmits the reply to the client 110. Fig. 2 is an exemplary diagram of a device 200 that may incorporate client no, one or more of the servers 120, and/or the voting system 130 in one implementation consistent with the present invention. One skilled in the art would recognize that other configurations of the device 200 are possible.
The device 200 may include a bus 210, a processor 220, a main memory 230, a read only memory (ROM) 240, a storage device 250, an input device 260, an output device 270, and a communication interface 280. The bus 210 permits communication among the components of the device 200.
The processor 220 may include any type of conventional processor or microprocessor that interprets and executes instructions. The main memory 230 may include a random access memory (RAM) or another dynamic storage device that stores information and instructions for execution by the processor 220. The ROM 240 may include a conventional ROM or another type of static storage device that stores static information and instructions for use by the processor 220. The storage device 250 may include a magnetic and/or optical recording medium and its corresponding drive. The input device 260 may include any conventional mechanism that permits an operator to input information to the device 200, such as a keyboard, a mouse, a pen, voice recognition and/ or biometric mechanisms, etc. The output device 270 may include any conventional mechanism that outputs information to the operator, including a display, a printer, a pair of speakers, etc. The communication interface 280 may include any transceiver-like mechanism that enables the device 200 to communicate with other devices and/ or systems. For example, the communication interface 280 may include mechanisms for communicating with another device or system via a network. As will be described in detail below, a device 200, consistent with the present invention, performs tasks in response to processor 220 executing sequences of instructions contained in a computer-readable medium, such as memory 230. A computer-readable medium may include one or more memory devices and/or carrier waves.
Execution of the sequences of instructions contained in the computer-readable medium causes processor 220 to perform processes that will be described later. Alternatively, hardwired circuitry may be used in place of or in combination with software instructions to implement processes consistent with the present invention. Thus, the present invention is not limited to any specific combination of hardware circuitry and software.
EXEMPLARY NOTING SYSTEM
Fig. 3 is an exemplary diagram of a voting system 130 consistent with the present invention. The voting system 130 may include a voter 310, voter support 320, and voting status service 330. The voter 310 performs the voting process.
A. Voter
Fig. 4 is an exemplary diagram of the voter 310. The voter 310 may include an unmarshal module 410, a voter core 420, and a marshal module 430. The unmarshal module 410 may include conventional mechanisms that receive network messages from each of the replicas and convert them into sets of parameters in a language used by the voter core 420. For example, in a Common Object Request Broker Architecture (CORBA) system, the unmarshal module 410 may receive network messages "flattened" for transmission into CORBA General Inter-Object Request Broker (ORB) protocol (GIOP)'s common data representation (CDR) format that is used with Internet Inter-ORB protocol (HOP) and convert them into sets of parameters in the voter core 420 language, such as Java. The marshal module 430 may include conventional mechanisms that construct a message from the set of parameters chosen or constructed by the voter core 420 and transmit the message to the client in the language in which it was received or the language used by the client. The voter core 420 selects or constructs one set of parameters, from the received sets of parameters, to be returned to the client. The voter core 420 may receive inputs from the voter support 320 that inform it of the current voting policy, as well as notifying it when a failure occurs. The voter core 420 may output status information on the voting to the voting status service 330 to help detect performance problems, potential intrusions, or other anomalies.
Fig. 5 is a flowchart of voting stages entered into by the voter core 420 in an implementation consistent with the present invention. For the description that follows, a "ballot" is a request or reply message sent by a single replica. A "vote" is the process of choosing (or constructing) one message from among all of the ballots.
The voter core 420 may operate in three stages in the processing of a single vote in one implementation consistent with the present invention. At each stage, the current voting policy dictates the action of the voter core 420. The three stages include a quorum stage 510, an exclusion stage 520, and a collation stage 530. Other stages may be possible.
While in the quorum stage 510, the voter core 420 waits for enough ballots to arrive to begin the actual voting process. For example, the voter core 420 may wait for: a) N ballots b) All but N ballots c) x% of the ballots d) N identical ballots The first operation involves static voting, while the second, third, and fourth involve dynamic voting.
After sufficient ballots arrive, the voter core 420 enters the exclusion stage 520. In this stage, the voter core 420 may exclude a number of ballots from further consideration to enable the voter 310 to tolerate value failures, such as failures caused by a faulty (and undetected) floating point processor or a breach in security (e.g., an adversary could break into either the voter 310 or the middleware to insert messages that look like correct request or reply messages, but contain incorrect data). For example, the voter core 420 may exclude: a) The lowest n values b) The highest n values c) The furthest n values (from the median, for integers; can be high or low) d) All values e distance from the mean (for floating point values) or median (for integers) e) All values x sigma from the mean (for floating point values) f) None
After the exclusion stage 520, the voter core 420 enters the collation stage 530. In this stage, the voter core 420 chooses or constructs a value from those values that were not excluded in the exclusion stage 520. For example, the voter core 420 may: a) Choose the median value b) Choose the mean value (for floating point parameters) c) Choose the mode value (i.e., the most common one) d) Construct a value
In some cases, it maybe useful for the voter core 420 to choose the "best" parameter for each parameter in a message, then combine these "best" values into one request/reply message. In this case, the actual vector selected by the voter core 420 may not have been generated by any client or server. For example, if server [1] returned {1,2} and server [2] returned {3,4}, the voter core 420 might create a reply message of {3,2} by choosing the "best" of each parameter. The current voting policy might specify what is best. Other types of voting may also be possible. For example, the voting policy may instruct the voter core 420 to perform random voting or weighted voting operations. Random voting operations may be performed by the voter core 420 at any or all of the three stages to help thwart an adversary. For example, in the quorum stage 510, the voter core 420 may be instructed to wait for a random percentage of the maximum ballots (i.e., the size of the group sending the ballots) to arrive. This number could change with each vote or could be the same for all votes the voter core 420 manages. In the exclusion stage 520, the voter core 420 may exclude a random number or percentage of the ballots or all but a random number of the ballots. This number could potentially be reset at different time intervals. In the collation stage 530, the voter core 420 may randomly choose a value. This may occur in some applications where the particularly bad data has already been discarded in the exclusion stage 520.
Weighted voting improves security by giving different replicas varying amounts of trust in the different stages of the voting. For example, in the quorum stage 510, instead of waiting for a given number of ballots, the voter core 420 may wait for some number of "points," where each replica is assigned a different number of points. When a ballot arrives, that ballot's arrival counts towards the number of points dictated by its quorum weighting. In the collation stage 530, the voter core 420 may perform its operations after the remaining ballots are expanded based on a weighting. For example, suppose there are ballots with values {2,3,4} from replicas 1-3, respectively, and the weighting is {1,2,2}. After expansion, the ballots will be {2,3,3,4,4} and this is what the voter core 420 operates upon to choose a value, such as median, mode, mean, or random.
At each of the three stages 510-530, the voter core 420 may throw exceptions as directed by the current voting policy. In this case, the voter 310 may return an exception to the client that would normally receive the vote. For example, in the quorum stage 510, the voter core 420 may throw an exception if the quorum is not met within a predetermined period of time. In the exclusion stage 520, the voter core 420 may throw an exception if too many members (i.e., replicas) are excluded. In the collation stage 530, the voter core 420 may throw an exception if the number of ballots that included the voted-upon value was too small.
B. Voter Support
Fig. 6 is an exemplary diagram of voter support 320 in an implementation consistent with the present invention. In the figure, thin lines denote runtime interactions that may occur synchronously to the arrival of a ballot, dashed lines denote runtime interactions that may occur asynchronously to the arrival of a ballot, and thick lines denote interactions that may occur at compile time.
Voter support 320 may include server files 610, repository 620, preload modules 630 and 640, lookup table 650, voting description language (VDL) files 660, compiler 670, analysis tools 680, and vote manager 690. The server files 610 and repository 620 store information regarding types of application-level parameters used by particular interfaces. The preload modules 630 and 640 access the server files 610 and repository 620, respectively, to load parameter types into the lookup table 650. To save time, the preload modules 630 and 640 may load the parameter types at compile time rather than runtime.
The lookup table 650 stores the parameter types from the server files 610 and the repository 620. The unmarshal module 410 may access the lookup table 650 to determine the types of application-level parameters in a received request or reply. If the lookup table 650 does not have the information, the unmarshal module 410 may directly access the repository 620.
The VDL files 660 store voting policies for use by the voter core 420. To perform the following voting policies, the following exemplary VDL code fragments (preceded by English translations) may be used: a) "wait until 4 ballots have arrived, exclude the lowest one, then choose a random one from those left" quorum 4 exclusion lowest 1 collation random b) "wait until half of the ballots have arrived, exclude a random one, then choose the median of those left" quorum 50 percent exclusion random 1 collation median c) "wait until all but 2 of the messages have arrived, exclude the two highest-valued ones, then choose the most common one left" quorum all but 2 exclusion highest 2 collation mode
The compiler 670 reads a voting, policy file from the VDL files 660 and loads it into the vote manager 690. The analysis tools 680 analyze the voting policy file to provide the vote manager 690 with quantitative tradeoffs of the policy, such as its costs and benefits under various operating conditions, and with various group memberships and performance. Based on the inputs from the compiler 670 and the analysis tools 680, the vote manager 690 provides a current voting policy to the voter core 420, so that the voter core 420 may.determine the appropriate parameters to supply to the marshal module 430. C. Voting Status Service
Fig. 7 is an exemplary diagram of voting status service 330 in an implementation consistent with the present invention. The voter core 420 tracks various kinds of information to implement its functionality. Some of this information may be useful to other entities, so the voter core 420 provides the information to the voting status service 330.
The voting status service 330 receives various voting statistics from the voter core 420, including, for example, the progress of each ballot through the voter core 420, such as whether the ballot arrived after the quorum arrived and whether it was excluded or chosen in the collation stage 530. The voter core 420 may also provide information to the voting status service 330 regarding whether a replica sends a duplicate ballot or one which is early, and so forth.
The voting status service 330 collects this information and maintains various kinds of moving averages, called status conditions. Different entities may then register to be informed when a given status condition crosses a threshold. For example, a security management system may want to know if a given replica (or host or domain) is frequently giving bad data (i.e., its ballots are being excluded by non- random exclusion operators). A performance management system may want to track the performance of the system (e.g., whether ballots from particular replicas are frequently late).
The voting status service 330 may provide information regarding group communication systems. Group communication systems that provide virtual synchrony may suffer severe performance degradations when even a single group member is overloaded. It is typically very difficult or impossible for such a system to exclude a slow member without knowing how the group is being used by the client application or other information not generally available to the system. The voting status service 330 may, however, provide the needed information. For example, the voting status service 330 may send an alert to the group communication system when it receives replies from a particular member that are sufficiently late compared to the others over some recent moving average of votes. This alert can be based, for example, on the following conditions: a) a given replica in a given group is late "too often;" b) any replica in a given group is late "too often;" c) any replica on a given host is late "too often;" d) too many replicas on any one host are late "too often;" e) too many replicas in any one domain are late "too often;" where the "too often" is a moving average that the subscriber can parameterize. Given such an alert, then, the group communication system can exclude the slow member to improve performance.
The voter core 420 may also handle group membership changes in the group communication system. When a given member of the server group fails, that group member usually cannot send a ballot to the voting system 130. The group communication system typically discards any messages after having declared the member to have failed. In this case, upon receipt of a failure notification, the voter core 420 may safely reduce the number of ballots required to achieve a quorum (e.g., to meet 50%) in the quorum stage 510. If, however, the voter core 420 is past the quorum stage 510 and a ballot was already received (i.e., before the failure notification), the voter core 420 may use that ballot as it normally would.
When a new member joins the server group, the group communication system may inform the voter core 420 of the invocation sequence number on which the new member is expected to first vote. Alternatively, the voter core 420 may simply wait for the first ballot from that member to arrive.
EXEMPLARY SYSTEM PROCESSING Figs. 8A and 8B are flowcharts of system processing for voting on a message in an implementation consistent with the present invention. The following description will assume that a single client 110 (Fig. 1) sends a request that is multicast to multiple replicated servers 120. One skilled in the art would recognize that the description also applies to multiple replicated clients that send requests to one or more servers.
Processing begins when the client 110 sends a request that is multicast to the replicated servers 120. Each of the servers 120 services the request and returns a reply message that is received by the voting system 130. The unmarshal module 410 (Fig. 6) receives each of the messages [step 805] (Fig. 8A) and reads the message's header to obtain, for example, the interface name, the method name, and the direction (i.e., reply or request) [step 810]. Using this information, the unmarshal module 410 accesses the lookup table 650 to determine the types of application-level parameters included in the message [step 815].
Prior to access by the unmarshal module 410, the preload modules 630 and 640 load parameter types into the lookup table 650. The preload modules 630 and 640 may do this at compile time so that this information will be available when needed by the unmarshal module 410. If the lookup table 650 does not have the information on a particular interface, the unmarshal module 410 may directly access the repository 620. If the repository 620 also lacks the information, the voter 310 may throw an exception.
Once the unmarshal module 410 determines the parameter types, it extracts the parameters from the message [step 820] and converts them, if necessary, to the language used by the voter core 420 [step 825]. For example, the unmarshal module 410 may map the parameters from CORBA Interface Definition Language (IDL) type to their corresponding Java types. The unmarshal module 410 then sends the parameters and the parameter types to the voter core 420 [step 830].
The voter core 420 votes on the parameters based on the current voting policy supplied by the vote manager 690 [step 835] (Fig. 8B). As described above, the voter core 420 may perform the voting in many different ways, such as taking the median, mean, or mode value or using weighted or random voting, to generate a voted set of parameters. Once the voting has been completed, the voter core 420 may send voting statistics for tracking by the voting status service 330 (Fig. 3) [step 840].
The voter core 420 sends the voted set of parameters and the parameter types to the marshal module 430 [step 845]. The marshal module 430 uses this information to construct a message that it transmits to the client 110 [steps 850 and 855]. The marshal module 430 may also provide a confidence value to the client 110 [step 860]. The confidence value might be generated by the voter core 420 and indicate how good the1 outcome was. For example, the value may depend on: the number of values that were "equal to" (for integers) or "close to" (for floating point values) the other values, the arrival distribution (jitter or standard deviation) of the messages, etc. This would permit the client to decide how to use a reply based on how good it is perceived to be.
CLIENT REPLICATION
The preceding description referred to replicated servers and indicated that replicated clients may also be possible. Fig. 9 is an exemplary diagram of a system 900 in which systems and methods consistent with the present invention may be implemented. The system 900 includes multiple replicated clients [1, . . . , M] 910 communicating with multiple replicated servers [1, . . . , N] 920 via a voting system 930. The clients 910, servers 920, and voting system 930 may be implemented as one or more devices and may communicate via any communications medium (e.g., by wired or wireless communication; via a network, such as the Internet; via a storage device; etc.).
The voting system 930 is similar in structure and operation to the voting system 130 described above with regard to Figs. 1-8B. In this case, however, the voting system 930 receives multiple replicated requests for service from the replicated clients 910. The voting system 930 votes on the requests in the manner described above to generate a request message for sending to the servers 920. The voting system 930 multicasts the request message to the servers 920 and receives replicated replies. The voting system 930 votes on the replies to generate a reply message for sending to the replicated clients 910. The voting system 930 multicasts the reply message to the clients 910 to satisfy their requests.
VOTER REPLICATION
The preceding description referred to replicated servers and clients. Replicating voting systems may also be desirable in some configurations. Fig. 10 is an exemplary diagram of a system 1000 in which systems and methods consistent with the present invention may be implemented. The system 1000 includes multiple replicated clients [1, . . . , M] 1010 communicating with multiple replicated servers [l, . . . , N] 1020 via multiple replicated voting systems [1, . . . , L] 1030. The clients 1010, servers 1020, and voting systems 1030 may be implemented as one or more devices and may communicate via any communications medium (e.g., by wired or wireless communication; via a network, such as the Internet; via a storage device; etc.).
The voting systems 1030 are similar in structure and operation to the voting systems 130 and 930 described above with regard to Figs. 1-9. In this case, however, each of the voting systems 1030 receives all messages (requests/replies) in the same order to eliminate the need to synchronize between them. In the case of random voting, it may be necessary to seed all of the replicated voting systems 1030 with the exact same random seed, at least on a per-connection (client-server relationship) basis. The same may be true for weighted voting. Other restrictions may also apply so that multiple replicated voting systems [1, . . . , L] 1030 process each identical ballot in the same deterministic manner.
CORBA EXAMPLE
In a CORBA system, the voting system 130 may take the form of a gateway within the client-server request-reply path. In this case, a normal CORBA client may obtain an object reference from its ORB by calling, for example, bind() that contacts a naming service for it. The ORB may create a proxy (stub) object and then return a pointer or reference (depending on the language) for that proxy to the client. This is what the client considers the object reference.
After this, the client may call the proxy object just like it would a local object. The gateway approach modifies this by having the client not call the ORB's bind(), but rather a similar routine, such as a routine called connectQ. This routine "forges" the object reference to "point" to the gateway (e.g., using the correct Internet protocol (IP) address and port). The method connectQ may call a method of the client-side ORB, such as a string_to_object() method that creates a proxy for the client, a pointer or reference to which is returned to the client as the return value from connectQ. The client can now make invocations on this object reference just as it would one returned by the ORB's bindQ, and the HOP messages emitted by the ORB are delivered to the gateway.
The voting system 130 is not, however, restricted to using only the gateway approach. The voting system 130 may use other approaches so long as it is passed a marshaled message or a vector of application-level parameters (i.e., iiop_msg[ι...N] oτparaπik). For example, CORBA interceptors may be used to provide the system builder the ability to insert code at predefined locations in the ORB, either after or before the linearized request buffer has been created.
It may also be possible to create a tool, taking as inputs the IDL and NDL, to generate a voting algorithm in a smart proxy (or smart stubs) (i.e., an optional layer containing user-level code above the ORB's proxy but below the client).
CONCLUSION
Systems and methods consistent with the present invention provide voting at a high level of abstraction, generality, and portability. When added to appropriate middleware technology, such as CORBA or quality objects (QuO), these systems and methods can provide a powerful, well- specified, flexible, and adaptable tool for transmitting correct requests and replies between clients and servers, with wide-ranging applications including active replication, security, and N-version programming. Furthermore, the ability of the systems and methods to monitor and report trends in failure modes has potential applications in performance, dependability, and security.
The foregoing description of preferred embodiments of the present invention provides illustration and description, but is not intended to be exhaustive or to limit the invention to the precise form disclosed.
Modifications and variations are possible in light of the above teachings or may be acquired from practice of the invention. For example, while a series of steps have been provided with regard to Figs. 8A and 8B, the order of the steps may change in other implementations consistent with the present invention. Further, the preceding description related to client-server "pull" communication, where the client initiates a request-reply invocation with the server. Systems and methods consistent with the present invention may operate equally well with other styles of communication, such as "push" communication. "Push" communication involves a producer (or publisher) asynchronously "pushing" an event to its consumers (or subscribers). This is similar to a class of commercial software known as message-oriented middleware.
In addition, while a voting system has been described as operating upon a voting language (NDL), in other implementations consistent with the present invention, the voting system interacts with other types of applications. In this case, the voting system may use an application program interface (API) to define the voting system operations and pass a notification callback. The API may provide a direct mapping of the voting system stages or higher-level functionality. One such API might allow the specification of a failure mode, the maximum number of faults to be tolerated, and the initial number of replicas for the service. Another API might offer simple choices, such as majority, all, consensus, etc.
The scope of the invention is defined by the claims and their equivalents.

Claims

WHAT IS CLAIMED IS:
1. A voting system, comprising: an unmarshal module configured to receive a plurality of messages and extract at least one parameter from each of the messages; a voter core configured to vote on each of the parameters based on a current voting policy; and a marshal module configured to construct a message using the voted parameters and output the constructed message.
2. The voting system of claim l, wherein the unmarshal module is configured to receive a plurality of reply messages from a plurality of replicated servers.
3. The voting system of claim l, wherein the unmarshal module is configured to receive a plurality of request messages from a plurality of replicated clients.
4. The voting system of claim l, wherein the unmarshal module is configured to read a header of the messages to obtain at least one of an interface name, a method name, and a message direction.
5. The voting system of claim 4, wherein the unmarshal module is configured to determine types of the parameters included in the messages from the message headers.
6. The voting system of claim 5, wherein the unmarshal module is further configured to access a lookup table to identify the types of parameters included in the messages.
7. The voting system of claim 1, wherein the unmarshal module is -further configured to translate the parameters from a language in which the messages we^e received to a language used by the voter core.
8. The voting system of claim 1, wherein the voter core is further configured to wait for a predetermined number of messages to be received by the unmarshal module before voting on the parameters.
9. The voting system of claim 8, wherein the voter core is configured to wait for a random number of messages.
10. The voting system of claim 1, wherein the voter core is further configured to exclude a number of messages before voting on the parameters.
11. The voting system of claim 10, wherein the voter core is configured to exclude a random number of messages.
12. The voting system of claim 1, wherein the voter core is configured to select one of the parameters when voting on the parameters.
13. The voting system of claim 12, wherein the voter core is configured to randomly select one of the parameters.
14. The voting system of claim 1, wherein the voter core is configured to construct a parameter from the parameters when voting on the parameters.
15. The voting system of claim 1, wherein the voter core is configured to weight at least one of the parameters before voting on the parameters.
16. The voting system of claim 1, wherein the voter core is further configured to generate voting statistics corresponding to the voting and output the voting statistics for use by one or more external entities.
17. The voting system of claim 1, wherein the voter core is further configured to generate a confidence value based on an outcome of the voting; and wherein the marshal module is further configured to output the confidence value with the constructed message.
18. The voting system of claim 1, wherein the marshal module is further configured to translate the voted parameters from a language used by the voter core to a language in which the messages were received.
19. The voting system of claim 1, wherein the marshal module is further configured to translate the voted parameters from a language used by the voter core to a language used by a recipient of the constructed message.
20. A voting system, comprising: means for receiving a plurality of messages; means for extracting a parameter from each of the messages; means for voting on each of the parameters based on a current voting policy; means for constructing a message using the voted parameters; and means for outputting the constructed message.
21. A method for voting, comprising: receiving at least one message; extracting at least one parameter from each of the messages; voting on each of the parameters based on a current voting policy; constructing a message using the voted parameters; and outputting the constructed message.
22. The method of claim 21, wherein the receiving includes: receiving a plurality of reply messages from a plurality of replicated servers.
23. The method of claim 21, wherein the receiving includes: receiving a plurality of request messages from a plurality of replicated clients.
24. The method of claim 21, wherein the extracting includes: reading a header of the messages to obtain at least one of an interface name, a method name, and a message direction.
25. The method of claim 24, wherein the extracting further includes: determining types of the parameters included in the messages from the message headers.
26. The method of claim 25, wherein the determining includes: accessing a lookup table to identify the types of parameters included in the messages.
27. The method of claim 21, further comprising: translating the parameters from a language in which the messages were received to a language used to perform the voting.
28. The method of claim 21, further comprising: waiting for a predetermined number of messages to be received before voting on the parameters.
29. The method of claim 28, wherein the waiting includes: waiting for a random number of messages.
30. The method of claim 21, further comprising: excluding a number of messages before voting on the parameters.
31. The method of claim 30, wherein the excluding includes: excluding a random number of messages.
32. The method of claim 21, wherein the voting includes: selecting one of the parameters.
33. The method of claim 32, wherein the selecting includes: randomly choosing one of the parameters.
34. The method of claim 21, wherein the voting includes: constructing a parameter from the parameters.
35. The method of claim 21, wherein the voting includes: weighting at least one of the parameters before voting on the parameters.
36. The method of claim 21, further comprising: generating voting statistics corresponding to the voting; and outputting the voting statistics for use by one or more external entities.
37. The method of claim 21, further comprising: generating a confidence value based on an outcome of the voting; and outputting the confidence value with the constructed message.
38. The method of claim 21, further comprising: translating the voted parameters from a language used for the voting to a language in which the messages were received.
39. The method of claim 21, further comprising: translating the voted parameters from a language used for the voting to a language used by a recipient of the constructed message.
40. A computer readable medium containing instructions for causing at least one processor to perform a voting method, the method comprising: extracting at least one parameter from each of a plurality of received messages; voting on each of the parameters based on a current voting policy; and constructing a message using the voted parameters.
41. A portable voting system usable in a plurality of different types of middleware systems, comprising: an unmarshal module configured to receive a plurality of messages in a first format, translate the messages to a second format, and extract at least one parameter from each of the messages; a voter core configured to vote on the parameters based on a voting policy; and a marshal module configured to generate a message using the voted parameters, translate the generated message from the second format to a third format, and output the generated message.
42. The portable voting system of claim 41, wherein the first format and the third format are a same format.
43. A voting system, comprising: a vote manager configured to obtain a current voting policy from a plurality of voting policies; and a voter configured to receive a plurality of messages directed to at least one destination, extract at least one parameter from each of the messages, vote on each of the parameters based on the current voting policy, construct a message using the voted parameters, and output the constructed message to the at least one destination.
44. The voting system of claim 43, further comprising: a voting policy database configured to store the plurality of voting policies, the vote manager selecting one of the voting policies from the voting policy database as the current voting policy to be used by the voter.
45. A method for voting, comprising: receiving a plurality of ballots having associated values; excluding at least one of the ballots based on the values associated with the at least one ballot after a predetermined number of the ballots have been received; and selecting a value from the values associated with ones of the ballots remaining after the exclusion.
46. The method of claim 45, wherein the selecting includes: constructing a value from the values associated with the remaining ballots.
47. In a network connecting at least one client to a plurality of replicated servers via a plurality of voting systems, each of the voting systems comprising: an unmarshal module configured to receive a plurality of messages from the replicated servers and extract at least one parameter from each of the messages; a voter core configured to vote on each of the parameters based on a current voting policy; and a marshal module configured to construct a message using the voted parameters and output the constructed message to the at least one client.
48. In a network connecting a plurality of replicated clients to at least one server via a plurality of voting systems, each of the voting systems comprising: an unmarshal module configured to receive a plurality of messages from the replicated clients and extract at least one parameter from each of the messages; a voter core configured to vote on each of the parameters based on a current voting policy; and a marshal module configured to construct a message using the voted parameters and output the constructed message to the at least one server.
49. A voting system, comprising: a voter configured to perform a plurality of operations, including receiving a plurality of messages directed to at least one destination, extracting at least one parameter from each of the messages, voting on each of the parameters based on a current voting policy, constructing a message using the voted parameters, and outputting the constructed message to the at least one destination; and at least one interface configured to permit at least one external entity to control the operations of the voter.
50. A voting state machine, comprising: a quorum component configured to wait for a number of ballots to arrive, each of the ballots including at least one value; an exclusion component configured to exclude at least one of the ballots after the number of the ballots arrive; and a collation component configured to select a value from the values included in the non-excluded ballots as representative of the values included in the ballots.
51. The voting state machine of claim 50, wherein the quorum component is configured to wait for one of a predetermined number of the ballots, all but a predetermined number of the ballots, a percentage of a total number of the ballots, a predetermined number of identical ones of the ballots, and a random percentage of the total number of the ballots.
52. The voting state machine of claim 50, wherein the exclusion component is configured to exclude one of a lowest number of the values, a highest number of the values, a predetermined number of the values furthest from a median value, all of the values that are a predetermined distance from a mean or median value, all values some deviation from the mean value, and a random number or percentage of the values.
53. The voting state machine of claim 50, wherein the collation component is configured to construct a value from the values included in the non-excluded ballots.
54. The voting state machine of claim 50, wherein the collation component is configured to choose one of a median value from the values included in the non-excluded ballots, a mean value from the values included in the non-excluded ballots, a mode value from the values included in the non-excluded ballots, and a random one of the values included in the non-excluded ballots.
PCT/US2001/016830 2000-05-25 2001-05-23 Systems and methods for voting on multiple messages WO2001090851A2 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
AU2001263410A AU2001263410A1 (en) 2000-05-25 2001-05-23 Systems and methods for voting on multiple messages

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US58050100A 2000-05-25 2000-05-25
US09/580,501 2000-05-25

Publications (2)

Publication Number Publication Date
WO2001090851A2 true WO2001090851A2 (en) 2001-11-29
WO2001090851A3 WO2001090851A3 (en) 2003-02-06

Family

ID=24321352

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US2001/016830 WO2001090851A2 (en) 2000-05-25 2001-05-23 Systems and methods for voting on multiple messages

Country Status (2)

Country Link
AU (1) AU2001263410A1 (en)
WO (1) WO2001090851A2 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2013033827A1 (en) 2011-09-07 2013-03-14 Tsx Inc. High availability system, replicator and method

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5335324A (en) * 1985-08-30 1994-08-02 Hitachi, Ltd. Distributed processing system and method for job execution using a plurality of processors and including identification of replicated data
US5956489A (en) * 1995-06-07 1999-09-21 Microsoft Corporation Transaction replication system and method for supporting replicated transaction-based services
US6049872A (en) * 1997-05-06 2000-04-11 At&T Corporation Method for authenticating a channel in large-scale distributed systems
US6052718A (en) * 1997-01-07 2000-04-18 Sightpath, Inc Replica routing
US6233623B1 (en) * 1996-01-11 2001-05-15 Cabletron Systems, Inc. Replicated resource management system for managing resources in a distributed application and maintaining a relativistic view of state
US20020046273A1 (en) * 2000-01-28 2002-04-18 Lahr Nils B. Method and system for real-time distributed data mining and analysis for network

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5335324A (en) * 1985-08-30 1994-08-02 Hitachi, Ltd. Distributed processing system and method for job execution using a plurality of processors and including identification of replicated data
US5956489A (en) * 1995-06-07 1999-09-21 Microsoft Corporation Transaction replication system and method for supporting replicated transaction-based services
US6233623B1 (en) * 1996-01-11 2001-05-15 Cabletron Systems, Inc. Replicated resource management system for managing resources in a distributed application and maintaining a relativistic view of state
US6052718A (en) * 1997-01-07 2000-04-18 Sightpath, Inc Replica routing
US6049872A (en) * 1997-05-06 2000-04-11 At&T Corporation Method for authenticating a channel in large-scale distributed systems
US20020046273A1 (en) * 2000-01-28 2002-04-18 Lahr Nils B. Method and system for real-time distributed data mining and analysis for network

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
DATABASE DERWENT [Online] XP002953790 Retrieved from EAST Database accession no. 1998-581044 & RESEARCH DISCLOSURE vol. 415, no. 85, 20 September 1998, *
INGOLS KYLE W.: 'Availability study on dynamic voting algorithms', 05 February 2000, MIT DEPARTMENT OF ELECTRICAL ENGINEERING AND COMPUTER SCIENCE XP002953791 *

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2013033827A1 (en) 2011-09-07 2013-03-14 Tsx Inc. High availability system, replicator and method
CN103782545A (en) * 2011-09-07 2014-05-07 多伦多证券交易所 High availability system, replicator and method
EP2754265A1 (en) * 2011-09-07 2014-07-16 TSX Inc. High availability system, replicator and method
EP2754265A4 (en) * 2011-09-07 2015-04-29 Tsx Inc High availability system, replicator and method
AU2012307047B2 (en) * 2011-09-07 2016-12-15 Tsx Inc. High availability system, replicator and method

Also Published As

Publication number Publication date
WO2001090851A3 (en) 2003-02-06
AU2001263410A1 (en) 2001-12-03

Similar Documents

Publication Publication Date Title
Amir et al. Robust and E cient Replication using Group Communication
US7836031B2 (en) Systems and methods for employing a trigger-based mechanism to detect a database table change and registering to receive notification of the change
US6711606B1 (en) Availability in clustered application servers
US7155438B2 (en) High availability for event forwarding
US6138251A (en) Method and system for reliable remote object reference management
US9124666B2 (en) Reliability and availability of distributed servers
Townend et al. Fault tolerance within a grid environment
US20030187927A1 (en) Clustering infrastructure system and method
US20030158908A1 (en) Exactly once cache framework
US20050097300A1 (en) Processing system and method including a dedicated collective offload engine providing collective processing in a distributed computing environment
WO2008012333A1 (en) Method and apparatus for preserving isolation of web applications when executing fragmented requests
Felber et al. Optimistic active replication
EP1493261A2 (en) Retry technique for multi-tier network communication systems
US6745339B2 (en) Method for dynamically switching fault tolerance schemes
Li et al. A framework to support survivable web services
Ren et al. An adaptive algorithm for tolerating value faults and crash failures
WO2001090851A2 (en) Systems and methods for voting on multiple messages
Dolev et al. Total ordering of messages in broadcast domains
Chun et al. Design Considerations for Information Planes.
Yang et al. A fault-tolerant approach to secure information retrieval
EP0965926A2 (en) Improved availability in clustered application servers
Atkins et al. An efficient kernel-level dependable multicast protocol for distributed systems
Jones The voting virtual machine: Voting support for distributed systems
Frank et al. Picsou: Enabling Efficient Cross-Consensus Communication
US7437456B1 (en) Object reference generating device, object reference generating method and computer readable recording medium for recording an object reference generating program

Legal Events

Date Code Title Description
AK Designated states

Kind code of ref document: A2

Designated state(s): AE AG AL AM AT AT AU AZ BA BB BG BR BY BZ CA CH CN CR CU CZ CZ DE DE DK DK DM DZ EE EE ES FI FI GB GD GE GH GM HR HU ID IL IN IS JP KE KG KP KR KZ LC LK LR LS LT LU LV MA MD MG MK MN MW MX MZ NO NZ PL PT RO RU SD SE SG SI SK SK SL TJ TM TR TT TZ UA UG UZ VN YU ZA ZW

AL Designated countries for regional patents

Kind code of ref document: A2

Designated state(s): GH GM KE LS MW MZ SD SL SZ TZ UG ZW AM AZ BY KG KZ MD RU TJ TM AT BE CH CY DE DK ES FI FR GB GR IE IT LU MC NL PT SE TR BF BJ CF CG CI CM GA GN GW ML MR NE SN TD TG

121 Ep: the epo has been informed by wipo that ep was designated in this application
REG Reference to national code

Ref country code: DE

Ref legal event code: 8642

122 Ep: pct application non-entry in european phase
DFPE Request for preliminary examination filed prior to expiration of 19th month from priority date (pct application filed before 20040101)
NENP Non-entry into the national phase

Ref country code: JP