WO2000008823A1 - Load balance and fault tolerance in a network system - Google Patents

Load balance and fault tolerance in a network system Download PDF

Info

Publication number
WO2000008823A1
WO2000008823A1 PCT/US1999/002154 US9902154W WO0008823A1 WO 2000008823 A1 WO2000008823 A1 WO 2000008823A1 US 9902154 W US9902154 W US 9902154W WO 0008823 A1 WO0008823 A1 WO 0008823A1
Authority
WO
WIPO (PCT)
Prior art keywords
server
servers
work
peer group
computer
Prior art date
Application number
PCT/US1999/002154
Other languages
French (fr)
Inventor
John D. Keene
Jeffrey L. Farris
Original Assignee
E2 Software Corporation
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by E2 Software Corporation filed Critical E2 Software Corporation
Priority to AU24909/99A priority Critical patent/AU2490999A/en
Publication of WO2000008823A1 publication Critical patent/WO2000008823A1/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/02Marketing; Price estimation or determination; Fundraising
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/10Office automation; Time management
    • G06Q10/107Computer-aided management of electronic mailing [e-mailing]

Definitions

  • the present invention relates to load balancing and fault tolerance amongst computer servers functioning to track Internet/Intranet transactions.
  • the present invention relates to a system of load balancing and fault tolerance utilizing a lightweight algorithm and continually cycling processes to reduce exchange of server state information.
  • a mechanism is described to achieve both load balancing and fault tolerance.
  • the backup systems provide load balancing services while active. When a system fails, the remaining available systems take over the failed system's load.
  • a master system determines which participating system owns a decision track. Ownership of a decision track indicates responsibility for executing a contact gathering process and an event evaluation process. Step evaluation processes are distributed among available system within the same peer group .
  • Inet internet and intranet
  • sites have been added to networks.
  • a great expenditure of time and effort has been invested in creating a myriad of resources available to Inet browsers.
  • Inet forum it is useful to have tools to interact with those browsing the Inet such as being able to track parties contacting a particular Inet-site. It is important that these tools be reliable and responsive as an Inet contact may be the first and possibly only type of contact made with an individual .
  • the creation of virtual worlds online has further increased the importance of reliability and responsiveness. Purveyors of the Inet desire interactions that further emulate a real life commercial experience.
  • a traditional method of increasing transaction speed is to increase the speed of processor units running the application. Processors have limits however to the maximum throughput available. Increasing demands cannot always be met by a faster processing box.
  • Another method of increasing transaction speed is through shared processing amongst a plurality of processing units. Typically, however, this has involved very complicated hardware and software solutions requiring a sizable investment in man hours and expense. Often these types of solutions are not warranted for a dedicated Inet application.
  • this invention provides a method of load balancing and fault tolerance amongst a plurality of servers on a computer network, such as the internet or a private network.
  • a programmed computer server divides processing work into tracks of work that may be referred to as decision tracks.
  • Each decision track comprises a series of conditions that are to be tested by records of a database. As conditions of the decision track are tested and met, an appropriate action is taken in response to the condition met.
  • actions can be sequenced so as to achieve a desired result, as illustrated in Table 1 below.
  • Decision tracks are constructed so as to be able to be claimed by a computer server.
  • a plurality of computer servers is arranged into a peer group networked together.
  • the network enables the computer servers to communicate with each other.
  • Servers coordinate within the peer group to claim individual decision tracks.
  • the server owning a decision track processes initial work pertaining to that track and also allocates blocks of work from that decision track to other servers on the network who have advertised for work.
  • a server ceases. to communicate with the other servers in the peer group all uncompleted work assigned to a non-communicating server is reallocated to another server.
  • Peer groups also elect a master server.
  • the presence of a master server signifies that a group is functioning and able to handle common tasks.
  • a master server handles tasks pertaining to the peer group as a whole, such as e-mail.
  • the master is typically elected by a simple device such as the lowest machine number of each server.
  • an Internet can refer to, for example, a network comprising computers exceeding the boundaries of a private network.
  • An Intranet can refer to, for example, computers within a private network.
  • An Inet can refer to an Internet and/or an Intranet adhering to an internet protocol or similar protocol.
  • An Inet-site is, for example, a site available on either an Internet or an Intranet.
  • a network can have a computer acting as a server and a computer acting as a client.
  • a contact can, for example, be an access to an electronic interface such as a web site, or other contents of a stored memory such as a hard drive or dynamic random access memory of a server.
  • a client can be a person, a node operator, or broadly, a machine or electronic device making such contact, or causing a node of a network to make such a contact.
  • Real time is meant to be read broadly to signify on a basis timely to or in relation to an individual event.
  • FIG. 1 illustrates a typical configuration supporting this invention.
  • FIG. 2 illustrates the query process of a decision track
  • FIG. 3 illustrates a load balancing sequence
  • an apparatus method and system for load balancing and fault tolerance comprising a plurality of computer servers 110 networked together into a peer group 120 and also networked to a database server 130.
  • the network provides a means of communication between servers.
  • Work is divided into tracks 135 and distributed amongst the peer group servers according to the availability of each server to accommodate additional work. Utilizing a multitude of servers to process work effectively lessens the work required by a single server and effectively speeds the response of the system.
  • the ability of a peer group to allocate work amongst available servers, and then reallocate work if a particular server should become unavailable, provides fault tolerance.
  • Servers periodically notify other peer group servers of their presence on a network by way of a well- known device such as a "hello" message or an
  • Such advertisements are performed on a periodic cycle.
  • a preferred periodic cycle is about 15 seconds.
  • periodic cycles may be any length that is appropriate based on network characteristics, such as the number of nodes, the speed of communication, and the speed of the processor units. Generally, any periodic cycle between 5 seconds and 120 seconds is acceptable. If an advertisement is not received from a server for a predetermined number of periodic cycles, such as for example, 4 cycles of 15 seconds each, the other peer group servers will consider the mute server unavailable. Any work previously allocated to a server subsequently determined unavailable is reallocated amongst available servers .
  • work is structured so that it may be executed by decision tracks.
  • Each decision track comprises a series of queries to be made against records of a database. If the conditions of a query are met, 210 then an appropriate action may be taken, if the conditions are not met, then a next record, or a next set of conditions is queried.
  • a peer group computer server 110 will claim ownership of one or more decision tracks 135.
  • a computer server 110 performs initial work such as for example, contact gathering 320.
  • Contact gathering comprises creation of a set of contact records 145 that are to be put on a particular step of the decision track 135.
  • blocks of work comprising steps are created and can either be distributed to other computer servers 110 in the peer group 330 or performed by an owning server.
  • a block of work may consist of, by way of example, a set of contact records ready for the next step of a decision track to be performed on them, or a list of steps to be executed on a particular record.
  • a server evaluates events 340 for any changes in conditions and cycles through the process again.
  • Distribution of work is effectuated by a response to advertisements or requests for work sent out by various computer servers 110 included in a peer group 120.
  • a server As a server is capable of accepting additional work, it will send an advertisement to the other servers in the peer group requesting work, such as for example a step list block 350.
  • the requesting computer server 350 the executes indicated steps 360.
  • An owner computer server 110 who receives such an advertisement may send a block of work to the advertising computer server llOto be processed. In this manner there is a continual load sharing of available work.
  • decision track ownership is claimed by attaching a claim counter to an advertisement broadcast by a server.
  • a server will claim a decision track and set a counter to a predetermined interval, for example two. Each time the server broadcasts an advertisement, the counter decrements one. When the counter reaches zero the decision track is authoritatively owned by the claiming server.
  • Other peer group servers may challenge the claim for a decision track by claiming it for themselves during the counter interval .
  • ownership election reverts to an arbitration routine.
  • Arbitration determines ownership by a simple criterion such as the server with the least number of owned tracks . In the instance where two or more servers have an equal number of tracks, the ownership is awarded to the server with lowest machine
  • a preferred embodiment teaches each server 110 maintaining a table 155 to store the time of the most recent advertisement for each server and the decision tracks owned by each server.
  • Each server queries the table to test if a predetermined period has elapsed without notification from any of the peer group servers. If a predetermined period has elapsed without notification from a particular server, the non- communicating server is deemed to be unavailable. All decision tracks owned by unavailable servers are reallocated to the remaining servers. Reallocation is accomplished in much the same manner as initial election.
  • a server will advertise claiming ownership of a decision track of a server determined to be unavailable. If the advertisement is not challenged within a predetermined number of advertisement cycles, ownership is awarded to the advertising server.
  • a decision track 135 will not be without an owner for more than the predetermined period. After the predetermined period has elapsed another server 110 takes ownership and the work of the decommissioned server commences again.
  • Each peer group server 110 includes a copy of each decision track 135 as well as the table recording ownership of the various decision tracks. As a server 110 begins functioning as an owner, it records ownership in the table, and commences to perform the work allocated to the owner of that decision track.
  • a database server 130 stores the contact data records 145 referenced in the various blocks of work performed by peer group servers executing decision tracks. Typically, there is only one database server 130 from which all records are processed. In this manner all peer group servers have access to the same data.
  • a peer group 120 will also elect a master server 140.
  • the advertised presence of a master server declares that network connectivity exists, that the peer group is communicating properly, and that operations may commence. Elections for a master server 140 are based on a simple criterion such as the lowest ID of the servers involved.
  • the master will broadcast an "advertisement" or hello message, declaring its presence to other servers in the peer group.
  • One preferred embodiment of a periodic cycle is 15 seconds.
  • Another preferred embodiment of a periodic cycle is between 5 seconds and 120 seconds. The duration of the periodic cycle will depend on the speed of the network and the processing power of the servers.
  • a period may be comprised of more or less cycles depending on the criticality of the timing for the work being performed and the processing power of the servers.
  • Decision tracks 135 and the criteria for each step of a decision track 135 can be created and manipulated via a user interface 165.
  • graphical representation for each step of a decision making process correlating to each step of a decision track is created.
  • the graphical representation can facilitate accurate processing of data and ease of use.
  • Another method for creating decisions tracks would include a written language statement defining criteria for each decision.
  • a software program on a computer readable medium is loaded on a plurality of servers.
  • the software program comprises a front-end application that allows users to access a variety of the features designed to load balance and provide fault tolerance.
  • Features are grouped into different categories according to the type of users.
  • a security scheme allows user access to a feature according to category.
  • An administrator can be responsible for secure configuration and maintenance of a decision track software.
  • the administrator can configure databases and external access methods and defines access rights of various users.
  • the administration is also responsible for defining the synchronization relationships with other servers .
  • Decision tracks 135 may also define a series of actions to take based on different trigger events.
  • Trigger events may be time-based single events, time- based recurring events or external input and query result events. Queries may be directed to a database. In addition, queries against external Structured Query Language (SQL) accessible databases will operate. Conditionals control the transition of individual query results to the next state in the decision track.
  • SQL Structured Query Language
  • a software program comprising computer readable code on a computer readable medium is loaded onto a plurality of servers.
  • the software program additionally comprises a front-end application that allows users to access a variety of the features designed to automate load sharing and fault tolerance.
  • the techniques described here may be implemented in hardware or software, or a combination of the two.
  • the techniques are implemented in computer programs executing one or more programmable computer that includes a processor, a storage medium readable by the processor (including volatile and non-volatile memory and/or storage elements) , and suitable input and output devices.
  • the programmable computers may be either general-purpose computers or special-purpose, embedded systems. In either case, program code is applied to data entered with or received from an input device to perform the functions described and to generate output information. The output information is applied to one or more output devices .
  • Each program is preferably implemented in a high level procedural or object-oriented programming language to communicate with a computer system.
  • the programs can be implemented in assembly or machine language, if desired.
  • the language may be a compiled or interpreted language.
  • Each such computer program is preferably stored on a storage medium or device (e.g., CD-ROM, hard disk, magnetic diskette, or memory chip) that is readable by a general or special purpose programmable computer for configuring and operating the computer when the storage medium or device is read by the computer to perform the procedures described.
  • a storage medium or device e.g., CD-ROM, hard disk, magnetic diskette, or memory chip
  • the system also may be implemented as a computer-readable storage medium, configured with a computer program, where the storage medium so configured causes a computer to operate in a specific and predefined manner.

Abstract

In the present invention a method and apparatus is provided for load balancing and fault tolerance amongst a plurality of servers on a computer network, such as the Internet or a private network. A track of work is claimed by a server. Thereafter, the server processes work pertaining to that track and also allocates blocks of work to other servers on the network who have advertised for work. In the event a server ceases to communicate with the other servers on the network, all uncompleted work assigned to the server ceasing to communicate is reallocated to another server. The group of servers comprises a peer group, and a peer group elects a master. The presence of the master indicates that the peer group is functioning properly.

Description

LOAD BALANCE AND FAULT TOLERANCE IN A NETWORK SYSTEM Cross Reference To Related Applications
This application claims priority to U.S. Provisional Application No. 60/095,652, filed August 7, 1998.
Background of the Invention The present invention relates to load balancing and fault tolerance amongst computer servers functioning to track Internet/Intranet transactions. In particular the present invention relates to a system of load balancing and fault tolerance utilizing a lightweight algorithm and continually cycling processes to reduce exchange of server state information.
A mechanism is described to achieve both load balancing and fault tolerance. The backup systems provide load balancing services while active. When a system fails, the remaining available systems take over the failed system's load. A master system determines which participating system owns a decision track. Ownership of a decision track indicates responsibility for executing a contact gathering process and an event evaluation process. Step evaluation processes are distributed among available system within the same peer group .
By way of example access to distributed networks such as the internet has increased greatly in recent years and challenged commerce to use the internet advantageously. Thousands of internet and intranet, hereinafter Inet, sites have been added to networks. A great expenditure of time and effort has been invested in creating a myriad of resources available to Inet browsers. As a means- to benefit from the Inet forum, it is useful to have tools to interact with those browsing the Inet such as being able to track parties contacting a particular Inet-site. It is important that these tools be reliable and responsive as an Inet contact may be the first and possibly only type of contact made with an individual . The creation of virtual worlds online has further increased the importance of reliability and responsiveness. Purveyors of the Inet desire interactions that further emulate a real life commercial experience. Virtual storefront owners, corporate home pages, online catalogue vendors, and a myriad of other Inet-site owners, find it useful to be able to emulate the real life experience. As the complexity of an Inet interaction increases, the expectations of an individual making contact via the Inet also increases. Contact requires a fast reliable response.
A traditional method of increasing transaction speed is to increase the speed of processor units running the application. Processors have limits however to the maximum throughput available. Increasing demands cannot always be met by a faster processing box. Another method of increasing transaction speed is through shared processing amongst a plurality of processing units. Typically, however, this has involved very complicated hardware and software solutions requiring a sizable investment in man hours and expense. Often these types of solutions are not warranted for a dedicated Inet application.
To be effective, a system needs to be right sized to the task at hand. Consequently, there remains a need for a simple, cost effective means of sharing a processing load and also providing fault tolerance.
Summary of the Invention Accordingly, this invention provides a method of load balancing and fault tolerance amongst a plurality of servers on a computer network, such as the internet or a private network. In a preferred embodiment of the invention, a programmed computer server divides processing work into tracks of work that may be referred to as decision tracks. Each decision track comprises a series of conditions that are to be tested by records of a database. As conditions of the decision track are tested and met, an appropriate action is taken in response to the condition met. In addition, actions can be sequenced so as to achieve a desired result, as illustrated in Table 1 below.
Decision Track
Condition 1 If Condition is Met Then Action 1
Condition 2 If Condition is Met Then Action 2
Condition 3 If Condition is Met Then Action 3
Condition 4 If Condition is Met Then Action 4
Condition 5 If Condition is Met Then Action 5
Decision tracks are constructed so as to be able to be claimed by a computer server. A plurality of computer servers is arranged into a peer group networked together. The network enables the computer servers to communicate with each other. Servers coordinate within the peer group to claim individual decision tracks. Thereafter, the server owning a decision track processes initial work pertaining to that track and also allocates blocks of work from that decision track to other servers on the network who have advertised for work. In the event a server ceases. to communicate with the other servers in the peer group, all uncompleted work assigned to a non-communicating server is reallocated to another server.
Peer groups also elect a master server. The presence of a master server signifies that a group is functioning and able to handle common tasks. In addition to normal peer group server functions, a master server handles tasks pertaining to the peer group as a whole, such as e-mail. The master is typically elected by a simple device such as the lowest machine number of each server. For the purposes of this disclosure an Internet can refer to, for example, a network comprising computers exceeding the boundaries of a private network. An Intranet can refer to, for example, computers within a private network. An Inet can refer to an Internet and/or an Intranet adhering to an internet protocol or similar protocol. An Inet-site is, for example, a site available on either an Internet or an Intranet. A network, for example, can have a computer acting as a server and a computer acting as a client. A contact can, for example, be an access to an electronic interface such as a web site, or other contents of a stored memory such as a hard drive or dynamic random access memory of a server. A client can be a person, a node operator, or broadly, a machine or electronic device making such contact, or causing a node of a network to make such a contact. Real time is meant to be read broadly to signify on a basis timely to or in relation to an individual event.
Other advantages and features of the present invention will become apparent from the following description, including the drawings and the claims. Brief Description of the Drawing FIG. 1 illustrates a typical configuration supporting this invention.
FIG. 2 illustrates the query process of a decision track,
FIG. 3 illustrates a load balancing sequence,
Description of the Preferred Embodiments
According to the present invention, an apparatus method and system are described for load balancing and fault tolerance comprising a plurality of computer servers 110 networked together into a peer group 120 and also networked to a database server 130. The network provides a means of communication between servers. Work is divided into tracks 135 and distributed amongst the peer group servers according to the availability of each server to accommodate additional work. Utilizing a multitude of servers to process work effectively lessens the work required by a single server and effectively speeds the response of the system. The ability of a peer group to allocate work amongst available servers, and then reallocate work if a particular server should become unavailable, provides fault tolerance.
Servers periodically notify other peer group servers of their presence on a network by way of a well- known device such as a "hello" message or an
"advertisement . " Such advertisements are performed on a periodic cycle. A preferred periodic cycle is about 15 seconds. However, periodic cycles may be any length that is appropriate based on network characteristics, such as the number of nodes, the speed of communication, and the speed of the processor units. Generally, any periodic cycle between 5 seconds and 120 seconds is acceptable. If an advertisement is not received from a server for a predetermined number of periodic cycles, such as for example, 4 cycles of 15 seconds each, the other peer group servers will consider the mute server unavailable. Any work previously allocated to a server subsequently determined unavailable is reallocated amongst available servers .
In a preferred embodiment of the present invention, work is structured so that it may be executed by decision tracks. Each decision track comprises a series of queries to be made against records of a database. If the conditions of a query are met, 210 then an appropriate action may be taken, if the conditions are not met, then a next record, or a next set of conditions is queried.
During operation, a peer group computer server 110 will claim ownership of one or more decision tracks 135. After determination of ownership 310, a computer server 110 performs initial work such as for example, contact gathering 320. Contact gathering comprises creation of a set of contact records 145 that are to be put on a particular step of the decision track 135. After the contact gathering is complete, blocks of work comprising steps are created and can either be distributed to other computer servers 110 in the peer group 330 or performed by an owning server. A block of work may consist of, by way of example, a set of contact records ready for the next step of a decision track to be performed on them, or a list of steps to be executed on a particular record. After the distribution, a server evaluates events 340 for any changes in conditions and cycles through the process again. Distribution of work is effectuated by a response to advertisements or requests for work sent out by various computer servers 110 included in a peer group 120. As a server is capable of accepting additional work, it will send an advertisement to the other servers in the peer group requesting work, such as for example a step list block 350. The requesting computer server 350 the executes indicated steps 360. An owner computer server 110 who receives such an advertisement may send a block of work to the advertising computer server llOto be processed. In this manner there is a continual load sharing of available work.
In one preferred embodiment, decision track ownership is claimed by attaching a claim counter to an advertisement broadcast by a server. A server will claim a decision track and set a counter to a predetermined interval, for example two. Each time the server broadcasts an advertisement, the counter decrements one. When the counter reaches zero the decision track is authoritatively owned by the claiming server. Other peer group servers may challenge the claim for a decision track by claiming it for themselves during the counter interval .
If two or more servers claim ownership of the same decision track, ownership election reverts to an arbitration routine. Arbitration determines ownership by a simple criterion such as the server with the least number of owned tracks . In the instance where two or more servers have an equal number of tracks, the ownership is awarded to the server with lowest machine
ID.
A preferred embodiment teaches each server 110 maintaining a table 155 to store the time of the most recent advertisement for each server and the decision tracks owned by each server. Each server queries the table to test if a predetermined period has elapsed without notification from any of the peer group servers. If a predetermined period has elapsed without notification from a particular server, the non- communicating server is deemed to be unavailable. All decision tracks owned by unavailable servers are reallocated to the remaining servers. Reallocation is accomplished in much the same manner as initial election. A server will advertise claiming ownership of a decision track of a server determined to be unavailable. If the advertisement is not challenged within a predetermined number of advertisement cycles, ownership is awarded to the advertising server.
The allocation and reallocation process acts as fault tolerance. A decision track 135 will not be without an owner for more than the predetermined period. After the predetermined period has elapsed another server 110 takes ownership and the work of the decommissioned server commences again. Each peer group server 110 includes a copy of each decision track 135 as well as the table recording ownership of the various decision tracks. As a server 110 begins functioning as an owner, it records ownership in the table, and commences to perform the work allocated to the owner of that decision track.
A database server 130 stores the contact data records 145 referenced in the various blocks of work performed by peer group servers executing decision tracks. Typically, there is only one database server 130 from which all records are processed. In this manner all peer group servers have access to the same data. A peer group 120 will also elect a master server 140. The advertised presence of a master server declares that network connectivity exists, that the peer group is communicating properly, and that operations may commence. Elections for a master server 140 are based on a simple criterion such as the lowest ID of the servers involved. In a periodic cycle the master will broadcast an "advertisement" or hello message, declaring its presence to other servers in the peer group. One preferred embodiment of a periodic cycle is 15 seconds. Another preferred embodiment of a periodic cycle is between 5 seconds and 120 seconds. The duration of the periodic cycle will depend on the speed of the network and the processing power of the servers.
If the presence of a master server 140 has not been detected by a peer group 120, through receipt of an advertisement from a master server 140 for a period of some number of periodic cycles, for example 4 cycles, the peer group elects a new master server. A period may be comprised of more or less cycles depending on the criticality of the timing for the work being performed and the processing power of the servers.
Decision tracks 135 and the criteria for each step of a decision track 135 can be created and manipulated via a user interface 165. In a preferred embodiment graphical representation for each step of a decision making process correlating to each step of a decision track is created. The graphical representation can facilitate accurate processing of data and ease of use. Another method for creating decisions tracks would include a written language statement defining criteria for each decision. In a preferred embodiment of this invention, a software program on a computer readable medium is loaded on a plurality of servers. The software program comprises a front-end application that allows users to access a variety of the features designed to load balance and provide fault tolerance. Features are grouped into different categories according to the type of users. A security scheme allows user access to a feature according to category.
An administrator can be responsible for secure configuration and maintenance of a decision track software. The administrator can configure databases and external access methods and defines access rights of various users. The administration is also responsible for defining the synchronization relationships with other servers .
Decision tracks 135 may also define a series of actions to take based on different trigger events. Trigger events may be time-based single events, time- based recurring events or external input and query result events. Queries may be directed to a database. In addition, queries against external Structured Query Language (SQL) accessible databases will operate. Conditionals control the transition of individual query results to the next state in the decision track.
The methods and mechanisms described here are not limited to any particular hardware or software configuration, or to any particular communications modality, but rather they may find applicability in any communications or computer network environment. In a preferred embodiment of this invention, a software program comprising computer readable code on a computer readable medium is loaded onto a plurality of servers. The software program additionally comprises a front-end application that allows users to access a variety of the features designed to automate load sharing and fault tolerance.
The techniques described here may be implemented in hardware or software, or a combination of the two. Preferably, the techniques are implemented in computer programs executing one or more programmable computer that includes a processor, a storage medium readable by the processor (including volatile and non-volatile memory and/or storage elements) , and suitable input and output devices. The programmable computers may be either general-purpose computers or special-purpose, embedded systems. In either case, program code is applied to data entered with or received from an input device to perform the functions described and to generate output information. The output information is applied to one or more output devices .
Each program is preferably implemented in a high level procedural or object-oriented programming language to communicate with a computer system. However, the programs can be implemented in assembly or machine language, if desired. In any case, the language may be a compiled or interpreted language.
Each such computer program is preferably stored on a storage medium or device (e.g., CD-ROM, hard disk, magnetic diskette, or memory chip) that is readable by a general or special purpose programmable computer for configuring and operating the computer when the storage medium or device is read by the computer to perform the procedures described. The system also may be implemented as a computer-readable storage medium, configured with a computer program, where the storage medium so configured causes a computer to operate in a specific and predefined manner.
The invention described has broad application to a wide range of electronic interaction environments and a number of embodiments based upon the principles disclosed are possible.
What is claimed is:

Claims

1. A method of load balancing or fault tolerance in a system of computer servers comprising:
a) dividing a server workload into separate tracks of work;
b) communicating an advertisement requesting from a server residing in a peer group of servers ;
c) allocating a track of work to said server in said peer group requesting work;
d) communicating on a periodic cycle the presence of each of the servers in said peer group to other servers in the peer group; and
e) reallocating a track of work previously allocated to a server that fails to communicate on a periodic cycle within a predetermined time period.
2. A method of load balancing and fault tolerance in a system of computer servers as recited in claim 1 wherein allocating a block of work further comprises:
a) claiming ownership of decision tracks by servers wherein the server becomes the owner server of that decision track;
b) performing preliminary work on a decision track by the owner server; and
c) transferring a block of work by the owner server to a server advertising for work.
3. A method of load balancing and fault tolerance in a system of computer servers as recited in claim 1 or claim 2 wherein a track of work is a decision track.
4. A method of load balancing and fault tolerance in a system of computer servers as recited in claim 1 wherein the network is an Inet.
5. A method of load balancing and fault tolerance in a system of computer servers as recited in claim 1 further comprising election of a master server to indicate a peer group is operational.
6. A method of load balancing and fault tolerance in a system of computer servers as recited in claim 5 wherein election is based on the binary name of the peer group servers .
7. A method of load balancing and fault tolerance in a system of computer servers as recited in claim 5, further comprising:
a) monitoring the peer group for the presence of a master server; and
b) electing a new master server if no master server is advertised as present for a predetermined period.
8. A method of load balancing and fault tolerance in a system of computer servers as recited in claim 1 wherein the predetermined period is 4 periodic cycles of about 30 seconds each cycle.
9. A method of load balancing and fault tolerance in a system of computer servers as recited in claim 2 further comprising the step of storing the ownership of a decision track in a table on a peer group server.
10. An apparatus for providing fault tolerance or load balancing in a network of computer servers, the apparatus comprising:
a) a peer group of computer servers networked together;
b) a software program implementing the following in said peer group of computer servers:
i) declare a peer group operational through the election of a master server from amongst said peer group of computer servers ;
ii) claim ownership of a decision track by one computer server of said peer group of computer servers creating an owner server;
iii) enable the owner server to notify other servers comprising said peer group of the presence of the owner server on the network; c) request work by a peer group server other than the owner server;
d) create a block of work by an owner server;
e) distribute a block of work to a peer group server requesting work;
f) monitor the presence of an owner server; and
g) claim decision tracks previously owned by an owner server failing to be present on the network for a predetermined period.
11. The apparatus of claim 10 for providing fault tolerance and load balancing in a network of computer servers wherein the network is an Inet.
12. The apparatus of claim 10 for providing fault tolerance and load balancing in a network of computer servers wherein the election is based on the binary name of the peer servers.
13. The apparatus of claim 10 for providing fault tolerance and load balancing in a network of computer servers wherein the means for an owner server of notifying the other servers comprising a peer group of the presence of the owner server is an advertisement from the owner server to other servers comprising a peer grou .
14. The apparatus of claim 10 for providing fault tolerance and load balancing in a network of computer servers further comprising:
a) a means for monitoring the peer group for the presence of a master server; and
b) a means for electing a new master server if no master server is present for a predetermined period.
15. The apparatus of claim 14 for providing fault tolerance and load balancing in a network of computer servers wherein the means for monitoring a peer group for the presence of a master server is a periodic advertisement by a master.
16. The apparatus of claim 14 for providing fault tolerance and load balancing in a network of computer servers wherein the predetermined period is 4 periodic cycles of about 30 seconds each cycle.
17. Software, stored on a computer-readable medium, for gathering and disseminating information on a network, the software comprising instructions to cause a computer system to perform the following operations: a) provide a plurality of servers comprising a peer group; b) provide communications between said servers; c) divide a workload into separate tracks; d) communicate an advertisement from a server requesting work; e) allocate a track of work to a server; f) communicate the presence of a server to the peer group on a periodic cycle; and g) reallocate a track of work previously allocated to a non-communicating server if an advertisement stating the presence of said non-communicating server is not communicated to the peer group within a predetermined time period.
18. The article of manufacture of claim 17 further comprising: a) computer readable code means for claiming ownership of decision tracks by a server wherein the server becomes the owner server of that decision track; b) computer readable code means for performing preliminary work on a decision track by the owner server; and c) computer readable code means for transferring a block of work by the owner server to a server advertising for work
19. A programmed computer server for providing load balancing or fault tolerance in a peer group of servers the computer server comprising: a) a memory having at least one region for storing computer executable program code; b) a processor for executing program code stored in said memory; c) a program code stored in said memory said program code implementing the following operations in said computer server: i) divide server workload into separate tracks of work; ii) communicate the presence of each server in the workgroup; iii) divide a server workload into separate tracks of work; iv) communicate an advertisement from a server in said peer group requesting work; v) allocate a track of work to said server in said peer group requesting work; vi) communicate the presence of each server comprising said peer group to other servers comprising said peer group on a periodic cycle; and vii) reallocate a track of work previously allocated to a non-communicating server if an advertisement stating the presence of said non-communicating server is not received by the peer group within a predetermined time period.
20. The programmed computer of claim 19 wherein the tracks of work comprise decision tracks
PCT/US1999/002154 1998-08-07 1999-02-01 Load balance and fault tolerance in a network system WO2000008823A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
AU24909/99A AU2490999A (en) 1998-08-07 1999-02-01 Load balance and fault tolerance in a network system

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US9565298P 1998-08-07 1998-08-07
US60/095,652 1998-08-07

Publications (1)

Publication Number Publication Date
WO2000008823A1 true WO2000008823A1 (en) 2000-02-17

Family

ID=22252983

Family Applications (2)

Application Number Title Priority Date Filing Date
PCT/US1999/002154 WO2000008823A1 (en) 1998-08-07 1999-02-01 Load balance and fault tolerance in a network system
PCT/US1999/002155 WO2000008583A1 (en) 1998-08-07 1999-02-01 Network contact tracking system

Family Applications After (1)

Application Number Title Priority Date Filing Date
PCT/US1999/002155 WO2000008583A1 (en) 1998-08-07 1999-02-01 Network contact tracking system

Country Status (5)

Country Link
EP (1) EP1019858A1 (en)
JP (1) JP2002522829A (en)
AU (2) AU2572699A (en)
CA (1) CA2306175A1 (en)
WO (2) WO2000008823A1 (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6950855B2 (en) 2002-01-18 2005-09-27 International Business Machines Corporation Master node selection in clustered node configurations
GB2421602A (en) * 2004-12-21 2006-06-28 Hewlett Packard Development Co Managing the failure of a master workload management process
FR3118843A1 (en) * 2021-01-13 2022-07-15 Dassault Aviation SECURE AIRCRAFT DIGITAL DATA TRANSFER SYSTEM COMPRISING REDUNDANT DATA PRODUCING SYSTEMS, ASSOCIATED ASSEMBLY AND METHOD

Families Citing this family (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
IL133489A0 (en) 1999-12-13 2001-04-30 Almondnet Inc A descriptive-profile mercantile method
JP2001222601A (en) * 2000-02-09 2001-08-17 Nec Corp System and method for information communication and information providing business method
AU2001236650A1 (en) * 2000-02-24 2001-09-03 Zack Network, Inc. Modifying contents of a document during delivery
AU2001245408A1 (en) * 2000-03-03 2001-09-17 Merinta, Inc. System and method for tracking user interaction with a graphical user interface
JP2001325521A (en) * 2000-05-18 2001-11-22 Nec Soft Ltd System and method for mail delivery by internet
JP2002049855A (en) * 2000-05-24 2002-02-15 Sony Computer Entertainment Inc Server system
JP2002083223A (en) * 2000-06-21 2002-03-22 Olympus Optical Co Ltd Medical equipment sales system and sales method of medical equipment
GB2371644B (en) * 2000-09-25 2004-10-06 Mythink Technology Co Ltd Method and system for real-time analyzing and processing data over the internet
NL1016388C2 (en) * 2000-10-11 2002-04-12 O L M E Commerce Services B V Internet operating method, uses database and recognition components associated with host site server to monitor and record Internet usage by registered users
US6832207B1 (en) 2000-11-28 2004-12-14 Almond Net, Inc. Super saturation method for information-media
US7343313B2 (en) * 2002-10-01 2008-03-11 Motorola, Inc. Method and apparatus for scheduling a meeting
JP5990598B2 (en) 2011-12-31 2016-09-14 トムソン ライセンシングThomson Licensing Method and user device for providing a web page
US11669584B2 (en) 2013-02-10 2023-06-06 Wix.Com Ltd. System and method for third party application activity data collection
IL292474A (en) * 2013-12-04 2022-06-01 Wix Com Ltd System and method for third party application activity data collection

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
AU4902096A (en) * 1995-02-01 1996-08-21 Freemark Communications, Inc. System and method for providing end-user free email
US5794210A (en) * 1995-12-11 1998-08-11 Cybergold, Inc. Attention brokerage
US5823879A (en) * 1996-01-19 1998-10-20 Sheldon F. Goldberg Network gaming system
CN1146781C (en) * 1996-01-23 2004-04-21 环球媒介股份有限公司 Information display system
US6285987B1 (en) * 1997-01-22 2001-09-04 Engage, Inc. Internet advertising system

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
DAMANI O P ET AL: "ONE-IP: techniques for hosting a service on a cluster of machines", COMPUTER NETWORKS AND ISDN SYSTEMS, vol. 29, no. 8-13, 1 September 1997 (1997-09-01), pages 1019-1027, XP004095300, ISSN: 0169-7552 *
YOSHIDA A: "MOWS: distributed Web and cache server in Java", COMPUTER NETWORKS AND ISDN SYSTEMS, vol. 29, no. 8-13, 1 September 1997 (1997-09-01), pages 965-975, XP004095295, ISSN: 0169-7552 *

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6950855B2 (en) 2002-01-18 2005-09-27 International Business Machines Corporation Master node selection in clustered node configurations
GB2421602A (en) * 2004-12-21 2006-06-28 Hewlett Packard Development Co Managing the failure of a master workload management process
GB2421602B (en) * 2004-12-21 2009-07-01 Hewlett Packard Development Co System and method for replacing an inoperable master workload management process
US7979862B2 (en) 2004-12-21 2011-07-12 Hewlett-Packard Development Company, L.P. System and method for replacing an inoperable master workload management process
FR3118843A1 (en) * 2021-01-13 2022-07-15 Dassault Aviation SECURE AIRCRAFT DIGITAL DATA TRANSFER SYSTEM COMPRISING REDUNDANT DATA PRODUCING SYSTEMS, ASSOCIATED ASSEMBLY AND METHOD
EP4030330A1 (en) * 2021-01-13 2022-07-20 Dassault Aviation System for secure transfer of digital data of an aircraft including systems producing redundant data, associated assembly and method

Also Published As

Publication number Publication date
EP1019858A1 (en) 2000-07-19
CA2306175A1 (en) 2000-02-17
WO2000008583A1 (en) 2000-02-17
JP2002522829A (en) 2002-07-23
AU2572699A (en) 2000-02-28
AU2490999A (en) 2000-02-28

Similar Documents

Publication Publication Date Title
WO2000008823A1 (en) Load balance and fault tolerance in a network system
US7979862B2 (en) System and method for replacing an inoperable master workload management process
US7418489B2 (en) Method and apparatus for applying policies
US8117169B2 (en) Performing scheduled backups of a backup node associated with a plurality of agent nodes
CN104169881B (en) System and method for the virtualization of server cluster application program
US7284146B2 (en) Markov model of availability for clustered systems
CN102682059B (en) Method and system for distributing users to clusters
US8271641B2 (en) Method and system for governing resource consumption in a multi-tenant system
US6157928A (en) Apparatus and system for an adaptive data management architecture
US7953848B2 (en) Problem determination in distributed enterprise applications
US7437594B1 (en) Server-side session management
US20010039559A1 (en) Workload management method to enhance shared resource access in a multisystem environment
US20060155912A1 (en) Server cluster having a virtual server
US6928477B1 (en) Availability and scalability in clustered application servers by transmitting expected loads of clients to load balancer
US20080030764A1 (en) Server parallel aggregation
US20040254984A1 (en) System and method for coordinating cluster serviceability updates over distributed consensus within a distributed data system cluster
US20020052980A1 (en) Method and apparatus for event handling in an enterprise
WO2000026844A1 (en) Method and apparatus for data management using an event transition network
US7571088B2 (en) Simulation of connected devices
WO2005017783A2 (en) Hierarchical management of the dynamic allocation of resourses in a multi-node system
KR100718907B1 (en) Load balancing system based on fuzzy grouping and the load balancing method
US20010027467A1 (en) Massively distributed database system and associated method
US20060149611A1 (en) Peer to peer resource negotiation and coordination to satisfy a service level objective
Kim et al. Virtual machines placement for network isolation in clouds
US7152026B1 (en) Versioned node configurations for parallel applications

Legal Events

Date Code Title Description
AK Designated states

Kind code of ref document: A1

Designated state(s): AL AM AT AU AZ BA BB BG BR BY CA CH CN CU CZ DE DK EE ES FI GB GE GH GM HR HU ID IL IN IS JP KE KG KP KR KZ LC LK LR LS LT LU LV MD MG MK MN MW MX NO NZ PL PT RO RU SD SE SG SI SK SL TJ TM TR TT UA UG UZ VN YU ZW

AL Designated countries for regional patents

Kind code of ref document: A1

Designated state(s): GH GM KE LS MW SD SZ UG ZW AM AZ BY KG KZ MD RU TJ TM AT BE CH CY DE DK ES FI FR GB GR IE IT LU MC NL PT SE BF BJ CF CG CI CM GA GN GW ML MR NE SN TD TG

121 Ep: the epo has been informed by wipo that ep was designated in this application
DFPE Request for preliminary examination filed prior to expiration of 19th month from priority date (pct application filed before 20040101)
REG Reference to national code

Ref country code: DE

Ref legal event code: 8642

122 Ep: pct application non-entry in european phase
REF Corresponds to

Ref document number: 10082178

Country of ref document: DE

Date of ref document: 20020814

Format of ref document f/p: P