EP2021910A2 - Next generation clustering - Google Patents

Next generation clustering

Info

Publication number
EP2021910A2
EP2021910A2 EP07709948A EP07709948A EP2021910A2 EP 2021910 A2 EP2021910 A2 EP 2021910A2 EP 07709948 A EP07709948 A EP 07709948A EP 07709948 A EP07709948 A EP 07709948A EP 2021910 A2 EP2021910 A2 EP 2021910A2
Authority
EP
European Patent Office
Prior art keywords
cluster
computer implemented
application server
lease
master
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Withdrawn
Application number
EP07709948A
Other languages
German (de)
French (fr)
Other versions
EP2021910A4 (en
Inventor
Naresh Revanuru
Priscilla C. Fung
Venkatesan Ranganathan
Aaron Fiske
Dean Bernard Jacobs
Prasad Peddada
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Oracle International Corp
Original Assignee
BEA Systems Inc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority claimed from US11/425,784 external-priority patent/US7536581B2/en
Priority claimed from US11/548,239 external-priority patent/US7661015B2/en
Priority claimed from US11/550,551 external-priority patent/US8122108B2/en
Application filed by BEA Systems Inc filed Critical BEA Systems Inc
Publication of EP2021910A2 publication Critical patent/EP2021910A2/en
Publication of EP2021910A4 publication Critical patent/EP2021910A4/en
Withdrawn legal-status Critical Current

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/50Network services
    • H04L67/60Scheduling or organising the servicing of application requests, e.g. requests for application data transmissions using the analysis and optimisation of the required network resources
    • H04L67/62Establishing a time schedule for servicing the requests

Definitions

  • enterprise software applications can use application servers, such as J2EE application servers like the WebLogic ServerTM available from BEA Systems, Inc., of San Jose, California. These application servers can be used in clusters that can interact with one another.
  • J2EE application servers like the WebLogic ServerTM available from BEA Systems, Inc., of San Jose, California. These application servers can be used in clusters that can interact with one another.
  • singleton services should be run on only one application server of a cluster.
  • These singleton services can include JMS servers, transaction recovery services or any other software that should be only run in a single instance.
  • Figure 1 shows a database-based leasing system.
  • Figure 2 shows a database-less leasing system of one embodiment of the present invention.
  • Figures 3A and 3B show a database-less leasing system of one embodiment of the present invention.
  • Figures 4A-4C illustrate an automatic migratable service system of one embodiment of the present invention.
  • Figures 5A and 5B illustrate a job scheduler system.
  • Figure 1 shows an example of a leasing system using a database 102.
  • application servers 104, 106 and 108 of the cluster 110 can rely on the database to provide access to a lease table 102.
  • Leases at the lease table 102 can be used to indicate what application server should run a singleton service.
  • the leases can be updated by the application server running the singleton service. In case of a crash, the lease will no longer be updated and will become invalid. This can allow one of the application servers of the cluster 110 to take over for a crashed or partitioned application server that was controlling the lease system. In some cases, it is desired to avoid the requirement of a High Availability (HA) database for leasing.
  • Embodiments of the present invention comprise a database-less leasing system.
  • One embodiment of the present invention is a computer-implemented method comprising a cluster 202 of application servers 204, 206, 208 and 210.
  • the method can include determining a cluster leader 202, using the cluster leader 212 to set up a lease table
  • the lease table is stored at the application servers, no database is required.
  • copies of the lease table are maintained at each application server in the cluster so that the copy of the lease table is available in case of a crash or partition.
  • the lease tables can be used to allow automatic migration of the singleton service.
  • Node managers can be used to determine the state of application servers in the cluster.
  • the node manager can be a software program running on application server hosts.
  • the node manager can be used to start and stop instances of the application servers.
  • the application server of the cluster that was started earliest can be selected to have the cluster leader.
  • the cluster leader is selected by a kind of competition. Every server in the cluster can periodically try to be the cluster leader. For example, every server in the cluster can try to be the cluster leader once every 30 seconds. If the cluster leader already exists, their attempt is rejected. If the cluster leader currently does not exist, the first server to try to claim it becomes cluster leader, thus preventing anyone else from becoming cluster leader. Tn this way, the application server of the cluster that was started earliest can be selected to have the cluster leader.
  • the system can be designed such that a cluster leader could be selected by another method.
  • the cluster leader 212 can heartbeat other application servers of the cluster.
  • the cluster leader 212 can store copies of the lease table in the other application servers of the cluster 202 to operate in case of a crash or partition of one or more application servers. In one embodiment, if the current cluster leader 212 fails to the heartbeat the other application servers, the other application servers can select another cluster leader.
  • One embodiment of the present invention comprises a cluster 202 of the application servers 204, 206, 208 and 210.
  • a cluster leader is selected based on the first application server up.
  • the cluster leader 212 is used to set up a lease table 214 at one of the application servers 204.
  • One embodiment of the present invention comprises a computer-implemented system wherein a lease table 214 is maintained at an application server 204 of a cluster 202 of application servers. Other application servers of the cluster can use the lease table 214 to maintain at least one lease 216 for a singleton service 218.
  • Figure 3A shows a cluster leader heartbeating data to the other application server of the cluster.
  • Figure 3B shows another cluster leader being selected in the case of a crash of the application server having the current cluster leader.
  • Figure 3C shows another cluster leader being selected in the case of a partition of the network that makes the first application server unavailable.
  • One embodiment of the present invention is a computer-implemented system comprising a first application server 402 of a cluster 404 that runs a singleton service 406.
  • the first application server 102 maintaining a lease 408 for the singleton service 406 at a lease table 410.
  • a migration master 412 checks the lease table 410 and reassigns the singleton service 406 to a second application server 414 of a cluster 404 if the first application server 402 fails to maintain the lease 408.
  • the lease table 410 can be maintained in a database or by using database-less leasing as described above.
  • the first application server 402 can fail to update the lease because of a crash of the first application server as shown in figure 4B or the first application server 402 can fail to update the lease because the first application server 402 is partitioned from the lease table as shown in figure 4C.
  • the first application server 402 can heartbeat the lease 408 to maintain control of the singleton service 406.
  • the singleton service can be a JMS server, a timer master or any other software that should be run in a single instance.
  • the second application server 414 can run a predetermined activation script before getting the singleton service.
  • the first application server 402 can run a predetermined deactivation script after giving up the singleton service.
  • the migration master 412 can select the next application server to run the singleton service, such as by selecting the next application server.
  • the singleton service is a Java Messaging System (JMS) service. If the singleton service is a JMS service, the migration manager can attempt a restart on the first application server before any migration.
  • JMS Java Messaging System
  • One embodiment is a computer implemented method or computer readable media containing code to do the steps of updating a lease 408 at a lease table 410 for a singleton service.
  • first application server 402 checking the lease table 410 with a migration master 412.
  • One embodiment of the present invention is a timer master 502 at an application server 504 of a cluster 506.
  • the timer master 502 assigns scheduled jobs to other applications servers 508, 510 and 512 of the cluster.
  • the application server 504 maintains a lease 514 for the timer master from a lease table 516.
  • the timer master 502 storing job info 520 for the scheduled jobs in a database.
  • another application server 510 of the cluster 506 can be assigned the time master 502 which can use the job info to assign scheduled jobs.
  • the scheduled jobs can include reports, such as database reports. Such reports can require a large number of database accesses and thus can take a lot of system resources.
  • the scheduled jobs can thus be scheduled to run at an off-peak time so as to not reduce the performance of other applications.
  • the lease table can be in the database or alternately a database-less leasing system can be used.
  • the timer master 502 can be a singleton service.
  • the timer master 502 can be assigned to the application server 510 by a migration master.
  • Other application servers can request jobs from the timer master 502.
  • One embodiment of the present invention is a computer-implemented system comprising a timer master 502 at an application server 504 of a cluster.
  • the timer master 502 can assign scheduled jobs to other application servers 508, 510 and 512 of the cluster
  • another application server 510 of the cluster 506 can be assigned the timer master that can assign scheduled jobs.
  • One embodiment of the present invention is an application server 504 of a cluster
  • Advanced clustering features like automatic server and service migration, cluster wide singleton and lock manager can use leasing and lease management. Leasing can guarantee that only one member in a cluster gets ownership of the lease for a certain period of time that can be renewed. The lease owner is then able to execute certain privileged operations like migrating failed servers knowing that it has exclusive ownership of the lease.
  • This specification describes how leasing and cluster master type of functionality can be implemented without any dependency on an external arbitrator like a high availability database.
  • LeaseManagers can be used by subsystems to obtain leases, register interest in getting a lease when one becomes available, find out the current owner of the lease etc.
  • One type of leasing basis used for automatic server migration requires the presence of a High Availability (HA) database.
  • HA High Availability
  • the lease table can be hosted in one of the servers in the cluster and not in the database. This means that the cluster members can elect a server that will host the lease table and become the cluster Leader.
  • This elected cluster Leader can be responsible for giving out leases, updating the lease table and replicating the updates atomically to the cluster. Replication is important for failover purposes.
  • the members can start another round of voting to elect a new cluster master.
  • the newly elected master can take ownership of the lease table.
  • the cluster master owner can also perform automatic server migration of failed cluster nodes.
  • consensus based leasing can meet the following requirements:
  • heartbeat timeout (next poll time). This is the time period during which the cluster members have not received any heartbeat from the cluster master. By default this period is 30 seconds.
  • ConsensusProcessIdentif ⁇ er Members of the cluster that can host the cluster master and participate in the consensus algorithm can be marked with a special ConsensusProcessIdentif ⁇ er in the conf ⁇ g.xml.
  • This identifier can be a unique integer value. This can be an attributed on ServerMBean. Customers should just be able to mark the servers that can host the cluster master and the product should be able to generate the identifiers automatically.
  • the console can allow customers to choose which cluster nodes can host the cluster master and automatically generate the consensus process identifier. It can also set the value ClusterMBean. setConsensusParticipantsO based on the number of servers chosen in the cluster.
  • the consensus leasing basis like all other implementations of the LeasingBasis interface can be hidden from subsystems and external users.
  • Subsystems can ask for a singleton service by implementing the weblogic.cluster.singleton.SingletonService interface and then registering with a SingletonService Manager.
  • LockManager can also be implemented on top of leasing.
  • a migratable service When a migratable service becomes unavailable for any reason (a bug in the service code, server crashing, network partition) it can be deactivated at its current location and activated on a new server. Should there be a failure while activating on the new server, it can be deactivated on that server and migrated again.
  • a service's liveness can be proven by the maintenance of a lease. The server that the service lives on can be responsible for keeping the lease alive, via a heartbeating mechanism. A server crash can naturally result in the lease timing out. In one embodiment, there is only one lease per Migratable Target.
  • All services on that target can share the lease. Tn one embodiment, all services on a target can assumed to be somewhat dependent on each other. (Or at least, the user should tolerate one failing service on a target causing the entire target to migrate). In one embodiment, the admin server does not need to be active for automatic migration.
  • the Migration Master can keep track of all services it is supposed to keep alive.
  • the information can be available from the configuration. This is useful because if a service is unleased, then it will no longer exist in the table to be monitored.
  • the configuration can be the same across all the services in the cluster.
  • Servers can be in Manager Service Independence (MSI) mode and still participate in automatic migration.
  • MSI Manager Service Independence
  • the only restriction is that new services cannot be started.
  • the service will _not_ be automatically migrated.
  • migration can be enabled for any newly added services (and can persist even if the admin server is later shut down).
  • the server can deactivate its services.
  • the MM can start the service somewhere else. This may result in redundant deactivation calls if the network connection is restored after deactivation but before MM notices the lease timeout, but deactivation is idempotent.
  • the service If the service is unhealthy yet not disconnected, it will communicate to the Migratable Target and tell it to relinquish the lease. The MM will notice the lease disappearing/timing out, and will migrate it.
  • the following method can be added to the MigratableTarget: * Called by Migratable classes when they detect a failure and need to stop and start on a different server. Should only be used for unrecoverable failures in a Migratable object. If shutdowns erver is true (as it would be for JTA), then the server will be shutdown and the service deactivated as a consequence of this.*/public void failcdScrvicc(String serviceName, boolean shutdownServer)
  • the Migration Master upon noticing the expired lease, can start a migration. Tt can set a flag noting that the migration has begun for particular service. This can prevent re- noticing an expired lease and migrating again in the middle of a previous migration. (The same mechanism is used in Server migration.)
  • the current location of the service (if it is still available) can deactivate itself. Then the new location can call activate on the target. This can be the same code path as in the original migratable services. There can be additional steps introduced, however.
  • its first action can be to claim the lease. This can stop the migration master from constantly checking for its liveness. Jt can also provide an atomicity lock on the operation; no one else can be activating while this one holds the lease.
  • the service can check if a named Node Master (NM) pre-migration script is specified for it. Then it can check for the existence of the node master on the current machine. If it is not there, but there is a script specified, it can stop the migration. If it is there, it can check to see if the node master has performed a specified pre-migration script. If it hasn't run the script already, it can tell the node master to run pre-migration script. Extra flags can be passed to this script, to allow the script to do something different if we're migrating JTA, JMS or something else entirely. Placeholder scripts, can be provided but specific tlog migration, for example, need not be done.
  • NM Node Master
  • activation will not proceed until the node master responds positively that it has run the pre-migration script. We can make repeated attempts to run the pre-migration. script. If it cannot for some reason, the migration will stop, and we will let the migration master migrate us to a new server.
  • Deactivation is essentially the inverse of activation.
  • the deactivating server can call deactivate on all the services in the specified order. Exceptions will be logged, but no action can be taken.
  • node master post-migration script is specified for it. Then it can check for the existence of the node master on the current machine. If it is there, check to see if the node master has performed a specified post- migration script. If it hasn't run the script already, tell the node master to run the post- migration script. Extra flags can be passed to this script, to allow the script to do something different if we're migrating JTA, JMS or something else entirely. In one embodiment, we can provide placeholder scripts, but specific tlog migration, for example, cannot be done.
  • a kill script will be run, if available. If the kill script fails, activation can continue as normal. In the worst case, we can deactivate everywhere and let the admin reactivate it when the issues have been addressed. Finally, when the script part is complete, the service will give up the lease. Scripts will be run during manual migration, if specified.
  • the service can live in its new location forever, by default. Administrators may manually migrate the target to a new (or, in the case of failback, old) server at any time in the same exact manner as they did before.
  • MigratableTargetMBean can control how many attempts to make in terms of the number of complete cluster loops. Note that existing migratable target limitations can still apply: if candidate servers are specified, only servers in the candidate server list will be tried.
  • the AdditionalMigrationAttempts can default to zero. It can control the number of times we will try to migrate the service across every server in the cluster (or candidate- server list, if specified.) For example, if a cluster has 3 members, and the AdditionalMigrationAttempts is set to 2, we can try to start it on every server in the cluster, then pause, try again, pause, and try one final time. In this example, this means each server can have 3 opportunities to bring up the service successfully.
  • the pause between migration attempts can be controlled by a value, such as
  • MillisToSleepBetweenAttempts In one embodiment, this ONLY controls the pause that happens when the service fails to come up on any server, so we start back at the first and try again. While doing normal migrations there need be no delays.
  • MillisToSleepBetweenAttempts 12000"
  • a migratablc service could fail to come up on every possible configured server. This attribute controls how many further attempts, after the service has failed on every server at least once, should be tried. Note that each attempt specified here indicates another full circuit of migrations amongst all the configured servers. So for a 3 -server cluster, and a value of 2, a total of 4 additional migrations will be attempted, (the original server is never a valid destination)
  • JMS should be able to restart itself without undergoing an actual migration (for performance purposes, doing a full migration would be a waste of time for the problem in question.
  • a method will be added to MigrationManager that will request a service restart for the named service, or a deactivation and reactivation on the same server; no resources released or acquired. Requests a 'soft migration'.
  • the specified migratable service will be deactivated and then reactivated on the same server. Nodemanager scripts will NOT be invoked. Services that are dependent on this migratable will be restarted as well restartMigratable(Migratable m) Repeated, rapid restart attempts on one server will be interpreted as an error after a certain threshold is met and the target will be migrated.
  • Hidden get/set methods on ServerMBean can control how many attempts can be made and how long the period is. (For example, it could be set to allow up to 3 restarts within a 12 hour period.) Controls how many times a service may be restarted within the interval specified in getlntervalForRestartAttemptThrottling. get/setAllowedRestartAttemptsOControls how long the interval is for throttling restart attempts. See getAllowedRestartAttempts. gct/sctIntcrvalForRcstartAttcmptThrottlingO
  • the Migration Master can be a service similar to a Cluster Master. It can be a lightweight singleton, stateless and maintained by lease competition, in the same manner as the Cluster Master. Each server can register a permanent interest in obtaining the Migration Master lease. Whatever server currently holds it can perform the starting and stopping of migration tasks. If the current Migration Master crashes or is shutdown, one of the waiting servers will be chosen by the leasing infrastructure to take over the lease, becoming the new Migration Master. The Migration Master does not have to be collocated with the Cluster Master.
  • the master migration can be the repository of migration information. Tt can keep records of all the migrations it has done (target name, source server, destination server, timestamp). If the admin server is available, it can report migrations to the admin server for console/JMX/WLST display.
  • Non-debug level logging can be added to provide information to the user when migrations happen. Activation and Deactivation of a target on a server can be logged.
  • the current Migration Master can log the details of the migration: Source, Destination, Target
  • Time It can also log whether or not each migration passed or failed. A failed migration can be logged as a warning, not an error. If a service cannot be started on any server successfully, we will log an error.
  • MigratableTarget can be modified to extend SingletonService.
  • MigratableTarget can provide additional functionality in the way of pre/post activation scripts required for some of the current services that live on
  • MigratableTargets Note that the Migratable interface that some services may implement is not a SingletonService. Migratable merely means a class can be targeted to a
  • MigratableTarget It is the MigratableTarget itself that is actually operated upon by the code. MigratableTargets can start/stop Migratable classes as appropriate, as they always have.
  • the SingletonService interface may be implemented by customers or internal users looking for a lightweight cluster-wide singleton. It does not have as many features as the
  • MigratableTarget (which will support scripts, candidate machines, etc), but is much easier to configure and create.
  • a SingletonService can request immediate migration by calling deactivate on itself.
  • the MigratablcTargctMBcan can have extra, optional attributes. PrcScript, PostScript and AutoMigratable. ⁇ MigratableTarget Cluster ⁇ 'mycluster"
  • the console migratable target page can require an extra checkbox to allow enabling of automatic migration. PreScript and PostScript are not required, but will be executed if they exist.
  • the console editing page for Migratable Targets can have these options settable there. Multiple Migratable Services targeted to one Migratable Target can specify the order of their activation, in case there arc dependencies. Targets that arc auto-migratablc need not be ordered with respect to each other. Service order will still be respected whether or not the target is automatically migratable or not.
  • the Migratable Target infrastructure in general is internal only.
  • Migratable services can be allowed to specify a deployment order in their MBean.
  • the behavior is modeled on deployment order.
  • a target When a target is asked to activate its component services, it can do so in order of smallest Order to largest Order. If no Order is specified, a default value can be assigned. For consistency, this can be the same default that deployment order uses: 100. If two services have the same Order number, there is no guarantee of their order of activation with regards to each other.
  • When a target is asked to deactivate its component services, it can do so in order of largest Order to smallest Order. Note that if two services have the same Order number, their deactivation order is not guaranteed be the reverse of their Activation order.
  • Order can be a dynamic value.
  • the current value of Order is always the one used. This means that if the Order changes between activation and deactivation, the sequences may not be exact reverses of each other.
  • the weblgoic.cluster.migration.Migratable interface can have the following method added:
  • the implementation and MBeans can be augmented with an additional setOrder method to allow user configuration of this value. This is NOT required, however. It is up to each individual implementor to decide if they want the order configurable.
  • a default order variable can be provided in the base interface: DEFAULT_ORDER.
  • DEFAULT_ORDER By default, all current implementing classes will return it from the getOrderQ call. This can assure that the current behavior will not be changed, until people make a specific effort to change their orderings.
  • Job Scheduler can make the timers cluster aware and provides the ability to execute them any where in the cluster. Timers are no longer tied to the server that created them. The purpose of this specification is to:
  • Timers should be able to execute anywhere in the cluster and failover as needed.
  • 2. Provide cron job type of execution within the application server cluster. Users should be able to specify things like "execute this job repeatedly somewhere in the cluster. The job should run if there is at least one running member in the cluster". There is no dependency on the server that actually created the timer. The timer execution is load balanced across the cluster and is able to failover to another running member in case of failures.
  • timers There can be two types of timers that can be differentiated based on their lifecycle.
  • a local timer can be scheduled within a server JAVA Virtual Machine
  • TVM timer
  • JVM JVM
  • the timer runs as long as the JVM is alive and dies when the JVM exits.
  • the application needs to reschedule the timer on subsequent server startup.
  • Cluster Wide Timers A cluster wide timer can be aware of other server JVM's that form part of the same cluster and is able to load balance and failover. The timers lifecycle is not bound to the server that created it but it is bound to the lifecycle of the cluster. As long as at least one cluster member is alive the timer can be able to execute. Such timers are able to survive a complete cluster restart. Cluster wide timers are created and handled by the Job
  • Job Scheduler can meet the following requirements:
  • Job Scheduler is dependent on a database and cannot function without it.
  • Oracle, DB2, Informix, MySQL, Sybase, MSSQL are supported.
  • Job Scheduler will only work in a cluster.
  • Submitted jobs can run anywhere in the cluster. Two consecutive executions of a job can run on the same server or on different servers.
  • Job Scheduler is dependent on Leasing. Leasing support is needed to elect the TimerMaster. Each server can also use leasing to claim ownership on the job before executing it. 5. Job Scheduler can use the same leasing basis as Server Migration and
  • Job Scheduler can be bound into the global JNDI tree of each server using a well defined name.
  • the JNDl name can be "web logic.
  • JobScheduler Users can cast the looked up object to commonj .timers .TimerManager and use its methods to create jobs.
  • ClusterMBean can expose an attribute called DataSourceForJobScheduler that will be used to access the database.
  • Job Scheduler functionality is only available with the datasourcc is configured.
  • Job Scheduler will only support schedule at fixed delay functionality. Two consecutive job executions are separated by an 'interval' period. 10. In one embodiment, only round-robin load balancing of jobs is supported. Every cluster member will periodically poll the TimerMaster (which is just another cluster member) for ready jobs to execute. The TimerMaster will give a fraction of the total ready jobs to each member for execution.
  • Job Scheduler can require a database for persisting timers. All databases supported by Server Migration functionality can be supported by Job Scheduler as well. Job Scheduler can access the database using ClusterMBean.getDataSourceForJobScheduler ⁇ . Users can create a table called "weblogic_timers" with the following fields:
  • Job Scheduler only functions in a cluster. All cluster nodes can participate in executing jobs without discrimination. In one embodiment, Job Scheduler will be turned on only if the DataSourceForJobScheduler ClusterMBean attribute is set to a valid data source in config.xml.
  • DataSourceForJobScheduler ClusterMBean attribute is set to a valid data source in config.xml.
  • Job Scheduler can be looked up using the JNDI name "weblogic.JobScheduler" and cast to commonj. timers.
  • TimerManager jobScheduler (common.timers.TimerManager) ic.lookupC'weblogic.JobScheduler”); commonj. timers.
  • TimerListener timerListener new MySerializableTimerListenerO; jobScheduler.schedule(timerListener, 0, 30*1000); // execute this job every 30 seconds
  • MySerializableTimerListener implements commonj.timers.TimerListener, java.io.Serializable ⁇ public void timerExpired(Timer timer) ( ... )
  • Job scheduler can use leasing functionality to claim ownership of individual timers before execution and to select a Timer Master.
  • the Timer Master can be running on exactly one cluster member and is responsible for allocating timers to individual servers.
  • the leasing basis can be dependent on the ClusterMBean.getLeasingBas ⁇ sO attribute. If the LeasingBasis is set to database then the configuration associated with database leasing can be setup just like in Server Migration. If the LeasingBasis is set to "consensus" then no database support is required for leasing. Console can provide an option to set
  • ClusterMBean.setDataSourceForJobScheduler ⁇ The data source can be inherited from server migration or session persistence during shutdown. If customers configure data source for one they should be able to reuse it for Job Scheduler functionality as well.
  • One embodiment may be implemented using a conventional general purpose of a specialized digital computer or microprocessor(s) programmed according to the teachings of the present disclosure, as will be apparent to those skilled in the computer art.
  • Appropriate software coding can readily be prepared by skilled programmers based on the teachings of the present discloser, as will be apparent to those skilled in the software art.
  • the invention may also be implemented by the preparation of integrated circuits or by interconnecting an appropriate network of conventional component circuits, as will be readily apparent to those skilled in the art.
  • One embodiment includes a computer program product which is a storage medium (media) having instructions stored thereon/in which can be used to program a computer to perform any of the features present herein.
  • the storage medium can include, but is not limited to, any type of disk including floppy disks, optical discs, DVD, CD-ROMs, micro drive, and magneto-optical disks, ROMs, RAMs, EPROMs, EEPROMs, DRAMs 5 flash memory of media or device suitable for storing instructions and/or data stored on any one of the computer readable medium (media), the present invention can include software for controlling both the hardware of the general purpose/specialized computer or microprocessor, and for enabling the computer or microprocessor to interact with a human user or other mechanism utilizing the results of the present invention.
  • Such software may include, but is not limited to, device drivers, operating systems, execution environments/containers, and user applications.
  • Embodiments of the present invention can include providing code for implementing processes of the present invention.
  • the providing can include providing code to a user in any manner.
  • the providing can include transmitting digital signals containing the code to a user; providing the code on a physical media to a user; or any other method of making the code available.
  • Embodiments of the present invention can include a computer implemented method for transmitting code which can be executed at a computer to perform any of the processes of embodiments of the present invention.
  • the transmitting can include transfer through any portion of a network, such as the Internet; through wires, the atmosphere or space; or any other type of transmission.
  • the transmitting can include initiating a transmission of code; or causing the code to pass into any region or country from another region or country.
  • transmitting includes causing the transfer of code through a portion of a network as a result of previously addressing and sending data including the code to a user.
  • a transmission to a user can include any transmission received by the user in any region or country, regardless of the location from which the transmission is sent.
  • Embodiments of the present invention can include a signal containing code which can be executed at a computer to perform any of the processes of embodiments of the present invention.
  • the signal can be transmitted through a network, such as the Internet; through wires, the atmosphere or space; or any other type of transmission.
  • the entire signal need not be in transit at the same time.
  • the signal can extend in time over the period of its transfer. The signal is not to be considered as a snapshot of what is currently in transit.

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Hardware Redundancy (AREA)
  • Pharmaceuticals Containing Other Organic And Inorganic Compounds (AREA)

Abstract

One embodiment of the present invention comprises determining a cluster leader and using the cluster leader to set up a lease table at an application server of a cluster of application servers. The lease table can be used to maintain at least one lease for a singleton service.

Description

NEXT GENERATION CLUSTERING
COPYRIGHT NOTICE A portion of the disclosure of this patent document contains material which is subject to copyright protection. The copyright owner has no objection to the facsimile reproduction by anyone of the patent document or the patent disclosure, as it appears in the Patent and Trademark Office patent file or records, but otherwise reserves all copyright rights whatsoever.
CLAIM OF PRIORITY
U.S. Provisional Patent Application No. 60/747,364 entitled "Next Generation Clustering", by Naresh Revanuru et ah, filed May 16, 2006 [Arty. Docket No. BEAS- 01937US0]. U.S. Patent Application No. 11/425,784 entitled "Automatic Migratable Services", by Aaron Fiske, filed June 22, 2006 [Atty. Docket No. BEAS-02030USO].
U.S. Patent Application No. 11/548,239 entitled "Job Scheduler", by Naresh Revanuru et al., filed October 10, 2006 [Atty. Docket No. BEAS-0203 IUSO].
U.S. Patent Application No. 11/550,551 entitled "Database-Less Leasing", by Naresh Revanuru et ah, filed October 18, 2006 [Atty. Docket No. BEAS-02029US0].
BACKGROUND OF INVENTION
In order to handle a large number of interactions, enterprise software applications can use application servers, such as J2EE application servers like the WebLogic Server™ available from BEA Systems, Inc., of San Jose, California. These application servers can be used in clusters that can interact with one another.
Some of the services of the application servers, called singleton services should be run on only one application server of a cluster. These singleton services can include JMS servers, transaction recovery services or any other software that should be only run in a single instance.
BRIEF DESCRIPTION OF THE DRAWINGS
Figure 1 shows a database-based leasing system. Figure 2 shows a database-less leasing system of one embodiment of the present invention.
Figures 3A and 3B show a database-less leasing system of one embodiment of the present invention. Figures 4A-4C illustrate an automatic migratable service system of one embodiment of the present invention.
Figures 5A and 5B illustrate a job scheduler system.
DETAILED DESCRTPTTON
Database-less leasing
Figure 1 shows an example of a leasing system using a database 102. In this example, application servers 104, 106 and 108 of the cluster 110 can rely on the database to provide access to a lease table 102. Leases at the lease table 102 can be used to indicate what application server should run a singleton service. The leases can be updated by the application server running the singleton service. In case of a crash, the lease will no longer be updated and will become invalid. This can allow one of the application servers of the cluster 110 to take over for a crashed or partitioned application server that was controlling the lease system. In some cases, it is desired to avoid the requirement of a High Availability (HA) database for leasing. Embodiments of the present invention comprise a database-less leasing system.
One embodiment of the present invention is a computer-implemented method comprising a cluster 202 of application servers 204, 206, 208 and 210. The method can include determining a cluster leader 202, using the cluster leader 212 to set up a lease table
214 at one of the application servers, and using the lease table 214 to maintain least one lease 216 for a singleton service 218.
Since the lease table is stored at the application servers, no database is required. In one embodiment, copies of the lease table are maintained at each application server in the cluster so that the copy of the lease table is available in case of a crash or partition.
The lease tables can be used to allow automatic migration of the singleton service.
Node managers can be used to determine the state of application servers in the cluster. The node manager can be a software program running on application server hosts. The node manager can be used to start and stop instances of the application servers.
The application server of the cluster that was started earliest can be selected to have the cluster leader. In one embodiment, the cluster leader is selected by a kind of competition. Every server in the cluster can periodically try to be the cluster leader. For example, every server in the cluster can try to be the cluster leader once every 30 seconds. If the cluster leader already exists, their attempt is rejected. If the cluster leader currently does not exist, the first server to try to claim it becomes cluster leader, thus preventing anyone else from becoming cluster leader. Tn this way, the application server of the cluster that was started earliest can be selected to have the cluster leader. Alternately, the system can be designed such that a cluster leader could be selected by another method.
The cluster leader 212 can heartbeat other application servers of the cluster. The cluster leader 212 can store copies of the lease table in the other application servers of the cluster 202 to operate in case of a crash or partition of one or more application servers. In one embodiment, if the current cluster leader 212 fails to the heartbeat the other application servers, the other application servers can select another cluster leader.
One embodiment of the present invention comprises a cluster 202 of the application servers 204, 206, 208 and 210. A cluster leader is selected based on the first application server up. The cluster leader 212 is used to set up a lease table 214 at one of the application servers 204.
One embodiment of the present invention comprises a computer-implemented system wherein a lease table 214 is maintained at an application server 204 of a cluster 202 of application servers. Other application servers of the cluster can use the lease table 214 to maintain at least one lease 216 for a singleton service 218. Figure 3A shows a cluster leader heartbeating data to the other application server of the cluster. Figure 3B shows another cluster leader being selected in the case of a crash of the application server having the current cluster leader. Figure 3C shows another cluster leader being selected in the case of a partition of the network that makes the first application server unavailable.
Automatic Migratable Service
One embodiment of the present invention is a computer-implemented system comprising a first application server 402 of a cluster 404 that runs a singleton service 406. The first application server 102 maintaining a lease 408 for the singleton service 406 at a lease table 410. A migration master 412 checks the lease table 410 and reassigns the singleton service 406 to a second application server 414 of a cluster 404 if the first application server 402 fails to maintain the lease 408. The lease table 410 can be maintained in a database or by using database-less leasing as described above.
The first application server 402 can fail to update the lease because of a crash of the first application server as shown in figure 4B or the first application server 402 can fail to update the lease because the first application server 402 is partitioned from the lease table as shown in figure 4C. The first application server 402 can heartbeat the lease 408 to maintain control of the singleton service 406. The singleton service can be a JMS server, a timer master or any other software that should be run in a single instance.
The second application server 414 can run a predetermined activation script before getting the singleton service. The first application server 402 can run a predetermined deactivation script after giving up the singleton service. The migration master 412 can select the next application server to run the singleton service, such as by selecting the next application server.
In one embodiment, there can be a special rule if the singleton service is a Java Messaging System (JMS) service. If the singleton service is a JMS service, the migration manager can attempt a restart on the first application server before any migration. One embodiment is a computer implemented method or computer readable media containing code to do the steps of updating a lease 408 at a lease table 410 for a singleton service. At first application server 402, checking the lease table 410 with a migration master 412. In addition, reassigning the singleton service 406 to a second application server if the first application server does not maintain the lease 408.
Job Scheduler
One embodiment of the present invention is a timer master 502 at an application server 504 of a cluster 506. The timer master 502 assigns scheduled jobs to other applications servers 508, 510 and 512 of the cluster. The application server 504 maintains a lease 514 for the timer master from a lease table 516. The timer master 502 storing job info 520 for the scheduled jobs in a database. In the case of a crash of the application server 504, another application server 510 of the cluster 506 can be assigned the time master 502 which can use the job info to assign scheduled jobs. The scheduled jobs can include reports, such as database reports. Such reports can require a large number of database accesses and thus can take a lot of system resources. The scheduled jobs can thus be scheduled to run at an off-peak time so as to not reduce the performance of other applications. The lease table can be in the database or alternately a database-less leasing system can be used. The timer master 502 can be a singleton service. The timer master 502 can be assigned to the application server 510 by a migration master. Other application servers can request jobs from the timer master 502.
One embodiment of the present invention is a computer-implemented system comprising a timer master 502 at an application server 504 of a cluster. The timer master 502 can assign scheduled jobs to other application servers 508, 510 and 512 of the cluster
506. In the case of a crash of the application server 504, another application server 510 of the cluster 506 can be assigned the timer master that can assign scheduled jobs.
One embodiment of the present invention is an application server 504 of a cluster
506 assigning scheduled jobs to other application servers of the cluster 504. In the case of a crash of the application server 504, assigning another application server 510 the timer master 502. Thereafter, scheduled jobs being assigned using the timer master 502 at the other application server 510.
DETAILED DESCRIPTION OF EXEMPLARY EMBODIMENT Details of one exemplary embodiment are described below. These details give one example of how to implement the claimed invention and arc not meant to limit the scope of the invention or to narrow the scope of any of the claimed terms.
Advanced clustering features like automatic server and service migration, cluster wide singleton and lock manager can use leasing and lease management. Leasing can guarantee that only one member in a cluster gets ownership of the lease for a certain period of time that can be renewed. The lease owner is then able to execute certain privileged operations like migrating failed servers knowing that it has exclusive ownership of the lease. This specification describes how leasing and cluster master type of functionality can be implemented without any dependency on an external arbitrator like a high availability database.
LeaseManagers can be used by subsystems to obtain leases, register interest in getting a lease when one becomes available, find out the current owner of the lease etc. One type of leasing basis used for automatic server migration requires the presence of a High Availability (HA) database. In other words, the lease table is always hosted in the database and the database should be highly available for the cluster to query and update the leases. The lease table can be hosted in one of the servers in the cluster and not in the database. This means that the cluster members can elect a server that will host the lease table and become the cluster Leader. This elected cluster Leader can be responsible for giving out leases, updating the lease table and replicating the updates atomically to the cluster. Replication is important for failover purposes. Tf the cluster Leader becomes unavailable and does not send heartbeat to the group asserting its presence, the members can start another round of voting to elect a new cluster master. The newly elected master can take ownership of the lease table. Apart from taking ownership and hosting the lease table, the cluster master owner can also perform automatic server migration of failed cluster nodes. In one embodiment, consensus based leasing can meet the following requirements:
1. There will be at most one cluster master at any given point in time. This means that there will never be more than one cluster master but there could be short periods where the cluster is without a cluster master.
2. There will be short period just after the cluster startup when there is no cluster master. The period could be in the order of minutes.
3. When the current cluster master dies, the election of a new cluster master can take time equal to the sum of:
1. heartbeat timeout (next poll time). This is the time period during which the cluster members have not received any heartbeat from the cluster master. By default this period is 30 seconds.
2. Time taken by the algorithm to reach consensus.
4. Users can mark a subset of the cluster on which the cluster on which the cluster master can be hoisted. This subset is better suited for participating in consensus due to the presence of redundant network connections and such. But this means that if all the members in the consensus list die, then the cluster will be without a cluster master. It is strongly recommended that the consensus list be of separate machines.
Members of the cluster that can host the cluster master and participate in the consensus algorithm can be marked with a special ConsensusProcessIdentifϊer in the confϊg.xml. This identifier can be a unique integer value. This can be an attributed on ServerMBean. Customers should just be able to mark the servers that can host the cluster master and the product should be able to generate the identifiers automatically. There can be another attribute on ClusterMBean that specifies the total number of consensus participants. In one embodiment, this attribute is called ConsensusParticipants. It can be the sum total of servers in a cluster that have the CunsensusProcessIdentifer.
Reaching an agreement on who is going to be the cluster leader can be a time consuming process. Once a cluster leader is elected, requests for leases can be arbitrated by the cluster master directly without going through a round of consensus. The cluster leader will update the lease table and replicate the updates to all other members in the consensus list. This can be different from database leasing. In database leasing, all the leases including the lease for the cluster leader can be maintained in the database as peers. In consensus leasing basis, the cluster leader lease can be used to grant other leases quickly. It can be possible to choose the default leasing basis. Consensus leasing basis can be the default setting. Customers can override the default with the database leasing basis if they want to. A value, such as ClusterMBean.setMigrationBasisO, can control the default.
The console can allow customers to choose which cluster nodes can host the cluster master and automatically generate the consensus process identifier. It can also set the value ClusterMBean. setConsensusParticipantsO based on the number of servers chosen in the cluster.
The consensus leasing basis like all other implementations of the LeasingBasis interface can be hidden from subsystems and external users. Subsystems can ask for a singleton service by implementing the weblogic.cluster.singleton.SingletonService interface and then registering with a SingletonService Manager. LockManager can also be implemented on top of leasing.
When a migratable service becomes unavailable for any reason (a bug in the service code, server crashing, network partition) it can be deactivated at its current location and activated on a new server. Should there be a failure while activating on the new server, it can be deactivated on that server and migrated again. By default, we can attempt to start on every candidate server in the cluster until it has either started or failed on every one. If it fails on every single one, it can log an error and leave it deactivated. A service's liveness can be proven by the maintenance of a lease. The server that the service lives on can be responsible for keeping the lease alive, via a heartbeating mechanism. A server crash can naturally result in the lease timing out. In one embodiment, there is only one lease per Migratable Target. All services on that target can share the lease. Tn one embodiment, all services on a target can assumed to be somewhat dependent on each other. (Or at least, the user should tolerate one failing service on a target causing the entire target to migrate). In one embodiment, the admin server does not need to be active for automatic migration.
The Migration Master (MM) can keep track of all services it is supposed to keep alive. The information can be available from the configuration. This is useful because if a service is unleased, then it will no longer exist in the table to be monitored. The configuration can be the same across all the services in the cluster.
Servers can be in Manager Service Independence (MSI) mode and still participate in automatic migration. The only restriction is that new services cannot be started. In one embodiment, if they are deployed to a single server, without an admin server to distribute the configuration change, the service will _not_ be automatically migrated. When the admin server starts and synchronizes everyone's configuration, migration can be enabled for any newly added services (and can persist even if the admin server is later shut down).
If the leases are lost (for example, network issues may cause the leases to be lost), the server can deactivate its services. The MM can start the service somewhere else. This may result in redundant deactivation calls if the network connection is restored after deactivation but before MM notices the lease timeout, but deactivation is idempotent.
If the service is unhealthy yet not disconnected, it will communicate to the Migratable Target and tell it to relinquish the lease. The MM will notice the lease disappearing/timing out, and will migrate it. The following method can be added to the MigratableTarget: * Called by Migratable classes when they detect a failure and need to stop and start on a different server. Should only be used for unrecoverable failures in a Migratable object. If shutdowns erver is true (as it would be for JTA), then the server will be shutdown and the service deactivated as a consequence of this.*/public void failcdScrvicc(String serviceName, boolean shutdownServer)
The handling of fencing can be difficult in the case of JTA. Should the deactivation take a long time, there is no way of guaranteeing the service will be deactivated by the time the next server tries to activate it. In this case, if a graceful shutdown takes longer than the lease period, the server can immediately and abruptly exit. Its services can be taken over and recovered by the newly chosen home for the migratable services.
The Migration Master, upon noticing the expired lease, can start a migration. Tt can set a flag noting that the migration has begun for particular service. This can prevent re- noticing an expired lease and migrating again in the middle of a previous migration. (The same mechanism is used in Server migration.) The current location of the service (if it is still available) can deactivate itself. Then the new location can call activate on the target. This can be the same code path as in the original migratable services. There can be additional steps introduced, however.
When the target is being activated, its first action can be to claim the lease. This can stop the migration master from constantly checking for its liveness. Jt can also provide an atomicity lock on the operation; no one else can be activating while this one holds the lease.
Next, the service can check if a named Node Master (NM) pre-migration script is specified for it. Then it can check for the existence of the node master on the current machine. If it is not there, but there is a script specified, it can stop the migration. If it is there, it can check to see if the node master has performed a specified pre-migration script. If it hasn't run the script already, it can tell the node master to run pre-migration script. Extra flags can be passed to this script, to allow the script to do something different if we're migrating JTA, JMS or something else entirely. Placeholder scripts, can be provided but specific tlog migration, for example, need not be done. In one embodiment, activation will not proceed until the node master responds positively that it has run the pre-migration script. We can make repeated attempts to run the pre-migration. script. If it cannot for some reason, the migration will stop, and we will let the migration master migrate us to a new server.
At this point, we can call the migratable services' activate() methods in order. If they all run without exception, migration is now complete. If there is an error during activation, we can stop and enter deactivation mode.
Deactivation is essentially the inverse of activation. First, the deactivating server can call deactivate on all the services in the specified order. Exceptions will be logged, but no action can be taken.
Once all the services have had deactivate called, we will perform another node master check. The service will check if a named node master post-migration script is specified for it. Then it can check for the existence of the node master on the current machine. If it is there, check to see if the node master has performed a specified post- migration script. If it hasn't run the script already, tell the node master to run the post- migration script. Extra flags can be passed to this script, to allow the script to do something different if we're migrating JTA, JMS or something else entirely. In one embodiment, we can provide placeholder scripts, but specific tlog migration, for example, cannot be done.
If the post-migration script fails, a kill script will be run, if available. If the kill script fails, activation can continue as normal. In the worst case, we can deactivate everywhere and let the admin reactivate it when the issues have been addressed. Finally, when the script part is complete, the service will give up the lease. Scripts will be run during manual migration, if specified.
In one embodiment, there is no automatic failback mechanism. The service can live in its new location forever, by default. Administrators may manually migrate the target to a new (or, in the case of failback, old) server at any time in the same exact manner as they did before.
Should a service be migrated to every server and never come up successfully, it can be left deactivated. Optional settings on the MigratableTargetMBean can control how many attempts to make in terms of the number of complete cluster loops. Note that existing migratable target limitations can still apply: if candidate servers are specified, only servers in the candidate server list will be tried.
The AdditionalMigrationAttempts can default to zero. It can control the number of times we will try to migrate the service across every server in the cluster (or candidate- server list, if specified.) For example, if a cluster has 3 members, and the AdditionalMigrationAttempts is set to 2, we can try to start it on every server in the cluster, then pause, try again, pause, and try one final time. In this example, this means each server can have 3 opportunities to bring up the service successfully. The pause between migration attempts can be controlled by a value, such as
MillisToSleepBetweenAttempts. In one embodiment, this ONLY controls the pause that happens when the service fails to come up on any server, so we start back at the first and try again. While doing normal migrations there need be no delays.
<Migratabl eTarget Cluster="mycluster"
ConstrainedCandidateServers="serverl,server2" Name="MIG-TAR-l" UserPreferredServer="serverl " AdditionalMigrationAttempts="2" MillisToSleepBetweenAttempts=" 12000"
AutoMigratable="true" />
The following methods can be added to the MigratableTargetMBean:
/
* A migratablc service could fail to come up on every possible configured server. This attribute controls how many further attempts, after the service has failed on every server at least once, should be tried. Note that each attempt specified here indicates another full circuit of migrations amongst all the configured servers. So for a 3 -server cluster, and a value of 2, a total of 4 additional migrations will be attempted, (the original server is never a valid destination)
*/ get/setAdditionalMigrationAttempts()
/**
* Controls how long of a pause there should be between the migration attempts described in getAdditionalMigrationAttemptsO. Note that this delay only happens when the service has failed to come up on every server. It does not cause any sort of delay between attempts to migrate otherwise.
*/ gct/sctMillisToSlccpBctwccnAttcmpts()
JMS should be able to restart itself without undergoing an actual migration (for performance purposes, doing a full migration would be a waste of time for the problem in question. For their purposes, a method will be added to MigrationManager that will request a service restart for the named service, or a deactivation and reactivation on the same server; no resources released or acquired. Requests a 'soft migration'. The specified migratable service will be deactivated and then reactivated on the same server. Nodemanager scripts will NOT be invoked. Services that are dependent on this migratable will be restarted as well restartMigratable(Migratable m) Repeated, rapid restart attempts on one server will be interpreted as an error after a certain threshold is met and the target will be migrated. Since this is a method only designed for internal use, external configuration of the threshold need not be provided. Hidden get/set methods on ServerMBean can control how many attempts can be made and how long the period is. (For example, it could be set to allow up to 3 restarts within a 12 hour period.) Controls how many times a service may be restarted within the interval specified in getlntervalForRestartAttemptThrottling. get/setAllowedRestartAttemptsOControls how long the interval is for throttling restart attempts. See getAllowedRestartAttempts. gct/sctIntcrvalForRcstartAttcmptThrottlingO
The Migration Master can be a service similar to a Cluster Master. It can be a lightweight singleton, stateless and maintained by lease competition, in the same manner as the Cluster Master. Each server can register a permanent interest in obtaining the Migration Master lease. Whatever server currently holds it can perform the starting and stopping of migration tasks. If the current Migration Master crashes or is shutdown, one of the waiting servers will be chosen by the leasing infrastructure to take over the lease, becoming the new Migration Master. The Migration Master does not have to be collocated with the Cluster Master. The master migration can be the repository of migration information. Tt can keep records of all the migrations it has done (target name, source server, destination server, timestamp). If the admin server is available, it can report migrations to the admin server for console/JMX/WLST display.
Non-debug level logging can be added to provide information to the user when migrations happen. Activation and Deactivation of a target on a server can be logged. The current Migration Master can log the details of the migration: Source, Destination, Target
Name, Time. It can also log whether or not each migration passed or failed. A failed migration can be logged as a warning, not an error. If a service cannot be started on any server successfully, we will log an error.
There can be a new interface, SingletonService. MigratableTarget can be modified to extend SingletonService. MigratableTarget can provide additional functionality in the way of pre/post activation scripts required for some of the current services that live on
MigratableTargets. Note that the Migratable interface that some services may implement is not a SingletonService. Migratable merely means a class can be targeted to a
MigratableTarget. It is the MigratableTarget itself that is actually operated upon by the code. MigratableTargets can start/stop Migratable classes as appropriate, as they always have.
The SingletonService interface may be implemented by customers or internal users looking for a lightweight cluster-wide singleton. It does not have as many features as the
MigratableTarget (which will support scripts, candidate machines, etc), but is much easier to configure and create.
A SingletonService can request immediate migration by calling deactivate on itself.
The MM will notice the disappeared lease and will migrate the service to a new location. interface SingletonService
/ * This is called upon server start and during the activation stage of migrating. It should obtain any system resources and start any services required for the SingletonService to begin serving requests.
*/ public void activate() /*
* This is called upon server shutdown and during the deactivation stage of migration. It should release any resources obtained in activate, and stop any services that should only be available from one provider in the cluster.
*/ public void deactivateO
The MigratablcTargctMBcan can have extra, optional attributes. PrcScript, PostScript and AutoMigratable. <MigratableTarget Cluster^'mycluster"
ConstrainedCandidateS ervers=" server 1 ,server2 " Name="MTG-TAR-l " UserPreferredServer="serverl "
PreScript=" ../scriptdir/runMeBeforeActivation" PostScript="../scriptdir/runMeAfterDeactivation" KillScript=" ../anotherscriptdir/runMelnCaseOfFailure" AutoMigratable="true" />
The following methods can be added to the MigratableTargetMBean
/**
* Sets the auto migratable value. If a Migratable Target is automatically migratable, it will be migrated automatically upon the shutdown or failure of the server where it currently lives. */ get/setAutoMigratableO
/**
* Sets the script to run before a Migratable Target is actually activated. Before the target is activated, if there is a script specified and NodeManager available, we will run the script. Setting a script without a NodeManager available will result in an error upon migration. If the script fails or cannot be found, migration will not proceed on the current server, and will be tried on the next suitable server. (The next server in the candidate server list, or the cluster, if there is no candidate list.) */ get/setPreS criptFileNameO
/**
* Sets the script to run after a Migratablc Target is fully deactivated. After the target is deactivated, if there is a script specified and NodeManager available, we will run the script. Setting a script without a NodeManager available will result in an error upon migration. If the script fails or cannot be found, migration will still proceed. */ get/setPostScriptFileNameO /**
* Sets the script to run in case a Migratable Target's post script fails. Setting a script without a NodeManager available will result in an error upon migration. If the script fails or cannot be found, migration will still proceed.
*/ get/setKillScriptFileName()
The console migratable target page can require an extra checkbox to allow enabling of automatic migration. PreScript and PostScript are not required, but will be executed if they exist. The console editing page for Migratable Targets can have these options settable there. Multiple Migratable Services targeted to one Migratable Target can specify the order of their activation, in case there arc dependencies. Targets that arc auto-migratablc need not be ordered with respect to each other. Service order will still be respected whether or not the target is automatically migratable or not.
Ordering need not be exposed to customers. The Migratable Target infrastructure in general is internal only.
Migratable services can be allowed to specify a deployment order in their MBean. The behavior is modeled on deployment order. There can be a value called 'Order' that accepts integers. (Including negative values.) When a target is asked to activate its component services, it can do so in order of smallest Order to largest Order. If no Order is specified, a default value can be assigned. For consistency, this can be the same default that deployment order uses: 100. If two services have the same Order number, there is no guarantee of their order of activation with regards to each other. When a target is asked to deactivate its component services, it can do so in order of largest Order to smallest Order. Note that if two services have the same Order number, their deactivation order is not guaranteed be the reverse of their Activation order.
Order can be a dynamic value. The current value of Order is always the one used. This means that if the Order changes between activation and deactivation, the sequences may not be exact reverses of each other.
The case of a failed activation can follow the same rules as normal activation and deactivation. Deactivation of the successfully activated services can happen in reverse order, unless Order numbers are identical. Tn that case, the deactivation order may not be a reverse of the activation order.
The weblgoic.cluster.migration.Migratable interface can have the following method added:
/** * Returns the order value for this particular migratable object. This controls in which order this object will be activated and deactivated with regards to all the other migratable objects deployed on a migratable target.
*/ public int getθrder()
The implementation and MBeans can be augmented with an additional setOrder method to allow user configuration of this value. This is NOT required, however. It is up to each individual implementor to decide if they want the order configurable.
A default order variable can be provided in the base interface: DEFAULT_ORDER. By default, all current implementing classes will return it from the getOrderQ call. This can assure that the current behavior will not be changed, until people make a specific effort to change their orderings.
Job Scheduler can make the timers cluster aware and provides the ability to execute them any where in the cluster. Timers are no longer tied to the server that created them. The purpose of this specification is to:
1. Make timescluster aware. Timers should be able to execute anywhere in the cluster and failover as needed. 2. Provide cron job type of execution within the application server cluster. Users should be able to specify things like "execute this job repeatedly somewhere in the cluster. The job should run if there is at least one running member in the cluster". There is no dependency on the server that actually created the timer. The timer execution is load balanced across the cluster and is able to failover to another running member in case of failures.
There can be two types of timers that can be differentiated based on their lifecycle.
• Local Timer A local timer can be scheduled within a server JAVA Virtual Machine
(TVM) and lives within the same JVM forever. The timer runs as long as the JVM is alive and dies when the JVM exits. The application needs to reschedule the timer on subsequent server startup.
• Cluster Wide Timers A cluster wide timer can be aware of other server JVM's that form part of the same cluster and is able to load balance and failover. The timers lifecycle is not bound to the server that created it but it is bound to the lifecycle of the cluster. As long as at least one cluster member is alive the timer can be able to execute. Such timers are able to survive a complete cluster restart. Cluster wide timers are created and handled by the Job
Scheduler.
Each type can have its own advantages and disadvantages. Local timers can handle fine grained periodicity in the order of milliseconds. Job schedulers cannot handle fine grained periodicity with precision as the timers need to be persisted. Cluster wide timers work well with coarse grained intervals in the order of few seconds or more. Job scheduler can be used to schedule jobs like running reports every day or at the end of every week. It cab be important to run the job even if the server that created it is no longer available. Other cluster members can ensure that the job continues to execute. Job Scheduler can meet the following requirements:
1. Use customer configured database to persist timers and make them available to the entire cluster. Job Scheduler is dependent on a database and cannot function without it. Oracle, DB2, Informix, MySQL, Sybase, MSSQL are supported.
2. In one embodiment, the Job Scheduler will only work in a cluster.
3. Submitted jobs can run anywhere in the cluster. Two consecutive executions of a job can run on the same server or on different servers.
Only one server can execute the job at any given point in time.
4. Job Scheduler is dependent on Leasing. Leasing support is needed to elect the TimerMaster. Each server can also use leasing to claim ownership on the job before executing it. 5. Job Scheduler can use the same leasing basis as Server Migration and
Singleton Services.
6. Job Scheduler can be bound into the global JNDI tree of each server using a well defined name. The JNDl name can be "web logic. JobScheduler". Users can cast the looked up object to commonj .timers .TimerManager and use its methods to create jobs.
7. Only Serializable jobs are accepted by the Job Scheduler. Non- Serializable jobs can be rejected with an IllegalArgumentException.
8. ClusterMBean can expose an attribute called DataSourceForJobScheduler that will be used to access the database. In one embodiment, Job Scheduler functionality is only available with the datasourcc is configured.
9. In one embodiment, Job Scheduler will only support schedule at fixed delay functionality. Two consecutive job executions are separated by an 'interval' period. 10. In one embodiment, only round-robin load balancing of jobs is supported. Every cluster member will periodically poll the TimerMaster (which is just another cluster member) for ready jobs to execute. The TimerMaster will give a fraction of the total ready jobs to each member for execution.
Job Scheduler can require a database for persisting timers. All databases supported by Server Migration functionality can be supported by Job Scheduler as well. Job Scheduler can access the database using ClusterMBean.getDataSourceForJobSchedulerø. Users can create a table called "weblogic_timers" with the following fields:
Name Type
TIMER ID NUMBER
TIMERJNFO VARCHAR2(100)
TIMER_MANAGER_NAME VARCHAR2(100)
CLUSTER NAME VARCHAR2(100) DOMAΓN_NAME VARCHAR2(1OO)
TIMER LISTENER BLOB
NEXT_EXECUTION_TIME NUMBER
INTERVAL NUMBER
In one embodiment, the Job Scheduler only functions in a cluster. All cluster nodes can participate in executing jobs without discrimination. In one embodiment, Job Scheduler will be turned on only if the DataSourceForJobScheduler ClusterMBean attribute is set to a valid data source in config.xml. Here is an example:
<domain>
<cluster>
<name>Cluster-0</name>
<multicast-address>239.192.0.0</multicast-address> <multicast-port>7466</multicast-port>
<data-source-for-job-scheduler>JDBC Data Source-0</data-source-for-job-scheduler> </cluster>
<j dbc-system-resource> <name>JDBC Data Source-0</name>
<target>myserver,server-O</target>
<descriptor-file-name>jdbc/JDBC_Data_Source-0-3407-jdbc.xml</descriptor-file- name> </j dbc-system-resource> </domain>
Job Scheduler can be looked up using the JNDI name "weblogic.JobScheduler" and cast to commonj. timers. TimerManager. Here is an example: InitialContext ic = new InitialContextO; commonj .timers. TimerManager jobScheduler = (common.timers.TimerManager) ic.lookupC'weblogic.JobScheduler"); commonj. timers. TimerListener timerListener = new MySerializableTimerListenerO; jobScheduler.schedule(timerListener, 0, 30*1000); // execute this job every 30 seconds
private static class MySerializableTimerListener implements commonj.timers.TimerListener, java.io.Serializable { public void timerExpired(Timer timer) ( ... )
}
Job scheduler can use leasing functionality to claim ownership of individual timers before execution and to select a Timer Master. The Timer Master can be running on exactly one cluster member and is responsible for allocating timers to individual servers. The leasing basis can be dependent on the ClusterMBean.getLeasingBasϊsO attribute. If the LeasingBasis is set to database then the configuration associated with database leasing can be setup just like in Server Migration. If the LeasingBasis is set to "consensus" then no database support is required for leasing. Console can provide an option to set
ClusterMBean.setDataSourceForJobSchedulerø. The data source can be inherited from server migration or session persistence during shutdown. If customers configure data source for one they should be able to reuse it for Job Scheduler functionality as well.
One embodiment may be implemented using a conventional general purpose of a specialized digital computer or microprocessor(s) programmed according to the teachings of the present disclosure, as will be apparent to those skilled in the computer art. Appropriate software coding can readily be prepared by skilled programmers based on the teachings of the present discloser, as will be apparent to those skilled in the software art. The invention may also be implemented by the preparation of integrated circuits or by interconnecting an appropriate network of conventional component circuits, as will be readily apparent to those skilled in the art.
One embodiment includes a computer program product which is a storage medium (media) having instructions stored thereon/in which can be used to program a computer to perform any of the features present herein. The storage medium can include, but is not limited to, any type of disk including floppy disks, optical discs, DVD, CD-ROMs, micro drive, and magneto-optical disks, ROMs, RAMs, EPROMs, EEPROMs, DRAMs5 flash memory of media or device suitable for storing instructions and/or data stored on any one of the computer readable medium (media), the present invention can include software for controlling both the hardware of the general purpose/specialized computer or microprocessor, and for enabling the computer or microprocessor to interact with a human user or other mechanism utilizing the results of the present invention. Such software may include, but is not limited to, device drivers, operating systems, execution environments/containers, and user applications.
Embodiments of the present invention can include providing code for implementing processes of the present invention. The providing can include providing code to a user in any manner. For example, the providing can include transmitting digital signals containing the code to a user; providing the code on a physical media to a user; or any other method of making the code available.
Embodiments of the present invention can include a computer implemented method for transmitting code which can be executed at a computer to perform any of the processes of embodiments of the present invention. The transmitting can include transfer through any portion of a network, such as the Internet; through wires, the atmosphere or space; or any other type of transmission. The transmitting can include initiating a transmission of code; or causing the code to pass into any region or country from another region or country. For example, transmitting includes causing the transfer of code through a portion of a network as a result of previously addressing and sending data including the code to a user. A transmission to a user can include any transmission received by the user in any region or country, regardless of the location from which the transmission is sent.
Embodiments of the present invention can include a signal containing code which can be executed at a computer to perform any of the processes of embodiments of the present invention. The signal can be transmitted through a network, such as the Internet; through wires, the atmosphere or space; or any other type of transmission. The entire signal need not be in transit at the same time. The signal can extend in time over the period of its transfer. The signal is not to be considered as a snapshot of what is currently in transit.
The forgoing description of preferred embodiments of the present invention has been provided for the purposes of illustration and description. It is not intended to be exhaustive or to limit the invention to the precise forms disclosed. Many modifications and variations will be apparent to one of ordinary skill in the relevant arts. For example, steps preformed in the embodiments of the invention disclosed can be performed in alternate orders, certain steps can be omitted, and additional steps can be added. The embodiments where chosen and described in order to best explain the principles of the invention and its practical application, thereby enabling others skilled in the art to understand the invention for various embodiments and with various modifications that are suited to the particular used contemplated. It is intended that the scope of the invention be defined by the claims and their equivalents.

Claims

CLAIMSWhat is claimed is:
1. A computer implemented method comprising: at a cluster of application servers, determining a cluster leader; using the cluster leader to set up a lease table at one of the application servers; using the lease table to maintain at least one lease for a singleton service.
2. The computer implemented method of claim 1, wherein the lease table is used to allow automatic migration of the singleton service.
3. The computer implemented method of claim 1, wherein the cluster leader is selected in a competition.
4. The computer implemented method of claim 1, wherein each application server periodically attempts to be the cluster leader.
5. The computer implemented method of claim 1, wherein the determining step comprises selecting the application server that was started earliest.
6. The computer implemented method of claim 1, wherein the cluster leader heartbeats other application servers of the cluster.
7. The computer readable medium of claim 6, wherein if the cluster leader fails to heartbeat the other application servers, the other application servers select another cluster leader.
8. The computer implemented medium of claim 1, wherein the lease table is set up at the same application server as the cluster leader.
9. A computer implemented system comprising: a first application server of a cluster to run a singleton service, the first application server maintaining at lease for the singleton service at a lease table; and a migration master to check the lease table and to re-assign the singleton service to a second application server of the cluster if the first application server fails to maintain the lease.
10. The computer implemented system of claim 9, wherein the re-assignment of the singleton service is because the first application server crashed.
11. The computer implemented system of claim 9, wherein the re-assignment of the singleton service is because the first application server is partitioned from the lease table.
12. The computer implemented system of claim 9, wherein the first application server heartbeats the lease to maintain control of the singleton service.
13. The computer implemented system of claim 9, wherein the singleton service is a JMS server.
14. The computer implemented system of claim 9, wherein the singleton service is a transaction recovery service.
15. The computer implemented system of claim 9, wherein the singleton service is a timer master.
16. The computer implemented system of claim 9, wherein the second application server runs a predetermined activation script.
17. The computer implemented system of claim 9, wherein the first application server runs a predetermined deactivation script.
18. The computer implemented system of claim 9, wherein the migration master selects the next application server to run the singleton service.
19. The computer implemented system of claim 9, wherein the singleton service is a JMS service and wherein the migration manager attempts a restart on the first application server before any migration.
20. A computer implemented system comprising: a timer master at an application server of a cluster, the timer master assigning scheduled jobs to other application servers of the cluster; the application server maintaining a lease for the timer master from a lease table; the timer master storing job information for the scheduled jobs on a database, wherein in case of a crash of the application server another application server of the cluster is assigned the timer master and uses the job information to assign scheduled jobs.
21. The computer implemented system of claim 20, wherein the scheduled jobs include reports.
22. The computer implemented system of claim 21, wherein the reports are database reports.
23. The computer implemented system of claim 20, wherein the lease table is in the database.
24. The computer implemented system of claim 20, wherein the timer master is a singleton service.
25. The computer implemented system of claim 24, wherein the timer master is assigned to another application server by a migration master.
26. The computer implemented system of claim 20, wherein the other application servers request jobs from the time master.
27. The computer implemented system of claim 20, wherein the jobs are scheduled to run at an off-peak time.
EP07709948.9A 2006-05-16 2007-01-04 Next generation clustering Withdrawn EP2021910A4 (en)

Applications Claiming Priority (5)

Application Number Priority Date Filing Date Title
US74736406P 2006-05-16 2006-05-16
US11/425,784 US7536581B2 (en) 2006-05-16 2006-06-22 Automatic migratable services
US11/548,239 US7661015B2 (en) 2006-05-16 2006-10-10 Job scheduler
US11/550,551 US8122108B2 (en) 2006-05-16 2006-10-18 Database-less leasing
PCT/US2007/060102 WO2007136883A2 (en) 2006-05-16 2007-01-04 Next generation clustering

Publications (2)

Publication Number Publication Date
EP2021910A2 true EP2021910A2 (en) 2009-02-11
EP2021910A4 EP2021910A4 (en) 2015-05-06

Family

ID=38725393

Family Applications (1)

Application Number Title Priority Date Filing Date
EP07709948.9A Withdrawn EP2021910A4 (en) 2006-05-16 2007-01-04 Next generation clustering

Country Status (5)

Country Link
EP (1) EP2021910A4 (en)
CN (2) CN101460921B (en)
AU (1) AU2007254088A1 (en)
CA (1) CA2652147A1 (en)
WO (1) WO2007136883A2 (en)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
GB2505229B (en) * 2012-08-23 2019-10-16 Metaswitch Networks Ltd Upgrading nodes
US9411628B2 (en) 2014-11-13 2016-08-09 Microsoft Technology Licensing, Llc Virtual machine cluster backup in a multi-node environment
CN117033092A (en) * 2023-10-10 2023-11-10 北京大道云行科技有限公司 Single-instance service failover method and system, electronic equipment and storage medium

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2005502957A (en) * 2001-09-06 2005-01-27 ビーイーエイ システムズ, インコーポレイテッド Exactly one-time cache framework
US7392302B2 (en) * 2002-02-21 2008-06-24 Bea Systems, Inc. Systems and methods for automated service migration
US6944788B2 (en) * 2002-03-12 2005-09-13 Sun Microsystems, Inc. System and method for enabling failover for an application server cluster
CN1151635C (en) * 2002-07-09 2004-05-26 华中科技大学 General dispatching system based on content adaptive for colony network service
US20040153558A1 (en) * 2002-10-31 2004-08-05 Mesut Gunduc System and method for providing java based high availability clustering framework
CN100452797C (en) * 2005-07-15 2009-01-14 清华大学 High-available distributed boundary gateway protocol system based on cluster router structure

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
See references of WO2007136883A2 *

Also Published As

Publication number Publication date
CN103327066A (en) 2013-09-25
CN103327066B (en) 2016-08-17
CA2652147A1 (en) 2007-11-29
EP2021910A4 (en) 2015-05-06
CN101460921A (en) 2009-06-17
WO2007136883A2 (en) 2007-11-29
CN101460921B (en) 2013-05-22
WO2007136883A3 (en) 2008-04-24
AU2007254088A1 (en) 2007-11-29

Similar Documents

Publication Publication Date Title
US7536581B2 (en) Automatic migratable services
US8122108B2 (en) Database-less leasing
US7661015B2 (en) Job scheduler
US7380155B2 (en) System for highly available transaction recovery for transaction processing systems
US7447940B2 (en) System and method for providing singleton services in a cluster
US7620842B2 (en) Method for highly available transaction recovery for transaction processing systems
US8560889B2 (en) Adding scalability and fault tolerance to generic finite state machine frameworks for use in automated incident management of cloud computing infrastructures
US8769132B2 (en) Flexible failover policies in high availability computing systems
US20180026867A1 (en) Monitoring of replicated data instances
EP3276492B1 (en) Failover and recovery for replicated data instances
US8464092B1 (en) System and method for monitoring an application or service group within a cluster as a resource of another cluster
US9384103B2 (en) EJB cluster timer
US7516181B1 (en) Technique for project partitioning in a cluster of servers
WO2007136883A2 (en) Next generation clustering
WO2003073281A1 (en) Highly available transaction recovery for transaction processing systems

Legal Events

Date Code Title Description
PUAI Public reference made under article 153(3) epc to a published international application that has entered the european phase

Free format text: ORIGINAL CODE: 0009012

17P Request for examination filed

Effective date: 20081208

AK Designated contracting states

Kind code of ref document: A2

Designated state(s): AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HU IE IS IT LI LT LU LV MC NL PL PT RO SE SI SK TR

AX Request for extension of the european patent

Extension state: AL BA HR MK RS

RIN1 Information on inventor provided before grant (corrected)

Inventor name: PEDDADA, PRASAD

Inventor name: JACOBS, DEAN, BERNARD

Inventor name: FISKE, AARON

Inventor name: RANGANATHAN, VENKATESAN

Inventor name: FUNG, PRISCILLA, C.

Inventor name: REVANURU, NARESH

RBV Designated contracting states (corrected)

Designated state(s): DE FR GB NL

DAX Request for extension of the european patent (deleted)
RAP1 Party data changed (applicant data changed or rights of an application transferred)

Owner name: ORACLE INTERNATIONAL CORPORATION

RIC1 Information provided on ipc code assigned before grant

Ipc: G06F 7/00 20060101AFI20141111BHEP

Ipc: G06F 11/20 20060101ALI20141111BHEP

Ipc: H04L 29/08 20060101ALI20141111BHEP

Ipc: G06F 11/14 20060101ALI20141111BHEP

A4 Supplementary search report drawn up and despatched

Effective date: 20150409

RIC1 Information provided on ipc code assigned before grant

Ipc: G06F 7/00 20060101AFI20150401BHEP

Ipc: H04L 29/08 20060101ALI20150401BHEP

Ipc: G06F 11/14 20060101ALI20150401BHEP

Ipc: G06F 11/20 20060101ALI20150401BHEP

STAA Information on the status of an ep patent application or granted ep patent

Free format text: STATUS: THE APPLICATION IS DEEMED TO BE WITHDRAWN

18D Application deemed to be withdrawn

Effective date: 20151110