MXPA98005490A

MXPA98005490A - Dynamic changes in configurac

Info

Publication number: MXPA98005490A
Application number: MXPA/A/1998/005490A
Authority: MX
Inventors: W Arendt James; Chao Chingyun; David Kistler Michael; Daniel Lawlor Frank; Augusto Mancisidor Rodolfo; Ramanathan Jayashree; Raymond Strong Hovey
Original assignee: International Business Machines Corporation
Priority date: 1997-07-07
Filing date: 1998-07-07
Publication date: 1999-09-01

Abstract

Configuration changes are dynamically applied to a group multiprocessing system by placing a configuration change event on the waiting list. When the configuration change event is processed, the above configuration is backed up and each computer program component applies an appropriate portion of a configuration change transaction in a synchronized and orderly fashion. Each component of the computer program applies its portion of the transaction either by reinitialization or by means of a registered transaction operation. If the configuration change transaction fails, the components of the computation program re-execute or re-execute the portions of the configuration change already applied in a synchronized and orderly manner by restoring the previous configuration. Multiple events can be placed on the waiting list for different configuration changes

Description

redundant, regardless of whether the component is a processor, card-; of memory, hard disk drive, adapter, power supply, etc. While providing urj, seamless jump and continuous operation, fault tolerant systems are expensive due to the requirement of redundant computing equipment. access to shared resources. A node can "have" a set of resources -disks, volume groups, file systems, networks, network addresses and / or aplicacioh.es- as long as that node is available. When that node is deactivated, access to resources is provided through a different node. i An active configuration comprises a set of program entities and computer equipment in addition to a set of relationships between these entities, the combination of entities and relationships that provide services to users. Computer equipment entities specify nodes, adapters, shared disks, etc. while the computer program entities specify redirection and reintegration policies. For example, a particular computer program entity may specify that an application server must be redirected to node B when node A fails. You can also specify whether the application server should regress to node A when node A is reinstated. Within grouped multiprocessing systems, it would be advantageous to reconfigure critical mission that can not be deactivated for long periods of time (and preferably without deactivating). An example of a situation that requires uninterrupted support for dynamic configuration changes would be to perform a computer equipment update within a group of four nodes (nodes A, B, C, and D). A user might require deactivating the node, such as node D, to update it, update the computer equipment, reconnect node D to the group, and possibly make configuration changes. If node D were equipped with a faster processor and / or additional memory, for example, the user could wish for node D to become the primary system for an application server that previously ran on a different node. The user will want to make these changes and will want the changes to be preserved through power outages and group startups. Another example of a situation that requires dynamic changes: configuration involves dynamic and transient configuration changes. If the workload of a node increases temporarily, the user may wish to move an application server that previously ran on that system to another node. Since the increase in the workload is not normal, the non-I1I change needs to be maintained through the group starts.

There is at least one group computation program -HACMP for AIX®, which can be obtained from International Business Machines Corporation of Armonk, New York-which provides some dynamic reconfiguration capabilities. Ce.da node includes a default configuration that is copied into the active configuration for the respective node in the start group. The default configuration can be modified at the same time the group is active and copied to the default settings of other gnoclos. This default, modified configuration is subsequently copied in a stepwise configuration to each active node. The new configuration is verified and, when the daemons are refreshed for each group node, they are copied into the active configuration for the active nodes. The group services for inactive nodes added by the reconfiguration can then be started. The existing or state-of-the-art system for dynamic reconfiguration has several limitations. First, multiple reconfigurations can not be synchronized. When a second reconfiguration is initiated at the same time a reconfiguration is in progress. dynamic, the presence of a stepwise configuration on any group node acts as a lock preventing the initiation of a new dynamic reconfiguration event.

Second, the state-of-the-art system can not be used to effect dynamic changes when multiple computer program components are involved in the application of different parts of the changes to the configuration. When a dynamic configuration change involving multiple computer program components fails, the changes already made up to the time of the failure must be re-executed. This is much more complex than dynamically changing a single component, and reverting to a previous configuration if the attempted configuration change fails. Therefore, the changes that can be made dynamically are limited. It would be desirable, therefore, to provide a multiprocessing system of groups with support for dynamic changes involving multiple components of computer programs, and for multiple dynamic reconfigurations of syn- onization. It would also be desirable to coordinate dynamic configuration changes with other events in a 'system and make dynamic changes in a secure manner; in case of failure. It is therefore an object of the present invention to provide an improved system of multiprocessing of groups.

Ethernet, Token-Riñg, FDDI, or an optical serial channel i connector network (SOCC). A serial network may also provide point-to-point communication between nodes 104-110, used for message control and latent traffic in the event that an alternative subsystem fails. As described in the exemplary embodiment, the i system 102 may include some level of redundancy to eliminate unique failure points. For example, each node 104-110 can be connected to each public network 112-114 by means of two network adapters: a service adapter that provides the primary active connection between a node and the network and an adapter eg reserve that replaces the adapter of service in the event that the service adapter fails. in this way, when a resource within the system 102 becomes unavailable, the alternative resources can be quickly replaced by the resource that has failed. Those of ordinary skill in the art will appreciate that the computer equipment described in the exemplary embodiment of Figure 1 may vary.

For example, a system may include more or fewer nodes, additional clients, and / or other connections not shown.

Furthermore, the present invention can be implemented within any computer program that uses configuration data and does not need to support dynamic changes in such data. Systems that provide high availability are used solely for the purpose of illustrating and explaining the invention. With reference to Figure 2, a waiting list structure is illustrated which can be employed by a process for dynamically re-configuring a highly available multiprocessing system that involves multiple computer program components in accordance with a preferred embodiment of the present invention. .. SSee requires coordination in the processing of events - typically malfunction events (or "redirection") and recovery events (or "reintegration") - related to highly available resources. Such coordination is provided by a duplicate event waiting list. The waiting list of event in the exemplary modality is a duplicate waiting list maintained by means of a coordinating component of the "high availability" group computing program: the coordination component is a distributed entity that has a demon that each node runs within the group The coordination component subscribes to other components of the high availability group computing program such as a component to handle adapter and node failures, an i component to handle forced redirections by a system administrator, and / or a component for which it is being used in various phases of the process of a given event, resulting in incorrigible or incorrect behavior.The wait list structure 202 that can be extended, described, includes a plurality of Wait list entries 204 together with an indicator 206 to a first waiting list entry and an I indicator 208 for a final wait list entry.

The waitlist structure 202 also includes flags 210, which can be used to dynamically reconfigure a highly available data processing system that involves multiple components of computer programs. Each wait list entry 204 may include an event name (such as "node_up") and a priority. Priority classes can be used to manage all events that relate to the nodes are assigned a primary priority while all events that relate to adapters are assigned a secondary priority and all events that are related with the application servers are assigned a tertiary priority. Each waiting list entry 204 may also include an identification node, a time stamp, indicators to the next waiting list entry and the previous waiting list entry, a type of configuration in the event waiting list duplicated In exemplary mode, the duplicate event waiting list is the same waiting list used for recovery and failure events. A separate event waiting list can be used for configuration and change events, but would still require coordination with the existing event waiting list. The wait list may contain other events that have already been scheduled and / or other events that have a higher associated priority than the configuration change event I. [For example, a previous configuration change event may be in progress, or it may be assigned to a recovery event or fails a higher priority and processed before the configuration change events. The process then proceeds to step 308, which illustrates a determination of whether the configuration change event can be processed later. At a minimum, this stage requires a determination of whether event B is complete for which processing had already begun at the time the configuration change event was initiated. Depending on the particular implementation, this stage may also require a determination of whether the waiting list contains other events that have a higher priority than the configuration change event, such as parts of the change [of configuration, then they can apply the PRIORITY portion of the configuration change undo the transaction and can be achieved in a similar way. Depending on the method to apply a configuration change, a computational program component can be reinitialized under the old configuration or ba or a reverse scan through a register transition operation. Once the configuration is restored, the process goes to the. step 320, which describes the notification to the user of the failed configuration change transaction. The indication used to provide such an aiviso may include information regarding the reason why the configuration change transaction failed, including an identification of the computational program component in which the transaction failed. From this information, a system administrator can correct the problem and restart the configuration change. From step 320, the process is directed to step 322, which illustrates the resumption of event processing. Within this path of the described process, the event processing resumes under the old configuration. Referring again to step 314, once the transaction is successfully completed, the process continues to step 1, 322, described above.

In this access route of the process described, however,

Claims

synchronize each portion within the sequence of ordered portions using flags. The method according to claim 1, further characterized in that the step of effecting the configuration change transaction in a sequence of ordered portions further comprises: resetting at least one computer program component within the plurality of components of computer program with a new configuration. The method according to claim 1, further characterized in that the step of effecting the configuration change transaction in a sequence of ordered portions further comprises: executing a transition operation from the previous configuration to a new configuration in at least one component of the computer program within the plurality of computer program components; and record the transition operation. 6. The method according to claim 1, further characterized in that the step of initiating a configuration change transaction involving a plurality of computer program components further comprises: creating a copy of the above configuration; An apparatus to support dynamic configuration changes e? a multiprocessing system of groups, characterized in that it comprises: means of transaction initiation to initiate a configuration change transaction involving a plurality of computer program components at the same time as the multiprocessing system of groups is running; transaction execution means for effecting the configuration change transaction in a sequence of ordered portions, each portion is applied by means of a computer program component within the plurality of computer program components; I and restoration means, which respond to the detection of fail D in the configuration change transaction, restoring a previous configuration. 10. The apparatus according to claim 9, further characterized in that the restoration means further comprises: means for effecting in reverse order the succession of ordered portions. The apparatus according to claim 9, further characterized in that the means of executing the transaction further comprises: 17. A computer program product for use with a data processing system, characterized in that it comprises: a computer means; first; instructions in the useful computer means for initiating a configuration change transaction involving a plurality of computer program components; second instructions in the computer's useful means for effecting the configuration change transaction in a succession of ordered portions, each portion is executed by means of a computer program component with the plurality of computer program components and third instructions in the useful means of computer, which respond to the detection of the configuration change transaction failed, restoring a previous configuration. 18. The computer program product according to claim 17, further characterized in that the third instructions further comprise: instructions to perform in reverse order the succession of ordered portions. 19. A group multiprocessing system, characterized in that it comprises: a plurality of nodes connected by at least one network, each node within the plurality of nodes includes a memory containing information of the configuration for J the multiprocessing system of groups; multiprocessing system computation program of groups developing in each node, the purchase program: initiates a configuration change transaction involving a plurality of component! js of the computation program at the same time as the multiproce system runs; Groups of groups; and i in response to the detection that the configuration change transaction failed, restore a previous configuration. 20. The group multiprocessing system according to claim 19, further characterized in that the group multiprocessing computation program performs the configuration change transaction in a succession of ordered portions, each portion is applied by means of a computer program component with the plurality of computer program components.