GB2492320A

GB2492320A - Recovering the state of a computer system from a saved state and a copy of the input signals since the last saving of the state.

Info

Publication number: GB2492320A
Application number: GB1110491.6A
Authority: GB
Inventors: Edmund Richard James Pringle
Original assignee: Metaswitch Networks Ltd
Current assignee: Metaswitch Networks Ltd
Priority date: 2011-06-21
Filing date: 2011-06-21
Publication date: 2013-01-02
Anticipated expiration: 2031-06-21
Also published as: GB201110491D0; WO2012175951A1; GB2492320B

Abstract

Disclosed is a method of recovering a computer-implemented process, the process being computer code loaded into a working memory and processed by at least one processor. The process receiving signals that initiate internal procedures. The method having the steps of saving a state of the process by copying the data from the working memory allocated to the process, recording the signals input to the process since the saving of the state. When a command to recover the process is issued the process is restored to the saved state and the recorded signals applied to the restored process to complete the recovery process. The recorded signals may be filtered and /or reordered before being applied to the restored process.

Description

Process Recovery Method and System

Field of the Invention

The present invention relates to process recovery in computing systems.

It relates more specifically, but not exclusively, to a method and system for recovering a process that involve saving a state of the process together with signals input to the process. It may be applied to control processes in data communication networks.

Background of the Invention

Data communications networks typically require capabilitics to cope with system failure. This system failure may take the form of an error in a control process being implemented by computing hardware. For example, the control process may form part of routing software for an internet protocol (IP) network. The control process may be implemented by a server or embedded processor.

In network applications there is typically a requirement for high availability of network systems. For example, a network system may need to be available for 99.999% or 99.9999% of the time it is operational per year (so-called "five nines" and "six nines" availability). This applies to both data and voice communications. However, due to the complexity of modern computing systems, systcm failure cannot be completely avoided. This is especially true in data communications networks, as these may comprise a multitude of hardware devices operating in real-time on large traffic loads at multiple control levels.

To achieve a high level of availability a number of methods have been proposed to handle system failure.

A first method of handling system failure is known as "graceful restart".

According to this method, when a control process encounters an error, it is restarted. Typically, this involves initiating a specific restart routine developed as part of the control-process program code. After being restarted, the control process then communicates with other concurrent processes to determine its state at the time of the error. Once this information is received from the other concurrent processes thc control process continues its opcration. As a control process typically performs a system function, an engineer would need to examine the function of the process to determine the restart communications that need to be performed to rcstore thc control process to its functional state bcforc the failure. This examination is typically performed when developing the program code for the process, before the process is operational.

A second method of handling system failure is through fault tolerance.

According to this method, a second, backup copy of the control process may be run in parallel with the original process. The original proccss may then replicate important state information and send it at regular intervals to the backup copy. If the original process fails the backup copy is able to take over. A computer system that uses this method of fault tolerance is described in US Patent 6,769,073 Bi.

These methods of handling system failure have been successful in the past but they also both present a number of problems. A first problem is that bespoke program code often needs to be written for each control process to handle failures of its peers and its own recovery. For example, the restart procedure and the communications with other concurrent processes may vary depending on the control process in question, which in turn may depend on the hardware being controlled. When program code for a new process is written, corresponding recovery program code needs to be written. When program code for an existing process is updated, the recovery program code for both the existing process and any peer processes also needs to be re-written to handle any changes to the process function. As the state of a process changes over time, the recovery program code needs to include code that depends on the state that the process must leam from its peers in order to recover sufficient internal state to function.

In complex control functions, it may be laborious to determine all the interactions a control process requires for a restart and the sequence required to correctly restore a state of the process. In certain implementations it may be difficult to successfully inform all concurrent process of a process failure. A sccond problem is that any restart or backup process may still fail. As the original control process has failed, it is likely that the cause of the failure will be reflected in the current state, or some prior state, of the process. For example, as the control process communicates with other concurrent processes it may receive data or a sequence of data that precipitates another error. Alternatively, even if a backup process takes over, it may encounter the same conditions that caused the original process to fail.

In practice, developing program code to successfully restart a control process is time-consuming and requires engineers to determine and write a bespoke recovery sequence. This bespoke recovery sequence is then applied to future failures. There is no guarantee though that a solution developed in this way will always work for future failures without further modification.

It is therefore desirable to provide an improved method and system for handling system failure to enable high availability.

Summary of the Invention

In accordance with a first aspect of the present invention, there is provided a method for recovering a computer-implemented process, the process being implemented by computer program code loaded into working memory and processed by one or more processors, the process receiving one or more signals to initiate one or more internal procedures, the method comprising: saving a state of the process by copying data from working memory allocated to the process; and recording one or more signals input to the process since the time of saving the state; whereby, following a command to recover the process, said process is restored to the saved state and one or more of the recorded signals are applied to the restored process.

One or more of the recorded signals may be applied in a number of different orders and/or following a filtering step. An order for the application of the signals may be predetermined or determined dynamically, i.e. based on changing data.

The method may further comprise: following the command to recover the process, queuing one or more subsequent signals input to the process; forwarding said subsequent signals to the process once the one or more recorded signals have been applied.

The steps of saving a state of the process and recording one or more signals may be repeated at predetermined time intervals. The predetermined time intervals may be configurable by a user. They may be based on an interval determined by a predetermined number of processor cycles or an interval determined by a predetermined number of signals input to the process.

The step of saving a state of the process may comprise saving any control data that has changed from a previous saved state.

The rccorded signals input to the process since the time of saving a selected state may be discarded when saving a subsequent state. The recorded signals may be filtered before being applied to the restored process.

The method may further comprise following the command to restore the process, rescheduling one or more other processes to accommodate the recovery of the process. Rescheduling one or more other processes may comprise one or more of pausing, stopping, starting or restarting one or more other processes.

The process may be one of a group of processes and the one or more rescheduled processes belong to said group, the one or more recorded signals that are applied being one or more recorded signals received from outside of the group.

One or more of the recorded signals may be re-ordered before being applied to the restored process or applied in the order in which they were recorded. The one or more of the recorded signals may be delayed before being applied to the restored process.

The method may further comprise: monitoring the process for an error; and on detection of an error, sending the command to recover the process.

The process may comprise a control routine for a data communications network and may be implcmented by an operating support system, the operating support system handling memory management and signalling for the process.

In accordance with a second aspect of the present invention, there is provided a system for recovering a computer-implemented process comprising: an operating support system with access to memory and signalling for thc process, the operating support system comprising: a state manager arranged to save a state of the process from memory; a signalling component arranged to record one or more signals input to the process since the time of saving the state; and I 5 a restoration manager arranged to, responsive to a command to recover the process, restore the process to the state saved by the state manager and apply one or more of the signals recorded by the signalling component to the restored process.

The operating support system may handle memory management and signalling for the process. It may provide an interface between the process and an operating system and/or one or more operating system sewices.

The system may be adapted to utilisc the method variations set out above.

In accordance with a third aspect of the present invention, there is provided a computer program comprising computer program code adapted to perform the method and method variations when the program is run on a computer.

In accordance with a fourth aspect of the present invention, there is provided a node in a data communications network comprising: one or more processors; a working memory; and an operating platform arranged to provide an implementation of a plurality of processes by loading computer program code associated with said processes into the working memory for execution by the one or more processors, wherein, in operation, each process is configured to receive one or more signals from one or more of the plurality of processes, the operating platform being further arranged to, for a particular one of the plurality of processes: save a state of the process by copying data from the working memory allocated to a process; record one or more signals input to the process since the time of saving the state; and responsive to a failure in the implementation of the process, restart the implementation of the process with the saved state and subsequently apply one or more of the recorded signals.

One of the plurality of processes may comprise an interface for receiving one or more signals from one or more other nodes in the data communications network. The node may form part of a network system wherein each node may be arranged to receive one or more signals from one or more of the plurality of nodes. The network node(s) may be arranged to utilise the method and system variations.

Further features and advantages of the invention will become apparent from the following description of preferred embodiments of the invention, given by way of example only, which is made with reference to the accompanying drawings.

Brief Description of the Drawings

Figure 1 is a schematic illustration of an exemplary operating support system that may be used in the implementation of an embodiment of the present invention Figure 2 is a schematic illustration of communications between two exemplary processes; Figure 3 is a schematic illustration of communications between an exemplary process and an external component; Figure 4 is a schematic illustration of memory management performed by the operating support system; Figure 5 is a schematic illustration of exemplary elements of the operating support system according to an embodiment of the present invention; Figures 6A and 6B are flow diagrams showing two exemplary methods for storing process information for process rccovcry according to an embodiment of thc prcsent invention; Figure 7 is a flow diagram showing an exemplary method of recovering a process according to an embodiment of the present invention; Figure 8 shows examples of stored process information according to an 1 5 embodiment of the present invention; Figure 9 is a schematic illustration of the life of an example process to which an embodiment of the present invention has been applied; Figure 10 is a schematic illustration of an exemplary method of recovering a process according to an embodiment of the present invention; and Figure 11 is a schematic illustration of an exemplary method of rccovcring a process using parallel processing according to an embodiment of the present invention.

Detailed Description of the Invention

Embodiments of the present invention provide an improved method and system for handling system failure to enable high availability. In one embodiment, state data for a computer-implemented process is copied from memory and stored. Inputs into the process are then recorded to generate a "rewind queue". If the process fails, for example due to an error, the state data may be used to restore the process, in one embodiment by copying the state data back into the memory space for the process. The recorded inputs may then be forwarded to the restored process in order to return the restored process to a functioning state without interrupting concurrent processes. The recorded inputs may be selectively applied to the restored process. By reapplying the recorded inputs, in some embodiments one at a time in order, the "rewind queue" automatically generates program code to appropriately recover the process, avoiding the need for laborious bespoke solutions. In some embodiments, the features described herein contribute to an improved data communications network, for example a node in the network with higher availability, by recovering a process in a manner that is reliable and hidden from, i.e. not detected by, other concurrent control processes.

A number of embodiments will now be described in the context of an exemplary system. However, the present invention is not limited to the exemplary system and other systems that provide similar functionality may alternatively be used.

Figure 1 shows an operating environment 100 provided by an operating support system 120. The operating support system 120 may be used to implement embodiments of the present invention. The operating support system provides support for one or more processes 130. In the present example, three processes 130A, 130B and 130C are shown. A process generally performs a well-defined function, such as control processing for network data. The operating support system 120 is arranged to interact with system facilities of an operating system 110 through interface 115. In this manner, the operating support system 120 isolates the processes 130 from the operating system 110.

The processes 130 thus do not need to be adapted to run on a particular operating system 110 as they run within the operating support system 1 20. The operating support system 1 20 comprises a number of components. These components provide operating ifinctionality for the processes 1 30. In the present example, the operating support system 120 comprises a messaging and scheduling component 120A, a memory management component 120B, a diagnostics component 120C, a set of utilities 120D and a testing component 120E. Depending on the implementation, not all components need be provided and additional components may be provided as well as, or instead of, those illustrated in Figure 1. An internal interface 125 may be provided between the processes 130 and thc components of the operating support system 120.

The processes 130 of Figure 1 are configured to be run within the operating environment 100 provided by the operating support system 120. For example, the processes may comprise computer program code that is arranged to be processed by one or more central processing units (CPUs). As these CPUs are under control of the operating system 110, the operating support system 120 is arranged to provide a first level of processing (i.e. a platform) for the computer program code implementing a process 130 and then pass appropriate instructions to the operating system 110 via interface 115. These instructions may be stored in working memory managed by the operating system 110 in the usual manner. In certain embodiments, operating system 11 0 may not be required, as the operating support system 120 may additionally be arranged to provide operating system services in place of the operating system. Ta other embodiments, a combination of these two approaches may be used: for certain services the operating support system 120 may pass the required information to the operating system 110 and for other services the operating support system may interact directly with hardware, bypassing the operating system 110.

The operating support system 120 is responsible for deciding how processes 130 implemented in the operating support system 120 are mapped to processes or services providcd by the operating system 110. This mapping may be one-to-one, one-to-many, many-to-many or many-to-one. The operating system 110 may comprise, amongst others, Linux, UNIX, Solaris, Windows, Chorus, Nucleus, and VxWorks. The operating system 110 and/or the operating support system 120 may be adapted to operate on a wide range of hardware devices, from embedded processors such as the Motorola 68000 family to server-based processors such as those based on the SPARC architecture. Tn some embodiments the operating support system 1 20 may comprise the operating system 110.

The processes 130 typically operate independently within the operating environment 100 provided by the operating support system 120. Processes 130 may communicate with one another using a signalling scheme implemented by the operating support system 120. In this signalling scheme messages are passed between processes 130 under the control of the operating support system 120.

These messages may be referred to as Inter-Process Signals or IPSs. Typically, the format of the signalling is set by the operating support system 120 and the processes 130 are developed to use this format. In the example of Figure 1, the messaging and scheduling component 120A is responsible for co-ordinating the sending of IPSs between processes. It is shown in more detail in Figure 2.

The operating support system 120 also provides an interface to external components through one or more components referred to herein as "stubs" 140.

The external component may be another hardware device, i.e. a hardware device other than the one or more hardware devices implementing the operating system 110, operating support system 120 and processes 130. For example, the hardware device may be, amongst others, another network node or a routing device. The stubs 140 themselves may comprise functions for specific interactions with these hardware devices, such as sending andlor receiving packets from other nodes or updating routing tables (discussed below).

Processes 130 may communicate with the stubs 140 using I PSs. Communication with external components is shown in more detail in Figure 3. In the example of Figure 3, the stub has two main subcomponents: a stub process 165 that may operate in the same way as processes 130 and a stub initiator 170 that handles communications with external components 180. Stubs may be written to interface with a particular external device. The stub process 165 translates IPSs into appropriate communications for the external device and the stub initiator receives communications from the external device and translates them into IPSs. Hence, communications from external components or devices are accessible by the operating support system 120 in the form of IPSs generated by the stub initiator 170.

Three exemplary processes will now be described. These provide an example of the kind of function a process may perform but should not be seen as limiting. Three processes are described for conciseness and many different kinds of process may be implemented in real-world systems.

A first exemplary process is an Open Shortest Path First (OSPF) neighbour-manager component. This is an Internet Protocol (IP) routing component for handling communications with neighbour nodes in a data communications network. It operates based on the OSPE adaptive routing protocol. The neighbour nodes are neighbours in a communications sense to a node upon which the process is implemented, such as a routing device or a communications server. The OSPF neighbour-manager component sends and receives IP packets to neighbours via a "sockets" stub component 140. These IP packets are packaged as one or more IPSs. It may also provide OSPF network information to other processes, again as one or more IPSs.

A second exemplary process is an OSPF protocol manager. This process holds a database of OSPF information, for example that received from one or more OSPE neighbour-manager components. This OSPF information is used to calculate routing information. The OSPF protocol manager sends and receives network information to the OSPE neighbour manager. Again, this network information is packaged as one or more IPSs.

A third exemplary process is a routing manager. This process holds network routes generated by the OSPF protocol and programs an internal forwarding table for a routing device. It may receive routes from an OSPE protocol manager in the form of IPSs. The routing manager then comprises internal procedures to program the received routes into a forwarding table of a routing device via a stub component 140 adapted to provide forwarding table information to a routing device. The routing manager may also be arranged to receive manually programmed routes through a user interface. These manually programmed routes may be sent to an OSPE protocol manager via one or more IPSs. The OSPF protocol manager may then advertise these routes using its own procedures and IPSs.

As the operating support system 120 provides memoiy management for processes 130, it has access to the memory contents for those processes.

Similarly, through the use of TPSs and memory and scheduling component 120A, the operating support system 120 has access to signalling inputs provided to a process 130. These two properties make the operating support system 120 suitable to implement an embodiment of the present invention.

Figure 2 schematically shows communication between two processes 130A and 130B using the operating support system 120. Each process has process data 135, which may comprise an internal state of the process that is stored in memory allocated to the process. In the example of Figure 2, the processes have a number of procedures that can be initiated by a signal from another process or the operating support system 120. These may be one or more of initialisation procedures, create procedures, receive procedures, verification procedures, destroy procedures and timer procedures. In Figure 2, process B 130B has three initiating procedures: X, Y and Z. Each of these procedures may, for example, signal to process A 130A using an IPS. These signals are forwarded to process A 130A via internal interface 125 and messaging and scheduling component 120A. Process A 130A has three corresponding ports or channels for receipt of signals: X', Y' and Z'. Process A 130A also has a self-signalling procedure W. In use, process B 130B may signal to process A 130A using the signalling pathway X-X'. This may represent, for example, one of: a "create call" from process B 130B, which initiates a corresponding "create procedure" in process A l3OA; a "send data" signal from process B 130B, which sends data and is received by a "receive procedure" in process A I 30A or a "destroy call" from process B 130B, which initiates a "destroy procedure" that may desfroy process A 130A. One or more of the signals sent between the two processes 130 may be queued using a queue 150 implemented by messaging and signalling component 120A. As the messaging and signalling component 120A receives IPSs, in some cases all messages and signals between processes, it is in a position to record those messages and signals to implement embodiments of the present invention.

As described above, a process 130 may be arranged to perform one or more procedures in responsc to an IPS. These procedures may, for example: directly or indirectly store at least a portion of data passed in an IPS; modir an internal state of a process as a result of an IPS, while not directly storing data contained in thc IPS; examine and return information from an internal state of a process in a further IPS, for example, an IPS may indicate a query for data that the process is storing; output a further IPS dependent on an internal state of a process: for example, a process in a softswitch handling a Session Initiation Protocol (SIP) call, receiving an IPS indicating a SIP "180 ringing" message from a callee may cause a process to look up where to send the message on to and to generate an IPS informing a further process of the "180 ringing" state to send on to the caller without modi'ing the intemal state of the process; or change how a process behaves on receipt of future IPSs.

These are simply a representative example of the functions performed by a procedure of a process. Procedures may include other functions, including any combinations of any of the above examples.

Figure 4 shows an exemplary memory management system that is provided by the operating support system 120, Typically processes 130 only have access to a limited set of data, for example one or more of: a fixed sized control and/or data block allocated to the process when it is created by the operating support system 120; IPSs it has generated or received; and dynamically allocated, self-managed control block information. The operating support system 120 may be configured to support either static or dynamic memory allocation by the operating system 110, depending on which operating system is used. Figure 4 shows two exemplary components that may form part of memory management component 120B: buffer manager 210 and memory manager 220. Processes 130 with the operating environment 100 and external components 180 via stubs 140 may obtain and release control block memory using memory manager 220. Typically the memory manager 220 is arranged to request portions of working memory, such as random access memory (RAM), virtual memory or other volatile storage, managed by the operating system 110.

In some embodiments, the memory manager 220 may be adapted to reserve memory directly. The memory manager 220 performs memory mapping between process memory such as control blocks and memory as managed by the operating system 110. Typically, memory for a process in the form of control blocks is managed using handles as identifiers for the control blocks, but in alternate embodiments direct pointer references may be used. These control blocks may represent the internal state of a process. They may, for example, comprise the data space in memory that is allocated to a process. Transient memory allocation in the form of buffer storage may also be handled in the operating support system 1 20 by buffer manager 21 0, either alone or in cooperation with memory manager 220.. As memory manager 220 has access to memory, in some eases all memory used and allocated to a process, it is in a position to save or copy the contents of memory at a particular time for a particular process and hence implement the saving of a process state according to embodiments of the present invention.

Buffer memory is typically used for messaging purposes, for example to transport and/or store IPSs. IPSs may be stored in buffer memory in the form of a contiguous control part and a non-contiguous data part. The data part typically contains data that either has been received or will be transmitted across a physical connection to another device, while the control part contains information for internal processing by the operating support system 120 and its processes 130. The data part can take any system-specific format and may be omitted if not required. The control area of buffer memory may store an IPS header, a packet header and the control part. The IFS header may comprise information identifying a recipient and/or destination process that may, for example, be used to implement the filtering described below, The packet header may provide a reference to information in the header of a data part stored in a data buffer. The data buffer may also store information in a tail or footer. In some embodiments, to send data, signals and/or messages from a first process to a second process, rather than copying data between processes, the ownership of portions of the buffer memory may be changed from the first process to the second process. In other embodiments, or in addition to the example above, an IPS may comprise a programming function call from one process to another with a number of parameters, whcrein an operating system or operating support system is able to intercept this and store any parameter information to record a signal.

Figure 5 is a schematic illustration of an exemplary system for recovering a process according to an embodiment of the present invention. The system is implemented as part of the operating support system 120 and comprises a chcckpoint manager 310, a signalling compondnt 320 and a restoration manager 330. The checkpoint manager 310 may form part of memory management component 120B, the signalling component 320 may form part of messaging and signalling component 120A and the restoration manager may form part of either the diagnostics component 120C or utilities 120D. In other embodiments, the checkpoint manager 310, signalling component 320 and restoration manager 330 may be provided as stand-alone components, either as part of or separate from the operating support system 120, or their functionality may be combined in a single implementing component. If the components arc implemented separately from the operating support system 120 they arc arranged to access memory and signalling for a process, either through communication with the operating support system 120 or by direct access to memory space for the process and its signals.

The checkpoint manager 310 is arranged to save a state of a process. In some embodiments, the checkpoint manager 310 may be arranged to copy the entire contents of all memory assigned to the process. For example, the checkpoint manager 3 1 0 may request from memory manager 220 the memory contents for a particular process. In other embodiments, the checkpoint manager 310 may be arranged to copy a subset of the memory assigned to the process, for example all essential control blocks or all operating data. The memory contents need to enable the state of a process to be restored. When the term "from memory" is used this may refer to data retrieved from memory control structures managed by the memory manager 220 or data retrieved from memory via the operating system 110.

To reduce the amount of processing and the amount of memory required to save a state, the checkpoint manager 310 may be arranged to only copy control blocks that have changed from a previous saved state. If the state of a process is reasonably stable over time, this variation may enhance efficiency.

The checkpoint manager 310 is typically arranged to save the state of a process at predetcrmined time intervals. Thesc time intervals may be adjustable by a user such as a system administrator or may be set dynamically, for example, depending on processor and/or memory load. The time intervals may be set in reference to a particular number of processor cycles, e.g. every 10,000 cycles across one or more CPUs, or in reference to a particular number of messages received by a process, e.g. afier every 1000 messages. They may also be defined as a preset time value, e.g. every X minutes or seconds. Tn other embodiments, the time the checkpoint manager 310 saves a state may be dependent on the activity of the process; for example, a state may be saved when the process is idle or in a steady state. This variation helps increase the likelihood that the saved state is stable. Different methods of calculating the time intervals may be logically combined: e.g. every 10,000 cycles or whenever idle for more than 1 minute.

Setting the length of the time interval between process-state saves is typically a trade-off between the speed of recovery and resource availability. To decrease storage requirements certain previously-saved states may be discarded when a new state is saved. The number of saved states to maintain in storage is again dependent on available resources. This differs from known solutions that comprise bespoke code for a restart operation; in these solutions the code was developed then run once, there was no option to configure the operation. In practice, the checkpoint manager 310 is likely to be responsible for saving the states of multiple processes, in one case all operational processes with the operating environment 100.

The signalling manager 320 is arranged to save the signa's that are input to a process following the saving of a state by the checkpoint manager 310. For example, the checkpoint manager 310 may notif' the signalling manager 320 once a state has been saved, and the signalling manager 320 is then arranged to save a copy of all IPSs sent to the process in question. The signalling manager 320 may have access to the IPSs through messaging and signalling component 120A, as, for example, all IPSs may be arranged to be transmitted via messaging and signalling component 120A. In this case, the signalling component 320 need only monitor for IPSs tagged with an address of a process. If such IPSs are detected they are copied to a storage location or queue structure.

If the checkpoint manager 310 saves a subsequent state, for example after a predetermined time interval, it signals to the signalling manager 320. The signalling manager 320 may then discard all signals recorded during the time interval from the storage location or queue structure and begin recording a new set of signals. In some embodiments, longer signal histories may be stored, for example relating to N previous states, again depending on available resources.

Figure 5 further shows a restoration manager 330. The restoration manager 330 is arranged to recover a process. The restoration manager 330 may be initiated following a command from a component (not shown) arranged to monitor for process failure or error. This component may detect an error in a process and automatically send a command to the restoration manager 330 to recover the said process. Ahernativcly, this ifinctionality may be implemented as part of the restoration manager 330. The restoration manager 330 may also be arranged to recover a process for a reason other than an error or failure of the process, for example, it may be initiated based on the failure of a different process, based on a manually-entered command or as part of a testing and/or diagnostics routine.

To recover a process the restoration manager 330 may first be arranged to stop the process, if it has not been stopped by a failure or error. This may be achieved using a standard command of the operating support system 120. The restoration manager 330 is then arranged to restart or restore the process using the most recent saved state. This may comprise overwriting memory reserved for the process with the saved memory contents and may be achieved using memory manager 220. The restoration manager 330 may also be arranged to divert and/or queue all new signal inputs into the process. For example, it may send an instruction to the message and signalling component 120A to implement a queue 150 for all signals or messages addressed to the process.

As well as restoring a process to a saved state, the restoration manager 330 is also arranged to apply one or more input signals recorded by the signalling component 320 to the restored process. The recorded signals are applied in order. The restoration manager 330 may bc arranged to apply all the signals in order or to filter the recorded signals before application. Variations such as these are described with regard to Figures 10 and 11. When applying the recorded signals any outputs produced by the process may either be forwarded to other processes or discarded. The message and signalling manager 120A may be arranged to keep a record of IPSs sent to concurrent processes, for example using a unique process identifier. The message and signalling manager 120A may thus be adapted to discard duplicate signals, forwarding only those that were not initially sent by the restored process, or to only forward signals that have changed in some way as a result of filtering applied to the signals. Once the recorded input signals or messages are applied to the restored process, the restoration manager 330 is arranged to instruct the removal of the diversion and/or queuing of new input signals for the process received following the failure or stopping of the process. These new signals are then applied in order to the restored process following the application of the recorded signals. For example, the restoration manager 330 may instruct the messaging and signalling component 120A to begin to clear, in order, any queue 150 of signals for the process.

In variations of the present embodiment, as described below with regard to Figure 11, the restoration manager 330 may be arranged to perform multiple recovery operations in parallel or serially and/or a use a state other than the last state to recover the process. The state that is used to restore a process may be configurable based on a particular operating conditions for an implementation; for example, if five previous states are saved, the system may be configured to go back three states, and apply the signals recorded in the following three time intervals, and to change this if availability or some other metric varies from a thresholdvalue.

The restoration manager 330, in association with thc operating support system 120, may also be arranged to re-schedule other concurrent processes to facilitate the recovery operation. Re-scheduling may comprise, amongst others, pausing, restarting, stopping and starting processes. For example, the rcstoration managcr 330 may send a request to the operating support system U0 to pausc one or more concurrcnt processes so that processing load related to one or more CPUs can be directed to the recovery process. This can reduce the time required to recover a process, resulting in quicker process recovery.

The restoration manager 330, in association with the operating support system 120, may further be arranged to selectively re-schedule and/or restart associated processes. Processes may be grouped, for example using one or more group identifiers. These groups may be related to process frmnction; for example, all routing components for a particular network area may be grouped. Filtering and/or re-ordering may then be performed based on groups. Often a group of interrelated processes will send many signals within the group but may receive and or send few signals outside of the group. For example, setting up a particular telephone call may be initiated by one signal from a process outside of a group but may result in many signals being passed within the group. A recovery operation performed by the restoration manager 330 may thus comprise restarting all the processes within the group then filtering the recorded signals so as to only apply the signal from the process outside of the group. In this example, only a single signal needs to be applied, as opposed to applying all the many signals communicated within the group. In other examples, one or more signals originating from outside the group may be applied, in both cases resulting in greater processing efficiency. The recorded inter-group signals need not be re-applied (or maybe filtered out) as these will be re-produced by the group of processes following the application of the one or more recorded signals that are external to the group. This in turn results in quicker process recovery.

Checkpoint manager 310, signalling component 320 and restoration manager 330 have the combined effect of recovering the process in such a way that surrounding processes are unaware of the recovery or any precipitating failure. No additional communications with concurrent processes are required to determine the state of a process before failure, instead the combination of the "checkpoint" saved states and the applied recorded signals return the process to an operational level. The diversion or queuing of new signals follow[ng the down-time of the process avoids the need for any further input from the concurrent processes, and this can be implemented outside of said concurrent processes using the messaging framework of the operating support system 120.

The functionality of these components may be programmed once and yet applied to the recovery of all processes, avoiding the need for unnecessary bespoke coding or trial and error approaches. For example, rather than writing recovery program code anew for each process in order to request state from peer processes in order to rebuild an internal state of a process, the internal state at a point in time (a "checkpoint") is restored based on copied memory contents and external signals subsequent to the checkpoint are applied. According to one embodiment of the present invention a method of recovering a computer-implemented process is provided. Examples of this embodiment are shown in Figures 6A and 6B, and an exemplary recovery cycle is shown in Figure 7.

These methods may be implemented as computer program code that is arranged to be stored in memory and processed by one or more processors. These methods may be adapted to include or exclude any of the variations discussed above with respect to the system embodiments.

Turning to Figure 6A, at step 610 a state of a process is saved, e.g. a "checkpoint" is recorded. This may be performed as described above with regard to checkpoint manager 310. At step 620, inputs to the process such as signals or messages are recorded. This may be performed as described above with regard to signalling component 320. The method may then be repeated at predetermined time intervals, illustrated by the loop from step 620 to step 610.

Figure 6B shows a variation of this method that has additional step of discarding previous state and/or signalling data at step 615. This may be performed for the immediately preceding state and!or signalling data, or for state and/or signalling data recorded N checkpoints ago.

Figure 7 illustrates how this stored state and recorded signal data may be used to recover a failed process. Figure 7 shows an exemplary process and certain steps may be omitted if necessary. At step 710, the failure of a process is detected. At step 720, any input signals directed to the failed process arc queued.

At step 730, the last saved state of the process, for example, the memoiy contents for the process as saved in step 610, is used to restart or restore the process. In other embodiments, a saved state other than the last saved state may be used. At step 740, the signals to the process recorded at step 620 are applied to the process in the order they were recorded. These recorded signals may be filtered in some embodiments. Once this is complete, the signals queued at step 720 are unblocked or released and applied to the process. Following step 750, the process is restored and can continue to operate as usual.

As described through these exemplary methods, certain embodiments of the present invention need not be limited to the operating support system 120 shown in Figure 1, but may be applied in a variety of contexts to improve the recovery of a computer-implemented process.

Figure BA shows an example of at least a portion of a saved state. This example is presented to explain the features of embodiments of the present invention; it is a necessary simplification of one implementation of a saved state, other implementations may differ at least in structure, format and scale. Figure BA, for example, may represent a control block 800 that is stored in memory for a process. The state of a process may comprise one or more control blocks.

Typically, the control block 800 is a memory structure maintained by memory manager 220. Memory manager 220 may be arranged to communicate with operating system 110 in order to store the control block 800 in memory locations indicated and reserved by operating system 110. The example of Figure 8A shows one or more control variable identifiers 8] 0 and the associated control variable values 820. These control variables may represent, amongst others, internal variables used by the process, values indicating the state of external hardware managed by the process, values or data relating to network messages being processed by the process, and timing and configuration variables. In the example of Figure 8A, the control variable values are stored in hexadecimal format, but in practice any format suitable for memory storage may be used. The combined values of the variables indicated in Figure 8A may be said to comprise the state of the proccss in memory. For example, a process may be rcstorcd to a state by ovcrwriting values 820 in memory relating to a failed state with values representative of a previous-saved state. The control values for each saved state may be the same or may vary, for example as objects within the process are created additional control values may be added to a control block such as 800.

Figure 8B shows an example of a "rewind" list 830 of recorded signals for a process. This example is presented to explain the features of embodiments of the present invention; it is a necessary simplification of one implementation of recorded signals, other implementations may differ at least in structure, format and scale. Following the saving of a state, for example copying at least the control block 800, signals to a process, for example lPSs, may be copied and added to the list 830. The list 830 records the signal 850 and the time 840 the signal is received. The time 840 need not be recorded in all embodiments and the order in the list 830 may be representative of the order in time the signals were received by a process andlor sent by other concurrent processes. Signals may comprise, amongst others, one or more of: data, messages, packets, flag values, commands, function calls, and control information. Their function is to pass information between processes and components. In some embodiments the signals may comprise a message with a header speci'ing a recipient process and/or a particular sub-process, port, channel or queue within the recipient process. This header information may be used to filter signals such that those relating to a particular process can be identified, copied and added to a particular list 830 for the process.

As described above, in a variation of an embodiment of the present invention, recorded signals, such as those in list 830, may be filtered before being applied to a restored process. This filtering may occur at any stage between reeordal and application. In some embodiments, the filtering is performing when the signals are recorded, e.g. signalling component 320 may be arranged to only record signals with pre-determined or dynamically determined properties. En other embodiments, the filtering is performed when recovering a process before applying the recorded signals. The time of filtering may be set by a systems engineer based on the control functions of the processes andlor system resources; for example, filtering non-essential signals at the reeordal stage reduces the amount of memory required to implement the recovery system.

In some embodiments, filtering is accomplished based on information within the signal. For example, data identifying a particular telephone call in an Asynchronous Transfer Mode (ATM) or Voice-Over-Internet Protocol (VoIP) network may be included in signals passed between processes. When Session Initiation Protocol (SIP) is used, this data may comprise a call identifier (Call-ID). If the process comprises a call handling routine within a telecommunications network and the process fails or crashes handling a particular call, the recorded signals may be filtered to remove all inputs relating to that particular call. This is illustrated in Figure 8C, wherein signals "KiIl(G,T)" and "Data(pj,t)" are both related to an entity "T/t". These signals are then removed from the list, illustrated by strikeouts 860, by filtering out all signals containing a "T/t" identifier. This filtering may also apply to, amongst others, signals relating to packets within a particular data stream, packets for a particular destination or packets being sent by a particular route. As well as filtering based on a particular entity or item being processed, filtering may also be performed based on whether a signal is essential or required for consistency.

Any non-essential or non-critical inputs to a process may be removed from a "rewind" list before the recorded inputs are applied to a restored process. For example, if the process rcprcscnts a routing process in a data communications network that has an optional encryption mode, but that mode is not being used in a current implementation, any inputs relating to the encryption mode may be filtered from the list. In another example, fi]tered signals may relate to a particular network or network area. Typically, signals that result in output to other system components are maintained if this output is required for the stability of the system.

As well as, or instead of, filtering the recorded signals, certain variations of described embodiments may comprise re-ordering or delaying recorded input signals. This is useful in the case when one or more processes fail because they are attempting to perform two mutually exclusive or incompatible procedures at the same time. For example, one or more processes may handle a SIP call. If devices handling the SIP call at both ends terminate the call at the same time, messages or signals relating to the call may be input into a collection of one or more processes at the same time. This may cause conflict leading to the failure of one or more of the processes; for example, it may not be possible to handle two termination procedures from opposite ends of a SIP call within a common time window (even if that time window is measured in milliseconds). Following the restoration of a checkpoint state of the one or more failed processes, the recorded signals relating to a first process may be applied as normal but the recorded signals relating to a second process may be delayed such that a conflict within the time window is avoided. The ordering or re-ordering may be performed dynamically in response to system conditions.

In another example, two SIP calls may be placed within a certain time period. A common set of processes may handle both calls. However, the first call may result in a process state that leads to a failure when the second call begins. Recorded signals may be isolated, for example using the filtering described above, that relate to each call. These recorded signals may then be re-ordered such that the signals relating to the second call are applied to a restored process before signals relating to the first call. Re-ordering the calls by re-ordering the application of the recorded signals may avoid the process state that lead to the initial failure.

Figure 9 shows a worked example applying an embodiment of the present invention. Line 900 represents the operation of a process over time, with time running from left to right within the Figure. At certain "checkpoints" 910 in time a state of the process is saved. In the example, a state is saved at regular intervals: at checkpoints 910A, 910B and 910C. Memory values 800A, 800B and 800C represent the contents of memory at the three points 910A, 91DB and 910C. When these values are copied and saved they represent a saved state of process 900. During normal operation in the time between the checkpoints, signals are input into the process 900, as represented by arrows 930. The signals or other internal procedures in the process may lead to a change in state. The signals are recorded in lists 830.

At time 940 the process 900 fails. This may be identified by a component monitoring the health or progress of the process 900. For example, a lack of activity within a process, a certain flag or a signal from operating system or operating support system 120 may indicate an error has occurred. Process 900 may also be adapted to send a notification on encountering an error before it aborts and/or is destroyed. On detection of the error or failure, inputs 930D directed to the process that are received after the process 900 has stopped at time 940 arc queued by a queuing component 950 and, for example, stored in list 960.

Figure 10 shows a worked example of recovering a process according to an embodiment of the present invention. Following the failure or error at time 940, the last recorded state 800C is used by a restoration component 1010, which may be implemented by restoration manager 330, to restart the process 900. The recorded signals 830C between checkpoint 910C and time of failure 940 are then applied to the restored or restarted process. As described above, in certain embodiments, the outputs produced by the process following the application of the recorded signals 830C may be filtered or discarded to avoid duplicating signals that are sent to concurrent processes. In some embodiments, recorded signals 830C may be filtered before being applied. The result of restoration component 1010 is a restored process 900. Queued signals 960 are then applied to the restored process 900' to complete the recovery. Restored process 900' thus continues its operation, represented by the continuation of the process timeline, and the checkpoints are restored, as indicated by checkpoint 910D. Other concurrent processes and even a user or system engineer are unaware that the process was interrupted between points 900 and 900'. From an external viewpoint the process 900-900' appears continuous. This provides high availability, i.e. inputs 930D are still received and operated upon by process 900' and no data is lost.

Figure 11 shows a variation of the example of Figure 10 based on an embodiment of the present invention that restores multiple versions of a process in parallel. In Figure II, the process of Figure 10 is applied in parallel such that two recovery operations are performed. Unless indicated, the reference numerals from Figure 10 apply. Iii a first operation, a copy of the process is recovered in a similar manner to Figure 10. The recorded signals 830C are filtered such that all the signals are maintained, which produces input list 1030A. Signals from this list are applied to a restored copy of the process using the state information SOOC to produce a first recovered process 900'. In parallel, e.g. at substantially the same time, a second recovery operation is initiated. This is equivalent to the first recovery operation with the exception that different filtering is applied. In the second operation, the filtering produces a different list of recorded signals I 030B. For example, signals relating to a particular item of data being processed may be filtered from the list of signals. In Figure 11, signal F is filtered from list 1030B. As in the first operation, the filtered recorded signals 1030B are applied to a process that has been restored using state information SOOC to produce a second recovered process 900". Both recovered processes 900' and 900" arc monitored for a time foUowing recovery to check that future failure is avoided.

In the example of Figure 11, the first recovered process 900' encounters an error and fails at time 940'. The second recovered process 900", however, continues to operate successfully. The second recovered process 900" is thus selected as the official process version. Queued signals 960 are then applied and further checkpoints 910D are implemented. In other examples, the queued signals 960 may be applied to both recovered processes (e.g. 900' and 900") before momtonng for failure and/or deciding upon an "official" version of the process.

The example of Figure 11 is a necessary simplification. In practical implementations therc may be any number of recovery operations being performed in parallel, with recovered versions of the process that avoid future error or failure being candidate versions for selection of a final recovered process. In other embodiments the operations may also be applied serially rather than in parallel, i.e. the first recovery operation may be attempted; if that is unsuccessful, the second recovery operation may be attempted and so on. The multiple versions of the recovered operations may further use different saved states for the restoration. For example, the first recovery operation may use saved state 800C and recorded signals 830C; if this fails, then a second recovery operation may use saved state 800B and apply recorded signal sets 830B and 830C in time order. The recorded signal sets 830B and/or 830C may be filtered before application. For example, if signal C is a cause of the failure, a first recovery operation using saved state 800C and any combination of recorded signals 830C would fail (e.g. the combinations shown in Figure 11). However, if the methods of Figure 11 were applied to saved state 800B and recorded signals 830B, a second recovery operation that filtered signals 830B to filter out signal C would result in a successful recovery. Recorded signals 830C and queued signals 960 may then be applied to complete the recovery of the process. Hence, embodiments of the present invention allow a logical approach to process recovery that may be applied in a common manner to all process failures, avoiding the need for trial and error to recover a process.

The above embodiments are to be understood as illustrative examples of the invention. Exemplary variations are described above and these may be combined in any order or configuration. It is to be understood that any feature described in relation to any one embodiment may be used alone, or in combination with other features described, and may also be used in combination with one or more features of any other of the embodiments, or any combination of any other of the embodiments. Furthermore, equivalents and modifications not described above may also be employed without departing from the scope of the invention, which is defined in the accompanying claims.

Claims

<claim-text>claims 1. A method for recovering a computer-implemented process, the process being implemented by computer program code loaded into working memory and processed by one or more processors, the process receiving one or more signals to initiate one or more internal procedures, the method comprising: saving a state of the process by copying data from working memory allocated to the process; and recording one or more signals input to the process since the time of saving the state; whereby, following a command to recover the process, said process is restored to the saved state and one or more of the recorded signals are applied to the restored process.</claim-text> <claim-text>2. The method of claim I, further comprising: following the command to recover the process, queuing one or more subsequent signals input to the process; forwarding said subsequent signals to the process once the one or more recorded signals have been applied.</claim-text> <claim-text>3. The method of claim I or claim 2, wherein the steps of saving a state of the process and recording one or more signals are repeated at predetermined time intervals.</claim-text> <claim-text>4. The method of claim 3, wherein the predetermined time intervals at which the steps are repeated are configurable by a user.</claim-text> <claim-text>5. The method of claim 3 or claim 4, wherein the predetermined time intervals are based on an interval determined by a predetermined number of processor cycles or an interval determined by a predetermined number of signals input to the process.</claim-text> <claim-text>6. The method of any one of claims 3 to 5, wherein the step of saving a state of the process comprises saving any control data that has changed from a previous saved state.</claim-text> <claim-text>7. Thc method of any one of claims 3 to 6, wherein the recorded signals input to the process since the time of saving a selected state are discarded when saving a subsequent state.</claim-text> <claim-text>8. The method of any one of the preceding claims, whercin the recorded signals arc filtered before bcing applicd to the restorcd process.</claim-text> <claim-text>9. The method of any one of the preceding claims, further comprising: following the command to restore the process, rescheduling one or more other processes to accommodate the recovery of the process.</claim-text> <claim-text>10. The method of claim 9, wherein rescheduling one or more other processes comprises one or more of pausing, stopping, starting or restarting one or more other processes.</claim-text> <claim-text>11. The method of claim 9 or claim 10, wherein the process is one of a group of processes and the one or more rescheduled processes belong to said group, the one or more recorded signals that are applied being one or more recorded signals received from outside of the group.</claim-text> <claim-text>12. The method of any one of the preceding claims, wherein one or more of the recorded signals are re-ordered before being applied to the restored process.</claim-text> <claim-text>13. The method of any one of claims I to 12, wherein said one or more of the recorded signals are applied in the order in which they were recorded - 14. The method of any one of the preceding claims, wherein one or more of the recorded signals are delayed before being applied to the restored process.15. The method of any one of the preceding claims, further comprising: monitoring the process for an error; and on dctcction of an error, sending thc command to recovcr thc process.16. The method of any one of the preceding claims, wherein the process comprises a control routine for a data communications network.17. The method of any one of the preceding claims, the process is implemented by an operating support system, the operating support system handling memory management and signalling for the process.18. A system for recovering a computer-implemented process comprising: an opcrating support system with access to memory and signalling for the process, the operating support system comprising: a state manager arranged to save a state of the process from memory; a signalling component arranged to record one or more signals input to the process since the time of saving the state; and a restoration manager arranged to, responsive to a command to recover the process, restore the process to the state saved by the state manager and apply one or more of the signals recorded by the signalling component to the restored process.19. The system of claim 18, wherein the operating support system handles memory management and signalling for the process.20. The system of claim 18 or claim 19, wherein the operating support system provides an interface between the process and an operating system.21. The system of any one of claims 18 to claim 20, wherein the operating support system provides one or more operating system services.22. The system of any one of claims 18 to claim 21, wherein the restoration manager is further arranged to queue one or more input signals to the process that are received following the command to recover the process for forwarding to the process once the recorded signals are applied.23. The system of any one of claims 18 to claim 22, wherein the state manager and signalling component are arranged to respectively save a state and record one or more signals at predetermined time intervals.24. The system of claim 23, wherein the predetermined time intervals are configurable by a user.25. The system of claim 23 or claim 24, wherein the predetermined time intervals comprise intervals determined by a predetermined number of processor cycles or intervals determined by a predetermined number of signals input to the process.26. The system of any one of claims 23 to 25, wherein the state manager is arranged to save control data that has changed from a previous saved state.27. The system of any one of claims 23 to 26, wherein the signalling component is arranged to discard recorded signals relating to a selected state in response to the state manager saving a subsequent state.28. The system of any one of claims 18 to 27, wherein the restoration manager is arranged to filter the recorded signals before applying them to the restored process.29. The system of any one of claims 18 to 28, wherein the restoration manager is arranged to instruct the rescheduling of one or more other processes to accommodate the recovery of the process.30. The system of claim 29, wherein the restoration manager is arranged to instruct one or more of pausing, stopping, starting or restart[ng one or more other processes.31. The system of claim 29 or claim 30, wherein the process is one of a group of processes and the one or more rescheduled processes belong to said group, the one or more recorded signals that arc applied being one or more recorded signals received from outside of the group.32. The system of any one of claims i 8 to 31, wherein the restoration manager is arranged to re-order one or more of the recorded signals before they arc applied to the restored process.33. The system of any one of claims 18 to 31, wherein the restoration manager is arranged to apply the one or more recorded signals in the order in which they were recorded.34. The system of any one of claims i 8 to 31, wherein the restoration manager is arranged to delay one or more of the recorded signals before they are applied to the restored process.35. The system of any one of claims 18 to 34, further comprising: a monitoring component to detect an error in the process and send the command to recover the process to the restoration manager.36. The system of any one of claims 18 to 35, wherein the process comprises a control routine for a data communications nctwork.37. A computer program comprising computer program code adapted to perform all the steps of claims I to 17 when the program is run on a computer.38. A node in a data communications network comprising: one or more processors; a working memory; and an operating platform arranged to provide an implementation of a plurality of processes by loading computer program code associated with said processes into the worldng memory for execution by the one or more processors, wherein, in operation, each process is configured to receive one or more signals from one or more of the plurality of processes, the operating platform being further arranged to, for a particular one of the plurality of processes: save a state of the process by copying data from the working memory allocated to a process; record one or more signals input to the process since the time of saving the state; and responsive to a failure in the implementation of the process, restart the implementation of the process with the saved state and subsequently apply one or more of the recorded signals.39. The node of claim 38, wherein one of the plurality of processes comprises an interface for receiving one or more signals from one or more other nodes in the data communications network.40. A network system comprising a plurality of nodes according to claim 39, wherein each aodc is arranged to receive onc or more signals from one or more of the plurality of nodes.</claim-text>