CA2284573A1

CA2284573A1 - Process management infrastructure

Info

Publication number: CA2284573A1
Application number: CA002284573A
Authority: CA
Inventors: Richard Waclawik
Original assignee: Individual
Current assignee: Crosskeys Systems Corp
Priority date: 1997-03-14
Filing date: 1998-03-16
Publication date: 1998-09-24

Abstract

A process management infrastructure for use in a system for monitoring network performance comprising process tags for uniquely identifying each instance of a process in the system.

Description

Process Management Infrastructure This invention relates to a process management infrastructure system for use in an object-oriented programming environment, and in particular for use in a system for monitoring the compliance with service level agreements in a telecommunications network.
There is a need for a system to manage service level agreements (SLAs) between telecommunications service providers and their business customers. Part of the management process that relates to SLAs is the comparison of the service providers' performance vis-a-vis specific guarantees that it may provide to its customer.
In packet switched networks, unlike circuit switched networks, customers are not given a dedicated circuit; their data is statistically multiplexed with data from other sources.
Each customer pays for a particular level of service, and it is therefore important to ensure that the customer is receiving the level of service he has paid for. Our co-pending application of even date describes a system for monitoring network performance relative to customer service agreements.
The availability requirements for the system is very high. Downtime must be minimized at all costs. The system collects time based information from various network management systems and operation support systems. While the system is down, critical information can be lost. Various strategies are used within the system to minimize the possibility of losing information. One such strategy is to minimize down time.
The system indicates to various parts of the service provider's organization the level of service they are providing to their customers. Individuals in customer support, sales, network operations and senior executives rely on this information to make decisions.
The system is also expected to evolve to a point where the service provider's customers will have access to the system. System downtime can negatively impact the customer's perception of the quality of the service provider's operation.
To meet the above requirements, it is important that during system failures, the system automatically attempt to recover. Failing this, the system should notify system operators of the failure. The system should also have minimal downtime during deployment of new system functionality.

The sheer volume of data involved makes the task of managing the data quite daunting.
Typically a system might monitor several thousand customers involving the collection of ten million rows of data per day. Detailed real time events are typically kept for 180 days, summarized daily reports may be kept for 180 days, and summarized monthly reports for 18 months.
Object oriented database programming techniques are employed to handle such large volumes of data. The data is received from the network management system in real time through obj ects known as event collectors, which run processes under the control of a director.
In a typical system, a director must know store details about each instance of a process it is to run, such as executable file name, configuration and the like. This can lead to considerable downtime when a process fails or it is desired to change the system.
An object of the invention is to provide a process management infrastructure which is conducive to providing high availability of such a system with improved flexibility.
According to the present invention there is provided a process management for use in an object-oriented programming environment, comprising means for running a plurality of processes, a director for monitoring the operation of said processes, and a database for storing a plurality of process tags, each process tag uniquely identifying each instance of a process, said means for running said processes obtaining information required to run said processes by looking up the respective process tags therefor in said database.
The basic concept underpinning this infrastructure is thus the concept of a ptag or process tag, which uniquely identifies an instance of a program. Each back end process, such as an event collector, which gathers information from a network management system, such as a Newbridge Networks 46020 network management system (NMS) and database monitor generally will have its own ptag recording relevant information about the process, such as the executable name, start up indicator and arguments required to run the process.
The invention may run, for example, on a Unix-based Sun Sparc Ultra 2 workstation.
Normally dual processors should be used with a minimum of 256 Megabytes of RAM.
The invention can run, for example, over a TCP/IP network.

The invention is particularly applicable to a system for monitoring the compliance with service level agreements in a telecommunications network, but has also application in other environments. It allows added flexibility and permits fine control over different elements of the system. For example, a new network manager can be added to the system on the fly simply by starting up another instance of a process without the need to copy and rename executable files.
The invention represents an important technical advance in the management of communications networks.
The invention also provides a method of managing processes in an object-oriented programming environment, wherein process tags are used to uniquely identify each instance of a process in the system.
The invention will now be described in more detail, by way of example only, with reference to the accompanying drawings, in which:-Figure 1 is an overview of a system to which the management infrastructure may be applied; and Figure 2 is a block diagram of a process management infrastructure in accordance with the invention.
Refernng to the Figure 1, the service level management system comprises a command interface 1 and a Director 2 forming part of a system controller 20. The system controller 20 communicates with a Newbridge Networks 46020 network manager 14 and a data management framework 15 through back end processes 13, for example, event collectors and database monitors. The network manager 14 manages a packet switched network, or a fast packet switched network, such as an ATM or frame relay network. The system controller 20 as well as other processes write to log files 18. The log files are used by system utilities 19.
The process management infrastructure, is suitable for managing service level agreements in packet switched networks to ensure, for example, that customers are receiving the quality of service for which they have contracted. Unlike circuit switched networks, where bandwidth is dedicated to a particular customer, ins such networks bandwidth is statistically shared among a number of users, and it is important to ensure that customers are receiving the quality of service that they have paid for, for example, as determined by average throughput, peak rate and the like.
The infrastructure, which is implemented in object-oriented software, for example using C++, is designed to extract event information, for example, relating to the setting up of virtual connections, from a network manager, comprises a command interface ( rci ) 1, a deamon process, which provides the Director, a simple inter process communication component 3 to permit the exchange of messages between processes; a process configuration component 4; and a software logging component 5.
The system command interface 1 is the main user command interface to the process management infrastructure. From this interface, a system operator can issue commands to:
start the system. These commands could be:
~ shutdown the system ~ get status information on the processes currently running ~ start a specific process ~ stop a specific process ~ tell a process to reread its configuration ~ tell a process to change its logging level The following table sets out the effect of the various commands issued by the rci 1.
Command Description status The status command with the all option will pass an IPC (Inter Process Communication) message to the Director requesting the status. The Director will return this status message in a form of an IPC message. The message will be formatted such that rci will directly print the message out to the standard output.

The status command with a specified PTag will pass an IPC message directly to the PTag, that process will then return an IPC message containing specific information related to that process. The message will w0 98/42157 PCT/CA98/00231 be formatted for rci to print out to the standard output.

startup All startup IPC messages go through the Director process.

Upon issuing the startup command with the all option, rci will check to see if the Director is running. If the Director is not running then rci will start it up, then the Director will startup all the other back-end process specified in the Director's master process table.
If the Director is already running, then rci will not try to start up the Director.

When issuing the startup command with a specific PTag, rci will check to see if the Director is running. If the Director is not running then rci will not be able to issue the command to the Director and an error message will be logged. If the Director is running, then an IPC message is send to the Director to startup the specified PTag.

If rci receives an error sending the message to Director a error message will be recorded in its log file.

shutdown This command has message passing very similar to the startup command.

All shutdown messages will go through the Director process. If all is specified then an IPC message will be passed to the Director instructing it to shutdown all the back-end process. Then the Director will shutdown as well. If a PTag is specified, an IPC message is passed to the Director instructing it to only shutdown the process that corresponds to the PTag.

reconfig If all is specificed, an IPC message is sent directly to all processes. If a PTag is specified then an IPC message is sent directly to the specified process. No acknowledgment IPC message is returned.

log This command sends an IPC message to the process specified by the PTag to change the logging level of a process.

The logging levels are:

string value debugl 1 (most verbose) debug2 2 debug3 3 info 4 warning 5 serious 6 fatal 7 operator 8 (least verbose) Example: If a process was running at info level, and it received a message to change logging level without a specific level to change to, then it would change to debug3 logging level. Also if the process was running at debugl level and it received a message to change logging level without a specified level, the level will loop around and change to operator level.

help If no specific command is specified, rci lists the built-in commands and their acceptable syntax. If a built-in command is specified after the help command word, then additional help is displayed for that built-in command. If an invalid command or an invalid command line argument is entered, the general help will be displayed automatically.

<extended Allows for processing of specific (extended) commands, such as the sync command> command for the event collector. Rci will check for invalid commands using a header file. If commands are add or removed, rci must be modified. The syntax checking will be left to the back-end process. If the command is invalid a general help message will be sent to the standard output.

The main purpose of the director daemon 2 is to startup the system, restart processes that get terminated and shutdown the system. On startup, director 2 reads in a master process table 6, which contains the names of the processes that are to be started and monitored by director. The Director starts, stops and monitors back-end processes, such as event collectors and database monitors.
The master process table has the format:
PTag Executable Name Start Up Indicator Process Priority Process Args The format of the configuration file which will specify the processes to start is outlined in the following table.
Field ~ Description PTag Process Tag. It is used to uniquely identify each instance of a process in the system. The PTag is to be used as the name of the config file (with the extension .cfg) and as the name of the FIFO (with the extension .fifo) Executable The name of the executable without path. The path name for executables is in a variable in the system setup config file (systemSetup.cfg).

Start Up IndicatorThis field indicates whether a process is to be started up by director.

This field contains either a Y or a N. When the command program starts up director, director reads its process table and will start up all processes that have a Y in this field. If the command program gives director a startup all command, director starts up all processes which contain a Y in this field. If director gets a specific request to start up a process which contains an N in this field, then it will start up the process, however it will not restart the process if the process should exit with a restart code.

Process PriorityThe nice priority level of the process.

Process The argument string for the process.

arguments Note: director will put the PTag for the process into the argument string for the process with a -PTag flag. All processes may get the PTag from the config file class which will parse the command line for process variables.

The inter process communication component 3 is a simple mechanism used to send messages to various processes in order to control the operation of the system.
This component is implemented using the UNIX FIFO mechanism. The address of a specific process is determined by the ptag associated with that process. Two components are provided, one to read messages and one to write them.
A message has the following components:
~ source process - the ptag of the process that is sending the message ~ destination process - the ptag of the process that is receiving the message ~ message id - an integer value which indicates what the message content is to the receiver ~ response requested - an integer value which indicates whether the process is expecting a reply to the message ~ size - and integer value which indicates the size of the body ~ body.- the body of the message for the process The process configuration component is a set of object oriented classes which are imbedded into each process for the purpose of managing configuration parameters associated with that process. The process configuration component uses the ptag associated with the process to determine which configuration file in the configuration table 7 should be read.
The software logging component 5 is a set of object oriented classes which are embedded into each process for the purpose of providing a status and error logging facility. The software logging component 5 uses the ptag associated with the process to determine which file in the log database 8 the log data should be written to.
A specific example will now be described with reference to Figure 2.
In a first scenario, process B2 is running and suddenly dies. The following events occur to restart process B2.
~ the director is notified of the process death via Unix signals.
~ the director finds the Ptag associated with the dead process in the master processtable 6.
~ the director rereads the process information associated with this Ptag from the master process file.
~ the director issues a log message to inform the system operator that process died and that it will be restarted.
~ the director restarts the process.
_g_ For the second scenario, assume that the system was configured with process B2 running.
It is desirable to add the functionality provided by process A1 without shutting down the system. The following steps are undertaken.
~ An entry is added to the master process table for the particular ptag.
s ~ Using rci, a startup command for that ptag is issued. Specifically , the command is "rci startup A1"
The startup message is sent by the rci command to the director. The director reads the master process table to obtain the process information associated with the ptag A1.
The director starts the process Al and issues a log message to this effect using the software logging component.
In the third scenario, the configuration of process B2 must be modified. As an example, process B2 has a configuration parameter that specifies a specific data store.
The space remaining in this data store exceeds a maximum threshold and new information must be placed in a new data store. The following steps are undertaken.
The parameter in the configuration file associated with process B2 is modified to reflect the new data store where the new data items must be stored.
Using rci, a reread config command is issued to process B2. Specifically, the following command is issued : "rci rereadConfig B2"
The message is sent along to process B2.
Upon receiving this message, the process configuration component rereads the config file and issues a log message to this effect using the software logging component.
In the fourth scenario, the functionality provided by process B2 is no longer desired. It is desirable to remove the functionality provided by process B2 without shutting down the system. The following steps are undertaken.
~ The restart flag for this particular ptag in the master process list is set to 'N' for no restart.
~ Using rci, a shutdown command for that ptag is issued. Specifically , the command is "rci shutdown B2"

~ The shutdown message is sent by the rci command to process B2 ~ Process B2 shuts down.
~ the director is notified of the process death via Unix signals.
the director finds the ptag associated with the dead process.
~ the director rereads the process information associated with this ptag from the master process file and notices that the restart flag is set to no.
~ the director issues a log message stating that process B2 died and will not be restarted.
~ the system operator can then delete the entry associated with the process B2 from the master process table.
The described system minimizes the downtime and enhances the flexibility of the system.

Claims

Claims:
A process management for use in an object-oriented programming environment, comprising means for running a plurality of processes, a director for monitoring the operation of said processes, and a database for storing a plurality of process tags, each process tag uniquely identifying each instance of a process, said means for running said processes obtaining information required to run said processes by looking up the respective process tags therefor in said database.
2. A process management infrastructure as claimed in claim 1, further comprising an interprocess communication component for sending messages to various processes in order to control the operation of the system, said processes being identified by the process tag associated therewith.
3. A process management infrastructure comprising two said interprocess communication components, one for reading messages and the other for writing them.
A process management infrastructure as claimed in any one of claims 1 to 3, wherein said process tags contain at least the name of the executable file and arguments for the process.
5. A process management infrastructure as claimed in any one of claims 1 to 3, which is implemented in a service level management system for data networks.
6. A method of managing processes in an object-oriented programming environment, wherein process tags are used to uniquely identify each instance of a process in the system, information pertaining to the process is associated with the process tag in a database, and each instance of a process is run by extracting said information from the database.
7. A method as claimed in claim 1, wherein messages are sent to various processes in order to control the operation of the system, said processes being identified by the process tag associated therewith.
8. A method as claimed in claim 1, wherein said process tags are associated with at least the name of the executable file for the process and the process arguments.
9. A method of monitoring the operation of a packet switched network with a network manager, comprising receiving messages from said network manager containing data relating to the operation of said network, running processes to store and analyze said information, characterized in that said processes are uniquely identified by process tags stored in a database along with other information about said processes.
10. A method as claimed in claim 9, characterized in that messages are exchanged between said running processes, said messages identifying said processes by the process tag associated therewith.