MXPA98007722A - Method and apparatus for the processing of defective tolerant calls - Google Patents

Method and apparatus for the processing of defective tolerant calls

Info

Publication number
MXPA98007722A
MXPA98007722A MXPA/A/1998/007722A MX9807722A MXPA98007722A MX PA98007722 A MXPA98007722 A MX PA98007722A MX 9807722 A MX9807722 A MX 9807722A MX PA98007722 A MXPA98007722 A MX PA98007722A
Authority
MX
Mexico
Prior art keywords
call
server
mon
control computer
active
Prior art date
Application number
MXPA/A/1998/007722A
Other languages
Spanish (es)
Inventor
G Blum Andrea
A Potochniak Paul
Original Assignee
At&T Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by At&T Corp filed Critical At&T Corp
Publication of MXPA98007722A publication Critical patent/MXPA98007722A/en

Links

Abstract

The present invention relates to a method and apparatus for processing call data. A first server, which is in an active mode, replicates the call data to a second server that is in a standby mode. The first server is verified as to a defective condition by the second server, as well as other network devices. If a defective condition is detected, the first server is switched to the standby mode and the second server is switched to the active mode.

Description

METHOD AND APPARATUS FOR PROCESSING CALLS TOLERANT OF DEFECTS Field of the invention The invention relates to the processing of calls in general. More particularly, the invention relates to a method and apparatus for automatically switching call processing from an active call processor to a call waiting processor in the event of failures in the active call processor.
BACKGROUND OF THE INVENTION Given the current state of telephone technology, telephone calls, in modern telecommunication networks, are relatively reliable in terms of speed in call consummation, meet quality of service requirements and maintain a connection. call during the course of a conversation. The last category, maintaining a call connection, is largely provided by integrating redundancy into the network, especially in the call processing platform. The call processing platform generally controls the establishment and suspension of a call connection and ensures that billing of the call is exactly maintained. This redundancy in the platform REF. 28378 Call processing ensures that a call connection is maintained, even if there is a failure of the physical components in the equipment used to establish the call connection and is sometimes referred to as "effect-tolerant call processing". The conventional technology and methods for integrating redundancy into a call processing platform, however, are less than desirable for a variety of reasons. For example, a call processing platform typically has a call control computer that is responsible for implementing the call flow, by coordinating and allocating the resources of the other components of the platform, such as a switching matrix, computer voice response and database computers. Given its central function, the operation of the call control computer is extremely important to maintain a call connection. As a result, the control computer of. Calls is usually a specialized computer designed with 'redundant physical components, such as a backup microprocessor, memory, power supply and so on. However, this specialized call control computer is very expensive. In addition, a single call control computer, even with redundant physical components, is susceptible to failures of common mode. Common mode failures occur when a single failure of a system component causes a total system failure to occur. In addition, the specialized call control computer is difficult to scale or update and maintain. In an attempt to avoid the above problems, some call processing platforms use multiple call control computers, instead of a single specialized call control computer with redundant physical elements. However, the use of multiple call control computers presents a new set of problems. Normally, one of the call control computers is designated as an active call control computer, with a second one designated as a call waiting control computer. The active call control computer actively controls the call processing functions for the call processing platform, while the call waiting control computer remains ready to take control of the call processing platform in the case that the active call control computer experiences a failure of the physical elements (hardware) or of the programming elements (software). To ensure that calls are not suspended when the call control computer active fails, it is necessary to duplicate all call processing data to the call waiting control computer. In addition, it is necessary to implement a verification scheme to verify the active call control computer and determine when it is necessary to switch the call waiting controller. There are conventional techniques for duplicating call processing data from an active call control computer to a call waiting control computer, such as the technique described in a paper by Rachid Guerraoui et al., Entitled "Software Based Replication for Fault Tolerance ", Computer Journal, IEEE, April 1997. The technique described in the Guerraoui document, however, is unsatisfactory for a variety of reasons. For example, Guerraoui's document fails to describe a verification and switching scheme that minimizes suspended calls in the event of failures of the active call control computer. In addition, Guerraoui's document does not describe means to synchronize call processing data through the call processing platform. In addition, Guerraoui's document does not teach how to ensure that the waiting computer has accurate records with respect to statistical call data. x Normally, a call processing platform requires two types of data to process a call: (1) dynamic call data; and (2) static call data. The dynamic call data consists of 5 information about the caller or the call connection that changes for each call. For example, a destination telephone number is considered to be dynamic call data since it usually changes from call to call. Static call data consist of information about of a caller that is relatively stable, that is, does not change on a call-in-call basis. An example of static call data would be a billing address for a caller or perhaps a Personal Identification Number. Guerraoui's document fails in discussing the duplication of static call data to the call waiting control computer. In view of the foregoing, it can be appreciated that there is a substantial need for a fault-tolerant call processing method and apparatus that resolves the problems discussed above.
BRIEF DESCRIPTION OF THE INVENTION The present invention includes a method and apparatus for the processing of call data. A first server that is in an active mode replicates the Call data to a second server that is in a standby mode. The first server is verified as to a failure or defect condition by the second server, as well as other network devices. If a defective condition is detected, the first server is switched to the standby mode and the second server is switched to the active mode. With these and other advantages and features of the invention which will become apparent hereinafter, the nature of the invention can be understood more clearly by reference to the following detailed description of the invention, the appended claims and the various drawings appended hereto. I presented.
BRIEF DESCRIPTION OF THE DRAWINGS Figure 1 illustrates a communication system suitable for practicing one embodiment of the invention. Figure 2 illustrates a call processing platform according to one embodiment of the invention. Figure 3 is a block diagram of a call control computer according to one embodiment of the invention.
Figure 4 illustrates a block flow diagram of the steps carried out by a dynamic data replication module according to one embodiment of the invention. Figure 5 (a) illustrates a first diagram of. block flow of a High Availability Dae on module (HAD) according to a given modality. the invention. Figure 5 (b) illustrates a second block flow diagram for an HAD module according to one embodiment of the invention. Figure 6 (a) illustrates a first block flow diagram of a Verification Service (MON) module according to one embodiment of the invention. Figure 6 (b) illustrates a second block flow diagram of a MON module according to one embodiment of the invention.
DETAILED DESCRIPTION The invention includes a method and apparatus for the processing of calls tolerant to faults or defects. More particularly, the invention includes a method and apparatus for automatically switching from an active call control computer to a call waiting control computer in the event of a failure of the physical elements or the elements of the call. programming, of the active call control computer, without interrupting the active call connections currently processed by the active call control computer. Two key elements required to carry out this automatic switching are call data synchronization and communication verification. One embodiment of the invention comprises an integrated call processing platform on general purpose computer devices. Non-specialized computing devices, for general applications, are combined with voice response units (VRU) and a switching matrix to create a call processing platform that is easily maintained, easily distributed, fault tolerant, that provides high service availability through the use of a distributed verification system based on elements of programming, backup "live", full data sharing, replication and database synchronization. It is worth noting that although the distributed verification system of this embodiment of the invention is implemented in programming elements, it can be seen that the distributed verification system could be implemented in physical elements or programming elements and still fall within the scope of the invention. invention.
The call processing platform performs call control and resource management by using non-specialized computer devices of general application. The use of non-specialized, general-purpose computer devices significantly reduces the cost of the call processing platform in general and the particular call control computers. This embodiment of the invention uses a pair of non-specialized general-purpose computer devices such as call control computers, one of the computers actively controls call processing for the call processing platform ("active call control computer"). ) and the other placed in a standby mode ("call waiting control computer") and ready to take on the responsibilities of call processing in the event that the active call control computer experiences a failure in the physical elements or in the programming elements. The switching of the active call control computer to the call waiting control computer can be carried out on a demand basis or automatically in case of failures of the active call control computer. The "active / standby switching" based on demand, of the control computers of calls allows a platform administrator to request either a ONJDEMAND GRACEFUL switch (uniformly on a demand basis) or a 0N_DEMAND QUICK switch (fast on demand basis). The ON_DEMAND GRACEFUL switchback synchronizes the entire call processing platform by temporarily stopping call processing and clearing all currently used switching resources. The ON_DEMAND QUICK switch operates in a similar way to the automatic active / standby switching described below. The automatic switching on / off of call control computers is carried out by using two key elements. The first key element is the verification of the platform. The second key element is the synchronization of call status information. Verification of the platform is carried out by using distributed monitors for call control computers and other critical processes. Each call control computer is equipped with a communications monitor to verify the internal processes for the call control computer, also as the status of the other network devices that are part of the call processing platform. In addition, each network device is equipped with a communications monitor to verify the internal processes of each network device, also like the control computers of calls. Each communications monitor can detect failures of the device that is in operation of the monitor, as well as failures of other devices external to the device that puts the monitor into operation. Thus, each network device, which includes call control computers, is "capable of detecting device failures and reporting device failures to the active call control computer." Additionally, each remote communications monitor of the active call control computer can detect or confirm communication failures of the active call control computer and alert the call control computer to await the need for a possible transfer In this embodiment of the invention, the Verification of the platform is carried out through the use of two sets of verification processes, these processes verify the platform in terms of failures of the physical elements and of the programming elements, in such a way that the processing of calls is maintained by activating the call waiting control computer after the failure of the active call control computer. The first set of verification processes are referred to as High Availability Daemon (HAD) processes. HAD processes run or are executed in the Call control computers, an HAD by computer. The HADs are responsible for: (1) coordinating the startup and shutdown of call processing on the platform; (2) tracking the status of local applications to their own processors; (3) tracking of communication status and system status of the other components of the platform; and J4) verification of the status of the other call control computers. The HAD process is described in more detail with reference to Figures 3, 5 (a) and 5 (b). - The second set of verification processes are referred to as verification service processes (MON). The MON processes run or run on the other components of the platform, for example, VRUS and database computer. Each component has a MON process. In general, MONs are responsible for: (1) tracking the state of the local application to its own processor; (2) report the status of the local processor to the two call control computers; and (3) direct the flow of calls to the active call control computer. The process MON is described in more detail with reference to figures 3, 6 (a) and 6 (b). If any of the verification processes (HAD or MON) detects a failure that affects the call processing capabilities of the control computer of active calls, register a vote - to - switch (vote - to - switch) with the call waiting control computer. After receiving two such votes, the call waiting control computer is activated. First, the call waiting control computer tells your companion (formerly active) call control computer to enter standby mode. Then, the call waiting control computer informs the other components of the platform to redirect the call flow to itself, such as the new active call control computer. The other key component of the automatic active / standby switching is the fully associated synchronization of each call state data structure contained in the active call control computer, with its call state data structure replicated in the remote computer. Call waiting control. As part of the. In normal operation, the call control computer maintains call information on a call-in-call basis, that is, dynamic call data. This information deals with the switching and VRU resources currently assigned to a call and caller data, such as a target number and billing instrument data (eg call card). As this information is collected by the Active call control computer of the other platform elements, the data is synchronized in real time to the call waiting control computer. By this method, the call waiting control computer always has all the call information necessary to continue with call processing if the verification processes determine that the active call control computer has failed. Thus, the call control computers are fully synchronized with respect to the call data used for the processing of calls. The active call control computer immediately shares all the updates of the call status with the call waiting control computer live, so that after the failure of the active call control computer, the control computer Call waiting can accept redirecting the call flow with minimal loss of active calls or queuing delay. The synchronization of the database and replication of static call data are also carried out for both call control computers. A database computer stores data from. static call in a static call data profile and then replicates the static call data in the computers Call control active and waiting, whenever static call data is accessed or modified. This ensures that if the data is lost in either one unit or another, it can be easily recovered from a replication. Replication of the static call data for this embodiment of the invention utilizes an Advanced Replication product provided by Oracle Corporation. The call server copies of the database are read-only and are propagated to the call servers when using the Oracle Read-Only Snapshots product. Periodic data reviews of dynamic and static call records on both call control computers are carried out to confirm that all data is synchronized. This ensures that both call control computers have updated call records with respect to a particular call, such that the call is not suspended or interrupted in the event of an active call control computer failure. Referring now in detail to the drawings, wherein like parts are designated by like reference numbers from beginning to end, illustrated in Figure 1, a communications system capable of practicing a mode of the invention. As it is shown in Figure 1, terminals A and B (each one named 7) is connected to a Public Switched Telephone Network (PSTN) 9. PSTN 9 also connects to a call service center (CSC) 8. A calling party initiates a telephone call from terminal A. The call is processed by CSC 8 and a call connection is consumed. to the called party, in terminal B, via PSTN 9.
The CSC 8 includes a call processing platform (CPP) 10 which is described in more detail with reference to Figure 2. Figure 2 illustrates a call processing platform according to one embodiment of the invention. A CPP 10 includes a computer controlled switching matrix 12, a first call control computer 14, a second call control computer 20, a plurality of VRU 16 and a database computer 18. The switching matrix 12 is interconnected with a pair of call control computers, via the local area network 44 (LAN). The switching matrix 12 is responsible for providing all the network terminations to the PSTN. The call control computers 14 and 20 are responsible for the implementation of the call flow between a source number and a destination number. The call control computers 14 and 20 coordinate and allocate the resources of the other components of the platform, such as switch 12, VRU 16 and database computer 18. Each call control computer has an active mode and a standby mode. An active call control computer actively controls call processing for the CPP 12, while the other call computer is placed in a standby mode as a backup for the call control computer in the call center. active mode. VRU 16 are computers capable of providing voice or speech and pulsed tone resources, used to interact with the caller. The VRUs 16 are connected to the switching matrix 12 via a network, such as a Primary Rate Integrated Services Digital Network (ISDN-PRI) and to the call control computer 14 located in another network, such as LAN 44. Database computer 18 is a general application computer that contains a relational database for use in call processing. The database computer 18 is connected to the call control computers via the LAN 44. Figure 3 is a block diagram of a call control computer according to a embodiment of the invention. For purposes of clarity, the following description will refer to the call control computer 14. However, the call control computers 14 and 20 are similar and therefore, any discussion regarding a call control computer is similar. applicable to the other call control computer. The call control computer 14 comprises a main memory module 24, a central processing unit 26 (CPU), a system control module 28, a main distribution line adapter 30, a high-density Daemon module 32. Availability (HAD) and a dynamic data replication module 34, each of which is connected to a main distribution line 22 of CPU / memory and a main distribution line 38 of Input / Output (1/0), via the main distribution line adapter 30. In addition, the call control computer 20 contains multiple I / O controllers 40, also as an external memory 46 and a network interconnection 48, each of which is connected to a main I / O distribution line via the controllers I / O 40. The overall operation of the call control computer 14 is controlled by the CPU 26, which operates under the control of executed computer program instructions that are stored in the memory main 24 or external memory 46. Main memory 24 and external memory 46 are storage devices that can be read by the machine. The difference between the main memory 24 and the external memory 46 is that the CPU 26 can normally access the information stored in the main memory 24 faster than the information stored in the external memory 36. Thus, for example, the main memory 24 can be any storage device equipment that can be read by the machine, such as random access memory (RAM), read-only memory (ROM), programmable read-only memory (PROM), programmable read-only memory that can be erased (EPROM), programmable read-only memory and electronically erasable (EEPROM). The external memory 46 can be any type of storage device that can be read by the machine, such as magnetic storage means (i.e., a magnetic disk) or optical storage medium (i.e., a CD-ROM). In addition, the call control computer 14 can contain various combinations of storage devices that can be read by the machine, by means of other 1/0 controllers, which are accessible by the CPU 26 and which are capable of storing a combination of instructions of computer programs and data.
The CPU 26 includes any processor of sufficient processing power to perform the HAD and data replication functionality found in the call control computer 14. Examples of CPUs suitable for carrying out the invention include the family of INTEL processors, such as the Pentium®, Pentium® PRO and Pentium® II microprocessors. The network interface 48 is used for communications between the call control computer 14 and a communication network, such as the LAN 44. The network interface 48 supports appropriate signaling and voltage levels, in accordance with well-known techniques in The technique. The I / O controllers 40 are used to control the flow of information between the call control computer 14 and a variety of devices or networks such as the external memory 46 and the network interface 48. The control module 28 of the system includes a control of the system by the human user, operation and user interface. The main distribution line adapter 30 is used to transfer data back and forth between the main distribution line 22 of the CPU / memory and the main distribution line I / O 38. The VRU 16 and the database computer 18 are similar to computer 14 call control described with reference to Figure 3. The VRUs 16 and the database computer 18, however, replace the HAD module 32 with a Verification Service (MON) module 50 (not shown in Figure 3). The MON 50 can also be implemented in other network devices internal or external to the CPP 10 (call processing platform). The HAD 32, the MON 50 and the dynamic data replication module 34 implement the main functionality for this embodiment of the invention. It will be noted that the HAD module 32 and the dynamic data replication module 34 are shown in Figure 3 and the MON 50 module describes separate functional modules. However, it can be appreciated that the functions carried out "by these modules can be further separated into more modules, combined together to form a module or they can be distributed throughout the system and still fall within the scope of the invention. The functionality of these modules can be implemented in physical elements, programming elements or a combination of physical elements and programming elements, using well-known signal processing techniques.The HAD 32 and the MON 50 share the responsibility of four central functions: (1) coordinate start-up and shutdown of call control computers 14 and 20; (2) track and record communication and activity states for the call control computer 14 and 20; (3) detect and alarm any problem / failure of the physical elements, programming elements and others; and (4) verify the operations of one and the other. The HAD 32 operates or runs on both call control computers 14 and 20. The call control computers 14 and 20 have two primary modes: (1) an active mode; and (2) a standby mode. When a call control computer is in the active mode, it is actively controlling the call processing functions for the CPP 10 and is referred to as an active call control computer. Similarly, the HAD 32 running or running on the active call control computer is referred to as an active HAD (HAD-CurrActy). When a call control computer is in standby mode, it remains ready to take active control of the call processing functions for CPP 10, either on demand or automatically, with minimal impact on currently active calls . A call control computer that is in the standby mode is referred to as a call waiting control computer and the HAD 32 running or running on the call waiting control computer is referred to as an HAD in -wait (HAD-Stand). At any time, only one of The two call control computers can be in active control of the CPP 10. The HAD 32 provides the following functionality for the call control computers 14 and 20: 1. Perform and shut down the critical processes of the call control computer active in a particular order, during the startup and shutdown of the platform. 2. Notify the MONs that run or run on the other network devices that perform or shut down the critical processes on the other network devices. 3. Carry out a switching, on a demand or automatic basis, of the control of the platform between the call control computers 14 and 20. 4. The standby HAD recognizes the need for, and initiates, the automatic switching of the control from the platform to the call waiting control computer, from a faulty active call control computer, with minimal loss of currently active calls. 5. Keep track of the status of the critical processes of the call server. 6. Keep track of the status of other critical processes of network devices. 7. Recognize the defective active call control computer after a cold start or restart and thus initialize (or initialize) the faulty active call control computer automatically. 8. Respond to any 'MON beat' or status queries from other network devices MON 50 runs or runs on all network devices away from call control computers 14 and 20, such as VRU 16 and the database computer 18. The MON 50 provides the following functionality for these other network devices: 1. Recognize which call control computer is currently active by communicating with the HAD-CurrActy 2. Respond to either HAD beats, status queries, status change reports and status transition requests 3. Maintain status monitoring of critical processes of other network devices 4. Notify the currently active HAD of any state change or alarm 5. Verify the communication status of the HAD-CurrActy and notify the HAD pending any problem.
To properly implement automatic switching, HAD 32 or MON 50 processes must detect and act on system failures in a short period of time, for example, 5 seconds of their appearance. The type of failures that can be detected by the HAD 32 or MON 50 include: 1. The failure of a critical process in a call server; 2. The loss of heartbeat messages from a critical process; or 3. The loss of the active call server due to network or operating system failures. Additional details for the HAD 32 and the MON 50 will be described later in this specification. The dynamic data replication module 34 is responsible for the replication of the call data received in the control computer, from active calls to the call waiting control computer. Thus, if the active call control computer fails, the call waiting control computer can handle the call processing operations for the CPP 10 while minimizing the number of calls suspended during the switching process. The dynamic data replication module 34 is described in more detail with reference to Figure 4.
Figure 4 illustrates a block flow diagram of the steps carried out by a dynamic data replication module according to one embodiment of the invention. As shown in Figure 4, the call data is received in step 52. In step 54, the system determines whether the active call control computer or the call waiting control computer will receive the data from calls. If the active call control computer is going to receive the call data in step 54, the active call control computer processes the call data in step 56. The active call control computer accesses a call data record and compares the received call data with the call data stored in the call record. call data, in step 60. If the call data differs from the call data stored in the call data record in step 60, the call data is replicated and sent to the call waiting control computer , in step 62. If the call data is not different from the call data stored in the call data record, in step 60, the system searches for the next call data set in step 52. If the call waiting control computer will receive the call data in step 54, the system determines whether the call data is from the active call control computer, in step 64. If so, the call data record for the call waiting control computer is updated with the new call data in the step 66. If the call data is not from the active call control computer in step 64, the system searches for the next set of call data in step 52. The database computer 18 is an application computer General that contains a relational database for use in the processing of calls. Similar to the other network devices described with reference to CPP 10, the database computer 18 includes a MON 50 module to verify the call control computers 14 and 20. The database computer 18 also includes a module for the replication of static data. The static data replication module receives the static call data and stores the static call data in a static call data profile in the relational database. Each time the static call data profile is updated, the static data replication module replicates the static call data stored in the static call data profile to the call control computers 14 and 20.
The CPP (call processing platform) 10 periodically checks the call data records and the static call data profiles on a periodic basis. The data checks help to ensure data synchronization between the call control computers 14 and 20. Figure 5 (a) illustrates a first block flow diagram of a High Availability Daemon (HAD) module according to a embodiment of the invention. The CPP 10 has two call control computers, a first call control computer and a second call control computer. Each call control computer executes an HAD process, with each HAD process in communication with each other. For purposes of clarity, an HAD process that runs or runs on the first call control computer will be referred to as "the first HAD process" and an HAD process running or running on the second call control computer will be referred to as "the second HAD process". Similarly, an HAD process that runs or runs on the active call control computer will be referred to as "the active HAD process" and a process that runs or runs on the call waiting control computer will be referred to as " the HAD process on hold. " As shown in Figure 5 (a), each HAD process executes steps 70, 72, 74, 76, 78, 80, 82 and 84. In step 70, the HAD process is initiated. After the start, the HAD process activates the call control computer in which it is executed, in step 72. In step 74, the HAD is removed from service. In step 76, the HAD process determines whether the call control computer in which it is running is the faulty active call control computer. In this embodiment of the invention, this determination is carried out by interrogating the data stored in step 78 and by receiving a response to the interrogation, in step 76. Alternatively, other means could be implemented to choose the active call processor defective, such as by means of an alternating or random selection process and still fall within the scope of the invention. In step 80, the HAD process exchanges beats with the internal processes running or running on the same call control computer that is running or executing the HAD process. The HAD process determines if all internal processes are in operation within the normal performance parameters, in step 82. If, in step 82, all internal processes are not in operation according to normal performance parameters , then the HAD process is put out of service again at the stage 74. If all the internal processes are in operation according to the normal performance parameters in step 82, the HAD process is placed in standby mode in step 84. Thus, in step 84, both HAD processes are placed in mode on hold. The defective active HAD process is initialized (or adjusted to initial values) in step 86. Then, the defective active HAD processor determines if the other HAD process ("companion HAD") is already in active mode, in step 88. If the HAD Mate is already active in step 88, the defective active HAD process is placed on hold, in step 84. If the HAD Mate is still not active, in step 88, then the defective active HAD is placed in a mode Wait, in step 90. In step 92, the defective active HAD process activates the VRU 16 and sends a Mon go active message 132 to the MON 50. Figure 5 (b) illustrates a. second block flow diagram of an HAD module according to one embodiment of the invention. The defective active HAD process determines, in step 94, whether the threshold number of the VRU has been activated. If the threshold number of the VRUs has not been activated in step 94, then the defective active HAD is placed in standby mode, in step 84. If the threshold number of the VRUs has been activated in step 94, the defective active HAD process inspects the state of the switch in step 96. The defective active HAD process determines whether the switch is ready to carry out switching functions in step 98. If the switch is not ready in step 98, the defective active HAD process is placed in switch mode. wait, in step 84. If the switch is ready to carry out switching functions, in step 98, then the defective active HAD is placed in an active mode, in step 100. Once the active HAD has been placed in active mode, the HAD process announces its active state to all other network devices in step 102. A function shared by the HAD processes and the MON processes is to verify the internal processes of the computer running the HAD or MON processes and also verify the network devices external to the computer that executes the HAD or MON processes. In steps 104, 106 and 108, the active HAD process interrogates the internal processes of the active call control computer, as well as the other network devices, such as the VRU 16 and the Switching Matrix 12. In step 104 , the active HAD process sends status queries to the internal processes, the VRU 16 and the Switching Matrix 12. The HAD process receives responses from the internal processes, the VRU 16 and the Switching Matrix 12 in step 106. In stage 108, the HAD process determines whether the processes internal, the VRU 16 and the Switching Matrix 12 are functioning properly. If the internal processes, VRU 16 and Switching Matrix 12 are in operation properly in step 108, then steps 104, 106 and 108 are repeated until the HAD process determines that one of the internal processes, VRU 16 or Switching Matrix 12 is not in operation properly, in step 108. If a fault occurs in the internal processes, VRU 16 or Switching Matrix 12, in step 108, the HAD process determines whether it is the internal processes that have failed in step 110. If the internal processes have not failed in step 110, the HAD process determines whether the call processing platform 10 has lost a VRU threshold number 16 in step 112. If a VRU threshold number is not present in step 112, then an alarm arises in step 114 and the active HAD process is placed on hold in step 84. Yes. the active HAD process determines that an internal process has failed in step 110, the active HAD process notifies the waiting HAD process to activate and then instructs the active call control computer to proceed to an out-of-service mode, in step 116. Then, the active HAD process is placed out of service in step 118.
Figure 6 (a) illustrates a first block flow diagram of a Verification Service (MON) module according to one embodiment of the invention. Figure 6 (a) shows a MON process which may be running in any of the network devices that are part of the CPP 10. In step 120, a Mon process is initiated. In step 122, the Mon process activates the VRU 16. In step 124, the Mon process is placed out of service. Then, the process Mon inspects the state of the internal processes of the device which is executing the process Mon in step 126. The process Mon determines whether all internal processes are running properly in step 128. If all the internal processes do not are executed appropriately in step 128, then the process Mon is placed out of service in step 124. However, if all internal processes are running properly, in step 128, then the process Mon is placed in a mode waiting in step 130. In step 134, the MON process determines whether it has received a "MON go active" message 132. If it has not received a "MON go active" message 132, then the MON process remains in standby mode , in step 130. If a "MON GO active" message 132 is received in step 130, the process Mon is placed in a standby mode in step 136.
Figure 6 (b) illustrates a second block flow diagram of a MON module according to one embodiment of the invention. In the lid 138, the process Mon inspects the state of the VRU 16. In step 140, the process Mon determines whether the ports of the VRU 16 are ready. If the ports of the VRU are not ready, in step 140, then steps 138 and 140 are repeated until the ports of the VRU are ready. If the ports of the VRU are ready, in step 140, then the process Mon is placed in active mode in step 142. The active Mon process announces its active state to the other network devices in step 144. Steps 146 , 148 and 150 carry out the verification process for the active Mon process. In step 146, the active Mon process sends status queries to the internal processes running on the device that is running the active Mon process and also inspects the status of the active and waiting call control computers. The responses of the internal processes and of the active and waiting call control computers are received in step 148. The active Mon process determines, in step 150, whether the internal processes and the active call waiting control computers they are in operation within the normal performance parameters. If internal processes and computers control Active and standby calls are functioning within the normal parameters, in step 150, steps 148 and 150 are repeated until the Mon process determines that the internal processes, the active call control computer or the call control computer in standby, it has failed in step 150. In the event that a fault is detected, in step 150, the process Mon determines whether an internal process has failed in step 152. If an internal process has failed in step 152 , then the process Mon is placed out of service in step 124. If an internal process has not failed in step 152, then the active process Mon determines whether the active call processor has failed in step 154. If the processor of Active calls failed, in step 154, the active Mon process sends the waiting HAD process a vote to switch (message "HAD Go Active" 119) in step 156 and then advances to a standby mode in step 130. With reference again to the fi 5 (b), an "HAD go Active" message 119 is sent to the standby HAD process, in step 84. In step 86, the standby HAD process determines whether it is initialized as the faulty active Had process. Since the waiting HAD process is not initialized as the defective active HAD process, in step 86, the waiting HAD process determines whether it has received an appropriate "HAD. Go active" message 119 in step 160. If the standby HAD process has not received an appropriate "HAD go active" message 119 in step 160, the standby HAD process remains in standby mode in the cover 84. If the waiting HAD process receives an appropriate message from " HAD go active "119 in step 160, the standby HAD process carries out steps 90 to 118 as the newly active HAD process. The operation of CPP 10 can be better understood by means of the following example. Suppose a passenger in an airplane wishes to make a telephone call. The passenger takes the portion of the handset from an air terminal and picks up an ON button. When the handset is activated (ON), the air terminal captures a radio channel to a ground station. The ground station performs a network connection to the CPP 10. A message is sent to the active call control computer of the CPP 10 that the Switching Matrix 12 has detected a main line capture. Then, the active call control computer begins a new call log and replicates the new call log to the call waiting control computer. Then, the Switching Matrix 12 notifies the VRU 16 of an incoming call request. The VRU 16 carries out the response monitoring and requests identifying information from the ground station. The station in Earth sends identifiers from the ground station / air terminal (GS / AT) to the Switching Matrix 12, which passes the GS / AT identifiers to the VRU 16. The VRU 16 sends the GS / AT identifiers to the active call control computer. The active call control computer updates its call data record and replicates the GS / AT identifiers to the call waiting control computer, so that it can update its call data record. Then, the active call control computer accesses a static call data profile on the database computer 18, to validate the GS / AT identifiers. If the GS / AT identifiers are valid, the active call control computer updates its call data record and replicates the validated GS / AT identifiers to the call waiting control computer, so that it can update its Call data record. Then, the active call control computer sends a message to VRU 16, that it is ready or authorized to collect the call information. The VRU 16 passes the same message to the Switching Matrix 12. The Switching Matrix 12 sends an acknowledgment to the ground station that the GS / AT identifiers are valid. The air terminal cuts a voice path to the ground station, which in turn cuts a trajectory of voice to the switching matrix 12. Then, the Switching Matrix 12 cuts a voice path to the VRU 16. The VRU 16 reproduces a dial tone that is sent to the air terminal. Once the passenger in the airplane receives the dial tone, he is asked to dezlize his credit card to pay for the telephone call. The credit card information is received by VRU 16, which passes to the active call control computer to update its call data record. Then, the active call control computer replicates the credit card information to the call waiting control computer, such that the call waiting control computer can update its call data record. Then, the active processor inspects the static call data profile to determine if the credit card number is a valid number. If the credit card number is valid, the active processor sends a message to VRU 16 that it is ready or authorized to collect a passenger destination number. The active call control computer also replicates the validation message to the call waiting control computer. Once the VRU 16 receives authorization to collect a destination number, the VRU 16 reproduces the dial tone again for the passenger using the air terminal. The passenger introduces a destination telephone number to complete a call connection. The destination telephone number is sent from the air terminal to VRU 16, which in turn passes it to the active call control computer, so that you can update your call log. Then, the active call control computer replicates the destination telephone number to the call waiting control computer. Similar to the GS / AT identifiers and credit card information, the active call control computer validates the destination telephone number by accessing the static call data profile stored by the database computer 18. If the destination telephone number is a valid destination telephone number, the active call control computer sends a message to VRU 16, that it is correct to make the call. The active call control computer also updates its own call log and replicates the validation of the destination telephone number to the call waiting control computer, so that it can update its own call data record. The VRU 16 sends a message to the passenger indicating that the call connection was made by sending a message such as "Thank you for using AT & T".
At this point, VRU 16 sends a message to the active call control computer that the new call is complete. The active call control computer sends a message to the Switching Matrix 12 to establish a communication link to the called party. The switching matrix 12 initiates a communication link and waits for a response from the called party. If the switching matrix 12 receives a response, it sends a message to the active call control computer to begin billing. Then, the active call control computer updates its call data records and replicates the call data to the call waiting control computer. Once a call connection is established, the passenger can start a conversation with the called party. Suppose that sometimes during the conversation, the HAD 32 running or running on the call waiting control computer or Mon 50 running or running on the VRU 16, the Switching Matrix 12 or the Database Computer 18, detects a failure in the physical elements or in the programming elements in the active call control computer. The HAD 32 or Mon 50 sends a "Vote-To-Switch" message to the standby HAD. If the standby HAD receives two such messages within a predetermined period of time, the HAD in Wait sends a message to the active HAD that tells the active HAD to enter standby mode. Then, the standby HAD places the call control computer in standby in the active mode. Then, the call waiting control computer retrieves the call data for this particular call from its call data record. Then, the newly activated call control computer sends a message to the switching matrix 12 to send all future data for this particular call. Since the call waiting control computer has a record of updated call data, the passenger and the called party can continue their conversation without any interruption.
HAD MODULE 32 The HAD module 32 (hereinafter referred to herein simply as "HAD") runs or runs on a call control computer as a message-driven state transition program, designed to coordinate the states of call processing of a Resource Manager (REM) module that runs or runs on the active call control computer with a Remote Access Dip (RAD) module that runs or runs on VRU 16. The HAD is communicates with a MON-CV module that runs or runs on VRU 16 for exchange information about the start / stop of the platform. The MON-CV in turn relieves the messages to the RAD. The types of HAD messages listed here are generically named for simplicity when used to describe HAD activities and the following state transitions. The HAD receives the following messages: ImAlive - from MONS, HAD-Mate and critical processes when they have been initialized (or adjusted to initial values); of critical processes like a heartbeat response. ImDead - of MONS, Had-Mate and critical processes when they are uniformly extinguished. MonState - of MONs as a beat response and current activity level report. ReiaState - from the REM (resource manager modules) of critical processes to report their current call processing level. HadState - of the HAD-Mate as a beat response and report of the current activity level. StateQuery - of MONS and HAD-Mate to request a heartbeat response in the form of a current activity level report.
GoActive - de la Ul (interface or interconnection with the user) for activation based on demand; of the HAD during a commutation. GoStandby - of the Ul for the deactivation based on demand; of the HAD during a commutation. VoteToSwitch - from MON to HAD-Stand when MON has detected missed response beats by HAD-CurrActy. I Controller - of your HAD-Mate when HAD-Mate has advanced to the active state.
The HAD sends the following messages: ImAlive - to MONS, HAD-Mate and critical processes when HAD has been initialized (or adjusted to initial values). I Dead - to MONS, HAD-Mate and critical processes when HAD is extinguished uniformly. AliveQuery - to critical processes to require a heartbeat response in the form of an ImAlive report. StáteQuery - to HAD-Mate and MONS to request a heartbeat response in the form of a report about the current activity level. HadState - to HAD-Mate and MONS as a beat response and report of the current activity level.
GoActive - to HAD-Sta d s HAD-CurrActy, to bring the HAD-waiting to the processing level ie Hammocks fully active, when HAD-CuriActy has lost a cyclic process; to MON-Cvs to bring us to the state fully - active; to REM to bring him to the fully active state. InController - to your HAD-Mate and MONS when you are advancing to the active state. After initiation (or adjustment to initial values), HAD advances through a process of determining the reading of its parameter files, one of which is the designation of all SJs. If the active control is faulty, Z? F_ACTIVE_ S. M. AD note - "U ^ your-control servo 'ie Jamadas is the active call balance of activ- d" ectious, then HAD S - J.5 isuítra to himself as the defective act or defective or HAD-Ei ^ Acty, then know how to act a 3 or act v-ut, only without receiving? &_ _: den "GoActive '' manual c- J Inteface of the User (Ul, The 'rules, are: If HAD-DefActy notices a HAD-Mate is not on the off-line or off-line or on-line, If the HAD-DefActy nett that its HAD-Mate advances to the active state or is already active, it will be able to advance - only if it is waiting, it would prevent. .
If an HAD notices that it is not the defective asset, then it also knows how to advance only to the waiting, prevented state. The HAD that is not defective active can be brought to the active state, by the explicit command of the Ul during manual switching or by a "GoActive" command of its HAD-Mate or by means of the automatic switching scenario "VotaToSwitch". If either an HAD process or another process is extinguished or redeveloped, it proceeds through the same initialization (or adjustment to initial values) as it would during a cold start or start. The newly developed HAD reads the DEF_ACTIVE_CS parameter and proceeds as indicated above. To start tracking critical processes by the HAD, all critical processes in the call control computer send an ImAlive report at the start or start to the HAD. Then, the HAD creates a process record that contains updatable information about the communication status of the process or, if appropriate, its process status. When the other monitors, MON-Op, MON-CV and HAD-Mate are activated, they also report ImAlive to the HAD. The HAD uses an internal alarm routine to regularly beat the critical processes of the call control computer. The HAD sends Alive-Queries as beats to all its critical processes, each PROC_HB_INTERVAL number of seconds. All AliveQuery receivers in the HAD must respond with an ImAlive. After receiving an ImAlive response from a critical process, HAD updates that process heartbeat record. The HAD keeps track of the AliveQueries without answering. If a process fails to respond to a PROC_HB_MISSES HAD AliveQueries number, the HAD can trigger alarms or experience a state transition. The parameters PROC_HB_INTERVAL and PROC_HB_MISSES are tunable. The Ul maintains the HAD list of the critical processes in a parameter file. The HAD reads these parameters at startup or when the Ul sends a ReadParms command ("Re-read parameters"). The HAD uses an internal alarm routine to regularly beat the far MON server. The HAD sends StateQueries as beats to the MONs, which include MON-CV and MON-Op. However, only the current activity level of the MON-Cv is important for the verification of the HAD of the state of the platform and the capacity of the call processing. (The MON-Op operates independently of the rest of the platform). The HAD sends these beats every MON__HB_INTERVAL number of seconds. MONS must answer the StateQueries with a MonState report. After receiving a MonState report, the HAD updates the heartbeat of the MON and the status register. The HAD keeps track of the StateQueries without answering. If a MON fails to respond to the MON_HB_MISSES number of StateQueries of the HAD, the HAD triggers an alarm but does not experience any state transition by itself. Both MON_HB_INTERVAL and MON_HB_MISSES. they are tunable parameters. A MON state can be: MON_OOS (f§DOS§f represents f§f: out-of-service ^). MON_STANBY MON_AIT_RAD_ACTIVE MON_ACTIVE MON_WAIT_RAD_OOS MON_MAINT_STANBY The HAD, whenever it is ACTIVE or WAITING, follows its HAD-Mate (HAD partner) with StateQuery heartbeats (state interrogation). The HAD-Mate must respond with a HadState report. After receiving a report from HadState, the HAD updates the heartbeat and status record of the HAD-Maté. The HAD keeps track of the StateQueries without answering. If the HAD-CurrActy (currently active HAD) fails to respond to the number HAD_HB_MISSES of StateQueries of the HAD-Stand (HAD waiting), then the HAD-Stand begins to look for confirmation of the communication problems of the HAD-CurrActy in the form of a "VoteToSwitch" notification of any MON that also has communication failures detected with HAD-CurrActy. If HAD-Stand obtains this MONITOR VoteToSwitch confirmation within SWITCH_INTERVAL seconds of the first missed beats detected from the standby HAD, it initiates a rapid automatic platform switchover and brings your call control computer to full activation. If HAD-Stand obtains a "VoteToSwitch" from a MON before HAD-Stand itself has detected missed HAD-CurrActy beats, then HAD-Stand begins counting the S ITCH_INTERVAL and waits for another "VoteToSwitch" from another MON, before start with automatic quick switching. If HAD-Stand is switched and becomes the currently active HAD, it sends the other monitors an IraController announcement, so that the flow of the call can be redirected to the new active call control computer. The parameters SWITCH_INTERVAL and HAD_HB_MISSES are tunable. Please note that fast automatic switching is presented without any reference to any call control computer being designated as the faulty active call control computer. The status of the HAD can be: HAD_OOS HAD STANDBY HAD_ AIT_MONS HAD_AIT_REM HAD_ACTIVE The heartbeat state of a tracked critical process or beat state of MON or HAD-Mate can be: ALIVE - if the process continues answering AliveQueries of the HAD. MISSESJHTBT - if the process has failed to respond to one more successive AliveQueries up to a number PROC_HB_MISSES. NOT_RESPONDING - if the process has failed to respond to successive PROC_HB_MISSES AliveQueries, but it is found that the process is not extinguished when using kill (O). DEAD - if an unresponsive process is found it is extinguished when using kill (O). Also, an ImDead of a critical process will cause an immediate transition to this heartbeat state or lack thereof. HAD depends on the following variables: DEF; _ACTIVE_CS - This is the machine name of the designated active defective call control computer. It can be changed at any time. It is used by an HAD when it adjusts to initial values or when trying to resolve levels of activity in conflict with HAD-Mate.
PROC_HB_INTERVAL - This is the interval, in seconds between AliveQuery beats sent by the HAD to its critical I processes and StateQuery beats to its HAD-MATE. The default is 1 second. PROC_HB_MISSES - This is the number of responses successively lost to AliveQueries that HAD allows a critical process before declaring it NOT_RESPONDING. The default is 2. REM_INIT_TIMER - This is the number of responses successively lost to AliveQueries that HAD-CurryActy allows REMs (resource management modules) when HAD-CurrActy has undergone a transition to HAD_Wait_REM status and sends an order to REM GoActive. MON_HB_INTERVAL - This is the interval in seconds between the StateQuery beats sent by the HAD to the MONS. The default is 10 seconds. MON__HB_MISSES - This is the number of answers successively lost to StateQueries that HAD allows a MON before declaring it NOT_RESPONDING. The default is 2. SWITCH_INTERVAL - This is the interval in seconds after HAD-Stand has detected or received a non-response notification from the HAD-CurrActy of a MON. The HAD-Stand must receive a confirmation of the HAD-CurrACty problem of another MON in order to start an automatic fast switching. The default is 5 seconds. The following comprises a description of states and transitions of the HAD. HAD_00S: When HAD initializes (or adjusts to initial values) its starts in the out-of-service state, HAD-OOS. When all of your critical call control computer processes have initialized and sent ImAlive statements, HAD undergoes transitions to the next waiting state. HAD_STANDBY: In the HAD_STANDBY state, all the critical processes of the call control computer respond to AliveQueries and HAD communicates with the * remote MONs, in all the CRIS units and to its HAD-Mate. It is considered that the HAD is on hold, prevented. If HAD recognizes itself as the defective active HAD, it experiences transitions to the next waiting state if its HAD-Mate is not currently active. If the "GoActive" HAD is indicated, "through the Ul or its HAD-Mate, it experiences transitions to the next waiting state, regardless of the faulty active HAD, and if HAD receives a" VoteToSwicth "notification and confirmation, it experiences transitions to the next waiting state to begin a switchover fast automatic. If HAD has "GoActive" capability, notify everyone with an ImController ad. HAD_ AIT_MONS: In this state, it is considered that HAD advances to the state jfgoing activefff. If the HAD has reached this state during an automatic initialization (initial adjustment) or during a rapid switchover, then the HAD searches for messages from all MON-Cvs that recognize that HAD is active and at least one MON-CV in been on hold. When these have been received, HAD transitions to the next state and sends a RemGoActive command to REM. If HAD has reached the HAD_WAIT_MONS state during a uniform switch on demand, HAD tells MON-Cv to go into standby mode. Then, HAD searches for messages from all MON-Cvs that recognize HAD as active and at least one MON-CV in standby status. When these have been received, HAD transitions to the next state and sends a RemGoActivef command to its critical process REMs. HAD_WAIT_REM: In this state, the HAD is still active. The HAD is waiting for the REM to respond to the RemGoActive command with a RemGoneActive report. When HAD obtains this report, it transitions to the HAD_ACTIVE state fully and sends a GoActive command to the MON-Cv. (The MON-Cv they may already be in an active state if a fast switch is already on the way). HAD_AC IVE In this state, the RAD is in the call processing and interaction with the REM, in the call control computer. The HAD sends periodic AliveQuery and StateQuery beats to update the communication and status registers and keep track of the missed heartbeat responses. If HAD notices that a The critical process does not respond to your AliveQueries or if a critical process sends an ImDead report, HAD will transition to the HAD_00S status and send a "Go-Active" command to your HAD-Mate for the relay. Make ^^ j. ^^^^^ samsajíp ^^ e detailed information regarding the functions of the HAD in response to certain conditions.
Table 1: STATUS OF HAD: HAD OOS of REM or critical processes: From HAD-Mate or Ul From MON: From HAD-MATE Table 2: STATE OF HAD: HAD STANDBY of REM or critical processes From HAD-Mate or Ul: of MON: Determination of Initialization is only done in HAD STANDBY state: of HAD-Mate: TABLE 3: STATE OF HAD: HAD AIT MONS Table 3: STATE OF HAD: HAD WAIT MONS Table 4: HAD STATE: HAD WAIT REM Table 5: HAD STATE: HAD ACTIVE MON 50 MODULE The Mon 50 module can be optimized to run or run on different network devices, for example, a VRU or database computer. This detailed description will provide a general summary of two types of MON modules, the first designated for a VRU (MON-CV) and the second designated for a ground server to air (GTAS). (MON-OP) used in an embodiment of the invention.
MON-CV The MON-CV runs or runs on VRUS 16 as a message-driven transition program, designed to verify and coordinate. the call processing states of the RAD module running or running on the VRUs 16, with the call processing states of the REM-module running or running on the active call control computer. The Mon-CV tells the HAD-CurrActy server to receive platform start / stop commands and other platform updates. The MON-CV also keeps track of the communication status of the HAD-CurrActy. If the MON-CV detects any problem when communicating with the HAD-CurrActy, it will immediately notify the waiting HAD that it is alert for a possible commutation. The MON-CV message types listed herein are generically named for simplicity when used to describe the MON-CV activities and state transitions that follow. The MON-CV receives the following messages: ImAlive - from both HAD and its critical process RAD when they have been initialized and from RAD regularly as a "beat response to confirm viability. ImDead - from both HAD or RAD when they have been uniformly inactivated." StateQuery - from HADS to request a heartbeat response in the form of a status report HadQuery - from HADS as a heartbeat response in the form of a status report RadGoneOos - from RAD to report your current call processing level as out of service RadGoneActive - de RAD to report your current call processing level as active RadGoneMoos - from RAD to report itself in a maintenance state.> GoActive - from HAD-CurrActy for normal activation, from Ul for manual activation. - of Ul or HAD for deactivation I Controller - of the HAD that is activated during the initialization or adjustment to initial values or switching.
The MON-CV sends the following messages: AliveQuery - to RAD to request a beat response indicating viability.
StateQuery - to HAD to request a heartbeat response in the form of a status report. "MonState - to HAD as a beat response and current activity level reporting GoOos - to RAD to move from active call processing to out of service GoActive - to RAD to bring it to the fully active call processing level. - to HAD-Stand when MON-CV has detected a communication problem with the currently active HAD.The MON-CV uses an internal alarm routine to regularly send an Alive Query heartbeat to each PROC_HB_INTERVAL number of seconds. ImAlive report After receiving an ImAlive from RAD, the MON-CV updates the communication status of RAD. If RAD fails to respond to PROC_HB_MISSES AliveQueries number, the MON-CV sounds an alarm and transitions to an out-of-service state. The intervals of PROC_HB_IÑTERVAL and PROC_HB_MISSES are tunable. To track the state of communication of the HAD, the MON-CV uses an internal alarm to send regular StateQuery beats every HAD_HB_INTERVAL number of seconds. HADs must respond with HadState reports. After receiving a HadState report from HAD, the MON-CV updates the communication status of the HAD. The HAD_HB_INTERVAL parameter is tunable. The MON-CV keeps track of the missing StateQuery response. If the currently active HAD fails to respond to the MON-CV's StateQueries HAD_HB_MISSES number, the MON-CV immediately notifies the waiting HAD to be alert for a possible quick switch when sending a VoteToSwitch notification. If it is within the SWITCH_INTERVAL number of seconds, the voter MON-CV has not detected the renewed beats of the HAD-CurrActy that does not respond or has not received notification of the HAD-Stand of a commutation en route, then the voting MON-CV will pass through itself to a standby state, because it can not inform its critical RAD processes of where to direct the call flow. The parameters of HAD_HB_MISSES and SWITCH_INTERVAL are tunable. The heartbeat state for a critical process tracked or HAD ras.treado can be: ALIVE - if the process [HAD] continues to answer the AliveQueries [StateQueries] of the MON-CV. MISSED__HTBT - if the process [HAD] has failed to respond to one or more successive AliveQueries [StateQueries] up to the number PROC HB MISSES [HAD HB MISSES].
NOT_RESPONDING - if the process [HAD] has failed to resolve to PROC_HB_MISSES [HAD_HB_MISSES] successive AliveQueries [StateQueries] but it is found that the process is not in extinction when using kill (O). Also, an I Dead report of the process. { HAD] will cause an immediate transition to this heartbeat state. DEAD - If a process is found. { HAD] that does not respond is in extinction when using kill (O). Also, an I Dead report of the process. { HAD] will cause an immediate transition to your heartbeat state.
The MON depends on the following parameters: HAD_HB_INTERVAL - This is the interval in seconds between the StateQuery beats sent by the MON-CV to the HADS. The default is 1 second. HAD_HB_MISSES - This is the number of successive missing responses to StateQueries that MON_CV allows an HAD before declaring it NOT_RESPONDING. The default is 2. PROC_HB_INTERVAL - This is the interval in seconds between AliveQuery beats sent by MON-CV to the RAD. The default is 1 second. PROC-HB_MISSES - This is the number of successive missed responses to AliveQueries that MON-CV allows the RAD before declaring it NOT_RESPONDING. The default is 2. SWITCH_INTERVAL - This is the number of seconds that the MON_CV waits for a switching notification or HAD-CurrActy heartbeat detection renewed immediately after the MON-CV sends a VoteToSwitch notification to the HAD-Stand. If SWITCH-INTERVAL expires without any notification, the MON-CV goes into the standby state. The following provides a description of the states and transitions of the MON-CV.
MON_OOS: When the MON-CV is initialized or adjusted to initial values, it starts in the out-of-service state, MON_OOS. When the RAD has initialized and sent a statement ImAlive, the MON-CV makes a transition to the next waiting state.
MON_S ANDBY:. In the MON-STANDBY state, MON-CV knows that the RAD is active, but the RAD circuits are still OOS (out of service). The MON-CV is in communication with the remote HADs in the call control computers and with the RAD. When MON-CV receives an I Controller announcement from the HA that is activated, MON-CV registers the identity, in such a way that it can indicate to the RAD which is the current active call control computer. When HAD-CurrActy completes the activation and sends a GoActive command to the MON-CV, the MON-CV makes a transition to the next state and sends a RadGoActive command to the RAD.
MON_WAIT_RAD_ACTIVE: In this state, MON-CV expects the RAD to respond to the RadGoActive command with a RadGoneActive report. When RAD sends this report, the MON-CV makes a transition to the fully active MON_ACTIVE state.
MON_ACTIVE: In this state, the RAD processes the calls in the CRIS unit and interacts with the REM in the call control computer. MON-CV sends periodic AliveQuery beats to RAD and StateQueries to both HADs, to update communication records and keep track of missed heartbeat responses. If the MON-CV notices that the currently active HAD is not responding to the StateQueries, the MON-CV will immediately notify the waiting HAD (with a notification of VoteToSwitch) that is alert for possible fast switching as described above. If such rapid switching occurs, the newly activated HAD sends the MON-CV an I Controller announcement. The MON-CV registers which is the new HAD-CurrActy and notifies the RAD with no change to the current state of the RAD. At any time in the MON_ACTIVE state, if the MON-CV sees that the RAD is not responding to its AliveQueries, the MON-CV will transition to the MON_OOS state and notify the HADs.
MON_WAI _RAD_OOS: If the MON-CV must pass active call processing, due to the receipt of a GoStandby order from the Ul or HAD, the MON-CV first sends a RadGoOos command to the RAD. When the RAD answers with a RadGoneOos report, the MON-CV transitions to the MON-S ANDBY state.
MON_MAINT_S ANDBY: If the RAD sends a RadGsneMoos report to MON-CV, indicating the need for maintenance, the MON-CV transitions to MON_MAINT_STANDBY. When RAD must go to MOOS, it handles all CRIS circuits as they become ideal, in anticipation of their CRIS deactivation. The MON-CV does not try to reactivate the RAD if it is a call control computer that is currently active. The RAD can only leave its MOOS state by a manual command or re-start. In all the Mon states, the MON-CV responds to the I Dead report of the RADs, the HadGoneOos report or the PROC_HB_MISSES number of heartbeat responses successively lost by the RAD when making a transition to a MON_OOS state and sending a report to the HADs MonGonOos. The following Tables 6 to 10 provide a detailed description of "the functions of the MON-CV in response to certain conditions.
Table 6: STATUS OF MON-CV: MON-OOS Table 7: STATUS of MON-CV: MON STANDBY Table 8: STATUS OF MON-CV: MON WAIT RAD ACTIVE Table 9: STATUS OF MON-CV: MON ACTIVE Table 10: STATUS OF MON-CV: MON WAIT RAD OOS In all Mon states, MON-CV responds to RAD's I Dead report, the RadGoneOos report or the PROC_HB_MISSES number of heartbeat responses successively lost by RAD when transitioning to a MON_OOS state and sending a MonGonOos report to the HADs. In any state, MON answers the HQ StateQueries with a MonState report. If MON-CV bounces, it is re-developed and re-adjusted to initial values as if experiencing a cold start. The MON-CV makes no assumptions about any previous state. Currently, RAD bounces are recognized only if MON-CV receives an RAD ImDead report when RAD is in uniform extinction. In that case, the MON makes a transition to the MON_OOS state. Once RAD has reset to initial values and sent an ImAlive report to the MON-CV, the MON will now transition to MON STANDBY and proceed in the usual way.
MON-OP (GTAS) The MON-Op is executed as a state transition program driven by a message on the GTAS, designed to verify and coordinate the states of the critical processes of the GTAS. The MON-Op also keeps track of the current states of the active and standby HAD monitors on the call control computer. If the MON-Op notices "any problem in communication with the currently active HAD, it will immediately notify the waiting HAD that it is alert for a possible switchover, because the GTAS can run the MON-Op independently of the rest of the platform. You need to have only two states, out of service or active. The Mon-Op message types listed herein are generically named for simplicity when used in the description of the MON-Op activities and state transitions that follow. The Mon-Op receives the following messages: ImAlive - from both HAD and its critical RAD process when they have been initialized and from critical processes regularly as a beat to confirm their viability. I Dead - of both HAD or critical processes when they are uniformly extinguished. StateQuery -. of both HADs to request a heartbeat response in the form of a status report.
HadState - of both HAD co or a beat response and the current activity level report. ImController - of the HAD that is activated during initialization or adjustment to initial values or switching.
The MON-Op sends the following messages: 'AliveQuery - to pr "critical events to request a heartbeat response indicating viability.StateQuery - to both HADs to request a heartbeat response in the form of a status report. both HAD as a beat response indicating viability VoteToSwitch - to the HAD-Stand when MON-Op has detected a communication problem with the currently active HAD To maintain a current record of the states of the call control computers, the MON-Op uses an internal alarm routine to send HAD_HB_INTERVAL number of seconds regularly to the HAD StateQuery beats.Any HAD must respond with a HadState report.If the currently active HAD fails to respond to the HAD_HB_MISSES number of the StateQueries of the MON_OP, the MON-Op immediately notifies the waiting HAD that it is on alert for a possible commutation by sending it a VoteToSwitch notification.
The parameters of HAD_HB_INTERVAL and HAD_HB_MISSES are tunable. It is worth noting that because the GTAS run or run independently from the rest of the platform, MON-OP does not need to change its state or re-direct some call flow after the 'failures' of the HAD-CurrActy, unlike MON -CV, the "which must take some action within the SWITCH_INTERVAL number of seconds to send a VoteToSwitch.The status of the HAD can be: HAD_00S HAD_STANDBY HAD_WAIT_MONS HAD_WAIT_REM HAD_ACTIVE To keep track of the communication status of the critical servers in the GTAS, MON-Op uses the internal alarm to regularly send the AliveQuery beats to each PROC_HB_INTERVAL number of seconds.All AliveQueries receivers of the MON-Op must respond with ImAlive reports.The MON-Op keeps track of the lost AliveQuery responses. If any of your critical processes fail to respond to the PROC_HB_MISSES number of the AliveQueries of the MON-Op, then the MON-Op does soa r an alarm and make a transition to an out-of-service state. Both parameters PROC_HB_INTERVAL and PROC_HB_MISSES are tunable. The MON-OP may need to manipulate multiple instances of the same critical process server. Therefore, the assumption of simplification - that an active instance of a critical multiple instance server is fffsufficient§ | for the platform to keep call processing active - allows the MON-Op to assign and add the ALIVE status to any multi-instance process that has that active instance. The communications status of a tracked process or HAD can have one of the following values: ALIVE - if the process [HAD] continues to answer the AliveQueries [StateQueries] of the MON-Op. MISSED_HTBT - if the process [HAD] has failed to respond to one or more AliveQueries [StateQueries] successive to the number PROC_HB_MISSES [HAD_HB_MISSES]. NOT_RESPONDING - yes the process. { HAD} has failed to respond to the successive AliveQueries, PROC_HB_MISSES [HAD_HB_MISSES] [StateQueries], but is not in extinction when using kill (O). DEAD - If it is found that a process that does not respond [HAD] is in extinction when using kill (O).
Also, receiving an I Dead message from the process will cause an immediate transition to this heartbeat state.
The MON-Op depends on the following parameters: HAD_HB_INTERVAL - This is the interval in seconds between the StateQuery beats used by the MON-Op to the HADs. The default is 1 second. HAD_HB_MISSES - This is the number of successive missing responses to StateQueries that MON_Op allows an HAD before declaring it NOT_RESPONDING. The default is 2. PROC_HB_INTERVAL - This is the interval in seconds between AliveQuery beats sent by the MON-Op to its critical processes. The default is 1 second. PROC-HB_MISSES - This is the number of successive missed responses to AliveQueries to the MON-Op that allows a critical process before declaring it NOT_RESPONDING. The default is 2. The following provides a general description of the states and transitions of MON-Op.
MON_OOS: When the MON-Op is set to initial values or initialized, it starts in the out-of-service state, MON OOS. When each critical process has been initialized and sent an ImAlive statement, the MON-Op makes a transition to a fully active state.
MON_ACTIVE: In this state, the MON-Op knows that the GTAS are capable of processing calls. The MON-Op sends periodic AliveQuery beats to the critical processes and heartbeats of StateQuery to the HADs. If the MON-Op notices that the currently active HAD is not responding to the StateQueries, the MON-Op immediately notifies the standby HAD with a VoteToSwitch of the need to be alert for a possible commutation. If the MON-Op notices that a critical process is not responding to its AliveQueries, the MON-Op will transition to the MON-Op state and notify the HADs. The following Tables 11 and 12 describe in detail the functions of the MON-Op in response to certain conditions.
Table 11: STATUS OF MON-Op: MON OOS Table 12: STATUS OF MON-Op: MON ACTIVE Although various embodiments are specifically illustrated and described herein, it will be appreciated that modifications and variations of the present invention are covered by the teachings above and within the scope of the appended claims without departing from the spirit and scope of the invention. For example, although a pair of call control computers are used for CPP 10, it can be appreciated that any number of call control computers can be used and still fall within the scope of the invention. In addition, although the communication verification processes are described with reference to CPP 10, it can be seen that these processes they can be implemented over other network devices and still fall within the scope of the invention. It is noted that, in relation to this date, the best method known to the applicant to carry out the aforementioned invention, is that which is clear from the present description of the invention. Having described the invention as above, property is claimed as contained in the following

Claims (23)

  1. Claims 1. A method for processing call data, characterized in that it comprises the steps of: replicating the call data of a first server that is in an active mode to a second server that is in a standby mode; verify the first server, through the second server and other network devices, regarding a failure condition; and switching the first server to the standby mode and the second server to the active mode if a defective condition is detected. The method according to claim 1, characterized in that the step of replicating the call data comprises the steps of: receiving the call data in the first server; process the call data on the first server; update a call data record so that the first server reflects the call data; send the call data to the second server; and updating a call data record so that the second server reflects the call data. 3. The method according to claim 1, characterized in that the verification step comprises the steps of: interrogating the first server, through the network devices, to detect a defective condition; And send a message from the network devices to the second server of a defective condition detected. The method according to claim 3, characterized in that the switching stage comprises the steps of: receiving the messages in the second server; determine whether the messages reach a predetermined threshold number and if so: switch the second server from standby mode to active mode; and send a message from the second server to the first server so that it switches to the standby state. The method according to claim 4, characterized in that it further comprises the step of sending a message to the network devices to redirect the call data to the second server. The method according to claim 1, characterized in that it further comprises the steps of: receiving static call data in a database; store the static call data in a static call data profile in the database; and replicating the static call data to the first and second servers if the static call data is updated. The method according to claim 6, characterized in that the step of replication comprises the steps of: receiving the static call data in the first and second servers; and updating a static call data profile for the first server and a static call data profile for the second servers. The method according to claim 7, characterized in that it further comprises the step of auditing or inspecting the call data records and the static call data profiles on a periodic basis, to ensure data synchronization. 9. A method for processing call data, characterized in that it comprises the steps of: receiving the call data in a first server that is in an active mode; process the call data on the first server; update a call data record so that the first server reflects the call data; replicate the call data to a second server in a standby mode; verify the first server regarding a defective condition; and switching the first server to the standby mode and the second server to the active mode if a defective condition is detected. 10. 'The method of compliance with the claim 9, characterized in that it further comprises the steps of: receiving the replicated call data in the second server; and updating a call data record so that the second server reflects the replicated call data. 11. The method according to the claim 10, characterized in that it further comprises the step of sending a message that the first server has switched to the standby mode and that the second server has switched to the active mode. 12. An apparatus for processing calls, characterized in that it comprises: a first call control computer in active mode for receiving call data; a second call control computer in standby mode, coupled to the first call control computer; means for replicating the call data of the first call control computer to the second call control computer; means for verifying the first call control computer, for detecting failures of the first call control computer; and means for switching the second call control computer to the active mode and the first call control computer to the standby mode, if faults occur. The apparatus according to claim 12, characterized in that it further comprises a database coupled to the first and second call control computers. The apparatus according to claim 13, characterized in that the call information comprises static call information and dynamic call information and the database stores the static information. 15. The apparatus according to claim 14, characterized in that it also comprises means for replicating the static call information in the first and second call control computers. 16. The apparatus according to claim 12, characterized in that the replication means replicate the static call information in the first and second call control computers, as long as the static call information is modified. 17. The apparatus according to claim 12, characterized in that the means of verification comprise: means for remotely verifying the first and second call control computers; and means for locally verifying the first and second call control computers. The apparatus according to claim 17, characterized in that the local verification means comprise: means for adjusting the first control computer of -call in the active mode and the second call control computer in the standby mode; means for initializing or adjusting to initial values the first call control computer in active mode; means to determine whether a set of internal processes within the first computer control of calls are running or running under normal parameters; and means for sending a message to the second call control computer to switch from standby mode to active mode, if the set of internal processes are not running or running within normal parameters. 19. The apparatus according to claim 17, characterized in that the remote verification means comprise: means for determining whether a set of internal processes within the first call control computer are running or running within the normal parameters; and means for sending a message to the second call control computer to interrupt the second call control computer, from the standby mode to the active mode, if the set of internal processes are not running within the normal parameters. The apparatus according to claim 12, characterized in that the switching means comprise: means for receiving interruption vote messages on the second server; means for determining whether the messages reach a predetermined threshold number and if so; means for switching the second server from standby to active mode; and means for sending a message from the second server to the first server to switch to standby mode. The apparatus according to claim 20, characterized in that it also comprises means for sending a message to the network devices to redirect the call data to the second server. 22. A computer for carrying out call processing, characterized in that it comprises: a memory containing: a computer program for replicating the call data of a first server that is in an active mode to a second server that is in a standby mode; a set of computer programs to verify the first server by the second server and other network devices in terms of a defective condition; a computer program for switching the first server to the standby mode and the second server to the active mode if a defective condition is detected; and a processor to run the programs. 23. A means that can be read by computer, whose content causes a computer system to carry out a remote call procedure, the computer system has a computer program that, when executed, carries out the steps of: replicating the call data of a first server, which is in active mode, to a second server that is in standby mode, verifying the first server by the second server and other network devices as to a defective condition; and switching the first server to the standby mode and the second server to the active mode if a defective condition is detected.
MXPA/A/1998/007722A 1997-09-25 1998-09-22 Method and apparatus for the processing of defective tolerant calls MXPA98007722A (en)

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
US08937762 1997-09-25

Publications (1)

Publication Number Publication Date
MXPA98007722A true MXPA98007722A (en) 2000-01-01

Family

ID=

Similar Documents

Publication Publication Date Title
US5974114A (en) Method and apparatus for fault tolerant call processing
US6145089A (en) Server fail-over system
US6202067B1 (en) Method and apparatus for correct and complete transactions in a fault tolerant distributed database system
CN102640108B (en) The monitoring of replicated data
US6457050B1 (en) System and method for dynamically restoring communications within a network
CN102656565B (en) Failover and recovery for replicated data instances
EP0804842B1 (en) A hardware and data reduntant architecture for nodes in a communications system
US7590895B2 (en) Heartbeat apparatus via remote mirroring link on multi-site and method of using same
US7254740B2 (en) System and method for state preservation in a stretch cluster
CA2273348C (en) Redundant call processing
US5473771A (en) Fault-tolerant processing system architecture
CN109688012A (en) A kind of method of alliance's chain node hot standby switch
US6493715B1 (en) Delivery of configuration change in a group
US6298072B1 (en) Real-time transaction synchronization among peer authentication systems in a telecommunications network environment
US20060031540A1 (en) High availability software based contact centre
US20010039574A1 (en) System and method for verification of remote spares in a communications network
US8345840B2 (en) Fast detection and reliable recovery on link and server failures in a dual link telephony server architecture
US8291120B2 (en) Systems, methods, and computer program product for automatically verifying a standby site
JPH06348628A (en) Intelligent network system
CA2745824C (en) Registering an internet protocol phone in a dual-link architecture
US5974459A (en) Telecommunications network devoid of a distinct network management layer
MXPA98007722A (en) Method and apparatus for the processing of defective tolerant calls
US6590961B1 (en) Call protect systems with handoff redundancy
US6137774A (en) System and method for dispatching commands to switching elements within a communications network
Cisco Fault Tolerance