TELECOMMUNICATIONS SYSTEM FAILURE RECOVERY FIELD OF THE INVENTION
The present invention relates to telephony systems, and in particular to apparatus
and methods for providing recovery from failures in telephony systems.
BACKGROUND OF THE INVENTION
Telephony switches are well known in the art. Typical prior art telephony switches
which might be used on private premises, as opposed to being used as part of the
public switched telephone network (PSTN), include switches available from Lucent
Technologies and the AS5300 switch available from Cisco. Telephony switches are
generally used to provide telephony services to end units such as conventional
telephones, the services typically including basic services such as call establishment,
routing of calls to other switches or to the PSTN, and other well known services.
Advanced services may also be included, as is well known in the art.
One example of a prior art protocol which can be used in a telephony system
including telephone switches is the H.323 v. 2 protocol of the International
Telecommunications Union (ITU), published in 1998.
The problem of failure of a telephony system component, such as a telephony
switch, is well known in the art. When a switch fails, various problems typically occur,
including: inability of the switch to receive calls; loss of existing calls; inability of
establishing new calls; long failure periods; and inability to recover in case of a site
disaster.
Existing solutions to the problem of switch failure are typically based on the use of a
a shadow switch, including a backup switch which shadows ongoing operations and,
at the time of failure, takes over for the failed switch. Generally, shadow machines are
limited by a maximum physical distance from the master machine. In a case where a shadow machine is limited to be at the same physical location as the main machine, the shadow machine may also become inoperative in the case of a physical disaster. Generally, failure detection and recovery time for prior art shadow based systems is
approximately 10 minutes.
Technologies related to those of the present invention are described in a co-pending
application entitled ''System for Building a Telephony Application", filed on the same
day as the present application, having the same inventors and assigned to the same
assignee as the present application. The "System for Building a Telephony
Application" application is hereby incorporated herein by reference.
The disclosures of all references mentioned above and throughout the present
specification are hereby incorporated herein by reference.
SUMMARY OF THE INVENTION
The present invention seeks to provide an improved telephony system. In one preferred embodiment of the present invention, an improved telephony system
including an improved telephony switch is provided. In another preferred
embodiment, the telephony switch may be provided alone.
In the present invention, a disaster recover mechanism which does not require the
use of redundant shadow machines is provided. In the present invention, one switch
uses peer-to-peer communication, as is well known in the art, to redundantly store
important system information relating to the one switch on another active switch. In
case of a failure of the first switch, operation is moved to the second switch, using the
redundant stored information in the second switch or in another appropriate location.
Preferably, failures are detected using a keep-alive system.
There is thus provided in accordance with a preferred embodiment of the present
invention a telephony system including a data network, a plurality of telephony
devices each operatively associated with the data network and operative to request
telephony services, via the data network, in accordance with a telephony protocol, at
least two telephony switches including a first telephony switch and a second telephony
switch, each of the at least two telephony switches being operatively associated with
the data network and operative to fulfill telephony service requests received via the
data network in accordance with the telephony protocol and each including a system
information database, the system information database being operative to store
configuration information defining a plurality of characteristics of each of the
telephony devices, wherein the first telephony switch includes a first data replication
subsystem operative to replicate the stored configuration information to the system
information database included in the second telephony switch.
Further in accordance with a preferred embodiment of the present invention the
second telephony switch includes a second data replication subsystem operative to
replicate the stored configuration information to the system information database
included in the first telephony switch.
Still further in accordance with a preferred embodiment of the present invention the
at least two telephony switches also includes a third telephony switch, and the second
telephony switch includes a second data replication subsystem operative to replicate
the stored configuration information to the system information database included in
the third telephony switch.
Additionally in accordance with a preferred embodiment of the present invention
the telephony protocol includes H.323.
There is also provided in accordance with another preferred embodiment of the
present invention a telephony switch for use in a telephony system including a data
network and a plurality of telephony devices each operatively associated with the data
network and operative to request telephony services, via the data network, in
accordance with a telephony protocol, the telephony switch including a network
adapter operative to provide communications via a data network between the
telephony switch and a plurality of telephony devices and to receive telephony service
requests from the telephony devices, a system information database, the system
information database being operative to store configuration information defining a
plurality of characteristics of each of the telephony devices, and a first data replication
subsystem operative to replicate the stored configuration information to a second
system information database external to the telephony switch.
Further in accordance with a preferred embodiment of the present invention the
second system information database external to the telephony switch is included in a second telephony switch.
Still further in accordance with a preferred embodiment of the present invention the
telephony protocol includes H.323.
There is also provided in accordance with another preferred embodiment of the
present invention a method for providing recovery in a telephony system after the
failure of a telephony switch included in the system, the method including providing a
first telephony switch having an internal switch database for storing information and
replication capability to replicate the stored information in another database external to
the telephony switch, operating the first telephony switch, including storing system
information in the internal switch database, replicating the stored system information
to an external database external to the first telephony switch, detecting failure of the
first telephony switch, in response to a result of the detecting step, failing over
operations of the first telephony switch to a second telephony switch, using replicated
system information in the external database.
Further in accordance with a preferred embodiment of the present invention the
second telephony switch is selected based, at least in part, on a load of the second
telephony switch, from a plurality of available telephony switches.
Still further in accordance with a preferred embodiment of the present invention the
method also includes performing load balancing between the second telephony switch
and at least one other telephony switch.
Additionally in accordance with a preferred embodiment of the present invention
the step of detecting failure includes detecting failure using a keep-alive count.
Moreover in accordance with a preferred embodiment of the present invention the external database is included in the second telephony switch.
There is also provided in accordance with another preferred embodiment of the present invention a method for use in a telephony system including a data network and
a plurality of telephony devices each operatively associated with the data network and
operative to request telephony services, via the data network, in accordance with a
telephony protocol, the method including providing communications via a data
network between a telephony switch and a plurality of telephony devices and receiving
telephony service requests from the telephony devices, storing configuration
information defining a plurality of characteristics of each of the telephony devices, and
replicating the stored configuration information to a backup system information
database.
Further in accordance with a preferred embodiment of the present invention the
storing step includes storing in a system information database included in a first
telephony switch.
Still further in accordance with a preferred embodiment of the present invention the
backup system information database is included in a second telephony switch.
Additionally in accordance with a preferred embodiment of the present invention
the telephony protocol includes H.323.
There is also provided in accordance with another preferred embodiment of the
present invention, in a telephony system operative to provide failover of a telephony
switch upon failure of the telephony switch to update a keep-alive count, a load
balancing method comprising providing a plurality of telephony switches, identifying
one of the plurality of telephony switches as an overloaded telephony switch, and
failing to respond to at least one keep-alive message directed to the overloaded telephony switch, thus causing at least one device associated with the overloaded
telephony switch to fail over to another one of the plurality of telephony switches.
Further in accordance with a preferred embodiment of the present invention the
method also includes determining whether a sufficient number of devices have failed
over, and responding to at least one keep-alive message based, at least in part, on a
result of the determining step.
BRIEF DESCRIPTION OF THE DRAWINGS The present invention will be understood and appreciated more fully from the following detailed description, taken in conjunction with the drawings in which:
Fig. 1 is a simplified partly pictorial, partly block diagram illustration of a
telephony system having failure recover characteristics, the system being constructed
and operative in accordance with a preferred embodiment of the present invention;
Fig. 2 is a simplified flowchart illustration of a preferred method of operation of the
apparatus of Fig. 1 ;
Fig. 3 is a simplified flowchart illustration of a preferred load balancing method of
operation of the apparatus of Fig. 1; and
Fig. 4 is a simplified block diagram illustration of a preferred implementation of a
portion of the apparatus of Fig. 1.
DETAILED DESCRIPTION OF A PREFERRED EMBODIMENT
Reference is now made to Fig. 1 which is a simplified partly pictorial, partly block diagram illustration of a telephony system having failure recovery characteristics, the
system being constructed and operative in accordance with a preferred embodiment of
the present invention. The system of Fig. 1 preferably comprises a plurality of
telephony switches 100, comprising at least a first switch 110 and a second switch
120.
It is appreciated that a larger number of telephony switches 100 may be provided,
only two telephony switches 100 being shown in Fig. 100 for sake of simplicity of
description. Furthermore, it is appreciated that an alternative preferred embodiment of
the present invention comprises a subcombination of the system of Fig. 1 comprising a
single telephony switch 100, typically without the additional components shown in the
system of Fig. 1.
The system of Fig. 1 also preferably comprises a data network 130, which may
comprise any appropriate data network as is well known in the art, such as, for
example, a LAN or a WAN.
The system of Fig. 1 also preferably comprises a plurality of telephony devices
140: a preferred implementation of the telephony devices 140 is described below with
reference to Fig. 4. Each of the telephony devices 140 is preferably operatively
associated with a plurality of end units 150 such as conventional telephones or other
communication devices. It is appreciated that a wide variety of appropriate telephony
devices 140 and end units 150. including telephony devices and end units which are
well known in the art. may be used. It is further appreciated that other types of
telephony device 140. including devices associated only with a single end unit or
themselves comprising an end unit, may be used.
The system of Fig. 1 also preferably comprises a gateway 155, typically comprising
any appropriate gateway, as is well known in the art. operative to connect the
remainder of the system of Fig. 1 to the public switched telephone network (PSTN)
160 or to another outside telephone network. Preferably, the gateway 155 also has
failover capabilities similar to those described below with reference to the telephony
switch 100.
Each telephony switch 100 preferably comprises, in addition to conventional
components well known in the art of telephone switches, the following elements,
typically in operative association with each other and with the conventional
components:
1. a network adapter 170, the network adapter 170 being preferably operative to
mediate bi-directional communications between the telephony switch 100 and the
network 130;
2. a system information database 180, typically implemented in a combination of
computer hardware and software as is well known in the art and preferably comprising
an appropriate commercially available database management system, such as, for
example: Oracle v. 8, commercially available from Oracle Corporation World
Headquarters, 500 Oracle Parkway, Redwood Shores, CA 94065 USA.
3. a replication subsystem 190. typically implemented in software and preferably
being operative to replicate, as is well known in the field of database management
systems, at least a portion of the system information database 180 to a different
database not comprised in the same telephony switch 100 as the replication subsystem
190. The replication subsystem 190 may comprise suitable commercially available
software, typically supplied together with a commercially available database management system, as mentioned above with reference to the system information database 180; typically the commercially available software is include in a
commercially available database management system, as described above.
It is appreciated that various components of the system of Fig. 1, including each
telephony switch 100 and each telephony device 140, may be implemented in a
suitable combination of hardware and software, as is well known in the art. It is
further appreciated that one or more of the plurality of telephony switches 100
comprised in the system of Fig. 1 may comprise conventional telephony switches (not
shown), which conventional switches may not comprise at least unit 190 of the
telephony switches 100 shown in Fig. 1.
The operation of the system of Fig. 1 is now briefly described. The first telephony
switch 1 10 and the second telephony switch 120 each operate, similarly to
conventional telephone switches, to provide telephony services in response to
telephony service requests received via the network 130 from one or more of the end
units 140. During provision of these services, each of the first telephony switch 110
and the second telephony switch 120 stores in the respective system information
database 180 comprised therein information regarding the end units 140.
Typically the stored information comprises identification and configuration
information sufficient to maintain connection with and provide services to the end
units 140: additional information may also be stored. When information,
predetermined to be a type of information important for maintaining service in the
even of component failure, is stored in the system information database 180, the
replication subsystem 190 replicates the stored information to another external
database. For example, when information is stored in the system information database 180 of the first switch 1 10 concerning end units 200 serviced by the first switch 1 10, the information may preferably be replicated to the system information database 180
of the second switch 120.
The case of storing in a database comprised in the second switch 120 is by way of
example only, and is not meant to limit the generality of the foregoing.
Upon failure of the first switch 110, operations may thus be easily continued by the
second switch 120. Any appropriate method may be used to determine that the first
switch 1 10 has failed such as, for example, a keep alive method as is well known in the art.
Upon failure to update keep alive, either spontaneously by the first switch 110 or by
failure to respond to a message from another component of the system such as an end
unit 140 or the second switch 120, the other component of the system which detects
the failure to update keep-alive preferably initiates fail-over to the second switch 120.
Alternatively, a component which detects failure to update keep-alive may notify
another component which initiates fail-over to the second switch 120. It will be
appreciated that an appropriate choice of keep alive parameters may thus allow rapid
failover in the system of Fig. 1.
Reference is now made to Fig. 2, which is a simplified flowchart illustration of a
preferred method of operation of the system of Fig. 1. The method of Fig. 2 preferably
comprises the following steps:
A switch is operated normally, preferably in a manner similar to that of prior art
switches: during operation, system information is stored in an internal switch database
(step 210). During normal operation as described in step 210, stored data in the
internal switch database is replicated to another database external to the switch, typically a database in another switch (step 220). Upon loss of keep-alive update or upon another appropriate indication of failure of the switch which has been replicating stored data, switch operations are failed over to another switch, using replicated data
in the other database to continue operations (step 230).
Referring back to Fig. 1 , it is appreciated that the apparatus of Fig. 1 may also
perform load balancing as is well known in the art. In the present invention, a
particular type of load balancing is possible in that an overloaded switch, such as a
switch which receives connection of a plurality of devices upon failure of another
switch, may cause load balancing to occur by failing to respond to keep-alive
messages for a period of time, such as a predetermined period of time or until load on
the overloaded switch drops to a predetermined load. Preferably, devices become
attached to other switches according to the load of the switch to which attachment is
made; this may also be true in the method of Fig. 2, in that failover may occur to the
least loaded switch.
Reference is now made to Fig. 3, which is a simplified flowchart illustration of a
preferred load balancing method of operation of the system of Fig. 1. The method of
Fig. 3 is self-explanatory in light of the above explanation.
Reference is now made to Fig. 4, which is a simplified block diagram illustration of
a preferred implementation of any of the telephony devices 140 of Fig. 1. The
components of the apparatus of Fig. 4 may be implemented in any suitable
combination of hardware and software, as is well known in the art.
The apparatus of Fig. 4 preferably comprises a digitized voice unit 240 and an
audio packetizer 250 operatively associated therewith. The digitized voice unit 240 is
preferably operative to receive and send audio from and to an associated end unit such as any one or more of the end units 150 of Fig. 1 on one side; to digitize the audio if
necessary: and to forward the digitized audio to or receive digitized audio from the audio packetizer 250. The audio packetizer 250 is preferably operative to packetize
received digitized audio and to send the packetized digitized audio over a network
such as the data network 130 of Fig. 1. The audio packetizer 250 is also preferably
operative to receive packets from the network and to deliver digitized audio to the
digitized voice unit 240.
The apparatus of Fig. 4 also preferably comprises the following components:
- telephony signaling apparatus 260;
- a state machine 270;
- an interface and failover layer 280; and
- an H.323 stack 290.
Preferably, the telephony signaling apparatus 260 is operatively associated with the
state machine 270, which is operatively associated with the interface and failover layer
280. which is operatively associated with the H.323 stack 290, which is operatively
associated with a network such as the data network 130 of Fig. 1. The units 260, 270,
280. and 290 preferably each operate bi-directionally, similarly to the digitized voice
unit 240 and the audio packetizer 250 as previously explained. For the sake of
simplicity of description only and without limiting the generality of the present
invention, the units 260. 270. 280, and 290 will be described below as if they operated
uni-directionally.
Telephony signaling apparatus 260 is preferably operative to receive telephony
signals, as are well known in the art. from the end unit associated therewith and to
report the received signals to the state machine 270. The state machine 270 is preferably operative to track calls and telephony states. The interface and failover
layer 280 is preferably operative to provide the state machine 270 with a convenient interface to the H.323 stack 290. and is responsible for connecting calls to a switch,
such as the first telephony switch 1 10 of Fig. 1. Preferably, failover functions of the
telephony device 140. as described above with reference to Figs. 1 - 3, are
implemented in the interface and failover layer 280, said implementation being
self-explanatory with reference to the above discussion of Fig. 1 - 3.
The H.323 stack 290 preferably comprises any appropriate standard H.323 stack, as
is well known in the art, the H.323 stack 290 being operatively associated with the
network.
It is appreciated that various features of the invention which are, for clarity,
described in the contexts of separate embodiments may also be provided in
combination in a single embodiment. Conversely, various features of the invention
which are. for brevity, described in the context of a single embodiment may also be
provided separately or in any suitable subcombination.
It will be appreciated by persons skilled in the art that the present invention is not
limited by what has been particularly shown and described hereinabove. Rather the
scope of the invention is defined only by the claims which follow: