SE515348C2

SE515348C2 - Processor redundancy in a distributed system

Info

Publication number: SE515348C2
Application number: SE9504396A
Authority: SE
Inventors: Lars Ulrik Jensen
Original assignee: Ericsson Telefon Ab L M
Priority date: 1995-12-08
Filing date: 1995-12-08
Publication date: 2001-07-16
Also published as: SE9703132A0; SE9703132L; WO1997022054A3; WO1997022054A2; SE9703132D0; AU1048897A; SE9504396D0; SE9504396L

Abstract

A method of automatically recover from multiple permanent failures of processors in a distributed processor system, in particular a software driven telecommunication system. The method involves the creation of an initial configuration describing each processor and software objects executing thereon, and, for each processor the creation of a catastrophe plan to be followed if the processor has a failure. A catastrophe plan contains information as how to redistribute the software objects executing on the faulty processor to operating processor of the processor system. If a processor goes down its software objects are transferred to operating processors following the catastrophe plan for the faulty processor. A hardware and a software model of the processor system and its software is presented. A software object that has a hardware dependency is handled by the model.

Description

25 30 35 . . - - e» 5152 348 mellan reparation är önskvärt och medger att reparation kan ut- föras på arbetsdagar (måndag till fredag) och på arbetstid (fràn 08.00 till 17.00) samt medger schemalagt underhåll. Dessa krav leder till att telekommunikatíonssystemet måste kunna to- lerera att en processor går ned och att, innan den har repare- rats, även en andra processor tillåts att gå ned men att syste- met skall kunna tolerera detta och fortfarande vara I det värsta fallet, havererade processorerna har reparerats, driftsdugligt. och innan de första andra skall systemet kunna tolerera att även en tredje, fjärde och tillkommande processo- rer gàr ned, men att systemet fortfarande skall vara tillgäng- ligt trots de trasiga processorerna. 25 30 35. . - - »5152 348 between repairs is desirable and allows repairs to be carried out on working days (Monday to Friday) and during working hours (from 08.00 to 17.00) and allows scheduled maintenance. These requirements mean that the telecommunications system must be able to tolerate that one processor goes down and that, before it has been repaired, a second processor is also allowed to go down but that the system must be able to tolerate this and still be In the worst case , damaged processors have been repaired, operational. and before the first two, the system must be able to tolerate that even a third, fourth and additional processors go down, but that the system must still be available despite the broken processors.

Amerikanska patentskriften 4 710 926 avser ett felåterhämt- ningsförfarande i ett distribuerat processorsystem enligt vil- ket förfarande reservprocessorer tar över trasiga processorers funktioner. En reservprocessor tjänstgör som ersättare för en eller flera aktiva processorer. När en reservprocessor tas i drift tjänstgör den inte längre såsom reservprocessor för någon annan processor i systemet. Vid felåterhämtning överförs alla funktioner som exekverar på den trasig processorn till reserv- processorn. Den gamla reservprocessorns funktion att vara re- servprocessor överförs till en andra reservprocessor i systemet.U.S. Patent No. 4,710,926 relates to an error recovery process in a distributed processor system according to which backup processors take over the functions of broken processors. A backup processor serves as a replacement for one or more active processors. When a backup processor is put into operation, it no longer serves as a backup processor for any other processor in the system. In case of error recovery, all functions executing on the broken processor are transferred to the backup processor. The function of the old spare processor to be a spare processor is transferred to a second spare processor in the system.

Detta kända system kräver således två eller flera reservproces- sorer. När en reservprocessor är inaktiv utför den inte nägra jobb. När den aktiveras börjar den utföra jobb - förutsatt att den fungerar d v s inte uppvisar nägra fel. Det är nödvändigt att köra testprogram som verifierar att reservprocessorerna fungerar. Detta omnämns dock inte patentskriften.This known system thus requires two or more backup processors. When a backup processor is idle, it does not perform any jobs. When activated, it starts performing jobs - provided it works, ie does not show any errors. It is necessary to run test programs that verify that the backup processors are working. However, this is not mentioned in the patent specification.

Det amerikanska patentet 4 412 281 beskriver ett distribuerat, feltolerant, rekonfigurerbart signalbehandlingssystem som inne- fattar många identiska, med varandra förbundna sub-systemele- ment som delar på det stora behandlingsjobbet. Sub-system- elementen är redundant hopkopplade med varandra med dubbla bus- sar. Vissa sub-systemelement tjänstgör som reservsystemelement och är redo ta över de jobb som utförs pá ett trasigt sub- H. i. 10 15 20 25 30 35 - - . | | u 5135 348 systemelement. Ett reservsub-systemelement har även till upp- gift att utföra kontroll av alla de egna funktionerna för att verifiera att dessa fungerar. Genom att periodiskt rotera re- servsub-systemelement och aktiva sub-systemelement säkerställer ett distribuerat operativsystem att samtliga sub-systemelement utför den nämnda kontrollen av de egna funktionerna, varvid även mindre fel kan detekteras. Trasiga element i sub-systemet tas ur drift och ersätts med ett reservelement utan att syste- met går ned.U.S. Patent 4,412,281 discloses a distributed, fault-tolerant, reconfigurable signal processing system that includes many identical, interconnected sub-system elements that share the major processing work. The sub-system elements are redundantly connected to each other by double buses. Some sub-system elements serve as backup system elements and are ready to take over the jobs performed on a broken sub- H. i. 10 15 20 25 30 35 - -. | | u 5135 348 system element. A backup subsystem element also has the task of performing control of all its own functions to verify that these work. By periodically rotating spare subsystem elements and active subsystem elements, a distributed operating system ensures that all subsystem elements perform the mentioned check of their own functions, whereby even minor errors can be detected. Broken elements in the sub-system are taken out of service and replaced with a spare element without the system going down.

Signalbehandlingssystemet använder således reserv sub-system- element som inte deltar i de övergripande behandlingsuppgifter- na. Den metod som används för rekonfigurering när ett trasigt element detekteras och byts ut mot ett reservelement, utnyttjar distinkta kontaktdonadresser för varje element i systemet. En kontaktdonadress associeras med en virtuell adress, som ersät- ter kontaktdonadressen när ett feltillstånd detekteras.The signal processing system thus uses reserve sub-system elements that do not participate in the overall processing tasks. The method used for reconfiguration when a broken element is detected and replaced with a spare element, uses distinct connector addresses for each element in the system. A connector address is associated with a virtual address, which replaces the connector address when a fault condition is detected.

SAMMANFATTNING AV UPPFINNINGEN Ett ändamål med uppfinningen är att tillhandahålla en metod för automatisk återhämtning från multipla permanenta fel i proces- sorer i ett distribuerat processorsystem använt i en applika- tionsmiljö i ett telekomsystem som har höga krav på tillgäng- lighet samtidigt som telekomsystemet skall medge systemunder- håll, vare sig detta är planlagt eller ej.SUMMARY OF THE INVENTION An object of the invention is to provide a method for automatic recovery from multiple permanent errors in processors in a distributed processor system used in an application environment in a telecom system which has high demands on availability while allowing the telecom system to hold, whether this is planned or not.

Ett annat ändamål med uppfinningen är att utnyttja tillgängliga processorresurser samtidigt som uppfinningen medger en hetero- gen processormiljö, till följd dels av införande av ny teknolo- gi i ett system som växer med tiden och dels av de enskilda krav som ställs av olika delar av en applikation som kör på prOCeSSOISyStemet . Ännu ett ändamål med uppfinningen är att tillhandahålla en me- tod för snabb återhämtning i händelse av att flera processorer går sönder i ett distribuerat processorsystem som används i en telekomsystemmiljö genom att tillhandahålla en initial konfigu- rering av samtliga processorer och genom att tillhandahålla, för varje processor i systemet, en katastrofplan som skall an- lO 15 20 25 30 35 . . . . f» H- ..= f f 515 348 4 vändas i händelse av att ifrågavarande processor går sönder. En katastrofplan är det organ med vars hjälp mjukvaruobjekt, som finns installerade på en trasig processor, distribueras till ett antal processorer i systemet så att lasten delas mellan prOCeSSOrErna . Ännu ett ändamål med uppfinningen är att tillhandahålla nya ka- tastrofplaner för systemet med fungerande processorer, av vilka några på sig kan ha installerade mjukvaruobjekt från en trasig processor, för att på så sätt förbereda systemet för snabb återhämtning i händelse av att ännu en processor i systemet går sönder. Ännu ett ändamål med uppfinningen är att tillhandahålla en me- tod av ovanstående slag vilken tar tillbaka systemet till dess ursprungliga konfigurering av processorer och mjukvaruobjekt när systemets trasiga processor eller trasiga processorer efter reparation eller utbyte sätts in i systemet igen. Ännu ett ändamål med uppfinningen är att tillhandahålla en mjukvarumodell som medger att mjukvaruobjekt kan flyttas från en trasig processor till en fungerande processor genom att mjukvaruobjekten omstartas på den fungerande processorn. En mjukvarumodell kan också döda ett mjukvaruobjekt som finns in- stallerat på en processor och kan omstarta mjukvaruobjektet på en reparerad, tidigare trasig, processor som har satts tillbaka i systemet. Det sistnämnda ändamålet inträffar i första hand när systemet återgår till sin initial konfigurering och det på de fungerande processorerna finns installerade objekt som skall lämnas tillbaka till de reparerade processorerna.Another object of the invention is to utilize available processor resources at the same time as the invention allows a heterogeneous processor environment, as a result of the introduction of new technology in a system that grows with time and partly of the individual requirements set by different parts of a application running on prOCeSSOISyStemet. Yet another object of the invention is to provide a method for rapid recovery in the event of multiple processors failing in a distributed processor system used in a telecom system environment by providing an initial configuration of all processors and by providing, for each processor in the system, a disaster plan to an- lO 15 20 25 30 35. . . . f »H- .. = f f 515 348 4 is reversed in the event that the processor in question breaks down. A disaster plan is the body by means of which software objects, which are installed on a broken processor, are distributed to a number of processors in the system so that the load is shared between the processors. Yet another object of the invention is to provide new disaster plans for the system with working processors, some of which may have software objects installed from a broken processor, in order to prepare the system for rapid recovery in the event that another processor in the system breaks down. Yet another object of the invention is to provide a method of the above kind which takes the system back to its original configuration of processors and software objects when the system's faulty processor or faulty processors are re-inserted into the system after repair or replacement. Yet another object of the invention is to provide a software model which allows software objects to be moved from a broken processor to a working processor by restarting the software objects on the working processor. A software model can also kill a software object that is installed on a processor and can restart the software object on a repaired, previously broken, processor that has been put back in the system. The latter purpose occurs primarily when the system returns to its initial configuration and there are objects installed on the operating processors to be returned to the repaired processors.

I enlighet med uppfinningen skapas en modell av telekommunika- tionssystemet, vilken modell innefattar en hårdvarumodell av de styrande processorerna och av den styrda hårdvarutrustningen samt även en mjukvarumodell som stöder och passar i hårdvarumo- dellen i telekommunikationssystemet.In accordance with the invention, a model of the telecommunication system is created, which model comprises a hardware model of the controlling processors and of the controlled hardware equipment as well as a software model that supports and fits into the hardware model of the telecommunication system.

I enlighet med uppfinningen används en första algoritm för att räkna ut katastrofplanerna för var och en av de fungerande pro- H. i- 10 15 20 25 30 35 ~ « A . m -515 348 S cessorerna antingen med ledning av den initiala konfigurationen eller med ledning av någon av de konfigurationer som förekommer efter det att ännu en processor har gått sönder.In accordance with the invention, a first algorithm is used to calculate the contingency plans for each of the operating pro- H. i-10 15 20 25 30 35 ~ «A. m -515 348 S cessors either with the guidance of the initial configuration or with the guidance of any of the configurations that occur after another processor has broken.

I enlighet med uppfinningen används en andra algoritm som med ledning av en aktuell konfiguration beräknar en deltakonfigura- tion, som om den tillämpas på den aktuella konfigurationen, ger tillbaka den initiala konfigurationen av systemet.In accordance with the invention, a second algorithm is used which, on the basis of a current configuration, calculates a delta configuration, which, if applied to the current configuration, returns the initial configuration of the system.

I den mjukvarumodell som används för mjukvaruobjekten finns inga konfigurationsberoenden inkapslade i mjukvaruobjekten vil- ket gör att mjukvaruobjekten kan flyttas från en processor till en annan i vilken processorkonfiguration som helst.In the software model used for the software objects, there are no configuration dependencies encapsulated in the software objects, which means that the software objects can be moved from one processor to another in any processor configuration.

KOÉT BESKRIVNING AV RITNINGARNA En belysande utföringsform av uppfinningen kommer att beskrivas nedan i anslutning till de bifogade ritningarna, i vilka Fig. 1 är ett blockschema som visar ett distribuerat pro- cessorsystem i en initial konfiguration, Fig. 2 är ett blockschema som visar processorsystemet i Fig. l i en aktuell konfiguration efter det att en av processorerna gått sönder, Fig. 3 är ett blockschema av processorsystemet i Fig.1 i en andra aktuell konfiguration efter det att två pro- cessorer gått sönder, Fig. 4 är ett flödesschema som visar metoden i enlighet med uppfinningen, Fig. 5 är ett blockschema av ett distribuerat processorsys- tem i vilket vissa av processorerna styr hârdvaruutrustning, Fig. 6A är en schematisk vy av ett modulariserat mjukvaruob- jekt, Ne e; lO 15 20 25 30 35 u UH 515 348 6 Fig. 6B-D är blockscheman av visande tre olika typer av mjuk- varuobjekt, Fig. 7 är ett blockschema som visar hur hàrdvaru- och mjuk- varumodellerna i enlighet med uppfinningen passar i hop i en enda modell av telekommunikationssystemet, och Fig. 8 är ett blockschema som visar hàrdvarumodellen i en- lighet med uppfinningen.DESCRIPTION OF THE DRAWINGS An illustrative embodiment of the invention will be described below in connection with the accompanying drawings, in which Fig. 1 is a block diagram showing a distributed processor system in an initial configuration; Fig. 2 is a block diagram showing the processor system in Fig. 1 is a current configuration after one of the processors has broken down, Fig. 3 is a block diagram of the processor system of Fig. 1 in a second current configuration after two processors have broken down, Fig. 4 is a flow chart showing the method according to the invention, Fig. 5 is a block diagram of a distributed processor system in which some of the processors control hardware equipment, Fig. 6A is a schematic view of a modularized software object, Ne e; Fig. 6B-D are block diagrams showing three different types of software objects, Fig. 7 is a block diagram showing how the hardware and software models in accordance with the invention fit together in a single model of the telecommunication system, and Fig. 8 is a block diagram showing the hardware model in accordance with the invention.

DETALJERAD BESKRIVNING AV UPPFINNINGEN P2, P3 Proces- I Fig. 1 visas ett antal distribuerade processorer P1, och P4 vilka kommunicerar med varandra över ett nät N1. sorerna bildar del av ett ej visat telekommunikationsnät. Nätet N1 kan bilda en del av det nämnda ej visade telekommunikations- nätet. Varje processor innehåller en processorenhet PE och min- ne M. Mjukvaruobjekt 1, 2, 3 18 är installerade på pro- cessorerna; objekten 1, 2, 3 pá processor Pl, objekten 4-7 på processor P2, objekten 8-12 på P3 och objekten 13-18 på P4.DETAILED DESCRIPTION OF THE INVENTION P2, P3 Process- Fig. 1 shows a number of distributed processors P1, and P4 which communicate with each other over a network N1. the sensors form part of a telecommunication network (not shown). The network N1 can form a part of the mentioned telecommunication network (not shown). Each processor contains a processor unit PE and memory M. Software objects 1, 2, 3 18 are installed on the processors; objects 1, 2, 3 on processor P1, objects 4-7 on processor P2, objects 8-12 on P3 and objects 13-18 on P4.

Mjukvaran för en applikation som kör i telekommunikationsnätet innefattar mjukvaruobjekt (Fig. 6B-D), vilka ingår i mjukvaru- moduler (Fig. 6A). De modulariserade mjukvaruobjekten är allo- kationsoberoende objekt vilka kan förflyttas fritt mellan pro- cessorerna. Ett modulariserat mjukvaruobjekt år oberoende av övriga modulariserade mjukvaruobjekt. Ett mjukvarurobjekt inne- håller typiskt en process och persistent data. Persistent data är data som överlever en omstart av mjukvaruobjektet. Mjukva- ruobjekt kan kommunicera med varandra. Ett av en applikation begärt jobb involverar typiskt många mjukvaruobjekt på olika processorer och exekveras av vissa eller av alla processer. Ap- plikationen känner inte till hur mjukvaruobjektena är distri- buerade på de olika processorerna. Modulariserat persistent da- ta kan vara lagrat i en databas. På samma sätt som processor- systemet är distribuerat kan också databasen vara distribuerad över ett antal minnen M på många processorer, företrädesvis på P2, P3 och P4. Dessa data- DB3 och DB4 och innefattar minnena i samtliga processorer P1, baspartitioner betecknas DB1, DB2, . | » - n 10 15 20 25 30 35 . . . . ~ . 515 348 7 ett arbetsminne (RAM). Ur en applikations synpunkt är databa- sens distribuerade natur transparent. Persistent data måste kunna lagras på säker sätt. Detta kan åstadkommas genom konven- tionell backup-teknik, t ex genom att lagra datat på en skiva.The software for an application running in the telecommunication network includes software objects (Fig. 6B-D), which are included in software modules (Fig. 6A). The modularized software objects are allocation-independent objects which can be moved freely between the processors. A modularized software object is independent of other modularized software objects. A software object typically contains a process and persistent data. Persistent data is data that survives a reboot of the software object. Software objects can communicate with each other. One job requested by an application typically involves many software objects on different processors and is executed by some or all of the processes. The application does not know how the software objects are distributed on the various processors. Modularized persistent data can be stored in a database. In the same way as the processor system is distributed, the database can also be distributed over a number of memories M on many processors, preferably on P2, P3 and P4. These data- DB3 and DB4 and include the memories in all processors P1, base partitions are designated DB1, DB2,. | »- n 10 15 20 25 30 35. . . . ~. 515 348 7 a working memory (RAM). From an application's point of view, the distributed nature of the database is transparent. Persistent data must be able to be stored securely. This can be achieved by conventional backup technology, for example by storing the data on a disk.

Ett nytt och föredraget alternativ är emellertid att lagra en spegelkopia av varje modulariserat mjukvaruobjekt i en databas- partition på en processor som är skild från den på vilken ob- jektet är installerat. Närmare bestämt lagras spegelkopian av varje modulariserat mjukvaruobjekt i databaspartitionen på den processor som utpekas i katastrofplanen för den processor på vilken det modulariserade mjukvaruobjektet, originalet, exekve- rar. På detta sätt kan kopior av det modulariserade persistenta datat lagras på säkert sätt på en annan processor i händelse av den processor på vilken originalet finns lagrat går sönder.However, a new and preferred alternative is to store a mirror copy of each modularized software object in a database partition on a processor separate from that on which the object is installed. More specifically, the mirror copy of each modularized software object is stored in the database partition of the processor designated in the disaster plan of the processor on which the modularized software object, the original, executes. In this way, copies of the modularized persistent data can be securely stored on another processor in the event that the processor on which the original is stored breaks.

Det sätt på vilket objekten 1-18 är distribuerade på processo- rerna P1-P4 kallas den initiala konfigureringen av det distri- buerade processorsystemet och visas i bifogade Tabell 1, ur vilken framgår att objekten 1, 2 och 3 är installerade på pro- cessorn Pl.The way in which objects 1-18 are distributed on the processors P1-P4 is called the initial configuration of the distributed processor system and is shown in the attached Table 1, from which it appears that objects 1, 2 and 3 are installed on the processor. Pl.

Den initiala konfigureringen får inte försvinna om någon pro- cessor går sönder. Av denna anledning lagras den initiala konfigurationen och spegelkopian av denna på det ovan beskrivna sättet. I stället för att implementera den initiala konfigura- tionen i form av en tabell kan den implementeras i form av så (1,2), (1,3) den information som framgår ur den första raden i Tabell 1. kallade par. Exempelvis motsvarar paren (1,1), För att återhämtningstiden för processorsystemet, och därför även av telekommunikationssystemet, skall vara kort om en pro- cessor går sönder skall det finna en katastrofplan för varje processor som går sönder. Eftersom man inte kan förutse vilken processor som går sönder måste det finnas en katastrofplan för varje processor. En katastrofplan innehåller direktiv avseende de processorer till vilka mjukvaruobjekten på den trasiga pro- cessorn skall flyttas. En katastrofplan, visad i Fig. 2 utpekar de processorer till vilka de på processorn P1 installerade ob- jekten skall flyttas i händelse av att processorn P1 går sön- 10 15 20 25 30 35 1.- .H v n n» .H1 a. n . i t, ru | u a | n ia v . ». p»: e u a o »nu n. _ a. a f. f- n. i | -~ ﬁ . æ Q æ 8 u . f > H i H n der. En annan katastrofplan innehåller information som talar om vart de pá processorn P2 installerade objekten skall flyttas om P2 går sönder. På likartat sätt finns en katastrofplan som skall följas om processor P3 går sönder. Denna katastrofplan visas i Tabell 4. Slutligen finns en katastrofplan, Tabell 5, som tar hand om det fall som inträffar i händelse av att pro- cessor P4 går sönder.The initial configuration must not disappear if any processor breaks. For this reason, the initial configuration and the mirror copy of it are stored in the manner described above. Instead of implementing the initial configuration in the form of a table, it can be implemented in the form of so (1,2), (1,3) the information that appears from the first row in Table 1. called pairs. For example, the pairs (1,1) correspond to In order for the recovery time of the processor system, and therefore also of the telecommunication system, to be short if a processor breaks, there must be a disaster plan for each processor that breaks. Since it is not possible to predict which processor will break, there must be a disaster plan for each processor. A disaster plan contains directives regarding the processors to which the software objects on the broken processor are to be moved. A disaster plan, shown in Fig. 2, designates the processors to which the objects installed on the processor P1 are to be moved in the event that the processor P1 fails. . i t, ru | u a | n ia v. ». p »: e u a o» nu n. _ a. a f. f- n. i | - ~ ﬁ. æ Q æ 8 u. f> H i H n der. Another disaster plan contains information that tells you where to move the objects installed on processor P2 if P2 breaks down. Similarly, there is a disaster plan to follow if processor P3 breaks down. This contingency plan is shown in Table 4. Finally, there is a contingency plan, Table 5, which takes care of the case that occurs in the event that processor P4 breaks down.

Processor P1 qår sönder Antag, att processor P1 går sönder. Processorsystemet måste återhämta sig snabbt från felet och därför skall de på proces- sorn P1 installerade mjukvaruobjekten 1, 2, 3 flyttas över till processorerna P2 och P4 i enlighet med katastrofplanen för P1.Processor P1 breaks down Assume that processor P1 breaks down. The processor system must recover quickly from the error and therefore the software objects 1, 2, 3 installed on the processor P1 must be transferred to the processors P2 and P4 in accordance with the disaster plan for P1.

Närmare bestämt skall mjukvaruobjekten 1 och 3 flyttas till processorn P2 och mjukvaruobjektet 2 skall flyttas till proces- sorn P4. Detta àstadkoms genom att mjukvaruobjekten 1-3 tas bort från processorn P1 och genom att man på processorn P2 ska- par och startar mjukvaruobjekten 1, 3 och på processorn P4 skapar och startar mjukvaruobjektet 2. Detta kräver givetvis att exekveringskapacitet finns tillgänglig på processorerna P2 och P4. Finns inte sådan kapacitet tillgänglig kan systemet in- te återhämta sig från processorfelet. I det följande antas att den nödvändiga processorkapaciteten finns tillgänglig och att systemet således återhämtar sig från felet. Systemet kommer nu att ha den kofiguration som visas i Fig. 2 och Tabell 6. Utgå- ende fràn denna konfiguration, vilken nu kallas den aktuella konfigurationen, måste nya katastrofplaner skapas för att sys- temet snabbt skall kunna återhämta sig om någon av de övriga processorerna går sönder. För detta ändamål framställs nya ka- tastrofplaner vilka ger direktiv om till vilka processorer de på en trasig processor installerade mjukvaruobjekten skall flyttas. Eftersom man inte kan förutsäga vilken av de tre driftsdugliga processorerna P2-P4 som kommer att gå sönder är det nödvändigt att skapa katastrofplaner för var och en av de driftsdugliga processorerna. Tabell 8 är den nya katastofplanen (CP-P2') (CP-P3') för processor P3 och Tabell 10 den nya katastrofplanen (CP-P4') för processor P2, Tabell 9 den nya katastrofplanen för processor P4. . v » - »- 10 15 20 25 30 35 »nn m, n n nn ...in -ß I n n' nn n n n n n nn n n» n n n n n n n n- h nn. =>= n. f n , nn cf-r-n n n n n = n n n n i n »n , U n På samma sätt som de ursprungliga katastrofplanerna lagras de nya katastrofplanerna och deras spegelkopior på det ovan be- skrivna sättet.More specifically, the software objects 1 and 3 must be moved to the processor P2 and the software object 2 must be moved to the processor P4. This is achieved by removing the software objects 1-3 from the processor P1 and by creating and starting the software objects 1, 3 on the processor P2 and on the processor P4 creating and starting the software object 2. This of course requires that execution capacity is available on the processors P2 and P4. If such capacity is not available, the system can not recover from the processor error. In the following, it is assumed that the necessary processor capacity is available and that the system thus recovers from the error. The system will now have the co-configuration shown in Fig. 2 and Table 6. Based on this configuration, which is now called the current configuration, new disaster plans must be created in order for the system to be able to recover quickly if any of the other the processors break down. For this purpose, new disaster plans are produced which provide directives as to which processors the software objects installed on a broken processor are to be moved. Since it is not possible to predict which of the three operational processors P2-P4 will break, it is necessary to create disaster plans for each of the operational processors. Table 8 is the new disaster plan (CP-P2 ') (CP-P3') for processor P3 and Table 10 is the new disaster plan (CP-P4 ') for processor P2, Table 9 is the new disaster plan for processor P4. . v »-» - 10 15 20 25 30 35 »nn m, n n nn ... in -ß I n n 'nn n n n n n n n n n n n n n n n n n n n n n n h nn. => = n. f n, nn cf-r-n n n n n n = n n n n i n »n, U n In the same way as the original contingency plans, the new contingency plans and their mirror copies are stored in the manner described above.

Det bör observeras att framställningen av nya katastrofplaner för den aktuella konfigurationen sparar minne jämfört med det följande teoretsikt tänkbara schema; antag att systemet består av fyra processorer och att systemet är så designat att det to- lererar att två processorer går sönder. Eftersom man inte kan förutse vilken processor som går sönder först och vilken pro- cessor går sönder som nummer två, måste man skapa lika många katastrofplaner som antalet olika sätt på vilket två element kan väljas ur en grupp av fyra element. Dessa katastrofplaner måste lagras i databasen. Således mäste tjugofyra katastrofpla- ner skapas och lagras. Denna lagring kräver mycket minnes- utrymme. Minnesutrymmet växer snabbare än exponentiellt med an- talet processor systemet kan tolerera gå sönder.It should be noted that the preparation of new contingency plans for the current configuration saves memory compared to the following theoretically conceivable scheme; assume that the system consists of four processors and that the system is designed so that it tolerates two processors breaking. Since it is not possible to predict which processor will break first and which processor will break as number two, one must create as many disaster plans as the number of different ways in which two elements can be selected from a group of four elements. These contingency plans must be stored in the database. Thus, twenty-four disaster plans must be created and stored. This storage requires a lot of memory space. The memory space grows faster than exponentially with the number of processors the system can tolerate breaking down.

Under perioden från det att systemet återhämtat sig fram till dess att nya katastrofplaner utarbetats och lagrats i databasen är systemet sårbart. Under denna period får inga processorer gå sönder. Skulle en processor haverera under denna period kommer att systemet inte att vara tillgängligt.During the period from the recovery of the system until new contingency plans have been prepared and stored in the database, the system is vulnerable. During this period, no processors may break. Should a processor fail during this period, the system will not be available.

När en processor Pl efter att ha tagits bort och reparerats åter installeras i systemet, skall systemet återgå till den initiala konfigurationen. Detta kan ske antingen genom att döda alla mjukvaruobjekt i den aktuella konfigurationen och genom att skapa och starta alla mjukvaruobjekt på processorerna i systemet. I den föreslagna utföringsformen av uppfinningen dö- das först endast de objekt som flyttats bort fràn den första processorn och som nu exekverar på andra processorer. Därefter äterskapas och startas dessa på processorn Pl. För att hitta dessa objekt skapas en deltakonfigurationstabell genom att sub- trahera den initiala konfigurationen från den aktuella konfi- gurationen med uteslutande av processorn Pl. Genom att subtra- hera Tabell 1 från Tabell 6 erhålls den i Fig. 7 visade delta- konfigurationen. Den rad som avser den trasiga processorn Pl .bn n: 10 15 20 25 30 35 u» x» u o I. m., i» . s .i .n n f n | n an v » v» nu: I n . 1 »vn :vs ~f » .- ;« :u- = . 1 n s. u f I f 1 . å. , .. ingår inte i subtraktionen. Deltakonfigurationen visar att ob- jekten 1 och 3 på processorn P2 och att objektet P2 på pro- cessorns P4 skall dödas på de respektive processorerna. Däref- ter skall de återskapas och startas på den reparerade pro- cessorn Pl. När objekten återskapats kommer systemet att köra på samma sätt som det gjorde i den initiala konfigurationen och systemets återhämtiningstid blir kort.When a processor P1 is removed from the system after being removed and repaired, the system should return to its initial configuration. This can be done either by killing all software objects in the current configuration and by creating and launching all software objects on the processors in the system. In the proposed embodiment of the invention, first only the objects which have been moved away from the first processor and which now execute on other processors are killed. Then these are created and started on the processor P1. To find these objects, a delta configuration table is created by subtracting the initial configuration from the current configuration excluding the processor P1. By subtracting Table 1 from Table 6, the delta configuration shown in Fig. 7 is obtained. The row referring to the broken processor Pl .bn n: 10 15 20 25 30 35 u »x» u o I. m., I ». s .i .n n f n | n an v »v» nu: I n. 1 »vn: vs ~ f» .-; «: u- =. 1 n s. U f I f 1. å., .. is not included in the subtraction. The delta configuration shows that objects 1 and 3 on processor P2 and that object P2 on processor P4 are to be killed on the respective processors. Then they must be recreated and started on the repaired processor P1. When the objects are recreated, the system will run in the same way as it did in the initial configuration and the system recovery time will be short.

Processor Pl havererar och där efter havererar processor P2.Processor P1 crashes and then processor P2 crashes.

I det följande exemplet antas att processorn Pl går sönder och därefter att processorn P2 går sönder. Med anknytning till det föregående exemplet antas först att systemet är i drift med samma konfiguration som i Fig. 1, att katastrofplaner har ska- pats för var och en av processorerna Pl-P4, att processorn P1 havererar, att mjukvaruobjekten på processorn Pl flyttas över till de driftsdugliga processorerna i enlighet med katastro- fplanen i Tabell 2, att systemet återhämtar sig och är i drift, att nya katastrofplaner skapas för processorerna P2, P3 och P4, att processorn P1 tas bort och lämnas till reparation. Nu antas att processorn P2 havererar. Således skall den nya katastro- fplanen som hör ihop med processorn P2, d v s den nya kata- strofplanen i Tabell 8, följas. Enligt denna katastrofplan skall objekten 1, 3 och 4 flyttas till processorn P3 och objek- ten 5-7 skall flyttas till processorn P4. På samma sätt som tidigare tas mjukvaruobjekten på processorn 2 bort och flyttas över till processorerna P3 och P4. Systemet kommer nu att vara i funktion och köra med den konfiguration som visas i Fig. 3 och Tabell ll. I enlighet med uppfinningen är det nu nödvändigt att utarbeta nya katastrofplaner för var och en av processorer- na P3 och P4.In the following example, it is assumed that the processor P1 breaks and then that the processor P2 breaks. In connection with the previous example, it is first assumed that the system is in operation with the same configuration as in Fig. 1, that disaster plans have been created for each of the processors P1-P4, that the processor P1 fails, that the software objects on the processor P1 are moved over to the operational processors in accordance with the disaster plan in Table 2, that the system recovers and is in operation, that new disaster plans are created for the processors P2, P3 and P4, that the processor P1 is removed and left for repair. It is now assumed that the processor P2 fails. Thus, the new disaster plan associated with processor P2, i.e. the new disaster plan in Table 8, must be followed. According to this disaster plan, objects 1, 3 and 4 must be moved to processor P3 and objects 5-7 must be moved to processor P4. In the same way as before, the software objects on the processor 2 are removed and moved to the processors P3 and P4. The system will now be operational and running with the configuration shown in Fig. 3 and Table ll. In accordance with the invention, it is now necessary to prepare new contingency plans for each of the processors P3 and P4.

Under perioden från processorns P2 haveri till det att nya ka- tastrofplaner för processorerna P3 och P4 har utarbetats och lagrats i systemet kommer systemet vara sårbart. Om således en- dera P3 eller P4 havererar kommer systemet att vara otillgäng- ligt. Nu antas att detta inte inträffar. Istället är systemet i drift med den konfiguration som visas i Fig. 3 och Tabell 11. nu .. 10 15 20 25 30 35 ff- 1 - n 1. -,.~ f- | ff z w» 1 s n . v f v nu r _ v 1 : v u I n v . u - I I - ,-V - - I » » 1 » - -~ v e = ; - | - s « s 1-1. u = .f a 1. n Q Nu tas processorn P2 bort från systemet och lämnas till repara- tion. Därefter antas att processorerna Pl och P2 blir repare- rade och att de sätts tillbaka in i systemet. Det kommer nu att bli nödvändigt att döda objekten 1-7 och processorerna P3 och P4 och att återskapa och starta dem på sina respektive usprung- liga processorer Pl och P2. Genom att använda en deltakon- figuration, som erhålls genom att subtrahera den initiala kon- figurationen från den aktuella konfigurationen i Fig. 1, med undantagande av de trasiga processorerna Pl och P2 denna gång, kommer de objekt som måste dödas identifieras; i detta fall 1- 7. Närmare bestämt skall objekten 1, 3, 4 på processor P3 och objekten 2, 5, 6, 7 på processorn P4 flyttas över. Det sätt på vilket dessa objekt skall distribueras på processorerna Pl och P2 ges av den initiala konfigurationstabellen. Således skapas och startas objekten 1, 2, 3 på processorn P1 och objekten 4-7 skapas och startas på processorn P2. Systemet har nu återhämtat sig från inkopplingen av de två reparerade processorerna och antas nu vara i drift.During the period from the processor P2 crash to the time that new disaster plans for the processors P3 and P4 have been prepared and stored in the system, the system will be vulnerable. Thus, if either P3 or P4 fails, the system will be unavailable. Now it is assumed that this will not happen. Instead, the system is in operation with the configuration shown in Fig. 3 and Table 11. now .. 10 15 20 25 30 35 ff- 1 - n 1. - ,. ~ f- | ff z w »1 s n. v f v nu r _ v 1: v u I n v. u - I I -, -V - - I »» 1 »- - ~ v e =; - | - s «s 1-1. u = .f a 1. n Q The processor P2 is now removed from the system and left for repair. It is then assumed that the processors P1 and P2 are repaired and that they are put back into the system. It will now be necessary to kill objects 1-7 and processors P3 and P4 and to recreate and restart them on their respective original processors P1 and P2. Using a delta configuration obtained by subtracting the initial configuration from the current configuration in Fig. 1, with the exception of the broken processors P1 and P2 this time, the objects to be killed will be identified; in this case 1- 7. More specifically, objects 1, 3, 4 on processor P3 and objects 2, 5, 6, 7 on processor P4 shall be moved over. The manner in which these objects are to be distributed on the processors P1 and P2 is given by the initial configuration table. Thus, objects 1, 2, 3 are created and started on the processor P1 and objects 4-7 are created and started on the processor P2. The system has now recovered from the connection of the two repaired processors and is now assumed to be in operation.

I de ovanstående exemplen har ett processorsystem med fyra pro- cessorer beskrivits. Uppfinningen är lika väl tillämpbar på processor system som innehåller två, tre, fem eller flera pro- cessorer. I det sista exemplet beskrevs ett processor system som tolererade två processorhaverier. Uppfinnings förfarandet är lika väl tillämpbart på processorsystem som tolererar tre eller flera processorhaverier. Varje gång en processor havere- rar flyttas dess mjukvaruobjekt till andra funktionsdugliga processorer i systemet och nya katastrofplaner för de funk- tionsdugliga processorerna skapas. Det sista exemplet illust- rerar att ett av fyra processorer bestående processorsystem kan vara i drift med 50% av processorerna havererade. Applikationen kommer fortfarande att exekvera ehuru med ett nedsatt kapaci- tet. Om processorsystemet är en väljare i en telefonstation kommer telefontrafiken fortfarande att vara i gång och spärr kommer att inträffa vid låg trafikvolym. Detta är ett nytt och unikt särdrag som inte finns i någon av de ovan nämnda ameri- kanska patenten och, såvitt sökanden är bekant, har ingen tidigare kunnat åstadkomma detta. .un- 10 15 20 25 30 35 | u : | n 515 348 12 En första algoritm används till att skapa katastrofplaner fràn den initiala konfigurationen i det fall att en första processor havererar eller från den aktuella konfigurationen i det fall att en tillkommande processor havererar. Den första algoritmen innehåller parametrar som avser kapaciteten för en processor, parametrar som avser storleken av minnet av en processor, para- metrar som avser hur mycket processorkapacitet (maskincykler per process som skall exekvera) och minne de enskilda flyttade objekten erfordrar, samt parametrar avseende tjänstens kvalité.In the above examples, a processor system with four processors has been described. The invention is equally applicable to processor systems containing two, three, five or more processors. The last example described a processor system that tolerated two processor failures. The method of the invention is equally applicable to processor systems that tolerate three or more processor failures. Each time a processor crashes, its software objects are moved to other functional processors in the system and new disaster plans for the functional processors are created. The last example illustrates that a processor system consisting of four processors can be in operation with 50% of the processors failed. The application will still execute even with a reduced capacity. If the processor system is a selector in a telephone exchange, the telephone traffic will still be running and blocking will occur at low traffic volume. This is a new and unique feature which does not exist in any of the above-mentioned American patents and, as far as the applicant is aware, no one has previously been able to achieve this. .un- 10 15 20 25 30 35 | u: | n 515 348 12 A first algorithm is used to create disaster plans from the initial configuration in the event that a first processor fails or from the current configuration in the event that an additional processor fails. The first algorithm contains parameters relating to the capacity of a processor, parameters relating to the size of the memory of a processor, parameters relating to how much processor capacity (machine cycles per process to execute) and memory required by the individual moved objects, and parameters relating to the service quality.

En andra algoritm används för att återföra systemet till dess initiala konfiguration. Denna andra algoritm har redan beskri- vits ovan och kallades för delta konfiguration.A second algorithm is used to return the system to its initial configuration. This second algorithm has already been described above and was called delta configuration.

Olika metoder kan användas för att detektera en trasig proces- sor, t ex "hjärtslagsmetoden" i enlighet med den ovan nämnda amerikanska patentskriften 4 710 926. En föredragen metod i ett typiskt telekomnät är emellertid att övervaka de länkar som kopplar ihop processorerna med varandra i nätet Nl. I samband med de tvâ ovan beskrivna exemplen beskrevs att mjukvaruobjekt som fanns installerade på en havererad processor överflyttades till två processorer. Naturligtvis kan den havererade proces- sorns objekt även distribueras pà tre eller flera processorer i systemet. I undantagsfall kan samtliga mjukvaruobjekt flyttas över till en enda processor i det fall att systemet består av två driftsdugliga processorer och den ena av dessa havererar.Various methods can be used to detect a broken processor, such as the "heartbeat method" in accordance with the aforementioned U.S. Patent 4,710,926. However, a preferred method in a typical telecom network is to monitor the links connecting the processors to each other in the network Nl. In connection with the two examples described above, it was described that software objects that were installed on a failed processor were transferred to two processors. Of course, the objects of the failed processor can also be distributed on three or more processors in the system. In exceptional cases, all software objects can be moved to a single processor in the event that the system consists of two operational processors and one of these fails.

När systemet är i drift är alla processorerna är driftsdugliga och har mer kapacitet och mer minne än vad som erfordras för att de skall utföra sina jobb med beräknad kapacitet, d V s de skall ha kapacitet och minne över för att ta över objekt från en eller flera processorer och för att ta över applikationsjobb som är under exekvering. På så sätt säkerställs att samtliga processorer är driftsdugliga och inget testprogram behöver kö- ras för att verifiera detta. Vidare kommer systemets kapacitet att överskrida den beräknade kapaciteten, vilket betyder att systemet kommer att ha "reservkapacitet" som kan användas för att ta hand om ytterligare applikationsjobb; det finns ingen 10 15 20 25 30 35 =a -H- 515 348 13 död "reserv"-kapacitet. För applikationen kommer en havererad processor endast medföra att systemet kapacitet nedsätts; have- riet kommer inte att döda systemet.When the system is in operation, all the processors are operational and have more capacity and more memory than is required for them to perform their jobs with calculated capacity, i.e. they must have the capacity and memory to take over objects from one or more multiple processors and to take over application jobs that are being executed. This ensures that all processors are operational and no test program needs to be run to verify this. Furthermore, the capacity of the system will exceed the estimated capacity, which means that the system will have "reserve capacity" that can be used to take care of additional application jobs; there is no dead "reserve" capacity. For the application, a failed processor will only cause the system capacity to be reduced; the accident will not kill the system.

Såsom ett alternativ till att dimensionera systemet med en ak- tiv "reserv"-kapacitet som kan användas för att ta hand om tillkommande applikationsjobb, kan systemet designas med ingen aktiv "reserv"-kapacitet och med samtliga processorer arbetande med dimensionerad kapacitet. När en processor havererar kommer systemet att arbeta med nedsatt kapacitet.As an alternative to dimensioning the system with an active "reserve" capacity that can be used to take care of additional application jobs, the system can be designed with no active "reserve" capacity and with all processors working with dimensioned capacity. When a processor fails, the system will operate at reduced capacity.

Fig. 4 är ett flödesschema som visar de metodsteg som utförs i enlighet med uppfinningen. Det finns alltid en initial konfigu- ration i vilken samtliga modulariserade Mjukvaruobjekt är mappade på enskilda processorer. Den initiala konfigurationen skapas av systemsäljaren eller systemoperatören och lagras i systemet. Detta anges i ruta 20. Därefter skall katastrofplaner skapas i enlighet med den första algoritmen. Det skall finnas lika många katastrofplaner som processorer i systemet. Vidare skall spegelkopior av persistenta databasobjekt skapas. Detta anges i ruta 21. I en föredragen utföringsform skapar varje processor sin egen katastrofplan, d v s den katastrofplan som systemet skall använda i händelse av processorn havererar. Åt- gärden säkerställer att arbetet med att skapa katastrofplanerna blir fullständigt distribuerat. Därefter havererar en proces- sor, ruta 22. Mjukvaruobjekten på den havererade processorn skall flyttas över till driftsdugliga processorer med utnytt- jande av katastrofplanen för den havererade processorn. Med uttrycket "flytta över" objekt avses att nya kopior av mjukva- ruobjekten på den havererade processorn skapas och startas på de processorer till vilka de skall flyttas över i enlighet med katastrofplanen. Detta anges i ruta 23. Ruta 23 representerar således systemets återhämtning från den havererade processorn.Fig. 4 is a flow chart showing the method steps performed in accordance with the invention. There is always an initial configuration in which all modularized Software objects are mapped on individual processors. The initial configuration is created by the system vendor or system operator and stored in the system. This is indicated in box 20. Thereafter, contingency plans shall be created in accordance with the first algorithm. There must be as many contingency plans as processors in the system. Furthermore, mirror copies of persistent database objects must be created. This is indicated in box 21. In a preferred embodiment, each processor creates its own disaster plan, i.e. the disaster plan that the system must use in the event of the processor failing. The measure ensures that the work of creating the disaster plans is fully distributed. Then a processor fails, box 22. The software objects on the failed processor must be transferred to operational processors using the disaster plan for the failed processor. The term "transfer" object means that new copies of the software objects on the failed processor are created and started on the processors to which they are to be transferred in accordance with the disaster plan. This is indicated in box 23. Box 23 thus represents the system's recovery from the failed processor.

Systemet är nu i drift och en ny konfiguration, benämnd den ak- tuella konfigurationen, uppstår. Den aktuella konfigurationen lagras också i ett minne av en distribuerad processor. Medan systemet är i drift skapas nya katastrofplaner för de funk- tionsdugliga processorerna, ruta 24. Nu har systemet ätervunnit sin förmåga att motstå ett nytt processorhaveri. Även spegelko- . . : . H 10 15 20 25 30 35 515 348 14 pior av de ny katastrofplanerna lagras i databasen. Om en ny processor havererar sker återgång till operation 22, visat av pilen 25. Därefter repareras den eller de havererade processo- rerna ruta 26, och sätts tillbaka i systemet, ruta 26. Om två eller flera processorer har havererat antas att de repareras och sätts tillbaka i systemet samtidigt. Teoretiskt sett är det och åt naturligtvis möjligt att reparera havererade processorer en en åt gängen och att de sätts tillbaka i systemet en och en gången. Ur praktisk synpunkt är detta emellertid en omväg. Det sista steget i processen, ruta 27, är att ta tillbaka systemet till dess initiala konfiguration med användande av den andra algoritmen.The system is now in operation and a new configuration, called the current configuration, is emerging. The current configuration is also stored in a memory by a distributed processor. While the system is in operation, new disaster plans are being created for the functional processors, box 24. The system has now regained its ability to withstand a new processor failure. Also mirror cow-. . :. H 10 15 20 25 30 35 515 348 14 piors of the new disaster plans are stored in the database. If a new processor fails, it returns to operation 22, indicated by the arrow 25. Then the failed processor (s) is repaired in box 26, and put back in the system, box 26. If two or more processors have failed, it is assumed that they are repaired and put back in the system at the same time. Theoretically, it is and of course possible to repair failed processors one at a time and that they are put back into the system one at a time. From a practical point of view, however, this is a detour. The final step in the process, box 27, is to return the system to its initial configuration using the second algorithm.

I de ovan beskrivna exemplen har antagits att mjukvaruobjekten l-18 inte styr någon hårdvaruutrustning. Exempel pä hårdvaruut- rustning som styrs av mjukvarumoduler är I/O-anordningar, gränssnitt mot abonnentlinje, processorer mot abonnentlinjer, tonavkodare, talbeskedsutrustningar, konferensutrustning. Hård- varuberoenden av detta slag skapar restriktioner på mjukvaru- modulerna. En mjukvarumodul som är involverad i styrning av en hárdvaruutrustning som är ansluten till en eller flera proces- sorer kan inte flyttas över till en godtycklig processor i sys- temet utan måste flyttas över till en processor som har access till denna härdvaruutrustning. Katastrofplaner måste skapas med detta i åtanke.In the examples described above, it has been assumed that the software objects l-18 do not control any hardware equipment. Examples of hardware equipment controlled by software modules are I / O devices, subscriber line interfaces, subscriber line processors, tone decoders, voice messaging equipment, conference equipment. Hardware dependencies of this kind create restrictions on the software modules. A software module involved in controlling a hardware equipment connected to one or more processors cannot be transferred to any processor in the system but must be transferred to a processor having access to this hardware equipment. Disaster plans must be created with this in mind.

Ett telekomsystem kan vanligen fortsätta sin drift trots för- lust av vissa organ men de tjänster telekomsystemet tillhanda- häller kan hämmas.A telecom system can usually continue to operate despite the loss of certain bodies, but the services provided by the telecom system can be hampered.

Till den grad det är möjligt måste således en katastrofplan al- lokera organstyrande mjukvaruobjekt till processorer som har access till de styrda organen. Om detta inte är möjligt måste de modulariserade mjukvaruobjekten som styr organen uteslutas ur katastrofplanen. Ett uteslutet mjukvaruobjekt är alltid av den typ som visas i Fig. 6C.To the extent possible, a disaster plan must thus allocate organ-controlling software objects to processors that have access to the controlled organs. If this is not possible, the modularized software objects that control the organs must be excluded from the contingency plan. An excluded software object is always of the type shown in Fig. 6C.

Fig. 6B illustrerar ett mjukvaruobjekt som innehåller en funk- tionsdel (exekveringsdel) och en del med persistent data , . < | m 10 15 20 25 30 35 f | 1 , . , 515 348 15 (persistentdata-del). Mjukvaruobjektet i Fig. 6C innehàller funktionsdelen av mjukvaruobjektet i Fig. 6B och mjukvaruobjek- tet i Fig. 6D innehåller persistentdata-delen av samma mjuk- varuobjekt som visas i Fig. 6B. Mjukvaruobjekten i Fig. 6C och 6D bildar tillsammans ett par och de har samma nyckel. Nyckeln är det i Fig. 6B visade mjukvaruobjektets logiska adress till databasen och nyckeln kommer därför även att vara den logiska adressen till de två mjukvaruobjekten i Fig. 6C och 6D i data- basen. Det är mjukvaruobjektet i Fig. 6C som styr en anordning.Fig. 6B illustrates a software object containing a function part (execution part) and a part with persistent data,. <| m 10 15 20 25 30 35 f | 1,. , 515 348 15 (persistent data part). The software object in Fig. 6C contains the functional part of the software object in Fig. 6B and the software object in Fig. 6D contains the persistent data part of the same software object as shown in Fig. 6B. The software objects in Figs. 6C and 6D together form a pair and they have the same key. The key is the logical address of the software object shown in Fig. 6B to the database and the key will therefore also be the logical address of the two software objects in Figs. 6C and 6D in the database. It is the software object in Fig. 6C that controls a device.

Det är i detta mjukvaruobjekt som det finns ett hárdvaruberoen- de.It is in this software object that there is a hardware dependency.

Ett exkluderat mjukvaruobjekt i en aktuell konfiguration måste blockeras, d v s andra mjukvaruobjekt skall inte kunna kommuni- cera med det. I en föredragen utföringsform ástadkoms blocke- ring genom att sätta persistentdata-delen av mjukvaruobjektet i Fig. 6D i blockerat tillstànd. Ett blockerat tillstànd markeras genom att ställa en flagga i mjukvaruobjektet. Det bör observe- ras att det inte är det organstyrande mjukvaruobjektet 6C som blockeras utan dess persistenta tvilling-mjukvaruobjekt i Fig. 6D. Genom att undersöka tillståndet av ett mjukvaruobjekt innan applikationen börjar kommunicera med mjukvaruobjektet kan ap- plikationen hantera blockerade organ pá ett ordnat sätt.An excluded software object in a current configuration must be blocked, ie other software objects must not be able to communicate with it. In a preferred embodiment, blocking is accomplished by placing the persistent data portion of the software object in Fig. 6D in a blocked state. A blocked state is marked by placing a flag in the software object. It should be noted that it is not the organ control software object 6C that is blocked but its persistent twin software object in Fig. 6D. By examining the state of a software object before the application starts communicating with the software object, the application can handle blocked organs in an orderly manner.

Såsom ett alternativ till att blockera mjukvaruobjekt med ställd flagga i databasrepresentationen av mjukvaruobjektet kan mjukvaruobjektet blockeras i de adresstabeller som finns i de respektive processorernas operativsystem. Operativsystemet av en processor har en adresstabell till sina egna mjukvaruobjekt.As an alternative to blocking software objects with a set flag in the database representation of the software object, the software object can be blocked in the address tables contained in the operating systems of the respective processors. The operating system of a processor has an address table for its own software objects.

Adresstabellerna används när meddelanden skickas till operativ- systemets mjukvaruobjekt.The address tables are used when sending messages to the operating system software objects.

I begreppet kvalité av en tjänst inkluderas det faktum att pro- cessorerna i ett processorsystem kan vara av tvâ slag, feltole- ranta processorer (FTP-processorer) och icke-FTP-processorer.The concept of quality of a service includes the fact that the processors in a processor system can be of two types, fault-tolerant processors (FTP processors) and non-FTP processors.

En FTP processor, som vanligen innefattar dubbla processorer, fortsätter att exekvera sina jobb även om det uppstår ett en- kelt hârdvarufel i den hårdvara som FTP-processorn styr. När felet uppstår triggas ett alarm men programmet havererar inte. 10 15 20 25 30 35 . Q 1 « v u 515 348 16 Med hjälp av organ, vilka inte beskrivs i föreliggande uppfin- ning eftersom de inte bildar någon del av denna uppfinning, är det möjligt att ta en FTP-processor ur tjänst, så att de kan repareras, på ett kontrollerat sätt med användande av katastro- fplanerna. De tjänster applikationen tillhandahåller kommer inte att avbrytas. Såsom ett exempel kan nämnas att i ett tele- komsystem kommer inga trafikstörningar att inträffa; pågående koppel ej att brytas. Systemåterhämtning till följd av borttag- ning av en FTP-processor kommer således inte att bryta levere- rade tjänster. Genom att kombinera den föreliggande uppfinning- en med de ovan beskrivna FTP-processorerna, är det möjligt att åstadkomma mycket hög systemtillgänglighet samt mycket hög tjänstetillförlitlighet när hårdvarufel inträffar. Kort samman- fattat är syftet med parametern tjänstekvalité att stödja FTP- processorer för programvara som kräver mycket hög tjänstetill- förlitlighet. En FTP-processor kommer således att maskera ett internt hårdvarufel men FTP-processorn måste repareras innan nya interna hårdvarufel uppstår.An FTP processor, which usually includes dual processors, continues to execute its jobs even if a single hardware failure occurs in the hardware controlled by the FTP processor. When the error occurs, an alarm is triggered but the program does not crash. 10 15 20 25 30 35. By means of means, which are not described in the present invention because they do not form part of this invention, it is possible to take an FTP processor out of service, so that they can be repaired, on a controlled manner using the contingency plans. The services provided by the application will not be interrupted. As an example, it can be mentioned that in a telecom system no traffic disruptions will occur; ongoing leash not to be broken. System recovery due to removal of an FTP processor will thus not interrupt delivered services. By combining the present invention with the FTP processors described above, it is possible to achieve very high system availability as well as very high service reliability when hardware failures occur. In short, the purpose of the service quality parameter is to support FTP processors for software that requires very high service reliability. An FTP processor will thus mask an internal hardware failure, but the FTP processor must be repaired before new internal hardware failures occur.

I Fig. 5 visas ett processor system likartat det i Fig. 1, var- vid det förekommer ett nät N1 till vilket processorer Pl-P4 har access och kan kommunicera med varandra. Ehuru ej visat i Fig. 5 antas att mjukvaruobjekten 1-18 är distribuerade på processo- rerna P1-P4 på samma sätt som visas i Fig. 1. Vidare visas ett organ Dl vara ansluten till processorn P1. Det finns också ett andra nät N2 till vilket processorer P2 och P4 har access. En organprocessor D6 är en anordning som används för att ansluta organ D2 och D3 till nätet N2. Organprocessorn D6 styr således anordningarna D2 och D3. En annan organprocessor D7 ansluter anordningar D4 och D5 till nätet N2 och styr således dessa. Det finns mjukvaruobjekt som ansluter anordningarna D1, D2, D3 D7 till processorsystemet. Såsom ett exempel antas mjukvaruob- jektet 1 styra organet D1. Om P1 kan ingen av processorerna P2- P4 styra organet D1. Således kan objektet 1 inte flyttas över till någon av processorerna P2-P4. Vad beträffar organen D2-D7 gäller emellertid att mjukvaruobjekt som styr något eller några av dessa organ måste vara installerade på en processor som kan kommunicera med dessa organ. Organen D2-D5 kan styras via nätet N2 och således kan mjukvaruobjekt som styr något eller några av § I ¿ - ~ . 10 15 20 25 30 35 u h., 515 348 17 dessa organ installeras på någon av processorerna P2 och P4.Fig. 5 shows a processor system similar to that of Fig. 1, whereby there is a network N1 to which processors P1-P4 have access and can communicate with each other. Although not shown in Fig. 5, it is assumed that the software objects 1-18 are distributed on the processors P1-P4 in the same way as shown in Fig. 1. Furthermore, a means D1 is shown to be connected to the processor P1. There is also a second network N2 to which processors P2 and P4 have access. A device processor D6 is a device used to connect devices D2 and D3 to the network N2. The organ processor D6 thus controls the devices D2 and D3. Another device processor D7 connects devices D4 and D5 to the network N2 and thus controls them. There are software objects that connect the devices D1, D2, D3 D7 to the processor system. As an example, the software object 1 is assumed to control the means D1. If P1, none of the processors P2- P4 can control the means D1. Thus, the object 1 cannot be transferred to any of the processors P2-P4. However, in the case of means D2-D7, software objects that control one or more of these means must be installed on a processor that can communicate with these means. The means D2-D5 can be controlled via the network N2 and thus software objects which control one or some of § I ¿- ~. 10 15 20 25 30 35 u h., 515 348 17 these means are installed on one of the processors P2 and P4.

Processorerna P1 och P3 kan däremot inte komma i fråga. Samman- fattningsvis gäller således att ett mjukvaruobjekt som ansluter hårdvara till systemet måste installeras på en processor som har access till denna hårdvara.Processors P1 and P3, on the other hand, are out of the question. In summary, a software object that connects hardware to the system must be installed on a processor that has access to this hardware.

Typiska exempel på ett organ av slaget D1 är processorn P1 själv. Ett annat exempel är en hårdvaruanordning, t ex en UART- anordning. Om processorn P1 havererar eller om den hårdvaru- anordning som processorn P1 styr havererar kan inte det mjuk- varuobjekt som representerar processorn P1 eller ett mjukva- ruobjekt som representerar den havererade hårdvaran flyttas över till någon av processørerna P2-P4 eftersom de sistnämnda inte kan få kontroll över den havererade hårdvaran eller över processorn P1. Likväl måste processorsystemet tolerera att P1 havererar om systemet skall vara redundant. Om det mjukvaruob- jekt som representerar processorn P1 havererar måste det mjuk- varuobjekt som representerar processorn P1 blockeras och att det inte kan accessas av något annat mjukvaruobjekt.Typical examples of a device of the type D1 are the processor P1 itself. Another example is a hardware device, such as a UART device. If the processor P1 fails or if the hardware device controlled by the processor P1 fails, the software object representing the processor P1 or a software object representing the failed hardware cannot be transferred to any of the processors P2-P4 because the latter cannot control over the failed hardware or over the P1 processor. Nevertheless, the processor system must tolerate that P1 fails if the system is to be redundant. If the software object representing the processor P1 fails, the software object representing the processor P1 must be blocked and cannot be accessed by any other software object.

Betrakta Fig. 1. Varje processor P1-P4 har varsitt objekt som beskriver processorn. Närmare bestämt representerar objekt 1 processorn P1, objekt 4 processorn P2, objekt 8 processorn P3 och objekt 13 processorn P4. Alla dessa hårdvaruberoenden be- skrivs exakt av den hårdvarumodell på vilken mjukvarumodellen i Fig. 7 är installerad. Således visar modellen att inget av des- sa objekt 1, 4, 8 och 13 kan flyttas över till någon annan processor i systemet. Den första algoritmen arbetar på hårdva- rumodellen med installerad mjukvara och kommer således att ta hänsyn till alla hårdvaruberoenden när den skapar nya katastro- fplaner. Processorns 1 katastrofplan, visad i Tabell 1, kommer därför att förbli densamma med undantag för att objektet 1 för- (Tabell 3) objektet 4 att försvinna och katastrofplanen för processorn 3 svinner. I katastrofplanen för processor 2 kommer kommer objektet 8 att försvinna. I katastrofplanen för proces- sorn P4 kommer objektet 13 att försvinna. Ett färre antal objekt än vad som beskrivits tidigare kommer således att finnas kvar på de respektive processorerna när en processor havererar. m, l0 15 20 25 30 35 1 a 515 348 18 Om t ex processorn P1 havererar måste objektet 1 blockeras mot access från andra mjukvaruobjekt. Detta betyder att inga andra mjukvaruobjekt tilläts kommunicera med mjukvaruobjektet 1.Consider Fig. 1. Each processor P1-P4 has its own object that describes the processor. More specifically, object 1 represents the processor P1, object 4 the processor P2, object 8 the processor P3 and object 13 the processor P4. All these hardware dependencies are described exactly by the hardware model on which the software model in Fig. 7 is installed. Thus, the model shows that none of these objects 1, 4, 8 and 13 can be transferred to any other processor in the system. The first algorithm works on the hardware model with installed software and will thus take into account all hardware dependencies when creating new disaster plans. The disaster plan of the processor 1, shown in Table 1, will therefore remain the same except that the object 1 causes the object (Table 3) 4 to disappear and the disaster plan of the processor 3 disappears. In the disaster plan for processor 2, the object 8 will disappear. In the disaster plan for the processor P4, the object 13 will disappear. A smaller number of objects than described previously will thus remain on the respective processors when a processor fails. m, l0 15 20 25 30 35 1 a 515 348 18 If, for example, the processor P1 fails, the object 1 must be blocked against access from other software objects. This means that no other software objects were allowed to communicate with the software object 1.

Modellen enligt uppfinningen kommer alltid att låtsas att sys- temet är i drift även om hárdvaruutrustning gär sönder och inte kan kontrolleras av sitt eller sina mjukvaruobjekt. Om emeller- tid det på den högre nivå i systemet visar sig att systemet inte är i drift, inte ens kan leverera tjänster med nedsatt kvalité, då är orsaken till detta att finna i bristande redun- dans och modellen kan inte ändra på detta faktum.The model according to the invention will always pretend that the system is in operation even if hardware equipment breaks down and cannot be controlled by his or her software objects. However, if at the higher level of the system it turns out that the system is not in operation, can not even deliver services with reduced quality, then the reason for this is to be found in the lack of redundancy and the model can not change this fact.

I Fig. 6A visas mjukvarumodellen av det modulariserade mjukva- ruobjektet. Mjukvarumodellen består av en klass benämnd configObj som har operationerna construct() och destruct().Fig. 6A shows the software model of the modularized software object. The software model consists of a class called configObj which has the operations construct () and destruct ().

Construct() och destruct() används för att skapa respektive dö- da ett enskilt modulärt mjukvaruobjekt. Det modulära mjuk- varuobjektet har beskrivits ovan i samband med Fig. 6B-6D.Construct () and destruct () are used to create and kill a single modular software object, respectively. The modular software object has been described above in connection with Figs. 6B-6D.

Det i Fig. 1 visade systemets härdvarumodell beskrivs med hän- visning till Fig. 8. Härdvarumodellen 30 beskriver de fysiska organen och deras platser i systemet. Hårdvarumodellen 30 inne- fattar en klass Processor 31 vilken alstrar objekt som repre- senterar processorerna P1-P4, en klass CPPool 32 vilken genere- rar objekt som representerar nätet N1, en klass DevX 33 som genererar objekt vilka representerar nätet N2 och en klass Con- figObj 34 som genererar objekt vilka representerar mjuk- varuobjekten, visade i Fig. 6B, av de ovan nämnda organen; in- klusive organprocessorerna. Klassen ConfigObj 35 alstrar objekt som representerar mjukvaruobjekten av det modulariserade mjuk- varuobjektet i Fig.6A men bildar inte någon del av hårdvarumodellen.The hardware model of the system shown in Fig. 1 is described with reference to Fig. 8. The hardware model 30 describes the physical members and their locations in the system. The hardware model 30 includes a class Processor 31 which generates objects representing the processors P1-P4, a class CPPool 32 which generates objects representing the network N1, a class DevX 33 which generates objects representing the network N2 and a class Con Fig. 34b 34 generating objects representing the software objects shown in Fig. 6B by the above-mentioned means; including the organ processors. The ConfigObj 35 class generates objects that represent the software objects of the modularized software object in Fig. 6A but does not form part of the hardware model.

Modellen visar att processorerna är anslutna till N1 och till N2. Modellen visar också att mjukvara som inte har några organ kan installeras pà alla processorer som kan anslutas till N1, medan mjukvara som styr organ mäste installeras på processorer som kan anslutas till N2. Modellen definierar restriktionerna 515 348 19 för varje hàrdvaruberoende mjukvaruobjekt. Genom att ansluta ConfigObj till OrgX inkluderas inte bara hârdvaruanordningarna i hàrdvarumodellen utan samtidigt installeras de organstyrande mjukvaruobjekten i modellen.The model shows that the processors are connected to N1 and to N2. The model also shows that software that has no devices can be installed on all processors that can be connected to N1, while software that controls devices must be installed on processors that can be connected to N2. The model defines the restrictions 515 348 19 for each hardware-dependent software object. By connecting ConfigObj to OrgX, not only are the hardware devices included in the hardware model, but at the same time the organ-controlling software objects are installed in the model.

Ma =a a n , , , , _, , , » «= av a -a a c . aa _ a :a »aa v y , , _ . =, ,' __ < = a a a a o a a m a 20 Tabell 1 INITIAL KONFTGURATON PROCESSOR-ID OBJEKT-ID l 1, 2, 3 2 4, 5, 6, 7 3 8, 9, 10, 11, 12 4 13, 14, 15, 16, 17, 18 Tabell 2 KATASTROFPLAN för processor 1 (KP-P1) OBJEKT-ID PROCESSOR-ID 1 2 2 4 3 2 Tabell 3 KATASTROFPLAN för processor 2 (KP-P2) OBJEKT-ID PROCESSOR-ID 4 1 5 3 6 3 7 4 :fa E» 5151348 21 Tabell 4 - . . . ,, KATASTROFPLAN för processor 3 (KP-P3) OBJEKT-ID PROCESSOR-ID 8 1 9 1 10 2 11 4 12 4 Tabell 5 KATASTROFPLAN för processor 4 (KP-P4) OBJEKT-ID PROCESSOR-ID 13 2 14 1 15 2 16 1 17 2 18 1 ~ - e » m e-a u. . . . . « . f. n -u »u v Q . . , V 1 o. . u . . , , w- -,. .- r . . . 1 v . »a , , 22 I - p r m , Tabell 6 AKTUELL KONFIGURATIONSTABELL när processor l är trasig PROCESSOR-ID OBJEKT-ID 2 1, 3, 4, 5, 6, 7 3 8, 9, 10, 11 12 4 2, 13, 14, 15, 16, 17, 18 Tabell 7 DELTAKONFIGURATION för processor 1 PROCESSOR-ID OBJEKT-ID 2 1, 3, 3 _ 515 548 23 Tabell 8 ~ . . . , , NY KATASTROFPLAN för processor 2 (KP-P2') OBJEKT-ID PROCESSOR-ID l 3 3 3 4 3 5 4 6 4 7 4 Tabell 9 NY KATASTROFPLAN för processor 3 (KP-P3') OBJEKT-ID PROCESSOR-ID 8 2 9 2 10 4 ll 4 12 4 2515 548 Tabell 10 NY KATASTROFPLAN för processor 4 (KP-4') OBJEKT-ID PROCESSOR-ID 2 2 13 2 14 2 15 2 16 3 17 3 18 3 Tabell 11 NY AKTUELL KONFIGURATIONSTABELL när processor 2 är trasig PROCESSOR-ID OBJEKT-ID 3 1, 3, 4, 8, 9, 10, 11, 12 4 2, 5, 6, 7, 13, 14, 15, 16, 17, 18Ma = a a n,,,, _,,, »« = av a -a a c. aa _ a: a »aa v y,, _. =,, '__ <= aaaaoaama 20 Table 1 INITIAL KONFTGURATON PROCESSOR ID OBJECT ID l 1, 2, 3 2 4, 5, 6, 7 3 8, 9, 10, 11, 12 4 13, 14, 15, 16, 17, 18 Table 2 DISASTER PLAN for processor 1 (KP-P1) OBJECT ID PROCESSOR ID 1 2 2 4 3 2 Table 3 DISASTER PLAN for processor 2 (KP-P2) OBJECT ID PROCESSOR ID 4 1 5 3 6 3 7 4: fa E »5151348 21 Table 4 -. . . ,, DISASTER PLAN for processor 3 (KP-P3) OBJECT ID PROCESSOR ID 8 1 9 1 10 2 11 4 12 4 Table 5 DISASTER PLAN for processor 4 (KP-P4) OBJECT ID PROCESSOR ID 13 2 14 1 15 2 16 1 17 2 18 1 ~ - e »m ea u. . . . «. f. n -u »u v Q. . , V 1 o. u. . ,, w- - ,. .- r. . . 1 v. »A,, 22 I - prm, Table 6 CURRENT CONFIGURATION TABLE when processor 1 is broken PROCESSOR ID OBJECT ID 2 1, 3, 4, 5, 6, 7 3 8, 9, 10, 11 12 4 2, 13, 14, 15, 16, 17, 18 Table 7 PARTICIPATION CONFIGURATION for processor 1 PROCESSOR ID OBJECT ID 2 1, 3, 3 _ 515 548 23 Table 8 ~. . . ,, NEW DISASTER PLAN for processor 2 (KP-P2 ') OBJECT ID PROCESSOR ID l 3 3 3 4 3 5 4 6 4 7 4 Table 9 NEW DISASTER PLAN for processor 3 (KP-P3') OBJECT ID PROCESSOR ID 8 2 9 2 10 4 ll 4 12 4 2515 548 Table 10 NEW DISASTER PLAN for processor 4 (KP-4 ') OBJECT ID PROCESSOR ID 2 2 13 2 14 2 15 2 16 3 17 3 18 3 Table 11 NEW CURRENT CONFIGURATION TABLE when processor 2 is broken PROCESSOR ID OBJECT ID 3 1, 3, 4, 8, 9, 10, 11, 12 4 2, 5, 6, 7, 13, 14, 15, 16, 17, 18

Claims

10 15 20 25 30 35. ~ ». 1. 515 348 15 PATENT REQUIREMENTS A »| ». i 1 P.ans. 9504396-4

A procedure for automatic recovery from multiple, permanent processor failures in a distributed processor system of a software-driven telecommunication system characterized by - (a) creating an initial configuration describing each processor and software objects executing on them - (b) shall be followed if the processor fails, which contingency plan creates, for each processor, a contingency plan that contains information on how the software objects executing on the failed processor are to be redistributed to functional processors in the processor system, the disaster plan of the processor, - (d) running the redistributed software objects on their respective processors whereby the processor system recovers from the failed processor, - (e) creating, for each now operational processor in the current configuration of the system, new disaster plans that shall is followed if any of the now operational processors fail, whereby the processor system regains its ability to withstand a new processor failure, - (f) repetition of steps (c) - (e) for each processor that fails, - (h) repair of the failed the processors and reinserting them into the processor system, - (i) returning to the initial configuration by distributing said redistributed software objects back to their respective PIOCESSOIGIs'.

A method for automatic recovery from multiple permanent processor failures according to claim 1, characterized in that the initial configuration is created by mapping software objects executing on a single processor on said single processor and repeating said mapping for each of the processors in the processor system. 10 15 20 25. »V f * - 515 -348 1G,

Procedure for automatic recovery from multiples. | =. q. P.ans. 9504396-4 permanent processor failures according to claim 2, characterized in that the redistribution step comprises the creation and start of the software objects redistributed in accordance with the disaster plan for the failed processor.

Method for automatic recovery from multiple permanent processor failures according to claim 3, characterized in that first hardware objects controlling hardware equipment, which the failed processor controls, and then software objects not controlling hardware equipment are redistributed.

Method according to claim 4, characterized by blocking of software objects which exhibit a hardware dependency and which execute on a failed processor against access from software objects which execute on other operable prOCeSSOs & r in PIOCGSSOISYSCEMGC.

Method according to claim 5, characterized in that a processor having a hardware dependence is represented by a software object.

A method for automatic recovery from multiple permanent processor failures according to claim 1, characterized by storing said initial disaster plans in a database distributed in working memories associated with the individual processors in the distributed PROCESSOR SYSTEM. 10 15 20 25 30 515 348 å?

8. Method for automatic recovery from multiples in .Ü. . I Hz: f: P.ans. 9504396-4 permanent processor failures according to claim 1, characterized in that the step of returning to the initial configuration comprises - producing information regarding which software objects are to be moved back to the repaired and installed processor, - killing of software objects to be moved back and - generation and launch of the killed software objects on the repaired, installed processor.

Method for automatic recovery from multiple permanent processor failures according to claims 2 and 8, characterized in that said retrieval takes place by subtracting the initial configuration from the current configuration with the exclusion of the failed pIOCeSSOrein (s).

Method for automatic recovery from multiple permanent processor failures according to claim 1, characterized in that the disaster plan to be followed in the event of a processor failure is created by the pIOCeSSOI in question.

Method for automatic recovery from multiple permanent processor failures according to claim 8, characterized in that each software module has at least two functions, one that creates the software object and one that kills it.