WO2000055953A1 - Systeme et procede de gestion d'evenements et de detection anticipee d'erreurs - Google Patents

Systeme et procede de gestion d'evenements et de detection anticipee d'erreurs Download PDF

Info

Publication number
WO2000055953A1
WO2000055953A1 PCT/US2000/006919 US0006919W WO0055953A1 WO 2000055953 A1 WO2000055953 A1 WO 2000055953A1 US 0006919 W US0006919 W US 0006919W WO 0055953 A1 WO0055953 A1 WO 0055953A1
Authority
WO
WIPO (PCT)
Prior art keywords
event
client
fault
block
list
Prior art date
Application number
PCT/US2000/006919
Other languages
English (en)
Inventor
Kumar Gajjar
Nghiep Tran
Original Assignee
Smartsan Systems, Inc.
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Smartsan Systems, Inc. filed Critical Smartsan Systems, Inc.
Priority to AU38892/00A priority Critical patent/AU3889200A/en
Publication of WO2000055953A1 publication Critical patent/WO2000055953A1/fr

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/06Management of faults, events, alarms or notifications
    • H04L41/0681Configuration of triggering conditions
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/06Management of faults, events, alarms or notifications
    • H04L41/069Management of faults, events, alarms or notifications using logs of notifications; Post-processing of notifications
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/22Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks comprising specially adapted graphical user interfaces [GUI]
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L43/00Arrangements for monitoring or testing data switching networks
    • H04L43/16Threshold monitoring
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L69/00Network arrangements, protocols or services independent of the application payload and not provided for in the other groups of this subclass
    • H04L69/40Network arrangements, protocols or services independent of the application payload and not provided for in the other groups of this subclass for recovering from a failure of a protocol instance or entity, e.g. service redundancy protocols, protocol state redundancy or protocol service redirection

Definitions

  • the present invention relates generally to network fault management via a software event manager (EM) inserted in the network at a central point in the system and controlled by the user through a Graphical User Interface (GUI).
  • EM software event manager
  • GUI Graphical User Interface
  • fibre channel The introduction and proliferation of fibre channel has allowed greatly increasing network connectivity between central servers and local storage so that many more devices can be connected to a network over wider geographical areas.
  • Fibre channel is an ANSI-standard, high-speed data communications technology providing gigabit-per-second transmission rates for server/storage and large-size, high-performance, geographically dispersed networking environments. Increases in computer network speed, size and connectivity require that early fault detection and fault management controls be embedded in the central server or elsewhere with connections to all devices and storage comprising the network.
  • the main components or functions of the fault or EM are: 1. Event Table 2. Registration
  • GUI Graphical User Interface
  • the present invention provides a software system and method for the users or clients of the system to set and change, as needed, the fault reporting, fault logging, fault notification, and fault trigger point thresholds for any event in the network or system.
  • a "point and click" graphical user interface (GUI) can allow users to perform these tasks, or they can be performed by calling API functions.
  • Another advantage of the present invention is the integration into a central point, the EM, of all appropriate fault management functions as follows:
  • FIG. 1 is a block diagram of one embodiment of a controller device according to the invention embodying an Event Manager (EM) for managing events and faults in a computer network;
  • EM Event Manager
  • Figure 2 is a block diagram illustrating one embodiment of the process of client registration
  • Figure 3 is a block diagram of nested hierarchical blocks illustrating one embodiment of the format and the ordered information content in the Client Event Table;
  • Figure 4 is a block diagram illustrating one embodiment of the Event Notification Registration List
  • FIG. 5 is a flow chart diagram of the Event Notification process in accordance with the present invention.
  • Figure 6 is a block diagram of one embodiment illustrating the Event
  • FIG. 7 is a flow chart diagram of Event Thresholding in accordance with the present invention.
  • Figure 8 is a flow chart diagram of Ordered Event Thresholding in accordance with the present invention.
  • Figure 9 is a block diagram of one embodiment of the Event Reporting feature; and Figure 10 is a block diagram of an Event Reporting example in accordance with the present invention.
  • the present invention is a novel system and method of providing fault management and early fault detection, reporting and system response in a computer or logic device network that reaches all the way down to the device level, including logical devices.
  • FIG. 1 is a schematic block diagram illustrating one embodiment of the controller or EM 100 wherein there are identified the key elements of the EM 100. These elements include the Processor Module 1 10 comprizing a Processor 120 connected to a random access memory (RAM) 130, a non-volatile memory 140, a read-only memory (ROM) 150, a Cache/Staging memory 170 and the input/output connections to all the relevant components of the network (FC I/O's 172, 174, etc. and (I O's) 182, 184, 186 etc.
  • RAM random access memory
  • ROM read-only memory
  • Cache/Staging memory 170 the input/output connections to all the relevant components of the network
  • FIG. 2 a block diagram illustrates one of the ways that a client XYZ registers with the EM.
  • the client assembles an event/fault table, as shown in block A, wherein there are listed in the required level of detail the possible or anticipated events that can occur to the client and its components. This table is discussed in great detail in Figure 3.
  • the client XYZ registers with the EM with Client Identification (ID) and a pointer to its Event Table, through a step B to the EM.
  • ID Client Identification
  • FIG. 3 is a set of nested hierarchical blocks of lists illustrating the format and the ordered information content in the Client Event Table 300.
  • All the event elements say 301 to 309 are listed in numerical ascending order for one client.
  • tags 361-366 also correspond each to an entire block of ordered lists of choices, attributes and actions down to the level of required detail to identify the component, the fault and its severity and to take component and system fault remedial actions as illustrated by the information inside the right-hand side blocks 361-366 of Figure 3. Additional options, choices, list members can be added to the lists in blocks 300, 360-366 suitable or required by a specific application by a designer skilled in the art.
  • Event Notification Registration feature allows a client to register itself with the EM in order to be notified after the occurrence of a specified event.
  • FIG 4 illustrates in detail the Event Notification Registration (ENR) function of the EM.
  • ENR Event Notification Registration
  • the EM creates an ENR Element, 450, 136, 142, and adds it to its ENR List, 436, and increments the ENRCount.
  • the ENR function, block 436 of Figure 4 stores this information for the given client when the client sends to block 436 an ENR in the format of block 450, 451 etc. of Figure 4.
  • the format of the client ENR say block 450, includes the Client ID, the Event Code, a data on the previous event occurance, ENR Prev P, data on the next event occurance, ENRNext P and a Callback Function List the contents of which are shown in block 480. For every event received by the EM 120, it checks the ENR
  • FIG. 5 illustrates in detail via a flow chart the Event Notification process.
  • the flow chart starts with the EM receiving an event. Then it checks every notification entry in the list of block 136 or block 436 if there is a next entry. If not then it exits. If yes it checks to find if the event matches the one in the stored list. If not then it returns to the start of the event notification flow chart to test the next event. If yes then it calls the Callback functions in the callback functions list 480. Then it returns to the beginning of the flow chart to check the next notification entry.
  • FIG. 6 illustrates in detail the Event Threshold Registration (ETR) function of the controller 100, shown as block 138, 142 in Figure 1 and as block 638 in Figure 6.
  • ETR Event Threshold Registration
  • the EM creates an ETR element, 650, 138, 142, adds it to its ETR List 638, and increments the ETRCount.
  • the format of the client ETR, say block 650 includes the client ID, the Event Code, data on the previews event occurance ETRPrevP, data on the next (current) event occurance ETRNextP, Occurance Count, Timestamp, Threshold Type, Threshold Duration, Threshold Event Count, Callback Function, Event Count, Event Code List.
  • the Event Code list in block 650 is further delineated into a Threshold Event List, block 680 that tags the threshold events.
  • Each threshold event in block 680 is further delineated into a Threshold Element List, block 690 containing information on the Element Type, Event Number, Client ID, Severity Level, Component Type and Component ID.
  • the ETR feature of EM 100 allows clients to register event(s) with EM, so that EM will notify the client if the threshold parameters are met for the specified event(s). This will allow a client to monitor the activity in the system (i.e. For failure analysis, one can request to be notified when 5 "Media error” events occur within 2 seconds, when this happens it can decide what to do with the device).
  • Example 1 User, the client, sets the trigger parameter as: "Notify the user via SNMP Trap if 3 Bad Block Errors occur within 10 seconds time interval from Storage Device 0".
  • EM will monitor all Bad Block errors generated by Device 0, log the time the errors occurred and monitors to check if 3 errors occurred within the 10-second time interval. If so then it will notify the user by sending an SNMP Trap to the Management station.
  • Example 2 Fibre Channel Driver, the client, sets the trigger parameter as:
  • EM will monitor all LIP Resets detected on Fibre Channel Port 1 , log the time the errors occurred and check if 5 errors occurred within the 15-second time interval. If so then it will call the function fcdInit(port 1).
  • the function emETR () returns a unique ID which can be used to de-register the ETR.
  • FIG. 7 is a flowchart of steps in a method for checking whether Threshold parameters are met for the specified event(s). This will allow a client to monitor the activity in the system.
  • a Threshold event list is created as shown in Figure 6 and placed in step 702 of Figure 7.
  • the event thresholding program in step 710 initiates or continues the evaluation of threshold entries. If there no more threshold entries in the list, the program exits the its evaluation process. If there is an additional threshold entry in the list then it proceeds in step 720 to compare its duration against a preset duration. If the given threshold duration is greater than a preset duration then in step 722 it resets the Timestamp and Resets the Counter and proceeds to the next step 730.
  • step 730 it is compared to the preset event for match. If it does not match then the program returns to the initial step 710 where it looks for a next entry to evaluate. If it does match in step 730 then it increments the Counter and proceeds to 740 and checks the Counter to find if it is equal to one (1). If yes it proceeds to step 742 where it resets the timestamp and proceeds to the initializing step 710 where it calls for a next entry to be tested. If the answer is No in step 740 then it proceeds to 750 where it checks to find if the counter value is greater or equal to the threshold event count. If the answer is No then it returns to step 710 to initiate testing a next entry. If the answer is Yes then it continues to step 760 where it calls the Callback function, resets the timestep, resets the counter and returns to the initializing step 710 for evaluating the next entry.
  • Figure 8 addresses the case when the threshold event list is ordered as shown in block 680 of Figure 6.
  • the only difference between FIGS. 7 and 8 occur in the insertion of steps 831 and 833 between steps 830 and 832. They yes option of step 830 leads to a new step 831 where the matched index is compared to the counter. If they are not equal then the timestamp and the counter are reset and the program returns to the initializing step 810 to evaluate the next entry. If they are equal (ordered event) then the remaining steps are identical to the corresponding ones of Figure 7.
  • Figure 9 illustrates in detail how a client reports an event 900 to the EM.
  • the client will call emReportEvent () 910 in EM with the following parameters inserted: client ID 920, event number 930, component ID 940 and software context 950.
  • the software context block 950 contains File Name, Line Number and Version Number.
  • the remaining blocks 960, 970, 980, 991, 992, 993, 994, 995 and 996 are identical in format to those in FIGS. 2 and 3.
  • the EM When the EM receives an Event Reporting request, it will index into the Client Table using the Client ID and find the Client Event Table. Then using the Event Number, EM will index into the Client Event Table and get the Event Element of FIGS . 4 and 6.
  • Figure 10 illustrates an event reporting example from the FC driver: the Al Loop Up Event.
  • Block 1000 identifies the event from the event element.
  • Block 1010 identifies the event element.
  • Block 1020 identifies the relevant Correction Description Table.
  • Block 1030 identifies the two actions that are enabled on this event as specified in the first two elements.

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Human Computer Interaction (AREA)
  • Computer Security & Cryptography (AREA)
  • Debugging And Monitoring (AREA)
  • Computer And Data Communications (AREA)

Abstract

Les utilisateurs ou les clients d'un système informatique peuvent définir et modifier, selon les besoins, les seuils de rapports d'erreurs, d'enregistrements d'erreurs, de notifications d'erreurs, et de point de déclenchement d'erreurs pour tout événement se produisant dans un réseau ou un système. Une interface graphique utilisateur 'pointer-cliquer' permet aux utilisateurs d'effectuer ces tâches de manière pratique, celles-ci pouvant être effectuées, le cas échéant, à l'aide de fonctions API. Un autre avantage réside dans l'intégration d'un gestionnaire d'événements (Event Manager) dans un point central pour toutes les fonctions de gestion d'erreurs appropriées, y compris les fonctions 'table d'événements', 'enregistrement', 'seuils d'événements', 'enregistrement et notification', ainsi que 'opérations de récupération' ou 'actions à effectuer'.
PCT/US2000/006919 1999-03-15 2000-03-15 Systeme et procede de gestion d'evenements et de detection anticipee d'erreurs WO2000055953A1 (fr)

Priority Applications (1)

Application Number Priority Date Filing Date Title
AU38892/00A AU3889200A (en) 1999-03-15 2000-03-15 System and method of event management and early fault detection

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US12449499P 1999-03-15 1999-03-15
US60/124,494 1999-03-15

Publications (1)

Publication Number Publication Date
WO2000055953A1 true WO2000055953A1 (fr) 2000-09-21

Family

ID=22415207

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US2000/006919 WO2000055953A1 (fr) 1999-03-15 2000-03-15 Systeme et procede de gestion d'evenements et de detection anticipee d'erreurs

Country Status (2)

Country Link
AU (1) AU3889200A (fr)
WO (1) WO2000055953A1 (fr)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7546488B2 (en) 2004-07-02 2009-06-09 Seagate Technology Llc Event logging and analysis in a software system
US7546489B2 (en) 2005-01-25 2009-06-09 Seagate Technology Llc Real time event logging and analysis in a software system
US20130198573A1 (en) * 2000-07-18 2013-08-01 Apple Inc. Event logging and performance analysis system for applications

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5029170A (en) * 1989-11-30 1991-07-02 Hansen Robert G Assembly language programming potential error detection scheme which recognizes incorrect symbolic or literal address constructs
US5119377A (en) * 1989-06-16 1992-06-02 International Business Machines Corporation System and method for software error early detection and data capture
US5132972A (en) * 1989-11-29 1992-07-21 Honeywell Bull Inc. Assembly language programming potential error detection scheme sensing apparent inconsistency with a previous operation
US5383201A (en) * 1991-12-23 1995-01-17 Amdahl Corporation Method and apparatus for locating source of error in high-speed synchronous systems
US5432795A (en) * 1991-03-07 1995-07-11 Digital Equipment Corporation System for reporting errors of a translated program and using a boundry instruction bitmap to determine the corresponding instruction address in a source program
US5594861A (en) * 1995-08-18 1997-01-14 Telefonaktiebolaget L M Ericsson Method and apparatus for handling processing errors in telecommunications exchanges

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5119377A (en) * 1989-06-16 1992-06-02 International Business Machines Corporation System and method for software error early detection and data capture
US5132972A (en) * 1989-11-29 1992-07-21 Honeywell Bull Inc. Assembly language programming potential error detection scheme sensing apparent inconsistency with a previous operation
US5029170A (en) * 1989-11-30 1991-07-02 Hansen Robert G Assembly language programming potential error detection scheme which recognizes incorrect symbolic or literal address constructs
US5432795A (en) * 1991-03-07 1995-07-11 Digital Equipment Corporation System for reporting errors of a translated program and using a boundry instruction bitmap to determine the corresponding instruction address in a source program
US5383201A (en) * 1991-12-23 1995-01-17 Amdahl Corporation Method and apparatus for locating source of error in high-speed synchronous systems
US5594861A (en) * 1995-08-18 1997-01-14 Telefonaktiebolaget L M Ericsson Method and apparatus for handling processing errors in telecommunications exchanges

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20130198573A1 (en) * 2000-07-18 2013-08-01 Apple Inc. Event logging and performance analysis system for applications
US7546488B2 (en) 2004-07-02 2009-06-09 Seagate Technology Llc Event logging and analysis in a software system
US7546489B2 (en) 2005-01-25 2009-06-09 Seagate Technology Llc Real time event logging and analysis in a software system

Also Published As

Publication number Publication date
AU3889200A (en) 2000-10-04

Similar Documents

Publication Publication Date Title
US7525422B2 (en) Method and system for providing alarm reporting in a managed network services environment
US7426654B2 (en) Method and system for providing customer controlled notifications in a managed network services system
US6529784B1 (en) Method and apparatus for monitoring computer systems and alerting users of actual or potential system errors
US6434616B2 (en) Method for monitoring abnormal behavior in a computer system
US8812649B2 (en) Method and system for processing fault alarms and trouble tickets in a managed network services system
US8738760B2 (en) Method and system for providing automated data retrieval in support of fault isolation in a managed services network
US8924533B2 (en) Method and system for providing automated fault isolation in a managed services network
US8676945B2 (en) Method and system for processing fault alarms and maintenance events in a managed network services system
EP0831617B1 (fr) Mécanisme flexible pour un trap SNMP
US5276529A (en) System and method for remote testing and protocol analysis of communication lines
US20040205689A1 (en) System and method for managing a component-based system
US7818283B1 (en) Service assurance automation access diagnostics
US7469287B1 (en) Apparatus and method for monitoring objects in a network and automatically validating events relating to the objects
US20050038888A1 (en) Method of and apparatus for monitoring event logs
US20040006619A1 (en) Structure for event reporting in SNMP systems
US20020188568A1 (en) Systems and methods of containing and accessing generic policy
CN113810366A (zh) 一种网站上传文件安全识别系统及方法
CN106685744A (zh) 一种故障排除方法、装置及系统
WO2000055953A1 (fr) Systeme et procede de gestion d'evenements et de detection anticipee d'erreurs
CN115242621B (zh) 网络专线监控方法、装置、设备及计算机可读存储介质
CN110521233B (zh) 标识中断的方法、接入点、远程配置的方法、系统和介质
WO2019241199A1 (fr) Système et procédé de maintenance prédictive de dispositifs en réseau
JP2003132019A (ja) 計算機システムの障害監視方法
CN111259383A (zh) 一种安全管理中心系统
CN110489690B (zh) 监控政务服务应用系统的方法、服务器、设备及存储介质

Legal Events

Date Code Title Description
AK Designated states

Kind code of ref document: A1

Designated state(s): AE AL AM AT AU AZ BA BB BG BR BY CA CH CN CR CU CZ DE DK DM DZ EE ES FI GB GD GE GH GM HR HU ID IL IN IS JP KE KG KP KR KZ LC LK LR LS LT LU LV MA MD MG MK MN MW MX NO NZ PL PT RO RU SD SE SG SI SK SL TJ TM TR TT TZ UA UG US UZ VN YU ZA ZW

AL Designated countries for regional patents

Kind code of ref document: A1

Designated state(s): GH GM KE LS MW SD SL SZ TZ UG ZW AM AZ BY KG KZ MD RU TJ TM AT BE CH CY DE DK ES FI FR GB GR IE IT LU MC NL PT SE BF BJ CF CG CI CM GA GN GW ML MR NE SN TD TG

121 Ep: the epo has been informed by wipo that ep was designated in this application
REG Reference to national code

Ref country code: DE

Ref legal event code: 8642

122 Ep: pct application non-entry in european phase