WO2000055953A1 - Systeme et procede de gestion d'evenements et de detection anticipee d'erreurs - Google Patents
Systeme et procede de gestion d'evenements et de detection anticipee d'erreurs Download PDFInfo
- Publication number
- WO2000055953A1 WO2000055953A1 PCT/US2000/006919 US0006919W WO0055953A1 WO 2000055953 A1 WO2000055953 A1 WO 2000055953A1 US 0006919 W US0006919 W US 0006919W WO 0055953 A1 WO0055953 A1 WO 0055953A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- event
- client
- fault
- block
- list
- Prior art date
Links
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L41/00—Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
- H04L41/06—Management of faults, events, alarms or notifications
- H04L41/0681—Configuration of triggering conditions
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L41/00—Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
- H04L41/06—Management of faults, events, alarms or notifications
- H04L41/069—Management of faults, events, alarms or notifications using logs of notifications; Post-processing of notifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L41/00—Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
- H04L41/22—Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks comprising specially adapted graphical user interfaces [GUI]
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L43/00—Arrangements for monitoring or testing data switching networks
- H04L43/16—Threshold monitoring
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L69/00—Network arrangements, protocols or services independent of the application payload and not provided for in the other groups of this subclass
- H04L69/40—Network arrangements, protocols or services independent of the application payload and not provided for in the other groups of this subclass for recovering from a failure of a protocol instance or entity, e.g. service redundancy protocols, protocol state redundancy or protocol service redirection
Definitions
- the present invention relates generally to network fault management via a software event manager (EM) inserted in the network at a central point in the system and controlled by the user through a Graphical User Interface (GUI).
- EM software event manager
- GUI Graphical User Interface
- fibre channel The introduction and proliferation of fibre channel has allowed greatly increasing network connectivity between central servers and local storage so that many more devices can be connected to a network over wider geographical areas.
- Fibre channel is an ANSI-standard, high-speed data communications technology providing gigabit-per-second transmission rates for server/storage and large-size, high-performance, geographically dispersed networking environments. Increases in computer network speed, size and connectivity require that early fault detection and fault management controls be embedded in the central server or elsewhere with connections to all devices and storage comprising the network.
- the main components or functions of the fault or EM are: 1. Event Table 2. Registration
- GUI Graphical User Interface
- the present invention provides a software system and method for the users or clients of the system to set and change, as needed, the fault reporting, fault logging, fault notification, and fault trigger point thresholds for any event in the network or system.
- a "point and click" graphical user interface (GUI) can allow users to perform these tasks, or they can be performed by calling API functions.
- Another advantage of the present invention is the integration into a central point, the EM, of all appropriate fault management functions as follows:
- FIG. 1 is a block diagram of one embodiment of a controller device according to the invention embodying an Event Manager (EM) for managing events and faults in a computer network;
- EM Event Manager
- Figure 2 is a block diagram illustrating one embodiment of the process of client registration
- Figure 3 is a block diagram of nested hierarchical blocks illustrating one embodiment of the format and the ordered information content in the Client Event Table;
- Figure 4 is a block diagram illustrating one embodiment of the Event Notification Registration List
- FIG. 5 is a flow chart diagram of the Event Notification process in accordance with the present invention.
- Figure 6 is a block diagram of one embodiment illustrating the Event
- FIG. 7 is a flow chart diagram of Event Thresholding in accordance with the present invention.
- Figure 8 is a flow chart diagram of Ordered Event Thresholding in accordance with the present invention.
- Figure 9 is a block diagram of one embodiment of the Event Reporting feature; and Figure 10 is a block diagram of an Event Reporting example in accordance with the present invention.
- the present invention is a novel system and method of providing fault management and early fault detection, reporting and system response in a computer or logic device network that reaches all the way down to the device level, including logical devices.
- FIG. 1 is a schematic block diagram illustrating one embodiment of the controller or EM 100 wherein there are identified the key elements of the EM 100. These elements include the Processor Module 1 10 comprizing a Processor 120 connected to a random access memory (RAM) 130, a non-volatile memory 140, a read-only memory (ROM) 150, a Cache/Staging memory 170 and the input/output connections to all the relevant components of the network (FC I/O's 172, 174, etc. and (I O's) 182, 184, 186 etc.
- RAM random access memory
- ROM read-only memory
- Cache/Staging memory 170 the input/output connections to all the relevant components of the network
- FIG. 2 a block diagram illustrates one of the ways that a client XYZ registers with the EM.
- the client assembles an event/fault table, as shown in block A, wherein there are listed in the required level of detail the possible or anticipated events that can occur to the client and its components. This table is discussed in great detail in Figure 3.
- the client XYZ registers with the EM with Client Identification (ID) and a pointer to its Event Table, through a step B to the EM.
- ID Client Identification
- FIG. 3 is a set of nested hierarchical blocks of lists illustrating the format and the ordered information content in the Client Event Table 300.
- All the event elements say 301 to 309 are listed in numerical ascending order for one client.
- tags 361-366 also correspond each to an entire block of ordered lists of choices, attributes and actions down to the level of required detail to identify the component, the fault and its severity and to take component and system fault remedial actions as illustrated by the information inside the right-hand side blocks 361-366 of Figure 3. Additional options, choices, list members can be added to the lists in blocks 300, 360-366 suitable or required by a specific application by a designer skilled in the art.
- Event Notification Registration feature allows a client to register itself with the EM in order to be notified after the occurrence of a specified event.
- FIG 4 illustrates in detail the Event Notification Registration (ENR) function of the EM.
- ENR Event Notification Registration
- the EM creates an ENR Element, 450, 136, 142, and adds it to its ENR List, 436, and increments the ENRCount.
- the ENR function, block 436 of Figure 4 stores this information for the given client when the client sends to block 436 an ENR in the format of block 450, 451 etc. of Figure 4.
- the format of the client ENR say block 450, includes the Client ID, the Event Code, a data on the previous event occurance, ENR Prev P, data on the next event occurance, ENRNext P and a Callback Function List the contents of which are shown in block 480. For every event received by the EM 120, it checks the ENR
- FIG. 5 illustrates in detail via a flow chart the Event Notification process.
- the flow chart starts with the EM receiving an event. Then it checks every notification entry in the list of block 136 or block 436 if there is a next entry. If not then it exits. If yes it checks to find if the event matches the one in the stored list. If not then it returns to the start of the event notification flow chart to test the next event. If yes then it calls the Callback functions in the callback functions list 480. Then it returns to the beginning of the flow chart to check the next notification entry.
- FIG. 6 illustrates in detail the Event Threshold Registration (ETR) function of the controller 100, shown as block 138, 142 in Figure 1 and as block 638 in Figure 6.
- ETR Event Threshold Registration
- the EM creates an ETR element, 650, 138, 142, adds it to its ETR List 638, and increments the ETRCount.
- the format of the client ETR, say block 650 includes the client ID, the Event Code, data on the previews event occurance ETRPrevP, data on the next (current) event occurance ETRNextP, Occurance Count, Timestamp, Threshold Type, Threshold Duration, Threshold Event Count, Callback Function, Event Count, Event Code List.
- the Event Code list in block 650 is further delineated into a Threshold Event List, block 680 that tags the threshold events.
- Each threshold event in block 680 is further delineated into a Threshold Element List, block 690 containing information on the Element Type, Event Number, Client ID, Severity Level, Component Type and Component ID.
- the ETR feature of EM 100 allows clients to register event(s) with EM, so that EM will notify the client if the threshold parameters are met for the specified event(s). This will allow a client to monitor the activity in the system (i.e. For failure analysis, one can request to be notified when 5 "Media error” events occur within 2 seconds, when this happens it can decide what to do with the device).
- Example 1 User, the client, sets the trigger parameter as: "Notify the user via SNMP Trap if 3 Bad Block Errors occur within 10 seconds time interval from Storage Device 0".
- EM will monitor all Bad Block errors generated by Device 0, log the time the errors occurred and monitors to check if 3 errors occurred within the 10-second time interval. If so then it will notify the user by sending an SNMP Trap to the Management station.
- Example 2 Fibre Channel Driver, the client, sets the trigger parameter as:
- EM will monitor all LIP Resets detected on Fibre Channel Port 1 , log the time the errors occurred and check if 5 errors occurred within the 15-second time interval. If so then it will call the function fcdInit(port 1).
- the function emETR () returns a unique ID which can be used to de-register the ETR.
- FIG. 7 is a flowchart of steps in a method for checking whether Threshold parameters are met for the specified event(s). This will allow a client to monitor the activity in the system.
- a Threshold event list is created as shown in Figure 6 and placed in step 702 of Figure 7.
- the event thresholding program in step 710 initiates or continues the evaluation of threshold entries. If there no more threshold entries in the list, the program exits the its evaluation process. If there is an additional threshold entry in the list then it proceeds in step 720 to compare its duration against a preset duration. If the given threshold duration is greater than a preset duration then in step 722 it resets the Timestamp and Resets the Counter and proceeds to the next step 730.
- step 730 it is compared to the preset event for match. If it does not match then the program returns to the initial step 710 where it looks for a next entry to evaluate. If it does match in step 730 then it increments the Counter and proceeds to 740 and checks the Counter to find if it is equal to one (1). If yes it proceeds to step 742 where it resets the timestamp and proceeds to the initializing step 710 where it calls for a next entry to be tested. If the answer is No in step 740 then it proceeds to 750 where it checks to find if the counter value is greater or equal to the threshold event count. If the answer is No then it returns to step 710 to initiate testing a next entry. If the answer is Yes then it continues to step 760 where it calls the Callback function, resets the timestep, resets the counter and returns to the initializing step 710 for evaluating the next entry.
- Figure 8 addresses the case when the threshold event list is ordered as shown in block 680 of Figure 6.
- the only difference between FIGS. 7 and 8 occur in the insertion of steps 831 and 833 between steps 830 and 832. They yes option of step 830 leads to a new step 831 where the matched index is compared to the counter. If they are not equal then the timestamp and the counter are reset and the program returns to the initializing step 810 to evaluate the next entry. If they are equal (ordered event) then the remaining steps are identical to the corresponding ones of Figure 7.
- Figure 9 illustrates in detail how a client reports an event 900 to the EM.
- the client will call emReportEvent () 910 in EM with the following parameters inserted: client ID 920, event number 930, component ID 940 and software context 950.
- the software context block 950 contains File Name, Line Number and Version Number.
- the remaining blocks 960, 970, 980, 991, 992, 993, 994, 995 and 996 are identical in format to those in FIGS. 2 and 3.
- the EM When the EM receives an Event Reporting request, it will index into the Client Table using the Client ID and find the Client Event Table. Then using the Event Number, EM will index into the Client Event Table and get the Event Element of FIGS . 4 and 6.
- Figure 10 illustrates an event reporting example from the FC driver: the Al Loop Up Event.
- Block 1000 identifies the event from the event element.
- Block 1010 identifies the event element.
- Block 1020 identifies the relevant Correction Description Table.
- Block 1030 identifies the two actions that are enabled on this event as specified in the first two elements.
Landscapes
- Engineering & Computer Science (AREA)
- Computer Networks & Wireless Communication (AREA)
- Signal Processing (AREA)
- Human Computer Interaction (AREA)
- Computer Security & Cryptography (AREA)
- Debugging And Monitoring (AREA)
- Computer And Data Communications (AREA)
Abstract
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
AU38892/00A AU3889200A (en) | 1999-03-15 | 2000-03-15 | System and method of event management and early fault detection |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US12449499P | 1999-03-15 | 1999-03-15 | |
US60/124,494 | 1999-03-15 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2000055953A1 true WO2000055953A1 (fr) | 2000-09-21 |
Family
ID=22415207
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/US2000/006919 WO2000055953A1 (fr) | 1999-03-15 | 2000-03-15 | Systeme et procede de gestion d'evenements et de detection anticipee d'erreurs |
Country Status (2)
Country | Link |
---|---|
AU (1) | AU3889200A (fr) |
WO (1) | WO2000055953A1 (fr) |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US7546488B2 (en) | 2004-07-02 | 2009-06-09 | Seagate Technology Llc | Event logging and analysis in a software system |
US7546489B2 (en) | 2005-01-25 | 2009-06-09 | Seagate Technology Llc | Real time event logging and analysis in a software system |
US20130198573A1 (en) * | 2000-07-18 | 2013-08-01 | Apple Inc. | Event logging and performance analysis system for applications |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5029170A (en) * | 1989-11-30 | 1991-07-02 | Hansen Robert G | Assembly language programming potential error detection scheme which recognizes incorrect symbolic or literal address constructs |
US5119377A (en) * | 1989-06-16 | 1992-06-02 | International Business Machines Corporation | System and method for software error early detection and data capture |
US5132972A (en) * | 1989-11-29 | 1992-07-21 | Honeywell Bull Inc. | Assembly language programming potential error detection scheme sensing apparent inconsistency with a previous operation |
US5383201A (en) * | 1991-12-23 | 1995-01-17 | Amdahl Corporation | Method and apparatus for locating source of error in high-speed synchronous systems |
US5432795A (en) * | 1991-03-07 | 1995-07-11 | Digital Equipment Corporation | System for reporting errors of a translated program and using a boundry instruction bitmap to determine the corresponding instruction address in a source program |
US5594861A (en) * | 1995-08-18 | 1997-01-14 | Telefonaktiebolaget L M Ericsson | Method and apparatus for handling processing errors in telecommunications exchanges |
-
2000
- 2000-03-15 WO PCT/US2000/006919 patent/WO2000055953A1/fr active Application Filing
- 2000-03-15 AU AU38892/00A patent/AU3889200A/en not_active Abandoned
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5119377A (en) * | 1989-06-16 | 1992-06-02 | International Business Machines Corporation | System and method for software error early detection and data capture |
US5132972A (en) * | 1989-11-29 | 1992-07-21 | Honeywell Bull Inc. | Assembly language programming potential error detection scheme sensing apparent inconsistency with a previous operation |
US5029170A (en) * | 1989-11-30 | 1991-07-02 | Hansen Robert G | Assembly language programming potential error detection scheme which recognizes incorrect symbolic or literal address constructs |
US5432795A (en) * | 1991-03-07 | 1995-07-11 | Digital Equipment Corporation | System for reporting errors of a translated program and using a boundry instruction bitmap to determine the corresponding instruction address in a source program |
US5383201A (en) * | 1991-12-23 | 1995-01-17 | Amdahl Corporation | Method and apparatus for locating source of error in high-speed synchronous systems |
US5594861A (en) * | 1995-08-18 | 1997-01-14 | Telefonaktiebolaget L M Ericsson | Method and apparatus for handling processing errors in telecommunications exchanges |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20130198573A1 (en) * | 2000-07-18 | 2013-08-01 | Apple Inc. | Event logging and performance analysis system for applications |
US7546488B2 (en) | 2004-07-02 | 2009-06-09 | Seagate Technology Llc | Event logging and analysis in a software system |
US7546489B2 (en) | 2005-01-25 | 2009-06-09 | Seagate Technology Llc | Real time event logging and analysis in a software system |
Also Published As
Publication number | Publication date |
---|---|
AU3889200A (en) | 2000-10-04 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US7525422B2 (en) | Method and system for providing alarm reporting in a managed network services environment | |
US7426654B2 (en) | Method and system for providing customer controlled notifications in a managed network services system | |
US6529784B1 (en) | Method and apparatus for monitoring computer systems and alerting users of actual or potential system errors | |
US6434616B2 (en) | Method for monitoring abnormal behavior in a computer system | |
US8812649B2 (en) | Method and system for processing fault alarms and trouble tickets in a managed network services system | |
US8738760B2 (en) | Method and system for providing automated data retrieval in support of fault isolation in a managed services network | |
US8924533B2 (en) | Method and system for providing automated fault isolation in a managed services network | |
US8676945B2 (en) | Method and system for processing fault alarms and maintenance events in a managed network services system | |
EP0831617B1 (fr) | Mécanisme flexible pour un trap SNMP | |
US5276529A (en) | System and method for remote testing and protocol analysis of communication lines | |
US20040205689A1 (en) | System and method for managing a component-based system | |
US7818283B1 (en) | Service assurance automation access diagnostics | |
US7469287B1 (en) | Apparatus and method for monitoring objects in a network and automatically validating events relating to the objects | |
US20050038888A1 (en) | Method of and apparatus for monitoring event logs | |
US20040006619A1 (en) | Structure for event reporting in SNMP systems | |
US20020188568A1 (en) | Systems and methods of containing and accessing generic policy | |
CN113810366A (zh) | 一种网站上传文件安全识别系统及方法 | |
CN106685744A (zh) | 一种故障排除方法、装置及系统 | |
WO2000055953A1 (fr) | Systeme et procede de gestion d'evenements et de detection anticipee d'erreurs | |
CN115242621B (zh) | 网络专线监控方法、装置、设备及计算机可读存储介质 | |
CN110521233B (zh) | 标识中断的方法、接入点、远程配置的方法、系统和介质 | |
WO2019241199A1 (fr) | Système et procédé de maintenance prédictive de dispositifs en réseau | |
JP2003132019A (ja) | 計算機システムの障害監視方法 | |
CN111259383A (zh) | 一种安全管理中心系统 | |
CN110489690B (zh) | 监控政务服务应用系统的方法、服务器、设备及存储介质 |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
AK | Designated states |
Kind code of ref document: A1 Designated state(s): AE AL AM AT AU AZ BA BB BG BR BY CA CH CN CR CU CZ DE DK DM DZ EE ES FI GB GD GE GH GM HR HU ID IL IN IS JP KE KG KP KR KZ LC LK LR LS LT LU LV MA MD MG MK MN MW MX NO NZ PL PT RO RU SD SE SG SI SK SL TJ TM TR TT TZ UA UG US UZ VN YU ZA ZW |
|
AL | Designated countries for regional patents |
Kind code of ref document: A1 Designated state(s): GH GM KE LS MW SD SL SZ TZ UG ZW AM AZ BY KG KZ MD RU TJ TM AT BE CH CY DE DK ES FI FR GB GR IE IT LU MC NL PT SE BF BJ CF CG CI CM GA GN GW ML MR NE SN TD TG |
|
121 | Ep: the epo has been informed by wipo that ep was designated in this application | ||
REG | Reference to national code |
Ref country code: DE Ref legal event code: 8642 |
|
122 | Ep: pct application non-entry in european phase |