WO2001050263A1 - Systeme et procede de reconnaissance d'une defaillance d'un dispositif - Google Patents

Systeme et procede de reconnaissance d'une defaillance d'un dispositif Download PDF

Info

Publication number
WO2001050263A1
WO2001050263A1 PCT/US2000/035722 US0035722W WO0150263A1 WO 2001050263 A1 WO2001050263 A1 WO 2001050263A1 US 0035722 W US0035722 W US 0035722W WO 0150263 A1 WO0150263 A1 WO 0150263A1
Authority
WO
WIPO (PCT)
Prior art keywords
monitoring
requests
ofthe
recited
received
Prior art date
Application number
PCT/US2000/035722
Other languages
English (en)
Inventor
William Gaske
Original Assignee
Computer Associates Think, Inc.
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Computer Associates Think, Inc. filed Critical Computer Associates Think, Inc.
Priority to AU26125/01A priority Critical patent/AU2612501A/en
Publication of WO2001050263A1 publication Critical patent/WO2001050263A1/fr

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/3055Monitoring arrangements for monitoring the status of the computing system or of the computing system component, e.g. monitoring if the computing system is on, off, available, not available
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/0703Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation
    • G06F11/0751Error or fault detection not based on redundancy
    • G06F11/0754Error or fault detection not based on redundancy by exceeding limits
    • G06F11/0757Error or fault detection not based on redundancy by exceeding limits by exceeding a time limit, i.e. time-out, e.g. watchdogs
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/3003Monitoring arrangements specially adapted to the computing system or computing system component being monitored
    • G06F11/3006Monitoring arrangements specially adapted to the computing system or computing system component being monitored where the computing system is distributed, e.g. networked systems, clusters, multiprocessor systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/3003Monitoring arrangements specially adapted to the computing system or computing system component being monitored
    • G06F11/3041Monitoring arrangements specially adapted to the computing system or computing system component being monitored where the computing system component is an input/output interface
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L43/00Arrangements for monitoring or testing data switching networks
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L43/00Arrangements for monitoring or testing data switching networks
    • H04L43/10Active monitoring, e.g. heartbeat, ping or trace-route

Definitions

  • the present disclosure relates to device monitoring, and in particular to a system and
  • a thread is a code sequence that runs multi-
  • the processor can still be processing
  • Threads provide an efficient way to improve the throughput and scalability of
  • monitoring by allowing multiple monitoring requests to execute concurrently, each in different
  • monitoring operation that can be performed by a thread is a "ping" to remote network devices.
  • An example of a ping is a UNIX utility used to determine whether a specified address can be
  • a ping command can use an Internet Control Message Protocol (ICMP) to determine if ICMP is Internet Control Message Protocol (ICMP).
  • ICMP Internet Control Message Protocol
  • a node can respond.
  • APIs application program interfaces
  • monitoring such as the monitoring of system event logs, log files, and
  • APIs which are specific to the OS (Operating System) platform
  • RPC remote procedure call
  • RPCs are protocols that enable computers to transmit data or to request services of other computers or devices.
  • one or more ports may be monitored, one or more log files may be monitored, one or
  • application log files may be monitored, one or more services and processes may be monitored, one or more services and processes may be monitored, one or more services and processes may be monitored, one or more services and processes may be monitored, one or more services and processes may be monitored, one or more services and processes may be monitored, one or more services and processes may be monitored, one or more services and processes may be monitored, one or more services and processes may be monitored, one or more services and processes may be monitored, one or more services and processes may be monitored, one or more services and processes may be monitored, one or more services and processes may be monitored, one or more services and processes may be monitored, one or more services and processes may be monitored, one or more services and processes may be monitored, one or more services and processes may be monitored, one or more services and processes may be monitored, one or more services and processes may be monitored, one or more services and processes may be monitored, one or more services and processes may be monitored, one or more services and processes may be monitored, one or more services and processes may be monitored, one or more services and processes may be monitored, one or more services and processes
  • Another method may select a
  • a method of monitoring a device including sending a plurality of monitoring
  • processing is not performed for any ofthe monitoring requests unless a valid response to at least
  • the method may further comprise
  • Each ofthe plurality of monitoring requests may run via
  • each ofthe respective monitoring tlireads may be instructed to process
  • the respective monitoring tlireads may be instructed to discard results of their respective requests.
  • the method may further include running a status checking thread, wherein if a valid response to
  • the response may be forwarded to the status checking thread.
  • the plurality of requests sent to the device may be for monitoring at least one of physical network interface availability, TCP/IP port
  • the method may
  • a monitoring request may be considered to have timed out, if a response is not
  • a monitoring system for monitoring at least one device on a network may include storage
  • the at least one monitoring application sending a plurality of monitoring requests to
  • the device and processing responses to the plurality of monitoring requests only after a response
  • processing of responses is not performed for any ofthe monitoring requests unless a valid
  • a storage medium storing a monitoring application includes computer executable code
  • each monitoring thread monitoring a request made
  • each ofthe monitoring tlireads process the results of their
  • a method of monitoring a device includes sending a plurality of monitoring requests to
  • each monitoring request having a time out time associated therewith, wherein a
  • monitoring request is considered to have timed out, if a response to the request is not received prior to the time out time expiring and processing responses to the plurality of monitoring
  • a method of monitoring a device may include sending a set of monitoring requests to the
  • the set of monitoring requests being commenced in multiple tlireads at substantially the
  • Fig. 1 is a block diagram of a network including a monitoring system according to an embodiment
  • FIG. 2A, 2B and 2C depict a flow chart for describing a monitoring method according to an embodiment
  • Fig. 3 is a block diagram of a monitoring system according to an embodiment
  • Fig. 4 is a block diagram depicting exemplary components capable of being monitored
  • Figs. 5A, 5B and 5C depict a flowchart for describing a monitoring method according to
  • Fig. 6 depicts a monitoring queue according to an embodiment.
  • each specific element includes all technical equivalents which operate in a similar manner.
  • Fig. 1 depicts an exemplary network to which the present system and method may be
  • a monitoring system 102 may be connected to remote
  • a network printer 104 such as a network printer 104, network facsimile device 106, server 108, computer
  • system 102 can communicate with these other remote devices on the network 116 via suitable
  • Network 116 may be, for example, a local area network (LAN).
  • LAN local area network
  • WAN wide area network
  • Internet the Internet
  • monitoring system 102 includes a monitoring application that may be used to
  • the monitoring application can be provided on one ofthe other remote devices shown in Fig. 1 or on a device
  • monitoring device may itself be monitored by another device running a monitoring application.
  • Monitoring system 102 may be a standard PC, laptop, mainframe, etc. capable of running
  • FIG. 3 depicts a block
  • monitoring system 102 may include. Of course, monitoring
  • system 102 may not include each component shown and/or may include additional components
  • monitoring system 102 may include a central processing unit (CPU) 2, a graphics processing unit (GPU), a graphics processing unit (GPU), a graphics processing unit (GPU), a graphics processing unit (GPU), a graphics processing unit (GPU), a graphics processing unit (GPU), a graphics processing unit (GPU), a graphics processing unit (GPU), a graphics processing unit (GPU), a graphics processing unit (GPU), a graphics processing unit (GPU) 2, a graphics processing unit (GPU) 2, a graphics processing unit (GPU), a graphics processing unit (GPU), a graphics processing unit (GPU), a graphics processing unit (GPU), a graphics processing unit (GPU), a graphics processing unit (GPU), a graphics processing unit (GPU), a graphics processing unit (GPU), a graphics processing unit (GPU), a graphics processing unit (GPU), a graphics processing unit (GPU), a graphics processing unit (GPU), a graphics processing unit (GPU), a graphics processing unit (GPU),
  • controller 12 a LAN interface 14
  • network controller 16 an internal bus 18 and one or more
  • input devices 20 such as, for example, a keyboard and mouse.
  • CPU 2 controls the operation of system 102 and is capable of running applications stored
  • Memory 4 may include, for example, RAM, ROM, removable CDROM, DVD,
  • Memory 4 may also store various types of data necessary for the execution ofthe
  • Clock circuit 6 may include a
  • circuit for generating information indicating the present time may be capable of being
  • the LAN interface 14 allows communication between the network 116, which in this example is a LAN, and the LAN data transmission controller 12.
  • controller 12 uses a predetermined protocol suite to exchange information and data with the other
  • Monitoring system 102 may also be capable of communicating with
  • System 102 may also be capable of
  • PSTN Public Switched Telephone Network
  • Each of the devices on network 116 including server 108, computers 110, 112, facsimile
  • printer 104 may include one or more ofthe components shown in Fig. 4
  • a device may
  • system event logs 40 include one or more system event logs 40, one or more system error log files 42, one or more
  • Monitoring system 102 may include an operating system capable of running different
  • system 102 is capable of simultaneously
  • Each thread performing a monitoring request will typically execute until a blocking
  • the monitoring thread performs the monitoring request.
  • the monitoring thread also forwards
  • thread may wait indefinitely for a result to be returned from the device being monitored.
  • a timer in order to avoid an indefinite wait state, a timer
  • the device is not forthcoming. After the time expires, the monitoring request or requests that
  • a failure e.g., an error, failure or invalid message
  • responses to each ofthe threads is processed in a normal manner.
  • step SI the monitoring application provided on system 102 is started.
  • monitoring application may be manually started by an operator or system 102 may be
  • step S2 a set ofthe specific types of monitoring requests to be performed can
  • step S3 an overall monitoring time can be input and set by the operator or
  • time-out timer e.g., Figure 3, clock circuit 6
  • each monitoring request in the set of requests is performed by a
  • step S4 each ofthe monitoring request threads is commenced at
  • step S6 the time-out timer is started.
  • Each ofthe monitoring threads then monitors for a result to its request or for a time-out (Step S8). If a monitoring result is received,
  • Step S10 the result is forwarded by the monitoring thread to the status checking thread
  • Step S 14 If a monitoring thread result is not received
  • Step S10 a determination is made whether the time-out timer has timed out. If the time ⁇
  • Step SI 6 the process returns to Step S8 and monitoring
  • Step SI 6 If the time-out timer has timed out (Yes, Step SI 6), monitoring requests that have not
  • Step S22 examines the request results (Step S22).
  • Step S24 a determination is made whether each of
  • catastrophic failure may then be reported, for example, to an operator via the display or other
  • requests are processed in a normal manner and can be reported, for example, to the operator via
  • the display or other output device and/or stored.
  • Each ofthe components capable of being monitored may not take the same amount of
  • FIG. 5 depicts a method for monitoring according to
  • each monitoring request has a time-out time associated therewith.
  • each monitoring request is monitored for a result. If a result is not received within the
  • each monitoring request is arranged in a monitor queue such as that
  • each monitoring requests includes a "Time-out" time associated
  • step S50 the monitoring application provided on
  • the monitoring application may be manually started by an operator or
  • system 102 may be programmed to automatically run the monitoring application periodically or
  • step S52 a set ofthe specific types of monitoring requests to
  • step S53 a monitoring time can be input and set by the
  • step S56 the time-out timer is started. In this
  • the time-out timer may be, for example, an elapsed time counter.
  • monitoring threads then monitors for a result to its request or for a time-out (Step S58).
  • Step S60 a
  • Step S60 information indicating the result (e.g., valid message or error message) is forwarded to the status checking thread and stored (Step S60).
  • Step S62 the next monitoring request in the queue is selected (Step S64). That monitoring
  • Step S58 If a result for the monitoring request is not
  • Step S60 a determination is made whether the request has timed out. This
  • determination can be made by determining whether the time-out time associated with that
  • Step S56 If the request has not timed out (No, Step S66), the next monitoring request
  • Step S67 it is checked whether a result for that request has been
  • Step S58 if the monitoring request timed out (Yes, Step S66),
  • time-out information identifying the monitoring request that timed out is forwarded to the status
  • Step S70 a determination is made whether each monitoring request in the queue has either
  • Step S70 the next monitoring request in the queue is
  • Step S67 the process returns to Step S58. If all monitoring requests have either
  • Step S70 the status checking thread then examines the request
  • Step S72 a determination is made whether all requests have failed, either
  • Step S74 If all requests failed (Yes, Step S74), it is determined that a catastrophic failure ofthe
  • Step S76 Notification may then
  • Step S78 sent to all processing threads to process all results and the results are processed in a
  • start times can be set up, and any particular monitoring request or requests can be assigned to the

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Computing Systems (AREA)
  • Quality & Reliability (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Computer And Data Communications (AREA)
  • Debugging And Monitoring (AREA)

Abstract

On décrit un procédé qui permet de surveiller un dispositif, le procédé consistant à envoyer au dispositif une pluralité de demandes (S2) de surveillance et à traiter les réponses aux diverses demandes de surveillance suite à la réception (S10) d'une réponse donnée à chaque demande ou à la fin du temps de réponse imparti à la demande (S16), ledit traitement n'étant effectué pour aucune des demandes de surveillance tant qu'une réponse valide à au moins une des diverses demandes de surveillance n'est pas reçue.
PCT/US2000/035722 1999-12-30 2000-12-29 Systeme et procede de reconnaissance d'une defaillance d'un dispositif WO2001050263A1 (fr)

Priority Applications (1)

Application Number Priority Date Filing Date Title
AU26125/01A AU2612501A (en) 1999-12-30 2000-12-29 System and method for device failure recognition

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US17408599P 1999-12-30 1999-12-30
US60/174,085 1999-12-30

Publications (1)

Publication Number Publication Date
WO2001050263A1 true WO2001050263A1 (fr) 2001-07-12

Family

ID=22634767

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US2000/035722 WO2001050263A1 (fr) 1999-12-30 2000-12-29 Systeme et procede de reconnaissance d'une defaillance d'un dispositif

Country Status (2)

Country Link
AU (1) AU2612501A (fr)
WO (1) WO2001050263A1 (fr)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP1659753A1 (fr) * 2004-11-22 2006-05-24 LG Electronics Inc. Système et procédé de commande de surveillance
EP1962538A1 (fr) * 2005-12-09 2008-08-27 Huawei Technologies Co., Ltd. Procede de gestion de dispositif terminal
CN101727378B (zh) * 2009-12-31 2013-04-17 深圳联友科技有限公司 一种应用服务程序稳定运行控制方法及系统
WO2016055893A1 (fr) * 2014-10-09 2016-04-14 Telefonaktiebolaget L M Ericsson (Publ) Procédé, dispositif de surveillance de trafic (tm), routeur de requête (rr) et système de surveillance d'un réseau de distribution de contenu (cdn)
EP3678023A4 (fr) * 2017-09-30 2020-08-26 Huawei Technologies Co., Ltd. Procédé et dispositif de gestion de temporisation de service de système

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5237684A (en) * 1991-08-12 1993-08-17 International Business Machines Corporation Customized and versatile event monitor within event management services of a computer system
US5355484A (en) * 1991-08-12 1994-10-11 International Business Machines Corporation Dynamically established event monitors in event management services of a computer system
US5437046A (en) * 1993-02-24 1995-07-25 Legent Corporation System and method for resolving the location of a station on a local area network
US5953530A (en) * 1995-02-07 1999-09-14 Sun Microsystems, Inc. Method and apparatus for run-time memory access checking and memory leak detection of a multi-threaded program
US6098169A (en) * 1997-12-23 2000-08-01 Intel Corporation Thread performance analysis by monitoring processor performance event registers at thread switch
US6138249A (en) * 1997-12-11 2000-10-24 Emc Corporation Method and apparatus for monitoring computer systems during manufacturing, testing and in the field

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5237684A (en) * 1991-08-12 1993-08-17 International Business Machines Corporation Customized and versatile event monitor within event management services of a computer system
US5355484A (en) * 1991-08-12 1994-10-11 International Business Machines Corporation Dynamically established event monitors in event management services of a computer system
US5437046A (en) * 1993-02-24 1995-07-25 Legent Corporation System and method for resolving the location of a station on a local area network
US5953530A (en) * 1995-02-07 1999-09-14 Sun Microsystems, Inc. Method and apparatus for run-time memory access checking and memory leak detection of a multi-threaded program
US6138249A (en) * 1997-12-11 2000-10-24 Emc Corporation Method and apparatus for monitoring computer systems during manufacturing, testing and in the field
US6098169A (en) * 1997-12-23 2000-08-01 Intel Corporation Thread performance analysis by monitoring processor performance event registers at thread switch

Cited By (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP1659753A1 (fr) * 2004-11-22 2006-05-24 LG Electronics Inc. Système et procédé de commande de surveillance
JP2009060633A (ja) * 2004-11-22 2009-03-19 Lg Electronics Inc 移動通信端末機用モニタ装置及びモニタ制御方法
US7688813B2 (en) 2004-11-22 2010-03-30 Lg Electronics Inc. Monitoring control system and method
EP1962538A1 (fr) * 2005-12-09 2008-08-27 Huawei Technologies Co., Ltd. Procede de gestion de dispositif terminal
EP1962538B1 (fr) * 2005-12-09 2013-05-08 Huawei Technologies Co., Ltd. Procede de gestion de dispositif terminal
CN101727378B (zh) * 2009-12-31 2013-04-17 深圳联友科技有限公司 一种应用服务程序稳定运行控制方法及系统
WO2016055893A1 (fr) * 2014-10-09 2016-04-14 Telefonaktiebolaget L M Ericsson (Publ) Procédé, dispositif de surveillance de trafic (tm), routeur de requête (rr) et système de surveillance d'un réseau de distribution de contenu (cdn)
CN106797330A (zh) * 2014-10-09 2017-05-31 瑞典爱立信有限公司 用于监测内容递送网络(cdn)的方法、业务监测器(tm)、请求路由器(rr)和系统
US10498626B2 (en) 2014-10-09 2019-12-03 Telefonaktiebolaget Lm Ericsson (Publ) Method, traffic monitor (TM), request router (RR) and system for monitoring a content delivery network (CDN)
CN106797330B (zh) * 2014-10-09 2020-07-10 瑞典爱立信有限公司 用于监测内容递送网络(cdn)的方法、业务监测器(tm)、请求路由器(rr)和系统
EP3678023A4 (fr) * 2017-09-30 2020-08-26 Huawei Technologies Co., Ltd. Procédé et dispositif de gestion de temporisation de service de système
US11693701B2 (en) 2017-09-30 2023-07-04 Huawei Technologies Co., Ltd. System service timeout processing method, and apparatus

Also Published As

Publication number Publication date
AU2612501A (en) 2001-07-16

Similar Documents

Publication Publication Date Title
JP4174057B2 (ja) 2つ以上のノードからなる第1のグループに対して新しいメンバを加えるための方法
US6453430B1 (en) Apparatus and methods for controlling restart conditions of a faulted process
US5396613A (en) Method and system for error recovery for cascaded servers
US7415519B2 (en) System and method for prevention of boot storms in a computer network
US7613801B2 (en) System and method for monitoring server performance using a server
US6115393A (en) Network monitoring
US5805785A (en) Method for monitoring and recovery of subsystems in a distributed/clustered system
US6510478B1 (en) Method and apparatus for coordination of a shared object in a distributed system
EP1697843B1 (fr) Systeme et procede de gestion des defaillances d'un reseau de protocole d'un systeme de groupement
WO2000041377A1 (fr) Scanneur d'entree/sortie (e/s) pour un systeme de commande a determination d'homologue
WO1998003912A1 (fr) Procede et appareil de coordination des acces a un objet partage dans un systeme reparti
WO2001050263A1 (fr) Systeme et procede de reconnaissance d'une defaillance d'un dispositif
US6286111B1 (en) Retry mechanism for remote operation failure in distributed computing environment
US7222174B2 (en) Monitoring control network system
US5894547A (en) Virtual route synchronization
US20040039816A1 (en) Monitoring method of the remotely accessible resources to provide the persistent and consistent resource states
US6938086B1 (en) Auto-detection of duplex mismatch on an ethernet
Cisco Troubleshooting the Mainframe Application
JP4479564B2 (ja) ネットワークシステム初期化時のチャネル制御方法、プログラム、およびコンピュータシステム
Cisco Troubleshooting the Mainframe Application
US7039680B2 (en) Apparatus and method for timeout-free waiting for an ordered message in a clustered computing environment
US6990668B1 (en) Apparatus and method for passively monitoring liveness of jobs in a clustered computing environment
Cisco Troubleshooting the Mainframe Application
KR100427149B1 (ko) 클라이언트 프로세스와 서버 프로세스간의 서비스 접속을감시하는 방법 및 시스템과 프로그램 저장 장치
US12063160B1 (en) Automated VPN load balancer

Legal Events

Date Code Title Description
AK Designated states

Kind code of ref document: A1

Designated state(s): AE AG AL AM AT AU AZ BA BB BG BR BY BZ CA CH CN CR CU CZ DE DK DM DZ EE ES FI GB GD GE GH GM HR HU ID IL IN IS JP KE KG KP KR KZ LC LK LR LS LT LU LV MA MD MG MK MN MW MX MZ NO NZ PL PT RO RU SD SE SG SI SK SL TJ TM TR TT TZ UA UG UZ VN YU ZA ZW

AL Designated countries for regional patents

Kind code of ref document: A1

Designated state(s): GH GM KE LS MW MZ SD SL SZ TZ UG ZW AM AZ BY KG KZ MD RU TJ TM AT BE CH CY DE DK ES FI FR GB GR IE IT LU MC NL PT SE TR BF BJ CF CG CI CM GA GN GW ML MR NE SN TD TG

DFPE Request for preliminary examination filed prior to expiration of 19th month from priority date (pct application filed before 20040101)
121 Ep: the epo has been informed by wipo that ep was designated in this application
REG Reference to national code

Ref country code: DE

Ref legal event code: 8642

122 Ep: pct application non-entry in european phase
NENP Non-entry into the national phase

Ref country code: JP