EP1381952A2 - Paniknachrichtanalysegerät - Google Patents

Paniknachrichtanalysegerät

Info

Publication number
EP1381952A2
EP1381952A2 EP01973104A EP01973104A EP1381952A2 EP 1381952 A2 EP1381952 A2 EP 1381952A2 EP 01973104 A EP01973104 A EP 01973104A EP 01973104 A EP01973104 A EP 01973104A EP 1381952 A2 EP1381952 A2 EP 1381952A2
Authority
EP
European Patent Office
Prior art keywords
message
bugs
customer
database
version
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Ceased
Application number
EP01973104A
Other languages
English (en)
French (fr)
Inventor
Roderick E. Bagg
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
NetApp Inc
Original Assignee
Supachill Technologies Pty Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Supachill Technologies Pty Ltd filed Critical Supachill Technologies Pty Ltd
Publication of EP1381952A2 publication Critical patent/EP1381952A2/de
Ceased legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/36Prevention of errors by analysis, debugging or testing of software
    • G06F11/362Debugging of software
    • G06F11/366Debugging of software using diagnostics
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/22Detection or location of defective computer hardware by testing during standby operation or during idle time, e.g. start-up testing
    • G06F11/2294Detection or location of defective computer hardware by testing during standby operation or during idle time, e.g. start-up testing by remote test
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/32Monitoring with visual or acoustical indication of the functioning of the machine
    • G06F11/324Display of status information
    • G06F11/327Alarm or error message display

Definitions

  • This invention relates to analysis of panic messages from network servers.
  • a first known method to enable reporting of a software application error is to provide a pre-public release of a software package to a select group customers for "beta testing.” During this trial period, customers report to the company any problems that they encounter and the software engineers at the company fix the bugs and provide updated versions of the software to the beta testers who continue testing with the new version. This process continues for a short testing period until the software is hopefully error free. While this first known method provides reporting of software bugs to a manufacturer it suffers from several drawbacks. First, it provides no method for automatically reporting the problem to the manufacturer. It relies solely on the beta tester to inform the manufacturer. Second, it provides no automated analysis of a problem identified by a beta tester. That is, it requires an employee at the manufacturer to determine whether the problem has already been reported, fixed, or is a new problem. Third, it provides no method for delivery of updated software to a user who is determined to be using older software with an identified and fixed problem.
  • a second known method of reporting computer system errors is to rely on the end user to call the manufacturer and report a problem when it occurs.
  • the customer is provided a customer support line that they may call to report problems they are having.
  • the manufacturer may conclude there is a problem with some portion of a program.
  • While this second known method provides reporting of software bugs to a manufacturer it suffers from several drawbacks.
  • the customer may decide not to call as customer support calls tend to involve long waits on hold listening to musak and often provides no relief as the manufacturer has no formal structure in place to coordinate and analyze the calls they receive.
  • the customer may not be knowledgeable enough to provide the manufacturer with the necessary information they need to diagnose the problem, or worse, they may misinform the manufacturer as to the origin of the problem.
  • the invention includes a system and method for analyzing panic messages from computer systems that have suffered failures.
  • a filer server dedicated to file storage and retrieval
  • This message is indicative of the problem that caused the filer to crash.
  • This message is sent to the manufacturer via a communications network such as the Internet.
  • the message also includes other information, such as the user's name, the version of the software, a back trace, and a mini core dump.
  • automatic analysis commences to determine if the bug can be identified.
  • the panic message is analyzed by comparing it against a database of panic messages that correspond with known bugs. If successful, automated housekeeping occurs which includes updating this instance in a tracking database, delivery of an answer to the customer (including solutions), updating analysis statistics, and additional activities. If unsuccessful the process continues.
  • a back trace analyzer analyzes the back trace using an expression algorithm that looks for exact matches on function names and recognized sequences of matches that correspond to known bugs. If successful, automated housekeeping occurs as indicated above. If unsuccessful, the process continues.
  • a core script analyzer analyzes a core dump for recognizable patterns of code that correspond to known bugs. If successful, automated housekeeping occurs as indicated above If unsuccessful the process continues.
  • Figure 1 illustrates a block diagram of a system for a panic message analyzer.
  • Figure 2 illustrates a panic message analyzer process in a system for a panic message analyzer.
  • Figure 4 illustrates a core dump process in a system for a panic message analyzer.
  • Embodiment of the invention can be implemented using general purpose processors or special purpose processors operating under program control, or other circuits, adapted to particular process steps and data structures described herein. Implementation of the process steps and data structures described herein would not require undue experimentation or further investigation.
  • filer - This term refers to a file server.
  • a file server is a computer and storage device dedicated to data storage and retrieval.
  • Core dump - A core dump is the printing or the copying to a more permanent medium (such as a hard disk) the contents of random access memory at one moment in time.
  • Figure 1 shows a block diagram of a system for a panic message analyzer.
  • a system 100 includes a client device 110 associated with a customer, a communications link 120, a communications network 130, a server device 140 associated with a manufacturer, a mass storage 150, a housekeeping database 151, a bugs database 152, and a core dump 160.
  • the client device 110 includes a processor, a main memory, and software for executing instructions (not shown, but understood by one skilled in the art). Although the client device 110 and server device 140 are shown as separate devices there is no requirement that they be separate devices.
  • the communications link 120 operates to couple the client device 110 to the communications network 130.
  • the server device 140 includes a processor, a main memory, software for executing instructions (not shown, but understood by one skilled in the art), and a mass storage 150.
  • client device 110 and server device 140 are shown as separate devices there is no requirement that they be separate devices.
  • server device 140 and mass storage 150 are shown as combined there is no requirement that they be combined. They could be separate devices.
  • the mass storage 150 includes the housekeeping database 151 and bugs database 152.
  • the core dump 160 includes a mini core dump 161, a back-trace 162, and a panic message 163.
  • FIG. 2 illustrates a panic message analyzer process, indicated by general reference character 200.
  • the manual panic message analyzer process 200 initiates at a 'start' terminal 201.
  • the panic message analyzer process 200 continues to a 'panic message created' procedure 203 which allows the customer's device to create a panic message 163 prior to failure.
  • a 'customer submits panic message' procedure 205 allows the customer to submit the panic message 163 for analysis utilizing the client device 110 to transmit the panic message 163 to the server device 140.
  • the customer submits the message via interaction and transfer over an Internet connection which is well- known in the art. There is, however, no requirement the panic message 163 be transferred by this method as long as it is delivered to the manufacturer.
  • An 'analyze panic message' procedure 207 allows the panic message 163 to be analyzed by comparing recognized data elements it contains (a panic message includes the address of where a system was last operating, line numbers, text and source code filenames, and other data) against known data elements that correspond to known bugs in the bugs database 152 on the server device 140.
  • a 'known bug?' decision procedure 209 determines whether the panic message identifies a known bug. If the "known bug?' decision procedure 209 determines that the bug is a known bug, the panic message analyzer process 200 continues to a "solution to customer" procedure 213.
  • the 'solution to customer' procedure 213 extracts a solution from the database which is associated with the bug identified by the 'known bug' decision procedure
  • the solution provided to the customer can be written instructions detailing how to fix and avoid further occurrences, a copy of a software program to fix the problem, or recommendations for the purchase of additional products from the manufacturer that fix the problem.
  • An 'automatic housekeeping' procedure 215 records all relevant information regarding identification/non-identification of the bug, the solution sent to the customer (if any), and statistics relating to these events in the housekeeping database 151. If the panic message analyzer failed to diagnose the problem, the 'automatic housekeeping' procedure leaves the case active (i.e. marked as unresolved).
  • FIG. 3 illustrates an auto support process, indicated by general reference character 300.
  • the auto support process 300 initiates at a 'start' terminal 301.
  • the auto support process 300 continues to an 'auto support message sent' procedure 303 which allows the client device 110 to automatically send a message to the sever device 140 containing a copy of the panic message 163 and mini core dump 161.
  • An 'auto support message received' procedure 305 allows the server device 140 to receive the panic message 163 and mini core dump 161 from the client device 110.
  • An 'analyze panic message' procedure 307 allows the panic message 163 to be analyzed by comparing recognized data elements it contains (a panic message includes the address of where a system was last operating, line numbers, text and source code filenames, and other data) against known data elements that correspond to known bugs in the bugs database 152 on the server device 140.
  • a 'known panic bug?' decision procedure 309 determines whether the panic message identifies a known bug. If the "known bug?' decision procedure 209 determines that the bug is a known bug, the panic message analyzer process 200 continues to a "discard mini core dump" procedure 321.
  • An 'extract back-trace' procedure 311 extracts the back-trace 162 from the mini core dump 161.
  • An 'analyze back-trace' procedure 313 allows the back-trace 162 to be analyzed using an expression algorithm that looks for exact matches on function names and recognized sequences of function names that correspond to known bugs in the bugs database 152 on the server device 140.
  • a 'known back-trace bug?' decision procedure 315 determines whether the back-trace 162 identifies a known bug. If the 'known back-trace bug?' decision procedure 315 determines that the bug is a known bug, the auto support process 300 continues to a "discard mini core dump" procedure 321.
  • a 'request core dump' 317 procedure notifies the customer that a core dump
  • This notification includes all the instructions necessary to create the core dump 160 and deliver it to the manufacturer.
  • the notification would be sent electronically to the customer; however, there is no requirement that notification be accomplished in this manner.
  • An 'automatic housekeeping' procedure 319 records all relevant information regarding identification/non-identification of the bug, the solution sent to the customer (if any), and statistics relating to these events in the housekeeping database 151. If the panic message analyzer failed to diagnose the problem, the 'automatic housekeeping' procedure leaves the case active (i.e. marked as unresolved).
  • the panic message analyzer would not identify it in version two if the bug now appeared at line 20 due to the exact matching methodology used.
  • the back-trace analyzer might identify the bug as it uses a more sophisticated approach, and it would then pass this information to the panic message analyzer.
  • the auto support process 300 terminates through an 'end' terminal 325.
  • a 'discard mini core dump' procedure 321 causes the mini core dump 161 to be discarded as it is no longer needed due to identification of the bug.
  • a 'solution sent to customer' procedure 323 causes a solution to be extracted from the bugs database 152 which is associated with the identified bug.
  • the solution provided to the customer varies depending on the bug identified. For example, it can be written instructions detailing how to fix and avoid further occurrences, a copy of a software program to fix the problem, or recommendations for the purchase of additional products from the manufacturer that fix the problem.
  • the auto support process 300 continues to an 'automatic housekeeping' procedure 319.
  • FIG 4 illustrates a core dump process, indicated by general reference character 400.
  • the core dump process 400 initiates at a 'start' terminal 401.
  • the core dump process 400 continues to a 'core arrives from customer' procedure 403 which allows analysis of the core dump 160 to begin.
  • the core dump 160 is requested by a' request core dump' procedure 317 (illustrated in Figure 3) when prior analysis of the panic message 163 and back-trace 162 have failed.
  • An 'analyze panic message' procedure 405 allows the panic message 163 to be analyzed by comparing recognized data elements it contains (a panic message includes the address of where a system was last operating, line numbers, text and source code filenames, and other data) against known data elements that correspond to known bugs in the bugs database 152 on the server device 140.
  • a 'known panic bug?' decision procedure 407 determines whether the panic message identifies a known bug. If the "known bug?' decision procedure 407 determines that the bug is a known bug, the core dump process 400 continues to a "store core dump" procedure 423.
  • An 'extract back-trace' procedure 409 extracts the back-trace 162 from the core dump 160.
  • An 'analyze back-trace' procedure 411 allows the back-trace 162 to be analyzed using an expression algorithm that looks for exact matches on function names and recognized sequences of function names that correspond to known bugs within the bugs database 152.
  • a 'known back-trace bug?' decision procedure 413 determines whether the back-trace 162 identifies a known bug. If the 'known back-trace bug?' decision procedure 413 determines that the bug is a known bug, the core dump process 400 continues to a 'store core dump' procedure 423.
  • a 'core script analyzer' procedure 415 automatically analyzes the core dump
  • a 'known core bug?' decision procedure 417 determines whether core script analysis has identified a known bug. If the 'known core bug?' decision procedure 417 determines it has identified a known core bug, the core dump process 400 continues to a 'store core dump' procedure 423.
  • a 'manual core dump analysis' procedure 419 allows the core dump 160 to be analyzed manually by personnel at the manufacturer.
  • a 'manual solution sent to customer' procedure 421 allows personnel at the manufacturer to send a solution to the customer based on the manual analysis of the core dump 160.
  • the core dump process 400 continues to a "automatic housekeeping" procedure
  • a 'store core dump' procedure 423 allows the mini core dump 161 to be moved to a storage location.
  • a 'solution sent to customer' procedure 425 causes a solution to be extracted from the bugs database 152 which is associated with the identified bug.
  • the solution provided to the customer varies depending on the bug identified. For example, it can be written instructions detailing how to fix and avoid further occurrences, a copy of a software program to fix the problem, or recommendations for the purchase of additional products from the manufacturer that fix the problem.
  • An 'automatic housekeeping' procedure 427 records all relevant information regarding identification/non-identification of the bug, the solution sent to the customer (if any), statistics relating to these events, and any entries necessary to the bugs database 152.
  • functionality exists that allows the back-trace analyzer to teach the panic message analyzer about the bug. This allows future instances of the bug to be resolved at an earlier stage.
  • functionality exists that allows the core to teach the back-trace analyzer and panic message analyzer about the bug. This allows future instances of the bug to be resolved at an earlier stage.
  • the core dump process 400 terminates through an 'end' terminal 429.
  • the invention has general applicability to various fields of use, not necessarily related to the services described above.
  • these fields of use can include one or more of, or some combination of, the following:

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • General Engineering & Computer Science (AREA)
  • Computer Hardware Design (AREA)
  • Quality & Reliability (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Debugging And Monitoring (AREA)
  • Automatic Analysis And Handling Materials Therefor (AREA)
  • Stored Programmes (AREA)
  • Information Transfer Between Computers (AREA)
EP01973104A 2000-09-08 2001-09-10 Paniknachrichtanalysegerät Ceased EP1381952A2 (de)

Applications Claiming Priority (3)

Application Number Priority Date Filing Date Title
US65820800A 2000-09-08 2000-09-08
US658208 2000-09-08
PCT/US2001/029049 WO2002021281A2 (en) 2000-09-08 2001-09-10 Panic message analyzer

Publications (1)

Publication Number Publication Date
EP1381952A2 true EP1381952A2 (de) 2004-01-21

Family

ID=24640348

Family Applications (1)

Application Number Title Priority Date Filing Date
EP01973104A Ceased EP1381952A2 (de) 2000-09-08 2001-09-10 Paniknachrichtanalysegerät

Country Status (4)

Country Link
EP (1) EP1381952A2 (de)
JP (1) JP4979176B2 (de)
CA (1) CA2420008C (de)
WO (1) WO2002021281A2 (de)

Families Citing this family (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7174352B2 (en) 1993-06-03 2007-02-06 Network Appliance, Inc. File system image transfer
US6138126A (en) 1995-05-31 2000-10-24 Network Appliance, Inc. Method for allocating files in a file system integrated with a raid disk sub-system
US7343529B1 (en) 2004-04-30 2008-03-11 Network Appliance, Inc. Automatic error and corrective action reporting system for a network storage appliance
EP2232367A4 (de) 2007-12-12 2011-03-09 Univ Washington Deterministische mehrfachverarbeitung
WO2009114645A1 (en) * 2008-03-11 2009-09-17 University Of Washington Efficient deterministic multiprocessing
US8453120B2 (en) 2010-05-11 2013-05-28 F5 Networks, Inc. Enhanced reliability using deterministic multiprocessing-based synchronized replication
CN109542657A (zh) * 2018-10-16 2019-03-29 深圳壹账通智能科技有限公司 系统异常的处理方法及服务器

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5111384A (en) * 1990-02-16 1992-05-05 Bull Hn Information Systems Inc. System for performing dump analysis
US5293612A (en) * 1989-05-11 1994-03-08 Tandem Computers Incorporated Selective dump method and apparatus
EP0586767A1 (de) * 1992-09-11 1994-03-16 International Business Machines Corporation Selektive Datenerfassung für Software-Ausnahmezustände

Family Cites Families (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH0291735A (ja) * 1988-09-28 1990-03-30 Tohoku Nippon Denki Software Kk リモート障害保守管理システム
JPH04335449A (ja) * 1991-05-13 1992-11-24 Nec Corp 端末障害情報採取方式
SE470031B (sv) * 1991-06-20 1993-10-25 Icl Systems Ab System och metod för övervakning och förändring av driften av ett datorsystem
JPH05334135A (ja) * 1992-05-28 1993-12-17 Nec Corp プログラム異常終了時のエラー情報表示方式
US5761407A (en) * 1993-03-15 1998-06-02 International Business Machines Corporation Message based exception handler
JP2701807B2 (ja) * 1995-09-13 1998-01-21 日本電気株式会社 障害通知装置
JPH10228395A (ja) * 1997-02-17 1998-08-25 Sekisui Chem Co Ltd 制御用コントローラの異常診断装置
US6073255A (en) * 1997-05-13 2000-06-06 Micron Electronics, Inc. Method of reading system log
JPH1124961A (ja) * 1997-07-08 1999-01-29 Nippon Denki Joho Service Kk コンピュータ保守システム
JPH1139259A (ja) * 1997-07-15 1999-02-12 Casio Comput Co Ltd 情報処理装置、及びプログラムを記録した記録媒体
JP3525410B2 (ja) * 1998-12-16 2004-05-10 富士通株式会社 障害復旧方法およびそのためのコンピュータ読み取り可能なプログラム記録媒体
JP2000181734A (ja) * 1998-12-16 2000-06-30 Fujitsu Ltd プログラム参照領域の修復方法、修復システム、プログラム走行側装置およびプログラム障害対処装置ならびにそのためのコンピュ−タ読み取り可能なプログラム記録媒体

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5293612A (en) * 1989-05-11 1994-03-08 Tandem Computers Incorporated Selective dump method and apparatus
US5111384A (en) * 1990-02-16 1992-05-05 Bull Hn Information Systems Inc. System for performing dump analysis
EP0586767A1 (de) * 1992-09-11 1994-03-16 International Business Machines Corporation Selektive Datenerfassung für Software-Ausnahmezustände

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
See also references of WO0221281A3 *

Also Published As

Publication number Publication date
WO2002021281A3 (en) 2003-11-06
CA2420008C (en) 2012-04-03
CA2420008A1 (en) 2002-03-14
WO2002021281A2 (en) 2002-03-14
JP4979176B2 (ja) 2012-07-18
JP2004524596A (ja) 2004-08-12

Similar Documents

Publication Publication Date Title
US7475387B2 (en) Problem determination using system run-time behavior analysis
US7984007B2 (en) Proactive problem resolution system, method of proactive problem resolution and program product therefor
US7328376B2 (en) Error reporting to diagnostic engines based on their diagnostic capabilities
US6859893B2 (en) Service guru system and method for automated proactive and reactive computer system analysis
US8140565B2 (en) Autonomic information management system (IMS) mainframe database pointer error diagnostic data extraction
US7080287B2 (en) First failure data capture
US8250563B2 (en) Distributed autonomic solutions repository
US8244792B2 (en) Apparatus and method for information recovery quality assessment in a computer system
US7007200B2 (en) Error analysis fed from a knowledge base
US7305465B2 (en) Collecting appliance problem information over network and providing remote technical support to deliver appliance fix information to an end user
US20050081118A1 (en) System and method of generating trouble tickets to document computer failures
US20050022176A1 (en) Method and apparatus for monitoring compatibility of software combinations
US20040236843A1 (en) Online diagnosing of computer hardware and software
US20160026547A1 (en) Generating predictive diagnostics via package update manager
US20070038896A1 (en) Call-stack pattern matching for problem resolution within software
JPH0644242B2 (ja) コンピュータ・システムにおける問題解決方法
CN101918922A (zh) 用于计算机网络中的自动数据异常修正的系统和方法
NZ526097A (en) Online diagnosing of computer hardware and software from a remote location without requiring human assistance
US6957366B1 (en) System and method for an interactive web-based data catalog for tracking software bugs
US20060088027A1 (en) Dynamic log for computer systems of server and services
CN111444101A (zh) 自动创建产品测试缺陷的方法及装置
CA2420008C (en) Panic message analyzer
US20070011541A1 (en) Methods and systems for identifying intermittent errors in a distributed code development environment
JP2003345628A (ja) 障害調査資料採取方法及びその実施システム並びにその処理プログラム
CN114371870B (zh) 代码扫描、提交方法及代码扫描服务器、客户端和服务端

Legal Events

Date Code Title Description
PUAI Public reference made under article 153(3) epc to a published international application that has entered the european phase

Free format text: ORIGINAL CODE: 0009012

17P Request for examination filed

Effective date: 20030404

AK Designated contracting states

Kind code of ref document: A2

Designated state(s): DE FR GB IT NL

RAP1 Party data changed (applicant data changed or rights of an application transferred)

Owner name: NETWORK APPLIANCE, INC.

REG Reference to a national code

Ref country code: DE

Ref legal event code: R003

STAA Information on the status of an ep patent application or granted ep patent

Free format text: STATUS: THE APPLICATION HAS BEEN REFUSED

18R Application refused

Effective date: 20111206