EP1381952A2 - Panic message analyzer - Google Patents
Panic message analyzerInfo
- Publication number
- EP1381952A2 EP1381952A2 EP01973104A EP01973104A EP1381952A2 EP 1381952 A2 EP1381952 A2 EP 1381952A2 EP 01973104 A EP01973104 A EP 01973104A EP 01973104 A EP01973104 A EP 01973104A EP 1381952 A2 EP1381952 A2 EP 1381952A2
- Authority
- EP
- European Patent Office
- Prior art keywords
- message
- bugs
- customer
- database
- version
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Ceased
Links
- 238000000034 method Methods 0.000 claims abstract description 145
- 238000004891 communication Methods 0.000 claims description 10
- 238000012546 transfer Methods 0.000 claims description 3
- 230000005540 biological transmission Effects 0.000 claims 3
- 230000004044 response Effects 0.000 claims 3
- 239000000284 extract Substances 0.000 description 5
- 230000006870 function Effects 0.000 description 5
- 238000012360 testing method Methods 0.000 description 4
- 238000004422 calculation algorithm Methods 0.000 description 3
- 238000010586 diagram Methods 0.000 description 2
- 238000011835 investigation Methods 0.000 description 2
- 230000002730 additional effect Effects 0.000 description 1
- 238000013459 approach Methods 0.000 description 1
- 238000012937 correction Methods 0.000 description 1
- 238000013500 data storage Methods 0.000 description 1
- 238000003745 diagnosis Methods 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 230000003993 interaction Effects 0.000 description 1
- 238000012545 processing Methods 0.000 description 1
- 230000009897 systematic effect Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/36—Prevention of errors by analysis, debugging or testing of software
- G06F11/362—Debugging of software
- G06F11/366—Debugging of software using diagnostics
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/22—Detection or location of defective computer hardware by testing during standby operation or during idle time, e.g. start-up testing
- G06F11/2294—Detection or location of defective computer hardware by testing during standby operation or during idle time, e.g. start-up testing by remote test
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/30—Monitoring
- G06F11/32—Monitoring with visual or acoustical indication of the functioning of the machine
- G06F11/324—Display of status information
- G06F11/327—Alarm or error message display
Definitions
- This invention relates to analysis of panic messages from network servers.
- a first known method to enable reporting of a software application error is to provide a pre-public release of a software package to a select group customers for "beta testing.” During this trial period, customers report to the company any problems that they encounter and the software engineers at the company fix the bugs and provide updated versions of the software to the beta testers who continue testing with the new version. This process continues for a short testing period until the software is hopefully error free. While this first known method provides reporting of software bugs to a manufacturer it suffers from several drawbacks. First, it provides no method for automatically reporting the problem to the manufacturer. It relies solely on the beta tester to inform the manufacturer. Second, it provides no automated analysis of a problem identified by a beta tester. That is, it requires an employee at the manufacturer to determine whether the problem has already been reported, fixed, or is a new problem. Third, it provides no method for delivery of updated software to a user who is determined to be using older software with an identified and fixed problem.
- a second known method of reporting computer system errors is to rely on the end user to call the manufacturer and report a problem when it occurs.
- the customer is provided a customer support line that they may call to report problems they are having.
- the manufacturer may conclude there is a problem with some portion of a program.
- While this second known method provides reporting of software bugs to a manufacturer it suffers from several drawbacks.
- the customer may decide not to call as customer support calls tend to involve long waits on hold listening to musak and often provides no relief as the manufacturer has no formal structure in place to coordinate and analyze the calls they receive.
- the customer may not be knowledgeable enough to provide the manufacturer with the necessary information they need to diagnose the problem, or worse, they may misinform the manufacturer as to the origin of the problem.
- the invention includes a system and method for analyzing panic messages from computer systems that have suffered failures.
- a filer server dedicated to file storage and retrieval
- This message is indicative of the problem that caused the filer to crash.
- This message is sent to the manufacturer via a communications network such as the Internet.
- the message also includes other information, such as the user's name, the version of the software, a back trace, and a mini core dump.
- automatic analysis commences to determine if the bug can be identified.
- the panic message is analyzed by comparing it against a database of panic messages that correspond with known bugs. If successful, automated housekeeping occurs which includes updating this instance in a tracking database, delivery of an answer to the customer (including solutions), updating analysis statistics, and additional activities. If unsuccessful the process continues.
- a back trace analyzer analyzes the back trace using an expression algorithm that looks for exact matches on function names and recognized sequences of matches that correspond to known bugs. If successful, automated housekeeping occurs as indicated above. If unsuccessful, the process continues.
- a core script analyzer analyzes a core dump for recognizable patterns of code that correspond to known bugs. If successful, automated housekeeping occurs as indicated above If unsuccessful the process continues.
- Figure 1 illustrates a block diagram of a system for a panic message analyzer.
- Figure 2 illustrates a panic message analyzer process in a system for a panic message analyzer.
- Figure 4 illustrates a core dump process in a system for a panic message analyzer.
- Embodiment of the invention can be implemented using general purpose processors or special purpose processors operating under program control, or other circuits, adapted to particular process steps and data structures described herein. Implementation of the process steps and data structures described herein would not require undue experimentation or further investigation.
- filer - This term refers to a file server.
- a file server is a computer and storage device dedicated to data storage and retrieval.
- Core dump - A core dump is the printing or the copying to a more permanent medium (such as a hard disk) the contents of random access memory at one moment in time.
- Figure 1 shows a block diagram of a system for a panic message analyzer.
- a system 100 includes a client device 110 associated with a customer, a communications link 120, a communications network 130, a server device 140 associated with a manufacturer, a mass storage 150, a housekeeping database 151, a bugs database 152, and a core dump 160.
- the client device 110 includes a processor, a main memory, and software for executing instructions (not shown, but understood by one skilled in the art). Although the client device 110 and server device 140 are shown as separate devices there is no requirement that they be separate devices.
- the communications link 120 operates to couple the client device 110 to the communications network 130.
- the server device 140 includes a processor, a main memory, software for executing instructions (not shown, but understood by one skilled in the art), and a mass storage 150.
- client device 110 and server device 140 are shown as separate devices there is no requirement that they be separate devices.
- server device 140 and mass storage 150 are shown as combined there is no requirement that they be combined. They could be separate devices.
- the mass storage 150 includes the housekeeping database 151 and bugs database 152.
- the core dump 160 includes a mini core dump 161, a back-trace 162, and a panic message 163.
- FIG. 2 illustrates a panic message analyzer process, indicated by general reference character 200.
- the manual panic message analyzer process 200 initiates at a 'start' terminal 201.
- the panic message analyzer process 200 continues to a 'panic message created' procedure 203 which allows the customer's device to create a panic message 163 prior to failure.
- a 'customer submits panic message' procedure 205 allows the customer to submit the panic message 163 for analysis utilizing the client device 110 to transmit the panic message 163 to the server device 140.
- the customer submits the message via interaction and transfer over an Internet connection which is well- known in the art. There is, however, no requirement the panic message 163 be transferred by this method as long as it is delivered to the manufacturer.
- An 'analyze panic message' procedure 207 allows the panic message 163 to be analyzed by comparing recognized data elements it contains (a panic message includes the address of where a system was last operating, line numbers, text and source code filenames, and other data) against known data elements that correspond to known bugs in the bugs database 152 on the server device 140.
- a 'known bug?' decision procedure 209 determines whether the panic message identifies a known bug. If the "known bug?' decision procedure 209 determines that the bug is a known bug, the panic message analyzer process 200 continues to a "solution to customer" procedure 213.
- the 'solution to customer' procedure 213 extracts a solution from the database which is associated with the bug identified by the 'known bug' decision procedure
- the solution provided to the customer can be written instructions detailing how to fix and avoid further occurrences, a copy of a software program to fix the problem, or recommendations for the purchase of additional products from the manufacturer that fix the problem.
- An 'automatic housekeeping' procedure 215 records all relevant information regarding identification/non-identification of the bug, the solution sent to the customer (if any), and statistics relating to these events in the housekeeping database 151. If the panic message analyzer failed to diagnose the problem, the 'automatic housekeeping' procedure leaves the case active (i.e. marked as unresolved).
- FIG. 3 illustrates an auto support process, indicated by general reference character 300.
- the auto support process 300 initiates at a 'start' terminal 301.
- the auto support process 300 continues to an 'auto support message sent' procedure 303 which allows the client device 110 to automatically send a message to the sever device 140 containing a copy of the panic message 163 and mini core dump 161.
- An 'auto support message received' procedure 305 allows the server device 140 to receive the panic message 163 and mini core dump 161 from the client device 110.
- An 'analyze panic message' procedure 307 allows the panic message 163 to be analyzed by comparing recognized data elements it contains (a panic message includes the address of where a system was last operating, line numbers, text and source code filenames, and other data) against known data elements that correspond to known bugs in the bugs database 152 on the server device 140.
- a 'known panic bug?' decision procedure 309 determines whether the panic message identifies a known bug. If the "known bug?' decision procedure 209 determines that the bug is a known bug, the panic message analyzer process 200 continues to a "discard mini core dump" procedure 321.
- An 'extract back-trace' procedure 311 extracts the back-trace 162 from the mini core dump 161.
- An 'analyze back-trace' procedure 313 allows the back-trace 162 to be analyzed using an expression algorithm that looks for exact matches on function names and recognized sequences of function names that correspond to known bugs in the bugs database 152 on the server device 140.
- a 'known back-trace bug?' decision procedure 315 determines whether the back-trace 162 identifies a known bug. If the 'known back-trace bug?' decision procedure 315 determines that the bug is a known bug, the auto support process 300 continues to a "discard mini core dump" procedure 321.
- a 'request core dump' 317 procedure notifies the customer that a core dump
- This notification includes all the instructions necessary to create the core dump 160 and deliver it to the manufacturer.
- the notification would be sent electronically to the customer; however, there is no requirement that notification be accomplished in this manner.
- An 'automatic housekeeping' procedure 319 records all relevant information regarding identification/non-identification of the bug, the solution sent to the customer (if any), and statistics relating to these events in the housekeeping database 151. If the panic message analyzer failed to diagnose the problem, the 'automatic housekeeping' procedure leaves the case active (i.e. marked as unresolved).
- the panic message analyzer would not identify it in version two if the bug now appeared at line 20 due to the exact matching methodology used.
- the back-trace analyzer might identify the bug as it uses a more sophisticated approach, and it would then pass this information to the panic message analyzer.
- the auto support process 300 terminates through an 'end' terminal 325.
- a 'discard mini core dump' procedure 321 causes the mini core dump 161 to be discarded as it is no longer needed due to identification of the bug.
- a 'solution sent to customer' procedure 323 causes a solution to be extracted from the bugs database 152 which is associated with the identified bug.
- the solution provided to the customer varies depending on the bug identified. For example, it can be written instructions detailing how to fix and avoid further occurrences, a copy of a software program to fix the problem, or recommendations for the purchase of additional products from the manufacturer that fix the problem.
- the auto support process 300 continues to an 'automatic housekeeping' procedure 319.
- FIG 4 illustrates a core dump process, indicated by general reference character 400.
- the core dump process 400 initiates at a 'start' terminal 401.
- the core dump process 400 continues to a 'core arrives from customer' procedure 403 which allows analysis of the core dump 160 to begin.
- the core dump 160 is requested by a' request core dump' procedure 317 (illustrated in Figure 3) when prior analysis of the panic message 163 and back-trace 162 have failed.
- An 'analyze panic message' procedure 405 allows the panic message 163 to be analyzed by comparing recognized data elements it contains (a panic message includes the address of where a system was last operating, line numbers, text and source code filenames, and other data) against known data elements that correspond to known bugs in the bugs database 152 on the server device 140.
- a 'known panic bug?' decision procedure 407 determines whether the panic message identifies a known bug. If the "known bug?' decision procedure 407 determines that the bug is a known bug, the core dump process 400 continues to a "store core dump" procedure 423.
- An 'extract back-trace' procedure 409 extracts the back-trace 162 from the core dump 160.
- An 'analyze back-trace' procedure 411 allows the back-trace 162 to be analyzed using an expression algorithm that looks for exact matches on function names and recognized sequences of function names that correspond to known bugs within the bugs database 152.
- a 'known back-trace bug?' decision procedure 413 determines whether the back-trace 162 identifies a known bug. If the 'known back-trace bug?' decision procedure 413 determines that the bug is a known bug, the core dump process 400 continues to a 'store core dump' procedure 423.
- a 'core script analyzer' procedure 415 automatically analyzes the core dump
- a 'known core bug?' decision procedure 417 determines whether core script analysis has identified a known bug. If the 'known core bug?' decision procedure 417 determines it has identified a known core bug, the core dump process 400 continues to a 'store core dump' procedure 423.
- a 'manual core dump analysis' procedure 419 allows the core dump 160 to be analyzed manually by personnel at the manufacturer.
- a 'manual solution sent to customer' procedure 421 allows personnel at the manufacturer to send a solution to the customer based on the manual analysis of the core dump 160.
- the core dump process 400 continues to a "automatic housekeeping" procedure
- a 'store core dump' procedure 423 allows the mini core dump 161 to be moved to a storage location.
- a 'solution sent to customer' procedure 425 causes a solution to be extracted from the bugs database 152 which is associated with the identified bug.
- the solution provided to the customer varies depending on the bug identified. For example, it can be written instructions detailing how to fix and avoid further occurrences, a copy of a software program to fix the problem, or recommendations for the purchase of additional products from the manufacturer that fix the problem.
- An 'automatic housekeeping' procedure 427 records all relevant information regarding identification/non-identification of the bug, the solution sent to the customer (if any), statistics relating to these events, and any entries necessary to the bugs database 152.
- functionality exists that allows the back-trace analyzer to teach the panic message analyzer about the bug. This allows future instances of the bug to be resolved at an earlier stage.
- functionality exists that allows the core to teach the back-trace analyzer and panic message analyzer about the bug. This allows future instances of the bug to be resolved at an earlier stage.
- the core dump process 400 terminates through an 'end' terminal 429.
- the invention has general applicability to various fields of use, not necessarily related to the services described above.
- these fields of use can include one or more of, or some combination of, the following:
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- General Engineering & Computer Science (AREA)
- Computer Hardware Design (AREA)
- Quality & Reliability (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Debugging And Monitoring (AREA)
- Automatic Analysis And Handling Materials Therefor (AREA)
- Stored Programmes (AREA)
- Information Transfer Between Computers (AREA)
Abstract
The invention provides a method and system for automatically obtaining and analyzing error messages from end users of software and hardware products. Additionally, the invention provides a method and system of providing solutions that automatically and manually correct the errors. An ever-growing database of errors and solutions is maintained so that future identical or similar problems are expedited without human intervention. Successful analysis at any but the lowest level automatically allows previous levels to be taught greater efficiency for future analysis of the same or similar errors.
Description
PANIC MESSAGE ANALYZER
Background of the Invention
1. Field of the invention
This invention relates to analysis of panic messages from network servers.
Related Art
Computers rely on programmed instructions to direct their operation. General purpose computers most often receive these instructions from software that executes within their memory. Despite the fact that software engineers vigorously test programs to eliminate the presence of coded instructions that may cause errors (commonly known as bugs), the presence of bugs is practically unavoidable in simple programs and a foregone conclusion in complex programs.
When a computer executes a program instruction that causes an error, the error may have relatively no effect on the system, or the error may cause the system to crash. All errors are of importance to software engineers, however, those that cause a catastrophic result, such as a crash, are of primary importance. Generally, systems are designed to provide an alert to a system operator that they have suffered some type of failure. Error messages provide this alert, and these messages are useful when reporting errors to a manufacturer of the software.
A first known method to enable reporting of a software application error is to provide a pre-public release of a software package to a select group customers for "beta testing." During this trial period, customers report to the company any problems that they encounter and the software engineers at the company fix the bugs and provide updated versions of the software to the beta testers who continue testing with the new version. This process continues for a short testing period until the software is hopefully error free.
While this first known method provides reporting of software bugs to a manufacturer it suffers from several drawbacks. First, it provides no method for automatically reporting the problem to the manufacturer. It relies solely on the beta tester to inform the manufacturer. Second, it provides no automated analysis of a problem identified by a beta tester. That is, it requires an employee at the manufacturer to determine whether the problem has already been reported, fixed, or is a new problem. Third, it provides no method for delivery of updated software to a user who is determined to be using older software with an identified and fixed problem.
A second known method of reporting computer system errors is to rely on the end user to call the manufacturer and report a problem when it occurs. The customer is provided a customer support line that they may call to report problems they are having.
Based on the frequency and subject matter of calls received, the manufacturer may conclude there is a problem with some portion of a program.
While this second known method provides reporting of software bugs to a manufacturer it suffers from several drawbacks. First, the customer may decide not to call as customer support calls tend to involve long waits on hold listening to musak and often provides no relief as the manufacturer has no formal structure in place to coordinate and analyze the calls they receive. Second, the customer may not be knowledgeable enough to provide the manufacturer with the necessary information they need to diagnose the problem, or worse, they may misinform the manufacturer as to the origin of the problem.
Accordingly, it would be advantageous to provide a method for computer system errors to be reliably reported to a manufacturer in such a manner that the process is automated to the level of determining whether the problem is known or unknown. And, if the problem is known, providing channels for delivery of updated software to clients, and if unknown, providing a method to obtain and analyze necessary information from the suspect system to enable diagnosis and correction of the errors.
Summary of the Invention
The invention includes a system and method for analyzing panic messages from computer systems that have suffered failures. One of the last processes a filer (server dedicated to file storage and retrieval) performs before it crashes is to render a panic message. This message is indicative of the problem that caused the filer to crash. This message is sent to the manufacturer via a communications network such as the Internet. The message also includes other information, such as the user's name, the version of the software, a back trace, and a mini core dump.
At the manufacturer, automatic analysis commences to determine if the bug can be identified. First, the panic message is analyzed by comparing it against a database of panic messages that correspond with known bugs. If successful, automated housekeeping occurs which includes updating this instance in a tracking database, delivery of an answer to the customer (including solutions), updating analysis statistics, and additional activities. If unsuccessful the process continues.
Second, a back trace analyzer analyzes the back trace using an expression algorithm that looks for exact matches on function names and recognized sequences of matches that correspond to known bugs. If successful, automated housekeeping occurs as indicated above. If unsuccessful, the process continues.
Third, a core script analyzer analyzes a core dump for recognizable patterns of code that correspond to known bugs. If successful, automated housekeeping occurs as indicated above If unsuccessful the process continues.
Fourth, if the automated methods above fail to identify the problem as a known bug, the core in manually analyzed to detect known and unknown bugs. Manual and automatic housekeeping is performed following this manual analysis.
Brief Description of the Drawings
Figure 1 illustrates a block diagram of a system for a panic message analyzer.
Figure 2 illustrates a panic message analyzer process in a system for a panic message analyzer.
Figure 3 illustrates an auto support process in a system for a panic message analyzer.
Figure 4 illustrates a core dump process in a system for a panic message analyzer.
Detailed Description of the Preferred Embodiment
In the following description, a preferred embodiment of the invention is described with regard to preferred process steps and data structures. Embodiment of the invention can be implemented using general purpose processors or special purpose processors operating under program control, or other circuits, adapted to particular process steps and data structures described herein. Implementation of the process steps and data structures described herein would not require undue experimentation or further investigation.
Lexicography
The following terms refer to or relate to aspects of the invention as described below. The descriptions of general meanings of these terms are not intended to be limiting, only illustrative.
• filer - This term refers to a file server. A file server is a computer and storage device dedicated to data storage and retrieval.
Core dump - A core dump is the printing or the copying to a more permanent medium (such as a hard disk) the contents of random access memory at one moment in time.
• Mini core dump - A subset of data from a core dump.
• Back-trace - A list of computer instructions in the reverse order they were executed.
As noted above, these descriptions of general meanings of these terms are not intended to be limiting, only illustrative. Other and further applications of the invention, including extensions of these terms and concepts, would be clear to those of ordinary skill in the art after perusing this application. These other and further applications are part of the scope and spirit of the invention, and would be clear to those of ordinary skill in the art, without further invention or undue experimentation.
System Elements
Figure 1 shows a block diagram of a system for a panic message analyzer.
A system 100 includes a client device 110 associated with a customer, a communications link 120, a communications network 130, a server device 140 associated with a manufacturer, a mass storage 150, a housekeeping database 151, a bugs database 152, and a core dump 160.
The client device 110 includes a processor, a main memory, and software for executing instructions (not shown, but understood by one skilled in the art). Although the client device 110 and server device 140 are shown as separate devices there is no requirement that they be separate devices.
The communications link 120 operates to couple the client device 110 to the communications network 130.
The server device 140 includes a processor, a main memory, software for executing instructions (not shown, but understood by one skilled in the art), and a mass storage 150. Although the client device 110 and server device 140 are shown as separate devices there is no requirement that they be separate devices. Additionally, although the
server device 140 and mass storage 150 are shown as combined there is no requirement that they be combined. They could be separate devices.
The mass storage 150 includes the housekeeping database 151 and bugs database 152.
The core dump 160 includes a mini core dump 161, a back-trace 162, and a panic message 163.
Method of Operation - Manual Message Processing
Figure 2 illustrates a panic message analyzer process, indicated by general reference character 200. The manual panic message analyzer process 200 initiates at a 'start' terminal 201. The panic message analyzer process 200 continues to a 'panic message created' procedure 203 which allows the customer's device to create a panic message 163 prior to failure.
A 'customer submits panic message' procedure 205 allows the customer to submit the panic message 163 for analysis utilizing the client device 110 to transmit the panic message 163 to the server device 140. In a preferred embodiment, the customer submits the message via interaction and transfer over an Internet connection which is well- known in the art. There is, however, no requirement the panic message 163 be transferred by this method as long as it is delivered to the manufacturer.
An 'analyze panic message' procedure 207 allows the panic message 163 to be analyzed by comparing recognized data elements it contains (a panic message includes the address of where a system was last operating, line numbers, text and source code filenames, and other data) against known data elements that correspond to known bugs in the bugs database 152 on the server device 140.
A 'known bug?' decision procedure 209 determines whether the panic message identifies a known bug. If the "known bug?' decision procedure 209 determines
that the bug is a known bug, the panic message analyzer process 200 continues to a "solution to customer" procedure 213.
An 'investigate via auto support' procedure 211 takes note that analysis of the customer-submitted panic message 163 failed to identify the problem with the affected system and that further investigation is required. The panic message analyzer process 200 continues to an "automatic housekeeping" procedure 215.
The 'solution to customer' procedure 213 extracts a solution from the database which is associated with the bug identified by the 'known bug' decision procedure
209. The solution provided to the customer can be written instructions detailing how to fix and avoid further occurrences, a copy of a software program to fix the problem, or recommendations for the purchase of additional products from the manufacturer that fix the problem.
An 'automatic housekeeping' procedure 215 records all relevant information regarding identification/non-identification of the bug, the solution sent to the customer (if any), and statistics relating to these events in the housekeeping database 151. If the panic message analyzer failed to diagnose the problem, the 'automatic housekeeping' procedure leaves the case active (i.e. marked as unresolved).
Method of Operation -Auto Support Analysis
Figure 3 illustrates an auto support process, indicated by general reference character 300. The auto support process 300 initiates at a 'start' terminal 301. The auto support process 300 continues to an 'auto support message sent' procedure 303 which allows the client device 110 to automatically send a message to the sever device 140 containing a copy of the panic message 163 and mini core dump 161.
An 'auto support message received' procedure 305 allows the server device 140 to receive the panic message 163 and mini core dump 161 from the client device 110.
An 'analyze panic message' procedure 307 allows the panic message 163 to be analyzed by comparing recognized data elements it contains (a panic message includes the address of where a system was last operating, line numbers, text and source code filenames, and other data) against known data elements that correspond to known bugs in the bugs database 152 on the server device 140.
A 'known panic bug?' decision procedure 309 determines whether the panic message identifies a known bug. If the "known bug?' decision procedure 209 determines that the bug is a known bug, the panic message analyzer process 200 continues to a "discard mini core dump" procedure 321.
An 'extract back-trace' procedure 311 extracts the back-trace 162 from the mini core dump 161.
An 'analyze back-trace' procedure 313 allows the back-trace 162 to be analyzed using an expression algorithm that looks for exact matches on function names and recognized sequences of function names that correspond to known bugs in the bugs database 152 on the server device 140.
A 'known back-trace bug?' decision procedure 315 determines whether the back-trace 162 identifies a known bug. If the 'known back-trace bug?' decision procedure 315 determines that the bug is a known bug, the auto support process 300 continues to a "discard mini core dump" procedure 321.
A 'request core dump' 317 procedure notifies the customer that a core dump
160 of the affected system is required. This notification includes all the instructions necessary to create the core dump 160 and deliver it to the manufacturer. In a preferred embodiment, the notification would be sent electronically to the customer; however, there is no requirement that notification be accomplished in this manner.
An 'automatic housekeeping' procedure 319 records all relevant information regarding identification/non-identification of the bug, the solution sent to the customer (if any), and statistics relating to these events in the housekeeping database 151. If the panic
message analyzer failed to diagnose the problem, the 'automatic housekeeping' procedure leaves the case active (i.e. marked as unresolved).
Additionally, if the bug was identified by analysis of the back-trace, functionality exists that allows the back-trace analyzer to teach the panic message analyzer about the bug. This allows future instances of the bug to be resolved at an earlier stage.
For example, if the bug first appeared in version one of the software at line 10, the panic message analyzer would not identify it in version two if the bug now appeared at line 20 due to the exact matching methodology used. The back-trace analyzer, however, might identify the bug as it uses a more sophisticated approach, and it would then pass this information to the panic message analyzer. The auto support process 300 terminates through an 'end' terminal 325.
A 'discard mini core dump' procedure 321 causes the mini core dump 161 to be discarded as it is no longer needed due to identification of the bug.
A 'solution sent to customer' procedure 323 causes a solution to be extracted from the bugs database 152 which is associated with the identified bug. The solution provided to the customer varies depending on the bug identified. For example, it can be written instructions detailing how to fix and avoid further occurrences, a copy of a software program to fix the problem, or recommendations for the purchase of additional products from the manufacturer that fix the problem.
The auto support process 300 continues to an 'automatic housekeeping' procedure 319.
Method of Operation - Core Dump Analysis
Figure 4 illustrates a core dump process, indicated by general reference character 400. The core dump process 400 initiates at a 'start' terminal 401. The core dump process 400 continues to a 'core arrives from customer' procedure 403 which allows analysis
of the core dump 160 to begin. The core dump 160 is requested by a' request core dump' procedure 317 (illustrated in Figure 3) when prior analysis of the panic message 163 and back-trace 162 have failed. These two analysis techniques, however, are duplicated during the core dump process 400 further providing fail-safe systematic analysis.
An 'analyze panic message' procedure 405 allows the panic message 163 to be analyzed by comparing recognized data elements it contains (a panic message includes the address of where a system was last operating, line numbers, text and source code filenames, and other data) against known data elements that correspond to known bugs in the bugs database 152 on the server device 140.
A 'known panic bug?' decision procedure 407 determines whether the panic message identifies a known bug. If the "known bug?' decision procedure 407 determines that the bug is a known bug, the core dump process 400 continues to a "store core dump" procedure 423.
An 'extract back-trace' procedure 409 extracts the back-trace 162 from the core dump 160.
An 'analyze back-trace' procedure 411 allows the back-trace 162 to be analyzed using an expression algorithm that looks for exact matches on function names and recognized sequences of function names that correspond to known bugs within the bugs database 152.
A 'known back-trace bug?' decision procedure 413 determines whether the back-trace 162 identifies a known bug. If the 'known back-trace bug?' decision procedure 413 determines that the bug is a known bug, the core dump process 400 continues to a 'store core dump' procedure 423.
A 'core script analyzer' procedure 415 automatically analyzes the core dump
160 by searching for data elements in the core dump 160 that correspond to known bugs within the bugs database 152.
A 'known core bug?' decision procedure 417 determines whether core script analysis has identified a known bug. If the 'known core bug?' decision procedure 417 determines it has identified a known core bug, the core dump process 400 continues to a 'store core dump' procedure 423.
A 'manual core dump analysis' procedure 419 allows the core dump 160 to be analyzed manually by personnel at the manufacturer.
A 'manual solution sent to customer' procedure 421 allows personnel at the manufacturer to send a solution to the customer based on the manual analysis of the core dump 160. The core dump process 400 continues to a "automatic housekeeping" procedure
427.
A 'store core dump' procedure 423 allows the mini core dump 161 to be moved to a storage location.
A 'solution sent to customer' procedure 425 causes a solution to be extracted from the bugs database 152 which is associated with the identified bug. The solution provided to the customer varies depending on the bug identified. For example, it can be written instructions detailing how to fix and avoid further occurrences, a copy of a software program to fix the problem, or recommendations for the purchase of additional products from the manufacturer that fix the problem.
An 'automatic housekeeping' procedure 427 records all relevant information regarding identification/non-identification of the bug, the solution sent to the customer (if any), statistics relating to these events, and any entries necessary to the bugs database 152.
Additionally, if the bug was identified by analysis of the back-trace, functionality exists that allows the back-trace analyzer to teach the panic message analyzer about the bug. This allows future instances of the bug to be resolved at an earlier stage.
Furthermore, if the bug was identified by analysis of the core, functionality exists that allows the core to teach the back-trace analyzer and panic message analyzer about the bug. This allows future instances of the bug to be resolved at an earlier stage.
The core dump process 400 terminates through an 'end' terminal 429.
Generality of the Invention
The invention has general applicability to various fields of use, not necessarily related to the services described above. For example, these fields of use can include one or more of, or some combination of, the following:
• In addition to general applicability to file servers the invention has broad applicability to networks, network devices, and all types of software. Other and further applications of the invention, in its most general form, will be clear to those skilled in the art after perusal of this application, and are within the scope and spirit of the invention.
Alternate Embodiments
Although preferred embodiments are disclosed herein, many variations are possible which remain within the concept, scope, and spirit of the invention, and these variations would become clear to those skilled in the art after perusal of this application.
Claims
1. A method of analyzing computer error messages comprising;
receiving a message including information regarding an extraordinary event comprising at least an address value, source code, and other information associated with said event; and applying a set of rules to said message; and generating an action in response to said message.
2. The method of claim 1 , wherein said other information includes some text other than said address value.
3. The method of claim 1 , wherein said message is responsive to an operation taken by a customer.
4. The method of claim 3, wherein said operation is electronic transfer of said message from said customer to a manufacturer via a communications network.
5. The method of claim 4, wherein said electronic transfer of said message includes a product release version as entered by said customer.
6. The method of claim 1, wherein said applying a set of rules compares said message against a first database of known bugs.
7. The method of claim 6, wherein said applying a set of rules further comprises identifying said message as matching a known bug in a first database.
8. The method of claim 7, wherein said known bug is linked to solutions associated with said version.
9. The method of claim 8, wherein a solution is extracted from said solutions by matching said version entered by said customer to said version associated with said known bug.
10. The method of claim 1, wherein said action is delivery of said solution to said customer.
11. The method of claim 10, wherein said delivery is by transmission over a communications network.
12. The method of claim 10, wherein said delivery is by a mail service.
13. The method of claim 1, wherein said action further comprises recording statistics to a second database.
14. The method of claim 13, wherein said statistics further comprise information related to said message.
15. A method of analyzing computer error messages comprising;
receiving a message including information comprising a sequence of executed program instructions; applying a set of rules to said sequence; and generating an action in response to said message.
16. The method of claim 15, wherein said message is responsive to a problem with a device utilized by said customer and manufactured by said manufacturer.
17. The method of claim 16, wherein said message further comprises an address value and a product release version.
18. The method of claim 17, wherein said message is electronically transferred via said communications network.
19. The method of claim 15 wherein said message is compared against a first database of known bugs.
20. The method of claim 19, wherein said applying a set of rules to said sequence comprises identifying some portion of said sequence as matching at least one of said known bugs in said first database.
21. The method of claim 20, wherein said known bugs are associated with solutions categorized by said version.
22. The method of claim 20, wherein a solution is extracted from said solutions by further matching said version received from said customer to said version associated with said solutions.
23. The method of claim 15, wherein said action is delivery of said solution to said customer.
24. The method of claim 23, wherein said delivery is by transmission over a communications network.
25. The method of claim 23, wherein said delivery is by a mail service.
26. The method of claim 15, wherein said action further comprises recording statistics to a second database.
27. The method of claim 26, wherein said statistics further comprise information related to said message.
28. The method of claim 26, wherein said action teaches, when possible, a prior identification process to recognize said known bugs.
29. A method for analyzing computer error messages comprising;
receiving a message including information comprising the entire contents of a random access memory of a device; applying a set of rules to said contents; and generating an action in response to said message.
30. The method of claim 29, wherein said message is responsive to a request by the manufacturer and to an initial problem with a device owned by said customer and manufactured by said manufacturer.
31. The method of claim 29, wherein said contents further comprises a sequence of executed program instructions, an address value, and a product release version.
32. The method of claim 29, wherein said message is electronically transferred via a communications network.
33. The method of claim 29, wherein said applying a set of rules further comprises scanning said contents for patterns of data that indicate known bugs.
34. The method of claim 33, wherein said scanning identifies some portion of said contents as matching at least one of said known bugs in said first database.
35. The method of claim 34, wherein said known bugs are associated with solutions categorized by said version.
36. The method of claim 35, wherein a solution is extracted from said solutions by matching said version received from said customer to said version associated with said known bug.
37. The method of claim 29, wherein said action is delivery of said solution to said customer.
38. The method of claim 37, wherein said delivery is by transmission over a network.
39. The method of claim 37, wherein said delivery is by a mail service.
40. The method of claim 29, wherein said action further comprises recording statistics to said second database.
41. The method of claim 40, wherein said statistics further comprise information related to said message.
42. The method of claim 29, wherein said action teaches, when possible, prior identification processes to recognize said known bugs.
43. Apparatus including a memory, said memory having storage capable of holding information, said information including;
information identifying a product release version; information identifying a set of bugs in an earliest release; and information regarding duplicate bugs.
44. The apparatus of claim 43, wherein said version is stored in a first database.
45. The apparatus of claim 43, wherein said set of bugs is stored in said first database.
46. The apparatus of claim 43, wherein said version and said set of bugs are associated.
47. The apparatus of claim 43, wherein said duplicate bugs are stored in said first database.
48. The apparatus of claim 47, wherein duplicate like bugs are removed from said first database.
Applications Claiming Priority (3)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US65820800A | 2000-09-08 | 2000-09-08 | |
US658208 | 2000-09-08 | ||
PCT/US2001/029049 WO2002021281A2 (en) | 2000-09-08 | 2001-09-10 | Panic message analyzer |
Publications (1)
Publication Number | Publication Date |
---|---|
EP1381952A2 true EP1381952A2 (en) | 2004-01-21 |
Family
ID=24640348
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
EP01973104A Ceased EP1381952A2 (en) | 2000-09-08 | 2001-09-10 | Panic message analyzer |
Country Status (4)
Country | Link |
---|---|
EP (1) | EP1381952A2 (en) |
JP (1) | JP4979176B2 (en) |
CA (1) | CA2420008C (en) |
WO (1) | WO2002021281A2 (en) |
Families Citing this family (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US7174352B2 (en) | 1993-06-03 | 2007-02-06 | Network Appliance, Inc. | File system image transfer |
US6138126A (en) | 1995-05-31 | 2000-10-24 | Network Appliance, Inc. | Method for allocating files in a file system integrated with a raid disk sub-system |
US7343529B1 (en) | 2004-04-30 | 2008-03-11 | Network Appliance, Inc. | Automatic error and corrective action reporting system for a network storage appliance |
JP5576798B2 (en) | 2007-12-12 | 2014-08-20 | ユニバーシティ・オブ・ワシントン | Deterministic multiprocessing (DETERMINISTICMULTIPROCESSING) |
EP2266026A4 (en) * | 2008-03-11 | 2012-01-11 | Univ Washington | MULTITRAITEMENT DETERMINISTIC EFFECTIVE |
US8453120B2 (en) | 2010-05-11 | 2013-05-28 | F5 Networks, Inc. | Enhanced reliability using deterministic multiprocessing-based synchronized replication |
CN109542657A (en) * | 2018-10-16 | 2019-03-29 | 深圳壹账通智能科技有限公司 | The processing method and server of system exception |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5111384A (en) * | 1990-02-16 | 1992-05-05 | Bull Hn Information Systems Inc. | System for performing dump analysis |
US5293612A (en) * | 1989-05-11 | 1994-03-08 | Tandem Computers Incorporated | Selective dump method and apparatus |
EP0586767A1 (en) * | 1992-09-11 | 1994-03-16 | International Business Machines Corporation | Selective data capture for software exception conditions |
Family Cites Families (12)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JPH0291735A (en) * | 1988-09-28 | 1990-03-30 | Tohoku Nippon Denki Software Kk | Maintenance managing system for remote fault |
JPH04335449A (en) * | 1991-05-13 | 1992-11-24 | Nec Corp | Terminal fault information collecting system |
SE470031B (en) * | 1991-06-20 | 1993-10-25 | Icl Systems Ab | System and method for monitoring and changing the operation of a computer system |
JPH05334135A (en) * | 1992-05-28 | 1993-12-17 | Nec Corp | Error information display system for abnormal end of program |
US5761407A (en) * | 1993-03-15 | 1998-06-02 | International Business Machines Corporation | Message based exception handler |
JP2701807B2 (en) * | 1995-09-13 | 1998-01-21 | 日本電気株式会社 | Failure notification device |
JPH10228395A (en) * | 1997-02-17 | 1998-08-25 | Sekisui Chem Co Ltd | Abnormality diagnostic device for controller |
US6073255A (en) * | 1997-05-13 | 2000-06-06 | Micron Electronics, Inc. | Method of reading system log |
JPH1124961A (en) * | 1997-07-08 | 1999-01-29 | Nippon Denki Joho Service Kk | Computer maintenance system |
JPH1139259A (en) * | 1997-07-15 | 1999-02-12 | Casio Comput Co Ltd | Information processing apparatus and recording medium recording program |
JP2000181734A (en) * | 1998-12-16 | 2000-06-30 | Fujitsu Ltd | Method for restoring program reference area, restoration system, program running side device, program failure handling device, and computer readable program recording medium therefor |
JP3525410B2 (en) * | 1998-12-16 | 2004-05-10 | 富士通株式会社 | Disaster recovery method and computer readable program recording medium for the same |
-
2001
- 2001-09-10 WO PCT/US2001/029049 patent/WO2002021281A2/en active Application Filing
- 2001-09-10 EP EP01973104A patent/EP1381952A2/en not_active Ceased
- 2001-09-10 JP JP2002524828A patent/JP4979176B2/en not_active Expired - Fee Related
- 2001-09-10 CA CA2420008A patent/CA2420008C/en not_active Expired - Fee Related
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5293612A (en) * | 1989-05-11 | 1994-03-08 | Tandem Computers Incorporated | Selective dump method and apparatus |
US5111384A (en) * | 1990-02-16 | 1992-05-05 | Bull Hn Information Systems Inc. | System for performing dump analysis |
EP0586767A1 (en) * | 1992-09-11 | 1994-03-16 | International Business Machines Corporation | Selective data capture for software exception conditions |
Non-Patent Citations (1)
Title |
---|
See also references of WO0221281A3 * |
Also Published As
Publication number | Publication date |
---|---|
JP2004524596A (en) | 2004-08-12 |
JP4979176B2 (en) | 2012-07-18 |
CA2420008A1 (en) | 2002-03-14 |
CA2420008C (en) | 2012-04-03 |
WO2002021281A3 (en) | 2003-11-06 |
WO2002021281A2 (en) | 2002-03-14 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US7475387B2 (en) | Problem determination using system run-time behavior analysis | |
US7984007B2 (en) | Proactive problem resolution system, method of proactive problem resolution and program product therefor | |
US7328376B2 (en) | Error reporting to diagnostic engines based on their diagnostic capabilities | |
US6859893B2 (en) | Service guru system and method for automated proactive and reactive computer system analysis | |
US8140565B2 (en) | Autonomic information management system (IMS) mainframe database pointer error diagnostic data extraction | |
US8250563B2 (en) | Distributed autonomic solutions repository | |
US7080287B2 (en) | First failure data capture | |
US7007200B2 (en) | Error analysis fed from a knowledge base | |
US7305465B2 (en) | Collecting appliance problem information over network and providing remote technical support to deliver appliance fix information to an end user | |
US7594219B2 (en) | Method and apparatus for monitoring compatibility of software combinations | |
US20050081118A1 (en) | System and method of generating trouble tickets to document computer failures | |
US20060288183A1 (en) | Apparatus and method for information recovery quality assessment in a computer system | |
US20160026547A1 (en) | Generating predictive diagnostics via package update manager | |
US20040236843A1 (en) | Online diagnosing of computer hardware and software | |
JPH01243135A (en) | Problem processing system | |
US20070038896A1 (en) | Call-stack pattern matching for problem resolution within software | |
CN101918922A (en) | Systems and methods for automated data anomaly correction in a computer network | |
US6944849B1 (en) | System and method for storing and reporting information associated with asserts | |
JPH0325629A (en) | Method and system for detecting error in program | |
NZ526097A (en) | Online diagnosing of computer hardware and software from a remote location without requiring human assistance | |
US20060088027A1 (en) | Dynamic log for computer systems of server and services | |
US6957366B1 (en) | System and method for an interactive web-based data catalog for tracking software bugs | |
CN111444101A (en) | Method and device for automatically creating product test defects | |
CA2420008C (en) | Panic message analyzer | |
US20070011541A1 (en) | Methods and systems for identifying intermittent errors in a distributed code development environment |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PUAI | Public reference made under article 153(3) epc to a published international application that has entered the european phase |
Free format text: ORIGINAL CODE: 0009012 |
|
17P | Request for examination filed |
Effective date: 20030404 |
|
AK | Designated contracting states |
Kind code of ref document: A2 Designated state(s): DE FR GB IT NL |
|
RAP1 | Party data changed (applicant data changed or rights of an application transferred) |
Owner name: NETWORK APPLIANCE, INC. |
|
REG | Reference to a national code |
Ref country code: DE Ref legal event code: R003 |
|
STAA | Information on the status of an ep patent application or granted ep patent |
Free format text: STATUS: THE APPLICATION HAS BEEN REFUSED |
|
18R | Application refused |
Effective date: 20111206 |