CN103577273A - Second failure data capture in co-operating multi-image systems - Google Patents
Second failure data capture in co-operating multi-image systems Download PDFInfo
- Publication number
- CN103577273A CN103577273A CN201310343980.0A CN201310343980A CN103577273A CN 103577273 A CN103577273 A CN 103577273A CN 201310343980 A CN201310343980 A CN 201310343980A CN 103577273 A CN103577273 A CN 103577273A
- Authority
- CN
- China
- Prior art keywords
- information
- software
- fault
- images
- thread
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/36—Preventing errors by testing or debugging software
- G06F11/3668—Software testing
- G06F11/3672—Test management
- G06F11/3688—Test management for test execution, e.g. scheduling of test suites
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/07—Responding to the occurrence of a fault, e.g. fault tolerance
- G06F11/0703—Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation
- G06F11/0706—Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation the processing taking place on a specific hardware platform or in a specific software environment
- G06F11/0715—Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation the processing taking place on a specific hardware platform or in a specific software environment in a system implementing multitasking
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/30—Monitoring
- G06F11/34—Recording or statistical evaluation of computer activity, e.g. of down time, of input/output operation ; Recording or statistical evaluation of user activity, e.g. usability assessment
- G06F11/3466—Performance evaluation by tracing or monitoring
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/07—Responding to the occurrence of a fault, e.g. fault tolerance
- G06F11/0703—Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation
- G06F11/0706—Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation the processing taking place on a specific hardware platform or in a specific software environment
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/07—Responding to the occurrence of a fault, e.g. fault tolerance
- G06F11/0703—Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation
- G06F11/0706—Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation the processing taking place on a specific hardware platform or in a specific software environment
- G06F11/0712—Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation the processing taking place on a specific hardware platform or in a specific software environment in a virtual computing platform, e.g. logically partitioned systems
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/07—Responding to the occurrence of a fault, e.g. fault tolerance
- G06F11/0703—Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation
- G06F11/0751—Error or fault detection not based on redundancy
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/07—Responding to the occurrence of a fault, e.g. fault tolerance
- G06F11/0703—Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation
- G06F11/0766—Error or fault reporting or storing
- G06F11/0778—Dumping, i.e. gathering error/state information after a fault for later diagnosis
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/07—Responding to the occurrence of a fault, e.g. fault tolerance
- G06F11/0703—Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation
- G06F11/079—Root cause analysis, i.e. error or fault diagnosis
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/07—Responding to the occurrence of a fault, e.g. fault tolerance
- G06F11/0703—Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation
- G06F11/0793—Remedial or corrective actions
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/36—Preventing errors by testing or debugging software
- G06F11/362—Software debugging
- G06F11/3636—Software debugging by tracing the execution of the program
Abstract
A method, computer system, and computer program captures diagnostic trace information in a computer system having a plurality of software images. Information is received that is associated with a first failure in a first one of the plurality of software images. The received information is distributed to others of the plurality of software images. Further information is captured that is associated with a second failure in another one of the plurality of software images. The information distribution comprises distributing a first part of the information to a plurality of first soft images of the soft images and distributing a second part of the information to a plurality of second soft images of the soft images.
Description
Technical field
The present invention relates to automatically catching of diagnostic data in computer system, particularly the diagnostic data in the multi-mapping computer system of co-operate catches automatically.
Background technology
Automatically being captured in computer system of diagnostic data is well-known.Particularly, it is used in complicated and/or long-play application conventionally to allow the quick solution of problem, and does not need on-the-spot fault or the standby system of reproducing.Known solution is that the form with dump, daily record and trace file provides Fisrt fault data capture (FFDC), and data capture triggers when problem being detected.
The problem of the solution that this is known is to have compromise obtaining for the demand of enough diagnostic messages of analysis and solve problem and produce between the cost of this diagnostic message.The cost that produces diagnostic message can comprise a) for the performance cost of the application of daily record recording and tracking continuously, b) fault is produced dump institute's time spent to (this may postpone restarting of application), and c) the disk space amount that requires of storage diagnostic message output.
WO2012/026035A discloses a kind of fault processing system, it has: stored position information acquiring unit, obtains the stored position information of the memory location that is illustrated in the failure message generating when fault occurs for the storage unit of the assembly from wherein having broken down; Failure message acquiring unit, be used for based on stored position information, from memory device, obtain failure message that generate, relevant with fault when fault occurs messaging device, this memory device be connected in case can with messaging device and fault handling devices communicating; And configuration control module, for the failure message based on obtaining, according to messaging device, revise the configuration of fault handling equipment.Thereby fault processing system can easily be reproduced in the fault occurring in messaging device, to effectively carry out, reproduce test.
Therefore, in prior art, there are the needs of processing the problems referred to above.
Summary of the invention
It is a kind of for catching the method for diagnosis trace information that embodiments of the invention provide, described method, for having the computer system of a plurality of software images, said method comprising the steps of: receive the information relevant with Fisrt fault in first of described a plurality of software images; Other software image by described distribution of information to described a plurality of software images; Catch the information relevant with the second fault in another in described a plurality of software images.The advantage of the method is that the cost of acquisition and tracking diagnostic message is minimized, until Fisrt fault occurs, after this, the value of the trace diagnosis information of catching is maximized, and by only catching the detailed tracking diagnostic message relevant with Fisrt fault, the cost of acquisition and tracking diagnostic message is minimized.
In an embodiment, the execution of the step of the described information of described distribution in load balancer, supervisory routine, operating system, monitoring software or peer-to-peer communications mechanism.
In a preferred embodiment, described by described distribution of information, the step to other software images of described a plurality of software images comprises: the first of described information is distributed to more than first software image in described a plurality of software image, and the second portion of described information is distributed to more than second software image in described a plurality of software image.This advantage having is to have distributed and crossed over the load that software image is collected diagnosis trace information, and still allowed the collection of comprehensive trace diagnosis information.
In a preferred embodiment, the step of described capturing information is expired after predetermined amount of time.In alternate embodiments, the step of described capturing information is expired after the second fault.The advantage that these embodiment have is, is limited in the time period of catching other diagnosis trace information during it, and therefore the other cost of diagnosis trace information is caught in restriction.
In another embodiment, each of described software image also comprises process or thread; And the information of described reception is relevant with the first process or the Fisrt fault in thread of described process or thread; The distribution of information of described distribution is to other processes or the thread of described process or thread; The second fault in another of described information of catching and described process or thread is relevant.
In another embodiment, the diagnosis trace information of described reception is identified the external factor of described software image as the reason of described Fisrt fault.This advantage having is that the fault causing due to external factor (as network failure) may cause the other trace diagnosis information relevant with external factor that will collect in each software image.
In another embodiment, described method is further comprising the steps of: after described receiving step, whether one or more other software images that check described a plurality of software images are carrying out the software identical with described the first software image in described a plurality of software images.
In another embodiment, described method is further comprising the steps of: by the relevant information combination of the second fault in another of the relevant information of the Fisrt fault in the first software image of described and described a plurality of software images and described and described a plurality of software images; Analyze the information of described combination to determine the reason of Fisrt fault.This combination of trace diagnosis information and analysis allow to determine the reason of fault, and do not need on-the-spot fault or the standby system of reproducing.
In another embodiment, the step of described capturing information continues, until the information of the described combination of described analysis is to determine that the step of the reason of Fisrt fault finishes.This allows to catch the information from any further fault, combines simultaneously and analyzes from the trace diagnosis information of fault before, but allow to stop catching analyzing while finishing.
Embodiments of the invention also provide a kind of department of computer science to unify for realizing the computer program of the said method of catching diagnosis trace information.
From other aspect, the invention provides a kind of for catching the computer program of diagnosis trace information, described computer program comprises: computer-readable recording medium, it can be read by treatment circuit, and the instruction that storage is carried out by treatment circuit, for carrying out for carrying out the method for step of the present invention.
From other aspect, the invention provides a kind of computer program, it is stored on computer-readable medium and can be loaded in the internal storage of digital machine, comprises software code part, when described program is moved on computers, for carrying out step of the present invention.
From other aspect, the invention provides a kind of basic as the method being described with reference to the drawings.
From other aspect, the invention provides a kind of basic as the system being described with reference to the drawings.
Accompanying drawing explanation
Only by way of example, with reference to accompanying drawing, will be described in more detail the preferred embodiments of the present invention now, in accompanying drawing:
Fig. 1 wherein can be used the calcspar with a plurality of software images of communication agency of the present invention;
Fig. 2 is the calcspar of one of software image of Fig. 1;
Fig. 3 is the calcspar of the application software of Fig. 2;
Fig. 4 illustrates the time relationship between a plurality of reflections, Fisrt fault event and the second event of failure of Fig. 1;
Fig. 5 is the process flow diagram of catching diagnosis trace information according to the embodiment of the present invention; And
Fig. 6 is the process flow diagram of analyzing the diagnosis trace information that the embodiment by Fig. 5 catches.
Embodiment
With reference to figure 1, there are each operational processes data independently of application server of software image 102-112, and use communication agency 120 to intercom mutually.Communication agency 120 can be load balancer, supervisory routine, operating system or monitoring software.In another embodiment, communication agency 120 can be peer-to-peer communications mechanism simply.
Fig. 2 illustrates one of software image 102 of Fig. 1.Typically, software image comprises operating system 202, middleware 204 and application software 206.Any of these elements can not be present in software image, and other assemblies of not mentioning above may reside in software image.In a preferred embodiment, each software image is identical with other software images.In other embodiments, each software image has the assembly common with other software images.
Fig. 3 illustrates the application software of Fig. 2.Typically, application software will be implemented as a plurality of processes 302, and each of these processes 302 has a plurality of threads 304.Although Fig. 3 only illustrates a process 302 with a thread 304, can carry out any amount of process, each process can have any amount of thread.Each of the process 302 of carrying out can have the thread 304 of varying number.
Fig. 4 illustrates the timeline of the system of Fig. 1.Video 2 104, video 3 106, video 5 110 and video 6 112 each start to carry out and carry out continuously and there is no a fault.Video and 1 102 in the time 406, start to carry out.Its is carried out continuously until the time 408 while breaking down.This fault causes event of failure.Event of failure causes trace diagnosis information to be recorded to journal file 402.Trace diagnosis information is made as Fisrt fault data capture (FFDC) data that head straight for typically, that is to say, it is the general selection of trace diagnosis information, and this trace diagnosis information is optimized for any external cause (as process signals or I/O mistake) of fail soft assembly and fault can be identified.Because produce the cost of diagnostic message, as performance cost, fault is produced to the disk amount that dump institute's time spent and the output of storage diagnostic message require, detailed trace diagnosis information is not made as always and catches.
With reference to figure 5, the method for embodiments of the invention starts in step 502.In step 504, by communication agency, receive Fisrt fault data.Check 506, to check any other reflection that whether has operation same software.As mentioned above, in other embodiments, each software image has the assembly common with other software images.If there is no other reflections that move in same software, and if there is no video and there are other reflections of common assembly with fault alternatively, in step S512 method, finish.
If have other reflections move in same software, or have alternatively common assembly, in step 508, event of failure also causes the information exchange relevant to fault to cross communication agency 120 1 102 being delivered to other reflections 2 to 6 104-112 from videoing.These reflections 2 to 6 104-112 at least operate on some component softwares identical with the component software of operation in the reflection 1 102 breaking down in the time 408.Fig. 2 to 6 104-112 then can expect with reflection in 1 102 identical fault appear in these reflections and adjust their diagnostic configuration.For example, if the specific software components of videoing in 1 102 has been identified as, cause fault, the more more detailed logging of the operation of this specific software components record can be born in reflection 2 to 6 104-112.This may be included in the extra tracking being opened in component software.As another example, if the reason of fault in 1 102 of videoing is that storer is not enough, 2 to 6 104-112 that video can start the more details that log recording is used about the storer in their reflections.
Fig. 4 also illustrates the second fault occurring in the time 410 in reflection 4 108.This fault causes event of failure.Step 510 in Fig. 5, event of failure causes trace diagnosis information log to be recorded to journal file 404.Journal file 404 is included in component software that the time 408 breaks down in 1 102 at reflection or in the more detailed trace diagnosis information of the failure cause of time 408 in reflection 1 102.If the failure cause in software image 4 108 with cause videoing in 1 before the reason of fault same or similar, the more detailed trace diagnosis information of catching may should take to prevent that the action that further fault occurs is quite helpful to identification failure cause and identification.In Fig. 5, method finishes in step 512.
In another embodiment, in may being called " theory " or " ladder " embodiment, the increase level of catching of trace diagnosis information is crossed over reflection 102-112 by balancing the load.Each reflection is configured to the specific part of software group (stack) or a plurality of specific part to catch more fully trace diagnosis information.Between reflection 102-112, all part acquisition and tracking diagnostic messages that require to software group.Reflection can also be configured to any subset of acquisition and tracking diagnostic message, its may be expectation and to its can some or all reflection between divide coverage.
In another embodiment, said method can not crossed over reflection 102-112 application, but leap process 302 or 304 application of leap thread.The the first process acquisition and tracking diagnostic message breaking down, if its for and when other processes break down, reconfigure what trace diagnosis information and caught by other processes.Similarly, the first thread breaking down can acquisition and tracking diagnostic message, if its for and when other threads break down, reconfigure what trace diagnosis information and caught by other threads.This leap process and cross over thread method can with cross over the Combination of Methods that reflection uses or can use separately.
In another embodiment, before the level of catching of trace diagnosis information turns back to its level before Fisrt fault or is made as another predeterminated level, for the predetermined amount of time after Fisrt fault event, catching of the trace diagnosis information reconfiguring can be crossed over other reflections, process or thread application.
In another embodiment, second or subsequently event of failure occurred and/or after enough trace diagnosis information caught, the level of catching of the trace diagnosis information on all reflections turns back to its level before Fisrt fault event.
In another embodiment, said method can be applied to not identical software group or working load.For example, the fault causing for the external factor by common (as network failure), one or more reflections, process or thread can be configured to catch other trace diagnosis information, and wherein different configurations is suitably for the network failure of the expection of each reflection, process or thread.
Return to Fig. 6, in step 602, bring into use trace diagnosis information analysis fault.In step 604, Fisrt fault data and the combination of the second fault data.Then in step 606, analyze the information of combination.In step 608, analyze and finish.In another embodiment, first analyze Fisrt fault data, then consider that the second fault data is analyzed in the discovery of Fisrt fault data.Can in the first reflection 102, analyze, or can when finishing failure message from the first reflection 102, by other reflections 104-112, be analyzed.
In another embodiment, after fault, start or the reflection 102-112 of restarting can also be configured to catch the trace diagnosis information of increase level.
Person of ordinary skill in the field knows, various aspects of the present invention can be implemented as system, method, computer program or computer program.Therefore, various aspects of the present invention can specific implementation be following form, that is: hardware implementation mode, implement software mode (comprising firmware, resident software, microcode etc.) completely completely, or the embodiment of hardware and software aspect combination, can be referred to as " circuit ", " module " or " system " here.In addition, in certain embodiments, various aspects of the present invention can also be embodied as the form of the computer program in one or more computer-readable mediums, comprise computer-readable program code in this computer-readable medium.
Can adopt the combination in any of one or more computer-readable mediums.Computer-readable medium can be computer-readable signal media or computer-readable recording medium.Computer-readable recording medium for example may be-but not limited to-electricity, magnetic, optical, electrical magnetic, infrared ray or semi-conductive system, device or device, or the combination arbitrarily.The example more specifically of computer-readable recording medium (non exhaustive list) comprising: have the electrical connection, portable computer diskette, hard disk, random-access memory (ram), ROM (read-only memory) (ROM), erasable type programmable read only memory (EPROM or flash memory), optical fiber, Portable, compact dish ROM (read-only memory) (CD-ROM), light storage device, magnetic memory device of one or more wires or the combination of above-mentioned any appropriate.In presents, computer-readable recording medium can be any comprising or stored program tangible medium, and this program can be used or be combined with it by instruction execution system, device or device.
Computer-readable signal media can be included in base band or the data-signal of propagating as a carrier wave part, has wherein carried computer-readable program code.The combination of electromagnetic signal that the data-signal of this propagation can adopt various ways, comprises---but being not limited to---, light signal or above-mentioned any appropriate.Computer-readable signal media can also be any computer-readable medium beyond computer-readable recording medium, and this computer-readable medium can send, propagates or transmit the program for being used or be combined with it by instruction execution system, device or device.
The program code comprising on computer-readable medium can be with any suitable medium transmission, comprises that---but being not limited to---is wireless, wired, optical cable, RF etc., or the combination of above-mentioned any appropriate.
Can write for carrying out the computer program code of the present invention's operation with the combination in any of one or more programming languages, described programming language comprises object-oriented programming language-such as Java, Smalltalk, C++ etc., also comprises conventional process type programming language-such as " C " language or similar programming language.Program code can fully be carried out, partly on subscriber computer, carries out, as an independently software package execution, part part on subscriber computer, carry out or on remote computer or server, carry out completely on remote computer on subscriber computer.In relating to the situation of remote computer, remote computer can be by the network of any kind---comprise LAN (Local Area Network) (LAN) or wide area network (WAN)-be connected to subscriber computer, or, can be connected to outer computer (for example utilizing ISP to pass through Internet connection).
Below with reference to describing the present invention according to process flow diagram and/or the block diagram of the method for the embodiment of the present invention, device (system) and computer program.Should be appreciated that the combination of each square frame in each square frame of process flow diagram and/or block diagram and process flow diagram and/or block diagram, can be realized by computer program instructions.These computer program instructions can offer the processor of multi-purpose computer, special purpose computer or other programmable data treating apparatus, thereby produce a kind of machine, make these computer program instructions when the processor by computing machine or other programmable data treating apparatus is carried out, produced the device of the function/action of stipulating in the one or more square frames in realization flow figure and/or block diagram.
Also these computer program instructions can be stored in computer-readable medium, these instructions make computing machine, other programmable data treating apparatus or other equipment with ad hoc fashion work, thereby the instruction being stored in computer-readable medium just produces the manufacture (article of manufacture) of the instruction of the function/action of stipulating in the one or more square frames that comprise in realization flow figure and/or block diagram.
Computer program instructions can also be loaded into computing machine, other programmable data treating apparatus or other equipment, so that sequence of operations step is carried out on computing machine, other programmable devices or other equipment, to produce computer implemented processing, the instruction that makes to carry out on computing machine or other programmable devices is provided for realizing the function/action of appointment in process flow diagram and/or calcspar square or a plurality of square.
Process flow diagram in accompanying drawing and block diagram have shown the system according to a plurality of embodiment of the present invention, architectural framework in the cards, function and the operation of method and computer program product.In this, each square frame in process flow diagram or block diagram can represent a part for module, program segment or a code, and a part for described module, program segment or code comprises one or more for realizing the executable instruction of the logic function of regulation.Also it should be noted that what the function marking in square frame also can be marked to be different from accompanying drawing occurs in sequence in some realization as an alternative.For example, in fact two continuous square frames can be carried out substantially concurrently, and they also can be carried out by contrary order sometimes, and this determines according to related function.Also be noted that, each square frame in block diagram and/or process flow diagram and the combination of the square frame in block diagram and/or process flow diagram, can realize by the special-purpose hardware based system of the function putting rules into practice or action, or can realize with the combination of specialized hardware and computer instruction.
For fear of doubt, term " comprises " as used herein, run through instructions and claim be not interpreted as meaning " only by ... form ".
Claims (15)
1. for catching a method for diagnosis trace information, described method, for having the computer system of a plurality of software images, said method comprising the steps of:
Receive the information relevant with Fisrt fault in first of described a plurality of software images;
Other software image by described distribution of information to described a plurality of software images;
Catch the information relevant with the second fault in another in described a plurality of software images.
2. method according to claim 1, the execution of the step of the described information of wherein said distribution in load balancer, supervisory routine, operating system, monitoring software or peer-to-peer communications mechanism.
According to claim 1 to the method described in any one of claim 2, wherein said by described distribution of information, the step to other software images of described a plurality of software images comprises: the first of described information is distributed to more than first software image in described a plurality of software image, and the second portion of described information is distributed to more than second software image in described a plurality of software image.
According to claim 1 to the method described in any one of claim 3, the step of wherein said capturing information is expired after predetermined amount of time.
According to claim 1 to the method described in any one of claim 3, the step of wherein said capturing information is expired after the second fault.
According to claim 1 to the method described in any one of claim 5, wherein:
Each of described software image also comprises process or thread; And
The information of described reception is relevant with the first process or the Fisrt fault in thread of described process or thread;
The distribution of information of described distribution is to other processes or the thread of described process or thread;
The second fault in another of described information of catching and described process or thread is relevant.
According to claim 1 to the method described in any one of claim 6, the information of wherein said reception is identified the external factor of described software image as the reason of described Fisrt fault.
According to claim 1 to the method described in any one of claim 7, further comprising the steps of: after described receiving step, whether one or more other software images that check described a plurality of software images are carrying out the software identical with described the first software image in described a plurality of software images.
According to claim 1 to the method described in any one of claim 8, further comprising the steps of:
By the relevant information combination of the second fault in another of the relevant information of the Fisrt fault in the first software image of described and described a plurality of software images and described and described a plurality of software images;
Analyze the information of described combination to determine the reason of Fisrt fault.
10. method according to claim 9, the step of wherein said capturing information continues, until the information of the described combination of described analysis is to determine that the described step of the reason of Fisrt fault finishes.
11. 1 kinds of computer systems, comprising:
A plurality of software images;
Journal file, comprises the relevant trace diagnosis information of Fisrt fault in the first software image with described a plurality of software images;
Communication agency, for other software images to described a plurality of software images by the distribution of information from described journal file;
Described other software images of described a plurality of software images are caught the information relevant with the second fault in another of described a plurality of software images.
12. computer systems according to claim 11, wherein said communication agency is distributed to more than first software image in described a plurality of software image by the first of described information, and the second portion of described information is distributed to more than second software image in described a plurality of software image.
13. according to claim 11 the arbitrary described computer system to claim 12, wherein:
Each of described software image also comprises process or thread; And
The information of described reception is relevant with the first process or the Fisrt fault in thread of described process or thread;
The distribution of information of described distribution is to other processes or the thread of described process or thread;
The second fault in another of described information of catching and described process or thread is relevant.
14. according to claim 11 the arbitrary described computer system to claim 13, one of one of wherein said communication agency or described a plurality of software images:
By the relevant information combination of the second fault in another of the relevant information of the Fisrt fault in the first software image of described and described a plurality of software images and described and described a plurality of software images;
Analyze the information of described combination to determine the reason of Fisrt fault.
15. 1 kinds according to any claim of claim 1-10 for catch diagnosis trace information system.
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
GB1214159.4 | 2012-08-08 | ||
GB1214159.4A GB2504728A (en) | 2012-08-08 | 2012-08-08 | Second failure data capture in co-operating multi-image systems |
Publications (2)
Publication Number | Publication Date |
---|---|
CN103577273A true CN103577273A (en) | 2014-02-12 |
CN103577273B CN103577273B (en) | 2017-06-06 |
Family
ID=46935094
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201310343980.0A Active CN103577273B (en) | 2012-08-08 | 2013-08-08 | Method and computer system for capturing diagnosis tracking information |
Country Status (3)
Country | Link |
---|---|
US (4) | US9436590B2 (en) |
CN (1) | CN103577273B (en) |
GB (1) | GB2504728A (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105988882A (en) * | 2015-02-12 | 2016-10-05 | 广东欧珀移动通信有限公司 | Application software fault recovery method and terminal equipment |
CN109757771A (en) * | 2019-02-22 | 2019-05-17 | 红云红河烟草(集团)有限责任公司 | Filter-stick forming device shuts down duration calculation method and computing device |
Families Citing this family (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2013162596A1 (en) * | 2012-04-27 | 2013-10-31 | Hewlett-Packard Development Company, L.P. | Mapping application dependencies at runtime |
US10970152B2 (en) | 2017-11-21 | 2021-04-06 | International Business Machines Corporation | Notification of network connection errors between connected software systems |
US10684910B2 (en) * | 2018-04-17 | 2020-06-16 | International Business Machines Corporation | Intelligent responding to error screen associated errors |
JP7367495B2 (en) * | 2019-11-29 | 2023-10-24 | 富士通株式会社 | Information processing equipment and communication cable log information collection method |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN1077037A (en) * | 1992-03-06 | 1993-10-06 | 国际商业机器公司 | Multi-media computer diagnostic system |
CN101226495A (en) * | 2007-01-19 | 2008-07-23 | 国际商业机器公司 | System and method for the capture and preservation of intermediate error state data |
US20080222456A1 (en) * | 2007-03-05 | 2008-09-11 | Angela Richards Jones | Method and System for Implementing Dependency Aware First Failure Data Capture |
Family Cites Families (20)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5761739A (en) * | 1993-06-08 | 1998-06-02 | International Business Machines Corporation | Methods and systems for creating a storage dump within a coupling facility of a multisystem enviroment |
US6651183B1 (en) * | 1999-10-28 | 2003-11-18 | International Business Machines Corporation | Technique for referencing failure information representative of multiple related failures in a distributed computing environment |
CA2315449A1 (en) * | 2000-08-10 | 2002-02-10 | Ibm Canada Limited-Ibm Canada Limitee | Generation of runtime execution traces of applications and associated problem determination |
US6813731B2 (en) * | 2001-02-26 | 2004-11-02 | Emc Corporation | Methods and apparatus for accessing trace data |
US7120685B2 (en) * | 2001-06-26 | 2006-10-10 | International Business Machines Corporation | Method and apparatus for dynamic configurable logging of activities in a distributed computing system |
US6779132B2 (en) * | 2001-08-31 | 2004-08-17 | Bull Hn Information Systems Inc. | Preserving dump capability after a fault-on-fault or related type failure in a fault tolerant computer system |
US7080287B2 (en) * | 2002-07-11 | 2006-07-18 | International Business Machines Corporation | First failure data capture |
US7840856B2 (en) * | 2002-11-07 | 2010-11-23 | International Business Machines Corporation | Object introspection for first failure data capture |
CA2433750A1 (en) * | 2003-06-27 | 2004-12-27 | Ibm Canada Limited - Ibm Canada Limitee | Automatic collection of trace detail and history data |
GB0412104D0 (en) | 2004-05-29 | 2004-06-30 | Ibm | Apparatus method and program for recording diagnostic trace information |
US7519510B2 (en) * | 2004-11-18 | 2009-04-14 | International Business Machines Corporation | Derivative performance counter mechanism |
US7383471B2 (en) * | 2004-12-28 | 2008-06-03 | Hewlett-Packard Development Company, L.P. | Diagnostic memory dumping |
US20060195731A1 (en) * | 2005-02-17 | 2006-08-31 | International Business Machines Corporation | First failure data capture based on threshold violation |
US7487407B2 (en) * | 2005-07-12 | 2009-02-03 | International Business Machines Corporation | Identification of root cause for a transaction response time problem in a distributed environment |
WO2007088575A1 (en) * | 2006-01-31 | 2007-08-09 | Fujitsu Limited | System monitor device control method, program, and computer system |
US8949671B2 (en) * | 2008-01-30 | 2015-02-03 | International Business Machines Corporation | Fault detection, diagnosis, and prevention for complex computing systems |
US8381014B2 (en) * | 2010-05-06 | 2013-02-19 | International Business Machines Corporation | Node controller first failure error management for a distributed system |
WO2012026035A1 (en) | 2010-08-27 | 2012-03-01 | 富士通株式会社 | Fault processing method, fault processing system, fault processing device and fault processing program |
JP5252014B2 (en) * | 2011-03-15 | 2013-07-31 | オムロン株式会社 | Control device, control system, tool device, and collection instruction program |
US8615676B2 (en) * | 2011-03-24 | 2013-12-24 | International Business Machines Corporation | Providing first field data capture in a virtual input/output server (VIOS) cluster environment with cluster-aware vioses |
-
2012
- 2012-08-08 GB GB1214159.4A patent/GB2504728A/en not_active Withdrawn
-
2013
- 2013-06-28 US US13/930,875 patent/US9436590B2/en not_active Expired - Fee Related
- 2013-08-08 CN CN201310343980.0A patent/CN103577273B/en active Active
-
2014
- 2014-08-29 US US14/473,089 patent/US9424170B2/en not_active Expired - Fee Related
-
2016
- 2016-03-14 US US15/068,832 patent/US9852051B2/en active Active
- 2016-03-14 US US15/068,910 patent/US9921950B2/en active Active
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN1077037A (en) * | 1992-03-06 | 1993-10-06 | 国际商业机器公司 | Multi-media computer diagnostic system |
CN101226495A (en) * | 2007-01-19 | 2008-07-23 | 国际商业机器公司 | System and method for the capture and preservation of intermediate error state data |
US20080222456A1 (en) * | 2007-03-05 | 2008-09-11 | Angela Richards Jones | Method and System for Implementing Dependency Aware First Failure Data Capture |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105988882A (en) * | 2015-02-12 | 2016-10-05 | 广东欧珀移动通信有限公司 | Application software fault recovery method and terminal equipment |
CN105988882B (en) * | 2015-02-12 | 2019-08-27 | Oppo广东移动通信有限公司 | A kind of application software fault repairing method and terminal device |
CN109757771A (en) * | 2019-02-22 | 2019-05-17 | 红云红河烟草(集团)有限责任公司 | Filter-stick forming device shuts down duration calculation method and computing device |
Also Published As
Publication number | Publication date |
---|---|
US20160196177A1 (en) | 2016-07-07 |
US20140047280A1 (en) | 2014-02-13 |
GB201214159D0 (en) | 2012-09-19 |
US20160203037A1 (en) | 2016-07-14 |
US9852051B2 (en) | 2017-12-26 |
GB2504728A (en) | 2014-02-12 |
US9921950B2 (en) | 2018-03-20 |
US9436590B2 (en) | 2016-09-06 |
CN103577273B (en) | 2017-06-06 |
US9424170B2 (en) | 2016-08-23 |
US20140372808A1 (en) | 2014-12-18 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN103577273A (en) | Second failure data capture in co-operating multi-image systems | |
CN111752799A (en) | Service link tracking method, device, equipment and storage medium | |
CN105556482A (en) | Monitoring mobile application performance | |
US20160274997A1 (en) | End user monitoring to automate issue tracking | |
US9276819B2 (en) | Network traffic monitoring | |
CN107045475B (en) | Test method and device | |
CN103544095A (en) | Server program monitoring method and system of server program | |
CN108762966A (en) | System exception hold-up interception method, device, computer equipment and storage medium | |
CN112333044B (en) | Shunting equipment performance test method, device and system, electronic equipment and medium | |
CN103514075A (en) | Method and device for monitoring API function calling in mobile terminal | |
CN112860569A (en) | Automatic testing method and device, electronic equipment and storage medium | |
CN110515821A (en) | Based on the event-handling method, electronic equipment and computer storage medium buried a little | |
CN114745295A (en) | Data acquisition method, device, equipment and readable storage medium | |
CN111970151A (en) | Flow fault positioning method and system for virtual and container network | |
CN116431443A (en) | Log recording method, device, computer equipment and computer readable storage medium | |
US10462234B2 (en) | Application resilience system and method thereof for applications deployed on platform | |
CN115658500A (en) | Vue-based front-end error log uploading method and system in hybrid development | |
US10432472B1 (en) | Network operation center (NOC) tool pattern detection and trigger to real-time monitoring operation mode | |
KR101828156B1 (en) | Transaction Monitoring System and Operating method thereof | |
CN112799910A (en) | Hierarchical monitoring method and device | |
US20080154657A1 (en) | System for monitoring order fulfillment of telecommunication services | |
CN113542796B (en) | Video evaluation method, device, computer equipment and storage medium | |
CN107577546B (en) | Information processing method and device and electronic equipment | |
CN117687870A (en) | Mobile terminal white screen monitoring method, system, electronic equipment and medium | |
CN114245052A (en) | Video data storage method and device, storage medium and electronic device |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |