CN105306272A - Method and system for collecting fault scene information of information system - Google Patents

Method and system for collecting fault scene information of information system Download PDF

Info

Publication number
CN105306272A
CN105306272A CN201510763286.3A CN201510763286A CN105306272A CN 105306272 A CN105306272 A CN 105306272A CN 201510763286 A CN201510763286 A CN 201510763286A CN 105306272 A CN105306272 A CN 105306272A
Authority
CN
China
Prior art keywords
information
parameter
information system
running status
fault
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201510763286.3A
Other languages
Chinese (zh)
Other versions
CN105306272B (en
Inventor
邬大卫
安卫杰
信怀义
贺媛
顾涛
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
China Construction Bank Corp
Original Assignee
China Construction Bank Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by China Construction Bank Corp filed Critical China Construction Bank Corp
Priority to CN201510763286.3A priority Critical patent/CN105306272B/en
Priority to EP15893168.3A priority patent/EP3148116B1/en
Priority to PCT/CN2015/098824 priority patent/WO2016188100A1/en
Priority to ES15893168.3T priority patent/ES2687384T3/en
Publication of CN105306272A publication Critical patent/CN105306272A/en
Priority to US15/388,865 priority patent/US10545807B2/en
Application granted granted Critical
Publication of CN105306272B publication Critical patent/CN105306272B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/0703Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation
    • G06F11/0751Error or fault detection not based on redundancy
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/06Management of faults, events, alarms or notifications
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/0703Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation
    • G06F11/0766Error or fault reporting or storing
    • G06F11/0778Dumping, i.e. gathering error/state information after a fault for later diagnosis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/0703Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation
    • G06F11/0766Error or fault reporting or storing
    • G06F11/0781Error filtering or prioritizing based on a policy defined by the user or on a policy defined by a hardware/software module, e.g. according to a severity level
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/0703Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation
    • G06F11/0766Error or fault reporting or storing
    • G06F11/0787Storage of error reports, e.g. persistent data storage, storage using memory protection
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/3051Monitoring arrangements for monitoring the configuration of the computing system or of the computing system component, e.g. monitoring the presence of processing resources, peripherals, I/O links, software programs
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/06Management of faults, events, alarms or notifications
    • H04L41/0631Management of faults, events, alarms or notifications using root cause analysis; using analysis of correlation between notifications, alarms or events based on decision criteria, e.g. hierarchy, tree or time analysis
    • H04L41/0645Management of faults, events, alarms or notifications using root cause analysis; using analysis of correlation between notifications, alarms or events based on decision criteria, e.g. hierarchy, tree or time analysis by additionally acting on or stimulating the network after receiving notifications
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/06Management of faults, events, alarms or notifications
    • H04L41/069Management of faults, events, alarms or notifications using logs of notifications; Post-processing of notifications
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L43/00Arrangements for monitoring or testing data switching networks
    • H04L43/04Processing captured monitoring data, e.g. for logfile generation
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L43/00Arrangements for monitoring or testing data switching networks
    • H04L43/08Monitoring or testing based on specific metrics, e.g. QoS, energy consumption or environmental parameters
    • H04L43/0805Monitoring or testing based on specific metrics, e.g. QoS, energy consumption or environmental parameters by checking availability
    • H04L43/0817Monitoring or testing based on specific metrics, e.g. QoS, energy consumption or environmental parameters by checking availability by checking functioning
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/14Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic
    • H04L63/1408Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic by monitoring network traffic
    • H04L63/1416Event detection, e.g. attack signature detection
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/14Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic
    • H04L63/1408Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic by monitoring network traffic
    • H04L63/1425Traffic logging, e.g. anomaly detection

Abstract

Embodiments of the invention disclose a method and a system for collecting fault scene information of an information system. The method comprises the following steps of periodically collecting information of each parameter in a first preset parameter set of the information system at a preset time interval, monitoring the running status of the information system, judging whether the running status of the information system goes wrong, and collecting information of of each parameter in a second preset parameter set of the information system when the running status of the information system goes wrong. According to the method and the system for collecting the information, the information of each parameter in the first preset parameter set of the information system can be collected at the preset time interval, when the running status of the information system goes wrong, the information of each parameter in the second preset parameter set of the information system is automatically collected, and manual participation and collection are no longer needed, so that comprehensiveness and timeliness of information collection are guaranteed to meet subsequent requirements for analysis and positioning of a fault problem, and furthermore, the risk of faulty operation in a manual information collection process in emergency circumstances is avoided.

Description

Information system fault scenes formation gathering method and system
Technical field
The present invention relates to information gathering techniques field, particularly relate to a kind of information system fault scenes formation gathering method and system.
Background technology
The fault scenes information of information system is most important for the failure reason analysis of follow-up system.The contingency occurred due to fault in information system, transience and complexity, if scene when can not break down to information system is carried out comprehensively, fault information collection timely, so will be difficult to analysis and the location of carrying out fault recurrence and failure problems.
At present owing to lacking the instrument that dependent failure scene information is collected, information system operations staff generally adopts manual type to carry out fault information collection, but, to there are differences due to operations staff's technical capability and production run requires the demand of resuming production in time when breaking down, when causing information system to break down, the scene information that often breaks down collects the complete and problem such as not in time, cause key message to lack, and then the information that prior art information system fault scenes is collected cannot meet the follow-up needs analyzed failure problems.
Summary of the invention
For solving the problems of the technologies described above, embodiments providing a kind of information system fault scenes formation gathering method and system, with information that is timely, that comprehensively collect information system fault scenes, meeting follow-up demand failure problems analyzed, locates.
For solving the problem, embodiments provide following technical scheme:
A kind of information system fault scenes formation gathering method, comprising:
With each parameter information in the first parameter preset set of information system described in prefixed time interval timing acquiring;
The running status of described information system is monitored, judges whether the running status of described information system breaks down;
When the running status of described information system breaks down, gather the information of each parameter in the second parameter preset set of described information system.
Preferably, when the running status of described information system breaks down, the information gathering each parameter in the second parameter preset set of described information system comprises:
When the running status of described information system breaks down, described fault message is mated with presupposed information, obtain the preset failure scene type that this fault message is corresponding;
The fault scenes type corresponding according to described fault message, gathers the information of each parameter in the second parameter preset set corresponding to this fault scenes type.
Preferably, the method also comprises: mated with presupposed information by described fault message, when not obtaining the preset failure scene type corresponding with this fault message, increases fault scenes type corresponding with it.
Preferably, the span of described prefixed time interval is 1min-5min, comprises endpoint value.
Preferably, the method also comprises:
The information of each parameter in the information of each parameter in the first parameter preset set of described collection and the second parameter preset set is stored.
A kind of information system fault scenes Information Collection System, be applied to the information system fault scenes formation gathering method described in above-mentioned any one, this system comprises:
First acquisition module, with each parameter information in the first parameter preset set of information system described in prefixed time interval timing acquiring;
Monitoring module, for monitoring the running status of described information system, judges whether the running status of described information system breaks down;
Second acquisition module, for when the running status of described information system breaks down, gathers the information of each parameter in the second parameter preset set of described information system.
Preferably, described second acquisition module comprises:
Information matching unit, for when the running status of described information system breaks down, described fault message is mated with presupposed information, obtain the preset failure scene type that this fault message is corresponding, wherein, the presupposed information of various faults scene type and correspondence thereof is previously provided with in described information matching unit;
Information collection unit, for the fault scenes type corresponding according to described fault message, gathers the information of each parameter in the second parameter preset set corresponding to this fault scenes type.
Preferably, described second acquisition module also comprises:
Information Tip element, for described fault message is mated with presupposed information, when not obtaining the preset failure scene type corresponding with this fault message, sending information, there is not fault scenes type corresponding with it for pointing out in current system in described information.
Preferably, described second acquisition module also comprises:
Information supplementary units, for breaking down when described information system, and when there is not fault scenes type corresponding with it in current system, collects the information of current failure, gathers and be organized into its corresponding fault scenes type, be increased in described information matching unit.
Preferably, this system also comprises:
Memory module, for storing the information of each parameter in the information of each parameter in the first parameter preset set of described collection and the second parameter preset set.
Compared with prior art, technique scheme has the following advantages:
The information system fault scenes formation gathering method that the embodiment of the present invention provides and system, with each parameter information in the first parameter preset set of information system described in prefixed time interval timing acquiring, and the running status of described information system is monitored, judge whether the running status of described information system breaks down, when the running status of described information system breaks down, gather the information of each parameter in the second parameter preset set of described information system.As can be seen here, the information system fault scenes formation gathering method that the embodiment of the present invention provides and system, each parameter information in the first parameter preset set of described information system can be gathered with prefixed time interval, and when the running status of described information system breaks down, the information of each parameter in second parameter preset set of the described information system of automatic collection, and no longer need artificial participation and collection, thus ensure that the comprehensive of information and promptness, to meet the follow-up needs analyzed failure problems and locate, and in avoiding in emergency circumstances artificial Information Monitoring process, there is the risk of misoperation.
Accompanying drawing explanation
In order to be illustrated more clearly in the embodiment of the present invention or technical scheme of the prior art, be briefly described to the accompanying drawing used required in embodiment or description of the prior art below, apparently, accompanying drawing in the following describes is only some embodiments of the present invention, for those of ordinary skill in the art, under the prerequisite not paying creative work, other accompanying drawing can also be obtained according to these accompanying drawings.
The information system fault scenes formation gathering method schematic flow sheet that Fig. 1 provides for one embodiment of the invention;
The information system fault scenes formation gathering method schematic flow sheet that Fig. 2 provides for another embodiment of the present invention;
The information system fault scenes Information Collection System structure chart that Fig. 3 provides for one embodiment of the invention;
The information system fault scenes Information Collection System structure chart that Fig. 4 provides for another embodiment of the present invention.
Embodiment
Just as described in the background section, the information that prior art information system fault scenes is collected cannot meet the follow-up needs analyzed failure problems.
In view of this, embodiments provide a kind of information system fault scenes formation gathering method, comprising:
With each parameter information in the first parameter preset set of information system described in prefixed time interval timing acquiring;
The running status of described information system is monitored, judges whether the running status of described information system breaks down;
When the running status of described information system breaks down, gather the information of each parameter in the second parameter preset set of described information system.
Accordingly, the embodiment of the present invention additionally provides a kind of information system fault scenes Information Collection System, and be applied to above-mentioned information system fault scenes formation gathering method, this system comprises:
First acquisition module, with each parameter information in the first parameter preset set of information system described in prefixed time interval timing acquiring;
Monitoring module, for monitoring the running status of described information system, judges whether the running status of described information system breaks down;
Second acquisition module, for when the running status of described information system breaks down, gathers the information of each parameter in the second parameter preset set of described information system.
The information system fault scenes formation gathering method that the embodiment of the present invention provides and system, each parameter information in the first parameter preset set of described information system can be gathered with prefixed time interval, and when the running status of described information system breaks down, the information of each parameter in second parameter preset set of the described information system of automatic collection, and no longer need artificial participation and collection, thus ensure that the comprehensive of information and promptness, to meet the follow-up needs analyzed failure problems and locate, and in emergency circumstances avoid the risk that misoperation occurs in artificial Information Monitoring process.
It is more than the core concept of the application, below in conjunction with the accompanying drawing in the embodiment of the present invention, the technical scheme in the embodiment of the present invention is clearly and completely described, obviously, described embodiment is only the present invention's part embodiment, instead of whole embodiments.Based on the embodiment in the present invention, those of ordinary skill in the art, not making the every other embodiment obtained under creative work prerequisite, belong to the scope of protection of the invention.
Set forth a lot of detail in the following description so that fully understand the present invention, but the present invention can also adopt other to be different from alternate manner described here to implement, those skilled in the art can when without prejudice to doing similar popularization when intension of the present invention, therefore the present invention is by the restriction of following public specific embodiment.
Embodiments provide a kind of information system fault scenes formation gathering method, as shown in Figure 1, the method comprises:
Step S1: with each parameter information in the first parameter preset set of information system described in prefixed time interval timing acquiring.
It should be noted that, in embodiments of the present invention, no matter whether described information system breaks down, all with each parameter information in the first parameter preset set of information system described in prefixed time interval timing acquiring.In one particular embodiment of the present invention, described first parameter preset set comprises: operation system information and weblogic information etc., wherein, described operation system information comprises: run the File Open quantity information etc. that progress information, occupation condition information, network interface card service condition information and each system process are corresponding in described information system; Described weblogic information comprises: service processes information, garbage reclamation log information, serve log information and JVM information etc., the present invention does not limit this, specifically depends on the circumstances.
Also it should be noted that, in embodiments of the present invention, JVM is the abbreviation of JavaVirtualMachine (Java Virtual Machine), and it is a kind of specification for computing equipment, a computer fabricated out, is realized by the various computer function of analogue simulation on the computer of reality.WebLogic is a middleware based on JAVAEE framework, for developing, integrated, dispose and manage that large-scale distributed Web applies, the Java application server of network application and database application.Because JVM and WebLogic is known by those skilled in the art, the present invention is no longer described in detail this.
On the basis of above-mentioned any embodiment, in a preferred embodiment of the invention, the span of described prefixed time interval is 1min-5min, comprise endpoint value, but the present invention does not limit this, specifically depend on the circumstances, as long as ensure described prefixed time interval be less than described information system break down after to described information system restart run between the time interval, thus ensure when described information system breaks down, the information of each parameter in moment first parameter preset set of breaking down can be collected.
Step S2: monitor the running status of described information system, judges whether the running status of described information system breaks down.
It should be noted that, in embodiments of the present invention, when described information system is in running status, complete monitoring need be carried out to the running status of described information system.In one particular embodiment of the present invention, the running status of described information system is monitored, judges whether the running status of described information system breaks down and comprise:
The running status of described information system is monitored, judges whether the running status of described information system exception occurs;
When the running status of described information system occurs abnormal, gather described information system and abnormal information occurs;
Judge that described information system abnormal information occurs and whether meets pre-conditioned;
When the information that exception occurs described information system meets pre-conditioned, judge that the running status of described information system breaks down.
Also it should be noted that, in embodiments of the present invention, describedly pre-conditionedly whether can exceed threshold value for the processor occupancy of described information system, also can for whether there is keyword in the error log of described information system, can also be other Rule of judgment, or comprising multiple Rule of judgment simultaneously, the present invention does not limit this, specifically depends on the circumstances.
Step S3: when the running status of described information system breaks down, gathers the information of each parameter in the second parameter preset set of described information system.
In a preferred embodiment of the invention, when the running status of described information system breaks down, the information gathering each parameter in the second parameter preset set of described information system comprises:
Step S301: when the running status of described information system breaks down, mated by described fault message with presupposed information, obtains the preset failure scene type that this fault message is corresponding;
Step S302: the fault scenes type corresponding according to described fault message, gathers the information of each parameter in the second parameter preset set corresponding to this fault scenes type.
On the basis of above-described embodiment, in one embodiment of the invention, the method also comprises:
Step S303: described fault message is mated with presupposed information, when not obtaining the preset failure scene type corresponding with this fault message, send information, described information breaks down for pointing out information system described in the operations staff of described information system, and in described information system, there is not fault scenes type corresponding with it, thus automatically cannot collect its fault message.In another embodiment of the present invention, the method also comprises: mated with presupposed information by described fault message, when not obtaining the preset failure scene type corresponding with this fault message, increase fault scenes type corresponding with it, thus carry out perfect to preset failure scene type, gather the contingent all fault scenes information of described information system as far as possible in time, comprehensively, for the analysis of consequent malfunction problem, location provide strong support.
On the basis of above-mentioned any embodiment, in one embodiment of the invention, described second parameter preset set comprises: the application log information of described fault scenes type and application configuration information etc., the present invention does not limit this, specifically depends on the circumstances.As in one particular embodiment of the present invention, when machine or service failure are delayed in the weblogic serve port service monitoring described information system, described second parameter preset set comprises: whether the backup of the application daily record of this serve port, this serve port dump (dump is for showing java thread pond execution thread function calling relationship), backup dump file (preserving the running status of described system process, for debugging driver for driver coder) etc. occur.
It should be noted that, in embodiments of the present invention, parameter type in second parameter preset set of different preset failure scene type can be identical, also can be different, the present invention does not limit this, specifically depend on the circumstances, to guarantee to carry out information targetedly to fault scenes type corresponding when breaking down, thus realize the quick, effective of fault information collection.
Also it should be noted that, in embodiments of the present invention, when the running status of described information system breaks down, the parameter information collection of described first parameter preset set and the parameter information collection of described second parameter preset set can be carried out simultaneously, also can carry out successively, the present invention does not limit this, specifically depends on the circumstances.
On the basis of above-mentioned any embodiment, in one embodiment of the invention, the method also comprises:
Step S4: store the information of each parameter in the information of each parameter in the first parameter preset set of described collection and the second parameter preset set, so that operations staff's inquiry of described information system and calling.
It should be noted that, on the basis of above-described embodiment, in one embodiment of the invention, the information of each parameter in the information of each parameter in the first parameter preset set of described collection and the second parameter preset set is also comprised before storing: the information of each parameter in the information of each parameter in the first parameter preset set of described collection and the second parameter preset set is sorted out, arranged, and then classification stores.Concrete sorting technique can be determined according to its fault scenes type, and also can be convenient to the method for inquiry or analysis according to other and determine, the present invention limit this, specifically depends on the circumstances.
In sum, the information system fault scenes formation gathering method that the embodiment of the present invention provides, each parameter information in the first parameter preset set of described information system can be gathered with prefixed time interval, and when the running status of described information system breaks down, the information of each parameter in second parameter preset set of the described information system of automatic collection, and no longer need manual intervention and wait for, not only while the artificial workload of minimizing, can guarantee that the very first time occurred in fault carries out the mobile phone of fault scenes information, ensure that the ageing and comprehensive of fault scenes information, follow-up failure problems to be analyzed to meet, the demand of location, also avoid in case of emergency people is the risk of misoperation.
Accordingly, the embodiment of the present invention additionally provides a kind of information system fault scenes Information Collection System, be applied to the information system fault scenes formation gathering method that the above-mentioned any embodiment of the present invention provides. as shown in Figure 3, the information system fault scenes Information Collection System that the embodiment of the present invention provides comprises:
First acquisition module 100, with each parameter information in the first parameter preset set of information system described in prefixed time interval timing acquiring;
Monitoring module 200, for monitoring the running status of described information system, judges whether the running status of described information system breaks down;
Second acquisition module 300, for when the running status of described information system breaks down, gathers the information of each parameter in the second parameter preset set of described information system.
In embodiments of the present invention, no matter whether described information system breaks down, and described first acquisition module 100 is all with each parameter information in the first parameter preset set of information system described in prefixed time interval timing acquiring.In one particular embodiment of the present invention, described first parameter preset set comprises: operation system information and weblogic information etc., wherein, described operation system information comprises: run the File Open quantity etc. that progress information, occupation condition information, network interface card service condition information and each system process are corresponding in described information system; Described weblogic information comprises: service processes information, garbage reclamation log information, serve log information and JVM information etc.The present invention does not limit this, specifically depends on the circumstances.
On the basis of above-mentioned any embodiment, in a preferred embodiment of the invention, the span of described prefixed time interval is 1min-5min, comprise endpoint value, but the present invention does not limit this, specifically depend on the circumstances, as long as ensure described prefixed time interval be less than described information system break down after to described information system restart run between the time interval, thus ensure when described information system breaks down, the information of each parameter in moment first parameter preset set of breaking down can be collected.
Preferably, on the basis of above-mentioned any embodiment, in one embodiment of the invention, when described information system is in running status, described monitoring module 200 need carry out complete monitoring to the running status of described information system.Concrete, in one embodiment of the invention, described monitoring module 200 comprises:
Monitoring unit, for monitoring the running status of described information system, judges whether the running status of described information system exception occurs;
Collecting unit, for when the running status of described information system occurs abnormal, gathers described information system and abnormal information occurs;
First judging unit, for judging that described information system abnormal information occurs and whether meets pre-conditioned;
Second judging unit, for when the information that exception occurs described information system meets pre-conditioned, judge that the running status of described information system breaks down, send early warning information to described second acquisition module 300, trigger described second acquisition module 300 to work, and the abnormal information simultaneously monitored sends to described second acquisition module.
It should be noted that, in embodiments of the present invention, describedly pre-conditionedly whether can exceed threshold value for the processor occupancy of described information system, also can for whether there is keyword in the error log of described information system, can also be other Rule of judgment, or comprising multiple Rule of judgment simultaneously, the present invention does not limit this, specifically depends on the circumstances.
On the basis of above-mentioned any embodiment, in one embodiment of the invention, described second acquisition module 300 comprises:
Information matching unit, for when the running status of described information system breaks down, described fault message is mated with presupposed information, obtain the preset failure scene type that this fault message is corresponding, wherein, the presupposed information of various faults scene type and correspondence thereof is previously provided with in described information matching unit, so that described information matching unit is when receiving the abnormal information that described monitoring module sends, can according to this abnormal information, by inquiring about the presupposed information of described various faults scene type and correspondence thereof, determine the fault scenes type of its correspondence fast,
Information collection unit, for the fault scenes type corresponding according to described fault message, gathers the information of each parameter in the second parameter preset set corresponding to this fault scenes type.
On the basis of above-mentioned any embodiment, in one embodiment of the invention, described second parameter preset set comprises: the application log information of described fault scenes type and application configuration information etc., the present invention does not limit this, specifically depends on the circumstances.As in one particular embodiment of the present invention, when machine or service failure are delayed in the weblogic serve port service monitoring described information system, described second parameter preset set comprises: whether the backup of the application daily record of this serve port, this serve port dump (dump is for showing java thread pond execution thread function calling relationship), backup dump file (preserving the running status of described system process, for debugging driver for driver coder) etc. occur.The present invention does not limit this, specifically depends on the circumstances.
It should be noted that, in embodiments of the present invention, parameter type in second parameter preset set of different preset failure scene type can be identical, also can be different, the present invention does not limit this, specifically depend on the circumstances, to guarantee to carry out information targetedly to fault scenes type corresponding when breaking down, thus realize the quick, effective of fault information collection.
On the basis of above-described embodiment, in one embodiment of the invention, described second acquisition module 300 also comprises: information Tip element, for described fault message is mated with presupposed information, when not obtaining the preset failure scene type corresponding with this fault message, send information, there is not fault scenes type corresponding with it for pointing out in current system in described information, cannot start the information of described information collection unit to this fault scenes and automatically collect.
On the basis of above-described embodiment, in one embodiment of the invention, described second acquisition module 300 also comprises:
Information supplementary units, for breaking down when described information system, and when there is not fault scenes type corresponding with it in current system, collect the information of current failure, gather and be organized into its corresponding fault scenes type, be increased in described information matching unit, thus carry out perfect to the preset failure scene type stored in described matching unit, gather the contingent all fault scenes information of described information system as far as possible in time, comprehensively, for the analysis of consequent malfunction problem, location provide strong support.
It should be noted that, in embodiments of the present invention, when the running status of described information system breaks down, the parameter information collection of described first parameter preset set and the parameter information collection of described second parameter preset set can be carried out simultaneously, also can carry out successively, the present invention does not limit this, specifically depends on the circumstances.
On the basis of above-mentioned any embodiment, in one embodiment of the invention, this system also comprises:
Memory module 400, for storing the information of each parameter in the information of each parameter in the first parameter preset set of described collection and the second parameter preset set, so that operations staff's inquiry of described information system and calling.
It should be noted that, on the basis of above-described embodiment, in one embodiment of the invention, described memory module 400 is also for sorting out the information of each parameter in the information of each parameter in the first parameter preset set of described collection and the second parameter preset set, arrange, and then classification stores.Concrete sorting technique can be determined according to its fault scenes type, and also can be convenient to the method for inquiry or analysis according to other and determine, the present invention limit this, specifically depends on the circumstances.
In sum, the information system fault scenes Information Collection System that the embodiment of the present invention provides, to be gathered each parameter information in the first parameter preset set of described information system with prefixed time interval by described first acquisition module 100, and when the running status of described information system breaks down, automatically the information of each parameter in the second parameter preset set of described information system is gathered by described second acquisition module 300, and no longer need manual intervention and wait for, not only while the artificial workload of minimizing, can guarantee that the very first time occurred in fault carries out the mobile phone of fault scenes information, ensure that the ageing and comprehensive of fault scenes information, follow-up failure problems to be analyzed to meet, the demand of location, also avoid in case of emergency people is the risk of misoperation.
In this specification, various piece adopts the mode of going forward one by one to describe, and what each some importance illustrated is the difference with other parts, between various piece identical similar portion mutually see.
To the above-mentioned explanation of the disclosed embodiments, professional and technical personnel in the field are realized or uses the present invention.To be apparent for those skilled in the art to the multiple amendment of these embodiments, General Principle as defined herein can without departing from the spirit or scope of the present invention, realize in other embodiments.Therefore, the present invention can not be restricted to embodiment illustrated herein, but will meet the widest scope consistent with principle disclosed herein and features of novelty.

Claims (10)

1. an information system fault scenes formation gathering method, is characterized in that, comprising:
With each parameter information in the first parameter preset set of information system described in prefixed time interval timing acquiring;
The running status of described information system is monitored, judges whether the running status of described information system breaks down;
When the running status of described information system breaks down, gather the information of each parameter in the second parameter preset set of described information system.
2. formation gathering method according to claim 1, is characterized in that, when the running status of described information system breaks down, the information gathering each parameter in the second parameter preset set of described information system comprises:
When the running status of described information system breaks down, described fault message is mated with presupposed information, obtain the preset failure scene type that this fault message is corresponding;
The fault scenes type corresponding according to described fault message, gathers the information of each parameter in the second parameter preset set corresponding to this fault scenes type.
3. formation gathering method according to claim 2, it is characterized in that, the method also comprises: mated with presupposed information by described fault message, when not obtaining the preset failure scene type corresponding with this fault message, increases fault scenes type corresponding with it.
4. formation gathering method according to claim 1, is characterized in that, the span of described prefixed time interval is 1min-5min, comprises endpoint value.
5. formation gathering method according to claim 1, is characterized in that, the method also comprises:
The information of each parameter in the information of each parameter in the first parameter preset set of described collection and the second parameter preset set is stored.
6. an information system fault scenes Information Collection System, be applied to the information system fault scenes formation gathering method described in any one of claim 1-5, it is characterized in that, this system comprises:
First acquisition module, with each parameter information in the first parameter preset set of information system described in prefixed time interval timing acquiring;
Monitoring module, for monitoring the running status of described information system, judges whether the running status of described information system breaks down;
Second acquisition module, for when the running status of described information system breaks down, gathers the information of each parameter in the second parameter preset set of described information system.
7. system according to claim 6, is characterized in that, described second acquisition module comprises:
Information matching unit, for when the running status of described information system breaks down, described fault message is mated with presupposed information, obtain the preset failure scene type that this fault message is corresponding, wherein, the presupposed information of various faults scene type and correspondence thereof is previously provided with in described information matching unit;
Information collection unit, for the fault scenes type corresponding according to described fault message, gathers the information of each parameter in the second parameter preset set corresponding to this fault scenes type.
8. system according to claim 7, is characterized in that, described second acquisition module also comprises:
Information Tip element, for described fault message is mated with presupposed information, when not obtaining the preset failure scene type corresponding with this fault message, sending information, there is not fault scenes type corresponding with it for pointing out in current system in described information.
9. system according to claim 8, is characterized in that, described second acquisition module also comprises:
Information supplementary units, for breaking down when described information system, and when there is not fault scenes type corresponding with it in current system, collects the information of current failure, gathers and be organized into its corresponding fault scenes type, be increased in described information matching unit.
10. system according to claim 6, is characterized in that, this system also comprises:
Memory module, for storing the information of each parameter in the information of each parameter in the first parameter preset set of described collection and the second parameter preset set.
CN201510763286.3A 2015-11-10 2015-11-10 Information system fault scenes formation gathering method and system Active CN105306272B (en)

Priority Applications (5)

Application Number Priority Date Filing Date Title
CN201510763286.3A CN105306272B (en) 2015-11-10 2015-11-10 Information system fault scenes formation gathering method and system
EP15893168.3A EP3148116B1 (en) 2015-11-10 2015-12-25 Information system fault scenario information collection method and system
PCT/CN2015/098824 WO2016188100A1 (en) 2015-11-10 2015-12-25 Information system fault scenario information collection method and system
ES15893168.3T ES2687384T3 (en) 2015-11-10 2015-12-25 Method and system for collecting fault scenario information for an information system
US15/388,865 US10545807B2 (en) 2015-11-10 2016-12-22 Method and system for acquiring parameter sets at a preset time interval and matching parameters to obtain a fault scenario type

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201510763286.3A CN105306272B (en) 2015-11-10 2015-11-10 Information system fault scenes formation gathering method and system

Publications (2)

Publication Number Publication Date
CN105306272A true CN105306272A (en) 2016-02-03
CN105306272B CN105306272B (en) 2019-01-25

Family

ID=55203056

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201510763286.3A Active CN105306272B (en) 2015-11-10 2015-11-10 Information system fault scenes formation gathering method and system

Country Status (5)

Country Link
US (1) US10545807B2 (en)
EP (1) EP3148116B1 (en)
CN (1) CN105306272B (en)
ES (1) ES2687384T3 (en)
WO (1) WO2016188100A1 (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107704392A (en) * 2017-09-30 2018-02-16 华为技术有限公司 The processing method and server of a kind of test case
WO2019214010A1 (en) * 2018-05-08 2019-11-14 网宿科技股份有限公司 Method and device for monitoring for equipment failure
CN111130941A (en) * 2019-12-26 2020-05-08 口碑(上海)信息技术有限公司 Network error detection method and device
CN112041820A (en) * 2018-03-20 2020-12-04 奥普塔姆软件股份有限公司 Automatic root cause analysis based on matching sets
CN113535506A (en) * 2020-04-21 2021-10-22 上海际链网络科技有限公司 Service system monitoring method and device, storage medium and computer equipment

Families Citing this family (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108880850A (en) * 2017-10-26 2018-11-23 北京视联动力国际信息技术有限公司 A kind of method and apparatus regarding networked terminals fault detection
CN108803545A (en) * 2018-08-10 2018-11-13 北京天安智慧信息技术有限公司 Multi-parameter conjoint analysis alarm method and system
CN109460432B (en) * 2018-11-14 2020-06-26 腾讯科技(深圳)有限公司 Data processing method and system
CN109672565A (en) * 2018-12-29 2019-04-23 北京奇安信科技有限公司 Data stream monitoring method, device, equipment and medium
CN110191011A (en) * 2019-04-15 2019-08-30 厦门科灿信息技术有限公司 Smart machine monitoring method, device and equipment based on data center's monitoring system
CN110366031B (en) * 2019-07-24 2021-11-26 长春融成智能设备制造股份有限公司 Vision-based abnormal state monitoring and fault diagnosis method for MES (manufacturing execution system) of digital workshop
CN111475370A (en) * 2020-03-06 2020-07-31 平安科技(深圳)有限公司 Operation and maintenance monitoring method, device and equipment based on data center and storage medium

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101556679A (en) * 2009-05-21 2009-10-14 中国建设银行股份有限公司 Method for processing failures in integrated front-end system and computer equipment
US20100204960A1 (en) * 2009-02-06 2010-08-12 Wireless Data Procurement Remote fault detection and condition monitoring
US20100318846A1 (en) * 2009-06-16 2010-12-16 International Business Machines Corporation System and method for incident management enhanced with problem classification for technical support services
EP1733506B1 (en) * 2004-03-18 2012-08-15 ADVA AG Optical Networking Fault management in an ethernet based communication system
CN103368973A (en) * 2013-07-25 2013-10-23 浪潮(北京)电子信息产业有限公司 Safety system for cloud operating system
CN103368771A (en) * 2013-06-24 2013-10-23 华为技术有限公司 Collecting method and device for fault site information of multi-node server system
CN103428026A (en) * 2012-05-14 2013-12-04 国际商业机器公司 Method and system for problem determination and diagnosis in shared dynamic clouds
US20140075239A1 (en) * 2012-09-07 2014-03-13 Oracle International Corporation Failure handling in the execution flow of provisioning operations in a cloud environment
US20150149541A1 (en) * 2013-11-26 2015-05-28 International Business Machines Corporation Leveraging Social Media to Assist in Troubleshooting

Family Cites Families (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US4922491A (en) * 1988-08-31 1990-05-01 International Business Machines Corporation Input/output device service alert function
US20020152305A1 (en) * 2000-03-03 2002-10-17 Jackson Gregory J. Systems and methods for resource utilization analysis in information management environments
US7681195B2 (en) * 2004-04-02 2010-03-16 International Business Machines Corporation System, method, and service for efficient allocation of computing resources among users
US7561877B2 (en) * 2005-03-18 2009-07-14 Qualcomm Incorporated Apparatus and methods for managing malfunctions on a wireless device
US20080010531A1 (en) * 2006-06-12 2008-01-10 Mks Instruments, Inc. Classifying faults associated with a manufacturing process
US7509539B1 (en) * 2008-05-28 2009-03-24 International Business Machines Corporation Method for determining correlation of synchronized event logs corresponding to abnormal program termination
US8307435B1 (en) * 2010-02-18 2012-11-06 Symantec Corporation Software object corruption detection
BR112013033917A2 (en) * 2011-06-28 2019-09-24 Visual Physics Llc optical film laminate weak or null winding paper
US8996916B2 (en) * 2011-08-16 2015-03-31 Future Dial, Inc. System and method for identifying problems via a monitoring application that repetitively records multiple separate consecutive files listing launched or installed applications
US20130297603A1 (en) * 2012-05-01 2013-11-07 Fujitsu Technology Solutions Intellectual Property Gmbh Monitoring methods and systems for data centers
CN103929320A (en) * 2013-01-15 2014-07-16 中国银联股份有限公司 Integration platform for IT system disaster recovery
CN104731664A (en) * 2013-12-23 2015-06-24 伊姆西公司 Method and device for processing faults

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP1733506B1 (en) * 2004-03-18 2012-08-15 ADVA AG Optical Networking Fault management in an ethernet based communication system
US20100204960A1 (en) * 2009-02-06 2010-08-12 Wireless Data Procurement Remote fault detection and condition monitoring
CN101556679A (en) * 2009-05-21 2009-10-14 中国建设银行股份有限公司 Method for processing failures in integrated front-end system and computer equipment
US20100318846A1 (en) * 2009-06-16 2010-12-16 International Business Machines Corporation System and method for incident management enhanced with problem classification for technical support services
CN103428026A (en) * 2012-05-14 2013-12-04 国际商业机器公司 Method and system for problem determination and diagnosis in shared dynamic clouds
US20140075239A1 (en) * 2012-09-07 2014-03-13 Oracle International Corporation Failure handling in the execution flow of provisioning operations in a cloud environment
CN103368771A (en) * 2013-06-24 2013-10-23 华为技术有限公司 Collecting method and device for fault site information of multi-node server system
CN103368973A (en) * 2013-07-25 2013-10-23 浪潮(北京)电子信息产业有限公司 Safety system for cloud operating system
US20150149541A1 (en) * 2013-11-26 2015-05-28 International Business Machines Corporation Leveraging Social Media to Assist in Troubleshooting

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
GARY G.YEN等: "Improving the performance of globalized dual heuristic programmingfor fault tolerant control through an online learning supervisor", 《IEEE TRANSACTIONS ON AUTOMATION SCIENCE AND ENGINEERING》 *

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107704392A (en) * 2017-09-30 2018-02-16 华为技术有限公司 The processing method and server of a kind of test case
CN107704392B (en) * 2017-09-30 2021-05-18 华为技术有限公司 Test case processing method and server
CN112041820A (en) * 2018-03-20 2020-12-04 奥普塔姆软件股份有限公司 Automatic root cause analysis based on matching sets
WO2019214010A1 (en) * 2018-05-08 2019-11-14 网宿科技股份有限公司 Method and device for monitoring for equipment failure
CN111130941A (en) * 2019-12-26 2020-05-08 口碑(上海)信息技术有限公司 Network error detection method and device
CN113535506A (en) * 2020-04-21 2021-10-22 上海际链网络科技有限公司 Service system monitoring method and device, storage medium and computer equipment

Also Published As

Publication number Publication date
EP3148116B1 (en) 2018-09-05
EP3148116A1 (en) 2017-03-29
ES2687384T3 (en) 2018-10-24
CN105306272B (en) 2019-01-25
US10545807B2 (en) 2020-01-28
US20170132063A1 (en) 2017-05-11
WO2016188100A1 (en) 2016-12-01
EP3148116A4 (en) 2017-05-03

Similar Documents

Publication Publication Date Title
CN105306272A (en) Method and system for collecting fault scene information of information system
CN101197621B (en) Method and system for remote diagnosing and locating failure of network management system
CN102355368B (en) Fault processing method of network equipment and system
CN108234170B (en) Monitoring method and device for server cluster
CN103607297B (en) Fault processing method of computer cluster system
CN103200050B (en) The hardware state monitoring method and system of server
CN107508722B (en) Service monitoring method and device
CN105337765A (en) Distributed hadoop cluster fault automatic diagnosis and restoration system
CN106331098A (en) Server cluster system
CN106789306B (en) Method and system for detecting, collecting and recovering software fault of communication equipment
CN103699063B (en) The harvester of off-line data and method in a kind of Manufacturing Executive System MES
CN105165054A (en) Method for processing network service faults, service management system and system management module
CN105302661A (en) System and method for implementing virtualization management platform high availability
CN104301136A (en) Method and equipment for reporting and processing fault information
CN101556679A (en) Method for processing failures in integrated front-end system and computer equipment
CN102479113A (en) Abnormal self-adapting processing method and system
CN103425645A (en) Monitoring system and monitoring method for single point of failure of database cluster
CN104038373A (en) Information early warning and self repairing system and method
CN105554074A (en) NAS resource monitoring system and monitoring method based on RPC communication
CN109284294A (en) Acquire method and device, the storage medium, processor of data
CN106294795A (en) A kind of data base's changing method and system
CN107204868B (en) Task operation monitoring information acquisition method and device
CN105025179A (en) Method and system for monitoring service agents of call center
CN104506939A (en) Information reporting method and television terminal
CN111162938A (en) Data processing system and method

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant