CN108737182A - The processing method and system of system exception - Google Patents
The processing method and system of system exception Download PDFInfo
- Publication number
- CN108737182A CN108737182A CN201810496049.9A CN201810496049A CN108737182A CN 108737182 A CN108737182 A CN 108737182A CN 201810496049 A CN201810496049 A CN 201810496049A CN 108737182 A CN108737182 A CN 108737182A
- Authority
- CN
- China
- Prior art keywords
- communication ends
- monitoring node
- processing
- central server
- operation information
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L41/00—Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
- H04L41/06—Management of faults, events, alarms or notifications
- H04L41/0654—Management of faults, events, alarms or notifications using network fault recovery
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L41/00—Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
- H04L41/06—Management of faults, events, alarms or notifications
- H04L41/0631—Management of faults, events, alarms or notifications using root cause analysis; using analysis of correlation between notifications, alarms or events based on decision criteria, e.g. hierarchy, tree or time analysis
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L41/00—Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
- H04L41/06—Management of faults, events, alarms or notifications
- H04L41/0654—Management of faults, events, alarms or notifications using network fault recovery
- H04L41/0659—Management of faults, events, alarms or notifications using network fault recovery by isolating or reconfiguring faulty entities
- H04L41/0661—Management of faults, events, alarms or notifications using network fault recovery by isolating or reconfiguring faulty entities by reconfiguring faulty entities
Landscapes
- Engineering & Computer Science (AREA)
- Computer Networks & Wireless Communication (AREA)
- Signal Processing (AREA)
- Data Exchanges In Wide-Area Networks (AREA)
- Debugging And Monitoring (AREA)
Abstract
The invention belongs to field of computer technology more particularly to the processing method and system of system exception.The method includes:Communication ends acquire its system operation information in real time, and by system operation information reporting to monitoring node;It monitors node and alarm email is generated according to system operation information, and the alarm email is sent to the central server, the communication ends for system exception occur and the operation data for indicating the communication ends system exception are had recorded in the alarm email;The operation data in the alarm email is exported into preset processing scheme database and is matched by the central server, is obtained and the matched processing script of the operation data;The central server by the processing script push to it is described there are the communication ends of system exception, the processing script is executed automatically after being received by the communication ends for system exception occur, abnormal for processing system.The present invention realizes the system O&M of automation, while also ensuring the timeliness of system O&M.
Description
Technical field
The invention belongs to field of computer technology more particularly to the processing methods and system of system exception.
Background technology
With the continuous development of network technology, the network equipments such as server, gateway are come into operation on a large scale, network
Capacity and topological complexity are all increasing, and it is inevitably each in the process of running which results in network systems
Kind system exception.
At this stage, mainly system is monitored by monitoring tools, once there is system exception, typically with mail
Or warning information is passed to relevant operation maintenance personnel by the mode of phone, then system exception is handled by operation maintenance personnel.So
And many system failures repeat, and there are identical processing method, existing system exception processing mode can cause
Work that is a large amount of cumbersome and repeating generates, and reduces the O&M efficiency of system.
Invention content
In view of this, an embodiment of the present invention provides the processing method of system exception and system, set with solving current network
The standby low problem of O&M efficiency when there is system exception.
The first aspect of the embodiment of the present invention provides the processing method of system exception, the treating method comprises:
Communication ends acquire its system operation information in real time, and by the system operation information reporting to monitoring node;
The monitoring node generates alarm email according to the system operation information, and the alarm email is sent to
Central server has recorded the communication ends for system exception occur in the alarm email and for indicating the communication ends system exception
Operation data;
The central server exports the operation data in the alarm email to preset processing scheme data
It is matched, is obtained and the matched processing script of the operation data in library;
The central server, which pushes to the processing script, described there is the communication ends of system exception, the processing foot
This is executed automatically after being received by the communication ends for system exception occur, abnormal for processing system.
The second aspect of the embodiment of the present invention provides a kind of processing system of system exception, including central server and
Multiple communication ends of distributed deployment and multiple monitoring nodes,
The communication ends for acquiring its system operation information in real time, and by the system operation information reporting to the prison
Control node;
The monitoring node is used to generate alarm email according to the system operation information, and the alarm email is sent
The communication ends for system exception occur are had recorded to the central server, in the alarm email and for indicating the communication ends system
The operation data for exception of uniting;
The central server is for exporting the operation data in the alarm email to preset processing scheme
It is matched, is obtained and the matched processing script of the operation data in database;
The central server be additionally operable to by the processing script push to it is described there are the communication ends of system exception, it is described
Processing script executes automatically after being received by the communication ends for system exception occur, abnormal for processing system.
Existing advantageous effect is the embodiment of the present invention compared with prior art:The embodiment of the present invention is in the middle part of existing network
Central server is affixed one's name to, and the multiple monitoring nodes of distributed deployment, original communication ends acquire its system fortune in real time in network
Row information, and by system operation information reporting to node is monitored, monitoring node is according to system operation information, to there is system exception
Communication ends generate and alarm email and be sent to central server so that central server is in preset processing scheme database
Corresponding processing script is matched, and pushes to communication ends and automatically processes.From there is system exception to recovery system exception, entirely
Process is automatically performed between communication ends, monitoring node and central server, realizes the system O&M of automation, while also protecting
The timeliness for having demonstrate,proved system O&M saves time and the energy of operation maintenance personnel.
Description of the drawings
It to describe the technical solutions in the embodiments of the present invention more clearly, below will be to embodiment or description of the prior art
Needed in attached drawing be briefly described, it should be apparent that, the accompanying drawings in the following description be only the present invention some
Embodiment for those of ordinary skill in the art without having to pay creative labor, can also be according to these
Attached drawing obtains other attached drawings.
Fig. 1 is the network topology architecture schematic diagram of the processing system of system exception provided in an embodiment of the present invention;
Fig. 2 is the interaction diagrams of the processing method of system exception provided in an embodiment of the present invention;
Fig. 3 is the implementation flow chart of the processing method for the system exception that another embodiment of the present invention provides;
Fig. 4 is the implementation flow chart of the processing method for the system exception that another embodiment of the present invention provides;
Fig. 5 is the implementation flow chart of the processing method for the system exception that another embodiment of the present invention provides;
Fig. 6 is the implementation flow chart of the processing method for the system exception that another embodiment of the present invention provides;
Fig. 7 is the implementation flow chart of the processing method for the system exception that another embodiment of the present invention provides;
Fig. 8 is the implementation flow chart of the processing method for the system exception that another embodiment of the present invention provides;
Fig. 9 is the interaction diagrams of the processing method for the system exception that another embodiment of the present invention provides;
Figure 10 is the schematic diagram for the network node that one embodiment of the invention provides.
Specific implementation mode
In order to make the invention's purpose, features and advantages of the invention more obvious and easy to understand, below in conjunction with the present invention
Attached drawing in embodiment, technical scheme in the embodiment of the invention is clearly and completely described, it is clear that disclosed below
Embodiment be only a part of the embodiment of the present invention, and not all embodiment.Based on the embodiments of the present invention, this field
All other embodiment that those of ordinary skill is obtained without making creative work, belongs to protection of the present invention
Range.
Fig. 1 is the network topology architecture schematic diagram of the processing system of system exception provided in an embodiment of the present invention, in order to just
In explanation, only the parts related to this embodiment are shown.
Referring to Fig.1, which is deployed with central server in a network, and distributed deployment has multiple communication ends and multiple
Monitor node.
Wherein, communication ends can be original each network node just disposed in a network, for example, server, gateway,
The terminal devices such as the network equipments and computer, intelligent appliance, smart mobile phone such as routing.In embodiments of the present invention, to communication
It is installed in end and the system run carries out O&M, once there is system exception, then system based on the embodiment of the present invention is different
Normal processing method restores the system exception automatically.For communication ends, adopted in real time in system operation
Collecting system operation information, and by system operation information reporting to monitoring node.
Central server with monitoring node for realized in the embodiment of the present invention to system exception it is automatic restore and portion
The equipment being deployed in network.Monitoring Node distribution formula is deployed in network, and equipment form can be to have higher data processing
The server of ability, monitoring node generates alarm email according to the system operation information that communication ends report, and alarm email is sent out
It send to central server, in alarm email, has recorded the communication ends for system exception occur and for indicating that the communication system is different
Normal operation data.In embodiments of the present invention, one or more communication ends, Mei Gejian can be disposed under a monitoring node
The running situation that control node is responsible for the communication ends to being deployed under it is monitored.Central server is a network area
It is interior to be only arranged one, and central server can simultaneously with all monitoring nodes and communication ends disposed in the network area
Communication.Processing scheme database is provided on central server, after receiving the alarm email that reports of monitoring node, in it is genuinely convinced
Business device, which exports the operation data in alarm email into processing scheme database, to be matched, and corresponding processing script is obtained,
And processing script is pushed into the communication ends for system exception occur.Communication ends execute at this automatically after receiving processing script
Script is managed, to realize the automatic recovery to system exception.
On the basis of embodiment shown in Fig. 1, further, each node that monitors runs letter with to its reporting system
The communication ends of breath are located under same gateway, in order to monitor the system operation letter that node timely and accurately gets communication ends
Breath, also, the monitoring node under its system operation information reporting to same gateway in communication reliability and is communicated speed by communication ends
It is more guaranteed in rate, the efficiency that reports of system operation information is also comparatively improved, and be convenient for operation management.
Next, based on present invention embodiment shown in FIG. 1, to the processing side of system exception provided in an embodiment of the present invention
Method is described in detail.Fig. 2 shows the interaction flows of the processing method of system exception provided in an embodiment of the present invention, in the friendship
In mutual flow, it includes above-mentioned central server, monitoring node and communication ends to be related to interactive communication entity.
As shown in Fig. 2, the processing method of the system exception includes:
S1:The communication ends acquire its system operation information in real time, and by the system operation information reporting to the prison
Control node.
In embodiments of the present invention, the program for acquisition system operation information is mounted in communication ends in advance, is communicated
The real-time acquisition of system operation information is realized by the program of the advance device in system operation in end.Collected system
System operation information include but not limited to system processing business datum, system operation daily record, communication ends basic resource service condition,
Communicate client database runnability, performance of middle piece etc..After collecting system operation information, communication ends using timing report or
The mode of person's real-time report, by system operation information reporting to monitoring node.
As an embodiment of the present invention, before S1, communication ends, which need to predefine it, needs reporting system to run
The monitoring node of information.As shown in Figure 3:
S301:The communication ends obtain monitoring node listing, are had recorded in the system in the monitoring node listing
The monitoring node disposed under each gateway and each gateway.
Monitoring node listing is handed down to each communication ends by central server, wherein having recorded each gateway in system and every
The monitoring node disposed under a gateway, in monitoring node listing, each gateway and each monitoring node can be with its IP address
Form showed.Monitoring node listing is safeguarded by central server, when the content wherein recorded changes, by
Central server is re-issued to communication ends, and communication ends are after receiving new monitoring node listing, to the prison being locally stored
Control node listing is updated.
S302:The communication ends it is described monitoring node listing in find it where gateway.
As described in foregoing embodiments, it is preferable that each monitoring node and the communication ends to its reporting system operation information
Under same gateway, therefore, in the present embodiment, communication ends are searched in monitoring node listing residing for the communication ends first
Gateway.
S303:The communication ends are determined as the monitoring node disposed under the gateway found to need to report the system
The monitoring node of system operation information.
After finding the gateway residing for communication ends in monitoring node listing, communication ends selection is deployed in appointing under the gateway
One monitoring node of meaning, is determined as communication ends by the monitoring node and needs monitoring node to its reporting system operation information.
In the corresponding embodiments of Fig. 3, communication ends by the monitoring node under its system operation information reporting to same gateway,
It is more guaranteed on communication reliability and traffic rate, the efficiency that reports of system operation information is also comparatively improved, and just
In operation management.
After S303, further, as shown in figure 4, further including:
S304:The communication ends record all monitoring nodes disposed under the gateway that finds.
Communication ends record all monitoring nodes disposed under gateway where it.For example, net where communication ends
5 monitoring nodes are deployed under pass, then in addition to being configured to one of monitoring node to need reporting system operation information
It monitors except node, communication ends record other 4 monitoring address of node information, node identifications etc..
S305:If detecting the system operation information reporting failure, the communication ends are under the gateway found
Another monitoring node is selected, as the monitoring node for needing to report the system operation information.
Communication ends to monitoring node reporting system operation information after, it will usually receive monitoring node return system
Operation information receives successfully response, if not receiving the response within a certain period of time, communication ends give tacit consent to this system operation
Information reporting fails, then at this point, communication ends according to the information recorded in S104, select another monitoring to save under gateway where it
It puts to carry out system operation information reporting.The corresponding embodiments of Fig. 4 consider monitoring node or the failure of communication link is possible, are
Smoothly reporting for system operation information establishes standby and reports mechanism, has effectively ensured the promptness of system O&M.
S2:The monitoring node generates alarm email according to the system operation information, and the alarm email is sent
The communication ends for system exception occur are had recorded to the central server, in the alarm email and for indicating the communication ends system
The operation data for exception of uniting.
Monitoring node analyzes the system operation information that each communication ends report, and is with monitor that each communication ends are run
Program exception or business datum in system is abnormal, and according to monitoring result, and the relevant information based on system exception generates alarm
Mail is sent to central server.In alarm email, essentially describe the communication ends for system exception occur device identification or
Person's network address, and it is written with the operation data for indicating communication ends system exception.
As an embodiment of the present invention, before S2, monitoring node, which can be predicted, establishes system normal operation model,
System operation information is imported the model, thus judge that system is whether normal operation in corresponding communication ends.Such as Fig. 5 institutes
Show:
S501:Within a preset period of time, the monitoring node to the system operation information of the different communication ends into
Row acquisition.
Monitoring node can acquire the system operation information of each communication ends in a period of time in advance, be stored in operation information collection
In, for subsequent modeling analysis.
S502:The monitoring node clusters the collected system operation information, obtains multiple gatherings.
It monitors node and uses clustering algorithm, such as CURE clustering algorithms, collected system operation information is clustered,
Multiple gatherings are obtained, the system operation information in each gathering has same or analogous data characteristics.
S503:The monitoring node marks the gathering for showing system normal operation in the multiple gathering.
According to pre-set empirical value, monitoring node marks in multiple gatherings of generation shows system normal operation
Gathering, the system operation information in these gatherings, which can symbolize in corresponding communication ends, does not occur system exception.
As an embodiment of the present invention, the realization of S503 is as shown in Figure 6:
S601:The monitoring node arranges the multiple gathering according to the size descending of cluster.
After cluster, the quantity for the system operation information assembled in each gathering is different, therefore first, and monitoring node will
Multiple gatherings according to cluster size, i.e., according to the quantity for the system operation information assembled in gathering, by the gathering descending of generation
Arrangement.
S602:The monitoring node reads preset scale parameter, and the scale parameter is for showing synchronization system
Normal communication ends account for the quantitative proportion of all communication ends.
Scale parameter is determined by empirical value or previous running situation, is used to show that, in synchronization, this to be
Normal communication ends of uniting account for the quantitative proportion of all communication ends, namely for showing that the system operation information of system normal operation exists
Whole service information concentrates shared ratio.
S603:The monitoring node is based on the preset scale parameter, and the gathering for being arranged in top N is labeled as
Gathering for showing system normal operation.
After getting preset scale parameter, monitoring node is according to preset scale parameter, communication ends current
The quantity for the system operation information that timing statistics section reports and the system operation information content in each gathering, will be arranged in front
N gatherings are labeled as the gathering for showing system normal operation.Wherein, the system operation information content in the gathering of label
The sum of be approximately equal to preset scale parameter with the ratio between system operation information summation in all gatherings.
In the corresponding embodiments of Fig. 6, empirically value and clustering algorithm complete the screening to system operation information, therefrom
The system operation information for showing system normal operation is determined, at the modeling for subsequent system normal operation model
Reason.
S604:The gathering of the monitoring node based on label generates system normal operation model, and the system is normal
Moving model is used to judge whether the system operation information that the communication ends report to show described lead to by the monitoring node
Believe the system normal operation at end.
For the gathering marked, monitoring node gets system operation information therein, is thus modeled, and system is generated
System normal operation model.The system normal operation model can be established based on neural network, and the system operation in gathering is believed
Breath is used as input sample, and the system operation situation representated by system operation information, i.e. system are normal or system exception is as output
As a result, to carry out model training.The model after training is completed to stay for judging that the system operation information that communication ends report whether can
Enough show the system normal operation of the communication ends.
S3:The central server exports the operation data in the alarm email to preset processing scheme number
According to being matched in library, obtain and the matched processing script of the operation data.
In embodiments of the present invention, may only timed task be arranged by platform behind in central server, monitored with timing acquisition
The alarm email that node generates.In alarm email, for indicating that the operation data of communication ends system exception can be attached with text
The form of part is adhered in mail, can also be embodied in the form of message body.The announcement that central server sends monitoring node
Relevant content of text is parsed in alert mail, including is segmented to content of text, and system operation can be characterized by finding out
The character string of index and the corresponding data for reading the character string, text message is converted into for characterizing system operation situation
Tables of data, the key name in the tables of data are the character string that can characterize system performance measure, and key assignments is the correspondence of each character string
Data.Wherein, the character string that can characterize system performance measure includes but not limited to server number, server address, system
Operating parameter, etc. when abnormal time of origin, system exception description, system exception.
Being created in central server has processing scheme database, processing scheme database at least to be created before executing S3
It builds, as shown in Figure 7:
S701:The central server enters configuration mode.
After configuration mode is triggered, central server shows that a configurable page, operation maintenance personnel can be with to O&M user
Processing scheme database is configured on the configurable page.
S702:Under the configuration mode, it is input by user different for describing system that the central server receives O&M
Normal characteristic parameter and corresponding processing script.
In configurable page, characteristic parameter and corresponding processing foot of the O&M user input for describing system exception
This.For every a kind of system exception, the character string for characterizing system performance measure can correspond to different values respectively, these are different
The character string of value can form the characteristic parameter for describing system exception.
S703:The central server stores O&M characteristic parameter input by user with corresponding processing script association,
The characteristic parameter is used to be matched with the operation data by the central server.
The feature for describing certain a kind of system exception that central server inputs O&M user in configurable page
Parameter is associated storage, so, in parsing alarm email with the processing script for restoring such system exception
After operation data for indicating communication ends system exception, by the character string for characterizing system performance measure in operation data
It is matched with characteristic parameter for characterizing the corresponding character string number of character string institute of system performance measure, to just
It is capable of determining that the type of system exception, and further determines the processing script for handling such system exception.
S4:The central server, which pushes to the processing script, described there is the communication ends of system exception, the place
Reason script executes automatically after being received by the communication ends for system exception occur, abnormal for processing system.
Central server manages the processing script for determining to match communication ends system exception in scheme database at which
Afterwards, described in alarm email appearance system exception communication ends relevant information, such as communication ends network address, will
The processing script pushes to communication ends.Communication ends can execute the processing automatically after the script for receiving central server push
Script, to realize the recovery to system exception.
Further, in communication ends, by the priority of setting processing script come the timely recovery of safeguards system exception, such as
Shown in Fig. 8:
S801:After the processing script for detecting the central server push, the communication ends automatically create institute
State the execution thread of processing script.
In communication ends, the push of script will be handled in advance as trigger condition, that is, once detect that central server pushes
Processing script then triggers thread creation action, is automatically localling create the execution thread about the processing script.
S802:The highest priority of the execution thread is arranged in the communication ends, preferentially to execute the processing foot automatically
This.
After completing execution thread and creating, communication ends set the priority of the execution thread to highest, so,
Other thread priorities that communication ends have created before this are below the priority of the execution thread, and communication ends can run this immediately and hold
Line journey, to execute processing script, to realize the timely recovery of system exception.
Further, as an embodiment of the present invention, after having handled system exception, as shown in Figure 9:
S5:The communication ends are after system exception is handled successfully, the execution to the central server feedback processing script
Daily record.
Communication ends execute handle script during, implementation procedure is recorded, generates execution journal, and to center
The successful result of server feedback system exception processing and execution journal.
S6:The central server is for statistical analysis to the execution journal received every prefixed time interval,
The prediction address to system operation situation is generated according to the result of statistical analysis.
According to the execution journal received, the communication ends to being successfully processed system exception record central server, and
It is for statistical analysis to the system exception situation of communication ends and corresponding handling result at regular intervals, according to statistical analysis
As a result the prediction address to system operation situation is generated, preferably to help operation maintenance personnel to understand the operating condition of system, more preferably
Ground improves system function, improves the stability of system.
Based on the restoration methods of system described above exception, the automatic recovery to system exception may be implemented.For example, being
There is the exception of thread block in a certain communication ends in system, monitor the system operation information that node is reported by parsing communication ends, really
Recognize the communication ends and system exception occur, therefore sends alarm email to central server.Central server carries out alarm email
Text resolution, and analysis result importing processing scheme database is matched, to find the place for handling thread block
Script is managed, and according to the communication ends address in alarm email, processing script is pushed into communication ends, communication ends execute at this automatically
Script is managed, system exception is completed and restores.
It should be understood that the size of the serial number of each step is not meant that the order of the execution order in above-described embodiment, each process
Execution sequence should be determined by its function and internal logic, the implementation process without coping with the embodiment of the present invention constitutes any limit
It is fixed.
Figure 10 is the schematic diagram for the network node that one embodiment of the invention provides, here, network node can be in Fig. 1
Central server, communication ends or monitoring node.As shown in Figure 10, the network node 10 of the embodiment includes:Processor 100,
Memory 101 and it is stored in the computer program 102 that can be run in the memory 101 and on the processor 100.Institute
State the restoration methods that above-mentioned each network node corresponding system exception is realized when processor 100 executes the computer program 102
Step in embodiment, such as communication ends, the processor 100 executes step S1 shown in Fig. 2, to monitoring node
It says, the processor 100 executes step S2 shown in Fig. 2, and for central server, the processor 100 executes shown in Fig. 2
Step S3 and S4.
Illustratively, the computer program 102 can be divided into one or more module/units, it is one or
Multiple module/the units of person are stored in the memory 101, and are executed by the processor 100, to complete the present invention.Institute
It can be the series of computation machine program instruction section that can complete specific function, the instruction segment to state one or more module/units
For describing implementation procedure of the computer program 102 in its corresponding network node.
The processor 100 can be central processing unit (Central Processing Unit, CPU), can also be
Other general processors, digital signal processor (Digital Signal Processor, DSP), application-specific integrated circuit
(Application Specific Integrated Circuit, ASIC), ready-made programmable gate array (Field-
Programmable Gate Array, FPGA) either other programmable logic device, discrete gate or transistor logic,
Discrete hardware components etc..General processor can be microprocessor or the processor can also be any conventional processor
Deng.
The memory 101 can be the internal storage unit of corresponding network node, such as the hard disk of communication ends or interior
It deposits.The memory 101 can also be the External memory equipment of corresponding network node, such as the plug-in type being equipped in communication ends
Hard disk, intelligent memory card (Smart Media Card, SMC), secure digital (Secure Digital, SD) card, flash card
(Flash Card) etc..Further, the memory 101 can also both include the internal storage unit of corresponding network node
Also include External memory equipment.The memory 101 is used to store its needed for the computer program and the server
His program and data.The memory 101 can be also used for temporarily storing the data that has exported or will export.
The present invention realizes all or part of flow in above-described embodiment method, can also be instructed by computer program
Relevant hardware is completed, and the computer program can be stored in a computer readable storage medium, the computer program
When being executed by processor, it can be achieved that the step of above-mentioned each embodiment of the method.Wherein, the computer program includes computer
Program code, the computer program code can be source code form, object identification code form, executable file or certain centres
Form etc..The computer-readable medium may include:Can carry the computer program code any entity or device,
Recording medium, USB flash disk, mobile hard disk, magnetic disc, CD, computer storage, read-only memory (ROM, Read-Only
Memory), random access memory (RAM, Random Access Memory), electric carrier signal, telecommunication signal and software
Distribution medium etc..It should be noted that the content that the computer-readable medium includes can be according to making laws in jurisdiction
Requirement with patent practice carries out increase and decrease appropriate, such as in certain jurisdictions, according to legislation and patent practice, computer
Readable medium does not include electric carrier signal and telecommunication signal.
The embodiment of the present invention deploys central server in existing network, and the multiple monitoring nodes of distributed deployment,
Original communication ends acquire its system operation information in real time in network, and system operation information reporting is monitored to node is monitored
Node is according to system operation information, to there is the communication ends of system exception generation alarm email and being sent to central server, with
So that central server is matched corresponding processing script in preset processing scheme database, and pushes to communication ends and locate automatically
Reason.From there is system exception to recovery system exception, whole process is automatic between communication ends, monitoring node and central server
It completes, realizes the system O&M of automation, while also ensuring the timeliness of system O&M, save the time of operation maintenance personnel
With energy.
Embodiment described above is merely illustrative of the technical solution of the present invention, rather than its limitations;Although with reference to aforementioned reality
Applying example, invention is explained in detail, it will be understood by those of ordinary skill in the art that:It still can be to aforementioned each
Technical solution recorded in embodiment is modified or equivalent replacement of some of the technical features;And these are changed
Or it replaces, the spirit and scope for various embodiments of the present invention technical solution that it does not separate the essence of the corresponding technical solution.
Claims (10)
1. a kind of processing method of system exception, which is characterized in that the treating method comprises:
The real-time acquisition system operation information of communication ends, and by the system operation information reporting to monitoring node;
The monitoring node generates alarm email according to the system operation information, and genuinely convinced during the alarm email is sent to
It is engaged in device, has recorded the communication ends for system exception occur and the operation for indicating the communication ends system exception in the alarm email
Data;
The central server exports the operation data in the alarm email into preset processing scheme database
It is matched, is obtained and the matched processing script of the operation data;
The central server, which pushes to the processing script, described there is the communication ends of system exception, the processing script quilt
The communication ends for system exception occur execute automatically after receiving, abnormal for processing system.
2. processing method as described in claim 1, which is characterized in that acquire its system operation letter in real time in the communication ends
Breath, and by before the system operation information reporting to the monitoring node, further include:
The communication ends obtain monitoring node listing, had recorded in the monitoring node listing each gateway in the system and
The monitoring node disposed under each gateway;
The communication ends it is described monitoring node listing in find it where gateway;
The monitoring node disposed under the gateway found is determined as needing to report the system operation to believe by the communication ends
The monitoring node of breath.
3. processing method as described in claim 1, which is characterized in that further include:
The communication ends record all monitoring nodes disposed under the gateway that finds;
If detecting the system operation information reporting failure, the communication ends select another institute under the gateway found
Monitoring node is stated, as the monitoring node for needing to report the system operation information.
4. processing method as described in claim 1, which is characterized in that in the monitoring node according to the system operation information
Before generating alarm email, further include:
Within a preset period of time, the monitoring node is acquired the system operation information of the different communication ends;
The monitoring node clusters the collected system operation information, obtains multiple gatherings;
The monitoring node marks the gathering for showing system normal operation in the multiple gathering;
The gathering of the monitoring node based on label generates system normal operation model, the system normal operation model quilt
The monitoring node is used to judge whether the system operation information that the communication ends report to show the system of the communication ends
Normal operation.
5. processing method as claimed in claim 4, which is characterized in that the monitoring node marks use in the multiple gathering
In the gathering for showing system normal operation, including:
The monitoring node arranges the multiple gathering according to the size descending of cluster;
The monitoring node reads preset scale parameter, and the scale parameter is for showing that synchronization system normally communicates
End accounts for the quantitative proportion of all communication ends;
The monitoring node is based on the preset scale parameter, and the gathering for being arranged in top N is labeled as showing
The gathering of system normal operation.
6. processing method as described in claim 1, which is characterized in that the central server will be in the alarm email
The characteristic is exported into preset processing scheme database and is matched, and obtains processing corresponding with the characteristic
Before script, further include:
The central server enters configuration mode;
Under the configuration mode, the central server receives the O&M feature ginseng input by user for describing system exception
Several and corresponding processing script;
The central server stores O&M characteristic parameter input by user with corresponding processing script association, the feature ginseng
Number is used to be matched with the operation data by the central server.
7. processing method as described in claim 1, which is characterized in that further include:
The communication ends are after system exception is handled successfully, to the execution journal of the central server feedback processing script;
The central server is for statistical analysis to the execution journal received every prefixed time interval;
The central server generates the prediction address to system operation situation according to the result of statistical analysis.
8. processing method as described in claim 1, which is characterized in that further include:
After the processing script for detecting the central server push, the communication ends automatically create the processing script
Execution thread;
The highest priority of the execution thread is arranged in the communication ends, preferentially to execute the processing script automatically.
9. a kind of processing system of system exception, which is characterized in that lead to including central server and the multiple of distributed deployment
Believe end and multiple monitoring nodes,
The communication ends save the system operation information reporting to the monitoring for acquiring its system operation information in real time
Point;
The monitoring node is used to generate alarm email according to the system operation information, and the alarm email is sent to institute
Central server is stated, the communication ends for system exception occur are had recorded in the alarm email and for indicating that the communication end system is different
Normal operation data;
The central server is for exporting the operation data in the alarm email to preset processing scheme data
It is matched, is obtained and the matched processing script of the operation data in library;
The central server, which is additionally operable to push in the processing script, described there is the communication ends of system exception, the processing
Script executes automatically after being received by the communication ends for system exception occur, abnormal for processing system.
10. processing system as claimed in claim 9, which is characterized in that the monitoring node reports the system to transport with to it
The communication ends of row information are located under same gateway.
Priority Applications (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810496049.9A CN108737182A (en) | 2018-05-22 | 2018-05-22 | The processing method and system of system exception |
PCT/CN2018/093707 WO2019223062A1 (en) | 2018-05-22 | 2018-06-29 | Method and system for processing system exceptions |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810496049.9A CN108737182A (en) | 2018-05-22 | 2018-05-22 | The processing method and system of system exception |
Publications (1)
Publication Number | Publication Date |
---|---|
CN108737182A true CN108737182A (en) | 2018-11-02 |
Family
ID=63938832
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201810496049.9A Pending CN108737182A (en) | 2018-05-22 | 2018-05-22 | The processing method and system of system exception |
Country Status (2)
Country | Link |
---|---|
CN (1) | CN108737182A (en) |
WO (1) | WO2019223062A1 (en) |
Cited By (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109828884A (en) * | 2018-12-14 | 2019-05-31 | 深圳壹账通智能科技有限公司 | Carry additionally service data processing method, system, computer equipment and storage medium |
CN111447329A (en) * | 2020-03-31 | 2020-07-24 | 携程旅游信息技术(上海)有限公司 | Method, system, device and medium for monitoring state server in call center |
CN111756778A (en) * | 2019-03-26 | 2020-10-09 | 京东数字科技控股有限公司 | Server disk cleaning script pushing method and device and storage medium |
WO2020238415A1 (en) * | 2019-05-29 | 2020-12-03 | 深圳前海微众银行股份有限公司 | Method and apparatus for monitoring model training |
CN113676356A (en) * | 2021-08-27 | 2021-11-19 | 创新奇智(青岛)科技有限公司 | Alarm information processing method and device, electronic equipment and readable storage medium |
CN113747171A (en) * | 2021-08-06 | 2021-12-03 | 天津津航计算技术研究所 | Self-recovery video decoding method |
CN114077525A (en) * | 2020-08-17 | 2022-02-22 | 鸿富锦精密电子(天津)有限公司 | Abnormal log processing method and device, terminal equipment, cloud server and system |
Families Citing this family (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113495820A (en) * | 2020-04-03 | 2021-10-12 | 北京沃东天骏信息技术有限公司 | Method and device for collecting and processing abnormal information and abnormal monitoring system |
CN113765685A (en) * | 2020-06-05 | 2021-12-07 | 腾讯科技(深圳)有限公司 | Abnormity management method and device |
CN111915452A (en) * | 2020-08-28 | 2020-11-10 | 平安国际智慧城市科技股份有限公司 | Monitoring system, method and device, monitoring processing equipment and storage medium |
CN112214409B (en) * | 2020-10-13 | 2023-11-24 | 中国工商银行股份有限公司 | Operation and maintenance method and device used in test environment |
CN112561385A (en) * | 2020-12-24 | 2021-03-26 | 平安银行股份有限公司 | Risk monitoring method and system |
CN115225534A (en) * | 2022-07-26 | 2022-10-21 | 雷沃工程机械集团有限公司 | Method for monitoring running state of monitoring server |
CN117458722B (en) * | 2023-12-26 | 2024-03-08 | 西安民为电力科技有限公司 | Data monitoring method and system based on electric power energy management system |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101561878A (en) * | 2009-05-31 | 2009-10-21 | 河海大学 | Unsupervised anomaly detection method and system based on improved CURE clustering algorithm |
CN103532795A (en) * | 2013-10-30 | 2014-01-22 | 蓝盾信息安全技术股份有限公司 | Monitoring system and method for detecting availability of WEB business system |
CN104184819A (en) * | 2014-08-29 | 2014-12-03 | 城云科技(杭州)有限公司 | Multi-hierarchy load balancing cloud resource monitoring method |
CN105337765A (en) * | 2015-10-10 | 2016-02-17 | 上海新炬网络信息技术有限公司 | Distributed hadoop cluster fault automatic diagnosis and restoration system |
WO2017088681A1 (en) * | 2015-11-24 | 2017-06-01 | 阿里巴巴集团控股有限公司 | Fault handling method and apparatus for gateway device |
CN107135156A (en) * | 2017-06-07 | 2017-09-05 | 努比亚技术有限公司 | Call chain collecting method, mobile terminal and computer-readable recording medium |
Family Cites Families (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US9928148B2 (en) * | 2014-08-21 | 2018-03-27 | Netapp, Inc. | Configuration of peered cluster storage environment organized as disaster recovery group |
CN104699759B (en) * | 2015-02-10 | 2018-05-15 | 上海新炬网络信息技术股份有限公司 | A kind of data base automatic operation and maintenance method |
WO2017044772A1 (en) * | 2015-09-09 | 2017-03-16 | Convida Wireless, Llc | Methods for enabling context-aware coap messaging |
CN105721304A (en) * | 2016-04-05 | 2016-06-29 | 网宿科技股份有限公司 | Adaptive routing adjustment method and system and service device |
CN107632918B (en) * | 2017-08-30 | 2020-09-11 | 中国工商银行股份有限公司 | Monitoring system and method for computing storage equipment |
-
2018
- 2018-05-22 CN CN201810496049.9A patent/CN108737182A/en active Pending
- 2018-06-29 WO PCT/CN2018/093707 patent/WO2019223062A1/en active Application Filing
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101561878A (en) * | 2009-05-31 | 2009-10-21 | 河海大学 | Unsupervised anomaly detection method and system based on improved CURE clustering algorithm |
CN103532795A (en) * | 2013-10-30 | 2014-01-22 | 蓝盾信息安全技术股份有限公司 | Monitoring system and method for detecting availability of WEB business system |
CN104184819A (en) * | 2014-08-29 | 2014-12-03 | 城云科技(杭州)有限公司 | Multi-hierarchy load balancing cloud resource monitoring method |
CN105337765A (en) * | 2015-10-10 | 2016-02-17 | 上海新炬网络信息技术有限公司 | Distributed hadoop cluster fault automatic diagnosis and restoration system |
WO2017088681A1 (en) * | 2015-11-24 | 2017-06-01 | 阿里巴巴集团控股有限公司 | Fault handling method and apparatus for gateway device |
CN107135156A (en) * | 2017-06-07 | 2017-09-05 | 努比亚技术有限公司 | Call chain collecting method, mobile terminal and computer-readable recording medium |
Cited By (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109828884A (en) * | 2018-12-14 | 2019-05-31 | 深圳壹账通智能科技有限公司 | Carry additionally service data processing method, system, computer equipment and storage medium |
CN111756778A (en) * | 2019-03-26 | 2020-10-09 | 京东数字科技控股有限公司 | Server disk cleaning script pushing method and device and storage medium |
WO2020238415A1 (en) * | 2019-05-29 | 2020-12-03 | 深圳前海微众银行股份有限公司 | Method and apparatus for monitoring model training |
CN111447329A (en) * | 2020-03-31 | 2020-07-24 | 携程旅游信息技术(上海)有限公司 | Method, system, device and medium for monitoring state server in call center |
CN114077525A (en) * | 2020-08-17 | 2022-02-22 | 鸿富锦精密电子(天津)有限公司 | Abnormal log processing method and device, terminal equipment, cloud server and system |
CN113747171A (en) * | 2021-08-06 | 2021-12-03 | 天津津航计算技术研究所 | Self-recovery video decoding method |
CN113747171B (en) * | 2021-08-06 | 2024-04-19 | 天津津航计算技术研究所 | Self-recovery video decoding method |
CN113676356A (en) * | 2021-08-27 | 2021-11-19 | 创新奇智(青岛)科技有限公司 | Alarm information processing method and device, electronic equipment and readable storage medium |
Also Published As
Publication number | Publication date |
---|---|
WO2019223062A1 (en) | 2019-11-28 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN108737182A (en) | The processing method and system of system exception | |
CN105159964B (en) | A kind of log monitoring method and system | |
CN107196804B (en) | Alarm centralized monitoring system and method for terminal communication access network of power system | |
US7043661B2 (en) | Topology-based reasoning apparatus for root-cause analysis of network faults | |
CN106326219B (en) | Method, device and system for checking business system data | |
CN108170580A (en) | A kind of rule-based log alarming method, apparatus and system | |
CN107508722B (en) | Service monitoring method and device | |
CN108880847A (en) | A kind of method and device of positioning failure | |
CN103220173A (en) | Alarm monitoring method and alarm monitoring system | |
CN111162949A (en) | Interface monitoring method based on Java byte code embedding technology | |
CN101925039A (en) | Prewarning method and device of billing ticket | |
CN110224865A (en) | A kind of log warning system based on Stream Processing | |
CN112769605B (en) | Heterogeneous multi-cloud operation and maintenance management method and hybrid cloud platform | |
CN115809183A (en) | Method for discovering and disposing information-creating terminal fault based on knowledge graph | |
CN109345131A (en) | A kind of enterprise management condition monitoring method and system | |
CN106878038A (en) | Fault Locating Method and device in a kind of communication network | |
CN102664760A (en) | Alarming method for communication system, equipment and communication system | |
CN113271224A (en) | Node positioning method and device, storage medium and electronic device | |
CN113505048A (en) | Unified monitoring platform based on application system portrait and implementation method | |
CN104794013B (en) | Alignment system running status, the method and device for establishing system running state model | |
CN104765672A (en) | Error code monitoring method, device and equipment | |
CN114172921A (en) | Log auditing method and device for scheduling recording system | |
CN109818808A (en) | Method for diagnosing faults, device and electronic equipment | |
CN113079186A (en) | Industrial network boundary protection method and system based on industrial control terminal feature recognition | |
CN110609761B (en) | Method and device for determining fault source, storage medium and electronic equipment |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20181102 |
|
RJ01 | Rejection of invention patent application after publication |