CN108737182A - The processing method and system of system exception - Google Patents

The processing method and system of system exception Download PDF

Info

Publication number
CN108737182A
CN108737182A CN201810496049.9A CN201810496049A CN108737182A CN 108737182 A CN108737182 A CN 108737182A CN 201810496049 A CN201810496049 A CN 201810496049A CN 108737182 A CN108737182 A CN 108737182A
Authority
CN
China
Prior art keywords
communication ends
monitoring node
processing
central server
operation information
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201810496049.9A
Other languages
Chinese (zh)
Inventor
陈天豪
杨海勇
谢晓华
袁少雄
金鑫
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Ping An Technology Shenzhen Co Ltd
Original Assignee
Ping An Technology Shenzhen Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Ping An Technology Shenzhen Co Ltd filed Critical Ping An Technology Shenzhen Co Ltd
Priority to CN201810496049.9A priority Critical patent/CN108737182A/en
Priority to PCT/CN2018/093707 priority patent/WO2019223062A1/en
Publication of CN108737182A publication Critical patent/CN108737182A/en
Pending legal-status Critical Current

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/06Management of faults, events, alarms or notifications
    • H04L41/0654Management of faults, events, alarms or notifications using network fault recovery
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/06Management of faults, events, alarms or notifications
    • H04L41/0631Management of faults, events, alarms or notifications using root cause analysis; using analysis of correlation between notifications, alarms or events based on decision criteria, e.g. hierarchy, tree or time analysis
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/06Management of faults, events, alarms or notifications
    • H04L41/0654Management of faults, events, alarms or notifications using network fault recovery
    • H04L41/0659Management of faults, events, alarms or notifications using network fault recovery by isolating or reconfiguring faulty entities
    • H04L41/0661Management of faults, events, alarms or notifications using network fault recovery by isolating or reconfiguring faulty entities by reconfiguring faulty entities

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Data Exchanges In Wide-Area Networks (AREA)
  • Debugging And Monitoring (AREA)

Abstract

The invention belongs to field of computer technology more particularly to the processing method and system of system exception.The method includes:Communication ends acquire its system operation information in real time, and by system operation information reporting to monitoring node;It monitors node and alarm email is generated according to system operation information, and the alarm email is sent to the central server, the communication ends for system exception occur and the operation data for indicating the communication ends system exception are had recorded in the alarm email;The operation data in the alarm email is exported into preset processing scheme database and is matched by the central server, is obtained and the matched processing script of the operation data;The central server by the processing script push to it is described there are the communication ends of system exception, the processing script is executed automatically after being received by the communication ends for system exception occur, abnormal for processing system.The present invention realizes the system O&M of automation, while also ensuring the timeliness of system O&M.

Description

The processing method and system of system exception
Technical field
The invention belongs to field of computer technology more particularly to the processing methods and system of system exception.
Background technology
With the continuous development of network technology, the network equipments such as server, gateway are come into operation on a large scale, network Capacity and topological complexity are all increasing, and it is inevitably each in the process of running which results in network systems Kind system exception.
At this stage, mainly system is monitored by monitoring tools, once there is system exception, typically with mail Or warning information is passed to relevant operation maintenance personnel by the mode of phone, then system exception is handled by operation maintenance personnel.So And many system failures repeat, and there are identical processing method, existing system exception processing mode can cause Work that is a large amount of cumbersome and repeating generates, and reduces the O&M efficiency of system.
Invention content
In view of this, an embodiment of the present invention provides the processing method of system exception and system, set with solving current network The standby low problem of O&M efficiency when there is system exception.
The first aspect of the embodiment of the present invention provides the processing method of system exception, the treating method comprises:
Communication ends acquire its system operation information in real time, and by the system operation information reporting to monitoring node;
The monitoring node generates alarm email according to the system operation information, and the alarm email is sent to Central server has recorded the communication ends for system exception occur in the alarm email and for indicating the communication ends system exception Operation data;
The central server exports the operation data in the alarm email to preset processing scheme data It is matched, is obtained and the matched processing script of the operation data in library;
The central server, which pushes to the processing script, described there is the communication ends of system exception, the processing foot This is executed automatically after being received by the communication ends for system exception occur, abnormal for processing system.
The second aspect of the embodiment of the present invention provides a kind of processing system of system exception, including central server and Multiple communication ends of distributed deployment and multiple monitoring nodes,
The communication ends for acquiring its system operation information in real time, and by the system operation information reporting to the prison Control node;
The monitoring node is used to generate alarm email according to the system operation information, and the alarm email is sent The communication ends for system exception occur are had recorded to the central server, in the alarm email and for indicating the communication ends system The operation data for exception of uniting;
The central server is for exporting the operation data in the alarm email to preset processing scheme It is matched, is obtained and the matched processing script of the operation data in database;
The central server be additionally operable to by the processing script push to it is described there are the communication ends of system exception, it is described Processing script executes automatically after being received by the communication ends for system exception occur, abnormal for processing system.
Existing advantageous effect is the embodiment of the present invention compared with prior art:The embodiment of the present invention is in the middle part of existing network Central server is affixed one's name to, and the multiple monitoring nodes of distributed deployment, original communication ends acquire its system fortune in real time in network Row information, and by system operation information reporting to node is monitored, monitoring node is according to system operation information, to there is system exception Communication ends generate and alarm email and be sent to central server so that central server is in preset processing scheme database Corresponding processing script is matched, and pushes to communication ends and automatically processes.From there is system exception to recovery system exception, entirely Process is automatically performed between communication ends, monitoring node and central server, realizes the system O&M of automation, while also protecting The timeliness for having demonstrate,proved system O&M saves time and the energy of operation maintenance personnel.
Description of the drawings
It to describe the technical solutions in the embodiments of the present invention more clearly, below will be to embodiment or description of the prior art Needed in attached drawing be briefly described, it should be apparent that, the accompanying drawings in the following description be only the present invention some Embodiment for those of ordinary skill in the art without having to pay creative labor, can also be according to these Attached drawing obtains other attached drawings.
Fig. 1 is the network topology architecture schematic diagram of the processing system of system exception provided in an embodiment of the present invention;
Fig. 2 is the interaction diagrams of the processing method of system exception provided in an embodiment of the present invention;
Fig. 3 is the implementation flow chart of the processing method for the system exception that another embodiment of the present invention provides;
Fig. 4 is the implementation flow chart of the processing method for the system exception that another embodiment of the present invention provides;
Fig. 5 is the implementation flow chart of the processing method for the system exception that another embodiment of the present invention provides;
Fig. 6 is the implementation flow chart of the processing method for the system exception that another embodiment of the present invention provides;
Fig. 7 is the implementation flow chart of the processing method for the system exception that another embodiment of the present invention provides;
Fig. 8 is the implementation flow chart of the processing method for the system exception that another embodiment of the present invention provides;
Fig. 9 is the interaction diagrams of the processing method for the system exception that another embodiment of the present invention provides;
Figure 10 is the schematic diagram for the network node that one embodiment of the invention provides.
Specific implementation mode
In order to make the invention's purpose, features and advantages of the invention more obvious and easy to understand, below in conjunction with the present invention Attached drawing in embodiment, technical scheme in the embodiment of the invention is clearly and completely described, it is clear that disclosed below Embodiment be only a part of the embodiment of the present invention, and not all embodiment.Based on the embodiments of the present invention, this field All other embodiment that those of ordinary skill is obtained without making creative work, belongs to protection of the present invention Range.
Fig. 1 is the network topology architecture schematic diagram of the processing system of system exception provided in an embodiment of the present invention, in order to just In explanation, only the parts related to this embodiment are shown.
Referring to Fig.1, which is deployed with central server in a network, and distributed deployment has multiple communication ends and multiple Monitor node.
Wherein, communication ends can be original each network node just disposed in a network, for example, server, gateway, The terminal devices such as the network equipments and computer, intelligent appliance, smart mobile phone such as routing.In embodiments of the present invention, to communication It is installed in end and the system run carries out O&M, once there is system exception, then system based on the embodiment of the present invention is different Normal processing method restores the system exception automatically.For communication ends, adopted in real time in system operation Collecting system operation information, and by system operation information reporting to monitoring node.
Central server with monitoring node for realized in the embodiment of the present invention to system exception it is automatic restore and portion The equipment being deployed in network.Monitoring Node distribution formula is deployed in network, and equipment form can be to have higher data processing The server of ability, monitoring node generates alarm email according to the system operation information that communication ends report, and alarm email is sent out It send to central server, in alarm email, has recorded the communication ends for system exception occur and for indicating that the communication system is different Normal operation data.In embodiments of the present invention, one or more communication ends, Mei Gejian can be disposed under a monitoring node The running situation that control node is responsible for the communication ends to being deployed under it is monitored.Central server is a network area It is interior to be only arranged one, and central server can simultaneously with all monitoring nodes and communication ends disposed in the network area Communication.Processing scheme database is provided on central server, after receiving the alarm email that reports of monitoring node, in it is genuinely convinced Business device, which exports the operation data in alarm email into processing scheme database, to be matched, and corresponding processing script is obtained, And processing script is pushed into the communication ends for system exception occur.Communication ends execute at this automatically after receiving processing script Script is managed, to realize the automatic recovery to system exception.
On the basis of embodiment shown in Fig. 1, further, each node that monitors runs letter with to its reporting system The communication ends of breath are located under same gateway, in order to monitor the system operation letter that node timely and accurately gets communication ends Breath, also, the monitoring node under its system operation information reporting to same gateway in communication reliability and is communicated speed by communication ends It is more guaranteed in rate, the efficiency that reports of system operation information is also comparatively improved, and be convenient for operation management.
Next, based on present invention embodiment shown in FIG. 1, to the processing side of system exception provided in an embodiment of the present invention Method is described in detail.Fig. 2 shows the interaction flows of the processing method of system exception provided in an embodiment of the present invention, in the friendship In mutual flow, it includes above-mentioned central server, monitoring node and communication ends to be related to interactive communication entity.
As shown in Fig. 2, the processing method of the system exception includes:
S1:The communication ends acquire its system operation information in real time, and by the system operation information reporting to the prison Control node.
In embodiments of the present invention, the program for acquisition system operation information is mounted in communication ends in advance, is communicated The real-time acquisition of system operation information is realized by the program of the advance device in system operation in end.Collected system System operation information include but not limited to system processing business datum, system operation daily record, communication ends basic resource service condition, Communicate client database runnability, performance of middle piece etc..After collecting system operation information, communication ends using timing report or The mode of person's real-time report, by system operation information reporting to monitoring node.
As an embodiment of the present invention, before S1, communication ends, which need to predefine it, needs reporting system to run The monitoring node of information.As shown in Figure 3:
S301:The communication ends obtain monitoring node listing, are had recorded in the system in the monitoring node listing The monitoring node disposed under each gateway and each gateway.
Monitoring node listing is handed down to each communication ends by central server, wherein having recorded each gateway in system and every The monitoring node disposed under a gateway, in monitoring node listing, each gateway and each monitoring node can be with its IP address Form showed.Monitoring node listing is safeguarded by central server, when the content wherein recorded changes, by Central server is re-issued to communication ends, and communication ends are after receiving new monitoring node listing, to the prison being locally stored Control node listing is updated.
S302:The communication ends it is described monitoring node listing in find it where gateway.
As described in foregoing embodiments, it is preferable that each monitoring node and the communication ends to its reporting system operation information Under same gateway, therefore, in the present embodiment, communication ends are searched in monitoring node listing residing for the communication ends first Gateway.
S303:The communication ends are determined as the monitoring node disposed under the gateway found to need to report the system The monitoring node of system operation information.
After finding the gateway residing for communication ends in monitoring node listing, communication ends selection is deployed in appointing under the gateway One monitoring node of meaning, is determined as communication ends by the monitoring node and needs monitoring node to its reporting system operation information.
In the corresponding embodiments of Fig. 3, communication ends by the monitoring node under its system operation information reporting to same gateway, It is more guaranteed on communication reliability and traffic rate, the efficiency that reports of system operation information is also comparatively improved, and just In operation management.
After S303, further, as shown in figure 4, further including:
S304:The communication ends record all monitoring nodes disposed under the gateway that finds.
Communication ends record all monitoring nodes disposed under gateway where it.For example, net where communication ends 5 monitoring nodes are deployed under pass, then in addition to being configured to one of monitoring node to need reporting system operation information It monitors except node, communication ends record other 4 monitoring address of node information, node identifications etc..
S305:If detecting the system operation information reporting failure, the communication ends are under the gateway found Another monitoring node is selected, as the monitoring node for needing to report the system operation information.
Communication ends to monitoring node reporting system operation information after, it will usually receive monitoring node return system Operation information receives successfully response, if not receiving the response within a certain period of time, communication ends give tacit consent to this system operation Information reporting fails, then at this point, communication ends according to the information recorded in S104, select another monitoring to save under gateway where it It puts to carry out system operation information reporting.The corresponding embodiments of Fig. 4 consider monitoring node or the failure of communication link is possible, are Smoothly reporting for system operation information establishes standby and reports mechanism, has effectively ensured the promptness of system O&M.
S2:The monitoring node generates alarm email according to the system operation information, and the alarm email is sent The communication ends for system exception occur are had recorded to the central server, in the alarm email and for indicating the communication ends system The operation data for exception of uniting.
Monitoring node analyzes the system operation information that each communication ends report, and is with monitor that each communication ends are run Program exception or business datum in system is abnormal, and according to monitoring result, and the relevant information based on system exception generates alarm Mail is sent to central server.In alarm email, essentially describe the communication ends for system exception occur device identification or Person's network address, and it is written with the operation data for indicating communication ends system exception.
As an embodiment of the present invention, before S2, monitoring node, which can be predicted, establishes system normal operation model, System operation information is imported the model, thus judge that system is whether normal operation in corresponding communication ends.Such as Fig. 5 institutes Show:
S501:Within a preset period of time, the monitoring node to the system operation information of the different communication ends into Row acquisition.
Monitoring node can acquire the system operation information of each communication ends in a period of time in advance, be stored in operation information collection In, for subsequent modeling analysis.
S502:The monitoring node clusters the collected system operation information, obtains multiple gatherings.
It monitors node and uses clustering algorithm, such as CURE clustering algorithms, collected system operation information is clustered, Multiple gatherings are obtained, the system operation information in each gathering has same or analogous data characteristics.
S503:The monitoring node marks the gathering for showing system normal operation in the multiple gathering.
According to pre-set empirical value, monitoring node marks in multiple gatherings of generation shows system normal operation Gathering, the system operation information in these gatherings, which can symbolize in corresponding communication ends, does not occur system exception.
As an embodiment of the present invention, the realization of S503 is as shown in Figure 6:
S601:The monitoring node arranges the multiple gathering according to the size descending of cluster.
After cluster, the quantity for the system operation information assembled in each gathering is different, therefore first, and monitoring node will Multiple gatherings according to cluster size, i.e., according to the quantity for the system operation information assembled in gathering, by the gathering descending of generation Arrangement.
S602:The monitoring node reads preset scale parameter, and the scale parameter is for showing synchronization system Normal communication ends account for the quantitative proportion of all communication ends.
Scale parameter is determined by empirical value or previous running situation, is used to show that, in synchronization, this to be Normal communication ends of uniting account for the quantitative proportion of all communication ends, namely for showing that the system operation information of system normal operation exists Whole service information concentrates shared ratio.
S603:The monitoring node is based on the preset scale parameter, and the gathering for being arranged in top N is labeled as Gathering for showing system normal operation.
After getting preset scale parameter, monitoring node is according to preset scale parameter, communication ends current The quantity for the system operation information that timing statistics section reports and the system operation information content in each gathering, will be arranged in front N gatherings are labeled as the gathering for showing system normal operation.Wherein, the system operation information content in the gathering of label The sum of be approximately equal to preset scale parameter with the ratio between system operation information summation in all gatherings.
In the corresponding embodiments of Fig. 6, empirically value and clustering algorithm complete the screening to system operation information, therefrom The system operation information for showing system normal operation is determined, at the modeling for subsequent system normal operation model Reason.
S604:The gathering of the monitoring node based on label generates system normal operation model, and the system is normal Moving model is used to judge whether the system operation information that the communication ends report to show described lead to by the monitoring node Believe the system normal operation at end.
For the gathering marked, monitoring node gets system operation information therein, is thus modeled, and system is generated System normal operation model.The system normal operation model can be established based on neural network, and the system operation in gathering is believed Breath is used as input sample, and the system operation situation representated by system operation information, i.e. system are normal or system exception is as output As a result, to carry out model training.The model after training is completed to stay for judging that the system operation information that communication ends report whether can Enough show the system normal operation of the communication ends.
S3:The central server exports the operation data in the alarm email to preset processing scheme number According to being matched in library, obtain and the matched processing script of the operation data.
In embodiments of the present invention, may only timed task be arranged by platform behind in central server, monitored with timing acquisition The alarm email that node generates.In alarm email, for indicating that the operation data of communication ends system exception can be attached with text The form of part is adhered in mail, can also be embodied in the form of message body.The announcement that central server sends monitoring node Relevant content of text is parsed in alert mail, including is segmented to content of text, and system operation can be characterized by finding out The character string of index and the corresponding data for reading the character string, text message is converted into for characterizing system operation situation Tables of data, the key name in the tables of data are the character string that can characterize system performance measure, and key assignments is the correspondence of each character string Data.Wherein, the character string that can characterize system performance measure includes but not limited to server number, server address, system Operating parameter, etc. when abnormal time of origin, system exception description, system exception.
Being created in central server has processing scheme database, processing scheme database at least to be created before executing S3 It builds, as shown in Figure 7:
S701:The central server enters configuration mode.
After configuration mode is triggered, central server shows that a configurable page, operation maintenance personnel can be with to O&M user Processing scheme database is configured on the configurable page.
S702:Under the configuration mode, it is input by user different for describing system that the central server receives O&M Normal characteristic parameter and corresponding processing script.
In configurable page, characteristic parameter and corresponding processing foot of the O&M user input for describing system exception This.For every a kind of system exception, the character string for characterizing system performance measure can correspond to different values respectively, these are different The character string of value can form the characteristic parameter for describing system exception.
S703:The central server stores O&M characteristic parameter input by user with corresponding processing script association, The characteristic parameter is used to be matched with the operation data by the central server.
The feature for describing certain a kind of system exception that central server inputs O&M user in configurable page Parameter is associated storage, so, in parsing alarm email with the processing script for restoring such system exception After operation data for indicating communication ends system exception, by the character string for characterizing system performance measure in operation data It is matched with characteristic parameter for characterizing the corresponding character string number of character string institute of system performance measure, to just It is capable of determining that the type of system exception, and further determines the processing script for handling such system exception.
S4:The central server, which pushes to the processing script, described there is the communication ends of system exception, the place Reason script executes automatically after being received by the communication ends for system exception occur, abnormal for processing system.
Central server manages the processing script for determining to match communication ends system exception in scheme database at which Afterwards, described in alarm email appearance system exception communication ends relevant information, such as communication ends network address, will The processing script pushes to communication ends.Communication ends can execute the processing automatically after the script for receiving central server push Script, to realize the recovery to system exception.
Further, in communication ends, by the priority of setting processing script come the timely recovery of safeguards system exception, such as Shown in Fig. 8:
S801:After the processing script for detecting the central server push, the communication ends automatically create institute State the execution thread of processing script.
In communication ends, the push of script will be handled in advance as trigger condition, that is, once detect that central server pushes Processing script then triggers thread creation action, is automatically localling create the execution thread about the processing script.
S802:The highest priority of the execution thread is arranged in the communication ends, preferentially to execute the processing foot automatically This.
After completing execution thread and creating, communication ends set the priority of the execution thread to highest, so, Other thread priorities that communication ends have created before this are below the priority of the execution thread, and communication ends can run this immediately and hold Line journey, to execute processing script, to realize the timely recovery of system exception.
Further, as an embodiment of the present invention, after having handled system exception, as shown in Figure 9:
S5:The communication ends are after system exception is handled successfully, the execution to the central server feedback processing script Daily record.
Communication ends execute handle script during, implementation procedure is recorded, generates execution journal, and to center The successful result of server feedback system exception processing and execution journal.
S6:The central server is for statistical analysis to the execution journal received every prefixed time interval, The prediction address to system operation situation is generated according to the result of statistical analysis.
According to the execution journal received, the communication ends to being successfully processed system exception record central server, and It is for statistical analysis to the system exception situation of communication ends and corresponding handling result at regular intervals, according to statistical analysis As a result the prediction address to system operation situation is generated, preferably to help operation maintenance personnel to understand the operating condition of system, more preferably Ground improves system function, improves the stability of system.
Based on the restoration methods of system described above exception, the automatic recovery to system exception may be implemented.For example, being There is the exception of thread block in a certain communication ends in system, monitor the system operation information that node is reported by parsing communication ends, really Recognize the communication ends and system exception occur, therefore sends alarm email to central server.Central server carries out alarm email Text resolution, and analysis result importing processing scheme database is matched, to find the place for handling thread block Script is managed, and according to the communication ends address in alarm email, processing script is pushed into communication ends, communication ends execute at this automatically Script is managed, system exception is completed and restores.
It should be understood that the size of the serial number of each step is not meant that the order of the execution order in above-described embodiment, each process Execution sequence should be determined by its function and internal logic, the implementation process without coping with the embodiment of the present invention constitutes any limit It is fixed.
Figure 10 is the schematic diagram for the network node that one embodiment of the invention provides, here, network node can be in Fig. 1 Central server, communication ends or monitoring node.As shown in Figure 10, the network node 10 of the embodiment includes:Processor 100, Memory 101 and it is stored in the computer program 102 that can be run in the memory 101 and on the processor 100.Institute State the restoration methods that above-mentioned each network node corresponding system exception is realized when processor 100 executes the computer program 102 Step in embodiment, such as communication ends, the processor 100 executes step S1 shown in Fig. 2, to monitoring node It says, the processor 100 executes step S2 shown in Fig. 2, and for central server, the processor 100 executes shown in Fig. 2 Step S3 and S4.
Illustratively, the computer program 102 can be divided into one or more module/units, it is one or Multiple module/the units of person are stored in the memory 101, and are executed by the processor 100, to complete the present invention.Institute It can be the series of computation machine program instruction section that can complete specific function, the instruction segment to state one or more module/units For describing implementation procedure of the computer program 102 in its corresponding network node.
The processor 100 can be central processing unit (Central Processing Unit, CPU), can also be Other general processors, digital signal processor (Digital Signal Processor, DSP), application-specific integrated circuit (Application Specific Integrated Circuit, ASIC), ready-made programmable gate array (Field- Programmable Gate Array, FPGA) either other programmable logic device, discrete gate or transistor logic, Discrete hardware components etc..General processor can be microprocessor or the processor can also be any conventional processor Deng.
The memory 101 can be the internal storage unit of corresponding network node, such as the hard disk of communication ends or interior It deposits.The memory 101 can also be the External memory equipment of corresponding network node, such as the plug-in type being equipped in communication ends Hard disk, intelligent memory card (Smart Media Card, SMC), secure digital (Secure Digital, SD) card, flash card (Flash Card) etc..Further, the memory 101 can also both include the internal storage unit of corresponding network node Also include External memory equipment.The memory 101 is used to store its needed for the computer program and the server His program and data.The memory 101 can be also used for temporarily storing the data that has exported or will export.
The present invention realizes all or part of flow in above-described embodiment method, can also be instructed by computer program Relevant hardware is completed, and the computer program can be stored in a computer readable storage medium, the computer program When being executed by processor, it can be achieved that the step of above-mentioned each embodiment of the method.Wherein, the computer program includes computer Program code, the computer program code can be source code form, object identification code form, executable file or certain centres Form etc..The computer-readable medium may include:Can carry the computer program code any entity or device, Recording medium, USB flash disk, mobile hard disk, magnetic disc, CD, computer storage, read-only memory (ROM, Read-Only Memory), random access memory (RAM, Random Access Memory), electric carrier signal, telecommunication signal and software Distribution medium etc..It should be noted that the content that the computer-readable medium includes can be according to making laws in jurisdiction Requirement with patent practice carries out increase and decrease appropriate, such as in certain jurisdictions, according to legislation and patent practice, computer Readable medium does not include electric carrier signal and telecommunication signal.
The embodiment of the present invention deploys central server in existing network, and the multiple monitoring nodes of distributed deployment, Original communication ends acquire its system operation information in real time in network, and system operation information reporting is monitored to node is monitored Node is according to system operation information, to there is the communication ends of system exception generation alarm email and being sent to central server, with So that central server is matched corresponding processing script in preset processing scheme database, and pushes to communication ends and locate automatically Reason.From there is system exception to recovery system exception, whole process is automatic between communication ends, monitoring node and central server It completes, realizes the system O&M of automation, while also ensuring the timeliness of system O&M, save the time of operation maintenance personnel With energy.
Embodiment described above is merely illustrative of the technical solution of the present invention, rather than its limitations;Although with reference to aforementioned reality Applying example, invention is explained in detail, it will be understood by those of ordinary skill in the art that:It still can be to aforementioned each Technical solution recorded in embodiment is modified or equivalent replacement of some of the technical features;And these are changed Or it replaces, the spirit and scope for various embodiments of the present invention technical solution that it does not separate the essence of the corresponding technical solution.

Claims (10)

1. a kind of processing method of system exception, which is characterized in that the treating method comprises:
The real-time acquisition system operation information of communication ends, and by the system operation information reporting to monitoring node;
The monitoring node generates alarm email according to the system operation information, and genuinely convinced during the alarm email is sent to It is engaged in device, has recorded the communication ends for system exception occur and the operation for indicating the communication ends system exception in the alarm email Data;
The central server exports the operation data in the alarm email into preset processing scheme database It is matched, is obtained and the matched processing script of the operation data;
The central server, which pushes to the processing script, described there is the communication ends of system exception, the processing script quilt The communication ends for system exception occur execute automatically after receiving, abnormal for processing system.
2. processing method as described in claim 1, which is characterized in that acquire its system operation letter in real time in the communication ends Breath, and by before the system operation information reporting to the monitoring node, further include:
The communication ends obtain monitoring node listing, had recorded in the monitoring node listing each gateway in the system and The monitoring node disposed under each gateway;
The communication ends it is described monitoring node listing in find it where gateway;
The monitoring node disposed under the gateway found is determined as needing to report the system operation to believe by the communication ends The monitoring node of breath.
3. processing method as described in claim 1, which is characterized in that further include:
The communication ends record all monitoring nodes disposed under the gateway that finds;
If detecting the system operation information reporting failure, the communication ends select another institute under the gateway found Monitoring node is stated, as the monitoring node for needing to report the system operation information.
4. processing method as described in claim 1, which is characterized in that in the monitoring node according to the system operation information Before generating alarm email, further include:
Within a preset period of time, the monitoring node is acquired the system operation information of the different communication ends;
The monitoring node clusters the collected system operation information, obtains multiple gatherings;
The monitoring node marks the gathering for showing system normal operation in the multiple gathering;
The gathering of the monitoring node based on label generates system normal operation model, the system normal operation model quilt The monitoring node is used to judge whether the system operation information that the communication ends report to show the system of the communication ends Normal operation.
5. processing method as claimed in claim 4, which is characterized in that the monitoring node marks use in the multiple gathering In the gathering for showing system normal operation, including:
The monitoring node arranges the multiple gathering according to the size descending of cluster;
The monitoring node reads preset scale parameter, and the scale parameter is for showing that synchronization system normally communicates End accounts for the quantitative proportion of all communication ends;
The monitoring node is based on the preset scale parameter, and the gathering for being arranged in top N is labeled as showing The gathering of system normal operation.
6. processing method as described in claim 1, which is characterized in that the central server will be in the alarm email The characteristic is exported into preset processing scheme database and is matched, and obtains processing corresponding with the characteristic Before script, further include:
The central server enters configuration mode;
Under the configuration mode, the central server receives the O&M feature ginseng input by user for describing system exception Several and corresponding processing script;
The central server stores O&M characteristic parameter input by user with corresponding processing script association, the feature ginseng Number is used to be matched with the operation data by the central server.
7. processing method as described in claim 1, which is characterized in that further include:
The communication ends are after system exception is handled successfully, to the execution journal of the central server feedback processing script;
The central server is for statistical analysis to the execution journal received every prefixed time interval;
The central server generates the prediction address to system operation situation according to the result of statistical analysis.
8. processing method as described in claim 1, which is characterized in that further include:
After the processing script for detecting the central server push, the communication ends automatically create the processing script Execution thread;
The highest priority of the execution thread is arranged in the communication ends, preferentially to execute the processing script automatically.
9. a kind of processing system of system exception, which is characterized in that lead to including central server and the multiple of distributed deployment Believe end and multiple monitoring nodes,
The communication ends save the system operation information reporting to the monitoring for acquiring its system operation information in real time Point;
The monitoring node is used to generate alarm email according to the system operation information, and the alarm email is sent to institute Central server is stated, the communication ends for system exception occur are had recorded in the alarm email and for indicating that the communication end system is different Normal operation data;
The central server is for exporting the operation data in the alarm email to preset processing scheme data It is matched, is obtained and the matched processing script of the operation data in library;
The central server, which is additionally operable to push in the processing script, described there is the communication ends of system exception, the processing Script executes automatically after being received by the communication ends for system exception occur, abnormal for processing system.
10. processing system as claimed in claim 9, which is characterized in that the monitoring node reports the system to transport with to it The communication ends of row information are located under same gateway.
CN201810496049.9A 2018-05-22 2018-05-22 The processing method and system of system exception Pending CN108737182A (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN201810496049.9A CN108737182A (en) 2018-05-22 2018-05-22 The processing method and system of system exception
PCT/CN2018/093707 WO2019223062A1 (en) 2018-05-22 2018-06-29 Method and system for processing system exceptions

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810496049.9A CN108737182A (en) 2018-05-22 2018-05-22 The processing method and system of system exception

Publications (1)

Publication Number Publication Date
CN108737182A true CN108737182A (en) 2018-11-02

Family

ID=63938832

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810496049.9A Pending CN108737182A (en) 2018-05-22 2018-05-22 The processing method and system of system exception

Country Status (2)

Country Link
CN (1) CN108737182A (en)
WO (1) WO2019223062A1 (en)

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109828884A (en) * 2018-12-14 2019-05-31 深圳壹账通智能科技有限公司 Carry additionally service data processing method, system, computer equipment and storage medium
CN111447329A (en) * 2020-03-31 2020-07-24 携程旅游信息技术(上海)有限公司 Method, system, device and medium for monitoring state server in call center
CN111756778A (en) * 2019-03-26 2020-10-09 京东数字科技控股有限公司 Server disk cleaning script pushing method and device and storage medium
WO2020238415A1 (en) * 2019-05-29 2020-12-03 深圳前海微众银行股份有限公司 Method and apparatus for monitoring model training
CN113676356A (en) * 2021-08-27 2021-11-19 创新奇智(青岛)科技有限公司 Alarm information processing method and device, electronic equipment and readable storage medium
CN113747171A (en) * 2021-08-06 2021-12-03 天津津航计算技术研究所 Self-recovery video decoding method
CN114077525A (en) * 2020-08-17 2022-02-22 鸿富锦精密电子(天津)有限公司 Abnormal log processing method and device, terminal equipment, cloud server and system

Families Citing this family (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113495820A (en) * 2020-04-03 2021-10-12 北京沃东天骏信息技术有限公司 Method and device for collecting and processing abnormal information and abnormal monitoring system
CN113765685A (en) * 2020-06-05 2021-12-07 腾讯科技(深圳)有限公司 Abnormity management method and device
CN111915452A (en) * 2020-08-28 2020-11-10 平安国际智慧城市科技股份有限公司 Monitoring system, method and device, monitoring processing equipment and storage medium
CN112214409B (en) * 2020-10-13 2023-11-24 中国工商银行股份有限公司 Operation and maintenance method and device used in test environment
CN112561385A (en) * 2020-12-24 2021-03-26 平安银行股份有限公司 Risk monitoring method and system
CN115225534A (en) * 2022-07-26 2022-10-21 雷沃工程机械集团有限公司 Method for monitoring running state of monitoring server
CN117458722B (en) * 2023-12-26 2024-03-08 西安民为电力科技有限公司 Data monitoring method and system based on electric power energy management system

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101561878A (en) * 2009-05-31 2009-10-21 河海大学 Unsupervised anomaly detection method and system based on improved CURE clustering algorithm
CN103532795A (en) * 2013-10-30 2014-01-22 蓝盾信息安全技术股份有限公司 Monitoring system and method for detecting availability of WEB business system
CN104184819A (en) * 2014-08-29 2014-12-03 城云科技(杭州)有限公司 Multi-hierarchy load balancing cloud resource monitoring method
CN105337765A (en) * 2015-10-10 2016-02-17 上海新炬网络信息技术有限公司 Distributed hadoop cluster fault automatic diagnosis and restoration system
WO2017088681A1 (en) * 2015-11-24 2017-06-01 阿里巴巴集团控股有限公司 Fault handling method and apparatus for gateway device
CN107135156A (en) * 2017-06-07 2017-09-05 努比亚技术有限公司 Call chain collecting method, mobile terminal and computer-readable recording medium

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9928148B2 (en) * 2014-08-21 2018-03-27 Netapp, Inc. Configuration of peered cluster storage environment organized as disaster recovery group
CN104699759B (en) * 2015-02-10 2018-05-15 上海新炬网络信息技术股份有限公司 A kind of data base automatic operation and maintenance method
WO2017044772A1 (en) * 2015-09-09 2017-03-16 Convida Wireless, Llc Methods for enabling context-aware coap messaging
CN105721304A (en) * 2016-04-05 2016-06-29 网宿科技股份有限公司 Adaptive routing adjustment method and system and service device
CN107632918B (en) * 2017-08-30 2020-09-11 中国工商银行股份有限公司 Monitoring system and method for computing storage equipment

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101561878A (en) * 2009-05-31 2009-10-21 河海大学 Unsupervised anomaly detection method and system based on improved CURE clustering algorithm
CN103532795A (en) * 2013-10-30 2014-01-22 蓝盾信息安全技术股份有限公司 Monitoring system and method for detecting availability of WEB business system
CN104184819A (en) * 2014-08-29 2014-12-03 城云科技(杭州)有限公司 Multi-hierarchy load balancing cloud resource monitoring method
CN105337765A (en) * 2015-10-10 2016-02-17 上海新炬网络信息技术有限公司 Distributed hadoop cluster fault automatic diagnosis and restoration system
WO2017088681A1 (en) * 2015-11-24 2017-06-01 阿里巴巴集团控股有限公司 Fault handling method and apparatus for gateway device
CN107135156A (en) * 2017-06-07 2017-09-05 努比亚技术有限公司 Call chain collecting method, mobile terminal and computer-readable recording medium

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109828884A (en) * 2018-12-14 2019-05-31 深圳壹账通智能科技有限公司 Carry additionally service data processing method, system, computer equipment and storage medium
CN111756778A (en) * 2019-03-26 2020-10-09 京东数字科技控股有限公司 Server disk cleaning script pushing method and device and storage medium
WO2020238415A1 (en) * 2019-05-29 2020-12-03 深圳前海微众银行股份有限公司 Method and apparatus for monitoring model training
CN111447329A (en) * 2020-03-31 2020-07-24 携程旅游信息技术(上海)有限公司 Method, system, device and medium for monitoring state server in call center
CN114077525A (en) * 2020-08-17 2022-02-22 鸿富锦精密电子(天津)有限公司 Abnormal log processing method and device, terminal equipment, cloud server and system
CN113747171A (en) * 2021-08-06 2021-12-03 天津津航计算技术研究所 Self-recovery video decoding method
CN113747171B (en) * 2021-08-06 2024-04-19 天津津航计算技术研究所 Self-recovery video decoding method
CN113676356A (en) * 2021-08-27 2021-11-19 创新奇智(青岛)科技有限公司 Alarm information processing method and device, electronic equipment and readable storage medium

Also Published As

Publication number Publication date
WO2019223062A1 (en) 2019-11-28

Similar Documents

Publication Publication Date Title
CN108737182A (en) The processing method and system of system exception
CN105159964B (en) A kind of log monitoring method and system
CN107196804B (en) Alarm centralized monitoring system and method for terminal communication access network of power system
US7043661B2 (en) Topology-based reasoning apparatus for root-cause analysis of network faults
CN106326219B (en) Method, device and system for checking business system data
CN108170580A (en) A kind of rule-based log alarming method, apparatus and system
CN107508722B (en) Service monitoring method and device
CN108880847A (en) A kind of method and device of positioning failure
CN103220173A (en) Alarm monitoring method and alarm monitoring system
CN111162949A (en) Interface monitoring method based on Java byte code embedding technology
CN101925039A (en) Prewarning method and device of billing ticket
CN110224865A (en) A kind of log warning system based on Stream Processing
CN112769605B (en) Heterogeneous multi-cloud operation and maintenance management method and hybrid cloud platform
CN115809183A (en) Method for discovering and disposing information-creating terminal fault based on knowledge graph
CN109345131A (en) A kind of enterprise management condition monitoring method and system
CN106878038A (en) Fault Locating Method and device in a kind of communication network
CN102664760A (en) Alarming method for communication system, equipment and communication system
CN113271224A (en) Node positioning method and device, storage medium and electronic device
CN113505048A (en) Unified monitoring platform based on application system portrait and implementation method
CN104794013B (en) Alignment system running status, the method and device for establishing system running state model
CN104765672A (en) Error code monitoring method, device and equipment
CN114172921A (en) Log auditing method and device for scheduling recording system
CN109818808A (en) Method for diagnosing faults, device and electronic equipment
CN113079186A (en) Industrial network boundary protection method and system based on industrial control terminal feature recognition
CN110609761B (en) Method and device for determining fault source, storage medium and electronic equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20181102

RJ01 Rejection of invention patent application after publication