CN106506196A - Enterprise-level online troubleshooting method and system - Google Patents

Enterprise-level online troubleshooting method and system Download PDF

Info

Publication number
CN106506196A
CN106506196A CN201610911552.7A CN201610911552A CN106506196A CN 106506196 A CN106506196 A CN 106506196A CN 201610911552 A CN201610911552 A CN 201610911552A CN 106506196 A CN106506196 A CN 106506196A
Authority
CN
China
Prior art keywords
enterprise
link
level
surfed
net
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201610911552.7A
Other languages
Chinese (zh)
Inventor
赵银宏
毛建民
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Ctrip Business Co Ltd
Original Assignee
Shanghai Ctrip Business Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Ctrip Business Co Ltd filed Critical Shanghai Ctrip Business Co Ltd
Priority to CN201610911552.7A priority Critical patent/CN106506196A/en
Publication of CN106506196A publication Critical patent/CN106506196A/en
Pending legal-status Critical Current

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/06Management of faults, events, alarms or notifications
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/06Management of faults, events, alarms or notifications
    • H04L41/0654Management of faults, events, alarms or notifications using network fault recovery
    • H04L41/0659Management of faults, events, alarms or notifications using network fault recovery by isolating or reconfiguring faulty entities
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/14Network analysis or design
    • H04L41/145Network analysis or design involving simulating, designing, planning or modelling of a network
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L43/00Arrangements for monitoring or testing data switching networks
    • H04L43/08Monitoring or testing based on specific metrics, e.g. QoS, energy consumption or environmental parameters
    • H04L43/0852Delays
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L43/00Arrangements for monitoring or testing data switching networks
    • H04L43/16Threshold monitoring
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/02Protocols based on web technology, e.g. hypertext transfer protocol [HTTP]
    • H04L67/025Protocols based on web technology, e.g. hypertext transfer protocol [HTTP] for remote control or remote monitoring of applications

Abstract

The troubleshooting method and system the invention discloses a kind of enterprise-level is surfed the Net, enterprise-level online troubleshooting method are comprised the following steps:S1, the upper planar network architecture of combing enterprise, systematic function and service configuration;S2, to system, equipment and flow configuration monitoring alarming threshold value;S3, in units of every computer of enterprise, the access of real-time Simulation user is operated and is detected;S4, to result of detection exceed monitoring alarm threshold values abnormal information be monitored warning.The present invention can be monitored to link, system, equipment and application in time, carry out real-time detection to performance, find out problem ahead of time, it is to avoid affect user;And in the present invention, user's request inquiry troubleshooting efficiency greatly promote, and do not allow error-prone, be not easy to omit;Present invention operation is eased, and detection steps are simplified, even if domestic consumer can also operate, the learning cost of system left-hand seat is reduced.

Description

Enterprise-level online troubleshooting method and system
Technical field
The present invention relates to a kind of network technology, more particularly to a kind of enterprise-level online troubleshooting method and system.
Background technology
In large enterprise, online environment is complicated, is to meet different application and different user, and company provides multiple exit, many The high availability scheme of agency;Each system manager needs the ruuning situation to multiple links, each system and every equipment Constantly control, understand ruuning situation in time, the switching in time that goes wrong meets system stable operation.
Also numerous in the influence factor of application end and user side:Type of service, service priority, all departments employee online row Limit for authority, bandwidth etc., all directly affect the using effect and performance of public network.
Complicated architecture management difficulty of getting up is larger, and the very first time that goes wrong judges that the efficiency of trouble point is also subject to shadow Ring.At present, company is generally all by the way of investigating one by one, and efficiency is low, not easy to operate, be also possible to produce omission.Do not have on the market There is good product to disclosure satisfy that the demand that enterprise need to customize, the good solution party of the quick troubleshooting neither one of online failure Case.
Content of the invention
The technical problem to be solved in the present invention is the quick troubleshooting in order to overcome failure of enterprise-level being surfed the Net in prior art The not defect of a good solution, there is provided a kind of enterprise-level is surfed the Net troubleshooting method and system.
The present invention is to solve above-mentioned technical problem by following technical proposals:
A kind of troubleshooting method the invention provides enterprise-level is surfed the Net, its feature are, comprise the following steps:
S1, the upper planar network architecture of combing enterprise, systematic function and service configuration;
S2, to system, equipment and flow configuration monitoring alarming threshold value;
S3, in units of every computer of enterprise, the access of real-time Simulation user is operated and is detected;
S4, to result of detection exceed monitoring alarm threshold values abnormal information be monitored warning.
It is preferred that step S3Include:In every computer respectively simulation by generic link, optimize routing link and Locality protection link-access key network address, and the delay information occurred during test access;
Step S4Include:Warning is monitored when delay information over-time threshold values is detected.
It is preferred that step S4In also include:
For generic link, optimization routing link is switched to;
For routing link is optimized, other optimization routing links are switched to;
For locality protection link, other locality protection links are switched to.
It is preferred that step S4Also include afterwards:S5, result of detection shown.
It is preferred that the network performance of the generic link, the optimization routing link and the locality protection link is successively Increase.
It is an object of the invention to additionally providing a kind of enterprise-level online troubleshooting system, its feature is, including:
Combing module, for the upper planar network architecture of combing enterprise, systematic function and service configuration;
Configuration module, for system, equipment and flow configuration monitoring alarming threshold value;
Analog module, for every computer by enterprise in units of, the access of real-time Simulation user is operated and is visited Survey;
Alarm module, is monitored warning for exceeding the abnormal information of monitoring alarm threshold values to result of detection.
It is preferred that the analog module is used in every computer simulation respectively by generic link, optimization route chain Road and locality protection link-access key network address, and the delay information occurred during test access;
The alarm module is used for being monitored warning when delay information over-time threshold values is detected.
It is preferred that the alarm module is additionally operable to:
For generic link, optimization routing link is switched to;
For routing link is optimized, other optimization routing links are switched to;
For locality protection link, other locality protection links are switched to.
It is preferred that the enterprise-level online troubleshooting system also includes:
Display module, for showing to result of detection.
It is preferred that the network performance of the generic link, the optimization routing link and the locality protection link is successively Increase.
The positive effect of the present invention is:The present invention can be supervised to link, system, equipment and application in time Control, carries out real-time detection to performance, finds out problem ahead of time, it is to avoid affect user;And in the present invention, user's request inquiry row Barrier efficiency greatly promote, and do not allow error-prone, be not easy to omit;Present invention operation is eased, and detection steps are simplified, even if Domestic consumer can also operate, and reduce the learning cost of system left-hand seat.
Description of the drawings
Module diagrams of the Fig. 1 for the enterprise-level online troubleshooting system of presently preferred embodiments of the present invention.
Flow charts of the Fig. 2 for the enterprise-level online troubleshooting method of presently preferred embodiments of the present invention.
Specific embodiment
The present invention is further illustrated below by the mode of embodiment, but does not therefore limit the present invention to described reality Apply among a scope.
As shown in figure 1, the enterprise-level online troubleshooting system of the present invention includes combing module 1, configuration module 2, analog module 3rd, alarm module 4 and display module 5;
Wherein, the combing module 1 is used for upper planar network architecture, systematic function and the service configuration of combing enterprise;The configuration Module 2 is then to system, equipment and flow configuration monitoring alarm threshold value;The analog module 3 then can be with every of enterprise computer For unit, the access of real-time Simulation user is operated and is detected;Specifically, the analog module 3 divides in every computer Generic link Mo Ni not passed through, optimize routing link and locality protection link-access key network address, and occur during test access Delay information;Wherein, the network performance of the generic link, the optimization routing link and the locality protection link increases successively Plus.
And the alarm module 4 is then used for being monitored report to the abnormal information that result of detection exceedes monitoring alarm threshold values Alert, specifically, the alarm module 4 is used for being monitored warning when delay information over-time threshold values is detected;In addition, institute State alarm module 4 to be additionally operable to:
For generic link, it is switched to optimize routing link;
For routing link is optimized, other optimization routing links are switched to;
For locality protection link, other locality protection links are switched to.
The display module 5 is then used for showing result of detection.
Each important node of service end to client is deployed as visiting by the internet access request of each service part of present invention statistics Measuring point.The monitor node of deployment various dimensions collects related troubleshooting module, by many by visual Web (the Internet) page There is point in the quick failure of screening of dimension detection.
In the present invention, it is achieved that need real-time detection and alarm to system, to multiple exit link, equipment and each generation The performance of reason system is found out in time, and as far as possible before user side is found using problem, timely handling failure is reduced to user and application The impact of system.And in the present invention, instrument is provided to application and user, visually contrast properties, carry out investigation analysis Process, cannot access when target occur in company's application system or enterprise staff online, the access performance problem such as access delay height when, Can be analyzed based on fault message quick detection, fault point, and the very first time carry out troubleshooting process by the system.
The present invention implement during:
(1) for link, high-availability system, equipment and the application being related in system architecture, by zabbix, python Script is write with bash (zabbix, python and bash are scripting tools), fit applications show that control station is matched somebody with somebody Put and show contrast.
(2) affect online factor numerous, each factor High Availabitity framework needs and equipment is numerous, and go wrong needs first Time judges trouble point, on the basis of the clear company of combing on the whole planar network architecture, by B/S (browser/server) mould The each factor function of massing is monitored.Realized using Python plus Django (the Web application frameworks of an open source code).
(3) collecting, analyzing correction data by each system engineer's experience long-time troubleshooting experience, define and meet The normal criterion of each functional status.Make whole system clear to module these experiences and Standard compilation by technological means Clear orderly.Using the detection thinking of various dimensions, quickly to exclude the factor that possible causing trouble occurs.Different user is such as simulated Detect returning result, office online different sensing point returning result, each position of company to same target of each system outlet Put region to be monitored by different agency plant returning result contrasts, to multiple DNS (domain name system) that company is present, parse Result judgement dns service state, each band of position intended application port status judge to judge opposite end application state, pass through route Arbitration functions UNICOM situation etc..
A kind of troubleshooting method present invention also offers enterprise-level is surfed the Net, its utilize above-mentioned enterprise-level online troubleshooting system reality Existing, as shown in Fig. 2 enterprise-level online troubleshooting method is comprised the following steps:
Step 101, the upper planar network architecture of combing enterprise, systematic function and service configuration;
Step 102, to system, equipment and flow configuration monitoring alarming threshold value;
Step 103, in units of every computer of enterprise, the access of real-time Simulation user is operated and is detected;
Step 104, the abnormal information to result of detection more than monitoring alarm threshold values are monitored warning;
Step 105, result of detection is shown.
Wherein, specifically include in step 103:In every computer, simulation is route by generic link, optimization respectively Link and locality protection link-access key network address, and the delay information occurred during test access;Specifically wrap at step 104 Include:Warning is monitored when delay information over-time threshold values is detected;Also, for generic link, switch to optimization road By link;For routing link is optimized, other optimization routing links are switched to;For locality protection link, other are switched to Locality protection link.Specifically, the internetworking of the generic link, the optimization routing link and the locality protection link Can increase successively.
Present invention can also apply to other O&M scenarios, detection instrument is deficient and numerous and diverse, the scattered feelings of troubleshooting means Under condition, can be all integrated in set of system by technological means, alternative command row, with monitoring, performance comparison control Platform and the visualized operation instrument page quickly obtain correlated results.
Although the specific embodiment of the present invention is the foregoing described, it will be appreciated by those of skill in the art that these It is merely illustrative of, protection scope of the present invention is defined by the appended claims.Those skilled in the art is not carrying on the back On the premise of the principle and essence of the present invention, various changes or modifications, but these changes can be made to these embodiments Protection scope of the present invention is each fallen within modification.

Claims (10)

1. a kind of enterprise-level is surfed the Net troubleshooting method, it is characterised in that comprise the following steps:
S1, the upper planar network architecture of combing enterprise, systematic function and service configuration;
S2, to system, equipment and flow configuration monitoring alarming threshold value;
S3, in units of every computer of enterprise, the access of real-time Simulation user is operated and is detected;
S4, to result of detection exceed monitoring alarm threshold values abnormal information be monitored warning.
2. enterprise-level as claimed in claim 1 is surfed the Net troubleshooting method, it is characterised in that step S3Include:In every computer Middle simulation respectively is by generic link, optimization routing link and locality protection link-access key network address, and goes out during test access Existing delay information;
Step S4Include:Warning is monitored when delay information over-time threshold values is detected.
3. enterprise-level as claimed in claim 2 is surfed the Net troubleshooting method, it is characterised in that step S4In also include:
For generic link, optimization routing link is switched to;
For routing link is optimized, other optimization routing links are switched to;
For locality protection link, other locality protection links are switched to.
4. enterprise-level as claimed in claim 1 is surfed the Net troubleshooting method, it is characterised in that step S4Also include afterwards:S5, to visit Survey result to be shown.
5. enterprise-level as claimed in claim 2 is surfed the Net troubleshooting method, it is characterised in that the generic link, the optimization road Increased by the network performance of link and the locality protection link successively.
6. a kind of enterprise-level is surfed the Net troubleshooting system, it is characterised in that include:
Combing module, for the upper planar network architecture of combing enterprise, systematic function and service configuration;
Configuration module, for system, equipment and flow configuration monitoring alarming threshold value;
Analog module, for every computer by enterprise in units of, the access of real-time Simulation user is operated and is detected;
Alarm module, is monitored warning for exceeding the abnormal information of monitoring alarm threshold values to result of detection.
7. enterprise-level as claimed in claim 6 is surfed the Net troubleshooting system, it is characterised in that the analog module is by based on per platform In calculation machine, simulation passes through generic link, optimizes routing link and locality protection link-access key network address respectively, and test access When the delay information that occurs;
The alarm module is used for being monitored warning when delay information over-time threshold values is detected.
8. enterprise-level as claimed in claim 7 is surfed the Net troubleshooting system, it is characterised in that the alarm module is additionally operable to:
For generic link, optimization routing link is switched to;
For routing link is optimized, other optimization routing links are switched to;
For locality protection link, other locality protection links are switched to.
9. enterprise-level as claimed in claim 6 is surfed the Net troubleshooting system, it is characterised in that the enterprise-level online troubleshooting system is also Including:
Display module, for showing to result of detection.
10. enterprise-level as claimed in claim 7 is surfed the Net troubleshooting system, it is characterised in that the generic link, the optimization road Increased by the network performance of link and the locality protection link successively.
CN201610911552.7A 2016-10-19 2016-10-19 Enterprise-level online troubleshooting method and system Pending CN106506196A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201610911552.7A CN106506196A (en) 2016-10-19 2016-10-19 Enterprise-level online troubleshooting method and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201610911552.7A CN106506196A (en) 2016-10-19 2016-10-19 Enterprise-level online troubleshooting method and system

Publications (1)

Publication Number Publication Date
CN106506196A true CN106506196A (en) 2017-03-15

Family

ID=58294826

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201610911552.7A Pending CN106506196A (en) 2016-10-19 2016-10-19 Enterprise-level online troubleshooting method and system

Country Status (1)

Country Link
CN (1) CN106506196A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111464601A (en) * 2020-03-24 2020-07-28 新浪网技术(中国)有限公司 Node service scheduling system and method

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102185709A (en) * 2011-04-22 2011-09-14 赛特斯网络科技(南京)有限责任公司 Integrated network quality of service assurance and management system
CN102223251A (en) * 2011-06-14 2011-10-19 重庆市电力公司江北供电局 Collecting and analyzing method for network operation and maintenance and business processing device
CN104270268A (en) * 2014-09-28 2015-01-07 曙光信息产业股份有限公司 Network performance analysis and fault diagnosis method of distributed system
CN105045700A (en) * 2015-07-08 2015-11-11 国网辽宁省电力有限公司信息通信分公司 Method for monitoring user experience index of application system in real time
CN105471656A (en) * 2015-12-10 2016-04-06 国家电网公司 Abstraction method specific to operation and maintenance information model of intelligent substation automation system

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102185709A (en) * 2011-04-22 2011-09-14 赛特斯网络科技(南京)有限责任公司 Integrated network quality of service assurance and management system
CN102223251A (en) * 2011-06-14 2011-10-19 重庆市电力公司江北供电局 Collecting and analyzing method for network operation and maintenance and business processing device
CN104270268A (en) * 2014-09-28 2015-01-07 曙光信息产业股份有限公司 Network performance analysis and fault diagnosis method of distributed system
CN105045700A (en) * 2015-07-08 2015-11-11 国网辽宁省电力有限公司信息通信分公司 Method for monitoring user experience index of application system in real time
CN105471656A (en) * 2015-12-10 2016-04-06 国家电网公司 Abstraction method specific to operation and maintenance information model of intelligent substation automation system

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111464601A (en) * 2020-03-24 2020-07-28 新浪网技术(中国)有限公司 Node service scheduling system and method

Similar Documents

Publication Publication Date Title
US11228608B2 (en) Vector-based anomaly detection
EP2924579B1 (en) Event correlation
US9413597B2 (en) Method and system for providing aggregated network alarms
US20080244319A1 (en) Method and Apparatus For Detecting Performance, Availability and Content Deviations in Enterprise Software Applications
JP5267736B2 (en) Fault detection apparatus, fault detection method, and program recording medium
US11348023B2 (en) Identifying locations and causes of network faults
US20150033084A1 (en) Organizing network performance metrics into historical anomaly dependency data
CN111162949A (en) Interface monitoring method based on Java byte code embedding technology
CN104796273A (en) Method and device for diagnosing root of network faults
Kavulya et al. Failure diagnosis of complex systems
Wang et al. Efficient alarm behavior analytics for telecom networks
CN105721184A (en) Network link quality monitoring method and apparatus
US20200409831A1 (en) Testing agent for application dependency discovery, reporting, and management tool
US8903997B2 (en) Network analysis
CN105912413A (en) A method and apparatus for analyzing the availability of a system, in particular of a safety critical system
CN108809734A (en) Network alarm root-cause analysis method, system, storage medium and computer equipment
CN105808368A (en) Information security abnormity detection method and system based on random probability distribution
Amaral et al. Inference of network anomaly propagation using spatio-temporal correlation
CN111082998A (en) Architecture system of operation and maintenance monitoring campus convergence layer
US20160162348A1 (en) Automated detection of a system anomaly
CN106506196A (en) Enterprise-level online troubleshooting method and system
Kushnir et al. Causality inference for failures in NFV
EP3735767B1 (en) Method and system for assigning resource failure severity in communication networks
WO2013035266A1 (en) Monitoring device, monitoring method and program
KR101027261B1 (en) Method and System for Detecting Error Based Policy in Process Control Network

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20170315