CN105320585A - Method and device for achieving application fault diagnosis - Google Patents
Method and device for achieving application fault diagnosis Download PDFInfo
- Publication number
- CN105320585A CN105320585A CN201410324069.XA CN201410324069A CN105320585A CN 105320585 A CN105320585 A CN 105320585A CN 201410324069 A CN201410324069 A CN 201410324069A CN 105320585 A CN105320585 A CN 105320585A
- Authority
- CN
- China
- Prior art keywords
- data
- application
- service
- diagnosis
- relevant
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Landscapes
- Debugging And Monitoring (AREA)
Abstract
The invention discloses a method and device for achieving application fault diagnosis. The method comprises the steps that multi-dimensional application data are collected; when a service application is abnormal, associated diagnosis data related to service abnormities are obtained for the collected multi-dimensional application data from the time and space associated relation of the service abnormities according to the service abnormity type; the obtained associated diagnosis date related to the service abnormities are compared with historical diagnosis data of the associated diagnosis data to determine the application fault type. Fault diagnosis is carried out on the service application abnormities through the multi-dimensional application data, the problem that a single terminal exists due to the fact that fault diagnosis is carried out through single data is solved, the service fault is determined more comprehensively, and the service abnormity problem is solved.
Description
Technical field
The present invention relates to computer application field, espespecially a kind of method and device realizing application and trouble diagnosis.
Background technology
Along with the development of IT technology application, the miscellaneous service process of carrying out of enterprise more and more closely combines with Internet technology, and the applied information system be made up of server, database, middleware etc. also becomes and becomes increasingly complex.Even if progressively improve the level requirement of technician, but still exist and carry out the more and more difficult problem of failture evacuation.The running quality of service application (ability of finishing service, speed and stability) direct relation enterprise can be supplied to the professional skill of user.Manage the monitoring performance of Mission critical applications, carrying out analyzing and diagnosing for Problems existing in performance supervision timely and effectively, is an urgent demand improving customer service application availability.
At present, mainly the following aspects is comprised to the monitoring performance management of service application: 1, the access situation of application is monitored; 2, when service application generation property abnormality, judge whether because abnormal causing appears in network system performance; 3, when service application generation access exception, judge whether to cause because network or application are subjected to attack.By the diagnosis to service application fault, technician can be effectively helped to carry out the instant recovery of service application.
The fault diagnosis of existing service application mainly carries out fault analysis from single data such as data on flows or monitor datas (such as, applying daily record); Because the data of carrying out Analysis on Fault Diagnosis are single, easily cause the fault diagnosis result obtained to exist unilateral or not enough, this just needs to complete fault diagnosis by how artificial participation.
Summary of the invention
In order to solve the problems of the technologies described above, the invention provides a kind of method and the device that realize application and trouble diagnosis, according to the data of multidimensional, comprehensive diagnostic can be carried out to traffic failure, reduce and artificially participate in.
In order to reach foregoing invention object, the invention discloses a kind of method realizing application and trouble diagnosis, comprising:
Gather multidimensional application data;
When service application occurs abnormal, to the multidimensional application data collected from the relevant diagnosis data related to the Time and place incidence relation of service exception, according to service exception type acquisition service exception;
The relevant diagnosis data related to by the service exception of acquisition, compare with the historical diagnostic data of each relevant diagnosis data respectively, determine application and trouble type.
Further, multidimensional application data comprises: the data on flows of monitor data, service application service device IP and destination address extraction extracted according to service application service device IP and the application performance data of service application service device IP and destination address extraction.
Further, monitor data at least comprises: IP address and/or monitoring period and/or cpu busy percentage and/or disk utilization and/or disk input and output io and/or internal memory relevant information and/or swapace relevant information and/or network interface relevant information and/or database response time and/or use si from the exchange memory that internal memory called in by disk and/or use so and/or the size bo from internal memory write disk and/or the size bi from disk write memory and/or service state from the exchange memory that internal memory calls in disk.
Further, data on flows for by the session of identical five-tuple institute uniquely identified, at least comprises: handshake SYN bag number and/or the code bit field FIN bag number sending TCP header and/or TCP relevant information that acquisition time and/or source/destination address and/or source/destination port and/or agreement and/or send uses when TCP/IP connects and/or send access specified services in RST number and/or unit interval total flow extremely.
Further, application performance data at least comprise: source/destination address and/or destination interface and/or request time and/or server response time and/or the time of loading and/or page relevant information and/or Http relevant information and/or tomcat global access velocity sag and/or in the unit interval database access amount abnormal and/or Weblogic current sessions number is abnormal;
Described application performance data acquisition is in the performance data of http protocol and/or the performance data of ORACLE database service and/or the performance data of MYSQL database server.
Further, the relevant diagnosis data related to by the service exception of acquisition, compare with the historical diagnostic data of each relevant diagnosis data respectively, determine that application and trouble type specifically comprises:
The relevant diagnosis data that the service exception of acquisition is related to, compared by periodicity baseline or moving window baseline with the historical diagnostic data of each relevant diagnosis data respectively, according to the threshold range of each relevant diagnosis data preset, determine application and trouble type.
Further, described historical diagnostic data is: the monitor data in the first preset duration; Data on flows in second preset duration and real-time application performance data.
Further, when fault diagnosis does not analyze result, the method also comprises: store relating to abnormal multidimensional data, after historical data upgrades, determine application and trouble type further again.
Further, the method also comprises: according to determining application and trouble type, provide fault recovery to advise from historical diagnostic data.
On the other hand, the application also provides a kind of device realizing application and trouble diagnosis, comprising: collecting unit, acquiring unit and failure diagnosis unit; Wherein,
Collecting unit, for gathering multidimensional application data;
Acquiring unit, for when service application occurs abnormal, to the multidimensional application data collected from the relevant diagnosis data related to the Time and place incidence relation of service exception, according to service exception type acquisition service exception;
Failure diagnosis unit, for the relevant diagnosis data related to by the service exception of acquisition, compares with the historical diagnostic data of each relevant diagnosis data respectively, determines application and trouble type.
Further, multidimensional application data comprises: the data on flows of monitor data, service application service device IP and destination address extraction extracted according to service application service device IP and the application performance data of service application service device IP and destination address extraction.
Further, monitor data at least comprises: IP address and/or monitoring period and/or cpu busy percentage and/or disk utilization and/or disk input and output io and/or internal memory relevant information and/or swapace relevant information and/or network interface relevant information and/or database response time and/or use si from the exchange memory that internal memory called in by disk and/or use so and/or the size bo from internal memory write disk and/or the size bi from disk write memory and/or service state from the exchange memory that internal memory calls in disk.
Further, data on flows for by the session of identical five-tuple institute uniquely identified, at least comprises: handshake SYN bag number and/or the code bit field FIN bag number sending TCP header and/or TCP relevant information that acquisition time and/or source/destination address and/or source/destination port and/or agreement and/or send uses when TCP/IP connects and/or send access specified services in RST number and/or unit interval total flow extremely.
Further, application performance data at least comprise: source/destination address and/or destination interface and/or request time and/or server response time and/or the time of loading and/or page relevant information and/or Http relevant information and/or tomcat global access velocity sag and/or in the unit interval database access amount abnormal and/or Weblogic current sessions number is abnormal;
Described application performance data acquisition is in the performance data of http protocol and/or the performance data of ORACLE database service and/or the performance data of MYSQL database server.
Further, failure diagnosis unit specifically for, the relevant diagnosis data that the service exception of acquisition is related to, compared by periodicity baseline or moving window baseline with the historical diagnostic data of each relevant diagnosis data respectively, according to the threshold range of each relevant diagnosis data preset, determine application and trouble type.
Further, historical diagnostic data is: the monitor data in the first preset duration; Data on flows in second preset duration and real-time application performance data.
Further, this device also comprises follow-up diagnosis unit, for storing relating to abnormal multidimensional data when fault diagnosis does not analyze result, after historical data upgrades, determines application and trouble type further again.
Further, this device also comprises recovery suggestion unit, for according to determining application and trouble type, provides fault recovery to advise from historical diagnostic data.
Technical scheme comprises: gather multidimensional application data; When service application occurs abnormal, to the multidimensional application data collected from the relevant diagnosis data related to the Time and place incidence relation of service exception, according to service exception type acquisition service exception; The relevant diagnosis data related to by the service exception of acquisition, compare with the historical diagnostic data of each relevant diagnosis data respectively, determine application and trouble type, and analyzing failure cause.The present invention carries out fault diagnosis by multidimensional application data to service application is abnormal, avoids the problem that the terminal that adopts single data to cause diagnosing malfunction is single, more fully determines traffic failure, solve service exception problem.
Accompanying drawing explanation
Accompanying drawing described herein is used to provide a further understanding of the present invention, and form a application's part, schematic description and description of the present invention, for explaining the present invention, does not form inappropriate limitation of the present invention.In the accompanying drawings:
Fig. 1 is the process flow diagram that the present invention realizes the method for application and trouble diagnosis;
Fig. 2 is the structured flowchart that the present invention realizes the device of application and trouble diagnosis.
Embodiment
Fig. 1 is a kind of process flow diagram realizing the method for application and trouble diagnosis, as shown in Figure 1, comprising:
Step 100, collection multidimensional application data;
In this step, the multidimensional application data of collection comprises: the data on flows of monitor data, service application service device IP and destination address extraction extracted according to service application service device IP and the application performance data of service application service device IP and destination address extraction.
Further, monitor data at least comprises: IP address, and/or monitoring period, and/or cpu busy percentage, and/or disk utilization, and/or disk input and output (io), and/or internal memory relevant information, and/or swapace relevant information, and/or network interface relevant information, and/or the database response time, and/or exchange memory use (si) of internal memory is called in from disk, and/or exchange memory use (so) of disk is called in from internal memory, and/or the size (bo) of disk is write from internal memory, and/or from the size (bi) of disk write memory, and/or service state.
Data on flows is for by the session of identical five-tuple institute uniquely identified, at least comprise: data on flows for by the session of identical five-tuple institute uniquely identified, at least comprises: acquisition time and/or source/destination address and/or source/destination port and/or agreement and/or send SYN (handshake used when TCP/IP connects) bag number and/or send FIN (the code bit field of TCP header) bag number and/or TCP relevant information and/or send access specified services in RST number and/or unit interval total flow extremely.Here, TCP relevant information comprises: TCP number of retransmissions, TCP check and errors number, TCP are connected abnormal closedown number of times etc.
Application performance data at least comprise: source/destination address and/or destination interface and/or request time and/or server response time and/or the time of loading and/or page relevant information and/or Http relevant information and/or tomcat global access velocity sag and/or in the unit interval database access amount abnormal and/or Weblogic current sessions number is abnormal;
Here, tomcat is existing a kind of WEB application server, and Weblogic is the WEB middleware in JAVA Program Appliance.
Application performance data acquisition is in the performance data of http protocol and/or the performance data of ORACLE database service and/or the performance data of MYSQL database server.
Here, page relevant information comprises: page downloading time, the slack-off ratio of the page etc.
Http relevant information comprises: Http access rate, Http error rate, in the unit interval, http access number is abnormal etc.
Step 101, when service application occurs abnormal, to the multidimensional application data collected from the relevant diagnosis data obtaining service exception the Time and place incidence relation of service exception, according to service exception type and relate to.
It should be noted that, the Time and place incidence relation of service exception, refer to the time occurred by service exception, according to the abnormal time occurred, relevant diagnosis data are obtained in temporal information confirmable in multidimensional data, from the information of the protocol layer related to, obtain relevant relevant diagnosis data.
Because service application abnormal conditions are complicated, those skilled in the art should understand and cannot exemplify comprehensively; In order to clearly the present invention will be described, to concentrating, common service application is abnormal illustrates here, and briefly provides the relevant diagnosis data partly related to.
It should be noted that, service exception type is the summary that those skilled in the art rule of thumb analyze the service exception kind drawn, is below kind and the relevant diagnosis data that relate to of common service exception type:
1, service application service availability is abnormal, comprise: the abnormity diagnosis of the availabilities such as main frame, database, middleware, service access, the relevant diagnosis data be mainly concerned with comprise: service state (start/stop), cpu busy percentage, disk utilization, internal memory utilize correlation parameter etc., and these part abnormal conditions are mainly from monitor data.
2, service application service device response abnormality, the relevant diagnosis data be mainly concerned with comprise: the application request time, application page downloading time, the slack-off ratio of the page, Http access rate, Http error rate (s), server response time, the database response time, the exchange memory of calling in internal memory from disk uses (si), call in disk swapping internal memory from internal memory and use (so), free memory, from the size (bo) of internal memory write disk, from the size (bi) of disk write memory, cpu utilization factor etc., in these achievement datas, first 6 is application performance data, latter 6 is monitor data.
3, service application service access exception, the relevant diagnosis data be mainly concerned with comprise: in the unit interval, the total flow of access specified services is abnormal, in unit interval, http access number is abnormal, tomcat global access velocity sag, in unit interval, database access amount is abnormal, and Weblogic current sessions number is abnormal, in these diagnosis indexs, first achievement data is from water flow collection device, and other achievement datas carry out self-application collector.
4, service application Traffic Anomaly, the relevant diagnosis data be mainly concerned with comprise: agreement abnormal proportion event (Tcp/Udp/Icmp/Igmp) abnormal proportion, flow extraordinary (bps, pps, session), these achievement datas are mainly from water flow collection device.
5, the service performance of service application is abnormal, and the relevant diagnosis data be mainly concerned with comprise: service performance monitoring is abnormal.
6, the service state of service application is abnormal, and the relevant diagnosis data be mainly concerned with comprise: service state (start/stop), and service state monitoring is abnormal.
7, the exception that causes due to network attack of service application, the relevant diagnosis data be mainly concerned with comprise: the transmission SYN bag number in the unit interval is abnormal, and average packet is long abnormal, and worm event alarm appears in circuit: CodeRed, hard disk killer, SqlSlammer, shock wave, shock wave killer, Sasser, worm mail, WinNuke attacks, UdpFragmentFlood.Achievement data is mainly from water flow collection device.
8, service application circuit is abnormal, and the relevant diagnosis data be mainly concerned with comprise: Layer 2 data Traffic Anomaly, tcp data bag retransmission rate, TCP inspection and error rate, and TCP connects abnormal closedown number of times etc.Achievement data is from water flow collection device and application collector.
Step 102, the relevant diagnosis data related to by the service exception of acquisition, compare with the historical diagnostic data of each relevant diagnosis data respectively, determine application and trouble type.
Concrete, the relevant diagnosis data that the service exception of acquisition is related to, compared by periodicity baseline or moving window baseline with the historical diagnostic data of each relevant diagnosis data respectively, according to the threshold range of each relevant diagnosis data preset, determine application and trouble type.
In this step, historical diagnostic data is: the monitor data in the first preset duration; Data on flows in second preset duration and real-time application performance data.
Here, for monitor data, because mainly comprising of employing aims at the interior data identical with daily record character day, so the first preset duration, generally refer to the monitor data in several cycles of generation, the monitor data type that the cycle of monitor data is designed according to physical fault abnormal conditions is relevant, generally minute to obtain as minimum unit;
Data on flows is referred to and to be compared by the flow parameter of short-term, and to determine exception, therefore, the second preset duration generally refers to the duration of about 20S.
Certainly, according to actual conditions, the first preset duration and the second preset duration can adjust according to practical situations and demand.
When fault diagnosis does not analyze result, the inventive method also comprises: store relating to abnormal multidimensional data, after historical data upgrades, determine application and trouble type further again.
The inventive method also comprises: according to determining application and trouble type and reason, provide fault recovery to advise from historical diagnostic data.
Fig. 2 is the structured flowchart that the present invention realizes the device of application and trouble diagnosis, as shown in Figure 2, comprising:
Collecting unit, acquiring unit and failure diagnosis unit; Wherein,
Collecting unit, for gathering multidimensional application data;
Here, multidimensional application data comprises: the data on flows of monitor data, service application service device IP and destination address extraction extracted according to service application service device IP and the application performance data of service application service device IP and destination address extraction.
Monitor data at least comprises: IP address, and/or monitoring period, and/or cpu busy percentage, and/or disk utilization, and/or disk input and output io, and/or internal memory relevant information, and/or swapace relevant information, and/or network interface relevant information, and/or the database response time, and/or the exchange memory use si of internal memory is called in from disk, and/or exchange memory use so of disk is called in from internal memory, and/or the size bo of disk is write from internal memory, and/or from the size bi of disk write memory, and/or service state.
Data on flows for by the session of identical five-tuple institute uniquely identified, at least comprises: handshake SYN bag number and/or the code bit field FIN bag number sending TCP header and/or TCP relevant information that acquisition time and/or source/destination address and/or source/destination port and/or agreement and/or send uses when TCP/IP connects and/or send access specified services in RST number and/or unit interval total flow extremely.
Application performance data at least comprise: source/destination address and/or destination interface and/or request time and/or server response time and/or the time of loading and/or the page (URL) relevant information and/or Http relevant information and/or tomcat global access velocity sag and/or in the unit interval database access amount abnormal and/or Weblogic current sessions number is abnormal;
Application performance data acquisition is in the performance data of http protocol and/or the performance data of ORACLE database service and/or the performance data of MYSQL database server.
Acquiring unit, for when service application occurs abnormal, to the multidimensional application data collected from the relevant diagnosis data related to the Time and place incidence relation of service exception, according to service exception type acquisition service exception;
Failure diagnosis unit, for the relevant diagnosis data related to by the service exception of acquisition, compares with the historical diagnostic data of each relevant diagnosis data respectively, determines application and trouble type.
Failure diagnosis unit specifically for, the relevant diagnosis data that the service exception of acquisition is related to, compared by periodicity baseline or moving window baseline with the historical diagnostic data of each relevant diagnosis data respectively, according to the threshold range of each relevant diagnosis data preset, determine application and trouble type.
Historical diagnostic data is: the monitor data in the first preset duration; Data on flows in second preset duration and real-time application performance data.
Apparatus of the present invention also comprise follow-up diagnosis unit, for storing relating to abnormal multidimensional data when fault diagnosis does not analyze result, after historical data upgrades, determine application and trouble type further again.
Apparatus of the present invention also comprise recovers suggestion unit, for according to determining application and trouble type, provides fault recovery to advise from historical diagnostic data.
Below by specific embodiment, to know detailed description to the present invention, embodiment only for content of the present invention is clearly described, and is not used in display institute of the present invention protection domain.
Embodiment 1
The long-term online stable operation of certain business application system, find that certain business data module data manipulation is shown sporadic slack-off gradually one period, and progressively expanding the service exception that other modules also start to occur slack-off situation (but slack-off degree is relatively little) to, abnormal failure cause is failed to understand.
Be below the method for traditional application and trouble diagnosis, mainly through application daily record, system application and trouble progressively diagnosed:
First, by checking application daily record, checking switch and router state and configuration in application, and checking equipment packet loss, the data such as Packet Error Rate, find that the network equipment is acted normally; Check simultaneously and find that slowly situation does not appear obviously in other application, get rid of the possibility that network goes wrong.
Owing to adopting above-mentioned single application daily record, cannot diagnostic application fault type, therefore existing method needs to adopt the following artificial mode participated in carry out fault diagnosis:
Checked the system cpu of application place main frame by utility command row, internal memory, system cache, disk io situation, find that above parameter is acted normally.Owing to not checking out exception,
Further, operation maintenance personnel utility command row checks system cpu, internal memory, system cache, the disk io situation of problem application data base place main frame, through repeatedly to check and to compare disk io between discovery system lag phase frequent, apparently higher than the system normal moment, this problem is classified as suspicious item.
Communication between operation maintenance personnel inspection application and database facility, continues capture packet and analyze by bag analysis tool, and the system that finds occurs first about 20-40 minute slowly, and communication data amount increases, and this is classified as the suspicious item of failure exception.
Operation maintenance personnel checks out above two suspicious items, and suspect that system is slack-off relevant with application, notice application research staff shows up research.
For determining failure exception problem, carrying out application operating daily record and attending a day school and code walk-through, and continuing to monitor applied host machine, database host, database operational factor.In code walk-through, find may there is the problem reading raw data when running long-time interval report data, to solve application and trouble problem.
Above process adopts single data to carry out effective fault diagnosis, participates in just achieving fault diagnosis in failure diagnostic process by a large amount of thinking.
Use application and trouble diagnostic system of the present invention, the diagnosis associated data of first 5 minutes after system is slack-off is analyzed; Here, suppose that according to the working experience of those skilled in the art, the collection period of monitor data is 1 minute, the monitor data then obtaining continuous 5 cycles is analyzed, general, while this cycle of setting, can also by the alarm cycle of this cycle set system failure exception.
The time occurred to respond slow fault associates as Time and place with operation system IP, extracts monitor data, comprises the following indexs such as internal memory is relevant:
Wherein, monitor data comprises: the virtual memory utilization rate in internal memory relevant information is greater than 70%, and the historical context data of virtual memory utilization rate are for being less than 10%.
The work numerical value of calling in the exchange memory use of internal memory from disk is greater than 800, and the historical context data that the exchange memory that internal memory called in by disk uses are about 0-120.
The work numerical value of calling in the exchange memory use of disk from internal memory is greater than 900, and the historical context data of calling in the exchange memory use of disk from internal memory are about 0-100.
Idle physical memory is about 80-140M, and historical context data are 400-500M.
Often be greater than 600 from the size of internal memory write disk, and historical context data are 20-100.
Often exceed 600 from the size of disk write memory, and historical context data are 40-70.
In the system slack-off stage, in the unit interval, database access amount obviously rises.Access rate in Http relevant information is then without significant change.
When system starts slack-off, url significantly slack-off in Http relevant information is relevant to certain business (through inquiry system url list, can know that this URL is the Report Operations page) operation pages, these pages server response time the response time taper to more than 3500ms subsequently by the 50-200ms of historical context data;
More than the present embodiment each historical context data are all the numerical value of periodic window baseline.
Moving window baseline is the response time mean value of nearest one period of short period, and periodically baseline refers to the data response of the synchronization of a unit period (working day, a week, January);
After slack-off from above data certainty annuity, obtain the response time of the page of other business from application performance data, its page response time variations is to about 1500ms.
Determine that application and trouble reason comprises:
1, to a large amount of data in magnetic disk frequent operation.
2, disk buffering is less than normal or fragment is too much.
3, physical memory is too small, causes physical memory to take too high, affects digital independent.
4, the accidental exception of the URL page of operation system association, the exception that the unreasonable use of operation maintenance personnel causes.(system carried out the combing of URL, can correspond to the operation of application, as Report Operations from the URL access of application)
Fault diagnosis is advised:
1, the operating frequency of data in magnetic disk is reduced.
2, expand disk buffering or carry out defragmentation.
3, increase physical memory too small, reduce physical memory occupancy.
4, whether determination operation interference operates relevant to particular type, adjusts causing the item of interference.
From above-mentioned diagnostic result, if carry out fault diagnosis according to existing method, can only be diagnosed the exception of internal memory, disk by monitor data; If employing performance data, can only diagnose the accidental exception of URL and the association page, adopt existing method, diagnostic result is unilateral, affects service application and recovers in time from exception.
Although the embodiment disclosed by the application is as above, the embodiment that described content only adopts for ease of understanding the application, and be not used to limit the application.Those of skill in the art belonging to any the application; under the prerequisite not departing from the spirit and scope disclosed by the application; any amendment and change can be carried out in the form implemented and details; but the scope of patent protection of the application, the scope that still must define with appending claims is as the criterion.
Claims (18)
1. realize a method for application and trouble diagnosis, it is characterized in that, comprising:
Gather multidimensional application data;
When service application occurs abnormal, to the multidimensional application data collected from the relevant diagnosis data related to the Time and place incidence relation of service exception, according to service exception type acquisition service exception;
The relevant diagnosis data related to by the service exception of acquisition, compare with the historical diagnostic data of each relevant diagnosis data respectively, determine application and trouble type.
2. method according to claim 1, it is characterized in that, described multidimensional application data comprises: the data on flows of monitor data, service application service device IP and destination address extraction extracted according to service application service device IP and the application performance data of service application service device IP and destination address extraction.
3. method according to claim 2, it is characterized in that, described monitor data at least comprises: IP address, and/or monitoring period, and/or cpu busy percentage, and/or disk utilization, and/or disk input and output io, and/or internal memory relevant information, and/or swapace relevant information, and/or network interface relevant information, and/or the database response time, and/or the exchange memory use si of internal memory is called in from disk, and/or exchange memory use so of disk is called in from internal memory, and/or the size bo of disk is write from internal memory, and/or from the size bi of disk write memory, and/or service state.
4. method according to claim 2, it is characterized in that, described data on flows for by the session of identical five-tuple institute uniquely identified, at least comprises: handshake SYN bag number and/or the code bit field FIN bag number sending TCP header and/or TCP relevant information that acquisition time and/or source/destination address and/or source/destination port and/or agreement and/or send uses when TCP/IP connects and/or send access specified services in RST number and/or unit interval total flow extremely.
5. method according to claim 2, it is characterized in that, described application performance data at least comprise: source/destination address and/or destination interface and/or request time and/or server response time and/or the time of loading and/or page relevant information and/or Http relevant information and/or tomcat global access velocity sag and/or in the unit interval database access amount abnormal and/or Weblogic current sessions number is abnormal;
Described application performance data acquisition is in the performance data of http protocol and/or the performance data of ORACLE database service and/or the performance data of MYSQL database server.
6. method according to claim 1, is characterized in that, the described relevant diagnosis data related to by the service exception of acquisition, compares respectively, determine that application and trouble type specifically comprises with the historical diagnostic data of each relevant diagnosis data:
The relevant diagnosis data that the service exception of acquisition is related to, compared by periodicity baseline or moving window baseline with the historical diagnostic data of each relevant diagnosis data respectively, according to the threshold range of each relevant diagnosis data preset, determine application and trouble type.
7. method according to claims 1 to 6, is characterized in that, described historical diagnostic data is: the monitor data in the first preset duration; Data on flows in second preset duration and real-time application performance data.
8. method according to claim 1, is characterized in that, when fault diagnosis does not analyze result, the method also comprises: store relating to abnormal multidimensional data, after historical data upgrades, determine application and trouble type further again.
9. method according to claims 1 to 8, is characterized in that, the method also comprises: according to determining application and trouble type, provide fault recovery to advise from historical diagnostic data.
10. realize a device for application and trouble diagnosis, it is characterized in that, comprising: collecting unit, acquiring unit and failure diagnosis unit; Wherein,
Collecting unit, for gathering multidimensional application data;
Acquiring unit, for when service application occurs abnormal, to the multidimensional application data collected from the relevant diagnosis data related to the Time and place incidence relation of service exception, according to service exception type acquisition service exception;
Failure diagnosis unit, for the relevant diagnosis data related to by the service exception of acquisition, compares with the historical diagnostic data of each relevant diagnosis data respectively, determines application and trouble type.
11. devices according to claim 10, it is characterized in that, described multidimensional application data comprises: the data on flows of monitor data, service application service device IP and destination address extraction extracted according to service application service device IP and the application performance data of service application service device IP and destination address extraction.
12. devices according to claim 10, it is characterized in that, described monitor data at least comprises: IP address, and/or monitoring period, and/or cpu busy percentage, and/or disk utilization, and/or disk input and output io, and/or internal memory relevant information, and/or swapace relevant information, and/or network interface relevant information, and/or the database response time, and/or the exchange memory use si of internal memory is called in from disk, and/or exchange memory use so of disk is called in from internal memory, and/or the size bo of disk is write from internal memory, and/or from the size bi of disk write memory, and/or service state.
13. devices according to claim 10, it is characterized in that, described data on flows for by the session of identical five-tuple institute uniquely identified, at least comprises: handshake SYN bag number and/or the code bit field FIN bag number sending TCP header and/or TCP relevant information that acquisition time and/or source/destination address and/or source/destination port and/or agreement and/or send uses when TCP/IP connects and/or send access specified services in RST number and/or unit interval total flow extremely.
14. devices according to claim 10, it is characterized in that, described application performance data at least comprise: source/destination address and/or destination interface and/or request time and/or server response time and/or the time of loading and/or page relevant information and/or Http relevant information and/or tomcat global access velocity sag and/or in the unit interval database access amount abnormal and/or Weblogic current sessions number is abnormal;
Described application performance data acquisition is in the performance data of http protocol and/or the performance data of ORACLE database service and/or the performance data of MYSQL database server.
15. devices according to claim 10, it is characterized in that, failure diagnosis unit specifically for, the relevant diagnosis data that the service exception of acquisition is related to, compared by periodicity baseline or moving window baseline with the historical diagnostic data of each relevant diagnosis data respectively, according to the threshold range of each relevant diagnosis data preset, determine application and trouble type.
16., according to the device described in claim 10 ~ 15, is characterized in that, described historical diagnostic data is: the monitor data in the first preset duration; Data on flows in second preset duration and real-time application performance data.
17. devices according to claim 10, it is characterized in that, this device also comprises follow-up diagnosis unit, for storing relating to abnormal multidimensional data when fault diagnosis does not analyze result, after historical data upgrades, determines application and trouble type further again.
18., according to the device described in claim 10 ~ 17, is characterized in that, this device also comprises recovery suggestion unit, for according to determining application and trouble type, provides fault recovery to advise from historical diagnostic data.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201410324069.XA CN105320585B (en) | 2014-07-08 | 2014-07-08 | A kind of method and device for realizing application failure diagnosis |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201410324069.XA CN105320585B (en) | 2014-07-08 | 2014-07-08 | A kind of method and device for realizing application failure diagnosis |
Publications (2)
Publication Number | Publication Date |
---|---|
CN105320585A true CN105320585A (en) | 2016-02-10 |
CN105320585B CN105320585B (en) | 2019-04-02 |
Family
ID=55248005
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201410324069.XA Active CN105320585B (en) | 2014-07-08 | 2014-07-08 | A kind of method and device for realizing application failure diagnosis |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN105320585B (en) |
Cited By (26)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105871638A (en) * | 2016-06-03 | 2016-08-17 | 北京启明星辰信息安全技术有限公司 | Network security control method and device |
CN106130786A (en) * | 2016-07-26 | 2016-11-16 | 腾讯科技(深圳)有限公司 | The detection method of a kind of network failure and device |
CN106452941A (en) * | 2016-08-24 | 2017-02-22 | 重庆大学 | Network anomaly detection method and device |
CN106484555A (en) * | 2016-09-29 | 2017-03-08 | 广东欧珀移动通信有限公司 | Abnormality detection and the method recovered and mobile terminal |
CN107342891A (en) * | 2017-06-07 | 2017-11-10 | 厦门金龙旅行车有限公司 | A kind of method of remote collection vehicle trouble data |
CN107995056A (en) * | 2016-10-27 | 2018-05-04 | 中国移动通信集团公司 | The method and device of fire wall recessiveness NAT breakdown judges |
CN108183821A (en) * | 2017-12-26 | 2018-06-19 | 国网山东省电力公司信息通信公司 | A kind of application performance acquisition methods and device towards electrical network business |
CN108508874A (en) * | 2018-05-08 | 2018-09-07 | 网宿科技股份有限公司 | A kind of method and apparatus of monitoring equipment fault |
CN108923952A (en) * | 2018-05-31 | 2018-11-30 | 北京百度网讯科技有限公司 | Method for diagnosing faults, equipment and storage medium based on service monitoring index |
CN108920326A (en) * | 2018-06-14 | 2018-11-30 | 阿里巴巴集团控股有限公司 | Determine system time-consuming abnormal method, apparatus and electronic equipment |
CN109002261A (en) * | 2018-07-11 | 2018-12-14 | 佛山市云端容灾信息技术有限公司 | Difference block big data analysis method, apparatus, storage medium and server |
CN109491844A (en) * | 2018-09-21 | 2019-03-19 | 国网技术学院 | A kind of computer system identifying exception information |
CN109787816A (en) * | 2018-12-28 | 2019-05-21 | 北京奇安信科技有限公司 | Traffic failure localization method, device, equipment and medium |
CN109828863A (en) * | 2019-01-10 | 2019-05-31 | 网联清算有限公司 | Data disaster tolerance method, apparatus, storage medium and computer equipment |
CN109857431A (en) * | 2019-01-11 | 2019-06-07 | 平安科技(深圳)有限公司 | Code revision method and device, computer-readable medium and electronic equipment |
CN110362442A (en) * | 2018-04-09 | 2019-10-22 | 阿里巴巴集团控股有限公司 | A kind of data monitoring method, device and equipment |
CN111193609A (en) * | 2019-11-20 | 2020-05-22 | 腾讯科技(深圳)有限公司 | Application abnormity feedback method and device and application abnormity monitoring system |
CN111371623A (en) * | 2020-03-13 | 2020-07-03 | 杨磊 | Service performance and safety monitoring method and device, storage medium and electronic equipment |
CN112783718A (en) * | 2020-12-31 | 2021-05-11 | 航天信息股份有限公司 | Management system and method for system abnormity |
CN112887354A (en) * | 2019-11-29 | 2021-06-01 | 贵州白山云科技股份有限公司 | Method and device for acquiring performance information |
CN113064762A (en) * | 2021-04-09 | 2021-07-02 | 上海新炬网络信息技术股份有限公司 | Service self-recovery method based on multiple detection |
CN113691405A (en) * | 2021-08-25 | 2021-11-23 | 北京知道创宇信息技术股份有限公司 | Access abnormity diagnosis method and device, storage medium and electronic equipment |
CN113722142A (en) * | 2021-09-02 | 2021-11-30 | 北京天融信网络安全技术有限公司 | Method and device for analyzing reasons of insufficient memory, electronic equipment and storage medium |
WO2022063242A1 (en) * | 2020-09-27 | 2022-03-31 | 中兴通讯股份有限公司 | Two-layer service state detection method, communication device, and storage medium |
CN115225462A (en) * | 2022-07-21 | 2022-10-21 | 北京天融信网络安全技术有限公司 | Network fault diagnosis method and device |
CN115696444A (en) * | 2022-09-23 | 2023-02-03 | 中兴通讯股份有限公司 | Time delay detection method and device, data analysis platform and readable storage medium |
Citations (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101848477A (en) * | 2009-03-24 | 2010-09-29 | 亚信科技(中国)有限公司 | Method and system for diagnosing fault |
CN102081623A (en) * | 2009-11-30 | 2011-06-01 | 中国移动通信集团浙江有限公司 | Method and system for detecting database abnormality |
CN102340415A (en) * | 2011-06-23 | 2012-02-01 | 北京新媒传信科技有限公司 | Server cluster system and monitoring method thereof |
CN102761448A (en) * | 2012-08-07 | 2012-10-31 | 中国石油大学(华东) | Cluster monitoring and early warning method |
WO2013086996A1 (en) * | 2011-12-13 | 2013-06-20 | 华为技术有限公司 | Failure processing method, device and system |
CN103412805A (en) * | 2013-07-31 | 2013-11-27 | 交通银行股份有限公司 | IT (information technology) fault source diagnosis method and IT fault source diagnosis system |
CN103532940A (en) * | 2013-09-30 | 2014-01-22 | 广东电网公司电力调度控制中心 | Network security detection method and device |
CN103532776A (en) * | 2013-09-30 | 2014-01-22 | 广东电网公司电力调度控制中心 | Service flow detection method and system |
CN103595584A (en) * | 2013-11-13 | 2014-02-19 | 德科仕通信(上海)有限公司 | Method and system for diagnosing Web application performance problem |
-
2014
- 2014-07-08 CN CN201410324069.XA patent/CN105320585B/en active Active
Patent Citations (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101848477A (en) * | 2009-03-24 | 2010-09-29 | 亚信科技(中国)有限公司 | Method and system for diagnosing fault |
CN102081623A (en) * | 2009-11-30 | 2011-06-01 | 中国移动通信集团浙江有限公司 | Method and system for detecting database abnormality |
CN102340415A (en) * | 2011-06-23 | 2012-02-01 | 北京新媒传信科技有限公司 | Server cluster system and monitoring method thereof |
WO2013086996A1 (en) * | 2011-12-13 | 2013-06-20 | 华为技术有限公司 | Failure processing method, device and system |
CN102761448A (en) * | 2012-08-07 | 2012-10-31 | 中国石油大学(华东) | Cluster monitoring and early warning method |
CN103412805A (en) * | 2013-07-31 | 2013-11-27 | 交通银行股份有限公司 | IT (information technology) fault source diagnosis method and IT fault source diagnosis system |
CN103532940A (en) * | 2013-09-30 | 2014-01-22 | 广东电网公司电力调度控制中心 | Network security detection method and device |
CN103532776A (en) * | 2013-09-30 | 2014-01-22 | 广东电网公司电力调度控制中心 | Service flow detection method and system |
CN103595584A (en) * | 2013-11-13 | 2014-02-19 | 德科仕通信(上海)有限公司 | Method and system for diagnosing Web application performance problem |
Cited By (43)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105871638A (en) * | 2016-06-03 | 2016-08-17 | 北京启明星辰信息安全技术有限公司 | Network security control method and device |
CN106130786B (en) * | 2016-07-26 | 2019-05-07 | 腾讯科技(深圳)有限公司 | A kind of detection method and device of network failure |
CN106130786A (en) * | 2016-07-26 | 2016-11-16 | 腾讯科技(深圳)有限公司 | The detection method of a kind of network failure and device |
CN106452941A (en) * | 2016-08-24 | 2017-02-22 | 重庆大学 | Network anomaly detection method and device |
CN106484555A (en) * | 2016-09-29 | 2017-03-08 | 广东欧珀移动通信有限公司 | Abnormality detection and the method recovered and mobile terminal |
CN106484555B (en) * | 2016-09-29 | 2019-05-17 | Oppo广东移动通信有限公司 | The method and mobile terminal of abnormality detection and recovery |
CN107995056A (en) * | 2016-10-27 | 2018-05-04 | 中国移动通信集团公司 | The method and device of fire wall recessiveness NAT breakdown judges |
CN107995056B (en) * | 2016-10-27 | 2021-04-13 | 中国移动通信集团公司 | Method and device for judging hidden NAT fault of firewall |
CN107342891A (en) * | 2017-06-07 | 2017-11-10 | 厦门金龙旅行车有限公司 | A kind of method of remote collection vehicle trouble data |
CN108183821A (en) * | 2017-12-26 | 2018-06-19 | 国网山东省电力公司信息通信公司 | A kind of application performance acquisition methods and device towards electrical network business |
CN108183821B (en) * | 2017-12-26 | 2021-03-30 | 国网山东省电力公司信息通信公司 | Application performance obtaining method and device for power grid service |
CN110362442A (en) * | 2018-04-09 | 2019-10-22 | 阿里巴巴集团控股有限公司 | A kind of data monitoring method, device and equipment |
CN110362442B (en) * | 2018-04-09 | 2023-09-22 | 创新先进技术有限公司 | Data monitoring method, device and equipment |
EP3591485A4 (en) * | 2018-05-08 | 2020-04-29 | Wangsu Science & Technology Co., Ltd. | Method and device for monitoring for equipment failure |
CN108508874A (en) * | 2018-05-08 | 2018-09-07 | 网宿科技股份有限公司 | A kind of method and apparatus of monitoring equipment fault |
CN108923952B (en) * | 2018-05-31 | 2021-11-30 | 北京百度网讯科技有限公司 | Fault diagnosis method, equipment and storage medium based on service monitoring index |
CN108923952A (en) * | 2018-05-31 | 2018-11-30 | 北京百度网讯科技有限公司 | Method for diagnosing faults, equipment and storage medium based on service monitoring index |
CN108920326A (en) * | 2018-06-14 | 2018-11-30 | 阿里巴巴集团控股有限公司 | Determine system time-consuming abnormal method, apparatus and electronic equipment |
CN109002261B (en) * | 2018-07-11 | 2022-03-22 | 佛山市云端容灾信息技术有限公司 | Method and device for analyzing big data of difference block, storage medium and server |
CN109002261A (en) * | 2018-07-11 | 2018-12-14 | 佛山市云端容灾信息技术有限公司 | Difference block big data analysis method, apparatus, storage medium and server |
CN109491844B (en) * | 2018-09-21 | 2022-03-04 | 国网技术学院 | Computer system for identifying abnormal information |
CN109491844A (en) * | 2018-09-21 | 2019-03-19 | 国网技术学院 | A kind of computer system identifying exception information |
CN109787816A (en) * | 2018-12-28 | 2019-05-21 | 北京奇安信科技有限公司 | Traffic failure localization method, device, equipment and medium |
CN109828863A (en) * | 2019-01-10 | 2019-05-31 | 网联清算有限公司 | Data disaster tolerance method, apparatus, storage medium and computer equipment |
CN109857431A (en) * | 2019-01-11 | 2019-06-07 | 平安科技(深圳)有限公司 | Code revision method and device, computer-readable medium and electronic equipment |
CN109857431B (en) * | 2019-01-11 | 2022-06-03 | 平安科技(深圳)有限公司 | Code modification method and device, computer readable medium and electronic equipment |
CN111193609B (en) * | 2019-11-20 | 2021-09-28 | 腾讯科技(深圳)有限公司 | Application abnormity feedback method and device and application abnormity monitoring system |
CN111193609A (en) * | 2019-11-20 | 2020-05-22 | 腾讯科技(深圳)有限公司 | Application abnormity feedback method and device and application abnormity monitoring system |
CN112887354A (en) * | 2019-11-29 | 2021-06-01 | 贵州白山云科技股份有限公司 | Method and device for acquiring performance information |
CN111371623B (en) * | 2020-03-13 | 2023-02-28 | 杨磊 | Service performance and safety monitoring method and device, storage medium and electronic equipment |
CN111371623A (en) * | 2020-03-13 | 2020-07-03 | 杨磊 | Service performance and safety monitoring method and device, storage medium and electronic equipment |
WO2022063242A1 (en) * | 2020-09-27 | 2022-03-31 | 中兴通讯股份有限公司 | Two-layer service state detection method, communication device, and storage medium |
CN112783718A (en) * | 2020-12-31 | 2021-05-11 | 航天信息股份有限公司 | Management system and method for system abnormity |
CN113064762A (en) * | 2021-04-09 | 2021-07-02 | 上海新炬网络信息技术股份有限公司 | Service self-recovery method based on multiple detection |
CN113064762B (en) * | 2021-04-09 | 2024-02-23 | 上海新炬网络信息技术股份有限公司 | Service self-recovery method based on various detection |
CN113691405A (en) * | 2021-08-25 | 2021-11-23 | 北京知道创宇信息技术股份有限公司 | Access abnormity diagnosis method and device, storage medium and electronic equipment |
CN113691405B (en) * | 2021-08-25 | 2023-12-01 | 北京知道创宇信息技术股份有限公司 | Access abnormality diagnosis method and device, storage medium and electronic equipment |
CN113722142A (en) * | 2021-09-02 | 2021-11-30 | 北京天融信网络安全技术有限公司 | Method and device for analyzing reasons of insufficient memory, electronic equipment and storage medium |
CN113722142B (en) * | 2021-09-02 | 2023-08-25 | 北京天融信网络安全技术有限公司 | Method and device for analyzing reasons of insufficient memory, electronic equipment and storage medium |
CN115225462A (en) * | 2022-07-21 | 2022-10-21 | 北京天融信网络安全技术有限公司 | Network fault diagnosis method and device |
CN115225462B (en) * | 2022-07-21 | 2024-02-02 | 北京天融信网络安全技术有限公司 | Network fault diagnosis method and device |
CN115696444A (en) * | 2022-09-23 | 2023-02-03 | 中兴通讯股份有限公司 | Time delay detection method and device, data analysis platform and readable storage medium |
CN115696444B (en) * | 2022-09-23 | 2023-09-12 | 中兴通讯股份有限公司 | Time delay detection method, device, data analysis platform and readable storage medium |
Also Published As
Publication number | Publication date |
---|---|
CN105320585B (en) | 2019-04-02 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN105320585A (en) | Method and device for achieving application fault diagnosis | |
US20210042270A1 (en) | Alarm log compression method, apparatus, and system, and storage medium | |
US9910727B2 (en) | Detecting anomalous accounts using event logs | |
US10122575B2 (en) | Log collection, structuring and processing | |
CN108039957B (en) | Intelligent analysis system for complex network traffic packet | |
Lee et al. | An internet traffic analysis method with mapreduce | |
US9158649B2 (en) | Methods and computer program products for generating a model of network application health | |
US10574669B1 (en) | Packet filters in security appliances with modes and intervals | |
US11632320B2 (en) | Centralized analytical monitoring of IP connected devices | |
CN103152352A (en) | Perfect information security and forensics monitoring method and system based on cloud computing environment | |
US20220343168A1 (en) | Multi-domain service assurance using real-time adaptive thresholds | |
US20190007292A1 (en) | Apparatus and method for monitoring network performance of virtualized resources | |
CN107066370A (en) | A kind of automatic monitoring and the instrument and method for collecting faulty hard disk daily record | |
EP3282643A1 (en) | Method and apparatus of estimating conversation in a distributed netflow environment | |
CN107911387A (en) | Power information acquisition system account logs in the monitoring method with abnormal operation extremely | |
US20190007285A1 (en) | Apparatus and Method for Defining Baseline Network Behavior and Producing Analytics and Alerts Therefrom | |
CN105119767A (en) | Data self-check and self-cleaning software operation state monitoring method and system | |
CN114039900A (en) | Efficient network data packet protocol analysis method and system | |
JP2020092332A (en) | Network abnormality detection device, network abnormality detection system, and network abnormality detection method | |
US9645877B2 (en) | Monitoring apparatus, monitoring method, and recording medium | |
CN103957128A (en) | Method and system for monitoring data flow direction in cloud computing environment | |
US10038603B1 (en) | Packet capture collection tasking system | |
US11556120B2 (en) | Systems and methods for monitoring performance of a building management system via log streams | |
CN108400905B (en) | Method for processing end-to-end flow analysis of distributed storage | |
CN113254313A (en) | Monitoring index abnormality detection method and device, electronic equipment and storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |