CN105320585A

CN105320585A - Method and device for achieving application fault diagnosis

Info

Publication number: CN105320585A
Application number: CN201410324069.XA
Authority: CN
Inventors: 谌颐; 胡盛华
Original assignee: Beijing Venus Information Security Technology Co Ltd; Beijing Venus Information Technology Co Ltd
Current assignee: Beijing Venus Information Security Technology Co Ltd; Venus Info Tech Inc; Beijing Venus Information Technology Co Ltd
Priority date: 2014-07-08
Filing date: 2014-07-08
Publication date: 2016-02-10
Anticipated expiration: 2034-07-08
Also published as: CN105320585B

Abstract

The invention discloses a method and device for achieving application fault diagnosis. The method comprises the steps that multi-dimensional application data are collected; when a service application is abnormal, associated diagnosis data related to service abnormities are obtained for the collected multi-dimensional application data from the time and space associated relation of the service abnormities according to the service abnormity type; the obtained associated diagnosis date related to the service abnormities are compared with historical diagnosis data of the associated diagnosis data to determine the application fault type. Fault diagnosis is carried out on the service application abnormities through the multi-dimensional application data, the problem that a single terminal exists due to the fact that fault diagnosis is carried out through single data is solved, the service fault is determined more comprehensively, and the service abnormity problem is solved.

Description

A kind of method and device realizing application and trouble diagnosis

Technical field

The present invention relates to computer application field, espespecially a kind of method and device realizing application and trouble diagnosis.

Background technology

Along with the development of IT technology application, the miscellaneous service process of carrying out of enterprise more and more closely combines with Internet technology, and the applied information system be made up of server, database, middleware etc. also becomes and becomes increasingly complex.Even if progressively improve the level requirement of technician, but still exist and carry out the more and more difficult problem of failture evacuation.The running quality of service application (ability of finishing service, speed and stability) direct relation enterprise can be supplied to the professional skill of user.Manage the monitoring performance of Mission critical applications, carrying out analyzing and diagnosing for Problems existing in performance supervision timely and effectively, is an urgent demand improving customer service application availability.

At present, mainly the following aspects is comprised to the monitoring performance management of service application: 1, the access situation of application is monitored; 2, when service application generation property abnormality, judge whether because abnormal causing appears in network system performance; 3, when service application generation access exception, judge whether to cause because network or application are subjected to attack.By the diagnosis to service application fault, technician can be effectively helped to carry out the instant recovery of service application.

The fault diagnosis of existing service application mainly carries out fault analysis from single data such as data on flows or monitor datas (such as, applying daily record); Because the data of carrying out Analysis on Fault Diagnosis are single, easily cause the fault diagnosis result obtained to exist unilateral or not enough, this just needs to complete fault diagnosis by how artificial participation.

Summary of the invention

In order to solve the problems of the technologies described above, the invention provides a kind of method and the device that realize application and trouble diagnosis, according to the data of multidimensional, comprehensive diagnostic can be carried out to traffic failure, reduce and artificially participate in.

In order to reach foregoing invention object, the invention discloses a kind of method realizing application and trouble diagnosis, comprising:

Gather multidimensional application data;

When service application occurs abnormal, to the multidimensional application data collected from the relevant diagnosis data related to the Time and place incidence relation of service exception, according to service exception type acquisition service exception;

The relevant diagnosis data related to by the service exception of acquisition, compare with the historical diagnostic data of each relevant diagnosis data respectively, determine application and trouble type.

Further, multidimensional application data comprises: the data on flows of monitor data, service application service device IP and destination address extraction extracted according to service application service device IP and the application performance data of service application service device IP and destination address extraction.

Further, monitor data at least comprises: IP address and/or monitoring period and/or cpu busy percentage and/or disk utilization and/or disk input and output io and/or internal memory relevant information and/or swapace relevant information and/or network interface relevant information and/or database response time and/or use si from the exchange memory that internal memory called in by disk and/or use so and/or the size bo from internal memory write disk and/or the size bi from disk write memory and/or service state from the exchange memory that internal memory calls in disk.

Further, data on flows for by the session of identical five-tuple institute uniquely identified, at least comprises: handshake SYN bag number and/or the code bit field FIN bag number sending TCP header and/or TCP relevant information that acquisition time and/or source/destination address and/or source/destination port and/or agreement and/or send uses when TCP/IP connects and/or send access specified services in RST number and/or unit interval total flow extremely.

Further, application performance data at least comprise: source/destination address and/or destination interface and/or request time and/or server response time and/or the time of loading and/or page relevant information and/or Http relevant information and/or tomcat global access velocity sag and/or in the unit interval database access amount abnormal and/or Weblogic current sessions number is abnormal;

Described application performance data acquisition is in the performance data of http protocol and/or the performance data of ORACLE database service and/or the performance data of MYSQL database server.

Further, the relevant diagnosis data related to by the service exception of acquisition, compare with the historical diagnostic data of each relevant diagnosis data respectively, determine that application and trouble type specifically comprises:

The relevant diagnosis data that the service exception of acquisition is related to, compared by periodicity baseline or moving window baseline with the historical diagnostic data of each relevant diagnosis data respectively, according to the threshold range of each relevant diagnosis data preset, determine application and trouble type.

Further, described historical diagnostic data is: the monitor data in the first preset duration; Data on flows in second preset duration and real-time application performance data.

Further, when fault diagnosis does not analyze result, the method also comprises: store relating to abnormal multidimensional data, after historical data upgrades, determine application and trouble type further again.

Further, the method also comprises: according to determining application and trouble type, provide fault recovery to advise from historical diagnostic data.

On the other hand, the application also provides a kind of device realizing application and trouble diagnosis, comprising: collecting unit, acquiring unit and failure diagnosis unit; Wherein,

Collecting unit, for gathering multidimensional application data;

Acquiring unit, for when service application occurs abnormal, to the multidimensional application data collected from the relevant diagnosis data related to the Time and place incidence relation of service exception, according to service exception type acquisition service exception;

Failure diagnosis unit, for the relevant diagnosis data related to by the service exception of acquisition, compares with the historical diagnostic data of each relevant diagnosis data respectively, determines application and trouble type.

Further, failure diagnosis unit specifically for, the relevant diagnosis data that the service exception of acquisition is related to, compared by periodicity baseline or moving window baseline with the historical diagnostic data of each relevant diagnosis data respectively, according to the threshold range of each relevant diagnosis data preset, determine application and trouble type.

Further, historical diagnostic data is: the monitor data in the first preset duration; Data on flows in second preset duration and real-time application performance data.

Further, this device also comprises follow-up diagnosis unit, for storing relating to abnormal multidimensional data when fault diagnosis does not analyze result, after historical data upgrades, determines application and trouble type further again.

Further, this device also comprises recovery suggestion unit, for according to determining application and trouble type, provides fault recovery to advise from historical diagnostic data.

Technical scheme comprises: gather multidimensional application data; When service application occurs abnormal, to the multidimensional application data collected from the relevant diagnosis data related to the Time and place incidence relation of service exception, according to service exception type acquisition service exception; The relevant diagnosis data related to by the service exception of acquisition, compare with the historical diagnostic data of each relevant diagnosis data respectively, determine application and trouble type, and analyzing failure cause.The present invention carries out fault diagnosis by multidimensional application data to service application is abnormal, avoids the problem that the terminal that adopts single data to cause diagnosing malfunction is single, more fully determines traffic failure, solve service exception problem.

Accompanying drawing explanation

Accompanying drawing described herein is used to provide a further understanding of the present invention, and form a application's part, schematic description and description of the present invention, for explaining the present invention, does not form inappropriate limitation of the present invention.In the accompanying drawings:

Fig. 1 is the process flow diagram that the present invention realizes the method for application and trouble diagnosis;

Fig. 2 is the structured flowchart that the present invention realizes the device of application and trouble diagnosis.

Embodiment

Fig. 1 is a kind of process flow diagram realizing the method for application and trouble diagnosis, as shown in Figure 1, comprising:

Step 100, collection multidimensional application data;

In this step, the multidimensional application data of collection comprises: the data on flows of monitor data, service application service device IP and destination address extraction extracted according to service application service device IP and the application performance data of service application service device IP and destination address extraction.

Further, monitor data at least comprises: IP address, and/or monitoring period, and/or cpu busy percentage, and/or disk utilization, and/or disk input and output (io), and/or internal memory relevant information, and/or swapace relevant information, and/or network interface relevant information, and/or the database response time, and/or exchange memory use (si) of internal memory is called in from disk, and/or exchange memory use (so) of disk is called in from internal memory, and/or the size (bo) of disk is write from internal memory, and/or from the size (bi) of disk write memory, and/or service state.

Data on flows is for by the session of identical five-tuple institute uniquely identified, at least comprise: data on flows for by the session of identical five-tuple institute uniquely identified, at least comprises: acquisition time and/or source/destination address and/or source/destination port and/or agreement and/or send SYN (handshake used when TCP/IP connects) bag number and/or send FIN (the code bit field of TCP header) bag number and/or TCP relevant information and/or send access specified services in RST number and/or unit interval total flow extremely.Here, TCP relevant information comprises: TCP number of retransmissions, TCP check and errors number, TCP are connected abnormal closedown number of times etc.

Application performance data at least comprise: source/destination address and/or destination interface and/or request time and/or server response time and/or the time of loading and/or page relevant information and/or Http relevant information and/or tomcat global access velocity sag and/or in the unit interval database access amount abnormal and/or Weblogic current sessions number is abnormal;

Here, tomcat is existing a kind of WEB application server, and Weblogic is the WEB middleware in JAVA Program Appliance.

Application performance data acquisition is in the performance data of http protocol and/or the performance data of ORACLE database service and/or the performance data of MYSQL database server.

Here, page relevant information comprises: page downloading time, the slack-off ratio of the page etc.

Http relevant information comprises: Http access rate, Http error rate, in the unit interval, http access number is abnormal etc.

Step 101, when service application occurs abnormal, to the multidimensional application data collected from the relevant diagnosis data obtaining service exception the Time and place incidence relation of service exception, according to service exception type and relate to.

It should be noted that, the Time and place incidence relation of service exception, refer to the time occurred by service exception, according to the abnormal time occurred, relevant diagnosis data are obtained in temporal information confirmable in multidimensional data, from the information of the protocol layer related to, obtain relevant relevant diagnosis data.

Because service application abnormal conditions are complicated, those skilled in the art should understand and cannot exemplify comprehensively; In order to clearly the present invention will be described, to concentrating, common service application is abnormal illustrates here, and briefly provides the relevant diagnosis data partly related to.

It should be noted that, service exception type is the summary that those skilled in the art rule of thumb analyze the service exception kind drawn, is below kind and the relevant diagnosis data that relate to of common service exception type:

1, service application service availability is abnormal, comprise: the abnormity diagnosis of the availabilities such as main frame, database, middleware, service access, the relevant diagnosis data be mainly concerned with comprise: service state (start/stop), cpu busy percentage, disk utilization, internal memory utilize correlation parameter etc., and these part abnormal conditions are mainly from monitor data.

2, service application service device response abnormality, the relevant diagnosis data be mainly concerned with comprise: the application request time, application page downloading time, the slack-off ratio of the page, Http access rate, Http error rate (s), server response time, the database response time, the exchange memory of calling in internal memory from disk uses (si), call in disk swapping internal memory from internal memory and use (so), free memory, from the size (bo) of internal memory write disk, from the size (bi) of disk write memory, cpu utilization factor etc., in these achievement datas, first 6 is application performance data, latter 6 is monitor data.

3, service application service access exception, the relevant diagnosis data be mainly concerned with comprise: in the unit interval, the total flow of access specified services is abnormal, in unit interval, http access number is abnormal, tomcat global access velocity sag, in unit interval, database access amount is abnormal, and Weblogic current sessions number is abnormal, in these diagnosis indexs, first achievement data is from water flow collection device, and other achievement datas carry out self-application collector.

4, service application Traffic Anomaly, the relevant diagnosis data be mainly concerned with comprise: agreement abnormal proportion event (Tcp/Udp/Icmp/Igmp) abnormal proportion, flow extraordinary (bps, pps, session), these achievement datas are mainly from water flow collection device.

5, the service performance of service application is abnormal, and the relevant diagnosis data be mainly concerned with comprise: service performance monitoring is abnormal.

6, the service state of service application is abnormal, and the relevant diagnosis data be mainly concerned with comprise: service state (start/stop), and service state monitoring is abnormal.

7, the exception that causes due to network attack of service application, the relevant diagnosis data be mainly concerned with comprise: the transmission SYN bag number in the unit interval is abnormal, and average packet is long abnormal, and worm event alarm appears in circuit: CodeRed, hard disk killer, SqlSlammer, shock wave, shock wave killer, Sasser, worm mail, WinNuke attacks, UdpFragmentFlood.Achievement data is mainly from water flow collection device.

8, service application circuit is abnormal, and the relevant diagnosis data be mainly concerned with comprise: Layer 2 data Traffic Anomaly, tcp data bag retransmission rate, TCP inspection and error rate, and TCP connects abnormal closedown number of times etc.Achievement data is from water flow collection device and application collector.

Step 102, the relevant diagnosis data related to by the service exception of acquisition, compare with the historical diagnostic data of each relevant diagnosis data respectively, determine application and trouble type.

Concrete, the relevant diagnosis data that the service exception of acquisition is related to, compared by periodicity baseline or moving window baseline with the historical diagnostic data of each relevant diagnosis data respectively, according to the threshold range of each relevant diagnosis data preset, determine application and trouble type.

In this step, historical diagnostic data is: the monitor data in the first preset duration; Data on flows in second preset duration and real-time application performance data.

Here, for monitor data, because mainly comprising of employing aims at the interior data identical with daily record character day, so the first preset duration, generally refer to the monitor data in several cycles of generation, the monitor data type that the cycle of monitor data is designed according to physical fault abnormal conditions is relevant, generally minute to obtain as minimum unit;

Data on flows is referred to and to be compared by the flow parameter of short-term, and to determine exception, therefore, the second preset duration generally refers to the duration of about 20S.

Certainly, according to actual conditions, the first preset duration and the second preset duration can adjust according to practical situations and demand.

When fault diagnosis does not analyze result, the inventive method also comprises: store relating to abnormal multidimensional data, after historical data upgrades, determine application and trouble type further again.

The inventive method also comprises: according to determining application and trouble type and reason, provide fault recovery to advise from historical diagnostic data.

Fig. 2 is the structured flowchart that the present invention realizes the device of application and trouble diagnosis, as shown in Figure 2, comprising:

Collecting unit, acquiring unit and failure diagnosis unit; Wherein,

Collecting unit, for gathering multidimensional application data;

Here, multidimensional application data comprises: the data on flows of monitor data, service application service device IP and destination address extraction extracted according to service application service device IP and the application performance data of service application service device IP and destination address extraction.

Monitor data at least comprises: IP address, and/or monitoring period, and/or cpu busy percentage, and/or disk utilization, and/or disk input and output io, and/or internal memory relevant information, and/or swapace relevant information, and/or network interface relevant information, and/or the database response time, and/or the exchange memory use si of internal memory is called in from disk, and/or exchange memory use so of disk is called in from internal memory, and/or the size bo of disk is write from internal memory, and/or from the size bi of disk write memory, and/or service state.

Data on flows for by the session of identical five-tuple institute uniquely identified, at least comprises: handshake SYN bag number and/or the code bit field FIN bag number sending TCP header and/or TCP relevant information that acquisition time and/or source/destination address and/or source/destination port and/or agreement and/or send uses when TCP/IP connects and/or send access specified services in RST number and/or unit interval total flow extremely.

Application performance data at least comprise: source/destination address and/or destination interface and/or request time and/or server response time and/or the time of loading and/or the page (URL) relevant information and/or Http relevant information and/or tomcat global access velocity sag and/or in the unit interval database access amount abnormal and/or Weblogic current sessions number is abnormal;

Failure diagnosis unit specifically for, the relevant diagnosis data that the service exception of acquisition is related to, compared by periodicity baseline or moving window baseline with the historical diagnostic data of each relevant diagnosis data respectively, according to the threshold range of each relevant diagnosis data preset, determine application and trouble type.

Historical diagnostic data is: the monitor data in the first preset duration; Data on flows in second preset duration and real-time application performance data.

Apparatus of the present invention also comprise follow-up diagnosis unit, for storing relating to abnormal multidimensional data when fault diagnosis does not analyze result, after historical data upgrades, determine application and trouble type further again.

Apparatus of the present invention also comprise recovers suggestion unit, for according to determining application and trouble type, provides fault recovery to advise from historical diagnostic data.

Below by specific embodiment, to know detailed description to the present invention, embodiment only for content of the present invention is clearly described, and is not used in display institute of the present invention protection domain.

Embodiment 1

The long-term online stable operation of certain business application system, find that certain business data module data manipulation is shown sporadic slack-off gradually one period, and progressively expanding the service exception that other modules also start to occur slack-off situation (but slack-off degree is relatively little) to, abnormal failure cause is failed to understand.

Be below the method for traditional application and trouble diagnosis, mainly through application daily record, system application and trouble progressively diagnosed:

First, by checking application daily record, checking switch and router state and configuration in application, and checking equipment packet loss, the data such as Packet Error Rate, find that the network equipment is acted normally; Check simultaneously and find that slowly situation does not appear obviously in other application, get rid of the possibility that network goes wrong.

Owing to adopting above-mentioned single application daily record, cannot diagnostic application fault type, therefore existing method needs to adopt the following artificial mode participated in carry out fault diagnosis:

Checked the system cpu of application place main frame by utility command row, internal memory, system cache, disk io situation, find that above parameter is acted normally.Owing to not checking out exception,

Further, operation maintenance personnel utility command row checks system cpu, internal memory, system cache, the disk io situation of problem application data base place main frame, through repeatedly to check and to compare disk io between discovery system lag phase frequent, apparently higher than the system normal moment, this problem is classified as suspicious item.

Communication between operation maintenance personnel inspection application and database facility, continues capture packet and analyze by bag analysis tool, and the system that finds occurs first about 20-40 minute slowly, and communication data amount increases, and this is classified as the suspicious item of failure exception.

Operation maintenance personnel checks out above two suspicious items, and suspect that system is slack-off relevant with application, notice application research staff shows up research.

For determining failure exception problem, carrying out application operating daily record and attending a day school and code walk-through, and continuing to monitor applied host machine, database host, database operational factor.In code walk-through, find may there is the problem reading raw data when running long-time interval report data, to solve application and trouble problem.

Above process adopts single data to carry out effective fault diagnosis, participates in just achieving fault diagnosis in failure diagnostic process by a large amount of thinking.

Use application and trouble diagnostic system of the present invention, the diagnosis associated data of first 5 minutes after system is slack-off is analyzed; Here, suppose that according to the working experience of those skilled in the art, the collection period of monitor data is 1 minute, the monitor data then obtaining continuous 5 cycles is analyzed, general, while this cycle of setting, can also by the alarm cycle of this cycle set system failure exception.

The time occurred to respond slow fault associates as Time and place with operation system IP, extracts monitor data, comprises the following indexs such as internal memory is relevant:

Wherein, monitor data comprises: the virtual memory utilization rate in internal memory relevant information is greater than 70%, and the historical context data of virtual memory utilization rate are for being less than 10%.

The work numerical value of calling in the exchange memory use of internal memory from disk is greater than 800, and the historical context data that the exchange memory that internal memory called in by disk uses are about 0-120.

The work numerical value of calling in the exchange memory use of disk from internal memory is greater than 900, and the historical context data of calling in the exchange memory use of disk from internal memory are about 0-100.

Idle physical memory is about 80-140M, and historical context data are 400-500M.

Often be greater than 600 from the size of internal memory write disk, and historical context data are 20-100.

Often exceed 600 from the size of disk write memory, and historical context data are 40-70.

In the system slack-off stage, in the unit interval, database access amount obviously rises.Access rate in Http relevant information is then without significant change.

When system starts slack-off, url significantly slack-off in Http relevant information is relevant to certain business (through inquiry system url list, can know that this URL is the Report Operations page) operation pages, these pages server response time the response time taper to more than 3500ms subsequently by the 50-200ms of historical context data;

More than the present embodiment each historical context data are all the numerical value of periodic window baseline.

Moving window baseline is the response time mean value of nearest one period of short period, and periodically baseline refers to the data response of the synchronization of a unit period (working day, a week, January);

After slack-off from above data certainty annuity, obtain the response time of the page of other business from application performance data, its page response time variations is to about 1500ms.

Determine that application and trouble reason comprises:

1, to a large amount of data in magnetic disk frequent operation.

2, disk buffering is less than normal or fragment is too much.

3, physical memory is too small, causes physical memory to take too high, affects digital independent.

4, the accidental exception of the URL page of operation system association, the exception that the unreasonable use of operation maintenance personnel causes.(system carried out the combing of URL, can correspond to the operation of application, as Report Operations from the URL access of application)

Fault diagnosis is advised:

1, the operating frequency of data in magnetic disk is reduced.

2, expand disk buffering or carry out defragmentation.

3, increase physical memory too small, reduce physical memory occupancy.

4, whether determination operation interference operates relevant to particular type, adjusts causing the item of interference.

From above-mentioned diagnostic result, if carry out fault diagnosis according to existing method, can only be diagnosed the exception of internal memory, disk by monitor data; If employing performance data, can only diagnose the accidental exception of URL and the association page, adopt existing method, diagnostic result is unilateral, affects service application and recovers in time from exception.

Although the embodiment disclosed by the application is as above, the embodiment that described content only adopts for ease of understanding the application, and be not used to limit the application.Those of skill in the art belonging to any the application; under the prerequisite not departing from the spirit and scope disclosed by the application; any amendment and change can be carried out in the form implemented and details; but the scope of patent protection of the application, the scope that still must define with appending claims is as the criterion.

Claims

1. realize a method for application and trouble diagnosis, it is characterized in that, comprising:

Gather multidimensional application data;

2. method according to claim 1, it is characterized in that, described multidimensional application data comprises: the data on flows of monitor data, service application service device IP and destination address extraction extracted according to service application service device IP and the application performance data of service application service device IP and destination address extraction.

3. method according to claim 2, it is characterized in that, described monitor data at least comprises: IP address, and/or monitoring period, and/or cpu busy percentage, and/or disk utilization, and/or disk input and output io, and/or internal memory relevant information, and/or swapace relevant information, and/or network interface relevant information, and/or the database response time, and/or the exchange memory use si of internal memory is called in from disk, and/or exchange memory use so of disk is called in from internal memory, and/or the size bo of disk is write from internal memory, and/or from the size bi of disk write memory, and/or service state.

4. method according to claim 2, it is characterized in that, described data on flows for by the session of identical five-tuple institute uniquely identified, at least comprises: handshake SYN bag number and/or the code bit field FIN bag number sending TCP header and/or TCP relevant information that acquisition time and/or source/destination address and/or source/destination port and/or agreement and/or send uses when TCP/IP connects and/or send access specified services in RST number and/or unit interval total flow extremely.

5. method according to claim 2, it is characterized in that, described application performance data at least comprise: source/destination address and/or destination interface and/or request time and/or server response time and/or the time of loading and/or page relevant information and/or Http relevant information and/or tomcat global access velocity sag and/or in the unit interval database access amount abnormal and/or Weblogic current sessions number is abnormal;

6. method according to claim 1, is characterized in that, the described relevant diagnosis data related to by the service exception of acquisition, compares respectively, determine that application and trouble type specifically comprises with the historical diagnostic data of each relevant diagnosis data:

7. method according to claims 1 to 6, is characterized in that, described historical diagnostic data is: the monitor data in the first preset duration; Data on flows in second preset duration and real-time application performance data.

8. method according to claim 1, is characterized in that, when fault diagnosis does not analyze result, the method also comprises: store relating to abnormal multidimensional data, after historical data upgrades, determine application and trouble type further again.

9. method according to claims 1 to 8, is characterized in that, the method also comprises: according to determining application and trouble type, provide fault recovery to advise from historical diagnostic data.

10. realize a device for application and trouble diagnosis, it is characterized in that, comprising: collecting unit, acquiring unit and failure diagnosis unit; Wherein,

Collecting unit, for gathering multidimensional application data;

11. devices according to claim 10, it is characterized in that, described multidimensional application data comprises: the data on flows of monitor data, service application service device IP and destination address extraction extracted according to service application service device IP and the application performance data of service application service device IP and destination address extraction.

12. devices according to claim 10, it is characterized in that, described monitor data at least comprises: IP address, and/or monitoring period, and/or cpu busy percentage, and/or disk utilization, and/or disk input and output io, and/or internal memory relevant information, and/or swapace relevant information, and/or network interface relevant information, and/or the database response time, and/or the exchange memory use si of internal memory is called in from disk, and/or exchange memory use so of disk is called in from internal memory, and/or the size bo of disk is write from internal memory, and/or from the size bi of disk write memory, and/or service state.

13. devices according to claim 10, it is characterized in that, described data on flows for by the session of identical five-tuple institute uniquely identified, at least comprises: handshake SYN bag number and/or the code bit field FIN bag number sending TCP header and/or TCP relevant information that acquisition time and/or source/destination address and/or source/destination port and/or agreement and/or send uses when TCP/IP connects and/or send access specified services in RST number and/or unit interval total flow extremely.

14. devices according to claim 10, it is characterized in that, described application performance data at least comprise: source/destination address and/or destination interface and/or request time and/or server response time and/or the time of loading and/or page relevant information and/or Http relevant information and/or tomcat global access velocity sag and/or in the unit interval database access amount abnormal and/or Weblogic current sessions number is abnormal;

15. devices according to claim 10, it is characterized in that, failure diagnosis unit specifically for, the relevant diagnosis data that the service exception of acquisition is related to, compared by periodicity baseline or moving window baseline with the historical diagnostic data of each relevant diagnosis data respectively, according to the threshold range of each relevant diagnosis data preset, determine application and trouble type.

16., according to the device described in claim 10 ~ 15, is characterized in that, described historical diagnostic data is: the monitor data in the first preset duration; Data on flows in second preset duration and real-time application performance data.

17. devices according to claim 10, it is characterized in that, this device also comprises follow-up diagnosis unit, for storing relating to abnormal multidimensional data when fault diagnosis does not analyze result, after historical data upgrades, determines application and trouble type further again.

18., according to the device described in claim 10 ~ 17, is characterized in that, this device also comprises recovery suggestion unit, for according to determining application and trouble type, provides fault recovery to advise from historical diagnostic data.