CN108390793A - A kind of method and device of analysis system stability - Google Patents

A kind of method and device of analysis system stability Download PDF

Info

Publication number
CN108390793A
CN108390793A CN201810083390.1A CN201810083390A CN108390793A CN 108390793 A CN108390793 A CN 108390793A CN 201810083390 A CN201810083390 A CN 201810083390A CN 108390793 A CN108390793 A CN 108390793A
Authority
CN
China
Prior art keywords
data
operation data
monitor control
fluctuation range
related coefficient
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201810083390.1A
Other languages
Chinese (zh)
Inventor
孙迁
叶国华
刘发亮
马翔
杜中原
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Suning Commerce Group Co Ltd
Original Assignee
Suning Commerce Group Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Suning Commerce Group Co Ltd filed Critical Suning Commerce Group Co Ltd
Priority to CN201810083390.1A priority Critical patent/CN108390793A/en
Publication of CN108390793A publication Critical patent/CN108390793A/en
Pending legal-status Critical Current

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L43/00Arrangements for monitoring or testing data switching networks
    • H04L43/08Monitoring or testing based on specific metrics, e.g. QoS, energy consumption or environmental parameters
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/14Network analysis or design
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L43/00Arrangements for monitoring or testing data switching networks
    • H04L43/04Processing captured monitoring data, e.g. for logfile generation

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Data Mining & Analysis (AREA)
  • Environmental & Geological Engineering (AREA)
  • Debugging And Monitoring (AREA)

Abstract

The embodiment of the invention discloses a kind of method and devices of analysis system stability, are related to field of computer technology, can improve intelligence degree and the accuracy of monitoring system stability.The present invention includes:Acquisition and the associated operation data of monitor control index;Using the correlation between different monitor control indexes, pending operation data is selected from the operation data acquired, and determine fluctuation range;According to the fluctuation range, the abnormal conditions of the current operation data of system are obtained.Stability for analysis system.

Description

A kind of method and device of analysis system stability
Technical field
The present invention relates to field of computer technology more particularly to a kind of method and devices of analysis system stability.
Background technology
With the development of computer technology and Internet technology, the scale of Chinese Internet industry constantly expands, largely Constantly designed in line service, in order to ensure these in line service normal operation, where needing these real-time business The operation conditions of system
Currently, most system monitorings use and set threshold values for a certain system performance measure, by comparing runtime value Judge whether system running state is normal with the size of threshold values, but the monitor mode of this static setting monitor control index, It is merely able to the index monitoring for solving some compared with coarseness, such as the fingers such as congestion situations of the loading condition of monitoring CPU, the network port Target monitor, only can decision-making system whether overload.And in practical applications, the effect of monitoring is not smart enough, flexible, at present Monitoring strategies often all there is a problem of monitoring scene is single, decision procedure ossify, especially under many most complex scenarios System operation situation, it is difficult to make correct judgement.
And in order to improve the stability of system, most common mode is to carry out dilatation for system.In new system application or expansion Rong Shi, also can reference index monitoring evaluate the configuration of required machine and quantity.But due to the threshold values of these indexs monitoring, often It is to be influenced by personal experience according to the empirically determined of people again, it is very inaccurate.
Invention content
The embodiment of the present invention provides a kind of method and device of analysis system stability, can improve monitoring system stabilization The intelligence degree of property and accuracy.
The index typically artificially directly set by some in current existing technology is come monitoring system exception, often It is influenced by personal experience, compared with the accuracy that the index monitoring of coarseness has also been difficult to safeguards system monitoring.The accuracy of monitoring It is relatively low directly result in System Expansion after often all also need to debugging system, front and back debugging system is also required to many times.Monitoring Accuracy it is relatively low, also result in after system debug, in line service, all it is easy to appear some operation troubles, accidents, this is just The corresponding manpower of distribution in need carries out malfunction elimination and occupies a large amount of manpower to increase the operating cost of operator Resource.
For the defect exposed when judging system operation situation by threshold values in traditional system monitoring means:Such as monitoring The problems such as scene is single, decision procedure is rigid, and judgement result does not square with the fact, in the present embodiment, by acquiring bi directional association System monitoring item data simultaneously carries out confluence analysis, founding mathematical models to data, by judging collected system monitoring data Whether meet mathematical model to judge system operation situation, has abandoned and in the past single monitored item setting threshold values had been judged to be The mode for operation conditions of uniting, the system monitoring made is more accurate, comprehensively.Such as:So that by the industry of this static state of order volume The monitor control index of the monitor control index for data of being engaged in and other dynamic datas of system operation, which combines, to be possibly realized so that Duo Gewei The monitor control index of degree merges, and is quantified as related coefficient, then the operating status by correlation analysis system.
Historical performance by being then based on system shows comprehensive multi objective statistical analysis, and technical staff need not be again into the hand-manipulating of needle The troublesome operation for removing to adjust each system monitor item threshold values manually to different business scene, avoids and is supervised under different business scene The inaccurate situation of control alarm, improves intelligence degree and the accuracy of existing monitoring means.
Description of the drawings
It to describe the technical solutions in the embodiments of the present invention more clearly, below will be to needed in the embodiment Attached drawing is briefly described, it should be apparent that, drawings in the following description are only some embodiments of the invention, for ability For the those of ordinary skill of domain, without creative efforts, it can also be obtained according to these attached drawings other attached Figure.
Fig. 1 is system architecture schematic diagram provided in an embodiment of the present invention;
Fig. 2 a are method flow schematic diagram provided in an embodiment of the present invention;
Fig. 2 b are the schematic diagram of specific example provided in an embodiment of the present invention;
Fig. 3, Fig. 4 are apparatus structure schematic diagram provided in an embodiment of the present invention.
Specific implementation mode
To make those skilled in the art more fully understand technical scheme of the present invention, below in conjunction with the accompanying drawings and specific embodiment party Present invention is further described in detail for formula.Embodiments of the present invention are described in more detail below, the embodiment is shown Example is shown in the accompanying drawings, and in which the same or similar labels are throughly indicated same or similar element or has identical or class Like the element of function.It is exemplary below with reference to the embodiment of attached drawing description, is only used for explaining the present invention, and cannot It is construed to limitation of the present invention.Those skilled in the art of the present technique are appreciated that unless expressly stated, odd number shape used herein Formula " one ", "one", " described " and "the" may also comprise plural form.It is to be further understood that the specification of the present invention The middle wording " comprising " used refers to that there are the feature, integer, step, operation, element and/or component, but it is not excluded that Other one or more features of presence or addition, integer, step, operation, element, component and/or their group.It should be understood that When we say that an element is " connected " or " coupled " to another element, it can be directly connected or coupled to other elements, or There may also be intermediary elements.In addition, " connection " used herein or " coupling " may include being wirelessly connected or coupling.Here make Wording "and/or" includes any cell of one or more associated list items and all combines.The art Technical staff is appreciated that unless otherwise defined all terms (including technical terms and scientific terms) used herein have Meaning identical with the general understanding of the those of ordinary skill in fields of the present invention.It should also be understood that such as general Term, which should be understood that, those of defined in dictionary has a meaning that is consistent with the meaning in the context of the prior art, and Unless being defined as here, will not be explained with the meaning of idealization or too formal.
Method flow in the present embodiment can specifically be held in a kind of system as shown in Figure 1 by computer software Row, it particularly relates to computer software performance monitoring, software algorithm programming, monitoring data confluence analysis, mathematical model It establishes.
The system includes:Operation system, Analysis server and background data base, each end equipment of system between each other can be with Channel is established by internet, and data interaction is carried out by respective data transmission port.
Analysis server disclosed in the present embodiment can be specifically work station, supercomputer on hardware view Etc. equipment, or a kind of server cluster or Analysis server for data processing for being made of multiple servers Function can also be integrated in background data base, operation system or other hardware systems, i.e. background data base, operation system Or other hardware systems realize the function of Analysis server by distributing a certain number of hardware resources, it specifically can be with Realize that different computing functions is integrated on hardware system by current virtual machine technique or distributed computing technology.Its In, Analysis server can acquire monitoring data in real time from monitor supervision platform, and monitor supervision platform is used for the operation of monitoring business system State, and record the prisons such as the system snapshot of daily record or operation system in the process of running in relation to operation system operation data Data are controlled, monitoring data can be distinguished according to the monitor control index specifically set on each monitor supervision platform.For example, this implementation The monitor supervision platform that may relate to includes but not limited in example:A Zabbix (offer distributed system prison based on WEB interface Depending on and network monitoring function enterprise-level solution of increasing income), cross-system synchronous communication frame (RSF), cross-system it is asynchronous Communications framework (ESB) etc..
In background data base, operation data when operation system operation is stored, such as:Price data, is ordered logistics data Forms data etc..Database schema common at present, type specifically may be used in background data base.
Operation system can be specifically the hardware for having computing function by multiple servers, super calculation etc. on hardware view Equipment composition, it is a kind of for runing system in line service, such as runed on online shopping platform inventive system, order system System, notice system etc..
The embodiment of the present invention provides a kind of method of analysis system stability, as shown in Figure 2 a, including:
S1, acquisition and the associated operation data of monitor control index.
Specifically, the monitor control index includes at least:Run the free time of the processor of the hardware device of operation system The user program holding time percentage of percentage, the write/read stand-by period percentage of the processor, the processor Percentage, the index of the computing resource in relation to hardware device such as utilization rate of disk read-write port, network interface card transmission are used than, memory Data traffic and the network interface card receive in the index of the communication resource in relation to hardware device such as data traffic at least one of.With And operation system generates in the process of running and most of data that can be registered as that daily record can be transferred, for example include:It is described The abnormal quantity of system, the service call amount of the system, the response time of the system, the system service exception amount and At least one of in the multi-group datas such as the order volume of the system.Monitor control index can specifically be used as operation number associated therewith According to label.
For example, as shown in Figure 2 b, Analysis server accesses the monitoring systems such as zabbix by way of timed task is arranged, Interface calling amount data collecting system, all kinds of monitor supervision platforms etc., to obtain the multinomial data of a period of time and land storage, than Such as:Acquire the zabbix system monitorings index (the system average load in such as 1min) in a period of time, the prison of zabbix systems Controlling index includes:It is the free time percentage of processor, the write/read stand-by period percentage of the processor, described User program holding time percentage, the memory of processor are sent using percentage, the utilization rate of disk read-write port, network interface card Data traffic and the network interface card receive data traffic.Specifically, can be adopted by the multi collect in a period of time, history of forming Sample set, and be recorded as monitoring contrast table, in order to carry out pair the current operation data of system and existing monitoring contrast table Than analysis, to rapidly find out the monitor control index being abnormal.
Wherein, the service call amount of the system can be understood as:Operation system is called at runtime in each single item service The number of appearance, for example call the business services such as inquiry service, rate of exchange service, broadcast service.
The response time of the system can be understood as:Operation system is directed to the response of certain some business function at runtime Time.
The service exception amount of the system can be understood as:Operation system is real-time by included detection means at runtime The quantity for the service exception that self-test is quoted.
The order volume of the system can be understood as:In a certain period of time, the quantity on order handled by operation system, In, handled quantity on order can be understood as the quantity on order being disposed, the quantity on order handled, either Sum of the two.
S2, using the correlation between different monitor control indexes, pending fortune is selected from the operation data acquired Row data, and determine fluctuation range.
Wherein, the correlation between monitor control index establishes related corresponding data between the more significant index of correlation Model determines the value of its related coefficient, by the value of related coefficient come the correlation between quantification monitoring index.In order to follow-up Connected applications actual conditions are that model exports the rational fluctuation range of result setting, pending to be detected by fluctuation range Operation data whether there is abnormal conditions.Specifically, each monitor control index is directed toward certain a kind of operation data, pending fortune Row data can be understood as:There are 2 monitor control indexes of correlation respectively corresponding operation datas.
Pending operation data includes:N group operation datas, and at least there is a pair of of tool in the N groups operation data There are the monitor control index of correlation, i.e. i-th group of associated monitor control index of operation data and the associated monitor control index of jth group operation data There are correlation, N >=2,1≤i≤N, 1≤j≤N and i ≠ j.
Such as:So that by other of the monitor control index of the business datum of this static state of order volume and system operation dynamic The monitor control index of data, which combines, to be possibly realized so that the monitor control index of multiple dimensions merges, and is quantified as related coefficient, then lead to Cross the operating status of correlation analysis system.
It is total in the present embodiment, it is easy analysis uniform units, acquired operation data progress data can be directed to and scabbled Stretch processing calculates the variance between each group of data, covariance, related coefficient.
S3, according to the fluctuation range, obtain the abnormal conditions of the current operation data of system.
The index typically artificially directly set by some in current existing technology is come monitoring system exception, often It is influenced by personal experience, compared with the accuracy that the index monitoring of coarseness has also been difficult to safeguards system monitoring.The accuracy of monitoring It is relatively low directly result in System Expansion after often all also need to debugging system, front and back debugging system is also required to many times.Monitoring Accuracy it is relatively low, also result in after system debug, in line service, all it is easy to appear some operation troubles, accidents, this is just The corresponding manpower of distribution in need carries out malfunction elimination and occupies a large amount of manpower to increase the operating cost of operator Resource.
In the present embodiment, by acquiring bi directional association system monitoring item data and carrying out confluence analysis, foundation to data Mathematical model judges system operation situation by judging whether collected system monitoring data meet mathematical model, abandons Judging the mode of system operation situation for what single monitored item set threshold values in the past, the system monitoring made is more accurate, Comprehensively.Such as:So that by other dynamic numbers of the monitor control index and system operation of the business datum of this static state of order volume According to monitor control index combine and be possibly realized so that the monitor control index of multiple dimensions merges, and is quantified as related coefficient, then pass through The operating status of correlation analysis system.
For the defect exposed when judging system operation situation by threshold values in traditional system monitoring means:Such as monitoring The problems such as scene is single, decision procedure is rigid, and judgement result does not square with the fact, the present invention is proposed based in one section of period of integration The multinomial monitoring data of system, and the mathematical model of correlation is established, to the monitoring data of mode input system, by output As a result it is analyzed, obtains the conclusion of system running state.Historical performance by being then based on system shows comprehensive multi objective system Meter analysis, technical staff need not carry out going to adjust the cumbersome behaviour of each system monitor item threshold values manually for different business scene again Make, avoids the situation for occurring monitoring alarm inaccuracy under different business scene, improve the intelligent journey of existing monitoring means Degree and accuracy.
In the present embodiment, step S2:The correlation using between different monitor control indexes, from the operation acquired Pending operation data is selected in data, and determines fluctuation range, may include:
Establish the data model of the pending operation data.The value of related coefficient is determined by the data model, And set the fluctuation range of the related coefficient.
Wherein, the correlation between each group of data is analyzed by index comprehensive, such as:The stability bandwidth of cpu load with order That singly measures has significant relation, also has relationship with network interface card, and is then not directly dependent upon with the abnormal rate of operation system, then CPU is negative The stability bandwidth of load and the related coefficient of order volume, the related coefficient with network interface card, will be higher than the phase with the abnormal rate of operation system Relationship number.Related corresponding data model is established between the index of correlation more notable (for example related coefficient is higher), is determined Its related coefficient is specifically worth, and connected applications actual conditions are that model exports the rational fluctuation range of result setting.
After carrying out unit to each item data of acquisition and being uniformly processed, and the related coefficient between each group of data is calculated, screened Go out the higher monitored item of related coefficient, the equation between associated monitoring item is established according to related coefficient, to the various operation shapes of system Operation data input model equation under state, calculates the value of related coefficient, therefrom counts related coefficient under normal circumstances The range of value, and in the case of system exception the value of related coefficient basis for estimation of the range as system operation situation.
Specifically, the data model for establishing the pending operation data, including:
The different operation data of at least two groups is acquired, and obtains related coefficient between every two groups of different operation datas.If The related coefficient of wherein two groups data is more than preset value, then establishes data of the related coefficient more than two groups of operation datas of preset value Model.For example, judging that the specific implementation scene of system running state is as follows according to data model:
The operation data monitored of the system operation of a period of time under normal circumstances is acquired, data1, data2 are labeled as, Data3, data4 ... input these data, and data are carried out with the operation of uniform units, and then every two groups of data calculate related Coefficient exports all related coefficient p12, p13, p14, p23, p24, p34.
Shown in the following calculation formula of related coefficient 1:
Wherein, E is mathematic expectaion, and cov indicates covariance, σXAnd σYIt is standard deviation, ∑ is summation, and X, Y indicate two kinds respectively Systematic parameter (systematic parameter can be according to specific operation system type set), N indicate that value quantity, μ indicate conversion coefficient.
1, all related coefficients are inputted, are screened, corresponding two groups of data of the high related coefficient of output numerical value, such as: P14 is 0.77, and corresponding data are data1, data4.Wherein, the high related coefficient of numerical value can be understood as:If related coefficient Absolute value is more than preset value, then shows that corresponding dependence on parameter is relatively strong and meets linear relationship.The preferred value of specific preset value It is 0.6.
2, each two groups of data are substituted into formula respectively to calculate, calculation formula 2 as follows may be used,
Wherein,For average,To sum, whereinEqual to the value of related coefficient,Indicate it is a kind of for calculating Intermediate parameters.And equation of linear regression isX, y indicates two kinds of systematic parameters (systematic parameter can root respectively According to specific operation system type set, x, y of small letter and the systematic parameter represented by X, Y of capitalization can it is identical can not also Together, need depending on the type of specific operation system), N indicates the conversion coefficient in calculation formula 2.
Specifically, 1-2 step calculating is carried out from operation data in different time periods to acquisition, and after repeatedly calculating evaluation, root According to the value of acquired related coefficient, an available distribution.After the calculating of mass data, one can be obtained A relatively stable distribution, you can as the fluctuation range, i.e., by the same related coefficient after multiple calculate The distribution situation of gained is as the fluctuation range.
In actual motion, can timing acquisition time previous stage at regular intervals data, calculate phase relation Several values, such as 100.2 and 50.6.Related coefficient and corresponding fluctuation range are compared, if gone beyond the scope and gap is more than Exceeding for being arranged is worth, such as:100.2 exceed the 42.12% of the upper limit 70.5, are more than the 10% of setting, are then determined as exception, go forward side by side The corresponding alarm of row.When alarm, then according to the cell-phone number in setting, warning message is sent to email address of related personnel etc.. For another example:Current operation data is acquired when system exception by the form of snapshot, and inputs corresponding mathematical model and is calculated, It is determined as abnormal conditions if obtained output result is beyond the fluctuation range of setting.For example, historical data order volume 10W When, cpu load only has 30%, and when order volume 10W, cpu load rises to 50% explanation suddenly, and there are problems.But In the scheme of traditional artificial given threshold, the threshold value of alarm often can all be higher than 50%, therefore in this case, only with Simple threshold values detection mode then still will be considered that cpu load is out of question and exception be not present.
In the present embodiment, step S3:It is described according to the fluctuation range, obtain the current operation data of the system The specific implementation of abnormal conditions may include:
The current operation data of the system is acquired, and the data model by being established exports the current fortune of the system The result of calculation of row data.When the result of calculation does not meet the fluctuation range, the current operation number of the system is judged According to exception.
Wherein, the result of calculation does not meet the fluctuation range and can be understood as:The concrete numerical value of the result of calculation It falls except the numerical intervals of the fluctuation range;And the concrete numerical value of the result of calculation is not exclusively fallen in the fluctuation range Numerical intervals within and the concrete numerical value of the result of calculation entirely fall within the feelings within the numerical intervals of the fluctuation range Condition, then it represents that the result of calculation meets the fluctuation range.Alternatively, the result of calculation do not meet the fluctuation range can be with It is interpreted as:The concrete numerical value of the result of calculation is not exclusively fallen within the numerical intervals of the fluctuation range, the only described meter The case where concrete numerical value of calculation result is entirely fallen within the numerical intervals of the fluctuation range just indicate the result of calculation symbol Close the fluctuation range.
Further, further include:
When judging the exception of the current operation data of system, exception information is extracted.And according to the exception information Send out early warning.
Wherein, the exception information includes at least the addresses host ip, the monitor control index and the corresponding generation of the system The interface message of abnormal operation data.Such as:The corresponding mathematical model of Analysis server gathered data input is calculated, if To fluctuation range of the output result beyond setting be then determined as abnormal conditions, and record relevant host ip, monitor control index With corresponding interface class, and alarm is sent out, relevant information is sent to related system by the form of mail or short message to be responsible for People.
In practical applications, it is suitable for multiple business scene, based on specific business scenario by data model with being programmed into Row is realized, timed task is arranged later, for involved in data model to system monitoring item data be acquired, and input number According to model, the value of related coefficient is obtained, then judges whether it belongs to normal range (NR), early warning mould is called if beyond normal range (NR) Block, including early warning SMS transmission module, with early warning mail sending module, recipient can send warning information with dynamic configuration.Make The judgement to system running state is obtained from the size of static more a certain item monitoring data and threshold values, it is more to be changed into comprehensive analysis Whether monitoring data correlation normal so that system monitoring it is more intelligent, it is comprehensive with it is accurate.One typical field of citing Scape:Such as before and after fresh code publication, indices are not above threshold values, but in the case where portfolio does not change, are System resource occupation obviously rises, then can determine whether out to may be that program itself is out of joint by index comprehensive analysis, timely early warning. To overcome using business scenario is single, can not dynamically monitor, the disadvantage of analysis result inaccuracy in traditional monitoring method, Improve the flexibility of monitoring.
In the present embodiment, it also provides a kind of according to the abnormal conditions judged, further determines whether to need based on different The occurrence frequency of reason condition carries out the scheme of system resource dilatation/capacity reducing, specifically includes:
Obtain the history abnormal data of the system, and according to the history abnormal data calculate the system dilatation or Capacity reducing demand.According to the dilatation of the system or capacity reducing demand, the resource quantity of the system is distributed in adjustment.
Wherein, the history abnormal data includes abnormal conditions that at the appointed time system has occurred in section and right Answer the exception information of the abnormal conditions occurred.Dilatation or capacity reducing demand, it can be understood as the hardware resource of system Adjusted value, such as:Need to increase newly/the processor quantity of reduction, amount of memory, disk space quantity etc., adjusted value is canonical Expression needs to increase, then indicates to need to reduce to be negative.Such as:Analysis server can be according between order volume and each index Inner link calculates server farms and the correspondence model of order volume, is provided for server expansion and subsequent application new engine auxiliary It helps.In calculation server scale, according to the order volume of input, each system tune usage data, server farms, reading and writing data The historical data of mode and server resource utilization substantially extrapolates machine configuration and the quantity of needs.To be that technical staff is fast Speed provides the scheme of dilatation/capacity reducing that can be for reference.
The present invention also provides a kind of devices of analysis system stability, specifically may operate in analysis clothes as shown in Figure 1 It is engaged on device, the device is as shown in Figure 3, including:
Data acquisition module, for acquiring and the associated operation data of monitor control index;
Preprocessing module selects to wait for for the correlation between different monitor control indexes from the operation data acquired The operation data of processing, and determine fluctuation range;
Analysis module, for using according to the fluctuation range, obtaining the abnormal conditions of the current operation data of system.
Wherein, the preprocessing module is specifically used for the different operation data of acquisition at least two groups, and obtains every two groups not Related coefficient between same operation data;If the related coefficient of wherein two groups data is more than preset value, it is big to establish related coefficient In the data model of two groups of operation datas of preset value;The value of related coefficient is determined by the data model later, and is set The fluctuation range of the related coefficient;
The monitor control index includes at least:The free time percentage of processor, the write/read of the processor wait for Percentage of time, the user program holding time percentage of the processor, memory are made using percentage, disk read-write port The data traffic and the network interface card sent with rate, network interface card receives data traffic, the service call amount of the system, the system At least one of in the order volume of response time, the service exception amount of the system and the system;
The analysis module is specifically used for acquiring the current operation data of the system, and the data mould by being established Type exports the result of calculation of the current operation data of the system;When the result of calculation does not meet the fluctuation range, sentence The exception of the current operation data of the fixed system.
Further, as shown in Figure 4, which further includes:
Alarm module, it is described different for when judging the exception of the current operation data of system, extracting exception information Normal information includes at least the addresses host ip, the monitor control index and the interface for corresponding to the operation data being abnormal of the system Information;Early warning is sent out according to the exception information;
Calibration module, the history abnormal data for obtaining the system, and institute is calculated according to the history abnormal data State dilatation or the capacity reducing demand of system, wherein the history abnormal data includes that at the appointed time the system has been sent out in section The exception information of the abnormal conditions occurred described in raw abnormal conditions and correspondence;And according to the dilatation of the system or capacity reducing need It asks, the resource quantity of the system is distributed in adjustment.
In the present embodiment, by acquiring bi directional association system monitoring item data and carrying out confluence analysis, foundation to data Mathematical model judges system operation situation by judging whether collected system monitoring data meet mathematical model, abandons Judging the mode of system operation situation for what single monitored item set threshold values in the past, the system monitoring made is more accurate, Comprehensively.Such as:So that by other dynamic numbers of the monitor control index and system operation of the business datum of this static state of order volume According to monitor control index combine and be possibly realized so that the monitor control index of multiple dimensions merges, and is quantified as related coefficient, then pass through The operating status of correlation analysis system.
For the defect exposed when judging system operation situation by threshold values in traditional system monitoring means:Such as monitoring The problems such as scene is single, decision procedure is rigid, and judgement result does not square with the fact, the present invention is proposed based in one section of period of integration The multinomial monitoring data of system, and the mathematical model of correlation is established, to the monitoring data of mode input system, by output As a result it is analyzed, obtains the conclusion of system running state.Historical performance by being then based on system shows comprehensive multi objective system Meter analysis, technical staff need not carry out going to adjust the cumbersome behaviour of each system monitor item threshold values manually for different business scene again Make, avoids the situation for occurring monitoring alarm inaccuracy under different business scene, improve the intelligent journey of existing monitoring means Degree and accuracy.
Each embodiment in this specification is described in a progressive manner, identical similar portion between each embodiment Point just to refer each other, and each embodiment focuses on the differences from other embodiments.For apparatus embodiments For, since it is substantially similar to the method embodiment, so describing fairly simple, referring to the portion of embodiment of the method in place of correlation It defends oneself bright.The above description is merely a specific embodiment, but scope of protection of the present invention is not limited thereto, Any one skilled in the art in the technical scope disclosed by the present invention, the change or replacement that can be readily occurred in, It should be covered by the protection scope of the present invention.Therefore, protection scope of the present invention should be with scope of the claims Subject to.

Claims (10)

1. a kind of method of analysis system stability, which is characterized in that including:
Acquisition and the associated operation data of monitor control index;
Using the correlation between different monitor control indexes, pending operation data is selected from the operation data acquired, And determine fluctuation range, wherein pending operation data includes:N group operation datas, and in the N groups operation data extremely Few to there is a pair of monitor control index with correlation, i.e. i-th group of associated monitor control index of operation data is closed with jth group operation data There are correlation, N >=2,1≤i≤N, 1≤j≤N and i ≠ j for the monitor control index of connection;
According to the fluctuation range, the abnormal conditions of the current operation data of system are obtained.
2. according to the method described in claim 1, it is characterized in that, the monitor control index includes at least:When the free time of processor Between percentage, the write/read stand-by period percentage of the processor, the processor user program holding time percentage The data traffic and the network interface card sent using percentage, the utilization rate of disk read-write port, network interface card than, memory receives data flow Amount, the service call amount of the system, the response time of the system, the service exception amount of the system and ordering for the system At least one of in single amount.
3. method according to claim 1 or 2, which is characterized in that the correlation using between different monitor control indexes Property, pending operation data is selected from the operation data acquired, and determine fluctuation range, including:
Establish the data model of the pending operation data;
The value of related coefficient is determined by the data model, and sets the fluctuation range of the related coefficient.
4. according to the method described in claim 3, it is characterized in that, the data mould for establishing the pending operation data Type, including:
The different operation data of at least two groups is acquired, and obtains related coefficient between every two groups of different operation datas;
If the related coefficient of wherein two groups data is more than preset value, two groups of operation datas that related coefficient is more than preset value are established Data model.
5. according to the method described in claim 1, it is characterized in that, described according to the fluctuation range, it is current to obtain system The abnormal conditions of operation data, including:
The current operation data of the system is acquired, and the data model by being established exports the current operation number of the system According to result of calculation;
When the result of calculation does not meet the fluctuation range, the exception of the current operation data of the system is judged.
6. method according to claim 1 or 5, which is characterized in that further include:
When judging the exception of the current operation data of system, exception information is extracted, the exception information includes at least institute State the addresses host ip of system, the interface message of the monitor control index and the corresponding operation data being abnormal;
Early warning is sent out according to the exception information.
7. according to the method described in claim 6, it is characterized in that, further including:
The history abnormal data of the system is obtained, and calculates dilatation or the capacity reducing of the system according to the history abnormal data Demand, wherein the history abnormal data includes the abnormal conditions and correspondence that at the appointed time the system has occurred in section The exception information of the abnormal conditions occurred;
According to the dilatation of the system or capacity reducing demand, the resource quantity of the system is distributed in adjustment.
8. a kind of device of analysis system stability, which is characterized in that including:
Data acquisition module, for acquiring and the associated operation data of monitor control index;
Preprocessing module selects pending for the correlation between different monitor control indexes from the operation data acquired Operation data, and determine fluctuation range;
Analysis module, for using according to the fluctuation range, obtaining the abnormal conditions of the current operation data of system.
9. device according to claim 8, which is characterized in that the preprocessing module is specifically used for acquisition at least two groups Different operation datas, and obtain related coefficient between every two groups of different operation datas;If the phase relation of wherein two groups data Number is more than preset value, then establishes data model of the related coefficient more than two groups of operation datas of preset value;Pass through the number later The value of related coefficient is determined according to model, and sets the fluctuation range of the related coefficient;
The monitor control index includes at least:The free time percentage of processor, the write/read stand-by period of the processor Percentage, the user program holding time percentage of the processor, memory use the use of percentage, disk read-write port Data traffic and the network interface card that rate, network interface card are sent receive the sound of data traffic, the service call amount of the system, the system At least one of between seasonable, in the order volume of the service exception amount of the system and the system;
The analysis module is specifically used for acquiring the current operation data of the system, and the data model by being established is defeated Go out the result of calculation of the current operation data of the system;When the result of calculation does not meet the fluctuation range, institute is judged State the exception of the current operation data of system.
10. device according to claim 8, which is characterized in that further include:
Alarm module, for when judging the exception of the current operation data of system, extracting exception information, the abnormal letter Interface letter of the breath including at least the addresses host ip of the system, the monitor control index and the corresponding operation data being abnormal Breath;Early warning is sent out according to the exception information;
Calibration module, the history abnormal data for obtaining the system, and the system is calculated according to the history abnormal data The dilatation of system or capacity reducing demand, wherein the history abnormal data includes that at the appointed time the system has occurred in section The exception information of the abnormal conditions occurred described in abnormal conditions and correspondence;And according to the dilatation of the system or capacity reducing demand, The resource quantity of the system is distributed in adjustment.
CN201810083390.1A 2018-01-29 2018-01-29 A kind of method and device of analysis system stability Pending CN108390793A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810083390.1A CN108390793A (en) 2018-01-29 2018-01-29 A kind of method and device of analysis system stability

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810083390.1A CN108390793A (en) 2018-01-29 2018-01-29 A kind of method and device of analysis system stability

Publications (1)

Publication Number Publication Date
CN108390793A true CN108390793A (en) 2018-08-10

Family

ID=63074226

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810083390.1A Pending CN108390793A (en) 2018-01-29 2018-01-29 A kind of method and device of analysis system stability

Country Status (1)

Country Link
CN (1) CN108390793A (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109522325A (en) * 2018-09-28 2019-03-26 中国平安人寿保险股份有限公司 Business impact analysis method, apparatus, electronic equipment and storage medium
WO2020237433A1 (en) * 2019-05-24 2020-12-03 李玄 Method and apparatus for monitoring digital certificate processing device, and device, medium and product
CN112423032A (en) * 2020-10-21 2021-02-26 当趣网络科技(杭州)有限公司 Data monitoring method and device based on smart television, electronic equipment and medium
CN112600705A (en) * 2020-12-14 2021-04-02 国网四川省电力公司信息通信公司 Method for automatic operation and maintenance of network equipment
CN114493378A (en) * 2022-04-06 2022-05-13 树根互联股份有限公司 Index acquisition method and device of industrial equipment and computer equipment

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104820630A (en) * 2015-05-22 2015-08-05 上海新炬网络信息技术有限公司 System resource monitoring device based on business variable quantity
CN106600115A (en) * 2016-11-28 2017-04-26 湖北华中电力科技开发有限责任公司 Intelligent operation and maintenance analysis method for enterprise information system

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104820630A (en) * 2015-05-22 2015-08-05 上海新炬网络信息技术有限公司 System resource monitoring device based on business variable quantity
CN106600115A (en) * 2016-11-28 2017-04-26 湖北华中电力科技开发有限责任公司 Intelligent operation and maintenance analysis method for enterprise information system

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109522325A (en) * 2018-09-28 2019-03-26 中国平安人寿保险股份有限公司 Business impact analysis method, apparatus, electronic equipment and storage medium
WO2020237433A1 (en) * 2019-05-24 2020-12-03 李玄 Method and apparatus for monitoring digital certificate processing device, and device, medium and product
US11924194B2 (en) 2019-05-24 2024-03-05 Antpool Technologies Limited Method and apparatus for monitoring digital certificate processing device, and device, medium, and product
CN112423032A (en) * 2020-10-21 2021-02-26 当趣网络科技(杭州)有限公司 Data monitoring method and device based on smart television, electronic equipment and medium
CN112600705A (en) * 2020-12-14 2021-04-02 国网四川省电力公司信息通信公司 Method for automatic operation and maintenance of network equipment
CN114493378A (en) * 2022-04-06 2022-05-13 树根互联股份有限公司 Index acquisition method and device of industrial equipment and computer equipment

Similar Documents

Publication Publication Date Title
CN108390793A (en) A kind of method and device of analysis system stability
CN106951984B (en) Dynamic analysis and prediction method and device for system health degree
CA2756198C (en) Digital analytics system
Hong Estimating quantile sensitivities
US20020194042A1 (en) Method of business analysis
Gopinath et al. A waste relationship model and center point tracking metric for lean manufacturing systems
EP2278502A1 (en) Deleting data stream overload
CN110532152A (en) A kind of monitoring alarm processing method and system based on Kapacitor computing engines
CN106612216A (en) Method and apparatus of detecting website access exception
CN106656557A (en) Service state processing method and device
CN114500339B (en) Node bandwidth monitoring method and device, electronic equipment and storage medium
CN111984442A (en) Method and device for detecting abnormality of computer cluster system, and storage medium
Wang et al. A motifs-based Maximum Entropy Markov Model for realtime reliability prediction in System of Systems
CN117041017A (en) Intelligent operation and maintenance management method and system for data center
CN108039971A (en) A kind of alarm method and device
CN113342939B (en) Data quality monitoring method and device and related equipment
CN112416590A (en) Server system resource adjusting method and device, computer equipment and storage medium
CN113434270A (en) Data resource scheduling method and device, electronic equipment and storage medium
CN114662952A (en) Behavior data evaluation method, behavior data evaluation device, behavior data evaluation equipment and storage medium
CN112491585A (en) Micro-service health degree evaluation method and device
CN114140241A (en) Abnormity identification method and device for transaction monitoring index
CN113254781A (en) Model determination method and device in recommendation system, electronic equipment and storage medium
CN110879770A (en) Terminal performance evaluation and field fault self-detection method and system
CN109117449A (en) Method based on non-linear least square calculation using models Internet bar installation rate
CN112148491B (en) Data processing method and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20180810