CN108390793A - A kind of method and device of analysis system stability - Google Patents
A kind of method and device of analysis system stability Download PDFInfo
- Publication number
- CN108390793A CN108390793A CN201810083390.1A CN201810083390A CN108390793A CN 108390793 A CN108390793 A CN 108390793A CN 201810083390 A CN201810083390 A CN 201810083390A CN 108390793 A CN108390793 A CN 108390793A
- Authority
- CN
- China
- Prior art keywords
- data
- operation data
- monitor control
- fluctuation range
- related coefficient
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000004458 analytical method Methods 0.000 title claims abstract description 34
- 238000000034 method Methods 0.000 title claims abstract description 24
- 230000002159 abnormal effect Effects 0.000 claims abstract description 40
- 238000004364 calculation method Methods 0.000 claims description 22
- 238000013499 data model Methods 0.000 claims description 16
- 241001269238 Data Species 0.000 claims description 13
- 230000004044 response Effects 0.000 claims description 5
- 238000007781 pre-processing Methods 0.000 claims description 4
- 238000012544 monitoring process Methods 0.000 abstract description 62
- 238000005516 engineering process Methods 0.000 abstract description 8
- 238000013178 mathematical model Methods 0.000 description 11
- 230000006870 function Effects 0.000 description 7
- 230000003068 static effect Effects 0.000 description 6
- 230000009897 systematic effect Effects 0.000 description 5
- 238000010219 correlation analysis Methods 0.000 description 4
- 238000010586 diagram Methods 0.000 description 4
- 230000005540 biological transmission Effects 0.000 description 3
- 238000004891 communication Methods 0.000 description 3
- 230000007547 defect Effects 0.000 description 3
- 238000012545 processing Methods 0.000 description 3
- 230000006399 behavior Effects 0.000 description 2
- 230000008859 change Effects 0.000 description 2
- 238000006243 chemical reaction Methods 0.000 description 2
- 230000008878 coupling Effects 0.000 description 2
- 238000010168 coupling process Methods 0.000 description 2
- 238000005859 coupling reaction Methods 0.000 description 2
- 238000001514 detection method Methods 0.000 description 2
- 230000008030 elimination Effects 0.000 description 2
- 238000003379 elimination reaction Methods 0.000 description 2
- 230000010354 integration Effects 0.000 description 2
- 230000007257 malfunction Effects 0.000 description 2
- 230000008569 process Effects 0.000 description 2
- 230000001419 dependent effect Effects 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 238000011156 evaluation Methods 0.000 description 1
- 230000003993 interaction Effects 0.000 description 1
- 238000012417 linear regression Methods 0.000 description 1
- 239000000203 mixture Substances 0.000 description 1
- 230000000750 progressive effect Effects 0.000 description 1
- 238000011002 quantification Methods 0.000 description 1
- 230000009467 reduction Effects 0.000 description 1
- 230000006641 stabilisation Effects 0.000 description 1
- 238000011105 stabilization Methods 0.000 description 1
- 238000007619 statistical method Methods 0.000 description 1
- 230000001360 synchronised effect Effects 0.000 description 1
- 238000012360 testing method Methods 0.000 description 1
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L43/00—Arrangements for monitoring or testing data switching networks
- H04L43/08—Monitoring or testing based on specific metrics, e.g. QoS, energy consumption or environmental parameters
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L41/00—Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
- H04L41/14—Network analysis or design
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L43/00—Arrangements for monitoring or testing data switching networks
- H04L43/04—Processing captured monitoring data, e.g. for logfile generation
Landscapes
- Engineering & Computer Science (AREA)
- Computer Networks & Wireless Communication (AREA)
- Signal Processing (AREA)
- Data Mining & Analysis (AREA)
- Environmental & Geological Engineering (AREA)
- Debugging And Monitoring (AREA)
Abstract
The embodiment of the invention discloses a kind of method and devices of analysis system stability, are related to field of computer technology, can improve intelligence degree and the accuracy of monitoring system stability.The present invention includes:Acquisition and the associated operation data of monitor control index;Using the correlation between different monitor control indexes, pending operation data is selected from the operation data acquired, and determine fluctuation range;According to the fluctuation range, the abnormal conditions of the current operation data of system are obtained.Stability for analysis system.
Description
Technical field
The present invention relates to field of computer technology more particularly to a kind of method and devices of analysis system stability.
Background technology
With the development of computer technology and Internet technology, the scale of Chinese Internet industry constantly expands, largely
Constantly designed in line service, in order to ensure these in line service normal operation, where needing these real-time business
The operation conditions of system
Currently, most system monitorings use and set threshold values for a certain system performance measure, by comparing runtime value
Judge whether system running state is normal with the size of threshold values, but the monitor mode of this static setting monitor control index,
It is merely able to the index monitoring for solving some compared with coarseness, such as the fingers such as congestion situations of the loading condition of monitoring CPU, the network port
Target monitor, only can decision-making system whether overload.And in practical applications, the effect of monitoring is not smart enough, flexible, at present
Monitoring strategies often all there is a problem of monitoring scene is single, decision procedure ossify, especially under many most complex scenarios
System operation situation, it is difficult to make correct judgement.
And in order to improve the stability of system, most common mode is to carry out dilatation for system.In new system application or expansion
Rong Shi, also can reference index monitoring evaluate the configuration of required machine and quantity.But due to the threshold values of these indexs monitoring, often
It is to be influenced by personal experience according to the empirically determined of people again, it is very inaccurate.
Invention content
The embodiment of the present invention provides a kind of method and device of analysis system stability, can improve monitoring system stabilization
The intelligence degree of property and accuracy.
The index typically artificially directly set by some in current existing technology is come monitoring system exception, often
It is influenced by personal experience, compared with the accuracy that the index monitoring of coarseness has also been difficult to safeguards system monitoring.The accuracy of monitoring
It is relatively low directly result in System Expansion after often all also need to debugging system, front and back debugging system is also required to many times.Monitoring
Accuracy it is relatively low, also result in after system debug, in line service, all it is easy to appear some operation troubles, accidents, this is just
The corresponding manpower of distribution in need carries out malfunction elimination and occupies a large amount of manpower to increase the operating cost of operator
Resource.
For the defect exposed when judging system operation situation by threshold values in traditional system monitoring means:Such as monitoring
The problems such as scene is single, decision procedure is rigid, and judgement result does not square with the fact, in the present embodiment, by acquiring bi directional association
System monitoring item data simultaneously carries out confluence analysis, founding mathematical models to data, by judging collected system monitoring data
Whether meet mathematical model to judge system operation situation, has abandoned and in the past single monitored item setting threshold values had been judged to be
The mode for operation conditions of uniting, the system monitoring made is more accurate, comprehensively.Such as:So that by the industry of this static state of order volume
The monitor control index of the monitor control index for data of being engaged in and other dynamic datas of system operation, which combines, to be possibly realized so that Duo Gewei
The monitor control index of degree merges, and is quantified as related coefficient, then the operating status by correlation analysis system.
Historical performance by being then based on system shows comprehensive multi objective statistical analysis, and technical staff need not be again into the hand-manipulating of needle
The troublesome operation for removing to adjust each system monitor item threshold values manually to different business scene, avoids and is supervised under different business scene
The inaccurate situation of control alarm, improves intelligence degree and the accuracy of existing monitoring means.
Description of the drawings
It to describe the technical solutions in the embodiments of the present invention more clearly, below will be to needed in the embodiment
Attached drawing is briefly described, it should be apparent that, drawings in the following description are only some embodiments of the invention, for ability
For the those of ordinary skill of domain, without creative efforts, it can also be obtained according to these attached drawings other attached
Figure.
Fig. 1 is system architecture schematic diagram provided in an embodiment of the present invention;
Fig. 2 a are method flow schematic diagram provided in an embodiment of the present invention;
Fig. 2 b are the schematic diagram of specific example provided in an embodiment of the present invention;
Fig. 3, Fig. 4 are apparatus structure schematic diagram provided in an embodiment of the present invention.
Specific implementation mode
To make those skilled in the art more fully understand technical scheme of the present invention, below in conjunction with the accompanying drawings and specific embodiment party
Present invention is further described in detail for formula.Embodiments of the present invention are described in more detail below, the embodiment is shown
Example is shown in the accompanying drawings, and in which the same or similar labels are throughly indicated same or similar element or has identical or class
Like the element of function.It is exemplary below with reference to the embodiment of attached drawing description, is only used for explaining the present invention, and cannot
It is construed to limitation of the present invention.Those skilled in the art of the present technique are appreciated that unless expressly stated, odd number shape used herein
Formula " one ", "one", " described " and "the" may also comprise plural form.It is to be further understood that the specification of the present invention
The middle wording " comprising " used refers to that there are the feature, integer, step, operation, element and/or component, but it is not excluded that
Other one or more features of presence or addition, integer, step, operation, element, component and/or their group.It should be understood that
When we say that an element is " connected " or " coupled " to another element, it can be directly connected or coupled to other elements, or
There may also be intermediary elements.In addition, " connection " used herein or " coupling " may include being wirelessly connected or coupling.Here make
Wording "and/or" includes any cell of one or more associated list items and all combines.The art
Technical staff is appreciated that unless otherwise defined all terms (including technical terms and scientific terms) used herein have
Meaning identical with the general understanding of the those of ordinary skill in fields of the present invention.It should also be understood that such as general
Term, which should be understood that, those of defined in dictionary has a meaning that is consistent with the meaning in the context of the prior art, and
Unless being defined as here, will not be explained with the meaning of idealization or too formal.
Method flow in the present embodiment can specifically be held in a kind of system as shown in Figure 1 by computer software
Row, it particularly relates to computer software performance monitoring, software algorithm programming, monitoring data confluence analysis, mathematical model
It establishes.
The system includes:Operation system, Analysis server and background data base, each end equipment of system between each other can be with
Channel is established by internet, and data interaction is carried out by respective data transmission port.
Analysis server disclosed in the present embodiment can be specifically work station, supercomputer on hardware view
Etc. equipment, or a kind of server cluster or Analysis server for data processing for being made of multiple servers
Function can also be integrated in background data base, operation system or other hardware systems, i.e. background data base, operation system
Or other hardware systems realize the function of Analysis server by distributing a certain number of hardware resources, it specifically can be with
Realize that different computing functions is integrated on hardware system by current virtual machine technique or distributed computing technology.Its
In, Analysis server can acquire monitoring data in real time from monitor supervision platform, and monitor supervision platform is used for the operation of monitoring business system
State, and record the prisons such as the system snapshot of daily record or operation system in the process of running in relation to operation system operation data
Data are controlled, monitoring data can be distinguished according to the monitor control index specifically set on each monitor supervision platform.For example, this implementation
The monitor supervision platform that may relate to includes but not limited in example:A Zabbix (offer distributed system prison based on WEB interface
Depending on and network monitoring function enterprise-level solution of increasing income), cross-system synchronous communication frame (RSF), cross-system it is asynchronous
Communications framework (ESB) etc..
In background data base, operation data when operation system operation is stored, such as:Price data, is ordered logistics data
Forms data etc..Database schema common at present, type specifically may be used in background data base.
Operation system can be specifically the hardware for having computing function by multiple servers, super calculation etc. on hardware view
Equipment composition, it is a kind of for runing system in line service, such as runed on online shopping platform inventive system, order system
System, notice system etc..
The embodiment of the present invention provides a kind of method of analysis system stability, as shown in Figure 2 a, including:
S1, acquisition and the associated operation data of monitor control index.
Specifically, the monitor control index includes at least:Run the free time of the processor of the hardware device of operation system
The user program holding time percentage of percentage, the write/read stand-by period percentage of the processor, the processor
Percentage, the index of the computing resource in relation to hardware device such as utilization rate of disk read-write port, network interface card transmission are used than, memory
Data traffic and the network interface card receive in the index of the communication resource in relation to hardware device such as data traffic at least one of.With
And operation system generates in the process of running and most of data that can be registered as that daily record can be transferred, for example include:It is described
The abnormal quantity of system, the service call amount of the system, the response time of the system, the system service exception amount and
At least one of in the multi-group datas such as the order volume of the system.Monitor control index can specifically be used as operation number associated therewith
According to label.
For example, as shown in Figure 2 b, Analysis server accesses the monitoring systems such as zabbix by way of timed task is arranged,
Interface calling amount data collecting system, all kinds of monitor supervision platforms etc., to obtain the multinomial data of a period of time and land storage, than
Such as:Acquire the zabbix system monitorings index (the system average load in such as 1min) in a period of time, the prison of zabbix systems
Controlling index includes:It is the free time percentage of processor, the write/read stand-by period percentage of the processor, described
User program holding time percentage, the memory of processor are sent using percentage, the utilization rate of disk read-write port, network interface card
Data traffic and the network interface card receive data traffic.Specifically, can be adopted by the multi collect in a period of time, history of forming
Sample set, and be recorded as monitoring contrast table, in order to carry out pair the current operation data of system and existing monitoring contrast table
Than analysis, to rapidly find out the monitor control index being abnormal.
Wherein, the service call amount of the system can be understood as:Operation system is called at runtime in each single item service
The number of appearance, for example call the business services such as inquiry service, rate of exchange service, broadcast service.
The response time of the system can be understood as:Operation system is directed to the response of certain some business function at runtime
Time.
The service exception amount of the system can be understood as:Operation system is real-time by included detection means at runtime
The quantity for the service exception that self-test is quoted.
The order volume of the system can be understood as:In a certain period of time, the quantity on order handled by operation system,
In, handled quantity on order can be understood as the quantity on order being disposed, the quantity on order handled, either
Sum of the two.
S2, using the correlation between different monitor control indexes, pending fortune is selected from the operation data acquired
Row data, and determine fluctuation range.
Wherein, the correlation between monitor control index establishes related corresponding data between the more significant index of correlation
Model determines the value of its related coefficient, by the value of related coefficient come the correlation between quantification monitoring index.In order to follow-up
Connected applications actual conditions are that model exports the rational fluctuation range of result setting, pending to be detected by fluctuation range
Operation data whether there is abnormal conditions.Specifically, each monitor control index is directed toward certain a kind of operation data, pending fortune
Row data can be understood as:There are 2 monitor control indexes of correlation respectively corresponding operation datas.
Pending operation data includes:N group operation datas, and at least there is a pair of of tool in the N groups operation data
There are the monitor control index of correlation, i.e. i-th group of associated monitor control index of operation data and the associated monitor control index of jth group operation data
There are correlation, N >=2,1≤i≤N, 1≤j≤N and i ≠ j.
Such as:So that by other of the monitor control index of the business datum of this static state of order volume and system operation dynamic
The monitor control index of data, which combines, to be possibly realized so that the monitor control index of multiple dimensions merges, and is quantified as related coefficient, then lead to
Cross the operating status of correlation analysis system.
It is total in the present embodiment, it is easy analysis uniform units, acquired operation data progress data can be directed to and scabbled
Stretch processing calculates the variance between each group of data, covariance, related coefficient.
S3, according to the fluctuation range, obtain the abnormal conditions of the current operation data of system.
The index typically artificially directly set by some in current existing technology is come monitoring system exception, often
It is influenced by personal experience, compared with the accuracy that the index monitoring of coarseness has also been difficult to safeguards system monitoring.The accuracy of monitoring
It is relatively low directly result in System Expansion after often all also need to debugging system, front and back debugging system is also required to many times.Monitoring
Accuracy it is relatively low, also result in after system debug, in line service, all it is easy to appear some operation troubles, accidents, this is just
The corresponding manpower of distribution in need carries out malfunction elimination and occupies a large amount of manpower to increase the operating cost of operator
Resource.
In the present embodiment, by acquiring bi directional association system monitoring item data and carrying out confluence analysis, foundation to data
Mathematical model judges system operation situation by judging whether collected system monitoring data meet mathematical model, abandons
Judging the mode of system operation situation for what single monitored item set threshold values in the past, the system monitoring made is more accurate,
Comprehensively.Such as:So that by other dynamic numbers of the monitor control index and system operation of the business datum of this static state of order volume
According to monitor control index combine and be possibly realized so that the monitor control index of multiple dimensions merges, and is quantified as related coefficient, then pass through
The operating status of correlation analysis system.
For the defect exposed when judging system operation situation by threshold values in traditional system monitoring means:Such as monitoring
The problems such as scene is single, decision procedure is rigid, and judgement result does not square with the fact, the present invention is proposed based in one section of period of integration
The multinomial monitoring data of system, and the mathematical model of correlation is established, to the monitoring data of mode input system, by output
As a result it is analyzed, obtains the conclusion of system running state.Historical performance by being then based on system shows comprehensive multi objective system
Meter analysis, technical staff need not carry out going to adjust the cumbersome behaviour of each system monitor item threshold values manually for different business scene again
Make, avoids the situation for occurring monitoring alarm inaccuracy under different business scene, improve the intelligent journey of existing monitoring means
Degree and accuracy.
In the present embodiment, step S2:The correlation using between different monitor control indexes, from the operation acquired
Pending operation data is selected in data, and determines fluctuation range, may include:
Establish the data model of the pending operation data.The value of related coefficient is determined by the data model,
And set the fluctuation range of the related coefficient.
Wherein, the correlation between each group of data is analyzed by index comprehensive, such as:The stability bandwidth of cpu load with order
That singly measures has significant relation, also has relationship with network interface card, and is then not directly dependent upon with the abnormal rate of operation system, then CPU is negative
The stability bandwidth of load and the related coefficient of order volume, the related coefficient with network interface card, will be higher than the phase with the abnormal rate of operation system
Relationship number.Related corresponding data model is established between the index of correlation more notable (for example related coefficient is higher), is determined
Its related coefficient is specifically worth, and connected applications actual conditions are that model exports the rational fluctuation range of result setting.
After carrying out unit to each item data of acquisition and being uniformly processed, and the related coefficient between each group of data is calculated, screened
Go out the higher monitored item of related coefficient, the equation between associated monitoring item is established according to related coefficient, to the various operation shapes of system
Operation data input model equation under state, calculates the value of related coefficient, therefrom counts related coefficient under normal circumstances
The range of value, and in the case of system exception the value of related coefficient basis for estimation of the range as system operation situation.
Specifically, the data model for establishing the pending operation data, including:
The different operation data of at least two groups is acquired, and obtains related coefficient between every two groups of different operation datas.If
The related coefficient of wherein two groups data is more than preset value, then establishes data of the related coefficient more than two groups of operation datas of preset value
Model.For example, judging that the specific implementation scene of system running state is as follows according to data model:
The operation data monitored of the system operation of a period of time under normal circumstances is acquired, data1, data2 are labeled as,
Data3, data4 ... input these data, and data are carried out with the operation of uniform units, and then every two groups of data calculate related
Coefficient exports all related coefficient p12, p13, p14, p23, p24, p34.
Shown in the following calculation formula of related coefficient 1:
Wherein, E is mathematic expectaion, and cov indicates covariance, σXAnd σYIt is standard deviation, ∑ is summation, and X, Y indicate two kinds respectively
Systematic parameter (systematic parameter can be according to specific operation system type set), N indicate that value quantity, μ indicate conversion coefficient.
1, all related coefficients are inputted, are screened, corresponding two groups of data of the high related coefficient of output numerical value, such as:
P14 is 0.77, and corresponding data are data1, data4.Wherein, the high related coefficient of numerical value can be understood as:If related coefficient
Absolute value is more than preset value, then shows that corresponding dependence on parameter is relatively strong and meets linear relationship.The preferred value of specific preset value
It is 0.6.
2, each two groups of data are substituted into formula respectively to calculate, calculation formula 2 as follows may be used,
Wherein,For average,To sum, whereinEqual to the value of related coefficient,Indicate it is a kind of for calculating
Intermediate parameters.And equation of linear regression isX, y indicates two kinds of systematic parameters (systematic parameter can root respectively
According to specific operation system type set, x, y of small letter and the systematic parameter represented by X, Y of capitalization can it is identical can not also
Together, need depending on the type of specific operation system), N indicates the conversion coefficient in calculation formula 2.
Specifically, 1-2 step calculating is carried out from operation data in different time periods to acquisition, and after repeatedly calculating evaluation, root
According to the value of acquired related coefficient, an available distribution.After the calculating of mass data, one can be obtained
A relatively stable distribution, you can as the fluctuation range, i.e., by the same related coefficient after multiple calculate
The distribution situation of gained is as the fluctuation range.
In actual motion, can timing acquisition time previous stage at regular intervals data, calculate phase relation
Several values, such as 100.2 and 50.6.Related coefficient and corresponding fluctuation range are compared, if gone beyond the scope and gap is more than
Exceeding for being arranged is worth, such as:100.2 exceed the 42.12% of the upper limit 70.5, are more than the 10% of setting, are then determined as exception, go forward side by side
The corresponding alarm of row.When alarm, then according to the cell-phone number in setting, warning message is sent to email address of related personnel etc..
For another example:Current operation data is acquired when system exception by the form of snapshot, and inputs corresponding mathematical model and is calculated,
It is determined as abnormal conditions if obtained output result is beyond the fluctuation range of setting.For example, historical data order volume 10W
When, cpu load only has 30%, and when order volume 10W, cpu load rises to 50% explanation suddenly, and there are problems.But
In the scheme of traditional artificial given threshold, the threshold value of alarm often can all be higher than 50%, therefore in this case, only with
Simple threshold values detection mode then still will be considered that cpu load is out of question and exception be not present.
In the present embodiment, step S3:It is described according to the fluctuation range, obtain the current operation data of the system
The specific implementation of abnormal conditions may include:
The current operation data of the system is acquired, and the data model by being established exports the current fortune of the system
The result of calculation of row data.When the result of calculation does not meet the fluctuation range, the current operation number of the system is judged
According to exception.
Wherein, the result of calculation does not meet the fluctuation range and can be understood as:The concrete numerical value of the result of calculation
It falls except the numerical intervals of the fluctuation range;And the concrete numerical value of the result of calculation is not exclusively fallen in the fluctuation range
Numerical intervals within and the concrete numerical value of the result of calculation entirely fall within the feelings within the numerical intervals of the fluctuation range
Condition, then it represents that the result of calculation meets the fluctuation range.Alternatively, the result of calculation do not meet the fluctuation range can be with
It is interpreted as:The concrete numerical value of the result of calculation is not exclusively fallen within the numerical intervals of the fluctuation range, the only described meter
The case where concrete numerical value of calculation result is entirely fallen within the numerical intervals of the fluctuation range just indicate the result of calculation symbol
Close the fluctuation range.
Further, further include:
When judging the exception of the current operation data of system, exception information is extracted.And according to the exception information
Send out early warning.
Wherein, the exception information includes at least the addresses host ip, the monitor control index and the corresponding generation of the system
The interface message of abnormal operation data.Such as:The corresponding mathematical model of Analysis server gathered data input is calculated, if
To fluctuation range of the output result beyond setting be then determined as abnormal conditions, and record relevant host ip, monitor control index
With corresponding interface class, and alarm is sent out, relevant information is sent to related system by the form of mail or short message to be responsible for
People.
In practical applications, it is suitable for multiple business scene, based on specific business scenario by data model with being programmed into
Row is realized, timed task is arranged later, for involved in data model to system monitoring item data be acquired, and input number
According to model, the value of related coefficient is obtained, then judges whether it belongs to normal range (NR), early warning mould is called if beyond normal range (NR)
Block, including early warning SMS transmission module, with early warning mail sending module, recipient can send warning information with dynamic configuration.Make
The judgement to system running state is obtained from the size of static more a certain item monitoring data and threshold values, it is more to be changed into comprehensive analysis
Whether monitoring data correlation normal so that system monitoring it is more intelligent, it is comprehensive with it is accurate.One typical field of citing
Scape:Such as before and after fresh code publication, indices are not above threshold values, but in the case where portfolio does not change, are
System resource occupation obviously rises, then can determine whether out to may be that program itself is out of joint by index comprehensive analysis, timely early warning.
To overcome using business scenario is single, can not dynamically monitor, the disadvantage of analysis result inaccuracy in traditional monitoring method,
Improve the flexibility of monitoring.
In the present embodiment, it also provides a kind of according to the abnormal conditions judged, further determines whether to need based on different
The occurrence frequency of reason condition carries out the scheme of system resource dilatation/capacity reducing, specifically includes:
Obtain the history abnormal data of the system, and according to the history abnormal data calculate the system dilatation or
Capacity reducing demand.According to the dilatation of the system or capacity reducing demand, the resource quantity of the system is distributed in adjustment.
Wherein, the history abnormal data includes abnormal conditions that at the appointed time system has occurred in section and right
Answer the exception information of the abnormal conditions occurred.Dilatation or capacity reducing demand, it can be understood as the hardware resource of system
Adjusted value, such as:Need to increase newly/the processor quantity of reduction, amount of memory, disk space quantity etc., adjusted value is canonical
Expression needs to increase, then indicates to need to reduce to be negative.Such as:Analysis server can be according between order volume and each index
Inner link calculates server farms and the correspondence model of order volume, is provided for server expansion and subsequent application new engine auxiliary
It helps.In calculation server scale, according to the order volume of input, each system tune usage data, server farms, reading and writing data
The historical data of mode and server resource utilization substantially extrapolates machine configuration and the quantity of needs.To be that technical staff is fast
Speed provides the scheme of dilatation/capacity reducing that can be for reference.
The present invention also provides a kind of devices of analysis system stability, specifically may operate in analysis clothes as shown in Figure 1
It is engaged on device, the device is as shown in Figure 3, including:
Data acquisition module, for acquiring and the associated operation data of monitor control index;
Preprocessing module selects to wait for for the correlation between different monitor control indexes from the operation data acquired
The operation data of processing, and determine fluctuation range;
Analysis module, for using according to the fluctuation range, obtaining the abnormal conditions of the current operation data of system.
Wherein, the preprocessing module is specifically used for the different operation data of acquisition at least two groups, and obtains every two groups not
Related coefficient between same operation data;If the related coefficient of wherein two groups data is more than preset value, it is big to establish related coefficient
In the data model of two groups of operation datas of preset value;The value of related coefficient is determined by the data model later, and is set
The fluctuation range of the related coefficient;
The monitor control index includes at least:The free time percentage of processor, the write/read of the processor wait for
Percentage of time, the user program holding time percentage of the processor, memory are made using percentage, disk read-write port
The data traffic and the network interface card sent with rate, network interface card receives data traffic, the service call amount of the system, the system
At least one of in the order volume of response time, the service exception amount of the system and the system;
The analysis module is specifically used for acquiring the current operation data of the system, and the data mould by being established
Type exports the result of calculation of the current operation data of the system;When the result of calculation does not meet the fluctuation range, sentence
The exception of the current operation data of the fixed system.
Further, as shown in Figure 4, which further includes:
Alarm module, it is described different for when judging the exception of the current operation data of system, extracting exception information
Normal information includes at least the addresses host ip, the monitor control index and the interface for corresponding to the operation data being abnormal of the system
Information;Early warning is sent out according to the exception information;
Calibration module, the history abnormal data for obtaining the system, and institute is calculated according to the history abnormal data
State dilatation or the capacity reducing demand of system, wherein the history abnormal data includes that at the appointed time the system has been sent out in section
The exception information of the abnormal conditions occurred described in raw abnormal conditions and correspondence;And according to the dilatation of the system or capacity reducing need
It asks, the resource quantity of the system is distributed in adjustment.
In the present embodiment, by acquiring bi directional association system monitoring item data and carrying out confluence analysis, foundation to data
Mathematical model judges system operation situation by judging whether collected system monitoring data meet mathematical model, abandons
Judging the mode of system operation situation for what single monitored item set threshold values in the past, the system monitoring made is more accurate,
Comprehensively.Such as:So that by other dynamic numbers of the monitor control index and system operation of the business datum of this static state of order volume
According to monitor control index combine and be possibly realized so that the monitor control index of multiple dimensions merges, and is quantified as related coefficient, then pass through
The operating status of correlation analysis system.
For the defect exposed when judging system operation situation by threshold values in traditional system monitoring means:Such as monitoring
The problems such as scene is single, decision procedure is rigid, and judgement result does not square with the fact, the present invention is proposed based in one section of period of integration
The multinomial monitoring data of system, and the mathematical model of correlation is established, to the monitoring data of mode input system, by output
As a result it is analyzed, obtains the conclusion of system running state.Historical performance by being then based on system shows comprehensive multi objective system
Meter analysis, technical staff need not carry out going to adjust the cumbersome behaviour of each system monitor item threshold values manually for different business scene again
Make, avoids the situation for occurring monitoring alarm inaccuracy under different business scene, improve the intelligent journey of existing monitoring means
Degree and accuracy.
Each embodiment in this specification is described in a progressive manner, identical similar portion between each embodiment
Point just to refer each other, and each embodiment focuses on the differences from other embodiments.For apparatus embodiments
For, since it is substantially similar to the method embodiment, so describing fairly simple, referring to the portion of embodiment of the method in place of correlation
It defends oneself bright.The above description is merely a specific embodiment, but scope of protection of the present invention is not limited thereto,
Any one skilled in the art in the technical scope disclosed by the present invention, the change or replacement that can be readily occurred in,
It should be covered by the protection scope of the present invention.Therefore, protection scope of the present invention should be with scope of the claims
Subject to.
Claims (10)
1. a kind of method of analysis system stability, which is characterized in that including:
Acquisition and the associated operation data of monitor control index;
Using the correlation between different monitor control indexes, pending operation data is selected from the operation data acquired,
And determine fluctuation range, wherein pending operation data includes:N group operation datas, and in the N groups operation data extremely
Few to there is a pair of monitor control index with correlation, i.e. i-th group of associated monitor control index of operation data is closed with jth group operation data
There are correlation, N >=2,1≤i≤N, 1≤j≤N and i ≠ j for the monitor control index of connection;
According to the fluctuation range, the abnormal conditions of the current operation data of system are obtained.
2. according to the method described in claim 1, it is characterized in that, the monitor control index includes at least:When the free time of processor
Between percentage, the write/read stand-by period percentage of the processor, the processor user program holding time percentage
The data traffic and the network interface card sent using percentage, the utilization rate of disk read-write port, network interface card than, memory receives data flow
Amount, the service call amount of the system, the response time of the system, the service exception amount of the system and ordering for the system
At least one of in single amount.
3. method according to claim 1 or 2, which is characterized in that the correlation using between different monitor control indexes
Property, pending operation data is selected from the operation data acquired, and determine fluctuation range, including:
Establish the data model of the pending operation data;
The value of related coefficient is determined by the data model, and sets the fluctuation range of the related coefficient.
4. according to the method described in claim 3, it is characterized in that, the data mould for establishing the pending operation data
Type, including:
The different operation data of at least two groups is acquired, and obtains related coefficient between every two groups of different operation datas;
If the related coefficient of wherein two groups data is more than preset value, two groups of operation datas that related coefficient is more than preset value are established
Data model.
5. according to the method described in claim 1, it is characterized in that, described according to the fluctuation range, it is current to obtain system
The abnormal conditions of operation data, including:
The current operation data of the system is acquired, and the data model by being established exports the current operation number of the system
According to result of calculation;
When the result of calculation does not meet the fluctuation range, the exception of the current operation data of the system is judged.
6. method according to claim 1 or 5, which is characterized in that further include:
When judging the exception of the current operation data of system, exception information is extracted, the exception information includes at least institute
State the addresses host ip of system, the interface message of the monitor control index and the corresponding operation data being abnormal;
Early warning is sent out according to the exception information.
7. according to the method described in claim 6, it is characterized in that, further including:
The history abnormal data of the system is obtained, and calculates dilatation or the capacity reducing of the system according to the history abnormal data
Demand, wherein the history abnormal data includes the abnormal conditions and correspondence that at the appointed time the system has occurred in section
The exception information of the abnormal conditions occurred;
According to the dilatation of the system or capacity reducing demand, the resource quantity of the system is distributed in adjustment.
8. a kind of device of analysis system stability, which is characterized in that including:
Data acquisition module, for acquiring and the associated operation data of monitor control index;
Preprocessing module selects pending for the correlation between different monitor control indexes from the operation data acquired
Operation data, and determine fluctuation range;
Analysis module, for using according to the fluctuation range, obtaining the abnormal conditions of the current operation data of system.
9. device according to claim 8, which is characterized in that the preprocessing module is specifically used for acquisition at least two groups
Different operation datas, and obtain related coefficient between every two groups of different operation datas;If the phase relation of wherein two groups data
Number is more than preset value, then establishes data model of the related coefficient more than two groups of operation datas of preset value;Pass through the number later
The value of related coefficient is determined according to model, and sets the fluctuation range of the related coefficient;
The monitor control index includes at least:The free time percentage of processor, the write/read stand-by period of the processor
Percentage, the user program holding time percentage of the processor, memory use the use of percentage, disk read-write port
Data traffic and the network interface card that rate, network interface card are sent receive the sound of data traffic, the service call amount of the system, the system
At least one of between seasonable, in the order volume of the service exception amount of the system and the system;
The analysis module is specifically used for acquiring the current operation data of the system, and the data model by being established is defeated
Go out the result of calculation of the current operation data of the system;When the result of calculation does not meet the fluctuation range, institute is judged
State the exception of the current operation data of system.
10. device according to claim 8, which is characterized in that further include:
Alarm module, for when judging the exception of the current operation data of system, extracting exception information, the abnormal letter
Interface letter of the breath including at least the addresses host ip of the system, the monitor control index and the corresponding operation data being abnormal
Breath;Early warning is sent out according to the exception information;
Calibration module, the history abnormal data for obtaining the system, and the system is calculated according to the history abnormal data
The dilatation of system or capacity reducing demand, wherein the history abnormal data includes that at the appointed time the system has occurred in section
The exception information of the abnormal conditions occurred described in abnormal conditions and correspondence;And according to the dilatation of the system or capacity reducing demand,
The resource quantity of the system is distributed in adjustment.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810083390.1A CN108390793A (en) | 2018-01-29 | 2018-01-29 | A kind of method and device of analysis system stability |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810083390.1A CN108390793A (en) | 2018-01-29 | 2018-01-29 | A kind of method and device of analysis system stability |
Publications (1)
Publication Number | Publication Date |
---|---|
CN108390793A true CN108390793A (en) | 2018-08-10 |
Family
ID=63074226
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201810083390.1A Pending CN108390793A (en) | 2018-01-29 | 2018-01-29 | A kind of method and device of analysis system stability |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN108390793A (en) |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109522325A (en) * | 2018-09-28 | 2019-03-26 | 中国平安人寿保险股份有限公司 | Business impact analysis method, apparatus, electronic equipment and storage medium |
WO2020237433A1 (en) * | 2019-05-24 | 2020-12-03 | 李玄 | Method and apparatus for monitoring digital certificate processing device, and device, medium and product |
CN112423032A (en) * | 2020-10-21 | 2021-02-26 | 当趣网络科技(杭州)有限公司 | Data monitoring method and device based on smart television, electronic equipment and medium |
CN112600705A (en) * | 2020-12-14 | 2021-04-02 | 国网四川省电力公司信息通信公司 | Method for automatic operation and maintenance of network equipment |
CN114493378A (en) * | 2022-04-06 | 2022-05-13 | 树根互联股份有限公司 | Index acquisition method and device of industrial equipment and computer equipment |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104820630A (en) * | 2015-05-22 | 2015-08-05 | 上海新炬网络信息技术有限公司 | System resource monitoring device based on business variable quantity |
CN106600115A (en) * | 2016-11-28 | 2017-04-26 | 湖北华中电力科技开发有限责任公司 | Intelligent operation and maintenance analysis method for enterprise information system |
-
2018
- 2018-01-29 CN CN201810083390.1A patent/CN108390793A/en active Pending
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104820630A (en) * | 2015-05-22 | 2015-08-05 | 上海新炬网络信息技术有限公司 | System resource monitoring device based on business variable quantity |
CN106600115A (en) * | 2016-11-28 | 2017-04-26 | 湖北华中电力科技开发有限责任公司 | Intelligent operation and maintenance analysis method for enterprise information system |
Cited By (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109522325A (en) * | 2018-09-28 | 2019-03-26 | 中国平安人寿保险股份有限公司 | Business impact analysis method, apparatus, electronic equipment and storage medium |
WO2020237433A1 (en) * | 2019-05-24 | 2020-12-03 | 李玄 | Method and apparatus for monitoring digital certificate processing device, and device, medium and product |
US11924194B2 (en) | 2019-05-24 | 2024-03-05 | Antpool Technologies Limited | Method and apparatus for monitoring digital certificate processing device, and device, medium, and product |
CN112423032A (en) * | 2020-10-21 | 2021-02-26 | 当趣网络科技(杭州)有限公司 | Data monitoring method and device based on smart television, electronic equipment and medium |
CN112600705A (en) * | 2020-12-14 | 2021-04-02 | 国网四川省电力公司信息通信公司 | Method for automatic operation and maintenance of network equipment |
CN114493378A (en) * | 2022-04-06 | 2022-05-13 | 树根互联股份有限公司 | Index acquisition method and device of industrial equipment and computer equipment |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN108390793A (en) | A kind of method and device of analysis system stability | |
CN106951984B (en) | Dynamic analysis and prediction method and device for system health degree | |
CA2756198C (en) | Digital analytics system | |
Hong | Estimating quantile sensitivities | |
US20020194042A1 (en) | Method of business analysis | |
Gopinath et al. | A waste relationship model and center point tracking metric for lean manufacturing systems | |
EP2278502A1 (en) | Deleting data stream overload | |
CN110532152A (en) | A kind of monitoring alarm processing method and system based on Kapacitor computing engines | |
CN106612216A (en) | Method and apparatus of detecting website access exception | |
CN106656557A (en) | Service state processing method and device | |
CN114500339B (en) | Node bandwidth monitoring method and device, electronic equipment and storage medium | |
CN111984442A (en) | Method and device for detecting abnormality of computer cluster system, and storage medium | |
Wang et al. | A motifs-based Maximum Entropy Markov Model for realtime reliability prediction in System of Systems | |
CN117041017A (en) | Intelligent operation and maintenance management method and system for data center | |
CN108039971A (en) | A kind of alarm method and device | |
CN113342939B (en) | Data quality monitoring method and device and related equipment | |
CN112416590A (en) | Server system resource adjusting method and device, computer equipment and storage medium | |
CN113434270A (en) | Data resource scheduling method and device, electronic equipment and storage medium | |
CN114662952A (en) | Behavior data evaluation method, behavior data evaluation device, behavior data evaluation equipment and storage medium | |
CN112491585A (en) | Micro-service health degree evaluation method and device | |
CN114140241A (en) | Abnormity identification method and device for transaction monitoring index | |
CN113254781A (en) | Model determination method and device in recommendation system, electronic equipment and storage medium | |
CN110879770A (en) | Terminal performance evaluation and field fault self-detection method and system | |
CN109117449A (en) | Method based on non-linear least square calculation using models Internet bar installation rate | |
CN112148491B (en) | Data processing method and device |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20180810 |