CN103178990A - Network device performance monitoring method and network management system - Google Patents

Network device performance monitoring method and network management system Download PDF

Info

Publication number
CN103178990A
CN103178990A CN2011104303495A CN201110430349A CN103178990A CN 103178990 A CN103178990 A CN 103178990A CN 2011104303495 A CN2011104303495 A CN 2011104303495A CN 201110430349 A CN201110430349 A CN 201110430349A CN 103178990 A CN103178990 A CN 103178990A
Authority
CN
China
Prior art keywords
time window
index data
performance index
kpi performance
described
Prior art date
Application number
CN2011104303495A
Other languages
Chinese (zh)
Inventor
单建业
刘武升
王明昭
石国章
刘涛
赵浩然
邓小红
Original Assignee
中国移动通信集团青海有限公司
中国移动通信集团公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 中国移动通信集团青海有限公司, 中国移动通信集团公司 filed Critical 中国移动通信集团青海有限公司
Priority to CN2011104303495A priority Critical patent/CN103178990A/en
Publication of CN103178990A publication Critical patent/CN103178990A/en

Links

Abstract

The invention discloses a network device performance monitoring method and a network management system. The network device performance monitoring method includes: the network management system acquires key performance indicator (KPI) data of a network device periodically; the acquired KPI data are summarized according to set time windows to obtain KPI data summarizing values of the time windows; a linear regression algorithm model is utilized, KPI data of future N time windows including the current time window are predicted according to KPI data summarizing values of N time windows before the current window; whether the KPI data of the further N time windows exceed a set alarm threshold value is judged; and a alarm is given out during yes judgment. The network device performance monitoring method and the network management system can achieve early warning according to KPI historical data.

Description

A kind of performance of network equipments method for supervising and network management system

Technical field

The present invention relates to the network management technology of the communications field, relate in particular to a kind of performance of network equipments method for supervising and network management system.

Background technology

In the flourish course of telecommunications industry, business support system has been brought into play more and more important effect at aspects such as customer service, service fulfillment, service guarantee, charging and account, forecaste and plans, needs along with market development, the continuous release of new business, business support system is accompanied by the growth of company and is also constantly carrying out self excelling, and the fundamental node that is supporting the normal operation of system is exactly main frame, and the safety problem of main frame is also increasingly serious.

Traditional host performance monitor mode is all to adopt the KPI of monitoring host computer operation (the KPI full name is Key Performance Indicator, it is Key Performance Indicator, index as the operating position of CPU usage, memory usage, IO throughput, hard disk utilization rate, database) set threshold value, when system monitoring surpasses threshold value to the KPI index, send initiatively early warning, the advantage of the method is simple and easy to do.

From long-term O﹠M practice, prior art has its obvious limitation, is mainly reflected in the following aspects:

(1) traditional variation tendency of not considering the KPI of system index based on the monitor mode of threshold value, in the middle of system's running, if the KPI indexes suddenly changed occurs, but also do not reach when setting threshold value, can not produce alarm, and in fact having needed to cause system maintenance personnel's concern this moment, the anti-locking system KPI index of the intervening measure of the formula of need to taking the initiative further goes up.

(2) traditional monitor mode based on threshold value is all the passive type monitoring of adopting, and just reports to the police after situation has occured, and may have influence on the normal operation of system this time, can not play the purpose of active preventative monitoring.

In a word, the monitor mode that is based on threshold value that traditional host performance monitor mode adopts is in case when the KPI of system index surpasses threshold value, be often system's overload operation state.Alarm this moment for the customer, deals with difficulty large, and system's operation risk is also higher.

Summary of the invention

The embodiment of the present invention provides a kind of performance of network equipments method for supervising and network management system, in order to realize carrying out early warning according to KPI performance index historical data.

The performance of network equipments method for supervising that the embodiment of the present invention provides comprises:

The KPI performance index data of network management system cycle collection network equipment;

Described network management system gathers the KPI performance index data that collect according to the time window of setting, the KPI performance index data that obtain each time window gather value;

Described network management system is utilized the linear regression algorithm model, and gathers value according to the KPI performance index data of N time window before the current time window, and prediction comprises the KPI performance index data of following N time window of current time window;

Described network management system judges whether the KPI performance index data of described following N time window surpass the setting alarm threshold, and sends alarm when being judged as YES.

The performance of network equipments supervising device that the embodiment of the present invention provides comprises:

Acquisition module is for the KPI performance index data of cycle collection network equipment;

Summarizing module, the KPI performance index data that are used for collecting gather according to the time window of setting, and the KPI performance index data that obtain each time window gather value;

Prediction module is used for utilizing the linear regression algorithm model, and gathers value according to the KPI performance index data of N time window before the current time window, and prediction comprises the KPI performance index data of following N time window of current time window;

Alarm module is used for judging whether the KPI performance index data of a pre-described following N time window surpass the setting alarm threshold, and sends alarm when being judged as YES.

the above embodiment of the present invention, gather according to the time window of setting by the KPI performance index data that will collect, and utilize the linear regression algorithm model, gather value according to the KPI performance index data of N time window before the current time window, prediction comprises the KPI performance index data of following N time window of current time window, thereby utilize historical KPI performance index data that following KPI performance index data are predicted, and carry out alarm according to prediction case, thereby sent early warning before actual generation problem, in order to realize that system is carried out preanalysis to be processed.

Description of drawings

The performance of network equipments monitoring schematic flow sheet that Fig. 1 provides for the embodiment of the present invention;

Fig. 2 and Fig. 3 are respectively the medium-term and long-term KPI performance index data prediction schematic diagram in the embodiment of the present invention;

Fig. 4 and Fig. 5 are respectively the long-term KPI performance index data prediction schematic diagram in the embodiment of the present invention;

The structural representation of the network management system that Fig. 6 provides for the embodiment of the present invention.

Embodiment

Problem for the prior art existence, the embodiment of the present invention is according to the KPI performance index data that monitor, the operating position of system resource in predict future a period of time, and carry out active monitoring and alarm according to prediction case, thereby sent early warning before actual generation problem, in order to realize that system is carried out preanalysis to be processed.

Here said KPI performance index data can comprise that CPU usage, memory usage, network throughput etc. can characterize one of various parameters of the network equipment or operation system performance or combination in any.

Below in conjunction with accompanying drawing, the embodiment of the present invention is described in detail.

Referring to Fig. 1, be the performance of network equipments monitoring schematic flow sheet that the embodiment of the present invention provides, this flow process can realize by network management system, this flow process can comprise:

Step 101, the KPI performance index data of cycle collection network equipment.

During concrete enforcement, can gather various KPI performance index data from traditional supervisory control system.

Step 102 gathers the KPI performance index data that collect according to the time window of setting, the KPI performance index data that obtain each time window gather value.

During concrete enforcement, the KPI performance index data peaks that collects in a time window can be gathered value as the KPI performance index data of this time window.In the situation that include a plurality of data statistics cycles in a time window, the KPI performance index data peaks that collects in each data statistics cycle in a time window is averaged, again with the mean value of the KPI performance index data peaks of each data collection cycle, gather value as the KPI performance index data of this time window.

Step 103 is utilized the linear regression algorithm model, and gathers value according to the KPI performance index data of N time window before the current time window, and prediction comprises the KPI performance index data of following N time window of current time window.The time window of the N here refers to a continuous N time window.

The KPI performance index data of following N the time window that step 104, the prediction of output go out.

Further, above-mentioned flow process also comprises:

Step 105 judges whether the KPI performance index data of a following N time window satisfy alarm conditions, and send alarm when being judged as YES.This step and step 104 do not have strict sequential requirement.

During concrete enforcement, if the KPI performance index data that collect in step 101 comprise polytype, as comprise CPU usage and memory usage, in this step, need to whether the needs alarm judges to the CPU usage of a following N time window according to the alarm conditions of CPU usage, whether the needs alarm judges to the memory usage of a following N time window according to the alarm conditions of memory usage, and carries out alarm according to judged result.Wherein, determine whether alarm according to alarm conditions, following several implementation can be arranged:

Mode 1: if there are data to surpass the threshold value (this threshold value is default fixed value) of these KPI performance index in the KPI performance index data of a following N time window, send alarm;

Mode 2: determination methods is with mode 1, but the threshold value of KPI performance index wherein is the KPI performance index data according to a following N time window, and the KPI performance index data of N time window before the current time window gather the linear regression algorithm model slope of value, and dynamically adjustment obtains on predefined threshold basis;

Mode 3: (this threshold value can set in advance if having data to surpass this KPI Performance Counter Threshold in the KPI performance index data of a following N time window, also can dynamically adjust as mode 2 and obtain), and linear regression algorithm model slope is satisfied to impose a condition, and sends alarm.

Mode 4: aforesaid way is combined with.

According to the length that time window arranges, the embodiment of the present invention can realize pre-front monitoring alarm, medium-term and long-term pre-front monitoring alarm and long-term pre-front monitoring alarm closely in real time.

Predetermined period of nearly live pre-front monitoring alarm is shorter, can in time predict the performance situation of the network equipment, in order to pinpoint the problems as early as possible.Its time length of window can be arranged between 10 to 20 minutes, and for example 10 minutes, 15 minutes or 20 minutes, the best was 15 minutes, and corresponding time window length also can be set according to dissimilar KPI.This time window length of 10 to 20 minutes is the empirical value that obtains from long-term O﹠M work, and repeatedly tests the time window value best to the effect that predicts the outcome that draws.

In nearly live pre-front monitoring flow process, when the KPI performance index data of N the time window of statistics before the current time window gather value, the KPI performance index data peaks that collects in each time window wherein can be gathered value as the KPI performance index data of corresponding time window.

When the KPI performance index data of a predict future N time window, can gather value according to the KPI performance index data of N time window before the current time window, with the time window sequence number as independent variable, KPI performance index data are gathered value as dependent variable, obtain the one-variable linear regression equation; And then according to the KPI performance index data of N time window of this equation predict future.

When whether the KPI performance index data that judge a following N time window satisfy the alarm conditions of corresponding K PI performance index, the KPI Performance Counter Threshold of institute's foundation can be according to the KPI performance index data of a following N time window, and the KPI performance index data of N time window before the current time window gather the linear regression algorithm model slope of value, the KPI Performance Counter Threshold that sets in advance are adjusted obtain.The judgement of KPI early warning need to be considered threshold value and slope usually, such as following alarm regulation can be set:

(VALUE>90)OR(VALUE>80AND?SLOPE>0.5)

The meaning of this rule is: predicted value surpasses 90% (threshold value 1), and perhaps predicted value surpasses 80% (threshold value 2) and trend slope greater than 0.5, will send alarm.

Further, the value of these values can in use constantly be adjusted and optimize, mainly according to following several principles:

(1) if produced monitoring alarm in original supervisory control system, and do not produce in advance early warning in this early warning system, need to analyze the KPI data cases before producing alarm, suitably reduce absolute threshold, or reduce slope threshold value.

(2) if produced a large amount of early warning, and major part all belongs to wrong report, needs to heighten absolute threshold or heightens slope threshold value.

(3) more satisfactory situation is, in the 1-2 of the monitoring alarm of original supervisory control system 80% before producing hour, native system has early warning, and the accuracy rate of early warning simultaneously should reach more than 70%.

Predetermined period of medium-term and long-term pre-front monitoring alarm is moderate, predicts too frequently and alarm monitoring before can in time predicting the performance situation of the network equipment and being unlikely to again to resemble nearly fact in advance.Its time length of window can be arranged on 1 day or several days, and for example 1 day, 2 days or 5 days, the best was 1 day, and corresponding time window length also can be set according to dissimilar KPI.This time window length of 1 day is the empirical value that obtains from long-term O﹠M work, and repeatedly tests the time window value best to the effect that predicts the outcome that draws.

In medium-term and long-term pre-front monitoring flow process, when the KPI performance index data of N time window before statistics current time window gather value, the KPI performance index data peaks that collects in each data statistics cycle in a time window can be averaged, again with the mean value of the KPI performance index data peaks of each data collection cycle, gather value as the KPI performance index data of this time window.

When the KPI performance index data of a predict future N time window, in can N time window before the current time window, gather with the KPI performance index data of each time window the KPI performance index data that value deducts its previous time window and gather value, obtain including the array of N-1 increment size; Utilize the linear regression algorithm model, and according to this array, calculate the linear regression algorithm model slope (namely with the element sequence number in array as dependent variable, the element value of correspondence as independent variable, is obtained in the one-variable linear regression equation) of this N-1 increment size; Utilize this linear regression algorithm model, and calculate respectively according to this slope the increment size that the KPI performance index data of following N the time window that comprises the current time window are compared with previous time window; Increment size according to the KPI performance index data of each time window in a following N time window, and the KPI performance index data prediction value of previous time window, obtain the KPI performance index data prediction value of each time window in a following N time window; Wherein, the KPI performance index data prediction value of first time window in N time window in this future gathers the value sum for the KPI performance index data of corresponding increment size time window previous with it.

When whether the KPI performance index data that judge a following N time window satisfy the alarm conditions of corresponding K PI performance index, if the increment slope is greater than 0 (showing that ascendant trend is very precipitous), can produce alarm or with mark mode, prediction data be marked, to causing network manager's attention.

Before long-term pre-, predetermined period of monitoring alarm is longer, and the performance situation of the network equipment that can longer a period of time of predict future is in order to adopt corresponding processing policy according to the KPI variation tendency.Its time length of window can be set to one month or some months, and the best is 1 month, and corresponding time window length can be set according to dissimilar KPI certainly.This time window length of 1 month is the empirical value that obtains from long-term O﹠M work, and repeatedly tests the time window value best to the effect that predicts the outcome that draws.

In long-term pre-front monitoring flow process, when the KPI performance index data of N time window before statistics current time window gather value, the KPI performance index data peaks that collects in each data statistics cycle in a time window can be averaged, again with the mean value of the KPI performance index data peaks of each data collection cycle, gather value as the KPI performance index data of this time window.

When the KPI performance index data of a predict future N time window, can gather value according to the KPI performance index data of N time window before the current time window, with the time window sequence number as independent variable, KPI performance index data are gathered value as dependent variable, obtain the one-variable linear regression equation; And then according to the KPI performance index data of N time window of this equation predict future.

For long-term pre-front monitoring, alarming mechanism can be set.Because before long-term pre-, monitoring is mainly be used to checking the long-term behaviour tendency, can be used as the reference of capacity analysis, such as: CPU is in a high position for a long time, and rising trend is arranged, and whether considers the hardware dilatation.

The below illustrates the closely live realization flow of pre-front monitoring alarm, medium-term and long-term pre-front monitoring alarm and long-term pre-front monitoring alarm closely in real time respectively in conjunction with instantiation.

Example one: nearly pre-front monitoring alarm in real time

Take the CPU usage of monitor network equipment, time window length as 15 minutes as example, its near live pre-before monitoring alarm flow process can comprise:

Gathered a secondary data in every 5 to 15 minutes from traditional supervisory control system, for example gathered once in every 5 minutes, 10 minutes or 15 minutes.Each time window is 15 minutes, can collect so one or more CPU usage of a network equipment in each time window, and the maximum of the CPU usage that collects in each time window is gathered value as the CPU usage of this time window.Then get forward 8 time windows since the current time window, namely get the CPU usage of front 2 hours, the CPU usage of these 8 time windows gathers value can be as shown in table 1.

Table 1

Time CPU usage (%) ??20:00 ??73.2 ??20:15 ??72.1 ??20:30 ??74.3 ??20:45 ??74.6 ??21:00 ??75.1 ??21:15 ??75.8 ??21:30 ??78.3 ??21:45 ??77.2

The below gathers the CPU usage predicted value of following 8 time windows of value prediction with the CPU usage of these 8 time windows:

The CPU usage of these 8 time windows is processed into the form of independent variable X and dependent variable Y, as shown in table 2:

Table 2

??X ??Y ??1 ??73.2 ??2 ??72.1 ??3 ??74.3 ??4 ??74.6 ??5 ??75.1 ??6 ??75.8 ??7 ??78.3 ??8 ??77.2

The linear regression formula is y=a*x+b, at first calculates respectively the value of a and b according to table 2 before prediction.The embodiment of the present invention adopts the model of the LineRegression of Weka to calculate, and can substitute with two functions in Excel here:

A=SLOPE (dependent variable array, independent variable array)

B=INTERCEPT (dependent variable array, independent variable array)

Data according in above-mentioned linear regression algorithm model and table 2 calculate respectively b=71.65, a=0.7619.

Then according to formula y=a*x+b, the just CPU usage of measurable following 8 time windows.Concrete, the cpu busy percentage of the 9th time window is:

Y 9=a*x+b=0.7619*9+71.65=78.50 (get approximation, keep after decimal point 2)

The cpu busy percentage of the 10th time window is:

Y 10=a*x+b=0.7619*10+71.65=78.27 (get approximation, keep after decimal point 2)

The like the measurable cpu busy percentage that goes out the 9th to the 18th time window, as shown in table 3:

Table 3

??X ??Y ??9 ??78.50 ??10 ??79.27

??11 ??80.03 ??12 ??80.79 ??13 ??81.55 ??14 ??82.31 ??15 ??83.08 ??16 ??83.84 ??17 ??84.60 ??18 ??85.36

Judge whether needs adjustment CPU usage threshold value according to default CPU usage threshold value adjustment strategy, adjust if need, adjust on the basis of predefined CPU usage threshold value, and adjudicate whether needs alarm according to the prediction data of the CPU usage threshold value after adjusting and table 3.

For example, in this flow process, predefined CPU usage threshold value is 80%.According to above-mentioned linear regression algorithm model y=0.7619*x+71.65 as can be known slope be 0.7619, this value is larger, show that the CPU usage ascendant trend is comparatively obvious, time window length is 15 minutes, just can reach predefined threshold value like this within the very fast time, and also can rise with fast speed, further the predicted value given according to table 3 can find out that most of predicted value surpasses 80%, and tends to 85%.In this case, can not adjust the CPU usage threshold value, because according to prediction, CPU usage rises very fast, need in time send alarm to remind the network manager to process.

Example two: medium-term and long-term pre-front monitoring alarm

Take the CPU usage of monitor network equipment, time window length as 1 day as example, its medium-term and long-term pre-before monitoring alarm flow process can comprise:

Gather one time CPU usage according to the data collection cycle of setting from traditional supervisory control system, for example gathered once in every 5 minutes, 10 minutes or 15 minutes, each hour added up the maximum of the CPU usage that collects in this time period, the CPU usage peak value that each hour within 1 day counted on averages, and obtains the peak-to-average of the CPU usage of this day.

Then get forward the CPU usage peak-to-average of 15 time windows since the current time window, namely get the peak-to-average of the CPU usage of front 15 days, the peak-to-average of the CPU usage of these 15 time windows can be as shown in Figure 2.Wherein, show the peak-to-average of the CPU usage of every day during May 16 to May 30 in 2009.

Afterwards, in the peak-to-average of the CPU usage of front 15 days, deduct the CPU usage of its previous day with the CPU usage of every day, obtain including the array of 14 increment sizes; Then utilize the linear regression algorithm model, and according to this array, calculate the linear regression algorithm model slope of these 14 increment sizes; And then utilize this linear regression algorithm model, and calculate respectively according to this slope the increment size that the CPU usage of following 15 day every day was compared with its previous day.

Afterwards, with the CPU usage increment size of following 15 day every day and the CPU usage predicted value addition of its previous day, obtain the CPU usage predicted value of this day.Wherein, in this 15 days futures, the CPU usage predicted value of first day is the increment size of this day and the peak-to-average sum of the CPU usage of its previous day.

According to above forecasting process, as shown in Figure 2, obtain the CPU usage predicted value on May 31st, 2009 to June 14.

Carry out the prediction of KPI performance index data by the slope of slope, the variation tendency of KPI performance index data can be amplified, to causing monitor staff's attention.

As shown in Figure 3, show May 31 to the situation of the actual CPU usage on June 14.Compare with prediction before, although do not exceed 75%, but CPU usage but had remarkable rising than front 15 days, in this case, although do not reach the threshold value of monitoring alarm, the CPU usage of this machine with speed rising faster, needs the monitor staff to pay close attention to, investigate as early as possible the zooming reason of CPU usage, avoid affecting in the near future the operation of system.

Example three: long-term pre-front monitoring alarm

Take the CPU usage of monitor network equipment, time window length as 1 month as example, its long-term pre-before monitoring alarm flow process can comprise:

Gather one time CPU usage according to the data collection cycle of setting from traditional supervisory control system, for example gathered once in every 5 minutes, 10 minutes or 15 minutes, each hour added up the maximum of the CPU usage that collects in this time period, the CPU usage peak value that each hour within 1 day counted on averages, and obtains the peak-to-average of the CPU usage of this day; The CPU usage peak-to-average of every day in one month is averaged again, obtain the CPU usage peak-to-average of month.

Then get forward the CPU usage peak-to-average of 3 months from beginning in current month, the peak-to-average of this CPU usage of 3 months can be as shown in Figure 4.Wherein, show during in May, 2009~July the peak-to-average of CPU usage per month.

Then according to per month the peak-to-average of CPU usage during in May, 2009~July, utilize the algorithm of linear regression, the trimestral CPU usage of predict future.

As can be seen from Figure 4: following three months CPU usage are little than the variation of earlier month, substantially more stable, can think that this server will satisfy capacity requirement in 3 months futures.

Take the CPU usage of certain application server of company as example, as shown in Figure 5:

This example comes the trimestral CPU usage situation of predict future by calculating 2009 5,6,7 trimestral CPU usage averages.As can be seen from Figure 5: CPU usage is obvious downward trend, and coming few months will be down to below 50%, can consider to redistribute resource, makes this machine to use more rationally and effectively.

Based on identical technical conceive, the embodiment of the present invention also provides a kind of performance monitoring apparatus of realizing above-mentioned flow process.

Referring to Fig. 6, the structural representation of the network management system that provides for the embodiment of the present invention.This device can comprise:

Acquisition module 601 is for the KPI performance index data of cycle collection network equipment;

Summarizing module 602, the KPI performance index data that are used for collecting gather according to the time window of setting, and the KPI performance index data that obtain each time window gather value;

Prediction module 603 is used for utilizing the linear regression algorithm model, and gathers value according to the KPI performance index data of N time window before the current time window, and prediction comprises the KPI performance index data of following N time window of current time window;

Alarm module 604 is used for judging whether the KPI performance index data of a pre-described following N time window surpass the setting alarm threshold, and sends alarm when being judged as YES.

Further, also can comprise output module 605, be used for the KPI performance index data of the described following N time window that the prediction of output arrives.

Concrete, the KPI performance index data that summarizing module 602 gathers described each time window that obtains gather value, be the KPI performance index data peaks that collects in each time window, the mean value of the KPI performance index data peaks that perhaps collects for each collection period in each time window, the length of described collection period is less than the length of described time window.

Concrete, prediction module 603 can be utilized the linear regression algorithm model, and gathering value according to the KPI performance index data of N time window before the current time window, the KPI performance index data that calculate this N time window gather the linear regression algorithm model slope of value; Utilize described linear regression algorithm model, and calculate respectively the predicted value of the KPI performance index data of following N the time window that comprises the current time window according to this slope.Its specific implementation can be described with reference to aforementioned flow process.

Concrete, prediction module 603 can N time window before the current time window in, gather with the KPI performance index data of each time window the KPI performance index data that value deducts its previous time window and gather value, obtain including the array of N-1 increment size; Utilize the linear regression algorithm model, and according to this array, calculate the linear regression algorithm model slope of this N-1 increment size; Utilize described linear regression algorithm model, and calculate respectively according to this slope the increment size that the KPI performance index data of following N the time window that comprises the current time window are compared with previous time window; Increment size according to the KPI performance index data of each time window in a described following N time window, and the KPI performance index data prediction value of previous time window, obtain the KPI performance index data prediction value of each time window in described following N time window; Wherein, the KPI performance index data prediction value of first time window in N time window in this future gathers the value sum for the KPI performance index data of corresponding increment size time window previous with it.Its specific implementation can be described with reference to aforementioned flow process.

Concrete, alarm module 604 specifically is used for: if the KPI performance index data of described following N time window surpass this KPI Performance Counter Threshold, send alarm; If the linear regression algorithm model slope that the KPI performance index data of N time window before described current time window gather value sends alarm greater than setting threshold; If the KPI performance index data of N time window before described current time window gather the linear regression algorithm model slope of value greater than setting threshold, and the KPI performance index data of described following N time window surpass this KPI Performance Counter Threshold, send alarm.

Concrete, described KPI Performance Counter Threshold is the KPI performance index data according to a described following N time window, and the KPI performance index data of N time window before the current time window gather the linear regression algorithm model slope of value, and adjustment obtains on the KPI Performance Counter Threshold basis that sets in advance.

Concrete, the length of described time window is take minute as unit; Perhaps, the length of described time window take in the sky as unit; Perhaps, the length of described time window is take the moon as unit.

Can find out by above description, the embodiment of the present invention compared with prior art has advantage in the following aspects:

(1) embodiment of the present invention becomes Real Time Monitoring into initiatively early warning type monitoring, thereby avoid the generation of prior art problem, traditional monitor data is carried out deep discovery and arrangement, the method of the slope by calculating historical data, come the operating position of predict future a period of time, realize active early warning by the configuration alarming mechanism, variation tendency according to the KPI of system index, when system's generation performance sudden change, can send in advance early warning information, the apprizing system attendant adopts an effective measure and prevents the fault of systematic function aspect.

(2) embodiment of the present invention is supported the configurable of window analysis time, be defaulted as 15 minutes of standard, the time window of 15 minutes is the Best Times window configuration that draws according to long-term O﹠M experience accumulation, the effect that the interval of 15 minutes predicts out is also ideal, also can be according to the different system of different user, carry out the time window configuration of customization, change simple and easy to doly, promotion prospect is very extensive.

(3) embodiment of the present invention provides nearly real-time estimate process, by the monitored results in real application systems, usually 60 to 120 minutes before the KPI of system data surpass threshold value, can realize the pre-prelocalization of fault, send a warning message to administrative staff in advance, accomplish to prevent trouble before it happens.

(4) embodiment of the present invention provides and has carried out medium-term and long-term (15 days) forecasting process, and medium-and long-term forecasting can the KPI trend data of predict future about 15 days, for medium-term and long-term system monitoring provides foundation.

(5) embodiment of the present invention provides capacity predict (the 6-12 month), according to the prediction of long-term (the 6-12 month), can provide valuable foundation for the host capacity planning of key service system, for System Expansion lays a good foundation.

(6) embodiment of the present invention has realized that the KPI of monitoring is configurable, can supply configuration from CPU, internal memory to storage, network hundreds of kind KPI, the scope that can monitor is more extensive, historical monitor data before relying on, can realize the pre-front monitoring for CPU, internal memory, network etc. hundreds of kind KPI, simultaneously for the KPI index of monitoring, can also realize customizing completely, good human-computer interaction interface is provided, collocation method is simple and easy to do, for transplanting and the popularization of system are had laid a good foundation.

(7) embodiment of the present invention deployment is simple and convenient, and is extremely low to the resource occupation of system, and cost is lower, is easy to spread and uses.

(8) the regular definable of embodiment of the present invention early warning, can define according to slope and predicted value threshold value, two indexs of KPI slope and threshold value can be carried out according to user's business characteristic the predefine of customization, thereby can satisfy the Fault Pre prelocalization of customization fully.The operation resource is comparatively nervous when system, KPI is on the operation threshold value for a long time, but when there is no significant change, can think to a certain extent that system running state is normal, the KPI performance index data threshold of using by adjusting alarm, can avoid a large amount of alarms of producing, the mitigation system attendant processes the burden of alarm.

Through the above description of the embodiments, those skilled in the art can be well understood to the present invention and can realize by hardware, also can realize by the mode that software adds necessary general hardware platform.Based on such understanding, technical scheme of the present invention can embody with the form of software product, it (can be CD-ROM that this software product can be stored in a non-volatile memory medium, USB flash disk, portable hard drive etc.) in, comprise some instructions with so that computer equipment (can be personal computer, server, the perhaps network equipment etc.) carry out the described method of each embodiment of the present invention.

It will be appreciated by those skilled in the art that accompanying drawing is the schematic diagram of a preferred embodiment, the module in accompanying drawing or flow process might not be that enforcement the present invention is necessary.

It will be appreciated by those skilled in the art that the module in the device in embodiment can be distributed in the device of embodiment according to the embodiment description, also can carry out respective change and be arranged in the one or more devices that are different from the present embodiment.The module of above-described embodiment can be merged into a module, also can further split into a plurality of submodules.

The invention described above sequence number does not represent the quality of embodiment just to description.

Above disclosed be only several specific embodiment of the present invention, still, the present invention is not limited thereto, the changes that any person skilled in the art can think of all should fall into protection scope of the present invention.

Claims (16)

1. a performance of network equipments method for supervising, is characterized in that, comprising:
The KPI performance index data of network management system cycle collection network equipment;
Described network management system gathers the KPI performance index data that collect according to the time window of setting, the KPI performance index data that obtain each time window gather value;
Described network management system is utilized the linear regression algorithm model, and gathers value according to the KPI performance index data of N time window before the current time window, and prediction comprises the KPI performance index data of following N time window of current time window;
Described network management system judges whether the KPI performance index data of described following N time window surpass the setting alarm threshold, and sends alarm when being judged as YES.
2. the method for claim 1, is characterized in that, the KPI performance index data of described each time window gather value, is the KPI performance index data peaks that collects in each time window; Perhaps
The KPI performance index data of described each time window gather value, the mean value of the KPI performance index data peaks that collects for each collection period in each time window, and the length of described collection period is less than the length of described time window.
3. the method for claim 1, it is characterized in that, the described linear regression algorithm model that utilizes, and gather value according to the KPI performance index data of N time window before the current time window, prediction comprises the KPI performance index data of following N time window of current time window, comprising:
Utilize the linear regression algorithm model, and gather value according to the KPI performance index data of N time window before the current time window, the KPI performance index data that calculate this N time window gather the linear regression algorithm model slope of value;
Utilize described linear regression algorithm model, and calculate respectively the predicted value of the KPI performance index data of following N the time window that comprises the current time window according to this slope.
4. the method for claim 1, it is characterized in that, the described linear regression algorithm model that utilizes, and gather value according to the KPI performance index data of N time window before the current time window, prediction comprises the KPI performance index data of following N time window of current time window, comprising:
In N before a current time window time window, gather with the KPI performance index data of each time window the KPI performance index data that value deducts its previous time window and gather value, obtain including the array of N-1 increment size;
Utilize the linear regression algorithm model, and according to this array, calculate the linear regression algorithm model slope of this N-1 increment size;
Utilize described linear regression algorithm model, and calculate respectively according to this slope the increment size that the KPI performance index data of following N the time window that comprises the current time window are compared with previous time window;
Increment size according to the KPI performance index data of each time window in a described following N time window, and the KPI performance index data prediction value of previous time window, obtain the KPI performance index data prediction value of each time window in described following N time window; Wherein, the KPI performance index data prediction value of first time window in N time window in this future gathers the value sum for the KPI performance index data of corresponding increment size time window previous with it.
5. the method for claim 1, is characterized in that, when one of following situation, the KPI performance index data of the following N of a described network management system judgement time window surpass sets alarm threshold:
The KPI performance index data of described following N time window surpass this KPI Performance Counter Threshold;
The KPI performance index data of N time window before described current time window gather the linear regression algorithm model slope of value greater than setting threshold;
The linear regression algorithm model slope that the KPI performance index data of N time window before described current time window gather value is greater than setting threshold, and the KPI performance index data of described following N time window surpass this KPI Performance Counter Threshold.
6. method as claimed in claim 5, it is characterized in that, described KPI Performance Counter Threshold is the KPI performance index data according to a described following N time window, and the KPI performance index data of N time window before the current time window gather the linear regression algorithm model slope of value, and adjustment obtains on the KPI Performance Counter Threshold basis that sets in advance.
7. method as described in one of claim 1-6, is characterized in that, the length of described time window is take minute as unit; Perhaps
The length of described time window take in the sky as unit; Perhaps
The length of described time window is take the moon as unit.
8. method as described in one of claim 1-6, is characterized in that, described KPI performance index data comprise one of following or combination in any: CPU usage, memory usage and network throughput.
9. a network management system, is characterized in that, comprising:
Acquisition module is for the KPI performance index data of cycle collection network equipment;
Summarizing module, the KPI performance index data that are used for collecting gather according to the time window of setting, and the KPI performance index data that obtain each time window gather value;
Prediction module is used for utilizing the linear regression algorithm model, and gathers value according to the KPI performance index data of N time window before the current time window, and prediction comprises the KPI performance index data of following N time window of current time window;
Alarm module is used for judging whether the KPI performance index data of a pre-described following N time window surpass the setting alarm threshold, and sends alarm when being judged as YES.
10. network management system as claimed in claim 9, it is characterized in that, the KPI performance index data that described summarizing module gathers described each time window that obtains gather value, be the KPI performance index data peaks that collects in each time window, the mean value of the KPI performance index data peaks that perhaps collects for each collection period in each time window, the length of described collection period is less than the length of described time window.
11. network management system as claimed in claim 9, it is characterized in that, described prediction module specifically is used for, utilize the linear regression algorithm model, and gathering value according to the KPI performance index data of N time window before the current time window, the KPI performance index data that calculate this N time window gather the linear regression algorithm model slope of value; Utilize described linear regression algorithm model, and calculate respectively the predicted value of the KPI performance index data of following N the time window that comprises the current time window according to this slope.
12. network management system as claimed in claim 9, it is characterized in that, described prediction module specifically is used for, in N before a current time window time window, gather with the KPI performance index data of each time window the KPI performance index data that value deducts its previous time window and gather value, obtain including the array of N-1 increment size; Utilize the linear regression algorithm model, and according to this array, calculate the linear regression algorithm model slope of this N-1 increment size; Utilize described linear regression algorithm model, and calculate respectively according to this slope the increment size that the KPI performance index data of following N the time window that comprises the current time window are compared with previous time window; Increment size according to the KPI performance index data of each time window in a described following N time window, and the KPI performance index data prediction value of previous time window, obtain the KPI performance index data prediction value of each time window in described following N time window; Wherein, the KPI performance index data prediction value of first time window in N time window in this future gathers the value sum for the KPI performance index data of corresponding increment size time window previous with it.
13. network management system as claimed in claim 12 is characterized in that, described alarm module specifically is used for, if the KPI performance index data of described following N time window surpass this KPI Performance Counter Threshold, sends alarm; If the linear regression algorithm model slope that the KPI performance index data of N time window before described current time window gather value sends alarm greater than setting threshold; If the KPI performance index data of N time window before described current time window gather the linear regression algorithm model slope of value greater than setting threshold, and the KPI performance index data of described following N time window surpass this KPI Performance Counter Threshold, send alarm.
14. network management system as claimed in claim 13, it is characterized in that, described KPI Performance Counter Threshold is the KPI performance index data according to a described following N time window, and the KPI performance index data of N time window before the current time window gather the linear regression algorithm model slope of value, and adjustment obtains on the KPI Performance Counter Threshold basis that sets in advance.
15. network management system as described in one of claim 9-14 is characterized in that, the length of described time window is take minute as unit; Perhaps
The length of described time window take in the sky as unit; Perhaps
The length of described time window is take the moon as unit.
16. network management system as described in one of claim 9-14 is characterized in that, described KPI performance index data comprise one of following or combination in any: CPU usage, memory usage and network throughput.
CN2011104303495A 2011-12-20 2011-12-20 Network device performance monitoring method and network management system CN103178990A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN2011104303495A CN103178990A (en) 2011-12-20 2011-12-20 Network device performance monitoring method and network management system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN2011104303495A CN103178990A (en) 2011-12-20 2011-12-20 Network device performance monitoring method and network management system

Publications (1)

Publication Number Publication Date
CN103178990A true CN103178990A (en) 2013-06-26

Family

ID=48638622

Family Applications (1)

Application Number Title Priority Date Filing Date
CN2011104303495A CN103178990A (en) 2011-12-20 2011-12-20 Network device performance monitoring method and network management system

Country Status (1)

Country Link
CN (1) CN103178990A (en)

Cited By (22)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103490948A (en) * 2013-09-06 2014-01-01 华为技术有限公司 Method and device for predicting network performance
CN103945442A (en) * 2014-05-07 2014-07-23 东南大学 System anomaly detection method based on linear prediction principle in mobile communication system
CN104468206A (en) * 2014-11-28 2015-03-25 华为技术服务有限公司 Performance warning method and device
CN104731690A (en) * 2013-11-13 2015-06-24 奈飞公司 Adaptive metric collection, storage, and alert thresholds
WO2015090022A1 (en) * 2013-12-18 2015-06-25 中兴通讯股份有限公司 Resource scheduling method and device, and computer storage medium
CN105071968A (en) * 2015-08-18 2015-11-18 大唐移动通信设备有限公司 Method and device for repairing hidden failures of service plane and control plane of communication device
WO2015172508A1 (en) * 2014-05-16 2015-11-19 中兴通讯股份有限公司 Performance data processing method and device
CN105323111A (en) * 2015-11-17 2016-02-10 南京南瑞集团公司 Operation and maintenance automation system and method
CN105491599A (en) * 2015-12-21 2016-04-13 南京华苏科技股份有限公司 Novel regression system for predicting LTE network performance indexes
CN105634787A (en) * 2014-11-26 2016-06-01 华为技术有限公司 Evaluation method, prediction method and device and system for network key indicator
CN105871575A (en) * 2015-01-21 2016-08-17 中国移动通信集团河南有限公司 Load early warning method and device for core network elements
CN106095639A (en) * 2016-05-30 2016-11-09 中国农业银行股份有限公司 A kind of cluster subhealth state method for early warning and system
CN106452931A (en) * 2016-12-27 2017-02-22 中国建设银行股份有限公司 Monitoring index, domain value discovery method, domain value adjusting method and automatic monitoring system
CN106487571A (en) * 2015-09-02 2017-03-08 中国移动通信集团公司 A kind of method and device of assessment network performance index variation tendency
CN106533730A (en) * 2015-09-15 2017-03-22 中兴通讯股份有限公司 Method and device for acquiring index of Hadoop cluster component
CN106559813A (en) * 2015-09-28 2017-04-05 中兴通讯股份有限公司 A kind of network estimation method and device
CN106713029A (en) * 2016-12-20 2017-05-24 中国银联股份有限公司 Method and apparatus for determining resource monitoring thresholds
CN106886485A (en) * 2017-02-28 2017-06-23 深圳市华傲数据技术有限公司 Power system capacity analyzing and predicting method and device
CN107426019A (en) * 2017-07-06 2017-12-01 国家电网公司 Network failure determines method, computer equipment and computer-readable recording medium
CN107534570A (en) * 2015-06-16 2018-01-02 慧与发展有限责任合伙企业 Virtualize network function monitoring
CN107608870A (en) * 2017-09-22 2018-01-19 郑州云海信息技术有限公司 A kind of statistical method and system of system resource utilization rate
WO2018103524A1 (en) * 2016-12-08 2018-06-14 Huawei Technologies Co., Ltd. Prediction of performance indicators in cellular networks

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101541016A (en) * 2009-05-06 2009-09-23 华为技术有限公司 Method for predicting data and equipment
CN102004671A (en) * 2010-11-15 2011-04-06 北京航空航天大学 Resource management method of data center based on statistic model in cloud computing environment
CN102111284A (en) * 2009-12-28 2011-06-29 北京亿阳信通软件研究院有限公司 Method and device for predicting telecom traffic

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101541016A (en) * 2009-05-06 2009-09-23 华为技术有限公司 Method for predicting data and equipment
CN102111284A (en) * 2009-12-28 2011-06-29 北京亿阳信通软件研究院有限公司 Method and device for predicting telecom traffic
CN102004671A (en) * 2010-11-15 2011-04-06 北京航空航天大学 Resource management method of data center based on statistic model in cloud computing environment

Cited By (33)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10298464B2 (en) 2013-09-06 2019-05-21 Huawei Technologies Co., Ltd. Network performance prediction method and apparatus
WO2015032252A1 (en) * 2013-09-06 2015-03-12 华为技术有限公司 Prediction method and device for network performance
CN103490948A (en) * 2013-09-06 2014-01-01 华为技术有限公司 Method and device for predicting network performance
CN104731690A (en) * 2013-11-13 2015-06-24 奈飞公司 Adaptive metric collection, storage, and alert thresholds
US10498628B2 (en) 2013-11-13 2019-12-03 Netflix, Inc. Adaptive metric collection, storage, and alert thresholds
CN104731690B (en) * 2013-11-13 2019-10-15 奈飞公司 Adaptive metrology collection, storage and warning threshold
WO2015090022A1 (en) * 2013-12-18 2015-06-25 中兴通讯股份有限公司 Resource scheduling method and device, and computer storage medium
CN103945442A (en) * 2014-05-07 2014-07-23 东南大学 System anomaly detection method based on linear prediction principle in mobile communication system
WO2015172508A1 (en) * 2014-05-16 2015-11-19 中兴通讯股份有限公司 Performance data processing method and device
CN105101281A (en) * 2014-05-16 2015-11-25 中兴通讯股份有限公司 Performance data processing method and device
CN105634787A (en) * 2014-11-26 2016-06-01 华为技术有限公司 Evaluation method, prediction method and device and system for network key indicator
CN105634787B (en) * 2014-11-26 2018-12-07 华为技术有限公司 Appraisal procedure, prediction technique and the device and system of network key index
CN104468206B (en) * 2014-11-28 2019-04-05 华为技术服务有限公司 The method and apparatus of performance alarm
CN104468206A (en) * 2014-11-28 2015-03-25 华为技术服务有限公司 Performance warning method and device
CN105871575A (en) * 2015-01-21 2016-08-17 中国移动通信集团河南有限公司 Load early warning method and device for core network elements
CN107534570A (en) * 2015-06-16 2018-01-02 慧与发展有限责任合伙企业 Virtualize network function monitoring
CN105071968A (en) * 2015-08-18 2015-11-18 大唐移动通信设备有限公司 Method and device for repairing hidden failures of service plane and control plane of communication device
CN106487571A (en) * 2015-09-02 2017-03-08 中国移动通信集团公司 A kind of method and device of assessment network performance index variation tendency
CN106487571B (en) * 2015-09-02 2020-02-14 中国移动通信集团公司 Method and device for evaluating network performance index change trend
CN106533730A (en) * 2015-09-15 2017-03-22 中兴通讯股份有限公司 Method and device for acquiring index of Hadoop cluster component
CN106559813A (en) * 2015-09-28 2017-04-05 中兴通讯股份有限公司 A kind of network estimation method and device
CN105323111B (en) * 2015-11-17 2018-08-10 南京南瑞集团公司 A kind of O&M automated system and method
CN105323111A (en) * 2015-11-17 2016-02-10 南京南瑞集团公司 Operation and maintenance automation system and method
CN105491599A (en) * 2015-12-21 2016-04-13 南京华苏科技股份有限公司 Novel regression system for predicting LTE network performance indexes
CN105491599B (en) * 2015-12-21 2019-03-08 南京华苏科技有限公司 Predict the novel regression system of LTE network performance indicator
CN106095639A (en) * 2016-05-30 2016-11-09 中国农业银行股份有限公司 A kind of cluster subhealth state method for early warning and system
WO2018103524A1 (en) * 2016-12-08 2018-06-14 Huawei Technologies Co., Ltd. Prediction of performance indicators in cellular networks
CN106713029A (en) * 2016-12-20 2017-05-24 中国银联股份有限公司 Method and apparatus for determining resource monitoring thresholds
CN106452931A (en) * 2016-12-27 2017-02-22 中国建设银行股份有限公司 Monitoring index, domain value discovery method, domain value adjusting method and automatic monitoring system
CN106452931B (en) * 2016-12-27 2019-09-17 中国建设银行股份有限公司 Monitor control index and thresholding discovery method, thresholding method of adjustment and automatic monitored control system
CN106886485A (en) * 2017-02-28 2017-06-23 深圳市华傲数据技术有限公司 Power system capacity analyzing and predicting method and device
CN107426019A (en) * 2017-07-06 2017-12-01 国家电网公司 Network failure determines method, computer equipment and computer-readable recording medium
CN107608870A (en) * 2017-09-22 2018-01-19 郑州云海信息技术有限公司 A kind of statistical method and system of system resource utilization rate

Similar Documents

Publication Publication Date Title
JP6568931B2 (en) Behavioral demand response ranking
US9432865B1 (en) Wireless cell tower performance analysis system and method
US9467572B2 (en) Determining usage predictions and detecting anomalous user activity through traffic patterns
US10013660B2 (en) Method and control system for scheduling load of a power plant
US8924033B2 (en) Generalized grid security framework
CN103354924B (en) For monitoring the method and system of performance indications
AU2001255994B8 (en) Method of Business Analysis
CN101847007B (en) Process for determining competing cause event probability and/or system availability during the simultaneous occurrence of multiple events
CN102740247B (en) Method and device for generating warning message
LaCommare et al. Understanding the cost of power interruptions to US electricity consumers
EP1812863B1 (en) Reporting of abnormal computer resource utilization data
US8249999B2 (en) Systems and method for costing of service proposals
Carreras et al. North American blackout time series statistics and implications for blackout risk
CN105630885B (en) A kind of multiplexing electric abnormality detection method and system
Moreno et al. Transmission network investment with probabilistic security and corrective control
JP2009180741A (en) Electric power consumption monitoring system
CN103617561A (en) System and method for evaluating state of secondary device of power grid intelligent substation
US20050216793A1 (en) Method and apparatus for detecting abnormal behavior of enterprise software applications
EP1478073A2 (en) System for monitoring and maintaining the provision of electrical power
CN101155085B (en) Method and device for real-time flux prediction and real-time flux monitoring and early warning
WO2011094664A1 (en) Risk scorecard
JP2007515020A (en) Statistical analysis of automatic monitoring and dynamic process metrics to reveal meaningful variations
EP1478074A2 (en) Dynamic economic dispatch
Yang et al. Probabilistic short-term wind power forecast using componential sparse Bayesian learning
CN103095937B (en) Prediction method for number of seats in call center based on telephone traffic prediction

Legal Events

Date Code Title Description
PB01 Publication
C06 Publication
SE01 Entry into force of request for substantive examination
C10 Entry into substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20130626

RJ01 Rejection of invention patent application after publication