CN106021082A - Big data based host performance capacity estimation method - Google Patents

Big data based host performance capacity estimation method Download PDF

Info

Publication number
CN106021082A
CN106021082A CN201610318916.0A CN201610318916A CN106021082A CN 106021082 A CN106021082 A CN 106021082A CN 201610318916 A CN201610318916 A CN 201610318916A CN 106021082 A CN106021082 A CN 106021082A
Authority
CN
China
Prior art keywords
performance data
coefficient
data
sample
target variable
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201610318916.0A
Other languages
Chinese (zh)
Other versions
CN106021082B (en
Inventor
庄磊
腾腾
张宏亮
于鹏
孙哲
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
China Construction Bank Corp
Original Assignee
China Construction Bank Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by China Construction Bank Corp filed Critical China Construction Bank Corp
Priority to CN201610318916.0A priority Critical patent/CN106021082B/en
Publication of CN106021082A publication Critical patent/CN106021082A/en
Application granted granted Critical
Publication of CN106021082B publication Critical patent/CN106021082B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/34Recording or statistical evaluation of computer activity, e.g. of down time, of input/output operation ; Recording or statistical evaluation of user activity, e.g. usability assessment
    • G06F11/3409Recording or statistical evaluation of computer activity, e.g. of down time, of input/output operation ; Recording or statistical evaluation of user activity, e.g. usability assessment for performance assessment
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/34Recording or statistical evaluation of computer activity, e.g. of down time, of input/output operation ; Recording or statistical evaluation of user activity, e.g. usability assessment
    • G06F11/3452Performance evaluation by statistical analysis

Abstract

The invention discloses a big data based host performance capacity estimation method. The big data based host performance capacity estimation method comprises the steps of firstly obtaining performance data of each sample of a host, wherein the performance data of each sample are the data affecting the estimation of the host capacity; then deleting the performance data of the host, that is, abnormal data of affecting factors through a periodic 7-difference confidence interval, wherein the method of deleting the abnormal data can be in combination with an actual transaction behavior to prevent the capacity estimation from being affected by the abnormal data; then performing correlation analysis for the performance data with the abnormal data deleted and a target variable, and building an estimation model of the target variable by combining with a time sequence based on the analysis result. According to the technical scheme of the big data based host performance capacity estimation method, and the influence of each affecting factor is fully considered, the actual service condition is fully combined, so that the accuracy of the host capacity estimation is improved.

Description

A kind of host performance capacity predictor method based on big data
Technical field
The present invention relates to IBM mainframe field, particularly relate to a kind of host performance capacity based on big data Predictor method.
Background technology
Along with the development of big data technique, the most emerging internet industry, or traditional system Make industry and all be unable to do without big data.And the technical problem that data are maximum greatly is exactly the storage of mass data, especially During its Shi Ge big bank carries out magnanimity transaction data concentration and Constructing data center, it is substantially all and adopts The process of mass data is realized with framework based on IBM mainframe and its Parallel coupled body.And it is sharp With mainframe, bank's magnanimity transaction data is stored the storage charges use needing to spend great number.Therefore need For historical data, host capacity estimated, to reduce the waste of unnecessary host resource.
At present, for the current transaction business datum of bank, owing to the factor such as festivals or holidays, policy affects, Real trade data exist a lot of abnormal growth point, only considers that the abnormal growth simple linear properties of point holds Amount predictor method, is difficult to match with practical situation, and lacks multiple factor of influence and performance to be predicted Considering of the correlation analysis of index, greatly reduces the accuracy of Performance Prediction.
The most how to combine practical situation and various factors, the accuracy that raising host capacity is estimated is It is presently required and solves the technical problem that.
Summary of the invention
In view of this, the invention provides a kind of host performance capacity predictor method based on big data, energy Enough combine with practical business situation, and carry out the correlation analysis of various factors, thus reach Improve the purpose of the accuracy that host capacity is estimated.
The invention discloses a kind of host performance capacity predictor method based on big data, including:
Obtaining main frame this performance data of various kinds, wherein, described this performance data of various kinds is for affect host capacity The sample data estimated;
The method utilizing periodically 7 jump confidence intervals deletes the abnormal data of described this performance data of various kinds, To obtain this performance data of various kinds of suppressing exception data;
Calculate this performance data of various kinds of described suppressing exception data and the relative coefficient of target variable;
The sample performance data that maximal correlation property coefficient is corresponding is obtained according to described relative coefficient;
According to the sample performance data that described maximal correlation property coefficient is corresponding, binding time sequence is set up described The prediction model of target variable.
Preferably, the described method utilizing periodically 7 jump confidence intervals deletes described this performance data of various kinds Abnormal data, to obtain each performance sample data of suppressing exception data, including:
The performance data calculating every day in described this performance data of various kinds contrasts the change difference before and after 7 days;
Frequency histogram demonstration is utilized to be distributed the most very much whether meeting of described difference in change value;
If this performance data of various kinds existing corresponding change difference meet the sample performance number being distributed the most very much According to, according to meet the sample average of the change difference being distributed the most very much and standard deviation calculate described in meet and divide the most very much The confidence interval of the change difference of cloth;
Meet according to location, described confidence interval corresponding to the abnormal change difference being distributed the most very much time Between point;
Each performance data of the time point corresponding to change difference of the most too abnormal distribution is met described in deletion.
Preferably, the described difference in change meeting the exception being distributed the most very much according to location, described confidence interval After time point corresponding to value, also include:
Percentage is utilized to verify the confidence interval of change difference corresponding to described each performance data;
Wherein, if being proved to be successful, meet described in execution deletion corresponding to the change difference of the most too abnormal distribution Each performance data of time point.
Preferably, the phase of this performance data of various kinds of described calculating described suppressing exception data and target variable Close property coefficient, including:
Calculate the linear dependence system of described target variable and each performance data of described suppressing exception data Number;
Judge that in described linear correlation property coefficient, whether the value of maximum linear relative coefficient is more than or equal to presetting Threshold value;
If the value of described maximum linear relative coefficient is more than or equal to predetermined threshold value, obtain described maximum linear The sample performance data that relative coefficient is corresponding;Wherein, described according to described relative coefficient acquisition maximum According to the sample performance data that relative coefficient is corresponding, described linear correlation property coefficient obtains maximum linear phase Close the sample performance data that property coefficient is corresponding;
If the value of described maximum linear relative coefficient is less than predetermined threshold value, calculate described suppressing exception data The determination coefficient that is fitted with given each curvilinear equation of this performance data of various kinds;
Judge described determine in coefficient maximum determine that whether the value of coefficient is more than or equal to predetermined threshold value;
If described maximum determines that coefficient is more than or equal to predetermined threshold value, obtain described maximum and determine that coefficient is corresponding Sample performance data;Wherein, described corresponding according to described relative coefficient acquisition maximal correlation property coefficient According to sample performance data described determine coefficient obtain maximum determine the sample performance data that coefficient is corresponding.
Preferably, the described sample performance corresponding according to described relative coefficient acquisition maximal correlation property coefficient Data, including:
If described maximum determines that coefficient is less than predetermined threshold value, according to described linear correlation property coefficient with determining it is In number, greatest coefficient obtains the sample performance data that described greatest coefficient is corresponding.
Preferably, the described sample performance data corresponding according to described maximal correlation property coefficient, binding time Sequence sets up the prediction model of described target variable, including:
According to the performance data that maximal correlation property coefficient in described performance data is corresponding, binding time sequence is built The initialization model of vertical described target variable;
Described initialization model model parameter is adjusted, obtains new prediction model;
Calculate the relative coefficient of described new prediction model;
If described relative coefficient is more than or equal to described preset value, the most described new prediction model is described mesh The prediction model of mark variable;
Otherwise, the parameter of described new prediction model is re-started adjustment, until described relative coefficient More than or equal to preset value, determine the prediction model of described target variable.
Preferably, the described sample performance corresponding according to described relative coefficient acquisition maximal correlation property coefficient After data, also include:
The sample performance data corresponding to described maximal correlation property coefficient and the correlative relationship of target variable Carry out hypothetical checking;
Wherein, if being proved to be successful, perform the sample performance data corresponding according to described maximal correlation property coefficient, Binding time sequence sets up the prediction model of described target variable.
Preferably, the described sample performance data corresponding to described maximal correlation property coefficient and target variable Correlative relationship carries out hypothetical checking, including:
Assume that the performance data that described maximal correlation property coefficient is corresponding is overall without significantly with the two of target variable Correlative relationship;
Calculate performance data corresponding to described maximal correlation property coefficient corresponding with described target variable statistic Together probit;
If described probit together is less than or equal to the significance level set, then refusal is described it is assumed that test Demonstrate,prove successfully;
Otherwise, accept described it is assumed that authentication failed.
Relative to prior art, the invention has the beneficial effects as follows: first the present invention obtains main frame this performance of various kinds Data, wherein, described various kinds this performance packet includes: different channel transaction rate and main frame real storage and void The service conditions deposited etc. are multiple affects the performance data that host capacity is estimated;Then put by periodically 7 jumps The abnormal data of the method i.e. factor of influence of performance data each to main frame that letter is interval is deleted, described deletion The method of abnormal data can be in conjunction with real trade behavior, it is to avoid capacity is estimated to be affected by abnormal data; Again each factor of influence and target variable are carried out correlation analysis, according to target variable, there is maximal correlation The performance data of property, binding time sequence sets up the prediction model of described target variable.The technology of the present invention Scheme fully takes into account the impact of each factor of influence, fully combines practical business situation so that host capacity The accuracy estimated is improved.
Accompanying drawing explanation
In order to be illustrated more clearly that the embodiment of the present invention or technical scheme of the prior art, below will be to reality Execute the required accompanying drawing used in example or description of the prior art to be briefly described, it should be apparent that below, Accompanying drawing in description is only embodiments of the invention, for those of ordinary skill in the art, not On the premise of paying creative work, it is also possible to obtain other accompanying drawing according to the accompanying drawing provided.
Fig. 1 is a kind of host performance capacity predictor methods based on big data disclosed in the embodiment of the present invention Flow chart;
Fig. 2 is the method flow of a kind of periodicity 7 jump confidence interval disclosed in another kind embodiment of the present invention Figure;
Fig. 3 is that disclosed in another kind embodiment of the present invention, a kind of host performance capacity based on big data are estimated The flow chart of method.
Detailed description of the invention
Below in conjunction with the accompanying drawing in the embodiment of the present invention, the technical scheme in the embodiment of the present invention is carried out Clearly and completely describe, it is clear that described embodiment is only a part of embodiment of the present invention, and It is not all, of embodiment.Based on the embodiment in the present invention, those of ordinary skill in the art are not doing Go out the every other embodiment obtained under creative work premise, broadly fall into the scope of protection of the invention.
The invention discloses a kind of host performance capacity predictor method based on big data, described method, bag Include:
Step S101, acquisition main frame this performance data of various kinds, wherein, described this performance data of various kinds is shadow Ring the sample data that host capacity is estimated;
Wherein, described this performance data of various kinds, including every day online period transaction rate (TPS), every day Online total amount of transactions, every day online period void deposit average service rate, every day online period real storage averagely uses Rate, every day online period disk response time, online period data every day storehouse Buffer Pool hit average rate, The average execution efficiencys of SQL etc. in online period data every day storehouse, wherein said each performance data not only includes not With channel transaction rate performance data, also include main frame online real storage every day, void deposit, database Buffer Pool etc. The performance data of utilization rate, fully takes into account and affects the various influence factors that host capacity is estimated, so that Host capacity to estimate accuracy higher;In the present embodiment using every day online period MIPS utilization rate as Target variable, the described various performance datas that affect are as factor of influence;
The present embodiment is from the big data platform set up, obtain main frame this performance data of various kinds, first needs Each performance data that main frame stores moved down into described big data platform, thus to be avoided that consumption host resource Carry out data analysis;
The method of step S102, utilization periodically 7 jump confidence intervals deletes described this performance data of various kinds Abnormal data, to obtain this performance data of various kinds of suppressing exception data;
Wherein, due to the Thinking of Social Factors Influence such as festivals or holidays, policy factor (the most double 11, the Spring Festival), There is a lot of abnormal data in described this performance data of various kinds;For the trading activity that bank is daily, its change There is periodic regularity in trend, i.e. period of change is 7 days, the method for described periodicity 7 jump confidence interval It is through calculating the change before and after the performance data of every day in described this performance data of various kinds contrasts 7 days Difference, obtains the change difference after the performance data of every day contrasts 7 days front change differences and contrasts 7 days, Then calculate the confidence interval of the two change difference, position ANOMALOUS VARIATIONS difference by confidence interval, Compare the ANOMALOUS VARIATIONS difference of 7 days before and after same time point correspondence again, judge that described time point is corresponding Whether performance data is abnormal data, thus reaches the purpose of accurate suppressing exception performance data.
Step S103, this performance data of various kinds calculating described suppressing exception data are relevant to target variable Property coefficient;
Wherein, this performance data of various kinds of described calculating described suppressing exception data is relevant to target variable Property coefficient is that this performance data of various kinds to described suppressing exception data carries out dependency with target variable and divides Analysis, thus find the performance data the highest to described target variable dependency;
Described correlation analysis includes linear dependence analysis and nonlinear correlation analysis, wherein, linear phase The analysis of closing property is by calculating this performance data of various kinds of described suppressing exception data and the linear of target variable Relative coefficient r, linear correlation property coefficient has preferably measured the linear phase between performance data with target variable Pass degree, correlation coefficient r belongs to [-1 ,+1], r=1 perfect positive correlation;R=-1: perfect negative correlation; Existence function relation between both of these case explanation performance data and target variable;R=0: close without linear System;| r | > 0.8: strong correlation;If not strong linear correlation, then need described performance data is entered with target variable Line nonlinearity correlation analysis;
Described nonlinear correlation analysis is that this performance data of various kinds calculating described suppressing exception data is with given The determination coefficients R that each curvilinear equation is fitted2, described R2It is referred to as the determination coefficient of equation, span Between 0~1, closer to 1, show that the variable of equation is the strongest to the interpretability of y.If described R2It is less than 0.8, it is believed that it does not meets optimal fitting result, it is taken as that the most impacted factor of described target variable Affect and affected by time series, therefore time series need to be added factor of influence and set up prediction model.
Step S104, obtain sample performance number corresponding to maximal correlation property coefficient according to described relative coefficient According to;
Wherein, the sample performance data that described maximal correlation property coefficient is corresponding is that host capacity estimates dependency The highest factor of influence;
Step S105, according to sample performance data corresponding to described maximal correlation property coefficient, binding time sequence Row set up the prediction model of described target variable.
In the present embodiment, first obtain from the big data platform set up and affect the various kinds that host capacity is estimated This performance data, deletes the different of described this performance data of various kinds by the method for periodically 7 jump confidence intervals Regular data, reaches to combine real trade situation, carries out estimating host capacity according to daily pattern;Delete different Normal growth point, it is to avoid capacity is estimated and affected by abnormal growth point, thus it is greatly improved estimate accurate Property;Then carry out dependency with described target variable and divide affecting the various performance datas that host capacity estimates Analysis, thus finds the performance data that dependency is the highest, finally combine the highest performance data of dependency and time Between sequence set up the prediction model of described target variable, compare based on the abnormal growth simple linear properties of point Capacity predictor method, the present invention combines real trade situation and comprehensive influence factor, passes through mathematical model The scheme setting up prediction model is more scientific, more accurately.
Preferably, in another embodiment, it is contemplated that in scene embodiment, due to the society such as red-letter day, policy Factor affects, and there is abnormity point in the performance data got, and described abnormity point can affect host performance The accuracy that forecast model builds.Therefore, the present invention proposes the side of a kind of periodicity 7 jump confidence interval Method deletes the abnormal data of described this performance data of various kinds, as in figure 2 it is shown, described method includes:
Step S201, calculate the performance data of every day in described this performance data of various kinds and contrast before and after 7 days Change difference;
Wherein, finding by observing banking transaction behavior, its transaction data variation tendency presents 7 days weeks Phase property rule, the relatively data on the same day in each week can improve the accuracy that abnormity point is deleted, therefore Calculate 7 jump values after 7 days 7 front jump values of contrast of each performance data and contrast 7 days;
Step S202, utilize frequency histogram demonstration be distributed the most very much whether meeting of described difference in change value;If Meet and be distributed the most very much, perform step S203;
Wherein, the abscissa of described frequency histogram is the continuous desirable numerical value of change difference DELTA, vertical coordinate The frequency of occurrences for a certain change difference DELTA;If the frequency histogram that described 7 rank difference DELTA are constituted is corresponding Expected value is 0, and standard deviation is 1, and the most described meeting of change difference is distributed the most very much;
Step S203, basis meet described in sample average and the standard deviation calculating of the change difference being distributed the most very much Meet the confidence interval of the change difference being distributed the most very much;
Wherein, setting confidence level as 80%, calculating the average of sample statistic is M and standard deviation is ST; By meeting confidence upper limit and the confidence of the change difference being distributed the most very much described in described average and standard deviation calculating Lower limit;
Step S204, meet, according to location, described confidence interval, the abnormal change difference being distributed the most very much Corresponding time point;
Wherein, described confidence interval includes the confidence interval contrasting 7 days both front and back change differences, by putting Letter interval positions the ANOMALOUS VARIATIONS of 7 days before and after ANOMALOUS VARIATIONS difference, more same time point correspondence respectively Difference, judges whether the performance data that described time point is corresponding is abnormal data;
Step S205, percentage is utilized to verify the confidence of change difference corresponding to described this performance data of various kinds Interval;If being proved to be successful, perform step S206;
Wherein, described percentage includes:
The described difference in change Valued Statistics being distributed the most very much that meets is ranked up;
Setting confidence level as 80%, higher limit is 90, if confidence upper limit is at 90% percentile after sequence, Lower limit is 10, and confidence lower limit is at 10% percentile after sequence, then prove to utilize the most too regularity of distribution meter The confidence interval calculated is accurate;
Step S206, delete described in meet the time point corresponding to change difference each of the most too abnormal distribution Performance data.
Illustrated by scene embodiment, with online period transaction rate on April 13 (TPS) in 2013 As a example by index, it is that 7 jumps on April 6th, 2013 are set to Δ 1, after it contrasts 7 days before setting its contrast 7 days 7 jumps on April 20th, 1 are set to Δ 2, are abnormal 7 jump values by location, confidence interval Δ 1, then Prove burr point that Day Trading in April 13 speed in 2013 is abnormal growth or Day Trading in April 6 speed in 2013 For the abnormal drop point reduced, further look at the 7 rank difference DELTA 2 on April 20th, 2013, if it is similarly Abnormal 7 jump values, then prove that Day Trading in April 13 speed in 2013 is exceptional value, should be screened out.
In another embodiment, disclose a kind of host performance capacity predictor method based on big data, see Fig. 3, described method, including:
Step S301, acquisition main frame this performance data of various kinds;
The method of step S302, utilization periodically 7 jump confidence intervals deletes described this performance data of various kinds Abnormal data, to obtain this performance data of various kinds of suppressing exception data;
Step S303, calculate the linear of described target variable and each performance data of described suppressing exception data Relative coefficient r;
Wherein, the calculating process of described linear correlation property coefficient r is to be automatically completed by SPSS instrument;
Step S304, judge maximum linear relative coefficient r in described linear correlation property coefficient rmaxValue whether More than or equal to predetermined threshold value;If described maximum linear relative coefficient rmaxMore than or equal to predetermined threshold value, perform Step S305, otherwise, performs step S306;
Wherein, described predetermined threshold value is 0.8, if linear correlation property coefficient r 0.8, then proves described rmaxRight The sample performance data answered becomes strong linear correlation with described target variable;
Step S305, obtain sample corresponding to maximum linear relative coefficient according to described linear correlation property coefficient This performance data;
Step S306, this performance data of various kinds calculating described suppressing exception data and given each curvilinear equation The determination coefficient being fitted;
Described given curve equation such as following table:
Calculate this performance datas of various kinds of described suppressing exception data when being fitted with given each curvilinear equation Calculate the determination coefficient of its correspondence respectively, wherein, described determine coefficients R2Calculating process be to pass through SPSS Instrument automatically completes;
Step S307, judge described determine in coefficient maximum determine that whether the value of coefficient is more than or equal to presetting threshold Value;If described maximum determines that coefficient is more than or equal to predetermined threshold value, perform step S308, otherwise perform step S309;
Wherein, described predetermined threshold value is 0.8, however, it is determined that coefficients R20.8, then prove describedCorresponding Sample performance data becomes strong nonlinearity relevant to described target variable;
Step S308, according to described determine coefficient obtain maximum determine the sample performance data that coefficient is corresponding;
Step S309, according to described linear correlation property coefficient and determine in coefficient greatest coefficient obtain described in The sample performance data that big coefficient is corresponding;
Step S310, phase to sample performance data corresponding to described maximal correlation property coefficient with target variable Close sexual relationship and carry out hypothetical checking;If being proved to be successful, perform step S311;
Wherein, utilize small probability apagoge thought, verify that described sample performance data is for overall performance number According to the most representative;If verifying unsuccessful, the most again to described this performance data of various kinds and described mesh The dependency of mark variable is analyzed;
Step S311, according to sample performance data corresponding to described maximal correlation property coefficient, binding time sequence Row set up the prediction model of described target variable.
In the present embodiment, first combine real trade behavior, delete the abnormal number in this performance data of various kinds According to, thus improve the accuracy that host capacity is estimated;Then to this performance of various kinds after suppressing exception data Data and described target variable carry out correlation analysis, and correlation analysis includes linear dependence analysis and non- Correlation analysis, process of analyzing from simple to complexity, this performance data of various kinds after suppressing exception data with When described target variable exists strong linear correlation, no longer to this performance data of various kinds after suppressing exception data It is analyzed with the non-linear dependencies of described target variable, thus reduces the complexity of correlation analysis; And this performance data of various kinds after determining described suppressing exception data and the dependency of described target variable After relation, the dependency to sample performance data corresponding to described maximal correlation property coefficient with target variable closes System carries out hypothetical checking, improves the accuracy of correlation analysis further, thus it is pre-to improve host capacity The accuracy estimated.
Preferably, in another embodiment, the described sample corresponding according to described maximal correlation property coefficient Can data, binding time sequence sets up the prediction model of described target variable, including:
According to the performance data that maximal correlation property coefficient in described performance data is corresponding, binding time sequence is built The initialization model of vertical described target variable;
Described initialization model parameter is adjusted, obtains new prediction model;
Wherein, described parameter includes model autoregression item, time series steady difference number of times etc.;
Calculate the relative coefficient of described new prediction model and target variable;
If described relative coefficient is more than or equal to described preset value, the most described new prediction model is described mesh The prediction model of mark variable;
Otherwise, the parameter of described new prediction model is re-started adjustment, until described relative coefficient More than or equal to preset value, determine the prediction model of described target variable.
Wherein, the prediction model equation of described target variable is as follows: with every day online period transaction rate (TPS) as the performance data that dependency is the highest,
Wherein, MIPStFor target variable, represent the t days MIPS service conditions, TPStFor input variable, table Show the t days TPS service conditions, MIPSt-1For the autoregression item MIPS service condition of i.e. t-n days, AR(1-7)For Autoregression term coefficient, MA(1-7)For rolling average parameter, εt-εt-7For white noise;
In the present embodiment, according to the maximal correlation property coefficient correspondence sample performance data binding time sequence obtained Row set up the prediction model of described target variable, and predicting the outcome to the prediction model of described target variable Judge, i.e. calculate the relative coefficient of described new prediction model and target variable, allow described relevant Property coefficient is more than or equal to described preset value 0.8, thus ensures finally to determine the prediction model of described target variable Accuracy.
Preferably, in another embodiment, corresponding to described maximal correlation property coefficient sample performance data Hypothetical checking is carried out with the correlative relationship of target variable, including:
Assume that the performance data that described maximal correlation property coefficient is corresponding is overall without significantly with the two of target variable Correlative relationship;
Calculate performance data corresponding to described maximal correlation property coefficient corresponding with the statistic of described target variable Together probit;
Wherein, described probit together represents sample data and the conceptual data probability without linear correlation, for Small probability event;
If described probit together is less than or equal to the significance level set, then refusal is described it is assumed that test Demonstrate,prove successfully;
Wherein, described significance level is to realize when carrying out hypothesis testing determining an admissible conduct Judge the small probability standard of boundary, it is considered that the probability equal to or less than 0.05 is small probability, in this enforcement Setting 1% in example is small probability standard;
Otherwise, accept described it is assumed that authentication failed.
In the present embodiment, utilize small probability apagoge thought, by probit together and significance level Comparison, checking sample is for totally the most representative, thus improves the accuracy of correlation analysis.
Described above to the disclosed embodiments, makes professional and technical personnel in the field be capable of or uses The present invention.Multiple amendment to these embodiments will be aobvious and easy for those skilled in the art See, generic principles defined herein can without departing from the spirit or scope of the present invention, Realize in other embodiments.Therefore, the present invention is not intended to be limited to the embodiments shown herein, And it is to fit to the widest scope consistent with principles disclosed herein and features of novelty.

Claims (8)

1. a host performance capacity predictor method based on big data, it is characterised in that the method includes:
Obtaining main frame this performance data of various kinds, wherein, described this performance data of various kinds is for affect host capacity The sample data estimated;
The method utilizing periodically 7 jump confidence intervals deletes the abnormal data of described this performance data of various kinds, To obtain this performance data of various kinds of suppressing exception data;
Calculate this performance data of various kinds of described suppressing exception data and the relative coefficient of target variable;
The sample performance data that maximal correlation property coefficient is corresponding is obtained according to described relative coefficient;
According to the sample performance data that described maximal correlation property coefficient is corresponding, binding time sequence is set up described The prediction model of target variable.
Method the most according to claim 1, it is characterised in that utilize periodically 7 jump confidence intervals Method delete described this performance data of various kinds abnormal data, to obtain each performance of suppressing exception data Sample data, including:
The performance data calculating every day in described this performance data of various kinds contrasts the change difference before and after 7 days;
Utilize the frequency histogram described difference in change value of demonstration whether to meet to be distributed the most very much;
If this performance data of various kinds existing corresponding change difference meet the sample performance number being distributed the most very much According to, according to meet the sample average of the change difference being distributed the most very much and standard deviation calculate described in meet and divide the most very much The confidence interval of the change difference of cloth;
Meet according to location, described confidence interval corresponding to the abnormal change difference being distributed the most very much time Between point;
Each performance data of the time point corresponding to change difference of the most too abnormal distribution is met described in deletion.
Method the most according to claim 2, it is characterised in that described fixed according to described confidence interval After meeting the abnormal time point corresponding to change difference being distributed the most very much described in Wei, also include:
Percentage is utilized to verify the confidence interval of change difference corresponding to described each performance data;
Wherein, if being proved to be successful, meet described in execution deletion corresponding to the change difference of the most too abnormal distribution Each performance data of time point.
Method the most according to claim 1, it is characterised in that described calculating described suppressing exception number According to the relative coefficient of this performance data of various kinds and target variable, including:
Calculate the linear dependence system of described target variable and each performance data of described suppressing exception data Number;
Judge that in described linear correlation property coefficient, whether the value of maximum linear relative coefficient is more than or equal to presetting Threshold value;
If the value of described maximum linear relative coefficient is more than or equal to predetermined threshold value, obtain described maximum linear The sample performance data that relative coefficient is corresponding;Wherein, described according to described relative coefficient acquisition maximum According to the sample performance data that relative coefficient is corresponding, described linear correlation property coefficient obtains maximum linear phase Close the sample performance data that property coefficient is corresponding;
If the value of described maximum linear relative coefficient is less than predetermined threshold value, calculate described suppressing exception data The determination coefficient that is fitted with given each curvilinear equation of this performance data of various kinds;
Judge described determine in coefficient maximum determine that whether the value of coefficient is more than or equal to predetermined threshold value;
If described maximum determines that coefficient is more than or equal to predetermined threshold value, obtain described maximum and determine that coefficient is corresponding Sample performance data;Wherein, described corresponding according to described relative coefficient acquisition maximal correlation property coefficient According to sample performance data described determine coefficient obtain maximum determine the sample performance data that coefficient is corresponding.
Method the most according to claim 4, it is characterised in that described according to described relative coefficient Obtain the sample performance data that maximal correlation property coefficient is corresponding, including:
If described maximum determines that coefficient is less than predetermined threshold value, according to described linear correlation property coefficient with determining it is In number, greatest coefficient obtains the sample performance data that described greatest coefficient is corresponding.
Method the most according to claim 1, it is characterised in that described according to described maximum correlation The sample performance data that coefficient is corresponding, binding time sequence is set up the prediction model of described target variable, is wrapped Include:
According to the performance data that maximal correlation property coefficient in described performance data is corresponding, binding time sequence is built The initialization model of vertical described target variable;
Described initialization model model parameter is adjusted, obtains new prediction model;
Calculate the relative coefficient of described new prediction model;
If described relative coefficient is more than or equal to described preset value, the most described new prediction model is described mesh The prediction model of mark variable;
Otherwise, the parameter of described new prediction model is re-started adjustment, until described relative coefficient More than or equal to preset value, determine the prediction model of described target variable.
Method the most according to claim 1, it is characterised in that described according to described relative coefficient After obtaining the sample performance data that maximal correlation property coefficient is corresponding, also include:
The sample performance data corresponding to described maximal correlation property coefficient and the correlative relationship of target variable Carry out hypothetical checking;
Wherein, if being proved to be successful, then prove the described maximal correlation property coefficient pair calculated according to sample The sample performance data answered meets entire change rule with the correlative relationship of target variable, performs according to institute Stating the sample performance data that maximal correlation property coefficient is corresponding, binding time sequence sets up described target variable Prediction model.
Method the most according to claim 7, it is characterised in that described to described maximum correlation system Sample performance data and the correlative relationship of target variable that number is corresponding carry out hypothetical checking, including:
Assume that the performance data that described maximal correlation property coefficient is corresponding is overall without significantly with the two of target variable Correlative relationship;
Calculate performance data corresponding to described maximal correlation property coefficient corresponding with described target variable statistic Together probit;
If described probit together is less than or equal to the significance level set, then refusal is described it is assumed that test Demonstrate,prove successfully;
Otherwise, accept described it is assumed that authentication failed.
CN201610318916.0A 2016-05-13 2016-05-13 A kind of host performance capacity predictor method based on big data Active CN106021082B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201610318916.0A CN106021082B (en) 2016-05-13 2016-05-13 A kind of host performance capacity predictor method based on big data

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201610318916.0A CN106021082B (en) 2016-05-13 2016-05-13 A kind of host performance capacity predictor method based on big data

Publications (2)

Publication Number Publication Date
CN106021082A true CN106021082A (en) 2016-10-12
CN106021082B CN106021082B (en) 2018-10-19

Family

ID=57100116

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201610318916.0A Active CN106021082B (en) 2016-05-13 2016-05-13 A kind of host performance capacity predictor method based on big data

Country Status (1)

Country Link
CN (1) CN106021082B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110941541A (en) * 2019-11-06 2020-03-31 北京百度网讯科技有限公司 Method and device for problem grading of data stream service

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8069240B1 (en) * 2007-09-25 2011-11-29 United Services Automobile Association (Usaa) Performance tuning of IT services
CN102411757A (en) * 2011-08-05 2012-04-11 中国工商银行股份有限公司 Method and system for forecasting capacity of large host central processing unit (CPU)
JP2012128771A (en) * 2010-12-17 2012-07-05 Mitsubishi Electric Corp Information processing apparatus and program
CN105069296A (en) * 2015-08-10 2015-11-18 国网浙江省电力公司电力科学研究院 Determination method and system of equipment threshold value

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8069240B1 (en) * 2007-09-25 2011-11-29 United Services Automobile Association (Usaa) Performance tuning of IT services
JP2012128771A (en) * 2010-12-17 2012-07-05 Mitsubishi Electric Corp Information processing apparatus and program
CN102411757A (en) * 2011-08-05 2012-04-11 中国工商银行股份有限公司 Method and system for forecasting capacity of large host central processing unit (CPU)
CN105069296A (en) * 2015-08-10 2015-11-18 国网浙江省电力公司电力科学研究院 Determination method and system of equipment threshold value

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
林森 等: ""用P-BP预测网络模型预测通信网络指标"", 《计算机应用》 *
焦龙 等: ""IBM SPSS Statistics智能容量规划解决方案,第二部分:多变量预测建模"", 《HTTPS://WWW.IBM.COM/DEVELOPERWORKS/CN/DATA/LIBRARY/BA/BA-SPSS-STATISTICS2/》 *

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110941541A (en) * 2019-11-06 2020-03-31 北京百度网讯科技有限公司 Method and device for problem grading of data stream service

Also Published As

Publication number Publication date
CN106021082B (en) 2018-10-19

Similar Documents

Publication Publication Date Title
CN103024762B (en) Service feature based communication service forecasting method
CN103827826B (en) Adaptively determining response time distribution of transactional workloads
CN104424598A (en) Cash demand quantity predicating device and method
CN105989441A (en) Model parameter adjustment method and device
CN109460908A (en) The construction cost assessment method of soft project
CA2756619A1 (en) Method and system for computerized tracking, analyzing and reporting of information specific to residential and commercial tenancy histories
CN110119948B (en) Power consumer credit evaluation method and system based on time-varying weight dynamic combination
CN106227765B (en) The accumulative implementation method of time window
WO2021098384A1 (en) Data abnormality detection method and apparatus
CN108460521A (en) The recommendation method and system of the audit target
CN109784627A (en) Capital budgeting management method, device, equipment and computer readable storage medium
CN108805338A (en) A kind of stable variable determines method, apparatus, server and storage medium
CN106021082A (en) Big data based host performance capacity estimation method
CN103530190B (en) A kind of load predicting method and device
CN116739742A (en) Monitoring method, device, equipment and storage medium of credit wind control model
CN106487570A (en) A kind of method and apparatus of assessment network performance index variation tendency
CN107832267A (en) A kind of statistics method of summary and device
CN103761442B (en) Forecasting method and device for flow parameters of micro areas
TWI626550B (en) Processing system and method for predicting system defect hotspot prediction
CN108090776A (en) The update method and system of credit evaluation model, credit estimation method and system
CN106301880A (en) One determines that cyberrelationship degree of stability, Internet service recommend method and apparatus
CN113743532B (en) Abnormality detection method, abnormality detection device, abnormality detection apparatus, and computer storage medium
CN114338429B (en) Network bandwidth determining method and device and electronic equipment
CN117391644B (en) Parameter adjustment method, device, equipment and medium in contract management process
CN116225882B (en) Command information system state monitoring and evaluating method

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
CB03 Change of inventor or designer information
CB03 Change of inventor or designer information

Inventor after: Zhuang Lei

Inventor after: Teng Teng

Inventor after: Zhang Hongliang

Inventor after: Yu Peng

Inventor after: Sun Zhe

Inventor before: Zhuang Lei

Inventor before: Teng Teng

Inventor before: Zhang Hongliang

Inventor before: Yu Peng

Inventor before: Sun Zhe