CN109951327B - Network fault data synthesis method based on Bayesian hybrid model - Google Patents

Network fault data synthesis method based on Bayesian hybrid model Download PDF

Info

Publication number
CN109951327B
CN109951327B CN201910165006.7A CN201910165006A CN109951327B CN 109951327 B CN109951327 B CN 109951327B CN 201910165006 A CN201910165006 A CN 201910165006A CN 109951327 B CN109951327 B CN 109951327B
Authority
CN
China
Prior art keywords
distribution
network
alm
generated
data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201910165006.7A
Other languages
Chinese (zh)
Other versions
CN109951327A (en
Inventor
阴法明
杜庆波
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nanjing College of Information Technology
Original Assignee
Nanjing College of Information Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nanjing College of Information Technology filed Critical Nanjing College of Information Technology
Priority to CN201910165006.7A priority Critical patent/CN109951327B/en
Publication of CN109951327A publication Critical patent/CN109951327A/en
Application granted granted Critical
Publication of CN109951327B publication Critical patent/CN109951327B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Landscapes

  • Complex Calculations (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention discloses a network fault data synthesis method based on a Bayesian hybrid model, which is used for solving the defect of reduced prediction performance caused by less fault data in the conventional network fault prediction. By adopting the method, the characteristics of the network data set with the unbalanced characteristics can be accurately grasped, and the accuracy of network fault prediction is effectively improved.

Description

Network fault data synthesis method based on Bayesian hybrid model
Technical Field
The invention relates to a Bayesian hybrid model-based network fault data synthesis method, and belongs to the technical field of unbalanced data processing.
Background
With the development of internet technology, more and more users begin to use various types of network services. Network operators are also striving to provide higher quality and more stable transmission streaming video services to users. Due to the generation of network faults, the quality of user experience is easily reduced. In other words, if an operator can accurately predict a network failure in advance and take measures to solve problems that may occur in the network, the user experience can be effectively improved. Therefore, the prediction and timely handling of the user's failure is crucial for the network operator.
In an actual system, the proportion of network fault data in the whole network data set collected by the system is relatively small, in other words, the probability of network fault generation is far lower than the probability of network normal. Thus, the network data set has non-uniform characteristics. An unbalanced data set refers to a set of data in which one type of data is significantly less than the other type of data. Here, the amount of data for a network failure (few class samples) is much less than the amount of data for a network failure (most class samples). For such cases, when processing unbalanced data, a conventional classifier is usually trained to have a preference, so that most classes predict with a high accuracy, and for few classes the accuracy is low. In methods of processing non-equalized data sets, typically sample-based methods, the non-equalized data sets are changed into equalized data sets by changing the distribution of the data sets.
Most existing methods deal with unbalanced data by generating new Minority samples directly from existing samples, such as the Synthetic Minrity Oversampling Technique (SMOTE) method. The methods are intuitive, but the distribution characteristics of a few types of samples are not deeply mined, so the generated samples are not necessarily helpful for classification, often have adverse effects on classification, and the generated new few types of samples are not representative, so the methods cannot be well applied to network fault prediction.
Disclosure of Invention
The invention aims to overcome the defects in the existing network fault data processing, and provides a network fault data synthesis method based on a Bayesian hybrid model.
The technical scheme of the invention is as follows: a network fault data synthesis method based on a Bayesian hybrid model comprises the following steps:
step 1: set the collected network data set as
Figure GDA0003140014980000021
Wherein xnThe system comprises six attributes, namely packet loss rate, terminal download rate, transmission delay, jitter, video transmission quality and terminal user experience score; the data set corresponds to a set of tags
Figure GDA0003140014980000022
y
n0 or 1, i.e. X corresponds to two types of tags, where y n0 is a network normal class label, ynThe 1 class is a network fault class label, and because the number of data of the network normal class is far more than that of the network fault class, y is definednX corresponding to 1nThe formed set is a minority of classes
Figure GDA0003140014980000023
Wherein
Figure GDA0003140014980000024
As minority class samples, NalmNumber of minority class samples, and ynX corresponding to 0nThe set of groups is a plurality of classes
Figure GDA0003140014980000025
Wherein
Figure GDA0003140014980000026
For most classes of samples, NmajThe number of most samples;
step 2: the Bayesian mixed model is selected to represent XalmThe probability distribution function expression of (a) includes:
Figure GDA0003140014980000027
wherein K is a mixed fraction, pij(V)、μj、ΛjAnd vjRespectively representing the weight, the mean, the covariance matrix and the freedom parameter of the jth mixed component;
Figure GDA0003140014980000028
probability density function for t distribution, expressed as:
Figure GDA0003140014980000029
wherein N (-) and Gam (-) represent a Gaussian distribution function and a Gamma distribution function, respectively, unjIs equal to xnImplicit variable, weight pi, associated with the jth mixed componentj(V) satisfies
Figure GDA00031400149800000210
The expression is as follows:
Figure GDA00031400149800000211
variable V in the above formulajObeying a Beta distribution, i.e. p (V)j)=Beta(V j1, α), α is the hyper-parameter of the Beta distribution, and μjjObeying a joint Gaussian-Wishart distribution, i.e. the product of a Gaussian distribution and a Wishart distribution, N (-) W (-):
p(μjj)=N(μj|mjjΛj)W(Λj|Wjj)
wherein
Figure GDA00031400149800000212
A hyper-parameter, m, for the joint Gaussian-Wishart distributionjIs a six-dimensional column vector, λjAnd ρjIs a scalar quantity, WjIs a (6 × 6) matrix; introducing an implicit variable
Figure GDA00031400149800000213
Wherein z isnIndicating the current data xnIs generated by which component in the t-mixture model, when xnIs generated from the jth mixed component, znjBased on the above, the hyper-parameters of the entire model are:
Figure GDA00031400149800000214
and step 3: by using XalmPerforming parameter estimation on the hybrid model, specifically as follows:
3-1) production of NalmObey [1, K]Random integers are uniformly distributed in the interval, and the probability of each integer in the interval is counted; i.e. if N is generatedjAn integer j, then δj=Nj/Nalm(ii) a For each
Figure GDA0003140014980000031
Corresponding hidden variable znIs initially distributed as
Figure GDA0003140014980000032
znIs a K-dimensional vector, which is in each dimension znjA value on (j ═ 1.., K) is {0,1 };
3-2) setting the hyper-parameters
Figure GDA0003140014980000033
An initial value of α; for all j (j ═ 1.. times, K), mj=0,λj=1,ρjTaking any number between 3 and 20, WjI is a unity matrix, vjTaking any number between 1 and 100, and taking any number between 1 and 10 for alpha; further, the iteration number count variable k is 1;
3-3) updating hidden variables
Figure GDA0003140014980000034
The distribution of (a) is, that is,
Figure GDA0003140014980000035
its hyper-parameter
Figure GDA0003140014980000036
The update formula of (2) is:
Figure GDA0003140014980000037
Figure GDA0003140014980000038
wherein
Figure GDA0003140014980000039
Calculation at first iteration
Figure GDA00031400149800000310
When the temperature of the water is higher than the set temperature,
Figure GDA00031400149800000311
Figure GDA00031400149800000312
3-4) updating random variables
Figure GDA00031400149800000313
The distribution of (a) is, that is,
Figure GDA00031400149800000314
corresponding hyperparameter
Figure GDA00031400149800000315
The update formula of (2) is as follows:
Figure GDA00031400149800000316
Figure GDA00031400149800000317
Figure GDA00031400149800000318
Figure GDA00031400149800000319
wherein the content of the first and second substances,
Figure GDA00031400149800000320
3-5) updating random variables
Figure GDA0003140014980000041
The distribution of (a) is, that is,
Figure GDA0003140014980000042
corresponding superParameter(s)
Figure GDA0003140014980000043
The update formula of (2) is:
Figure GDA0003140014980000044
Figure GDA0003140014980000045
3-6) updating hidden variables
Figure GDA0003140014980000046
Distribution of (2)
Figure GDA0003140014980000047
Wherein
Figure GDA0003140014980000048
In the above formula, each term is desired<·>The calculation formula of (a) is as follows:
Figure GDA0003140014980000049
Figure GDA00031400149800000410
Figure GDA00031400149800000411
Figure GDA00031400149800000412
where Γ (·) is a standard gamma function,Γ (·)' is the derivative of the standard gamma function; in addition to this, the present invention is,
Figure GDA00031400149800000413
and<unj>the calculation methods of (3) have been given in step 3-3) and step 3-4), respectively;
3-7) updating the degree of freedom parameter
Figure GDA00031400149800000414
That is, the solution contains v as followsjThe equation of (c):
Figure GDA00031400149800000415
newton's method is selected to obtain the solution v of the equationj
3-8) calculating likelihood value LIK after current iterationitrItr is the current iteration number:
Figure GDA0003140014980000051
3-9) calculating the difference value delta LIK (LIK) of the likelihood value after the current iteration and the likelihood value after the last iterationitr-LIKitr-1(ii) a If delta LIK is less than or equal to delta, the parameter estimation process is ended, otherwise, the step (3-3) is carried out, the value of itr is increased by 1, and the next iteration is continued; the threshold value delta is within the range of 10-5~10-4
And 4, step 4: generating a new network data set (X) using the estimated Bayesian hybrid modelalm) 'if the data amount to be generated is N', the method includes:
4-1) randomly generating a random number epsilon between 0 and 1 and obeying uniform distribution;
4-2) random Generation compliance
Figure GDA0003140014980000052
Distributed by
Figure GDA0003140014980000053
4-3) calculation
Figure GDA0003140014980000054
4-4) random Generation compliance
Figure GDA0003140014980000055
Distributed by
Figure GDA0003140014980000056
4-5) using the estimated
Figure GDA0003140014980000057
If ε ∈ [0, π1]Then a distribution t (mu) obeying t is generated11,v1) The sample of (1); if it is not
Figure GDA0003140014980000058
A distribution t (mu) obeying t is generatedkk,vk) The sample of (1); if it is not
Figure GDA0003140014980000059
A distribution t (mu) obeying t is generatedKK,vK) The sample of (1);
4-6) repeating the above steps (4-1) to (4-5) N' times to obtain (X)alm) ', the final network failure data set is
Figure GDA00031400149800000510
The total data set after synthesis is
Figure GDA00031400149800000511
The invention has the following beneficial effects:
1. the invention well solves the problem that the classification and prediction of the unbalanced data in the network fault prediction task are not accurate enough by generating the network fault data.
2. The invention utilizes the Bayesian mixed model to model the distribution of the network fault data, well grasps the characteristics of the data, and compared with the traditional method, the new network fault data generated by the invention has more representative and classified discrimination.
3. The Bayesian hybrid model designed by the invention can adaptively determine the optimal model structure according to minority class data.
Drawings
FIG. 1 is a flow chart of the method of the present invention.
FIG. 2 is a distribution diagram of an artificially generated sample after fitting with a Bayesian mixture model in accordance with the present invention.
FIG. 3 is a likelihood value variation curve of the Bayesian mixture model iterative process of the present invention.
FIG. 4 is a comparison of G values for the Kmeans-SMOTE method, the GMM oversampling method and the method of the present invention.
Detailed Description
The invention is described in further detail below with reference to the accompanying drawings.
As shown in fig. 1, the present invention provides a network fault data synthesis method based on a bayesian mixture model, which comprises the following steps:
step 1: set the collected network data set as
Figure GDA0003140014980000061
Wherein xnThe system comprises six attributes, namely packet loss rate, terminal download rate, transmission delay, jitter, video transmission quality and terminal user experience score; the data set corresponds to a set of tags
Figure GDA0003140014980000062
y
n0 or 1, i.e. X corresponds to two types of tags, where y n0 is a network normal class label, ynThe 1 class is a network fault class label, and because the number of data of the network normal class is far more than that of the network fault class, y is definednX corresponding to 1nThe formed set is a minority of classes
Figure GDA0003140014980000063
Wherein
Figure GDA0003140014980000064
As minority class samples, NalmNumber of minority class samples, and ynX corresponding to 0nThe set of groups is a plurality of classes
Figure GDA0003140014980000065
Wherein
Figure GDA0003140014980000066
For most classes of samples, NmajThe number of most samples;
step 2: the Bayesian mixed model is selected to represent XalmThe probability distribution function expression of (a) includes:
Figure GDA0003140014980000067
wherein K is a mixed fraction, pij(V),μjj,vjRespectively representing the weight, the mean, the covariance matrix and the freedom parameter of the jth mixed component.
Figure GDA0003140014980000068
A probability density function for the t-distribution, which can be expressed as:
Figure GDA0003140014980000069
wherein N (-) and Gam (-) represent a Gaussian distribution function and a Gamma distribution function, respectively, unjIs equal to xnAnd hidden variables associated with the jth mixture component. Weight pij(V) satisfies
Figure GDA00031400149800000610
The expression is as follows:
Figure GDA00031400149800000611
variable V in the above formulajObeying a Beta distribution, i.e. p (V)j)=Beta(V j1, α), α is the hyper-parameter of the Beta distribution. In addition, μjjObeying a joint Gaussian-Wishart distribution (i.e., the product of the Gaussian distribution and the Wishart distribution, N (-) W (-)):
p(μjj)=N(μj|mjjΛj)W(Λj|Wjj)
wherein
Figure GDA0003140014980000071
A hyper-parameter for the joint Gaussian-Wishart distribution. m isjIs a six-dimensional column vector, λjAnd ρjIs a scalar quantity, WjIs a (6 × 6) matrix. It is also necessary to introduce a hidden variable
Figure GDA0003140014980000072
Wherein z isnIndicating the current data xnIs generated from which component in the t-hybrid model. When x isnIs generated from the jth mixed component, z nj1. Based on the above, the hyper-parameters of the whole model are:
Figure GDA0003140014980000073
and step 3: by using XalmPerforming parameter estimation on the hybrid model, specifically as follows:
(3-1) production of NalmObey [1, K]Random integers are uniformly distributed in the interval, and the probability of each integer in the interval is counted; i.e. if N is generatedjAn integer j, then δj=Nj/Nalm(ii) a For each
Figure GDA0003140014980000074
Corresponding hidden variable znThe initial distribution of (a) is:
Figure GDA0003140014980000075
in addition, z isnIs a K-dimensional vector, which is in each dimension znjA value on (j ═ 1.., K) is {0,1 };
(3-2) setting of hyper-parameters
Figure GDA0003140014980000076
An initial value of α; for all j (j ═ 1.. times, K), mj=0,λj=1,ρjCan be any number between 3 and 20, WjI is a unity matrix, vjAny number between 1 and 100 can be taken, and any number between 1 and 10 can be taken as alpha; further, the iteration number count variable k is 1;
(3-3) updating hidden variables
Figure GDA0003140014980000077
The distribution of (a) is, that is,
Figure GDA0003140014980000078
its hyper-parameter
Figure GDA0003140014980000079
The update formula of (2) is:
Figure GDA00031400149800000710
Figure GDA00031400149800000711
wherein:
Figure GDA00031400149800000712
calculation at first iteration
Figure GDA00031400149800000713
When the temperature of the water is higher than the set temperature,
Figure GDA00031400149800000714
Figure GDA00031400149800000715
(3-4) updating random variables
Figure GDA00031400149800000716
The distribution of (a) is, that is,
Figure GDA00031400149800000717
corresponding hyperparameter
Figure GDA00031400149800000718
The update formula of (2) is as follows:
Figure GDA0003140014980000081
Figure GDA0003140014980000082
Figure GDA0003140014980000083
Figure GDA0003140014980000084
wherein the content of the first and second substances,
Figure GDA0003140014980000085
(3-5) updating random variables
Figure GDA0003140014980000086
The distribution of (a) is, that is,
Figure GDA0003140014980000087
corresponding hyperparameter
Figure GDA0003140014980000088
The update formula of (2) is:
Figure GDA0003140014980000089
Figure GDA00031400149800000810
(3-6) updating hidden variables
Figure GDA00031400149800000811
Distribution of (2)
Figure GDA00031400149800000812
Wherein:
Figure GDA00031400149800000813
in the above formula, each term is desired<·>The calculation formula of (a) is as follows:
Figure GDA00031400149800000814
Figure GDA00031400149800000815
Figure GDA00031400149800000816
Figure GDA00031400149800000817
wherein Γ (·) is a standard gamma function, Γ (·)' is a derivative of the standard gamma function; in addition to this, the present invention is,
Figure GDA00031400149800000818
and<unj>the calculation methods of (4) have been given in step (3-3) and step (3-4), respectively;
(3-7) updating the degree of freedom parameter
Figure GDA0003140014980000091
That is, the solution contains v as followsjThe equation of (c):
Figure GDA0003140014980000092
the solution v of the equation can be obtained quickly by using a common numerical calculation method, such as the Newton methodj
(3-8) calculating likelihood value LIK after current iterationitrItr is the current iteration number:
Figure GDA0003140014980000093
(3-9) calculating the difference value delta LIK (LIK) of the likelihood value after the current iteration and the likelihood value after the last iterationitr-LIKitr-1(ii) a If delta LIK is less than or equal to delta, the parameter estimation process is ended, otherwise, the step (3-3) is carried out, the value of itr is increased by 1, and the next iteration is continued; the threshold value delta is within the range of 10-5~10-4
And 4, step 4: generating a new network data set (X) using the estimated Bayesian hybrid modelalm) 'if the data amount to be generated is N', the method includes:
(4-1) randomly generating a random number epsilon between 0 and 1, which is subject to uniform distribution;
(4-2) random Generation compliance
Figure GDA0003140014980000094
Distributed by
Figure GDA0003140014980000095
(4-3) calculation of
Figure GDA0003140014980000096
(4-4) random Generation compliance
Figure GDA0003140014980000097
Distributed by
Figure GDA0003140014980000098
(4-5) utilization of the estimated
Figure GDA0003140014980000099
If ε ∈ [0, π1]Then a distribution t (mu) obeying t is generated11,v1) The sample of (1); if it is not
Figure GDA00031400149800000910
A distribution t (mu) obeying t is generatedkk,vk) The sample of (1); if it is not
Figure GDA00031400149800000911
A distribution t (mu) obeying t is generatedKK,vK) The sample of (1);
(4-6) repeating the above steps (4-1) to (4-5) N' times to obtain (X)alm) ', the final network failure data set is
Figure GDA00031400149800000912
The total data set after synthesis is
Figure GDA00031400149800000913
And (3) comparing the performances:
the clustering effect of the bayesian mixture model (DPMM) was first tested. The idea is as follows: and carrying out unsupervised learning by using a plurality of samples from a plurality of clusters and with unknown cluster labels by using a DPMM clustering algorithm, and finally comparing the clustering result with the labels of the original samples to display the classification effect.
In the experiment, 1000 three-dimensional samples are generated by using three single Gaussian models, and the iteration number of the experiment is 200. The distribution of sample points after the fitting is completed by the Bayesian mixed model designed by the invention is shown in figure 2. The number of correctly classified samples is 942, and the accuracy of the fitting reaches 94.2%. FIG. 3 shows a line graph of the change in the number of classes over 200 iterations, from which the blending score K of the model generally fluctuates around 3 and eventually converges by approximately 160 iterations. Experimental results show that the model structure can be automatically determined from the described samples based on a bayesian mixture model.
Then, the method of the present invention is subjected to a verification experiment with respect to network data provided by a certain network operator. The method is used for synthesizing a new sample and adding the new sample into a minority class, so that the new data set is relatively balanced, then a naive Bayes classifier is used as a base classifier to train and model the new data set, and then a test data set is used for testing. The comparison was performed using the traditional Kmeans-SMOTE method and the GMM oversampling method. The test data set used raw data, and the ratio of minority class to majority class in the test data we chose 1:30, 1:60 and 1:89 to perform the training test, and the results of the experiment are shown in fig. 4: as can be seen from FIG. 4, compared with the Kmeans-SMOTE algorithm and the GMM oversampling method, the DPMM value of the method of the present invention is improved by 16% and 4.8%, respectively. Therefore, the method provided by the invention effectively improves the classification prediction effect of the unbalanced network data.

Claims (1)

1. A network fault data synthesis method based on a Bayesian hybrid model is characterized by comprising the following steps:
step 1: set the collected network data set as
Figure FDA0003140014970000011
Wherein xnThe system comprises six attributes, namely packet loss rate, terminal download rate, transmission delay, jitter, video transmission quality and terminal user experience score; the data set corresponds to a set of tags
Figure FDA0003140014970000012
yn0 or 1, i.e. X corresponds to two types of tags, where yn0 is a network normal class label, ynThe 1 class is a network fault class label, and because the number of data of the network normal class is far more than that of the network fault class, y is definednX corresponding to 1nThe formed set is a minority of classes
Figure FDA0003140014970000013
Wherein
Figure FDA0003140014970000014
As minority class samples, NalmNumber of minority class samples, and ynX corresponding to 0nThe set of groups is a plurality of classes
Figure FDA0003140014970000015
Wherein
Figure FDA0003140014970000016
For most classes of samples, NmajThe number of most samples;
step 2: the Bayesian mixed model is selected to represent XalmThe probability distribution function expression of (a) includes:
Figure FDA0003140014970000017
wherein K is a mixed fraction, pij(V)、μj、ΛjAnd vjRespectively representing the weight of the jth mixed componentMean, covariance matrix and degree of freedom parameters;
Figure FDA0003140014970000018
probability density function for t distribution, expressed as:
Figure FDA0003140014970000019
wherein N (-) and Gam (-) represent a Gaussian distribution function and a Gamma distribution function, respectively, unjIs equal to xnImplicit variable, weight pi, associated with the jth mixed componentj(V) satisfies
Figure FDA00031400149700000110
The expression is as follows:
Figure FDA00031400149700000111
variable V in the above formulajObeying a Beta distribution, i.e. p (V)j)=Beta(Vj1, α), α is the hyper-parameter of the Beta distribution, and μjjObeying a joint Gaussian-Wishart distribution, i.e. the product of a Gaussian distribution and a Wishart distribution, N (-) W (-):
p(μjj)=N(μj|mjjΛj)W(Λj|Wjj)
wherein
Figure FDA00031400149700000112
A hyper-parameter, m, for the joint Gaussian-Wishart distributionjIs a six-dimensional column vector, λjAnd ρjIs a scalar quantity, WjIs a (6 × 6) matrix; introducing an implicit variable
Figure FDA00031400149700000113
Wherein z isnIndication whenPreceding data xnIs generated by which component in the t-mixture model, when xnIs generated from the jth mixed component, znjBased on the above, the hyper-parameters of the entire model are:
Figure FDA00031400149700000114
and step 3: by using XalmPerforming parameter estimation on the hybrid model, specifically as follows:
3-1) production of NalmObey [1, K]Random integers are uniformly distributed in the interval, and the probability of each integer in the interval is counted; i.e. if N is generatedjAn integer j, then δj=Nj/Nalm(ii) a For each
Figure FDA0003140014970000021
Corresponding hidden variable znIs initially distributed as
Figure FDA0003140014970000022
znIs a K-dimensional vector, which is in each dimension znjA value on (j ═ 1.., K) is {0,1 };
3-2) setting the hyper-parameters
Figure FDA0003140014970000023
An initial value of α; for all j, j ═ 1j=0,λj=1,ρjTaking any number between 3 and 20, WjI is a unity matrix, vjTaking any number between 1 and 100, and taking any number between 1 and 10 for alpha; further, the iteration number count variable k is 1;
3-3) updating hidden variables
Figure FDA0003140014970000024
The distribution of (a) is, that is,
Figure FDA0003140014970000025
its hyper-parameter
Figure FDA0003140014970000026
The update formula of (2) is:
Figure FDA0003140014970000027
Figure FDA0003140014970000028
wherein
Figure FDA0003140014970000029
Calculation at first iteration
Figure FDA00031400149700000210
When the temperature of the water is higher than the set temperature,
Figure FDA00031400149700000211
3-4) updating random variables
Figure FDA00031400149700000212
The distribution of (a) is, that is,
Figure FDA00031400149700000213
corresponding hyperparameter
Figure FDA00031400149700000214
The update formula of (2) is as follows:
Figure FDA00031400149700000215
Figure FDA00031400149700000216
Figure FDA00031400149700000217
Figure FDA00031400149700000218
wherein the content of the first and second substances,
Figure FDA0003140014970000031
3-5) updating random variables
Figure FDA0003140014970000032
The distribution of (a) is, that is,
Figure FDA0003140014970000033
corresponding hyperparameter
Figure FDA0003140014970000034
The update formula of (2) is:
Figure FDA0003140014970000035
Figure FDA0003140014970000036
3-6) updating hidden variables
Figure FDA0003140014970000037
Distribution of (2)
Figure FDA0003140014970000038
Wherein
Figure FDA0003140014970000039
In the above equation, the calculation formula of each term expectation < > is as follows:
Figure FDA00031400149700000310
Figure FDA00031400149700000311
Figure FDA00031400149700000312
Figure FDA00031400149700000313
wherein Γ (·) is a standard gamma function, Γ (·)' is a derivative of the standard gamma function; in addition to this, the present invention is,
Figure FDA00031400149700000314
and<unj>the calculation methods of (3) have been given in step 3-3) and step 3-4), respectively;
3-7) updating the degree of freedom parameter
Figure FDA00031400149700000315
That is, the solution contains v as followsjThe equation of (c):
Figure FDA00031400149700000316
newton's method is selected to obtain the solution v of the equationj
3-8) calculating likelihood value LIK after current iterationitrItr is the current iteration number:
Figure FDA0003140014970000041
3-9) calculating the difference value delta LIK (LIK) of the likelihood value after the current iteration and the likelihood value after the last iterationitr-LIKitr-1(ii) a If delta LIK is less than or equal to delta, the parameter estimation process is ended, otherwise, the step (3-3) is carried out, the value of itr is increased by 1, and the next iteration is continued; the threshold value delta is within the range of 10-5~10-4
And 4, step 4: generating a new network data set (X) using the estimated Bayesian hybrid modelalm) 'if the data amount to be generated is N', the method includes:
4-1) randomly generating a random number epsilon between 0 and 1 and obeying uniform distribution;
4-2) random Generation compliance
Figure FDA0003140014970000042
Distributed by
Figure FDA0003140014970000043
4-3) calculation
Figure FDA0003140014970000044
4-4) random Generation compliance
Figure FDA0003140014970000045
Distributed by
Figure FDA0003140014970000046
4-5) using the estimated
Figure FDA0003140014970000047
If ε ∈ [0, π1]Then a distribution t (mu) obeying t is generated11,v1) The sample of (1); if it is not
Figure FDA0003140014970000048
A distribution t (mu) obeying t is generatedkk,vk) The sample of (1); if it is not
Figure FDA0003140014970000049
A distribution t (mu) obeying t is generatedKK,vK) The sample of (1);
4-6) repeating the above steps (4-1) to (4-5) N' times to obtain (X)alm) ', the final network failure data set is
Figure FDA00031400149700000410
The total data set after synthesis is
Figure FDA00031400149700000411
CN201910165006.7A 2019-03-05 2019-03-05 Network fault data synthesis method based on Bayesian hybrid model Active CN109951327B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910165006.7A CN109951327B (en) 2019-03-05 2019-03-05 Network fault data synthesis method based on Bayesian hybrid model

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910165006.7A CN109951327B (en) 2019-03-05 2019-03-05 Network fault data synthesis method based on Bayesian hybrid model

Publications (2)

Publication Number Publication Date
CN109951327A CN109951327A (en) 2019-06-28
CN109951327B true CN109951327B (en) 2021-08-20

Family

ID=67008458

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910165006.7A Active CN109951327B (en) 2019-03-05 2019-03-05 Network fault data synthesis method based on Bayesian hybrid model

Country Status (1)

Country Link
CN (1) CN109951327B (en)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110688484B (en) * 2019-09-24 2021-12-31 北京工商大学 Microblog sensitive event speech detection method based on unbalanced Bayesian classification
CN111652375B (en) * 2020-06-02 2023-06-06 中南大学 Intelligent detection and diagnosis method and device for cooling coil faults based on Bayesian reasoning and virtual sensing
CN115037634B (en) * 2022-05-30 2024-04-16 中电信数智科技有限公司 K8s network fault prediction method based on Markov chain and Bayesian network

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103226595B (en) * 2013-04-17 2016-06-15 南京邮电大学 The clustering method of the high dimensional data of common factor analyzer is mixed based on Bayes
CN103955709B (en) * 2014-05-13 2017-04-19 西安电子科技大学 Weighted synthetic kernel and triple markov field (TMF) based polarimetric synthetic aperture radar (SAR) image classification method
US10409789B2 (en) * 2016-09-16 2019-09-10 Oracle International Corporation Method and system for adaptively imputing sparse and missing data for predictive models
CN107180246A (en) * 2017-04-17 2017-09-19 南京邮电大学 A kind of IPTV user's report barrier data synthesis method based on mixed model
CN109327404B (en) * 2018-09-30 2022-06-07 武汉思普崚技术有限公司 P2P prediction method and system based on naive Bayes classification algorithm, server and medium

Also Published As

Publication number Publication date
CN109951327A (en) 2019-06-28

Similar Documents

Publication Publication Date Title
CN109931678B (en) Air conditioner fault diagnosis method based on deep learning LSTM
CN109951327B (en) Network fault data synthesis method based on Bayesian hybrid model
CN110542819B (en) Transformer fault type diagnosis method based on semi-supervised DBNC
CN109086799A (en) A kind of crop leaf disease recognition method based on improvement convolutional neural networks model AlexNet
CN110398650B (en) Transformer fault diagnosis method based on k-adjacent SMOTE and deep learning
CN107169527B (en) Medical image classification method based on collaborative deep learning
CN108875772B (en) Fault classification model and method based on stacked sparse Gaussian Bernoulli limited Boltzmann machine and reinforcement learning
CN110349597A (en) A kind of speech detection method and device
CN115131347B (en) Intelligent control method for processing zinc alloy parts
CN110008404B (en) Latent semantic model optimization method based on NAG momentum optimization
CN108647772B (en) Method for removing gross errors of slope monitoring data
CN113673679A (en) Cut tobacco drying process parameter selection method based on particle swarm optimization neural network
CN112464984A (en) Automatic feature construction method based on attention mechanism and reinforcement learning
CN112597687B (en) Turbine disk structure mixed reliability analysis method based on few-sample learning
CN111191823A (en) Production logistics prediction method based on deep learning
CN111126560A (en) Method for optimizing BP neural network based on cloud genetic algorithm
CN112395558B (en) Improved unbalanced data mixed sampling method suitable for historical fault data of intelligent electric meter
CN109978023A (en) Feature selection approach and computer storage medium towards higher-dimension big data analysis
CN109342862A (en) Based on Non-surveillance clustering with and svm classifier Diagnosis Method of Transformer Faults
CN111474905B (en) Parameter drift fault diagnosis method in manufacturing process of electromechanical product
CN114692507A (en) Counting data soft measurement modeling method based on stacking Poisson self-encoder network
CN113449471A (en) Wind power output simulation generation method for continuously improving MC (multi-channel) by utilizing AP (access point) clustering-skipping
CN113077853B (en) Global optimization method and system for mechanical parameters of double loss value network deep reinforcement learning KVFD model
Hanh et al. Applying the meta-heuristic algorithms for mutation-based test data generation for Simulink models
CN114330924B (en) Complex product change strength prediction method based on generating type countermeasure network

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant