CN105005629A - SDN stream clustering method based on gaussian mixture - Google Patents
SDN stream clustering method based on gaussian mixture Download PDFInfo
- Publication number
- CN105005629A CN105005629A CN201510488828.0A CN201510488828A CN105005629A CN 105005629 A CN105005629 A CN 105005629A CN 201510488828 A CN201510488828 A CN 201510488828A CN 105005629 A CN105005629 A CN 105005629A
- Authority
- CN
- China
- Prior art keywords
- theta
- omega
- sigma
- sdn
- represent
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/28—Databases characterised by their database models, e.g. relational or object models
- G06F16/284—Relational databases
- G06F16/285—Clustering or classification
Landscapes
- Engineering & Computer Science (AREA)
- Databases & Information Systems (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Data Exchanges In Wide-Area Networks (AREA)
- Complex Calculations (AREA)
Abstract
The invention relates to an SDN stream clustering method based on gaussian mixture. According to the method, a basic gaussian mixture model algorithm is improved, side information of streams is introduced, and a gaussian mixture model based on side information and constrained by an equivalent set is constructed, so that a clustering effect is improved, and the SDN stream clustering method is applied to SDN data stream clustering. By adopting the method, the accuracy of a clustering result is improved greatly and a clustering speed is increased greatly.
Description
Technical field
The present invention relates to SDN data stream clustering, particularly a kind of SDN based on mixed Gaussian flows clustering method.
Background technology
Software defined network (Software Defined Network, SDN), it is a kind of new network innovation of Emulex network framework, it is a kind of implementation of network virtualization, its core technology OpenFlow is by separating network equipment chain of command and data surface, thus achieve the flexible control of network traffics, make network become more intelligent as pipeline.
At present under software defined network (SDN) environment, the research for the subsumption problem efficiently and accurately of SDN stream does not also have remarkable progress.
Summary of the invention
In view of this, the object of the invention is to propose a kind of SDN based on mixed Gaussian and flow clustering method, the accuracy of cluster result and cluster speed are greatly improved.
The present invention adopts following scheme to realize: a kind of SDN based on mixed Gaussian flows clustering method, specifically comprises the following steps;
Step S1: five-tuple record is carried out to original SDN data, and adopts KMeans clustering algorithm, complete the mapping relations between SND data stream and user;
Step S2: utilize gauss hybrid models GMM and formula
estimate the probability density distribution of SDN data stream, wherein K is the number of Gauss model, a
ibe the weight of i-th Gauss model, p
i(x| θ
i) be the probability density function of i-th Gauss model, described p
i(x| θ
i) average be μ
k, variance is σ
k; θ
i=(μ
i, ∑
i), μ
i, ∑
ifor the parameter of data genaration model to be solved;
Step S3: adopt stream duration, data packet number, stream size, data package size, packet interval time as SDN flow vector attribute, obtains SDN stream collection of equal value by side information;
Step S4: the data genaration model of gauss hybrid models GMM is adjusted;
Step S5: adopt must-link to put constraint in pairs and cannot-link puts constraint in pairs and carrys out auxiliary cluster.
Further, described step S4 specifically comprises the following steps:
Step S41: valued space SDN being flowed side information is expressed as
Wherein, Y={y
1..., y
i..., y
n, y
irepresent the cluster centre of i-th data point,
represent at s the cluster centre concentrating i-th data point of equal value, Y
srepresent s collection X of equal value
sprobability distribution, X
scomprise N
sbar data stream, all equivalences collect the data number comprised
x={x
1... x
nrepresent N bar data stream, X={X
1... X
mrepresent M collection of equal value, X
sx
1to X
min one of them collection of equal value, y
i∈ 1 ..., K}, K are the number of cluster centre;
Step S42: set up the log-likelyhood function based on constrained maximization:
Can obtain according to described data genaration model
Marginal probability distribution is:
Wherein θ
gbe that parameter current is estimated, θ is the parameter estimation after iterative computation, and X represents collection of equal value, and Y is the distribution probability of collection of equal value at each cluster centre point, y=y
i,
represent the prior probability of each cluster centre;
Step S43: log-likelyhood function is reduced to
wherein, the posterior probability of each collection of equal value calculates and is defined as follows:
Wherein
for the weight of current l class,
represent that parameter current estimates the probability of lower x,
for equivalence collection X
selement,
represent old parameter estimation, θ
lrepresent new parameter estimation, l represents the cluster centre of l;
Step S44: use the method for the maximal possibility estimation of belt restraining to solve the parameter of data genaration model, make Q
c(θ, θ
g) value maximize; Wherein
wherein
represent collection X of equal value
selement;
Further, described five-tuple comprises source IP, source port, Target IP, target port and agreement.
Further, described SDN stream is of equal value to be integrated as independent same distribution.
Further, target ip, target port, the agreement of described side information SDN stream.
Compared with prior art, the present invention has following beneficial effect: therefore the present invention introduces Semi-supervised clustering algorithm, according to user's historical data, analyzes packet and data correlation feature.The present invention is by improving basic gauss hybrid models algorithm, introduces the side information of stream, constructs the gauss hybrid models based on side information intensive bundle of equal value, improves Clustering Effect, and applied in SDN data stream clustering.Gauss hybrid models based on side information intensive bundle of equal value of the present invention compares all has larger lifting with gauss hybrid models and the accuracy of K-Means cluster result and cluster speed.
Accompanying drawing explanation
Fig. 1 is method flow schematic diagram of the present invention.
Embodiment
Below in conjunction with drawings and Examples, the present invention will be further described.
As shown in Figure 1, present embodiments provide a kind of SDN based on mixed Gaussian and flow clustering method, specifically comprise the following steps;
Step S1: five-tuple record is carried out to original SDN data, and adopts KMeans clustering algorithm, complete the mapping relations between SND data stream and user;
Step S2: utilize gauss hybrid models GMM and formula
estimate the probability density distribution of SDN data stream, wherein K is the number of Gauss model, a
ibe the weight of i-th Gauss model, p
i(x| θ
i) be the probability density function of i-th Gauss model, described p
i(x| θ
i) average be μ
k, variance is σ
k; θ
i=(μ
i, ∑
i), μ
i, ∑
ifor the parameter of data genaration model to be solved;
Step S3: adopt stream duration, data packet number, stream size, data package size, packet interval time as SDN flow vector attribute, obtains SDN stream collection of equal value by side information;
Step S4: the data genaration model of gauss hybrid models GMM is adjusted;
Step S5: adopt must-link to put constraint in pairs and cannot-link puts constraint in pairs and carrys out auxiliary cluster.
In the present embodiment, described step S4 specifically comprises the following steps:
Step S41: valued space SDN being flowed side information is expressed as
Wherein, Y={y
1..., y
i..., y
n, y
irepresent the cluster centre of i-th data point,
represent at s the cluster centre concentrating i-th data point of equal value, Y
srepresent s collection X of equal value
sprobability distribution, X
scomprise N
sbar data stream, all equivalences collect the data number comprised
x={x
1... x
nrepresent N bar data stream, X={X
1... X
mrepresent M collection of equal value, X
sx
1to X
min one of them collection of equal value, y
i∈ 1 ..., K}, K are the number of cluster centre;
Step S42: set up the log-likelyhood function based on constrained maximization:
Can obtain according to described data genaration model
Marginal probability distribution is:
Wherein θ
gbe that parameter current is estimated, θ is the parameter estimation after iterative computation, and X represents collection of equal value, and Y is the distribution probability of collection of equal value at each cluster centre point, y=y
i,
represent the prior probability of each cluster centre;
Step S43: log-likelyhood function is reduced to
Wherein, the posterior probability of each collection of equal value calculates and is defined as follows:
Wherein
for the weight of current l class,
represent that parameter current estimates the probability of lower x,
for equivalence collection X
selement,
represent old parameter estimation, θ
lrepresent new parameter estimation, l represents the cluster centre of l;
Step S44: use the method for the maximal possibility estimation of belt restraining to solve the parameter of data genaration model, make Q
c(θ, θ
g) value maximize; Wherein
wherein
represent collection X of equal value
selement;
In the present embodiment, described five-tuple comprises source IP, source port, Target IP, target port and agreement.
In the present embodiment, described SDN stream is of equal value to be integrated as independent same distribution.
In the present embodiment, target ip, target port, the agreement of described side information SDN stream.
The foregoing is only preferred embodiment of the present invention, all equalizations done according to the present patent application the scope of the claims change and modify, and all should belong to covering scope of the present invention.
Claims (5)
1. the SDN based on mixed Gaussian flows a clustering method, it is characterized in that comprising the following steps;
Step S1: five-tuple record is carried out to original SDN data, and adopts KMeans clustering algorithm, complete the mapping relations between SND data stream and user;
Step S2: utilize gauss hybrid models GMM and formula
estimate the probability density distribution of SDN data stream, wherein K is the number of Gauss model, a
ibe the weight of i-th Gauss model, p
i(x| θ
i) be the probability density function of i-th Gauss model, described p
i(x| θ
i) average be μ
k, variance is σ
k; θ
i=(μ
i, Σ
i), μ
i, Σ
ifor the parameter of data genaration model to be solved;
Step S3: adopt stream duration, data packet number, stream size, data package size, packet interval time as SDN flow vector attribute, obtains SDN stream collection of equal value by side information;
Step S4: the data genaration model of gauss hybrid models GMM is adjusted;
Step S5: adopt must-link to put constraint in pairs and cannot-link puts constraint in pairs and carrys out auxiliary cluster.
2. a kind of SDN based on mixed Gaussian according to claim 1 flows clustering method, it is characterized in that: described step S4 specifically comprises the following steps:
Step S41: valued space SDN being flowed side information is expressed as
Wherein, Y={y
1..., y
i..., y
n, y
irepresent the cluster centre of i-th data point,
represent at s the cluster centre concentrating i-th data point of equal value, Y
srepresent s collection X of equal value
sprobability distribution, X
scomprise N
sbar data stream, all equivalences collect the data number comprised
x={x
1... x
nrepresent N bar data stream, X={X
1... X
mrepresent M collection of equal value, X
sx
1to X
min one of them collection of equal value, y
i∈ 1 ..., K}, K are the number of cluster centre;
Step S42: set up the log-likelyhood function based on constrained maximization:
Can obtain according to described data genaration model
Marginal probability distribution is:
Wherein θ
gbe that parameter current is estimated, θ is the parameter estimation after iterative computation, and X represents collection of equal value, and Y is the distribution probability of collection of equal value at each cluster centre point, y=y
i,
represent the prior probability of each cluster centre;
Step S43: log-likelyhood function is reduced to
Wherein, the posterior probability of each collection of equal value calculates and is defined as follows:
Wherein
for the weight of current l class,
represent that parameter current estimates the probability of lower x,
for equivalence collection X
selement,
represent old parameter estimation, θ
lrepresent new parameter estimation, l represents the cluster centre of l;
Step S44: use the method for the maximal possibility estimation of belt restraining to solve the parameter of data genaration model, make Q
c(θ, θ
g) value maximize; Wherein
wherein
represent the element of collection Xs of equal value;
3. a kind of SDN based on mixed Gaussian according to claim 1 flows clustering method, it is characterized in that: described five-tuple comprises source IP, source port, Target IP, target port and agreement.
4. a kind of SDN based on mixed Gaussian according to claim 1 flows clustering method, it is characterized in that: described SDN stream is of equal value to be integrated as independent same distribution.
5. a kind of SDN based on mixed Gaussian according to claim 1 flows clustering method, it is characterized in that: target ip, target port, agreement that described side information SDN flows.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201510488828.0A CN105005629B (en) | 2015-08-11 | 2015-08-11 | A kind of SDN stream clustering methods based on mixed Gaussian |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201510488828.0A CN105005629B (en) | 2015-08-11 | 2015-08-11 | A kind of SDN stream clustering methods based on mixed Gaussian |
Publications (2)
Publication Number | Publication Date |
---|---|
CN105005629A true CN105005629A (en) | 2015-10-28 |
CN105005629B CN105005629B (en) | 2017-07-04 |
Family
ID=54378305
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201510488828.0A Active CN105005629B (en) | 2015-08-11 | 2015-08-11 | A kind of SDN stream clustering methods based on mixed Gaussian |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN105005629B (en) |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN1787076A (en) * | 2005-12-13 | 2006-06-14 | 浙江大学 | Method for distinguishing speek person based on hybrid supporting vector machine |
CN101127029A (en) * | 2007-08-24 | 2008-02-20 | 复旦大学 | Method for training SVM classifier in large scale data classification |
CN103927412A (en) * | 2014-04-01 | 2014-07-16 | 浙江大学 | Real-time learning debutanizer soft measurement modeling method on basis of Gaussian mixture models |
CN104506435A (en) * | 2014-12-12 | 2015-04-08 | 杭州华为数字技术有限公司 | SDN (Software Defined Network) controller and method for determining shortest path in SDN |
-
2015
- 2015-08-11 CN CN201510488828.0A patent/CN105005629B/en active Active
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN1787076A (en) * | 2005-12-13 | 2006-06-14 | 浙江大学 | Method for distinguishing speek person based on hybrid supporting vector machine |
CN101127029A (en) * | 2007-08-24 | 2008-02-20 | 复旦大学 | Method for training SVM classifier in large scale data classification |
CN103927412A (en) * | 2014-04-01 | 2014-07-16 | 浙江大学 | Real-time learning debutanizer soft measurement modeling method on basis of Gaussian mixture models |
CN104506435A (en) * | 2014-12-12 | 2015-04-08 | 杭州华为数字技术有限公司 | SDN (Software Defined Network) controller and method for determining shortest path in SDN |
Also Published As
Publication number | Publication date |
---|---|
CN105005629B (en) | 2017-07-04 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
WO2017157183A1 (en) | Automatic multi-threshold characteristic filtering method and apparatus | |
Song et al. | Training deep neural networks via direct loss minimization | |
Arora et al. | A convergence analysis of gradient descent for deep linear neural networks | |
CN109711544A (en) | Method, apparatus, electronic equipment and the computer storage medium of model compression | |
CN105703954A (en) | Network data flow prediction method based on ARIMA model | |
CN107358293A (en) | A kind of neural network training method and device | |
CN108776812A (en) | Multiple view clustering method based on Non-negative Matrix Factorization and various-consistency | |
CN105868773A (en) | Hierarchical random forest based multi-tag classification method | |
EP3905141A1 (en) | Estimating the implicit likelihoods of generative adversarial networks | |
CN101231702A (en) | Categorizer integration method | |
CN107133607A (en) | Demographics' method and system based on video monitoring | |
CN110490236A (en) | Automatic image marking method, system, device and medium neural network based | |
CN110188673A (en) | Expression recognition method and device | |
Gao et al. | Piecewise function approximation and vertex partitioning schemes for multi-dividing ontology algorithm in AUC criterion setting (I) | |
CN102637199B (en) | Image marking method based on semi-supervised subject modeling | |
CN106203628A (en) | A kind of optimization method strengthening degree of depth learning algorithm robustness and system | |
Li et al. | On the effectiveness of partial variance reduction in federated learning with heterogeneous data | |
Wang et al. | Knowledge-enhanced semi-supervised federated learning for aggregating heterogeneous lightweight clients in iot | |
CN103616021A (en) | Global localization method and device | |
Li et al. | Class balanced adaptive pseudo labeling for federated semi-supervised learning | |
CN105005629A (en) | SDN stream clustering method based on gaussian mixture | |
CN106339072A (en) | Distributed large data real-time processing system and method based on left and right brain model | |
Li et al. | Exponential family restricted Boltzmann machines and annealed importance sampling | |
CN103198052A (en) | Active learning method based on support vector machine | |
CN103903267B (en) | Image partition method based on average template and student's t mixed models |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |