CN105005629A - SDN stream clustering method based on gaussian mixture - Google Patents

SDN stream clustering method based on gaussian mixture Download PDF

Info

Publication number
CN105005629A
CN105005629A CN201510488828.0A CN201510488828A CN105005629A CN 105005629 A CN105005629 A CN 105005629A CN 201510488828 A CN201510488828 A CN 201510488828A CN 105005629 A CN105005629 A CN 105005629A
Authority
CN
China
Prior art keywords
theta
omega
sigma
sdn
represent
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201510488828.0A
Other languages
Chinese (zh)
Other versions
CN105005629B (en
Inventor
郑相涵
陈锋情
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Fuzhou University
Original Assignee
Fuzhou University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Fuzhou University filed Critical Fuzhou University
Priority to CN201510488828.0A priority Critical patent/CN105005629B/en
Publication of CN105005629A publication Critical patent/CN105005629A/en
Application granted granted Critical
Publication of CN105005629B publication Critical patent/CN105005629B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/28Databases characterised by their database models, e.g. relational or object models
    • G06F16/284Relational databases
    • G06F16/285Clustering or classification

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Data Exchanges In Wide-Area Networks (AREA)
  • Complex Calculations (AREA)

Abstract

The invention relates to an SDN stream clustering method based on gaussian mixture. According to the method, a basic gaussian mixture model algorithm is improved, side information of streams is introduced, and a gaussian mixture model based on side information and constrained by an equivalent set is constructed, so that a clustering effect is improved, and the SDN stream clustering method is applied to SDN data stream clustering. By adopting the method, the accuracy of a clustering result is improved greatly and a clustering speed is increased greatly.

Description

A kind of SDN based on mixed Gaussian flows clustering method
Technical field
The present invention relates to SDN data stream clustering, particularly a kind of SDN based on mixed Gaussian flows clustering method.
Background technology
Software defined network (Software Defined Network, SDN), it is a kind of new network innovation of Emulex network framework, it is a kind of implementation of network virtualization, its core technology OpenFlow is by separating network equipment chain of command and data surface, thus achieve the flexible control of network traffics, make network become more intelligent as pipeline.
At present under software defined network (SDN) environment, the research for the subsumption problem efficiently and accurately of SDN stream does not also have remarkable progress.
Summary of the invention
In view of this, the object of the invention is to propose a kind of SDN based on mixed Gaussian and flow clustering method, the accuracy of cluster result and cluster speed are greatly improved.
The present invention adopts following scheme to realize: a kind of SDN based on mixed Gaussian flows clustering method, specifically comprises the following steps;
Step S1: five-tuple record is carried out to original SDN data, and adopts KMeans clustering algorithm, complete the mapping relations between SND data stream and user;
Step S2: utilize gauss hybrid models GMM and formula estimate the probability density distribution of SDN data stream, wherein K is the number of Gauss model, a ibe the weight of i-th Gauss model, p i(x| θ i) be the probability density function of i-th Gauss model, described p i(x| θ i) average be μ k, variance is σ k; θ i=(μ i, ∑ i), μ i, ∑ ifor the parameter of data genaration model to be solved;
Step S3: adopt stream duration, data packet number, stream size, data package size, packet interval time as SDN flow vector attribute, obtains SDN stream collection of equal value by side information;
Step S4: the data genaration model of gauss hybrid models GMM is adjusted;
Step S5: adopt must-link to put constraint in pairs and cannot-link puts constraint in pairs and carrys out auxiliary cluster.
Further, described step S4 specifically comprises the following steps:
Step S41: valued space SDN being flowed side information is expressed as Ω = { Y | ( y 1 s = ... = y i s = ... = y N s s = Y s ) , s = 1 , ... , M } , Wherein, Y={y 1..., y i..., y n, y irepresent the cluster centre of i-th data point, represent at s the cluster centre concentrating i-th data point of equal value, Y srepresent s collection X of equal value sprobability distribution, X scomprise N sbar data stream, all equivalences collect the data number comprised x={x 1... x nrepresent N bar data stream, X={X 1... X mrepresent M collection of equal value, X sx 1to X min one of them collection of equal value, y i∈ 1 ..., K}, K are the number of cluster centre;
Step S42: set up the log-likelyhood function based on constrained maximization: Q C ( θ , θ g ) = E [ log p ( X , Y | Y ∈ Ω , θ ) | X , Y ∈ Ω , θ g ] = Σ y ∈ Ω log p ( X , y | y ∈ Ω , θ ) P ( y | X , y ∈ Ω , θ g ) , Can obtain according to described data genaration model l o g p ( X , y | y ∈ Ω , θ ) = l o g p ( y | y ∈ Ω , θ ) p ( X | y , y ∈ Ω , θ ) = Σ s = 1 M l o g a Y S + Σ s = 1 M l o g p ( X S | Y S , y ∈ Ω , θ ) , Marginal probability distribution is: P ( y | X , y ∈ Ω , θ g ) = P ( y ∈ Ω | X , y , θ g ) P ( y | X , θ g ) P ( y ∈ Ω | X , θ g ) = Π s = 1 M δ Y S P ( Y S | X S , θ g ) Σ Y 1 ... Σ Y M Π j = 1 M δ Y j P ( Y j | X j , θ g ) , δ Y j = 1 , y 1 j = ... = y N j j 0 , o t h e r w i s e ; Wherein θ gbe that parameter current is estimated, θ is the parameter estimation after iterative computation, and X represents collection of equal value, and Y is the distribution probability of collection of equal value at each cluster centre point, y=y i, represent the prior probability of each cluster centre;
Step S43: log-likelyhood function is reduced to wherein, the posterior probability of each collection of equal value calculates and is defined as follows: P ( Y s = l | X s , y ∈ Ω , θ g ) ≡ P ( y 1 s = l , ... y N l s = l | X s , y ∈ Ω , θ g ) = Π n = 1 N s [ a l g p l ( x n s | θ l g ) ] Σ j = 1 K Π n = 1 N s [ [ a j g p j ( x n s | θ l g ) ] ] , Wherein for the weight of current l class, represent that parameter current estimates the probability of lower x, for equivalence collection X selement, represent old parameter estimation, θ lrepresent new parameter estimation, l represents the cluster centre of l;
Step S44: use the method for the maximal possibility estimation of belt restraining to solve the parameter of data genaration model, make Q c(θ, θ g) value maximize; Wherein wherein represent collection X of equal value selement; Σ i = Σ s = 1 M P ( l | X s , y ∈ Ω , θ g ) Σ n = 1 N s ( x n s - μ i ) ( x n s - μ i ) T Σ s = 1 M P ( l | X s , y ∈ Ω , θ g ) N S ; a i = 1 M Σ s = 1 M P ( l | X s , y ∈ Ω , θ g ) .
Further, described five-tuple comprises source IP, source port, Target IP, target port and agreement.
Further, described SDN stream is of equal value to be integrated as independent same distribution.
Further, target ip, target port, the agreement of described side information SDN stream.
Compared with prior art, the present invention has following beneficial effect: therefore the present invention introduces Semi-supervised clustering algorithm, according to user's historical data, analyzes packet and data correlation feature.The present invention is by improving basic gauss hybrid models algorithm, introduces the side information of stream, constructs the gauss hybrid models based on side information intensive bundle of equal value, improves Clustering Effect, and applied in SDN data stream clustering.Gauss hybrid models based on side information intensive bundle of equal value of the present invention compares all has larger lifting with gauss hybrid models and the accuracy of K-Means cluster result and cluster speed.
Accompanying drawing explanation
Fig. 1 is method flow schematic diagram of the present invention.
Embodiment
Below in conjunction with drawings and Examples, the present invention will be further described.
As shown in Figure 1, present embodiments provide a kind of SDN based on mixed Gaussian and flow clustering method, specifically comprise the following steps;
Step S1: five-tuple record is carried out to original SDN data, and adopts KMeans clustering algorithm, complete the mapping relations between SND data stream and user;
Step S2: utilize gauss hybrid models GMM and formula estimate the probability density distribution of SDN data stream, wherein K is the number of Gauss model, a ibe the weight of i-th Gauss model, p i(x| θ i) be the probability density function of i-th Gauss model, described p i(x| θ i) average be μ k, variance is σ k; θ i=(μ i, ∑ i), μ i, ∑ ifor the parameter of data genaration model to be solved;
Step S3: adopt stream duration, data packet number, stream size, data package size, packet interval time as SDN flow vector attribute, obtains SDN stream collection of equal value by side information;
Step S4: the data genaration model of gauss hybrid models GMM is adjusted;
Step S5: adopt must-link to put constraint in pairs and cannot-link puts constraint in pairs and carrys out auxiliary cluster.
In the present embodiment, described step S4 specifically comprises the following steps:
Step S41: valued space SDN being flowed side information is expressed as Ω = { Y | ( y 1 s = ... = y i s = ... = y N s s = Y s ) , s = 1 , ... , M } , Wherein, Y={y 1..., y i..., y n, y irepresent the cluster centre of i-th data point, represent at s the cluster centre concentrating i-th data point of equal value, Y srepresent s collection X of equal value sprobability distribution, X scomprise N sbar data stream, all equivalences collect the data number comprised x={x 1... x nrepresent N bar data stream, X={X 1... X mrepresent M collection of equal value, X sx 1to X min one of them collection of equal value, y i∈ 1 ..., K}, K are the number of cluster centre;
Step S42: set up the log-likelyhood function based on constrained maximization: Q C ( θ , θ g ) = E [ log p ( X , Y | Y ∈ Ω , θ ) | X , Y ∈ Ω , θ g ] = Σ y ∈ Ω log p ( X , y | y ∈ Ω , θ ) P ( y | X , y ∈ Ω , θ g ) , Can obtain according to described data genaration model l o g p ( X , y | y ∈ Ω , θ ) = l o g p ( y | y ∈ Ω , θ ) p ( X | y , y ∈ Ω , θ ) = Σ s = 1 M l o g a Y S + Σ s = 1 M l o g p ( X S | Y S , y ∈ Ω , θ ) , Marginal probability distribution is: P ( y | X , y ∈ Ω , θ g ) = P ( y ∈ Ω | X , y , θ g ) P ( y | X , θ g ) P ( y ∈ Ω | X , θ g ) = Π s = 1 M δ Y S P ( Y S | X S , θ g ) Σ Y 1 ... Σ Y M Π j = 1 M δ Y j P ( Y j | X j , θ g ) , δ Y j = 1 , y 1 j = ... = y N j j 0 , o t h e r w i s e ; Wherein θ gbe that parameter current is estimated, θ is the parameter estimation after iterative computation, and X represents collection of equal value, and Y is the distribution probability of collection of equal value at each cluster centre point, y=y i, represent the prior probability of each cluster centre;
Step S43: log-likelyhood function is reduced to Q C ( θ , θ g ) = Σ s = 1 M Σ l = 1 K P ( Y s = l | X s , y ∈ Ω , θ g ) Σ n = 1 N s l o g p l ( x n s | θ l ) + Σ s = 1 M Σ l = 1 K P ( Y s = l | X s , y ∈ Ω , θ g ) N s l o g a l , Wherein, the posterior probability of each collection of equal value calculates and is defined as follows: P ( Y s = l | X s , y ∈ Ω , θ g ) ≡ P ( y 1 s = l , ... y N l s = l | X s , y ∈ Ω , θ g ) = Π n = 1 N s [ a l g p l ( x n s | θ l g ) ] Σ j = 1 K Π n = 1 N s [ [ a j g p j ( x n s | θ l g ) ] ] , Wherein for the weight of current l class, represent that parameter current estimates the probability of lower x, for equivalence collection X selement, represent old parameter estimation, θ lrepresent new parameter estimation, l represents the cluster centre of l;
Step S44: use the method for the maximal possibility estimation of belt restraining to solve the parameter of data genaration model, make Q c(θ, θ g) value maximize; Wherein wherein represent collection X of equal value selement; Σ i = Σ s = 1 M P ( l | X s , y ∈ Ω , θ g ) Σ n = 1 N s ( x n s - μ i ) ( x n s - μ i ) T Σ s = 1 M P ( l | X s , y ∈ Ω , θ g ) N S ; a i = 1 M Σ s = 1 M P ( l | X s , y ∈ Ω , θ g ) .
In the present embodiment, described five-tuple comprises source IP, source port, Target IP, target port and agreement.
In the present embodiment, described SDN stream is of equal value to be integrated as independent same distribution.
In the present embodiment, target ip, target port, the agreement of described side information SDN stream.
The foregoing is only preferred embodiment of the present invention, all equalizations done according to the present patent application the scope of the claims change and modify, and all should belong to covering scope of the present invention.

Claims (5)

1. the SDN based on mixed Gaussian flows a clustering method, it is characterized in that comprising the following steps;
Step S1: five-tuple record is carried out to original SDN data, and adopts KMeans clustering algorithm, complete the mapping relations between SND data stream and user;
Step S2: utilize gauss hybrid models GMM and formula estimate the probability density distribution of SDN data stream, wherein K is the number of Gauss model, a ibe the weight of i-th Gauss model, p i(x| θ i) be the probability density function of i-th Gauss model, described p i(x| θ i) average be μ k, variance is σ k; θ i=(μ i, Σ i), μ i, Σ ifor the parameter of data genaration model to be solved;
Step S3: adopt stream duration, data packet number, stream size, data package size, packet interval time as SDN flow vector attribute, obtains SDN stream collection of equal value by side information;
Step S4: the data genaration model of gauss hybrid models GMM is adjusted;
Step S5: adopt must-link to put constraint in pairs and cannot-link puts constraint in pairs and carrys out auxiliary cluster.
2. a kind of SDN based on mixed Gaussian according to claim 1 flows clustering method, it is characterized in that: described step S4 specifically comprises the following steps:
Step S41: valued space SDN being flowed side information is expressed as Ω = { Y | ( y 1 s = ... = y i s = ... = y N s s = Y s ) , s = 1 , ... , M } , Wherein, Y={y 1..., y i..., y n, y irepresent the cluster centre of i-th data point, represent at s the cluster centre concentrating i-th data point of equal value, Y srepresent s collection X of equal value sprobability distribution, X scomprise N sbar data stream, all equivalences collect the data number comprised x={x 1... x nrepresent N bar data stream, X={X 1... X mrepresent M collection of equal value, X sx 1to X min one of them collection of equal value, y i∈ 1 ..., K}, K are the number of cluster centre;
Step S42: set up the log-likelyhood function based on constrained maximization: Q C ( θ , θ g ) = E [ log p ( X , Y | Y ∈ Ω , θ ) | X , Y ∈ Ω , θ g ] = Σ y ∈ Ω log p ( X , y | y ∈ Ω , θ ) P ( y | X , y ∈ Ω , θ g ) , Can obtain according to described data genaration model l o g p ( X , y | y ∈ Ω , θ ) = l o g p ( y | y ∈ Ω , θ ) p ( X | y , y ∈ Ω , θ ) = Σ s = 1 M l o g a Y S + Σ s = 1 M l o g p ( X S | Y S , y ∈ Ω , θ ) , Marginal probability distribution is: P ( y | X , y ∈ Ω , θ g ) = P ( y ∈ Ω | X , y , θ g ) P ( y | X , θ g ) P ( y ∈ Ω | X , θ g ) = Π s = 1 M δ Y S P ( Y S | X S , θ g ) Σ Y 1 ... Σ Y M Π j = 1 M δ Y j P ( Y j | X j , θ g ) , δ Y j = 1 , y 1 j = ... = y N j j 0 , o t h e r w i s e ; Wherein θ gbe that parameter current is estimated, θ is the parameter estimation after iterative computation, and X represents collection of equal value, and Y is the distribution probability of collection of equal value at each cluster centre point, y=y i, represent the prior probability of each cluster centre;
Step S43: log-likelyhood function is reduced to Q C ( θ , θ g ) = Σ s = 1 M Σ l = 1 K P ( Y s = l | X s , y ∈ Ω , θ g ) Σ n = 1 N s l o g p l ( x n s | θ l ) + Σ s = 1 M Σ l = 1 K P ( Y s = l | X s , y ∈ Ω , θ g ) N s l o g a l , Wherein, the posterior probability of each collection of equal value calculates and is defined as follows: P ( Y s = l | X s , y ∈ Ω , θ g ) ≡ P ( y 1 s = l , ... y N l s = l | X s , y ∈ Ω , θ g ) = Π n = 1 N s [ a l g p l ( x n s | θ l g ) ] Σ j = 1 K Π n = 1 N s [ [ a j g p j ( x n s | θ l g ) ] ] , Wherein for the weight of current l class, represent that parameter current estimates the probability of lower x, for equivalence collection X selement, represent old parameter estimation, θ lrepresent new parameter estimation, l represents the cluster centre of l;
Step S44: use the method for the maximal possibility estimation of belt restraining to solve the parameter of data genaration model, make Q c(θ, θ g) value maximize; Wherein wherein represent the element of collection Xs of equal value; Σ i = Σ s = 1 M P ( l | X s , y ∈ Ω , θ g ) Σ n = 1 N s ( x n s - μ i ) ( x n s - μ i ) T Σ s = 1 M P ( l | X s , y ∈ Ω , θ g ) N S ; a i = 1 M Σ s = 1 M P ( l | X s , y ∈ Ω , θ g ) .
3. a kind of SDN based on mixed Gaussian according to claim 1 flows clustering method, it is characterized in that: described five-tuple comprises source IP, source port, Target IP, target port and agreement.
4. a kind of SDN based on mixed Gaussian according to claim 1 flows clustering method, it is characterized in that: described SDN stream is of equal value to be integrated as independent same distribution.
5. a kind of SDN based on mixed Gaussian according to claim 1 flows clustering method, it is characterized in that: target ip, target port, agreement that described side information SDN flows.
CN201510488828.0A 2015-08-11 2015-08-11 A kind of SDN stream clustering methods based on mixed Gaussian Active CN105005629B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201510488828.0A CN105005629B (en) 2015-08-11 2015-08-11 A kind of SDN stream clustering methods based on mixed Gaussian

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201510488828.0A CN105005629B (en) 2015-08-11 2015-08-11 A kind of SDN stream clustering methods based on mixed Gaussian

Publications (2)

Publication Number Publication Date
CN105005629A true CN105005629A (en) 2015-10-28
CN105005629B CN105005629B (en) 2017-07-04

Family

ID=54378305

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201510488828.0A Active CN105005629B (en) 2015-08-11 2015-08-11 A kind of SDN stream clustering methods based on mixed Gaussian

Country Status (1)

Country Link
CN (1) CN105005629B (en)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1787076A (en) * 2005-12-13 2006-06-14 浙江大学 Method for distinguishing speek person based on hybrid supporting vector machine
CN101127029A (en) * 2007-08-24 2008-02-20 复旦大学 Method for training SVM classifier in large scale data classification
CN103927412A (en) * 2014-04-01 2014-07-16 浙江大学 Real-time learning debutanizer soft measurement modeling method on basis of Gaussian mixture models
CN104506435A (en) * 2014-12-12 2015-04-08 杭州华为数字技术有限公司 SDN (Software Defined Network) controller and method for determining shortest path in SDN

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1787076A (en) * 2005-12-13 2006-06-14 浙江大学 Method for distinguishing speek person based on hybrid supporting vector machine
CN101127029A (en) * 2007-08-24 2008-02-20 复旦大学 Method for training SVM classifier in large scale data classification
CN103927412A (en) * 2014-04-01 2014-07-16 浙江大学 Real-time learning debutanizer soft measurement modeling method on basis of Gaussian mixture models
CN104506435A (en) * 2014-12-12 2015-04-08 杭州华为数字技术有限公司 SDN (Software Defined Network) controller and method for determining shortest path in SDN

Also Published As

Publication number Publication date
CN105005629B (en) 2017-07-04

Similar Documents

Publication Publication Date Title
WO2017157183A1 (en) Automatic multi-threshold characteristic filtering method and apparatus
Song et al. Training deep neural networks via direct loss minimization
Arora et al. A convergence analysis of gradient descent for deep linear neural networks
CN109711544A (en) Method, apparatus, electronic equipment and the computer storage medium of model compression
CN105703954A (en) Network data flow prediction method based on ARIMA model
CN107358293A (en) A kind of neural network training method and device
CN108776812A (en) Multiple view clustering method based on Non-negative Matrix Factorization and various-consistency
CN105868773A (en) Hierarchical random forest based multi-tag classification method
EP3905141A1 (en) Estimating the implicit likelihoods of generative adversarial networks
CN101231702A (en) Categorizer integration method
CN107133607A (en) Demographics' method and system based on video monitoring
CN110490236A (en) Automatic image marking method, system, device and medium neural network based
CN110188673A (en) Expression recognition method and device
Gao et al. Piecewise function approximation and vertex partitioning schemes for multi-dividing ontology algorithm in AUC criterion setting (I)
CN102637199B (en) Image marking method based on semi-supervised subject modeling
CN106203628A (en) A kind of optimization method strengthening degree of depth learning algorithm robustness and system
Li et al. On the effectiveness of partial variance reduction in federated learning with heterogeneous data
Wang et al. Knowledge-enhanced semi-supervised federated learning for aggregating heterogeneous lightweight clients in iot
CN103616021A (en) Global localization method and device
Li et al. Class balanced adaptive pseudo labeling for federated semi-supervised learning
CN105005629A (en) SDN stream clustering method based on gaussian mixture
CN106339072A (en) Distributed large data real-time processing system and method based on left and right brain model
Li et al. Exponential family restricted Boltzmann machines and annealed importance sampling
CN103198052A (en) Active learning method based on support vector machine
CN103903267B (en) Image partition method based on average template and student's t mixed models

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant