CN110210504A - A kind of recognition methods and equipment of network flow data - Google Patents

A kind of recognition methods and equipment of network flow data Download PDF

Info

Publication number
CN110210504A
CN110210504A CN201810167055.XA CN201810167055A CN110210504A CN 110210504 A CN110210504 A CN 110210504A CN 201810167055 A CN201810167055 A CN 201810167055A CN 110210504 A CN110210504 A CN 110210504A
Authority
CN
China
Prior art keywords
traffic data
cluster
data collection
current traffic
membership
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201810167055.XA
Other languages
Chinese (zh)
Inventor
苏龙华
徐军
董琪
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
China Mobile Communications Group Co Ltd
China Mobile Suzhou Software Technology Co Ltd
Original Assignee
China Mobile Communications Group Co Ltd
China Mobile Suzhou Software Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by China Mobile Communications Group Co Ltd, China Mobile Suzhou Software Technology Co Ltd filed Critical China Mobile Communications Group Co Ltd
Priority to CN201810167055.XA priority Critical patent/CN110210504A/en
Publication of CN110210504A publication Critical patent/CN110210504A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques

Landscapes

  • Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a kind of recognition methods of network flow data and equipment, for solving in the prior art, the technical problem of existing network flow data identification inaccuracy.The described method includes: choosing historical traffic data collection, the historical traffic data is clustered using clustering algorithm, obtain the history domain class center of the historical traffic data collection, wherein the history domain class center is the cluster centre of the historical traffic data collection;Using the history domain class center as the reference cluster centre of current traffic data collection to be identified, cluster centre is referred to according to described, generate the constraint function constrained the cluster centre of current traffic data collection, the current traffic data collection is clustered based on the constraint function, and identifying that the current traffic data concentrates the type of each data on flows according to the cluster result, the constraint function is for making the cluster centre of the current traffic data collection close to the history domain class center.

Description

A kind of recognition methods and equipment of network flow data
Technical field
The present invention relates to the recognition methods of technical field of the computer network more particularly to a kind of network flow data and set It is standby.
Background technique
With the rapid development of internet technology, the new opplication of network emerges one after another, to the intelligence of data on flows in network Management becomes increasingly important, and to carry out intelligent management to data on flows, then it is carried out firstly the need of the type to data on flows Identification.
Currently, mainly identifying that traditional clustering algorithm must be relied on to data on flows using traditional clustering algorithm A large amount of and undistorted data on flows, however data on flows be highly susceptible in transmit process the noise jamming of external environment from And lead to the loss or distortion of data, so, in the case where data on flows is smaller or is interfered by extraneous environmental noise, make When being identified with traditional clustering algorithm to data on flows, the accuracy of identification will be seriously reduced.
So in the prior art, there are the technical problems of network flow data identification inaccuracy.
Summary of the invention
The embodiment of the present invention provides the recognition methods and equipment of a kind of network flow data, can be improved network flow data Recognition accuracy.
In a first aspect, a kind of recognition methods of network flow data is provided, this method comprises:
Historical traffic data collection is chosen, the historical traffic data is clustered using clustering algorithm, is gone through described in acquisition The history domain class center of history data on flows collection, wherein the history domain class center is in the cluster of the historical traffic data collection The heart;
Using the history domain class center as the reference cluster centre of current traffic data collection to be identified, according to the ginseng Cluster centre is examined, the constraint function constrained the cluster centre of current traffic data collection is generated, is based on the constraint function The current traffic data collection is clustered, and identifies that the current traffic data concentrates each stream according to the cluster result The type of data is measured, the constraint function is for making the cluster centre of the current traffic data collection close in the class of the history domain The heart.
Using network flow data recognition methods provided in an embodiment of the present invention, can be filled by obtaining data volume in network The cluster task of current current traffic data to be identified is instructed at the history domain class center of the historical traffic data of foot, also will Reference cluster centre of the history domain class center as current traffic data collection, passes through the cluster to current traffic data collection of generation Center carries out the constraint function that constraint enables the cluster centre infinite approach history domain class center of current traffic data collection, pair Current traffic data collection is clustered, and can be improved the Clustering Effect of current traffic data, and then improve current traffic data Recognition accuracy.
Optionally, the constraint function of the generation are as follows:
It is described that the current traffic data collection is clustered based on the constraint function, comprising:
Objective function, the objective function are determined according to the constraint function and clustering algorithm are as follows:
Wherein, C is the classification sum of cluster, and N indicates the sum of flow sample, xjFor flow sample, γ is balance parameters And γ ∈ (0 ,+∞), | | xj-vi||2Indicate that the distance between j-th of sample and ith cluster center, β are balance parameters and β ∈ [0 ,+∞),For the ith cluster center at history domain class center, viFor the cluster centre of the i-th class, μijIndicate j-th of sample It is the degree of membership of i-th of class;
The current traffic data collection is clustered based on the objective function.
It is optionally, described that the current traffic data collection is clustered based on the objective function, comprising:
The value of the classification sum C of the cluster of the current traffic data is chosen, and chooses the value of balance parameters γ respectively And the value of β;
The target letter is inputted using the value of C, the value of γ, the value of β and the current traffic data collection as flow sample Number is clustered, and the optimal degree of membership of the current data set is obtained, and the optimal degree of membership includes that the current data is concentrated Multiple degrees of membership of each data on flows.
Optionally, described to identify that the current traffic data concentrates the class of each data on flows according to the cluster result Type, comprising:
According to the optimal degree of membership, determine that the target in multiple degrees of membership of each data on flows greater than threshold value is subordinate to Degree;
Each data on flows is classified as to the corresponding type of target degree of membership of each data on flows.
Above-mentioned three kinds optional modes are described using history domain class center as in the reference of current traffic data collection cluster The heart, to instruct to current traffic data to carry out the specific mistake of cluster and the classification according to cluster process identification current traffic data Journey, even if can also reach preferable in the insufficient situation of data volume of current traffic data distortion or current traffic data Clustering Effect promotes the accuracy of flow identification.
Optionally, the clustering algorithm is fuzzy division clustering algorithm or is Maximum Entropy Clustering Algorithm.
Second aspect, provides a kind of equipment for network flow data identification, and the equipment includes:
Cluster cell gathers the historical traffic data using clustering algorithm for choosing historical traffic data collection Class obtains the history domain class center of the historical traffic data collection, wherein the history domain class center is the historical traffic number According to the cluster centre of collection;
Recognition unit, for using the history domain class center as in the reference of current traffic data collection to be identified cluster The heart refers to cluster centre according to described, generates the constraint function constrained the cluster centre of current traffic data collection, be based on The constraint function clusters the current traffic data collection, and identifies the present flow rate number according to the cluster result According to the type for concentrating each data on flows, the constraint function is for making the cluster centre of the current traffic data collection close to institute State history domain class center.
Optionally, the constraint function of the generation are as follows:
The recognition unit is also used to:
Objective function, the objective function are determined according to the constraint function and clustering algorithm are as follows:
Wherein, C is the classification sum of cluster, and N indicates the sum of flow sample, xjFor flow sample, γ is balance parameters And γ ∈ (0 ,+∞), | | xj-vi||2Indicate that the distance between j-th of sample and ith cluster center, β are balance parameters and β ∈ [0 ,+∞),For the ith cluster center at history domain class center, viFor the cluster centre of the i-th class, μijIndicate j-th of sample It is the degree of membership of i-th of class;
The current traffic data collection is clustered based on the objective function.
Optionally, the recognition unit is also used to:
The value of the classification sum C of the cluster of the current traffic data is chosen, and chooses the value of balance parameters γ respectively And the value of β;
The target letter is inputted using the value of C, the value of γ, the value of β and the current traffic data collection as flow sample Number is clustered, and the optimal degree of membership of the current data set is obtained, and the optimal degree of membership includes that the current data is concentrated Multiple degrees of membership of each data on flows.
Optionally, the recognition unit is also used to:
It is described to identify that the current traffic data concentrates the type of each data on flows according to the cluster result, comprising:
According to the optimal degree of membership, determine that the target in multiple degrees of membership of each data on flows greater than threshold value is subordinate to Degree;
Each data on flows is classified as to the corresponding type of target degree of membership of each data on flows.
Optionally, the clustering algorithm is fuzzy division clustering algorithm or is Maximum Entropy Clustering Algorithm.
The technical effect of equipment provided by the present application for network flow data identification may refer to above-mentioned first aspect Each implementation technical effect, details are not described herein again.
The third aspect, provides a kind of equipment, and the equipment includes:
At least one processor, and
The memory being connect at least one described processor;
Wherein, the memory is stored with the instruction that can be executed by least one described processor, described at least one The instruction that device is stored by executing the memory is managed, method as described in relation to the first aspect is executed.
Fourth aspect provides a kind of computer readable storage medium:
The computer-readable recording medium storage has computer instruction, when the computer instruction is run on computers When, so that computer executes method as described in relation to the first aspect.
5th aspect, additionally provide a kind of computer program product comprising instruction makes when run on a computer It obtains computer and executes method described in above-mentioned first aspect.
A kind of network flow data recognition methods provided in an embodiment of the present invention can choose data volume abundance in network Historical traffic data, and cluster is carried out to the historical traffic data of selection and obtains history domain class center, and then going through using acquisition The cluster of current traffic data to be identified is instructed at history domain class center, enable to the cluster centre of current traffic data collection without Limit reaches the Clustering Effect for improving current traffic data close to history domain class center, and then improves the identification of current traffic data The effect of accuracy.
Detailed description of the invention
In order to more clearly explain the embodiment of the invention or the technical proposal in the existing technology, embodiment will be described below Needed in attached drawing be briefly described, it should be apparent that, the accompanying drawings in the following description is only of the invention some Embodiment.
Fig. 1 is the flow chart that network flow data method is identified in the embodiment of the present invention;
Fig. 2 is the structural schematic diagram in the embodiment of the present invention for the equipment of network flow data identification;
Fig. 3 is the structural schematic diagram in the embodiment of the present invention for another equipment of network flow data identification.
Specific embodiment
In order to make the object, technical scheme and advantages of the embodiment of the invention clearer, below in conjunction with the embodiment of the present invention In attached drawing, technical solution of the present invention is clearly and completely described, it is clear that described embodiment is skill of the present invention A part of the embodiment of art scheme, instead of all the embodiments.Based on the embodiment recorded in present specification, this field is general Logical technical staff every other embodiment obtained without creative efforts, belongs to the technology of the present invention side The range of case protection.
The new opplication of the high speed development of Internet technology, network emerges one after another, in order to meet WEB application rapid deployment, The demand of automatic maintenance and automatic dilatation, so that it is flat to produce PaaS (Platform as a Service, platform service) Platform, but accompanying problem is that how to identify the data on flows of PaaS platform.
The identification of existing network flow data mainly uses traditional clustering algorithm, however, traditional clustering algorithm must A large amount of and undistorted data on flows must be relied on, but the data on flows in network is highly susceptible to external environment in transmit process Noise jamming so as to cause data loss or distortion so that in the specific implementation, using traditional clustering algorithm logarithm When measuring lesser data on flows or being identified by the data on flows that extraneous environmental noise interferes, its identification will be seriously affected Accuracy.
In order to solve in the prior art, there are the technical problem of network flow identification inaccuracy, the embodiment of the present invention is provided The recognition methods and equipment of a kind of network flow data.
It should be noted that the recognition methods and equipment of a kind of network flow data provided in an embodiment of the present invention, in addition to The data on flows identification that can be applied to PaaS platform is outer, is also applied to and needs to carry out data on flows identification in the prior art Other any kind of network platforms, such as IaaS (Infrastructure as a Service, infrastructure i.e. service) Platform, SaaS (Software as a Service, software service) platform etc..In the present embodiment, specifically to apply For PaaS platform, the above method of the embodiment of the present invention to be described in detail.
Referring to FIG. 1, being a kind of recognition methods of network flow data provided in an embodiment of the present invention, comprising:
Step S101: historical traffic data collection is chosen, the historical traffic data is clustered using clustering algorithm, is obtained Obtain the history domain class center of the historical traffic data collection.
Wherein, the history domain class center is the cluster centre of the historical traffic data collection.
In embodiments of the present invention, the historical traffic data of selection, which integrates, to be the flow that the past 1 year of PaaS platform generates Data, or the PaaS platform data on flows that (such as 2 years, 3 years) generate in the past few years, namely the historical traffic chosen The data volume of data set wants sufficient, here, going over data on flows collection that 1 year generates as historical traffic to choose PaaS platform For data set.
After choosing historical traffic data collection, the method in the embodiment of the present invention can be using clustering algorithm to selection Historical traffic data collection newly into clustering, wherein the clustering algorithm of use can be specially Maximum Entropy Clustering Algorithm, can also Think fuzzy division clustering algorithm, can also be other kinds of clustering algorithm, herein with using Maximum Entropy Clustering Algorithm namely For MEC clustering algorithm, the objective function of MEC clustering algorithm are as follows:
Wherein, μij∈[0,1],1≤i≤C,1≤j≤N,C is the classification sum of cluster, and N indicates flow sample This sum, xjFor flow sample, γ is balance parameters and γ ∈ (0 ,+∞), | | xj-vi||2Indicate j-th of sample and i-th The distance between cluster centre, | | | | indicate Euclidean distance, viFor the cluster centre of the i-th class, μijIndicate j-th of sample It is the degree of membership of i-th of class, degree of membership characterization sample belongs to certain a kind of degree.
Using the historical traffic data collection of selection as the flow sample of MEC clustering algorithm objective function, for example, when selection When historical traffic data collection specifically includes 10000 datas on flows, the total N value of flow sample is just 1000, when selection When historical traffic data collection specifically includes 50000 datas on flows, the total N value of flow sample is just 5000, herein just not It enumerates.
During concrete practice, the value of the classification sum C of cluster can be selected according to actual needs, for example, when needing When data on flows is classified as useful flow and useless flow two major classes, C can value be 2;When needing data on flows being classified as hacker When aggressive flow, class of languages flow and image class flow, C can value be 3.
In the present embodiment, specifically with C can value be 2, data on flows is classified as useful flow and useless flow two major classes, select For the historical traffic data collection taken specifically includes 50000 datas on flows, according to the specific of the historical traffic data collection of selection In number and the value of C setting MEC clustering algorithm after the parameter of objective function (formula (1)), by formula (1) respectively to U and V Partial derivative is sought, and the partial derivative sought is taken 0 entirely, cluster centre namely the history domain of the historical traffic data collection can be acquired Class centerThe history domain class center usually acquiredFor matrix form.
Step S102: using the history domain class center as the reference cluster centre of current traffic data collection to be identified, Cluster centre is referred to according to described, the constraint function constrained the cluster centre of current traffic data collection is generated, is based on institute It states constraint function to cluster the current traffic data collection, and the current traffic data is identified according to the cluster result Concentrate the type of each data on flows.
Wherein, the constraint function is for making the cluster centre of the current traffic data collection close in the class of the history domain The heart.
In embodiments of the present invention, the history domain class center that will be acquired according to the historical traffic data of selectionAs PaaS The reference cluster centre of platform current traffic data collection to be identified, and according to reference cluster centre namely the history domain class acquired CenterGenerate the cluster centre of current traffic data collection is constrained so that current traffic data collection cluster centre without It limits close to history domain class centerConstraint function:
Wherein, β be balance parameters and β ∈ [0 ,+∞), C be the classification sum of cluster, the sum of N expression flow sample, For history domain class centerIn ith cluster center, viFor the cluster centre of the i-th class, μijIndicate that j-th of sample is i-th of class Degree of membership.
During concrete practice, according to the history domain class center acquiredIt generates in the cluster of current traffic data collection The heart is constrained the cluster centre infinite approach history domain class center so that current traffic data collectionConstraint function form It can be with are as follows:Certainly, in practical applications, those skilled in the art can in conjunction with actual application scenarios with And the embodiment of the present invention provides design philosophy: namely the history domain class center acquired according to the historical traffic data of selectionAs The reference cluster centre of current traffic data collection to be identified, and according to reference cluster centre namely the history domain class center acquiredGeneration constrains the cluster centre of current traffic data collection so that the cluster centre of current traffic data collection infinitely connects Nearly history domain class centerOther patterns constraint function, be just not listed one by one herein.
In embodiments of the present invention, specifically it is with the constraint function of generationFor.Based on the constraint Function clusters the current traffic data collection, may further include:
New objective function, new objective function are determined according to the constraint function and clustering algorithm are as follows:
That is, using the constraint function of generation as clustering algorithm, (clustering algorithm can be Maximum Entropy Clustering Algorithm, can also be with Can also be other kinds of clustering algorithm for fuzzy division clustering algorithm) the newly-increased calculating factor, and then constitute and utilize history The new clustering algorithm with transfer learning ability of the current field cluster task is instructed at class center, according to above-mentioned new cluster calculation The objective function namely formula (3) of method cluster current traffic data collection to be identified.
In this example, it is assumed that current traffic data collection to be clustered includes 100 datas on flows, then, this 100 Data on flows can serve as the flow sample of above-mentioned new objective function namely formula (3), and N value is just 100, can choose institute The value for stating the classification sum C of the cluster of current traffic data is 2, namely needs for current traffic data to be identified to be classified as useful Flow and useless flow two major classes, and respectively choose balance parameters γ value and β value, then, by the value of the C of selection, The value of γ, the value of β and the current traffic data collection including 100 datas on flows are brought into the target letter as flow sample Number is clustered.
Specifically, new objective function J namely formula (3) are sought partial derivative to U and V respectively, the partial derivative acquired takes entirely 0, and then the optimal degree of membership of current traffic data collection to be identified is acquired, specific derivation process is as follows:
(1) class center viIterative formula:
It enablesThen
That is class center iterative formula are as follows:
(2) degree of membership uijIterative formula:
It enablesThen
:
BecauseThen
That is:
Formula (9) formula is brought into formula (7), obtains degree of membership iterative formula are as follows:
The premium class center V and optimal degree of membership U of current data set to be identified can finally be acquired.
Optionally, the objective function of above-mentioned new clustering algorithm can be realized with program language during concrete practice Namely formula (3), the program language selected can be C language, Java language, any type of program language such as C Plus Plus, This does not do any restrictions.
Using program language realize above-mentioned new clustering algorithm objective function to current traffic data to be identified It, can be by the value of the C of selection, the value of γ, the value of β and current traffic data including 100 datas on flows when being clustered Collection exports the optimal degree of membership of current traffic data to be identified as the input factor, and then by the operation of program, wherein Optimal degree of membership includes multiple degrees of membership that the current data concentrates each data on flows.
Here, continue so that current traffic data to be identified includes 100 datas on flows as an example, since the value of C is 2, Current traffic data to be identified is classified as useful flow and useless flow two major classes, then, the optimal degree of membership of output is just 2 degrees of membership including each data on flows in above-mentioned 100 datas on flows, namely belong to comprising each data on flows useful The degree of membership of class of traffic and the degree of membership for belonging to useless class of traffic, the optimal degree of membership U usually exported are matrix form, because This, the optimal degree of membership of output is alternatively referred to as optimal subordinated-degree matrix.
According to the optimal degree of membership of output, determines and be greater than in 2 degrees of membership of each data on flows in 100 datas on flows The element value range of the target degree of membership of threshold value, usual optimal subordinated-degree matrix is [0,1], and above-mentioned threshold value can be according to reality It needs to be configured, such as is set as 0.7 or 0.8 or 0.88, here, by taking threshold value is 0.7 as an example, then, it is determined that 100 stream out The target degree of membership for being greater than threshold value in data in 2 degrees of membership of each data on flows is measured, and each data on flows is classified as often The corresponding type of target degree of membership of a data on flows.
For example, including 100 datas on flows in current traffic data to be identified, it is also to work as to be identified that the value of C, which is 2, Preceding data on flows is classified as useful flow and useless flow two major classes, and the optimal subordinated-degree matrix U of output can be specially 2*100's Matrix namely U2×100, or the matrix namely U of 100*2100×2, specifically, being 2*100's in optimal subordinated-degree matrix Matrix U2×100When, U2×100In first row two element representations, first data on flows 2 degrees of membership, the two of secondary series 2 degrees of membership of a second data on flows of element representation, and so on, to two element representations the 100th stream of the 100th column Measure 2 degrees of membership of data.
Then, it is determined that matrix U2×100It is subordinate in each column greater than the target that the element of threshold value 0.7 is the corresponding data on flows of each column Category degree, if the element of first row the first row is 0.9, the element of the second row of first row is 0.3, then just by first row the first row Element is determined as target degree of membership, it is assumed here that matrix U2×100Data on flows is classified as useful stream by the element representation of middle the first row Data on flows is classified as the degree of membership of useless flow by the degree of membership of amount, the element representation of the second row, then, just by first flow The corresponding useful flow of element that data are classified as target degree of membership namely first row the first row is classified as useful discharge pattern.
So in the above-mentioned methods, being clustered by using historical traffic data collection of the clustering algorithm to selection, obtain The history domain class center of historical traffic data collection, then using the history domain class center as current traffic data collection to be identified Reference cluster centre, and the constraint that is constrained the cluster centre of current traffic data collection is generated according to reference cluster centre Function, and then cluster is carried out to the current traffic data collection based on the constraint function and is identified according to the cluster result Method that the current traffic data concentrates the type of each data on flows, so, effective solution in the prior art, exists The technical problem of network flow identification inaccuracy, realizes the accuracy for improving network flow data identification.Simultaneously as upper It states in method, what is utilized is therefore the history domain class center of historical traffic data collection will not expose historical traffic data, so, Also there is the beneficial effect for improving the degree of safety that data use.
Based on the same inventive concept, a kind of equipment for network flow data identification is provided in the embodiment of the present invention, The specific implementation of the network flow data recognition methods of the equipment can be found in the description of above method embodiment part, repeat place It repeats no more, referring to FIG. 2, the equipment includes:
Cluster cell 20 carries out the historical traffic data using clustering algorithm for choosing historical traffic data collection Cluster, obtains the history domain class center of the historical traffic data collection, wherein the history domain class center is the historical traffic The cluster centre of data set;
Recognition unit 21, for being clustered the history domain class center as the reference of current traffic data collection to be identified Center refers to cluster centre according to described, generates the constraint function constrained the cluster centre of current traffic data collection, base The current traffic data collection is clustered in the constraint function, and the present flow rate is identified according to the cluster result The type of each data on flows in data set, the constraint function are used to make the cluster centre of the current traffic data collection close The history domain class center.
Optionally, the constraint function of the generation are as follows:
The recognition unit is also used to:
Objective function, the objective function are determined according to the constraint function and clustering algorithm are as follows:
Wherein, C is the classification sum of cluster, and N indicates the sum of flow sample, xjFor flow sample, γ is balance parameters And γ ∈ (0 ,+∞), | | xj-vi||2Indicate that the distance between j-th of sample and ith cluster center, β are balance parameters and β ∈ [0 ,+∞),For the ith cluster center at history domain class center, viFor the cluster centre of the i-th class, μijIndicate j-th of sample It is the degree of membership of i-th of class;
The current traffic data collection is clustered based on the objective function.
Optionally, the recognition unit is also used to:
The value of the classification sum C of the cluster of the current traffic data is chosen, and chooses the value of balance parameters γ respectively And the value of β;
The target letter is inputted using the value of C, the value of γ, the value of β and the current traffic data collection as flow sample Number is clustered, and the optimal degree of membership of the current data set is obtained, and the optimal degree of membership includes that the current data is concentrated Multiple degrees of membership of each data on flows.
Optionally, the recognition unit is also used to:
According to the optimal degree of membership, determine that the target in multiple degrees of membership of each data on flows greater than threshold value is subordinate to Degree;
Each data on flows is classified as to the corresponding type of target degree of membership of each data on flows.
Optionally, the clustering algorithm is fuzzy division clustering algorithm or is Maximum Entropy Clustering Algorithm.
Based on the same inventive concept, the embodiment of the invention also provides a kind of equipment.Referring to Fig. 3, which includes:
At least one processor 30, and
The memory 31 being connect at least one described processor 30;
Wherein, the memory 31 is stored with the instruction that can be executed by least one described processor 30, and described at least one A processor 30 realizes the knowledge of the above-mentioned network flow data of the embodiment of the present invention by executing the instruction that the memory 31 stores Other method.
Based on the same inventive concept, the embodiment of the invention also provides a kind of computer readable storage medium, the calculating Machine readable storage medium storing program for executing is stored with computer instruction, when the computer instruction is run on computers, so that computer is held The recognition methods of the above-mentioned network flow data of the row embodiment of the present invention.
It should be understood by those skilled in the art that, the embodiment of the present invention can provide as method, system or computer program Product.Therefore, complete hardware embodiment, complete software embodiment or reality combining software and hardware aspects can be used in the present invention Apply the form of example.Moreover, it wherein includes the computer of computer usable program code that the present invention, which can be used in one or more, The shape for the computer program product implemented in usable storage medium (including but not limited to magnetic disk storage and optical memory etc.) Formula.
The present invention be referring to according to the method for the embodiment of the present invention, the process of equipment (system) and computer program product Figure and/or block diagram describe.It should be understood that every one stream in flowchart and/or the block diagram can be realized by computer program instructions The combination of process and/or box in journey and/or box and flowchart and/or the block diagram.It can provide these computer programs Instruct the processor of general purpose computer, special purpose computer, Embedded Processor or other programmable data processing devices to produce A raw machine, so that being generated by the instruction that computer or the processor of other programmable data processing devices execute for real The device for the function of being specified in present one or more flows of the flowchart and/or one or more blocks of the block diagram.
These computer program instructions, which may also be stored in, is able to guide computer or other programmable data processing devices with spy Determine in the computer-readable memory that mode works, so that it includes referring to that instruction stored in the computer readable memory, which generates, Enable the manufacture of device, the command device realize in one box of one or more flows of the flowchart and/or block diagram or The function of being specified in multiple boxes.
These computer program instructions also can be loaded onto a computer or other programmable data processing device, so that counting Series of operation steps are executed on calculation machine or other programmable devices to generate computer implemented processing, thus in computer or The instruction executed on other programmable devices is provided for realizing in one or more flows of the flowchart and/or block diagram one The step of function of being specified in a box or multiple boxes.
Obviously, various changes and modifications can be made to the invention without departing from essence of the invention by those skilled in the art Mind and range.In this way, if these modifications and changes of the present invention belongs to the range of the claims in the present invention and its equivalent technologies Within, then the present invention is also intended to include these modifications and variations.

Claims (10)

1. a kind of recognition methods of network flow data characterized by comprising
Historical traffic data collection is chosen, the historical traffic data is clustered using clustering algorithm, obtains the history stream Measure the history domain class center of data set, wherein the history domain class center is the cluster centre of the historical traffic data collection;
Using the history domain class center as the reference cluster centre of current traffic data collection to be identified, according to described with reference to poly- Class center generates the constraint function constrained the cluster centre of current traffic data collection, based on the constraint function to institute It states current traffic data collection to be clustered, and identifies that the current traffic data concentrates each flow number according to the cluster result According to type, the constraint function is for making the cluster centre of the current traffic data collection close to the history domain class center.
2. the method as described in claim 1, which is characterized in that the constraint function of the generation are as follows:
It is described that the current traffic data collection is clustered based on the constraint function, comprising:
Objective function, the objective function are determined according to the constraint function and clustering algorithm are as follows:
Wherein, C is the classification sum of cluster, and N indicates the sum of flow sample, xjFor flow sample, γ is balance parameters and γ ∈ (0 ,+∞), | | xj-vi||2Indicate the distance between j-th of sample and ith cluster center, β be balance parameters and β ∈ [0 ,+ ∞),For the ith cluster center at history domain class center, viFor the cluster centre of the i-th class, μijIndicate that j-th of sample is i-th The degree of membership of a class;
The current traffic data collection is clustered based on the objective function.
3. method according to claim 2, which is characterized in that described to be based on the objective function to the current traffic data Collection is clustered, comprising:
The value of the classification sum C of the cluster of the current traffic data is chosen, and chooses the value and β of balance parameters γ respectively Value;
Using the value of C, the value of γ, the value of β and the current traffic data collection as flow sample input the objective function into Row cluster, obtains the optimal degree of membership of the current data set, and the optimal degree of membership includes that the current data is concentrated each Multiple degrees of membership of data on flows.
4. method as claimed in claim 3, which is characterized in that described to identify the present flow rate number according to the cluster result According to the type for concentrating each data on flows, comprising:
According to the optimal degree of membership, the target degree of membership for being greater than threshold value in multiple degrees of membership of each data on flows is determined;
Each data on flows is classified as to the corresponding type of target degree of membership of each data on flows.
5. such as method of any of claims 1-4, which is characterized in that the clustering algorithm is that fuzzy division cluster is calculated Method is Maximum Entropy Clustering Algorithm.
6. a kind of equipment for network flow data identification characterized by comprising
Cluster cell clusters the historical traffic data using clustering algorithm, obtains for choosing historical traffic data collection Obtain the history domain class center of the historical traffic data collection, wherein the history domain class center is the historical traffic data collection Cluster centre;
Recognition unit, for using the history domain class center as the reference cluster centre of current traffic data collection to be identified, Cluster centre is referred to according to described, the constraint function constrained the cluster centre of current traffic data collection is generated, is based on institute It states constraint function to cluster the current traffic data collection, and the current traffic data is identified according to the cluster result The type of each data on flows is concentrated, the constraint function is used to make the cluster centre of the current traffic data collection described in History domain class center.
7. equipment as claimed in claim 6, which is characterized in that the constraint function of the generation are as follows:
The recognition unit is also used to:
Objective function, the objective function are determined according to the constraint function and clustering algorithm are as follows:
Wherein, C is the classification sum of cluster, and N indicates the sum of flow sample, xjFor flow sample, γ is balance parameters and γ ∈ (0 ,+∞), | | xj-vi||2Indicate the distance between j-th of sample and ith cluster center, β be balance parameters and β ∈ [0 ,+ ∞),For the ith cluster center at history domain class center, viFor the cluster centre of the i-th class, μijIndicate that j-th of sample is i-th The degree of membership of a class;
The current traffic data collection is clustered based on the objective function.
8. equipment as claimed in claim 7, which is characterized in that the recognition unit is also used to:
The value of the classification sum C of the cluster of the current traffic data is chosen, and chooses the value and β of balance parameters γ respectively Value;
Using the value of C, the value of γ, the value of β and the current traffic data collection as flow sample input the objective function into Row cluster, obtains the optimal degree of membership of the current data set, and the optimal degree of membership includes that the current data is concentrated each Multiple degrees of membership of data on flows.
9. a kind of equipment characterized by comprising
At least one processor, and
The memory being connect at least one described processor;
Wherein, the memory is stored with the instruction that can be executed by least one described processor, at least one described processor By executing the instruction of the memory storage, the method according to claim 1 to 5 is executed.
10. a kind of computer readable storage medium, it is characterised in that:
The computer-readable recording medium storage has computer instruction, when the computer instruction is run on computers, So that computer executes method according to any one of claims 1 to 5.
CN201810167055.XA 2018-02-28 2018-02-28 A kind of recognition methods and equipment of network flow data Pending CN110210504A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810167055.XA CN110210504A (en) 2018-02-28 2018-02-28 A kind of recognition methods and equipment of network flow data

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810167055.XA CN110210504A (en) 2018-02-28 2018-02-28 A kind of recognition methods and equipment of network flow data

Publications (1)

Publication Number Publication Date
CN110210504A true CN110210504A (en) 2019-09-06

Family

ID=67778688

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810167055.XA Pending CN110210504A (en) 2018-02-28 2018-02-28 A kind of recognition methods and equipment of network flow data

Country Status (1)

Country Link
CN (1) CN110210504A (en)

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101252541A (en) * 2008-04-09 2008-08-27 中国科学院计算技术研究所 Method for establishing network flow classified model and corresponding system thereof
CN102881019A (en) * 2012-10-08 2013-01-16 江南大学 Fuzzy clustering image segmenting method with transfer learning function
US20130100849A1 (en) * 2011-10-20 2013-04-25 Telefonaktiebolaget Lm Ericsson (Publ) Creating and using multiple packet traffic profiling models to profile packet flows
CN103209169A (en) * 2013-02-23 2013-07-17 北京工业大学 Network flow filtering system and method based on field programmable gate array (FPGA)
CN106254321A (en) * 2016-07-26 2016-12-21 中国人民解放军防空兵学院 A kind of whole network abnormal data stream sorting technique
CN106789359A (en) * 2017-02-15 2017-05-31 广东工业大学 A kind of net flow assorted method and device based on grey wolf algorithm
CN107733937A (en) * 2017-12-01 2018-02-23 广东奥飞数据科技股份有限公司 A kind of Abnormal network traffic detection method

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101252541A (en) * 2008-04-09 2008-08-27 中国科学院计算技术研究所 Method for establishing network flow classified model and corresponding system thereof
US20130100849A1 (en) * 2011-10-20 2013-04-25 Telefonaktiebolaget Lm Ericsson (Publ) Creating and using multiple packet traffic profiling models to profile packet flows
CN102881019A (en) * 2012-10-08 2013-01-16 江南大学 Fuzzy clustering image segmenting method with transfer learning function
CN103209169A (en) * 2013-02-23 2013-07-17 北京工业大学 Network flow filtering system and method based on field programmable gate array (FPGA)
CN106254321A (en) * 2016-07-26 2016-12-21 中国人民解放军防空兵学院 A kind of whole network abnormal data stream sorting technique
CN106789359A (en) * 2017-02-15 2017-05-31 广东工业大学 A kind of net flow assorted method and device based on grey wolf algorithm
CN107733937A (en) * 2017-12-01 2018-02-23 广东奥飞数据科技股份有限公司 A kind of Abnormal network traffic detection method

Non-Patent Citations (7)

* Cited by examiner, † Cited by third party
Title
PENGJIANG QIAN 等: "Cluster Prototypes and Fuzzy Memberships Jointly Leveraged Cross-Domain Maximum Entropy Clustering", 《IEEE TRANS CYBERN. AUTHOR MANUSCRIPT》 *
SHOUWEI SUN 等: "Transfer Learning Based Maximum Entropy Clustering", 《2014 4TH IEEE ITERNATIONAL CONFERENCE ON INFORMATION SCIENCE AND TECHNOLOGY》 *
孙寿伟 等: "具备迁移能力的类中心距离极大化聚类算法", 《计算机工程与应用》 *
李明 等: "借鉴历史知识的类中心距离极大化聚类算法", 《计算机工程与设计》 *
蒋亦樟 等: "基于知识利用的迁移学习一般化增强模糊划分聚类算法", 《模式识别与人工智能》 *
钱鹏江 等: "知识迁移极大熵聚类算法", 《控制与决策》 *
陆伟宙 等: "基于半监督聚类的Web流量分类", 《计算机科学》 *

Similar Documents

Publication Publication Date Title
KR102337070B1 (en) Method and system for building training database using automatic anomaly detection and automatic labeling technology
CN107704512A (en) Financial product based on social data recommends method, electronic installation and medium
CN109726763A (en) A kind of information assets recognition methods, device, equipment and medium
CN116561542B (en) Model optimization training system, method and related device
CN104391879A (en) Method and device for hierarchical clustering
CN111199469A (en) User payment model generation method and device and electronic equipment
CN108805174A (en) clustering method and device
KR20210066545A (en) Electronic device, method, and computer readable medium for simulation of semiconductor device
CN113704389A (en) Data evaluation method and device, computer equipment and storage medium
CN115102836A (en) Network equipment fault analysis method and device and storage medium
CN112783508B (en) File compiling method, device, equipment and storage medium
CN115439919B (en) Model updating method, device, equipment, storage medium and program product
CN109743200B (en) Resource feature-based cloud computing platform computing task cost prediction method and system
CN115412401B (en) Method and device for training virtual network embedding model and virtual network embedding
CN114896306B (en) Data mining method and system based on artificial intelligence model
CN110210504A (en) A kind of recognition methods and equipment of network flow data
CN116316699A (en) Large power grid frequency security situation prediction method, device and storage medium
CN116108276A (en) Information recommendation method and device based on artificial intelligence and related equipment
CN114139636B (en) Abnormal operation processing method and device
CN114841490A (en) Ecological protection priority area identification method, system, device and storage medium
CN114610590A (en) Method, device and equipment for determining operation time length and storage medium
CN114123190A (en) Method and device for determining target region to which ammeter belongs, electronic equipment and storage medium
CN112015659A (en) Prediction method and device based on network model
KR102195958B1 (en) Method for applying the maximum number of work limits per workers based on reliability in a multi-assignment crowdsourcing based projects for artificial intelligence training data generation
CN117611957B (en) Unsupervised visual representation learning method and system based on unified positive and negative pseudo labels

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20190906

RJ01 Rejection of invention patent application after publication