CN109309630A - A kind of net flow assorted method, system and electronic equipment - Google Patents
A kind of net flow assorted method, system and electronic equipment Download PDFInfo
- Publication number
- CN109309630A CN109309630A CN201811113686.XA CN201811113686A CN109309630A CN 109309630 A CN109309630 A CN 109309630A CN 201811113686 A CN201811113686 A CN 201811113686A CN 109309630 A CN109309630 A CN 109309630A
- Authority
- CN
- China
- Prior art keywords
- network flow
- address
- network
- feature set
- data
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L47/00—Traffic control in data switching networks
- H04L47/10—Flow control; Congestion control
- H04L47/24—Traffic characterised by specific attributes, e.g. priority or QoS
- H04L47/2441—Traffic characterised by specific attributes, e.g. priority or QoS relying on flow classification, e.g. using integrated services [IntServ]
Landscapes
- Engineering & Computer Science (AREA)
- Computer Networks & Wireless Communication (AREA)
- Signal Processing (AREA)
- Data Exchanges In Wide-Area Networks (AREA)
Abstract
This application involves a kind of net flow assorted method, system and electronic equipments.This method comprises: step a: acquisition network flow data, and the processing that labels is carried out to the network flow data;Step b: it labels that treated according to described network flow data extracts bidirectional flow feature set;Step c: disaggregated model is constructed based on the bidirectional flow feature set, the classification results of network flow data are exported by the disaggregated model.The application classifies to network flow using the bidirectional flow feature in network flow data, can new opplication a large amount of in internet precisely be identified and be classified, improve classification accuracy, and the high-precision and high-performance of effective Logistics networks traffic classification are capable of.
Description
Technical field
The application belongs to net flow assorted technical field, in particular to a kind of net flow assorted method, system and electricity
Sub- equipment.
Background technique
As the high speed of internet is universal, due to the appearance of a large amount of new opplications, modern network environment becomes to become increasingly complex
And diversification.Traffic classification and network application identification play an important role in network management services and security system, such as take
Business quality, intruding detection system and traffic control system.If can accurately the flow in network system carry out classification and
Using identification, not only there is very big promotion to network security and network management services efficiency, system time and memory can also be reduced
Expense.
Currently, existing net flow assorted method specifically includes that
One, based on the net flow assorted of representative learning: being pre-processed to the network flow data got, use table
It levies learning algorithm and feature extraction is carried out to pretreated network flow data, network flow data is generated into network flow vector,
Classified according to the network flow vector to the network flow data, it may be achieved efficiently classify to network flow.
Two, based on the net flow assorted of semi-supervised learning: obtaining the network flow of tagged type and unmarked type, press
The stream feature in every network flow is extracted according to default fixed amount, obtains NetFlow characteristic vector;According to the net of tagged type
Network stream, calculate in default fixed amount it is each stream feature information gain, and according to the information gain to each stream feature into
Row characteristic weighing;The network flow of tagged type and unmarked type is mixed, and using k-means algorithm to mixing after
Network flow clustered, obtain k and cluster;Obtain the k each of cluster cluster in marked network flow feature to
The number of amount, and determine the accounting value of each type in each cluster;Wherein the accounting value is equal to the mark of each type
The number of note NetFlow characteristic vector accounts for the ratio of the number of marked NetFlow characteristic vector total in cluster;When in each cluster
When the total number of marked NetFlow characteristic vector is added less than default network flow threshold value, then clustering accordingly, it is unknown to be determined as
Otherwise protocol family will cluster accordingly and be determined as the maximum type of ratio in marked NetFlow characteristic vector;Repeat above two
Step clusters until k and all determines the flow cluster of discharge pattern;Using the flow cluster for determining discharge pattern as training number
According to, training outlet on traffic classifier.This process employs the advantages of semi-supervised learning, only use labeled data with traditional
The supervised learning algorithm of training pattern is compared, and Stability and veracity is more preferable.
Three, a kind of adaptive semi-supervised net flow assorted: obtaining the network flow of tagged type and unmarked type,
The stream feature for presetting fixed amount in every network flow is extracted, NetFlow characteristic vector is obtained;According to marked network flow feature
Vector calculates the mass center of the NetFlow characteristic vector set in each type, obtains vector set M;It is k- with the vector set M
The initial center point of means cluster carries out certainly the NetFlow characteristic vector collection X of mixed tagged type and unmarked type
The semi-supervised k-means cluster adapted to, and export clustering for k-means;According to the marked net of each cluster in the clustering of output
Network flow in obtained every class cluster is mapped in affiliated discharge pattern, obtains by the maximum a posteriori probability of network stream feature vector
To the flow cluster of known type;Using the flow cluster of the known type as training data, the traffic classifier in outlet is trained.
In conclusion existing net flow assorted method mainly focuses on the net flow assorted of algorithm level, it is all pair
The sorting algorithm part of training stage proposes various optimizations and innovatory algorithm, and not addressing how but can be from network number
According to a large amount of related effective feature set problems are extracted in packet, new opplication a large amount of in internet can not accurately be identified
And classification.
Summary of the invention
This application provides a kind of net flow assorted method, system and electronic equipments, it is intended at least to a certain extent
Solve one of above-mentioned technical problem in the prior art.
To solve the above-mentioned problems, this application provides following technical solutions:
A kind of net flow assorted method, comprising the following steps:
Step a: acquisition network flow data, and the processing that labels is carried out to the network flow data;
Step b: it labels that treated according to described network flow data extracts bidirectional flow feature set;
Step c: constructing disaggregated model based on the bidirectional flow feature set, exports network flow number by the disaggregated model
According to classification results.
The technical solution that the embodiment of the present application is taken further include: in the step a, the acquisition network flow data,
The network flow data processing that label is specifically included:
Step a1: the applicating category in selection network flow;
Step a2: each grid log using corresponding network flow data packet and corresponding period is collected;
Step a3: analyzing the network flow data packet, find out each application natural quality and with other application it
Between the IP address that exchanges and transport protocol;
Step a4: IP endpoint associated with each application and transmission packet number in the grid log are extracted, and is tied
It closes IP address and transport protocol is associated fusion, complete the processing that labels of the network flow data.
The technical solution that the embodiment of the present application is taken further include: in the step b, the basis labels, and treated
Network flow data extracts bidirectional flow feature set and specifically includes:
Step b1: being analyzed according to the network flow data to have labelled, counts the network flow data respectively
In each pair of { source IP address, purpose IP address } { source IP address -> purpose IP address } based on different port number and { destination IP
Location -> source IP address } between bilateral network stream information;
Step b2: finding out the positive network flow between each pair of { source IP address -> purpose IP address }, and from the positive network
All positive network flow feature sets are extracted in stream;
Step b3: finding out the reversed network flow between each pair of { purpose IP address -> source IP address }, and from the reversed network
All reversed network flow feature sets are extracted in stream;
Step b4: combining the forward and reverse network flow feature set between each pair of { source IP address, purpose IP address }, forms M
The bidirectional flow feature set of dimensional feature.
The technical solution that the embodiment of the present application is taken further include: the step b further include: utilize maximum variance explanation facility
The bidirectional flow feature set is optimized.
The technical solution that the embodiment of the present application is taken further include: described to utilize maximum variance explanation facility to bidirectional flow feature
Collection, which optimizes, to be specifically included:
Step b5: standard normalization is carried out to the network flow data;
Step b6: on the network flow data, the average value of each feature in bidirectional flow feature set is found out;
Step b7: the corresponding average value of each feature is subtracted with the network flow data after normalization, obtains each feature
It is new as a result, and doing normalized square mean to the new result of each feature;
Step b8: the covariance matrix of the bidirectional flow feature set is calculated, and according to main diagonal in the covariance matrix
The variance yields of each feature on line carries out ascending sequence, obtains the most close N of degree of association highest in bidirectional flow feature set
Dimensional feature;
Step b9: calculating the characteristic value and feature vector of the covariance matrix, and characteristic value be ranked up by size,
The corresponding feature vector of bidirectional flow feature after selecting top n optimization;
Step b10: the network flow data is projected in N number of feature vector;
Step b11: the M dimension bidirectional flow feature set of the network flow data is optimized to N-dimensional bidirectional flow feature set.
A kind of another technical solution that the embodiment of the present application is taken are as follows: net flow assorted system, comprising:
Data acquisition module: for acquiring network flow data;
Data preprocessing module: for carrying out the processing that labels to the network flow data;
Characteristic extracting module: for labelling that treated according to, network flow data extracts bidirectional flow feature set;
Model construction module: defeated by the disaggregated model for constructing disaggregated model based on the bidirectional flow feature set
The classification results of network flow data out.
The technical solution that the embodiment of the present application is taken further include:
The data collecting module collected network flow data specifically includes: the applicating category in selection network flow is received
Collect each grid log using corresponding network flow data packet and corresponding period;
The data preprocessing module specifically includes the network flow data processing that label: analyzing the network flow
Measure data packet, the IP address and transport protocol finding out the natural quality of each application and exchanging between other application;It extracts
IP endpoint associated with each application and transmission packet number in the grid log, and combine IP address and transport protocol into
Row association fusion, completes the processing that labels of the network flow data.
The technical solution that the embodiment of the present application is taken further include: the characteristic extracting module is according to the net that labels that treated
Network data on flows is extracted bidirectional flow feature set and is specifically included:
It is analyzed, is counted respectively each pair of in the network flow data according to the network flow data to have labelled
{ source IP address, purpose IP address } { source IP address -> purpose IP address } and { purpose IP address -> source based on different port number
IP address } between bilateral network stream information;
The positive network flow between each pair of { source IP address -> purpose IP address } is found out, and is extracted from the positive network flow
All positive network flow feature sets out;
The reversed network flow between each pair of { purpose IP address -> source IP address } is found out, and is extracted from the reversed network flow
All reversed network flow feature sets out;
The forward and reverse network flow feature set between each pair of { source IP address, purpose IP address } is combined, M dimensional feature is formed
Bidirectional flow feature set.
The technical solution that the embodiment of the present application is taken further includes characteristic optimization module, and the characteristic optimization module is for utilizing
Maximum variance explanation facility optimizes the bidirectional flow feature set.
The technical solution that the embodiment of the present application is taken further include: the characteristic optimization module utilizes maximum variance explanation facility
Bidirectional flow feature set is optimized and is specifically included:
Standard normalization is carried out to the network flow data;
On the network flow data, the average value of each feature in bidirectional flow feature set is found out;
The corresponding average value of each feature is subtracted with the network flow data after normalization, obtains the new knot of each feature
Fruit, and normalized square mean is done to the new result of each feature;
The covariance matrix of the bidirectional flow feature set is calculated, and according to every on leading diagonal in the covariance matrix
The variance yields of a feature carries out ascending sequence, obtains the most close N-dimensional feature of degree of association highest in bidirectional flow feature set;
The characteristic value and feature vector of the covariance matrix are calculated, and characteristic value is ranked up by size, selects preceding N
The corresponding feature vector of bidirectional flow feature after a optimization;
The network flow data is projected in N number of feature vector;
The M dimension bidirectional flow feature set of the network flow data is optimized to N-dimensional bidirectional flow feature set.
The another technical solution that the embodiment of the present application is taken are as follows: a kind of electronic equipment, comprising:
At least one processor;And
The memory being connect at least one described processor communication;Wherein,
The memory is stored with the instruction that can be executed by one processor, and described instruction is by described at least one
Device is managed to execute, so that at least one described processor is able to carry out the following operation of above-mentioned net flow assorted method:
Step a: acquisition network flow data, and the processing that labels is carried out to the network flow data;
Step b: it labels that treated according to described network flow data extracts bidirectional flow feature set;
Step c: constructing disaggregated model based on the bidirectional flow feature set, exports network flow number by the disaggregated model
According to classification results.
Compared with the existing technology, the beneficial effect that the embodiment of the present application generates is: the network flow of the embodiment of the present application
Classification method, system and electronic equipment classify to network flow using the bidirectional flow feature in network flow data, can
New opplication a large amount of in internet is precisely identified and classified;Simultaneously using the method for maximum variance explanation facility to two-way
Stream feature optimizes association, has ensured the high cohesion of bidirectional flow feature, has improved classification accuracy, being capable of effective ensure ne
The high-precision and high-performance of network traffic classification.
Detailed description of the invention
Fig. 1 is the flow chart of the net flow assorted method of the embodiment of the present application;
Fig. 2 is the acquisition of network flow data and the process schematic that labels;
Fig. 3 is extraction and the optimization process schematic diagram of bidirectional flow feature set;
Fig. 4 is the structural schematic diagram of the net flow assorted system of the embodiment of the present application;
Fig. 5 is the hardware device structural schematic diagram of net flow assorted method provided by the embodiments of the present application.
Specific embodiment
It is with reference to the accompanying drawings and embodiments, right in order to which the objects, technical solutions and advantages of the application are more clearly understood
The application is further elaborated.It should be appreciated that specific embodiment described herein is only to explain the application, not
For limiting the application.
Referring to Fig. 1, being the flow chart of the net flow assorted method of the embodiment of the present application.The network of the embodiment of the present application
Traffic classification method the following steps are included:
Step 100: acquisition network flow data, and the processing that labels is carried out to network flow data;
In step 100, the acquisition of network flow data and the process that labels are as shown in Figure 2, the specific steps are as follows:
Step 101: the applicating category in selection network flow;
Step 102: stationary applications class flow is captured by high performance network monitoring software continuity;
Step 103: being collected into the grid of the corresponding network flow data packet of each applicating category and corresponding period
Log;
Step 104: analysis network flow data packet is found out the natural quality of each application and is handed between other application
Key message of stream, such as IP address, transport protocol etc.;
Step 105: extracting IP endpoint associated with each application and transmission packet number in grid log, and combine
IP address and transport protocol are associated fusion, complete the processing that labels of network flow data.
Step 200: from extracting bidirectional flow feature set in the network flow data that labels that treated, and utilizing maximum variance
Explanation facility optimizes bidirectional flow feature set;
In step 200, the extraction of bidirectional flow feature set and optimization process as shown in figure 3, itself specifically includes the following steps:
Step 201: being analyzed, counted respectively every in network flow data according to the network flow data to have labelled
{ source IP address -> purpose IP address } to { source IP address, purpose IP address } based on different port number and purpose IP address ->
Source IP address } between two-way (forward and reverse) network flow information, each pair of at this time { source IP address, purpose IP address } has two
The network flow information of opposite direction exists;
Step 202: finding out the positive network flow between each pair of { source IP address -> purpose IP address }, therefrom extract about every
All positive network flow feature set F1 in item forward direction network flow;
Step 203: finding out the reversed network flow between each pair of { purpose IP address -> source IP address }, therefrom extract about every
All reversed network flow feature set F2 in the reversed network flow of item;
Step 204: combine each pair of { source IP address, purpose IP address } between forward and reverse network flow feature set F1,
F2 }, the bidirectional flow feature set F of M dimensional feature is formed, F={ F1, F2 } is denoted as;
In step 204, by combining all forward and reverse network flow feature sets, play the role of uniformly optimizing.
Step 205: to network flow data carry out standard normalization, by network flow data collection be normalized to mean value be 0,
The data set that variance is 1;Normalize formula are as follows: x*=(x-u)/δ, wherein u is the mean value of all-network data on flows, and δ is institute
There is the standard deviation of network flow data;
Step 206: on network flow data, finding out the average value of each feature on bidirectional flow feature set F;
Step 207: subtracting the corresponding average value of each feature with normalized network flow data, obtain each feature
Newly as a result, and doing normalized square mean to the new result of each feature;
Step 208: calculating the covariance matrix of bidirectional flow feature set F, and according in covariance matrix on leading diagonal
The variance yields of each feature carry out ascending sequence, it is special to obtain the most close N-dimensional of degree of association highest in bidirectional flow feature set F
Sign;
The covariance two-by-two between feature in step 208, on leading diagonal, covariance is greater than 0, indicate two features it
Between be positively correlated trend;Covariance indicates negatively correlated trend between two features less than 0;Covariance is equal to 0, indicates two
It is independent between feature;Covariance absolute value is bigger, contacted between two features it is closer, otherwise it is smaller.According to this 5 conditions,
The most close N-dimensional feature of degree of association highest in bidirectional flow feature set F can be calculated.The application utilizes maximum variance explanation engine
System preferentially combines the most close feature of bilateral network stream feature set degree of being associated in network flow data, filters out
Best embody the feature set of network flow classification.
Step 209: calculating the characteristic value and feature vector of covariance matrix, and characteristic value is ranked up by size, select
The corresponding feature vector of bidirectional flow feature after top n optimization out;
Step 210: network flow data being projected in N number of feature vector of selection: assuming that network flow data sample
Number is p, and characteristic q, it is DataTransform (p*q) that network flow data, which subtracts the sample matrix after characteristic mean, two-way
The covariance matrix of stream feature set is p*q, and the matrix of N number of feature vector composition of selection is EigenVectors (q*N), then
Network flow data after projection are as follows: OptimizeData (p*N)=DataTransform (p*q) X EigenVectors (q*
N);
In step 210, by the way that network flow data is projected in the corresponding feature vector of bidirectional flow feature after optimization,
The degree of polymerization that data can be improved reduces the influence of noise data, improves nicety of grading.
Step 211: the M dimension bidirectional flow feature set of network flow data is optimized to N-dimensional bidirectional flow feature set.
Step 300: the bidirectional flow feature set based on optimization, using the random forests algorithm building classification mould of supervised learning
Type exports the classification results of network flow data by disaggregated model;
In step 300, is modeled using the random forests algorithm of supervised learning, optimized bidirectional flow feature set is inputted
Classification based training is carried out to disaggregated model, and passes through disaggregated model Performance Evaluation, Optimum Classification model performance.With the survey of Qualify Phase
Data set is tried to show to test trained disaggregated model, test result, the bidirectional flow feature set building based on optimization
Disaggregated model obviously has very high nicety of grading, can improve classification effectiveness under the premise of guaranteeing compared with high-class accuracy rate, from
And improve overall performance.
Referring to Fig. 4, being the structure chart of the net flow assorted system of the embodiment of the present application.The network of the embodiment of the present application
Traffic classification system includes data acquisition module, data preprocessing module, characteristic extracting module, characteristic optimization module and model structure
Model block.
Data acquisition module: for acquiring network flow data;Wherein, Network Traffic Data Collection mode includes: selection
Applicating category in network flow captures stationary applications class flow by high performance network monitoring software continuity, is collected into every
The grid log of the corresponding network flow data packet of a applicating category and corresponding period.
Data preprocessing module: for carrying out the processing that labels to network flow data;Wherein, network flow data is beaten
Label process specifically includes: analysis network flow data packet finds out the natural quality of each application and between other application
Key message of exchange, such as IP address, transport protocol etc.;Extract IP associated with each application in grid log
Endpoint and transmission packet number, and IP address and transport protocol is combined to be associated fusion, complete the place that labels of network flow data
Reason.
Characteristic extracting module: for extracting bidirectional flow feature set from the network flow data that labels that treated;Specifically
Ground, bidirectional flow feature set extracting mode include:
A, it is analyzed according to the network flow data to have labelled, counts each pair of { source in network flow data respectively
IP address, purpose IP address } { source IP address -> purpose IP address } based on different port number and { purpose IP address -> source IP
Location } between two-way (forward and reverse) network flow information, each pair of at this time { source IP address, purpose IP address } has two phase negative sides
To network flow information exist;
B, the positive network flow between each pair of { source IP address -> purpose IP address } is found out, is therefrom extracted about every forward direction
All positive network flow feature set F1 in network flow;
C, the reversed network flow between each pair of { purpose IP address -> source IP address } is found out, is therefrom extracted reversed about every
All reversed network flow feature set F2 in network flow;
D, the forward and reverse network flow feature set { F1, F2 } between each pair of { source IP address, purpose IP address } is combined, is formed
The bidirectional flow feature set F of M dimensional feature, is denoted as F={ F1, F2 }.
Characteristic optimization module: for being optimized using bidirectional flow feature set of the maximum variance explanation facility to extraction;Tool
Body, bidirectional flow feature set optimal way includes:
A, standard normalization is carried out to network flow data, network flow data collection is normalized to mean value is 0, variance is
1 data set;Normalize formula are as follows: x*=(x-u)/δ, wherein u is the mean value of all-network data on flows, and δ is all-network
The standard deviation of data on flows;
B, on network flow data, the average value of each feature on bidirectional flow feature set F is found out;
C, the corresponding average value of each feature is subtracted with normalized network flow data, obtains the new knot of each feature
Fruit, and normalized square mean is done to the new result of each feature;
D, the covariance matrix of bidirectional flow feature set F is calculated, and according in covariance matrix each of on leading diagonal
The variance yields of feature carries out ascending sequence, obtains the most close N-dimensional feature of degree of association highest in bidirectional flow feature set F;Its
In, it is the covariance two-by-two between feature on leading diagonal, covariance is greater than 0, indicates the trend that is positively correlated between two features;
Covariance indicates negatively correlated trend between two features less than 0;Covariance is equal to 0, indicates independent between two features;Association
Variance absolute value is bigger, contacted between two features it is closer, otherwise it is smaller.According to this 5 conditions, can be calculated two-way
Flow the most close N-dimensional feature of degree of association highest in feature set F.The application is using maximum variance explanation facility to network flow data
In the most close feature of bilateral network stream feature set degree of being associated preferentially combined, filter out and best embody network flow
The feature set of classification.
E, the characteristic value and feature vector of covariance matrix are calculated, and characteristic value is ranked up by size, selects top n
The corresponding feature vector of bidirectional flow feature after optimization;
F, network flow data is projected in N number of feature vector of selection: assuming that network flow data sample number is p,
Characteristic is q, and it is DataTransform (p*q), bidirectional flow feature that network flow data, which subtracts the sample matrix after characteristic mean,
The covariance matrix of collection is p*q, and the matrix of N number of feature vector composition of selection is EigenVectors (q*N), then after projection
Network flow data are as follows: OptimizeData (p*N)=DataTransform (p*q) X EigenVectors (q*N);
G, the M dimension bidirectional flow feature set of network flow data is optimized to N-dimensional bidirectional flow feature set.
Model construction module: for the bidirectional flow feature set based on optimization, using the random forests algorithm structure of supervised learning
Disaggregated model is built, the classification results of network flow data are exported by disaggregated model;Wherein, using the random forest of supervised learning
Optimized bidirectional flow feature set is input to disaggregated model and carries out classification based training, and passes through disaggregated model by algorithm modeling
It can assess, Optimum Classification model performance.Trained disaggregated model is tested with the test data set of Qualify Phase, is tested
The results show that the disaggregated model that the bidirectional flow feature set based on optimization constructs obviously has very high nicety of grading, can guarantee
Under the premise of high-class accuracy rate, classification effectiveness is improved, to improve overall performance.
Fig. 5 is the hardware device structural schematic diagram of net flow assorted method provided by the embodiments of the present application.Such as Fig. 5 institute
Show, which includes one or more processors and memory.It takes a processor as an example, which can also include: defeated
Enter system and output system.
Processor, memory, input system and output system can be connected by bus or other modes, in Fig. 5 with
For being connected by bus.
Memory as a kind of non-transient computer readable storage medium, can be used for storing non-transient software program, it is non-temporarily
State computer executable program and module.Processor passes through operation non-transient software program stored in memory, instruction
And module realizes the place of above method embodiment thereby executing the various function application and data processing of electronic equipment
Reason method.
Memory may include storing program area and storage data area, wherein storing program area can storage program area, extremely
Application program required for a few function;It storage data area can storing data etc..In addition, memory may include that high speed is random
Memory is accessed, can also include non-transient memory, a for example, at least disk memory, flush memory device or other are non-
Transient state solid-state memory.In some embodiments, it includes the memory remotely located relative to processor that memory is optional, this
A little remote memories can pass through network connection to processing system.The example of above-mentioned network includes but is not limited to internet, enterprise
Intranet, local area network, mobile radio communication and combinations thereof.
Input system can receive the number or character information of input, and generate signal input.Output system may include showing
Display screen etc. shows equipment.
One or more of module storages in the memory, are executed when by one or more of processors
When, execute the following operation of any of the above-described embodiment of the method:
Step a: acquisition network flow data, and the processing that labels is carried out to the network flow data;
Step b: it labels that treated according to described network flow data extracts bidirectional flow feature set;
Step c: constructing disaggregated model based on the bidirectional flow feature set, exports network flow number by the disaggregated model
According to classification results.
Method provided by the embodiment of the present application can be performed in the said goods, has the corresponding functional module of execution method and has
Beneficial effect.The not technical detail of detailed description in the present embodiment, reference can be made to method provided by the embodiments of the present application.
The embodiment of the present application provides a kind of non-transient (non-volatile) computer storage medium, and the computer storage is situated between
Matter is stored with computer executable instructions, the executable following operation of the computer executable instructions:
Step a: acquisition network flow data, and the processing that labels is carried out to the network flow data;
Step b: it labels that treated according to described network flow data extracts bidirectional flow feature set;
Step c: constructing disaggregated model based on the bidirectional flow feature set, exports network flow number by the disaggregated model
According to classification results.
The embodiment of the present application provides a kind of computer program product, and the computer program product is non-temporary including being stored in
Computer program on state computer readable storage medium, the computer program include program instruction, when described program instructs
When being computer-executed, the computer is made to execute following operation:
Step a: acquisition network flow data, and the processing that labels is carried out to the network flow data;
Step b: it labels that treated according to described network flow data extracts bidirectional flow feature set;
Step c: constructing disaggregated model based on the bidirectional flow feature set, exports network flow number by the disaggregated model
According to classification results.
The net flow assorted method, system and electronic equipment of the embodiment of the present application utilize two-way in network flow data
Stream feature classifies to network flow, can new opplication a large amount of in internet precisely be identified and be classified;Benefit simultaneously
Association is optimized to bidirectional flow feature with the method for maximum variance explanation facility, the high cohesion of bidirectional flow feature has been ensured, has mentioned
High classification accuracy, is capable of the high-precision and high-performance of effective Logistics networks traffic classification.
The foregoing description of the disclosed embodiments makes professional and technical personnel in the field can be realized or use the application.
Various modifications to these embodiments will be readily apparent to those skilled in the art, defined herein
General Principle can realize in other embodiments without departing from the spirit or scope of the application.Therefore, this Shen
These embodiments shown in the application please be not intended to be limited to, and are to fit to special with principle disclosed in the present application and novelty
The consistent widest scope of point.
Claims (11)
1. a kind of net flow assorted method, which comprises the following steps:
Step a: acquisition network flow data, and the processing that labels is carried out to the network flow data;
Step b: it labels that treated according to described network flow data extracts bidirectional flow feature set;
Step c: constructing disaggregated model based on the bidirectional flow feature set, exports network flow data by the disaggregated model
Classification results.
2. net flow assorted method according to claim 1, which is characterized in that in the step a, the acquisition net
Network data on flows specifically includes the network flow data processing that label:
Step a1: the applicating category in selection network flow;
Step a2: each grid log using corresponding network flow data packet and corresponding period is collected;
Step a3: analyzing the network flow data packet, finds out the natural quality of each application and hands between other application
The IP address and transport protocol of stream;
Step a4: IP endpoint associated with each application and transmission packet number in the grid log are extracted, and combines IP
Address and transport protocol are associated fusion, complete the processing that labels of the network flow data.
3. net flow assorted method according to claim 2, which is characterized in that in the step b, the basis is beaten
Network flow data after tag processes extracts bidirectional flow feature set and specifically includes:
Step b1: being analyzed according to the network flow data to have labelled, is counted respectively every in the network flow data
{ source IP address -> purpose IP address } to { source IP address, purpose IP address } based on different port number and purpose IP address ->
Source IP address } between bilateral network stream information;
Step b2: the positive network flow between each pair of { source IP address -> purpose IP address } is found out, and from the positive network flow
Extract all positive network flow feature sets;
Step b3: the reversed network flow between each pair of { purpose IP address -> source IP address } is found out, and from the reversed network flow
Extract all reversed network flow feature sets;
Step b4: combining the forward and reverse network flow feature set between each pair of { source IP address, purpose IP address }, forms M Wei Te
The bidirectional flow feature set of sign.
4. net flow assorted method according to claim 3, which is characterized in that the step b further include: utilize maximum
Variance explanation facility optimizes the bidirectional flow feature set.
5. net flow assorted method according to claim 4, which is characterized in that described to utilize maximum variance explanation facility
Bidirectional flow feature set is optimized and is specifically included:
Step b5: standard normalization is carried out to the network flow data;
Step b6: on the network flow data, the average value of each feature in bidirectional flow feature set is found out;
Step b7: the corresponding average value of each feature is subtracted with the network flow data after normalization, obtains the new of each feature
As a result, and doing normalized square mean to the new result of each feature;
Step b8: calculating the covariance matrix of the bidirectional flow feature set, and according in the covariance matrix on leading diagonal
The variance yields of each feature carry out ascending sequence, it is special to obtain the most close N-dimensional of degree of association highest in bidirectional flow feature set
Sign;
Step b9: the characteristic value and feature vector of the covariance matrix are calculated, and characteristic value is ranked up by size, is selected
The corresponding feature vector of bidirectional flow feature after top n optimization;
Step b10: the network flow data is projected in N number of feature vector;
Step b11: the M dimension bidirectional flow feature set of the network flow data is optimized to N-dimensional bidirectional flow feature set.
6. a kind of net flow assorted system characterized by comprising
Data acquisition module: for acquiring network flow data;
Data preprocessing module: for carrying out the processing that labels to the network flow data;
Characteristic extracting module: for labelling that treated according to, network flow data extracts bidirectional flow feature set;
Model construction module: for constructing disaggregated model based on the bidirectional flow feature set, net is exported by the disaggregated model
The classification results of network data on flows.
7. net flow assorted system according to claim 6, which is characterized in that
The data collecting module collected network flow data specifically includes: the applicating category in selection network flow is collected every
A grid log using corresponding network flow data packet and corresponding period;
The data preprocessing module specifically includes the network flow data processing that label: analyzing the network flow number
According to packet, the IP address and transport protocol finding out the natural quality of each application and exchanged between other application;Described in extraction
IP endpoint associated with each application and transmission packet number in grid log, and IP address and transport protocol is combined to be closed
Connection fusion, completes the processing that labels of the network flow data.
8. net flow assorted system according to claim 7, which is characterized in that the characteristic extracting module is according to mark
Label treated network flow data extracts bidirectional flow feature set and specifically includes:
It is analyzed according to the network flow data to have labelled, counts each pair of { source IP in the network flow data respectively
Address, purpose IP address } { source IP address -> purpose IP address } based on different port number and { purpose IP address -> source IP
Location } between bilateral network stream information;
The positive network flow between each pair of { source IP address -> purpose IP address } is found out, and extracts institute from the positive network flow
There is positive network flow feature set;
The reversed network flow between each pair of { purpose IP address -> source IP address } is found out, and extracts institute from the reversed network flow
There is reversed network flow feature set;
The forward and reverse network flow feature set between each pair of { source IP address, purpose IP address } is combined, the two-way of M dimensional feature is formed
Flow feature set.
9. net flow assorted system according to claim 8, which is characterized in that it further include characteristic optimization module, it is described
Characteristic optimization module is for optimizing the bidirectional flow feature set using maximum variance explanation facility.
10. net flow assorted system according to claim 9, which is characterized in that the characteristic optimization module is using most
Big variance explanation facility, which optimizes bidirectional flow feature set, to be specifically included:
Standard normalization is carried out to the network flow data;
On the network flow data, the average value of each feature in bidirectional flow feature set is found out;
The corresponding average value of each feature is subtracted with the network flow data after normalization, obtains the new of each feature as a result, simultaneously
Normalized square mean is done to the new result of each feature;
The covariance matrix of the bidirectional flow feature set is calculated, and according to each spy in the covariance matrix on leading diagonal
The variance yields of sign carries out ascending sequence, obtains the most close N-dimensional feature of degree of association highest in bidirectional flow feature set;
The characteristic value and feature vector of the covariance matrix are calculated, and characteristic value is ranked up by size, it is excellent to select top n
The corresponding feature vector of bidirectional flow feature after change;
The network flow data is projected in N number of feature vector;
The M dimension bidirectional flow feature set of the network flow data is optimized to N-dimensional bidirectional flow feature set.
11. a kind of electronic equipment, comprising:
At least one processor;And
The memory being connect at least one described processor communication;Wherein,
The memory is stored with the instruction that can be executed by one processor, and described instruction is by least one described processor
It executes, so that at least one described processor is able to carry out the following of above-mentioned 1 to 5 described in any item net flow assorted methods
Operation:
Step a: acquisition network flow data, and the processing that labels is carried out to the network flow data;
Step b: it labels that treated according to described network flow data extracts bidirectional flow feature set;
Step c: constructing disaggregated model based on the bidirectional flow feature set, exports network flow data by the disaggregated model
Classification results.
Priority Applications (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811113686.XA CN109309630B (en) | 2018-09-25 | 2018-09-25 | Network traffic classification method and system and electronic equipment |
PCT/CN2018/112401 WO2020062390A1 (en) | 2018-09-25 | 2018-10-29 | Network traffic classification method and system, and electronic device |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811113686.XA CN109309630B (en) | 2018-09-25 | 2018-09-25 | Network traffic classification method and system and electronic equipment |
Publications (2)
Publication Number | Publication Date |
---|---|
CN109309630A true CN109309630A (en) | 2019-02-05 |
CN109309630B CN109309630B (en) | 2021-09-21 |
Family
ID=65225067
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201811113686.XA Active CN109309630B (en) | 2018-09-25 | 2018-09-25 | Network traffic classification method and system and electronic equipment |
Country Status (2)
Country | Link |
---|---|
CN (1) | CN109309630B (en) |
WO (1) | WO2020062390A1 (en) |
Cited By (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110097120A (en) * | 2019-04-30 | 2019-08-06 | 南京邮电大学 | Network flow data classification method, equipment and computer storage medium |
CN110149280A (en) * | 2019-05-27 | 2019-08-20 | 中国科学技术大学 | Net flow assorted method and apparatus |
CN110365603A (en) * | 2019-06-28 | 2019-10-22 | 西安交通大学 | A kind of self adaptive network traffic classification method open based on 5G network capabilities |
CN112995063A (en) * | 2021-04-19 | 2021-06-18 | 北京智源人工智能研究院 | Flow monitoring method, device, equipment and medium |
CN113114672A (en) * | 2021-04-12 | 2021-07-13 | 常熟市国瑞科技股份有限公司 | Video transmission data fine measurement method |
CN113746686A (en) * | 2020-05-27 | 2021-12-03 | 阿里巴巴集团控股有限公司 | Network flow state determination method, computing device and storage medium |
CN117197591A (en) * | 2023-11-06 | 2023-12-08 | 青岛创新奇智科技集团股份有限公司 | Data classification method based on machine learning |
WO2024065185A1 (en) * | 2022-09-27 | 2024-04-04 | 西门子股份公司 | Device classification method and apparatus, electronic device, and computer-readable storage medium |
Families Citing this family (13)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111698223B (en) * | 2020-05-22 | 2022-02-22 | 哈尔滨工程大学 | Encrypted WEB fingerprint identification method based on automatic feature engineering |
CN111817971B (en) * | 2020-06-12 | 2023-03-24 | 华为技术有限公司 | Data center network flow splicing method based on deep learning |
CN111970305B (en) * | 2020-08-31 | 2022-08-12 | 福州大学 | Abnormal flow detection method based on semi-supervised descent and Tri-LightGBM |
CN112448868B (en) * | 2020-12-02 | 2022-09-30 | 新华三人工智能科技有限公司 | Network traffic data identification method, device and equipment |
CN112804253B (en) * | 2021-02-04 | 2022-07-12 | 湖南大学 | Network flow classification detection method, system and storage medium |
CN112839055B (en) * | 2021-02-04 | 2022-08-23 | 北京六方云信息技术有限公司 | Network application identification method and device for TLS encrypted traffic and electronic equipment |
CN113098735B (en) * | 2021-03-31 | 2022-10-11 | 上海天旦网络科技发展有限公司 | Inference-oriented application flow and index vectorization method and system |
CN113141357B (en) * | 2021-04-19 | 2022-02-18 | 湖南大学 | Feature selection method and system for optimizing network intrusion detection performance |
CN113315721B (en) * | 2021-05-26 | 2023-01-17 | 恒安嘉新(北京)科技股份公司 | Network data feature processing method, device, equipment and storage medium |
CN113556317B (en) * | 2021-06-07 | 2022-10-11 | 中国科学院信息工程研究所 | Abnormal flow detection method and device based on network flow structural feature fusion |
CN114928560B (en) * | 2022-05-16 | 2023-01-31 | 珠海市鸿瑞信息技术股份有限公司 | Big data based network flow and equipment log cooperative management system and method |
CN116647877B (en) * | 2023-06-12 | 2024-03-15 | 广州爱浦路网络技术有限公司 | Flow category verification method and system based on graph convolution model |
CN116662817B (en) * | 2023-07-31 | 2023-11-24 | 北京天防安全科技有限公司 | Asset identification method and system of Internet of things equipment |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20080162088A1 (en) * | 2005-05-03 | 2008-07-03 | Devaul Richard W | Method and system for real-time signal classification |
CN102394827A (en) * | 2011-11-09 | 2012-03-28 | 浙江万里学院 | Hierarchical classification method for internet flow |
CN104052639A (en) * | 2014-07-02 | 2014-09-17 | 山东大学 | Real-time multi-application network flow identification method based on support vector machine |
CN106874879A (en) * | 2017-02-21 | 2017-06-20 | 华南师范大学 | Handwritten Digit Recognition method based on multiple features fusion and deep learning network extraction |
CN107967311A (en) * | 2017-11-20 | 2018-04-27 | 阿里巴巴集团控股有限公司 | A kind of method and apparatus classified to network data flow |
Family Cites Families (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103973589B (en) * | 2013-09-12 | 2017-04-12 | 哈尔滨理工大学 | Network traffic classification method and device |
CN104767692B (en) * | 2015-04-15 | 2018-05-29 | 中国电力科学研究院 | A kind of net flow assorted method |
CN106487535B (en) * | 2015-08-24 | 2020-04-28 | 中兴通讯股份有限公司 | Method and device for classifying network traffic data |
US10785247B2 (en) * | 2017-01-24 | 2020-09-22 | Cisco Technology, Inc. | Service usage model for traffic analysis |
US20200211721A1 (en) * | 2017-03-02 | 2020-07-02 | Singapore University Of Technology And Design | METHOD AND APPARATUS FOR DETERMINING AN IDENTITY OF AN UNKNOWN INTERNET-OF-THINGS (IoT) DEVICE IN A COMMUNICATION NETWORK |
-
2018
- 2018-09-25 CN CN201811113686.XA patent/CN109309630B/en active Active
- 2018-10-29 WO PCT/CN2018/112401 patent/WO2020062390A1/en active Application Filing
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20080162088A1 (en) * | 2005-05-03 | 2008-07-03 | Devaul Richard W | Method and system for real-time signal classification |
CN102394827A (en) * | 2011-11-09 | 2012-03-28 | 浙江万里学院 | Hierarchical classification method for internet flow |
CN104052639A (en) * | 2014-07-02 | 2014-09-17 | 山东大学 | Real-time multi-application network flow identification method based on support vector machine |
CN106874879A (en) * | 2017-02-21 | 2017-06-20 | 华南师范大学 | Handwritten Digit Recognition method based on multiple features fusion and deep learning network extraction |
CN107967311A (en) * | 2017-11-20 | 2018-04-27 | 阿里巴巴集团控股有限公司 | A kind of method and apparatus classified to network data flow |
Cited By (13)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110097120A (en) * | 2019-04-30 | 2019-08-06 | 南京邮电大学 | Network flow data classification method, equipment and computer storage medium |
CN110097120B (en) * | 2019-04-30 | 2022-08-26 | 南京邮电大学 | Network flow data classification method, equipment and computer storage medium |
CN110149280A (en) * | 2019-05-27 | 2019-08-20 | 中国科学技术大学 | Net flow assorted method and apparatus |
CN110149280B (en) * | 2019-05-27 | 2020-08-28 | 中国科学技术大学 | Network traffic classification method and device |
CN110365603A (en) * | 2019-06-28 | 2019-10-22 | 西安交通大学 | A kind of self adaptive network traffic classification method open based on 5G network capabilities |
CN113746686A (en) * | 2020-05-27 | 2021-12-03 | 阿里巴巴集团控股有限公司 | Network flow state determination method, computing device and storage medium |
CN113114672B (en) * | 2021-04-12 | 2023-02-28 | 常熟市国瑞科技股份有限公司 | Video transmission data fine measurement method |
CN113114672A (en) * | 2021-04-12 | 2021-07-13 | 常熟市国瑞科技股份有限公司 | Video transmission data fine measurement method |
CN112995063B (en) * | 2021-04-19 | 2021-10-08 | 北京智源人工智能研究院 | Flow monitoring method, device, equipment and medium |
CN112995063A (en) * | 2021-04-19 | 2021-06-18 | 北京智源人工智能研究院 | Flow monitoring method, device, equipment and medium |
WO2024065185A1 (en) * | 2022-09-27 | 2024-04-04 | 西门子股份公司 | Device classification method and apparatus, electronic device, and computer-readable storage medium |
CN117197591A (en) * | 2023-11-06 | 2023-12-08 | 青岛创新奇智科技集团股份有限公司 | Data classification method based on machine learning |
CN117197591B (en) * | 2023-11-06 | 2024-03-12 | 青岛创新奇智科技集团股份有限公司 | Data classification method based on machine learning |
Also Published As
Publication number | Publication date |
---|---|
CN109309630B (en) | 2021-09-21 |
WO2020062390A1 (en) | 2020-04-02 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109309630A (en) | A kind of net flow assorted method, system and electronic equipment | |
CN109726744B (en) | Network traffic classification method | |
Rustia et al. | Automatic greenhouse insect pest detection and recognition based on a cascaded deep learning classification method | |
CN109639481A (en) | A kind of net flow assorted method, system and electronic equipment based on deep learning | |
Ostapowicz et al. | Detecting fraudulent accounts on blockchain: A supervised approach | |
CN105283851B (en) | For selecting the cost analysis of tracking target | |
WO2020038353A1 (en) | Abnormal behavior detection method and system | |
CN103136471B (en) | A kind of malice Android application program detection method and system | |
CN111343161B (en) | Abnormal information processing node analysis method, abnormal information processing node analysis device, abnormal information processing node analysis medium and electronic equipment | |
CN113435546B (en) | Migratable image recognition method and system based on differentiation confidence level | |
CN102420723A (en) | Anomaly detection method for various kinds of intrusion | |
CN110417810A (en) | The malice for the enhancing model that logic-based returns encrypts flow rate testing methods | |
CN111353491B (en) | Text direction determining method, device, equipment and storage medium | |
CN110290022A (en) | A kind of unknown application layer protocol recognition methods based on self-adaption cluster | |
CN109522692B (en) | Webpage machine behavioral value method and system | |
CN105184886A (en) | Cloud data center intelligence inspection system and cloud data center intelligence inspection method | |
CN110780965A (en) | Vision-based process automation method, device and readable storage medium | |
CN110034966A (en) | A kind of method for classifying data stream and system based on machine learning | |
Shi et al. | Individual automatic detection and identification of big cats with the combination of different body parts | |
Cheng et al. | Blocking bug prediction based on XGBoost with enhanced features | |
CN110519228B (en) | Method and system for identifying malicious cloud robot in black-production scene | |
CN107493275A (en) | The extracted in self-adaptive and analysis method and system of heterogeneous network security log information | |
Zubi et al. | Using data mining techniques to analyze crime patterns in the libyan national crime data | |
CN113256438A (en) | Role identification method and system for network user | |
Xu et al. | Scene text detection based on robust stroke width transform and deep belief network |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |