CN109309630A - A kind of net flow assorted method, system and electronic equipment - Google Patents

A kind of net flow assorted method, system and electronic equipment Download PDF

Info

Publication number
CN109309630A
CN109309630A CN201811113686.XA CN201811113686A CN109309630A CN 109309630 A CN109309630 A CN 109309630A CN 201811113686 A CN201811113686 A CN 201811113686A CN 109309630 A CN109309630 A CN 109309630A
Authority
CN
China
Prior art keywords
network flow
address
network
feature set
data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201811113686.XA
Other languages
Chinese (zh)
Other versions
CN109309630B (en
Inventor
叶可江
赵世林
须成忠
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenzhen Institute of Advanced Technology of CAS
Original Assignee
Shenzhen Institute of Advanced Technology of CAS
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shenzhen Institute of Advanced Technology of CAS filed Critical Shenzhen Institute of Advanced Technology of CAS
Priority to CN201811113686.XA priority Critical patent/CN109309630B/en
Priority to PCT/CN2018/112401 priority patent/WO2020062390A1/en
Publication of CN109309630A publication Critical patent/CN109309630A/en
Application granted granted Critical
Publication of CN109309630B publication Critical patent/CN109309630B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L47/00Traffic control in data switching networks
    • H04L47/10Flow control; Congestion control
    • H04L47/24Traffic characterised by specific attributes, e.g. priority or QoS
    • H04L47/2441Traffic characterised by specific attributes, e.g. priority or QoS relying on flow classification, e.g. using integrated services [IntServ]

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Data Exchanges In Wide-Area Networks (AREA)

Abstract

This application involves a kind of net flow assorted method, system and electronic equipments.This method comprises: step a: acquisition network flow data, and the processing that labels is carried out to the network flow data;Step b: it labels that treated according to described network flow data extracts bidirectional flow feature set;Step c: disaggregated model is constructed based on the bidirectional flow feature set, the classification results of network flow data are exported by the disaggregated model.The application classifies to network flow using the bidirectional flow feature in network flow data, can new opplication a large amount of in internet precisely be identified and be classified, improve classification accuracy, and the high-precision and high-performance of effective Logistics networks traffic classification are capable of.

Description

A kind of net flow assorted method, system and electronic equipment
Technical field
The application belongs to net flow assorted technical field, in particular to a kind of net flow assorted method, system and electricity Sub- equipment.
Background technique
As the high speed of internet is universal, due to the appearance of a large amount of new opplications, modern network environment becomes to become increasingly complex And diversification.Traffic classification and network application identification play an important role in network management services and security system, such as take Business quality, intruding detection system and traffic control system.If can accurately the flow in network system carry out classification and Using identification, not only there is very big promotion to network security and network management services efficiency, system time and memory can also be reduced Expense.
Currently, existing net flow assorted method specifically includes that
One, based on the net flow assorted of representative learning: being pre-processed to the network flow data got, use table It levies learning algorithm and feature extraction is carried out to pretreated network flow data, network flow data is generated into network flow vector, Classified according to the network flow vector to the network flow data, it may be achieved efficiently classify to network flow.
Two, based on the net flow assorted of semi-supervised learning: obtaining the network flow of tagged type and unmarked type, press The stream feature in every network flow is extracted according to default fixed amount, obtains NetFlow characteristic vector;According to the net of tagged type Network stream, calculate in default fixed amount it is each stream feature information gain, and according to the information gain to each stream feature into Row characteristic weighing;The network flow of tagged type and unmarked type is mixed, and using k-means algorithm to mixing after Network flow clustered, obtain k and cluster;Obtain the k each of cluster cluster in marked network flow feature to The number of amount, and determine the accounting value of each type in each cluster;Wherein the accounting value is equal to the mark of each type The number of note NetFlow characteristic vector accounts for the ratio of the number of marked NetFlow characteristic vector total in cluster;When in each cluster When the total number of marked NetFlow characteristic vector is added less than default network flow threshold value, then clustering accordingly, it is unknown to be determined as Otherwise protocol family will cluster accordingly and be determined as the maximum type of ratio in marked NetFlow characteristic vector;Repeat above two Step clusters until k and all determines the flow cluster of discharge pattern;Using the flow cluster for determining discharge pattern as training number According to, training outlet on traffic classifier.This process employs the advantages of semi-supervised learning, only use labeled data with traditional The supervised learning algorithm of training pattern is compared, and Stability and veracity is more preferable.
Three, a kind of adaptive semi-supervised net flow assorted: obtaining the network flow of tagged type and unmarked type, The stream feature for presetting fixed amount in every network flow is extracted, NetFlow characteristic vector is obtained;According to marked network flow feature Vector calculates the mass center of the NetFlow characteristic vector set in each type, obtains vector set M;It is k- with the vector set M The initial center point of means cluster carries out certainly the NetFlow characteristic vector collection X of mixed tagged type and unmarked type The semi-supervised k-means cluster adapted to, and export clustering for k-means;According to the marked net of each cluster in the clustering of output Network flow in obtained every class cluster is mapped in affiliated discharge pattern, obtains by the maximum a posteriori probability of network stream feature vector To the flow cluster of known type;Using the flow cluster of the known type as training data, the traffic classifier in outlet is trained.
In conclusion existing net flow assorted method mainly focuses on the net flow assorted of algorithm level, it is all pair The sorting algorithm part of training stage proposes various optimizations and innovatory algorithm, and not addressing how but can be from network number According to a large amount of related effective feature set problems are extracted in packet, new opplication a large amount of in internet can not accurately be identified And classification.
Summary of the invention
This application provides a kind of net flow assorted method, system and electronic equipments, it is intended at least to a certain extent Solve one of above-mentioned technical problem in the prior art.
To solve the above-mentioned problems, this application provides following technical solutions:
A kind of net flow assorted method, comprising the following steps:
Step a: acquisition network flow data, and the processing that labels is carried out to the network flow data;
Step b: it labels that treated according to described network flow data extracts bidirectional flow feature set;
Step c: constructing disaggregated model based on the bidirectional flow feature set, exports network flow number by the disaggregated model According to classification results.
The technical solution that the embodiment of the present application is taken further include: in the step a, the acquisition network flow data, The network flow data processing that label is specifically included:
Step a1: the applicating category in selection network flow;
Step a2: each grid log using corresponding network flow data packet and corresponding period is collected;
Step a3: analyzing the network flow data packet, find out each application natural quality and with other application it Between the IP address that exchanges and transport protocol;
Step a4: IP endpoint associated with each application and transmission packet number in the grid log are extracted, and is tied It closes IP address and transport protocol is associated fusion, complete the processing that labels of the network flow data.
The technical solution that the embodiment of the present application is taken further include: in the step b, the basis labels, and treated Network flow data extracts bidirectional flow feature set and specifically includes:
Step b1: being analyzed according to the network flow data to have labelled, counts the network flow data respectively In each pair of { source IP address, purpose IP address } { source IP address -> purpose IP address } based on different port number and { destination IP Location -> source IP address } between bilateral network stream information;
Step b2: finding out the positive network flow between each pair of { source IP address -> purpose IP address }, and from the positive network All positive network flow feature sets are extracted in stream;
Step b3: finding out the reversed network flow between each pair of { purpose IP address -> source IP address }, and from the reversed network All reversed network flow feature sets are extracted in stream;
Step b4: combining the forward and reverse network flow feature set between each pair of { source IP address, purpose IP address }, forms M The bidirectional flow feature set of dimensional feature.
The technical solution that the embodiment of the present application is taken further include: the step b further include: utilize maximum variance explanation facility The bidirectional flow feature set is optimized.
The technical solution that the embodiment of the present application is taken further include: described to utilize maximum variance explanation facility to bidirectional flow feature Collection, which optimizes, to be specifically included:
Step b5: standard normalization is carried out to the network flow data;
Step b6: on the network flow data, the average value of each feature in bidirectional flow feature set is found out;
Step b7: the corresponding average value of each feature is subtracted with the network flow data after normalization, obtains each feature It is new as a result, and doing normalized square mean to the new result of each feature;
Step b8: the covariance matrix of the bidirectional flow feature set is calculated, and according to main diagonal in the covariance matrix The variance yields of each feature on line carries out ascending sequence, obtains the most close N of degree of association highest in bidirectional flow feature set Dimensional feature;
Step b9: calculating the characteristic value and feature vector of the covariance matrix, and characteristic value be ranked up by size, The corresponding feature vector of bidirectional flow feature after selecting top n optimization;
Step b10: the network flow data is projected in N number of feature vector;
Step b11: the M dimension bidirectional flow feature set of the network flow data is optimized to N-dimensional bidirectional flow feature set.
A kind of another technical solution that the embodiment of the present application is taken are as follows: net flow assorted system, comprising:
Data acquisition module: for acquiring network flow data;
Data preprocessing module: for carrying out the processing that labels to the network flow data;
Characteristic extracting module: for labelling that treated according to, network flow data extracts bidirectional flow feature set;
Model construction module: defeated by the disaggregated model for constructing disaggregated model based on the bidirectional flow feature set The classification results of network flow data out.
The technical solution that the embodiment of the present application is taken further include:
The data collecting module collected network flow data specifically includes: the applicating category in selection network flow is received Collect each grid log using corresponding network flow data packet and corresponding period;
The data preprocessing module specifically includes the network flow data processing that label: analyzing the network flow Measure data packet, the IP address and transport protocol finding out the natural quality of each application and exchanging between other application;It extracts IP endpoint associated with each application and transmission packet number in the grid log, and combine IP address and transport protocol into Row association fusion, completes the processing that labels of the network flow data.
The technical solution that the embodiment of the present application is taken further include: the characteristic extracting module is according to the net that labels that treated Network data on flows is extracted bidirectional flow feature set and is specifically included:
It is analyzed, is counted respectively each pair of in the network flow data according to the network flow data to have labelled { source IP address, purpose IP address } { source IP address -> purpose IP address } and { purpose IP address -> source based on different port number IP address } between bilateral network stream information;
The positive network flow between each pair of { source IP address -> purpose IP address } is found out, and is extracted from the positive network flow All positive network flow feature sets out;
The reversed network flow between each pair of { purpose IP address -> source IP address } is found out, and is extracted from the reversed network flow All reversed network flow feature sets out;
The forward and reverse network flow feature set between each pair of { source IP address, purpose IP address } is combined, M dimensional feature is formed Bidirectional flow feature set.
The technical solution that the embodiment of the present application is taken further includes characteristic optimization module, and the characteristic optimization module is for utilizing Maximum variance explanation facility optimizes the bidirectional flow feature set.
The technical solution that the embodiment of the present application is taken further include: the characteristic optimization module utilizes maximum variance explanation facility Bidirectional flow feature set is optimized and is specifically included:
Standard normalization is carried out to the network flow data;
On the network flow data, the average value of each feature in bidirectional flow feature set is found out;
The corresponding average value of each feature is subtracted with the network flow data after normalization, obtains the new knot of each feature Fruit, and normalized square mean is done to the new result of each feature;
The covariance matrix of the bidirectional flow feature set is calculated, and according to every on leading diagonal in the covariance matrix The variance yields of a feature carries out ascending sequence, obtains the most close N-dimensional feature of degree of association highest in bidirectional flow feature set;
The characteristic value and feature vector of the covariance matrix are calculated, and characteristic value is ranked up by size, selects preceding N The corresponding feature vector of bidirectional flow feature after a optimization;
The network flow data is projected in N number of feature vector;
The M dimension bidirectional flow feature set of the network flow data is optimized to N-dimensional bidirectional flow feature set.
The another technical solution that the embodiment of the present application is taken are as follows: a kind of electronic equipment, comprising:
At least one processor;And
The memory being connect at least one described processor communication;Wherein,
The memory is stored with the instruction that can be executed by one processor, and described instruction is by described at least one Device is managed to execute, so that at least one described processor is able to carry out the following operation of above-mentioned net flow assorted method:
Step a: acquisition network flow data, and the processing that labels is carried out to the network flow data;
Step b: it labels that treated according to described network flow data extracts bidirectional flow feature set;
Step c: constructing disaggregated model based on the bidirectional flow feature set, exports network flow number by the disaggregated model According to classification results.
Compared with the existing technology, the beneficial effect that the embodiment of the present application generates is: the network flow of the embodiment of the present application Classification method, system and electronic equipment classify to network flow using the bidirectional flow feature in network flow data, can New opplication a large amount of in internet is precisely identified and classified;Simultaneously using the method for maximum variance explanation facility to two-way Stream feature optimizes association, has ensured the high cohesion of bidirectional flow feature, has improved classification accuracy, being capable of effective ensure ne The high-precision and high-performance of network traffic classification.
Detailed description of the invention
Fig. 1 is the flow chart of the net flow assorted method of the embodiment of the present application;
Fig. 2 is the acquisition of network flow data and the process schematic that labels;
Fig. 3 is extraction and the optimization process schematic diagram of bidirectional flow feature set;
Fig. 4 is the structural schematic diagram of the net flow assorted system of the embodiment of the present application;
Fig. 5 is the hardware device structural schematic diagram of net flow assorted method provided by the embodiments of the present application.
Specific embodiment
It is with reference to the accompanying drawings and embodiments, right in order to which the objects, technical solutions and advantages of the application are more clearly understood The application is further elaborated.It should be appreciated that specific embodiment described herein is only to explain the application, not For limiting the application.
Referring to Fig. 1, being the flow chart of the net flow assorted method of the embodiment of the present application.The network of the embodiment of the present application Traffic classification method the following steps are included:
Step 100: acquisition network flow data, and the processing that labels is carried out to network flow data;
In step 100, the acquisition of network flow data and the process that labels are as shown in Figure 2, the specific steps are as follows:
Step 101: the applicating category in selection network flow;
Step 102: stationary applications class flow is captured by high performance network monitoring software continuity;
Step 103: being collected into the grid of the corresponding network flow data packet of each applicating category and corresponding period Log;
Step 104: analysis network flow data packet is found out the natural quality of each application and is handed between other application Key message of stream, such as IP address, transport protocol etc.;
Step 105: extracting IP endpoint associated with each application and transmission packet number in grid log, and combine IP address and transport protocol are associated fusion, complete the processing that labels of network flow data.
Step 200: from extracting bidirectional flow feature set in the network flow data that labels that treated, and utilizing maximum variance Explanation facility optimizes bidirectional flow feature set;
In step 200, the extraction of bidirectional flow feature set and optimization process as shown in figure 3, itself specifically includes the following steps:
Step 201: being analyzed, counted respectively every in network flow data according to the network flow data to have labelled { source IP address -> purpose IP address } to { source IP address, purpose IP address } based on different port number and purpose IP address -> Source IP address } between two-way (forward and reverse) network flow information, each pair of at this time { source IP address, purpose IP address } has two The network flow information of opposite direction exists;
Step 202: finding out the positive network flow between each pair of { source IP address -> purpose IP address }, therefrom extract about every All positive network flow feature set F1 in item forward direction network flow;
Step 203: finding out the reversed network flow between each pair of { purpose IP address -> source IP address }, therefrom extract about every All reversed network flow feature set F2 in the reversed network flow of item;
Step 204: combine each pair of { source IP address, purpose IP address } between forward and reverse network flow feature set F1, F2 }, the bidirectional flow feature set F of M dimensional feature is formed, F={ F1, F2 } is denoted as;
In step 204, by combining all forward and reverse network flow feature sets, play the role of uniformly optimizing.
Step 205: to network flow data carry out standard normalization, by network flow data collection be normalized to mean value be 0, The data set that variance is 1;Normalize formula are as follows: x*=(x-u)/δ, wherein u is the mean value of all-network data on flows, and δ is institute There is the standard deviation of network flow data;
Step 206: on network flow data, finding out the average value of each feature on bidirectional flow feature set F;
Step 207: subtracting the corresponding average value of each feature with normalized network flow data, obtain each feature Newly as a result, and doing normalized square mean to the new result of each feature;
Step 208: calculating the covariance matrix of bidirectional flow feature set F, and according in covariance matrix on leading diagonal The variance yields of each feature carry out ascending sequence, it is special to obtain the most close N-dimensional of degree of association highest in bidirectional flow feature set F Sign;
The covariance two-by-two between feature in step 208, on leading diagonal, covariance is greater than 0, indicate two features it Between be positively correlated trend;Covariance indicates negatively correlated trend between two features less than 0;Covariance is equal to 0, indicates two It is independent between feature;Covariance absolute value is bigger, contacted between two features it is closer, otherwise it is smaller.According to this 5 conditions, The most close N-dimensional feature of degree of association highest in bidirectional flow feature set F can be calculated.The application utilizes maximum variance explanation engine System preferentially combines the most close feature of bilateral network stream feature set degree of being associated in network flow data, filters out Best embody the feature set of network flow classification.
Step 209: calculating the characteristic value and feature vector of covariance matrix, and characteristic value is ranked up by size, select The corresponding feature vector of bidirectional flow feature after top n optimization out;
Step 210: network flow data being projected in N number of feature vector of selection: assuming that network flow data sample Number is p, and characteristic q, it is DataTransform (p*q) that network flow data, which subtracts the sample matrix after characteristic mean, two-way The covariance matrix of stream feature set is p*q, and the matrix of N number of feature vector composition of selection is EigenVectors (q*N), then Network flow data after projection are as follows: OptimizeData (p*N)=DataTransform (p*q) X EigenVectors (q* N);
In step 210, by the way that network flow data is projected in the corresponding feature vector of bidirectional flow feature after optimization, The degree of polymerization that data can be improved reduces the influence of noise data, improves nicety of grading.
Step 211: the M dimension bidirectional flow feature set of network flow data is optimized to N-dimensional bidirectional flow feature set.
Step 300: the bidirectional flow feature set based on optimization, using the random forests algorithm building classification mould of supervised learning Type exports the classification results of network flow data by disaggregated model;
In step 300, is modeled using the random forests algorithm of supervised learning, optimized bidirectional flow feature set is inputted Classification based training is carried out to disaggregated model, and passes through disaggregated model Performance Evaluation, Optimum Classification model performance.With the survey of Qualify Phase Data set is tried to show to test trained disaggregated model, test result, the bidirectional flow feature set building based on optimization Disaggregated model obviously has very high nicety of grading, can improve classification effectiveness under the premise of guaranteeing compared with high-class accuracy rate, from And improve overall performance.
Referring to Fig. 4, being the structure chart of the net flow assorted system of the embodiment of the present application.The network of the embodiment of the present application Traffic classification system includes data acquisition module, data preprocessing module, characteristic extracting module, characteristic optimization module and model structure Model block.
Data acquisition module: for acquiring network flow data;Wherein, Network Traffic Data Collection mode includes: selection Applicating category in network flow captures stationary applications class flow by high performance network monitoring software continuity, is collected into every The grid log of the corresponding network flow data packet of a applicating category and corresponding period.
Data preprocessing module: for carrying out the processing that labels to network flow data;Wherein, network flow data is beaten Label process specifically includes: analysis network flow data packet finds out the natural quality of each application and between other application Key message of exchange, such as IP address, transport protocol etc.;Extract IP associated with each application in grid log Endpoint and transmission packet number, and IP address and transport protocol is combined to be associated fusion, complete the place that labels of network flow data Reason.
Characteristic extracting module: for extracting bidirectional flow feature set from the network flow data that labels that treated;Specifically Ground, bidirectional flow feature set extracting mode include:
A, it is analyzed according to the network flow data to have labelled, counts each pair of { source in network flow data respectively IP address, purpose IP address } { source IP address -> purpose IP address } based on different port number and { purpose IP address -> source IP Location } between two-way (forward and reverse) network flow information, each pair of at this time { source IP address, purpose IP address } has two phase negative sides To network flow information exist;
B, the positive network flow between each pair of { source IP address -> purpose IP address } is found out, is therefrom extracted about every forward direction All positive network flow feature set F1 in network flow;
C, the reversed network flow between each pair of { purpose IP address -> source IP address } is found out, is therefrom extracted reversed about every All reversed network flow feature set F2 in network flow;
D, the forward and reverse network flow feature set { F1, F2 } between each pair of { source IP address, purpose IP address } is combined, is formed The bidirectional flow feature set F of M dimensional feature, is denoted as F={ F1, F2 }.
Characteristic optimization module: for being optimized using bidirectional flow feature set of the maximum variance explanation facility to extraction;Tool Body, bidirectional flow feature set optimal way includes:
A, standard normalization is carried out to network flow data, network flow data collection is normalized to mean value is 0, variance is 1 data set;Normalize formula are as follows: x*=(x-u)/δ, wherein u is the mean value of all-network data on flows, and δ is all-network The standard deviation of data on flows;
B, on network flow data, the average value of each feature on bidirectional flow feature set F is found out;
C, the corresponding average value of each feature is subtracted with normalized network flow data, obtains the new knot of each feature Fruit, and normalized square mean is done to the new result of each feature;
D, the covariance matrix of bidirectional flow feature set F is calculated, and according in covariance matrix each of on leading diagonal The variance yields of feature carries out ascending sequence, obtains the most close N-dimensional feature of degree of association highest in bidirectional flow feature set F;Its In, it is the covariance two-by-two between feature on leading diagonal, covariance is greater than 0, indicates the trend that is positively correlated between two features; Covariance indicates negatively correlated trend between two features less than 0;Covariance is equal to 0, indicates independent between two features;Association Variance absolute value is bigger, contacted between two features it is closer, otherwise it is smaller.According to this 5 conditions, can be calculated two-way Flow the most close N-dimensional feature of degree of association highest in feature set F.The application is using maximum variance explanation facility to network flow data In the most close feature of bilateral network stream feature set degree of being associated preferentially combined, filter out and best embody network flow The feature set of classification.
E, the characteristic value and feature vector of covariance matrix are calculated, and characteristic value is ranked up by size, selects top n The corresponding feature vector of bidirectional flow feature after optimization;
F, network flow data is projected in N number of feature vector of selection: assuming that network flow data sample number is p, Characteristic is q, and it is DataTransform (p*q), bidirectional flow feature that network flow data, which subtracts the sample matrix after characteristic mean, The covariance matrix of collection is p*q, and the matrix of N number of feature vector composition of selection is EigenVectors (q*N), then after projection Network flow data are as follows: OptimizeData (p*N)=DataTransform (p*q) X EigenVectors (q*N);
G, the M dimension bidirectional flow feature set of network flow data is optimized to N-dimensional bidirectional flow feature set.
Model construction module: for the bidirectional flow feature set based on optimization, using the random forests algorithm structure of supervised learning Disaggregated model is built, the classification results of network flow data are exported by disaggregated model;Wherein, using the random forest of supervised learning Optimized bidirectional flow feature set is input to disaggregated model and carries out classification based training, and passes through disaggregated model by algorithm modeling It can assess, Optimum Classification model performance.Trained disaggregated model is tested with the test data set of Qualify Phase, is tested The results show that the disaggregated model that the bidirectional flow feature set based on optimization constructs obviously has very high nicety of grading, can guarantee Under the premise of high-class accuracy rate, classification effectiveness is improved, to improve overall performance.
Fig. 5 is the hardware device structural schematic diagram of net flow assorted method provided by the embodiments of the present application.Such as Fig. 5 institute Show, which includes one or more processors and memory.It takes a processor as an example, which can also include: defeated Enter system and output system.
Processor, memory, input system and output system can be connected by bus or other modes, in Fig. 5 with For being connected by bus.
Memory as a kind of non-transient computer readable storage medium, can be used for storing non-transient software program, it is non-temporarily State computer executable program and module.Processor passes through operation non-transient software program stored in memory, instruction And module realizes the place of above method embodiment thereby executing the various function application and data processing of electronic equipment Reason method.
Memory may include storing program area and storage data area, wherein storing program area can storage program area, extremely Application program required for a few function;It storage data area can storing data etc..In addition, memory may include that high speed is random Memory is accessed, can also include non-transient memory, a for example, at least disk memory, flush memory device or other are non- Transient state solid-state memory.In some embodiments, it includes the memory remotely located relative to processor that memory is optional, this A little remote memories can pass through network connection to processing system.The example of above-mentioned network includes but is not limited to internet, enterprise Intranet, local area network, mobile radio communication and combinations thereof.
Input system can receive the number or character information of input, and generate signal input.Output system may include showing Display screen etc. shows equipment.
One or more of module storages in the memory, are executed when by one or more of processors When, execute the following operation of any of the above-described embodiment of the method:
Step a: acquisition network flow data, and the processing that labels is carried out to the network flow data;
Step b: it labels that treated according to described network flow data extracts bidirectional flow feature set;
Step c: constructing disaggregated model based on the bidirectional flow feature set, exports network flow number by the disaggregated model According to classification results.
Method provided by the embodiment of the present application can be performed in the said goods, has the corresponding functional module of execution method and has Beneficial effect.The not technical detail of detailed description in the present embodiment, reference can be made to method provided by the embodiments of the present application.
The embodiment of the present application provides a kind of non-transient (non-volatile) computer storage medium, and the computer storage is situated between Matter is stored with computer executable instructions, the executable following operation of the computer executable instructions:
Step a: acquisition network flow data, and the processing that labels is carried out to the network flow data;
Step b: it labels that treated according to described network flow data extracts bidirectional flow feature set;
Step c: constructing disaggregated model based on the bidirectional flow feature set, exports network flow number by the disaggregated model According to classification results.
The embodiment of the present application provides a kind of computer program product, and the computer program product is non-temporary including being stored in Computer program on state computer readable storage medium, the computer program include program instruction, when described program instructs When being computer-executed, the computer is made to execute following operation:
Step a: acquisition network flow data, and the processing that labels is carried out to the network flow data;
Step b: it labels that treated according to described network flow data extracts bidirectional flow feature set;
Step c: constructing disaggregated model based on the bidirectional flow feature set, exports network flow number by the disaggregated model According to classification results.
The net flow assorted method, system and electronic equipment of the embodiment of the present application utilize two-way in network flow data Stream feature classifies to network flow, can new opplication a large amount of in internet precisely be identified and be classified;Benefit simultaneously Association is optimized to bidirectional flow feature with the method for maximum variance explanation facility, the high cohesion of bidirectional flow feature has been ensured, has mentioned High classification accuracy, is capable of the high-precision and high-performance of effective Logistics networks traffic classification.
The foregoing description of the disclosed embodiments makes professional and technical personnel in the field can be realized or use the application. Various modifications to these embodiments will be readily apparent to those skilled in the art, defined herein General Principle can realize in other embodiments without departing from the spirit or scope of the application.Therefore, this Shen These embodiments shown in the application please be not intended to be limited to, and are to fit to special with principle disclosed in the present application and novelty The consistent widest scope of point.

Claims (11)

1. a kind of net flow assorted method, which comprises the following steps:
Step a: acquisition network flow data, and the processing that labels is carried out to the network flow data;
Step b: it labels that treated according to described network flow data extracts bidirectional flow feature set;
Step c: constructing disaggregated model based on the bidirectional flow feature set, exports network flow data by the disaggregated model Classification results.
2. net flow assorted method according to claim 1, which is characterized in that in the step a, the acquisition net Network data on flows specifically includes the network flow data processing that label:
Step a1: the applicating category in selection network flow;
Step a2: each grid log using corresponding network flow data packet and corresponding period is collected;
Step a3: analyzing the network flow data packet, finds out the natural quality of each application and hands between other application The IP address and transport protocol of stream;
Step a4: IP endpoint associated with each application and transmission packet number in the grid log are extracted, and combines IP Address and transport protocol are associated fusion, complete the processing that labels of the network flow data.
3. net flow assorted method according to claim 2, which is characterized in that in the step b, the basis is beaten Network flow data after tag processes extracts bidirectional flow feature set and specifically includes:
Step b1: being analyzed according to the network flow data to have labelled, is counted respectively every in the network flow data { source IP address -> purpose IP address } to { source IP address, purpose IP address } based on different port number and purpose IP address -> Source IP address } between bilateral network stream information;
Step b2: the positive network flow between each pair of { source IP address -> purpose IP address } is found out, and from the positive network flow Extract all positive network flow feature sets;
Step b3: the reversed network flow between each pair of { purpose IP address -> source IP address } is found out, and from the reversed network flow Extract all reversed network flow feature sets;
Step b4: combining the forward and reverse network flow feature set between each pair of { source IP address, purpose IP address }, forms M Wei Te The bidirectional flow feature set of sign.
4. net flow assorted method according to claim 3, which is characterized in that the step b further include: utilize maximum Variance explanation facility optimizes the bidirectional flow feature set.
5. net flow assorted method according to claim 4, which is characterized in that described to utilize maximum variance explanation facility Bidirectional flow feature set is optimized and is specifically included:
Step b5: standard normalization is carried out to the network flow data;
Step b6: on the network flow data, the average value of each feature in bidirectional flow feature set is found out;
Step b7: the corresponding average value of each feature is subtracted with the network flow data after normalization, obtains the new of each feature As a result, and doing normalized square mean to the new result of each feature;
Step b8: calculating the covariance matrix of the bidirectional flow feature set, and according in the covariance matrix on leading diagonal The variance yields of each feature carry out ascending sequence, it is special to obtain the most close N-dimensional of degree of association highest in bidirectional flow feature set Sign;
Step b9: the characteristic value and feature vector of the covariance matrix are calculated, and characteristic value is ranked up by size, is selected The corresponding feature vector of bidirectional flow feature after top n optimization;
Step b10: the network flow data is projected in N number of feature vector;
Step b11: the M dimension bidirectional flow feature set of the network flow data is optimized to N-dimensional bidirectional flow feature set.
6. a kind of net flow assorted system characterized by comprising
Data acquisition module: for acquiring network flow data;
Data preprocessing module: for carrying out the processing that labels to the network flow data;
Characteristic extracting module: for labelling that treated according to, network flow data extracts bidirectional flow feature set;
Model construction module: for constructing disaggregated model based on the bidirectional flow feature set, net is exported by the disaggregated model The classification results of network data on flows.
7. net flow assorted system according to claim 6, which is characterized in that
The data collecting module collected network flow data specifically includes: the applicating category in selection network flow is collected every A grid log using corresponding network flow data packet and corresponding period;
The data preprocessing module specifically includes the network flow data processing that label: analyzing the network flow number According to packet, the IP address and transport protocol finding out the natural quality of each application and exchanged between other application;Described in extraction IP endpoint associated with each application and transmission packet number in grid log, and IP address and transport protocol is combined to be closed Connection fusion, completes the processing that labels of the network flow data.
8. net flow assorted system according to claim 7, which is characterized in that the characteristic extracting module is according to mark Label treated network flow data extracts bidirectional flow feature set and specifically includes:
It is analyzed according to the network flow data to have labelled, counts each pair of { source IP in the network flow data respectively Address, purpose IP address } { source IP address -> purpose IP address } based on different port number and { purpose IP address -> source IP Location } between bilateral network stream information;
The positive network flow between each pair of { source IP address -> purpose IP address } is found out, and extracts institute from the positive network flow There is positive network flow feature set;
The reversed network flow between each pair of { purpose IP address -> source IP address } is found out, and extracts institute from the reversed network flow There is reversed network flow feature set;
The forward and reverse network flow feature set between each pair of { source IP address, purpose IP address } is combined, the two-way of M dimensional feature is formed Flow feature set.
9. net flow assorted system according to claim 8, which is characterized in that it further include characteristic optimization module, it is described Characteristic optimization module is for optimizing the bidirectional flow feature set using maximum variance explanation facility.
10. net flow assorted system according to claim 9, which is characterized in that the characteristic optimization module is using most Big variance explanation facility, which optimizes bidirectional flow feature set, to be specifically included:
Standard normalization is carried out to the network flow data;
On the network flow data, the average value of each feature in bidirectional flow feature set is found out;
The corresponding average value of each feature is subtracted with the network flow data after normalization, obtains the new of each feature as a result, simultaneously Normalized square mean is done to the new result of each feature;
The covariance matrix of the bidirectional flow feature set is calculated, and according to each spy in the covariance matrix on leading diagonal The variance yields of sign carries out ascending sequence, obtains the most close N-dimensional feature of degree of association highest in bidirectional flow feature set;
The characteristic value and feature vector of the covariance matrix are calculated, and characteristic value is ranked up by size, it is excellent to select top n The corresponding feature vector of bidirectional flow feature after change;
The network flow data is projected in N number of feature vector;
The M dimension bidirectional flow feature set of the network flow data is optimized to N-dimensional bidirectional flow feature set.
11. a kind of electronic equipment, comprising:
At least one processor;And
The memory being connect at least one described processor communication;Wherein,
The memory is stored with the instruction that can be executed by one processor, and described instruction is by least one described processor It executes, so that at least one described processor is able to carry out the following of above-mentioned 1 to 5 described in any item net flow assorted methods Operation:
Step a: acquisition network flow data, and the processing that labels is carried out to the network flow data;
Step b: it labels that treated according to described network flow data extracts bidirectional flow feature set;
Step c: constructing disaggregated model based on the bidirectional flow feature set, exports network flow data by the disaggregated model Classification results.
CN201811113686.XA 2018-09-25 2018-09-25 Network traffic classification method and system and electronic equipment Active CN109309630B (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN201811113686.XA CN109309630B (en) 2018-09-25 2018-09-25 Network traffic classification method and system and electronic equipment
PCT/CN2018/112401 WO2020062390A1 (en) 2018-09-25 2018-10-29 Network traffic classification method and system, and electronic device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201811113686.XA CN109309630B (en) 2018-09-25 2018-09-25 Network traffic classification method and system and electronic equipment

Publications (2)

Publication Number Publication Date
CN109309630A true CN109309630A (en) 2019-02-05
CN109309630B CN109309630B (en) 2021-09-21

Family

ID=65225067

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201811113686.XA Active CN109309630B (en) 2018-09-25 2018-09-25 Network traffic classification method and system and electronic equipment

Country Status (2)

Country Link
CN (1) CN109309630B (en)
WO (1) WO2020062390A1 (en)

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110097120A (en) * 2019-04-30 2019-08-06 南京邮电大学 Network flow data classification method, equipment and computer storage medium
CN110149280A (en) * 2019-05-27 2019-08-20 中国科学技术大学 Net flow assorted method and apparatus
CN110365603A (en) * 2019-06-28 2019-10-22 西安交通大学 A kind of self adaptive network traffic classification method open based on 5G network capabilities
CN112995063A (en) * 2021-04-19 2021-06-18 北京智源人工智能研究院 Flow monitoring method, device, equipment and medium
CN113114672A (en) * 2021-04-12 2021-07-13 常熟市国瑞科技股份有限公司 Video transmission data fine measurement method
CN113746686A (en) * 2020-05-27 2021-12-03 阿里巴巴集团控股有限公司 Network flow state determination method, computing device and storage medium
CN117197591A (en) * 2023-11-06 2023-12-08 青岛创新奇智科技集团股份有限公司 Data classification method based on machine learning
WO2024065185A1 (en) * 2022-09-27 2024-04-04 西门子股份公司 Device classification method and apparatus, electronic device, and computer-readable storage medium

Families Citing this family (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111698223B (en) * 2020-05-22 2022-02-22 哈尔滨工程大学 Encrypted WEB fingerprint identification method based on automatic feature engineering
CN111817971B (en) * 2020-06-12 2023-03-24 华为技术有限公司 Data center network flow splicing method based on deep learning
CN111970305B (en) * 2020-08-31 2022-08-12 福州大学 Abnormal flow detection method based on semi-supervised descent and Tri-LightGBM
CN112448868B (en) * 2020-12-02 2022-09-30 新华三人工智能科技有限公司 Network traffic data identification method, device and equipment
CN112804253B (en) * 2021-02-04 2022-07-12 湖南大学 Network flow classification detection method, system and storage medium
CN112839055B (en) * 2021-02-04 2022-08-23 北京六方云信息技术有限公司 Network application identification method and device for TLS encrypted traffic and electronic equipment
CN113098735B (en) * 2021-03-31 2022-10-11 上海天旦网络科技发展有限公司 Inference-oriented application flow and index vectorization method and system
CN113141357B (en) * 2021-04-19 2022-02-18 湖南大学 Feature selection method and system for optimizing network intrusion detection performance
CN113315721B (en) * 2021-05-26 2023-01-17 恒安嘉新(北京)科技股份公司 Network data feature processing method, device, equipment and storage medium
CN113556317B (en) * 2021-06-07 2022-10-11 中国科学院信息工程研究所 Abnormal flow detection method and device based on network flow structural feature fusion
CN114928560B (en) * 2022-05-16 2023-01-31 珠海市鸿瑞信息技术股份有限公司 Big data based network flow and equipment log cooperative management system and method
CN116647877B (en) * 2023-06-12 2024-03-15 广州爱浦路网络技术有限公司 Flow category verification method and system based on graph convolution model
CN116662817B (en) * 2023-07-31 2023-11-24 北京天防安全科技有限公司 Asset identification method and system of Internet of things equipment

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20080162088A1 (en) * 2005-05-03 2008-07-03 Devaul Richard W Method and system for real-time signal classification
CN102394827A (en) * 2011-11-09 2012-03-28 浙江万里学院 Hierarchical classification method for internet flow
CN104052639A (en) * 2014-07-02 2014-09-17 山东大学 Real-time multi-application network flow identification method based on support vector machine
CN106874879A (en) * 2017-02-21 2017-06-20 华南师范大学 Handwritten Digit Recognition method based on multiple features fusion and deep learning network extraction
CN107967311A (en) * 2017-11-20 2018-04-27 阿里巴巴集团控股有限公司 A kind of method and apparatus classified to network data flow

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103973589B (en) * 2013-09-12 2017-04-12 哈尔滨理工大学 Network traffic classification method and device
CN104767692B (en) * 2015-04-15 2018-05-29 中国电力科学研究院 A kind of net flow assorted method
CN106487535B (en) * 2015-08-24 2020-04-28 中兴通讯股份有限公司 Method and device for classifying network traffic data
US10785247B2 (en) * 2017-01-24 2020-09-22 Cisco Technology, Inc. Service usage model for traffic analysis
US20200211721A1 (en) * 2017-03-02 2020-07-02 Singapore University Of Technology And Design METHOD AND APPARATUS FOR DETERMINING AN IDENTITY OF AN UNKNOWN INTERNET-OF-THINGS (IoT) DEVICE IN A COMMUNICATION NETWORK

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20080162088A1 (en) * 2005-05-03 2008-07-03 Devaul Richard W Method and system for real-time signal classification
CN102394827A (en) * 2011-11-09 2012-03-28 浙江万里学院 Hierarchical classification method for internet flow
CN104052639A (en) * 2014-07-02 2014-09-17 山东大学 Real-time multi-application network flow identification method based on support vector machine
CN106874879A (en) * 2017-02-21 2017-06-20 华南师范大学 Handwritten Digit Recognition method based on multiple features fusion and deep learning network extraction
CN107967311A (en) * 2017-11-20 2018-04-27 阿里巴巴集团控股有限公司 A kind of method and apparatus classified to network data flow

Cited By (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110097120A (en) * 2019-04-30 2019-08-06 南京邮电大学 Network flow data classification method, equipment and computer storage medium
CN110097120B (en) * 2019-04-30 2022-08-26 南京邮电大学 Network flow data classification method, equipment and computer storage medium
CN110149280A (en) * 2019-05-27 2019-08-20 中国科学技术大学 Net flow assorted method and apparatus
CN110149280B (en) * 2019-05-27 2020-08-28 中国科学技术大学 Network traffic classification method and device
CN110365603A (en) * 2019-06-28 2019-10-22 西安交通大学 A kind of self adaptive network traffic classification method open based on 5G network capabilities
CN113746686A (en) * 2020-05-27 2021-12-03 阿里巴巴集团控股有限公司 Network flow state determination method, computing device and storage medium
CN113114672B (en) * 2021-04-12 2023-02-28 常熟市国瑞科技股份有限公司 Video transmission data fine measurement method
CN113114672A (en) * 2021-04-12 2021-07-13 常熟市国瑞科技股份有限公司 Video transmission data fine measurement method
CN112995063B (en) * 2021-04-19 2021-10-08 北京智源人工智能研究院 Flow monitoring method, device, equipment and medium
CN112995063A (en) * 2021-04-19 2021-06-18 北京智源人工智能研究院 Flow monitoring method, device, equipment and medium
WO2024065185A1 (en) * 2022-09-27 2024-04-04 西门子股份公司 Device classification method and apparatus, electronic device, and computer-readable storage medium
CN117197591A (en) * 2023-11-06 2023-12-08 青岛创新奇智科技集团股份有限公司 Data classification method based on machine learning
CN117197591B (en) * 2023-11-06 2024-03-12 青岛创新奇智科技集团股份有限公司 Data classification method based on machine learning

Also Published As

Publication number Publication date
CN109309630B (en) 2021-09-21
WO2020062390A1 (en) 2020-04-02

Similar Documents

Publication Publication Date Title
CN109309630A (en) A kind of net flow assorted method, system and electronic equipment
CN109726744B (en) Network traffic classification method
Rustia et al. Automatic greenhouse insect pest detection and recognition based on a cascaded deep learning classification method
CN109639481A (en) A kind of net flow assorted method, system and electronic equipment based on deep learning
Ostapowicz et al. Detecting fraudulent accounts on blockchain: A supervised approach
CN105283851B (en) For selecting the cost analysis of tracking target
WO2020038353A1 (en) Abnormal behavior detection method and system
CN103136471B (en) A kind of malice Android application program detection method and system
CN111343161B (en) Abnormal information processing node analysis method, abnormal information processing node analysis device, abnormal information processing node analysis medium and electronic equipment
CN113435546B (en) Migratable image recognition method and system based on differentiation confidence level
CN102420723A (en) Anomaly detection method for various kinds of intrusion
CN110417810A (en) The malice for the enhancing model that logic-based returns encrypts flow rate testing methods
CN111353491B (en) Text direction determining method, device, equipment and storage medium
CN110290022A (en) A kind of unknown application layer protocol recognition methods based on self-adaption cluster
CN109522692B (en) Webpage machine behavioral value method and system
CN105184886A (en) Cloud data center intelligence inspection system and cloud data center intelligence inspection method
CN110780965A (en) Vision-based process automation method, device and readable storage medium
CN110034966A (en) A kind of method for classifying data stream and system based on machine learning
Shi et al. Individual automatic detection and identification of big cats with the combination of different body parts
Cheng et al. Blocking bug prediction based on XGBoost with enhanced features
CN110519228B (en) Method and system for identifying malicious cloud robot in black-production scene
CN107493275A (en) The extracted in self-adaptive and analysis method and system of heterogeneous network security log information
Zubi et al. Using data mining techniques to analyze crime patterns in the libyan national crime data
CN113256438A (en) Role identification method and system for network user
Xu et al. Scene text detection based on robust stroke width transform and deep belief network

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant