CN108494620B - Network service flow characteristic selection and classification method - Google Patents

Network service flow characteristic selection and classification method Download PDF

Info

Publication number
CN108494620B
CN108494620B CN201810169202.7A CN201810169202A CN108494620B CN 108494620 B CN108494620 B CN 108494620B CN 201810169202 A CN201810169202 A CN 201810169202A CN 108494620 B CN108494620 B CN 108494620B
Authority
CN
China
Prior art keywords
classification
selecting
population
value
individual
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201810169202.7A
Other languages
Chinese (zh)
Other versions
CN108494620A (en
Inventor
董育宁
张咪
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nanjing University of Posts and Telecommunications
Original Assignee
Nanjing University of Posts and Telecommunications
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nanjing University of Posts and Telecommunications filed Critical Nanjing University of Posts and Telecommunications
Priority to CN201810169202.7A priority Critical patent/CN108494620B/en
Publication of CN108494620A publication Critical patent/CN108494620A/en
Application granted granted Critical
Publication of CN108494620B publication Critical patent/CN108494620B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L43/00Arrangements for monitoring or testing data switching networks
    • H04L43/06Generation of reports
    • H04L43/062Generation of reports related to network traffic
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L43/00Arrangements for monitoring or testing data switching networks
    • H04L43/02Capturing of monitoring data
    • H04L43/026Capturing of monitoring data using flow identification
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L43/00Arrangements for monitoring or testing data switching networks
    • H04L43/02Capturing of monitoring data
    • H04L43/028Capturing of monitoring data by filtering
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L43/00Arrangements for monitoring or testing data switching networks
    • H04L43/08Monitoring or testing based on specific metrics, e.g. QoS, energy consumption or environmental parameters
    • H04L43/0876Network utilisation, e.g. volume of load or congestion level
    • H04L43/0894Packet rate

Landscapes

  • Engineering & Computer Science (AREA)
  • Signal Processing (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Data Exchanges In Wide-Area Networks (AREA)

Abstract

The invention discloses a service flow feature selection and classification method based on a multi-target adaptive evolution algorithm. The self-adaptive intersection and variation keep the diversity of the population and ensure the convergence capability of the algorithm. Meanwhile, the invention classifies six multimedia service flows of online standard definition live video, webpage browsing (Baidu), online audio, webpage browsing (sina), network voice chat and online standard definition non-live video by using a designed three-layer KNN classifier model. The experimental result shows that the method has higher classification accuracy than the existing method.

Description

Network service flow characteristic selection and classification method
Technical Field
The invention belongs to the technical field of pattern recognition and classification, and particularly relates to a network service flow feature selection and classification method based on a multi-target adaptive evolution algorithm.
Background
In recent years, with the rapid development of the internet, accurate and efficient network flow classification is an important basis for network management. The diversity of network multimedia traffic stream types presents a significant challenge to their classification and identification. The traditional flow classification method mainly comprises three methods: port-based methods, deep packet inspection methods, and methods based on statistical characteristics of multimedia streams. However, with the advent of data encryption, new applications, and the use of dynamic ports, the first two classification methods are no longer applicable. Today, most researchers focus on machine learning classification methods including decision trees, svm (supportvectormachine), and C5.0.
In practical application, the feature dimension is often very high, and the existence of irrelevant and redundant features easily results in long time and high complexity for model training, and is not easy to popularize. The feature selection can filter out irrelevant and redundant features, so that the rapid dimension reduction is realized, and the model accuracy is improved. The feature selection algorithm can be classified into a Filter type (Filter), an encapsulation type (Wrapper) and an embedded type (Embed) according to different evaluation functions. The process of filtering type feature selection is independent and independent of the specific classifier. The encapsulation type is to combine feature selection with the design of a classifier and use classification accuracy to evaluate the selected features to select the optimal subset. The embedded type is that a feature selection method is used as a part of classifier training, and a subset is selected by analyzing the classification result of the obtained model. The current common feature selection methods include information Gain Ratio (GR), Pearson correlation coefficient, chi-square statistics, and the like. When the feature dimension is too high, efficiency needs to be improved by means of a search algorithm, and in recent years, many search algorithms have been applied to feature selection, such as Sequence Forward Selection (SFS), Sequence Backward Selection (SBS), and L-added-to-R selection algorithms. At present, an intelligent optimization search algorithm becomes a research hotspot, and the intelligent optimization search algorithm, such as an Evolution Algorithm (EA), a particle swarm algorithm and the like, is widely applied in the aspect of feature selection. However, these methods only consider a single criterion when searching for feature subsets, and do not consider the cardinality of the selected feature subsets, and they all belong to single target feature selection methods.
The multi-objective optimization can evaluate the quality of the feature subsets from multiple angles and optimize the evaluation indexes as objective functions simultaneously. Inspired by the natural biological evolution process, researchers have proposed multi-objective evolutionary algorithms, such as the non-dominated radial basis evolution algorithm (ENORA), for solving multi-objective optimization problems. However, uncorrelated and redundant features can increase the temporal complexity of multi-objective optimization when the feature dimension is high. For the evolutionary algorithm, the final classification accuracy and the convergence capability of the algorithm are reduced due to the fact that population initialization, crossing and mutation probability are not properly selected. And one objective function of most of the existing multi-objective feature selection algorithms is the accuracy of the classifier, so that the convergence speed is low and the running time is long.
Disclosure of Invention
The technical problem to be solved by the invention is as follows:
in order to overcome the defects of the algorithms, the invention provides a network service flow characteristic selection and classification method based on a multi-target self-adaptive evolution algorithm.
The invention adopts the following technical scheme for solving the technical problems:
the invention provides a network service flow characteristic selection and classification method based on a multi-target self-adaptive evolution algorithm, which comprises the following steps:
(1) data collection and preprocessing: collecting data flow samples of various multimedia services on the Internet, and then carrying out preprocessing operation;
(2) feature selection and analysis: analyzing the statistical characteristics of the network data flow samples, and selecting the characteristic combination which effectively distinguishes the service flows;
(3) classifying and checking the service flow: and carrying out classification experiments on the network multimedia service flows by utilizing the three layers of KNN classifiers to obtain classification results and calculate the integral classification accuracy.
Further, the method for selecting and classifying network multimedia service stream features based on the multi-objective adaptive evolution algorithm provided by the invention specifically comprises the following steps:
(2.1) capturing required multimedia service flow data through network packet analysis software WireShark in an open internet environment, and then converting the original data into a standard five-tuple text format, wherein the five-tuple text format comprises the arrival time of a data packet, a source IP address, a destination IP address, a protocol and the packet size of the data packet;
(2.2) performing basic statistical feature calculation on a standard five-tuple file of the original multimedia service stream, wherein the statistical features comprise: uplink/downlink packet size, entropy of uplink/downlink packet size information, overall packet size, uplink/downlink packet arrival time interval, downlink data packet rate, downlink byte rate, and ratio of uplink and downlink byte number.
Further, the method for selecting and classifying the network multimedia service stream features based on the multi-target adaptive evolution algorithm provided by the invention specifically comprises the following steps:
(3.1) sorting all the characteristics by adopting the information gain rate, and filtering the characteristics lower than the correlation threshold value;
(3.2) code selection: selecting binary codes with the length of the characteristic number N,each coding unit consists of a string of bits; any bit has two values, the value of 1 represents that the characteristic is selected, and the value of 0 represents that the characteristic is not selected; each individual is represented as:
Figure BDA0001585330690000021
wherein
Figure BDA0001585330690000022
cIAnd mIRespectively representing discrete parameters for performing adaptive crossover and mutation in each coding individual;
(3.3) population initialization: initializing an empty population P0When the number of individuals in the population is less than the population size popsize, the loop is executed in [1, N ]]Initializing the value of q randomly within the range, selecting q characteristics with the information gain rate ranked at the top by an individual, namely setting the corresponding front q position as 1 and the positions from q +1 to N as 0, and adding the individual into a population P0
(3.4) there are two fitness functions f per individual I1(I) And f2(I) Two objective functions corresponding to multi-objective optimization; wherein f is1(I) As the rate of inconsistency, f2(I) Representing the number of the selected features;
(3.5) selecting a parent: selecting a parent based on the crowding distance of the individual;
(3.6) adaptive interleaving:
fixed cross probability pcFor any two individuals I and J of the t generation, if the Bernoulli random variable is represented by pcIf the probability of (1) is taken, c isJRandomly set to 0 or 1, and cJIs given to cI(ii) a If c isJIf the value of (1) is 0, no crossing is performed, and if the value of (1) is 1, uniform crossing is performed;
adding new cross-generated individuals into the auxiliary population QtPerforming the following steps;
(3.7) adaptive mutation:
fixed mutation probability pmFor t generation individuals I, if the Bernoulli random variable is represented by pmIf the probability of (1) is taken, m isISetting the value to 0 or 1 randomly; if m isIIf the value of (1) is 0, no mutation is performed, and if the value of (1) is 1, single-point flipping is performedPerforming mutation;
adding new individuals generated by variation to QtIn the generation group, and the parent PtAnd QtAre combined into an auxiliary population Rt
For population RtAll the individuals in the system are sorted according to the grade and crowding distance of the target function, and the front popsize individuals are selected to survive to the next generation Pt+1
Executing t ═ t + 1;
(3.8) if the maximum iteration number gen is met or the inconsistency rate is kept unchanged in the iteration process, outputting an optimal feature subset; otherwise, repeating the steps (3.4) to (3.7).
Further, the method for selecting and classifying network multimedia service stream features based on the multi-target adaptive evolution algorithm provided by the invention comprises the following steps: a feature combination in a sample instance is called a pattern, the number of inconsistency of all patterns of a feature subset is the total number of samples of the pattern occurrence minus the number of samples of a certain type of label with the largest number of occurrences, and the inconsistency rate is equal to the number of inconsistency divided by the total number of samples.
Further, the method for selecting and classifying network multimedia service stream features based on the multi-target adaptive evolution algorithm, provided by the invention, has the correlation threshold value of 0.4 in the step (3.1), three layers of classifiers corresponding to N in the step (3.2) are 25, 26 and 13 in sequence, and the cross probability p in the step (3.6)cAnd the mutation probability p in step (3.7)mBoth are 0.1, popsize in step (3.7) is 100, and maximum number of iterations gen in step (3.8) is 10.
Further, the invention provides a method for selecting and classifying network multimedia service flow characteristics based on a multi-target self-adaptive evolution algorithm, wherein the service flow classification step specifically comprises the following steps:
(5.1) selecting the characteristics of the original multimedia service flow by adopting a characteristic selection method, classifying the multimedia flow into 4 types by a first-layer KNN: c1, C2, C3, C4; wherein C1 is online audio, C2 is online video, C3 is web browsing, and C4 is network voice chat;
(5.2) carrying out feature selection on the video stream features of the C2 obtained by the previous layer of classification again by using a feature selection method, and carrying out KNN classification of a second layer to obtain classification results C21 and C22;
(5.3) carrying out feature selection on the data stream features of the classification result C3 in the step (5.1) by using a feature selection method again, and carrying out second KNN classification of a second layer to obtain classification results C31 and C32;
and (5.4) counting the output result of the classification and calculating the accuracy of the whole classification.
Further, in the method for selecting and classifying the characteristics of the network multimedia service stream based on the multi-target adaptive evolution algorithm, the classification result C21 is an online live video, and C22 is an online non-live video; the content of the C31 webpage is characters and pictures, and the content of the C32 webpage is characters, pictures and videos.
Compared with the prior art, the invention adopting the technical scheme has the following technical effects:
1. compared with a single-target feature selection algorithm, the multi-target feature selection method adopting the multi-target self-adaptive evolution algorithm not only considers the classification accuracy rate, but also considers the number of the selected features; compared with the existing multi-target feature selection algorithm, the method has the advantages of lower calculation complexity, higher convergence rate, capability of effectively reducing the time and space overhead in the feature selection process and improvement on the efficiency of feature selection.
2. The invention adopts a multi-layer classification method for multimedia services, designs a three-layer KNN cascade classifier, firstly selects effective characteristic combinations by using the characteristic selection method of the invention, and then classifies by using the three-layer classifier of the invention. Compared with the existing multi-layer SVM classification method, the method has better classification accuracy.
3. The method selects characteristics of six multimedia service flows of online standard definition live video, webpage browsing (Baidu), online audio, webpage browsing (sina), network voice chat and online standard definition non-live video, and then classifies the service flows by using a three-layer KNN classifier. The experimental result shows that the method has higher recognition rate compared with GR, EA and ENORA feature selection algorithms, and the total accuracy is 98.6%.
Drawings
FIG. 1 is a flow chart of the classification method of the present invention.
Fig. 2 is a valid verification diagram of the feature combination selected in the present invention, in which (a) is a two-dimensional distribution diagram of four network traffic flows on the maximum value of the downstream packet size and the number of upstream bytes, (b) is a two-dimensional distribution diagram of two video types on the maximum value of the downstream packet size and the number of upstream bytes, and (c) is a two-dimensional distribution diagram of two web browsing types on the maximum value of the downstream packet size and the downstream byte rate skewness.
Fig. 3 is a comparison chart of the accuracy of the GR, EA and ENORA for six multimedia service classifications according to the present invention.
Detailed Description
The technical scheme of the invention is further explained in detail by combining the attached drawings:
it will be understood by those skilled in the art that, unless otherwise defined, all terms (including technical and scientific terms) used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs. It will be further understood that terms, such as those defined in commonly used dictionaries, should be interpreted as having a meaning that is consistent with their meaning in the context of the prior art and will not be interpreted in an idealized or overly formal sense unless expressly so defined herein.
As shown in fig. 1, the present invention provides a method for selecting and classifying multimedia service stream features based on a multi-objective adaptive evolution algorithm, the method comprises data acquisition and preprocessing of multimedia service streams, multimedia service stream feature selection based on the multi-objective adaptive evolution algorithm, three-layer KNN cascade classification output statistical results, and the like, and comprises the following steps:
step 1: the method comprises the following steps of data collection and pretreatment:
(1) in an open internet environment, capturing required multimedia service stream data through network packet analysis software WireShark, and then converting original data into a standard five-tuple text format, namely arrival time of a data packet, a source IP address, a destination IP address, a protocol and packet size of the data packet;
(2) performing basic statistical feature calculation on a standard five-tuple file of an original multimedia service stream, wherein the features comprise: uplink/downlink packet size, entropy of uplink/downlink packet size information, overall packet size, uplink/downlink packet arrival time interval, downlink data packet rate, downlink byte rate, and ratio of uplink and downlink byte number.
Step 2: the method comprises the following specific steps of:
(1) sorting all the features by adopting an information gain rate, and filtering the features lower than a correlation threshold value of 0.4;
(2) and (3) coding selection: a binary code is chosen, each individual consisting of a string of bits (length is a characteristic number N). Any bit has two values, the value of 1 represents that the characteristic is selected, and the value of 0 represents that the characteristic is not selected;
(3) population initialization: in [1, N ]]Randomly initializing the value of q in the range, and selecting q characteristics with the information gain rate ranking at the top as an initialization population P0Setting the corresponding front q position as 1 and the positions from q +1 to N as 0;
(4) each individual I has two fitness functions f1(I) And f2(I) Corresponding to two objective functions of the multi-objective optimization. Wherein f is1(I) For the inconsistency rate, a feature combination in the sample instance is called a pattern, the inconsistency number of all patterns in the feature subset is equal to the total number of samples appearing in the pattern minus the number of samples of a certain type of label with the largest number of occurrences, and the inconsistency rate is equal to the inconsistency number divided by the total number of samples; f. of2(I) Representing the number of the selected features;
(5) selecting a parent: selecting a parent based on the crowding distance of the individual;
(6) self-adaptive intersection: first fix the cross probability pc0.1, then for PtAny two individuals of generation I and J, if the Bernoulli random variable is represented by pcIf the probability of (1) is taken, c isJRandomly set to 0 or 1, and cJIs given to cI. If c isJIf the value of (1) is 0, no crossover is performed, and if 1, uniform crossover is performed. Adding new individuals generated by crossing into Pt+1In the generation group;
(7) self-adaptive mutation: first, the mutation probability p is fixedm0.1 for PtGeneration I if the Bernoulli random variable is pmIf the probability of (1) is taken, m isIIs randomly set to 0 or 1 if mIIf the value of (1) is 0, the mutation is not performed, and if the value of (1) is 1, the one-point inversion mutation is performed. Adding new individuals generated by mutation into Pt+1In the generation group, executing t as t + 1;
(8) if the maximum iteration number gen is 10 or the inconsistency rate is kept unchanged in the iteration process, outputting an optimal feature subset; otherwise, repeating the step (4) to the step (7).
In an experiment, a three-layer KNN cascade classifier model is designed, and the model can identify certain specific types of application services in each level of classifier by using the feature combination selected by the method. The KNN classifier of the first layer is mainly used for identifying online audio (QQ Music), online video (live broadcast and non-live broadcast), webpage browsing and network voice chat (Skype), and the optimal characteristic combination is the maximum value of the size of a downlink packet and the number of uplink bytes. For the convenience of observation, we have done a log operation on (a) of fig. 2. As can be seen from fig. 2 (a), Skype belongs to interactive audio, so the number of bytes in the upstream is higher than that of web browsing and QQ Music but lower than that of live video in the online, and Skype and QQ Music can be efficiently identified by using the maximum value of the size of the downstream packet and the number of bytes in the upstream.
And the KNN classifier of the second layer further divides the video obtained by classifying the first layer into an online live video and an online non-live video. The best combination of features: and (4) uplink byte number. The CBox belongs to a live video type, and the interactive data between the client and the server is obviously more than that of the non-live video, youku video, so that the live video and the non-live video can be completely separated by characteristic upstream byte number from (b) of fig. 2.
The KNN classifier at the third layer further divides the webpage browsing obtained by the classification at the first layer into Baidu (the webpage content is characters and pictures) and sina (the webpage content is characters, pictures and videos). The best combination of features: maximum value of downlink packet size and downlink byte rate skewness. Since the video-class service data packet is larger than other service flows, and the sina browsing content comprises video-class, the maximum value of the downstream packet size of sina is slightly larger than the Baidu service flow. As shown in FIG. 2 (c), the characteristic maximum value of the downstream packet size and the downstream byte rate skewness can accurately identify sina and Baidu traffic flows
And 3, classifying and checking the service flow, which comprises the following specific steps:
(1) performing feature selection on an original multimedia service stream by adopting a feature selection method, performing first-layer KNN classification, and classifying the multimedia stream into 4 types of C1, C2, C3 and C4; wherein, C1 is online audio (QQ Music), C2 is online video (live broadcast and non-live broadcast), C3 is web browsing, C4 is network voice chat (Skype);
(2) carrying out feature selection on the video stream features of the C2 obtained by the previous layer of classification by using a feature selection method again, and carrying out KNN classification of the second layer to obtain classification results C21 and C22; wherein, C21 is an online live video, and C22 is an online non-live video;
(3) performing feature selection on the data stream features of the classification result C3 in the step (1) again by using a feature selection method, and performing second KNN classification on a second layer to obtain classification results C31 and C32; wherein, C31 is Baidu (the webpage content is characters and pictures), C32 is sina (the webpage content is characters, pictures and videos);
(4) and counting the classification output result and calculating the integral classification accuracy.
In the experiment, two-fold cross validation is adopted, and the classification result of the invention is compared with the results of GR, EA and ENORA. As can be seen from FIG. 3, the method of the present invention has the highest overall classification accuracy, which is as high as 98.6%.
The foregoing is only a partial embodiment of the present invention, and it should be noted that, for those skilled in the art, various modifications and decorations can be made without departing from the principle of the present invention, and these modifications and decorations should also be regarded as the protection scope of the present invention.

Claims (6)

1. The network service flow characteristic selection and classification method based on the multi-target self-adaptive evolution algorithm is characterized by comprising the following steps of:
(1) data collection and preprocessing: collecting data flow samples of various multimedia services on the Internet, and then carrying out preprocessing operation;
(2) feature selection and analysis: analyzing the statistical characteristics of the network data flow samples, and selecting a characteristic combination which effectively distinguishes the service flows, specifically comprising the following steps:
(3.1) sorting all the characteristics by adopting the information gain rate, and filtering the characteristics lower than the correlation threshold value;
(3.2) code selection: selecting binary codes with the length of the characteristic number N, wherein each code individual consists of a string of bits; any bit has two values, the value of 1 represents that the characteristic is selected, and the value of 0 represents that the characteristic is not selected; each individual is represented as:
Figure FDA0003064434860000011
wherein
Figure FDA0003064434860000012
cI∈{0,1},mI∈{0,1};cIAnd mIRespectively representing discrete parameters for performing adaptive crossover and mutation in each coding individual;
(3.3) population initialization: initializing an empty population P0When the number of individuals in the population is less than the population size popsize, the loop is executed in [1, N ]]Initializing the value of q randomly within the range, selecting q characteristics with the information gain rate ranked at the top by an individual, namely setting the corresponding front q position as 1 and the positions from q +1 to N as 0, and adding the individual into a population P0
(3.4) there are two fitness functions f per individual I1(I) And f2(I) Two objective functions corresponding to multi-objective optimization; wherein f is1(I) As the rate of inconsistency, f2(I) Representing the number of the selected features;
(3.5) selecting a parent: selecting a parent based on the crowding distance of the individual;
(3.6) adaptive interleaving:
fixed cross probability pcFor any two individuals of the t generation I andj, if pcIf the probability of (1) is taken, c isJRandomly set to 0 or 1, and cJIs given to cI(ii) a If c isJIf the value of (1) is 0, no crossing is performed, and if the value of (1) is 1, uniform crossing is performed;
adding new individuals generated in a crossed way into an auxiliary population QtPerforming the following steps;
(3.7) adaptive mutation:
fixed mutation probability pmFor t generation of individual I, if pmIf the probability of (1) is taken, m isISetting the value to 0 or 1 randomly; if m isIIf the value of (1) is 0, performing no mutation, and if the value of (1) is 1, performing single-point inversion mutation;
adding new individuals generated by variation to QtIn the generation group, and the parent PtAnd QtAre combined into an auxiliary population Rt
For population RtAll the individuals in the system are sorted according to the grade and crowding distance of the target function, and the front popsize individuals are selected to survive to the next generation Pt+1
Executing t ═ t + 1;
(3.8) if the maximum iteration number gen is met or the inconsistency rate is kept unchanged in the iteration process, outputting an optimal feature subset; otherwise, repeating the step (3.4) to the step (3.7);
(3) classifying and checking the service flow: and carrying out classification experiments on the network multimedia service flows by utilizing the three layers of KNN classifiers to obtain classification results and calculate the integral classification accuracy.
2. The method for selecting and classifying network traffic flow characteristics based on multi-objective adaptive evolution algorithm according to claim 1, wherein the data collection and preprocessing operation specifically comprises:
(2.1) capturing required multimedia service flow data through network packet analysis software WireShark in an open internet environment, and then converting the original data into a standard five-tuple text format, wherein the five-tuple text format comprises the arrival time of a data packet, a source IP address, a destination IP address, a protocol and the packet size of the data packet;
(2.2) performing basic statistical feature calculation on a standard five-tuple file of the original multimedia service stream, wherein the statistical features comprise: uplink/downlink packet size, entropy of uplink/downlink packet size information, overall packet size, uplink/downlink packet arrival time interval, downlink data packet rate, downlink byte rate, and ratio of uplink and downlink byte number.
3. The method for selecting and classifying network traffic flow characteristics based on multi-objective adaptive evolution algorithm according to claim 1, wherein the inconsistency rate is: a feature combination in a sample instance is called a pattern, the number of inconsistency of all patterns of a feature subset is the total number of samples of the pattern occurrence minus the number of samples of a certain type of label with the largest number of occurrences, and the inconsistency rate is equal to the number of inconsistency divided by the total number of samples.
4. The method for selecting and classifying network traffic flow characteristics based on multi-objective adaptive evolution algorithm according to claim 1, wherein the correlation threshold in step (3.1) is 0.4, the three-tier classifiers corresponding to N in step (3.2) are 25, 26 and 13 in sequence, and the cross probability p in step (3.6) iscAnd the mutation probability p in step (3.7)mBoth are 0.1, popsize in step (3.7) is 100, and maximum number of iterations gen in step (3.8) is 10.
5. The method for selecting and classifying network traffic flow characteristics based on multi-objective adaptive evolution algorithm according to claim 1, wherein the traffic flow classification step specifically comprises:
(5.1) selecting the characteristics of the original multimedia service flow by adopting a characteristic selection method, classifying the multimedia flow into 4 types by a first-layer KNN: c1, C2, C3, C4; wherein C1 is online audio, C2 is online video, C3 is web browsing, and C4 is network voice chat;
(5.2) carrying out feature selection on the video stream features of the C2 obtained by the previous layer of classification again by using a feature selection method, and carrying out KNN classification of a second layer to obtain classification results C21 and C22;
(5.3) carrying out feature selection on the data stream features of the classification result C3 in the step (5.1) by using a feature selection method again, and carrying out second KNN classification of a second layer to obtain classification results C31 and C32;
and (5.4) counting the output result of the classification and calculating the accuracy of the whole classification.
6. The method for selecting and classifying characteristics of network traffic streams based on multi-objective adaptive evolution algorithm according to claim 5, wherein the classification result C21 is an online live video and C22 is an online non-live video; the content of the C31 webpage is characters and pictures, and the content of the C32 webpage is characters, pictures and videos.
CN201810169202.7A 2018-02-28 2018-02-28 Network service flow characteristic selection and classification method Active CN108494620B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810169202.7A CN108494620B (en) 2018-02-28 2018-02-28 Network service flow characteristic selection and classification method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810169202.7A CN108494620B (en) 2018-02-28 2018-02-28 Network service flow characteristic selection and classification method

Publications (2)

Publication Number Publication Date
CN108494620A CN108494620A (en) 2018-09-04
CN108494620B true CN108494620B (en) 2021-07-27

Family

ID=63341158

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810169202.7A Active CN108494620B (en) 2018-02-28 2018-02-28 Network service flow characteristic selection and classification method

Country Status (1)

Country Link
CN (1) CN108494620B (en)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112580606B (en) * 2020-12-31 2022-11-08 安徽大学 Large-scale human body behavior identification method based on clustering grouping
CN113079427B (en) * 2021-04-28 2021-11-23 北京航空航天大学 ASON network service availability evaluation method based on network evolution model
CN115049019B (en) * 2022-07-25 2023-04-07 湖南工商大学 Method and device for evaluating arsenic adsorption performance of metal organic framework and related equipment

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105260746A (en) * 2015-10-09 2016-01-20 乔善平 Expandable multilayer integrated multi-mark learning system
CN105787437A (en) * 2016-02-03 2016-07-20 东南大学 Vehicle brand type identification method based on cascading integrated classifier
CN106897733A (en) * 2017-01-16 2017-06-27 南京邮电大学 Video stream characteristics selection and sorting technique based on particle swarm optimization algorithm

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2008150840A1 (en) * 2007-05-29 2008-12-11 University Of Iowa Research Foundation Methods and systems for determining optimal features for classifying patterns or objects in images
US9165051B2 (en) * 2010-08-24 2015-10-20 Board Of Trustees Of The University Of Illinois Systems and methods for detecting a novel data class

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105260746A (en) * 2015-10-09 2016-01-20 乔善平 Expandable multilayer integrated multi-mark learning system
CN105787437A (en) * 2016-02-03 2016-07-20 东南大学 Vehicle brand type identification method based on cascading integrated classifier
CN106897733A (en) * 2017-01-16 2017-06-27 南京邮电大学 Video stream characteristics selection and sorting technique based on particle swarm optimization algorithm

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
"Combining support vector machines and information gain ranking for classification of mars mcmurdo panorama images";Changjing Shang,;《Proceedings of 2010 IEEE 17th International Conference on Image Processing》;20100929;全文 *
"基于粒子群优化算法的视频流特征选择方法";冯茂,;《南京邮电大学学报》;20170605;第32卷(第2期);第80-85页 *

Also Published As

Publication number Publication date
CN108494620A (en) 2018-09-04

Similar Documents

Publication Publication Date Title
CN112398779B (en) Network traffic data analysis method and system
CN109951444B (en) Encrypted anonymous network traffic identification method
CN109151880B (en) Mobile application flow identification method based on multilayer classifier
CN111565156B (en) Method for identifying and classifying network traffic
CN108494620B (en) Network service flow characteristic selection and classification method
CN104244035B (en) Network video stream sorting technique based on multi-level clustering
CN108199863B (en) Network traffic classification method and system based on two-stage sequence feature learning
Song et al. Encrypted traffic classification based on text convolution neural networks
Zhang et al. Proword: An unsupervised approach to protocol feature word extraction
Kaur A comparison of two hybrid ensemble techniques for network anomaly detection in spark distributed environment
CN114615093A (en) Anonymous network traffic identification method and device based on traffic reconstruction and inheritance learning
CN114500396A (en) MFD chromatographic characteristic extraction method and system for distinguishing anonymous Tor application flow
Soleymanpour et al. An efficient deep learning method for encrypted traffic classification on the web
Chen et al. Ride: Real-time intrusion detection via explainable machine learning implemented in a memristor hardware architecture
CN109450876B (en) DDos identification method and system based on multi-dimensional state transition matrix characteristics
Hu et al. Towards early and accurate network intrusion detection using graph embedding
Shi et al. PSO-based community detection in complex networks
Dong et al. An efficient feature selection method for network video traffic classification
CN108307231B (en) Network video stream feature selection and classification method based on genetic algorithm
CN114358177B (en) Unknown network traffic classification method and system based on multidimensional feature compact decision boundary
CN116781341A (en) Decentralised network DDoS attack identification method based on large language model
CN110061869B (en) Network track classification method and device based on keywords
CN113256507B (en) Attention enhancement method for generating image aiming at binary flow data
CN115459937A (en) Method for extracting characteristics of encrypted network traffic packet in distributed scene
CN115134128A (en) Method for mining and utilizing new type encrypted network flow packet in distributed scene

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant