CN114006870A - Network flow identification method based on self-supervision convolution subspace clustering network - Google Patents

Network flow identification method based on self-supervision convolution subspace clustering network Download PDF

Info

Publication number
CN114006870A
CN114006870A CN202111270837.4A CN202111270837A CN114006870A CN 114006870 A CN114006870 A CN 114006870A CN 202111270837 A CN202111270837 A CN 202111270837A CN 114006870 A CN114006870 A CN 114006870A
Authority
CN
China
Prior art keywords
network
data
clustering
self
supervision
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Withdrawn
Application number
CN202111270837.4A
Other languages
Chinese (zh)
Inventor
王艺杰
杨东
吕珍珍
王文庆
崔逸群
邓楠轶
朱博迪
介银娟
董夏昕
朱召鹏
崔鑫
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Xian Thermal Power Research Institute Co Ltd
Huaneng Group Technology Innovation Center Co Ltd
Original Assignee
Xian Thermal Power Research Institute Co Ltd
Huaneng Group Technology Innovation Center Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Xian Thermal Power Research Institute Co Ltd, Huaneng Group Technology Innovation Center Co Ltd filed Critical Xian Thermal Power Research Institute Co Ltd
Priority to CN202111270837.4A priority Critical patent/CN114006870A/en
Publication of CN114006870A publication Critical patent/CN114006870A/en
Withdrawn legal-status Critical Current

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L47/00Traffic control in data switching networks
    • H04L47/10Flow control; Congestion control
    • H04L47/24Traffic characterised by specific attributes, e.g. priority or QoS
    • H04L47/2483Traffic characterised by specific attributes, e.g. priority or QoS involving identification of individual flows
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Computational Linguistics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Evolutionary Biology (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Data Exchanges In Wide-Area Networks (AREA)

Abstract

The invention discloses a network flow identification method based on an automatic supervision convolution subspace clustering network, which comprises the following steps: preprocessing original network flow data; initializing and pre-training a self-encoder; training a convolution subspace clustering network, and learning a sparse representation matrix of data; adding a clustering module in a convolutional subspace clustering network, measuring the distance between two vectors by using cosine similarity in a similarity matrix construction of the clustering module, and generating a pseudo label by the clustering module; the self-supervision learning is realized by using a classification module to classify data and utilizing a pseudo label generated by a clustering module to calculate the error between a classification result and an expected label, so that the self-supervision effect is realized through the back propagation of a neural network; and finally identifying the network traffic type through a maximum likelihood estimation method. The method is realized based on the statistical characteristics of the flow data instead of the information loaded by the data frame, and has good identification effect on encrypted flow and the like.

Description

Network flow identification method based on self-supervision convolution subspace clustering network
Technical Field
The invention belongs to the technical field of deep learning, network space safety and flow identification, and particularly relates to a network flow identification method based on an automatic supervision convolution subspace clustering network.
Background
With the increasing abundance of network applications and the continuous development of network technologies, a large amount of network traffic is generated every moment, and the network traffic is an important carrier of various information in network transmission. The massive network flow brings great challenges to network security management and flow supervision, accurate identification of the network flow is an important premise for effective network security management and flow supervision, the quality of network transmission can be improved, and normal operation of network security can be guaranteed. The existing network traffic method mainly comprises a port-based identification method, a behavior feature matching-based identification method and a deep packet inspection method, wherein the port-based identification method only has accuracy on network protocol traffic identification using a common port and a registered port, the behavior feature matching-based identification method has high time complexity and space complexity, and the deep packet inspection method has poor intelligent identification capability. These conventional methods do not efficiently accomplish the task of network traffic identification.
Disclosure of Invention
In order to overcome the technical problems, the invention provides a network flow identification method based on an automatic supervision convolution subspace clustering network, which realizes the self supervision of the network by constructing an automatic supervision target by using the intrinsic characteristics of data by adding a clustering module and a classification module in a deep neural network. The clustering module is responsible for generating pseudo labels, and the classification module is responsible for supervising the learning process by utilizing the pseudo labels and the classification network. After the self-supervision is introduced, the process of representing learning and the clustering process are fused, training is carried out in a unified network frame, the representation which is beneficial to clustering tasks can be better learned, and then the clustering accuracy is improved. And after the optimal clustering result is obtained, corresponding each cluster obtained by clustering with a specific network application type by using a likelihood estimation method, and realizing the task of network flow identification.
In order to achieve the purpose, the invention adopts the technical scheme that:
a network flow identification method based on an automatic supervision convolution subspace clustering network comprises the following steps;
1) data preprocessing:
filtering the acquired network traffic data set through a set strategy, converting the original network traffic data of various different formats into a uniform data format, and avoiding loss of key data items during conversion;
2) initializing and pre-training the self-encoder:
initializing a self-encoder network, and then inputting the original data of the step 1) into an encoder and pre-training;
3) training a convolution subspace clustering network, and learning a sparse representation matrix of data:
training a convolution subspace clustering network, initializing a self-encoder part in the convolution subspace clustering network by utilizing the self-encoder parameters obtained by learning in the step 2), and inputting original data into the network;
4) constructing a pseudo label:
by the sparse representation matrix obtained by the convolutional subspace clustering network learning in the step 3), constructing a similarity matrix, then applying spectral clustering to the similarity matrix to obtain a clustering cluster segmentation result of the data sample, wherein the clustering cluster segmentation result obtained by the spectral clustering can be used as a pseudo label of the data set, although the result is not correct on all sample data, the result still contains useful information on the premise of full pre-training, and by utilizing the point, the pseudo label generated by clustering is used for supervising the processes of feature extraction and sparse matrix learning of the network;
5) self-supervision learning:
the self-supervision structure is mainly realized by adding a classification network in the supervision learning field, and because the convolution subspace clustering network can well reconstruct original data, the extracted data features, namely sparse representation layer, contain enough information to predict the labels of data sample points, the classified network is added behind the sparse representation layer of the network, and the pseudo labels generated by the clustering module in the last step are used as expected results of classification, so that the learning of feature extraction network features and the subspace clustering network can be supervised;
6) identifying the type of network traffic:
and judging the mapping relation between the clustered clusters in the step 5) and the specific network type by a maximum likelihood estimation method, and identifying the network traffic type.
The data set in the step 1) is a UNB ISCX network traffic data set, the data set is a network traffic data set collected by 13 applications belonging to five categories of Mail, instant messaging, streaming media, file transfer, VoIP and P2P, and the specific application types related to the data set comprise Fileziella, Handgout, Skype, AIM, Facebook Chat, Gmail Chat, Mail, Torrent, Vimeo, Youtube, ICQ, Handouts Audio and Skype Audio.
In the step 1), the UNB ISCX network traffic data set is processed through the steps of performing stream filtering and stream cleaning in the preprocessing of the network traffic data, and the characteristic attribute of each stream record is mapped into the same number of pixel points, so that original data which contains noise, is incomplete and inconsistent is converted into proper input data.
The self-encoder used in the step 2) is a spindle-shaped structure with two large ends and a small middle part as a whole, and is composed of an encoder and a decoder, namely a network which is formed by reconstructing an original data space from an original data dimensional space to a potential dimensional space and then from the potential dimensional space to the original data space. The invention adopts a convolutional self-encoder, namely, in an encoder part, a network stacked on each layer is a convolutional network, and in a decoder part, a network stacked on each layer is a deconvolution network. After the network parameters of the self-encoder are initialized randomly, the data to be analyzed are input into the network to be pre-trained layer by layer.
Before training the convolutional subspace clustering network in the step 3), initializing a self-encoder part in the convolutional subspace clustering network by using the self-encoder parameters obtained by learning in the previous step, and further continuously training the overall structure of the network until the network is converged.
The pseudo label of the data constructed in the step 4) is formed by adding a clustering module in a convolution subspace clustering network, measuring the distance between two vectors by using cosine similarity in a similarity matrix construction of the clustering module, and further realizing clustering by using a spectral clustering algorithm, wherein the spectral clustering is to convert the obtained data into a graph.
The self-supervision in the step 5) is realized by adding a classification module behind the sparse representation of the data learned in the convolutional subspace clustering network, the used classification module adopts the classification network in the traditional supervision learning field to classify the data, and meanwhile, the pseudo label generated by the clustering module is utilized to calculate the error between the classification result and the expected label, so that the self-supervision effect is realized through the back propagation of the neural network.
Identifying the network traffic type in the step 6), judging the mapping relation between the clustered cluster and the specific network application type by a maximum likelihood estimation method, and setting B to { B ═ B1,b2,…,bnIs the set of clusters after the data set is clustered, where n represents the number of clusters, D ═ D1,d2,…,dmRepresenting the set of the network traffic types to be identified, m representing the number of application types, the number of the application types being less than or equal to the number of clusters, and establishing a mapping relation f by maximum likelihood estimation, wherein the mapping relation f is established by a probability formula P (D)j|bi)=hji/hiWherein j is more than or equal to 1 and less than or equal to m, i is more than or equal to 1 and less than or equal to n, h in the formulajiRepresents a cluster biHas been marked as a network application type djNumber of data streams of hiThen it represents cluster biSum of all data objects, P (d)j|bi) To form a cluster biMapping to a specific network application type djThe probability of (2) is expressed as a probability matching function of the data traffic and the network application type
Figure BDA0003327952310000051
If the value of the probability lower limit of the maximum likelihood estimation is set as x, the above formula is used, and when the cluster b is a clusteriMarked as specific network application type djIs the largest in the proportion of the known type sample object to the total number of all data objects in the cluster, and the value of the sample object exceeds the probability lower limit x, the data flow is identified as the type d of the network application softwarejIf the value d does not reach the lower probability limit x, the network traffic type corresponding to the cluster can be marked as an unknown network application type, that is, the identified traffic is unknown traffic, and thus, the network is completedAnd identifying the flow.
The invention has the beneficial effects that:
the invention learns the representation of network flow data by using a convolutional subspace clustering network, reduces the dimension of original data, creatively introduces an automatic supervision method in order to solve the problems that the separability of the representation of the learning and clustering processes in a clustering algorithm based on deep learning and the lack of effective utilization of the internal information of a sample in the training process. Therefore, the original process of sparse representation through network learning and the clustering process are organically combined, and the effect of identifying network flow is improved.
Drawings
FIG. 1 is a general flow diagram of the present invention.
FIG. 2 is a block diagram of the flow identification by the present invention.
Detailed Description
The present invention will be described in further detail with reference to examples.
As shown in fig. 1, six steps of improving and identifying network traffic based on the self-supervised convolutional subspace clustering network are shown, namely data preprocessing, initializing and pre-training a self-encoder, training the convolutional subspace clustering network and learning a sparse representation matrix of data, constructing a pseudo tag, self-supervised learning, and identifying network traffic types.
As shown in FIG. 2, the framework of identifying network traffic type by self-supervised learning in the present invention is shown, the self-supervised method is added to the convolutional subspace clustering network, and the constructed classification module and clustering module are used to realize the self-supervision of the network and optimize the overall performance of the method.
The invention provides a network flow identification method based on an automatic supervision convolution subspace clustering network, which comprises the following steps:
step one, data preprocessing. The data set selected by the invention is a UNB ISCX network traffic data set, the data set is a network traffic data set collected aiming at 13 applications belonging to five categories of Mail, instant messaging, streaming media, file transfer, VoIP and P2P, and the specific application types related to the data set comprise Fileziella, Handgout, Skype, AIM, Facebook Chat, Gmail Chat, Mail, Torrent, Vimeo, Youtube, ICQ, Handouts Audio and Skype Audio. The method further processes the UNB ISCX network flow data set by executing the steps of flow filtering and flow cleaning, and maps the characteristic attribute of each flow record into the same number of pixel points, thereby converting the original data which contains noise, is incomplete and inconsistent into proper input data suitable for the method model of the invention.
And step two, initializing and pre-training the self-encoder. The self-encoder is a fusiform structure with two large ends and a small middle part as a whole, and consists of an encoder and a decoder, namely a network which is formed by reconstructing an original data dimensional space to a potential dimensional space and then reconstructing the potential dimensional space to the original data space. The invention adopts a convolutional self-encoder, namely, in an encoder part, a network stacked on each layer is a convolutional network, and in a decoder part, a network stacked on each layer is a deconvolution network. After the network parameters of the self-encoder are initialized randomly, the data to be analyzed are input into the network to be pre-trained layer by layer.
And step three, training a convolution subspace clustering network, and learning a sparse representation matrix of the data. Before training the convolutional subspace clustering network, initializing a self-encoder part in the convolutional subspace clustering network by utilizing self-encoder parameters obtained by learning in the previous step, inputting original data into the network, and further continuously training the overall structure of the network until the network is converged.
And step four, constructing a pseudo label. And a similarity matrix can be constructed by a sparse representation matrix obtained by convolutional subspace clustering network learning. And then applying spectral clustering on the similarity matrix to obtain a cluster segmentation result of the data sample. The cluster segmentation result obtained by spectral clustering can be used as a pseudo-label for a data set, and although the result is not correct on all sample data, it still contains useful information under sufficient pre-training. With this, the pseudo labels generated by clustering can be used to supervise the process of feature extraction and sparse matrix learning of the network. The pseudo label of the data is constructed by adding a clustering module in a convolution subspace clustering network, measuring the distance between two vectors by using cosine similarity in a similarity matrix construction of the clustering module, and then realizing clustering by using a spectral clustering algorithm, wherein the spectral clustering is to convert the obtained data into a graph.
And step five, self-supervision learning. The construction of self-supervision is mainly realized by adding a classification network of supervision and learning fields. Since the convolutional subspace clustering network can reconstruct the original data well, the extracted data features, i.e., sparse representation layers, contain enough information to predict the labels of the data sample points. Therefore, a classified network is added behind a sparse representation layer of the network, the used classification module adopts the classification network in the traditional supervised learning field to classify data, the pseudo label generated by the clustering module in the previous step is used as an expected classification result, and the error between the classification result and the expected label can be calculated, so that the self-supervision effect is realized through the back propagation of the neural network, the feature extraction network feature is continuously supervised, and the learning of the subspace clustering network is continuously realized.
And step six, identifying the network flow type. And judging the mapping relation between the clustered cluster and the specific network type by a maximum likelihood estimation method, and identifying the network traffic type. Let B be { B ═ B1,b2,…,bnAnd is the set of clusters after the data set is clustered, wherein n represents the number of clusters. D ═ D1,d2,…,dmAnd f, representing a set of network traffic types to be identified, m represents the number of application types, the number of the applications is less than or equal to the number of clusters, and a relevant mapping f: B → D exists in the classification of the data set. The mapping f is established by maximum likelihood estimation. The probability formula used isP(dj|bi)=hji/hiWherein j is more than or equal to 1 and less than or equal to m, i is more than or equal to 1 and less than or equal to n, h in the formulajiRepresents a cluster biHas been marked as a network application type djThe number of data streams. h isiThen it represents cluster biThe sum of all data object quantities. P (d)j|bi) To form a cluster biMapping to a specific network application type djThe probability of (c). The probability matching function formula of the data flow and the network application type is
Figure BDA0003327952310000091
If the value of the probability lower limit of the maximum likelihood estimation is set as x, the above formula is used, and when the cluster b is a clusteriMarked as specific network application type djIs the largest in the proportion of the known type sample object to the total number of all data objects in the cluster, and the value of the sample object exceeds the probability lower limit x, the data flow is identified as the type d of the network application softwarejThe data of (1). If the value d does not reach the lower probability limit x, the network traffic type corresponding to the cluster can be marked as an unknown network application type, that is, the identified traffic is unknown traffic. Thus, the network traffic identification is completed.
The identification effect of the method is evaluated by mainly using the single application identification accuracy and the overall application identification accuracy, wherein the single application identification accuracy is represented by the ratio of the number of the flows which correctly identify a certain application type to the number of the flows which are determined to be the certain application type, the overall application identification accuracy is represented by the ratio of the number of the flows which are correctly identified to be the corresponding application type to the total network traffic of the identified target data set, and the higher the numerical values of the two evaluation indexes are, the better the network traffic identification effect of the method is.
The invention has the following characteristics:
1. the method does not depend on information loaded by a data frame, and has good identification effect on encrypted flow and the like;
2. the introduced self-supervision learning method enables a certain proportion of data samples with pseudo labels to carry out effective mapping guidance on the whole method model, and improves the effect of network traffic identification.
The convolution subspace clustering network not only can effectively reduce the dimension of input data, but also can learn the implicit characteristics of the analyzed data in a mode of adjusting the number of layers of the neural network, optimizing the network training process and the like, and recover the data through the data reconstruction process. The data label of the self-supervision learning comes from the data, the neural network is optimized by constructing a self-supervision task and a target, the self-supervision of the learning process is realized, the quality of the learning representation is improved, and the effect of the subsequent task is improved.
In the process of identifying network traffic, the invention designs a network traffic identification method based on an automatic supervision convolution subspace clustering network, combines the convolution subspace clustering network and an automatic supervision learning method to solve the problems of dependence on the load information of data frames and difficulty in identifying encrypted traffic, and effectively completes the task of traffic identification.

Claims (8)

1. A network flow identification method based on an automatic supervision convolution subspace clustering network is characterized by comprising the following steps;
1) data preprocessing:
filtering the acquired network traffic data set through a set strategy, converting the original network traffic data of various different formats into a uniform data format, and avoiding loss of key data items during conversion;
2) initializing and pre-training the self-encoder:
initializing a self-encoder network, and then inputting the original data of the step 1) into an encoder and pre-training;
3) training a convolution subspace clustering network, and learning a sparse representation matrix of data:
training a convolution subspace clustering network, initializing a self-encoder part in the convolution subspace clustering network by utilizing the self-encoder parameters obtained by learning in the step 2), and inputting original data into the network;
4) constructing a pseudo label:
by the sparse representation matrix obtained by the convolutional subspace clustering network learning in the step 3), constructing a similarity matrix, then applying spectral clustering to the similarity matrix to obtain a clustering cluster segmentation result of the data sample, wherein the clustering cluster segmentation result obtained by the spectral clustering can be used as a pseudo label of the data set, although the result is not correct on all sample data, the result still contains useful information on the premise of full pre-training, and by utilizing the point, the pseudo label generated by clustering is used for supervising the processes of feature extraction and sparse matrix learning of the network;
5) self-supervision learning:
the self-supervision structure is mainly realized by adding a classification network in the supervision learning field, and because the convolution subspace clustering network can well reconstruct original data, the extracted data features, namely sparse representation layer, contain enough information to predict the labels of data sample points, the classified network is added behind the sparse representation layer of the network, and the pseudo labels generated by the clustering module in the last step are used as expected results of classification, so that the learning of feature extraction network features and the subspace clustering network can be supervised;
6) identifying the type of network traffic:
and judging the mapping relation between the clustered clusters in the step 5) and the specific network type by a maximum likelihood estimation method, and identifying the network traffic type.
2. The method for identifying network traffic based on the unsupervised convolutional subspace clustering network as claimed in claim 1, wherein the data set in step 1) is a UNB ISCX network traffic data set, the data set is a network traffic data set collected for 13 applications belonging to Mail, instant messaging, streaming media, file transfer, VoIP and P2P major categories, and the data set relates to specific application types including Fileziela, Handgout, Skype, AIM, Facebook Chat, Gmail Chat, Maiil, Torrent, Vimeo, Youtube, ICQ, Handgouuts Audio and Skype Audio.
3. The method of claim 1, wherein the preprocessing of the network traffic data in step 1) processes the UNB ISCX network traffic data set by performing the steps of stream filtering and stream cleaning, and maps the characteristic attribute of each stream record to the same number of pixels, thereby converting the original data which contains noise, is incomplete and inconsistent into proper input data.
4. The method for identifying network traffic based on the unsupervised convolutional subspace clustering network as claimed in claim 1, wherein the self-encoder used in step 2) is a spindle-shaped structure with two large ends and a small middle part as a whole, and is composed of an encoder and a decoder, that is, a network which is reconstructed from an original data dimensional space to a potential dimensional space and then from the potential dimensional space to the original data space.
5. The method for identifying network traffic based on the unsupervised convolutional subspace clustering network as claimed in claim 1, wherein before training the convolutional subspace clustering network in the step 3), the self-encoder part in the convolutional subspace clustering network is initialized by using the self-encoder parameters learned in the previous step, and then the whole structure of the network is continuously trained until the network converges.
6. The method for identifying network traffic based on the unsupervised convolutional subspace clustering network as claimed in claim 1, wherein the pseudo label of the data constructed in step 4) is generated by adding a clustering module in the convolutional subspace clustering network, measuring the distance between two vectors by using cosine similarity in the similarity matrix construction of the clustering module, and further using a spectral clustering algorithm to realize clustering, wherein the spectral clustering is to convert the obtained data into a graph, construct a graph data structure by using a KNN method, and then realize spectral clustering on the basis of the graph data structure, and in the convolutional subspace clustering network, the pseudo label is generated by using the clustering module by using sparse representation of the data obtained in the training stage.
7. The method for identifying network traffic based on the self-supervision convolutional subspace clustering network as claimed in claim 1, wherein the self-supervision in the step 5) is realized by learning a classification module behind the sparse representation of the data in the convolutional subspace clustering network, the classification module is a classification network in the traditional supervised learning field, and the data is classified, and meanwhile, the pseudo label generated by the clustering module is utilized to calculate the error between the classification result and the expected label, so that the self-supervision effect is realized through the back propagation of the neural network.
8. The method for identifying network traffic based on the unsupervised convolutional subspace clustering network as claimed in claim 1, wherein the network traffic type is identified in the step 6), the mapping relationship between the clustered cluster and the specific network application type is determined by the maximum likelihood estimation method, and B ═ B1,b2,…,bnIs the set of clusters after the data set is clustered, where n represents the number of clusters, D ═ D1,d2,…,dmRepresenting the set of the network traffic types to be identified, m representing the number of application types, the number of the application types being less than or equal to the number of clusters, and establishing a mapping relation f by maximum likelihood estimation, wherein the mapping relation f is established by a probability formula P (D)j|bi)=hji/hiWherein j is more than or equal to 1 and less than or equal to m, i is more than or equal to 1 and less than or equal to n, h in the formulajiRepresents a cluster biHas been marked as a network application type djNumber of data streams of hiThen it represents cluster biSum of all data objects, P (d)j|bi) To form a cluster biMapping to a specific network application type djThe probability of (2) is expressed as a probability matching function of the data traffic and the network application type
Figure FDA0003327952300000041
If the value of the probability lower limit of the maximum likelihood estimation is set as x, the above formula is used, and when the cluster b is a clusteriMarked as specific network application type djIs the largest in the proportion of the known type sample object to the total number of all data objects in the cluster, and the value of the sample object exceeds the probability lower limit x, the data flow is identified as the type d of the network application softwarejIf the value d does not reach the lower probability limit x, the network traffic type corresponding to the cluster can be marked as an unknown network application type, that is, the identified traffic is unknown traffic, and thus, the network traffic identification work is completed.
CN202111270837.4A 2021-10-29 2021-10-29 Network flow identification method based on self-supervision convolution subspace clustering network Withdrawn CN114006870A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111270837.4A CN114006870A (en) 2021-10-29 2021-10-29 Network flow identification method based on self-supervision convolution subspace clustering network

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111270837.4A CN114006870A (en) 2021-10-29 2021-10-29 Network flow identification method based on self-supervision convolution subspace clustering network

Publications (1)

Publication Number Publication Date
CN114006870A true CN114006870A (en) 2022-02-01

Family

ID=79925094

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111270837.4A Withdrawn CN114006870A (en) 2021-10-29 2021-10-29 Network flow identification method based on self-supervision convolution subspace clustering network

Country Status (1)

Country Link
CN (1) CN114006870A (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114679308A (en) * 2022-03-21 2022-06-28 山东大学 Unknown flow identification method and system based on double-path self-coding
CN116827689A (en) * 2023-08-29 2023-09-29 成都雨云科技有限公司 Edge computing gateway data processing method based on artificial intelligence and gateway
CN117527446A (en) * 2024-01-03 2024-02-06 上海人工智能网络系统工程技术研究中心有限公司 Network abnormal flow refined detection method

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114679308A (en) * 2022-03-21 2022-06-28 山东大学 Unknown flow identification method and system based on double-path self-coding
CN114679308B (en) * 2022-03-21 2023-04-07 山东大学 Unknown flow identification method and system based on double-path self-coding
CN116827689A (en) * 2023-08-29 2023-09-29 成都雨云科技有限公司 Edge computing gateway data processing method based on artificial intelligence and gateway
CN116827689B (en) * 2023-08-29 2023-11-14 成都雨云科技有限公司 Edge computing gateway data processing method based on artificial intelligence and gateway
CN117527446A (en) * 2024-01-03 2024-02-06 上海人工智能网络系统工程技术研究中心有限公司 Network abnormal flow refined detection method
CN117527446B (en) * 2024-01-03 2024-03-12 上海人工智能网络系统工程技术研究中心有限公司 Network abnormal flow refined detection method

Similar Documents

Publication Publication Date Title
CN111581405B (en) Cross-modal generalization zero sample retrieval method for generating confrontation network based on dual learning
CN109949317B (en) Semi-supervised image example segmentation method based on gradual confrontation learning
CN114006870A (en) Network flow identification method based on self-supervision convolution subspace clustering network
CN109902740B (en) Re-learning industrial control intrusion detection method based on multi-algorithm fusion parallelism
CN109218223B (en) Robust network traffic classification method and system based on active learning
CN113177132B (en) Image retrieval method based on depth cross-modal hash of joint semantic matrix
CN109711483B (en) Spark Autoencoder-based power system operation mode clustering method
CN112733965B (en) Label-free image classification method based on small sample learning
CN112784929B (en) Small sample image classification method and device based on double-element group expansion
CN115037805B (en) Unknown network protocol identification method, system and device based on deep clustering and storage medium
Yang et al. One-class classification using generative adversarial networks
CN111428201A (en) Prediction method for time series data based on empirical mode decomposition and feedforward neural network
CN111680644B (en) Video behavior clustering method based on deep space-time feature learning
CN114333064A (en) Small sample behavior identification method and system based on multidimensional prototype reconstruction reinforcement learning
CN114998602A (en) Domain adaptive learning method and system based on low confidence sample contrast loss
CN116452862A (en) Image classification method based on domain generalization learning
CN112990371B (en) Unsupervised night image classification method based on feature amplification
CN114399055A (en) Domain generalization method based on federal learning
CN115348215B (en) Encryption network traffic classification method based on space-time attention mechanism
CN114168782B (en) Deep hash image retrieval method based on triplet network
CN115759205A (en) Negative sample sampling method based on multi-model cooperation contrast learning
CN115277888A (en) Method and system for analyzing message type of mobile application encryption protocol
CN115131605A (en) Structure perception graph comparison learning method based on self-adaptive sub-graph
Liu et al. A survey of image clustering: Taxonomy and recent methods
Chen et al. Semi-supervised convolutional neural networks with label propagation for image classification

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
WW01 Invention patent application withdrawn after publication
WW01 Invention patent application withdrawn after publication

Application publication date: 20220201