CN112270351A - Semi-supervised encryption traffic identification method for generating countermeasure network based on auxiliary classification - Google Patents

Semi-supervised encryption traffic identification method for generating countermeasure network based on auxiliary classification Download PDF

Info

Publication number
CN112270351A
CN112270351A CN202011150439.4A CN202011150439A CN112270351A CN 112270351 A CN112270351 A CN 112270351A CN 202011150439 A CN202011150439 A CN 202011150439A CN 112270351 A CN112270351 A CN 112270351A
Authority
CN
China
Prior art keywords
certificate
ssl
connection
record
log
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202011150439.4A
Other languages
Chinese (zh)
Inventor
张明明
冒佳明
夏飞
赵俊峰
夏元轶
曾锃
许良杰
蒲强
马媛媛
陈璐
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Global Energy Interconnection Research Institute
Anhui Jiyuan Software Co Ltd
Information and Telecommunication Branch of State Grid Jiangsu Electric Power Co Ltd
Original Assignee
Global Energy Interconnection Research Institute
Anhui Jiyuan Software Co Ltd
Information and Telecommunication Branch of State Grid Jiangsu Electric Power Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Global Energy Interconnection Research Institute, Anhui Jiyuan Software Co Ltd, Information and Telecommunication Branch of State Grid Jiangsu Electric Power Co Ltd filed Critical Global Energy Interconnection Research Institute
Priority to CN202011150439.4A priority Critical patent/CN112270351A/en
Publication of CN112270351A publication Critical patent/CN112270351A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/10Complex mathematical operations
    • G06F17/18Complex mathematical operations for evaluating statistical data, e.g. average values, frequency distributions, probability functions, regression analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/14Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic
    • H04L63/1408Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic by monitoring network traffic

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Mathematical Analysis (AREA)
  • Pure & Applied Mathematics (AREA)
  • Mathematical Optimization (AREA)
  • Computing Systems (AREA)
  • Evolutionary Computation (AREA)
  • Computational Mathematics (AREA)
  • Computer Security & Cryptography (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Biophysics (AREA)
  • Probability & Statistics with Applications (AREA)
  • Biomedical Technology (AREA)
  • Computer Hardware Design (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Operations Research (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Algebra (AREA)
  • Databases & Information Systems (AREA)
  • Data Exchanges In Wide-Area Networks (AREA)

Abstract

The invention discloses a semi-supervised encryption traffic identification method for generating a countermeasure network based on auxiliary classification. The method comprises the steps of transforming an original auxiliary classification generation countermeasure network, fusing a generator into vectors after receiving random noise, hidden variables and data labels, generating generated data containing real flow characteristics, receiving unmarked samples and marked samples in the real data by a discriminator, stacking three MLP networks to finish judgment of true and false flows, classifying the flows and extracting the hidden variables respectively. The method of the invention modifies the loss function of the original auxiliary classification generation countermeasure network, so that the method can utilize unmarked data to carry out semi-supervised learning, improve the identification precision, reduce the cost of network flow acquisition and marking, and simultaneously improve the network management and safety monitoring level.

Description

Semi-supervised encryption traffic identification method for generating countermeasure network based on auxiliary classification
Technical Field
The invention relates to a semi-supervised encrypted traffic identification method for generating a countermeasure network based on auxiliary classification, and belongs to the technical field of encrypted traffic identification.
Background
The flow classification and identification are the basis for improving the network management and safety monitoring level and improving the service quality, and are also the premise of network behaviors such as network design and planning. With the enhancement of user privacy protection and security awareness, technologies such as SSL, SSH, VPN, etc. are more and more widely used, resulting in a higher proportion of encrypted traffic in network transmission.
Due to the adoption of application layer encryption, the traditional port matching and DPI can not accurately identify the application flow; compared with machine learning, deep learning can well express essential characteristics of data, but a large number of marked samples are relied on during training, and the accuracy of the samples directly leads to the recognition rate of a training result. However, the traffic acquisition and marking of encryption application are very difficult, and it is difficult to directly acquire the sample size required for training a better model, which results in high cost.
The existing deep learning-based traffic identification method is mostly based on supervised learning and depends on a large amount of marked data. Marked data is always difficult and costly to obtain, however unmarked data is readily available. Obviously, how to combine a large amount of unmarked flow data with a small amount of marked flow data to complete the classification task in a semi-supervised manner would greatly eliminate the dependency of a large amount of marked data sets, and is very meaningful.
By means of a small amount of encrypted flow data of real application, the generation method described in the invention can simply and quickly generate encrypted flow with better application characteristics, and can obtain better identification effect than a supervised learning method under the same condition by using easier-to-obtain unmarked real flow, thereby greatly reducing the cost of flow acquisition and marking.
Disclosure of Invention
The technical problem to be solved by the invention is to overcome the defects of the prior art and provide a semi-supervised encryption traffic identification method for generating a countermeasure network based on auxiliary classification.
In order to achieve the above object, the present invention provides a method for identifying semi-supervised encrypted traffic of a countermeasure network generated based on auxiliary classification, comprising the steps of:
1) monitoring a network card of the network equipment by a capture code based on the libpcap, capturing flow containing the flow needing to be identified and storing the flow as the pcap;
2) according to the known flow, marking the known flow in the captured pcap to form a marked pcap and an unmarked pcap;
3) three logs were extracted for marked and unmarked pcaps, respectively, using the open source traffic analysis tool zeek: connection logs, SSL logs and certificate logs;
4) extracting the three logs obtained in the step 3) according to the defined features to obtain a stream feature matrix containing the features, and forming a real data set for training;
5) constructing an auxiliary classification generation confrontation network structure, receiving a vector c synthesized by random noise, hidden variables and class labels by a generator, generating flow by the generator, receiving the flow characteristic matrix obtained in the step 4) by a discriminator, receiving the flow generated by the generator by the discriminator, carrying out recognition training, and outputting to three fully-connected neural networks by the discriminator to finish recognition of true and false flow, classification of classes and extraction of hidden parameters;
6) defining real data in the real data set as x, dividing the real data x into real unmarked data and real marked data, defining Pdata-unlabelDefining P for the probability distribution of true unlabeled datadata-labelDefining P for the probability distribution of true labeled datagTo generate a probability distribution of data, the generated data is defined by inputting a synthetic vector to a generator, and the countermeasure loss of a discriminator is expressed by using Wasserstein distance instead of the original log-likelihood functiondata-unlabel、Pdata-labelTo PgEarth-Mover distance of (1):
Figure BDA0002741018440000021
wherein sup is the minimum upper bound, | | f | | non-woven phosphorLIs a Lipschitz constraint that is a constraint,
Figure BDA0002741018440000022
is x belongs to pdata-labeldeIn the expectation that the position of the target is not changed,
Figure BDA0002741018440000023
is x belongs to pdata-unlabeldeIn the expectation that the position of the target is not changed,
Figure BDA0002741018440000024
is x belongs to Pg(x) is the probability distribution of the random variable x;
7) the classification loss adopts cross entropy loss, and if the output of MLP _ C is represented by y, y is usediThe value of each component of y is represented, the label corresponding to the input y flow is represented by y ', and y'iRepresenting the value of each component of y', m being the total number of classes, the classification penalty is:
Figure BDA0002741018440000025
8) let the hidden parameter vector of the input generator be h ═ h1,h2,……hn) The hidden parameter vector of the MLP _ H output is
Figure BDA0002741018440000026
The reduction loss for the implicit parameter is defined as
Figure BDA0002741018440000027
The reduction loss is the sum of the distances of the hidden parameter vector component of each input generator and the hidden parameter vector component of each MLP _ H output;
9) training by adopting a minipatch algorithm, taking any number of randomly distributed z and any label c with a label sample from random noise, generating a synthetic vector by matrix fusion of any number of hidden variables h, sending the synthetic vector to a generator for training, generating generated flow containing real flow characteristics, and sending any number of marked and unmarked real flows to a discriminator for training;
10) and alternately training the arbiter and the generator, and adopting the following parameter updating rules when the semi-supervised auxiliary label is trained to generate the countermeasure network: optimizing the countermeasure loss LSUpdating the generator, the discriminator and the network parameters of the MLP _ S; optimizing classification loss LcUpdating the network parameters of the generator, the discriminator and the MLP _ C; optimization of reduction loss LhUpdating the generator, the discriminator and the network parameters of the MLP _ H;
11) repeating the step 10), and achieving Nash balance through the counterstudy of the generator and the discriminator;
and performing a certain traffic identification and classification task on the traffic of the existing network based on the data preprocessing mode from the step 1) to the step 4), the trained discriminator and the MLP _ C.
Further, in step 4), extracting inter-stream features in the pcap by using a zeek source-opening tool according to the defined features, forming 28 stream features, obtaining a stream feature matrix of the 28 features, and forming a real data set for training.
Furthermore, the original auxiliary classification generation countermeasure network structure is modified, hidden variables are introduced, and the output of the generator can be controlled according to some special hidden characteristics.
Further, a loss function of the original auxiliary classification generation countermeasure network is modified, Wasserstein distance is used for replacing an original log-likelihood function, and reduction loss is defined.
Further, in step 3), the log contains a log conn.log of connection records, a log ssl.log of SSL connection records, and a log x509.log of certificate records.
Log conn, describing the connection between two endpoints of two network connection parties; log, recording information including IP address, port, protocol of data packet sending and receiving party, network connection double-sending connection state, data packet sending and receiving party connection state, data packet quantity and label;
log SSL, each line describes the version of SSL/TLS in the SSL/TLS handshake and encryption setup process, the password used, the server name, the certificate path, the subject, and the issuer's information;
log x509.log, each row is a certificate record describing certificate information including a certificate serial number, a generic name, time validity, a subject, a signature algorithm, and a key length in bits.
Further, the flow features include a connection feature, an SSL feature, and a certificate feature.
Further, the connection features include:
(1) SSL aggregation and connection record number: each connection record in the SSL aggregation and connection record number comprises a certain number of SSL aggregation and connection records, and the first characteristic SSL aggregation and connection record number is only the sum of the SSL aggregation number and the connection record number;
(2) mean duration: each connection record in the SSL aggregation and connection record number contains a duration in seconds; for each incoming SSL aggregation and connection record in the number of connection records, this duration value is stored in a list, from which the average is finally calculated, setting t to contain the duration value, t ranging from { t } t1,t2,t3,…, tnThen the average value of t is:
Figure BDA0002741018440000041
(3) standard deviation of duration: calculate standard deviation of duration list:
Figure BDA0002741018440000042
Figure BDA0002741018440000043
Figure BDA0002741018440000044
(4) standard deviation range of duration: (ii) a percentage of out-of-range of all duration values, the range having an upper limit and a lower limit, the upper limit being the mean + standard deviation and the lower limit being the mean-standard deviation;
(5) payload bytes from originator: the initiator records the number of bytes of the transmitted effective load for all connections from conn.log;
(6) payload bytes from responder: log records the number of payload bytes sent for all connections from conn.log by the responder;
(7) ratio of responder bytes to all bytes: all bytes are bytes from the originator and bytes from the responder; the ratio of responder bytes to all bytes is:
Figure BDA0002741018440000045
where r is the number of bytes from the responder and o is the number of bytes from the originator;
(8) establishing the ratio of the connection states, wherein each connection record comprises the connection state; the connection state is divided into an established state and an unestablished state by 13 types; the established and non-established states describe whether there are any TCP handshakes, or just attempt to do a TCP handshake; the established state is [ SF, S1, S2, S3, RSTO, RSTR ], which includes a successful TCP handshake; the non-established state is [ OTH, SO, REJ, SH, SHR, RSTOS0, RSTRH ], which includes an unsuccessful handshake; the special meaning of each connection state is stored in a zeek document, and the specific calculation method comprises the following steps:
Figure BDA0002741018440000046
where e is the number of established states and n is the number of non-established states;
(9) number of inbound messages: the number of upstream packets contained in the connection log;
(10) number of outgoing messages: the number of downstream packets contained in the connection log;
(11) cycle average: each connection record having a capture time, the periodicity of one of the groups being measured, the first step being to calculate the time difference between the connection records in sequence, the second step being to calculate the second time difference from the first time difference in absolute value, and if the second time difference is zero, it is intended that the associated connection record is periodic; the third step, from the second timeThe values of the time differences are stored in a list, from which an average value is calculated, D being the value of the second time difference, DnThe value obtained by the difference between the capture time of the nth connection record and the (n + 1) th connection record, d1A value representing the difference between the time of capture of the first connection record and the time of capture of the second connection record, d1Is the first value 0, d of 2nd time difference2Making a difference between the time of capture recorded for the second connection and the time of capture recorded for the third connection, d2Second value of 0, d of 2nd time difference3The difference between the capture time of the second connection record and the capture time of the third connection record, d3 is the third value of 2nd time difference 15:
Figure BDA0002741018440000051
Figure BDA0002741018440000052
(12) periodic standard deviation: using the list of time difference values obtained in the previous step, calculating the value of the second time difference D, the standard deviation of the list of time difference values:
Figure BDA0002741018440000053
Figure BDA0002741018440000054
Figure BDA0002741018440000055
further, the SSL features include:
(13) ratio of connection record and SSL aggregation: the ratio between non-SSL connection records and SSL connection records is described, the ratio R of connection records and SSL aggregations being:
Figure BDA0002741018440000061
wherein f isnIs the number of connection records, f, without SSLsIs the number of connection records using SSL;
(14) ratio of TLS and SSL versions: all SSL connection records have a TLS or SSL protocol version for encryption, the SSL connection records including SSL 1.0, SSL 2.0, SSL 3.0, TLS 1.0, TLS 1.1, TLS 1.2 and TLS 1.3, SSLTLSTLS, this feature describing how many SSL connection records have a TLS protocol, the ratio R of TLS and SSL versions being:
Figure BDA0002741018440000062
wherein TLS is the number of SSL connection records with TLS protocol, SSL is the number of SSL connection records with SSL protocol;
(15) SNI ratio: SNI is the name of the server in the SSL connection record, and describes how many SSL connection records contain SNI, the SSL connection record of malware has more empty SNI than the SSL connection record of normal software, and the SNI ratio is:
Figure BDA0002741018440000063
wherein FsIs the number of SSL connection records with SNI, FaIs the number of all SSL connection records;
(16) SNI is a marker for IP: sometimes SSL connection records have SNI as the IP address; in this case, the SNI IP should be the same as the target IP address; if any of the SSL connection records in the connection log have SNI as IP, but SNI is different from DstIP, then this feature is-1; 0 if any SSL connection record has SNI as IP and SNI is the same as DstIP; if there is no SSL connection record of IP address, it is 1;
(17) average value of certificate path records, let C be each SSL connection recordStoring a list of the number of certificates in the certificate path, C1Storing the number of certificates in the certificate path on behalf of the first SSL connection record, C2Storing the number of certificates in the certificate path on behalf of the second SSL connection record, CnStoring the number of certificates in the certificate path on behalf of the nth SSL connection record, and calculating the average value of the certificate path records according to the list:
Figure BDA0002741018440000064
(18) zeek can identify whether the end user certificate is self-signed or not, and stores the end user certificate in the SSL connection record; the self-signed certificate proportion is the ratio of the self-signed certificate and all end-user certificates in the log, and the ratio is the ratio of the number of self-signed certificates and the number of all certificates.
Further, the certificate features include:
(19) public key mean: each certificate record describing the certificate contains the public key of the certificate, the public key in each certificate record is added to the list, let J be the list formed by the number of public keys in each certificate record, J1For the number of public keys in the first certificate record, j2For the number of public keys in the second certificate record, jnCalculate the average from the list for the number of public keys in the nth certificate record:
Figure BDA0002741018440000071
(20) average value of certificate validity period: each certificate has a validity period stored in the certificate record in unix time, the validity period is stored in the list in seconds in each certificate, G is the list formed by the validity periods of the certificates in the certificate record, G1For the validity period of the certificate in the first certificate record, g2For the validity period of the certificate in the second certificate record, gnFor the certificate validity period in the nth certificate record, then calculate the average from the list:
Figure BDA0002741018440000072
(21) certificate validity standard deviation: the list of certificate validity periods, which is the same as the average value of the certificate validity periods, calculates G standard deviation in seconds:
Figure BDA0002741018440000073
Figure BDA0002741018440000074
Figure BDA0002741018440000075
(22) validity of certificate deadline during capture: determining whether the certificate during the capturing is valid or not through the capturing time and the validity period of the certificate; it is normal if the capture time is within the certificate validity period, the validity of the certificate validity period during capture being the number of certificates that exceed the validity period during capture of traffic, malware using invalid certificates instead of ordinary certificates:
(23) mean value of certificate validity start time: the ratio of the two lengths of time; the first length is that the validity period of the certificate is set as K, K1Is the certificate validity period, k, of the first certificate record2Is the certificate validity period, k, of the second certificate recordnIs the certificate validity period of the Nth certificate record, and the second length is the time period from the certificate validity period to the capture is set as P, P1Is the period of time, p, from the validity period of the certificate to the capture of the first certificate record2Is the period of time from the validity period of the certificate to the capture of the second certificate record, pnThe time period from the validity period of the certificate to the capture of the Nth certificate, and thus how old the certificate is, for each certificate, the ratio of these periods is calculated and the result is stored inIn the list, the average Z is then calculated from the list:
Figure BDA0002741018440000076
(24) number of certificates: connection logs usually contain one certificate, but sometimes more; thus, the number of certificates is only that of one connection data;
(25) domain number average in certificate SAN DNS: SAN is a backup name that describes which domains belong to this certificate, and for each new incoming certificate the number of dns in the SAN is stored in a list, and then an average is calculated from the list;
(26) ratio of certificate record to SSL connection record: the number of SSL connection records with a certificate path is described, since a certificate record can be added to the SSL aggregation in case it is included in the certificate path as a first certificate, the ratio of certificate record to SSL connection record being the ratio of the number of certificate records and the number of SSL connection records of one connection log;
(27) whether there is SNI in SAN DNS: SNI is the indication of the server name contained in the SSL connection record, SAN DNS is the domain in the certificate record belonging to the certificate; SNI is part of SAN DNS; if any certificate log does not contain the SNI in the SSL connection record in one SSL aggregation, then whether there is a SNI in the SAN DNS with a value of 0; if all certificate logs contain the SNI in the SAN DNS in each pair of SSL aggregation in the connection log, judging whether the value of the SNI in the SAN DNS is 1 or not;
(28) whether there is a CN in the SAN DNS: CN is generic name CN is part of SAN DNS; if none of the certificates contains a CN in the SAN DNS, whether the functional value of the CN in the AN DNS is 0 or not is judged; if all certificates contain a CN in san.dns, then if there is a CN with a functional value of 1 in AN DNS.
The invention achieves the following beneficial effects:
the method of the invention modifies the loss function of the original auxiliary classification generation countermeasure network, so that the method can utilize unmarked data to carry out semi-supervised learning, improve the identification precision, reduce the cost of network flow acquisition and marking, and simultaneously improve the network management and safety monitoring level.
Drawings
Fig. 1 is a flow diagram of assisted classification for generating a connection record against a network fabric.
Detailed Description
The following examples are only for illustrating the technical solutions of the present invention more clearly, and the protection scope of the present invention is not limited thereby.
It should be noted that, if there is a directional indication (such as up, down, left, right, front, and back) in the embodiment of the present invention, it is only used to explain the relative position relationship between the components, the motion situation, and the like in a certain posture, and if the certain posture is changed, the directional indication is changed accordingly.
In addition, if the description of "first", "second", etc. is referred to in the present invention, it is used for descriptive purposes only and is not to be construed as indicating or implying relative importance or implicitly indicating the number of technical features indicated. Thus, a feature defined as "first" or "second" may explicitly or implicitly include at least one such feature. In addition, technical solutions between various embodiments may be combined with each other, but must be realized by a person skilled in the art, and when the technical solutions are contradictory or cannot be realized, such a combination should not be considered to exist, and is not within the protection scope of the present invention.
1. Connection characteristics:
(1) SSL aggregation and connection record number: each connection record contains a certain number of SSL aggregations and connection records. The first feature is simply the sum of the number of SSL aggregations and the number of connection records.
(2) Mean duration: each connection record contains a duration in seconds. For each incoming SSL aggregation and connection record in the number of connection records, this duration value is stored in a list, from which the average is finally calculated, setting t to contain the duration value, t ranging from { t } t1,t2,t3,…,tnThen the average value of t is:
Figure BDA0002741018440000091
(3) standard deviation of duration: calculate standard deviation of duration list:
Figure BDA0002741018440000092
Figure BDA0002741018440000093
Figure BDA0002741018440000094
standard deviation range of duration: percentage out of range of all duration values. The range has two limits, the upper limit being the mean of t + standard deviation and the lower limit being the mean of t-standard deviation.
(4) Payload bytes from originator: log records the number of payload bytes sent for all connections from conn.
(5) Payload bytes from responder: log records the number of payload bytes sent for all connections from conn.
(6) Ratio of responder bytes to all bytes: all bytes are bytes from the originator and bytes from the responder. The specific calculation method comprises the following steps:
Figure BDA0002741018440000095
where r is the number of bytes from the responder and o is the number of bytes from the originator.
(7) A ratio of establishing connection status, each connection record containing connection status. There are 13 of these states. These states are classified into established states and non-established states. These two sets of states describe whether there are any TCP handshakes or just attempt to do a TCP handshake. The established state is [ SF, S1, S2, S3, RSTO, RSTR ], which includes a successful TCP handshake; the non-established state is [ OTH, SO, REJ, SH, SHR, RSTOS0, RSTRH ], which includes an unsuccessful handshake. The special meaning of each state is in the zeek document. The specific calculation method comprises the following steps:
Figure BDA0002741018440000101
where e is the number of established states and n is the number of non-established states.
(8) Number of inbound messages: the number of upstream packets contained in the connection log.
(9) Number of outgoing messages: the number of downstream packets contained in the connection log.
(10) Cycle average: each connection record has a capture time. Thus, we can measure the periodicity of one of the groups. Five fictitious connection records are shown. The first step is to calculate the time difference between the connection records in sequence. The next step is to calculate a second time difference based on the first time difference in absolute value. If the value is zero, it means that the associated connection record is periodic. Finally, the value from the second time difference is stored in a list. Calculating an average value from the list, let D be the value of the second time difference, DnThe value obtained by the difference between the capture time of the nth connection record and the (n + 1) th connection record, d1A value representing the difference between the time of capture of the first connection record and the time of capture of the second connection record, d1Is the first value 0, d of 2nd time difference2Making a difference between the time of capture recorded for the second connection and the time of capture recorded for the third connection, d2Second value of 0, d of 2nd time difference3The difference between the capture time of the second connection record and the capture time of the third connection record, d3 is the third value of 2nd time difference 15:
Figure BDA0002741018440000102
Figure BDA0002741018440000103
(11) periodic standard deviation: calculating the standard deviation of the value D of the second time difference by using the list of time difference values obtained in the previous step and the list of time difference values obtained in the previous step:
Figure BDA0002741018440000104
Figure BDA0002741018440000111
Figure BDA0002741018440000112
(12) SSL characteristics:
(13) ratio of connection record and SSL aggregation: this feature describes the ratio between non-SSL connection records and SSL connection records. The ratio R is:
Figure BDA0002741018440000113
where fn is the number of connection records without SSL and fs is the number of connection records with SSL
(14) Ratio of TLS and SSL versions: all SSL connection records have either a TLS or SSL protocol version for encryption. There are SSL 1.0, SSL 2.0, SSL 3.0, TLS 1.0, TLS 1.1, TLS 1.2 and TLS 1.3, where the SSL protocol is earlier than TLS and almost all ordinary traffic uses TLS. This feature describes how many SSL connection records have the TLS protocol. The ratio R is:
Figure BDA0002741018440000114
where TLS is the number of SSL connection records with TLS protocol and SSL is the number of SSL connection records with SSL protocol.
(15) SNI ratio: SNI is the server name in the SSL connection record. This feature describes how many SSL connection records contain SNIs, with the malware SSL connection records having more empty SNIs than normal SSL connection records. The ratio R is calculated as:
Figure BDA0002741018440000115
where Fs is the number of SSL connection records with SNI and Fa is the number of all SSL connection records.
(16) SNI is a marker for IP: sometimes the SSL connection record has SNI as the IP address. In this case, the SNI IP should be the same as the target IP address. This feature is-1 if any SSL connection record in its connection log has SNI as IP, but SNI is different from DstIP. 0 if any SSL connection record has SNI as IP and SNI is the same as DstIP; if there is no SSL connection record for an IP address, it is 1.
(17) Average value of certificate path records, C is a list formed by storing the number of certificates in the certificate path for each SSL connection record, C1Storing the number of certificates in the certificate path on behalf of the first SSL connection record, C2Storing the number of certificates in the certificate path on behalf of the second SSL connection record, CnStoring the number of certificates in the certificate path on behalf of the nth SSL connection record, and calculating the average value of the certificate path records according to the list:
Figure BDA0002741018440000121
(18) zeek is able to identify whether an end user certificate is self-signed. This information is in the SSL connection record. This feature is the ratio of the self-signed certificate and all end-user certificates in the log. The ratio R is:
Figure BDA0002741018440000122
where s is the number of self-signed certificates and c is the number of all certificates.
3. Certificate features:
(19) public key mean: each certificate record describing the certificate contains the public key of the certificate, the public key in each certificate record is added to the list, let J be the list formed by the number of public keys in each certificate record, J1For the number of public keys in the first certificate record, j2Number of public keys recorded for the second certificate, and so onnCalculate the average from the list for the number of public keys in the nth certificate record:
Figure BDA0002741018440000123
(20) average value of certificate validity period: each certificate has a validity period, such as: the certificate duration is 10 years from 1/2010 to 1/2020. These binding dates are stored in the certificate record as unix times. In each certificate, this validity period is stored in a list in seconds, from which an average is then calculated:
Figure BDA0002741018440000124
(21) certificate validity standard deviation: the list of validity periods of certificates, identical to the last one, in seconds, calculates the standard deviation:
Figure BDA0002741018440000125
Figure BDA0002741018440000126
Figure BDA0002741018440000127
(22) validity of certificate deadline during capture: by capturing the time and the validity period of the certificate, we can determine whether the certificate during capture is valid. It is normal if the capture time is within the certificate validity period. This feature is to capture the number of certificates that exceed the validity period during the traffic. Malware often uses invalid certificates instead of ordinary certificates:
(23) mean value of certificate validity start time: this feature is the ratio of the two time lengths. The first length is a certificate validity period and the second length is a period of time from the start of the certificate validity period to the capture. Thus, it can be calculated how old the certificate is. For each certificate, the ratio of these periods will be calculated and the result stored in a list. The average is then calculated from the list.
Figure BDA0002741018440000131
(24) Number of certificates: the connection log usually contains one certificate, but sometimes more. Therefore, this feature is only the number of certificates of one connection data.
(25) Domain number average in certificate SAN DNS: the SAN is an alternate name that describes which domains belong to this certificate. The number of dns in the SAN is stored in the list for each new incoming certificate. The average is then calculated from the list. An example of a portion of the Google certificate SAN dns: "google.com", "google.co.com", "google-analytical.com", "google.ca", "google.cl", "google.co.in", "google.co.jp", "google.co.uk", "google.de ]
Figure BDA0002741018440000132
(26) Ratio of certificate record to SSL connection record: this feature describes the number of SSL connection records with a certificate path, since a certificate record can be added to the SSL aggregation in case it is included in the certificate path as the first certificate. The ratio R is:
Figure BDA0002741018440000133
where c is the number of certificate records and s is the number of SSL connection records for one connection log.
(27) Whether there is SNI in SAN DNS: the SNI is a server name indication contained in the SSL connection record. The SAN DNS is the domain in the certificate record that belongs to the certificate. Typically, the SNI is part of the SAN DNS. If any certificate log does not contain the SNI in the SSL connection record in one SSL aggregation, the value of this feature is 0; if all certificate logs contain SNI in SAN DNS in each pair of SSL aggregations in the connection log, the value of this feature is 1.
(28) Whether there is a CN in the SAN DNS: CN is a generic name that is part of the certificate record. The CN should be part of the SAN DNS. If no certificate contains a CN in the SAN DNS, this function value is 0; this function value is 1 if all certificates contain CN in san.
Model training:
the most primitive AC-GAN objective function is the loss L from discriminating real samples and generating samplessAnd loss L of classified sample classCTwo parts are formed. The original AC-GAN objective function only uses marked data, and does not use unmarked data. In actual production, marked data are often difficult to obtain, and unmarked samples are easy to obtain, so that the loss function of the AC-GAN discriminator is modified to enable the AC-GAN discriminator to perform semi-supervised learning. Simultaneously, Wasserstein distance is used to replace the original log-likelihood function to represent the true data set Pdata-unlabel、Pdata-labelTo generation of a data set PgThe EM (Earth-Mover) distance of (c).
The penalty of the arbiter is therefore:
Figure BDA0002741018440000141
for the classification loss, cross entropy loss is adopted, and if the output of MLP _ C is represented by y, y is usediThe value of each component of y is represented, the label corresponding to the input y flow is represented by y ', and y'iRepresenting the value of each component, and m is the total number of classes. The classification penalty is then:
Figure BDA0002741018440000142
and simultaneously introducing a hidden variable h, and when the hidden variable and the random noise are independent from each other, for the flow G (z ') from the generator, obtaining an output z ' ═ M (G (z ')) through an encoder M, and separating each component of h. If the hidden parameter h of the input generator is equal to (h)1,h2,...,hn) MLP _ H output is
Figure BDA0002741018440000143
The reduction loss of the implicit parameter is defined as:
Figure BDA0002741018440000144
the following parameter update rules are used in training the entire model:
optimizing LSWhen loss occurs, updating network parameters of G, D and MLP _ S;
optimizing LcWhen loss occurs, updating network parameters of G, D and MLP _ C;
optimizing LhWhen loss occurs, the network parameters of G, D and MLP _ H are updated.
The libpcap is a network data packet capture function packet under a unix/linux platform in the prior art, the pcap file is a common datagram storage format, and the SSL connection recording protocol works by dividing a data stream into a series of segments and transmitting the segments, wherein each segment is independently protected and transmitted. The network devices may be PCs, switches and servers.
Labeling known flows in the captured pcap, resulting in labeled and unlabeled pcaps: the pcap file is a common datagram storage format, and is a file format, known in captured traffic and separable from the captured traffic are marked according to applications and stored as marked pcaps, and captured traffic contains known traffic but not well separated from the captured traffic and stored as unmarked pcaps, which is the prior art.
In the step (5), the identification of true and false flow, classification of categories and extraction of hidden parameters are completed: inputting the flow characteristic matrix obtained in the step (4) into three networks, outputting three vectors by the weight calculation of the neural network, wherein the meanings of the three vectors are true/false of flow, the probability of class classification and hidden parameter vectors respectively, and the output process is the prior art.
The MLP _ S refers to an MLP full-connection neural network for classifying the flow, the MLP _ C refers to a full-connection neural network used for judging whether the flow is real flow or generated flow in the sample diagram, and the MLP _ H refers to a full-connection neural network used for outputting hidden variables in the sample diagram.
The existing network refers to a network which needs to perform traffic identification and monitoring tasks and is currently providing production services, such as a public telecommunication network, a home local area network, an internal company network and the like.
In the method, the whole network is composed of a plurality of neural networks, the generator is a neural network, the discriminator + the fully-connected network + the MLP _ C form a stacked network to finish the true and false recognition of the flow, the discriminator + the fully-connected network + the MLP _ PS form the stacked network to finish the classified recognition of the flow, the discriminator + the fully-connected network + the MLP _ H form the stacked network to finish the output of the hidden variable, the training must be simultaneously performed, the discriminator + the fully-connected network + the MLP _ S form the stacked network to finish the classified recognition of the flow, the classification task is performed on the collected flow, the discriminator + the fully-connected network + the MLP _ H form the stacked network to finish the output of the hidden variable, and the generated flow is controlled by adjusting the hidden variable.
The special implicit features are professional terms, are used in the field of picture generation at first, and people do not know what meaning before model training is completed, and after the model training is completed, the picture generated by the generator is found to be changed after a certain value of the implicit variable is adjusted, for example, the hair color is found to be changed after the first feature of the implicit variable is changed, the hair style is found to be changed by adjusting the second feature, and the like. SSL is an existing encryption technique. In the method of the present invention, the certificates are SSL certificates.
The above description is only a preferred embodiment of the present invention, and it should be noted that, for those skilled in the art, several modifications and variations can be made without departing from the technical principle of the present invention, and these modifications and variations should also be regarded as the protection scope of the present invention.

Claims (10)

1. The method for identifying the semi-supervised encryption traffic of the countermeasure network generated based on the auxiliary classification is characterized by comprising the following steps:
1) monitoring a network card of the network equipment by a capture code based on the libpcap, capturing flow containing the flow needing to be identified and storing the flow as the pcap;
2) according to the known flow, marking the known flow in the captured pcap to form a marked pcap and an unmarked pcap;
3) three logs were extracted for marked and unmarked pcaps, respectively, using the open source traffic analysis tool zeek: connection logs, SSL logs and certificate logs;
4) extracting the three logs obtained in the step 3) according to the defined features to obtain a stream feature matrix containing the features, and forming a real data set for training;
5) constructing an auxiliary classification generation confrontation network structure, receiving a vector c synthesized by random noise, hidden variables and class labels by a generator, generating flow by the generator, receiving the flow characteristic matrix obtained in the step 4) by a discriminator, receiving the flow generated by the generator by the discriminator, carrying out recognition training, and outputting to three fully-connected neural networks by the discriminator to finish recognition of true and false flow, classification of classes and extraction of hidden parameters;
6) defining real data in the real data set as x, dividing the real data x into real unmarked data and real marked data, defining Pdata-unlabelDefining P for the probability distribution of true unlabeled datadata-labelDefining P for the probability distribution of true labeled datagTo generate a probability distribution of data, the generated data is defined by inputting a synthetic vector to a generator, and the countermeasure loss of a discriminator is expressed by using Wasserstein distance instead of the original log-likelihood functiondata-unlabel、Pdata-labelTo PgEarth-Mover distance of (1):
Figure FDA0002741018430000011
wherein sup is the minimum upper bound, | | f | | non-woven phosphorLIs a Lipschitz constraint that is a constraint,
Figure FDA0002741018430000012
is x belongs to pdata-labeldeIn the expectation that the position of the target is not changed,
Figure FDA0002741018430000013
is x belongs to pdata-unlabeldeIn the expectation that the position of the target is not changed,
Figure FDA0002741018430000014
is x belongs to Pg(x) is the probability distribution of the random variable x;
7) the classification loss adopts cross entropy loss, and if the output of MLP _ C is represented by y, y is usediThe value of each component of y is represented, the label corresponding to the input y flow is represented by y ', and y'iRepresenting the value of each component of y', m being the total number of classes, the classification penalty is:
Figure FDA0002741018430000015
8) input deviceThe hidden parameter vector of the generator is h ═ h (h)1,h2,……hn) The hidden parameter vector of the MLP _ H output is
Figure FDA0002741018430000021
The reduction loss for the implicit parameter is defined as
Figure FDA0002741018430000022
The reduction loss is the sum of the distances of the hidden parameter vector component of each input generator and the hidden parameter vector component of each MLP _ H output;
9) training by adopting a minipatch algorithm, taking any number of randomly distributed z and any label c with a label sample from random noise, generating a synthetic vector by matrix fusion of any number of hidden variables h, sending the synthetic vector to a generator for training, generating generated flow containing real flow characteristics, and sending any number of marked and unmarked real flows to a discriminator for training;
10) and alternately training the arbiter and the generator, and adopting the following parameter updating rules when the semi-supervised auxiliary label is trained to generate the countermeasure network: optimizing the countermeasure loss LSUpdating the generator, the discriminator and the network parameters of the MLP _ S; optimizing classification loss LcUpdating the network parameters of the generator, the discriminator and the MLP _ C; optimization of reduction loss LhUpdating the generator, the discriminator and the network parameters of the MLP _ H;
11) repeating the step 10), and achieving Nash balance through the counterstudy of the generator and the discriminator;
and performing a certain traffic identification and classification task on the traffic of the existing network based on the data preprocessing mode from the step 1) to the step 4), the trained discriminator and the MLP _ C.
2. The method for identifying semi-supervised encryption traffic based on assisted classification generation countermeasure network as claimed in claim 1, wherein in step 4), inter-flow features in the pcap are extracted by using a zeek source-opening tool according to the defined features, 28 flow features are formed, a flow feature matrix of the 28 features is obtained, and a real data set for training is formed.
3. The semi-supervised encryption traffic identification method for generation of the antagonistic network based on auxiliary classification as claimed in claim 1, wherein the structure of the antagonistic network generated based on the original auxiliary classification is modified, hidden variables are introduced, and the output of the generator can be controlled according to some special hidden features.
4. The semi-supervised encryption traffic identification method for generation of the countermeasure network based on auxiliary classification as claimed in claim 1, wherein a loss function of the original auxiliary classification generation countermeasure network is modified, a Wasserstein distance is used to replace an original log-likelihood function, and a restoration loss is defined.
5. The identification method for generating the semi-supervised encryption traffic of the countermeasure network based on the auxiliary classification as claimed in claim 1, wherein in the step 3), the log contains a log conn.log of the connection record, a log ssl.log of the SSL connection record and a log x509.log of the certificate record.
6. The identification method for generating the semi-supervised encryption traffic of the countermeasure network based on the auxiliary classification as claimed in claim 5, wherein in the log conn.log, the connection condition between two endpoints of two network connection parties is described; log, recording information including IP address, port, protocol of data packet sending and receiving party, network connection double-sending connection state, data packet sending and receiving party connection state, data packet quantity and label;
log SSL, each line describes the version of SSL/TLS in the SSL/TLS handshake and encryption setup process, the password used, the server name, the certificate path, the subject, and the issuer's information;
log x509.log, each row is a certificate record describing certificate information including a certificate serial number, a generic name, time validity, a subject, a signature algorithm, and a key length in bits.
7. The identification method for generating the semi-supervised encryption traffic of the countermeasure network based on the auxiliary classification as claimed in claim 5, wherein the flow characteristics comprise connection characteristics, SSL characteristics and certificate characteristics.
8. The method of claim 7, wherein the connection feature comprises:
(1) SSL aggregation and connection record number: each connection record in the SSL aggregation and connection record number comprises a certain number of SSL aggregation and connection records, and the first characteristic SSL aggregation and connection record number is only the sum of the SSL aggregation number and the connection record number;
(2) mean duration: each connection record in the SSL aggregation and connection record number contains a duration in seconds; for each incoming SSL aggregation and connection record in the number of connection records, this duration value is stored in a list, from which the average is finally calculated, setting t to contain the duration value, t ranging from { t } t1,t2,t3,…,tnThen the average value of t is:
Figure FDA0002741018430000031
(3) standard deviation of duration: calculate standard deviation of duration list:
Figure FDA0002741018430000032
Figure FDA0002741018430000033
Figure FDA0002741018430000034
(4) standard deviation range of duration: the percentage of the out-of-range of all the duration values, the range having an upper limit and a lower limit, the upper limit being the mean of t + the standard deviation, the lower limit being the mean of t-the standard deviation;
(5) payload bytes from originator: the initiator records the number of bytes of the transmitted effective load for all connections from conn.log;
(6) payload bytes from responder: log records the number of payload bytes sent for all connections from conn.log by the responder;
(7) ratio of responder bytes to all bytes: all bytes are bytes from the originator and bytes from the responder; the ratio of responder bytes to all bytes is:
Figure FDA0002741018430000041
where r is the number of bytes from the responder and o is the number of bytes from the originator;
(8) establishing the ratio of the connection states, wherein each connection record comprises the connection state; the connection state is divided into an established state and an unestablished state by 13 types; the established and non-established states describe whether there are any TCP handshakes, or just attempt to do a TCP handshake; the established states are [ SF, S1, S2, S3, RSTO, RSTR]Wherein a successful TCP handshake is included; the non-established state is [ OTH, SO, REJ, SH, SHR, RSTOS0, RSTRH]Including an unsuccessful handshake; the special meaning of each connection state is stored in the zeek document, and the ratio of establishing the connection state is:
Figure FDA0002741018430000042
where e is the number of established states and n is the number of non-established states;
(9) number of inbound messages: the number of upstream packets contained in the connection log;
(10) number of outgoing messages: the number of downstream packets contained in the connection log;
(11) cycle average: each connection record having a capture time, the periodicity of one of the groups being measured, the first step being to calculate the time difference between the connection records in sequence, the second step being to calculate the second time difference from the first time difference in absolute value, and if the second time difference is zero, it is intended that the associated connection record is periodic; third, the values from the second time difference are stored in a list, from which an average value is calculated, D being the value of the second time difference, DnThe value obtained by the difference between the capture time of the nth connection record and the (n + 1) th connection record, d1A value representing the difference between the time of capture of the first connection record and the time of capture of the second connection record, d1Is the first value 0, d of 2nd time difference2Making a difference between the time of capture recorded for the second connection and the time of capture recorded for the third connection, d2Second value of 0, d of 2nd time difference3The difference between the capture time of the second connection record and the capture time of the third connection record, d3 is the third value of 2nd time difference 15:
Figure FDA0002741018430000051
Figure FDA0002741018430000052
(12) periodic standard deviation: calculating the standard deviation of the value D of the second time difference by using the list of time difference values obtained in the previous step:
Figure FDA0002741018430000053
Figure FDA0002741018430000054
Figure FDA0002741018430000055
9. the method of claim 7, wherein the SSL characteristics comprise:
(13) ratio of connection record and SSL aggregation: the ratio between non-SSL connection records and SSL connection records is described, the ratio R of connection records and SSL aggregations being:
Figure FDA0002741018430000056
wherein f isnIs the number of connection records, f, without SSLsIs the number of connection records using SSL;
(14) ratio of TLS and SSL versions: all SSL connection records have a TLS or SSL protocol version for encryption, the SSL connection records including SSL 1.0, SSL 2.0, SSL 3.0, TLS 1.0, TLS 1.1, TLS 1.2 and TLS 1.3, SSLTLSTLS, this feature describing how many SSL connection records have a TLS protocol, the ratio R of TLS and SSL versions being:
Figure FDA0002741018430000057
wherein TLS is the number of SSL connection records with TLS protocol, SSL is the number of SSL connection records with SSL protocol;
(15) SNI ratio: SNI is the name of the server in the SSL connection record, and describes how many SSL connection records contain SNI, the SSL connection record of malware has more empty SNI than the SSL connection record of normal software, and the SNI ratio is:
Figure FDA0002741018430000061
wherein FsIs the number of SSL connection records with SNI, FaIs the number of all SSL connection records;
(16) SNI is a marker for IP: sometimes SSL connection records have SNI as the IP address; in this case, the SNI IP should be the same as the target IP address; if any of the SSL connection records in the connection log have SNI as IP, but SNI is different from DstIP, then this feature is-1; 0 if any SSL connection record has SNI as IP and SNI is the same as DstIP; if there is no SSL connection record of IP address, it is 1;
(17) average value of certificate path records, C is a list formed by storing the number of certificates in the certificate path for each SSL connection record, C1Storing the number of certificates in the certificate path on behalf of the first SSL connection record, C2Storing the number of certificates in the certificate path on behalf of the second SSL connection record, CnStoring the number of certificates in the certificate path on behalf of the nth SSL connection record, and calculating the average value of the certificate path records according to the list:
Figure FDA0002741018430000062
(18) zeek can identify whether the end user certificate is self-signed or not, and stores the end user certificate in the SSL connection record; the self-signed certificate ratio is the ratio of the self-signed certificates and all end-user certificates in the log, and the self-signed certificate ratio is the ratio of the number of the self-signed certificates and the number of all the certificates.
10. The method of claim 7, wherein the certificate feature comprises:
(19) public key mean: each certificate record describing a certificate contains the public key of the certificate, the public key in each certificate record is added to the list, let J beList formed by the number of public keys in each certificate record, j1For the number of public keys in the first certificate record, j2Number of public keys recorded for the second certificate, and so onnCalculate the average from the list for the number of public keys in the nth certificate record:
Figure FDA0002741018430000063
(20) average value of certificate validity period: each certificate has a validity period stored in the certificate record in unix time, the validity period is stored in the list in seconds in each certificate, G is the list formed by the validity periods of the certificates in the certificate record, G1For the validity period of the certificate in the first certificate record, g2For the validity period of the certificate in the second certificate record, gnFor the certificate validity period in the nth certificate record, then calculate the average from the list:
Figure FDA0002741018430000071
(21) certificate validity standard deviation: the list of certificate validity periods, which is the same as the average value of the certificate validity periods, calculates G standard deviation in seconds:
Figure FDA0002741018430000072
Figure FDA0002741018430000073
Figure FDA0002741018430000074
(22) validity of certificate deadline during capture: determining whether the certificate during the capturing is valid or not through the capturing time and the validity period of the certificate; it is normal if the capture time is within the certificate validity period, the validity of the certificate validity period during capture being the number of certificates that exceed the validity period during capture of traffic, malware using invalid certificates instead of ordinary certificates:
(23) mean value of certificate validity start time: the ratio of the two lengths of time; the first length is that the validity period of the certificate is set as K, K1Is the certificate validity period, k, of the first certificate record2Is the certificate validity period, k, of the second certificate recordnIs the certificate validity period of the Nth certificate record, and the second length is the time period from the certificate validity period to the capture is set as P, P1Is the period of time, p, from the validity period of the certificate to the capture of the first certificate record2Is the period of time from the validity period of the certificate to the capture of the second certificate record, and so onnIs the period of time from the validity period of the certificate to the capture of the nth certificate, thus calculating how old the certificate is, for each certificate, the ratio of these periods will be calculated and the result stored in a list, from which the average Z is then calculated:
Figure FDA0002741018430000075
(24) number of certificates: connection logs usually contain one certificate, but sometimes more; thus, the number of certificates is only that of one connection data;
(25) domain number average in certificate SAN DNS: SAN is a backup name that describes which domains belong to this certificate, and for each new incoming certificate the number of dns in the SAN is stored in a list, and then an average is calculated from the list;
(26) ratio of certificate record to SSL connection record: the number of SSL connection records with a certificate path is described, since a certificate record can be added to the SSL aggregation in case it is included in the certificate path as a first certificate, the ratio of certificate record to SSL connection record being the ratio of the number of certificate records and the number of SSL connection records of one connection log;
(27) whether there is SNI in SAN DNS: SNI is the indication of the server name contained in the SSL connection record, SAN DNS is the domain in the certificate record belonging to the certificate; SNI is part of SAN DNS; if any certificate log does not contain the SNI in the SSL connection record in one SSL aggregation, then whether there is a SNI in the SAN DNS with a value of 0; if all certificate logs contain the SNI in the SAN DNS in each pair of SSL aggregation in the connection log, judging whether the value of the SNI in the SAN DNS is 1 or not;
(28) whether there is a CN in the SAN DNS: CN is generic name CN is part of SAN DNS; if none of the certificates contains a CN in the SAN DNS, whether the functional value of the CN in the AN DNS is 0 or not is judged; if all certificates contain a CN in san.dns, then if there is a CN with a functional value of 1 in AN DNS.
CN202011150439.4A 2020-10-24 2020-10-24 Semi-supervised encryption traffic identification method for generating countermeasure network based on auxiliary classification Pending CN112270351A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011150439.4A CN112270351A (en) 2020-10-24 2020-10-24 Semi-supervised encryption traffic identification method for generating countermeasure network based on auxiliary classification

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011150439.4A CN112270351A (en) 2020-10-24 2020-10-24 Semi-supervised encryption traffic identification method for generating countermeasure network based on auxiliary classification

Publications (1)

Publication Number Publication Date
CN112270351A true CN112270351A (en) 2021-01-26

Family

ID=74342124

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011150439.4A Pending CN112270351A (en) 2020-10-24 2020-10-24 Semi-supervised encryption traffic identification method for generating countermeasure network based on auxiliary classification

Country Status (1)

Country Link
CN (1) CN112270351A (en)

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112884075A (en) * 2021-03-23 2021-06-01 北京天融信网络安全技术有限公司 Traffic data enhancement method, traffic data classification method and related device
CN113205140A (en) * 2021-05-06 2021-08-03 中国人民解放军海军航空大学航空基础学院 Semi-supervised specific radiation source individual identification method based on generative countermeasure network
CN113259313A (en) * 2021-03-30 2021-08-13 浙江工业大学 Malicious HTTPS flow intelligent analysis method based on online training algorithm
CN113612767A (en) * 2021-07-31 2021-11-05 中山大学 Encrypted malicious flow detection method and system based on multitask learning enhancement
CN113726615A (en) * 2021-11-02 2021-11-30 北京广通优云科技股份有限公司 Encryption service stability judgment method based on network behaviors in IT intelligent operation and maintenance system
CN114024721A (en) * 2021-10-13 2022-02-08 国网浙江省电力有限公司宁波供电公司 Traffic classification identification method based on transmission channel quality sequencing
CN115021981A (en) * 2022-05-18 2022-09-06 桂林电子科技大学 Industrial control system intrusion detection and tracing method

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109740677A (en) * 2019-01-07 2019-05-10 湖北工业大学 It is a kind of to improve the semisupervised classification method for generating confrontation network based on principal component analysis
CN110689086A (en) * 2019-10-08 2020-01-14 郑州轻工业学院 Semi-supervised high-resolution remote sensing image scene classification method based on generating countermeasure network
CN111611280A (en) * 2020-04-29 2020-09-01 南京理工大学 Encrypted traffic identification method based on CNN and SAE

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109740677A (en) * 2019-01-07 2019-05-10 湖北工业大学 It is a kind of to improve the semisupervised classification method for generating confrontation network based on principal component analysis
CN110689086A (en) * 2019-10-08 2020-01-14 郑州轻工业学院 Semi-supervised high-resolution remote sensing image scene classification method based on generating countermeasure network
CN111611280A (en) * 2020-04-29 2020-09-01 南京理工大学 Encrypted traffic identification method based on CNN and SAE

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
FRANTISEK STRASAK: ""Detection of HTTPS Malware Traffic"", 《CZECH TECHNICAL UNIVERSITY IN PRAGUE》 *
尹传龙: ""基于深度学习的网络异常检测技术研究"", 《中国博士学位论文全文数据库信息科技辑》 *
张扬 等: ""基于改进生成对抗网络的动漫人物头像生成算法"", 《计算机科学》 *

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112884075A (en) * 2021-03-23 2021-06-01 北京天融信网络安全技术有限公司 Traffic data enhancement method, traffic data classification method and related device
CN113259313A (en) * 2021-03-30 2021-08-13 浙江工业大学 Malicious HTTPS flow intelligent analysis method based on online training algorithm
CN113205140A (en) * 2021-05-06 2021-08-03 中国人民解放军海军航空大学航空基础学院 Semi-supervised specific radiation source individual identification method based on generative countermeasure network
CN113612767A (en) * 2021-07-31 2021-11-05 中山大学 Encrypted malicious flow detection method and system based on multitask learning enhancement
CN114024721A (en) * 2021-10-13 2022-02-08 国网浙江省电力有限公司宁波供电公司 Traffic classification identification method based on transmission channel quality sequencing
CN113726615A (en) * 2021-11-02 2021-11-30 北京广通优云科技股份有限公司 Encryption service stability judgment method based on network behaviors in IT intelligent operation and maintenance system
CN113726615B (en) * 2021-11-02 2022-02-15 北京广通优云科技股份有限公司 Encryption service stability judgment method based on network behaviors in IT intelligent operation and maintenance system
CN115021981A (en) * 2022-05-18 2022-09-06 桂林电子科技大学 Industrial control system intrusion detection and tracing method

Similar Documents

Publication Publication Date Title
CN112270351A (en) Semi-supervised encryption traffic identification method for generating countermeasure network based on auxiliary classification
CN111865815B (en) Flow classification method and system based on federal learning
CN105871832B (en) A kind of network application encryption method for recognizing flux and its device based on protocol attribute
Dorri et al. On the activity privacy of blockchain for IoT
CN102315974B (en) Stratification characteristic analysis-based method and apparatus thereof for on-line identification for TCP, UDP flows
CN106209775B (en) A kind of application type recognition methods of SSL encryption network flow and device
CN102164049B (en) Universal identification method for encrypted flow
Alshammari et al. A flow based approach for SSH traffic detection
CN111464485A (en) Encrypted proxy flow detection method and device
CN108768986A (en) A kind of encryption traffic classification method and server, computer readable storage medium
Kornycky et al. Radio frequency traffic classification over WLAN
CN104506356B (en) A kind of method and apparatus of determining IP address credit worthiness
CN106815511B (en) Information processing unit and method
Yang et al. Bayesian neural network based encrypted traffic classification using initial handshake packets
CN113676348A (en) Network channel cracking method, device, server and storage medium
CN112769623A (en) Internet of things equipment identification method under edge environment
CN117955745B (en) Network attack homology analysis method integrating network flow characteristics and threat information
Mazhar Rathore et al. Exploiting encrypted and tunneled multimedia calls in high-speed big data environment
CN114401097B (en) HTTPS service flow identification method based on SSL certificate fingerprint
US8284764B1 (en) VoIP traffic behavior profiling method
CN109617904A (en) A kind of HTTPS application and identification method in IPv6 network
CN111200543A (en) Encryption protocol identification method based on active service detection engine technology
CN113726809B (en) Internet of things equipment identification method based on flow data
CN116346483A (en) Encrypted mining behavior identification method and device
Wang et al. Ensemble classifier for traffic in presence of changing distributions

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20210126

RJ01 Rejection of invention patent application after publication