CN112134829B - Method and device for generating encrypted traffic feature set - Google Patents

Method and device for generating encrypted traffic feature set Download PDF

Info

Publication number
CN112134829B
CN112134829B CN201910555508.0A CN201910555508A CN112134829B CN 112134829 B CN112134829 B CN 112134829B CN 201910555508 A CN201910555508 A CN 201910555508A CN 112134829 B CN112134829 B CN 112134829B
Authority
CN
China
Prior art keywords
encrypted traffic
certificate
flow
feature
features
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201910555508.0A
Other languages
Chinese (zh)
Other versions
CN112134829A (en
Inventor
梁兴强
李波
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Guancheng Technology Co ltd
Original Assignee
Beijing Guancheng Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Guancheng Technology Co ltd filed Critical Beijing Guancheng Technology Co ltd
Priority to CN201910555508.0A priority Critical patent/CN112134829B/en
Publication of CN112134829A publication Critical patent/CN112134829A/en
Application granted granted Critical
Publication of CN112134829B publication Critical patent/CN112134829B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/14Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic
    • H04L63/1408Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic by monitoring network traffic
    • H04L63/1416Event detection, e.g. attack signature detection
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/14Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic
    • H04L63/1441Countermeasures against malicious traffic
    • H04L63/145Countermeasures against malicious traffic the attack involving the propagation of malware through the network, e.g. viruses, trojans or worms
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D30/00Reducing energy consumption in communication networks
    • Y02D30/50Reducing energy consumption in communication networks in wire-line communication networks, e.g. low power modes or reduced link rate

Abstract

The application discloses a method and a device for generating an encrypted traffic feature set, which are used for determining malicious encrypted traffic sample data and normal encrypted traffic sample data; deep protocol analysis is carried out on the encrypted traffic feature set to obtain an initial encrypted traffic feature set, wherein the initial encrypted traffic feature set comprises a flow feature subset, a handshake feature subset, a certificate feature subset and a domain name feature subset; and selecting the characteristics meeting preset conditions from the initial encrypted flow characteristic set to form a final encrypted flow characteristic set. Based on the method and the device, key features in the encrypted traffic can be extracted comprehensively, distinguishing features of malicious encrypted traffic and normal encrypted traffic are extracted from four aspects of flow features, handshake features, certificate features and domain name features to generate an encrypted traffic feature set, the features covered by the generated encrypted traffic feature set are more comprehensive, encrypted traffic details can be reflected objectively, and the detection accuracy can be ensured when the method and the device are applied to detecting malicious encrypted traffic.

Description

Method and device for generating encrypted traffic feature set
Technical Field
The present invention relates to the field of malicious encrypted traffic analysis technologies, and in particular, to a method and an apparatus for generating an encrypted traffic feature set.
Background
Malicious traffic refers to computer network traffic that an attacker specially constructs to attack a particular target, typically generated by malicious programs and propagated through the network. The method is an important task of network security, and can accurately and timely identify malicious traffic and take emergency treatment measures.
In recent years, malicious traffic using an encrypted communication protocol grows rapidly, and because traffic decryption cannot be performed, a detection technology constructed based on plaintext protocol analysis faces serious challenges in detecting malicious encrypted traffic, for example, because effective detection rules cannot be extracted from the malicious encrypted traffic basically, the rule detection technology constructed based on plaintext protocol analysis cannot be applied; because malicious encryption traffic cannot restore files, file detection technology constructed based on plaintext protocol analysis cannot be applied; since it is difficult to extract explicit behavior content from malicious encrypted traffic, behavior detection techniques constructed based on plaintext protocol analysis cannot be applied to detection of malicious encrypted traffic.
With the development of machine learning, the application of machine learning to the detection of malicious encrypted traffic would be a very good means. However, the core of machine learning is the selection of an encryption traffic feature set, and the quality of the encryption traffic feature set often directly determines the accuracy of malicious encryption traffic detection.
However, there is currently no sophisticated solution for feature set construction for encrypted traffic. Therefore, how to construct high-quality encrypted traffic feature sets is a technical problem that needs to be solved by those skilled in the art.
Disclosure of Invention
In view of the foregoing, the present application has been developed to provide a method and apparatus for generating an encrypted traffic feature set that overcome, or at least partially solve, the foregoing problems. The specific scheme is as follows:
a method of generating an encrypted traffic feature set, the method comprising:
determining malicious encrypted traffic sample data and normal encrypted traffic sample data;
deep protocol analysis is carried out on the malicious encrypted traffic sample data and the normal encrypted traffic sample data to obtain an initial encrypted traffic feature set, wherein the initial encrypted traffic feature set comprises a flow feature subset, a handshake feature subset, a certificate feature subset and a domain name feature subset;
and selecting the features meeting the preset conditions from the features included in the stream feature subset, the handshake feature subset, the certificate feature subset and the domain name feature subset to form a final encrypted flow feature set.
Optionally, the flow feature subset includes any one or more of the following flow features:
maximum, minimum, mean, standard deviation of unidirectional and bidirectional stream durations;
maximum value, minimum value, mean value and standard deviation of the sizes of unidirectional flow and bidirectional flow packets;
byte distribution and distribution mean values of unidirectional flow and bidirectional flow packets;
maximum value, minimum value, mean value and standard deviation of arrival time interval of unidirectional flow and bidirectional flow packets;
the size transition probability of unidirectional flow and bidirectional flow packets;
unidirectional and bidirectional flow packet arrival time interval transition probabilities.
Optionally, the subset of handshake features includes any one or more of the following handshake features:
secure sockets layer SSL protocol version;
an encryption algorithm selected by a protocol;
the protocol selects the length of the public key;
number of protocol extensions;
protocol extension item length;
a password suite provided by the client;
and the server selects the password suite.
Optionally, the certificate feature subset includes any one or more of the following certificate features:
a certificate version;
certificate sequence number length;
a signature algorithm;
signature public key length;
the number of certificate issuer information items;
the number of certificate user information items;
certificate validation time and certificate expiration time;
certificate expiration days;
whether the certificate is a self-signed certificate.
Optionally, the domain name feature subset includes any one or more of the following domain name features:
domain name length;
ranking of domain names;
the subdomain name length;
a main domain name length;
the domain name contains the number of digits;
the domain name contains the number of special characters;
whether the domain name contains wild cards.
Optionally, the selecting features meeting a preset condition from the features included in the flow feature subset, the handshake feature subset, the certificate feature subset and the domain name feature subset to form a final encrypted traffic feature set includes:
and respectively selecting the features meeting the preset conditions from the features included in the stream feature subset, the handshake feature subset, the certificate feature subset and the domain name feature subset to form a final encrypted flow feature set by a comprehensive feature engineering method combining basic statistic analysis, visual analysis and domain knowledge analysis, wherein the features meeting the preset conditions are features with the distinguishing degree exceeding a preset threshold on malicious encrypted flow and normal encrypted flow.
Optionally, the performing deep protocol analysis on the malicious encrypted traffic sample data and the normal encrypted traffic sample data includes:
and carrying out deep protocol analysis on the key negotiation session process of the malicious encrypted traffic sample data and the key negotiation session process of the normal encrypted traffic sample data.
An apparatus for generating an encrypted traffic feature set, comprising:
the sample data determining unit is used for determining malicious encrypted traffic sample data and normal encrypted traffic sample data;
an initial encrypted traffic feature set obtaining unit, configured to perform in-depth protocol analysis on the malicious encrypted traffic sample data and the normal encrypted traffic sample data, to obtain an initial encrypted traffic feature set, where the initial encrypted traffic feature set includes a flow feature subset, a handshake feature subset, a certificate feature subset, and a domain name feature subset;
and the final encrypted traffic feature set acquisition unit is used for respectively selecting features meeting preset conditions from the features included in the stream feature subset, the handshake feature subset, the certificate feature subset and the domain name feature subset to form a final encrypted traffic feature set.
A storage medium having stored thereon a program which, when executed by a processor, implements a method of generating an encrypted traffic feature set as described above.
An electronic device comprising a memory for storing a program and a processor for running the program, wherein the program is run to perform a method of generating an encrypted traffic feature set as described above.
By means of the technical scheme, the application discloses a method and a device for generating an encrypted traffic feature set, and malicious encrypted traffic sample data and normal encrypted traffic sample data are determined; deep protocol analysis is carried out on the malicious encrypted traffic sample data and the normal encrypted traffic sample data to obtain an initial encrypted traffic feature set, wherein the initial encrypted traffic feature set comprises a flow feature subset, a handshake feature subset, a certificate feature subset and a domain name feature subset; and selecting the features meeting the preset conditions from the features included in the stream feature subset, the handshake feature subset, the certificate feature subset and the domain name feature subset to form a final encrypted flow feature set. Based on the method and the device, key features in the encrypted traffic can be extracted comprehensively, distinguishing features of malicious encrypted traffic and normal encrypted traffic are extracted from four aspects of flow features, handshake features, certificate features and domain name features to generate an encrypted traffic feature set, the features covered by the generated encrypted traffic feature set are more comprehensive, encrypted traffic details can be reflected objectively, and the detection accuracy can be ensured when the method and the device are applied to detecting malicious encrypted traffic.
The foregoing description is only an overview of the technical solutions of the present application, and may be implemented according to the content of the specification in order to make the technical means of the present application more clearly understood, and in order to make the above-mentioned and other objects, features and advantages of the present application more clearly understood, the following detailed description of the present application will be given.
Drawings
Various other advantages and benefits will become apparent to those of ordinary skill in the art upon reading the following detailed description of the preferred embodiments. The drawings are only for purposes of illustrating the preferred embodiments and are not to be construed as limiting the application. Also, like reference numerals are used to designate like parts throughout the figures. In the drawings:
fig. 1 is a flow chart of a method for generating an encrypted traffic feature set according to an embodiment of the present application;
fig. 2 is a schematic diagram of an encryption session feature provided in an embodiment of the present application;
FIG. 3 is a chart showing the statistics of the number of issuer items in the certificate feature provided in the embodiment of the present application;
fig. 4 is a statistical comparison chart of the number of client protocol extension items in the handshake feature provided in the embodiment of the present application;
fig. 5 is a schematic structural diagram of an apparatus for generating an encrypted traffic feature set according to an embodiment of the present application.
Detailed Description
Exemplary embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings. While exemplary embodiments of the present disclosure are shown in the drawings, it should be understood that the present disclosure may be embodied in various forms and should not be limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the disclosure to those skilled in the art.
Specific embodiments provided in the embodiments of the present application are described in detail below.
Referring to fig. 1, fig. 1 is a flowchart of a method for generating an encrypted traffic feature set according to an embodiment of the present application, where the method includes the following steps:
s101: malicious encrypted traffic sample data and normal encrypted traffic sample data are determined.
The number of the malicious encrypted traffic sample data and the normal encrypted traffic sample data can be multiple, and the tags for the malicious encrypted traffic sample data and the normal encrypted traffic sample data can be set for subsequent processing training, for example, the tags for the malicious encrypted traffic sample data are set as "malicious" and the tags for the normal encrypted traffic sample data are "normal".
After the malicious encrypted traffic sample data and the normal encrypted traffic sample data are determined, the malicious encrypted traffic samples and the normal encrypted traffic samples may be subjected to stream segmentation according to the quadruple to generate a network stream.
S102: and carrying out deep protocol analysis on the malicious encrypted traffic sample data and the normal encrypted traffic sample data to obtain an initial encrypted traffic feature set, wherein the initial encrypted traffic feature set comprises a flow feature subset, a handshake feature subset, a certificate feature subset and a domain name feature subset.
It should be noted that, in this step, deep protocol analysis may be performed on the malicious encrypted traffic sample data and the normal encrypted traffic sample data, which specifically may be performed on the network flow generated after the flow splitting in the previous step. As an implementation manner, the key negotiation session process of the malicious encrypted traffic sample data and the key negotiation session process of the normal encrypted traffic sample data may be subjected to deep protocol analysis.
Through deep analysis of a certain encrypted traffic, the whole encrypted session feature can be classified into a flow feature, a handshake feature, a certificate feature and a domain name feature, so that the initial encrypted traffic feature set comprises a flow feature subset, a handshake feature subset, a certificate feature subset and a domain name feature subset, and each feature subset contains respective specific features. The four types of feature subsets are fine-grained to cover the whole encryption session, so that the detail of the encryption session can be fully and objectively embodied. As shown in fig. 2.
The following is a detailed description of four types of feature subsets
(1) Flow feature subset
The protocol independent feature is called a flow feature, which is related only to the network packet size and arrival time. Flow characteristics fall into two categories, unidirectional and bidirectional in particular. And the unidirectional flow characteristics respectively examine the characteristics of the Client end, the Server end and the bidirectional flow characteristics, and then examine the characteristics of the whole flow session. The partial flow features are listed below:
maximum, minimum, mean, standard deviation of unidirectional and bidirectional stream durations;
maximum value, minimum value, mean value and standard deviation of the sizes of unidirectional flow and bidirectional flow packets;
byte distribution and distribution mean values of unidirectional flow and bidirectional flow packets;
maximum value, minimum value, mean value and standard deviation of arrival time interval of unidirectional flow and bidirectional flow packets;
the size transition probability of unidirectional flow and bidirectional flow packets;
the probability of transition between arrival time intervals of unidirectional flow and bidirectional flow packets;
other flow characteristics.
It should be noted that the subset of flow features includes any one or more of the flow features described above.
(2) Handshake feature subset
The handshake feature refers to a feature extracted during a handshake negotiation phase of an encryption protocol. Representative handshaking features are:
SSL version;
an encryption algorithm selected by a protocol;
the protocol selects the length of the public key;
number of protocol extensions;
protocol extension item length;
the Client provides a password suite;
a password suite selected by a Server end;
other handshaking features.
It should be noted that the subset of handshake features includes any one or more of the above handshake features.
(3) Certificate feature subset
The certificate features refer to features of using certificates in the SSL (Secure Sockets Layer, secure socket layer) protocol process, and specifically include the following:
a certificate version;
certificate sequence number length;
a signature algorithm;
signature public key length;
the number of certificate issuer information items;
the number of certificate user information items;
certificate validation time and certificate expiration time;
certificate expiration days;
whether the certificate is a self-signed certificate;
other certificate features.
It should be noted that any one or more of the above certificate features are included in the certificate feature subset.
(4) Domain name feature subset
The domain name features refer to features of SNI (Server Name Indicator, server name indication) information in the SSL session, and specifically include the following:
domain name length;
alexa ranking of domain names;
the subdomain name length;
a main domain name length;
the domain name contains the number of digits;
the domain name contains the number of special characters;
whether the domain name contains wild cards;
other DNS features.
It should be noted that the domain name feature subset includes any one or more of the above domain name features.
S103: and selecting the features meeting the preset conditions from the features included in the stream feature subset, the handshake feature subset, the certificate feature subset and the domain name feature subset to form a final encrypted flow feature set.
After the initial encrypted traffic feature set is generated, features included in the flow feature subset, the handshake feature subset, the credential feature subset, and the domain name feature subset are processed. In order to select a plurality of features with the best effect on the encrypted malicious traffic scene, a comprehensive feature engineering method combining basic statistical value analysis, visual analysis and domain knowledge analysis is used for processing the features included in the stream feature subset, the handshake feature subset, the certificate feature subset and the domain name feature subset, and features meeting preset conditions are selected from the features included in the stream feature subset, the handshake feature subset, the certificate feature subset and the domain name feature subset to form a final encrypted traffic feature set, wherein the features meeting the preset conditions are features with the degree of distinction between malicious encrypted traffic and normal encrypted traffic exceeding a preset threshold. The characteristic meeting the preset condition is that the distinguishing degree of malicious encrypted traffic and normal encrypted traffic exceeds a preset threshold value.
The feature engineering effect is illustrated in this application because the number of features included in the flow feature subset, the handshake feature subset, the certificate feature subset, and the domain name feature subset is large.
Example one:
FIG. 3 is a statistical comparison of issuer project numbers in certificate features, where light gray represents normal encrypted traffic and dark gray represents malicious encrypted traffic. It is evident from the figure that more than 80% of malicious traffic has only 1 issuer item, while certificates in normal traffic tend to fill out more issuer items.
Example two:
fig. 4 is a diagram of statistics versus number of client protocol extensions in a handshake feature, where light gray represents normal encrypted traffic and dark gray represents malicious encrypted traffic. It can be seen from the figure that the normal encrypted traffic has significantly more extension than the malicious encrypted traffic.
The final encrypted traffic feature set can be applied to model training and parameter adjustment, and a final model is obtained for actual malicious encrypted traffic detection.
The embodiment discloses a method for generating an encrypted traffic feature set, which is used for determining malicious encrypted traffic sample data and normal encrypted traffic sample data; deep protocol analysis is carried out on the malicious encrypted traffic sample data and the normal encrypted traffic sample data to obtain an initial encrypted traffic feature set, wherein the initial encrypted traffic feature set comprises a flow feature subset, a handshake feature subset, a certificate feature subset and a domain name feature subset; and selecting the features meeting the preset conditions from the features included in the stream feature subset, the handshake feature subset, the certificate feature subset and the domain name feature subset to form a final encrypted flow feature set. Based on the method, key features in the encrypted traffic can be extracted comprehensively, distinguishing features of malicious encrypted traffic and normal encrypted traffic are extracted from four aspects of flow features, handshake features, certificate features and domain name features to generate an encrypted traffic feature set, the features covered by the generated encrypted traffic feature set are more comprehensive, encrypted traffic details can be reflected objectively, and the detection accuracy can be ensured when the method is applied to detecting malicious encrypted traffic.
Referring to fig. 5, fig. 5 is a schematic structural diagram of an apparatus for generating an encrypted traffic feature set according to an embodiment of the present application, where the apparatus includes the following units:
a sample data determining unit 51 for determining malicious encrypted traffic sample data and normal encrypted traffic sample data;
an initial encrypted traffic feature set obtaining unit 52, configured to perform in-depth protocol analysis on the malicious encrypted traffic sample data and the normal encrypted traffic sample data, to obtain an initial encrypted traffic feature set, where the initial encrypted traffic feature set includes a flow feature subset, a handshake feature subset, a certificate feature subset, and a domain name feature subset;
a final encrypted traffic feature set obtaining unit 53, configured to select features that meet a preset condition from the features included in the flow feature subset, the handshake feature subset, the certificate feature subset, and the domain name feature subset, respectively, to form a final encrypted traffic feature set.
It should be noted that, the specific implementation of each unit is described in detail in the method embodiment, please refer to the related content in the method embodiment specifically, and this embodiment is not repeated.
The device for generating the encrypted traffic characteristic set comprises a processor and a memory, wherein each unit is stored in the memory as a program unit, and the processor executes the program unit stored in the memory to realize corresponding functions.
The processor includes a kernel, and the kernel fetches the corresponding program unit from the memory. The kernel may be provided with one or more kernel parameters to enable the generation of encrypted traffic feature sets.
The memory may include volatile memory, random Access Memory (RAM), and/or nonvolatile memory, such as Read Only Memory (ROM) or flash memory (flash RAM), among other forms in computer readable media, the memory including at least one memory chip.
The embodiment of the application provides a storage medium, on which a program is stored, which when executed by a processor implements the method of generating an encrypted traffic feature set.
The embodiment of the application provides a processor for running a program, wherein the program runs to execute the method for generating the encrypted traffic characteristic set.
The embodiment of the application provides an electronic device, which comprises a processor, a memory and a program stored on the memory and capable of running on the processor, wherein the following steps are realized when the processor executes the program:
determining malicious encrypted traffic sample data and normal encrypted traffic sample data;
deep protocol analysis is carried out on the malicious encrypted traffic sample data and the normal encrypted traffic sample data to obtain an initial encrypted traffic feature set, wherein the initial encrypted traffic feature set comprises a flow feature subset, a handshake feature subset, a certificate feature subset and a domain name feature subset;
and selecting the features meeting the preset conditions from the features included in the stream feature subset, the handshake feature subset, the certificate feature subset and the domain name feature subset to form a final encrypted flow feature set.
Optionally, the flow feature subset includes any one or more of the following flow features:
maximum, minimum, mean, standard deviation of unidirectional and bidirectional stream durations;
maximum value, minimum value, mean value and standard deviation of the sizes of unidirectional flow and bidirectional flow packets;
byte distribution and distribution mean values of unidirectional flow and bidirectional flow packets;
maximum value, minimum value, mean value and standard deviation of arrival time interval of unidirectional flow and bidirectional flow packets;
the size transition probability of unidirectional flow and bidirectional flow packets;
unidirectional and bidirectional flow packet arrival time interval transition probabilities.
Optionally, the subset of handshake features includes any one or more of the following handshake features:
secure sockets layer SSL protocol version;
an encryption algorithm selected by a protocol;
the protocol selects the length of the public key;
number of protocol extensions;
protocol extension item length;
a password suite provided by the client;
and the server selects the password suite.
Optionally, the certificate feature subset includes any one or more of the following certificate features:
a certificate version;
certificate sequence number length;
a signature algorithm;
signature public key length;
the number of certificate issuer information items;
the number of certificate user information items;
certificate validation time and certificate expiration time;
certificate expiration days;
whether the certificate is a self-signed certificate.
Optionally, the domain name feature subset includes any one or more of the following domain name features:
domain name length;
ranking of domain names;
the subdomain name length;
a main domain name length;
the domain name contains the number of digits;
the domain name contains the number of special characters;
whether the domain name contains wild cards.
Optionally, the selecting features meeting a preset condition from the features included in the flow feature subset, the handshake feature subset, the certificate feature subset and the domain name feature subset to form a final encrypted traffic feature set includes:
and respectively selecting the features meeting the preset conditions from the features included in the stream feature subset, the handshake feature subset, the certificate feature subset and the domain name feature subset to form a final encrypted flow feature set by a comprehensive feature engineering method combining basic statistic analysis, visual analysis and domain knowledge analysis, wherein the features meeting the preset conditions are features with the distinguishing degree exceeding a preset threshold on malicious encrypted flow and normal encrypted flow.
Optionally, the performing deep protocol analysis on the malicious encrypted traffic sample data and the normal encrypted traffic sample data includes:
and carrying out deep protocol analysis on the key negotiation session process of the malicious encrypted traffic sample data and the key negotiation session process of the normal encrypted traffic sample data.
The electronic device herein may be a server, a PC, a PAD, a mobile phone, etc.
The present application also provides a computer program product adapted to perform, when executed on a data processing device, a program initialized with the method steps of:
determining malicious encrypted traffic sample data and normal encrypted traffic sample data;
deep protocol analysis is carried out on the malicious encrypted traffic sample data and the normal encrypted traffic sample data to obtain an initial encrypted traffic feature set, wherein the initial encrypted traffic feature set comprises a flow feature subset, a handshake feature subset, a certificate feature subset and a domain name feature subset;
and selecting the features meeting the preset conditions from the features included in the stream feature subset, the handshake feature subset, the certificate feature subset and the domain name feature subset to form a final encrypted flow feature set.
Optionally, the flow feature subset includes any one or more of the following flow features:
maximum, minimum, mean, standard deviation of unidirectional and bidirectional stream durations;
maximum value, minimum value, mean value and standard deviation of the sizes of unidirectional flow and bidirectional flow packets;
byte distribution and distribution mean values of unidirectional flow and bidirectional flow packets;
maximum value, minimum value, mean value and standard deviation of arrival time interval of unidirectional flow and bidirectional flow packets;
the size transition probability of unidirectional flow and bidirectional flow packets;
unidirectional and bidirectional flow packet arrival time interval transition probabilities.
Optionally, the subset of handshake features includes any one or more of the following handshake features:
secure sockets layer SSL protocol version;
an encryption algorithm selected by a protocol;
the protocol selects the length of the public key;
number of protocol extensions;
protocol extension item length;
a password suite provided by the client;
and the server selects the password suite.
Optionally, the certificate feature subset includes any one or more of the following certificate features:
a certificate version;
certificate sequence number length;
a signature algorithm;
signature public key length;
the number of certificate issuer information items;
the number of certificate user information items;
certificate validation time and certificate expiration time;
certificate expiration days;
whether the certificate is a self-signed certificate.
Optionally, the domain name feature subset includes any one or more of the following domain name features:
domain name length;
ranking of domain names;
the subdomain name length;
a main domain name length;
the domain name contains the number of digits;
the domain name contains the number of special characters;
whether the domain name contains wild cards.
Optionally, the selecting features meeting a preset condition from the features included in the flow feature subset, the handshake feature subset, the certificate feature subset and the domain name feature subset to form a final encrypted traffic feature set includes:
and respectively selecting the features meeting the preset conditions from the features included in the stream feature subset, the handshake feature subset, the certificate feature subset and the domain name feature subset to form a final encrypted flow feature set by a comprehensive feature engineering method combining basic statistic analysis, visual analysis and domain knowledge analysis, wherein the features meeting the preset conditions are features with the distinguishing degree exceeding a preset threshold on malicious encrypted flow and normal encrypted flow.
Optionally, the performing deep protocol analysis on the malicious encrypted traffic sample data and the normal encrypted traffic sample data includes:
and carrying out deep protocol analysis on the key negotiation session process of the malicious encrypted traffic sample data and the key negotiation session process of the normal encrypted traffic sample data.
It will be appreciated by those skilled in the art that embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment, or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
In one typical configuration, a computing device includes one or more processors (CPUs), input/output interfaces, network interfaces, and memory.
The memory may include volatile memory in a computer-readable medium, random Access Memory (RAM) and/or nonvolatile memory, etc., such as Read Only Memory (ROM) or flash RAM. Memory is an example of a computer-readable medium.
Computer readable media, including both non-transitory and non-transitory, removable and non-removable media, may implement information storage by any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of storage media for a computer include, but are not limited to, phase change memory (PRAM), static Random Access Memory (SRAM), dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), read Only Memory (ROM), electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technology, compact disc read only memory (CD-ROM), digital Versatile Discs (DVD) or other optical storage, magnetic cassettes, magnetic tape magnetic disk storage or other magnetic storage devices, or any other non-transmission medium, which can be used to store information that can be accessed by a computing device. Computer-readable media, as defined herein, does not include transitory computer-readable media (transmission media), such as modulated data signals and carrier waves.
It should also be noted that the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article or apparatus that comprises an element.
It will be appreciated by those skilled in the art that embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
The foregoing is merely exemplary of the present application and is not intended to limit the present application. Various modifications and changes may be made to the present application by those skilled in the art. Any modifications, equivalent substitutions, improvements, etc. which are within the spirit and principles of the present application are intended to be included within the scope of the claims of the present application.

Claims (5)

1. A method of generating an encrypted traffic feature set, the method comprising:
determining malicious encrypted traffic sample data and normal encrypted traffic sample data;
deep protocol analysis is carried out on the malicious encrypted traffic sample data and the normal encrypted traffic sample data to obtain an initial encrypted traffic feature set, wherein the initial encrypted traffic feature set comprises a flow feature subset, a handshake feature subset, a certificate feature subset and a domain name feature subset; the flow characteristics are related to the size and arrival time of network data packets only, the characteristics and handshake characteristics irrelevant to the protocol refer to the characteristics extracted in the handshake negotiation stage of the encryption protocol, the certificate characteristics refer to the characteristics of using certificates in the process of the secure socket layer protocol, and the domain name characteristics refer to the characteristics of server name indication information in the secure socket layer session;
selecting features meeting preset conditions from the features included in the flow feature subset, the handshake feature subset, the certificate feature subset and the domain name feature subset to form a final encrypted flow feature set;
wherein said performing deep protocol analysis on said malicious encrypted traffic sample data and said normal encrypted traffic sample data comprises:
performing deep protocol analysis on the key negotiation session process of the malicious encrypted traffic sample data and the key negotiation session process of the normal encrypted traffic sample data;
the flow characteristic subset comprises any one or more of the following flow characteristics:
maximum, minimum, mean, standard deviation of unidirectional and bidirectional stream durations;
maximum value, minimum value, mean value and standard deviation of the sizes of unidirectional flow and bidirectional flow packets;
byte distribution and distribution mean values of unidirectional flow and bidirectional flow packets;
maximum value, minimum value, mean value and standard deviation of arrival time interval of unidirectional flow and bidirectional flow packets;
the size transition probability of unidirectional flow and bidirectional flow packets;
the probability of transition between arrival time intervals of unidirectional flow and bidirectional flow packets;
the subset of handshake features includes any one or more of the following handshake features:
secure sockets layer SSL protocol version;
an encryption algorithm selected by a protocol;
the protocol selects the length of the public key;
number of protocol extensions;
protocol extension item length;
a password suite provided by the client;
a password suite selected by the server;
the certificate feature subset comprises any one or more of the following certificate features:
a certificate version;
certificate sequence number length;
a signature algorithm;
signature public key length;
the number of certificate issuer information items;
the number of certificate user information items;
certificate validation time and certificate expiration time;
certificate expiration days;
whether the certificate is a self-signed certificate;
the domain name feature subset comprises any one or more of the following domain name features:
domain name length;
ranking of domain names;
the subdomain name length;
a main domain name length;
the domain name contains the number of digits;
the domain name contains the number of special characters;
whether the domain name contains wild cards.
2. The method according to claim 1, wherein the selecting features satisfying a preset condition from features included in the flow feature subset, the handshake feature subset, the certificate feature subset, and the domain name feature subset, respectively, to form a final encrypted traffic feature set includes:
and respectively selecting the features meeting the preset conditions from the features included in the stream feature subset, the handshake feature subset, the certificate feature subset and the domain name feature subset to form a final encrypted flow feature set by a comprehensive feature engineering method combining basic statistic analysis, visual analysis and domain knowledge analysis, wherein the features meeting the preset conditions are features with the distinguishing degree exceeding a preset threshold on malicious encrypted flow and normal encrypted flow.
3. An apparatus for generating an encrypted traffic feature set, comprising:
the sample data determining unit is used for determining malicious encrypted traffic sample data and normal encrypted traffic sample data;
an initial encrypted traffic feature set obtaining unit, configured to perform in-depth protocol analysis on the malicious encrypted traffic sample data and the normal encrypted traffic sample data, to obtain an initial encrypted traffic feature set, where the initial encrypted traffic feature set includes a flow feature subset, a handshake feature subset, a certificate feature subset, and a domain name feature subset; the flow characteristics are related to the size and arrival time of network data packets only, the characteristics and handshake characteristics irrelevant to the protocol refer to the characteristics extracted in the handshake negotiation stage of the encryption protocol, the certificate characteristics refer to the characteristics of using certificates in the process of the secure socket layer protocol, and the domain name characteristics refer to the characteristics of server name indication information in the secure socket layer session;
a final encrypted traffic feature set obtaining unit, configured to select features satisfying a preset condition from features included in the flow feature subset, the handshake feature subset, the certificate feature subset, and the domain name feature subset, respectively, to form a final encrypted traffic feature set;
the initial encrypted traffic feature set obtaining unit performs deep protocol analysis on the malicious encrypted traffic sample data and the normal encrypted traffic sample data, and includes:
performing deep protocol analysis on the key negotiation session process of the malicious encrypted traffic sample data and the key negotiation session process of the normal encrypted traffic sample data;
the flow characteristic subset comprises any one or more of the following flow characteristics:
maximum, minimum, mean, standard deviation of unidirectional and bidirectional stream durations;
maximum value, minimum value, mean value and standard deviation of the sizes of unidirectional flow and bidirectional flow packets;
byte distribution and distribution mean values of unidirectional flow and bidirectional flow packets;
maximum value, minimum value, mean value and standard deviation of arrival time interval of unidirectional flow and bidirectional flow packets;
the size transition probability of unidirectional flow and bidirectional flow packets;
the probability of transition between arrival time intervals of unidirectional flow and bidirectional flow packets;
the subset of handshake features includes any one or more of the following handshake features:
secure sockets layer SSL protocol version;
an encryption algorithm selected by a protocol;
the protocol selects the length of the public key;
number of protocol extensions;
protocol extension item length;
a password suite provided by the client;
a password suite selected by the server;
the certificate feature subset comprises any one or more of the following certificate features:
a certificate version;
certificate sequence number length;
a signature algorithm;
signature public key length;
the number of certificate issuer information items;
the number of certificate user information items;
certificate validation time and certificate expiration time;
certificate expiration days;
whether the certificate is a self-signed certificate;
the domain name feature subset comprises any one or more of the following domain name features:
domain name length;
ranking of domain names;
the subdomain name length;
a main domain name length;
the domain name contains the number of digits;
the domain name contains the number of special characters;
whether the domain name contains wild cards.
4. A storage medium having stored thereon a program which when executed by a processor implements the method of generating an encrypted traffic feature set according to any one of claims 1 to 2.
5. An electronic device comprising a memory for storing a program and a processor for running the program, wherein the program when run performs the method of generating an encrypted traffic feature set according to any one of claims 1 to 2.
CN201910555508.0A 2019-06-25 2019-06-25 Method and device for generating encrypted traffic feature set Active CN112134829B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910555508.0A CN112134829B (en) 2019-06-25 2019-06-25 Method and device for generating encrypted traffic feature set

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910555508.0A CN112134829B (en) 2019-06-25 2019-06-25 Method and device for generating encrypted traffic feature set

Publications (2)

Publication Number Publication Date
CN112134829A CN112134829A (en) 2020-12-25
CN112134829B true CN112134829B (en) 2023-06-30

Family

ID=73849403

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910555508.0A Active CN112134829B (en) 2019-06-25 2019-06-25 Method and device for generating encrypted traffic feature set

Country Status (1)

Country Link
CN (1) CN112134829B (en)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115529147A (en) * 2021-06-25 2022-12-27 安碁资讯股份有限公司 Data leakage detection method and device
CN114553605A (en) * 2022-04-26 2022-05-27 中国矿业大学(北京) Encrypted malicious flow detection method for voting strategy
CN115941361B (en) * 2023-02-16 2023-05-09 科来网络技术股份有限公司 Malicious traffic identification method, device and equipment

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109379377A (en) * 2018-11-30 2019-02-22 极客信安(北京)科技有限公司 Encrypt malicious traffic stream detection method, device, electronic equipment and storage medium

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
IL248306B (en) * 2016-10-10 2019-12-31 Verint Systems Ltd System and method for generating data sets for learning to identify user actions
CN109495513B (en) * 2018-12-29 2021-06-01 极客信安(北京)科技有限公司 Unsupervised encrypted malicious traffic detection method, unsupervised encrypted malicious traffic detection device, unsupervised encrypted malicious traffic detection equipment and unsupervised encrypted malicious traffic detection medium

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109379377A (en) * 2018-11-30 2019-02-22 极客信安(北京)科技有限公司 Encrypt malicious traffic stream detection method, device, electronic equipment and storage medium

Also Published As

Publication number Publication date
CN112134829A (en) 2020-12-25

Similar Documents

Publication Publication Date Title
CN112134829B (en) Method and device for generating encrypted traffic feature set
US9152808B1 (en) Adapting decoy data present in a network
US20170339192A1 (en) Computer-implemented process and system employing outlier score detection for identifying and detecting scenario-specific data elements from a dynamic data source
CN110784465B (en) Data stream detection method and device and electronic equipment
CN113395406B (en) Encryption authentication method and system based on power equipment fingerprint
CN110912877B (en) Data transmitting and receiving method and device based on IEC61850 model in transformer substation
US11245685B2 (en) Methods and apparatus to verify encrypted handshakes
US20200044962A1 (en) Methods, systems, articles of manufacture and apparatus to identify applications
CN115296908A (en) Encryption method and device for sensitive information, electronic equipment and storage medium
CN113918977A (en) User information transmission device based on Internet of things and big data analysis
Karmakar et al. Shrinkable Cryptographic Technique Using Involutory Function for Image Encryption
CN109361712B (en) Information processing method and information processing device
CN112152966B (en) Method and device for identifying illegal SSL certificate
CN110032832B (en) Web application processing method and device
CN107103254B (en) Encrypted program identification method and device and electronic equipment
CN112329025A (en) Power terminal bypass safety analysis method and power terminal bypass safety analysis system
CN112995111A (en) Block chain-based Internet of things security detection method, equipment, system and medium
CN117390687B (en) Sensitive data query method and device, storage medium and electronic equipment
CN111565103B (en) Production data processing method and device
CN116757558B (en) Alcohol refining process quality prediction method and system based on data mining
CN117910023B (en) Computer information security processing method and system based on big data
CN116611097B (en) Land resource management method and device based on administrative data and electronic equipment
CN111428251B (en) Data processing method and device
CN112311761B (en) Data processing method and device
JP6055726B2 (en) Web page monitoring device, web page monitoring system, web page monitoring method and computer program

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant