CN112134829B - Method and device for generating encrypted traffic feature set - Google Patents
Method and device for generating encrypted traffic feature set Download PDFInfo
- Publication number
- CN112134829B CN112134829B CN201910555508.0A CN201910555508A CN112134829B CN 112134829 B CN112134829 B CN 112134829B CN 201910555508 A CN201910555508 A CN 201910555508A CN 112134829 B CN112134829 B CN 112134829B
- Authority
- CN
- China
- Prior art keywords
- encrypted traffic
- certificate
- flow
- feature
- features
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Images
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L63/00—Network architectures or network communication protocols for network security
- H04L63/14—Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic
- H04L63/1408—Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic by monitoring network traffic
- H04L63/1416—Event detection, e.g. attack signature detection
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L63/00—Network architectures or network communication protocols for network security
- H04L63/14—Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic
- H04L63/1441—Countermeasures against malicious traffic
- H04L63/145—Countermeasures against malicious traffic the attack involving the propagation of malware through the network, e.g. viruses, trojans or worms
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02D—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
- Y02D30/00—Reducing energy consumption in communication networks
- Y02D30/50—Reducing energy consumption in communication networks in wire-line communication networks, e.g. low power modes or reduced link rate
Abstract
The application discloses a method and a device for generating an encrypted traffic feature set, which are used for determining malicious encrypted traffic sample data and normal encrypted traffic sample data; deep protocol analysis is carried out on the encrypted traffic feature set to obtain an initial encrypted traffic feature set, wherein the initial encrypted traffic feature set comprises a flow feature subset, a handshake feature subset, a certificate feature subset and a domain name feature subset; and selecting the characteristics meeting preset conditions from the initial encrypted flow characteristic set to form a final encrypted flow characteristic set. Based on the method and the device, key features in the encrypted traffic can be extracted comprehensively, distinguishing features of malicious encrypted traffic and normal encrypted traffic are extracted from four aspects of flow features, handshake features, certificate features and domain name features to generate an encrypted traffic feature set, the features covered by the generated encrypted traffic feature set are more comprehensive, encrypted traffic details can be reflected objectively, and the detection accuracy can be ensured when the method and the device are applied to detecting malicious encrypted traffic.
Description
Technical Field
The present invention relates to the field of malicious encrypted traffic analysis technologies, and in particular, to a method and an apparatus for generating an encrypted traffic feature set.
Background
Malicious traffic refers to computer network traffic that an attacker specially constructs to attack a particular target, typically generated by malicious programs and propagated through the network. The method is an important task of network security, and can accurately and timely identify malicious traffic and take emergency treatment measures.
In recent years, malicious traffic using an encrypted communication protocol grows rapidly, and because traffic decryption cannot be performed, a detection technology constructed based on plaintext protocol analysis faces serious challenges in detecting malicious encrypted traffic, for example, because effective detection rules cannot be extracted from the malicious encrypted traffic basically, the rule detection technology constructed based on plaintext protocol analysis cannot be applied; because malicious encryption traffic cannot restore files, file detection technology constructed based on plaintext protocol analysis cannot be applied; since it is difficult to extract explicit behavior content from malicious encrypted traffic, behavior detection techniques constructed based on plaintext protocol analysis cannot be applied to detection of malicious encrypted traffic.
With the development of machine learning, the application of machine learning to the detection of malicious encrypted traffic would be a very good means. However, the core of machine learning is the selection of an encryption traffic feature set, and the quality of the encryption traffic feature set often directly determines the accuracy of malicious encryption traffic detection.
However, there is currently no sophisticated solution for feature set construction for encrypted traffic. Therefore, how to construct high-quality encrypted traffic feature sets is a technical problem that needs to be solved by those skilled in the art.
Disclosure of Invention
In view of the foregoing, the present application has been developed to provide a method and apparatus for generating an encrypted traffic feature set that overcome, or at least partially solve, the foregoing problems. The specific scheme is as follows:
a method of generating an encrypted traffic feature set, the method comprising:
determining malicious encrypted traffic sample data and normal encrypted traffic sample data;
deep protocol analysis is carried out on the malicious encrypted traffic sample data and the normal encrypted traffic sample data to obtain an initial encrypted traffic feature set, wherein the initial encrypted traffic feature set comprises a flow feature subset, a handshake feature subset, a certificate feature subset and a domain name feature subset;
and selecting the features meeting the preset conditions from the features included in the stream feature subset, the handshake feature subset, the certificate feature subset and the domain name feature subset to form a final encrypted flow feature set.
Optionally, the flow feature subset includes any one or more of the following flow features:
maximum, minimum, mean, standard deviation of unidirectional and bidirectional stream durations;
maximum value, minimum value, mean value and standard deviation of the sizes of unidirectional flow and bidirectional flow packets;
byte distribution and distribution mean values of unidirectional flow and bidirectional flow packets;
maximum value, minimum value, mean value and standard deviation of arrival time interval of unidirectional flow and bidirectional flow packets;
the size transition probability of unidirectional flow and bidirectional flow packets;
unidirectional and bidirectional flow packet arrival time interval transition probabilities.
Optionally, the subset of handshake features includes any one or more of the following handshake features:
secure sockets layer SSL protocol version;
an encryption algorithm selected by a protocol;
the protocol selects the length of the public key;
number of protocol extensions;
protocol extension item length;
a password suite provided by the client;
and the server selects the password suite.
Optionally, the certificate feature subset includes any one or more of the following certificate features:
a certificate version;
certificate sequence number length;
a signature algorithm;
signature public key length;
the number of certificate issuer information items;
the number of certificate user information items;
certificate validation time and certificate expiration time;
certificate expiration days;
whether the certificate is a self-signed certificate.
Optionally, the domain name feature subset includes any one or more of the following domain name features:
domain name length;
ranking of domain names;
the subdomain name length;
a main domain name length;
the domain name contains the number of digits;
the domain name contains the number of special characters;
whether the domain name contains wild cards.
Optionally, the selecting features meeting a preset condition from the features included in the flow feature subset, the handshake feature subset, the certificate feature subset and the domain name feature subset to form a final encrypted traffic feature set includes:
and respectively selecting the features meeting the preset conditions from the features included in the stream feature subset, the handshake feature subset, the certificate feature subset and the domain name feature subset to form a final encrypted flow feature set by a comprehensive feature engineering method combining basic statistic analysis, visual analysis and domain knowledge analysis, wherein the features meeting the preset conditions are features with the distinguishing degree exceeding a preset threshold on malicious encrypted flow and normal encrypted flow.
Optionally, the performing deep protocol analysis on the malicious encrypted traffic sample data and the normal encrypted traffic sample data includes:
and carrying out deep protocol analysis on the key negotiation session process of the malicious encrypted traffic sample data and the key negotiation session process of the normal encrypted traffic sample data.
An apparatus for generating an encrypted traffic feature set, comprising:
the sample data determining unit is used for determining malicious encrypted traffic sample data and normal encrypted traffic sample data;
an initial encrypted traffic feature set obtaining unit, configured to perform in-depth protocol analysis on the malicious encrypted traffic sample data and the normal encrypted traffic sample data, to obtain an initial encrypted traffic feature set, where the initial encrypted traffic feature set includes a flow feature subset, a handshake feature subset, a certificate feature subset, and a domain name feature subset;
and the final encrypted traffic feature set acquisition unit is used for respectively selecting features meeting preset conditions from the features included in the stream feature subset, the handshake feature subset, the certificate feature subset and the domain name feature subset to form a final encrypted traffic feature set.
A storage medium having stored thereon a program which, when executed by a processor, implements a method of generating an encrypted traffic feature set as described above.
An electronic device comprising a memory for storing a program and a processor for running the program, wherein the program is run to perform a method of generating an encrypted traffic feature set as described above.
By means of the technical scheme, the application discloses a method and a device for generating an encrypted traffic feature set, and malicious encrypted traffic sample data and normal encrypted traffic sample data are determined; deep protocol analysis is carried out on the malicious encrypted traffic sample data and the normal encrypted traffic sample data to obtain an initial encrypted traffic feature set, wherein the initial encrypted traffic feature set comprises a flow feature subset, a handshake feature subset, a certificate feature subset and a domain name feature subset; and selecting the features meeting the preset conditions from the features included in the stream feature subset, the handshake feature subset, the certificate feature subset and the domain name feature subset to form a final encrypted flow feature set. Based on the method and the device, key features in the encrypted traffic can be extracted comprehensively, distinguishing features of malicious encrypted traffic and normal encrypted traffic are extracted from four aspects of flow features, handshake features, certificate features and domain name features to generate an encrypted traffic feature set, the features covered by the generated encrypted traffic feature set are more comprehensive, encrypted traffic details can be reflected objectively, and the detection accuracy can be ensured when the method and the device are applied to detecting malicious encrypted traffic.
The foregoing description is only an overview of the technical solutions of the present application, and may be implemented according to the content of the specification in order to make the technical means of the present application more clearly understood, and in order to make the above-mentioned and other objects, features and advantages of the present application more clearly understood, the following detailed description of the present application will be given.
Drawings
Various other advantages and benefits will become apparent to those of ordinary skill in the art upon reading the following detailed description of the preferred embodiments. The drawings are only for purposes of illustrating the preferred embodiments and are not to be construed as limiting the application. Also, like reference numerals are used to designate like parts throughout the figures. In the drawings:
fig. 1 is a flow chart of a method for generating an encrypted traffic feature set according to an embodiment of the present application;
fig. 2 is a schematic diagram of an encryption session feature provided in an embodiment of the present application;
FIG. 3 is a chart showing the statistics of the number of issuer items in the certificate feature provided in the embodiment of the present application;
fig. 4 is a statistical comparison chart of the number of client protocol extension items in the handshake feature provided in the embodiment of the present application;
fig. 5 is a schematic structural diagram of an apparatus for generating an encrypted traffic feature set according to an embodiment of the present application.
Detailed Description
Exemplary embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings. While exemplary embodiments of the present disclosure are shown in the drawings, it should be understood that the present disclosure may be embodied in various forms and should not be limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the disclosure to those skilled in the art.
Specific embodiments provided in the embodiments of the present application are described in detail below.
Referring to fig. 1, fig. 1 is a flowchart of a method for generating an encrypted traffic feature set according to an embodiment of the present application, where the method includes the following steps:
s101: malicious encrypted traffic sample data and normal encrypted traffic sample data are determined.
The number of the malicious encrypted traffic sample data and the normal encrypted traffic sample data can be multiple, and the tags for the malicious encrypted traffic sample data and the normal encrypted traffic sample data can be set for subsequent processing training, for example, the tags for the malicious encrypted traffic sample data are set as "malicious" and the tags for the normal encrypted traffic sample data are "normal".
After the malicious encrypted traffic sample data and the normal encrypted traffic sample data are determined, the malicious encrypted traffic samples and the normal encrypted traffic samples may be subjected to stream segmentation according to the quadruple to generate a network stream.
S102: and carrying out deep protocol analysis on the malicious encrypted traffic sample data and the normal encrypted traffic sample data to obtain an initial encrypted traffic feature set, wherein the initial encrypted traffic feature set comprises a flow feature subset, a handshake feature subset, a certificate feature subset and a domain name feature subset.
It should be noted that, in this step, deep protocol analysis may be performed on the malicious encrypted traffic sample data and the normal encrypted traffic sample data, which specifically may be performed on the network flow generated after the flow splitting in the previous step. As an implementation manner, the key negotiation session process of the malicious encrypted traffic sample data and the key negotiation session process of the normal encrypted traffic sample data may be subjected to deep protocol analysis.
Through deep analysis of a certain encrypted traffic, the whole encrypted session feature can be classified into a flow feature, a handshake feature, a certificate feature and a domain name feature, so that the initial encrypted traffic feature set comprises a flow feature subset, a handshake feature subset, a certificate feature subset and a domain name feature subset, and each feature subset contains respective specific features. The four types of feature subsets are fine-grained to cover the whole encryption session, so that the detail of the encryption session can be fully and objectively embodied. As shown in fig. 2.
The following is a detailed description of four types of feature subsets
(1) Flow feature subset
The protocol independent feature is called a flow feature, which is related only to the network packet size and arrival time. Flow characteristics fall into two categories, unidirectional and bidirectional in particular. And the unidirectional flow characteristics respectively examine the characteristics of the Client end, the Server end and the bidirectional flow characteristics, and then examine the characteristics of the whole flow session. The partial flow features are listed below:
maximum, minimum, mean, standard deviation of unidirectional and bidirectional stream durations;
maximum value, minimum value, mean value and standard deviation of the sizes of unidirectional flow and bidirectional flow packets;
byte distribution and distribution mean values of unidirectional flow and bidirectional flow packets;
maximum value, minimum value, mean value and standard deviation of arrival time interval of unidirectional flow and bidirectional flow packets;
the size transition probability of unidirectional flow and bidirectional flow packets;
the probability of transition between arrival time intervals of unidirectional flow and bidirectional flow packets;
other flow characteristics.
It should be noted that the subset of flow features includes any one or more of the flow features described above.
(2) Handshake feature subset
The handshake feature refers to a feature extracted during a handshake negotiation phase of an encryption protocol. Representative handshaking features are:
SSL version;
an encryption algorithm selected by a protocol;
the protocol selects the length of the public key;
number of protocol extensions;
protocol extension item length;
the Client provides a password suite;
a password suite selected by a Server end;
other handshaking features.
It should be noted that the subset of handshake features includes any one or more of the above handshake features.
(3) Certificate feature subset
The certificate features refer to features of using certificates in the SSL (Secure Sockets Layer, secure socket layer) protocol process, and specifically include the following:
a certificate version;
certificate sequence number length;
a signature algorithm;
signature public key length;
the number of certificate issuer information items;
the number of certificate user information items;
certificate validation time and certificate expiration time;
certificate expiration days;
whether the certificate is a self-signed certificate;
other certificate features.
It should be noted that any one or more of the above certificate features are included in the certificate feature subset.
(4) Domain name feature subset
The domain name features refer to features of SNI (Server Name Indicator, server name indication) information in the SSL session, and specifically include the following:
domain name length;
alexa ranking of domain names;
the subdomain name length;
a main domain name length;
the domain name contains the number of digits;
the domain name contains the number of special characters;
whether the domain name contains wild cards;
other DNS features.
It should be noted that the domain name feature subset includes any one or more of the above domain name features.
S103: and selecting the features meeting the preset conditions from the features included in the stream feature subset, the handshake feature subset, the certificate feature subset and the domain name feature subset to form a final encrypted flow feature set.
After the initial encrypted traffic feature set is generated, features included in the flow feature subset, the handshake feature subset, the credential feature subset, and the domain name feature subset are processed. In order to select a plurality of features with the best effect on the encrypted malicious traffic scene, a comprehensive feature engineering method combining basic statistical value analysis, visual analysis and domain knowledge analysis is used for processing the features included in the stream feature subset, the handshake feature subset, the certificate feature subset and the domain name feature subset, and features meeting preset conditions are selected from the features included in the stream feature subset, the handshake feature subset, the certificate feature subset and the domain name feature subset to form a final encrypted traffic feature set, wherein the features meeting the preset conditions are features with the degree of distinction between malicious encrypted traffic and normal encrypted traffic exceeding a preset threshold. The characteristic meeting the preset condition is that the distinguishing degree of malicious encrypted traffic and normal encrypted traffic exceeds a preset threshold value.
The feature engineering effect is illustrated in this application because the number of features included in the flow feature subset, the handshake feature subset, the certificate feature subset, and the domain name feature subset is large.
Example one:
FIG. 3 is a statistical comparison of issuer project numbers in certificate features, where light gray represents normal encrypted traffic and dark gray represents malicious encrypted traffic. It is evident from the figure that more than 80% of malicious traffic has only 1 issuer item, while certificates in normal traffic tend to fill out more issuer items.
Example two:
fig. 4 is a diagram of statistics versus number of client protocol extensions in a handshake feature, where light gray represents normal encrypted traffic and dark gray represents malicious encrypted traffic. It can be seen from the figure that the normal encrypted traffic has significantly more extension than the malicious encrypted traffic.
The final encrypted traffic feature set can be applied to model training and parameter adjustment, and a final model is obtained for actual malicious encrypted traffic detection.
The embodiment discloses a method for generating an encrypted traffic feature set, which is used for determining malicious encrypted traffic sample data and normal encrypted traffic sample data; deep protocol analysis is carried out on the malicious encrypted traffic sample data and the normal encrypted traffic sample data to obtain an initial encrypted traffic feature set, wherein the initial encrypted traffic feature set comprises a flow feature subset, a handshake feature subset, a certificate feature subset and a domain name feature subset; and selecting the features meeting the preset conditions from the features included in the stream feature subset, the handshake feature subset, the certificate feature subset and the domain name feature subset to form a final encrypted flow feature set. Based on the method, key features in the encrypted traffic can be extracted comprehensively, distinguishing features of malicious encrypted traffic and normal encrypted traffic are extracted from four aspects of flow features, handshake features, certificate features and domain name features to generate an encrypted traffic feature set, the features covered by the generated encrypted traffic feature set are more comprehensive, encrypted traffic details can be reflected objectively, and the detection accuracy can be ensured when the method is applied to detecting malicious encrypted traffic.
Referring to fig. 5, fig. 5 is a schematic structural diagram of an apparatus for generating an encrypted traffic feature set according to an embodiment of the present application, where the apparatus includes the following units:
a sample data determining unit 51 for determining malicious encrypted traffic sample data and normal encrypted traffic sample data;
an initial encrypted traffic feature set obtaining unit 52, configured to perform in-depth protocol analysis on the malicious encrypted traffic sample data and the normal encrypted traffic sample data, to obtain an initial encrypted traffic feature set, where the initial encrypted traffic feature set includes a flow feature subset, a handshake feature subset, a certificate feature subset, and a domain name feature subset;
a final encrypted traffic feature set obtaining unit 53, configured to select features that meet a preset condition from the features included in the flow feature subset, the handshake feature subset, the certificate feature subset, and the domain name feature subset, respectively, to form a final encrypted traffic feature set.
It should be noted that, the specific implementation of each unit is described in detail in the method embodiment, please refer to the related content in the method embodiment specifically, and this embodiment is not repeated.
The device for generating the encrypted traffic characteristic set comprises a processor and a memory, wherein each unit is stored in the memory as a program unit, and the processor executes the program unit stored in the memory to realize corresponding functions.
The processor includes a kernel, and the kernel fetches the corresponding program unit from the memory. The kernel may be provided with one or more kernel parameters to enable the generation of encrypted traffic feature sets.
The memory may include volatile memory, random Access Memory (RAM), and/or nonvolatile memory, such as Read Only Memory (ROM) or flash memory (flash RAM), among other forms in computer readable media, the memory including at least one memory chip.
The embodiment of the application provides a storage medium, on which a program is stored, which when executed by a processor implements the method of generating an encrypted traffic feature set.
The embodiment of the application provides a processor for running a program, wherein the program runs to execute the method for generating the encrypted traffic characteristic set.
The embodiment of the application provides an electronic device, which comprises a processor, a memory and a program stored on the memory and capable of running on the processor, wherein the following steps are realized when the processor executes the program:
determining malicious encrypted traffic sample data and normal encrypted traffic sample data;
deep protocol analysis is carried out on the malicious encrypted traffic sample data and the normal encrypted traffic sample data to obtain an initial encrypted traffic feature set, wherein the initial encrypted traffic feature set comprises a flow feature subset, a handshake feature subset, a certificate feature subset and a domain name feature subset;
and selecting the features meeting the preset conditions from the features included in the stream feature subset, the handshake feature subset, the certificate feature subset and the domain name feature subset to form a final encrypted flow feature set.
Optionally, the flow feature subset includes any one or more of the following flow features:
maximum, minimum, mean, standard deviation of unidirectional and bidirectional stream durations;
maximum value, minimum value, mean value and standard deviation of the sizes of unidirectional flow and bidirectional flow packets;
byte distribution and distribution mean values of unidirectional flow and bidirectional flow packets;
maximum value, minimum value, mean value and standard deviation of arrival time interval of unidirectional flow and bidirectional flow packets;
the size transition probability of unidirectional flow and bidirectional flow packets;
unidirectional and bidirectional flow packet arrival time interval transition probabilities.
Optionally, the subset of handshake features includes any one or more of the following handshake features:
secure sockets layer SSL protocol version;
an encryption algorithm selected by a protocol;
the protocol selects the length of the public key;
number of protocol extensions;
protocol extension item length;
a password suite provided by the client;
and the server selects the password suite.
Optionally, the certificate feature subset includes any one or more of the following certificate features:
a certificate version;
certificate sequence number length;
a signature algorithm;
signature public key length;
the number of certificate issuer information items;
the number of certificate user information items;
certificate validation time and certificate expiration time;
certificate expiration days;
whether the certificate is a self-signed certificate.
Optionally, the domain name feature subset includes any one or more of the following domain name features:
domain name length;
ranking of domain names;
the subdomain name length;
a main domain name length;
the domain name contains the number of digits;
the domain name contains the number of special characters;
whether the domain name contains wild cards.
Optionally, the selecting features meeting a preset condition from the features included in the flow feature subset, the handshake feature subset, the certificate feature subset and the domain name feature subset to form a final encrypted traffic feature set includes:
and respectively selecting the features meeting the preset conditions from the features included in the stream feature subset, the handshake feature subset, the certificate feature subset and the domain name feature subset to form a final encrypted flow feature set by a comprehensive feature engineering method combining basic statistic analysis, visual analysis and domain knowledge analysis, wherein the features meeting the preset conditions are features with the distinguishing degree exceeding a preset threshold on malicious encrypted flow and normal encrypted flow.
Optionally, the performing deep protocol analysis on the malicious encrypted traffic sample data and the normal encrypted traffic sample data includes:
and carrying out deep protocol analysis on the key negotiation session process of the malicious encrypted traffic sample data and the key negotiation session process of the normal encrypted traffic sample data.
The electronic device herein may be a server, a PC, a PAD, a mobile phone, etc.
The present application also provides a computer program product adapted to perform, when executed on a data processing device, a program initialized with the method steps of:
determining malicious encrypted traffic sample data and normal encrypted traffic sample data;
deep protocol analysis is carried out on the malicious encrypted traffic sample data and the normal encrypted traffic sample data to obtain an initial encrypted traffic feature set, wherein the initial encrypted traffic feature set comprises a flow feature subset, a handshake feature subset, a certificate feature subset and a domain name feature subset;
and selecting the features meeting the preset conditions from the features included in the stream feature subset, the handshake feature subset, the certificate feature subset and the domain name feature subset to form a final encrypted flow feature set.
Optionally, the flow feature subset includes any one or more of the following flow features:
maximum, minimum, mean, standard deviation of unidirectional and bidirectional stream durations;
maximum value, minimum value, mean value and standard deviation of the sizes of unidirectional flow and bidirectional flow packets;
byte distribution and distribution mean values of unidirectional flow and bidirectional flow packets;
maximum value, minimum value, mean value and standard deviation of arrival time interval of unidirectional flow and bidirectional flow packets;
the size transition probability of unidirectional flow and bidirectional flow packets;
unidirectional and bidirectional flow packet arrival time interval transition probabilities.
Optionally, the subset of handshake features includes any one or more of the following handshake features:
secure sockets layer SSL protocol version;
an encryption algorithm selected by a protocol;
the protocol selects the length of the public key;
number of protocol extensions;
protocol extension item length;
a password suite provided by the client;
and the server selects the password suite.
Optionally, the certificate feature subset includes any one or more of the following certificate features:
a certificate version;
certificate sequence number length;
a signature algorithm;
signature public key length;
the number of certificate issuer information items;
the number of certificate user information items;
certificate validation time and certificate expiration time;
certificate expiration days;
whether the certificate is a self-signed certificate.
Optionally, the domain name feature subset includes any one or more of the following domain name features:
domain name length;
ranking of domain names;
the subdomain name length;
a main domain name length;
the domain name contains the number of digits;
the domain name contains the number of special characters;
whether the domain name contains wild cards.
Optionally, the selecting features meeting a preset condition from the features included in the flow feature subset, the handshake feature subset, the certificate feature subset and the domain name feature subset to form a final encrypted traffic feature set includes:
and respectively selecting the features meeting the preset conditions from the features included in the stream feature subset, the handshake feature subset, the certificate feature subset and the domain name feature subset to form a final encrypted flow feature set by a comprehensive feature engineering method combining basic statistic analysis, visual analysis and domain knowledge analysis, wherein the features meeting the preset conditions are features with the distinguishing degree exceeding a preset threshold on malicious encrypted flow and normal encrypted flow.
Optionally, the performing deep protocol analysis on the malicious encrypted traffic sample data and the normal encrypted traffic sample data includes:
and carrying out deep protocol analysis on the key negotiation session process of the malicious encrypted traffic sample data and the key negotiation session process of the normal encrypted traffic sample data.
It will be appreciated by those skilled in the art that embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment, or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
In one typical configuration, a computing device includes one or more processors (CPUs), input/output interfaces, network interfaces, and memory.
The memory may include volatile memory in a computer-readable medium, random Access Memory (RAM) and/or nonvolatile memory, etc., such as Read Only Memory (ROM) or flash RAM. Memory is an example of a computer-readable medium.
Computer readable media, including both non-transitory and non-transitory, removable and non-removable media, may implement information storage by any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of storage media for a computer include, but are not limited to, phase change memory (PRAM), static Random Access Memory (SRAM), dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), read Only Memory (ROM), electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technology, compact disc read only memory (CD-ROM), digital Versatile Discs (DVD) or other optical storage, magnetic cassettes, magnetic tape magnetic disk storage or other magnetic storage devices, or any other non-transmission medium, which can be used to store information that can be accessed by a computing device. Computer-readable media, as defined herein, does not include transitory computer-readable media (transmission media), such as modulated data signals and carrier waves.
It should also be noted that the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article or apparatus that comprises an element.
It will be appreciated by those skilled in the art that embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
The foregoing is merely exemplary of the present application and is not intended to limit the present application. Various modifications and changes may be made to the present application by those skilled in the art. Any modifications, equivalent substitutions, improvements, etc. which are within the spirit and principles of the present application are intended to be included within the scope of the claims of the present application.
Claims (5)
1. A method of generating an encrypted traffic feature set, the method comprising:
determining malicious encrypted traffic sample data and normal encrypted traffic sample data;
deep protocol analysis is carried out on the malicious encrypted traffic sample data and the normal encrypted traffic sample data to obtain an initial encrypted traffic feature set, wherein the initial encrypted traffic feature set comprises a flow feature subset, a handshake feature subset, a certificate feature subset and a domain name feature subset; the flow characteristics are related to the size and arrival time of network data packets only, the characteristics and handshake characteristics irrelevant to the protocol refer to the characteristics extracted in the handshake negotiation stage of the encryption protocol, the certificate characteristics refer to the characteristics of using certificates in the process of the secure socket layer protocol, and the domain name characteristics refer to the characteristics of server name indication information in the secure socket layer session;
selecting features meeting preset conditions from the features included in the flow feature subset, the handshake feature subset, the certificate feature subset and the domain name feature subset to form a final encrypted flow feature set;
wherein said performing deep protocol analysis on said malicious encrypted traffic sample data and said normal encrypted traffic sample data comprises:
performing deep protocol analysis on the key negotiation session process of the malicious encrypted traffic sample data and the key negotiation session process of the normal encrypted traffic sample data;
the flow characteristic subset comprises any one or more of the following flow characteristics:
maximum, minimum, mean, standard deviation of unidirectional and bidirectional stream durations;
maximum value, minimum value, mean value and standard deviation of the sizes of unidirectional flow and bidirectional flow packets;
byte distribution and distribution mean values of unidirectional flow and bidirectional flow packets;
maximum value, minimum value, mean value and standard deviation of arrival time interval of unidirectional flow and bidirectional flow packets;
the size transition probability of unidirectional flow and bidirectional flow packets;
the probability of transition between arrival time intervals of unidirectional flow and bidirectional flow packets;
the subset of handshake features includes any one or more of the following handshake features:
secure sockets layer SSL protocol version;
an encryption algorithm selected by a protocol;
the protocol selects the length of the public key;
number of protocol extensions;
protocol extension item length;
a password suite provided by the client;
a password suite selected by the server;
the certificate feature subset comprises any one or more of the following certificate features:
a certificate version;
certificate sequence number length;
a signature algorithm;
signature public key length;
the number of certificate issuer information items;
the number of certificate user information items;
certificate validation time and certificate expiration time;
certificate expiration days;
whether the certificate is a self-signed certificate;
the domain name feature subset comprises any one or more of the following domain name features:
domain name length;
ranking of domain names;
the subdomain name length;
a main domain name length;
the domain name contains the number of digits;
the domain name contains the number of special characters;
whether the domain name contains wild cards.
2. The method according to claim 1, wherein the selecting features satisfying a preset condition from features included in the flow feature subset, the handshake feature subset, the certificate feature subset, and the domain name feature subset, respectively, to form a final encrypted traffic feature set includes:
and respectively selecting the features meeting the preset conditions from the features included in the stream feature subset, the handshake feature subset, the certificate feature subset and the domain name feature subset to form a final encrypted flow feature set by a comprehensive feature engineering method combining basic statistic analysis, visual analysis and domain knowledge analysis, wherein the features meeting the preset conditions are features with the distinguishing degree exceeding a preset threshold on malicious encrypted flow and normal encrypted flow.
3. An apparatus for generating an encrypted traffic feature set, comprising:
the sample data determining unit is used for determining malicious encrypted traffic sample data and normal encrypted traffic sample data;
an initial encrypted traffic feature set obtaining unit, configured to perform in-depth protocol analysis on the malicious encrypted traffic sample data and the normal encrypted traffic sample data, to obtain an initial encrypted traffic feature set, where the initial encrypted traffic feature set includes a flow feature subset, a handshake feature subset, a certificate feature subset, and a domain name feature subset; the flow characteristics are related to the size and arrival time of network data packets only, the characteristics and handshake characteristics irrelevant to the protocol refer to the characteristics extracted in the handshake negotiation stage of the encryption protocol, the certificate characteristics refer to the characteristics of using certificates in the process of the secure socket layer protocol, and the domain name characteristics refer to the characteristics of server name indication information in the secure socket layer session;
a final encrypted traffic feature set obtaining unit, configured to select features satisfying a preset condition from features included in the flow feature subset, the handshake feature subset, the certificate feature subset, and the domain name feature subset, respectively, to form a final encrypted traffic feature set;
the initial encrypted traffic feature set obtaining unit performs deep protocol analysis on the malicious encrypted traffic sample data and the normal encrypted traffic sample data, and includes:
performing deep protocol analysis on the key negotiation session process of the malicious encrypted traffic sample data and the key negotiation session process of the normal encrypted traffic sample data;
the flow characteristic subset comprises any one or more of the following flow characteristics:
maximum, minimum, mean, standard deviation of unidirectional and bidirectional stream durations;
maximum value, minimum value, mean value and standard deviation of the sizes of unidirectional flow and bidirectional flow packets;
byte distribution and distribution mean values of unidirectional flow and bidirectional flow packets;
maximum value, minimum value, mean value and standard deviation of arrival time interval of unidirectional flow and bidirectional flow packets;
the size transition probability of unidirectional flow and bidirectional flow packets;
the probability of transition between arrival time intervals of unidirectional flow and bidirectional flow packets;
the subset of handshake features includes any one or more of the following handshake features:
secure sockets layer SSL protocol version;
an encryption algorithm selected by a protocol;
the protocol selects the length of the public key;
number of protocol extensions;
protocol extension item length;
a password suite provided by the client;
a password suite selected by the server;
the certificate feature subset comprises any one or more of the following certificate features:
a certificate version;
certificate sequence number length;
a signature algorithm;
signature public key length;
the number of certificate issuer information items;
the number of certificate user information items;
certificate validation time and certificate expiration time;
certificate expiration days;
whether the certificate is a self-signed certificate;
the domain name feature subset comprises any one or more of the following domain name features:
domain name length;
ranking of domain names;
the subdomain name length;
a main domain name length;
the domain name contains the number of digits;
the domain name contains the number of special characters;
whether the domain name contains wild cards.
4. A storage medium having stored thereon a program which when executed by a processor implements the method of generating an encrypted traffic feature set according to any one of claims 1 to 2.
5. An electronic device comprising a memory for storing a program and a processor for running the program, wherein the program when run performs the method of generating an encrypted traffic feature set according to any one of claims 1 to 2.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910555508.0A CN112134829B (en) | 2019-06-25 | 2019-06-25 | Method and device for generating encrypted traffic feature set |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910555508.0A CN112134829B (en) | 2019-06-25 | 2019-06-25 | Method and device for generating encrypted traffic feature set |
Publications (2)
Publication Number | Publication Date |
---|---|
CN112134829A CN112134829A (en) | 2020-12-25 |
CN112134829B true CN112134829B (en) | 2023-06-30 |
Family
ID=73849403
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910555508.0A Active CN112134829B (en) | 2019-06-25 | 2019-06-25 | Method and device for generating encrypted traffic feature set |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN112134829B (en) |
Families Citing this family (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN115529147A (en) * | 2021-06-25 | 2022-12-27 | 安碁资讯股份有限公司 | Data leakage detection method and device |
CN114553605A (en) * | 2022-04-26 | 2022-05-27 | 中国矿业大学(北京) | Encrypted malicious flow detection method for voting strategy |
CN115941361B (en) * | 2023-02-16 | 2023-05-09 | 科来网络技术股份有限公司 | Malicious traffic identification method, device and equipment |
Citations (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109379377A (en) * | 2018-11-30 | 2019-02-22 | 极客信安(北京)科技有限公司 | Encrypt malicious traffic stream detection method, device, electronic equipment and storage medium |
Family Cites Families (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
IL248306B (en) * | 2016-10-10 | 2019-12-31 | Verint Systems Ltd | System and method for generating data sets for learning to identify user actions |
CN109495513B (en) * | 2018-12-29 | 2021-06-01 | 极客信安(北京)科技有限公司 | Unsupervised encrypted malicious traffic detection method, unsupervised encrypted malicious traffic detection device, unsupervised encrypted malicious traffic detection equipment and unsupervised encrypted malicious traffic detection medium |
-
2019
- 2019-06-25 CN CN201910555508.0A patent/CN112134829B/en active Active
Patent Citations (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109379377A (en) * | 2018-11-30 | 2019-02-22 | 极客信安(北京)科技有限公司 | Encrypt malicious traffic stream detection method, device, electronic equipment and storage medium |
Also Published As
Publication number | Publication date |
---|---|
CN112134829A (en) | 2020-12-25 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN112134829B (en) | Method and device for generating encrypted traffic feature set | |
US9152808B1 (en) | Adapting decoy data present in a network | |
US20170339192A1 (en) | Computer-implemented process and system employing outlier score detection for identifying and detecting scenario-specific data elements from a dynamic data source | |
CN110784465B (en) | Data stream detection method and device and electronic equipment | |
CN113395406B (en) | Encryption authentication method and system based on power equipment fingerprint | |
CN110912877B (en) | Data transmitting and receiving method and device based on IEC61850 model in transformer substation | |
US11245685B2 (en) | Methods and apparatus to verify encrypted handshakes | |
US20200044962A1 (en) | Methods, systems, articles of manufacture and apparatus to identify applications | |
CN115296908A (en) | Encryption method and device for sensitive information, electronic equipment and storage medium | |
CN113918977A (en) | User information transmission device based on Internet of things and big data analysis | |
Karmakar et al. | Shrinkable Cryptographic Technique Using Involutory Function for Image Encryption | |
CN109361712B (en) | Information processing method and information processing device | |
CN112152966B (en) | Method and device for identifying illegal SSL certificate | |
CN110032832B (en) | Web application processing method and device | |
CN107103254B (en) | Encrypted program identification method and device and electronic equipment | |
CN112329025A (en) | Power terminal bypass safety analysis method and power terminal bypass safety analysis system | |
CN112995111A (en) | Block chain-based Internet of things security detection method, equipment, system and medium | |
CN117390687B (en) | Sensitive data query method and device, storage medium and electronic equipment | |
CN111565103B (en) | Production data processing method and device | |
CN116757558B (en) | Alcohol refining process quality prediction method and system based on data mining | |
CN117910023B (en) | Computer information security processing method and system based on big data | |
CN116611097B (en) | Land resource management method and device based on administrative data and electronic equipment | |
CN111428251B (en) | Data processing method and device | |
CN112311761B (en) | Data processing method and device | |
JP6055726B2 (en) | Web page monitoring device, web page monitoring system, web page monitoring method and computer program |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |