CN110598774B

CN110598774B - Encrypted flow detection method and device, computer readable storage medium and electronic equipment

Info

Publication number: CN110598774B
Application number: CN201910827194.5A
Authority: CN
Inventors: 罗赟骞; 邬江; 戴方岳
Original assignee: China Power Great Wall Internetworking Safety Technology Research Institute Beijing Co ltd
Current assignee: China Power Great Wall Internetworking Safety Technology Research Institute Beijing Co ltd
Priority date: 2019-09-03
Filing date: 2019-09-03
Publication date: 2023-04-07
Anticipated expiration: 2039-09-03
Also published as: CN110598774A

Abstract

The invention provides an encrypted flow detection method and device, a computer readable storage medium and electronic equipment. The method comprises the following steps: extracting features of network sessions from a target file to serve as training samples, and constructing a training sample set, wherein data in the training samples comprise data of at least two data types; setting the data type of a preset training sample as a data type which can be identified by a preset algorithm, and obtaining a training sample set after preprocessing, wherein the preset training sample comprises the characteristics of a network session which is extracted from a target file and has the data type which can be identified by the preset algorithm, and the preset algorithm can identify the characteristics of at least two data types; constructing an encrypted flow detection model by adopting the predetermined algorithm; and detecting the object to be detected by using the constructed encrypted flow detection model. The device is used for executing the encrypted traffic detection method. The invention constructs more comprehensive detection characteristics, saves computing resources and improves detection accuracy.

Description

Encrypted flow detection method and device, computer readable storage medium and electronic equipment

Technical Field

The present invention relates to the field of network security, and in particular, to an encrypted traffic detection method, an encrypted traffic detection apparatus for performing the encrypted traffic detection method, a computer-readable storage medium, and an electronic device.

Background

With the rapid development of the internet of things, big data, cloud computing and high-speed mobile communication networks, the information confidentiality problem becomes more and more important, various security protocols for ensuring the network communication security are widely applied, and more internet traffic is encrypted. The encryption technology ensures the communication security of internet users, ensures that information cannot be intercepted and read by a third party, and simultaneously makes a traditional security detection mechanism face failure.

The wide application of the artificial intelligence technology provides an important means for discovering the threat of malicious flow attack. At present, malicious encrypted traffic detection research is mainly divided into session-based, session-statistics-based and certificate-based detection research. Detection based on conversation mainly aims at extracting characteristics of network flow and adopts methods such as random forest and the like; detection based on session statistics mainly aims at extracting statistical characteristics of statistical data of network flows, and methods such as eXtreme Gradient Boosting (Xgboost) and LightGBM (Light Gradient Boosting Machine) are adopted; based on certificate detection, aiming at certificate extraction features, a detection model is constructed by methods such as a Support Vector Machine (SVM) and the like.

However, the existing detection model has incomplete features, occupies a large memory space, and has yet to be further improved in detection accuracy.

Disclosure of Invention

To solve at least one aspect of the above problems of the prior art, it is an object of the present invention to provide an encrypted traffic detection method, an encrypted traffic detection apparatus that performs the encrypted traffic detection method, a computer-readable storage medium, and an electronic device. The method aims to reduce the memory space occupied by the encryption flow detection model and further improve the accuracy of encryption flow detection.

To achieve the above object, as a first aspect of the present invention, there is provided an encrypted traffic detection method including:

extracting features of network sessions from a target file to serve as training samples, and constructing a training sample set, wherein data in the training samples comprise data of at least two data types;

preprocessing training samples in the training sample set to set the data types of preset training samples as the data types which can be identified by a preset algorithm, and obtaining the preprocessed training sample set, wherein the preset training samples comprise the features of network sessions, which are extracted from a target file and have the data types which can be identified by the preset algorithm, and the preset algorithm can identify the features of at least two data types;

constructing an encrypted flow detection model by using the pre-processed training sample set and adopting the predetermined algorithm;

and detecting the object to be detected by using the constructed encrypted flow detection model.

Optionally, the data in the training samples comprises numerical data and classification data, the predetermined algorithm being capable of identifying and processing the numerical data and the classification data.

Optionally, the predetermined algorithm comprises a LightGBM algorithm or a Catboost algorithm.

Optionally, the target file includes a static packet file and/or a real-time network traffic file.

Optionally, the characteristics of the network session include at least one of session connection characteristics, TLS/SSL session characteristics, X509 certificate characteristics, and DNS characteristics.

Optionally, a TLS/SSL session of the network session includes TLS/SSL handshake and certificate information.

Optionally, constructing the encrypted traffic detection model includes:

searching the optimal hyper-parameter of the preset algorithm by utilizing the preprocessed training sample set;

and training by using the pre-processed training sample set and the preset algorithm by using the optimal hyper-parameter to obtain the encrypted flow detection model.

Optionally, the detecting the object to be detected by using the constructed encrypted traffic detection model includes:

extracting the characteristics of an object to be detected;

preprocessing the extracted features of the object to be detected, and setting the data type of the extracted features of the object to be detected, of which the data type before extraction is the data type which can be identified by a preset algorithm, as the data type which can be identified by the preset algorithm;

inputting the preprocessed extracted characteristics of the object to be detected into the encrypted flow detection model for identification.

As a second aspect of the present invention, there is provided an encrypted traffic detection device including:

the system comprises a feature extraction module, a feature extraction module and a feature extraction module, wherein the feature extraction module is used for extracting features of a network session from a target file to serve as training samples and constructing a training sample set, and data in the training samples comprise data of at least two data types;

the characteristic data processing module is used for preprocessing the training samples in the training sample set so as to set the data types of the preset training samples as the data types which can be identified by a preset algorithm and obtain the preprocessed training sample set, wherein the preset training samples comprise the characteristics of network sessions which are extracted from a target file and have the data types which can be identified by the preset algorithm, and the preset algorithm can identify the characteristics of at least two data types;

the model construction module is used for constructing an encrypted flow detection model by using the pre-processed training sample set and adopting the predetermined algorithm;

and the encrypted flow detection module is used for detecting the object to be detected by using the constructed encrypted flow detection model.

Optionally, the characteristics of the network session comprise at least one of session connection characteristics, TLS/SSL session characteristics, X509 certificate characteristics, and DNS characteristics.

Optionally, the model building module comprises:

the optimal hyper-parameter selection module is used for searching the optimal hyper-parameter of the preset algorithm by utilizing the preprocessed training sample set;

and the model training module is used for training by using the preprocessed training sample set by using the optimal hyper-parameter and the preset algorithm to obtain the encrypted flow detection model.

Optionally, the feature extraction module is further configured to extract features of the object to be detected.

The characteristic data processing module is further used for preprocessing the extracted characteristic of the object to be detected, and setting the data type of the extracted characteristic of the object to be detected, of which the data type before extraction is the data type which can be identified by the preset algorithm, as the data type which can be identified by the preset algorithm.

And the encrypted flow detection module is also used for inputting the preprocessed extracted characteristics of the object to be detected into the encrypted flow detection model for identification.

As a third aspect of the present invention, there is provided a computer-readable storage medium for storing an executable program capable of executing the above-described encrypted traffic detection method of the present invention.

As a fourth aspect of the present invention, there is provided an electronic apparatus comprising:

one or more processors;

a storage device, on which one or more programs are stored, which, when executed by the one or more processors, cause the one or more processors to implement the encrypted traffic detection method of the present invention described above.

According to the characteristics of malicious encrypted traffic, the encrypted traffic detection model is constructed by using an algorithm capable of directly identifying and processing numerical data and non-numerical data, and the non-numerical data is not required to be converted into the numerical data, so that the occupied storage space of the model is reduced, and the detection accuracy is improved; meanwhile, non-numerical characteristic data is extracted, perfect detection characteristics are constructed, and malicious encrypted flow can be described more comprehensively, so that the detection accuracy is further improved.

Drawings

The accompanying drawings, which are included to provide a further understanding of the invention and are incorporated in and constitute a part of this specification, illustrate embodiments of the invention and together with the description serve to explain the principles of the invention and not to limit the invention. In the drawings:

FIG. 1 is a flow chart of a method of detecting encrypted traffic;

FIG. 2 is a flow chart of the construction of an encrypted traffic detection model using the predetermined algorithm;

FIG. 3 is a flow chart of detecting an object to be detected by using the constructed encryption traffic detection model;

fig. 4 is a block diagram of the encrypted flow rate detection apparatus.

Detailed Description

The following detailed description of embodiments of the invention refers to the accompanying drawings. It should be understood that the detailed description and specific examples, while indicating the present invention, are given by way of illustration and explanation only, not limitation.

As a first aspect of the present invention, there is provided an encrypted traffic detection method. Fig. 1 is a flow chart of a method of detecting encrypted traffic. As shown in fig. 1, the encrypted traffic detection method according to this embodiment includes:

in step S110, features of the web session are extracted from the target file as training samples, and a training sample set is constructed, where data in the training samples includes data of at least two data types.

In step S120, preprocessing is performed on the training samples in the training sample set to set the data type of a predetermined training sample as a data type that can be recognized by a predetermined algorithm, and obtain a preprocessed training sample set, where the predetermined training sample includes features of a web session extracted from a target file and having a previous data type that can be recognized by the predetermined algorithm, and the predetermined algorithm is capable of recognizing features of at least two data types.

In step S130, the preprocessed training sample set is used to construct an encrypted traffic detection model by using the predetermined algorithm.

In step S140, the constructed encrypted traffic detection model is used to detect the object to be detected.

The inventor of the invention researches and discovers that the existing models can only identify and process numerical data, so that when the characteristic data of the encrypted flow is extracted or only the characteristics of the numerical type are extracted, the malicious encrypted flow cannot be completely described, and the detection is not accurate enough; or after the non-numerical characteristic data is extracted, the non-numerical characteristic data needs to be converted into numerical data, a large amount of memory space is occupied, the detection timeliness is low, and the detection accuracy is further limited.

In view of the above, in order to overcome the problem that the existing models can only recognize and process numerical data, and in order to process non-numerical data, the invention adopts an algorithm capable of directly recognizing and processing data of at least two data types, so that malicious encrypted traffic can be described more comprehensively, and waste of memory resources caused by converting non-numerical characteristic data into numerical data can be avoided, thereby effectively improving the accuracy of encrypted traffic detection.

In addition, research finds that the session connection characteristics represent the characteristic expression of malicious encrypted traffic on the connection traffic; the Security Transport Layer protocol (TLS)/Security Sockets Layer (SSL) session feature and the X509 certificate feature represent the feature expression of malicious traffic on the encryption attribute; the Domain Name System (DNS) feature represents whether there is a problem with a Domain Name used in a session, such as possibly a Domain Name generation Algorithm (DGA) Domain Name. The characteristics comprise non-numerical characteristics, and the non-numerical characteristics describe unique performances on specific attributes of the malicious encrypted traffic and have an important role in comprehensively describing the malicious encrypted traffic. When the characteristics of the network session are extracted, the characteristic data of at least two data types are simultaneously extracted as training samples, and relatively perfect detection characteristics are constructed, so that the accuracy of encrypted flow detection can be further improved.

It should be noted that, in the present invention, at least two types of feature data are extracted, and depending on the data processing by the system, the data type of the data whose data type is not a numerical value may be changed by the system among the extracted feature data, and in order to enable the predetermined algorithm to be used to identify the feature data, the data type of the feature data is set again in step S120 as the data type before extraction.

As described above, among the features of the network session, non-numerical features play an important role in fully describing malicious encrypted traffic, and the non-numerical feature data is mainly classified data.

The existing encrypted flow detection model cannot directly identify and process the classification characteristics, and only one-hot coding (one-hot) needs to be carried out on the classification characteristic data to process the classification characteristics, so that the classification data is thinned. However, if the categories are too many, data becomes too sparse after one-hot processing, which greatly increases the size of the training set and wastes computing resources. In order to avoid the waste of the computing resources, the invention adopts an algorithm which can directly identify and process the classification characteristics and the numerical characteristics. Meanwhile, the algorithm capable of directly identifying and processing the classification characteristic and the numerical characteristic is adopted, so that the numerical characteristic and the classification characteristic can be simultaneously selected as training samples, malicious encrypted flow can be comprehensively described, and the detection accuracy is improved.

The LightGBM algorithm and the Catboost algorithm can directly identify and process the classification features, so that the encrypted traffic detection model can be constructed by using the algorithms.

The LightGBM algorithm is a novel Gradient Boost Decision Tree (GBDT) algorithm, and is currently widely applied to relevant fields such as classification, regression, training and the like. The method mainly has the following advantages: 1. the method comprises unilateral sampling based on gradient and mutually exclusive feature binding, and meets the requirements of efficiency and expandability under the conditions of high dimension and mass data; 2. the algorithm based on the histogram is used for accelerating the training process and reducing the memory consumption; 3. the tree generation strategy growing according to the leaf nodes is adopted, so that the generalization performance of the algorithm is improved; 4. the classification characteristics can be directly processed, and the problems that data becomes too sparse after one-hot processing and computing resources are wasted are avoided.

The Catboost algorithm is a Boosting ensemble learning algorithm, mainly solves the learning of classification features, and can directly process and learn character type classification features. The method mainly has the following advantages: 1. the method supports a Graphics Processing Unit (GPU), and is more efficient in calculation; 2. providing a training process visualization function; 3. and supporting modeling of various languages such as Python, R and the like.

The inventor experiments of the invention show that the encrypted flow detection model constructed by the Catboost algorithm has about 0.05% difference in the indexes of accuracy, F1 value (F-measure), recall rate and Area Under the Curve (AUC) compared with the encrypted flow detection model constructed by the LightGBM algorithm.

Based on the difference, the LightGBM algorithm is selected to construct the encrypted traffic detection model in the embodiment. Since the LightGBM algorithm can directly identify and process the classification feature of the "category" type, and depending on the data processing of the system, the data type of the extracted feature of the network session, which is originally the "category" type, may become a character type or an "object" type, and in order to enable the LightGBM algorithm to identify the above feature data, the data type of the extracted feature of the network session, which is originally the "category" type, needs to be set as "category".

The inventor of the invention finds that the existing encrypted traffic detection model based on session statistics cannot detect malicious encrypted traffic in real time. In the invention, the characteristic data of the network session can be extracted from the PCAP packet, the real-time network interface or other network flow files, thereby realizing the real-time detection of the encrypted flow.

In the present embodiment, the feature data of the network session is extracted from the static PCAP packet and/or the real-time network traffic, and further, the feature data of the network session required by the present invention may be extracted using the open source software Zeek.

As mentioned above, the session connection characteristics represent the characteristics of malicious encrypted traffic on the connection traffic; TLS/SSL session characteristics and X509 certificate characteristics represent the characteristic representation of malicious traffic on encryption attributes; the DNS feature represents whether there is a problem with the domain name used in the session, such as possibly a DGA domain name. To fully describe the malicious encrypted traffic, the characteristics related to the construction of the encrypted traffic detection model can be selected according to the characteristic expression of the malicious traffic in different attributes. The http feature may also be used, but the inventors believe it will die in the future and will therefore not be embodied in this embodiment.

As an embodiment of the present invention, the feature of the network session may be selected as follows to construct the encrypted traffic detection feature:

and extracting 62 session connection features, TLS/SSL session features, X509 certificate features and DNS features related to building a malicious encrypted traffic detection model from the network session. The extracted features include a numerical type feature and a "category" type feature. The method specifically comprises the following steps:

session connection characteristics refer to communication session characteristics associated with encrypted traffic communications. In the present embodiment, 5 features such as "session duration" are selected, as shown in table 1.

TABLE 1

TLS/SSL session characteristics refer to TLS/SSL handshake characteristic data generated in the process of carrying out encryption communication by using TLS/SSL protocol. The present embodiment selects 11 of the features, as shown in table 2.

TABLE 2

And the X509 certificate feature refers to certificate data transmitted by a server side in the process of carrying out encrypted communication by using the TLS/SSL protocol. The present embodiment has 33 of these features, as shown in table 3.

TABLE 3

/>

The DNS feature refers to the feature contained in the DNS requested before the session starts, and the DNS feature is selected mainly in consideration of the fact that the DNS domain name used by some malicious encrypted traffic is greatly different from a common normal domain name. 13 of these features were selected in this embodiment as shown in table 4.

TABLE 4

/>

The present embodiment is directed to a network session, because when a TLS/SSL session is first established and the session is already established, the session information includes important features such as TLS/SSL handshake and certificate, while a TLS/SSL session restored using previous session information does not include the above-mentioned information, in order to extract an effective detection feature from the session, the network session must satisfy that the TLS/SSL session includes important features such as TLS/SSL handshake and certificate, that is, the TLS/SSL session is first established and the session is already established.

Optionally, in order to use the extracted features of the network session for model training to obtain the encrypted traffic detection model, constructing a training sample set further includes: classifying the training samples into 'malicious' or 'normal' according to the nature of the network session, and constructing a training sample set

x _i Representing characteristic data, y _i In the present embodiment, the corresponding tag data is represented by 1 for malicious purpose, 0 for normal purpose, or in a customized manner.

Optionally, as an error-proofing process, in this embodiment, the preprocessing the training samples in the training sample set may include: the feature number of the training sample is checked, and if the training sample does not meet the specified feature number (in the present embodiment, the specified feature number is 62, wherein, the session connection feature is 5, the TLS/SSL session feature is 11, the X509 certificate feature is 33, and the DNS feature is 13), the training sample is discarded as a problem sample.

Optionally, fig. 2 is a flowchart for constructing an encrypted traffic detection model by using the predetermined algorithm. As shown in fig. 2, the constructing the encrypted traffic detection model by using the predetermined algorithm includes:

in step S131, the training sample set after the preprocessing is used to find the optimal hyper-parameter of the predetermined algorithm.

In general, the hyper-parameters have an important influence on the prediction accuracy. The hyper-parameters in the LightGBM algorithm determine the accuracy of the model, the speed of building the model and whether the model is over-fitted, so the number and the variation range of the hyper-parameters need to be determined, and the optimal hyper-parameters of the model are further obtained to build the optimal encrypted traffic detection model. In this embodiment, the parameters that the LightGBM algorithm needs to optimize are shown in table 5.

TABLE 5

Parameter name	Interpretation of parameters
		num_leaves	The number of leaves of each tree determines the accuracy of the model
learning_rate	Controlling the speed of iteration and determining model accuracy
		max_depth	Maximum depth of tree, determining whether model is overfitting
min_data_in_leaf	The minimum number of records a leaf may contain determines whether the model is overfitting
		feature_fraction	The proportion of randomly selected features in each iteration of the building tree determines the model building speed
bagging_fraction	The proportion of data used per iteration is typically used to speed up training and avoid overfitting
		max_bin	The maximum bin number of the inserted characteristic value determines the model construction speed
bagging_freq	Frequency of bagging, determining whether the model is overfitting
		n_estimators	The number of iterations is improved, and the accuracy of the model is determined

Optionally, in this embodiment, all training samples in the training sample set that is preprocessed in step S120 are used to find the optimal hyper-parameter of the encrypted traffic detection model.

Optionally, in this embodiment, any one of a grid search method, a random search method, or a heuristic method is used to find the optimal hyper-parameter of the model; and when the optimal hyper-parameter is searched, an N-fold cross validation method is adopted.

The grid search method is an exhaustive search method for the designated parameter values, namely, the possible values of each parameter are arranged and combined, all the possible combination results are listed to generate a grid, and the parameters of the estimation function are optimized by a cross validation method to obtain the optimal hyper-parameters.

The random search method does not exhaust all parameter values, but extracts a fixed number of parameter values according to a specified distribution to find the optimal hyper-parameter.

The heuristic method usually uses optimization algorithms such as particle swarm optimization and difference algorithm to find the optimal hyper-parameter.

The inventor researches and discovers that theoretically, the grid search algorithm has the lowest efficiency, the random search algorithm has the next lowest efficiency, and the heuristic method has the highest efficiency; in the aspect of implementation, the grid search algorithm and the random search algorithm are simpler, and the heuristic method is more complex.

The basic idea of cross validation is to group the original data in a certain sense, one part is used as a training set, the other part is used as a validation set, firstly, the training set is used for training the classifier, and then the validation set is used for testing the model obtained by training, so that the model is used as the performance index for evaluating the classifier. The purpose of cross-validation is to obtain a reliable and stable model.

In step S132, the optimal hyper-parameter is adopted, the preprocessed training sample set is used, and the predetermined algorithm is used for training, so as to obtain the encrypted traffic detection model.

In this embodiment, the optimal hyper-parameter obtained in step S131 and all training samples in the training sample set preprocessed in step S120 are used to train with the LightGBM algorithm, and the detection model is obtained.

Optionally, fig. 3 is a flowchart for detecting an object to be detected by using the constructed encrypted traffic detection model. As shown in fig. 3, the detecting the object to be detected by using the constructed encrypted traffic detection model includes:

in step S141, the feature of the object to be measured is extracted.

Alternatively, the object to be tested may be a static PCAP data packet file or a dynamic real-time network traffic file.

Optionally, in this embodiment, the extracted features of the object to be tested include 62 session connection features (as shown in table 1), TLS/SSL session features (as shown in table 2), X509 certificate features (as shown in table 3), and DNS features (as shown in table 4) of the network session to be tested.

In step S142, the extracted feature of the object to be measured is preprocessed, so that the data type of the feature of the object to be measured, in which the data type before extraction is the data type that can be recognized by the predetermined algorithm, is set as the data type that can be recognized by the predetermined algorithm.

Optionally, in this embodiment, because the LightGBM algorithm is capable of directly identifying and processing the feature of the "category" type, depending on the data processing system, the data type of the feature of the extracted object to be tested, which is originally the "category" type, may become a character type or an "object" type, and in order to enable the LightGBM algorithm to identify the above feature data, the data type of the feature of the extracted object to be tested, which is originally the "category" type, needs to be set as the "category".

In step S143, the obtained characteristics of the object to be detected are input into the encrypted traffic detection model for identification.

Optionally, in this embodiment, the inputting the obtained feature of the object to be detected into the encrypted traffic detection model for identification further includes: and obtaining the abnormal probability value p of the object to be detected by the encryption detection model, comparing the abnormal probability value p with a set threshold value epsilon, if p is larger than epsilon, judging that the object to be detected is malicious flow, and otherwise, judging that the object to be detected is normal flow.

Because the false alarm rate of the algorithm can generate a plurality of false positives, safety analysis personnel can not obtain effective alarm, and the result of the algorithm loses significance. Therefore, a method for dynamically setting the threshold epsilon can be adopted, and a proper threshold is set by combining the size of the false alarm rate generated by the algorithm, so that the false alarm rate of the algorithm is reduced, and the accuracy of encrypted flow detection is improved.

Optionally, in this embodiment, a threshold value for making the false positive rate obtained by the N-fold cross validation one in ten thousandth is selected during training.

As a second aspect of the present invention, an encrypted traffic detection apparatus is provided, and fig. 4 is a block diagram of the encrypted traffic detection apparatus. As shown in fig. 4, the system includes a feature extraction module 110, a feature data processing module 120, an encrypted traffic detection model building module 130, and an encrypted traffic detection module 140.

A feature extraction module 110, configured to perform step S110, specifically, the training sample construction module 110 is configured to extract features of the web session from the target file as training samples, and construct a training sample set, where data in the training samples includes data of at least two data types.

A feature data processing module 120, configured to perform step S120, specifically, the feature data processing module 120 is configured to perform preprocessing on a training sample in the training sample set, so as to set a data type of a predetermined training sample as a data type that can be recognized by a predetermined algorithm, and obtain a preprocessed training sample set, where the predetermined training sample includes features of a network session, where a previous data type is the data type that can be recognized by the predetermined algorithm, extracted from a target file, and the predetermined algorithm can recognize features of at least two data types.

The encrypted flow detection model building module 130 is configured to execute step S130, and specifically, the model building module 130 is configured to build the encrypted flow detection model by using the pre-processed training sample set and using the predetermined algorithm.

The encrypted flow detection module 140 is configured to execute step S140, and specifically, the encrypted flow detection module 140 is configured to detect the object to be detected by using the constructed encrypted flow detection model.

Optionally, the encrypted traffic detection model 130 includes an optimal hyperparameter selection module 150 and a model training module 160.

An optimal hyper-parameter selection module 150, configured to execute step S131, specifically, the optimal hyper-parameter selection module 150 is configured to find an optimal hyper-parameter of the predetermined algorithm by using the preprocessed training sample set.

The model training module 160 is configured to execute step S132, specifically, the model training module 160 is configured to perform training by using the pre-processed training sample set and using the predetermined algorithm by using the optimal hyper-parameter, so as to obtain the encrypted flow detection model.

Optionally, the feature extraction module 110 is further configured to execute step S141, that is, extract features of the object to be tested according to the features of the network session determined during model building.

Correspondingly, the feature data processing module 120 is further configured to execute step S142, that is, perform preprocessing on the extracted feature of the object to be tested, and set the data type of the extracted feature of the object to be tested, where the data type before extraction is the data type that can be recognized by the predetermined algorithm, as the data type that can be recognized by the predetermined algorithm.

Correspondingly, the encrypted flow detection module 140 is further configured to perform step S143, that is, input the preprocessed extracted feature of the object to be detected into the encrypted flow detection model for identification.

The working principle and the beneficial effect of the encryption traffic detection method have been described in detail above, and are not described again here.

Computer-readable storage media include volatile and nonvolatile, removable and non-removable media implemented in any method or technology for storage of information such as computer-readable instructions, data structures, program modules or other data. Computer-readable storage media includes, but is not limited to, RAM, ROM, EEPROM, flash memory or other memory technology, CD-ROM, digital Versatile Disks (DVD) or other optical disk storage, magnetic cassettes, magnetic tape, magnetic disk storage or other magnetic storage media, or any other medium which can be used to store the desired information and which can be accessed by a computer.

one or more processors;

It will be understood that the above embodiments are merely exemplary embodiments taken to illustrate the principles of the present invention, which is not limited thereto. It will be apparent to those skilled in the art that various modifications and improvements can be made without departing from the spirit and substance of the invention, and these modifications and improvements are also considered to be within the scope of the invention.

Claims

1. An encrypted traffic detection method, characterized in that the encrypted traffic detection method comprises:

detecting the object to be detected by using the constructed encrypted flow detection model;

wherein the detecting the object to be detected by using the constructed encrypted flow detection model comprises:

extracting the characteristics of an object to be detected;

inputting the preprocessed extracted characteristics of the object to be detected into the encrypted flow detection model for identification so as to determine whether the object to be detected is malicious flow;

wherein the characteristics of the network session comprise session connection characteristics, TLS/SSL session characteristics, X509 certificate characteristics and DNS characteristics.

2. The encrypted flow detection method according to claim 1, wherein the data in the training samples includes numerical data and classification data, and the predetermined algorithm is capable of recognizing and processing the numerical data and the classification data.

3. The encrypted traffic detection method according to claim 2, wherein the predetermined algorithm includes a LightGBM algorithm or a Catboost algorithm.

4. The encrypted traffic detection method according to claim 1, wherein the target file includes a static packet file and/or a real-time network traffic file.

5. The encrypted traffic detection method of claim 1, wherein a TLS/SSL session of the web session contains TLS/SSL handshake and certificate information.

6. The encrypted traffic detection method according to any one of claims 1 to 5, wherein constructing the encrypted traffic detection model includes:

7. An encrypted flow rate detection device, characterized by comprising:

the encrypted flow detection module is used for detecting an object to be detected by using the constructed encrypted flow detection model;

the characteristic extraction module is also used for extracting the characteristics of the object to be detected;

the characteristic data processing module is further used for preprocessing the extracted characteristic of the object to be detected, and setting the data type of the extracted characteristic of the object to be detected, of which the data type before extraction is the data type which can be identified by a preset algorithm, as the data type which can be identified by the preset algorithm;

the encrypted flow detection module is further used for inputting the preprocessed extracted characteristics of the object to be detected into the encrypted flow detection model for identification so as to determine whether the object to be detected is malicious flow;

wherein the characteristics of the network session include a session connection characteristic, a TLS/SSL session characteristic, an X509 certificate characteristic, and a DNS characteristic.

8. The encrypted flow rate detection device of claim 7, wherein the data in the training samples includes numerical data and classification data, and the predetermined algorithm is capable of identifying and processing the numerical data and the classification data.

9. The encrypted traffic detection device of claim 8, wherein the predetermined algorithm comprises a LightGBM algorithm or a Catboost algorithm.

10. The encrypted traffic detection device of claim 7, wherein the destination file comprises a static packet file and/or a real-time network traffic file.

11. The encrypted traffic detection apparatus of claim 7, wherein a TLS/SSL session of the web session contains TLS/SSL handshake and certificate information.

12. The encrypted flow rate detection device according to any one of claims 7 to 11, wherein the model construction module includes:

13. A computer-readable storage medium for storing an executable program capable of executing the encrypted traffic detection method according to any one of claims 1 to 6.

14. An electronic device, characterized in that the electronic device comprises:

one or more processors;

a storage device having one or more programs stored thereon, which when executed by the one or more processors, cause the one or more processors to implement the encrypted traffic detection method of any one of claims 1 to 6.