CN113660210A - Malicious TLS encrypted traffic detection model training method, detection method and terminal - Google Patents

Malicious TLS encrypted traffic detection model training method, detection method and terminal Download PDF

Info

Publication number
CN113660210A
CN113660210A CN202110819680.XA CN202110819680A CN113660210A CN 113660210 A CN113660210 A CN 113660210A CN 202110819680 A CN202110819680 A CN 202110819680A CN 113660210 A CN113660210 A CN 113660210A
Authority
CN
China
Prior art keywords
tls
malicious
model
data
logistic regression
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202110819680.XA
Other languages
Chinese (zh)
Other versions
CN113660210B (en
Inventor
安晓宁
潘季明
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Topsec Technology Co Ltd
Beijing Topsec Network Security Technology Co Ltd
Beijing Topsec Software Co Ltd
Original Assignee
Beijing Topsec Technology Co Ltd
Beijing Topsec Network Security Technology Co Ltd
Beijing Topsec Software Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Topsec Technology Co Ltd, Beijing Topsec Network Security Technology Co Ltd, Beijing Topsec Software Co Ltd filed Critical Beijing Topsec Technology Co Ltd
Priority to CN202110819680.XA priority Critical patent/CN113660210B/en
Publication of CN113660210A publication Critical patent/CN113660210A/en
Application granted granted Critical
Publication of CN113660210B publication Critical patent/CN113660210B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/14Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic
    • H04L63/1441Countermeasures against malicious traffic
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/14Network analysis or design
    • H04L41/145Network analysis or design involving simulating, designing, planning or modelling of a network
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/14Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic
    • H04L63/1408Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic by monitoring network traffic
    • H04L63/1425Traffic logging, e.g. anomaly detection
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/14Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic
    • H04L63/1441Countermeasures against malicious traffic
    • H04L63/1483Countermeasures against malicious traffic service impersonation, e.g. phishing, pharming or web spoofing
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D30/00Reducing energy consumption in communication networks
    • Y02D30/50Reducing energy consumption in communication networks in wire-line communication networks, e.g. low power modes or reduced link rate

Abstract

The invention discloses a malicious TLS encrypted flow detection model training method, a detection method and a terminal, wherein the method comprises the following steps: obtaining a TLS flow sample which comprises a plurality of malicious TLS encrypted flow data and a plurality of normal flow data; extracting different classes of characteristic values of TLS flow data in a handshake stage for each piece of flow data in the TLS flow sample; establishing corresponding logistic regression models for the characteristic values of different categories; adjusting the hyper-parameters of the logistic regression model through the set training set, so that the effect of the logistic regression model based on the verification set is optimal; and taking the trained logistic regression model as a malicious TLS encryption traffic detection model. By the method, the flow can be judged to be malicious flow under the condition that any type characteristic value of the flow data is determined to meet the preset condition in the detection stage. The method disclosed by the invention can improve the generalization capability of the detection model and improve the detection speed of the model.

Description

Malicious TLS encrypted traffic detection model training method, detection method and terminal
Technical Field
The invention relates to the field of information security, in particular to a malicious TLS encrypted traffic detection model training method, a malicious TLS encrypted traffic detection model detection method and a malicious TLS encrypted traffic detection terminal.
Background
At present, a classification algorithm is usually adopted in a malicious TLS encrypted traffic detection method based on machine learning, the method firstly collects TLS traffic and context DNS traffic related to the TLS traffic, and aggregates each TLS traffic and the context DNS traffic according to a quadruple and DNS answer Address; then, extracting four dimensional characteristics of TLS parameter characteristics of each TLS flow, characteristics of certificate issued by a server, flow behavior characteristics and flow context characteristics as characteristic vectors of the flow; and finally, recursively eliminating the features with low correlation with the current data set by using a machine learning algorithm to select the features, and training machine learning models such as random forests and the like by using the selected features.
The existing scheme needs to collect context DNS traffic corresponding to TLS traffic and aggregate the traffic, so that the speed of detecting malicious TLS encrypted traffic is low.
Disclosure of Invention
The embodiment of the invention provides a malicious TLS encrypted traffic detection model training method, a detection method and a terminal, which are used for improving the generalization capability of a detection model and improving the detection speed.
In a first aspect, an embodiment of the present invention provides a method for training a malicious TLS encrypted traffic detection model, where the method includes: obtaining a TLS flow sample, wherein the TLS flow sample comprises a plurality of malicious TLS encrypted flow data and a plurality of normal flow data; extracting different classes of characteristic values of TLS flow data in a handshake stage for each piece of flow data in the TLS flow sample; establishing corresponding logistic regression models for the characteristic values of different categories; for each logistic regression model: setting a training set and a verification set for the logistic regression model, wherein the training set and the verification set respectively comprise characteristic values of flow data of corresponding categories; adding corresponding characteristic labels to the characteristic values of the traffic data in the training set; training the logistic regression model by using the characteristic value added with the characteristic label; adjusting the hyper-parameters of the logistic regression model to optimize the effect of the logistic regression model based on the verification set; and taking each trained logistic regression model as a malicious TLS encryption traffic detection model.
In a second aspect, an embodiment of the present invention provides a method for detecting malicious TLS encrypted traffic, including: acquiring unknown TLS encrypted traffic data, and extracting different types of characteristic values of the unknown TLS encrypted traffic data in a handshake stage; and loading the malicious TLS encrypted traffic detection model to respectively detect the different types of characteristic values of the unknown TLS encrypted traffic data in the handshake stage.
In a third aspect, an embodiment of the present invention provides a terminal, where the terminal includes a first processor, a first memory, and a first communication bus; the first communication bus is configured to enable connection communication between the first processor and the first memory; the first processor is configured to execute one or more computer programs stored in the first memory to implement the steps of the aforementioned malicious TLS encrypted traffic detection model training method.
In a fourth aspect, an embodiment of the present invention provides a terminal, where the terminal includes a second processor, a second memory, and a second communication bus; the second communication bus is configured to enable connection communication between the second processor and the second memory; the second processor is configured to execute one or more computer programs stored in the second memory to implement the steps of the aforementioned malicious TLS encrypted traffic detection method.
In a fifth aspect, the present invention provides a computer-readable storage medium, on which a computer program is stored, and the computer program, when executed by a processor, implements the steps of the foregoing method.
The embodiment of the invention can finish detection only by collecting the data packet in the TLS handshake phase, thereby effectively improving the generalization capability of the detection model and improving the detection speed of malicious flow.
The foregoing description is only an overview of the technical solutions of the present invention, and the embodiments of the present invention are described below in order to make the technical means of the present invention more clearly understood and to make the above and other objects, features, and advantages of the present invention more clearly understandable.
Drawings
Various other advantages and benefits will become apparent to those of ordinary skill in the art upon reading the following detailed description of the preferred embodiments. The drawings are only for purposes of illustrating the preferred embodiments and are not to be construed as limiting the invention. Also, like reference numerals are used to refer to like parts throughout the drawings. In the drawings:
FIG. 1 is a basic flow diagram of a method according to an embodiment of the present disclosure;
FIG. 2 is a method sub-flow diagram of an embodiment of the present disclosure;
fig. 3 is a schematic diagram of a basic structure of an embodiment of the present disclosure.
Detailed Description
Exemplary embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings. While exemplary embodiments of the present disclosure are shown in the drawings, it should be understood that the present disclosure may be embodied in various forms and should not be limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the disclosure to those skilled in the art.
It will be understood that various modifications may be made to the embodiments of the present application. Accordingly, the foregoing description should not be construed as limiting, but merely as exemplifications of embodiments. Other modifications will occur to those skilled in the art within the scope and spirit of the disclosure.
The accompanying drawings, which are incorporated in and constitute a part of the specification, illustrate embodiments of the disclosure and, together with a general description of the disclosure given above, and the detailed description of the embodiments given below, serve to explain the principles of the disclosure.
These and other characteristics of the present disclosure will become apparent from the following description of preferred forms of embodiment, given as non-limiting examples, with reference to the attached drawings.
It should also be understood that, although the present disclosure has been described with reference to some specific examples, a person of skill in the art shall certainly be able to achieve many other equivalent forms of the disclosure, having the characteristics as set forth in the claims and hence all coming within the field of protection defined thereby.
The above and other aspects, features and advantages of the present disclosure will become more apparent in view of the following detailed description when taken in conjunction with the accompanying drawings.
Specific embodiments of the present disclosure are described hereinafter with reference to the accompanying drawings; however, it is to be understood that the disclosed embodiments are merely exemplary of the disclosure that may be embodied in various forms. Well-known and/or repeated functions and structures have not been described in detail so as not to obscure the present disclosure with unnecessary or unnecessary detail. Therefore, specific structural and functional details disclosed herein are not to be interpreted as limiting, but merely as a basis for the claims and as a representative basis for teaching one skilled in the art to variously employ the present disclosure in virtually any appropriately detailed structure.
The specification may use the phrases "in one embodiment," "in another embodiment," "in yet another embodiment," or "in other embodiments," which may each refer to one or more of the same or different embodiments in accordance with the disclosure.
An embodiment of the present invention provides a malicious TLS encrypted traffic detection model training method, as shown in fig. 1, including:
s101, obtaining TLS flow samples, wherein the TLS flow samples comprise a plurality of malicious TLS encrypted flow data and a plurality of normal flow data, and extracting different types of characteristic values of the TLS flow samples in a handshake stage for each piece of flow data in the TLS flow samples. In this example, malicious TLS encrypted traffic data in known malicious samples can be directly extracted, for example, malicious TLS encrypted traffic can be collected by running the malicious samples in a sandbox, and then, each piece of malicious TLS encrypted traffic data is analyzed and extracted to extract feature values of different types of the malicious TLS encrypted traffic data in a handshake phase, and for normal traffic data, the malicious TLS encrypted traffic data can be directly and normally acquired, and the extraction manner of the feature values is similar to that described herein and is not described in detail.
S102, establishing corresponding logistic regression models for the characteristic values of different categories. Specifically, a logistic regression model can be established for different types of feature values to perform model training. Logistic Regression (Logistic Regression) is a linear model for solving the two-class problem, but it is also applicable to the multiple-class problem by OvR. The logistic regression model has strong interpretability, can output prediction probability and has strong characterization capability on sparse data.
Then, the following steps can be executed for each logistic regression model to realize model training: s103, setting a training set and a verification set for the logistic regression model, wherein the training set and the verification set both comprise characteristic values of flow data of corresponding categories. The eigenvalues of the traffic data in the known TLS traffic sample can be divided into a training set and a verification set, the training set and the verification set both include the eigenvalue of the normal traffic data of the corresponding category and the eigenvalue of the malicious traffic, the model is trained through the training set, and the model is verified through the verification set.
And S104, adding corresponding characteristic labels for the characteristic values of the traffic data in the training set. As an example, the traffic data includes different malicious TLS encrypted traffic, and each malicious TLS encrypted traffic has different classes of abnormal features, it is possible that client information of a certain malicious TLS encrypted traffic is abnormal, and others are normal, such as it is possible that a certificate of a certain malicious TLS encrypted traffic is abnormal and other features are normal. For this case, the eigenvalues of all the categories of malicious traffic may be labeled with an anomaly, for example, the eigenvalue labels of the anomalies may be set to 1 to indicate the characteristic anomaly. For the feature values of the normal flow data, for example, the normal feature value labels may be all set to 0, indicating that the feature is normal.
And S105, training the logistic regression model by using the characteristic value added with the characteristic label. Thereby, model training is performed based on the feature value obtained by adding the feature label to each of the determined flow data. For example, training a logistic regression model with an L2 regularization term using the data after adding the feature labels.
And S106 is executed in the training process, and the hyper-parameters of the logistic regression model are adjusted, so that the effect of the logistic regression model based on the verification set is optimal, and the trained logistic regression model of the characteristic value can be obtained. Training other types of logistic regression models according to the method can obtain a complete logistic regression model for detecting malicious TLS encrypted traffic. And taking the trained logistic regression model as a malicious TLS encryption traffic detection model. And adjusting the logistic regression model with an L2 regular term, for example, based on the training result, so that the trained logistic regression model is used as a malicious TLS encryption traffic detection model. The trained logistic regression model can be used for detecting malicious TLS encrypted traffic.
In the detection stage, detection can be completed only by collecting data packets of unknown TLS flow data in the handshake stage, so that the generalization capability of the detection model is effectively improved, and the detection speed of malicious flow is improved.
In some embodiments, extracting, for each piece of traffic data in the TLS traffic sample, its different class of feature values in the handshake phase comprises:
as shown in fig. 2, for each piece of traffic data:
s201, extracting and analyzing field data of the piece of the streaming data in a Client Hello data packet, a Server Hello data packet and a Certificate data packet in a handshake stage;
s202, storing the field data of the flow data into a corresponding JSON file.
For example, TLS traffic may be collected by a traffic collection device, a data packet of a Client Hello, a Server Hello, and a Certificate of each TLS traffic is extracted, each field in the three handshake messages is analyzed, and each analyzed field of each traffic is separately stored as a JSON file and stored in a device disk. And collecting malicious TLS encrypted traffic as a malicious sample, extracting the cleaned data packets of the Client Hello, the Server Hello and the Certificate of each piece of malicious TLS traffic, analyzing each field of the three handshake messages and storing the fields into a JSON file. In a specific implementation process, the part of the content may be implemented by a TLS traffic data acquisition and processing module, and the module referred to in this example may be a software module, a hardware module, or a combination of software and hardware, which is not limited herein.
In some embodiments, extracting the characteristic value of each malicious TLS encrypted traffic data in the handshake phase further comprises:
as shown in fig. 2, for each JSON file:
s203, respectively extracting client characteristics, server characteristics and certificate characteristics based on the field data packet in the JSON file;
and S204, storing the extracted characteristic values of the client characteristic, the server characteristic and the certificate characteristic into corresponding characteristic files.
In this example, the JSON file stored by the TLS traffic data acquisition and processing module may be acquired by the feature extraction module, that is, the JSON file of each traffic in the device disk may be read by the feature extraction module, and two-dimensional features, such as TLS parameter features of the traffic and features of a certificate issued by the server, are extracted. The extracted features are taken as a feature set F, and then the feature set F can be split into client features FcService side characteristic FsCertificate feature FcertIn which F isc∪Fs∪FcertF. Wherein FcMay include one-hot encoded TLS Version, Src Port, Dst Port, Cipher Suites List, Extension List, support _ groups List, ec _ Point _ formats List, and client Extension number; fsThe method can comprise one-hot coded TLS Version, client chosen Cipher Suite and Extension, and server chosen Extension number; fcertThe certificate validity period, whether it is an EV certificate, whether it is an OV certificate, whether it is a DV certificate, whether it is a self-signature, whether the certificate is free, the number of certificate subjects and issuers, the number of certificate extensions, and the character entropy of subject commonName may be included. The module characterizes F each flowc、Fs、FcertCorresponding characteristic values are respectivelySave to the signature file Dc、Ds、DcertIn (1).
In some embodiments, training the logistic regression model with the feature values after adding the feature labels comprises:
and iteratively solving the loss function of the logistic regression model by adopting a gradient descent method to determine the target parameters of the logistic regression model.
The feature vector of the exemplary model input in this embodiment is x ∈ RnY ∈ {0,1}, the logistic regression model can be expressed as:
Figure BDA0003171436330000071
logistic regression model output hθ(x) P (y is 1| x; θ), i.e. the probability that the logistic regression model output is the label of the sample to be predicted is 1. The training of the logistic regression model may adopt a gradient descent method, and the parameter θ is approximately solved by continuously iterating the loss function of the minimization model, where the logistic regression model loss function may be:
Figure BDA0003171436330000072
the specific model training may be implemented by using a model training module, and the specific process of training the client model may include: first, a client-side feature file D is loadedcLabeling each piece of data, setting a label of normal TLS traffic data to be 0 and a label of malicious TLS traffic data to be 1, and then using a logistic regression model with an L2 regular term as a client model called M after labeling data trainingcAnd regulating regular coefficients and other hyper-parameters to make the model reach the optimum on the verification set, and finally, optimizing the client model McPersisted to local disk.
In some embodiments, the logistic regression model includes a client model, a server model, and a certificate model; and the client model, the server model and the certificate model obtained through training are used as malicious TLS encrypted traffic detection models.
In the existing scheme, the extracted TLS parameter characteristics (including client and server characteristics) and the characteristics of certificates issued by a server are connected in series to form a vector training single model, the whole process of generating TLS encrypted traffic is regarded as a whole to be identified so as to judge whether the TLS encrypted traffic generated in the communication process is malicious or not, and the characteristics of the server, the client and the certificates are not distinguished, but one or two of the client, the server and the certificates of a lot of malicious encrypted traffic are not malicious, if the characteristics of normal clients, servers or certificates are marked as malicious, the model is misled, and the generalization performance of the model is poor. In this example, the client model M may be trained sequentially according to the above stepscService end model MsAnd certificate model Mcert. Therefore, the client message, the server message and the certificate message can be distinguished and reasoned respectively, and the identification effect of malicious TLS encrypted flow is improved.
The method only needs to collect the data packets in the TLS handshake phase, and does not need to use context DNS traffic characteristics, so that the overhead of traffic aggregation is avoided, and the detection speed of the model is effectively improved. Secondly, the invention divides the TLS parameter of each flow and the characteristic set F of two dimensions of certificate issued by the server into the client characteristic FcService side characteristic FsAnd certificate feature FcertI.e. F ═ Fc∪Fs∪FcertAnd use of Fc、Fs、FcertSeparately training client model McService end model MsAnd certificate model McertThe three models are used to identify the abnormal situations of the client, the server and the certificate respectively, compared with the existing scheme that only the feature set F is used (F ═ F-c∪Fs∪Fcert) A single model M is trained to recognize the entire process of generating TLS traffic, rather than recognize the anomalies of the client, server, and certificate separately. For many malicious encrypted traffic, its client is normal, such as "phishing websites visited using Chrome browser", in which case its client isThe existing scheme takes the 'Chrome browser accessing the phishing website' as a whole to be considered malicious, extracts a feature set F from the whole, wherein the feature set F only corresponds to one label, namely the feature F to be extracted from the Chrome browser (namely a client side)cAlso regarded as malicious, but F thereofcIt is normal in practice, so that the mislabeling of the features in this part can affect the final result of the model output, resulting in the misclassification of the model. But using Fc、Fs、FcertThe client side is identified to be normal and the server side and the certificate are identified to be abnormal by means of characteristic respective prediction, and the prediction of the server side and the certificate cannot be influenced by the wrong marking of the client side characteristics, so that malicious TLS encrypted traffic generated by the communication between the client side and the server side can be still identified, and the model detection accuracy and the recall rate are effectively improved.
The embodiment of the invention also provides a malicious TLS encrypted traffic detection method, which comprises the following steps:
acquiring unknown TLS encrypted traffic data, and extracting different types of characteristic values of the unknown TLS encrypted traffic data in a handshake stage;
loading the malicious TLS encrypted traffic detection model according to any of claims 1-5 to detect the different classes of feature values of the unknown TLS encrypted traffic data in a handshake phase, respectively.
In some embodiments, the respectively detecting the different class feature values of the unknown TLS encrypted traffic data in the handshake phase includes:
comparing model outputs of different types of characteristic values of the unknown TLS encrypted traffic data in a handshake stage with corresponding preset thresholds respectively;
and under the condition that the output of any model is greater than a preset threshold value, judging that the unknown TLS encrypted traffic data is malicious TLS traffic.
As an example, for unknown TLS traffic, as shown in fig. 3, a client profile D of the TLS traffic may be extracted through a TLS traffic data collection and processing module and a feature extraction model firstcService endFeature file DsAnd certificate profile DcertThen, the module loads the client model M from the device disk respectivelycService end model MsAnd certificate model McertAnd the client characteristic file D of the traffic to be predictedcInput to the corresponding client model McOutputting probability p of client abnormalitycThe server side feature file DsInput to the corresponding server model MsOutput probability p of server side abnormitysCertificate feature file DcertInput to the corresponding server model McertOutput probability p of server side abnormitycertFor a manually set threshold value kc、ks、kcert(default k is 0.5, which may be optionally set here without limitation), if any p > k, it is determined that the TLS traffic is malicious encrypted traffic, otherwise, it is normal TLS traffic.
The embodiment of the invention also provides a terminal, which comprises a first processor, a first memory and a first communication bus;
the first communication bus is configured to enable connection communication between the first processor and the first memory;
the first processor is configured to execute one or more computer programs stored in the first memory to implement the steps of the malicious TLS encrypted traffic detection model training method as described above.
The embodiment of the invention also provides a terminal, which comprises a second processor, a second memory and a second communication bus;
the second communication bus is configured to enable connection communication between the second processor and the second memory;
the second processor is configured to execute one or more computer programs stored in the second memory to implement the steps of the aforementioned malicious TLS encrypted traffic detection method.
An embodiment of the present invention further provides a computer-readable storage medium, where a computer program is stored on the computer-readable storage medium, and when the computer program is executed by a processor, the method for training a malicious TLS encrypted traffic detection model and/or the method for detecting malicious TLS encrypted traffic are implemented.
It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.
The above-mentioned serial numbers of the embodiments of the present invention are merely for description and do not represent the merits of the embodiments.
Through the above description of the embodiments, those skilled in the art will clearly understand that the method of the above embodiments can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware, but in many cases, the former is a better implementation manner. Based on such understanding, the technical solutions of the present invention may be embodied in the form of a software product, which is stored in a storage medium (such as ROM/RAM, magnetic disk, optical disk) and includes instructions for enabling a terminal (such as a mobile phone, a computer, a server, or a network device) to execute the method according to the embodiments of the present invention.
While the present invention has been described with reference to the embodiments shown in the drawings, the present invention is not limited to the embodiments, which are illustrative and not restrictive, and it will be apparent to those skilled in the art that various changes and modifications can be made therein without departing from the spirit and scope of the invention as defined in the appended claims.

Claims (10)

1. A malicious TLS encrypted traffic detection model training method is characterized by comprising the following steps:
obtaining a TLS flow sample, wherein the TLS flow sample comprises a plurality of malicious TLS encrypted flow data and a plurality of normal flow data;
extracting different classes of characteristic values of TLS flow data in a handshake stage for each piece of flow data in the TLS flow sample;
establishing corresponding logistic regression models for the characteristic values of different categories;
for each logistic regression model:
setting a training set and a verification set for the logistic regression model, wherein the training set and the verification set respectively comprise characteristic values of flow data of corresponding categories;
adding corresponding characteristic labels to the characteristic values of the traffic data in the training set;
training the logistic regression model by using the characteristic value added with the characteristic label;
adjusting the hyper-parameters of the logistic regression model to optimize the effect of the logistic regression model based on the verification set;
and taking each trained logistic regression model as a malicious TLS encryption traffic detection model.
2. The malicious TLS encrypted traffic detection model training method according to claim 1, wherein extracting feature values of different classes of each piece of traffic data in the TLS traffic sample in the handshake phase comprises:
for each piece of traffic data:
extracting and analyzing field data of the piece of the traffic data in a Client Hello data packet, a Server Hello data packet and a Certificate data packet in a handshake stage;
and storing the field data of the flow data into a corresponding JSON file.
3. The malicious TLS encrypted traffic detection model training method according to claim 2, wherein extracting, for each piece of traffic data in the TLS traffic sample, its different class of feature values in the handshake phase further comprises:
for each JSON file:
respectively extracting client side characteristics, server side characteristics and certificate characteristics based on field data in the JSON file;
and storing the extracted characteristic values of the client characteristic, the server characteristic and the certificate characteristic into corresponding characteristic files.
4. The method as claimed in claim 3, wherein training the logistic regression model using the feature values with the added feature labels comprises:
and iteratively solving the loss function of the logistic regression model by adopting a gradient descent method to determine the target parameters of the logistic regression model.
5. The malicious TLS encrypted traffic detection model training method according to claim 3, wherein the logistic regression model comprises a client model, a server model, and a certificate model;
and the client model, the server model and the certificate model obtained through training are used as malicious TLS encrypted traffic detection models.
6. A malicious TLS encrypted traffic detection method is characterized by comprising the following steps:
acquiring unknown TLS encrypted traffic data, and extracting different types of characteristic values of the unknown TLS encrypted traffic data in a handshake stage;
loading the malicious TLS encrypted traffic detection model according to any of claims 1-5 to detect the different classes of feature values of the unknown TLS encrypted traffic data in a handshake phase, respectively.
7. The method as claimed in claim 6, wherein the step of respectively detecting the different kinds of feature values of the unknown TLS encrypted traffic data in the handshake phase comprises:
comparing model outputs of different types of characteristic values of the unknown TLS encrypted traffic data in a handshake stage with corresponding preset thresholds respectively;
and under the condition that the output of any model is greater than a preset threshold value, judging that the unknown TLS encrypted traffic data is malicious TLS traffic.
8. A terminal, comprising a first processor, a first memory, and a first communication bus;
the first communication bus is configured to enable connection communication between the first processor and the first memory;
the first processor is configured to execute one or more computer programs stored in the first memory to implement the steps of the malicious TLS encrypted traffic detection model training method of any of claims 1-5.
9. A terminal, comprising a second processor, a second memory, and a second communication bus;
the second communication bus is configured to enable connection communication between the second processor and the second memory;
the second processor is configured to execute one or more computer programs stored in the second memory to implement the steps of the malicious TLS encryption traffic detection method as claimed in claim 6 or 7.
10. A computer-readable storage medium, characterized in that a computer program is stored on the computer-readable storage medium, which computer program, when being executed by a processor, carries out the steps of the method according to any one of claims 1 to 7.
CN202110819680.XA 2021-07-20 2021-07-20 Training method, detection method and terminal for malicious TLS encrypted traffic detection model Active CN113660210B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110819680.XA CN113660210B (en) 2021-07-20 2021-07-20 Training method, detection method and terminal for malicious TLS encrypted traffic detection model

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110819680.XA CN113660210B (en) 2021-07-20 2021-07-20 Training method, detection method and terminal for malicious TLS encrypted traffic detection model

Publications (2)

Publication Number Publication Date
CN113660210A true CN113660210A (en) 2021-11-16
CN113660210B CN113660210B (en) 2023-05-12

Family

ID=78477513

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110819680.XA Active CN113660210B (en) 2021-07-20 2021-07-20 Training method, detection method and terminal for malicious TLS encrypted traffic detection model

Country Status (1)

Country Link
CN (1) CN113660210B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114172748A (en) * 2022-02-10 2022-03-11 中国矿业大学(北京) Encrypted malicious traffic detection method

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106786560A (en) * 2017-02-14 2017-05-31 中国电力科学研究院 A kind of power system stability characteristic automatic extraction method and device
CN110417810A (en) * 2019-08-20 2019-11-05 西安电子科技大学 The malice for the enhancing model that logic-based returns encrypts flow rate testing methods
CN112217763A (en) * 2019-07-10 2021-01-12 四川大学 Hidden TLS communication flow detection method based on machine learning

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106786560A (en) * 2017-02-14 2017-05-31 中国电力科学研究院 A kind of power system stability characteristic automatic extraction method and device
CN112217763A (en) * 2019-07-10 2021-01-12 四川大学 Hidden TLS communication flow detection method based on machine learning
CN110417810A (en) * 2019-08-20 2019-11-05 西安电子科技大学 The malice for the enhancing model that logic-based returns encrypts flow rate testing methods

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114172748A (en) * 2022-02-10 2022-03-11 中国矿业大学(北京) Encrypted malicious traffic detection method

Also Published As

Publication number Publication date
CN113660210B (en) 2023-05-12

Similar Documents

Publication Publication Date Title
CN109873812B (en) Anomaly detection method and device and computer equipment
US11716347B2 (en) Malicious site detection for a cyber threat response system
US10805346B2 (en) Phishing attack detection
CN109960729B (en) Method and system for detecting HTTP malicious traffic
US10033757B2 (en) Identifying malicious identifiers
CN113705619B (en) Malicious traffic detection method, system, computer and medium
WO2020159439A1 (en) System and method for network anomaly detection and analysis
US11212297B2 (en) Access classification device, access classification method, and recording medium
EP3703329B1 (en) Webpage request identification
US20220053010A1 (en) System and method for determining a communication anomaly in at least one network
CN109039875B (en) Phishing mail detection method and system based on link characteristic analysis
Malaysia An enhanced online phishing e-mail detection framework based on evolving connectionist system
CN108023868B (en) Malicious resource address detection method and device
CN111447232A (en) Network flow detection method and device
EP3905084A1 (en) Method and device for detecting malware
WO2021169239A1 (en) Crawler data recognition method, system and device
CN113660210A (en) Malicious TLS encrypted traffic detection model training method, detection method and terminal
CN113904861A (en) Encrypted flow security detection method and device
CN109697267A (en) CMS recognition methods and device
CN111464510A (en) Network real-time intrusion detection method based on rapid gradient lifting tree model
CN112839055B (en) Network application identification method and device for TLS encrypted traffic and electronic equipment
CN114048480A (en) Vulnerability detection method, device, equipment and storage medium
De Lucia et al. Identifying and detecting applications within TLS traffic
CN116112209A (en) Vulnerability attack flow detection method and device
US20210174199A1 (en) Classifying domain names based on character embedding and deep learning

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant