CN112822167B - Abnormal TLS encrypted traffic detection method and system - Google Patents

Abnormal TLS encrypted traffic detection method and system Download PDF

Info

Publication number
CN112822167B
CN112822167B CN202011614293.4A CN202011614293A CN112822167B CN 112822167 B CN112822167 B CN 112822167B CN 202011614293 A CN202011614293 A CN 202011614293A CN 112822167 B CN112822167 B CN 112822167B
Authority
CN
China
Prior art keywords
tls
steps
message
flow
abnormal
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202011614293.4A
Other languages
Chinese (zh)
Other versions
CN112822167A (en
Inventor
樊树胜
贺本彪
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hangzhou Zhongdian Anke Modern Technology Co ltd
Original Assignee
Hangzhou Zhongdian Anke Modern Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hangzhou Zhongdian Anke Modern Technology Co ltd filed Critical Hangzhou Zhongdian Anke Modern Technology Co ltd
Priority to CN202011614293.4A priority Critical patent/CN112822167B/en
Publication of CN112822167A publication Critical patent/CN112822167A/en
Application granted granted Critical
Publication of CN112822167B publication Critical patent/CN112822167B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/14Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic
    • H04L63/1408Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic by monitoring network traffic
    • H04L63/1416Event detection, e.g. attack signature detection
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/243Classification techniques relating to the number of classes
    • G06F18/24323Tree-organised classifiers
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D30/00Reducing energy consumption in communication networks
    • Y02D30/50Reducing energy consumption in communication networks in wire-line communication networks, e.g. low power modes or reduced link rate

Abstract

The invention provides a detection method for abnormal TLS encrypted traffic, which is characterized by comprising the following steps: s1: respectively acquiring a flow message data set S2 of abnormal encrypted flow and normal flow: carrying out information preprocessing on the acquired TLS flow message data set of the normal flow; s3: establishing a decision tree prediction model, introducing a data sample into a random forest model, and training the decision tree prediction model; s4: and classifying or predicting the preprocessed new data by using a classification algorithm according to the generated rule. According to the method, the abnormal flow in the encrypted flow message can be identified by using a random forest algorithm, technical support can be provided for user privacy protection and network security, and TLS handshake information in the abnormal encrypted flow message plays an important role in identifying the abnormal encrypted flow message in the process of identifying the abnormal flow message.

Description

Abnormal TLS encrypted traffic detection method and system
Technical Field
The invention relates to the field of network technology security, in particular to a method and a system for detecting abnormal TLS encrypted traffic.
Background
The statements of background art in this application, as they pertain to the present application, are provided solely for the purpose of illustration and description to facilitate an understanding of the present application, and are not to be construed as admissions or conjectures of applicants as prior art at the date of filing the present application for the first time.
With the development of artificial intelligence, some internet companies have a problem that users do not agree to collect privacy data of the users in order to analyze user behaviors, and encrypt traffic messages generated in a data collection process. Some users have the behavior of bypassing gfw detection by using illegal software, because the illegal software cannot perform traffic limiting operation on the traffic currently by adopting a special encryption mode. In order to identify the abnormal flow data, the abnormal flow message identification rate of the text by collecting the flow message under the specific environment and carrying out data analysis reaches 99.3%.
With the development of the technology, the traffic encryption technology is more and more mature, and meanwhile, the traffic encryption technology is applied in a larger scale, a ca certificate is authenticated by most websites, the traffic data is protected by a non-plaintext transmission mode, meanwhile, great difficulty and challenge are brought to the identification of abnormal traffic, and the abnormal traffic identification tool developed aiming at the plaintext transmission cannot realize the identification of the abnormal encrypted traffic.
On one hand, some monopoly company built-in services invade user privacy, for example, the applet company default configuration collects user location information, part of apps of a mobile phone force users to provide location information, address book information and the like, otherwise the apps cannot be used, and in a microsoft windows 10 system, part of privacy data which are not agreed by the users are collected in a silent state and are transmitted to a microsoft server in an encrypted traffic transmission mode. Users unknowingly become machines that provide data for these large companies.
On the other hand, after the way that the mainstream of Shadowcheck and v2rayN bypasses GFW detection, the way of bypassing GFW detection by means of torjan appears in the market currently, and the encryption technology adopts the way of conventional https to access ca certificate authentication. Because the similarity degree of the encryption mode and the traffic of the normally accessed webpage is too high, the prior art cannot realize the identification of the encrypted traffic, and a new challenge is provided for the domestic network security environment.
Disclosure of Invention
In order to solve the problems, the method comprises the steps of collecting key information in encrypted flow, screening key information fields by using feature engineering after research, training a random forest algorithm by using screened data, monitoring bypass flow by using the trained algorithm, and identifying abnormal encrypted flow.
The invention aims to provide an abnormal TLS encrypted traffic detection method, which is characterized by comprising the following steps:
s1: respectively acquiring flow message data sets of abnormal encrypted flow and normal flow;
s2: carrying out information preprocessing on the acquired TLS flow message data set of the normal flow;
s3: establishing a decision tree prediction model, introducing a data sample into a random forest model, and training the decision tree prediction model;
s4: and classifying or predicting the preprocessed new data by using a classification algorithm according to the generated rule.
Optionally, in step S1, the source of the abnormal traffic specifically includes:
TLS encrypted flow message including user information in a system standing state; and
and the target ip address of the message handshake request is different from the real ip address of the user accessing the network resource.
Optionally, in step S1, the obtaining a flow data set of a normal flow includes:
all ip address types under the silent state are configured into ip addresses which are not allowed to be accessed, then messages generated during the period that users normally access various types of mainstream websites through browsers are simulated, and the messages are marked as normal flow.
Optionally, in the step S2, the acquired TLS traffic packet data set includes:
the Client hello message acquisition information specifically comprises the following steps: the method comprises the following steps of (1) data packet length, data packet arrival interval time sequence, TLS recording length, TLS recording time, TLS content type, TLS handshake type, TLS cipher suite, TLS extension length, TLS extension type, TLS version number, TLS random number and 10 parameters;
the method for collecting information in the message of the Server hello, the Certificate option, the Certificate request option and the Server key exchange option specifically comprises the following steps: the method comprises the following steps of (1) data packet length, data packet arrival interval time sequence, TLS recording length, TLS recording time, TLS content type, TLS handshake type, TLS password suite, TLS version, TLS session ID and TLS random;
the information acquisition method comprises the following steps of collecting information in a message of a Certificate option, a Client key exchange and a Certificate version option, and specifically comprises the following steps: the method comprises the following steps of (1) packet length, packet arrival interval time sequence, TLS record length, TLS content type, TLS handshake type, TLS version and TLS key length;
the information acquisition in the Change cipher spec message specifically comprises the following steps: packet length, packet arrival interval time sequence, TLS record length, TLS content type, TLS handshake type, TLS version.
Optionally, the step S3 specifically includes:
s3.1) carrying out recursive analysis on the training set to generate an inverted decision tree structure;
s3.2) analyzing the path of the tree from the root node to the leaf node to generate a series of rules;
s3.3) generating t decision trees, and then forming a random forest model.
Optionally, in step S4, in the classification algorithm of the random forest, m =7 is calculated as follows:
Figure SMS_1
the invention also provides an abnormal TLS encrypted flow detection system, which comprises the following units:
an obtaining unit, configured to obtain flow message data sets of an abnormal encrypted flow and a normal flow, respectively;
the information preprocessing unit is used for preprocessing the acquired TLS flow message data set of the normal flow;
the model establishing and training unit is used for establishing a decision tree prediction model, introducing a data sample into a random forest model and training the decision tree prediction model;
and the classification prediction unit is used for classifying or predicting the preprocessed new data by using a classification algorithm according to the generated rule.
Optionally, the model building and training unit further includes:
the decision tree generating module is used for carrying out recursive analysis on the training set to generate an inverted decision tree structure;
the rule generating module is used for analyzing the path of the tree from the root node to the leaf node to generate a series of rules;
and the random forest model generation module is used for generating t decision trees and then forming a random forest model.
The invention also provides a computer-readable storage medium, on which a computer program is stored, characterized in that the program, when executed by a processor, carries out the steps of any of the methods described above.
The invention also provides a terminal comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor implements the steps of any of the above methods when executing the program.
Compared with the prior art, the scheme implemented by the invention at least has the following beneficial effects: the invention utilizes the importance of the random forest energy calculation parameters, and only selects a small number of important characteristics of several dimensions to approximately represent the original data, thereby having the function of reducing the dimensions of the data. In addition, the method can be defined according to the abnormal data characteristics, and when the data has a plurality of different characteristics, the method is used for characteristic selection, and key characteristics are selected to be used in an algorithm, so that an accurate prediction result is obtained.
The invention utilizes the natural parallelism of the random forest, can well process large-scale data, and can be easily used in a distributed environment.
The invention also can identify the abnormal flow in the encrypted flow message by using a random forest algorithm, can provide technical support for user privacy protection and network security, and has an important role in identifying the abnormal encrypted flow message by using TLS handshake information in the abnormal encrypted flow message in the identification process of the abnormal flow message. The random forest algorithm has advantages for identifying abnormal encrypted flow messages.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the invention and together with the description, serve to explain the principles of the invention. It is obvious that the drawings in the following description are only some embodiments of the invention, and that for a person skilled in the art, other drawings can be derived from them without inventive effort. In the drawings:
fig. 1 (a) - (c) are schematic diagrams illustrating a normal complete https request, three-way handshake, four-way hand waving and data transmission process of interaction between a client and a server;
fig. 2 (a) shows a general format of an http request message;
FIG. 2 (b) shows the common http request header code and meaning specification;
fig. 3 (a) shows the general format of an http response message;
FIG. 3 (b) shows a common http status code and meaning specification;
FIG. 3 (c) shows a common http response header and meaning specification;
FIG. 4 illustrates a flow diagram of an embodiment of an anomalous TLS encrypted traffic detection method of the present invention;
fig. 5 shows an example of storing, in step S2, TLS handshake data of each time as a piece of data in a report in the method for detecting an abnormal TLS encrypted traffic according to the present invention;
fig. 6 is a flowchart illustrating a specific embodiment of step S3 in the method for detecting an abnormal TLS encrypted traffic according to the present invention;
FIG. 7 shows the present invention FIG. 7 shows a flow of a classification algorithm for a classification algorithm using a random forest in an embodiment of the present invention;
figure 8 illustrates a block diagram of an embodiment of the anomalous TLS encrypted traffic detection system of the present invention.
Detailed description of the preferred embodiments
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention will be described in further detail with reference to the accompanying drawings. All other embodiments, which can be obtained by a person skilled in the art without making any creative effort based on the embodiments in the present invention, belong to the protection scope of the present invention.
The terminology used in the embodiments of the invention is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention. As used in the examples of the present invention and the appended claims, the singular forms "a", "an", and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise, and "a plurality" typically includes at least two.
It should be understood that the term "and/or" as used herein is merely one type of association that describes an associated object, meaning that three relationships may exist, e.g., a and/or B may mean: a exists alone, A and B exist simultaneously, and B exists alone. In addition, the character "/" herein generally indicates that the former and latter related objects are in an "or" relationship.
It should also be noted that the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a good or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such good or apparatus. Without further limitation, an element defined by the phrases "comprising one of \8230;" does not exclude the presence of additional like elements in an article or device comprising the element.
Alternative embodiments of the present invention are described in detail below with reference to the accompanying drawings.
Fig. 1 shows a schematic diagram of the interaction process between a client and a server for a normally complete https request.
As shown in fig. 1 (a) - (c), a normal and complete https request is sent, and the client and the server interact with each other through three handshakes, four waving hands, and data transmission.
Fig. 1 (a) shows 7 steps of HTTP request and response in full.
Fig. 1 (b) shows a TCP three-way handshake process. A connection must be established between two parties before either party can send data to the other party. Among the TCP/IP protocols, the TCP protocol provides reliable connection services, and the connection is initialized through three-way handshake. The purpose of the three-way handshake is to synchronize the sequence and acknowledgement numbers of both parties and exchange TCP window size information.
First handshake: a connection is established. The client sends a connection request message segment, the SYN position is 1, and the sequence Number is x; then, the client enters a SYN _ SEND state and waits for the confirmation of the server;
second handshake: the server receives the SYN segment. The server receives the SYN segment of the client, needs to confirm the SYN segment, and sets acknowledgement Number as x +1 (Sequence Number + 1); meanwhile, the self also sends SYN request information, the SYN position is 1, and the sequence Number is y; the server end puts all the information into a segment (namely SYN + ACK segment) and sends the segment to the client end, and at the moment, the server enters a SYN _ RECV state;
and (3) third handshake: the client receives the SYN + ACK segment from the server. And then setting the acknowledgement Number to be y +1, sending an ACK segment to the server, and after the segment is sent, enabling the client and the server to enter an ESTABLISHED state to finish TCP three-way handshake.
The three-way handshake can effectively prevent the failed connection request message segment from being suddenly transmitted to the server side again, thereby generating errors.
FIG. 2 shows components and an illustration of an HTTP request message.
As shown in fig. 2 (a), an HTTP request message is composed of 4 parts, i.e., a request line (request line), a request header (header), a null line, and request data.
The request header adds some additional information to the request message, and is composed of name/value pairs, wherein each row is paired, and the name and the value are separated by colon.
As shown in fig. 2 (b), a common http request header and a meaning specification are shown.
Fig. 1 (c) shows the TCP four-hand waving process. After the client and the server establish the TCP connection through the three-way handshake, the TCP connection is definitely to be disconnected after the data transmission is completed. That is, for a TCP disconnect, there is a "four wave".
Waving hands for the first time: a host 1 (a client can be used, and the server can be used) sets a Sequence Number, and sends a FIN message segment to a host 2; at this time, the host 1 enters a FIN _ WAIT _1 state; this indicates that host 1 has no data to send to host 2;
waving hands for the second time: the host 2 receives the FIN segment sent by the host 1 and returns an ACK segment to the host 1, wherein the acknowledgement Number is the Sequence Number plus 1; the host 1 enters a FIN _ WAIT _2 state; host 2 tells host 1 that i "agree" to your close request;
and c, waving hands for the third time: the host 2 sends a FIN message segment to the host 1 to request to close the connection, and meanwhile, the host 2 enters a LAST _ ACK state;
fourth hand waving: the host 1 receives the FIN segment sent by the host 2, sends an ACK segment to the host 2, and then the host 1 enters a TIME _ WAIT state; after the host 2 receives the ACK message segment of the host 1, the connection is closed; at this time, the host 1 still does not receive the reply after waiting for 2MSL, which proves that the Server is normally closed, and then the host 1 can also close the connection.
The TCP protocol is a connection-oriented, reliable, byte stream based transport layer communication protocol. TCP is full duplex mode, which means that when host 1 sends a FIN segment, it only indicates that host 1 has no data to send, and host 1 tells host 2 that its data has been sent completely; however, at this time, the host 1 can accept the data from the host 2; when host 2 returns an ACK segment, it indicates that it already knows that host 1 has no data to send, but that host 2 can still send data to host 1; when host 2 also sends a FIN segment, this time indicating that host 2 has no data to send, host 1 is told that i have no data to send, and then each other haphazardly breaks the TCP connection. Thus, this produces four hand swings.
As shown in fig. 3 (a), the HTTP response packet is mainly composed of a status line, a response header, an empty line, and response data.
The status row consists of 3 parts, respectively: protocol version, status code description.
The protocol version is consistent with the request message, and the state code description is a simple description of the state code. The status code is a 3-bit number.
1xx: indication information-indicating that the request has been received, processing continues.
2xx: success-meaning that the request has been successfully received, understood, accepted.
3xx: redirect-further action must be taken to complete the request.
4xx: client error-request has syntax error or request cannot be fulfilled.
5xx: server side error-the server fails to fulfill a legitimate request.
For example, a common status code is shown in fig. 3 (b), and a common response header is shown in fig. 3 (c).
Through the above three-way handshake and four-way waving steps, HTTP requests and responses are completed, and data transfer is possible.
Fig. 4 shows a flowchart of a first embodiment of a method of the abnormal TLS encrypted traffic detection method of the present invention.
As shown in fig. 4, the method for detecting the abnormal TLS encrypted traffic of the present invention includes the following steps:
s1: respectively acquiring flow message data sets of abnormal encrypted flow and normal flow;
the method specifically comprises the following steps:
s1.1) acquiring an abnormal flow message data set;
the source of the abnormal traffic is,
for example: TLS encryption flow message including user information in a system standing state;
also for example: and in the flow message generated by the illegal software, the destination ip address of the message handshake request is different from the real ip address of the user accessing the network resource. The destination address of the flow message is a proxy server address, and the flow message is forwarded by the proxy server in a TLS encryption mode.
S1.2) acquiring a normal flow message data set;
all ip address types under the silent state are configured into ip addresses which are not allowed to be accessed, then messages generated during the period that users normally access various types of mainstream websites through browsers are simulated, and the messages are marked as normal flow.
The above steps S1.1) and S1.2) may be performed in parallel.
S2: carrying out acquisition information preprocessing on a TLS flow message data set of normal flow;
for example,
the Client hello message acquisition information specifically comprises the following steps: the method comprises the following steps of data packet length, data packet arrival interval time sequence, TLS recording length, TLS recording time, TLS content type, TLS handshake type, TLS cipher suite, TLS extension length, TLS extension type, TLS version number, TLS random number and 10 parameters.
The method for collecting information in the message of the Server hello, the Certificate option, the Certificate request option and the Server key exchange option specifically comprises the following steps: packet length, packet arrival interval time sequence, TLS record length, TLS record time, TLS content type, TLS handshake type, TLS cipher suite, TLS version, TLS session ID, TLS random.
The information acquisition method comprises the following steps of collecting information in a message of a Certificate option, a Client key exchange and a Certificate version option, and specifically comprises the following steps: packet length, packet arrival interval time sequence, TLS record length, TLS content type, TLS handshake type, TLS version, TLS key length.
Collecting information in the Change ciper spec message specifically comprises the following steps: packet length, packet arrival interval time sequence, TLS record length, TLS content type, TLS handshake type, TLS version.
The result of saving each TLS stream data as one piece of data is shown in fig. 5.
Optionally, the experimental data does not collect information such as destination ip, source ip, destination port, source port, mac address, protocol number, and message generation time, so as to prevent overfitting of the data and avoid the influence of the characteristics of the noisy data.
S3: establishing a decision tree prediction model, introducing a data sample into a random forest model, and training the decision tree prediction model.
As shown in fig. 6, the step S3 specifically includes:
s3.1) carrying out recursive analysis on the training set to generate an inverted decision tree structure;
each tree of the random forest is independently constructed, and the random forest depends on the independent direction as much as possible without depending on the construction of other trees. The key to constructing each decision tree is: and what judging conditions are placed on each decision node.
In the invention, the mode of constructing the decision tree by adopting a recursion mode is as follows:
s3.1.1) making N be the number of training samples, and then making the number of input samples of a single decision tree be N randomly extracted N training samples returned from a training set.
S3.1.2) making the number of input features of the training sample M, wherein M is far smaller than M, randomly selecting M input features from the M input features when splitting is performed on each node of each decision tree, and then selecting the best input feature from the M input features to perform splitting. m does not change during the construction of the decision tree.
S 3.1.3) each tree is split until all training examples for that node belong to the same class. Pruning is not required.
The information entropy is often used as a quantitative index of the information content of a system, and thus can be further used as a target of system equation optimization or a criterion of parameter selection. In the generation process of the decision tree, the scheme uses entropy as a criterion for dividing the optimal attribute of the sample.
The larger the information entropy is, the higher the uncertainty of the event is, and when a decision tree is constructed by using the information entropy, the degree of information entropy decrease caused by each judgment condition needs to be compared. And selecting the judgment condition which causes the maximum degree of information entropy decrement, and placing the judgment condition at the position of the root node. Such steps are performed recursively until a complete decision tree is constructed.
In short, the randomly generated decision tree does not know which parameter to judge first, and the first step is to tell the decision tree the judgment order. For example, in the process of judging whether a section of code is good or bad, whether the required function can be realized, whether a comment exists, redundancy and the like are put after the function is realized or not are judged firstly.
S3.2) analyzing the path of the tree from the root node to the leaf node to generate a series of rules;
analyzing the relationship between the parameters and the results in the decision tree generated in the step S3.1) to generate a rule for distinguishing abnormal flow judgment.
S3.3) generating t decision trees, and then forming a random forest model.
S4: and classifying or predicting the preprocessed new data by using a classification algorithm according to the generated rule.
Preferably, a random forest classification algorithm is used, and the flow of the random forest classification algorithm is shown in fig. 6.
In the classification algorithm of fig. 7, m takes a value of, for example, 7, and is calculated as follows:
Figure SMS_2
m is calculated as follows:
in step S2, the Client hello message has 11 parameters, 10 parameters are respectively taken from the messages of Server hello, central option, central request option and Server key exchange option, that is, 10 parameters 4=40 parameters, 7 parameters are respectively taken from the messages of central option, client key exchange and central option, and 6 parameters are taken from the Change chart spec message. In total, 11+40+3 + 7+ 6=78parameters, which are used as the characteristics of the decision tree.
Therefore, the invention utilizes the importance of the random forest energy calculation parameters, only selects a small number of important characteristics of several dimensions to approximately represent the original data, thereby having the function of reducing the dimension of the data. In addition, the method can be defined according to the abnormal data characteristics, and when the data has a plurality of different characteristics, the method is used for characteristic selection, and key characteristics are selected to be used in an algorithm, so that an accurate prediction result is obtained.
The invention utilizes the natural parallelism of the random forest, can well process large-scale data, and can be easily used in a distributed environment.
The invention also can identify the abnormal flow in the encrypted flow message by using a random forest algorithm, can provide technical support for user privacy protection and network security, and has an important role in identifying the abnormal encrypted flow message by using TLS handshake information in the abnormal encrypted flow message in the identification process of the abnormal flow message. The random forest algorithm has advantages for identifying abnormal encrypted flow messages.
As shown in fig. 8, the system for detecting an abnormal TLS encrypted traffic provided by the present invention may include the following units:
an obtaining unit, configured to obtain flow message data sets of an abnormal encrypted flow and a normal flow, respectively;
the information preprocessing unit is used for preprocessing the acquired TLS flow message data set of the normal flow;
the model establishing and training unit is used for establishing a decision tree prediction model, introducing a data sample into a random forest model and training the decision tree prediction model;
and the classification prediction unit is used for classifying or predicting the preprocessed new data by using a classification algorithm according to the generated rule.
As shown in fig. 8, the model building and training unit further comprises:
the decision tree generation module is used for carrying out recursive analysis on the training set to generate an inverted decision tree structure;
the rule generating module is used for analyzing the path of the tree from the root node to the leaf node to generate a series of rules;
and the random forest model generation module is used for generating t decision trees and then forming a random forest model.
The present invention also provides a computer readable storage medium having stored thereon a computer program which, when executed by a processor, implements the steps of the method of the above-described abnormal TLS encrypted traffic detection method. The computer-readable storage medium may include, but is not limited to, any type of disk including floppy disks, optical disks, DVD, CD-ROMs, microdrive, and magneto-optical disks, ROMs, RAMs, EPROMs, EEPROMs, DRAMs, VRAMs, flash memory devices, magnetic or optical cards, nanosystems (including molecular memory ICs), or any type of media or device suitable for storing instructions and/or data.
The invention also provides computer equipment comprising a memory, a processor and a computer program stored on the memory and capable of running on the processor, wherein the processor executes the program to realize the steps of the method for detecting the abnormal TLS encrypted traffic. In the embodiment of the present invention, the processor is a control center of a computer system, and may be a processor of a physical machine or a processor of a virtual machine.
The foregoing description is only exemplary of the preferred embodiments of the invention and is not intended to limit the invention in any way as to its nature or form. Although the present invention has been described with reference to the preferred embodiments, it will be understood by those skilled in the art that various changes in form and details may be made therein without departing from the spirit and scope of the invention as defined by the appended claims. However, any simple modification, equivalent replacement, improvement and the like of the above embodiments according to the technical spirit of the present invention should be included in the protection scope of the present invention without departing from the spirit and principle of the present invention.

Claims (7)

1. An abnormal TLS encrypted traffic detection method is characterized by comprising the following steps:
s1: respectively acquiring flow message data sets of abnormal encrypted flow and normal flow;
s2: carrying out information preprocessing on the acquired TLS flow message data set of the normal flow;
in step S2, the collected TLS traffic packet data set includes:
the Client hello message acquisition information specifically comprises the following steps: the method comprises the steps of data packet length, data packet arrival interval time sequence, TLS recording length, TLS recording time, TLS content type, TLS handshake type, TLS cipher suite, TLS extension length, TLS extension type, TLS version number and TLS random number;
collecting information in the message of Server hello, certificate option, certificate request option and Server key exchange option, which comprises the following steps: the method comprises the following steps of (1) data packet length, data packet arrival interval time sequence, TLS record length, TLS record time, TLS content type, TLS handshake type, TLS password suite, TLS version, TLS session ID and TLS random number;
the information acquisition method includes the following steps that information is acquired in a message of a Certificate option, a Client key exchange and a Certificate version option, and specifically includes the following steps: the method comprises the following steps of (1) data packet length, data packet arrival interval time sequence, TLS record length, TLS content type, TLS handshake type, TLS version and TLS key length;
collecting information in the Change cipher spec message specifically comprises the following steps: the method comprises the following steps of (1) packet length, packet arrival interval time sequence, TLS record length, TLS content type, TLS handshake type and TLS version;
s3: establishing a decision tree prediction model, introducing a data sample into a random forest model, and training the decision tree prediction model;
wherein, the step S3 specifically includes:
s3.1) carrying out recursive analysis on the training set to generate an inverted decision tree structure;
s3.2) analyzing the path of the tree from the root node to the leaf node to generate a series of rules;
s3.3) generating t decision trees, and then forming a random forest model;
wherein, step S3.1) specifically includes:
s3.1.1) making N be the number of training samples, wherein the number of input samples of a single decision tree is N, and N training samples are randomly extracted from a training set;
s3.1.2) making the number of input features of the training sample M, wherein M is far smaller than M, randomly selecting M input features from the M input features when splitting is performed on each node of each decision tree, and then selecting the best input feature from the M input features for splitting, wherein M cannot be changed in the process of constructing the decision tree;
s3.1.3) each tree is split in such a way until all training examples of the node belong to the same class and pruning is not needed; s4: and classifying or predicting the preprocessed new data by using a classification algorithm according to the generated rule.
2. The abnormal TLS encrypted traffic detection method of claim 1, wherein:
in step S1, the abnormal encrypted traffic specifically includes:
TLS encryption flow message including user information in a system standing state; and
and the target ip address of the message handshake request is different from the real ip address of the network resource accessed by the user.
3. The method for detecting anomalous TLS encrypted traffic as recited in claim 1, wherein:
in step S1, the acquiring a flow data set of a normal flow includes:
all ip address types under the silent state are configured into ip addresses which are not allowed to be accessed, then messages generated during the period that users normally access various types of mainstream websites through browsers are simulated, and the messages are marked as normal flow.
4. The anomalous TLS encrypted traffic detection method of claim 1,
in the step S4, in the classification algorithm of the random forest, m =7, the calculation mode is as follows:
log2M+ 1。
5. an abnormal TLS encrypted traffic detection system is characterized by comprising the following units:
an obtaining unit, configured to obtain flow message data sets of an abnormal encrypted flow and a normal flow, respectively;
the information preprocessing unit is used for preprocessing the acquired TLS flow message data set of the normal flow;
the collected TLS flow message data set comprises:
the Client hello message acquisition information specifically comprises the following steps: the method comprises the steps of data packet length, data packet arrival interval time sequence, TLS recording length, TLS recording time, TLS content type, TLS handshake type, TLS cipher suite, TLS extension length, TLS extension type, TLS version number and TLS random number;
the method for collecting information in the message of the Server hello, the Certificate option, the Certificate request option and the Server key exchange option specifically comprises the following steps: the method comprises the steps of data packet length, data packet arrival interval time sequence, TLS recording length, TLS recording time, TLS content type, TLS handshake type, TLS password suite, TLS version, TLS session ID and TLS random number;
the information acquisition method includes the following steps that information is acquired in a message of a Certificate option, a Client key exchange and a Certificate version option, and specifically includes the following steps: the method comprises the following steps of (1) data packet length, data packet arrival interval time sequence, TLS record length, TLS content type, TLS handshake type, TLS version and TLS key length;
collecting information in the Change cipher spec message specifically comprises the following steps: the method comprises the following steps of (1) packet length, packet arrival interval time sequence, TLS record length, TLS content type, TLS handshake type and TLS version;
the model establishing and training unit is used for establishing a decision tree prediction model, introducing a data sample into a random forest model and training the decision tree prediction model;
wherein, the model establishing and training unit specifically comprises:
the recursion module is used for carrying out recursion analysis on the training set to generate an inverted decision tree structure;
the analysis module analyzes the path of the tree from the root node to the leaf node and generates a series of rules;
the generation model module is used for generating t decision trees and then forming a random forest model;
wherein, the recursion module specifically comprises:
let N be the number of training samples, the number of input samples of a single decision tree is N, and N training samples are randomly extracted from the training set;
the number of input features of a training sample is set to be M, and M is far smaller than M, so that when splitting is performed on each node of each decision tree, M input features are randomly selected from the M input features, then the best input feature is selected from the M input features for splitting, and M cannot be changed in the process of constructing the decision tree;
each tree is split in such a way until all training examples of the node belong to the same class and pruning is not needed;
and the classification prediction unit is used for classifying or predicting the preprocessed new data by using a classification algorithm according to the generated rule.
6. A computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out the steps of the method according to any one of claims 1 to 4.
7. A terminal comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the processor implements the steps of the method according to any of claims 1-4 when executing the program.
CN202011614293.4A 2020-12-31 2020-12-31 Abnormal TLS encrypted traffic detection method and system Active CN112822167B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011614293.4A CN112822167B (en) 2020-12-31 2020-12-31 Abnormal TLS encrypted traffic detection method and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011614293.4A CN112822167B (en) 2020-12-31 2020-12-31 Abnormal TLS encrypted traffic detection method and system

Publications (2)

Publication Number Publication Date
CN112822167A CN112822167A (en) 2021-05-18
CN112822167B true CN112822167B (en) 2023-04-07

Family

ID=75856369

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011614293.4A Active CN112822167B (en) 2020-12-31 2020-12-31 Abnormal TLS encrypted traffic detection method and system

Country Status (1)

Country Link
CN (1) CN112822167B (en)

Families Citing this family (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113364792B (en) * 2021-06-11 2022-07-12 奇安信科技集团股份有限公司 Training method of flow detection model, flow detection method, device and equipment
CN113765911A (en) * 2021-09-02 2021-12-07 恒安嘉新(北京)科技股份公司 Method, device, equipment and storage medium for detecting webshell encrypted flow
CN114338070B (en) * 2021-09-03 2023-05-30 中国电子科技集团公司第三十研究所 Shadowsocks (R) identification method based on protocol attribute
CN114866281B (en) * 2022-03-25 2023-07-21 中国科学院计算技术研究所 Method for deploying random forest model on P4 switch
CN114884715A (en) * 2022-04-27 2022-08-09 深信服科技股份有限公司 Flow detection method, detection model training method, device and related equipment
CN115314265B (en) * 2022-07-27 2023-07-18 天津市国瑞数码安全系统股份有限公司 Method and system for identifying TLS (transport layer security) encryption application based on traffic and time sequence
CN117411731B (en) * 2023-12-15 2024-03-01 江西师范大学 Encryption DDOS flow anomaly detection method based on LOF algorithm

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109379377A (en) * 2018-11-30 2019-02-22 极客信安(北京)科技有限公司 Encrypt malicious traffic stream detection method, device, electronic equipment and storage medium
CN111385145A (en) * 2020-03-04 2020-07-07 南京信息工程大学 Encryption flow identification method based on ensemble learning

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105447525A (en) * 2015-12-15 2016-03-30 中国科学院软件研究所 Data prediction classification method and device
US10536268B2 (en) * 2017-08-31 2020-01-14 Cisco Technology, Inc. Passive decryption on encrypted traffic to generate more accurate machine learning training data
CN108833360B (en) * 2018-05-23 2019-11-08 四川大学 A kind of malice encryption method for recognizing flux based on machine learning
CN110138745B (en) * 2019-04-23 2021-08-24 极客信安(北京)科技有限公司 Abnormal host detection method, device, equipment and medium based on data stream sequence
CN110113349A (en) * 2019-05-15 2019-08-09 北京工业大学 A kind of malice encryption traffic characteristics analysis method
CN111277587A (en) * 2020-01-19 2020-06-12 武汉思普崚技术有限公司 Malicious encrypted traffic detection method and system based on behavior analysis

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109379377A (en) * 2018-11-30 2019-02-22 极客信安(北京)科技有限公司 Encrypt malicious traffic stream detection method, device, electronic equipment and storage medium
CN111385145A (en) * 2020-03-04 2020-07-07 南京信息工程大学 Encryption flow identification method based on ensemble learning

Also Published As

Publication number Publication date
CN112822167A (en) 2021-05-18

Similar Documents

Publication Publication Date Title
CN112822167B (en) Abnormal TLS encrypted traffic detection method and system
US11425047B2 (en) Traffic analysis method, common service traffic attribution method, and corresponding computer system
US11399288B2 (en) Method for HTTP-based access point fingerprint and classification using machine learning
Sija et al. A survey of automatic protocol reverse engineering approaches, methods, and tools on the inputs and outputs view
CN112468518B (en) Access data processing method and device, storage medium and computer equipment
Shahbar et al. Benchmarking two techniques for Tor classification: Flow level and circuit level classification
CN110691097A (en) Industrial honey pot system based on hpfeeds protocol and working method thereof
KR101210622B1 (en) Method for detecting ip shared router and system thereof
Chatzoglou et al. A hands-on gaze on HTTP/3 security through the lens of HTTP/2 and a public dataset
US8972543B1 (en) Managing clients utilizing reverse transactions
CN114401097A (en) Method for identifying HTTPS service traffic based on SSL certificate fingerprint
CN113518042B (en) Data processing method, device, equipment and storage medium
Schmidbauer et al. Detection Of Computational Intensive Reversible Covert Channels Based On Packet Runtime.
Vithanage et al. A Secure corroboration protocol for internet of things (IoT) devices using MQTT version 5 and LDAP
CN104967527A (en) Recovering method of communication recording, recovering device of communication recording and server
US9723017B1 (en) Method, apparatus and computer program product for detecting risky communications
BR102020003105A2 (en) METHOD FOR DETECTION OF FAKE DNS SERVERS USING MACHINE LEARNING TECHNIQUES
CN113965418B (en) Attack success judgment method and device
CN115633359A (en) PFCP session security detection method, device, electronic equipment and storage medium
KR20200056029A (en) Anonymous network analysis system using passive fingerprinting and method thereof
JP3648520B2 (en) Network communication monitoring / control method, monitoring / control apparatus using the same, and computer-readable recording medium recording network communication monitoring / control program
CN115037537A (en) Abnormal traffic interception and abnormal domain name identification method, device, equipment and medium
CN114301802A (en) Confidential evaluation detection method and device and electronic equipment
Makanju et al. Robust learning intrusion detection for attacks on wireless networks
Oujezsky et al. Modeling botnet C&C traffic lifespans from NetFlow using survival analysis

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
CB02 Change of applicant information
CB02 Change of applicant information

Address after: 311215 Room 216, Floor 2, Building B, No. 858, Jianshe Second Road, Xiaoshan Economic and Technological Development Zone, Xiaoshan District, Hangzhou City, Zhejiang Province

Applicant after: Hangzhou Zhongdian Anke Modern Technology Co.,Ltd.

Address before: 310051 building 3, 351 Changhe Road, Changhe street, Binjiang District, Hangzhou City, Zhejiang Province

Applicant before: Hangzhou rischen Anke Technology Co.,Ltd.

GR01 Patent grant
GR01 Patent grant