CN113141375A - Network security monitoring method and device, storage medium and server - Google Patents

Network security monitoring method and device, storage medium and server Download PDF

Info

Publication number
CN113141375A
CN113141375A CN202110498132.1A CN202110498132A CN113141375A CN 113141375 A CN113141375 A CN 113141375A CN 202110498132 A CN202110498132 A CN 202110498132A CN 113141375 A CN113141375 A CN 113141375A
Authority
CN
China
Prior art keywords
flow
adopting
features
encrypted
log
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Withdrawn
Application number
CN202110498132.1A
Other languages
Chinese (zh)
Inventor
马保银
杨全才
刘征
谢君鹏
孙蒙
冯继强
王刚
李一波
白凌
雷宇
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Kashgar Power Supply Co Of State Grid Xinjiang Electric Power Co ltd
Original Assignee
Kashgar Power Supply Co Of State Grid Xinjiang Electric Power Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Kashgar Power Supply Co Of State Grid Xinjiang Electric Power Co ltd filed Critical Kashgar Power Supply Co Of State Grid Xinjiang Electric Power Co ltd
Priority to CN202110498132.1A priority Critical patent/CN113141375A/en
Publication of CN113141375A publication Critical patent/CN113141375A/en
Withdrawn legal-status Critical Current

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/02Network architectures or network communication protocols for network security for separating internal from external traffic, e.g. firewalls
    • H04L63/0227Filtering policies
    • H04L63/0263Rule management
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/14Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic
    • H04L63/1408Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic by monitoring network traffic
    • H04L63/1416Event detection, e.g. attack signature detection
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/20Network architectures or network communication protocols for network security for managing network security; network security policies in general

Abstract

The invention discloses a network security monitoring method, a device, a storage medium and a server, belonging to the technical field of internet security, in particular to a network security monitoring method, comprising the following steps: filtering the encrypted flow in the public network by adopting a filter, and acquiring the flow; extracting feature types and features of the collected flow; and training the sample data according to the feature type and the features by adopting a preset algorithm to generate a model classification accuracy result. The method does not need to use an interceptor, reduces the cost and the computing power, does not need to decrypt the flow on the premise of not influencing the network performance, and ensures the privacy in the flow communication process.

Description

Network security monitoring method and device, storage medium and server
Technical Field
The invention belongs to the technical field of internet security, and particularly relates to a network security monitoring method, a network security monitoring device, a storage medium and a server.
Background
Computer networks are important means and ways for people to know society and obtain information through modern information technology means. The network security management is the fundamental guarantee that people can safely surf the internet, surf the internet in a green way and surf the internet in a healthy way.
In order to ensure communication security and privacy and to cope with various eavesdropping and man-in-the-middle attacks, HTTPS is becoming widespread throughout, and more network traffic is also encrypted, however, an attacker can also hide his own information and whereabouts in this way, and evade detection by disguising malware as normal traffic to attack infection by wearing it with a layer of vest named TLS/SSL.
In recent years, detection of encrypted malicious traffic is always a focus of attention in the field of network security, and the inventor of the present invention finds that, in the prior art, an industrial gateway device mainly uses a method for decrypting traffic to detect an attack, but this decryption method consumes a large amount of resources and is high in cost, and at the same time, the decryption process is strictly limited by laws and regulations related to privacy protection.
Disclosure of Invention
In order to at least solve the technical problems, the invention provides a network security monitoring method, a network security monitoring device, a storage medium and a server.
According to a first aspect of the present invention, there is provided a network security monitoring method, including:
filtering the encrypted flow in the public network by adopting a filter, and acquiring the flow;
extracting feature types and features of the collected flow;
and training the sample data according to the feature type and the features by adopting a preset algorithm to generate a model classification accuracy result.
Further, in the above-mentioned case,
the filtering of the encrypted flow in the public network by adopting a filter for flow acquisition comprises the following steps:
and capturing a network data packet according to a preset filtering rule by using a wireshark as a filter, and generating a process characteristic analysis software packet file as the acquired flow.
Further, in the above-mentioned case,
the filtering of the encrypted flow in the public network by adopting a filter for flow acquisition comprises the following steps:
and extracting information logs in the HTTPS flow captured by the packet by adopting a flow packet deep analysis mode, wherein the information logs comprise a connection communication log, an SSL protocol log and a certificate log.
Further, in the above-mentioned case,
the extracting of the feature category and the feature of the collected flow comprises the following steps:
and acquiring the characteristics of the acquired flow by analyzing the head information of the HTTPS data packet, capturing the network data packet by using the wireshark, and generating a process characteristic analysis software packet file to obtain the flow characteristic category.
Further, in the above-mentioned case,
the extracting features of the collected flow comprises the following steps:
and creating a connection 4-tuple through data from the connection log, the SSL protocol log and the certificate log, and extracting features.
Further, in the above-mentioned case,
training the sample data according to the feature category and the features by adopting a preset algorithm to generate a model classification accuracy result, training the sample data by adopting a proper drawing learning algorithm as a classifier to generate a corresponding classification model, and calculating the accuracy of the sample data based on the classification model to obtain the classification accuracy result;
the sample data includes encrypted malicious traffic and encrypted benign traffic.
Further, in the above-mentioned case,
the preset algorithm comprises the following steps: l1 regularized logistic regression algorithm, support vector machine, random forest, extreme gradient boosting.
According to a second aspect of the present invention, a network security monitoring apparatus comprises:
the acquisition module is used for filtering the encrypted flow in the public network by adopting a filter to acquire the flow;
the characteristic extraction module is used for extracting characteristic categories and characteristics of the acquired flow;
and the effect analysis module is used for training the sample data according to the feature type and the features by adopting a preset algorithm to generate a model classification accuracy result.
According to a third aspect of the present invention, a network security monitoring server comprises a memory, a processor, and a computer program stored on the memory and executable on the processor,
the processor, when executing the program, performs the steps of the method of any of the first aspect.
According to a fourth aspect of the invention, a computer readable storage medium stores a program which, when executed, is capable of implementing a method as defined in any one of the above.
The invention has the beneficial effects that: filtering encrypted flow in a public network by adopting a filter, collecting the flow, and extracting feature types and features; and training the sample data according to the feature type and the features by adopting a preset algorithm to generate a model classification accuracy result. The method does not need to use an interceptor, reduces the cost and the computing power, does not need to decrypt the flow on the premise of not influencing the network performance, and ensures the privacy in the flow communication process.
Drawings
The above and/or additional aspects and advantages of the present invention will become apparent and readily appreciated from the following description of the embodiments, taken in conjunction with the accompanying drawings of which,
fig. 1 is a flowchart of a network security monitoring method provided in the present invention.
Detailed Description
Reference will now be made in detail to embodiments of the present invention, examples of which are illustrated in the accompanying drawings, wherein like or similar reference numerals refer to the same or similar elements or elements having the same or similar function throughout. The embodiments described below with reference to the drawings are illustrative only and should not be construed as limiting the invention.
In order to more clearly illustrate the invention, the invention is further described below with reference to preferred embodiments and the accompanying drawings. Similar parts in the figures are denoted by the same reference numerals. It is to be understood by persons skilled in the art that the following detailed description is illustrative and not restrictive, and is not to be taken as limiting the scope of the invention.
In a first aspect of the present invention, a network security monitoring method is provided, as shown in fig. 1, including:
step 201: filtering the encrypted flow in the public network by adopting a filter, and acquiring the flow;
in the disclosure, a tool such as wireshark is used as a filter, and a network data packet generation process characteristic analysis software package file is captured according to a preset filtering rule and used as a filtered flow.
Furthermore, tools such as wireshark and the like are used as filters, encrypted benign traffic and encrypted malicious traffic in a public network are respectively obtained according to preset filtering rules, and the captured encrypted malicious traffic is generated into a process characteristic analysis software package file, so that the detection work is converted into a two-classification problem in machine learning.
In another embodiment of the invention, wireshark is adopted to collect the flow of the HTTPS packet.
A traffic packet deep analysis mode can be used for extracting enough information logs in the HTTPS traffic captured by the packet, wherein the enough information logs comprise a connection communication log, an SSL protocol log and a certificate log.
From the 3 logs the following information can be obtained: connection records, SSL records, certificate records.
Wherein the connection record comprises, for each row, aggregating a set of packets and describing the connection between the two endpoints. The connection record contains information such as IP address, port, protocol, connection status, number of packets, label, etc.
The SSL record includes SSL/TLS handshake and encrypted connection establishment procedures. There are SSL/TLS versions, passwords used, server names, certificate paths, topics, certificate issuers, etc.
The certificate record includes that each line in the log is a certificate record and describes certificate information, such as a certificate serial number, a common name, time validity, a subject, a signature algorithm, a key length in bits, and the like.
The flow packet generates log data after deep analysis, and each row in any log has a unique key for linking rows in other logs.
By connecting the unique key in the log record, 2 records of association can be performed with the unique key in the SSL protocol log.
By using a column of id key values spliced by commas in the protocol log, the certificate record corresponding to each id can be found in the certificate log.
The Certificate path after traffic analysis exists in a Certificate path column in the ssl protocol log, wherein id key values of all certificates are stored, and each comma-separated id value corresponds to one Certificate record in the Certificate log.
Step 202: extracting feature types and features of the collected flow;
in the disclosure, the collected traffic characteristics may be obtained by analyzing the header information of the HTTPS packet, and the TLS handshake protocol containing the information is transmitted in the clear text in the network, so that a tool such as wireshark or the like may be used to capture a network packet and generate a process characteristic analysis software package file, and a traffic characteristic category is obtained.
The extracted traffic feature categories may be classified into data element statistics, TLS features, and context data features.
The statistical characteristics of the data elements comprise the size of the data packet, the arrival time sequence and the byte distribution.
The TLS features include the encryption suite and TLS extension provided by the client, the client public key length, the encryption suite selected by the server, and certificate information. Further, the certificate information includes whether it is a non-CA self-signature, the number in SAN x.509 extension, the validity period, and the like.
Contextual data features include, but can be subdivided into, DNS data flow and HTTP data flow features. The DNS feature concerns the domain name length in the DNS response, the character length ratio of digits to non-digits in the domain name, the TTL value, the number of IP addresses returned by the DMS response and the ranking condition of the domain name in the Alexa website. Further, the HTTP feature focuses on the various fields of the inbound and outbound HTTP and the HTTP response code. Wherein the plurality of fields of the inbound and outbound HTTP include: Set-Cookie, Location, Expires, Content-Type, Server, etc.
Joy is adopted to extract data features from a real-time network flow or process characteristic analysis software package file, wherein the data features comprise information such as clientHello, serverHello, certificate and clien-tKeyExchange, and then JSON is used for representing the data features, and the Joy further comprises an analysis tool (sleuth) which can be applied to the data files. And (4) extracting the required specific characteristic information by adopting slouth analysis.
In the invention, besides paying attention to the traditional flow characteristics such as the size of the data packet and some parameters related to time, Joy analyzes the initial data packet of the encrypted connection and fully utilizes the unencrypted field to extract the data characteristic elements from the encrypted packet.
For example, Joy's configuration commands in an ubuntu environment are as follows:
sudo apt-get installbuilt-essential libssl-dev lib process characteristic analysis software package-dev libcurl 4-opennsl-dev
git clone https://github.com/cisco/joy.git
cd joy
./config
make
In the method, Joy is used for extracting the data characteristics of the tls/dns/http type from the process characteristic analysis software package file in the data directory, and the extracted json result file is stored in the feature directory. For example:
./joy output=features bidir=1tls=1dns=1http=1data/*
extracting specific feature information required by using sleuth analysis
./sleuth bin/features/*--select"tls{cs,c_extensions,c_key_length,s_cs,s_extensions,s_cert[{validity_not_before,validity_not_after}]}"
In another embodiment of the invention, a connection 4-tuple is created by data from a connection log, an SSL protocol log, and a certificate log, and features are extracted for machine learning model training.
Further, connection is carried out according to id in a connection log and id in an SSL protocol log, then according to a certificate path in the conn _ ssl.log, the first key is taken and is associated with the id in the certificate log again, group aggregation operation is carried out according to the same data of connection 4 tuples (source IP, target port and protocol) in the obtained association result, and then feature extraction is carried out on each obtained connection 4 tuple according to the aggregation result.
For each connected 4-tuple, 37 features are extracted. The features are created based on thorough analysis of malware data. For these features, we divided them into 3 groups: connection feature, SSL feature, certificate feature.
Wherein the connection characteristics are based on characteristics of the connection record describing common behavior of the communication flow independent of credentials and encryption.
The SSL feature is a feature based on SSL records, describing SSL handshakes and information of encrypted communications.
The certificate feature is based on the characteristics of the certificate record, describing the information that the web service person provides to our certificate during the SSL handshake. Each property is a floating point value that is-1 if the property cannot be computed due to lack of information.
Further, the connection features comprise 12 in total, including: number of aggregation and connection records. That is, each connection 4-tuple contains the sum of the SSL aggregation and the connection record.
The duration average is the average of the connected parameter duration of each connected 4-tuple.
The standard deviation of duration is the standard deviation of the connected parameter duration of each connected 4-tuple.
Duration out of standard deviation ratio, including what percentage of all duration values of each connected 4-tuple are out of range. There are two limits to this range, the upper limit being the mean + standard deviation and the lower limit being the mean-standard deviation.
And the total transmit packet size. All connections of each 4-tuple record the number of bytes of payload sent.
In another embodiment of the present invention, the SSL characteristics include 10, including the ratio of the SSL connections in the connection record, i.e. the ratio of the number of non-SSL connections and SSL connections in the connection 4 tuple.
The ratio of TLS to SSL, i.e. the TLS version distribution in the join 4 tuple.
SNI ratio, i.e., the ratio of server _ name not empty in the join 4 tuple.
SNI is IP, the ratio of server _ name to IP address in the join 4 tuple.
In another embodiment of the present invention, the certificate features comprise 15 in total, including: public key mean. I.e. the average of all certificates exponennt in the concatenated 4-tuple.
The average value of the validity period of the certificate, i.e. the average value of the number of days of validity connecting all certificates in the 4-tuple.
Standard deviation of certificate validity period. I.e. the standard deviation of the number of days of validity connecting all certificates in the 4-tuple.
The validity of the certificate period during the capture. I.e. the proportion of all certificates connecting 4 tuples that are not expired.
Step 203: and training the sample data according to the feature type and the features by adopting a preset algorithm to generate a model classification accuracy result.
In the method, a preset algorithm of proper drawing learning is adopted as a classifier to train the sample data, wherein the sample data comprises encrypted malicious flow and encrypted benign flow, a corresponding classification model is generated, and the accuracy of the sample data is calculated based on the classification model to obtain a classification accuracy result. Wherein, the preset algorithm comprises: support Vector Machines (SVM), random forest (random forest).
In the invention, a support vector machine algorithm is adopted to train the sample data according to the feature type and the features, and the method comprises the following steps:
and (3) taking the characteristic value of the marked training data as input, and performing model training by using an SVM classifier of the LibSVM.
Further, the method for calculating the feature value of the marked training data comprises the following steps: the acquired flow file is F, any K continuous bytes in the file are taken as an element, the entropy value of a set S' formed by all the K continuous bytes in the file is calculated, and the relative entropy corresponding to the set is hk
Figure BDA0003055274680000121
Wherein m isikIs a set fkThe frequency of the occurrence of the ith element is calculated to obtain h0,h1,h2,h3
For each data file needing to be processed, calculating a Monte Carlo shockproof pad by taking each 48-bit stream as a group, taking the first 24 bits as montex and the last 24 bits as montey, calculating whether the point of the 48-bit stream falls in a circular area by using the montex and the montey, estimating a Monte Carlo pi value according to the point number falling in the circular area, and calculating the difference value between the Monte Carlo pi value and a real pi value as an error value P of the estimated pi value by a Monte Carlo simulation methoderror(ii) a Will be the eigenvalues of the labeled training data.
Analyzing the flow, judging the flow by using a classification model generated in a classifier training stage, and determining a classification resultPolicy evaluation including model classification accuracy result Pr,Pr=TP/(TP+FP) Wherein, TPThe number of correctly marked samples in the encrypted samples; fPThe number of samples that are mis-marked as encrypted in the non-encrypted samples.
Furthermore, the decision evaluation can also comprise a recall ratio ReAnd comprehensive evaluation FmWherein R ise=TP/(TP+FN);Fm=2PrRe/(Pr+Re). The identification effect of the identification method can be reflected by calculating the classification accuracy result and recall ratio of the model, the comprehensive evaluation can be more comprehensively evaluated based on the accuracy and recall ratio, and the more comprehensive evaluation result is higher, so that the more executed encryption flow classification effect of the method is ideal.
In another embodiment of the present invention, training sample data according to the feature type and the features by using a random forest algorithm to generate a model classification accuracy result, comprising:
the method for constructing the random forest by using Bagging specifically comprises the following steps:
step a1, constructing samples containing each element by random repeated sampling for the sample data, such as constructing a slave data (X, Y) … … (X) for random sampling with n timesn,Yn) Starting, constructing a boot strap sample;
step a 2: constructing a decision tree for each boot strap sample;
step a 3: repeating the step a1 and the step a2 to obtain a plurality of decision trees;
step a 4: and voting the input vector X by each decision tree, calculating all votes, taking the decision tree with the highest number of votes as a classification label of the vector X, and acquiring the proportion different from the proportion of the correct classification label as the false classification rate of the immediate forest.
Step a 5: and respectively calculating the true TP and the false positive FP of the sample data, and calculating the classification accuracy result of the model according to the obtained true TP and the false positive FP.
Further, the number TP of samples correctly predicted by the classification model in the samples of the actual type ii=nijTaking the obtained calculation result as the real TP;
taking the obtained calculation result as false negative FN; the number FP of samples which are misjudged as type i by the classification model in the samples with the actual type of non-ii=∑j≠inji(ii) a Taking the obtained calculation result as a false positive FP; the calculated model classification accuracy result 0A is:
Figure BDA0003055274680000141
in another embodiment of the invention, data set selection, for negative examples, collected traffic, uses a latest batch of 10w malware to capture malware-generated traffic through a sandbox. For the positive sample, one part uses normal flow in a daily office network, and simultaneously crawls the top10000 website which has the most visit in alexa by using a crawler, and collects the generated flow as the other part of data set.
It should be noted that, in the present invention, the preset algorithm may also be one of a l1 regularized logistic regression algorithm (l1-logistic regression) and an extreme gradient boost (XGBoost).
In a second aspect of the present invention, there is provided a network security monitoring apparatus, comprising:
the acquisition module is used for filtering the encrypted flow in the public network by adopting a filter to acquire the flow;
in the disclosure, the acquisition module is configured to capture a network data packet according to a preset filtering rule and generate a process characteristic analysis software package file as a filtered flow by using a tool such as wireshark as a filter.
Further, the acquisition module is used for adopting tools such as wireshark and the like as a filter, respectively acquiring encrypted benign traffic and encrypted malicious traffic in a public network according to a preset filtering rule, and generating a process characteristic analysis software package file from the captured encrypted malicious traffic, so that detection work is converted into a two-classification problem in machine learning.
In another embodiment of the invention, wireshark is adopted to collect the flow of the HTTPS packet.
A traffic packet deep analysis mode can be used for extracting enough information logs in the HTTPS traffic captured by the packet, wherein the enough information logs comprise a connection communication log, an SSL protocol log and a certificate log.
From the 3 logs the following information can be obtained: connection records, SSL records, certificate records.
Wherein the connection record comprises, for each row, aggregating a set of packets and describing the connection between the two endpoints. The connection record contains information such as IP address, port, protocol, connection status, number of packets, label, etc.
The SSL record includes SSL/TLS handshake and encrypted connection establishment procedures. There are SSL/TLS versions, passwords used, server names, certificate paths, topics, certificate issuers, etc.
The certificate record includes that each line in the log is a certificate record and describes certificate information, such as a certificate serial number, a common name, time validity, a subject, a signature algorithm, a key length in bits, and the like.
The flow packet generates log data after deep analysis, and each row in any log has a unique key for linking rows in other logs.
By connecting the unique key in the log record, 2 records of association can be performed with the unique key in the SSL protocol log.
By using a column of id key values spliced by commas in the protocol log, the certificate record corresponding to each id can be found in the certificate log.
The Certificate path after traffic analysis exists in a Certificate path column in the ssl protocol log, wherein id key values of all certificates are stored, and each comma-separated id value corresponds to one Certificate record in the Certificate log.
The characteristic extraction module is used for extracting characteristic categories and characteristics of the acquired flow;
in the disclosure, the feature extraction module is configured to obtain the collected traffic features by analyzing header information of the HTTPS packet, and a TLS handshake protocol including the information is transmitted in a clear text in a network, so that a network packet generation process characteristic analysis software package file may be captured by using a tool such as wireshark, and a traffic feature category is obtained.
The feature extraction module is further used for extracting traffic feature categories which can be divided into data element statistical features, TLS features and context data features.
The statistical characteristics of the data elements comprise the size of the data packet, the arrival time sequence and the byte distribution.
The TLS features include the encryption suite and TLS extension provided by the client, the client public key length, the encryption suite selected by the server, and certificate information. Further, the certificate information includes whether it is a non-CA self-signature, the number in SAN x.509 extension, the validity period, and the like.
Contextual data features include, but can be subdivided into, DNS data flow and HTTP data flow features. The DNS feature concerns the domain name length in the DNS response, the character length ratio of digits to non-digits in the domain name, the TTL value, the number of IP addresses returned by the DMS response and the ranking condition of the domain name in the Alexa website. Further, the HTTP feature focuses on the various fields of the inbound and outbound HTTP and the HTTP response code. Wherein the plurality of fields of the inbound and outbound HTTP include: Set-Cookie, Location, Expires, Content-Type, Server, etc.
Joy is adopted to extract data features from a real-time network flow or process characteristic analysis software package file, wherein the data features comprise information such as clientHello, serverHello, certificate and clien-tKeyExchange, and then JSON is used for representing the data features, and the Joy further comprises an analysis tool (sleuth) which can be applied to the data files. And (4) extracting the required specific characteristic information by adopting slouth analysis.
In the invention, besides paying attention to the traditional flow characteristics such as the size of the data packet and some parameters related to time, Joy analyzes the initial data packet of the encrypted connection and fully utilizes the unencrypted field to extract the data characteristic elements from the encrypted packet.
For example, Joy's configuration commands in an ubuntu environment are as follows:
sudo apt-get installbuilt-essential libssl-dev lib process characteristic analysis software package-dev libcurl 4-opennsl-dev
git clone https://github.com/cisco/joy.git
cd joy
./config
make
In the method, Joy is used for extracting the data characteristics of the tls/dns/http type from the process characteristic analysis software package file in the data directory, and the extracted json result file is stored in the feature directory. For example:
./joy output=features bidir=1tls=1dns=1http=1data/*
extracting specific feature information required by using sleuth analysis
./sleuth bin/features/*--select"tls{cs,c_extensions,c_key_length,s_cs,s_extensions,s_cert[{validity_not_before,validity_not_after}]}"
In another embodiment of the invention, a connection 4-tuple is created by data from a connection log, an SSL protocol log, and a certificate log, and features are extracted for machine learning model training.
Further, connection is carried out according to id in a connection log and id in an SSL protocol log, then according to a certificate path in the conn _ ssl.log, the first key is taken and is associated with the id in the certificate log again, group aggregation operation is carried out according to the same data of connection 4 tuples (source IP, target port and protocol) in the obtained association result, and then feature extraction is carried out on each obtained connection 4 tuple according to the aggregation result.
For each connected 4-tuple, 37 features are extracted. The features are created based on thorough analysis of malware data. For these features, we divided them into 3 groups: connection feature, SSL feature, certificate feature.
Wherein the connection characteristics are based on characteristics of the connection record describing common behavior of the communication flow independent of credentials and encryption.
The SSL feature is a feature based on SSL records, describing SSL handshakes and information of encrypted communications.
The certificate feature is based on the characteristics of the certificate record, describing the information that the web service person provides to our certificate during the SSL handshake. Each property is a floating point value that is-1 if the property cannot be computed due to lack of information.
Further, the connection features comprise 12 in total, including: number of aggregation and connection records. That is, each connection 4-tuple contains the sum of the SSL aggregation and the connection record.
The duration average is the average of the connected parameter duration of each connected 4-tuple.
The standard deviation of duration is the standard deviation of the connected parameter duration of each connected 4-tuple.
Duration out of standard deviation ratio, including what percentage of all duration values of each connected 4-tuple are out of range. There are two limits to this range, the upper limit being the mean + standard deviation and the lower limit being the mean-standard deviation.
And the total transmit packet size. All connections of each 4-tuple record the number of bytes of payload sent.
In another embodiment of the present invention, the SSL characteristics include 10, including the ratio of the SSL connections in the connection record, i.e. the ratio of the number of non-SSL connections and SSL connections in the connection 4 tuple.
The ratio of TLS to SSL, i.e. the TLS version distribution in the join 4 tuple.
SNI ratio, i.e., the ratio of server _ name not empty in the join 4 tuple.
SNI is IP, the ratio of server _ name to IP address in the join 4 tuple.
In another embodiment of the present invention, the certificate features comprise 15 in total, including: public key mean. I.e. the average of all certificates exponennt in the concatenated 4-tuple.
The average value of the validity period of the certificate, i.e. the average value of the number of days of validity connecting all certificates in the 4-tuple.
Standard deviation of certificate validity period. I.e. the standard deviation of the number of days of validity connecting all certificates in the 4-tuple.
The validity of the certificate period during the capture. I.e. the proportion of all certificates connecting 4 tuples that are not expired.
And the effect analysis module is used for training the acquired flow according to the feature type and the features by adopting a preset algorithm to generate a model classification accuracy result.
In the disclosure, the effect analysis module is configured to use a suitable drawing learning algorithm as a classifier, train the collected traffic as sample data, generate a corresponding classification model, classify the encrypted traffic based on the classification model, and screen out the encrypted malicious traffic. Wherein, the classification algorithm comprises: support Vector Machines (SVM), random forest (random forest).
In the invention, the effect analysis module adopts a support vector machine algorithm to train the sample data according to the feature type and the features, and the method comprises the following steps:
and (3) taking the characteristic value of the marked training data as input, and performing model training by using an SVM classifier of the LibSVM.
Further, the method for calculating the feature value of the marked training data comprises the following steps: the acquired flow file is F, any K continuous bytes in the file are taken as an element, the entropy value of a set S' formed by all the K continuous bytes in the file is calculated, and the relative entropy corresponding to the set is hk
Figure BDA0003055274680000211
Wherein m isikIs a set fkThe frequency of the occurrence of the ith element is calculated to obtain h0,h1,h2,h3
For each data file to be processed, calculating a Monte Carlo shockproof pad by taking each 48-bit stream as a group, taking the first 24 bits as montex and the last 24 bits as montey, calculating whether the point of the 48-bit stream falls in a circular area by utilizing the montex and the montey, estimating the Monte Carlo pi value according to the point number falling in the circular area, and then calculating the Monte Carlo pi value and the real Monte Carlo pi valueThe difference value between the pi values is used as an error value P for estimating the pi values by a Monte Carlo simulation methoderror(ii) a Will be the eigenvalues of the labeled training data.
Analyzing the flow, judging the flow by using a classification model generated in a classifier training stage, and performing decision evaluation on classification results including a model classification accuracy result Pr,Pr=TP/(TP+FP) Wherein, TPThe number of correctly marked samples in the encrypted samples; fPThe number of samples that are mis-marked as encrypted in the non-encrypted samples.
Furthermore, the decision evaluation can also comprise a recall ratio ReAnd comprehensive evaluation FmWherein R ise=TP/(TP+FN);Fm=2PrRe/(Pr+Re). The identification effect of the identification method can be reflected by calculating the classification accuracy result and recall ratio of the model, the comprehensive evaluation can be more comprehensively evaluated based on the accuracy and recall ratio, and the more comprehensive evaluation result is higher, so that the more executed encryption flow classification effect of the method is ideal.
In another embodiment of the present invention, the effect analysis module trains the sample data according to the feature type and the features by using a random forest algorithm to generate a model classification accuracy result, including:
the effect analysis module is used for executing step a1, i.e. constructing a sample containing each element by random repeatable sampling for a sample data for a number of times, e.g. constructing a slave data (X, Y) … … (X) for a random number of times of nn,Yn) Starting, constructing a boot strap sample;
the effect analysis module is further configured to perform step a 2: for each boot strap sample, constructing an effect analysis module for executing a decision tree;
the effect analysis module is further configured to perform step a 3: repeating the step a1 and the step a2 to obtain a plurality of decision trees;
the effect analysis module is further configured to perform step a 4: and voting the input vector X by each decision tree, calculating all votes, taking the decision tree with the highest number of votes as a classification label of the vector X, and acquiring the proportion different from the proportion of the correct classification label as the false classification rate of the immediate forest.
The effect analysis module is further configured to perform step a 5: and respectively calculating the true TP and the false positive FP of the sample data, and calculating the classification accuracy result of the model according to the obtained true TP and the false positive FP.
Further, the number TP of samples correctly predicted by the classification model in the samples of the actual type ii=nijTaking the obtained calculation result as the real TP;
taking the obtained calculation result as false negative FN; the number FP of samples which are misjudged as type i by the classification model in the samples with the actual type of non-ii=∑j≠inji(ii) a Taking the obtained calculation result as a false positive FP; the calculated model classification accuracy result 0A is:
Figure BDA0003055274680000231
in another embodiment of the present invention, the predetermined algorithm may be one of a l1 regularized logistic regression algorithm (l1-logistic regression) and an extreme gradient boost (XGBoost).
In another embodiment of the invention, data set selection, for negative examples, collected traffic, uses a latest batch of 10w malware to capture malware-generated traffic through a sandbox. For the positive sample, one part uses normal flow in a daily office network, and simultaneously crawls the top10000 website which has the most visit in alexa by using a crawler, and collects the generated flow as the other part of data set.
As used herein, the singular forms "a", "an", "the" and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It will be further understood that the terms "comprises" and/or "comprising," when used in this specification, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof. It will be understood that when an element is referred to as being "connected" or "coupled" to another element, it can be directly connected or coupled to the other element or intervening elements may also be present. Further, "connected" or "coupled" as used herein may include wirelessly connected or wirelessly coupled. As used herein, the term "and/or" includes all or any element and all combinations of one or more of the associated listed items.
It will be understood by those skilled in the art that, unless otherwise defined, all terms (including technical and scientific terms) used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs. It will be further understood that terms, such as those defined in commonly used dictionaries, should be interpreted as having a meaning that is consistent with their meaning in the context of the prior art and will not be interpreted in an idealized or overly formal sense unless expressly so defined herein.
It should be understood that the above detailed description of the technical solution of the present invention with the help of preferred embodiments is illustrative and not restrictive. On the basis of reading the description of the invention, a person skilled in the art can modify the technical solutions described in the embodiments, or make equivalent substitutions for some technical features; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present invention.

Claims (10)

1. A network security monitoring method is characterized by comprising the following steps:
filtering the encrypted flow in the public network by adopting a filter, and acquiring the flow;
extracting feature types and features of the collected flow;
and training the sample data according to the feature type and the features by adopting a preset algorithm to generate a model classification accuracy result.
2. The method of claim 1,
the filtering of the encrypted flow in the public network by adopting a filter for flow acquisition comprises the following steps:
and capturing a network data packet according to a preset filtering rule by using a wireshark as a filter, and generating a process characteristic analysis software packet file as the acquired flow.
3. The method of claim 1,
the filtering of the encrypted flow in the public network by adopting a filter for flow acquisition comprises the following steps:
and extracting information logs in the HTTPS flow captured by the packet by adopting a flow packet deep analysis mode, wherein the information logs comprise a connection communication log, an SSL protocol log and a certificate log.
4. The method of claim 1,
the extracting of the feature category and the feature of the collected flow comprises the following steps:
and acquiring the characteristics of the acquired flow by analyzing the head information of the HTTPS data packet, capturing the network data packet by using the wireshark, and generating a process characteristic analysis software packet file to obtain the flow characteristic category.
5. The method of claim 3,
the extracting features of the collected flow comprises the following steps:
and creating a connection 4-tuple through data from the connection log, the SSL protocol log and the certificate log, and extracting features.
6. The method of claim 1,
training the sample data according to the feature category and the features by adopting a preset algorithm to generate a model classification accuracy result, training the sample data by adopting a proper drawing learning algorithm as a classifier to generate a corresponding classification model, and calculating the accuracy of the sample data based on the classification model to obtain the classification accuracy result;
the sample data includes encrypted malicious traffic and encrypted benign traffic.
7. The method of claim 1,
the preset algorithm comprises the following steps: l1 regularized logistic regression algorithm, support vector machine, random forest, extreme gradient boosting.
8. A network security monitoring apparatus, comprising:
the acquisition module is used for filtering the encrypted flow in the public network by adopting a filter to acquire the flow;
the characteristic extraction module is used for extracting characteristic categories and characteristics of the acquired flow;
and the effect analysis module is used for training the sample data according to the feature type and the features by adopting a preset algorithm to generate a model classification accuracy result.
9. A network security monitoring server comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein,
the processor, when executing the program, performs the steps of the method of any one of claims 1 to 7.
10. A computer-readable storage medium, characterized in that the computer-readable storage medium stores a program which, when executed, is capable of implementing the method according to any one of claims 1-7.
CN202110498132.1A 2021-05-08 2021-05-08 Network security monitoring method and device, storage medium and server Withdrawn CN113141375A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110498132.1A CN113141375A (en) 2021-05-08 2021-05-08 Network security monitoring method and device, storage medium and server

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110498132.1A CN113141375A (en) 2021-05-08 2021-05-08 Network security monitoring method and device, storage medium and server

Publications (1)

Publication Number Publication Date
CN113141375A true CN113141375A (en) 2021-07-20

Family

ID=76816898

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110498132.1A Withdrawn CN113141375A (en) 2021-05-08 2021-05-08 Network security monitoring method and device, storage medium and server

Country Status (1)

Country Link
CN (1) CN113141375A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113676348A (en) * 2021-08-04 2021-11-19 南京赋乐科技有限公司 Network channel cracking method, device, server and storage medium

Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20140310396A1 (en) * 2013-04-15 2014-10-16 International Business Machines Corporation Identification and classification of web traffic inside encrypted network tunnels
US20160224803A1 (en) * 2015-01-29 2016-08-04 Affectomatics Ltd. Privacy-guided disclosure of crowd-based scores computed based on measurements of affective response
CN107256393A (en) * 2017-06-05 2017-10-17 四川大学 The feature extraction and state recognition of one-dimensional physiological signal based on deep learning
CN108254678A (en) * 2018-01-19 2018-07-06 成都航空职业技术学院 A kind of analog circuit fault sorting technique based on sine and cosine algorithm
US20180367506A1 (en) * 2015-08-05 2018-12-20 Intralinks, Inc. Systems and methods of secure data exchange
CN109391599A (en) * 2017-08-10 2019-02-26 蓝盾信息安全技术股份有限公司 A kind of detection system of the Botnet communication signal based on HTTPS traffic characteristics analysis
US20190273510A1 (en) * 2018-03-01 2019-09-05 Crowdstrike, Inc. Classification of source data by neural network processing
CN110391958A (en) * 2019-08-15 2019-10-29 北京中安智达科技有限公司 A kind of pair of network encryption flow carries out feature extraction automatically and knows method for distinguishing
CN111277578A (en) * 2020-01-14 2020-06-12 西安电子科技大学 Encrypted flow analysis feature extraction method, system, storage medium and security device
CN111277587A (en) * 2020-01-19 2020-06-12 武汉思普崚技术有限公司 Malicious encrypted traffic detection method and system based on behavior analysis

Patent Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20140310396A1 (en) * 2013-04-15 2014-10-16 International Business Machines Corporation Identification and classification of web traffic inside encrypted network tunnels
US20160224803A1 (en) * 2015-01-29 2016-08-04 Affectomatics Ltd. Privacy-guided disclosure of crowd-based scores computed based on measurements of affective response
US20180367506A1 (en) * 2015-08-05 2018-12-20 Intralinks, Inc. Systems and methods of secure data exchange
CN107256393A (en) * 2017-06-05 2017-10-17 四川大学 The feature extraction and state recognition of one-dimensional physiological signal based on deep learning
CN109391599A (en) * 2017-08-10 2019-02-26 蓝盾信息安全技术股份有限公司 A kind of detection system of the Botnet communication signal based on HTTPS traffic characteristics analysis
CN108254678A (en) * 2018-01-19 2018-07-06 成都航空职业技术学院 A kind of analog circuit fault sorting technique based on sine and cosine algorithm
US20190273510A1 (en) * 2018-03-01 2019-09-05 Crowdstrike, Inc. Classification of source data by neural network processing
CN110391958A (en) * 2019-08-15 2019-10-29 北京中安智达科技有限公司 A kind of pair of network encryption flow carries out feature extraction automatically and knows method for distinguishing
CN111277578A (en) * 2020-01-14 2020-06-12 西安电子科技大学 Encrypted flow analysis feature extraction method, system, storage medium and security device
CN111277587A (en) * 2020-01-19 2020-06-12 武汉思普崚技术有限公司 Malicious encrypted traffic detection method and system based on behavior analysis

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
刘兆禄; 赵英; 刘淑梅: "基于Spark的网络流量分类方法研究" *
张航 等: "全局灵敏度分析的支持向量机方法", 《航空工程进展》 *
程光 等: "基于支持向量机的加密流量识别方法", 《东南大学学报(自然科学版)》 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113676348A (en) * 2021-08-04 2021-11-19 南京赋乐科技有限公司 Network channel cracking method, device, server and storage medium
CN113676348B (en) * 2021-08-04 2023-12-29 南京赋乐科技有限公司 Network channel cracking method, device, server and storage medium

Similar Documents

Publication Publication Date Title
Anderson et al. Identifying encrypted malware traffic with contextual flow data
US20240089301A1 (en) Method and system for capture of visited links from encrypted and non-encrypted network traffic
CN113705619A (en) Malicious traffic detection method, system, computer and medium
JP2017538376A (en) System and method for detecting coverage channel network intrusion based on offline network traffic
CN111030941A (en) Decision tree-based HTTPS encrypted flow classification method
CN106815511B (en) Information processing unit and method
CN113259313A (en) Malicious HTTPS flow intelligent analysis method based on online training algorithm
CN112261007B (en) Https malicious encryption traffic detection method and system based on machine learning and storage medium
CN111447232A (en) Network flow detection method and device
CN113676348B (en) Network channel cracking method, device, server and storage medium
Yan et al. Identifying wechat red packets and fund transfers via analyzing encrypted network traffic
CN112217763A (en) Hidden TLS communication flow detection method based on machine learning
CN113542253A (en) Network flow detection method, device, equipment and medium
CN113141375A (en) Network security monitoring method and device, storage medium and server
Papadogiannaki et al. Network intrusion detection in encrypted traffic
CN113518042B (en) Data processing method, device, equipment and storage medium
CN112637292A (en) Data processing method and device, electronic equipment and storage medium
Ishibashi et al. Generating labeled training datasets towards unified network intrusion detection systems
CN113938312B (en) Method and device for detecting violent cracking flow
CN114117429A (en) Network flow detection method and device
CN114640519A (en) Encrypted traffic detection method and device and readable storage medium
Warmer Detection of web based command & control channels
Ucci et al. Near-real-time anomaly detection in encrypted traffic using machine learning techniques
Wang et al. A method for TLS malicious traffic identification based on machine learning
US20220407722A1 (en) Method for detecting anomalies in ssl and/or tls communications, corresponding device, and computer program product

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
WW01 Invention patent application withdrawn after publication
WW01 Invention patent application withdrawn after publication

Application publication date: 20210720