CN114070602A - HTTP tunnel detection method, device, electronic equipment and storage medium - Google Patents

HTTP tunnel detection method, device, electronic equipment and storage medium Download PDF

Info

Publication number
CN114070602A
CN114070602A CN202111332949.8A CN202111332949A CN114070602A CN 114070602 A CN114070602 A CN 114070602A CN 202111332949 A CN202111332949 A CN 202111332949A CN 114070602 A CN114070602 A CN 114070602A
Authority
CN
China
Prior art keywords
http
data
traffic data
metadata
sample
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202111332949.8A
Other languages
Chinese (zh)
Inventor
苏香艳
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Topsec Technology Co Ltd
Beijing Topsec Network Security Technology Co Ltd
Beijing Topsec Software Co Ltd
Original Assignee
Beijing Topsec Technology Co Ltd
Beijing Topsec Network Security Technology Co Ltd
Beijing Topsec Software Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Topsec Technology Co Ltd, Beijing Topsec Network Security Technology Co Ltd, Beijing Topsec Software Co Ltd filed Critical Beijing Topsec Technology Co Ltd
Priority to CN202111332949.8A priority Critical patent/CN114070602A/en
Publication of CN114070602A publication Critical patent/CN114070602A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/14Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic
    • H04L63/1441Countermeasures against malicious traffic
    • H04L63/145Countermeasures against malicious traffic the attack involving the propagation of malware through the network, e.g. viruses, trojans or worms
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/243Classification techniques relating to the number of classes
    • G06F18/24323Tree-organised classifiers
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/02Protocols based on web technology, e.g. hypertext transfer protocol [HTTP]

Landscapes

  • Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Computer Security & Cryptography (AREA)
  • Theoretical Computer Science (AREA)
  • General Engineering & Computer Science (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Signal Processing (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Physics & Mathematics (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • General Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Virology (AREA)
  • Computer Hardware Design (AREA)
  • Computing Systems (AREA)
  • Data Exchanges In Wide-Area Networks (AREA)

Abstract

The disclosure relates to the technical field of security, and provides a method and a device for detecting an HTTP tunnel, electronic equipment and a storage medium. The method comprises the following steps: acquiring first HTTP traffic data, inputting the first HTTP traffic data into an HTTP tunnel detection model, and acquiring a detection result, wherein the detection result is used for indicating that the first HTTP traffic data are collected by an HTTP tunnel or indicating that the first HTTP traffic data are collected by a non-HTTP tunnel; the HTTP tunnel detection model is a random forest model obtained based on training of training samples, and the training samples comprise: and each set of data comprises a plurality of screening data and a plurality of creation data, each screening data is a metadata item of which the characteristic difference between the metadata of the target white sample and the metadata of the target black sample is greater than or equal to a difference threshold value, and each creation data is at least one characteristic parameter of the metadata item. By adopting the method, the HTTP tunnel detection model is more stable, and the accuracy of the HTTP tunnel detection is improved.

Description

HTTP tunnel detection method, device, electronic equipment and storage medium
Technical Field
The present disclosure relates to the field of security technologies, and in particular, to a method and an apparatus for detecting an HTTP tunnel, an electronic device, and a storage medium.
Background
With the development of internet technology, the maintenance of network security becomes the key point of network management, in a network, an attacker often adopts the trojan technology to perform some network violation operations, and particularly, the trojan communication has high concealment through the tunnel technology, so that the trojan communication cannot be detected by a traditional detection mode, and a website is attacked by malicious software or information is stolen, and the like. The HTTP tunnel is a technology for encapsulating various network protocols by using HTTP or HTTPs for communication, and is constructed based on a common port, so that the HTTP tunnel is easy to implement and difficult to detect, and detection of the HTTP tunnel becomes a relatively troublesome problem in network detection.
In the prior art, HTTP tunnel data is detected by a decision tree classification algorithm, which mainly analyzes a data packet of captured network traffic, extracts packet header information to perform session stream reassembly, obtains feature information of a session stream, trains an HTTP tunnel detection model by using the decision tree classification algorithm based on the feature information, and detects the HTTP tunnel data according to the trained model.
However, with the prior art, the HTTP tunnel detection model has poor stability and the accuracy of the detection result is low.
Disclosure of Invention
In view of the foregoing, it is necessary to provide an HTTP tunnel detection method, apparatus, electronic device, and storage medium.
The embodiment of the disclosure provides a method for detecting an HTTP tunnel, which comprises the following steps:
acquiring first HTTP traffic data;
inputting the first HTTP traffic data into an HTTP tunnel detection model, and obtaining a detection result, wherein the detection result is used for indicating that the first HTTP traffic data is collected by the HTTP tunnel, or indicating that the first HTTP traffic data is not collected by the HTTP tunnel;
the HTTP tunnel detection model is a random forest model obtained based on training of training samples, and the training samples comprise: and each set of data comprises a plurality of screening data and a plurality of creation data, each screening data is a metadata item of which the characteristic difference between the metadata of the target white sample and the metadata of the target black sample is greater than or equal to a difference threshold value, and each creation data is at least one characteristic parameter of the metadata item.
In one embodiment, before inputting the first HTTP traffic data into an HTTP tunnel detection model, the method further includes:
comparing the domain name of the first HTTP traffic data with domain names which are stored in a white list and are not the traffic data acquired by the HTTP tunnel;
the inputting the first HTTP traffic data into an HTTP tunnel detection model includes:
and if the domain name of the first HTTP traffic data does not exist in the white list, inputting the first HTTP traffic data into an HTTP tunnel detection model.
In one embodiment, before inputting the first HTTP traffic data into an HTTP tunnel detection model, the method further includes:
analyzing the first HTTP traffic data to obtain session content;
the inputting the first HTTP traffic data into an HTTP tunnel detection model includes:
and if the session content does not conform to the HTTP protocol specification, inputting the first HTTP flow data into an HTTP tunnel detection model.
In one embodiment, before inputting the first HTTP traffic data into an HTTP tunnel detection model, the method further includes:
acquiring an original black sample, wherein the original black sample comprises a plurality of second HTTP traffic data acquired by the HTTP tunnel;
recombining each HTTP traffic data with the same quintuple information into a session stream in the plurality of second HTTP traffic data to obtain a plurality of black sample session streams;
extracting metadata of the target black sample from each black sample session stream;
and/or the presence of a gas in the gas,
acquiring an original white sample, wherein the original white sample comprises a plurality of third HTTP traffic data which are not acquired by the HTTP tunnel;
recombining each HTTP traffic data with the same quintuple information into a session stream in the third HTTP traffic data to obtain a plurality of white sample session streams;
metadata for the target white sample is extracted from each white sample conversational stream.
In one embodiment, after extracting the metadata of the target black sample from each of the session streams of black samples, and extracting the metadata of the target white sample from each of the session streams of white samples, the method further includes:
and comparing the metadata of the target black sample with the metadata of the target white sample, and determining a metadata item with the characteristic difference larger than or equal to a difference threshold value as screening data.
In one embodiment, after extracting the metadata of the target black sample from each of the session streams of black samples, and extracting the metadata of the target white sample from each of the session streams of white samples, the method further includes:
at least one characteristic parameter of the one metadata item is extracted as one creation data.
In one embodiment, feature preprocessing is performed on target data to obtain processed target data; wherein the target data comprises the screening data and/or the creation data;
the feature preprocessing comprises at least one of the following:
carrying out feature normalization processing;
single-hot coding treatment;
and processing the missing value.
The embodiment of the present disclosure provides an HTTP tunnel detection apparatus, which includes:
the first HTTP traffic data acquisition module is used for acquiring first HTTP traffic data;
a detection result obtaining module, configured to input the first HTTP traffic data into an HTTP tunnel detection model, and obtain a detection result, where the detection result is used to indicate that the first HTTP traffic data is acquired by the HTTP tunnel, or indicate that the first HTTP traffic data is not acquired by the HTTP tunnel;
the HTTP tunnel detection model is a random forest model obtained based on training of training samples, and the training samples comprise: and each set of data comprises a plurality of screening data and a plurality of creation data, each screening data is a metadata item of which the characteristic difference between the metadata of the target white sample and the metadata of the target black sample is greater than or equal to a difference threshold value, and each creation data is at least one characteristic parameter of the metadata item.
The embodiment of the present disclosure provides an electronic device, which includes a memory and a processor, where the memory stores a computer program, and the processor implements the steps of the HTTP tunnel detection method provided in any embodiment of the present disclosure when executing the computer program.
The embodiments of the present disclosure provide a computer-readable storage medium, on which a computer program is stored, where the computer program, when executed by a processor, implements the steps of a HTTP tunnel detection method provided in any embodiment of the present disclosure.
The HTTP tunnel detection method provided by the embodiment of the disclosure comprises the steps of acquiring first HTTP traffic data, inputting the first HTTP traffic data into an HTTP tunnel detection model, and acquiring a detection result, wherein the detection result is used for indicating that the first HTTP traffic data are collected by an HTTP tunnel or indicating that the first HTTP traffic data are collected by a non-HTTP tunnel; the HTTP tunnel detection model is a random forest model obtained based on training of training samples, and the training samples comprise: and each set of data comprises a plurality of screening data and a plurality of creation data, each screening data is a metadata item of which the characteristic difference between the metadata of the target white sample and the metadata of the target black sample is greater than or equal to a difference threshold value, and each creation data is at least one characteristic parameter of the metadata item. Therefore, the training samples including the screening data and the creation data are constructed, the training samples are enhanced, the difference between the target black samples and the target white samples is fully utilized, the HTTP tunnel detection model obtained based on training of the training samples is more stable, and the accuracy of HTTP tunnel detection is improved.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the present disclosure and together with the description, serve to explain the principles of the disclosure.
In order to more clearly illustrate the embodiments or technical solutions in the prior art of the present disclosure, the drawings used in the description of the embodiments or prior art will be briefly described below, and it is obvious for those skilled in the art that other drawings can be obtained according to the drawings without inventive exercise.
Fig. 1 is a schematic flowchart of an HTTP tunnel detection method according to an embodiment of the present disclosure;
fig. 2 is a schematic flowchart of another HTTP tunnel detection method according to an embodiment of the present disclosure;
fig. 3 is a schematic flowchart of another HTTP tunnel detection method according to an embodiment of the present disclosure;
fig. 4 is a schematic flowchart of another HTTP tunnel detection method according to an embodiment of the present disclosure;
fig. 5 is a schematic flowchart of another HTTP tunnel detection method according to an embodiment of the present disclosure;
fig. 6 is a schematic flowchart of another HTTP tunnel detection method according to an embodiment of the present disclosure;
fig. 7 is a schematic structural diagram of an HTTP tunnel detection apparatus according to an embodiment of the present disclosure;
fig. 8 is an internal structural diagram of an electronic device provided in an embodiment of the present disclosure.
Detailed Description
In order that the above objects, features and advantages of the present disclosure may be more clearly understood, aspects of the present disclosure will be further described below. It should be noted that the embodiments and features of the embodiments of the present disclosure may be combined with each other without conflict.
In the following description, numerous specific details are set forth in order to provide a thorough understanding of the present disclosure, but the present disclosure may be practiced in other ways than those described herein; it is to be understood that the embodiments disclosed in the specification are only a few embodiments of the present disclosure, and not all embodiments.
With the development of internet technology, the maintenance of network security becomes the key point of network management, in a network, an attacker often adopts the trojan technology to perform some network violation operations, and particularly, the trojan communication has high concealment through the tunnel technology, so that the trojan communication cannot be detected by a traditional detection mode, and a website is attacked by malicious software or information is stolen, and the like. The HTTP tunnel is a technology for encapsulating various network protocols by using HTTP or HTTPs for communication, and is constructed based on a common port, so that the HTTP tunnel is easy to implement and difficult to detect, and detection of the HTTP tunnel becomes a relatively troublesome problem in network detection.
In the prior art, HTTP tunnel data is detected by a decision tree classification algorithm, which mainly analyzes a data packet of captured network traffic, extracts packet header information to perform session stream reassembly, obtains feature information of a session stream, trains an HTTP tunnel detection model by using the decision tree classification algorithm based on the feature information, and detects the HTTP tunnel data according to the trained model. However, with the prior art, the HTTP tunnel detection model has poor stability and the accuracy of the detection result is low.
The invention provides a HTTP tunnel detection method, which comprises the steps of obtaining first HTTP flow data, inputting the first HTTP flow data into an HTTP tunnel detection model, and obtaining a detection result, wherein the detection result is used for indicating that the first HTTP flow data are collected by an HTTP tunnel or indicating that the first HTTP flow data are collected by a non-HTTP tunnel; the HTTP tunnel detection model is a random forest model obtained based on training of training samples, and the training samples comprise: and each set of data comprises a plurality of screening data and a plurality of creation data, each screening data is a metadata item of which the characteristic difference between the metadata of the target white sample and the metadata of the target black sample is greater than or equal to a difference threshold value, and each creation data is at least one characteristic parameter of the metadata item. Therefore, the training samples including the screening data and the creation data are constructed, the training samples are enhanced, the difference between the target black samples and the target white samples is fully utilized, the HTTP tunnel detection model obtained based on training of the training samples is more stable, and the accuracy of HTTP tunnel detection is improved.
The HTTP tunnel detection method provided by the disclosure can be applied to HTTP tunnel detection devices, the devices can be electronic devices such as various personal computers, notebook computers, smart phones, tablet computers and portable wearable devices, and optionally, the devices can also be functional modules or functional entities which can realize data processing methods in the electronic devices.
In an embodiment, as shown in fig. 1, fig. 1 is a schematic flowchart of a HTTP tunnel detection method provided in an embodiment of the present disclosure, which specifically includes the following steps:
s10: first HTTP traffic data is obtained.
The first HTTP traffic data refers to traffic data that needs to be detected, and may be traffic data acquired by an HTTP tunnel, or traffic data acquired by a non-HTTP tunnel generated by normal internet access.
Specifically, first HTTP traffic data generated when a user surfs the internet is obtained.
S14: and inputting the first HTTP flow data into an HTTP tunnel detection model to obtain a detection result.
The detection result is used for indicating that the first HTTP traffic data is collected by the HTTP tunnel, or indicating that the first HTTP traffic data is not collected by the HTTP tunnel.
The HTTP tunnel detection model is a random forest model obtained based on training of training samples, and the training samples comprise: and each set of data comprises a plurality of screening data and a plurality of creation data, each screening data is a metadata item of which the characteristic difference between the metadata of the target white sample and the metadata of the target black sample is greater than or equal to a difference threshold value, and each creation data is at least one characteristic parameter of the metadata item.
The random forest model refers to obtaining an output result by voting a training set by using a random forest algorithm, where the output result is a probability value obtained by determining that the first HTTP traffic data is an HTTP tunnel, and comparing the probability value with a preset threshold, and when the probability value is greater than the preset threshold, it indicates that the training of the current model is finished, where the preset threshold may be fifty percent, but is not limited thereto, and those skilled in the art may specifically set the random forest model according to actual situations.
The white sample refers to traffic data collected by a non-HTTP tunnel generated by normal internet access, and for the traffic data collected by the non-HTTP tunnel generated by normal internet access, the traffic data with a port of 80 may be captured by using a traffic capture card at an exit of a gateway and stored in a pacp packet, but is not limited thereto, and the disclosure is not particularly limited. The target white sample refers to the sample data after recombination.
The black sample refers to traffic data acquired by the HTTP Tunnel, the HTTP Tunnel may be set up by a common HTTP Tunnel tool for the traffic data acquired by the HTTP Tunnel, data transmission of the HTTP Tunnel is collected in the HTTP Tunnel, and is stored in a pacp packet, and the HTTP Tunnel tool may be reGeory, neo _ regeorg, or HTTP _ Tunnel, but is not limited thereto, and the present disclosure is not particularly limited. The target black sample refers to the sample data after recombination.
The metadata is extracted from the recombined sample data, and includes metadata such as a source IP, a destination IP, a source port, a destination port, a protocol type, a packet size list, a packet arrival time list, a clientheadername list, a serverheadername list, a packet transmission direction list, a load size list, the number of packets in a session, the effective number of packets in a session, a request method, a protocol version, a response state, and the like, it should be noted that, in the metadata extraction, a threshold value of request response is set in each session of sample data, i.e., the metadata of the first N pairs of request response packets in a session to extract one sample data, the threshold setting may be, for example, 10, i.e., a session for sample data, when more than 10 pairs of requests and responses, only the metadata of the top 10 pairs are extracted, but not limited thereto, the present disclosure is not particularly limited, and those skilled in the art can specifically set it according to actual situations.
The difference threshold is a threshold parameter set to determine that the metadata of the target white sample and the metadata of the target black sample have a larger difference, and the size of the threshold parameter can be specifically set by a person skilled in the art according to actual situations.
Specifically, the collected first HTTP traffic data is input to a trained HTTP tunnel detection model, the HTTP tunnel detection model outputs a detection result, and it is determined from the detection result that the first HTTP traffic data is collected by an HTTP tunnel or is not collected by the HTTP tunnel.
In this way, in this embodiment, the first HTTP traffic data is input into the HTTP tunnel detection model by acquiring the first HTTP traffic data, and a detection result is acquired, where the detection result is used to indicate that the first HTTP traffic data is collected by the HTTP tunnel, or indicate that the first HTTP traffic data is collected by the non-HTTP tunnel; the HTTP tunnel detection model is a random forest model obtained based on training of training samples, and the training samples comprise: and each set of data comprises a plurality of screening data and a plurality of creation data, each screening data is a metadata item of which the characteristic difference between the metadata of the target white sample and the metadata of the target black sample is greater than or equal to a difference threshold value, and each creation data is at least one characteristic parameter of the metadata item. Therefore, the training samples including the screening data and the creation data are constructed, the training samples are enhanced, the difference between the target black samples and the target white samples is fully utilized, the HTTP tunnel detection model obtained based on training of the training samples is more stable, and the accuracy of HTTP tunnel detection is improved.
Fig. 2 is a schematic flow diagram of another HTTP tunnel detection method provided in an embodiment of the present disclosure, and as shown in fig. 2, before inputting the first HTTP traffic data into the HTTP tunnel detection model, the method further includes:
s11 a: and comparing the domain name of the first HTTP traffic data with the domain name which is stored in the white list and is not the traffic data collected by the HTTP tunnel.
S14 a: and if the domain name of the first HTTP traffic data does not exist in the white list, inputting the first HTTP traffic data into an HTTP tunnel detection model.
The white list is a domain name white list and is used for determining whether the acquired user traffic data belongs to non-HTTP tunnel acquisition.
Specifically, a domain name of first HTTP traffic data is acquired, the domain name of the first HTTP traffic data is compared with a domain name of traffic data stored in a white list and not acquired based on an HTTP tunnel, and when it is determined that there is no domain name of traffic data in the white list that matches the domain name of the first HTTP traffic data, the first HTTP traffic data is input to an HTTP tunnel detection model, so that it is determined according to the HTTP tunnel detection model that the first HTTP traffic data is traffic data acquired by the HTTP tunnel, or is not traffic data acquired by the HTTP tunnel.
Illustratively, for a white list of 100 previously built-in Alexa domain names, after a domain name of first HTTP traffic data is obtained according to session metadata of the first HTTP traffic data, the domain name of the first HTTP traffic data is compared with a domain name in the white list, and if the domain name of the first HTTP traffic data does not exist in the white list, the first HTTP traffic data is input to an HTTP tunnel detection model for detection.
It should be noted that the white list may also be customized according to the customer requirement, such as some IPs, but not limited thereto, and the disclosure is not limited thereto.
Therefore, in the embodiment, before the acquired first HTTP traffic data is detected, the first HTTP traffic data is determined in a white list manner, so that traffic data which is not collected by the HTTP tunnel is filtered, and the first HTTP traffic data which cannot be filtered and is uncertain is input into the HTTP tunnel detection model for detection, thereby saving detection time.
Fig. 3 is a schematic flowchart of another HTTP tunnel detection method provided in an embodiment of the present disclosure, and as shown in fig. 3, before inputting the first HTTP traffic data into the HTTP tunnel detection model, the method further includes:
s11 b: the first HTTP traffic data is parsed to obtain session content.
S14 b: and if the session content does not conform to the HTTP protocol specification, inputting the first HTTP traffic data into an HTTP tunnel detection model.
The session protocol detection in the HTTP protocol specification refers to determining whether the session data conforms to the HTTP protocol, and if so, it may be determined that the session is not the session content of the traffic data detected by the HTTP tunnel, for example, it may be determined whether the session data is the traffic data acquired by the HTTP tunnel by detecting the first 4 bytes of the session data.
Specifically, session content of first HTTP traffic data is acquired, whether the session content of the first HTTP traffic data meets an HTTP protocol specification is judged, and when it is determined that the session content of the first HTTP traffic data does not meet the HTTP protocol specification, the first HTTP traffic data is input to an HTTP tunnel detection model, so that it is determined according to the HTTP tunnel detection model that the first HTTP traffic data is traffic data acquired by an HTTP tunnel, or is not the traffic data acquired by the HTTP tunnel.
In this way, in the embodiment, before the acquired first HTTP traffic data is detected, by judging whether the session content of the first HTTP traffic data conforms to the HTTP protocol, traffic data that is not collected in the HTTP tunnel is filtered, so that the first HTTP traffic data that cannot be filtered and is uncertain is input into the HTTP tunnel detection model for detection, thereby saving detection time.
Fig. 4 is a schematic flowchart of another HTTP tunnel detection method provided in an embodiment of the present disclosure, and as shown in fig. 4, before inputting the first HTTP traffic data into the HTTP tunnel detection model, the method further includes:
s121 a: and acquiring an original black sample, wherein the original black sample comprises a plurality of second HTTP traffic data acquired by the HTTP tunnel.
The original black sample is flow data which is obtained by building an HTTP Tunnel by using common HTTP Tunnel tools such as reGeory, neo _ regeorg, HTTP _ Tunnel, abpts and the like within a period of time, collecting transmission data of the HTTP Tunnel in the HTTP Tunnel, and storing the transmission data in a pcap packet.
It should be noted that, in the present embodiment, by using a plurality of different HTTP tunneling tools, different original black samples can be collected, so that the types of the original black samples are enhanced, so that the HTTP tunneling detection model can detect traffic data collected by the different HTTP tunneling tools, and the present embodiment has universality.
Specifically, a plurality of second HTTP traffic data of HTTP tunnels established using a plurality of different HTTP tunnel tools are collected and stored in a pacp packet.
S122 a: and recombining each HTTP traffic data with the same quintuple information in the plurality of second HTTP traffic data into one session stream to obtain a plurality of black sample session streams.
The quintuple information comprises a source IP address, a source port, a transmission protocol, a destination port and a destination IP.
Specifically, the pacp packet storing the plurality of second HTTP traffic data is analyzed, quintuple information of the plurality of second HTTP traffic data is acquired, and each HTTP traffic data having the same quintuple information is reassembled into one session stream to obtain a plurality of black sample session streams.
S123 a: metadata for the target black sample is extracted from each of the streams of black sample sessions.
Specifically, the metadata is extracted from each of the recombined black sample session streams, where the extracted metadata includes a source IP, a destination IP, a source port, a destination port, a protocol type, a packet size list, a packet arrival time list, a clientheadername list, a serverheadername list, a packet transmission direction list, a load size list, the number of packets in the session, an effective number of packets in the session, a request method, a protocol version, and a response state, but is not limited thereto.
Optionally, on the basis of the foregoing embodiment, as shown in fig. 4, before inputting the first HTTP traffic data into the HTTP tunnel detection model, the method further includes:
s121 b: and acquiring an original white sample, wherein the original white sample comprises a plurality of third HTTP traffic data acquired by the non-HTTP tunnel.
For example, the traffic collection card is used to capture a plurality of third HTTP traffic data with a port of 80 at the gateway outlet, that is, the obtained original white samples, and store the plurality of third HTTP traffic data in pacp packets, but not limited thereto. The target white sample refers to the sample data after recombination.
S122 b: and recombining each plurality of HTTP traffic data with the same quintuple information in the plurality of third HTTP traffic data into one session stream to obtain a plurality of white sample session streams. For the specific implementation, refer to step S122a of the above embodiment, which is not described herein.
S123 b: metadata for the target white sample is extracted from each white sample conversational stream. For the specific implementation, reference is made to step S123a in the foregoing embodiment, which is not described herein again.
In this way, in the embodiment, session stream recombination is performed on the obtained black sample and white sample within a period of time to obtain a black sample session stream and a white sample session stream, metadata of the target black sample and metadata of the target white sample are obtained, and further, a training sample can be constructed based on the metadata of the target black sample and the metadata of the target white sample to obtain the HTTP tunnel detection model.
In the above embodiments, in some embodiments of the present disclosure, one way to achieve this is to obtain metadata of the target black sample and metadata of the target white sample separately.
Fig. 5 is a flowchart illustrating a further HTTP tunnel detection method according to an embodiment of the present disclosure, where as shown in fig. 5, after extracting metadata of a target black sample from each black sample session stream and extracting metadata of a target white sample from each white sample session stream, the method further includes:
s13 a: and comparing the metadata of the target black sample with the metadata of the target white sample, and determining a metadata item with the characteristic difference larger than or equal to a difference threshold value as screening data.
The feature difference refers to a feature difference existing between metadata of a target black sample and metadata of a target white sample, the difference threshold is a numerical value set for the screening data in a plurality of metadata items, and the size of the difference threshold is not particularly limited in the present disclosure and can be specifically set by a person skilled in the art according to actual situations.
Specifically, the acquired metadata in the target black sample is compared with the metadata in the target white sample, and when it is determined that the characteristic difference between one metadata item in the target black sample and one metadata item in the target white sample is greater than a difference threshold value, the metadata item is determined to be screening data.
Like this, this embodiment is through comparing the metadata in the black sample of target with the metadata of the white sample of target to select the metadata that has great difference as screening data, and establish the training sample based on these a plurality of screening data, when making HTTP tunnel detection model training based on this training sample, can promote training time, and improve the accuracy that the model detected.
Fig. 6 is a flowchart illustrating a further HTTP tunnel detection method according to an embodiment of the present disclosure, where as shown in fig. 6, after extracting metadata of a target black sample from each black sample session stream and extracting metadata of a target white sample from each white sample session stream, the method further includes:
s13 b: at least one characteristic parameter of a metadata item is extracted as a creation data.
The characteristic parameter refers to a parameter calculated based on metadata of the target black sample and metadata of the target white sample, and may be, for example, an upload/download bit characteristic value, a payload mean value, a payload maximum value, a stream time interval mean value, and a ratio of a point in a character string, but is not limited thereto, and the disclosure is not limited in particular.
Specifically, one or more characteristic parameters with metadata items are obtained according to the metadata of the target black sample and the metadata of the target white sample, and the one or more characteristic parameters are used as creation data.
For example, when the metadata items are clientheadername and servername, since the effect is not ideal due to training of a random forest based on characters, the reputation value score is obtained by the metadata items clientheadername and servername through a Recurrent neural network (Gate recovery Unit, GRU), that is, the character strings of the clientheadername and servername are converted into numerical values by the method, and then the maximum value, the minimum value, the mean value, and the like of the reputation value are calculated based on the converted reputation value list, and for the character strings of the metadata items clientheadername and servername, the occupation ratio of a special character in the character strings and the occupation ratio of a point in the character strings are determined, so as to quantize the characters, but the disclosure is not limited specifically.
On the basis of the above-mentioned example, in some embodiments of the present disclosure, feature preprocessing is performed on target data to obtain processed target data.
Wherein the target data comprises screening data and/or creation data; the characteristic pretreatment comprises at least one of the following treatments: carrying out feature normalization processing; single-hot coding treatment; and processing the missing value.
Illustratively, the characteristic normalization processing refers to converting a numerical value into a decimal number between (0, 1) or converting a dimensional expression into a dimensionless expression, so that the data can be processed more conveniently and more quickly.
The one-hot coding process refers to coding the training samples in the form of 0 and 1.
The processing of the missing value refers to padding 0 in the missing value when there is a missing value in the metadata, for example, for the sample data, when there is a missing value in the payload, the missing value is padded with 0, and if the number of missing is small, the sample may be deleted, but the disclosure is not limited thereto.
In this way, the present implementation enables to reduce the training time of the HTTP tunnel detection model when training based on the training sample including the preprocessed screening data and/or creating data by preprocessing the screening data and/or creating data in the target data.
Fig. 7 is an HTTP tunnel detection apparatus provided in an embodiment of the present disclosure, including: a first HTTP traffic data acquisition module 11, and a detection result acquisition module 12.
The first HTTP traffic data obtaining module 11 is configured to obtain first HTTP traffic data.
The detection result obtaining module 12 is configured to input the first HTTP traffic data into the HTTP tunnel detection model, and obtain a detection result, where the detection result is used to indicate that the first HTTP traffic data is acquired by an HTTP tunnel, or indicate that the first HTTP traffic data is acquired by a non-HTTP tunnel;
the HTTP tunnel detection model is a random forest model obtained based on training of training samples, and the training samples comprise: and each set of data comprises a plurality of screening data and a plurality of creation data, each screening data is a metadata item of which the characteristic difference between the metadata of the target white sample and the metadata of the target black sample is greater than or equal to a difference threshold value, and each creation data is at least one characteristic parameter of the metadata item.
In the above embodiment, the detection result obtaining module 12 further includes: the filtering module is used for comparing the domain name of the first HTTP traffic data with the domain name which is stored in the white list and is not the traffic data acquired by the HTTP tunnel; and if the domain name of the first HTTP traffic data does not exist in the white list, inputting the first HTTP traffic data into an HTTP tunnel detection model.
In the above embodiment, the filtering module is further configured to parse the first HTTP traffic data to obtain session content; and if the session content does not conform to the HTTP protocol specification, inputting the first HTTP traffic data into an HTTP tunnel detection model.
In the above embodiment, the method further includes obtaining an original black sample, where the original black sample includes a plurality of second HTTP traffic data acquired by the HTTP tunnel; recombining each HTTP traffic data with the same quintuple information in the plurality of second HTTP traffic data into a session stream to obtain a plurality of black sample session streams; extracting metadata of a target black sample from each of the streams of black sample sessions; and/or obtaining an original white sample, wherein the original white sample comprises a plurality of third HTTP traffic data acquired by the non-HTTP tunnel; recombining each HTTP traffic data with the same quintuple information in the third HTTP traffic data into a session stream to obtain a plurality of white sample session streams; metadata for the target white sample is extracted from each white sample conversational stream.
In the above embodiment, the data obtaining module is further configured to compare metadata of the target black sample with metadata of the target white sample, and determine a metadata item with a feature difference greater than or equal to a difference threshold as a filter data.
In the above embodiment, the data obtaining module is further configured to extract at least one characteristic parameter of a metadata item as creative data.
In the above embodiment, the data acquisition module is further configured to perform feature preprocessing on the target data to obtain the processed target data; wherein the target data comprises screening data and/or creation data; the characteristic pretreatment comprises at least one of the following treatments: carrying out feature normalization processing; single-hot coding treatment; and processing the missing value.
In this way, the present embodiment is used for acquiring the first HTTP traffic data by the first HTTP traffic data acquisition module 11. The detection result obtaining module 12 is configured to input the first HTTP traffic data into the HTTP tunnel detection model, and obtain a detection result, where the detection result is used to indicate that the first HTTP traffic data is acquired based on the HTTP tunnel, or indicate that the first HTTP traffic data is acquired by a non-HTTP tunnel. The HTTP tunnel detection model is a random forest model obtained based on training of training samples, and the training samples comprise: and each set of data comprises a plurality of screening data and a plurality of creation data, each screening data is a metadata item of which the characteristic difference between the metadata of the target white sample and the metadata of the target black sample is greater than or equal to a difference threshold value, and each creation data is at least one characteristic parameter of the metadata item. Therefore, the training samples including the screening data and the creation data are constructed, the training samples are enhanced, the difference between the target black samples and the target white samples is fully utilized, the HTTP tunnel detection model obtained based on training of the training samples is more stable, and the accuracy of HTTP tunnel detection is improved.
The apparatus of this embodiment may be used to implement the technical solution of any one of the method embodiments shown in fig. 1 to fig. 6, and the implementation principle and the technical effect are similar, which are not described herein again.
An embodiment of the present disclosure provides an electronic device, as shown in fig. 8, including: the HTTP tunnel detection method provided in the embodiment of the present disclosure may be implemented when the processor executes the computer program, for example, the technical solution of any one of the method embodiments shown in fig. 1 to 6 may be implemented when the processor executes the computer program, and the implementation principle and the technical effect are similar, and are not described herein again.
The present disclosure also provides a computer-readable storage medium, on which a computer program is stored, where the computer program, when executed by a processor, may implement the HTTP tunnel detection method provided in the embodiment of the present disclosure, for example, when executed by the processor, implement the technical solution of any one of the method embodiments shown in fig. 1 to 6, and the implementation principle and the technical effect are similar, and are not described herein again.
It is noted that, in this document, relational terms such as "first" and "second," and the like, may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.
The foregoing are merely exemplary embodiments of the present disclosure, which enable those skilled in the art to understand or practice the present disclosure. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the disclosure. Thus, the present disclosure is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims (10)

1. An HTTP tunnel detection method, comprising:
acquiring first HTTP traffic data;
inputting the first HTTP traffic data into an HTTP tunnel detection model, and obtaining a detection result, wherein the detection result is used for indicating that the first HTTP traffic data is collected by the HTTP tunnel, or indicating that the first HTTP traffic data is not collected by the HTTP tunnel;
the HTTP tunnel detection model is a random forest model obtained based on training of training samples, and the training samples comprise: and each set of data comprises a plurality of screening data and a plurality of creation data, each screening data is a metadata item of which the characteristic difference between the metadata of the target white sample and the metadata of the target black sample is greater than or equal to a difference threshold value, and each creation data is at least one characteristic parameter of the metadata item.
2. The method of claim 1,
before the inputting the first HTTP traffic data into the HTTP tunnel detection model, the method further includes:
comparing the domain name of the first HTTP traffic data with domain names which are stored in a white list and are not the traffic data acquired by the HTTP tunnel;
the inputting the first HTTP traffic data into an HTTP tunnel detection model includes:
and if the domain name of the first HTTP traffic data does not exist in the white list, inputting the first HTTP traffic data into an HTTP tunnel detection model.
3. The method of claim 1,
before the inputting the first HTTP traffic data into the HTTP tunnel detection model, the method further includes:
analyzing the first HTTP traffic data to obtain session content;
the inputting the first HTTP traffic data into an HTTP tunnel detection model includes:
and if the session content does not conform to the HTTP protocol specification, inputting the first HTTP flow data into an HTTP tunnel detection model.
4. The method of claim 1, wherein prior to entering the first HTTP traffic data into an HTTP tunnel detection model, further comprising:
acquiring an original black sample, wherein the original black sample comprises a plurality of second HTTP traffic data acquired by the HTTP tunnel;
recombining each HTTP traffic data with the same quintuple information into a session stream in the plurality of second HTTP traffic data to obtain a plurality of black sample session streams;
extracting metadata of the target black sample from each black sample session stream;
and/or the presence of a gas in the gas,
acquiring an original white sample, wherein the original white sample comprises a plurality of third HTTP traffic data which are not acquired by the HTTP tunnel;
recombining each HTTP traffic data with the same quintuple information into a session stream in the third HTTP traffic data to obtain a plurality of white sample session streams;
metadata for the target white sample is extracted from each white sample conversational stream.
5. The method of claim 4, wherein after extracting the metadata of the target black sample from each black sample conversational flow and extracting the metadata of the target white sample from each white sample conversational flow, further comprising:
and comparing the metadata of the target black sample with the metadata of the target white sample, and determining a metadata item with the characteristic difference larger than or equal to a difference threshold value as screening data.
6. The method of claim 4, wherein after extracting the metadata of the target black sample from each black sample conversational flow and extracting the metadata of the target white sample from each white sample conversational flow, further comprising:
at least one characteristic parameter of the one metadata item is extracted as one creation data.
7. The method according to any one of claims 5-6, comprising:
performing characteristic preprocessing on target data to obtain processed target data; wherein the target data comprises the screening data and/or the creation data;
the feature preprocessing comprises at least one of the following:
carrying out feature normalization processing;
single-hot coding treatment;
and processing the missing value.
8. An HTTP tunnel detection apparatus, comprising:
the first HTTP traffic data acquisition module is used for acquiring first HTTP traffic data;
a detection result obtaining module, configured to input the first HTTP traffic data into an HTTP tunnel detection model, and obtain a detection result, where the detection result is used to indicate that the first HTTP traffic data is acquired by the HTTP tunnel, or indicate that the first HTTP traffic data is not acquired by the HTTP tunnel;
the HTTP tunnel detection model is a random forest model obtained based on training of training samples, and the training samples comprise: and each set of data comprises a plurality of screening data and a plurality of creation data, each screening data is a metadata item of which the characteristic difference between the metadata of the target white sample and the metadata of the target black sample is greater than or equal to a difference threshold value, and each creation data is at least one characteristic parameter of the metadata item.
9. An electronic device comprising a memory and a processor, the memory storing a computer program, wherein the processor when executing the computer program implements the steps of the HTTP tunnel detection method of any one of claims 1 to 7.
10. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the steps of the HTTP tunnel detection method according to any one of claims 1 to 7.
CN202111332949.8A 2021-11-11 2021-11-11 HTTP tunnel detection method, device, electronic equipment and storage medium Pending CN114070602A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111332949.8A CN114070602A (en) 2021-11-11 2021-11-11 HTTP tunnel detection method, device, electronic equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111332949.8A CN114070602A (en) 2021-11-11 2021-11-11 HTTP tunnel detection method, device, electronic equipment and storage medium

Publications (1)

Publication Number Publication Date
CN114070602A true CN114070602A (en) 2022-02-18

Family

ID=80275097

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111332949.8A Pending CN114070602A (en) 2021-11-11 2021-11-11 HTTP tunnel detection method, device, electronic equipment and storage medium

Country Status (1)

Country Link
CN (1) CN114070602A (en)

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102035698A (en) * 2011-01-06 2011-04-27 西北工业大学 HTTP tunnel detection method based on decision tree classification algorithm
CN107733851A (en) * 2017-08-23 2018-02-23 刘胜利 DNS tunnels Trojan detecting method based on communication behavior analysis
CN108665166A (en) * 2018-05-10 2018-10-16 易联支付有限公司 A kind of training method and device of risk control model
CN108985361A (en) * 2018-07-02 2018-12-11 北京金睛云华科技有限公司 A kind of malicious traffic stream detection implementation method and device based on deep learning
CN111478921A (en) * 2020-04-27 2020-07-31 深信服科技股份有限公司 Method, device and equipment for detecting communication of hidden channel
WO2020228283A1 (en) * 2019-05-15 2020-11-19 平安科技(深圳)有限公司 Feature extraction method and apparatus, and computer readable storage medium

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102035698A (en) * 2011-01-06 2011-04-27 西北工业大学 HTTP tunnel detection method based on decision tree classification algorithm
CN107733851A (en) * 2017-08-23 2018-02-23 刘胜利 DNS tunnels Trojan detecting method based on communication behavior analysis
CN108665166A (en) * 2018-05-10 2018-10-16 易联支付有限公司 A kind of training method and device of risk control model
CN108985361A (en) * 2018-07-02 2018-12-11 北京金睛云华科技有限公司 A kind of malicious traffic stream detection implementation method and device based on deep learning
WO2020228283A1 (en) * 2019-05-15 2020-11-19 平安科技(深圳)有限公司 Feature extraction method and apparatus, and computer readable storage medium
CN111478921A (en) * 2020-04-27 2020-07-31 深信服科技股份有限公司 Method, device and equipment for detecting communication of hidden channel

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
兰景宏;刘胜利;李晔;肖达;王东霞;: "一种基于多层联合分析的HTTP隧道木马检测方法", 计算机应用研究, vol. 33, no. 01, pages 240 - 244 *
赵琦;蒋朝惠;周雪梅;宋紫华;: "一种基于HTTP协议的隐蔽隧道及其检测方法", 计算机与现代化, no. 06, pages 16 - 23 *

Similar Documents

Publication Publication Date Title
CN109960729B (en) Method and system for detecting HTTP malicious traffic
CN109450842B (en) Network malicious behavior recognition method based on neural network
US11399288B2 (en) Method for HTTP-based access point fingerprint and classification using machine learning
CN101686239B (en) Trojan discovery system
CN109818970B (en) Data processing method and device
CN110808994B (en) Method and device for detecting brute force cracking operation and server
CN107370752B (en) Efficient remote control Trojan detection method
CN110392013A (en) A kind of Malware recognition methods, system and electronic equipment based on net flow assorted
CN111147394B (en) Multi-stage classification detection method for remote desktop protocol traffic behavior
CN110611640A (en) DNS protocol hidden channel detection method based on random forest
CN111478920A (en) Method, device and equipment for detecting communication of hidden channel
CN113329023A (en) Encrypted flow malice detection model establishing and detecting method and system
CN110958233B (en) Encryption type malicious flow detection system and method based on deep learning
US11888874B2 (en) Label guided unsupervised learning based network-level application signature generation
CN110519228B (en) Method and system for identifying malicious cloud robot in black-production scene
CN112800424A (en) Botnet malicious traffic monitoring method based on random forest
CN113591085A (en) Android malicious application detection method, device and equipment
CN112217763A (en) Hidden TLS communication flow detection method based on machine learning
CN113132329A (en) WEBSHELL detection method, device, equipment and storage medium
CN115086055A (en) Detection device and method for malicious traffic encrypted by android mobile device
CN113965418B (en) Attack success judgment method and device
CN113037748A (en) C and C channel hybrid detection method and system
CN115051874B (en) Multi-feature CS malicious encrypted traffic detection method and system
CN114070602A (en) HTTP tunnel detection method, device, electronic equipment and storage medium
CN106411879B (en) A kind of acquisition methods and device of software identification feature

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20220218