CN115378850A - Sketch-based encryption flow online analysis method and system - Google Patents

Sketch-based encryption flow online analysis method and system Download PDF

Info

Publication number
CN115378850A
CN115378850A CN202211053892.2A CN202211053892A CN115378850A CN 115378850 A CN115378850 A CN 115378850A CN 202211053892 A CN202211053892 A CN 202211053892A CN 115378850 A CN115378850 A CN 115378850A
Authority
CN
China
Prior art keywords
stream
flow
sketch
information
data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202211053892.2A
Other languages
Chinese (zh)
Other versions
CN115378850B (en
Inventor
彭立志
吕梦达
郝逸航
李辉
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
University of Jinan
Original Assignee
University of Jinan
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by University of Jinan filed Critical University of Jinan
Priority to CN202211053892.2A priority Critical patent/CN115378850B/en
Publication of CN115378850A publication Critical patent/CN115378850A/en
Application granted granted Critical
Publication of CN115378850B publication Critical patent/CN115378850B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L43/00Arrangements for monitoring or testing data switching networks
    • H04L43/08Monitoring or testing based on specific metrics, e.g. QoS, energy consumption or environmental parameters
    • H04L43/0876Network utilisation, e.g. volume of load or congestion level
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L43/00Arrangements for monitoring or testing data switching networks
    • H04L43/12Network monitoring probes
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L43/00Arrangements for monitoring or testing data switching networks
    • H04L43/16Threshold monitoring
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/14Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic
    • H04L63/1408Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic by monitoring network traffic
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D30/00Reducing energy consumption in communication networks
    • Y02D30/50Reducing energy consumption in communication networks in wire-line communication networks, e.g. low power modes or reduced link rate

Abstract

The invention discloses a Sketch-based encrypted flow online analysis method and a system, and belongs to the technical field of network security. Collecting original network flow data; extracting quintuple information and statistical characteristic information of original network flow data, wherein the statistical characteristic information comprises load size and arrival time, and filtering data packets with the load size smaller than a first threshold; acquiring a stream according to the quintuple information, and calculating a stream ID; and according to the statistical characteristic information of the stream, performing coarse-grained identification on the stream by using a Sketch data structure, dividing the importance degree of the stream, and keeping the information of the stream of which the importance degree exceeds a second threshold value. The method can realize reliable online identification and analysis of the encrypted flow on the premise of low memory occupation and high processing speed, and solves the problems of high computational power occupation, large memory occupation, low processing speed and difficulty in online identification in the prior art.

Description

Sketch-based encryption flow online analysis method and system
Technical Field
The application relates to the technical field of network security, in particular to a Sketch-based encrypted flow online analysis method and system.
Background
The statements in this section merely provide background information related to the present disclosure and may not constitute prior art.
The traffic analysis and identification are one of the bases for improving the network service quality and maintaining the network space security, but with the increasing share of the encrypted communication traffic in the network, the original method based on content detection is no longer effective, and the realization of the fast and accurate encrypted traffic analysis and identification is increasingly difficult. Research has shown that the proportion of encrypted traffic in the network has reached over 80% since 2020. In order to maintain a green and healthy network space, encrypted traffic analysis and identification become a research hotspot in academia and industry in recent years.
Due to the use of technologies such as dynamic port and port camouflage, the traditional identification method based on the port is invalid, and the existing traffic analysis and identification method can be roughly divided into two types, which are respectively: content-based identification methods and non-content-based identification methods.
Content-based identification methods (e.g., deep packet inspection techniques) require the content in each data packet to be inspected and matched against a fingerprint library established in advance to complete the identification. This approach is inefficient and not suitable for encrypted traffic identification.
The identification method not based on the content (such as utilizing the flow statistical information) has good effect on the accuracy of encrypted traffic identification, but the training of the machine learning model needs to use a large amount of data samples, and the training of the deep learning model needs a large amount of calculation power. In addition, the model memory occupies a large amount, and is difficult to be used for online analysis and identification.
In summary, the following deficiencies still exist in the current research on flow analysis and identification: with the popularization of encryption technology, the traditional method is gradually ineffective; the emerging method also has the defects of high computational power occupation, large memory occupation, low processing speed, difficulty in online identification and the like.
Disclosure of Invention
In order to solve the defects of the prior art, the application provides a method and a system for encrypted flow online analysis based on Sketch, firstly, network flow is collected and statistical characteristics of data packets are extracted, then, sketch is used for carrying out periodic measurement on flow, importance degrees are divided according to duration and frequency of the flow, finally, statistical information of the first K packets in the flow with higher importance degrees is kept, the flow with lower importance degrees is eliminated, and the encrypted flow online identification analysis which is more reliable can be realized on the premise of low memory occupation and high processing speed.
In a first aspect, the application provides an encryption flow online analysis method based on Sketch;
an encryption traffic online analysis method based on Sketch comprises the following steps:
collecting original network flow data;
extracting quintuple information and statistical characteristic information of original network traffic data, wherein the statistical characteristic information comprises load size and arrival time, and filtering data packets with the load size smaller than a first threshold; acquiring a stream according to the quintuple information, and calculating a stream ID;
and according to the statistical characteristic information of the stream, performing coarse-grained identification on the stream by using a Sketch data structure, dividing the importance degree of the stream, and keeping the information of the stream of which the importance degree exceeds a second threshold value.
In a second aspect, the application provides a system for encrypted traffic online analysis based on Sketch;
a Sketch-based encrypted traffic online analysis system comprises:
the flow acquisition module is used for acquiring original network flow data;
the flow cleaning module is used for extracting quintuple information and statistical characteristic information of original network flow data, wherein the statistical characteristic information comprises load size and arrival time, and filtering data packets with the load size smaller than a first threshold value; acquiring a stream according to the quintuple information, and calculating a stream ID;
and the flow analysis module is used for performing coarse-grained identification on the flow by using the Sketch data structure according to the statistical characteristic information of the flow, dividing the importance degree of the flow and keeping the information of the flow of which the importance degree exceeds a second threshold value.
In a third aspect, the present application provides an electronic device;
an electronic device comprises a memory, a processor and computer instructions stored on the memory and executed on the processor, wherein the computer instructions are executed by the processor to complete the steps of the Skatch-based encryption traffic online analysis method.
In a fourth aspect, the present application provides a computer-readable storage medium;
a computer readable storage medium for storing computer instructions, wherein the computer instructions, when executed by a processor, perform the steps of the above Sketch-based encrypted traffic online analysis method.
Compared with the prior art, the beneficial effects of this application are:
the application provides a Sketch-based encryption flow online analysis method and system. Firstly, the method realizes the on-line capture and analysis of the network flow; secondly, the method and the device perform compression storage according to the statistical characteristics of the network flow, so that a large amount of original data of the network flow is prevented from being stored, and the memory occupation is greatly reduced; in addition, since only the statistical characteristics of the flow are used, the present application is equally applicable to encrypting traffic; in addition, by using the Sketch data structure, the requirement of online measurement can be met, and network flow can also be periodically measured; finally, the method and the device divide the importance degree of the stream according to the frequency and the persistence of the stream, store the information of the first K packets of the important stream, reduce the memory occupation and improve the processing speed.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this application, are included to provide a further understanding of the application, and the description of the exemplary embodiments and illustrations of the application are intended to explain the application and are not intended to limit the application.
Fig. 1 is a schematic flow chart of an encryption traffic online analysis method based on Sketch according to an embodiment of the present application;
fig. 2 is a system framework schematic diagram of a system for encrypted traffic online analysis based on Sketch according to an embodiment of the present application;
fig. 3 is a schematic structural diagram of a Sketch data structure provided in an embodiment of the present application;
fig. 4 is a schematic flowchart of a Sketch data structure update operation provided in an embodiment of the present application;
fig. 5 is a schematic operation flow diagram of a clock scanning module in the Sketch data structure according to an embodiment of the present application.
Detailed Description
It should be noted that the following detailed description is exemplary and is intended to provide further explanation of the disclosure herein. Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this application belongs.
It is noted that the terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of exemplary embodiments according to the present application. As used herein, the singular forms "a", "an", and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise, and it should be understood that the terms "comprises" and "comprising", and any variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.
The embodiments and features of the embodiments of the invention may be combined with each other without conflict.
Example one
In the prior art, with the popularization of encryption technology, the traditional method is gradually ineffective; the emerging method also has the defects of high computational power occupation, large memory occupation, low processing speed, difficulty in online identification and the like; sketch is an efficient data structure with stable error boundaries, using a hashing strategy to store as much and correct raw data statistics as possible with much less spatial overhead and very low temporal overhead than the raw data. By designing the method using Sketch, the encryption traffic online identification with low overhead, high efficiency and reliability is realized. Therefore, the application provides an encryption flow online analysis method based on Sketch.
A Sketch-based encryption flow online analysis method comprises the following steps:
collecting original network flow data;
extracting quintuple information and statistical characteristic information of original network traffic data, wherein the statistical characteristic information comprises load size and arrival time, and filtering data packets with the load size smaller than a first threshold; acquiring a stream according to the quintuple information, and calculating a stream ID;
and according to the statistical characteristic information of the stream, performing coarse-grained identification on the stream by using a Sketch data structure, dividing the importance degree of the stream, and keeping the information of the stream of which the importance degree exceeds a second threshold value.
Further, after calculating the stream ID, the method further includes:
judging whether the stream is a new stream or not according to the stream ID;
if the flow is a new flow, the flow ID, the quintuple information and the arrival time are saved.
Further, according to the statistical characteristic information of the stream, performing coarse-grained identification on the stream by using a Sketch data structure, wherein the importance degree of dividing the stream comprises the following steps:
periodically measuring the flow;
and dividing the stream into important degrees according to the stream frequency and the stream persistence of the stream.
Furthermore, the stream frequency is the degree that the frequency of a stream in all the appearing periods of the stream conforms to a given interval;
stream persistence is the degree of persistence of a stream.
Furthermore, the Sketch data structure comprises a bit array, a two-dimensional array bucket, a clock scanning module, an importance measurement function and a hyper-parameter; the two-dimensional array bucket is used for storing the load size and the arrival time of a data packet in the stream and the ID, the load size, the frequency and the persistence of the stream, the clock scanning module is used for periodically measuring the stream, the importance measuring function is used for obtaining the importance degree of the stream according to the stream frequency and the stream persistence, and the hyper-parameter is used for maintaining the periodic measurement of the stream.
Further, whether to use a capture filter and/or remote capture is set according to the demand of the collected flow.
Further, the load size is the size of the payload of the transport layer of the data packet, the arrival time is the kernel time of the network device when the data packet is recorded on the storage medium of the network device by the network device, and the quintuple information includes the source IP address, the source port, the destination IP address, the destination port and the protocol type information of the data packet.
Next, an encrypted traffic online analysis method based on Sketch disclosed in this embodiment is described in detail with reference to fig. 1 to 5.
The embodiment provides an encryption traffic online analysis method based on Sketch.
A Sketch-based encryption flow online analysis method comprises the following steps:
s1, collecting original network flow data; the method comprises the following specific steps:
s101, reasonably setting a capture filter according to the requirement of acquiring original network flow to realize accurate capture of the flow, and filtering the flow based on modes such as types, transmission directions, protocols and data; wherein, the grammar expression of the capture filter is: { < protocol > < direction > < host > < value > < logical operation > < other expression > }, the roles of the fields and optional values are shown in table 1.
Table 1: filter expression field introduction
Figure BDA0003824802710000061
Figure BDA0003824802710000071
S102, capturing the flow of the remote network equipment by using remote capture, and separating a flow acquisition end from an analysis end to reduce the system load. Specifically, network connection between the device to be captured and the device to be captured is established and maintained, and rpcapd.exe application program (WinPcap remote capture server program) is run on the device to be captured to realize transmission of captured traffic data to the device to be captured; after the connection is established, the IP address, the user name and the corresponding password of the captured device are set in the capturing device to realize remote capturing.
S103, listing and selecting a captured host network card according to the identifier of the target network card, and finishing the binding of the target network card; the network card is uniquely determined by identifiers, different network card identifiers are different, and the identifier format is as follows: { FFFFFFFF-FFFF-FFFF-FFFF-FFFFFFFF }.
S104, starting to collect flow, enabling original network flow data to firstly reach a kernel buffer area of NPF (network Packet Filter) of WinPcap, and when the buffer area is full, transmitting the data into a user state buffer area in a callback function mode for asynchronous processing.
S2, extracting quintuple information and statistical characteristic information of the original network flow data, wherein the statistical characteristic information comprises load size and arrival time; filtering data packets with load sizes smaller than a first threshold value; acquiring a stream according to the quintuple information, and calculating a stream ID; the load size is the size of the effective load of a data packet transmission layer, the arrival time is the kernel time of the network equipment when the data packet is recorded on a storage medium of the network equipment by the network equipment, and the quintuple information comprises a source IP address, a source port, a destination IP address, a destination port and protocol type information of the data packet; the first threshold value is set according to task requirements and experience; the method comprises the following specific steps:
s201, obtaining the arrival time of a data packet and the total length information of the data packet; analyzing data link layer header information of a data packet according to an Ethernet data frame format, analyzing network layer information of the data packet, acquiring a source IP address, a destination IP address and transport layer protocol type information of the data packet, and acquiring the length of the IP datagram header and the total length including an effective load of the IP datagram header to calculate the size of the load; if the network card starts a GRO (general Receive Offload) function, fragmented TCP packets sent by a plurality of opposite terminals are aggregated in advance in the network card and then are transmitted to the NPF kernel buffer area, where the total length field of the IP datagram is 0, and the total length information of the packet is used to replace the calculated load.
S202, analyzing transmission layer information of a data packet, acquiring length information of a source port, a destination port and a transmission layer header, filtering the data packet with a load smaller than a first threshold value, wherein the data packet with the same quintuple information is a stream, and hashing the quintuple information of the data packet by using a hash function algorithm XXhash to generate a sixty-four-bit long unsigned integer as a stream ID; wherein the first threshold is set according to task requirements and experience.
S203, mapping the stream ID to the size N 1 If the corresponding bit position of the bit array is 0, the fact that the bit array does not appear before is proved, quintuple information and arrival time are stored in the structured log file, and the corresponding bit position of the five-tuple information is 1; if the corresponding bit is 1, no recording is performed.
S3, according to the statistical characteristic information of the stream, performing coarse-grained identification on the stream by using a Sketch data structure, dividing the importance degree of the stream, and keeping the information of the stream of which the importance degree exceeds a second threshold value; the information of the flow includes a flow ID, quintuple information, flow importance and statistical characteristic information, where the statistical characteristic information includes, but is not limited to, load size and arrival time; the second threshold value is set according to task requirements and experience;
specifically, the Sketch data structure includes a bit array with a size of N1, a two-dimensional array bucket with a size of N2 rows and d columns, a clock scanning module, an importance measurement function, and a series of hyper-parameters (cycle size, cycle internal frequency lower limit, cycle internal frequency upper limit) for maintaining periodic measurement of the stream; the two-dimensional array bucket comprises two arrays with the size of K and 6 counters, wherein the arrays are used for storing the load size and arrival time information of the first K packets in the stream, the counters are respectively used for storing the ID, the size, the frequency, the persistence, the frequency appearing in the current period and the flag bit whether appearing in the current period, the initial values of the frequency and the persistence of the stream are 100, and the initial values of the rest counters and the arrays are 0; the bit array is used for judging whether the arrival stream is a full stream or not, and the initial value is also 0; the clock scanning module is used for maintaining the periodic measurement of the convection, scanning all the buckets in the two-dimensional array in one period and reducing the persistence of the flow which does not appear in the current period; the importance measure function is a function of flow frequency and persistence, and the function formula is: the importance = α × flow frequency + β × flow persistence, where α and β can be adjusted accordingly within the interval [ -1,1] according to the needs of the measurement task; the Sketch data structure is shown in fig. 3, the Sketch update operation flow is shown in fig. 4, and the roles and meanings of the fields are shown in table 2;
table 2: introduction of elements in Sketch two-dimensional array bucket
Element name Action and meanings
Stream ID Uniquely identifying a network data stream
Flow size Recording the number of data packets in a stream
Frequency of the reaction Indicating the degree of coincidence of the frequency of occurrence of the stream with a predetermined target interval
Degree of persistence Indicating the degree of persistence of the occurrence of the stream
Frequency of cycles Recording the number of occurrences of the stream in a cycle
Marker bit Marking whether the stream appears in the current cycleFor treating
Load size array Record the load size information of the first K packets of the stream
Time of arrival array Recording the arrival time information of the first K packets of the stream
The method comprises the following specific steps:
s301, scanning all the buckets in the two-dimensional array buckets one by one at a constant speed by using a clock scanning module, and ensuring that all the buckets are just scanned in one period. When the pointer points to a bucket, if the memory of the bucket flows and the flag bit is 0, attenuating the flow persistence; if the current exists in the bucket and the flag bit is 1, the stream persistence is increased progressively, and then the flag bit is set to 0, which means that the current bucket ends in the current period and the new period begins; clock scanning operation as shown in fig. 5, steps S302-S306 are performed while periodically measuring.
S302, when the flow arrives, firstly, the flow ID passes through a hash function H 1 (ID)=ID%N 1 Mapping to a certain position of the bit array, if the corresponding bit is 1, then proving that the stream is a full stream, and not executing any operation, otherwise, executing step S303.
S303, passing the stream ID through a Hash function H 2 (ID)=ID%N 2 Mapping to a certain row of the two-dimensional array bucket, if the flow exists in the corresponding row of the two-dimensional array bucket, executing step S304, and updating the information of the flow in the bucket; if the stream does not exist in the corresponding row of the two-dimensional array bucket and the empty bucket exists in the row, inserting the stream into the empty bucket; if the stream does not exist in the corresponding row of the two-dimensional array bucket and an empty bucket does not exist in the row, removing the stream with the minimum importance degree and smaller than the initial value in the row and inserting a new stream into a corresponding position; otherwise, the flow is discarded and step S305 is executed.
S304, when the stream exists in the inserted corresponding bucket, firstly recording the arrival time and the load size of the stream into an array in the bucket; secondly, adding 1 to a flow size counter in the bucket; finally, if the flag bit is 0, checking a frequency counter in the period, and if the frequency is in the interval between the lower limit and the upper limit, making the flow frequency counter increment; if the frequency is less than the lower limit, the flow frequency counter is decreased, and the frequency and the flag position in the period are set to be 1; if the flag bit is 1, the frequency number in the period is increased by 1, if the frequency number in the period is greater than the upper limit, the flow frequency counter is decreased, and the insertion is completed.
S305, if the stream is full after the insertion, setting the corresponding bit position of the bit group as 1, and executing the step S306;
s306, calculating the importance degree of the full stream, and if the importance degree of the full stream is greater than a second threshold value, recording stream information into a structured log file; wherein, according to the importance measure function: the importance degree = α × stream frequency + β × stream persistence, and the importance degree is calculated; the stream frequency refers to the degree that the frequency of a stream in a period accords with a given interval (lower limit is less than or equal to the stream frequency < upper limit) in all the appearing periods of the stream, and the higher the frequency is, the more the frequency of the stream appearing in all the periods of the stream accords with a preset target stream; the stream persistence refers to the duration of a stream, and a greater persistence indicates that the stream occurs more cycles, i.e., is more persistent.
Example two
The embodiment discloses an encrypted traffic online analysis system based on Sketch, which comprises:
the flow acquisition module is used for acquiring original network flow data;
the flow cleaning module is used for extracting quintuple information and statistical characteristic information of original network flow data, wherein the statistical characteristic information comprises load size and arrival time; filtering data packets with load sizes smaller than a first threshold value; acquiring a stream according to the quintuple information, and calculating a stream ID;
and the flow analysis module is used for performing coarse-grained identification on the flow by using the Sketch data structure according to the statistical characteristic information of the flow, dividing the importance degree of the flow and keeping the information of the flow of which the importance degree exceeds a second threshold value.
It should be noted that the flow collection module, the flow cleaning module and the flow analysis module correspond to the steps in the first embodiment, and the modules are the same as the corresponding steps in the implementation example and application scenario, but are not limited to the disclosure in the first embodiment. It should be noted that the modules described above as part of a system may be implemented in a computer system such as a set of computer executable instructions.
EXAMPLE III
The third embodiment of the present invention provides an electronic device, which includes a memory, a processor, and a computer instruction stored in the memory and executed on the processor, where when the computer instruction is executed by the processor, the step of the above encrypted traffic online analysis method based on Sketch is completed.
Example four
The fourth embodiment of the present invention provides a computer-readable storage medium, configured to store a computer instruction, where the computer instruction, when executed by a processor, completes the steps of the above encrypted traffic online analysis method based on Sketch.
The present invention is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
In the foregoing embodiments, the descriptions of the embodiments have different emphasis, and for parts that are not described in detail in a certain embodiment, reference may be made to related descriptions of other embodiments.
The above description is only a preferred embodiment of the present application and is not intended to limit the present application, and various modifications and changes may be made by those skilled in the art. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present application shall be included in the protection scope of the present application.

Claims (10)

1. An encryption flow online analysis method based on Sketch is characterized by comprising the following steps:
collecting original network flow data;
extracting quintuple information and statistical characteristic information of original network flow data, wherein the statistical characteristic information comprises load size and arrival time; filtering the data packets with the load size smaller than a first threshold value according to the load size; acquiring a stream according to the quintuple information, and calculating a stream ID;
and according to the statistical characteristic information of the stream, performing coarse-grained identification on the stream by using a Sketch data structure, dividing the importance degree of the stream, and keeping the information of the stream of which the importance degree exceeds a second threshold value.
2. The Sketch-based encrypted traffic online analysis method as claimed in claim 1, wherein after calculating the stream ID, the method further comprises: judging whether the stream is a new stream or not according to the stream ID;
if the flow is a new flow, the flow ID, the quintuple information and the arrival time are saved.
3. The Sketch-based encrypted traffic online analysis method as claimed in claim 1, wherein the performing coarse-grained identification on the streams by using a Sketch data structure according to statistical characteristic information of the streams, and dividing the importance degree of the streams comprises:
periodically measuring the flow;
and dividing the stream into important degrees according to the stream frequency and the stream persistence of the stream.
4. 4 the Sketch-based encrypted traffic on-line analysis method as claimed in claim 3, wherein the stream frequency is the degree to which the frequency of a stream in all its occurring periods conforms to a given interval;
the stream persistence is the duration of one stream.
5. The Sketch-based encrypted traffic online analysis method as claimed in claim 1, wherein the Sketch data structure comprises a bit array, a two-dimensional array bucket, a clock scanning module, an importance metric function and a hyper-parameter; the bit array is used for judging whether the stream is full, the two-dimensional array bucket is used for storing the load size and the arrival time of a data packet in the stream and the ID, the load size, the frequency and the persistence of the stream, the clock scanning module is used for periodically measuring the stream, the importance measuring function is used for obtaining the importance degree of the stream according to the stream frequency and the stream persistence, and the superparameter is used for maintaining the periodic measurement of the stream.
6. Sketch-based encrypted traffic on-line analysis method according to claim 1, characterized in that whether to use the capture filter and/or remote capture is set according to the demand of the collected traffic.
7. The Sketch-based encrypted traffic online analysis method as claimed in claim 1, wherein the load size is a size of a packet transport layer payload, the arrival time is a core time of the network device when the packet is recorded on a storage medium thereof by the network device, and the five-tuple information includes a source IP address, a source port, a destination IP address, a destination port, and protocol type information of the packet.
8. A encrypted flow online analysis system based on Sketch is characterized by comprising:
the flow acquisition module is used for acquiring original network flow data;
the flow cleaning module is used for extracting quintuple information and statistical characteristic information of original network flow data, wherein the statistical characteristic information comprises load size and arrival time; filtering data packets with load sizes smaller than a first threshold value; acquiring a stream according to the quintuple information, and calculating a stream ID;
and the flow analysis module is used for performing coarse-grained identification on the flow by using the Sketch data structure according to the statistical characteristic information of the flow, dividing the importance degree of the flow and keeping the information of the flow of which the importance degree exceeds a second threshold value.
9. An electronic device comprising a memory and a processor and computer instructions stored on the memory and executed on the processor, the computer instructions when executed by the processor performing the steps of any of claims 1-7.
10. A computer-readable storage medium storing computer instructions which, when executed by a processor, perform the steps of any one of claims 1-7.
CN202211053892.2A 2022-08-31 2022-08-31 Encryption traffic online analysis method and system based on Sketch Active CN115378850B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211053892.2A CN115378850B (en) 2022-08-31 2022-08-31 Encryption traffic online analysis method and system based on Sketch

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211053892.2A CN115378850B (en) 2022-08-31 2022-08-31 Encryption traffic online analysis method and system based on Sketch

Publications (2)

Publication Number Publication Date
CN115378850A true CN115378850A (en) 2022-11-22
CN115378850B CN115378850B (en) 2023-10-31

Family

ID=84070579

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211053892.2A Active CN115378850B (en) 2022-08-31 2022-08-31 Encryption traffic online analysis method and system based on Sketch

Country Status (1)

Country Link
CN (1) CN115378850B (en)

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20050286423A1 (en) * 2004-06-28 2005-12-29 Poletto Massimiliano A Flow logging for connection-based anomaly detection
CN110049061A (en) * 2019-04-29 2019-07-23 南京邮电大学 Lightweight ddos attack detection device and detection method on high speed network
CN113079176A (en) * 2021-04-14 2021-07-06 西安交通大学 High-speed network flow abnormity detection system suitable for mass data
CN113542195A (en) * 2020-04-16 2021-10-22 北京观成科技有限公司 Method, system and equipment for detecting malicious encrypted traffic
CN113965492A (en) * 2020-07-03 2022-01-21 华为技术有限公司 Data flow statistical method and device
CN114037009A (en) * 2021-11-05 2022-02-11 国网江苏省电力有限公司常州供电分公司 IP address portrait method based on space-time statistics
CN114205253A (en) * 2021-12-15 2022-03-18 长沙理工大学 Active large flow accurate detection framework and method based on small flow filtering

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20050286423A1 (en) * 2004-06-28 2005-12-29 Poletto Massimiliano A Flow logging for connection-based anomaly detection
CN110049061A (en) * 2019-04-29 2019-07-23 南京邮电大学 Lightweight ddos attack detection device and detection method on high speed network
CN113542195A (en) * 2020-04-16 2021-10-22 北京观成科技有限公司 Method, system and equipment for detecting malicious encrypted traffic
CN113965492A (en) * 2020-07-03 2022-01-21 华为技术有限公司 Data flow statistical method and device
CN113079176A (en) * 2021-04-14 2021-07-06 西安交通大学 High-speed network flow abnormity detection system suitable for mass data
CN114037009A (en) * 2021-11-05 2022-02-11 国网江苏省电力有限公司常州供电分公司 IP address portrait method based on space-time statistics
CN114205253A (en) * 2021-12-15 2022-03-18 长沙理工大学 Active large flow accurate detection framework and method based on small flow filtering

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
赵小欢: "互联网流采样技术综述", 小型微型计算机系统, no. 08, pages 41 - 46 *

Also Published As

Publication number Publication date
CN115378850B (en) 2023-10-31

Similar Documents

Publication Publication Date Title
US8510830B2 (en) Method and apparatus for efficient netflow data analysis
EP3282643B1 (en) Method and apparatus of estimating conversation in a distributed netflow environment
JP7048555B2 (en) Methods and equipment for detecting traffic
CN112434039A (en) Data storage method, device, storage medium and electronic device
CN113378961A (en) Network traffic identification method, device, equipment and computer program product
CN115776449B (en) Train Ethernet communication state monitoring method and system
TW201349797A (en) A network flow abnormality detection system and a method of the same
CN110661807A (en) Automatic acquisition method and device for IPv6 address
CN111641531B (en) DPDK-based data packet distribution and feature extraction method
CN110149247B (en) Network state detection method and device
CN113660209A (en) DDoS attack detection system based on sketch and federal learning and application
CN113645182B (en) Denial of service attack random forest detection method based on secondary feature screening
CN110995770B (en) Fuzzy test application effect comparison method
CN104410533A (en) Network user behavior identification system
CN114325405A (en) Battery pack consistency analysis method, modeling method, device, equipment and medium
US20160124841A1 (en) Information processing system and data processing method
CN115378850B (en) Encryption traffic online analysis method and system based on Sketch
CN108763289B (en) Massive heterogeneous sensor format data analysis method
CN105956036A (en) Transaction quality analysis device and transaction quality analysis method
CN113627499B (en) Smoke level estimation method and equipment based on diesel vehicle tail gas image of inspection station
CN113238911B (en) Alarm processing method and device
CN110336817B (en) Unknown protocol frame positioning method based on TextRank
CN111680286B (en) Refinement method of Internet of things equipment fingerprint library
CN110912767B (en) Single-point measurement method of network flow
CN113079176A (en) High-speed network flow abnormity detection system suitable for mass data

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant