CN115378850A - Sketch-based encryption flow online analysis method and system - Google Patents
Sketch-based encryption flow online analysis method and system Download PDFInfo
- Publication number
- CN115378850A CN115378850A CN202211053892.2A CN202211053892A CN115378850A CN 115378850 A CN115378850 A CN 115378850A CN 202211053892 A CN202211053892 A CN 202211053892A CN 115378850 A CN115378850 A CN 115378850A
- Authority
- CN
- China
- Prior art keywords
- stream
- flow
- sketch
- information
- data
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000004458 analytical method Methods 0.000 title claims abstract description 39
- 238000000034 method Methods 0.000 claims abstract description 29
- 238000001914 filtration Methods 0.000 claims abstract description 10
- 230000002688 persistence Effects 0.000 claims description 22
- 230000006870 function Effects 0.000 claims description 18
- 238000005259 measurement Methods 0.000 claims description 9
- 238000003860 storage Methods 0.000 claims description 8
- 238000005206 flow analysis Methods 0.000 claims description 5
- 230000000737 periodic effect Effects 0.000 claims description 5
- 238000004140 cleaning Methods 0.000 claims description 4
- 238000012545 processing Methods 0.000 abstract description 11
- 238000010586 diagram Methods 0.000 description 11
- 230000005540 biological transmission Effects 0.000 description 5
- 238000004590 computer program Methods 0.000 description 5
- 238000003491 array Methods 0.000 description 3
- 230000007547 defect Effects 0.000 description 3
- 238000005516 engineering process Methods 0.000 description 3
- 238000013507 mapping Methods 0.000 description 3
- 230000008569 process Effects 0.000 description 3
- 238000011160 research Methods 0.000 description 3
- 230000003247 decreasing effect Effects 0.000 description 2
- 238000003780 insertion Methods 0.000 description 2
- 230000037431 insertion Effects 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 238000012549 training Methods 0.000 description 2
- 230000009471 action Effects 0.000 description 1
- 238000013459 approach Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000004422 calculation algorithm Methods 0.000 description 1
- 238000004364 calculation method Methods 0.000 description 1
- 238000004891 communication Methods 0.000 description 1
- 230000006835 compression Effects 0.000 description 1
- 238000007906 compression Methods 0.000 description 1
- 238000013136 deep learning model Methods 0.000 description 1
- 230000007812 deficiency Effects 0.000 description 1
- 238000001514 detection method Methods 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 238000007689 inspection Methods 0.000 description 1
- 238000010801 machine learning Methods 0.000 description 1
- 238000004519 manufacturing process Methods 0.000 description 1
- 239000003550 marker Substances 0.000 description 1
- 230000002085 persistent effect Effects 0.000 description 1
- 230000002123 temporal effect Effects 0.000 description 1
Images
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L43/00—Arrangements for monitoring or testing data switching networks
- H04L43/08—Monitoring or testing based on specific metrics, e.g. QoS, energy consumption or environmental parameters
- H04L43/0876—Network utilisation, e.g. volume of load or congestion level
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L43/00—Arrangements for monitoring or testing data switching networks
- H04L43/12—Network monitoring probes
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L43/00—Arrangements for monitoring or testing data switching networks
- H04L43/16—Threshold monitoring
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L63/00—Network architectures or network communication protocols for network security
- H04L63/14—Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic
- H04L63/1408—Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic by monitoring network traffic
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02D—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
- Y02D30/00—Reducing energy consumption in communication networks
- Y02D30/50—Reducing energy consumption in communication networks in wire-line communication networks, e.g. low power modes or reduced link rate
Abstract
The invention discloses a Sketch-based encrypted flow online analysis method and a system, and belongs to the technical field of network security. Collecting original network flow data; extracting quintuple information and statistical characteristic information of original network flow data, wherein the statistical characteristic information comprises load size and arrival time, and filtering data packets with the load size smaller than a first threshold; acquiring a stream according to the quintuple information, and calculating a stream ID; and according to the statistical characteristic information of the stream, performing coarse-grained identification on the stream by using a Sketch data structure, dividing the importance degree of the stream, and keeping the information of the stream of which the importance degree exceeds a second threshold value. The method can realize reliable online identification and analysis of the encrypted flow on the premise of low memory occupation and high processing speed, and solves the problems of high computational power occupation, large memory occupation, low processing speed and difficulty in online identification in the prior art.
Description
Technical Field
The application relates to the technical field of network security, in particular to a Sketch-based encrypted flow online analysis method and system.
Background
The statements in this section merely provide background information related to the present disclosure and may not constitute prior art.
The traffic analysis and identification are one of the bases for improving the network service quality and maintaining the network space security, but with the increasing share of the encrypted communication traffic in the network, the original method based on content detection is no longer effective, and the realization of the fast and accurate encrypted traffic analysis and identification is increasingly difficult. Research has shown that the proportion of encrypted traffic in the network has reached over 80% since 2020. In order to maintain a green and healthy network space, encrypted traffic analysis and identification become a research hotspot in academia and industry in recent years.
Due to the use of technologies such as dynamic port and port camouflage, the traditional identification method based on the port is invalid, and the existing traffic analysis and identification method can be roughly divided into two types, which are respectively: content-based identification methods and non-content-based identification methods.
Content-based identification methods (e.g., deep packet inspection techniques) require the content in each data packet to be inspected and matched against a fingerprint library established in advance to complete the identification. This approach is inefficient and not suitable for encrypted traffic identification.
The identification method not based on the content (such as utilizing the flow statistical information) has good effect on the accuracy of encrypted traffic identification, but the training of the machine learning model needs to use a large amount of data samples, and the training of the deep learning model needs a large amount of calculation power. In addition, the model memory occupies a large amount, and is difficult to be used for online analysis and identification.
In summary, the following deficiencies still exist in the current research on flow analysis and identification: with the popularization of encryption technology, the traditional method is gradually ineffective; the emerging method also has the defects of high computational power occupation, large memory occupation, low processing speed, difficulty in online identification and the like.
Disclosure of Invention
In order to solve the defects of the prior art, the application provides a method and a system for encrypted flow online analysis based on Sketch, firstly, network flow is collected and statistical characteristics of data packets are extracted, then, sketch is used for carrying out periodic measurement on flow, importance degrees are divided according to duration and frequency of the flow, finally, statistical information of the first K packets in the flow with higher importance degrees is kept, the flow with lower importance degrees is eliminated, and the encrypted flow online identification analysis which is more reliable can be realized on the premise of low memory occupation and high processing speed.
In a first aspect, the application provides an encryption flow online analysis method based on Sketch;
an encryption traffic online analysis method based on Sketch comprises the following steps:
collecting original network flow data;
extracting quintuple information and statistical characteristic information of original network traffic data, wherein the statistical characteristic information comprises load size and arrival time, and filtering data packets with the load size smaller than a first threshold; acquiring a stream according to the quintuple information, and calculating a stream ID;
and according to the statistical characteristic information of the stream, performing coarse-grained identification on the stream by using a Sketch data structure, dividing the importance degree of the stream, and keeping the information of the stream of which the importance degree exceeds a second threshold value.
In a second aspect, the application provides a system for encrypted traffic online analysis based on Sketch;
a Sketch-based encrypted traffic online analysis system comprises:
the flow acquisition module is used for acquiring original network flow data;
the flow cleaning module is used for extracting quintuple information and statistical characteristic information of original network flow data, wherein the statistical characteristic information comprises load size and arrival time, and filtering data packets with the load size smaller than a first threshold value; acquiring a stream according to the quintuple information, and calculating a stream ID;
and the flow analysis module is used for performing coarse-grained identification on the flow by using the Sketch data structure according to the statistical characteristic information of the flow, dividing the importance degree of the flow and keeping the information of the flow of which the importance degree exceeds a second threshold value.
In a third aspect, the present application provides an electronic device;
an electronic device comprises a memory, a processor and computer instructions stored on the memory and executed on the processor, wherein the computer instructions are executed by the processor to complete the steps of the Skatch-based encryption traffic online analysis method.
In a fourth aspect, the present application provides a computer-readable storage medium;
a computer readable storage medium for storing computer instructions, wherein the computer instructions, when executed by a processor, perform the steps of the above Sketch-based encrypted traffic online analysis method.
Compared with the prior art, the beneficial effects of this application are:
the application provides a Sketch-based encryption flow online analysis method and system. Firstly, the method realizes the on-line capture and analysis of the network flow; secondly, the method and the device perform compression storage according to the statistical characteristics of the network flow, so that a large amount of original data of the network flow is prevented from being stored, and the memory occupation is greatly reduced; in addition, since only the statistical characteristics of the flow are used, the present application is equally applicable to encrypting traffic; in addition, by using the Sketch data structure, the requirement of online measurement can be met, and network flow can also be periodically measured; finally, the method and the device divide the importance degree of the stream according to the frequency and the persistence of the stream, store the information of the first K packets of the important stream, reduce the memory occupation and improve the processing speed.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this application, are included to provide a further understanding of the application, and the description of the exemplary embodiments and illustrations of the application are intended to explain the application and are not intended to limit the application.
Fig. 1 is a schematic flow chart of an encryption traffic online analysis method based on Sketch according to an embodiment of the present application;
fig. 2 is a system framework schematic diagram of a system for encrypted traffic online analysis based on Sketch according to an embodiment of the present application;
fig. 3 is a schematic structural diagram of a Sketch data structure provided in an embodiment of the present application;
fig. 4 is a schematic flowchart of a Sketch data structure update operation provided in an embodiment of the present application;
fig. 5 is a schematic operation flow diagram of a clock scanning module in the Sketch data structure according to an embodiment of the present application.
Detailed Description
It should be noted that the following detailed description is exemplary and is intended to provide further explanation of the disclosure herein. Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this application belongs.
It is noted that the terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of exemplary embodiments according to the present application. As used herein, the singular forms "a", "an", and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise, and it should be understood that the terms "comprises" and "comprising", and any variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.
The embodiments and features of the embodiments of the invention may be combined with each other without conflict.
Example one
In the prior art, with the popularization of encryption technology, the traditional method is gradually ineffective; the emerging method also has the defects of high computational power occupation, large memory occupation, low processing speed, difficulty in online identification and the like; sketch is an efficient data structure with stable error boundaries, using a hashing strategy to store as much and correct raw data statistics as possible with much less spatial overhead and very low temporal overhead than the raw data. By designing the method using Sketch, the encryption traffic online identification with low overhead, high efficiency and reliability is realized. Therefore, the application provides an encryption flow online analysis method based on Sketch.
A Sketch-based encryption flow online analysis method comprises the following steps:
collecting original network flow data;
extracting quintuple information and statistical characteristic information of original network traffic data, wherein the statistical characteristic information comprises load size and arrival time, and filtering data packets with the load size smaller than a first threshold; acquiring a stream according to the quintuple information, and calculating a stream ID;
and according to the statistical characteristic information of the stream, performing coarse-grained identification on the stream by using a Sketch data structure, dividing the importance degree of the stream, and keeping the information of the stream of which the importance degree exceeds a second threshold value.
Further, after calculating the stream ID, the method further includes:
judging whether the stream is a new stream or not according to the stream ID;
if the flow is a new flow, the flow ID, the quintuple information and the arrival time are saved.
Further, according to the statistical characteristic information of the stream, performing coarse-grained identification on the stream by using a Sketch data structure, wherein the importance degree of dividing the stream comprises the following steps:
periodically measuring the flow;
and dividing the stream into important degrees according to the stream frequency and the stream persistence of the stream.
Furthermore, the stream frequency is the degree that the frequency of a stream in all the appearing periods of the stream conforms to a given interval;
stream persistence is the degree of persistence of a stream.
Furthermore, the Sketch data structure comprises a bit array, a two-dimensional array bucket, a clock scanning module, an importance measurement function and a hyper-parameter; the two-dimensional array bucket is used for storing the load size and the arrival time of a data packet in the stream and the ID, the load size, the frequency and the persistence of the stream, the clock scanning module is used for periodically measuring the stream, the importance measuring function is used for obtaining the importance degree of the stream according to the stream frequency and the stream persistence, and the hyper-parameter is used for maintaining the periodic measurement of the stream.
Further, whether to use a capture filter and/or remote capture is set according to the demand of the collected flow.
Further, the load size is the size of the payload of the transport layer of the data packet, the arrival time is the kernel time of the network device when the data packet is recorded on the storage medium of the network device by the network device, and the quintuple information includes the source IP address, the source port, the destination IP address, the destination port and the protocol type information of the data packet.
Next, an encrypted traffic online analysis method based on Sketch disclosed in this embodiment is described in detail with reference to fig. 1 to 5.
The embodiment provides an encryption traffic online analysis method based on Sketch.
A Sketch-based encryption flow online analysis method comprises the following steps:
s1, collecting original network flow data; the method comprises the following specific steps:
s101, reasonably setting a capture filter according to the requirement of acquiring original network flow to realize accurate capture of the flow, and filtering the flow based on modes such as types, transmission directions, protocols and data; wherein, the grammar expression of the capture filter is: { < protocol > < direction > < host > < value > < logical operation > < other expression > }, the roles of the fields and optional values are shown in table 1.
Table 1: filter expression field introduction
S102, capturing the flow of the remote network equipment by using remote capture, and separating a flow acquisition end from an analysis end to reduce the system load. Specifically, network connection between the device to be captured and the device to be captured is established and maintained, and rpcapd.exe application program (WinPcap remote capture server program) is run on the device to be captured to realize transmission of captured traffic data to the device to be captured; after the connection is established, the IP address, the user name and the corresponding password of the captured device are set in the capturing device to realize remote capturing.
S103, listing and selecting a captured host network card according to the identifier of the target network card, and finishing the binding of the target network card; the network card is uniquely determined by identifiers, different network card identifiers are different, and the identifier format is as follows: { FFFFFFFF-FFFF-FFFF-FFFF-FFFFFFFF }.
S104, starting to collect flow, enabling original network flow data to firstly reach a kernel buffer area of NPF (network Packet Filter) of WinPcap, and when the buffer area is full, transmitting the data into a user state buffer area in a callback function mode for asynchronous processing.
S2, extracting quintuple information and statistical characteristic information of the original network flow data, wherein the statistical characteristic information comprises load size and arrival time; filtering data packets with load sizes smaller than a first threshold value; acquiring a stream according to the quintuple information, and calculating a stream ID; the load size is the size of the effective load of a data packet transmission layer, the arrival time is the kernel time of the network equipment when the data packet is recorded on a storage medium of the network equipment by the network equipment, and the quintuple information comprises a source IP address, a source port, a destination IP address, a destination port and protocol type information of the data packet; the first threshold value is set according to task requirements and experience; the method comprises the following specific steps:
s201, obtaining the arrival time of a data packet and the total length information of the data packet; analyzing data link layer header information of a data packet according to an Ethernet data frame format, analyzing network layer information of the data packet, acquiring a source IP address, a destination IP address and transport layer protocol type information of the data packet, and acquiring the length of the IP datagram header and the total length including an effective load of the IP datagram header to calculate the size of the load; if the network card starts a GRO (general Receive Offload) function, fragmented TCP packets sent by a plurality of opposite terminals are aggregated in advance in the network card and then are transmitted to the NPF kernel buffer area, where the total length field of the IP datagram is 0, and the total length information of the packet is used to replace the calculated load.
S202, analyzing transmission layer information of a data packet, acquiring length information of a source port, a destination port and a transmission layer header, filtering the data packet with a load smaller than a first threshold value, wherein the data packet with the same quintuple information is a stream, and hashing the quintuple information of the data packet by using a hash function algorithm XXhash to generate a sixty-four-bit long unsigned integer as a stream ID; wherein the first threshold is set according to task requirements and experience.
S203, mapping the stream ID to the size N 1 If the corresponding bit position of the bit array is 0, the fact that the bit array does not appear before is proved, quintuple information and arrival time are stored in the structured log file, and the corresponding bit position of the five-tuple information is 1; if the corresponding bit is 1, no recording is performed.
S3, according to the statistical characteristic information of the stream, performing coarse-grained identification on the stream by using a Sketch data structure, dividing the importance degree of the stream, and keeping the information of the stream of which the importance degree exceeds a second threshold value; the information of the flow includes a flow ID, quintuple information, flow importance and statistical characteristic information, where the statistical characteristic information includes, but is not limited to, load size and arrival time; the second threshold value is set according to task requirements and experience;
specifically, the Sketch data structure includes a bit array with a size of N1, a two-dimensional array bucket with a size of N2 rows and d columns, a clock scanning module, an importance measurement function, and a series of hyper-parameters (cycle size, cycle internal frequency lower limit, cycle internal frequency upper limit) for maintaining periodic measurement of the stream; the two-dimensional array bucket comprises two arrays with the size of K and 6 counters, wherein the arrays are used for storing the load size and arrival time information of the first K packets in the stream, the counters are respectively used for storing the ID, the size, the frequency, the persistence, the frequency appearing in the current period and the flag bit whether appearing in the current period, the initial values of the frequency and the persistence of the stream are 100, and the initial values of the rest counters and the arrays are 0; the bit array is used for judging whether the arrival stream is a full stream or not, and the initial value is also 0; the clock scanning module is used for maintaining the periodic measurement of the convection, scanning all the buckets in the two-dimensional array in one period and reducing the persistence of the flow which does not appear in the current period; the importance measure function is a function of flow frequency and persistence, and the function formula is: the importance = α × flow frequency + β × flow persistence, where α and β can be adjusted accordingly within the interval [ -1,1] according to the needs of the measurement task; the Sketch data structure is shown in fig. 3, the Sketch update operation flow is shown in fig. 4, and the roles and meanings of the fields are shown in table 2;
table 2: introduction of elements in Sketch two-dimensional array bucket
Element name | Action and meanings |
Stream ID | Uniquely identifying a network data stream |
Flow size | Recording the number of data packets in a stream |
Frequency of the reaction | Indicating the degree of coincidence of the frequency of occurrence of the stream with a predetermined target interval |
Degree of persistence | Indicating the degree of persistence of the occurrence of the stream |
Frequency of cycles | Recording the number of occurrences of the stream in a cycle |
Marker bit | Marking whether the stream appears in the current cycleFor treating |
Load size array | Record the load size information of the first K packets of the stream |
Time of arrival array | Recording the arrival time information of the first K packets of the stream |
The method comprises the following specific steps:
s301, scanning all the buckets in the two-dimensional array buckets one by one at a constant speed by using a clock scanning module, and ensuring that all the buckets are just scanned in one period. When the pointer points to a bucket, if the memory of the bucket flows and the flag bit is 0, attenuating the flow persistence; if the current exists in the bucket and the flag bit is 1, the stream persistence is increased progressively, and then the flag bit is set to 0, which means that the current bucket ends in the current period and the new period begins; clock scanning operation as shown in fig. 5, steps S302-S306 are performed while periodically measuring.
S302, when the flow arrives, firstly, the flow ID passes through a hash function H 1 (ID)=ID%N 1 Mapping to a certain position of the bit array, if the corresponding bit is 1, then proving that the stream is a full stream, and not executing any operation, otherwise, executing step S303.
S303, passing the stream ID through a Hash function H 2 (ID)=ID%N 2 Mapping to a certain row of the two-dimensional array bucket, if the flow exists in the corresponding row of the two-dimensional array bucket, executing step S304, and updating the information of the flow in the bucket; if the stream does not exist in the corresponding row of the two-dimensional array bucket and the empty bucket exists in the row, inserting the stream into the empty bucket; if the stream does not exist in the corresponding row of the two-dimensional array bucket and an empty bucket does not exist in the row, removing the stream with the minimum importance degree and smaller than the initial value in the row and inserting a new stream into a corresponding position; otherwise, the flow is discarded and step S305 is executed.
S304, when the stream exists in the inserted corresponding bucket, firstly recording the arrival time and the load size of the stream into an array in the bucket; secondly, adding 1 to a flow size counter in the bucket; finally, if the flag bit is 0, checking a frequency counter in the period, and if the frequency is in the interval between the lower limit and the upper limit, making the flow frequency counter increment; if the frequency is less than the lower limit, the flow frequency counter is decreased, and the frequency and the flag position in the period are set to be 1; if the flag bit is 1, the frequency number in the period is increased by 1, if the frequency number in the period is greater than the upper limit, the flow frequency counter is decreased, and the insertion is completed.
S305, if the stream is full after the insertion, setting the corresponding bit position of the bit group as 1, and executing the step S306;
s306, calculating the importance degree of the full stream, and if the importance degree of the full stream is greater than a second threshold value, recording stream information into a structured log file; wherein, according to the importance measure function: the importance degree = α × stream frequency + β × stream persistence, and the importance degree is calculated; the stream frequency refers to the degree that the frequency of a stream in a period accords with a given interval (lower limit is less than or equal to the stream frequency < upper limit) in all the appearing periods of the stream, and the higher the frequency is, the more the frequency of the stream appearing in all the periods of the stream accords with a preset target stream; the stream persistence refers to the duration of a stream, and a greater persistence indicates that the stream occurs more cycles, i.e., is more persistent.
Example two
The embodiment discloses an encrypted traffic online analysis system based on Sketch, which comprises:
the flow acquisition module is used for acquiring original network flow data;
the flow cleaning module is used for extracting quintuple information and statistical characteristic information of original network flow data, wherein the statistical characteristic information comprises load size and arrival time; filtering data packets with load sizes smaller than a first threshold value; acquiring a stream according to the quintuple information, and calculating a stream ID;
and the flow analysis module is used for performing coarse-grained identification on the flow by using the Sketch data structure according to the statistical characteristic information of the flow, dividing the importance degree of the flow and keeping the information of the flow of which the importance degree exceeds a second threshold value.
It should be noted that the flow collection module, the flow cleaning module and the flow analysis module correspond to the steps in the first embodiment, and the modules are the same as the corresponding steps in the implementation example and application scenario, but are not limited to the disclosure in the first embodiment. It should be noted that the modules described above as part of a system may be implemented in a computer system such as a set of computer executable instructions.
EXAMPLE III
The third embodiment of the present invention provides an electronic device, which includes a memory, a processor, and a computer instruction stored in the memory and executed on the processor, where when the computer instruction is executed by the processor, the step of the above encrypted traffic online analysis method based on Sketch is completed.
Example four
The fourth embodiment of the present invention provides a computer-readable storage medium, configured to store a computer instruction, where the computer instruction, when executed by a processor, completes the steps of the above encrypted traffic online analysis method based on Sketch.
The present invention is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
In the foregoing embodiments, the descriptions of the embodiments have different emphasis, and for parts that are not described in detail in a certain embodiment, reference may be made to related descriptions of other embodiments.
The above description is only a preferred embodiment of the present application and is not intended to limit the present application, and various modifications and changes may be made by those skilled in the art. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present application shall be included in the protection scope of the present application.
Claims (10)
1. An encryption flow online analysis method based on Sketch is characterized by comprising the following steps:
collecting original network flow data;
extracting quintuple information and statistical characteristic information of original network flow data, wherein the statistical characteristic information comprises load size and arrival time; filtering the data packets with the load size smaller than a first threshold value according to the load size; acquiring a stream according to the quintuple information, and calculating a stream ID;
and according to the statistical characteristic information of the stream, performing coarse-grained identification on the stream by using a Sketch data structure, dividing the importance degree of the stream, and keeping the information of the stream of which the importance degree exceeds a second threshold value.
2. The Sketch-based encrypted traffic online analysis method as claimed in claim 1, wherein after calculating the stream ID, the method further comprises: judging whether the stream is a new stream or not according to the stream ID;
if the flow is a new flow, the flow ID, the quintuple information and the arrival time are saved.
3. The Sketch-based encrypted traffic online analysis method as claimed in claim 1, wherein the performing coarse-grained identification on the streams by using a Sketch data structure according to statistical characteristic information of the streams, and dividing the importance degree of the streams comprises:
periodically measuring the flow;
and dividing the stream into important degrees according to the stream frequency and the stream persistence of the stream.
4. 4 the Sketch-based encrypted traffic on-line analysis method as claimed in claim 3, wherein the stream frequency is the degree to which the frequency of a stream in all its occurring periods conforms to a given interval;
the stream persistence is the duration of one stream.
5. The Sketch-based encrypted traffic online analysis method as claimed in claim 1, wherein the Sketch data structure comprises a bit array, a two-dimensional array bucket, a clock scanning module, an importance metric function and a hyper-parameter; the bit array is used for judging whether the stream is full, the two-dimensional array bucket is used for storing the load size and the arrival time of a data packet in the stream and the ID, the load size, the frequency and the persistence of the stream, the clock scanning module is used for periodically measuring the stream, the importance measuring function is used for obtaining the importance degree of the stream according to the stream frequency and the stream persistence, and the superparameter is used for maintaining the periodic measurement of the stream.
6. Sketch-based encrypted traffic on-line analysis method according to claim 1, characterized in that whether to use the capture filter and/or remote capture is set according to the demand of the collected traffic.
7. The Sketch-based encrypted traffic online analysis method as claimed in claim 1, wherein the load size is a size of a packet transport layer payload, the arrival time is a core time of the network device when the packet is recorded on a storage medium thereof by the network device, and the five-tuple information includes a source IP address, a source port, a destination IP address, a destination port, and protocol type information of the packet.
8. A encrypted flow online analysis system based on Sketch is characterized by comprising:
the flow acquisition module is used for acquiring original network flow data;
the flow cleaning module is used for extracting quintuple information and statistical characteristic information of original network flow data, wherein the statistical characteristic information comprises load size and arrival time; filtering data packets with load sizes smaller than a first threshold value; acquiring a stream according to the quintuple information, and calculating a stream ID;
and the flow analysis module is used for performing coarse-grained identification on the flow by using the Sketch data structure according to the statistical characteristic information of the flow, dividing the importance degree of the flow and keeping the information of the flow of which the importance degree exceeds a second threshold value.
9. An electronic device comprising a memory and a processor and computer instructions stored on the memory and executed on the processor, the computer instructions when executed by the processor performing the steps of any of claims 1-7.
10. A computer-readable storage medium storing computer instructions which, when executed by a processor, perform the steps of any one of claims 1-7.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202211053892.2A CN115378850B (en) | 2022-08-31 | 2022-08-31 | Encryption traffic online analysis method and system based on Sketch |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202211053892.2A CN115378850B (en) | 2022-08-31 | 2022-08-31 | Encryption traffic online analysis method and system based on Sketch |
Publications (2)
Publication Number | Publication Date |
---|---|
CN115378850A true CN115378850A (en) | 2022-11-22 |
CN115378850B CN115378850B (en) | 2023-10-31 |
Family
ID=84070579
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202211053892.2A Active CN115378850B (en) | 2022-08-31 | 2022-08-31 | Encryption traffic online analysis method and system based on Sketch |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN115378850B (en) |
Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20050286423A1 (en) * | 2004-06-28 | 2005-12-29 | Poletto Massimiliano A | Flow logging for connection-based anomaly detection |
CN110049061A (en) * | 2019-04-29 | 2019-07-23 | 南京邮电大学 | Lightweight ddos attack detection device and detection method on high speed network |
CN113079176A (en) * | 2021-04-14 | 2021-07-06 | 西安交通大学 | High-speed network flow abnormity detection system suitable for mass data |
CN113542195A (en) * | 2020-04-16 | 2021-10-22 | 北京观成科技有限公司 | Method, system and equipment for detecting malicious encrypted traffic |
CN113965492A (en) * | 2020-07-03 | 2022-01-21 | 华为技术有限公司 | Data flow statistical method and device |
CN114037009A (en) * | 2021-11-05 | 2022-02-11 | 国网江苏省电力有限公司常州供电分公司 | IP address portrait method based on space-time statistics |
CN114205253A (en) * | 2021-12-15 | 2022-03-18 | 长沙理工大学 | Active large flow accurate detection framework and method based on small flow filtering |
-
2022
- 2022-08-31 CN CN202211053892.2A patent/CN115378850B/en active Active
Patent Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20050286423A1 (en) * | 2004-06-28 | 2005-12-29 | Poletto Massimiliano A | Flow logging for connection-based anomaly detection |
CN110049061A (en) * | 2019-04-29 | 2019-07-23 | 南京邮电大学 | Lightweight ddos attack detection device and detection method on high speed network |
CN113542195A (en) * | 2020-04-16 | 2021-10-22 | 北京观成科技有限公司 | Method, system and equipment for detecting malicious encrypted traffic |
CN113965492A (en) * | 2020-07-03 | 2022-01-21 | 华为技术有限公司 | Data flow statistical method and device |
CN113079176A (en) * | 2021-04-14 | 2021-07-06 | 西安交通大学 | High-speed network flow abnormity detection system suitable for mass data |
CN114037009A (en) * | 2021-11-05 | 2022-02-11 | 国网江苏省电力有限公司常州供电分公司 | IP address portrait method based on space-time statistics |
CN114205253A (en) * | 2021-12-15 | 2022-03-18 | 长沙理工大学 | Active large flow accurate detection framework and method based on small flow filtering |
Non-Patent Citations (1)
Title |
---|
赵小欢: "互联网流采样技术综述", 小型微型计算机系统, no. 08, pages 41 - 46 * |
Also Published As
Publication number | Publication date |
---|---|
CN115378850B (en) | 2023-10-31 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US8510830B2 (en) | Method and apparatus for efficient netflow data analysis | |
EP3282643B1 (en) | Method and apparatus of estimating conversation in a distributed netflow environment | |
JP7048555B2 (en) | Methods and equipment for detecting traffic | |
CN112434039A (en) | Data storage method, device, storage medium and electronic device | |
CN113378961A (en) | Network traffic identification method, device, equipment and computer program product | |
CN115776449B (en) | Train Ethernet communication state monitoring method and system | |
TW201349797A (en) | A network flow abnormality detection system and a method of the same | |
CN110661807A (en) | Automatic acquisition method and device for IPv6 address | |
CN111641531B (en) | DPDK-based data packet distribution and feature extraction method | |
CN110149247B (en) | Network state detection method and device | |
CN113660209A (en) | DDoS attack detection system based on sketch and federal learning and application | |
CN113645182B (en) | Denial of service attack random forest detection method based on secondary feature screening | |
CN110995770B (en) | Fuzzy test application effect comparison method | |
CN104410533A (en) | Network user behavior identification system | |
CN114325405A (en) | Battery pack consistency analysis method, modeling method, device, equipment and medium | |
US20160124841A1 (en) | Information processing system and data processing method | |
CN115378850B (en) | Encryption traffic online analysis method and system based on Sketch | |
CN108763289B (en) | Massive heterogeneous sensor format data analysis method | |
CN105956036A (en) | Transaction quality analysis device and transaction quality analysis method | |
CN113627499B (en) | Smoke level estimation method and equipment based on diesel vehicle tail gas image of inspection station | |
CN113238911B (en) | Alarm processing method and device | |
CN110336817B (en) | Unknown protocol frame positioning method based on TextRank | |
CN111680286B (en) | Refinement method of Internet of things equipment fingerprint library | |
CN110912767B (en) | Single-point measurement method of network flow | |
CN113079176A (en) | High-speed network flow abnormity detection system suitable for mass data |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |