CN112583738A - Method, equipment and storage medium for analyzing and classifying network flow - Google Patents

Method, equipment and storage medium for analyzing and classifying network flow Download PDF

Info

Publication number
CN112583738A
CN112583738A CN202011593229.2A CN202011593229A CN112583738A CN 112583738 A CN112583738 A CN 112583738A CN 202011593229 A CN202011593229 A CN 202011593229A CN 112583738 A CN112583738 A CN 112583738A
Authority
CN
China
Prior art keywords
analyzing
network traffic
network
steps
message
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202011593229.2A
Other languages
Chinese (zh)
Inventor
肖梅
齐凯
窦伊男
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Haohan Data Technology Co ltd
Original Assignee
Haohan Data Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Haohan Data Technology Co ltd filed Critical Haohan Data Technology Co ltd
Priority to CN202011593229.2A priority Critical patent/CN112583738A/en
Publication of CN112583738A publication Critical patent/CN112583738A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L47/00Traffic control in data switching networks
    • H04L47/10Flow control; Congestion control
    • H04L47/24Traffic characterised by specific attributes, e.g. priority or QoS
    • H04L47/2441Traffic characterised by specific attributes, e.g. priority or QoS relying on flow classification, e.g. using integrated services [IntServ]
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L61/00Network arrangements, protocols or services for addressing or naming
    • H04L61/45Network directories; Name-to-address mapping
    • H04L61/4505Network directories; Name-to-address mapping using standardised directories; using standardised directory access protocols
    • H04L61/4511Network directories; Name-to-address mapping using standardised directories; using standardised directory access protocols using domain name system [DNS]
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L69/00Network arrangements, protocols or services independent of the application payload and not provided for in the other groups of this subclass
    • H04L69/16Implementation or adaptation of Internet protocol [IP], of transmission control protocol [TCP] or of user datagram protocol [UDP]
    • H04L69/165Combined use of TCP and UDP protocols; selection criteria therefor

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Computer Security & Cryptography (AREA)
  • Data Exchanges In Wide-Area Networks (AREA)

Abstract

The invention relates to the technical field of network security and data processing, in particular to a method, a device, equipment and a storage medium for analyzing and classifying network traffic. In a first aspect of the present invention, a method for analyzing and classifying network traffic is provided, which includes the following steps: acquiring a message; shunting the message according to a protocol, wherein the message at least comprises common TCP/UDP; extracting fingerprints of each type from the common TCP/UDP according to a payload clustering method; acquiring a domain name with the largest jaccard coefficient in network connection; and judging whether the existing resource library has relevant characteristics, and if not, warehousing the corresponding classification rules. The invention also provides an electronic device and a computer readable storage medium. The method for analyzing and classifying the network flow does not need to prepare a sample library in advance, can automatically classify the flow in the network, analyzes the attribution service of each type of flow, and realizes automatic and accurate classification in the whole process.

Description

Method, equipment and storage medium for analyzing and classifying network flow
The technical field is as follows:
the present invention relates to the field of network security and data processing technologies, and in particular, to a method, an apparatus, a device, and a storage medium for analyzing and classifying network traffic.
Background art:
the classification of the network traffic has important significance for the optimal configuration of network resources and the security application of the network. The network flow is accurately classified in real time, and the normal, stable and reliable operation of the network can be guaranteed.
Common network traffic analysis methods include DPI (Deep PacKet Inspection) and DFI (Deep/Dynamic Flow Inspection), wherein DPI is a method for extracting fingerprint features based on a single PacKet, and DFI is a method for extracting multi-PacKet features based on one network connection, and the features may be information such as fingerprints, PacKet lengths, time intervals, and the like. With the development of communication technology, the influence of intelligent equipment on people is more and more deepened, various APP and Internet of things equipment are in endless, network traffic is in the future and changes in a very great way, and time and labor are wasted in the prior art for classifying the network traffic by means of manpower. Many researchers at home and abroad propose some automatic network traffic classification methods based on machine learning, and the classification methods have some disadvantages: for the supervised machine learning classification method, firstly, a complete sample library is needed, the completeness of the sample library determines the accuracy of a classification result, secondly, only known services can be classified, and unknown services cannot be automatically classified due to the absence of samples. For the unsupervised machine learning classification methods, firstly, they can only classify the traffic and can only be classified into several appointed classes, which is not suitable for the classification of the network traffic in reality, because how many classes the network traffic can be classified into is unknown, secondly, they cannot automatically classify the traffic into a specific service, and only have the classification function, and do not have the analysis and classification capability.
Therefore, there is a need in the art for a method, apparatus, device, and storage medium for analyzing classified network traffic.
The invention is provided in view of the above.
The invention content is as follows:
in view of the above, the present invention provides a method, an apparatus, a device and a storage medium for analyzing and classifying network traffic, so as to solve at least one technical problem in the prior art.
Specifically, in a first aspect of the present invention, a method for analyzing and classifying network traffic is provided, which includes the following steps:
acquiring a message;
shunting the message according to a protocol, wherein the message at least comprises common TCP/UDP;
extracting fingerprints of each type from the common TCP/UDP according to a payload clustering method;
acquiring a domain name with the largest jaccard coefficient in network connection;
and judging whether the existing resource library has relevant characteristics, and if not, warehousing the corresponding classification rules.
By adopting the technical scheme, the method for analyzing and classifying the network flow does not need to prepare a sample library in advance, can automatically classify the flow in the network, analyzes the attribution service of each type of flow, and realizes automatic and accurate classification in the whole process.
Further, the extracting the fingerprint of each type according to the payload clustering method includes the following steps:
collecting basic information of a first data packet with a payload and each network connection of each network connection;
collecting the numerical value of a bytes in front of the data packet and the length value of the data packet;
forming the data into a (a +1) -dimensional vector;
calculating cosine included angles between every two vectors, and analyzing the similarity of two network connections;
clustering the multidimensional vectors by using a K-means clustering algorithm;
and extracting the multidimensional vector of each cluster center as the fingerprint of each class.
By adopting the technical scheme, the network connection is classified through the cosine theorem and the K-means clustering algorithm, the result is reliable, the operation is simple, and less computing resources are occupied.
Further, if the data packet is less than a bytes, 0 is complemented.
By adopting the technical scheme, the data is subjected to unified processing, the resource occupation is reduced by 0 supplementation, and the calculation speed is improved.
Further, the extracting the fingerprint of each type according to the payload clustering method includes the following steps:
the common TCP/UDP of the flow break is removed.
By adopting the technical scheme, the messages which cannot be classified are removed, the resource utilization rate is improved, and the calculation efficiency is improved.
Further, the acquiring the domain name with the largest jaccard coefficient in the network connection comprises the following steps:
acquiring domain names and times thereof accessed in a first time period in an XDR ticket;
calculating the jaccard coefficient of each domain name;
and collecting the domain name with the largest jaccard coefficient.
By adopting the technical scheme, the method is suitable for being applied to data with high sparsity and acquiring the appropriate domain name.
Further, the calculating the jaccard coefficient of each domain name comprises the following steps:
the Jaccard coefficient is C/(n + B-C), wherein C is the number of times of domain names visited by the user in each network connection XDR call ticket in a first time period, n is the number of multidimensional vectors, and B is the total number of times of each domain name appearing in the XDR call ticket in the first time period.
By adopting the technical scheme, the calculation is simple and the reliability is strong.
Further, the calculating the cosine included angle between every two vectors comprises the following steps:
Figure BDA0002869713140000021
wherein M ═ X1,X2,……,Xa,Xa+1) Is one (a +1) -dimensional vector, N ═ Y1,Y2,……,Ya,Ya+1) Is another (a +1) -dimensional vector.
By adopting the technical scheme, the calculation is simple and the reliability is strong.
Further, the first time period is [ start time- Δ t, start time ], where start time is network traffic generation time, and Δ t is a preset time interval.
Further, the step of warehousing the corresponding classification rules comprises the following steps:
searching a corresponding service name according to the domain name and the characteristics;
extracting fingerprints to match with corresponding service classifications;
and warehousing the business and fingerprint classification rules.
By adopting the technical scheme, the automatic classification of the network flow is realized.
Further, before offloading the network connection in the message according to the protocol, the method includes the following steps:
the network connection is removed.
By adopting the technical scheme, the resource waste is reduced, and the calculation efficiency is improved.
Further, the offloading the network connection in the message according to the protocol includes the following steps:
network traffic is divided into three major classes: DNS, HTTP/HTTPS/RTMP/RTSP/QUIC, common TCP/UDPDNS.
By adopting the technical scheme, the processing method is adopted according to different protocols, so that efficient classification is realized, and resource waste is reduced.
Further, the method for analyzing and classifying network traffic comprises the following steps:
if the DNS is selected, the classification is finished;
if the resource library is HTTP/HTTPS/RTMP/RTSP/QUIC, extracting the fingerprint of the designated position as the characteristic, judging whether the existing resource library has related characteristics, and if not, warehousing the corresponding classification rule.
By adopting the technical scheme, the resources are classified, reasonably distributed, the resource utilization rate is improved, and the automatic classification of HTTP/HTTPS/RTMP/RTSP/QUIC is realized.
Further, the method for analyzing and classifying network traffic comprises the following steps:
judging whether the message can be identified by the existing resource library: if yes, the classification is finished.
By adopting the technical scheme, the resource waste is reduced, the classification efficiency is improved, and the resources are reasonably utilized.
In a second aspect of the present invention, an electronic device is provided, where the electronic device includes a memory and a processor, and the memory has at least one instruction, and the at least one instruction is loaded and executed by the processor to implement the method.
In a third aspect of the present invention, a computer-readable storage medium is provided, on which at least one instruction is stored, the at least one instruction being loaded and executed by a processor to implement the above method.
In conclusion, the invention has the following beneficial effects:
1. the method for analyzing and classifying the network flow does not need to prepare a sample library in advance, can automatically classify the flow in the network, analyzes the attribution service of each type of flow, and realizes automatic and accurate classification in the whole process;
2. the method comprises the steps of firstly shunting and then classifying, and improving the classification efficiency;
3. whether the resource exists in the resource library or not is judged, analysis steps are reduced, direct classification is achieved, classification efficiency is improved, and resources are reasonably utilized.
Description of the drawings:
in order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.
FIG. 1 is a diagram illustrating an embodiment of a method for analyzing classified network traffic according to the present invention;
FIG. 2 is a schematic diagram illustrating another embodiment of a method for analyzing classified network traffic according to the present invention;
FIG. 3 is a schematic diagram of the steps of the present invention for extracting fingerprints of each class according to the payload clustering method.
The specific implementation mode is as follows:
the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be obtained by a person skilled in the art without inventive effort based on the embodiments of the present invention, are within the scope of the present invention.
The terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention. As used in this specification and the appended claims, the singular forms "a", "an", and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It should also be understood that the term "and/or" as used herein refers to and encompasses any and all possible combinations of one or more of the associated listed items.
The present invention will be described in detail below by way of examples.
Some concepts related to the present application are explained below:
TCP (Transmission Control Protocol): is a connection-oriented, reliable, byte stream-based transport layer communication protocol.
UDP (User Data Protocol): is a simple connectionless, unreliable transport layer protocol for datagrams. In the TCP/IP model, UDP provides a simple interface above the networK layer (networK layer) and below the application layer (application layer). UDP provides only unreliable delivery of data and it does not preserve a backup of data once the data sent by an application to the network layer is sent out (so UDP is also sometimes considered an unreliable datagram protocol).
3. Network connection: a network connection is defined by a source ip, a source port, a destination ip, a destination port, and a protocol (tcp/udp) quintuple.
Jaccard coefficient: also known as the Jaccard similarity coefficient (Jaccard similarity coefficient) is used to compare similarity and difference between finite sample sets. The larger the Jaccard coefficient value, the higher the sample similarity.
5. The cosine theorem: basic theorem of euclidean plane geometry. The cosine theorem is a mathematical theorem describing the cosine value relationship between the length of three sides in a triangle and an angle. For any triangle, the square of any side is equal to the sum of the squares of the other two sides minus twice the product of the cosines of the two sides and their angles.
If three sides are a, B, C, the triangles are A (alpha), B (beta), C (gamma), in Δ ABC,
Figure BDA0002869713140000051
SSE: sum of Squared Error, an index used to measure K-means clustering effect.
XDR ticket: the session level detailed record of the signaling process and the service transmission process is generated after the internet full data is processed, and contains all internet access information of the user. XDR (External Data Representation) is an IETF standard agreement made in 1995.
K-means clustering algorithm (K-means clustering algorithm): the method is a clustering analysis algorithm for iterative solution, and comprises the steps of dividing data into K groups in advance, randomly selecting K objects as initial clustering centers, calculating the distance between each object and each seed clustering center, and allocating each object to the nearest clustering center. The cluster centers and the objects assigned to them represent a cluster.
DNS (Domain Name System ): is a service of the internet. It acts as a distributed database that maps domain names and IP addresses to each other, enabling people to more conveniently access the internet. The DNS uses TCP and UDP ports 53.
HTTP (Hypertext Transfer protocol): is a simple request-response protocol that typically runs on top of TCP. It specifies what messages the client may send to the server and what responses to get.
HTTPS (hyper Text Transfer Protocol over secure Security layer) is an HTTP channel which takes security as a target, and the security of a transmission process is ensured through transmission encryption and identity authentication on the basis of HTTP.
RTMP (Real Time Messaging Protocol): the protocol is based on TCP, is a protocol family, is a network protocol designed for real-time data communication, and belongs to an application layer of a TCP/IP four-layer model like HTTP.
RTSP (Real Time Streaming Protocol): is an application layer protocol in a TCP/IP protocol system, and is an IETF RFC standard submitted by Columbia university, Internet scenery and RealNetworks companies. The protocol defines how one-to-many applications efficiently communicate multimedia data over an IP network.
QUIC (quick UDP Internet connection): the UDP-based low-delay Internet transport layer protocol is established by Google.
15. A message (message) is a data unit exchanged and transmitted in the network, i.e. a data block to be sent by a station at one time. The message contains complete data information to be sent, and the message is very inconsistent in length, unlimited in length and variable.
In order to better understand the technical scheme, the technical scheme is described in detail in the following with the accompanying drawings and the specific embodiment.
Specifically, the invention provides a method for analyzing and classifying network traffic, which comprises the following steps:
s100, obtaining a message;
s400, distributing the message according to a protocol, wherein the message at least comprises common TCP/UDP;
s500, extracting fingerprints of each type according to a payload clustering method for common TCP/UDP;
s600, acquiring a domain name with the largest jaccard coefficient in network connection;
s700, judging whether the existing resource library has relevant characteristics, if not, S800, warehousing the corresponding classification rules.
By adopting the technical scheme, the method for analyzing and classifying the network flow does not need to prepare a sample library in advance, can automatically classify the flow in the network, analyzes the attribution service of each type of flow, and realizes automatic and accurate classification in the whole process.
In a preferred embodiment of the present invention, the s500, according to the payload clustering method, extracting the fingerprint of each class includes the following steps:
s520, collecting a first data packet with payload of each network connection and basic information of each network connection;
s530, collecting the numerical value of a bytes in front of the data packet and the length value of the data packet;
s540, forming a (a +1) -dimensional vector by the data;
s550, calculating a cosine included angle between every two vectors, and analyzing the similarity of the two network connections;
s560, clustering the multi-dimensional vectors by using a K-means clustering algorithm;
and S570, extracting the multidimensional vector of each cluster center as the fingerprint of each class.
In a specific implementation process, a can be a natural number greater than 1, any integer from 10 to 50 can be selected, preferably 32, and the multidimensional vector is a 33-dimensional vector and consists of a hexadecimal number and a packet length value of 32 bytes. N 33-dimensional vectors are clustered by using a K-means clustering algorithm, and the clustering effect is measured by using SSE (sum of squared errors).
By adopting the technical scheme, the network connection is classified through the cosine theorem and the K-means clustering algorithm, the result is reliable, the operation is simple, and less computing resources are occupied.
In a preferred embodiment of the present invention, if the packet is less than a bytes, 0 is complemented.
By adopting the technical scheme, the data is subjected to unified processing, the resource occupation is reduced by 0 supplementation, and the calculation speed is improved.
In a preferred embodiment of the present invention, the extracting fingerprints of each class according to the payload clustering method includes the following steps:
and S510, removing the common TCP/UDP of the flow interruption.
By adopting the technical scheme, the messages which cannot be classified are removed, the resource utilization rate is improved, and the calculation efficiency is improved.
In a preferred embodiment of the present invention, the s600, acquiring a domain name with a largest jaccard coefficient in a network connection includes the following steps:
s610, acquiring the domain name and the times thereof accessed in the XDR ticket within a first time period;
s620, calculating the jaccard coefficient of each domain name;
s630, extracting the domain name with the largest jaccard coefficient.
By adopting the technical scheme, the method is suitable for being applied to data with high sparsity and acquiring the appropriate domain name.
In a preferred embodiment of the present invention, the calculating the jaccard coefficient of each domain name comprises the steps of:
the Jaccard coefficient is C/(n + B-C), wherein C is the number of times of domain names visited by the user in each network connection XDR call ticket in a first time period, n is the number of multidimensional vectors, and B is the total number of times of each domain name appearing in the XDR call ticket in the first time period.
By adopting the technical scheme, the calculation is simple and the reliability is strong.
In a preferred embodiment of the present invention, the calculating the cosine included angle between every two vectors includes the following steps:
Figure BDA0002869713140000071
wherein M ═ X1,X2,……,Xa,Xa+1) Is one (a +1) -dimensional vector, N ═ Y1,Y2,……,Ya,Ya+1) Is another (a +1) -dimensional vector.
By adopting the technical scheme, the calculation is simple and the reliability is strong.
In a preferred embodiment of the present invention, the first period of time is [ starttime- Δ t, starttime]Where starttime is the network traffic generation time,t is a preset time interval.
In the specific implementation process, thet can be 1-3 seconds, and can accept most data to realize the calculation of the jaccard coefficient.
In a preferred embodiment of the present invention, the s800. binning the corresponding classification rules comprises the following steps:
s810, searching a corresponding service name according to the domain name and the characteristics;
s820, extracting fingerprints to match with corresponding service classifications;
and S830, storing the business and fingerprint classification rules.
In the specific implementation process, if a clustering result is a multidimensional vector of (1,2,0,1 … … 2, 4,7,6), the domain name with the highest jaccard coefficient is calculated to be ptqy.gitv.tv, and then the service name corresponding to the domain name is inquired to be 'love art', the classification rule of (1,2,0,1 … … 2, 4,7,6) is incorporated into the love art service.
By adopting the technical scheme, the automatic classification of the network flow is realized.
In a preferred embodiment of the present invention, the s400 includes the following steps before offloading the network connection in the message according to the protocol:
and S300, removing the cutoff network connection.
In the specific implementation process, the network traffic may be obtained by using the APP locally by the wiresharK, or may be obtained directly from the network through the TMA device image. After acquisition, some network connections which do not contain the connection header message may exist, and the network connections are not suitable for analyzing and extracting features, so the network connections are rejected.
By adopting the technical scheme, the resource waste is reduced, and the calculation efficiency is improved.
In a preferred embodiment of the present invention, the s400, forking the network connection in the message according to a protocol includes the following steps:
network traffic is divided into three major classes: DNS, HTTP/HTTPS/RTMP/RTSP/QUIC, common TCP/UDPDNS.
By adopting the technical scheme, the processing method is adopted according to different protocols, so that efficient classification is realized, and resource waste is reduced.
In a preferred embodiment of the present invention, the method for analyzing and classifying network traffic comprises the following steps:
if the DNS is selected, the classification is finished;
if the resource library is HTTP/HTTPS/RTMP/RTSP/QUIC, S900, extracting the fingerprint of the designated position as the characteristic, then S700, judging whether the existing resource library has the relevant characteristic, if not, S800, warehousing the corresponding classification rule.
By adopting the technical scheme, the resources are classified, reasonably distributed, the resource utilization rate is improved, and the automatic classification of HTTP/HTTPS/RTMP/RTSP/QUIC is realized.
In a preferred embodiment of the present invention, the method for analyzing and classifying network traffic comprises the following steps:
s200, judging whether the existing resource library can identify the message: if yes, the classification is finished.
By adopting the technical scheme, the resource waste is reduced, the classification efficiency is improved, and the resources are reasonably utilized.
Based on the same inventive concept, the present invention provides an apparatus comprising:
the system comprises a memory and a processor, wherein the memory is provided with at least one instruction, and the at least one instruction is loaded and executed by the processor so as to realize the method.
Based on the same inventive concept, the present invention provides a computer-readable storage medium having at least one instruction stored thereon, the at least one instruction being loaded and executed by a processor to implement the above method.
Those of ordinary skill in the art will appreciate that the various illustrative algorithmic steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware, or as a combination of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present invention.
In the several embodiments provided in the present application, it should be understood that the disclosed system, apparatus and method may be implemented in other ways.
It should be understood that the technical problems can be solved by combining and combining the features of the embodiments from the claims.
The functions, if implemented in the form of software functional units and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.
The previous description of the disclosed embodiments is provided to enable any person skilled in the art to make or use the present invention. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the invention. Thus, the present invention is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims (10)

1. A method of analyzing classified network traffic, comprising the steps of:
acquiring a message;
shunting the message according to a protocol, wherein the message at least comprises common TCP/UDP;
extracting fingerprints of each type from the common TCP/UDP according to a payload clustering method;
acquiring a domain name with the largest jaccard coefficient in network connection;
and judging whether the existing resource library has relevant characteristics, and if not, warehousing the corresponding classification rules.
2. The method of analyzing categorized network traffic according to claim 1, wherein: the method for extracting the fingerprints of each type according to the payload clustering method comprises the following steps:
collecting basic information of a first data packet with a payload and each network connection of each network connection;
collecting the numerical value of a bytes in front of the data packet and the length value of the data packet;
forming the data into a (a +1) -dimensional vector;
calculating cosine included angles between every two vectors, and analyzing the similarity of two network connections;
clustering the multidimensional vectors by using a K-means clustering algorithm;
and extracting the multidimensional vector of each cluster center as the fingerprint of each class.
3. The method of analyzing categorized network traffic according to claim 2, wherein: the method for acquiring the domain name with the largest jaccard coefficient in the network connection comprises the following steps:
acquiring domain names and times thereof accessed in a first time period in an XDR ticket;
calculating the jaccard coefficient of each domain name;
and collecting the domain name with the largest jaccard coefficient.
4. The method of analyzing categorized network traffic according to claim 3, wherein: the step of putting the corresponding classification rules into a database comprises the following steps:
searching a corresponding service name according to the domain name and the characteristics;
extracting fingerprints to match with corresponding service classifications;
and warehousing the business and fingerprint classification rules.
5. The method of analyzing categorized network traffic according to claim 4, wherein: before shunting the network connection in the message according to the protocol, the method comprises the following steps:
the network connection is removed.
6. The method of analyzing categorized network traffic according to claim 5, wherein: the method for shunting the network connection in the message according to the protocol comprises the following steps:
network traffic is divided into three major classes: DNS, HTTP/HTTPS/RTMP/RTSP/QUIC, common TCP/UDPDNS.
7. The method of analyzing categorized network traffic according to claim 6, wherein: the method for analyzing and classifying network traffic comprises the following steps:
if the DNS is selected, the classification is finished;
if the resource library is HTTP/HTTPS/RTMP/RTSP/QUIC, extracting the fingerprint of the designated position as the characteristic, judging whether the existing resource library has related characteristics, and if not, warehousing the corresponding classification rule.
8. The method of analyzing categorized network traffic according to any one of the claims 1-7, wherein: the method for analyzing and classifying network traffic comprises the following steps:
judging whether the message can be identified by the existing resource library: if yes, the classification is finished.
9. An electronic device, comprising a memory and a processor, wherein the memory has at least one instruction loaded and executed by the processor to implement the method of any one of claims 1-8.
10. A computer-readable storage medium having stored thereon at least one instruction, the at least one instruction being loaded and executed by a processor to implement the method of any one of claims 1-8.
CN202011593229.2A 2020-12-29 2020-12-29 Method, equipment and storage medium for analyzing and classifying network flow Pending CN112583738A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011593229.2A CN112583738A (en) 2020-12-29 2020-12-29 Method, equipment and storage medium for analyzing and classifying network flow

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011593229.2A CN112583738A (en) 2020-12-29 2020-12-29 Method, equipment and storage medium for analyzing and classifying network flow

Publications (1)

Publication Number Publication Date
CN112583738A true CN112583738A (en) 2021-03-30

Family

ID=75143902

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011593229.2A Pending CN112583738A (en) 2020-12-29 2020-12-29 Method, equipment and storage medium for analyzing and classifying network flow

Country Status (1)

Country Link
CN (1) CN112583738A (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113783795A (en) * 2021-07-19 2021-12-10 北京邮电大学 Encrypted flow classification method and related equipment
CN114091087A (en) * 2022-01-17 2022-02-25 北京浩瀚深度信息技术股份有限公司 Encrypted flow identification method based on artificial intelligence algorithm
CN114386514A (en) * 2022-01-13 2022-04-22 中国人民解放军国防科技大学 Unknown flow data identification method and device based on dynamic network environment

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20130174253A1 (en) * 2011-12-29 2013-07-04 Verisign, Inc. Systems and methods for detecting similarities in network traffic
CN105281973A (en) * 2015-08-07 2016-01-27 南京邮电大学 Webpage fingerprint identification method aiming at specific website category
US20160094567A1 (en) * 2014-09-30 2016-03-31 The Nielsen Company (Us), Llc Methods and apparatus to identify media distributed via a network
US20180309723A1 (en) * 2017-04-21 2018-10-25 Netskope, Inc. Reducing latency in security enforcement by a network security system (nss)
CN110020075A (en) * 2017-10-20 2019-07-16 南京烽火软件科技有限公司 Device is excavated in illegal website automatically
CN110099059A (en) * 2019-05-06 2019-08-06 腾讯科技(深圳)有限公司 A kind of domain name recognition methods, device and storage medium
US20190319977A1 (en) * 2019-06-27 2019-10-17 Intel Corporation Systems and Methods to Fingerprint and Classify Application Behaviors Using Telemetry
CN110380989A (en) * 2019-07-26 2019-10-25 东南大学 The polytypic internet of things equipment recognition methods of network flow fingerprint characteristic two-stage
CN111131070A (en) * 2019-12-19 2020-05-08 北京浩瀚深度信息技术股份有限公司 Port time sequence-based network traffic classification method and device and storage medium

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20130174253A1 (en) * 2011-12-29 2013-07-04 Verisign, Inc. Systems and methods for detecting similarities in network traffic
US20160094567A1 (en) * 2014-09-30 2016-03-31 The Nielsen Company (Us), Llc Methods and apparatus to identify media distributed via a network
CN105281973A (en) * 2015-08-07 2016-01-27 南京邮电大学 Webpage fingerprint identification method aiming at specific website category
US20180309723A1 (en) * 2017-04-21 2018-10-25 Netskope, Inc. Reducing latency in security enforcement by a network security system (nss)
CN110020075A (en) * 2017-10-20 2019-07-16 南京烽火软件科技有限公司 Device is excavated in illegal website automatically
CN110099059A (en) * 2019-05-06 2019-08-06 腾讯科技(深圳)有限公司 A kind of domain name recognition methods, device and storage medium
US20190319977A1 (en) * 2019-06-27 2019-10-17 Intel Corporation Systems and Methods to Fingerprint and Classify Application Behaviors Using Telemetry
CN110380989A (en) * 2019-07-26 2019-10-25 东南大学 The polytypic internet of things equipment recognition methods of network flow fingerprint characteristic two-stage
CN111131070A (en) * 2019-12-19 2020-05-08 北京浩瀚深度信息技术股份有限公司 Port time sequence-based network traffic classification method and device and storage medium

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
郑杰等: "一种基于生物特征匹配的未知协议比特流分类方法", 《科技通报》 *

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113783795A (en) * 2021-07-19 2021-12-10 北京邮电大学 Encrypted flow classification method and related equipment
CN113783795B (en) * 2021-07-19 2023-07-25 北京邮电大学 Encryption traffic classification method and related equipment
CN114386514A (en) * 2022-01-13 2022-04-22 中国人民解放军国防科技大学 Unknown flow data identification method and device based on dynamic network environment
CN114386514B (en) * 2022-01-13 2022-11-25 中国人民解放军国防科技大学 Unknown flow data identification method and device based on dynamic network environment
CN114091087A (en) * 2022-01-17 2022-02-25 北京浩瀚深度信息技术股份有限公司 Encrypted flow identification method based on artificial intelligence algorithm
CN114091087B (en) * 2022-01-17 2022-04-15 北京浩瀚深度信息技术股份有限公司 Encrypted flow identification method based on artificial intelligence algorithm

Similar Documents

Publication Publication Date Title
CN112583738A (en) Method, equipment and storage medium for analyzing and classifying network flow
US11411923B2 (en) Methods and systems for deep learning based API traffic security
CN112019574B (en) Abnormal network data detection method and device, computer equipment and storage medium
US8676729B1 (en) Network traffic classification using subspace clustering techniques
US20220174008A1 (en) System and method for identifying devices behind network address translators
CN110855576B (en) Application identification method and device
WO2014187238A1 (en) Application type identification method and network device
US9686173B1 (en) Unsupervised methodology to unveil content delivery network structures
Deng et al. The random forest based detection of shadowsock's traffic
US10984452B2 (en) User/group servicing based on deep network analysis
Park et al. Fine‐grained traffic classification based on functional separation
Aiello et al. Profiling DNS tunneling attacks with PCA and mutual information
Yang et al. Empowering sketches with machine learning for network measurements
CN111147394A (en) Multi-stage classification detection method for remote desktop protocol traffic behavior
Mazhar Rathore et al. Exploiting encrypted and tunneled multimedia calls in high-speed big data environment
US20100290353A1 (en) Apparatus and method for classifying network packet data
Kohout et al. Automatic discovery of web servers hosting similar applications
CN113382039B (en) Application identification method and system based on 5G mobile network flow analysis
CN115378619A (en) Sensitive data access method, electronic equipment and computer readable storage medium
Han et al. Game traffic classification using statistical characteristics at the transport layer
Kapoor et al. Detecting VoIP data streams: approaches using hidden representation learning
Du et al. Fenet: Roles classification of ip addresses using connection patterns
Zarei et al. Automated dataset generation for training peer-to-peer machine learning classifiers
Boonyopakorn Applying Data Analytics to Findings of User Behaviour Usage in Network Systems
Hejun et al. Online and automatic identification of encryption network behaviors in big data environment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination