CN110768933B - Network flow application identification method, system and equipment and storage medium - Google Patents

Network flow application identification method, system and equipment and storage medium Download PDF

Info

Publication number
CN110768933B
CN110768933B CN201810844159.XA CN201810844159A CN110768933B CN 110768933 B CN110768933 B CN 110768933B CN 201810844159 A CN201810844159 A CN 201810844159A CN 110768933 B CN110768933 B CN 110768933B
Authority
CN
China
Prior art keywords
application identification
encrypted network
traffic
network traffic
identification result
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201810844159.XA
Other languages
Chinese (zh)
Other versions
CN110768933A (en
Inventor
刘伯仲
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Sangfor Technologies Co Ltd
Original Assignee
Sangfor Technologies Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Sangfor Technologies Co Ltd filed Critical Sangfor Technologies Co Ltd
Priority to CN201810844159.XA priority Critical patent/CN110768933B/en
Publication of CN110768933A publication Critical patent/CN110768933A/en
Application granted granted Critical
Publication of CN110768933B publication Critical patent/CN110768933B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/14Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic
    • H04L63/1408Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic by monitoring network traffic
    • H04L63/1416Event detection, e.g. attack signature detection
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/04Network architectures or network communication protocols for network security for providing a confidential data exchange among entities communicating through data packet networks
    • H04L63/0428Network architectures or network communication protocols for network security for providing a confidential data exchange among entities communicating through data packet networks wherein the data content is protected, e.g. by encrypting or encapsulating the payload

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Security & Cryptography (AREA)
  • Computer Hardware Design (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Data Exchanges In Wide-Area Networks (AREA)

Abstract

The application discloses a network flow application identification method, a system and equipment and a computer readable storage medium, wherein the method comprises the following steps: acquiring encrypted network flow data; extracting the traffic characteristics of the encrypted network traffic data; and obtaining an application identification result of the encrypted network flow data according to the flow characteristics. In the application, the encrypted network flow data is not decrypted, the flow characteristics of the encrypted network flow data are directly extracted after the encrypted network flow data is obtained, and application identification is carried out according to the flow characteristics. The extraction of the flow characteristics is directly based on the encrypted network flow data and not based on the plaintext content of the network flow data, even if the equipment has a security problem, the equipment only acquires the encrypted network flow data and the flow characteristics and cannot know the plaintext content of the network flow data. Therefore, the network flow application identification method improves the safety of encrypted network flow data application identification.

Description

Network flow application identification method, system and equipment and storage medium
Technical Field
The present application relates to the field of communications technologies, and in particular, to a method, a system, and a device for identifying a network traffic application, and a computer-readable storage medium.
Background
As security issues become more and more important, the need for encrypted traffic transmission becomes stronger and stronger. For encrypted traffic application identification, in the prior art, a SSL (Secure socket layer) broker is provided between a client and a browser, and the SSL broker provides a broker function capable of decrypting traffic data, so as to obtain plaintext content of the traffic data, and perform application identification according to the plaintext content. In the application identification process, a client is required to have high trust on the SSL broker, and once the SSL broker has a security problem, the security of network traffic data can be directly influenced.
Therefore, how to improve the security of the application identification of the encrypted network traffic data is a problem to be solved by those skilled in the art.
Disclosure of Invention
The application aims to provide a network flow application identification method, a system and equipment and a computer readable storage medium, which improve the safety of encrypted network flow data application identification.
In order to achieve the above object, the present application provides a network traffic application identification method, including:
acquiring encrypted network flow data;
extracting the traffic characteristics of the encrypted network traffic data;
and obtaining an application identification result of the encrypted network flow data according to the flow characteristics.
Wherein, the acquiring the encrypted network traffic data includes:
and capturing encrypted network flow data according to a preset time window, and carrying out flow cleaning operation on the encrypted network flow data.
Wherein the traffic characteristics include an SNI field and traffic behavior characteristics;
correspondingly, obtaining the application identification result of the encrypted network traffic data according to the traffic characteristics includes:
identifying the encrypted network flow data according to the SNI field, and judging whether an application identification result is obtained;
and if not, determining the application identification result according to the flow behavior characteristics.
The traffic behavior characteristics comprise one or a combination of any two of statistical characteristics of uplink packet size, statistical characteristics of downlink packet size, statistical characteristics of time intervals between data packets and SSL characteristics in unit time.
Determining the application identification result according to the flow behavior characteristics, wherein the determining of the application identification result comprises the following steps:
and determining a threshold range corresponding to the flow behavior characteristics, and obtaining an application identification result of the encrypted network flow data according to the threshold range.
Obtaining an application identification result of the encrypted network traffic data according to the threshold range, including:
inputting the flow behavior characteristics into a discrimination model so as to obtain an application identification result of the encrypted network flow data;
the discrimination model is specifically a model describing application recognition results corresponding to each threshold range of each flow behavior characteristic.
Before the flow behavior feature is input into a discriminant model, the method further includes:
obtaining a training sample; wherein the training sample comprises encrypted network traffic data of a known application identification result;
analyzing the training sample, and determining the flow behavior characteristics and the threshold corresponding to the training sample so as to obtain the discriminant model; the traffic behavior feature is specifically a traffic behavior feature capable of characterizing the training sample.
Determining the application identification result according to the flow behavior characteristics, wherein the determining of the application identification result comprises the following steps:
and inputting the traffic behavior characteristics into a trained classification model so as to obtain an application identification result of the encrypted network traffic data.
Before inputting the flow behavior characteristics into the trained classification model, the method further comprises:
training a classification model by using a training sample so as to obtain a trained classification model; wherein the training samples comprise encrypted network traffic data for known application recognition results.
The classification model comprises any one of a decision tree model, a support vector machine model, a random forest model and a logistic regression model.
In order to achieve the above object, the present application provides a network traffic application identification system, including:
the acquisition module is used for acquiring encrypted network flow data;
the extraction module is used for extracting the flow characteristics of the encrypted network flow data;
and the identification module is used for obtaining an application identification result of the encrypted network flow data according to the flow characteristics.
The acquisition module is specifically a module for capturing encrypted network traffic data according to a preset time window and performing traffic cleaning operation on the encrypted network traffic data.
Wherein the traffic characteristics include an SNI field and traffic behavior characteristics;
correspondingly, the identification module specifically identifies the encrypted network traffic data according to the SNI field and judges whether an application identification result is obtained; and if not, determining the module of the application identification result according to the flow behavior characteristics.
The traffic behavior characteristics comprise one or a combination of any two of statistical characteristics of uplink packet size, statistical characteristics of downlink packet size, statistical characteristics of time intervals between data packets and SSL characteristics in unit time.
The identification module is specifically a module for determining a threshold range corresponding to the traffic behavior characteristics and obtaining an application identification result of the encrypted network traffic data according to the threshold range.
In order to achieve the above object, the present application provides a network traffic application identification device, including:
a memory for storing a computer program;
a processor for implementing the steps of the network traffic application identification method as described above when executing the computer program.
To achieve the above object, the present application provides a computer-readable storage medium having a computer program stored thereon, where the computer program is executed by a processor to implement the steps of the network traffic application identification method as described above.
According to the scheme, the network traffic application identification method provided by the application comprises the following steps: acquiring encrypted network flow data; extracting the traffic characteristics of the encrypted network traffic data; and obtaining an application identification result of the encrypted network traffic data according to the traffic characteristics.
In the prior art, the SSL broker is disposed between the client and the browser, and after obtaining the encrypted network traffic data, decryption operation is performed on the encrypted network traffic data, and then an application result of the network traffic data is identified. Therefore, in the application, the encrypted network traffic data is not decrypted, the traffic characteristics of the encrypted network traffic data are directly extracted after the encrypted network traffic data is acquired, and application identification is performed according to the traffic characteristics. The extraction of the flow characteristics is directly based on the encrypted network flow data and not based on the plaintext content of the network flow data, even if the equipment has a security problem, the equipment only acquires the encrypted network flow data and the flow characteristics and cannot know the plaintext content of the network flow data. Therefore, the network flow application identification method improves the safety of encrypted network flow data application identification. The application also discloses a network flow application identification system and equipment and a computer readable storage medium, which can also realize the technical effects.
Drawings
In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present application, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.
Fig. 1 is a flowchart of a network traffic application identification method disclosed in an embodiment of the present application;
fig. 2 is a flowchart of another network traffic application identification method disclosed in an embodiment of the present application;
fig. 3 is a flowchart of another network traffic application identification method disclosed in the embodiment of the present application;
FIG. 4 is a diagram of a web-side data packet versus time;
FIG. 5 is a diagram of a mobile terminal packet versus time;
fig. 6 is a structural diagram of a network traffic application identification system disclosed in an embodiment of the present application;
fig. 7 is a structural diagram of a network traffic application identification device disclosed in an embodiment of the present application;
fig. 8 is a block diagram of another network traffic application identification device disclosed in an embodiment of the present application.
Detailed Description
The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.
The embodiment of the application discloses a network flow application identification method, which improves the safety of encrypted network flow data application identification.
Referring to fig. 1, a flowchart of a network traffic application identification method disclosed in an embodiment of the present application is shown in fig. 1, and includes:
s101: acquiring encrypted network flow data;
in specific implementation, the encrypted network traffic data may be captured according to a preset time window, where the preset time is not specifically limited, and a person skilled in the art may flexibly set the time window according to actual needs and processing speed. Since the encrypted network traffic data is captured, the capturing can be distinguished according to the port (e.g. 443), and this step does not limit the specific capturing tool, for example, the traffic capturing function in Bro (a passive open network traffic analyzer) can be used to obtain the encrypted network traffic, and after the encrypted network traffic data is obtained, the traffic cleansing operation needs to be performed on the encrypted network traffic data, that is, the traffic with invalid access passes, and the access traffic with real communication content remains.
S102: extracting the traffic characteristics of the encrypted network traffic data;
the traffic characteristics are characteristics representing communication behaviors of the encrypted network traffic data, and in this embodiment, specific types of the traffic characteristics are not limited, and those skilled in the art can flexibly select the traffic characteristics according to actual situations.
As a preferred embodiment, the traffic characteristics may include an SNI (chinese full Name: server Name Index, english full Name: set Name Index) field and traffic behavior characteristics, where the traffic behavior characteristics are also not specifically limited, and may preferably include a statistical characteristic of an upstream packet size in a unit time, a statistical characteristic of a downstream packet size, a statistical characteristic of a time interval between packets, an SSL characteristic, and the like. The statistical characteristics may include mean, variance, etc., and the SSL characteristics may include Ciphersuite (cipher suite), CommonName (common name) of SSL certificates, organization name, validity, etc., used for SSL protocol negotiation.
S103: and obtaining an application identification result of the encrypted network flow data according to the flow characteristics.
In a specific implementation, the encrypted network traffic data is identified according to the corresponding relationship between different traffic characteristics and the application identification result, that is, the corresponding relationship between the traffic characteristics and the application identification result is stored by default in this step, and the corresponding relationship may be stored in a database in the form of a data table, or of course, may be embodied in other forms, and is not limited specifically herein.
When the traffic characteristics extracted in the previous step include the SNI field and the traffic behavior characteristics, encrypted network traffic data can be identified according to the SNI field, if the SNI field has the capability of strictly distinguishing applications, the application identification result can be directly output, the step of analyzing the traffic behavior characteristics subsequently is not needed, the application identification efficiency is improved, and the subsequent steps can be performed for more accurate identification result. And if the SNI field does not have the capability of strictly distinguishing the applications, determining an application identification result according to the flow behavior characteristics.
In the prior art, the SSL broker is disposed between the client and the browser, and after obtaining the encrypted network traffic data, decryption operation is performed on the encrypted network traffic data, and then an application result of the network traffic data is identified. Therefore, in the embodiment of the application, the encrypted network traffic data is not decrypted, the traffic characteristics of the encrypted network traffic data are directly extracted after the encrypted network traffic data is acquired, and application identification is performed according to the traffic characteristics. The extraction of the flow characteristics is directly based on the encrypted network flow data and not based on the plaintext content of the network flow data, even if the equipment has a security problem, the equipment only acquires the encrypted network flow data and the flow characteristics and cannot know the plaintext content of the network flow data. Therefore, the network flow application identification method provided by the embodiment of the application improves the safety of encrypted network flow data application identification.
The embodiment of the application discloses a network traffic application identification method, and compared with the previous embodiment, the embodiment further explains and optimizes the technical scheme. Specifically, the method comprises the following steps:
referring to fig. 2, a flowchart of another network traffic application identification method provided in the embodiment of the present application is shown in fig. 2, and includes:
s201: acquiring encrypted network flow data;
s202: extracting the traffic characteristics of the encrypted network traffic data; the traffic characteristics comprise an SNI field and traffic behavior characteristics;
s231: identifying the encrypted network flow data according to the SNI field, and judging whether an application identification result is obtained; if so, outputting the application identification result; if not, entering S232;
in specific implementation, the domain name in the SNI field is extracted through SSL resolution, and if the domain name has the capability of strictly distinguishing applications, the application identification result can be directly output without the step of analyzing the traffic behavior characteristics subsequently, so that the application identification efficiency is improved, and the subsequent steps can be performed for more accurate identification result. If the domain name does not have the capability of strictly distinguishing applications, S232 is entered.
S232: and determining the application identification result according to the flow behavior characteristics.
In specific implementation, the application identification result can be obtained according to the threshold range corresponding to each flow behavior characteristic. Specifically, the default existence here is the corresponding relationship between each threshold range of each flow behavior characteristic and the application identification result, and the corresponding relationship may be stored in the database in the form of a data table, or of course, the corresponding relationship may be embodied in other forms (for example, a manner of constructing a discriminant model), which will be described in detail in the following embodiment.
The flow behavior characteristic in this step is a specific value, and first a threshold range including the value is determined, and then a specific application corresponding to the threshold range is determined to obtain an application identification result.
It will be appreciated that the skilled person can design the order in which the different flow behaviour characteristics are utilised, according to the actual circumstances. For example, it is possible to first distinguish the application type (shopping application, search application, social application, or the like) using the SNI field, then distinguish the terminal (mobile terminal, web site, or the like) of the application using the variance of the up/down packet size per unit time, and finally distinguish the specific function (chat function, dynamic publishing function, or the like) of the application according to the average value of the time interval between packets. Of course, the user may also choose the type of the application recognition result singly, e.g. only distinguish the terminal encrypting the network traffic data.
As another embodiment, the classification model can also be used for determining the application recognition result according to the flow behavior characteristics. Specifically, the traffic behavior characteristics are input into the trained classification model, and the obtained classification result is the application identification result of the encrypted network traffic data. The classification model may be a common classification model, such as a decision tree model, a support vector machine model, a random forest model, a logistic regression model, or the like.
It is understood that the step of training the classification model is also included in inputting the flow behavior characteristics into the trained classification model. Specifically, a training sample is obtained, the training sample is encrypted network traffic data of a known application recognition result, and the training sample is used for training a classification model so as to obtain the trained classification model. Specifically, each flow behavior feature of the training sample is extracted, and each flow behavior feature of the classification model training sample and the application recognition result are used for learning to obtain a classification model after training.
The following specifically describes the step of obtaining the application identification result by using the threshold range of the traffic behavior characteristics, specifically:
referring to fig. 3, a flowchart of another network traffic application identification method provided in the embodiment of the present application is shown in fig. 3, and includes:
s301: acquiring the traffic data of the dense network;
s302: extracting the traffic characteristics of the encrypted network traffic data; the traffic characteristics comprise an SNI field and traffic behavior characteristics;
s331: identifying the encrypted network flow data according to the SNI field, and judging whether an application identification result is obtained; if so, outputting the application identification result; if not, the step S332 is entered;
taking the division control of distinguishing the mobile phone treasure washing from the web page treasure washing platform as an example, the domain name appearing only at the mobile phone end can be known by observing the acquired traffic packet domain name: apoll.m.taobao.com, nbsdk-baichuan.alicdn.com, m.taobao.com, etc., domain names appearing only on the web: www.taobao.com, ecmp.tanx.com, etc., co-occurring domain names: xxx. Therefore, the domain name can be used as a characteristic for distinguishing the mobile terminal and the web side, and if the unique domain name of the mobile terminal or the web side is found, the application identification result can be directly output.
S332: inputting the flow behavior characteristics into a discrimination model so as to obtain an application identification result of the encrypted network flow data; the discrimination model is specifically a model describing application recognition results corresponding to each threshold range of each flow behavior characteristic.
In specific implementation, a discriminant model can be established for the corresponding relation between each threshold range of each flow behavior characteristic and the application recognition result. Specifically, a training sample of encrypted network traffic data including a known application recognition result is obtained, and the training sample is analyzed to obtain a traffic behavior feature that can characterize the training sample and a threshold value corresponding to the traffic behavior feature. It will be appreciated that the same way of grabbing should be taken when obtaining training samples, in order to avoid interference from objective factors.
In the example given in the above step, as shown in fig. 4, the number of data streams at the web peer in unit time is more compact (i.e. the vertical direction shown in fig. 4 is more dense, and the individual time intervals are nearly connected into a line), while as shown in fig. 5, the data access behavior of the mobile terminal presents different characteristics, and observing that the number of packets at the web peer in unit time is different greatly from each other in a scatter diagram, which may result in a great difference in distribution of packets, and a variance at the web peer may be greater than that at the mobile terminal, so a traffic behavior characteristic of a variance in the number of packets in unit time is adopted as a flow behavior characteristic for distinguishing the web peer from the mobile terminal. When the unit time is determined, in order to reduce errors, the unit time can be set to be 0.1 second, the number of the data packets of the web end and the mobile terminal in the unit time is counted, the variance is calculated, and the function of supplementing the distinction is played under the condition that the domain name cannot be clearly distinguished.
In the following, a network traffic application identification system provided in an embodiment of the present application is introduced, and a network traffic application identification system described below and a network traffic application identification method described above may be referred to each other.
Referring to fig. 6, a structure diagram of a network traffic application identification system provided in an embodiment of the present application is shown in fig. 4, and includes:
an obtaining module 601, configured to obtain encrypted network traffic data;
an extracting module 602, configured to extract a traffic feature of the encrypted network traffic data;
the identifying module 603 is configured to obtain an application identification result of the encrypted network traffic data according to the traffic characteristics.
In the prior art, the SSL broker is disposed between the client and the browser, and after obtaining the encrypted network traffic data, decryption operation is performed on the encrypted network traffic data, and then an application result of the network traffic data is identified. Therefore, in the embodiment of the application, the encrypted network traffic data is not decrypted, the traffic characteristics of the encrypted network traffic data are directly extracted after the encrypted network traffic data is acquired, and application identification is performed according to the traffic characteristics. The extraction of the flow characteristics is directly based on the encrypted network flow data and not based on the plaintext content of the network flow data, even if the system has a security problem, the system only acquires the encrypted network flow data and the flow characteristics and cannot know the plaintext content of the network flow data. Therefore, the network flow application identification system provided by the embodiment of the application improves the safety of encrypted network flow data application identification.
On the basis of the foregoing embodiment, as a preferred implementation manner, the obtaining module is specifically a module that captures encrypted network traffic data according to a preset time window and performs a traffic cleaning operation on the encrypted network traffic data.
On the basis of the above embodiment, as a preferred implementation, the traffic characteristics include an SNI field and traffic behavior characteristics;
correspondingly, the identification module specifically identifies the encrypted network traffic data according to the SNI field and judges whether an application identification result is obtained; and if not, determining the module of the application identification result according to the flow behavior characteristics.
On the basis of the above embodiment, as a preferred implementation manner, the traffic behavior feature includes one or a combination of any several of a statistical feature of an upstream packet size in a unit time, a statistical feature of a downstream packet size, a statistical feature of a time interval between data packets, and an SSL feature.
On the basis of the foregoing embodiment, as a preferred implementation manner, the identification module is specifically a module that determines a threshold range corresponding to the traffic behavior feature, and obtains an application identification result of the encrypted network traffic data according to the threshold range.
On the basis of the foregoing embodiment, as a preferred implementation manner, the identification module is specifically a module that inputs the traffic behavior feature into a discrimination model so as to obtain an application identification result of the encrypted network traffic data; the discrimination model is specifically a model describing application recognition results corresponding to each threshold range of each flow behavior characteristic.
On the basis of the above embodiment, as a preferred implementation, the method further includes:
the acquisition training sample module is used for acquiring a training sample; wherein the training sample comprises encrypted network traffic data of a known application identification result;
the first training module is used for analyzing the training samples and determining flow behavior characteristics and threshold values corresponding to the training samples so as to obtain the discriminant model; the traffic behavior feature is specifically a traffic behavior feature capable of characterizing the training sample.
On the basis of the foregoing embodiment, as a preferred implementation manner, the identification module is specifically a module that inputs the traffic behavior feature into a trained classification model so as to obtain an application identification result of the encrypted network traffic data.
On the basis of the above embodiment, as a preferred implementation, the method further includes:
the second training module is used for training the classification model by using the training sample so as to obtain a trained classification model; wherein the training samples comprise encrypted network traffic data for known application recognition results.
On the basis of the above embodiment, as a preferred implementation manner, the classification model includes any one of a decision tree model, a support vector machine model, a random forest model, and a logistic regression model.
The present application further provides a network traffic application identification device, referring to fig. 7, a structure diagram of a network traffic application identification device provided in an embodiment of the present application, as shown in fig. 7, includes:
a memory 100 for storing a computer program;
the processor 200, when executing the computer program, may implement the steps of the network traffic application identification method provided in any of the above embodiments.
Specifically, the memory 100 includes a nonvolatile storage medium and an internal memory. The non-volatile storage medium stores an operating system and computer-readable instructions, and the internal memory provides an environment for the operating system and the computer-readable instructions in the non-volatile storage medium to run. The processor 200 provides the network traffic application identification device with computing and control capabilities, and when executing the computer program stored in the memory 100, the processor may implement the steps of implementing the network traffic application identification method provided in any of the above embodiments.
In the prior art, the SSL broker is disposed between the client and the browser, and after obtaining the encrypted network traffic data, decryption operation is performed on the encrypted network traffic data, and then an application result of the network traffic data is identified. Therefore, in the embodiment of the application, the encrypted network traffic data is not decrypted, the traffic characteristics of the encrypted network traffic data are directly extracted after the encrypted network traffic data is acquired, and application identification is performed according to the traffic characteristics. The extraction of the flow characteristics is directly based on the encrypted network flow data and not based on the plaintext content of the network flow data, even if the equipment has a security problem, the equipment only acquires the encrypted network flow data and the flow characteristics and cannot know the plaintext content of the network flow data. Therefore, the network flow application identification equipment provided by the embodiment of the application improves the safety of encrypted network flow data application identification.
On the basis of the foregoing embodiment, referring to fig. 8 as a preferred implementation, the network traffic application identification device further includes:
and an input interface 300 connected to the processor 200, for obtaining computer programs, parameters and instructions imported from outside, and storing the computer programs, parameters and instructions into the memory 100 under the control of the processor 200. The input interface 300 may be connected to an input device for receiving parameters or instructions manually input by a user. The input device may be a touch layer covered on a display screen, or a button, a track ball or a touch pad arranged on a terminal shell, or a keyboard, a touch pad or a mouse, etc.
And a display unit 400 connected to the processor 200 for displaying data transmitted by the processor 200. The display unit 400 may be a display screen on a PC, a liquid crystal display screen, or an electronic ink display screen. Specifically, in the present embodiment, the flow behavior characteristics, the application recognition result, and the like may be displayed by the display unit 400.
And a network port 500 connected to the processor 200 for performing communication connection with each external terminal device. The communication technology adopted by the communication connection can be a wired communication technology or a wireless communication technology, such as a mobile high definition link (MHL) technology, a Universal Serial Bus (USB), a High Definition Multimedia Interface (HDMI), a wireless fidelity (WiFi), a bluetooth communication technology, a low power consumption bluetooth communication technology, an ieee802.11 s-based communication technology, and the like. Specifically, in the present embodiment, training samples, trained discrimination models, classification models, and the like are imported from other terminal devices through the network port 500.
The present application also provides a computer-readable storage medium, which may include: various media capable of storing program codes, such as a usb disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk, or an optical disk. The storage medium stores thereon a computer program, which when executed by a processor implements the steps of the network traffic application identification method provided by the above-described embodiments.
The embodiments in the present description are described in a progressive manner, each embodiment focuses on differences from other embodiments, and the same and similar parts among the embodiments are referred to each other.
The previous description of the disclosed embodiments is provided to enable any person skilled in the art to make or use the present application. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the application. Thus, the present application is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.
The embodiments are described in a progressive manner in the specification, each embodiment focuses on differences from other embodiments, and the same and similar parts among the embodiments are referred to each other. For the system disclosed by the embodiment, the description is relatively simple because the system corresponds to the method disclosed by the embodiment, and the relevant points can be referred to the method part for description. It should be noted that, for those skilled in the art, it is possible to make several improvements and modifications to the present application without departing from the principle of the present application, and such improvements and modifications also fall within the scope of the claims of the present application.
It is further noted that, in the present specification, relational terms such as first and second, and the like are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.

Claims (12)

1. A network traffic application identification method is characterized by comprising the following steps:
acquiring encrypted network flow data;
extracting the traffic characteristics of the encrypted network traffic data;
obtaining an application identification result of the encrypted network traffic data according to the traffic characteristics;
wherein the traffic characteristics include an SNI field and traffic behavior characteristics;
correspondingly, obtaining the application identification result of the encrypted network traffic data according to the traffic characteristics includes:
identifying the encrypted network flow data according to the domain name in the SNI field, and judging whether an application identification result is obtained;
if not, determining the application identification result according to the flow behavior characteristics;
determining the application identification result according to the flow behavior characteristics, wherein the determining of the application identification result comprises the following steps:
and determining a threshold range corresponding to the traffic behavior characteristics, and obtaining an application identification result of the encrypted network traffic data according to the threshold range, or inputting the traffic behavior characteristics into a trained classification model so as to obtain the application identification result of the encrypted network traffic data.
2. The network traffic application identification method according to claim 1, wherein the obtaining encrypted network traffic data comprises:
and capturing encrypted network flow data according to a preset time window, and carrying out flow cleaning operation on the encrypted network flow data.
3. The method as claimed in claim 1, wherein the traffic behavior characteristics include one or a combination of any of statistical characteristics of upstream packet size, statistical characteristics of downstream packet size, statistical characteristics of time interval between packets, and SSL characteristics.
4. The method for identifying the network traffic application according to claim 1, wherein obtaining the application identification result of the encrypted network traffic data according to the threshold range comprises:
inputting the flow behavior characteristics into a discrimination model so as to obtain an application identification result of the encrypted network flow data;
the discrimination model is specifically a model describing application recognition results corresponding to each threshold range of each flow behavior characteristic.
5. The method of claim 4, wherein before entering the traffic behavior features into a discriminant model, the method further comprises:
obtaining a training sample; wherein the training sample comprises encrypted network traffic data of a known application identification result;
and analyzing the training sample, and determining the flow behavior characteristics and the threshold corresponding to the training sample so as to obtain the discriminant model.
6. The method for identifying network traffic applications according to claim 1, wherein before inputting the traffic behavior features into the trained classification model, the method further comprises:
training a classification model by using a training sample so as to obtain a trained classification model; wherein the training samples comprise encrypted network traffic data for known application recognition results.
7. The network traffic application identification method according to claim 1 or 6, wherein the classification model comprises any one of a decision tree model, a support vector machine model, a random forest model, and a logistic regression model.
8. A network traffic application identification system, comprising:
the acquisition module is used for acquiring encrypted network flow data;
the extraction module is used for extracting the flow characteristics of the encrypted network flow data;
the identification module is used for obtaining an application identification result of the encrypted network flow data according to the flow characteristics;
wherein the traffic characteristics include an SNI field and traffic behavior characteristics;
correspondingly, the identification module specifically identifies the encrypted network traffic data according to the domain name in the SNI field, and judges whether an application identification result is obtained; if not, determining the module of the application identification result according to the flow behavior characteristics;
the identification module is specifically used for determining a threshold range corresponding to the flow behavior characteristics, and obtaining an application identification result of the encrypted network flow data according to the threshold range, or identifying the encrypted network flow data according to a domain name in the SNI field, and judging whether an application identification result is obtained; and if not, inputting the traffic behavior characteristics into a trained classification model so as to obtain an application identification result of the encrypted network traffic data.
9. The system according to claim 8, wherein the obtaining module is specifically a module configured to capture encrypted network traffic data according to a preset time window and perform traffic cleansing operation on the encrypted network traffic data.
10. The system of claim 8, wherein the traffic behavior characteristics comprise one or a combination of statistical characteristics of upstream packet size, downstream packet size, time interval between packets, and SSL characteristics.
11. A network traffic application identification device, comprising:
a memory for storing a computer program;
a processor for implementing the steps of the network traffic application identification method according to any of claims 1 to 7 when executing said computer program.
12. A computer-readable storage medium, characterized in that a computer program is stored on the computer-readable storage medium, which computer program, when being executed by a processor, carries out the steps of the network traffic application identification method according to any one of claims 1 to 7.
CN201810844159.XA 2018-07-27 2018-07-27 Network flow application identification method, system and equipment and storage medium Active CN110768933B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810844159.XA CN110768933B (en) 2018-07-27 2018-07-27 Network flow application identification method, system and equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810844159.XA CN110768933B (en) 2018-07-27 2018-07-27 Network flow application identification method, system and equipment and storage medium

Publications (2)

Publication Number Publication Date
CN110768933A CN110768933A (en) 2020-02-07
CN110768933B true CN110768933B (en) 2022-08-09

Family

ID=69328369

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810844159.XA Active CN110768933B (en) 2018-07-27 2018-07-27 Network flow application identification method, system and equipment and storage medium

Country Status (1)

Country Link
CN (1) CN110768933B (en)

Families Citing this family (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10834136B2 (en) 2017-06-15 2020-11-10 Palo Alto Networks, Inc. Access point name and application identity based security enforcement in service provider networks
US10708306B2 (en) * 2017-06-15 2020-07-07 Palo Alto Networks, Inc. Mobile user identity and/or SIM-based IoT identity and application identity based security enforcement in service provider networks
CN111371700A (en) * 2020-03-11 2020-07-03 武汉思普崚技术有限公司 Traffic identification method and device applied to forward proxy environment
CN111885083A (en) * 2020-07-31 2020-11-03 北京微步在线科技有限公司 Malicious encrypted flow detection method and device
CN111953706A (en) * 2020-08-21 2020-11-17 公安部第三研究所 Method for identifying mobile application based on HTTPS flow information
CN112202739B (en) * 2020-09-17 2021-12-14 腾讯科技(深圳)有限公司 Flow monitoring method and device
CN112448868B (en) * 2020-12-02 2022-09-30 新华三人工智能科技有限公司 Network traffic data identification method, device and equipment
CN112769633B (en) * 2020-12-07 2022-08-09 深信服科技股份有限公司 Proxy traffic detection method and device, electronic equipment and readable storage medium
CN112615758B (en) * 2020-12-16 2022-04-29 北京锐安科技有限公司 Application identification method, device, equipment and storage medium
CN112839055B (en) * 2021-02-04 2022-08-23 北京六方云信息技术有限公司 Network application identification method and device for TLS encrypted traffic and electronic equipment
CN114158039B (en) * 2021-12-14 2024-04-12 哈尔滨工业大学 Traffic analysis method, system, computer and storage medium for low-power consumption Bluetooth encryption communication
CN114338126A (en) * 2021-12-24 2022-04-12 武汉思普崚技术有限公司 Network application identification method and device
CN115022216A (en) * 2022-05-27 2022-09-06 中国电信股份有限公司 Installed APP detection method and device, and network side equipment
CN117376034B (en) * 2023-12-07 2024-03-22 南京中孚信息技术有限公司 Network traffic identification system, method and medium based on user behavior association

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101669112A (en) * 2007-04-25 2010-03-10 Lg电子株式会社 Link information between various application messages is provided and uses this link information
CN103200133A (en) * 2013-03-21 2013-07-10 南京邮电大学 Flow identification method based on network flow gravitation cluster

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103873320B (en) * 2013-12-27 2017-06-13 北京天融信科技有限公司 Encryption method for recognizing flux and device
CN104901897A (en) * 2015-05-26 2015-09-09 杭州华三通信技术有限公司 Determination method and device of application type
ES2828948T3 (en) * 2015-07-02 2021-05-28 Telefonica Cibersecurity & Cloud Tech S L U Method, system and software products to securely enable network functionality over encrypted data sessions

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101669112A (en) * 2007-04-25 2010-03-10 Lg电子株式会社 Link information between various application messages is provided and uses this link information
CN103200133A (en) * 2013-03-21 2013-07-10 南京邮电大学 Flow identification method based on network flow gravitation cluster

Also Published As

Publication number Publication date
CN110768933A (en) 2020-02-07

Similar Documents

Publication Publication Date Title
CN110768933B (en) Network flow application identification method, system and equipment and storage medium
CN106533669B (en) The methods, devices and systems of equipment identification
CN112995196B (en) Method and system for processing situation awareness information in network security level protection
CN109936512B (en) Flow analysis method, public service flow attribution method and corresponding computer system
CN112769633B (en) Proxy traffic detection method and device, electronic equipment and readable storage medium
JP2019021055A (en) Management server, data browsing system, and program
WO2015032281A1 (en) Method and system for generating and processing challenge-response tests
CN113595967A (en) Data identification method, equipment, storage medium and device
CN111131070B (en) Port time sequence-based network traffic classification method and device and storage medium
CN107592299A (en) Proxy surfing recognition methods, computer installation and computer-readable recording medium
KR101326789B1 (en) A system and method of Multiple Context-awareness for a customized cloud service distribution in Service Level Agreement
CN114401097B (en) HTTPS service flow identification method based on SSL certificate fingerprint
CN109347785A (en) A kind of terminal type recognition methods and device
CN108965011A (en) One kind being based on intelligent gateway deep packet inspection system and analysis method
Ren et al. App identification based on encrypted multi-smartphone sources traffic fingerprints
CN104410724A (en) Method for realizing device type recognition in intelligent gateway based on HTTP protocol
CN113824729A (en) Encrypted flow detection method, system and related device
CN110401626B (en) Hacker attack grading detection method and device
WO2023082605A1 (en) Http message extraction method and apparatus, and medium and device
CN107995167B (en) Equipment identification method and server
KR20190079092A (en) System and method for user distinction and authentication
CN114465710A (en) Vulnerability detection method, device, equipment and storage medium based on flow
Tang et al. Relational reasoning-based approach for network protocol reverse engineering
JP6110688B2 (en) Identification device, identification method, and identification program
CN102202036A (en) Method for issuing information

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant