CN113452810B

CN113452810B - Traffic classification method, device, equipment and medium

Info

Publication number: CN113452810B
Application number: CN202110771448.3A
Authority: CN
Inventors: 郑开发; 史帅; 尚程; 杨满智; 王杰; 蔡琳; 梁彧; 田野; 金红; 陈晓光; 傅强
Original assignee: Eversec Beijing Technology Co Ltd
Current assignee: Eversec Beijing Technology Co Ltd
Priority date: 2021-07-08
Filing date: 2021-07-08
Publication date: 2023-05-12
Anticipated expiration: 2041-07-08
Also published as: CN113452810A

Abstract

The embodiment of the disclosure discloses a flow classification method, a device, equipment and a medium, wherein the method comprises the following steps: obtaining an encrypted flow classification labeling training set according to the classification result and the characteristics of the target encrypted flow data, and training a deep learning model according to the encrypted flow classification labeling training set to obtain a flow classification model; converting the encrypted flow data to be classified into images and extracting characteristic data; and inputting the characteristic data into the flow classification model to obtain a classification result of the encrypted flow to be classified. According to the method and the device, an encrypted traffic classification annotation training set is obtained according to the classification result and the characteristics of target encrypted traffic data, and the encrypted traffic classification annotation training set comprises the classification result and the characteristics and the classification annotation and is a standardized training set; further, the encrypted flow data to be classified is converted into images, so that the flow classification problem is converted into the image recognition problem that the deep learning model is good at processing, and the classification result is more accurate.

Description

Traffic classification method, device, equipment and medium

Technical Field

The embodiment of the disclosure relates to a data processing technology, in particular to a traffic classification method, a traffic classification device, traffic classification equipment and traffic classification media.

Background

Traffic analysis has been one of the most important network directions since the advent of the internet. With the wide application of various data transmission encryption technologies, traffic encryption has become a standard practice in current network applications, especially to avoid detection of firewalls and intrusion detection systems, various malicious traffic uses technologies such as secure socket protocols (Secure Sockets Layer, SSL) to encrypt its communication traffic, so as to promote technologies such as traffic analysis.

With the increasing development of computer networks, the duty ratio of encrypted traffic is larger and larger, and the encryption technology is more and more complex, so that the possibility of various malicious applications, malicious traffic and network abnormal traffic is increased, and a hacker attacker performs a great deal of malicious activities by using the encryption traffic technology. The attacker uses a large amount of encrypted traffic to evade the detection of the security inspector, so that the difficulty of discovery, detection and management is greatly increased, and when a large amount of encrypted traffic appears in a network, how to quickly classify the encrypted traffic, generate corresponding alarms and further perform refined traffic analysis is very important.

At present, most common flow classification models are directly classified based on machine learning and deep learning; in the aspect of training data, most algorithms train based on open source data, and the open source data is not targeted, so that the flow classification result is inaccurate.

Disclosure of Invention

The embodiment of the disclosure provides a flow classification method, a flow classification device, flow classification equipment and flow classification media, so that a flow classification result is more accurate.

In a first aspect, an embodiment of the present disclosure provides a traffic classification method, including:

obtaining an encrypted flow classification labeling training set according to the classification result and the characteristics of the target encrypted flow data, and training a deep learning model according to the encrypted flow classification labeling training set to obtain a flow classification model;

converting the encrypted flow data to be classified into images and extracting characteristic data;

and inputting the characteristic data into the flow classification model to obtain a classification result of the encrypted flow to be classified.

In a second aspect, embodiments of the present disclosure provide a flow classification device, the device comprising:

the training unit is used for obtaining an encrypted flow classification labeling training set according to the classification result and the characteristics of the target encrypted flow data, training a deep learning model according to the encrypted flow classification labeling training set and obtaining a flow classification model;

the characteristic unit is used for converting the encrypted flow data to be classified into images and then extracting characteristic data;

and the classification unit is used for inputting the characteristic data into the flow classification model to obtain the classification result of the encrypted flow to be classified.

In a third aspect, an embodiment of the present disclosure provides an electronic device, including:

one or more processors;

a memory for storing one or more programs,

the one or more programs, when executed by the one or more processors, cause the one or more processors to implement the traffic classification method as described in embodiments of the present disclosure.

In a fourth aspect, embodiments of the present disclosure provide a computer readable storage medium having stored thereon a computer program which when executed by a processor implements a traffic classification method according to embodiments of the present disclosure.

According to the method and the device, an encrypted traffic classification annotation training set is obtained according to the classification result and the characteristics of target encrypted traffic data, and the encrypted traffic classification annotation training set comprises the classification result and the characteristics and the classification annotation and is a standardized training set; further, the encrypted flow data to be classified is converted into images, so that the flow classification problem is converted into the image recognition problem that the deep learning model is good at processing, and the classification result is more accurate.

Drawings

The above and other features, advantages, and aspects of embodiments of the present disclosure will become more apparent by reference to the following detailed description when taken in conjunction with the accompanying drawings. The same or similar reference numbers will be used throughout the drawings to refer to the same or like elements. It should be understood that the figures are schematic and that elements and components are not necessarily drawn to scale.

Fig. 1 is a flowchart of a flow classification method according to a first embodiment of the disclosure;

fig. 2 is a block diagram of a flow classification device according to a second embodiment of the disclosure;

fig. 3 is a block diagram of an electronic device according to a third embodiment of the present disclosure.

Detailed Description

Embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings. While certain embodiments of the present disclosure have been shown in the accompanying drawings, it is to be understood that the present disclosure may be embodied in various forms and should not be construed as limited to the embodiments set forth herein, but are provided to provide a more thorough and complete understanding of the present disclosure. It should be understood that the drawings and embodiments of the present disclosure are for illustration purposes only and are not intended to limit the scope of the present disclosure.

It should be understood that the various steps recited in the method embodiments of the present disclosure may be performed in a different order and/or performed in parallel. Furthermore, method embodiments may include additional steps and/or omit performing the illustrated steps. The scope of the present disclosure is not limited in this respect.

The term "including" and variations thereof as used herein are intended to be open-ended, i.e., including, but not limited to. The term "based on" is based at least in part on. The term "one embodiment" means "at least one embodiment"; the term "another embodiment" means "at least one additional embodiment"; the term "some embodiments" means "at least some embodiments. Related definitions of other terms will be given in the description below.

It should be noted that the terms "first," "second," and the like in this disclosure are merely used to distinguish between different devices, modules, or units and are not used to define an order or interdependence of functions performed by the devices, modules, or units.

It should be noted that references to "one", "a plurality" and "a plurality" in this disclosure are intended to be illustrative rather than limiting, and those of ordinary skill in the art will appreciate that "one or more" is intended to be understood as "one or more" unless the context clearly indicates otherwise.

The names of messages or information interacted between the various devices in the embodiments of the present disclosure are for illustrative purposes only and are not intended to limit the scope of such messages or information.

Example 1

Fig. 1 is a flowchart of a traffic classification method according to an embodiment of the present disclosure, which may be performed by a server. The present embodiment includes steps S110, S120, and S130. As shown in fig. 1, the method specifically includes the following steps:

s110, obtaining an encrypted flow classification labeling training set according to the classification result and the characteristics of the target encrypted flow data, and training a deep learning model according to the encrypted flow classification labeling training set to obtain a flow classification model.

The classification of network traffic data refers to constructing a classification model by using a certain algorithm, and classifying and identifying the collected network traffic data of various application programs by using the classification model, wherein the classification result comprises a certain application program classification result, an application layer protocol classification result or a certain service type classification result obtained by dividing according to the requirements of service quality (Quality of Service, qoS). The traffic data classification result in the embodiment of the present disclosure mainly refers to an application classification result.

The traffic data includes, but is not limited to, domain name system (Domain Name System, DNS) data, for example, a DNS security scenario is studied, DNS features required by a domain name generation algorithm (Domain Generate Algorithm, DGA) are extracted, the extracted features are adjusted and combined, required DNS data field information is output, and encryption processing is performed to obtain DNS target encrypted traffic data.

The flow data can be collected by deploying a deep packet (Deep Packet Inspection, DPI) collecting device at a flow inlet and a flow outlet, the device model can be analyzed according to the number of users, network flow, application programs and other factors to select the device, and the device can be installed at the flow inlet and the flow outlet in a mirror deployment mode.

The classification result of the target encrypted traffic data refers to that traffic classification is performed on the target encrypted traffic data in advance by other methods, so as to obtain the classification result that the encrypted traffic data is traffic data such as domain name, internet protocol (Internet Protocol, IP), protocol or application program. Based on the accumulated and recorded synchronous knowledge bases such as domain names, IP, protocols and application programs, the characteristics of multiple dimensions including IP, protocol characteristic analysis, application characteristic analysis and the like are extracted, flow classification labeling is carried out, an encrypted flow classification labeling training set is obtained, and the encrypted flow classification labeling training set comprises classification results and characteristics and classification labels and is a standardized training set.

Further, the target encrypted traffic data includes white sample traffic data and basic traffic data, and before obtaining the encrypted traffic classification annotation training set according to the classification result and the characteristics of the target encrypted traffic data, the method further includes: acquiring an open source data set as white sample flow data; and collecting the network flow data packets as basic flow data, wherein the network flow data packets are subjected to the authorization of the application program user in advance. Encrypting the flow data according to a preset encryption protocol, and marking the target encrypted flow data packet according to a preset data packet marking mode.

Further, the training set training deep learning model according to the encrypted traffic classification label comprises: converting the encrypted traffic classified and marked in the encrypted traffic classified and marked training set into a two-dimensional image; training a deep learning model based on the converted two-dimensional image and the corresponding annotation relation.

Further, the training set training deep learning model according to the encrypted traffic classification label to obtain a traffic classification model, including: adding corresponding training data in the encrypted flow classification labeling training set according to the analysis result of the duty ratio and recall rate of the target encrypted flow data in the daily newly-added flow proportion; training the deep learning model according to the encrypted flow classification labeling training set added with the training data to obtain a flow classification model.

Meanwhile, in order to ensure that the model is more suitable for analysis of encrypted flow data, corresponding training data is added in an encrypted flow classification annotation training set according to the proportion of the target encrypted flow data in the newly increased flow proportion every day and the analysis result of recall rate detected by various detection algorithms, so that the training accuracy of the deep learning model is further improved.

S120, converting the encrypted flow data to be classified into image data and extracting feature data.

The encrypted traffic data to be classified is preprocessed and converted into (two-dimensional) image data which is good for processing by the deep learning method. The preprocessing comprises the steps of converting the encrypted flow data to be classified into image data based on a conversion function, and preserving flow characteristics in the converted image data. The conversion function is used for converting the encrypted traffic data to be classified into two-dimensional image data required by deep learning.

The network traffic data has an obvious hierarchical characteristic system, and the bottommost layer of the network traffic data is a series of traffic byte strings. According to the format specified by the network protocol, a plurality of traffic bytes are combined into one data packet, and a plurality of data packets of both communication parties are further combined into one network flow.

Further, the converting the encrypted traffic data to be classified into an image includes: dividing the encrypted flow data to be classified into network flow sets, wherein each network flow comprises a group of data packets communicated by two parties; for each network flow, generating an m-n two-dimensional image according to the first n character strings of the data packet and the single thermal coding of the m-dimensional vector, mapping the value of each data packet bit into the value of a pixel point one by one, and supplementing the character string length of the data packet with 0 to n when the character string length of the data packet is smaller than n, wherein n and m are natural numbers, and the two-dimensional image comprises the following characteristics: the speckle characteristics of the two-dimensional images corresponding to different applications of encrypted flow data are different, the speckle characteristics of the two-dimensional images corresponding to different protocols of encrypted flow data are different, and the speckle characteristics of the two-dimensional images corresponding to similar applications or similar protocols of encrypted flow data are similar.

S130, inputting the characteristic data into the flow classification model to obtain a classification result of the encrypted flow to be classified.

Further, the inputting the feature data into the flow classification model to obtain the classification result of the encrypted flow to be classified includes: and the flow classification model classifies the characteristic data based on time or space characteristics to obtain a classification result of the encrypted flow to be classified. Since the classification process uses temporal or spatial features and does not use class attribute features, encrypted traffic for applications that have no classification definition can be identified.

Further, the method further comprises the following steps: summarizing and displaying protocols and application programs according to the classification result of the encrypted traffic to be classified; and/or, early warning is carried out on the abnormal flow event.

According to the predefined encrypted traffic data information and the abnormal data traffic characteristics, abnormal traffic events can be identified, and further the abnormal traffic threat can be handled in a targeted manner, powerful protective measures are taken to prevent and eliminate hazards caused by the hidden danger.

The above deep learning model is a long and short term memory network (Long Short Term Memory networks, LSTM) model. The LSTM model is a cyclic neural network (Recurrent Neural Networks, RNN), and the deep learning model can automatically learn the identified features in the identification process, so that the problem of insufficient feature extraction is avoided.

Compared to a simple one-layer structure of a neural network, LSTM has a four-layer structure, which interact in a special way. The state of the cells is represented by horizontal lines in LSTM, which is similar to a conveyor belt, through the four-layer structure of LSTM, where the state of the cells runs over the entire chain with only some small linear operations acting on it, and information remains unchanged through the entire chain (four-layer structure). The LSTM includes a structure of gates that optionally pass information, including an activation function (Sigmoid) layer and dot-multiply operations, through which information can be deleted or added to the cell state.

The Sigmoid neural network layer outputs a number between 0 and 1 that describes how much information each component has passed through, 0 indicating no information passed through, and 1 indicating all passes through. LSTM has three gates for protecting and controlling the state of cells.

The first step of LSTM is to decide what information to discard from the cell state. This decision is implemented by the Sigmoid layer called "forget gate". It looks at ht-1 (the previous output) and xt (the current input) and outputs a number between 0 and 1 for each number in the cell state Ct-1 (the previous state). 1 represents complete retention and 0 represents complete deletion.

The next step is to decide what information to store in the cell state. This part is divided into two steps. First, the Sigmoid layer, called the "input gate layer", determines which values are to be updated. Next a tanh layer creates a candidate vector Ct, which is added to the state of the cell. In a further step, the two vectors are combined to create an updated value, and then the last state value Ct-1 is updated to Ct. The specific updating mode is as follows: multiplying the last state value by ft to express the expected forgetting part, and adding the obtained value to

A new candidate value is obtained.

Finally, output is based on the cell status. The specific implementation method comprises the following steps: a Sigmoid layer is run according to the portion of the cell state to be output, the cell state is passed through tanh (normalizing the value between-1 and 1) and multiplied by the output of the Sigmoid gate, resulting in an output result.

Example two

Fig. 2 is a block diagram of a flow classification device according to a second embodiment of the present disclosure, which may perform the flow classification method according to the first embodiment and may be configured in a server, as shown in fig. 3, and may include a training unit 210, a feature unit 220, and a classification unit 230, where,

the training unit 210 is configured to obtain an encrypted traffic classification label training set according to the classification result and the feature of the target encrypted traffic data, and train the deep learning model according to the encrypted traffic classification label training set to obtain a traffic classification model.

And the feature unit 220 is used for converting the encrypted flow data to be classified into an image and then extracting feature data.

And the classification unit 230 is configured to input the feature data into the traffic classification model to obtain a classification result of the encrypted traffic to be classified.

On the basis of the above embodiment, the target encrypted traffic data includes white sample traffic data and base traffic data, and further includes: the acquisition unit is used for acquiring an open source data set as white sample flow data; and collecting the network flow data packet as the basic flow data.

Based on the above embodiment, the training unit 210 is further configured to add corresponding training data in the encrypted traffic classification label training set according to the analysis result of the duty ratio and recall ratio of the new daily traffic proportion of the target encrypted traffic data; training the deep learning model according to the encrypted flow classification labeling training set added with the training data to obtain a flow classification model.

Based on the above embodiment, the classification unit 230 is further configured to classify the feature data based on temporal or spatial features by using the traffic classification model, so as to obtain a classification result of the encrypted traffic to be classified.

Based on the above embodiment, the feature unit 220 is further configured to segment the encrypted traffic data to be classified into network flow sets, where each network flow includes a set of data packets communicated by both parties; for each network flow, generating an m-n two-dimensional image according to the first n character strings of the data packet and the single thermal coding of the m-dimensional vector, mapping the value of each data packet bit into the value of a pixel point one by one, and supplementing the character string length of the data packet with 0 to n when the character string length of the data packet is smaller than n, wherein n and m are natural numbers, and the two-dimensional image comprises the following characteristics: the speckle characteristics of the two-dimensional images corresponding to different applications of encrypted flow data are different, the speckle characteristics of the two-dimensional images corresponding to different protocols of encrypted flow data are different, and the speckle characteristics of the two-dimensional images corresponding to similar applications or similar protocols of encrypted flow data are similar.

Based on the embodiment, the device further comprises a display unit, which is used for summarizing and displaying the protocol and the application program according to the classification result of the encrypted traffic to be classified; and/or, early warning is carried out on the abnormal flow event.

The flow classifying device provided in the embodiment of the present disclosure belongs to the same inventive concept as the flow classifying method provided in the first embodiment, and technical details not described in detail in the embodiment can be seen in the embodiment, and the embodiment has the same advantages of executing the flow classifying method.

Example III

Referring now to fig. 3, a block diagram of an electronic device 900 suitable for use in implementing embodiments of the present disclosure is shown. The electronic equipment in the embodiment of the disclosure has a flow classification function. The electronic device shown in fig. 3 is merely an example and should not be construed to limit the functionality and scope of use of the disclosed embodiments.

As shown in fig. 3, the electronic device 900 may include a processing apparatus (e.g., a central processing unit, a graphics processor, etc.) 901, which may perform various appropriate actions and processes according to a program stored in a Read Only Memory (ROM) 902 or a program loaded from a storage apparatus 908 into a Random Access Memory (RAM) 903. In the RAM 903, various programs and data necessary for the operation of the electronic device 900 are also stored. The processing device 901, the ROM 902, and the RAM 903 are connected to each other through a bus 904. An input/output (I/O) interface 905 is also connected to the bus 904.

In general, the following devices may be connected to the I/O interface 905: input devices 906 including, for example, a touch screen, touchpad, keyboard, mouse, camera, microphone, accelerometer, gyroscope, and the like; an output device 907 including, for example, a Liquid Crystal Display (LCD), a speaker, a vibrator, and the like; storage 908 including, for example, magnetic tape, hard disk, etc.; and a communication device 909. The communication means 909 may allow the electronic device 900 to communicate wirelessly or by wire with other devices to exchange data. While fig. 3 shows an electronic device 900 having various means, it is to be understood that not all illustrated means are required to be implemented or provided. More or fewer devices may be implemented or provided instead.

In particular, according to embodiments of the present disclosure, the processes described above with reference to flowcharts may be implemented as computer software programs. For example, embodiments of the present disclosure include a computer program product comprising a computer program embodied on a non-transitory computer readable medium, the computer program comprising program code for performing the method shown in the flow chart. In such an embodiment, the computer program may be downloaded and installed from a network via the communication device 909, or installed from the storage device 908, or installed from the ROM 902. When executed by the processing device 901, performs the above-described functions defined in the methods of the embodiments of the present disclosure.

Example IV

The computer readable medium described above in the present disclosure may be a computer readable signal medium or a computer readable storage medium or any combination of the two. The computer readable storage medium can be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or a combination of any of the foregoing. More specific examples of the computer-readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the context of this disclosure, a computer-readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In the present disclosure, however, the computer-readable signal medium may include a data signal propagated in baseband or as part of a carrier wave, with the computer-readable program code embodied therein. Such a propagated data signal may take any of a variety of forms, including, but not limited to, electro-magnetic, optical, or any suitable combination of the foregoing. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: electrical wires, fiber optic cables, RF (radio frequency), and the like, or any suitable combination of the foregoing.

In some implementations, the clients, servers may communicate using any currently known or future developed network protocol, such as HTTP (HyperText Transfer Protocol ), and may be interconnected with any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include a local area network ("LAN"), a wide area network ("WAN"), the internet (e.g., the internet), and peer-to-peer networks (e.g., ad hoc peer-to-peer networks), as well as any currently known or future developed networks.

The computer readable medium may be contained in the electronic device; or may exist alone without being incorporated into the electronic device.

The computer readable medium carries one or more programs which, when executed by the electronic device, cause the electronic device to: obtaining an encrypted flow classification labeling training set according to the classification result and the characteristics of the target encrypted flow data, and training a deep learning model according to the encrypted flow classification labeling training set to obtain a flow classification model; converting the encrypted flow data to be classified into image data and extracting characteristic data; and inputting the characteristic data into the flow classification model to obtain a classification result of the encrypted flow to be classified.

Computer program code for carrying out operations of the present disclosure may be written in one or more programming languages, including, but not limited to, an object oriented programming language such as Java, smalltalk, C ++ and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any kind of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or may be connected to an external computer (for example, through the Internet using an Internet service provider).

The flowcharts and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

The modules described in the embodiments of the present disclosure may be implemented in software or hardware. The name of the module is not limited to the module itself in some cases, and for example, the request module may be further described as "a module for transmitting a first play request according to first video information of a currently to-be-played video".

The functions described above herein may be performed, at least in part, by one or more hardware logic components. For example, without limitation, exemplary types of hardware logic components that may be used include: a Field Programmable Gate Array (FPGA), an Application Specific Integrated Circuit (ASIC), an Application Specific Standard Product (ASSP), a system on a chip (SOC), a Complex Programmable Logic Device (CPLD), and the like.

In the context of this disclosure, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. The machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

The foregoing description is only of the preferred embodiments of the present disclosure and description of the principles of the technology being employed. It will be appreciated by persons skilled in the art that the scope of the disclosure referred to in this disclosure is not limited to the specific combinations of features described above, but also covers other embodiments which may be formed by any combination of features described above or equivalents thereof without departing from the spirit of the disclosure. Such as those described above, are mutually substituted with the technical features having similar functions disclosed in the present disclosure (but not limited thereto).

Moreover, although operations are depicted in a particular order, this should not be understood as requiring that such operations be performed in the particular order shown or in sequential order. In certain circumstances, multitasking and parallel processing may be advantageous. Likewise, while several specific implementation details are included in the above discussion, these should not be construed as limiting the scope of the present disclosure. Certain features that are described in the context of separate embodiments can also be implemented in combination in a single embodiment. Conversely, various features that are described in the context of a single embodiment can also be implemented in multiple embodiments separately or in any suitable subcombination.

Although the subject matter has been described in language specific to structural features and/or methodological acts, it is to be understood that the subject matter defined in the appended claims is not necessarily limited to the specific features or acts described above. Rather, the specific features and acts described above are example forms of implementing the claims.

Claims

1. A method of traffic classification, the method comprising:

converting the encrypted flow data to be classified into image data, and extracting feature data, wherein flow features are reserved in the image data;

inputting the characteristic data into the flow classification model to obtain a classification result of the encrypted flow to be classified;

wherein the converting the encrypted traffic data to be classified into image data includes:

dividing the encrypted flow data to be classified into network flow sets, wherein each network flow comprises a group of data packets communicated by two parties;

for each network flow, generating an m-n two-dimensional image according to the first n character strings of the data packet and the single thermal coding of the m-dimensional vector, mapping the value of each data packet bit into the value of a pixel point one by one, and supplementing the character string length of the data packet with 0 to n when the character string length of the data packet is smaller than n, wherein n and m are natural numbers, and the two-dimensional image comprises the following characteristics: the speckle characteristics of the two-dimensional images corresponding to different applications of encrypted flow data are different, the speckle characteristics of the two-dimensional images corresponding to different protocols of encrypted flow data are different, and the speckle characteristics of the two-dimensional images corresponding to similar applications or similar protocols of encrypted flow data are similar.

2. The method of claim 1, wherein the target encrypted traffic data comprises white sample traffic data and base traffic data, and further comprising, prior to deriving the encrypted traffic classification annotation training set based on classification results and features of the target encrypted traffic data:

acquiring an open source data set as white sample flow data;

network flow packets are collected as base traffic data.

3. The method according to claim 1, wherein the training set of training deep learning models according to the encrypted traffic classification labels, to obtain a traffic classification model, comprises:

adding corresponding training data in the encrypted flow classification labeling training set according to the analysis result of the duty ratio and recall rate of the target encrypted flow data in the daily newly-added flow proportion;

training the deep learning model according to the encrypted flow classification labeling training set added with the training data to obtain a flow classification model.

4. The method according to claim 1, wherein inputting the feature data into the traffic classification model to obtain the classification result of the encrypted traffic to be classified comprises:

and the flow classification model classifies the characteristic data based on time or space characteristics to obtain a classification result of the encrypted flow to be classified.

5. The method as recited in claim 1, further comprising:

summarizing and displaying protocols and application programs according to the classification result of the encrypted traffic to be classified;

and/or, early warning is carried out on the abnormal flow event.

6. The method of claim 1, wherein the training set training deep learning model based on the encrypted traffic class labels comprises:

converting the encrypted traffic classified and marked in the encrypted traffic classified and marked training set into a two-dimensional image;

training a deep learning model based on the converted two-dimensional image and the corresponding annotation relation.

7. A flow classification device, the device comprising:

the characteristic unit is used for converting the encrypted flow data to be classified into image data and then extracting characteristic data, wherein the flow characteristics are reserved in the image data;

the classification unit is used for inputting the characteristic data into the flow classification model to obtain a classification result of the encrypted flow to be classified;

8. An electronic device, comprising:

one or more processors;

a memory for storing one or more programs,

when executed by the one or more processors, causes the one or more processors to implement the traffic classification method of any of claims 1-6.

9. A computer readable storage medium, on which a computer program is stored, characterized in that the program, when being executed by a processor, implements a flow classification method according to any one of claims 1-6.