CN112448868B

CN112448868B - Network traffic data identification method, device and equipment

Info

Publication number: CN112448868B
Application number: CN202011400451.6A
Authority: CN
Inventors: 吴问天
Original assignee: Xinhuasan Artificial Intelligence Technology Co ltd
Current assignee: Xinhuasan Artificial Intelligence Technology Co ltd
Priority date: 2020-12-02
Filing date: 2020-12-02
Publication date: 2022-09-30
Anticipated expiration: 2040-12-02
Also published as: CN112448868A

Abstract

The embodiment of the application discloses a method, a device and equipment for identifying network traffic data. In the application, the network flow data are captured and stored in the designated folder by using the packet capturing module packaged in the configured Docker container, whether the designated folder has the newly added network flow data or not is monitored by using the monitoring module packaged in the Docker container, if the newly added network flow data exist, the corresponding feature data are extracted from the newly added network flow data by using the feature module packaged in the Docker container, and the feature data are input into the trained recognition classification model to obtain a recognition classification result, so that the recognition classification of the newly added network flow data is realized, the newly added network flow data do not need to be converted into image data in a recognition process, and the time cost is saved. Further, the scheme provided by the application runs in the configured Docker container, and is convenient to deploy to other environments.

Description

Network traffic data identification method, device and equipment

Technical Field

The present application relates to the field of computers, and in particular, to a method, an apparatus, and a device for identifying network traffic data.

Background

When network traffic data is managed in an actual production environment, the network traffic data is often required to be identified and classified, for example, when video network traffic data is managed, video network traffic data generated by different video playing software is required to be identified and classified so as to confirm which video playing software the captured network traffic data belongs to.

However, in the related art, when the network traffic data is identified and classified, the network traffic data needs to be converted into image data, which is time-consuming. Therefore, a need exists for a method of quickly identifying classified network traffic data.

Disclosure of Invention

The application discloses a method, a device and equipment for identifying network traffic data, which are used for realizing rapid identification and classification of the network traffic data.

According to a first aspect of the embodiments of the present application, there is provided a method for identifying network traffic data, where the method is applied to a network device, and the method includes:

when newly-added network traffic data of a designated folder is monitored by a monitoring module packaged in a configured Docker container, the designated folder is used for storing the network traffic data captured by a packet capturing module packaged in the Docker container;

extracting corresponding feature data from the newly added network flow data through a feature module packaged in the configured Docker container;

and inputting the characteristic data into a trained recognition and classification model to obtain a recognition and classification result, wherein the recognition and classification result is used for indicating the source of the newly added network traffic data.

Optionally, the monitoring module is a python library Pyinotify developed based on a file system monitoring inotify function on the Linux system.

Optionally, the feature module is a compiled Joy feature extraction tool;

after the Joy feature extraction tool extracts corresponding feature data from the newly added network traffic data, the method further includes: and saving the characteristic data into a compressed file with a specified format.

Optionally, the method further comprises:

and calling a visualization tool Grafana packaged in the Docker container to visually display the recognition and classification results.

Optionally, the recognition classification model is obtained by training in the following manner:

obtaining sample network traffic data, and setting a corresponding class label for the obtained sample network traffic data, wherein the class label is used for indicating the source of the sample network traffic data;

extracting sample feature data from the sample network traffic data;

and training the recognition classification model according to the sample characteristic data and the class label.

According to a second aspect of the embodiments of the present application, there is provided a network traffic data identification apparatus, including:

the monitoring unit is used for processing newly added network flow data through the feature extraction unit when monitoring newly added network flow data of a specified folder is monitored through a monitoring module packaged in a configured Docker container, and the specified folder is used for storing the network flow data captured through a packet capturing module packaged in the Docker container;

the feature extraction unit is used for extracting corresponding feature data from the newly added network flow data through a feature module packaged in the configured Docker container;

and the identification classification unit is used for inputting the characteristic data into a trained identification classification model to obtain an identification classification result, and the identification classification result is used for indicating the source of the newly added network flow data.

Optionally, the feature module is a compiled Joy feature extraction tool;

after the feature extraction unit extracts corresponding feature data from the newly added network traffic data through the Joy feature extraction tool, the feature extraction unit is further configured to: and saving the characteristic data into a compressed file with a specified format.

Optionally, the apparatus further comprises:

and the visual display unit is used for calling a visual tool Grafana packaged in the Docker container to visually display the recognition and classification results.

Optionally, the apparatus further comprises:

the model training unit is used for obtaining sample network traffic data and setting a corresponding class label for the obtained sample network traffic data, wherein the class label is used for indicating the source of the sample network traffic data;

extracting sample feature data from the sample network traffic data;

According to a third aspect of embodiments of the present application, there is provided an electronic apparatus including: a processor and a memory;

the memory for storing machine executable instructions;

the processor is configured to read and execute the machine executable instructions stored in the memory to implement the method described above.

The technical scheme provided by the embodiment of the application can have the following beneficial effects:

according to the technical scheme, the network flow data are captured by the aid of the packaged packet capturing module in the configured Docker container and stored in the designated folder, whether the designated folder has the newly added network flow data or not is monitored by the monitoring module packaged in the Docker container, if the newly added network flow data exist, the corresponding feature data are extracted from the newly added network flow data by the aid of the feature module packaged in the Docker container, the feature data are input into the trained identification and classification model to obtain an identification and classification result, identification and classification of the captured newly added network flow data are achieved, the network flow data do not need to be converted into image data in an identification process, and time cost is saved. Further, the scheme provided by the application runs in configured Docker containers, the configured Docker containers are created through Docker images, and one Docker image can create corresponding Docker containers in different environments, so that the Docker containers can be conveniently deployed in other environments.

It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the application.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the present specification and together with the description, serve to explain the principles of the specification.

Fig. 1 is a flowchart of a method for implementing network traffic data identification according to an embodiment of the present application;

FIG. 2 is a schematic flow chart diagram of recognition classification model training provided by an embodiment of the present application;

fig. 3 is a schematic diagram of an apparatus for implementing network traffic data identification according to an embodiment of the present application;

fig. 4 is a schematic diagram of another apparatus for implementing network traffic data identification according to an embodiment of the present disclosure;

fig. 5 is a schematic hardware structure diagram of a method for implementing network traffic data identification according to an embodiment of the present application.

Detailed Description

Reference will now be made in detail to the exemplary embodiments, examples of which are illustrated in the accompanying drawings. When the following description refers to the accompanying drawings, like numbers in different drawings represent the same or similar elements unless otherwise indicated. The implementations described in the following exemplary examples do not represent all implementations consistent with the present application. Rather, they are merely examples of apparatus and methods consistent with certain aspects of the present application, as detailed in the appended claims.

The terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the application. As used in this application and the appended claims, the singular forms "a", "an", and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It should also be understood that the term "and/or" as used herein refers to and encompasses any and all possible combinations of one or more of the associated listed items.

It is to be understood that although the terms first, second, third, etc. may be used herein to describe various information, such information should not be limited to these terms. These terms are only used to distinguish one type of information from another. For example, first information may also be referred to as second information, and similarly, second information may also be referred to as first information, without departing from the scope of the present application. The word "if" as used herein may be interpreted as "at … …" or "when … …" or "in response to a determination", depending on the context.

In order to make the technical solutions provided in the embodiments of the present application better understood and make the above objects, features and advantages of the embodiments of the present application more comprehensible, the technical solutions in the embodiments of the present application are described in further detail below with reference to the accompanying drawings.

Referring to fig. 1, fig. 1 is a flowchart of a method provided in an embodiment of the present application. As an embodiment, the process shown in fig. 1 may be applied to network devices, such as Windows machines, Linux machines, and other devices.

Optionally, before implementing the embodiment of the present application, a Docker application container engine is required to encapsulate codes, dependency packages, and components used when implementing the embodiment of the present application into a portable Docker image, so as to deploy the embodiment of the present application into different environments through the Docker image.

Optionally, deploying the embodiment of the present application to different environments through a Docker image is implemented by creating a Docker container through the Docker image. In the embodiment of the present application, a Docker image is a read-only template file, and if the embodiment of the present application is to be implemented according to the Docker image, first, the Docker image needs to be instantiated as a Docker container, for example, the Docker image named as name2 may be instantiated as a Docker container by using a command "Docker run-itd — name1 name 2", and the Docker container is named as name 1. In the embodiment of the present application, a plurality of different Docker containers may be created in different deployment environments according to one Docker image.

Optionally, for different deployment environments, by modifying information of the configuration file in the Docker image and mounting the information of the configuration file (for example, information of an address, a port number, and the like of a configured database in the deployment environment) inside the created Docker container in a mounting manner, implementation in different deployment environments may be achieved. Taking the information of the configured database in the mount deployment environment as an example, the mount here refers to connecting the information indicating the location of the database in the deployment environment to a file directory inside the Docker container, so that the Docker container can find the database in the deployment environment through the connected file directory.

As shown in fig. 1, the process may include the following steps:

step 101, when monitoring that the network traffic data is newly added to the designated folder through the monitoring module packaged in the configured Docker container, step 102 is executed.

In specific implementation, because the Docker image is a read-only template file and cannot be directly run on the network device, before the embodiment of the present application is implemented, the Docker image obtained by the Docker application container engine needs to be instantiated on the network device to obtain a Docker container, and the Docker container can be run on the network device.

In the instantiation process, according to the configuration information of the network device applied in the embodiment of the present application, the information of the configuration file in the Docker image is modified, for example, the address and the port number of the database in the information of the modified configuration file are the address and the port number of the es (elastic search) database configured on the network device, and the information of the modified configuration file is mounted inside the created Docker container in a mounting manner, so that the embodiment of the present application can be implemented on the network device.

Optionally, before implementing the embodiment of the present application, a monitoring module for monitoring the specified folder is determined first. For example, the monitoring module may be a python library pyinitify developed based on a file system monitoring initfy function on a Linux system.

The Pyintotify is a Python module and is used for monitoring whether a specified folder has newly added network flow data or not, and the monitoring inotify function is realized by depending on a file system of a Linux kernel. The inotify function is a change notification mechanism of the file system, and can monitor operations on the file system, such as addition and deletion of files in the file system, and operations of data reading and data writing.

Optionally, before implementing the embodiment of the present application, a packet capturing module for encapsulating in a Docker image is also determined, and the packet capturing module for encapsulating in the Docker image may be implemented by various software, such as a Fiddler packet capturing tool, a Wireshark packet capturing tool, and the like.

Taking the case of setting the packet capturing module to be implemented by the Wireshark packet capturing tool as an example, when capturing network traffic data, it may be set, through a program carried in the Wireshark packet capturing tool, that the captured network traffic data is saved into a pcap file every time the network traffic data of a certain size is captured, for example, that the captured network traffic data of 1GB is saved into a pcap file every time the network traffic data of the certain size is captured, and the pcap file is saved into a designated folder.

As an embodiment, when the packet capturing module Wireshark packet capturing tool encapsulated in the Docker container stores captured network traffic data in a designated folder in a pcap file format, the monitoring module Pyinotify encapsulated in the Docker container monitors newly added network traffic data in the designated folder, and starts to execute step 102 to perform next processing on the newly added network traffic data, and the specific processing content is explained in detail in the introduction step 102.

Further, if the monitoring module Pyinotify packaged in the configured container Docker image monitors that the specified folder does not have new network traffic data, indicating that the Wireshark packet capturing tool does not capture the network traffic data, continuing to monitor the specified folder until the specified folder has the new network traffic data.

And 102, extracting corresponding feature data from the newly added network flow data through a feature module packaged in the configured Docker container.

As one embodiment, the feature module in step 102 may be implemented using a compiled Joy feature extraction tool. Before the method and the device for extracting the characteristics of the Joy features, the source code of the Joy feature extraction tool needs to be obtained first, then the source code is compiled to obtain a binary executable file, and the Joy feature extraction tool in the embodiment of the application realizes feature extraction through the binary executable file.

The compiled Joy feature extraction tool can be used for quickly extracting feature data corresponding to the acquired newly-added network flow data, the newly-added network flow data does not need to be converted into image data in the extraction process, and the extraction speed is greatly improved.

Further, the Joy feature extraction tool may save the obtained feature data into a compressed file of a specified format, such as a compressed file of a file format gz (a compressed file obtained using a gzip compression algorithm).

Optionally, if the newly added network traffic data is saved in the designated folder into a plurality of pcap files by the Wireshark packet capturing tool in step 101, the Joy feature extraction tool may extract the newly added network traffic data in each pcap file, and save a corresponding compressed file for each pcap file.

Optionally, the feature data extracted from the newly added network traffic data by the Joy feature extraction tool is packet size information (data packet size information, such as the number of bytes occupied by the data packet), packet time information (data packet time information, such as a time stamp in the data packet), and byte distribution information (byte distribution information) of a data packet containing the newly added network traffic data. The source of the newly added network traffic data can be identified through the feature data, for example, by identifying the feature data corresponding to the newly added video network traffic data, which video playing software the newly added video network traffic data comes from is identified.

Step 103, inputting the feature data into the trained recognition and classification model to obtain a recognition and classification result, where the recognition and classification result is used to indicate a source of the newly added network traffic data.

Optionally, there may be a plurality of ways to input the feature data into the trained recognition and classification model, for example, all compressed files which are obtained by the Joy feature extraction tool and store the feature data may be imported into the recognition and classification model in batch, or all compressed files may be recompressed into one compressed file containing all feature data, and then the compressed file containing all feature data is imported into the recognition and classification model. The recognition classification model is trained before the embodiment of the present application, and specific implementation will be described in the following description of the training process of the recognition classification model, which is not repeated herein.

In this embodiment, the identification and classification result obtained in step 103 is a result of identifying a source of the newly added network traffic data according to the feature data of the newly added network traffic data, and performing classification statistics on each part of the newly added network traffic data with different sources. For example, if the newly added network traffic data is network traffic data generated on the network device between 19 hours 00 and 20 hours 00, the identification classification result is "the newly added network traffic data from 19 hours 00 to 19 hours 30 is generated by using the a video software, and the newly added network traffic data from 19 hours 30 to 20 hours 00 is generated by using the B video software, and the newly added network traffic data is 2 GB. The above examples are merely for convenience of understanding, and the embodiments of the present application are not particularly limited.

Further, the recognition and classification result obtained in step 103 may be stored in an ES database configured on the network device, so as to facilitate the visual display of the recognition and classification result stored in the ES database by using the visualization tool Grafana packaged in the Docker container, and the recognition and classification result may be converted into visual data convenient to view through the visual display, for example, the recognition and classification result may be converted into visual data convenient to view, such as a pie chart, a line chart, and the like.

Thus, the flow shown in fig. 1 is completed.

As can be seen from the process shown in fig. 1, in the embodiment of the present application, a packet capturing module encapsulated in a configured Docker container is used to capture network traffic data and store the network traffic data in an assigned folder, and a monitoring module encapsulated in the Docker container monitors whether there is new network traffic data in the assigned folder, if there is new network traffic data, a feature module encapsulated in the Docker container is used to extract corresponding feature data from the new network traffic data, and the feature data is input to a trained recognition classification model to obtain a recognition classification result, so that recognition classification of the captured new network traffic data is implemented, and network traffic data does not need to be converted into image data in a recognition process, thereby saving time cost. Further, the scheme provided by the application runs in configured Docker containers, the configured Docker containers are created through Docker images, and one Docker image can create corresponding Docker containers in different environments, so that the Docker containers can be conveniently deployed in other environments.

The following describes the training process for recognizing the classification model:

referring to fig. 2, fig. 2 is a schematic flowchart of a training process for recognizing a classification model according to an embodiment of the present application. As shown in fig. 2, the process may include the following steps:

step 201, obtaining sample network traffic data, and setting a corresponding class label for the obtained sample network traffic data, where the class label is used to indicate a source of the sample network traffic data.

Optionally, in the embodiment of the present application, the obtaining of the sample network traffic data is performed by running only a program generating the sample network traffic data on the network device, and then capturing the sample network traffic data by using a packet capturing tool configured on the network device, such as a Wireshark packet capturing tool. The sample network flow data is captured by the method, the source of the obtained sample network flow data can be determined, and a corresponding label can be conveniently set for the obtained sample network flow data according to the source of the sample network flow data. For example, when sample network traffic data generated by the a video software is obtained, the network device is enabled to run only the a video software to obtain the generated sample network traffic data of the a video software, and a category label corresponding to the a video software is set for the obtained sample network traffic data.

Optionally, after the obtained multiple sample network traffic data are labeled with the category labels, the obtained multiple sample network traffic data may be divided into a training set and a test set, and the specific allocation may be allocated according to a preset proportion, for example, all the sample network traffic data are allocated according to a ratio of 7: the ratio of 3 is randomly divided into a training set and a test set.

Step 202, extracting sample feature data from the sample network traffic data.

As an embodiment, a compiled Joy feature extraction tool may be used to extract corresponding sample feature data from the sample network traffic data, and a specific extraction process is the same as the manner of extracting the feature data in the embodiment shown in fig. 1, and is not described here again.

Optionally, the extracted sample feature data is also divided into training set sample feature data and test set sample feature data corresponding to the sample network traffic data, so as to facilitate training, identifying and classifying the model in step 203.

And step 203, training the recognition classification model according to the sample characteristic data and the class label.

In a specific implementation, the recognition classification model may be trained by using a Twin Support Vector machine (TWSVM) in the embodiment of the present application. Firstly, the training set sample characteristic data and the test set sample characteristic data are imported into a twin support vector machine, the twin support vector machine firstly establishes an identification classification model according to the training set sample characteristic data, then the established identification classification model is subjected to iterative training to optimize the established identification classification model until the established identification classification model is evaluated, and the obtained evaluation result reaches the specified convergence condition. In this embodiment, the convergence condition may be set to stop training when the training error of the recognition and classification model is smaller than a predetermined threshold, and the recognition and classification model whose training is stopped is used as the trained recognition and classification model.

Thus, the flow shown in fig. 2 is completed.

Through the process shown in fig. 2, a trained recognition and classification model is obtained, and the trained recognition and classification model can realize recognition and classification of newly-added network traffic data. The flow shown in fig. 2 is only an example and is not intended to be limiting.

It should be noted that, when the trained recognition and classification model is used for recognizing and classifying the newly added network traffic data, the recognition and classification model can be further optimized according to the recognition and classification result, so that the recognition and classification model can more accurately recognize and classify the newly added network traffic data.

The method provided by the embodiment of the application is described above. The following describes an apparatus provided in an embodiment of the present application:

referring to fig. 3, fig. 3 is a schematic diagram of an apparatus for implementing the embodiment of the present application. The device includes:

the monitoring unit 301 is configured to, when newly added network traffic data of a specified folder is monitored by a monitoring module packaged in a configured Docker container, process the newly added network traffic data by a feature extraction unit, where the specified folder is used to store the network traffic data captured by a packet capturing module packaged in the Docker container.

Optionally, the configured monitoring module encapsulated in the Docker container may be a python library Pyinotify developed based on an inotify function of a file system monitoring function on a Linux system, and the packet capturing module encapsulated in the Docker container may be a Wireshark packet capturing tool.

A feature extraction unit 302, configured to extract, from the newly added network traffic data, corresponding feature data through a feature module encapsulated in a configured Docker container.

Optionally, the feature module in this embodiment of the apparatus is a compiled Joy feature extraction tool. After the Joy feature extraction tool extracts the corresponding feature data from the newly added network traffic data, the feature data is stored into a compressed file with a specified format, for example, a compressed file with a file format gz.

In this embodiment of the apparatus, the compiled Joy feature extraction tool refers to a Joy feature extraction tool that is saved in a binary executable file by obtaining a source code of the Joy feature extraction tool and then compiling the source code before this embodiment of the apparatus.

And the recognition classification unit 303 is configured to input the feature data into a trained recognition classification model to obtain a recognition classification result, where the recognition classification result is used to indicate a source of the newly added network traffic data.

Thus, the structure of the embodiment of the apparatus shown in FIG. 3 is completed.

Further, another apparatus embodiment is provided below for the embodiment of the present application, as shown in fig. 4, in addition to the monitoring unit 401, the feature extraction unit 402, and the identification and classification unit 403, the apparatus embodiment shown in fig. 4 further includes:

and a visual display unit 404, configured to invoke a visualization tool Grafana packaged in the Docker container to perform visual display on the recognition and classification result. The recognition and classification results can be stored in an ES database configured in the network device, so as to facilitate visual presentation of the recognition and classification results by using a visualization tool Grafana.

A model training unit 405, configured to obtain sample network traffic data, and set a corresponding class label for the obtained sample network traffic data, where the class label is used to indicate a source of the sample network traffic data;

extracting sample feature data from the sample network traffic data;

Optionally, in an embodiment of the apparatus, the model training unit trains the recognition classification model to use a twin support vector machine. In the training process, the convergence condition is set to stop training when the training error of the recognition classification model is smaller than a certain preset threshold value, so as to obtain the trained recognition classification model.

Correspondingly, an embodiment of the present application further provides a hardware structure diagram, which is specifically shown in fig. 5. As shown in fig. 5, the hardware structure includes: a processor and a memory.

Wherein the memory is to store machine executable instructions;

the processor is configured to read and execute the machine executable instructions stored in the memory, so as to implement the above-mentioned method embodiment of identifying network traffic data.

For one embodiment, the memory may be any electronic, magnetic, optical, or other physical storage device that may contain or store information such as executable instructions, data, and the like. For example, the memory may be: volatile memory, non-volatile memory, or similar storage media. In particular, the Memory may be a RAM (random Access Memory), a flash Memory, a storage drive (e.g., a hard disk drive), a solid state disk, any type of storage disk (e.g., an optical disk, a DVD, etc.), or similar storage medium, or a combination thereof.

So far, the description of the apparatus shown in fig. 5 is completed.

The above description is only exemplary of the present application and should not be taken as limiting the present application, as any modification, equivalent replacement, or improvement made within the spirit and principle of the present application should be included in the scope of protection of the present application.

Claims

1. A network flow data identification method is applied to network equipment and comprises the following steps:

when newly-added video network traffic data of a specified folder is monitored through a monitoring module packaged in a configured Docker container, the specified folder is used for storing the video network traffic data captured through a packet capturing module packaged in the Docker container;

extracting corresponding characteristic data from the newly added video network flow data through a characteristic module packaged in a configured Docker container; the characteristic data includes: one or more of data packet size information, data packet time information and byte distribution information of the data packet containing the newly added video network traffic data;

and inputting the characteristic data into a trained recognition and classification model to obtain a recognition and classification result, wherein the recognition and classification result is used for indicating the source of the newly added video network traffic data.

2. The method according to claim 1, wherein the monitoring module is a python library Pyinotify developed based on a file system monitoring inotify function on a Linux system.

3. The method of claim 1, wherein the feature module is a compiled Joy feature extraction tool;

after the Joy feature extraction tool extracts corresponding feature data from the newly added video network traffic data, the method further includes: and saving the characteristic data into a compressed file with a specified format.

4. The method of claim 1, further comprising:

5. The method of claim 1, wherein the recognition classification model is trained by:

extracting sample feature data from the sample network traffic data;

6. A network flow data identification device is applied to network equipment and comprises the following components:

the monitoring unit is used for processing newly added video network traffic data through the feature extraction unit when newly added video network traffic data of a specified folder is monitored through a monitoring module packaged in a configured Docker container, and the specified folder is used for storing the video network traffic data captured through a packet capturing module packaged in the Docker container;

the feature extraction unit is used for extracting corresponding feature data from the newly added video network flow data through a feature module packaged in a configured Docker container; the characteristic data includes: one or more of data packet size information, data packet time information and byte distribution information of the data packet containing the newly added video network traffic data;

and the identification classification unit is used for inputting the characteristic data into a trained identification classification model to obtain an identification classification result, and the identification classification result is used for indicating the source of the newly added video network flow data.

7. The apparatus of claim 6, wherein the feature module is a compiled Joy feature extraction tool;

after the feature extraction unit extracts corresponding feature data from the newly added video network traffic data through the Joy feature extraction tool, the feature extraction unit is further configured to: and saving the characteristic data into a compressed file with a specified format.

8. The apparatus of claim 6, further comprising:

9. The apparatus of claim 6, further comprising:

extracting sample characteristic data from the sample network traffic data;

10. An electronic device, comprising: a processor and a memory;

the memory for storing machine executable instructions;

the processor is used for reading and executing the machine executable instructions stored by the memory so as to realize the method of any one of claims 1 to 5.