CN112468509A

CN112468509A - Deep learning technology-based automatic flow data detection method and device

Info

Publication number: CN112468509A
Application number: CN202011446352.1A
Authority: CN
Inventors: 黄松; 周春阳; 周富成; 严小正
Original assignee: Hubei Songhao Technology Co ltd
Current assignee: Hubei Songhao Technology Co ltd
Priority date: 2020-12-09
Filing date: 2020-12-09
Publication date: 2021-03-09

Abstract

The invention discloses a method and a device for automatically detecting flow data based on a deep learning technology, wherein the detection method comprises the following steps: the method comprises the following steps: the system monitors and captures data streams of a data link layer, simultaneously adopts web service to support an external flow data access function, and cleans the captured original data information streams to remove redundant and unnecessary information; step two: after the target data traffic is captured by the packets, the raw data stream is subjected to data preprocessing, namely, a processing process from the raw traffic to the input data of the deep neural network. In view of the characteristic that security threat flow realizes a destruction task through network interaction, the invention adopts the flow detection technology to take the data flow as a basic research object, and after the flow of a captured data link layer, the processing load of the system can be effectively reduced through automatically selecting and analyzing the statistical characteristics of data flow transmission, thereby providing powerful support for determining the network security threat.

Description

Deep learning technology-based automatic flow data detection method and device

Technical Field

The invention relates to the technical field of flow detection, in particular to a method and a device for automatically detecting flow data based on a deep learning technology.

Background

With the continuous development of information technology, internet data gradually becomes an important basic resource of people's life, and the network security along with the internet data is facing a more and more serious challenge. The network flow detection technology is one of the most important protection technologies, network abnormal behaviors are identified by establishing a network access behavior reference, the universality is strong, and the method and the device have wide application in the fields of intrusion detection, network attack, fraud and secret stealing detection and the like. However, most of the conventional automatic traffic data detection methods focus on using methods such as detecting specific data packet loads, matching a trojan feature library or network protocol division, and these techniques rely on the judgment of trojan detection expert experience, lack of generalization capability, are difficult to cope with increasingly complex trojan techniques and network environments, and have low detection accuracy and lack of practicability. The method and the device for detecting and processing the flow data based on the deep learning technology support the automatic processing and the intelligent threat detection of the flow data, respectively correspond to the security threat flow, the uncertain flow and the security flow, and can provide powerful support for identifying the flow threat.

Disclosure of Invention

The invention aims to provide a method and a device for automatically detecting flow data based on a deep learning technology, which solve the problems in the background technology.

In order to achieve the purpose, the invention provides the following technical scheme: a flow data automatic detection method based on deep learning technology is disclosed, the detection method is as follows:

the method comprises the following steps: the system monitors and captures data streams of a data link layer, simultaneously adopts web service to support an external flow data access function, and cleans the captured original data information streams to remove redundant and unnecessary information;

step two: after target data traffic is captured in a grouping mode, data preprocessing is carried out on an original data stream, namely a processing process from the original traffic to input data of a deep neural network is carried out;

step three: the method comprises two processes of training and identifying the flow data by a deep learning method; identifying security threat flow, uncertain flow and security flow, and realizing flow admission control;

step four: and detecting the network flow by using the trained model, identifying the security threat flow, the uncertain flow and the security flow, and realizing the flow admission control.

As a preferred embodiment of the present invention, the step one includes the following sub-steps:

(1.1) for the data packets flowing through the network system, the network card is set to be in a hybrid mode at the network outlet of the system, the mode can accept all the flow data packets flowing through the local, and the data packet acquisition service system has the following working process:

firstly, the method comprises the following steps: initializing a data packet acquisition environment, and setting the size of an acquired data packet, the number of available CPUs (central processing units) and the size of a multithreading pool;

secondly, the method comprises the following steps: establishing a system memory buffer area, copying the received data packet to a socket buffer area, and accessing the data packet by calling a system function mmap by a system user layer;

thirdly, the method comprises the following steps: continuously inquiring each port by using multithread circulation and receiving a flow data packet, and filtering redundant information which does not belong to the network by adopting a self-defined packet processing function;

fourthly: and finally outputting a standard flow data packet set in a pcap format.

And (1.2) for externally provided data packets, providing an API callback interface for an external data source, acquiring a traffic data set through a web service, and extracting data traffic of a single application in the traffic set, wherein the data traffic comprises all protocol layer data streams of the application.

(1.3) comparing the data flow with the existing flow set in the system one by one,

if the data packet already exists in the system, merging the traffic generated by the same application, and selecting the latest timestamp version to cover the old version of the data stream.

If the application of the data stream does not exist in the system, the data stream is imported into the system as new data, and the associated data table version number is recorded;

if the data table is not retrieved in the system, the version number of the data table is set to 0, and the data table is imported into the system as new data traffic.

And (1.4) repeating the operations of the steps (1.2) and (1.3) until all the data traffic sets to be processed are smoothly imported into the system.

As a preferred embodiment of the present invention, the second step includes the following sub-steps:

(2.1) for the input flow data, selecting all data packets with the same < source IP, source port, destination IP, destination port and transport layer protocol >, and grouping the original flow data according to the data packets.

(2.2) adopting the following flow cleaning method for the grouped flow data:

firstly, the method comprises the following steps: comparing the conversation contents of all the grouped flows, and removing the repeated flows with the same contents;

secondly, the method comprises the following steps: removing the empty traffic packets without the data content of the application layer, such as data traffic packets of ACK (acknowledgement) files and the like;

thirdly, the method comprises the following steps: selecting a flow data packet by adopting a random sampling method, randomly generating a group of new MAC addresses and IP addresses, and replacing the MAC addresses and the IP addresses of the IP layer corresponding to the data link layer.

And (2.3) for the cleaned flow data, converting the flow data packet into an image input of a neural network, and if the size of the image is N x N, extracting a data flow segment with the length of N bytes every k data packet lengths, and inserting N-N confusion character strings at the tail end to enhance the randomness of the flow sample.

And (2.4) converting the file with the uniform length into a flow data picture according to a binary form, wherein the gray scale picture comprises two dimensions of width and height, the color picture comprises three dimensions of width, height and channel, and finally converting the picture into a file format which comprises the flow data pixel information and statistical information IDX and is used as the input of a neural network.

And (2.5) repeating the operations of the steps (2.1) to (2.4) until all the data traffic sets to be processed are processed.

As a preferred embodiment of the present invention, the third step includes the following sub-steps:

and (3.1) taking the flow data graph in the IDX format as the input of the two-dimensional CNN network model.

And (3.2) initializing a deep network model parameter, wherein a network structure adopts a stacked three-layer CNN neural network, a Dropout layer is added behind each layer of CNN neural network to prevent the overfitting phenomenon of the model, and then a layer of Flatten is added to reduce the dimension of flow data of the two-dimensional image and output the flow data.

(3.3) adding an attention mechanism behind the deep neural network convolution module, wherein the attention mechanism acquires weights obtained by training in the CNN neural network, and directly weights global information on a space or a channel to be used as input features, namely, the attention filter adds attention weights to a group of flow bytes with the window width h and operates to obtain new features.

And (3.4) finally adding a softmax layer in the deep neural network, dividing the output of the convolutional layer into a plurality of two-dimensional matrixes with the same size and smaller dimensionalities and without overlapping each other, and then pooling according to the mean value or the maximum value to obtain the output of the subsampling layer.

And (3.5) in the training process, firstly, randomly selecting N classes from the base class flow data set, sampling a basic support set and a basic query set from data samples of the classes, and training a deep neural network model by taking the support set as a training sample according to a training task target so as to minimize the recognition loss of the model to the flow samples in the query set.

As a preferred embodiment of the present invention, the step four includes the following sub-steps:

and (4.1) deploying the trained deep learning model on the target server.

And (4.2) starting a system flow acquisition module, continuously collecting the network flow conditions of each branch, and keeping the system processing capacity within a forwarding threshold value due to huge flow data of a scene.

And (4.3) loading the trained deep neural network model in a system flow analysis module, dividing the application scene flow into black security threat flow, grey uncertain flow and white security flow, screening out flow data of a single white part, excluding most normal flow, and directly judging the Trojan carried by the black threat flow part through a detection technology. For the gray part of the traffic, the possible threats are presumed as much as possible for further comprehensive comparison.

The invention also relates to a deep learning technology based traffic detection apparatus comprising a web service interface comprising a processing component and a memory resource represented by a memory for storing instructions executable by the processing component, the processing component being configured to execute the instructions to perform the above method.

Wherein, the processing assembly comprises a flow cleaning assembly, a flow conversion assembly and a flow detection assembly.

Wherein the application stored in the memory includes one or more modules each corresponding to a set of instructions.

Wherein the web services interface may further comprise a power component configured to perform power management of the web services interface, a wired or wireless network listening interface configured to connect the device to a network, an output interface, the web services interface operable based on an operating system stored in the memory.

Compared with the prior art, the invention has the following beneficial effects:

1. in view of the characteristic that security threat flow realizes a destruction task through network interaction, the invention adopts the flow detection technology to take the data flow as a basic research object, and after the flow of a captured data link layer, the processing load of the system can be effectively reduced through automatically selecting and analyzing the statistical characteristics of data flow transmission, thereby providing powerful support for determining the network security threat.

2. The deep learning model technology does not depend on a certain Trojan horse feature library, does not need to extract statistical features of training samples and flow to be measured, can be compatible with an encryption protocol and a weak feature protocol, and does not depend on expert experience to obtain data features on the premise of ensuring high accuracy, thereby realizing automatic processing.

Drawings

Other features, objects and advantages of the invention will become more apparent upon reading of the detailed description of non-limiting embodiments with reference to the following drawings:

FIG. 1 is a schematic diagram of a flow detection device based on deep learning technology according to the present invention;

FIG. 2 is a flow detection architecture diagram based on deep learning according to the present invention;

FIG. 3 is a flow data processing diagram according to the present invention;

fig. 4 is a flow chart of detecting traffic data according to the present invention.

1900. A web service interface; 1922. a processing component; 1926. a power supply component; 1932. a memory; 1950. a wired or wireless network monitoring interface; 1958. and (6) an output interface.

Detailed Description

In order to make the technical means, the creation characteristics, the achievement purposes and the effects of the invention easy to understand, the invention is further described with the specific embodiments.

Referring to fig. 1-4, the present invention provides a technical solution: a flow data automatic detection method based on deep learning technology is disclosed, the detection method is as follows:

the method comprises the following steps: the system monitors and captures data streams of a data link layer, meanwhile, a web service is adopted to support an external flow data access function, the system cleans the captured original data information streams and removes redundant and unnecessary information, the system captures the data streams of the data link layer, meanwhile, in order to effectively reduce the processing burden of the system, the system also supports a cross-system flow data access function, API call-back interfaces are provided for the externally accessed flow, the web service is adopted to obtain flow data of different sources, the system cleans the captured original data information streams and removes the redundant and unnecessary information, and the consistency of the flow data is maintained;

step two: after the target data traffic is captured in a grouping mode, the original data flow is subjected to data preprocessing, namely, the data processing process from the original traffic to the deep neural network input data is carried out, the system carries out deep analysis on the data traffic, and after the target data traffic is captured in a grouping mode, in order to avoid traffic grouping confusion caused by the fact that Maximum Transmission Units (MTUs) are different in size or the fragmentation function of a transmission protocol, the reorganization and sequencing of data fragmentation are supported, and the data flow is converted into an image through a data coding modeling method and used as the neural network input;

step three: the method comprises two processes of training and identifying the flow data by a deep learning method; the method comprises the steps of identifying safety threat flow, uncertain flow and safety flow, realizing flow admission control, training a deep neural network model by a system, automatically learning and accumulating large-flow data by model training, and realizing the rapid learning capability of the flow data by extracting abstract characteristic parts in different types of flow;

step four: the method comprises the steps of utilizing a trained model to detect network flow, identifying security threat flow, uncertain flow and security flow, realizing flow admission control, and utilizing a flow detection engine identification function of the system to deploy the system at a gateway outlet, automatically and correspondingly identifying the network flow into black security threat flow, gray uncertain flow and white security flow according to the trained deep learning model, directly judging trojans or screening out partial flow of a white list by adopting various detection technologies, and continuously feeding back results to the trained model to strengthen the data flow detection capability of the system.

Further, the first step comprises the following sub-steps:

Further, the second step includes the following sub-steps:

(2.2) adopting the following flow cleaning method for the grouped flow data:

Further, the third step includes the following sub-steps:

Further, the fourth step includes the following sub-steps:

and (4.1) deploying the trained deep learning model on the target server.

Referring to fig. 1, a deep learning technology-based traffic detection apparatus includes a web service interface 1900, where the web service interface 1900 includes a processing component 1922 and a memory resource represented by a memory 1932 for storing instructions executable by the processing component 1922, the processing component 1922 is configured to execute the instructions to perform the method, the processing component 1922 includes a traffic washing component, a traffic conversion component and a traffic detection component, an application program stored in the memory 1932 includes one or more modules each corresponding to a set of instructions, the web service interface 1900 may further include a power supply component 1926, a wired or wireless network snooping interface 1950, and an output interface 1958, the power supply component 1926 is configured to perform power management of the web service interface 1900, the wireless network snooping interface 1950 is configured to connect the apparatus 1900 to a network, the web services interface 1900 may operate based on an operating system stored in memory 1932, such as Windows Server, MacOSXTM, UnixTM, LinuxTM, FreeBSDTM, or the like.

In summary, in view of the characteristic that security threat traffic realizes a destruction task through network interaction, the invention adopts a flow detection technology to take data flow as a basic research object, after the captured data link layer traffic, the invention can effectively reduce the processing burden of the system by automatically selecting and analyzing the statistical characteristics of data flow transmission, and provides powerful support for determining network security threat.

While there have been shown and described what are at present considered the fundamental principles and essential features of the invention and its advantages, it will be apparent to those skilled in the art that the invention is not limited to the details of the foregoing exemplary embodiments, but is capable of other specific forms without departing from the spirit or essential characteristics thereof. The present embodiments are therefore to be considered in all respects as illustrative and not restrictive, the scope of the invention being indicated by the appended claims rather than by the foregoing description, and all changes which come within the meaning and range of equivalency of the claims are therefore intended to be embraced therein. Any reference sign in a claim should not be construed as limiting the claim concerned.

Furthermore, it should be understood that although the present description refers to embodiments, not every embodiment may contain only a single embodiment, and such description is for clarity only, and those skilled in the art should integrate the description, and the embodiments may be combined as appropriate to form other embodiments understood by those skilled in the art.

Claims

1. A flow data automatic detection method based on deep learning technology is characterized in that: the detection method comprises the following steps:

2. The method for automatically detecting the flow data based on the deep learning technology as claimed in claim 1, wherein: the first step comprises the following substeps:

3. The method for automatically detecting the flow data based on the deep learning technology as claimed in claim 1, wherein: the second step comprises the following substeps:

(2.2) adopting the following flow cleaning method for the grouped flow data:

4. The method for automatically detecting the flow data based on the deep learning technology as claimed in claim 1, wherein: the third step comprises the following substeps:

5. The method for automatically detecting the flow data based on the deep learning technology as claimed in claim 1, wherein: the fourth step comprises the following substeps:

and (4.1) deploying the trained deep learning model on the target server.

6. The utility model provides a flow data automatic checkout device based on deep learning technique which characterized in that: including a web services interface 1900 that includes a processing component 1922 and memory resources, represented by memory 1932, for storing instructions executable by the processing component 1922, the processing component 1922 being configured to execute the instructions to perform the methods described above.

7. The flow rate detection device based on the deep learning technology as claimed in claim 6, wherein: the processing assembly 1922 comprises a flow cleaning assembly, a flow conversion assembly and a flow detection assembly.

8. The flow rate detection device based on the deep learning technology as claimed in claim 6, wherein: the application programs stored in the memory 1932 include one or more modules that each correspond to a set of instructions.

9. The flow rate detection device based on the deep learning technology as claimed in claim 6, wherein: the web service interface 1900 may further include a power component 1926, a wired or wireless network listen interface 1950, an output interface 1958, the power component 1926 configured to perform power management of the web service interface 1900, the wireless network listen interface 1950 configured to connect the device 1900 to a network, and the web service interface 1900 may operate based on an operating system stored in the memory 1932.