CN112468509A - Deep learning technology-based automatic flow data detection method and device - Google Patents

Deep learning technology-based automatic flow data detection method and device Download PDF

Info

Publication number
CN112468509A
CN112468509A CN202011446352.1A CN202011446352A CN112468509A CN 112468509 A CN112468509 A CN 112468509A CN 202011446352 A CN202011446352 A CN 202011446352A CN 112468509 A CN112468509 A CN 112468509A
Authority
CN
China
Prior art keywords
data
flow
traffic
network
layer
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202011446352.1A
Other languages
Chinese (zh)
Inventor
黄松
周春阳
周富成
严小正
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hubei Songhao Technology Co ltd
Original Assignee
Hubei Songhao Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hubei Songhao Technology Co ltd filed Critical Hubei Songhao Technology Co ltd
Priority to CN202011446352.1A priority Critical patent/CN112468509A/en
Publication of CN112468509A publication Critical patent/CN112468509A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/14Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic
    • H04L63/1408Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic by monitoring network traffic
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures
    • G06F16/2282Tablespace storage structures; Management thereof
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2455Query execution
    • G06F16/24568Data stream processing; Continuous queries
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Computing Systems (AREA)
  • Software Systems (AREA)
  • Computational Linguistics (AREA)
  • Biophysics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Molecular Biology (AREA)
  • Evolutionary Computation (AREA)
  • Biomedical Technology (AREA)
  • Artificial Intelligence (AREA)
  • Mathematical Physics (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Databases & Information Systems (AREA)
  • Computer Security & Cryptography (AREA)
  • Computer Hardware Design (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Data Exchanges In Wide-Area Networks (AREA)

Abstract

The invention discloses a method and a device for automatically detecting flow data based on a deep learning technology, wherein the detection method comprises the following steps: the method comprises the following steps: the system monitors and captures data streams of a data link layer, simultaneously adopts web service to support an external flow data access function, and cleans the captured original data information streams to remove redundant and unnecessary information; step two: after the target data traffic is captured by the packets, the raw data stream is subjected to data preprocessing, namely, a processing process from the raw traffic to the input data of the deep neural network. In view of the characteristic that security threat flow realizes a destruction task through network interaction, the invention adopts the flow detection technology to take the data flow as a basic research object, and after the flow of a captured data link layer, the processing load of the system can be effectively reduced through automatically selecting and analyzing the statistical characteristics of data flow transmission, thereby providing powerful support for determining the network security threat.

Description

Deep learning technology-based automatic flow data detection method and device
Technical Field
The invention relates to the technical field of flow detection, in particular to a method and a device for automatically detecting flow data based on a deep learning technology.
Background
With the continuous development of information technology, internet data gradually becomes an important basic resource of people's life, and the network security along with the internet data is facing a more and more serious challenge. The network flow detection technology is one of the most important protection technologies, network abnormal behaviors are identified by establishing a network access behavior reference, the universality is strong, and the method and the device have wide application in the fields of intrusion detection, network attack, fraud and secret stealing detection and the like. However, most of the conventional automatic traffic data detection methods focus on using methods such as detecting specific data packet loads, matching a trojan feature library or network protocol division, and these techniques rely on the judgment of trojan detection expert experience, lack of generalization capability, are difficult to cope with increasingly complex trojan techniques and network environments, and have low detection accuracy and lack of practicability. The method and the device for detecting and processing the flow data based on the deep learning technology support the automatic processing and the intelligent threat detection of the flow data, respectively correspond to the security threat flow, the uncertain flow and the security flow, and can provide powerful support for identifying the flow threat.
Disclosure of Invention
The invention aims to provide a method and a device for automatically detecting flow data based on a deep learning technology, which solve the problems in the background technology.
In order to achieve the purpose, the invention provides the following technical scheme: a flow data automatic detection method based on deep learning technology is disclosed, the detection method is as follows:
the method comprises the following steps: the system monitors and captures data streams of a data link layer, simultaneously adopts web service to support an external flow data access function, and cleans the captured original data information streams to remove redundant and unnecessary information;
step two: after target data traffic is captured in a grouping mode, data preprocessing is carried out on an original data stream, namely a processing process from the original traffic to input data of a deep neural network is carried out;
step three: the method comprises two processes of training and identifying the flow data by a deep learning method; identifying security threat flow, uncertain flow and security flow, and realizing flow admission control;
step four: and detecting the network flow by using the trained model, identifying the security threat flow, the uncertain flow and the security flow, and realizing the flow admission control.
As a preferred embodiment of the present invention, the step one includes the following sub-steps:
(1.1) for the data packets flowing through the network system, the network card is set to be in a hybrid mode at the network outlet of the system, the mode can accept all the flow data packets flowing through the local, and the data packet acquisition service system has the following working process:
firstly, the method comprises the following steps: initializing a data packet acquisition environment, and setting the size of an acquired data packet, the number of available CPUs (central processing units) and the size of a multithreading pool;
secondly, the method comprises the following steps: establishing a system memory buffer area, copying the received data packet to a socket buffer area, and accessing the data packet by calling a system function mmap by a system user layer;
thirdly, the method comprises the following steps: continuously inquiring each port by using multithread circulation and receiving a flow data packet, and filtering redundant information which does not belong to the network by adopting a self-defined packet processing function;
fourthly: and finally outputting a standard flow data packet set in a pcap format.
And (1.2) for externally provided data packets, providing an API callback interface for an external data source, acquiring a traffic data set through a web service, and extracting data traffic of a single application in the traffic set, wherein the data traffic comprises all protocol layer data streams of the application.
(1.3) comparing the data flow with the existing flow set in the system one by one,
if the data packet already exists in the system, merging the traffic generated by the same application, and selecting the latest timestamp version to cover the old version of the data stream.
If the application of the data stream does not exist in the system, the data stream is imported into the system as new data, and the associated data table version number is recorded;
if the data table is not retrieved in the system, the version number of the data table is set to 0, and the data table is imported into the system as new data traffic.
And (1.4) repeating the operations of the steps (1.2) and (1.3) until all the data traffic sets to be processed are smoothly imported into the system.
As a preferred embodiment of the present invention, the second step includes the following sub-steps:
(2.1) for the input flow data, selecting all data packets with the same < source IP, source port, destination IP, destination port and transport layer protocol >, and grouping the original flow data according to the data packets.
(2.2) adopting the following flow cleaning method for the grouped flow data:
firstly, the method comprises the following steps: comparing the conversation contents of all the grouped flows, and removing the repeated flows with the same contents;
secondly, the method comprises the following steps: removing the empty traffic packets without the data content of the application layer, such as data traffic packets of ACK (acknowledgement) files and the like;
thirdly, the method comprises the following steps: selecting a flow data packet by adopting a random sampling method, randomly generating a group of new MAC addresses and IP addresses, and replacing the MAC addresses and the IP addresses of the IP layer corresponding to the data link layer.
And (2.3) for the cleaned flow data, converting the flow data packet into an image input of a neural network, and if the size of the image is N x N, extracting a data flow segment with the length of N bytes every k data packet lengths, and inserting N-N confusion character strings at the tail end to enhance the randomness of the flow sample.
And (2.4) converting the file with the uniform length into a flow data picture according to a binary form, wherein the gray scale picture comprises two dimensions of width and height, the color picture comprises three dimensions of width, height and channel, and finally converting the picture into a file format which comprises the flow data pixel information and statistical information IDX and is used as the input of a neural network.
And (2.5) repeating the operations of the steps (2.1) to (2.4) until all the data traffic sets to be processed are processed.
As a preferred embodiment of the present invention, the third step includes the following sub-steps:
and (3.1) taking the flow data graph in the IDX format as the input of the two-dimensional CNN network model.
And (3.2) initializing a deep network model parameter, wherein a network structure adopts a stacked three-layer CNN neural network, a Dropout layer is added behind each layer of CNN neural network to prevent the overfitting phenomenon of the model, and then a layer of Flatten is added to reduce the dimension of flow data of the two-dimensional image and output the flow data.
(3.3) adding an attention mechanism behind the deep neural network convolution module, wherein the attention mechanism acquires weights obtained by training in the CNN neural network, and directly weights global information on a space or a channel to be used as input features, namely, the attention filter adds attention weights to a group of flow bytes with the window width h and operates to obtain new features.
And (3.4) finally adding a softmax layer in the deep neural network, dividing the output of the convolutional layer into a plurality of two-dimensional matrixes with the same size and smaller dimensionalities and without overlapping each other, and then pooling according to the mean value or the maximum value to obtain the output of the subsampling layer.
And (3.5) in the training process, firstly, randomly selecting N classes from the base class flow data set, sampling a basic support set and a basic query set from data samples of the classes, and training a deep neural network model by taking the support set as a training sample according to a training task target so as to minimize the recognition loss of the model to the flow samples in the query set.
As a preferred embodiment of the present invention, the step four includes the following sub-steps:
and (4.1) deploying the trained deep learning model on the target server.
And (4.2) starting a system flow acquisition module, continuously collecting the network flow conditions of each branch, and keeping the system processing capacity within a forwarding threshold value due to huge flow data of a scene.
And (4.3) loading the trained deep neural network model in a system flow analysis module, dividing the application scene flow into black security threat flow, grey uncertain flow and white security flow, screening out flow data of a single white part, excluding most normal flow, and directly judging the Trojan carried by the black threat flow part through a detection technology. For the gray part of the traffic, the possible threats are presumed as much as possible for further comprehensive comparison.
The invention also relates to a deep learning technology based traffic detection apparatus comprising a web service interface comprising a processing component and a memory resource represented by a memory for storing instructions executable by the processing component, the processing component being configured to execute the instructions to perform the above method.
Wherein, the processing assembly comprises a flow cleaning assembly, a flow conversion assembly and a flow detection assembly.
Wherein the application stored in the memory includes one or more modules each corresponding to a set of instructions.
Wherein the web services interface may further comprise a power component configured to perform power management of the web services interface, a wired or wireless network listening interface configured to connect the device to a network, an output interface, the web services interface operable based on an operating system stored in the memory.
Compared with the prior art, the invention has the following beneficial effects:
1. in view of the characteristic that security threat flow realizes a destruction task through network interaction, the invention adopts the flow detection technology to take the data flow as a basic research object, and after the flow of a captured data link layer, the processing load of the system can be effectively reduced through automatically selecting and analyzing the statistical characteristics of data flow transmission, thereby providing powerful support for determining the network security threat.
2. The deep learning model technology does not depend on a certain Trojan horse feature library, does not need to extract statistical features of training samples and flow to be measured, can be compatible with an encryption protocol and a weak feature protocol, and does not depend on expert experience to obtain data features on the premise of ensuring high accuracy, thereby realizing automatic processing.
Drawings
Other features, objects and advantages of the invention will become more apparent upon reading of the detailed description of non-limiting embodiments with reference to the following drawings:
FIG. 1 is a schematic diagram of a flow detection device based on deep learning technology according to the present invention;
FIG. 2 is a flow detection architecture diagram based on deep learning according to the present invention;
FIG. 3 is a flow data processing diagram according to the present invention;
fig. 4 is a flow chart of detecting traffic data according to the present invention.
1900. A web service interface; 1922. a processing component; 1926. a power supply component; 1932. a memory; 1950. a wired or wireless network monitoring interface; 1958. and (6) an output interface.
Detailed Description
In order to make the technical means, the creation characteristics, the achievement purposes and the effects of the invention easy to understand, the invention is further described with the specific embodiments.
Referring to fig. 1-4, the present invention provides a technical solution: a flow data automatic detection method based on deep learning technology is disclosed, the detection method is as follows:
the method comprises the following steps: the system monitors and captures data streams of a data link layer, meanwhile, a web service is adopted to support an external flow data access function, the system cleans the captured original data information streams and removes redundant and unnecessary information, the system captures the data streams of the data link layer, meanwhile, in order to effectively reduce the processing burden of the system, the system also supports a cross-system flow data access function, API call-back interfaces are provided for the externally accessed flow, the web service is adopted to obtain flow data of different sources, the system cleans the captured original data information streams and removes the redundant and unnecessary information, and the consistency of the flow data is maintained;
step two: after the target data traffic is captured in a grouping mode, the original data flow is subjected to data preprocessing, namely, the data processing process from the original traffic to the deep neural network input data is carried out, the system carries out deep analysis on the data traffic, and after the target data traffic is captured in a grouping mode, in order to avoid traffic grouping confusion caused by the fact that Maximum Transmission Units (MTUs) are different in size or the fragmentation function of a transmission protocol, the reorganization and sequencing of data fragmentation are supported, and the data flow is converted into an image through a data coding modeling method and used as the neural network input;
step three: the method comprises two processes of training and identifying the flow data by a deep learning method; the method comprises the steps of identifying safety threat flow, uncertain flow and safety flow, realizing flow admission control, training a deep neural network model by a system, automatically learning and accumulating large-flow data by model training, and realizing the rapid learning capability of the flow data by extracting abstract characteristic parts in different types of flow;
step four: the method comprises the steps of utilizing a trained model to detect network flow, identifying security threat flow, uncertain flow and security flow, realizing flow admission control, and utilizing a flow detection engine identification function of the system to deploy the system at a gateway outlet, automatically and correspondingly identifying the network flow into black security threat flow, gray uncertain flow and white security flow according to the trained deep learning model, directly judging trojans or screening out partial flow of a white list by adopting various detection technologies, and continuously feeding back results to the trained model to strengthen the data flow detection capability of the system.
Further, the first step comprises the following sub-steps:
(1.1) for the data packets flowing through the network system, the network card is set to be in a hybrid mode at the network outlet of the system, the mode can accept all the flow data packets flowing through the local, and the data packet acquisition service system has the following working process:
firstly, the method comprises the following steps: initializing a data packet acquisition environment, and setting the size of an acquired data packet, the number of available CPUs (central processing units) and the size of a multithreading pool;
secondly, the method comprises the following steps: establishing a system memory buffer area, copying the received data packet to a socket buffer area, and accessing the data packet by calling a system function mmap by a system user layer;
thirdly, the method comprises the following steps: continuously inquiring each port by using multithread circulation and receiving a flow data packet, and filtering redundant information which does not belong to the network by adopting a self-defined packet processing function;
fourthly: and finally outputting a standard flow data packet set in a pcap format.
And (1.2) for externally provided data packets, providing an API callback interface for an external data source, acquiring a traffic data set through a web service, and extracting data traffic of a single application in the traffic set, wherein the data traffic comprises all protocol layer data streams of the application.
(1.3) comparing the data flow with the existing flow set in the system one by one,
if the data packet already exists in the system, merging the traffic generated by the same application, and selecting the latest timestamp version to cover the old version of the data stream.
If the application of the data stream does not exist in the system, the data stream is imported into the system as new data, and the associated data table version number is recorded;
if the data table is not retrieved in the system, the version number of the data table is set to 0, and the data table is imported into the system as new data traffic.
And (1.4) repeating the operations of the steps (1.2) and (1.3) until all the data traffic sets to be processed are smoothly imported into the system.
Further, the second step includes the following sub-steps:
(2.1) for the input flow data, selecting all data packets with the same < source IP, source port, destination IP, destination port and transport layer protocol >, and grouping the original flow data according to the data packets.
(2.2) adopting the following flow cleaning method for the grouped flow data:
firstly, the method comprises the following steps: comparing the conversation contents of all the grouped flows, and removing the repeated flows with the same contents;
secondly, the method comprises the following steps: removing the empty traffic packets without the data content of the application layer, such as data traffic packets of ACK (acknowledgement) files and the like;
thirdly, the method comprises the following steps: selecting a flow data packet by adopting a random sampling method, randomly generating a group of new MAC addresses and IP addresses, and replacing the MAC addresses and the IP addresses of the IP layer corresponding to the data link layer.
And (2.3) for the cleaned flow data, converting the flow data packet into an image input of a neural network, and if the size of the image is N x N, extracting a data flow segment with the length of N bytes every k data packet lengths, and inserting N-N confusion character strings at the tail end to enhance the randomness of the flow sample.
And (2.4) converting the file with the uniform length into a flow data picture according to a binary form, wherein the gray scale picture comprises two dimensions of width and height, the color picture comprises three dimensions of width, height and channel, and finally converting the picture into a file format which comprises the flow data pixel information and statistical information IDX and is used as the input of a neural network.
And (2.5) repeating the operations of the steps (2.1) to (2.4) until all the data traffic sets to be processed are processed.
Further, the third step includes the following sub-steps:
and (3.1) taking the flow data graph in the IDX format as the input of the two-dimensional CNN network model.
And (3.2) initializing a deep network model parameter, wherein a network structure adopts a stacked three-layer CNN neural network, a Dropout layer is added behind each layer of CNN neural network to prevent the overfitting phenomenon of the model, and then a layer of Flatten is added to reduce the dimension of flow data of the two-dimensional image and output the flow data.
(3.3) adding an attention mechanism behind the deep neural network convolution module, wherein the attention mechanism acquires weights obtained by training in the CNN neural network, and directly weights global information on a space or a channel to be used as input features, namely, the attention filter adds attention weights to a group of flow bytes with the window width h and operates to obtain new features.
And (3.4) finally adding a softmax layer in the deep neural network, dividing the output of the convolutional layer into a plurality of two-dimensional matrixes with the same size and smaller dimensionalities and without overlapping each other, and then pooling according to the mean value or the maximum value to obtain the output of the subsampling layer.
And (3.5) in the training process, firstly, randomly selecting N classes from the base class flow data set, sampling a basic support set and a basic query set from data samples of the classes, and training a deep neural network model by taking the support set as a training sample according to a training task target so as to minimize the recognition loss of the model to the flow samples in the query set.
Further, the fourth step includes the following sub-steps:
and (4.1) deploying the trained deep learning model on the target server.
And (4.2) starting a system flow acquisition module, continuously collecting the network flow conditions of each branch, and keeping the system processing capacity within a forwarding threshold value due to huge flow data of a scene.
And (4.3) loading the trained deep neural network model in a system flow analysis module, dividing the application scene flow into black security threat flow, grey uncertain flow and white security flow, screening out flow data of a single white part, excluding most normal flow, and directly judging the Trojan carried by the black threat flow part through a detection technology. For the gray part of the traffic, the possible threats are presumed as much as possible for further comprehensive comparison.
Referring to fig. 1, a deep learning technology-based traffic detection apparatus includes a web service interface 1900, where the web service interface 1900 includes a processing component 1922 and a memory resource represented by a memory 1932 for storing instructions executable by the processing component 1922, the processing component 1922 is configured to execute the instructions to perform the method, the processing component 1922 includes a traffic washing component, a traffic conversion component and a traffic detection component, an application program stored in the memory 1932 includes one or more modules each corresponding to a set of instructions, the web service interface 1900 may further include a power supply component 1926, a wired or wireless network snooping interface 1950, and an output interface 1958, the power supply component 1926 is configured to perform power management of the web service interface 1900, the wireless network snooping interface 1950 is configured to connect the apparatus 1900 to a network, the web services interface 1900 may operate based on an operating system stored in memory 1932, such as Windows Server, MacOSXTM, UnixTM, LinuxTM, FreeBSDTM, or the like.
In summary, in view of the characteristic that security threat traffic realizes a destruction task through network interaction, the invention adopts a flow detection technology to take data flow as a basic research object, after the captured data link layer traffic, the invention can effectively reduce the processing burden of the system by automatically selecting and analyzing the statistical characteristics of data flow transmission, and provides powerful support for determining network security threat.
While there have been shown and described what are at present considered the fundamental principles and essential features of the invention and its advantages, it will be apparent to those skilled in the art that the invention is not limited to the details of the foregoing exemplary embodiments, but is capable of other specific forms without departing from the spirit or essential characteristics thereof. The present embodiments are therefore to be considered in all respects as illustrative and not restrictive, the scope of the invention being indicated by the appended claims rather than by the foregoing description, and all changes which come within the meaning and range of equivalency of the claims are therefore intended to be embraced therein. Any reference sign in a claim should not be construed as limiting the claim concerned.
Furthermore, it should be understood that although the present description refers to embodiments, not every embodiment may contain only a single embodiment, and such description is for clarity only, and those skilled in the art should integrate the description, and the embodiments may be combined as appropriate to form other embodiments understood by those skilled in the art.

Claims (9)

1. A flow data automatic detection method based on deep learning technology is characterized in that: the detection method comprises the following steps:
the method comprises the following steps: the system monitors and captures data streams of a data link layer, simultaneously adopts web service to support an external flow data access function, and cleans the captured original data information streams to remove redundant and unnecessary information;
step two: after target data traffic is captured in a grouping mode, data preprocessing is carried out on an original data stream, namely a processing process from the original traffic to input data of a deep neural network is carried out;
step three: the method comprises two processes of training and identifying the flow data by a deep learning method; identifying security threat flow, uncertain flow and security flow, and realizing flow admission control;
step four: and detecting the network flow by using the trained model, identifying the security threat flow, the uncertain flow and the security flow, and realizing the flow admission control.
2. The method for automatically detecting the flow data based on the deep learning technology as claimed in claim 1, wherein: the first step comprises the following substeps:
(1.1) for the data packets flowing through the network system, the network card is set to be in a hybrid mode at the network outlet of the system, the mode can accept all the flow data packets flowing through the local, and the data packet acquisition service system has the following working process:
firstly, the method comprises the following steps: initializing a data packet acquisition environment, and setting the size of an acquired data packet, the number of available CPUs (central processing units) and the size of a multithreading pool;
secondly, the method comprises the following steps: establishing a system memory buffer area, copying the received data packet to a socket buffer area, and accessing the data packet by calling a system function mmap by a system user layer;
thirdly, the method comprises the following steps: continuously inquiring each port by using multithread circulation and receiving a flow data packet, and filtering redundant information which does not belong to the network by adopting a self-defined packet processing function;
fourthly: and finally outputting a standard flow data packet set in a pcap format.
And (1.2) for externally provided data packets, providing an API callback interface for an external data source, acquiring a traffic data set through a web service, and extracting data traffic of a single application in the traffic set, wherein the data traffic comprises all protocol layer data streams of the application.
(1.3) comparing the data flow with the existing flow set in the system one by one,
if the data packet already exists in the system, merging the traffic generated by the same application, and selecting the latest timestamp version to cover the old version of the data stream.
If the application of the data stream does not exist in the system, the data stream is imported into the system as new data, and the associated data table version number is recorded;
if the data table is not retrieved in the system, the version number of the data table is set to 0, and the data table is imported into the system as new data traffic.
And (1.4) repeating the operations of the steps (1.2) and (1.3) until all the data traffic sets to be processed are smoothly imported into the system.
3. The method for automatically detecting the flow data based on the deep learning technology as claimed in claim 1, wherein: the second step comprises the following substeps:
(2.1) for the input flow data, selecting all data packets with the same < source IP, source port, destination IP, destination port and transport layer protocol >, and grouping the original flow data according to the data packets.
(2.2) adopting the following flow cleaning method for the grouped flow data:
firstly, the method comprises the following steps: comparing the conversation contents of all the grouped flows, and removing the repeated flows with the same contents;
secondly, the method comprises the following steps: removing the empty traffic packets without the data content of the application layer, such as data traffic packets of ACK (acknowledgement) files and the like;
thirdly, the method comprises the following steps: selecting a flow data packet by adopting a random sampling method, randomly generating a group of new MAC addresses and IP addresses, and replacing the MAC addresses and the IP addresses of the IP layer corresponding to the data link layer.
And (2.3) for the cleaned flow data, converting the flow data packet into an image input of a neural network, and if the size of the image is N x N, extracting a data flow segment with the length of N bytes every k data packet lengths, and inserting N-N confusion character strings at the tail end to enhance the randomness of the flow sample.
And (2.4) converting the file with the uniform length into a flow data picture according to a binary form, wherein the gray scale picture comprises two dimensions of width and height, the color picture comprises three dimensions of width, height and channel, and finally converting the picture into a file format which comprises the flow data pixel information and statistical information IDX and is used as the input of a neural network.
And (2.5) repeating the operations of the steps (2.1) to (2.4) until all the data traffic sets to be processed are processed.
4. The method for automatically detecting the flow data based on the deep learning technology as claimed in claim 1, wherein: the third step comprises the following substeps:
and (3.1) taking the flow data graph in the IDX format as the input of the two-dimensional CNN network model.
And (3.2) initializing a deep network model parameter, wherein a network structure adopts a stacked three-layer CNN neural network, a Dropout layer is added behind each layer of CNN neural network to prevent the overfitting phenomenon of the model, and then a layer of Flatten is added to reduce the dimension of flow data of the two-dimensional image and output the flow data.
(3.3) adding an attention mechanism behind the deep neural network convolution module, wherein the attention mechanism acquires weights obtained by training in the CNN neural network, and directly weights global information on a space or a channel to be used as input features, namely, the attention filter adds attention weights to a group of flow bytes with the window width h and operates to obtain new features.
And (3.4) finally adding a softmax layer in the deep neural network, dividing the output of the convolutional layer into a plurality of two-dimensional matrixes with the same size and smaller dimensionalities and without overlapping each other, and then pooling according to the mean value or the maximum value to obtain the output of the subsampling layer.
And (3.5) in the training process, firstly, randomly selecting N classes from the base class flow data set, sampling a basic support set and a basic query set from data samples of the classes, and training a deep neural network model by taking the support set as a training sample according to a training task target so as to minimize the recognition loss of the model to the flow samples in the query set.
5. The method for automatically detecting the flow data based on the deep learning technology as claimed in claim 1, wherein: the fourth step comprises the following substeps:
and (4.1) deploying the trained deep learning model on the target server.
And (4.2) starting a system flow acquisition module, continuously collecting the network flow conditions of each branch, and keeping the system processing capacity within a forwarding threshold value due to huge flow data of a scene.
And (4.3) loading the trained deep neural network model in a system flow analysis module, dividing the application scene flow into black security threat flow, grey uncertain flow and white security flow, screening out flow data of a single white part, excluding most normal flow, and directly judging the Trojan carried by the black threat flow part through a detection technology. For the gray part of the traffic, the possible threats are presumed as much as possible for further comprehensive comparison.
6. The utility model provides a flow data automatic checkout device based on deep learning technique which characterized in that: including a web services interface 1900 that includes a processing component 1922 and memory resources, represented by memory 1932, for storing instructions executable by the processing component 1922, the processing component 1922 being configured to execute the instructions to perform the methods described above.
7. The flow rate detection device based on the deep learning technology as claimed in claim 6, wherein: the processing assembly 1922 comprises a flow cleaning assembly, a flow conversion assembly and a flow detection assembly.
8. The flow rate detection device based on the deep learning technology as claimed in claim 6, wherein: the application programs stored in the memory 1932 include one or more modules that each correspond to a set of instructions.
9. The flow rate detection device based on the deep learning technology as claimed in claim 6, wherein: the web service interface 1900 may further include a power component 1926, a wired or wireless network listen interface 1950, an output interface 1958, the power component 1926 configured to perform power management of the web service interface 1900, the wireless network listen interface 1950 configured to connect the device 1900 to a network, and the web service interface 1900 may operate based on an operating system stored in the memory 1932.
CN202011446352.1A 2020-12-09 2020-12-09 Deep learning technology-based automatic flow data detection method and device Pending CN112468509A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011446352.1A CN112468509A (en) 2020-12-09 2020-12-09 Deep learning technology-based automatic flow data detection method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011446352.1A CN112468509A (en) 2020-12-09 2020-12-09 Deep learning technology-based automatic flow data detection method and device

Publications (1)

Publication Number Publication Date
CN112468509A true CN112468509A (en) 2021-03-09

Family

ID=74801424

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011446352.1A Pending CN112468509A (en) 2020-12-09 2020-12-09 Deep learning technology-based automatic flow data detection method and device

Country Status (1)

Country Link
CN (1) CN112468509A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115473850A (en) * 2022-09-14 2022-12-13 电信科学技术第十研究所有限公司 Real-time data filtering method and system based on AI and storage medium
CN118101348A (en) * 2024-04-26 2024-05-28 南京理工大学 Bad website flow slicing behavior-oriented detection and processing method

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1953453A (en) * 2006-10-25 2007-04-25 北京交通大学 A system and realization method for high speed capture and quick storage of IPv6 data
CN108200030A (en) * 2017-12-27 2018-06-22 深信服科技股份有限公司 Detection method, system, device and the computer readable storage medium of malicious traffic stream
CN110351244A (en) * 2019-06-11 2019-10-18 山东大学 A kind of network inbreak detection method and system based on multireel product neural network fusion
CN110751261A (en) * 2018-07-23 2020-02-04 第四范式(北京)技术有限公司 Training method and system and prediction method and system of neural network model
CN111064678A (en) * 2019-11-26 2020-04-24 西安电子科技大学 Network traffic classification method based on lightweight convolutional neural network
WO2020119481A1 (en) * 2018-12-11 2020-06-18 深圳先进技术研究院 Network traffic classification method and system based on deep learning, and electronic device
CN111404942A (en) * 2020-03-18 2020-07-10 广东技术师范大学 Vertical malicious crawler flow identification method based on deep learning
CN111860188A (en) * 2020-06-24 2020-10-30 南京师范大学 Human body posture recognition method based on time and channel double attention

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1953453A (en) * 2006-10-25 2007-04-25 北京交通大学 A system and realization method for high speed capture and quick storage of IPv6 data
CN108200030A (en) * 2017-12-27 2018-06-22 深信服科技股份有限公司 Detection method, system, device and the computer readable storage medium of malicious traffic stream
CN110751261A (en) * 2018-07-23 2020-02-04 第四范式(北京)技术有限公司 Training method and system and prediction method and system of neural network model
WO2020119481A1 (en) * 2018-12-11 2020-06-18 深圳先进技术研究院 Network traffic classification method and system based on deep learning, and electronic device
CN110351244A (en) * 2019-06-11 2019-10-18 山东大学 A kind of network inbreak detection method and system based on multireel product neural network fusion
CN111064678A (en) * 2019-11-26 2020-04-24 西安电子科技大学 Network traffic classification method based on lightweight convolutional neural network
CN111404942A (en) * 2020-03-18 2020-07-10 广东技术师范大学 Vertical malicious crawler flow identification method based on deep learning
CN111860188A (en) * 2020-06-24 2020-10-30 南京师范大学 Human body posture recognition method based on time and channel double attention

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
Y. XIN ET AL.: "Machine Learning and Deep Learning Methods for Cybers", IEEE ACCESS, vol. 6 *
罗扶华;张爱新;: "基于深度学习的僵尸网络检测技术研究", 通信技术, no. 01 *
连鸿飞;张浩;郭文忠;: "一种数据增强与混合神经网络的异常流量检测", 小型微型计算机系统, no. 04 *

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115473850A (en) * 2022-09-14 2022-12-13 电信科学技术第十研究所有限公司 Real-time data filtering method and system based on AI and storage medium
CN115473850B (en) * 2022-09-14 2024-01-05 电信科学技术第十研究所有限公司 AI-based real-time data filtering method, system and storage medium
CN118101348A (en) * 2024-04-26 2024-05-28 南京理工大学 Bad website flow slicing behavior-oriented detection and processing method

Similar Documents

Publication Publication Date Title
CN111935170B (en) Network abnormal flow detection method, device and equipment
CN109639481B (en) Deep learning-based network traffic classification method and system and electronic equipment
CN113364752B (en) Flow abnormity detection method, detection equipment and computer readable storage medium
CN110808945B (en) Network intrusion detection method in small sample scene based on meta-learning
CN112163594A (en) Network encryption traffic identification method and device
CN111917740B (en) Abnormal flow alarm log detection method, device, equipment and medium
CN112953971B (en) Network security flow intrusion detection method and system
CN112333706B (en) Internet of things equipment anomaly detection method and device, computing equipment and storage medium
CN112804253B (en) Network flow classification detection method, system and storage medium
CN113259313A (en) Malicious HTTPS flow intelligent analysis method based on online training algorithm
CN111049786A (en) Network attack detection method, device, equipment and storage medium
CN110417729A (en) A kind of service and application class method and system encrypting flow
CN112468509A (en) Deep learning technology-based automatic flow data detection method and device
CN111523527B (en) Special transport vehicle monitoring method and device, medium and electronic equipment
CN111049783A (en) Network attack detection method, device, equipment and storage medium
CN110708292A (en) IP processing method, device, medium and electronic equipment
CN114448830A (en) Equipment detection system and method
CN110365659B (en) Construction method of network intrusion detection data set in small sample scene
CN110276300B (en) Method and device for identifying quality of garbage
CN111884883A (en) Quick auditing processing method for service interface
CN117749535A (en) Network traffic abnormality detection method and device
CN114125806A (en) Wireless camera detection method based on cloud storage mode of wireless network flow
Yuan et al. PacketGame: Multi-Stream Packet Gating for Concurrent Video Inference at Scale
CN112688924A (en) Network protocol analysis system
CN115865486B (en) Network intrusion detection method and system based on multi-layer perception convolutional neural network

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
WD01 Invention patent application deemed withdrawn after publication

Application publication date: 20210309

WD01 Invention patent application deemed withdrawn after publication