CN112468509A - Deep learning technology-based automatic flow data detection method and device - Google Patents
Deep learning technology-based automatic flow data detection method and device Download PDFInfo
- Publication number
- CN112468509A CN112468509A CN202011446352.1A CN202011446352A CN112468509A CN 112468509 A CN112468509 A CN 112468509A CN 202011446352 A CN202011446352 A CN 202011446352A CN 112468509 A CN112468509 A CN 112468509A
- Authority
- CN
- China
- Prior art keywords
- data
- flow
- traffic
- network
- layer
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000001514 detection method Methods 0.000 title claims abstract description 36
- 238000005516 engineering process Methods 0.000 title claims abstract description 31
- 238000013135 deep learning Methods 0.000 title claims abstract description 24
- 238000000034 method Methods 0.000 claims abstract description 66
- 238000012545 processing Methods 0.000 claims abstract description 34
- 238000013528 artificial neural network Methods 0.000 claims abstract description 26
- 230000006870 function Effects 0.000 claims abstract description 13
- 230000008569 process Effects 0.000 claims abstract description 13
- 238000007781 pre-processing Methods 0.000 claims abstract description 4
- 238000012549 training Methods 0.000 claims description 21
- 238000013527 convolutional neural network Methods 0.000 claims description 12
- ZXQYGBMAQZUVMI-GCMPRSNUSA-N gamma-cyhalothrin Chemical compound CC1(C)[C@@H](\C=C(/Cl)C(F)(F)F)[C@H]1C(=O)O[C@H](C#N)C1=CC=CC(OC=2C=CC=CC=2)=C1 ZXQYGBMAQZUVMI-GCMPRSNUSA-N 0.000 claims description 7
- 238000003062 neural network model Methods 0.000 claims description 7
- 230000007246 mechanism Effects 0.000 claims description 6
- 238000005070 sampling Methods 0.000 claims description 6
- 238000004140 cleaning Methods 0.000 claims description 5
- 238000013136 deep learning model Methods 0.000 claims description 5
- 238000012216 screening Methods 0.000 claims description 4
- 238000006243 chemical reaction Methods 0.000 claims description 3
- 238000001914 filtration Methods 0.000 claims description 3
- 238000005206 flow analysis Methods 0.000 claims description 3
- 238000011176 pooling Methods 0.000 claims description 3
- 230000005540 biological transmission Effects 0.000 abstract description 5
- 230000006378 damage Effects 0.000 abstract description 3
- 230000003993 interaction Effects 0.000 abstract description 3
- 238000011160 research Methods 0.000 abstract description 3
- 238000010586 diagram Methods 0.000 description 3
- 238000013467 fragmentation Methods 0.000 description 2
- 238000006062 fragmentation reaction Methods 0.000 description 2
- 206010000117 Abnormal behaviour Diseases 0.000 description 1
- 238000004458 analytical method Methods 0.000 description 1
- 230000006399 behavior Effects 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 238000012544 monitoring process Methods 0.000 description 1
- 230000008521 reorganization Effects 0.000 description 1
- 238000012163 sequencing technique Methods 0.000 description 1
- 238000005406 washing Methods 0.000 description 1
Images
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L63/00—Network architectures or network communication protocols for network security
- H04L63/14—Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic
- H04L63/1408—Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic by monitoring network traffic
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/22—Indexing; Data structures therefor; Storage structures
- G06F16/2282—Tablespace storage structures; Management thereof
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/24—Querying
- G06F16/245—Query processing
- G06F16/2455—Query execution
- G06F16/24568—Data stream processing; Continuous queries
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- Computing Systems (AREA)
- Software Systems (AREA)
- Computational Linguistics (AREA)
- Biophysics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Molecular Biology (AREA)
- Evolutionary Computation (AREA)
- Biomedical Technology (AREA)
- Artificial Intelligence (AREA)
- Mathematical Physics (AREA)
- General Health & Medical Sciences (AREA)
- Health & Medical Sciences (AREA)
- Databases & Information Systems (AREA)
- Computer Security & Cryptography (AREA)
- Computer Hardware Design (AREA)
- Computer Networks & Wireless Communication (AREA)
- Signal Processing (AREA)
- Data Exchanges In Wide-Area Networks (AREA)
Abstract
The invention discloses a method and a device for automatically detecting flow data based on a deep learning technology, wherein the detection method comprises the following steps: the method comprises the following steps: the system monitors and captures data streams of a data link layer, simultaneously adopts web service to support an external flow data access function, and cleans the captured original data information streams to remove redundant and unnecessary information; step two: after the target data traffic is captured by the packets, the raw data stream is subjected to data preprocessing, namely, a processing process from the raw traffic to the input data of the deep neural network. In view of the characteristic that security threat flow realizes a destruction task through network interaction, the invention adopts the flow detection technology to take the data flow as a basic research object, and after the flow of a captured data link layer, the processing load of the system can be effectively reduced through automatically selecting and analyzing the statistical characteristics of data flow transmission, thereby providing powerful support for determining the network security threat.
Description
Technical Field
The invention relates to the technical field of flow detection, in particular to a method and a device for automatically detecting flow data based on a deep learning technology.
Background
With the continuous development of information technology, internet data gradually becomes an important basic resource of people's life, and the network security along with the internet data is facing a more and more serious challenge. The network flow detection technology is one of the most important protection technologies, network abnormal behaviors are identified by establishing a network access behavior reference, the universality is strong, and the method and the device have wide application in the fields of intrusion detection, network attack, fraud and secret stealing detection and the like. However, most of the conventional automatic traffic data detection methods focus on using methods such as detecting specific data packet loads, matching a trojan feature library or network protocol division, and these techniques rely on the judgment of trojan detection expert experience, lack of generalization capability, are difficult to cope with increasingly complex trojan techniques and network environments, and have low detection accuracy and lack of practicability. The method and the device for detecting and processing the flow data based on the deep learning technology support the automatic processing and the intelligent threat detection of the flow data, respectively correspond to the security threat flow, the uncertain flow and the security flow, and can provide powerful support for identifying the flow threat.
Disclosure of Invention
The invention aims to provide a method and a device for automatically detecting flow data based on a deep learning technology, which solve the problems in the background technology.
In order to achieve the purpose, the invention provides the following technical scheme: a flow data automatic detection method based on deep learning technology is disclosed, the detection method is as follows:
the method comprises the following steps: the system monitors and captures data streams of a data link layer, simultaneously adopts web service to support an external flow data access function, and cleans the captured original data information streams to remove redundant and unnecessary information;
step two: after target data traffic is captured in a grouping mode, data preprocessing is carried out on an original data stream, namely a processing process from the original traffic to input data of a deep neural network is carried out;
step three: the method comprises two processes of training and identifying the flow data by a deep learning method; identifying security threat flow, uncertain flow and security flow, and realizing flow admission control;
step four: and detecting the network flow by using the trained model, identifying the security threat flow, the uncertain flow and the security flow, and realizing the flow admission control.
As a preferred embodiment of the present invention, the step one includes the following sub-steps:
(1.1) for the data packets flowing through the network system, the network card is set to be in a hybrid mode at the network outlet of the system, the mode can accept all the flow data packets flowing through the local, and the data packet acquisition service system has the following working process:
firstly, the method comprises the following steps: initializing a data packet acquisition environment, and setting the size of an acquired data packet, the number of available CPUs (central processing units) and the size of a multithreading pool;
secondly, the method comprises the following steps: establishing a system memory buffer area, copying the received data packet to a socket buffer area, and accessing the data packet by calling a system function mmap by a system user layer;
thirdly, the method comprises the following steps: continuously inquiring each port by using multithread circulation and receiving a flow data packet, and filtering redundant information which does not belong to the network by adopting a self-defined packet processing function;
fourthly: and finally outputting a standard flow data packet set in a pcap format.
And (1.2) for externally provided data packets, providing an API callback interface for an external data source, acquiring a traffic data set through a web service, and extracting data traffic of a single application in the traffic set, wherein the data traffic comprises all protocol layer data streams of the application.
(1.3) comparing the data flow with the existing flow set in the system one by one,
if the data packet already exists in the system, merging the traffic generated by the same application, and selecting the latest timestamp version to cover the old version of the data stream.
If the application of the data stream does not exist in the system, the data stream is imported into the system as new data, and the associated data table version number is recorded;
if the data table is not retrieved in the system, the version number of the data table is set to 0, and the data table is imported into the system as new data traffic.
And (1.4) repeating the operations of the steps (1.2) and (1.3) until all the data traffic sets to be processed are smoothly imported into the system.
As a preferred embodiment of the present invention, the second step includes the following sub-steps:
(2.1) for the input flow data, selecting all data packets with the same < source IP, source port, destination IP, destination port and transport layer protocol >, and grouping the original flow data according to the data packets.
(2.2) adopting the following flow cleaning method for the grouped flow data:
firstly, the method comprises the following steps: comparing the conversation contents of all the grouped flows, and removing the repeated flows with the same contents;
secondly, the method comprises the following steps: removing the empty traffic packets without the data content of the application layer, such as data traffic packets of ACK (acknowledgement) files and the like;
thirdly, the method comprises the following steps: selecting a flow data packet by adopting a random sampling method, randomly generating a group of new MAC addresses and IP addresses, and replacing the MAC addresses and the IP addresses of the IP layer corresponding to the data link layer.
And (2.3) for the cleaned flow data, converting the flow data packet into an image input of a neural network, and if the size of the image is N x N, extracting a data flow segment with the length of N bytes every k data packet lengths, and inserting N-N confusion character strings at the tail end to enhance the randomness of the flow sample.
And (2.4) converting the file with the uniform length into a flow data picture according to a binary form, wherein the gray scale picture comprises two dimensions of width and height, the color picture comprises three dimensions of width, height and channel, and finally converting the picture into a file format which comprises the flow data pixel information and statistical information IDX and is used as the input of a neural network.
And (2.5) repeating the operations of the steps (2.1) to (2.4) until all the data traffic sets to be processed are processed.
As a preferred embodiment of the present invention, the third step includes the following sub-steps:
and (3.1) taking the flow data graph in the IDX format as the input of the two-dimensional CNN network model.
And (3.2) initializing a deep network model parameter, wherein a network structure adopts a stacked three-layer CNN neural network, a Dropout layer is added behind each layer of CNN neural network to prevent the overfitting phenomenon of the model, and then a layer of Flatten is added to reduce the dimension of flow data of the two-dimensional image and output the flow data.
(3.3) adding an attention mechanism behind the deep neural network convolution module, wherein the attention mechanism acquires weights obtained by training in the CNN neural network, and directly weights global information on a space or a channel to be used as input features, namely, the attention filter adds attention weights to a group of flow bytes with the window width h and operates to obtain new features.
And (3.4) finally adding a softmax layer in the deep neural network, dividing the output of the convolutional layer into a plurality of two-dimensional matrixes with the same size and smaller dimensionalities and without overlapping each other, and then pooling according to the mean value or the maximum value to obtain the output of the subsampling layer.
And (3.5) in the training process, firstly, randomly selecting N classes from the base class flow data set, sampling a basic support set and a basic query set from data samples of the classes, and training a deep neural network model by taking the support set as a training sample according to a training task target so as to minimize the recognition loss of the model to the flow samples in the query set.
As a preferred embodiment of the present invention, the step four includes the following sub-steps:
and (4.1) deploying the trained deep learning model on the target server.
And (4.2) starting a system flow acquisition module, continuously collecting the network flow conditions of each branch, and keeping the system processing capacity within a forwarding threshold value due to huge flow data of a scene.
And (4.3) loading the trained deep neural network model in a system flow analysis module, dividing the application scene flow into black security threat flow, grey uncertain flow and white security flow, screening out flow data of a single white part, excluding most normal flow, and directly judging the Trojan carried by the black threat flow part through a detection technology. For the gray part of the traffic, the possible threats are presumed as much as possible for further comprehensive comparison.
The invention also relates to a deep learning technology based traffic detection apparatus comprising a web service interface comprising a processing component and a memory resource represented by a memory for storing instructions executable by the processing component, the processing component being configured to execute the instructions to perform the above method.
Wherein, the processing assembly comprises a flow cleaning assembly, a flow conversion assembly and a flow detection assembly.
Wherein the application stored in the memory includes one or more modules each corresponding to a set of instructions.
Wherein the web services interface may further comprise a power component configured to perform power management of the web services interface, a wired or wireless network listening interface configured to connect the device to a network, an output interface, the web services interface operable based on an operating system stored in the memory.
Compared with the prior art, the invention has the following beneficial effects:
1. in view of the characteristic that security threat flow realizes a destruction task through network interaction, the invention adopts the flow detection technology to take the data flow as a basic research object, and after the flow of a captured data link layer, the processing load of the system can be effectively reduced through automatically selecting and analyzing the statistical characteristics of data flow transmission, thereby providing powerful support for determining the network security threat.
2. The deep learning model technology does not depend on a certain Trojan horse feature library, does not need to extract statistical features of training samples and flow to be measured, can be compatible with an encryption protocol and a weak feature protocol, and does not depend on expert experience to obtain data features on the premise of ensuring high accuracy, thereby realizing automatic processing.
Drawings
Other features, objects and advantages of the invention will become more apparent upon reading of the detailed description of non-limiting embodiments with reference to the following drawings:
FIG. 1 is a schematic diagram of a flow detection device based on deep learning technology according to the present invention;
FIG. 2 is a flow detection architecture diagram based on deep learning according to the present invention;
FIG. 3 is a flow data processing diagram according to the present invention;
fig. 4 is a flow chart of detecting traffic data according to the present invention.
1900. A web service interface; 1922. a processing component; 1926. a power supply component; 1932. a memory; 1950. a wired or wireless network monitoring interface; 1958. and (6) an output interface.
Detailed Description
In order to make the technical means, the creation characteristics, the achievement purposes and the effects of the invention easy to understand, the invention is further described with the specific embodiments.
Referring to fig. 1-4, the present invention provides a technical solution: a flow data automatic detection method based on deep learning technology is disclosed, the detection method is as follows:
the method comprises the following steps: the system monitors and captures data streams of a data link layer, meanwhile, a web service is adopted to support an external flow data access function, the system cleans the captured original data information streams and removes redundant and unnecessary information, the system captures the data streams of the data link layer, meanwhile, in order to effectively reduce the processing burden of the system, the system also supports a cross-system flow data access function, API call-back interfaces are provided for the externally accessed flow, the web service is adopted to obtain flow data of different sources, the system cleans the captured original data information streams and removes the redundant and unnecessary information, and the consistency of the flow data is maintained;
step two: after the target data traffic is captured in a grouping mode, the original data flow is subjected to data preprocessing, namely, the data processing process from the original traffic to the deep neural network input data is carried out, the system carries out deep analysis on the data traffic, and after the target data traffic is captured in a grouping mode, in order to avoid traffic grouping confusion caused by the fact that Maximum Transmission Units (MTUs) are different in size or the fragmentation function of a transmission protocol, the reorganization and sequencing of data fragmentation are supported, and the data flow is converted into an image through a data coding modeling method and used as the neural network input;
step three: the method comprises two processes of training and identifying the flow data by a deep learning method; the method comprises the steps of identifying safety threat flow, uncertain flow and safety flow, realizing flow admission control, training a deep neural network model by a system, automatically learning and accumulating large-flow data by model training, and realizing the rapid learning capability of the flow data by extracting abstract characteristic parts in different types of flow;
step four: the method comprises the steps of utilizing a trained model to detect network flow, identifying security threat flow, uncertain flow and security flow, realizing flow admission control, and utilizing a flow detection engine identification function of the system to deploy the system at a gateway outlet, automatically and correspondingly identifying the network flow into black security threat flow, gray uncertain flow and white security flow according to the trained deep learning model, directly judging trojans or screening out partial flow of a white list by adopting various detection technologies, and continuously feeding back results to the trained model to strengthen the data flow detection capability of the system.
Further, the first step comprises the following sub-steps:
(1.1) for the data packets flowing through the network system, the network card is set to be in a hybrid mode at the network outlet of the system, the mode can accept all the flow data packets flowing through the local, and the data packet acquisition service system has the following working process:
firstly, the method comprises the following steps: initializing a data packet acquisition environment, and setting the size of an acquired data packet, the number of available CPUs (central processing units) and the size of a multithreading pool;
secondly, the method comprises the following steps: establishing a system memory buffer area, copying the received data packet to a socket buffer area, and accessing the data packet by calling a system function mmap by a system user layer;
thirdly, the method comprises the following steps: continuously inquiring each port by using multithread circulation and receiving a flow data packet, and filtering redundant information which does not belong to the network by adopting a self-defined packet processing function;
fourthly: and finally outputting a standard flow data packet set in a pcap format.
And (1.2) for externally provided data packets, providing an API callback interface for an external data source, acquiring a traffic data set through a web service, and extracting data traffic of a single application in the traffic set, wherein the data traffic comprises all protocol layer data streams of the application.
(1.3) comparing the data flow with the existing flow set in the system one by one,
if the data packet already exists in the system, merging the traffic generated by the same application, and selecting the latest timestamp version to cover the old version of the data stream.
If the application of the data stream does not exist in the system, the data stream is imported into the system as new data, and the associated data table version number is recorded;
if the data table is not retrieved in the system, the version number of the data table is set to 0, and the data table is imported into the system as new data traffic.
And (1.4) repeating the operations of the steps (1.2) and (1.3) until all the data traffic sets to be processed are smoothly imported into the system.
Further, the second step includes the following sub-steps:
(2.1) for the input flow data, selecting all data packets with the same < source IP, source port, destination IP, destination port and transport layer protocol >, and grouping the original flow data according to the data packets.
(2.2) adopting the following flow cleaning method for the grouped flow data:
firstly, the method comprises the following steps: comparing the conversation contents of all the grouped flows, and removing the repeated flows with the same contents;
secondly, the method comprises the following steps: removing the empty traffic packets without the data content of the application layer, such as data traffic packets of ACK (acknowledgement) files and the like;
thirdly, the method comprises the following steps: selecting a flow data packet by adopting a random sampling method, randomly generating a group of new MAC addresses and IP addresses, and replacing the MAC addresses and the IP addresses of the IP layer corresponding to the data link layer.
And (2.3) for the cleaned flow data, converting the flow data packet into an image input of a neural network, and if the size of the image is N x N, extracting a data flow segment with the length of N bytes every k data packet lengths, and inserting N-N confusion character strings at the tail end to enhance the randomness of the flow sample.
And (2.4) converting the file with the uniform length into a flow data picture according to a binary form, wherein the gray scale picture comprises two dimensions of width and height, the color picture comprises three dimensions of width, height and channel, and finally converting the picture into a file format which comprises the flow data pixel information and statistical information IDX and is used as the input of a neural network.
And (2.5) repeating the operations of the steps (2.1) to (2.4) until all the data traffic sets to be processed are processed.
Further, the third step includes the following sub-steps:
and (3.1) taking the flow data graph in the IDX format as the input of the two-dimensional CNN network model.
And (3.2) initializing a deep network model parameter, wherein a network structure adopts a stacked three-layer CNN neural network, a Dropout layer is added behind each layer of CNN neural network to prevent the overfitting phenomenon of the model, and then a layer of Flatten is added to reduce the dimension of flow data of the two-dimensional image and output the flow data.
(3.3) adding an attention mechanism behind the deep neural network convolution module, wherein the attention mechanism acquires weights obtained by training in the CNN neural network, and directly weights global information on a space or a channel to be used as input features, namely, the attention filter adds attention weights to a group of flow bytes with the window width h and operates to obtain new features.
And (3.4) finally adding a softmax layer in the deep neural network, dividing the output of the convolutional layer into a plurality of two-dimensional matrixes with the same size and smaller dimensionalities and without overlapping each other, and then pooling according to the mean value or the maximum value to obtain the output of the subsampling layer.
And (3.5) in the training process, firstly, randomly selecting N classes from the base class flow data set, sampling a basic support set and a basic query set from data samples of the classes, and training a deep neural network model by taking the support set as a training sample according to a training task target so as to minimize the recognition loss of the model to the flow samples in the query set.
Further, the fourth step includes the following sub-steps:
and (4.1) deploying the trained deep learning model on the target server.
And (4.2) starting a system flow acquisition module, continuously collecting the network flow conditions of each branch, and keeping the system processing capacity within a forwarding threshold value due to huge flow data of a scene.
And (4.3) loading the trained deep neural network model in a system flow analysis module, dividing the application scene flow into black security threat flow, grey uncertain flow and white security flow, screening out flow data of a single white part, excluding most normal flow, and directly judging the Trojan carried by the black threat flow part through a detection technology. For the gray part of the traffic, the possible threats are presumed as much as possible for further comprehensive comparison.
Referring to fig. 1, a deep learning technology-based traffic detection apparatus includes a web service interface 1900, where the web service interface 1900 includes a processing component 1922 and a memory resource represented by a memory 1932 for storing instructions executable by the processing component 1922, the processing component 1922 is configured to execute the instructions to perform the method, the processing component 1922 includes a traffic washing component, a traffic conversion component and a traffic detection component, an application program stored in the memory 1932 includes one or more modules each corresponding to a set of instructions, the web service interface 1900 may further include a power supply component 1926, a wired or wireless network snooping interface 1950, and an output interface 1958, the power supply component 1926 is configured to perform power management of the web service interface 1900, the wireless network snooping interface 1950 is configured to connect the apparatus 1900 to a network, the web services interface 1900 may operate based on an operating system stored in memory 1932, such as Windows Server, MacOSXTM, UnixTM, LinuxTM, FreeBSDTM, or the like.
In summary, in view of the characteristic that security threat traffic realizes a destruction task through network interaction, the invention adopts a flow detection technology to take data flow as a basic research object, after the captured data link layer traffic, the invention can effectively reduce the processing burden of the system by automatically selecting and analyzing the statistical characteristics of data flow transmission, and provides powerful support for determining network security threat.
While there have been shown and described what are at present considered the fundamental principles and essential features of the invention and its advantages, it will be apparent to those skilled in the art that the invention is not limited to the details of the foregoing exemplary embodiments, but is capable of other specific forms without departing from the spirit or essential characteristics thereof. The present embodiments are therefore to be considered in all respects as illustrative and not restrictive, the scope of the invention being indicated by the appended claims rather than by the foregoing description, and all changes which come within the meaning and range of equivalency of the claims are therefore intended to be embraced therein. Any reference sign in a claim should not be construed as limiting the claim concerned.
Furthermore, it should be understood that although the present description refers to embodiments, not every embodiment may contain only a single embodiment, and such description is for clarity only, and those skilled in the art should integrate the description, and the embodiments may be combined as appropriate to form other embodiments understood by those skilled in the art.
Claims (9)
1. A flow data automatic detection method based on deep learning technology is characterized in that: the detection method comprises the following steps:
the method comprises the following steps: the system monitors and captures data streams of a data link layer, simultaneously adopts web service to support an external flow data access function, and cleans the captured original data information streams to remove redundant and unnecessary information;
step two: after target data traffic is captured in a grouping mode, data preprocessing is carried out on an original data stream, namely a processing process from the original traffic to input data of a deep neural network is carried out;
step three: the method comprises two processes of training and identifying the flow data by a deep learning method; identifying security threat flow, uncertain flow and security flow, and realizing flow admission control;
step four: and detecting the network flow by using the trained model, identifying the security threat flow, the uncertain flow and the security flow, and realizing the flow admission control.
2. The method for automatically detecting the flow data based on the deep learning technology as claimed in claim 1, wherein: the first step comprises the following substeps:
(1.1) for the data packets flowing through the network system, the network card is set to be in a hybrid mode at the network outlet of the system, the mode can accept all the flow data packets flowing through the local, and the data packet acquisition service system has the following working process:
firstly, the method comprises the following steps: initializing a data packet acquisition environment, and setting the size of an acquired data packet, the number of available CPUs (central processing units) and the size of a multithreading pool;
secondly, the method comprises the following steps: establishing a system memory buffer area, copying the received data packet to a socket buffer area, and accessing the data packet by calling a system function mmap by a system user layer;
thirdly, the method comprises the following steps: continuously inquiring each port by using multithread circulation and receiving a flow data packet, and filtering redundant information which does not belong to the network by adopting a self-defined packet processing function;
fourthly: and finally outputting a standard flow data packet set in a pcap format.
And (1.2) for externally provided data packets, providing an API callback interface for an external data source, acquiring a traffic data set through a web service, and extracting data traffic of a single application in the traffic set, wherein the data traffic comprises all protocol layer data streams of the application.
(1.3) comparing the data flow with the existing flow set in the system one by one,
if the data packet already exists in the system, merging the traffic generated by the same application, and selecting the latest timestamp version to cover the old version of the data stream.
If the application of the data stream does not exist in the system, the data stream is imported into the system as new data, and the associated data table version number is recorded;
if the data table is not retrieved in the system, the version number of the data table is set to 0, and the data table is imported into the system as new data traffic.
And (1.4) repeating the operations of the steps (1.2) and (1.3) until all the data traffic sets to be processed are smoothly imported into the system.
3. The method for automatically detecting the flow data based on the deep learning technology as claimed in claim 1, wherein: the second step comprises the following substeps:
(2.1) for the input flow data, selecting all data packets with the same < source IP, source port, destination IP, destination port and transport layer protocol >, and grouping the original flow data according to the data packets.
(2.2) adopting the following flow cleaning method for the grouped flow data:
firstly, the method comprises the following steps: comparing the conversation contents of all the grouped flows, and removing the repeated flows with the same contents;
secondly, the method comprises the following steps: removing the empty traffic packets without the data content of the application layer, such as data traffic packets of ACK (acknowledgement) files and the like;
thirdly, the method comprises the following steps: selecting a flow data packet by adopting a random sampling method, randomly generating a group of new MAC addresses and IP addresses, and replacing the MAC addresses and the IP addresses of the IP layer corresponding to the data link layer.
And (2.3) for the cleaned flow data, converting the flow data packet into an image input of a neural network, and if the size of the image is N x N, extracting a data flow segment with the length of N bytes every k data packet lengths, and inserting N-N confusion character strings at the tail end to enhance the randomness of the flow sample.
And (2.4) converting the file with the uniform length into a flow data picture according to a binary form, wherein the gray scale picture comprises two dimensions of width and height, the color picture comprises three dimensions of width, height and channel, and finally converting the picture into a file format which comprises the flow data pixel information and statistical information IDX and is used as the input of a neural network.
And (2.5) repeating the operations of the steps (2.1) to (2.4) until all the data traffic sets to be processed are processed.
4. The method for automatically detecting the flow data based on the deep learning technology as claimed in claim 1, wherein: the third step comprises the following substeps:
and (3.1) taking the flow data graph in the IDX format as the input of the two-dimensional CNN network model.
And (3.2) initializing a deep network model parameter, wherein a network structure adopts a stacked three-layer CNN neural network, a Dropout layer is added behind each layer of CNN neural network to prevent the overfitting phenomenon of the model, and then a layer of Flatten is added to reduce the dimension of flow data of the two-dimensional image and output the flow data.
(3.3) adding an attention mechanism behind the deep neural network convolution module, wherein the attention mechanism acquires weights obtained by training in the CNN neural network, and directly weights global information on a space or a channel to be used as input features, namely, the attention filter adds attention weights to a group of flow bytes with the window width h and operates to obtain new features.
And (3.4) finally adding a softmax layer in the deep neural network, dividing the output of the convolutional layer into a plurality of two-dimensional matrixes with the same size and smaller dimensionalities and without overlapping each other, and then pooling according to the mean value or the maximum value to obtain the output of the subsampling layer.
And (3.5) in the training process, firstly, randomly selecting N classes from the base class flow data set, sampling a basic support set and a basic query set from data samples of the classes, and training a deep neural network model by taking the support set as a training sample according to a training task target so as to minimize the recognition loss of the model to the flow samples in the query set.
5. The method for automatically detecting the flow data based on the deep learning technology as claimed in claim 1, wherein: the fourth step comprises the following substeps:
and (4.1) deploying the trained deep learning model on the target server.
And (4.2) starting a system flow acquisition module, continuously collecting the network flow conditions of each branch, and keeping the system processing capacity within a forwarding threshold value due to huge flow data of a scene.
And (4.3) loading the trained deep neural network model in a system flow analysis module, dividing the application scene flow into black security threat flow, grey uncertain flow and white security flow, screening out flow data of a single white part, excluding most normal flow, and directly judging the Trojan carried by the black threat flow part through a detection technology. For the gray part of the traffic, the possible threats are presumed as much as possible for further comprehensive comparison.
6. The utility model provides a flow data automatic checkout device based on deep learning technique which characterized in that: including a web services interface 1900 that includes a processing component 1922 and memory resources, represented by memory 1932, for storing instructions executable by the processing component 1922, the processing component 1922 being configured to execute the instructions to perform the methods described above.
7. The flow rate detection device based on the deep learning technology as claimed in claim 6, wherein: the processing assembly 1922 comprises a flow cleaning assembly, a flow conversion assembly and a flow detection assembly.
8. The flow rate detection device based on the deep learning technology as claimed in claim 6, wherein: the application programs stored in the memory 1932 include one or more modules that each correspond to a set of instructions.
9. The flow rate detection device based on the deep learning technology as claimed in claim 6, wherein: the web service interface 1900 may further include a power component 1926, a wired or wireless network listen interface 1950, an output interface 1958, the power component 1926 configured to perform power management of the web service interface 1900, the wireless network listen interface 1950 configured to connect the device 1900 to a network, and the web service interface 1900 may operate based on an operating system stored in the memory 1932.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011446352.1A CN112468509A (en) | 2020-12-09 | 2020-12-09 | Deep learning technology-based automatic flow data detection method and device |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011446352.1A CN112468509A (en) | 2020-12-09 | 2020-12-09 | Deep learning technology-based automatic flow data detection method and device |
Publications (1)
Publication Number | Publication Date |
---|---|
CN112468509A true CN112468509A (en) | 2021-03-09 |
Family
ID=74801424
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202011446352.1A Pending CN112468509A (en) | 2020-12-09 | 2020-12-09 | Deep learning technology-based automatic flow data detection method and device |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN112468509A (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN115473850A (en) * | 2022-09-14 | 2022-12-13 | 电信科学技术第十研究所有限公司 | Real-time data filtering method and system based on AI and storage medium |
CN118101348A (en) * | 2024-04-26 | 2024-05-28 | 南京理工大学 | Bad website flow slicing behavior-oriented detection and processing method |
Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN1953453A (en) * | 2006-10-25 | 2007-04-25 | 北京交通大学 | A system and realization method for high speed capture and quick storage of IPv6 data |
CN108200030A (en) * | 2017-12-27 | 2018-06-22 | 深信服科技股份有限公司 | Detection method, system, device and the computer readable storage medium of malicious traffic stream |
CN110351244A (en) * | 2019-06-11 | 2019-10-18 | 山东大学 | A kind of network inbreak detection method and system based on multireel product neural network fusion |
CN110751261A (en) * | 2018-07-23 | 2020-02-04 | 第四范式(北京)技术有限公司 | Training method and system and prediction method and system of neural network model |
CN111064678A (en) * | 2019-11-26 | 2020-04-24 | 西安电子科技大学 | Network traffic classification method based on lightweight convolutional neural network |
WO2020119481A1 (en) * | 2018-12-11 | 2020-06-18 | 深圳先进技术研究院 | Network traffic classification method and system based on deep learning, and electronic device |
CN111404942A (en) * | 2020-03-18 | 2020-07-10 | 广东技术师范大学 | Vertical malicious crawler flow identification method based on deep learning |
CN111860188A (en) * | 2020-06-24 | 2020-10-30 | 南京师范大学 | Human body posture recognition method based on time and channel double attention |
-
2020
- 2020-12-09 CN CN202011446352.1A patent/CN112468509A/en active Pending
Patent Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN1953453A (en) * | 2006-10-25 | 2007-04-25 | 北京交通大学 | A system and realization method for high speed capture and quick storage of IPv6 data |
CN108200030A (en) * | 2017-12-27 | 2018-06-22 | 深信服科技股份有限公司 | Detection method, system, device and the computer readable storage medium of malicious traffic stream |
CN110751261A (en) * | 2018-07-23 | 2020-02-04 | 第四范式(北京)技术有限公司 | Training method and system and prediction method and system of neural network model |
WO2020119481A1 (en) * | 2018-12-11 | 2020-06-18 | 深圳先进技术研究院 | Network traffic classification method and system based on deep learning, and electronic device |
CN110351244A (en) * | 2019-06-11 | 2019-10-18 | 山东大学 | A kind of network inbreak detection method and system based on multireel product neural network fusion |
CN111064678A (en) * | 2019-11-26 | 2020-04-24 | 西安电子科技大学 | Network traffic classification method based on lightweight convolutional neural network |
CN111404942A (en) * | 2020-03-18 | 2020-07-10 | 广东技术师范大学 | Vertical malicious crawler flow identification method based on deep learning |
CN111860188A (en) * | 2020-06-24 | 2020-10-30 | 南京师范大学 | Human body posture recognition method based on time and channel double attention |
Non-Patent Citations (3)
Title |
---|
Y. XIN ET AL.: "Machine Learning and Deep Learning Methods for Cybers", IEEE ACCESS, vol. 6 * |
罗扶华;张爱新;: "基于深度学习的僵尸网络检测技术研究", 通信技术, no. 01 * |
连鸿飞;张浩;郭文忠;: "一种数据增强与混合神经网络的异常流量检测", 小型微型计算机系统, no. 04 * |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN115473850A (en) * | 2022-09-14 | 2022-12-13 | 电信科学技术第十研究所有限公司 | Real-time data filtering method and system based on AI and storage medium |
CN115473850B (en) * | 2022-09-14 | 2024-01-05 | 电信科学技术第十研究所有限公司 | AI-based real-time data filtering method, system and storage medium |
CN118101348A (en) * | 2024-04-26 | 2024-05-28 | 南京理工大学 | Bad website flow slicing behavior-oriented detection and processing method |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN111935170B (en) | Network abnormal flow detection method, device and equipment | |
CN109639481B (en) | Deep learning-based network traffic classification method and system and electronic equipment | |
CN113364752B (en) | Flow abnormity detection method, detection equipment and computer readable storage medium | |
CN110808945B (en) | Network intrusion detection method in small sample scene based on meta-learning | |
CN112163594A (en) | Network encryption traffic identification method and device | |
CN111917740B (en) | Abnormal flow alarm log detection method, device, equipment and medium | |
CN112953971B (en) | Network security flow intrusion detection method and system | |
CN112333706B (en) | Internet of things equipment anomaly detection method and device, computing equipment and storage medium | |
CN112804253B (en) | Network flow classification detection method, system and storage medium | |
CN113259313A (en) | Malicious HTTPS flow intelligent analysis method based on online training algorithm | |
CN111049786A (en) | Network attack detection method, device, equipment and storage medium | |
CN110417729A (en) | A kind of service and application class method and system encrypting flow | |
CN112468509A (en) | Deep learning technology-based automatic flow data detection method and device | |
CN111523527B (en) | Special transport vehicle monitoring method and device, medium and electronic equipment | |
CN111049783A (en) | Network attack detection method, device, equipment and storage medium | |
CN110708292A (en) | IP processing method, device, medium and electronic equipment | |
CN114448830A (en) | Equipment detection system and method | |
CN110365659B (en) | Construction method of network intrusion detection data set in small sample scene | |
CN110276300B (en) | Method and device for identifying quality of garbage | |
CN111884883A (en) | Quick auditing processing method for service interface | |
CN117749535A (en) | Network traffic abnormality detection method and device | |
CN114125806A (en) | Wireless camera detection method based on cloud storage mode of wireless network flow | |
Yuan et al. | PacketGame: Multi-Stream Packet Gating for Concurrent Video Inference at Scale | |
CN112688924A (en) | Network protocol analysis system | |
CN115865486B (en) | Network intrusion detection method and system based on multi-layer perception convolutional neural network |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
WD01 | Invention patent application deemed withdrawn after publication |
Application publication date: 20210309 |
|
WD01 | Invention patent application deemed withdrawn after publication |