CN115606162A - Abnormal flow detection method and system, and computer storage medium - Google Patents

Abnormal flow detection method and system, and computer storage medium Download PDF

Info

Publication number
CN115606162A
CN115606162A CN202080100505.9A CN202080100505A CN115606162A CN 115606162 A CN115606162 A CN 115606162A CN 202080100505 A CN202080100505 A CN 202080100505A CN 115606162 A CN115606162 A CN 115606162A
Authority
CN
China
Prior art keywords
data
detection
model
flow
result
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202080100505.9A
Other languages
Chinese (zh)
Inventor
程肯
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guangdong Oppo Mobile Telecommunications Corp Ltd
Shenzhen Huantai Technology Co Ltd
Original Assignee
Guangdong Oppo Mobile Telecommunications Corp Ltd
Shenzhen Huantai Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guangdong Oppo Mobile Telecommunications Corp Ltd, Shenzhen Huantai Technology Co Ltd filed Critical Guangdong Oppo Mobile Telecommunications Corp Ltd
Publication of CN115606162A publication Critical patent/CN115606162A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/14Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic
    • H04L63/1408Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic by monitoring network traffic
    • H04L63/1425Traffic logging, e.g. anomaly detection
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/14Network analysis or design
    • H04L41/142Network analysis or design using statistical or mathematical methods
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/16Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks using machine learning or artificial intelligence
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L43/00Arrangements for monitoring or testing data switching networks
    • H04L43/18Protocol analysers
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L43/00Arrangements for monitoring or testing data switching networks
    • H04L43/50Testing arrangements

Abstract

The embodiment of the application discloses a method and a system for detecting abnormal flow and a computer storage medium, wherein the method for detecting the abnormal flow comprises the following steps: acquiring flow data to be detected, and analyzing the flow data to be detected to obtain target structured data; carrying out feature extraction processing on the target structured data to obtain target feature data; determining a first detection result corresponding to the target characteristic data based on the self-coding model and the first detection model, determining a second detection result corresponding to the target characteristic data based on the second detection model, and determining a third detection result corresponding to the target characteristic data based on a preset rule base; the self-coding model and the first detection model are generated based on an unsupervised algorithm; the second detection model is based on a model generated by a supervised algorithm; and generating a target detection result of the flow data to be detected according to the first detection result, the second detection result and the third detection result.

Description

Abnormal flow detection method and system, and computer storage medium Technical Field
The embodiment of the application relates to the technical field of communication, in particular to an abnormal flow detection method and system and a computer storage medium.
Background
The abnormal traffic is unexpected traffic carried by limited bandwidth resources, and the abnormal traffic can reflect existing network abnormality to a certain extent, so that the rapid and accurate detection of the abnormal traffic in the network has an extremely important meaning for network protection.
At present, common abnormal traffic detection methods can be classified into black and white list-based detection, rule-based detection, machine learning model-based detection, deep learning model-based detection, and the like. However, the existing list and rule need to be updated continuously for both the black-and-white list-based detection and the rule-based detection, which cannot be well applied to the network traffic data with relatively high real-time performance, and the quality requirements for the selection of model parameters and training data are relatively high for the machine learning model-based detection and the deep learning model-based detection, so that when abnormal traffic detection is performed on the network traffic data with large data volume and relatively high real-time performance, the accuracy of detection cannot be ensured, and the detection effect is unstable.
Therefore, an efficient and accurate method for detecting abnormal traffic is needed.
Disclosure of Invention
The embodiment of the application provides an abnormal flow detection method and system and a computer storage medium, which can improve the accuracy of abnormal flow detection and effectively improve the detection quality of abnormal flow.
The technical scheme of the embodiment of the application is realized as follows:
in a first aspect, an embodiment of the present application provides a method for detecting abnormal traffic, where the method includes:
acquiring flow data to be detected, and analyzing the flow data to be detected to obtain target structured data;
carrying out feature extraction processing on the target structured data to obtain target feature data;
determining a first detection result corresponding to the target characteristic data based on a self-coding model and a first detection model, determining a second detection result corresponding to the target characteristic data based on a second detection model, and determining a third detection result corresponding to the target characteristic data based on a preset rule base; wherein the self-coding model is a model generated based on an unsupervised algorithm; the first detection model is a model generated based on an unsupervised algorithm; the second detection model is based on a model generated by a supervised algorithm;
and generating a target detection result of the flow data to be detected according to the first detection result, the second detection result and the third detection result.
In a second aspect, an embodiment of the present application provides an abnormal traffic detection method, where the method includes:
acquiring first flow data and extracting feature data corresponding to the first flow data;
coding the characteristic data through a self-coding model to obtain coded data; wherein the self-coding model is a model generated based on an unsupervised algorithm;
training through the coded data to obtain a first detection model; wherein the first detection model is a model generated based on an unsupervised algorithm;
obtaining a test result corresponding to the first traffic data according to the first detection model and a preset marking strategy;
training and obtaining a second detection model based on the feature data and the test result; wherein the second detection model is a model generated based on a supervised algorithm.
In a third aspect, an embodiment of the present application provides an abnormal traffic detection system, where the abnormal traffic detection system includes: a first acquisition section, a parsing section, a first extraction section, a determination section, a generation section,
the first acquisition part is configured to acquire flow data to be detected;
the analysis part is configured to analyze the flow data to be detected to obtain target structured data;
the first extraction part is configured to perform feature extraction processing on the target structured data to obtain target feature data;
the determining part is configured to determine a first detection result corresponding to the target feature data based on a self-coding model and a first detection model, determine a second detection result corresponding to the target feature data based on a second detection model, and determine a third detection result corresponding to the target feature data based on a preset rule base; wherein the self-coding model is a model generated based on an unsupervised algorithm; the first detection model is a model generated based on an unsupervised algorithm; the second detection model is based on a model generated by a supervised algorithm;
the generation part is configured to generate a target detection result of the to-be-detected flow data according to the first detection result, the second detection result and the third detection result.
In a fourth aspect, an embodiment of the present application provides an abnormal traffic detection system, where the abnormal traffic detection system includes: a second acquisition part, a second extraction part, a coding part, a training part,
the second acquisition part is configured to acquire first flow data;
the second extraction part is configured to extract feature data corresponding to the first flow data;
the coding part is configured to carry out coding processing on the characteristic data through a self-coding model to obtain coded data; wherein the self-coding model is a model generated based on an unsupervised algorithm;
the second acquisition part is further configured to acquire a first detection model through the encoded data training; wherein the first detection model is a model generated based on an unsupervised algorithm; obtaining a test result corresponding to the first flow data according to the first detection model and a preset marking strategy;
the training part is configured to train and obtain a second detection model based on the feature data and the test result; wherein the second detection model is a model generated based on a supervised algorithm.
In a fifth aspect, an embodiment of the present application provides an abnormal traffic detection system, where the abnormal traffic detection system includes a processor and a memory storing instructions executable by the processor, and when the instructions are executed by the processor, the abnormal traffic detection method according to the first aspect is implemented.
In a sixth aspect, the present application provides an abnormal traffic detection system, where the abnormal traffic detection system includes a processor, and a memory storing instructions executable by the processor, and when the instructions are executed by the processor, the abnormal traffic detection method according to the second aspect is implemented.
In a seventh aspect, an embodiment of the present application provides a computer-readable storage medium, which stores a program, and is applied to an abnormal flow detection system, where the program is executed by a processor to implement the abnormal flow detection method according to the first aspect and the second aspect.
The embodiment of the application provides an abnormal flow detection method and system and a computer storage medium, wherein the abnormal flow detection system acquires flow data to be detected, analyzes the flow data to be detected and acquires target structured data; carrying out feature extraction processing on the target structured data to obtain target feature data; determining a first detection result corresponding to the target characteristic data based on the self-coding model and the first detection model, determining a second detection result corresponding to the target characteristic data based on the second detection model, and determining a third detection result corresponding to the target characteristic data based on a preset rule base; the self-coding model and the first detection model are generated based on an unsupervised algorithm; the second detection model is based on a model generated by a supervised algorithm; and generating a target detection result of the flow data to be detected according to the first detection result, the second detection result and the third detection result. That is to say, in the embodiment of the present application, the abnormal flow detection system uses an unsupervised algorithm and a supervised algorithm to respectively train and generate the self-coding model, the first detection model and the second detection model, so that abnormal flow detection can be performed on the to-be-detected flow data based on the self-coding model, the first detection model, the second detection model and the preset rule base, and the obtained detection result is the risk judgment of the to-be-detected flow data, which is realized by combining the unsupervised algorithm, the supervised algorithm and the preset rule base, so that the accuracy of abnormal flow detection can be improved, and the detection quality of abnormal flow can be effectively improved.
Drawings
Fig. 1 is a schematic view of a first implementation flow of an abnormal traffic detection method;
FIG. 2 is a schematic diagram of a system configuration of an abnormal flow detection system;
FIG. 3 is a schematic diagram of a second implementation flow of the abnormal traffic detection method;
fig. 4 is a schematic view of a third implementation flow of the abnormal traffic detection method;
FIG. 5 is a schematic diagram of a self-encoder;
FIG. 6 is a schematic diagram of the composition of the marking system;
FIG. 7 is a schematic illustration of a marking process;
fig. 8 is a schematic flow chart of the implementation of the abnormal traffic detection method;
FIG. 9 is a first schematic diagram of the structure of the abnormal flow detecting system;
FIG. 10 is a schematic diagram of the structure of the abnormal flow detecting system;
fig. 11 is a schematic diagram of the structure of the abnormal flow detection system.
Detailed Description
The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application. It is to be understood that the specific embodiments described herein are merely illustrative of the relevant application and are not limiting of the application. It should be noted that, for the convenience of description, only the parts related to the related applications are shown in the drawings.
With the rapid development of the internet and the continuous expansion of the network scale, the internet has become an indispensable part of human production life. At the same time, however, people inevitably suffer from network anomalies in the process of enjoying network convenience. Various network anomalies which generally exist at present can be expressed through the anomalies of network flow, the anomaly flow can more comprehensively reflect the real-time conditions of the existing network, such as network scanning, DDoS attack, network worm viruses and the like, and the anomaly flow change in the network can be timely found to have important significance for the anomaly positioning of a network data center and the adoption of subsequent corresponding remedial measures.
Abnormal flow detection has been used as an effective network protection means, can detect unknown network attack behaviors, provides important support for network situation awareness, and is receiving more and more attention of researchers in recent years. The abnormal traffic identification can be divided into full traffic monitoring, encrypted traffic monitoring, industrial Control Protocol traffic detection, transmission Control Protocol/Internet Protocol (TCP/IP) traffic monitoring, domain Name System (DNSDNS) traffic monitoring, hypertext Transfer Protocol (HTTP) traffic monitoring, and the like according to a Protocol; the detection method can be classified into black and white list-based detection, rule-based detection, machine learning model-based detection, deep learning model-based detection, and the like.
At present, there are two common methods for detecting abnormal traffic, one is to filter an abnormal access source IP based on an Internet Protocol (IP) blacklist, and the other is to match abnormal external traffic using rules. However, both of the two schemes have certain limitations, wherein the scheme based on the filtering of the IP blacklist can only identify abnormal traffic from a source IP in the existing IP blacklist, and has no perception on IP change conditions; the scheme based on rule matching needs to analyze samples one by one, and an attacker usually tries to bypass the existing rules, so that the original rules are invalid. Therefore, the rule base needs to be updated dynamically, which consumes huge manpower and has poor detection effect on unknown threats.
The deep learning technology has outstanding advantages in abnormal flow detection. The deep learning model can take original data as input and can better depict rich information of the data from learned features, thereby improving classification performance. However, for network traffic data with large data volume and strong real-time performance, when abnormal traffic is detected by deep learning, the detection effect of the model is greatly affected by inappropriate parameter selection of the model or poor quality of the selected data. For example, if the number of layers of the selected neural network model is large, the situation of slow convergence may occur in the training process, and if the number of layers of the selected neural network model is small, the network parameters may not be accurately adjusted in the training process, so that a detection model with high accuracy is not easy to obtain.
In summary, the conventional detection methods for abnormal traffic cannot achieve a good detection effect.
In order to solve the above defects, the present application provides an abnormal flow detection method, in which an abnormal flow detection system uses an unsupervised algorithm and a supervised algorithm to respectively train and generate a self-coding model, a first detection model and a second detection model, so that abnormal flow detection can be performed on the to-be-detected flow data based on the self-coding model, the first detection model, the second detection model and a preset rule base, and an obtained detection result is a risk judgment on the to-be-detected flow data, which is realized by combining the unsupervised algorithm, the supervised algorithm and the preset rule base, so that accuracy of abnormal flow detection can be improved, and detection quality of abnormal flow is effectively improved.
Specifically, the abnormal flow detection system can collect, analyze and extract flow data, encode the obtained high-dimensional discrete feature data by using a self-encoding model, perform preliminary abnormal flow detection by using an unsupervised isolated forest algorithm, namely a first detection model, and generate a more accurate label for training a supervised decision tree algorithm by combining a labeling system comprising a preset information base, a regular rule base and an examination pattern base to obtain a second detection model. Therefore, the risk judgment can be carried out on the flow data by using the self-coding model, the first detection model, the second detection model and the preset rule base and combining the unsupervised algorithm, the supervised algorithm and the voting mode of the preset rule base.
The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application.
An embodiment of the present application provides a method for detecting abnormal traffic, where fig. 1 is a schematic view illustrating an implementation flow of the method for detecting abnormal traffic, and as shown in fig. 1, in the embodiment of the present application, a method for detecting abnormal traffic by an abnormal traffic detection system may include the following steps:
step 101, obtaining flow data to be detected, analyzing the flow data to be detected, and obtaining target structured data.
In the embodiment of the application, when abnormal flow detection is performed, the abnormal flow detection system can acquire the flow data to be detected first, and then can analyze the flow data to be detected, so as to acquire the target structured data corresponding to the flow data to be detected.
It can be understood that, in the embodiment of the present application, the abnormal traffic detection system may perform protocol analysis on the traffic data to be detected, so as to restore the original network behavior information. Specifically, the abnormal flow detection system can analyze information of both communication parties from the flow data to be detected according to a protocol specification, specifically, the abnormal flow detection system can include specific information such as a source IP, a destination IP, a source port, a destination port, request content, response information and the like, and meanwhile, some useless information can be lost, and finally, a structured data format can be generated, that is, target structured data is generated, so that subsequent feature extraction processing is facilitated.
Further, in the embodiment of the application, the abnormal flow detection system can capture the network flow data, so as to obtain the flow data to be detected.
It can be understood that, in the embodiment of the present application, when the abnormal traffic detection system captures the network traffic data, the network traffic data transmitted in the gateway and the switch may be captured by using the optical fiber splitter and other devices.
It should be noted that, in the embodiment of the present application, the abnormal traffic detection system may be any terminal having communication and storage functions, for example: a tablet Computer, a mobile phone, an electronic reader, a remote controller, a Personal Computer (PC), a notebook Computer, a vehicle-mounted device, a network tv, a wearable device, a Personal Digital Assistant (PDA), a Portable Media Player (PMP), a navigation device, and other terminals.
It should be noted that, in the embodiment of the present application, the flow data to be detected may be network flow data captured by the abnormal flow detection system in real time, specifically, the abnormal flow detection system may directly collect the flow data to be detected from the network card, and may also directly receive the flow data to be detected sent by other systems.
And 102, performing feature extraction processing on the target structured data to obtain target feature data.
In the embodiment of the application, after analyzing and obtaining the target structured data of the flow data to be detected, the abnormal flow detection system can perform feature extraction processing on the target structured data, so as to obtain the target feature data of the flow data to be detected.
It should be noted that, in the embodiment of the present application, after the abnormal traffic detection system obtains the target structured data through analysis processing, feature extraction may be performed on each piece of data in the target structured data, so as to obtain feature information of each piece of data, and finally, the target feature data may be generated.
Specifically, in the present application, the target feature data may include several types of feature information, such as a basic feature of a flow corresponding to the flow data to be detected, a content feature of the flow, a statistical feature based on a time window, and a statistical feature of a host accessed by the statistical feature within the time window.
For example, in the embodiment of the present application, each of the target feature data may include the following four types of feature information: the first type is the basic characteristics of the flow, including the duration of the access, the protocol type, the number of bytes sent, and the like; the second type is the content characteristic of the flow, and the request content in the flow is converted into the text vector characteristic; the third type is based on the statistical characteristics of the time window, including the access frequency in the time window, the total number of bytes sent in the time window, and the number of connections in which "synchronization Sequence Numbers (SYN)" errors occur, etc.; the fourth category is the statistical characteristics of the hosts it accesses within a time window, including the frequency of accesses, the total number of bytes received, the total number of connections in which "SYN" errors occur, etc.
That is to say, the abnormal traffic detection system performs feature extraction on each piece of data in the target structured data, obtains the above four types of feature information of the piece of data, and then generates a feature vector of uniform length of the piece of data based on the above four types of feature information.
It can be seen that, in the present application, each of the target feature data is a feature vector with the same length. Namely, after the abnormal flow detection system performs feature extraction on the flow data to be detected, feature vectors with uniform length corresponding to each piece of data in the flow data to be detected can be obtained, so that target feature data are formed.
103, determining a first detection result corresponding to the target characteristic data based on the self-coding model and the first detection model, determining a second detection result corresponding to the target characteristic data based on the second detection model, and determining a third detection result corresponding to the target characteristic data based on a preset rule base; the self-coding model and the first detection model are generated based on an unsupervised algorithm; the second detection model is based on a model generated by a supervised algorithm.
In an embodiment of the application, after the abnormal traffic detection system performs feature extraction processing on target structured data to obtain target feature data, a first detection result corresponding to the target feature data may be determined based on a self-coding model and a first detection model, a second detection result corresponding to the target feature data may be determined based on a second detection model, and a third detection result corresponding to the target feature data may also be determined based on a preset rule base.
It should be noted that, in the embodiment of the present application, the self-coding model and the first detection model are models generated based on an unsupervised algorithm; the second detection model is a model generated based on a supervised algorithm.
Further, in the embodiment of the application, when the abnormal flow rate detection system obtains the first detection result, the target feature data may be input to the self-encoding model first, so as to output and obtain encoded data, and then the encoded data may be input to the first detection model, so as to determine the first detection result. The self-coding model is a model generated based on an unsupervised algorithm.
It can be understood that, in the embodiment of the present application, after the abnormal flow detection system performs feature extraction on the flow data to be detected, the obtained target feature data is high-dimensional sparse, and in order to be more suitable for the first detection model generated by the unsupervised learning algorithm, the abnormal flow detection system needs to pre-process the high-dimensional sparse target feature data. Specifically, the abnormal flow detection system may perform encoding processing on the target feature data through a self-encoding model obtained through pre-training, so that low-dimensional continuous feature data, that is, encoded data, may be obtained.
Further, the abnormal flow rate detection system may input the encoded data into the first detection model after encoding the target feature data by the self-encoding model, thereby outputting the first detection result.
It can be understood that, in the present application, there may be a portion of flow data to be detected that cannot be detected by the first detection model, that is, a portion of flow data to be detected that cannot be marked by the first detection model, and at this time, unpredictable encoded data corresponding to the flow data to be detected may also be marked by means of a preset marking strategy, so as to obtain a corresponding first detection result.
Therefore, the abnormal flow detection system can utilize the self-coding model to code the target characteristic data corresponding to the flow data to be detected, then the coded data is input into the first detection model to be detected, namely, the automatic marking processing is carried out, for the data which cannot be marked by the first detection model, the abnormal flow detection system can use the preset marking strategy to mark the unpredictable data through the preset information library, the regular rule library and the examination pattern library in sequence, and finally, the first detection result corresponding to the flow data to be detected can be obtained.
Further, in the embodiment of the present application, when the abnormal flow rate detection system obtains the second detection result, the target feature data may be input to the second detection model, so that the second detection result may be output. Namely, the second detection model is directly utilized to detect and process the target characteristic data, and a second detection result is obtained.
Further, in the embodiment of the present application, when the abnormal traffic detection system obtains the third detection result, the preset rule base may be used to perform matching processing on the target feature data, so that the third detection result may be obtained.
Specifically, in the embodiment of the present application, the abnormal flow rate detection system may perform matching processing on the target feature data and the preset rule database to obtain a matching result, and then may determine the matching result as a third detection result of the flow rate data to be detected.
Therefore, in the application, when the abnormal flow detection system detects the abnormal flow of the to-be-detected flow data, the first detection model generated based on the unsupervised algorithm and the second detection model generated based on the supervised algorithm are not only used, but also the matching processing is performed by combining the preset rule base, so that the first detection result, the second detection result and the third detection result corresponding to the to-be-detected flow are respectively obtained.
In the embodiment of the present application, the first detection result, the second detection result, and the third detection result are all normal flow rates or abnormal flow rates.
And 104, generating a target detection result of the flow data to be detected according to the first detection result, the second detection result and the third detection result.
In an embodiment of the present application, after the abnormal flow rate detecting system respectively obtains a first detection result, a second detection result, and a third detection result corresponding to a flow rate to be detected based on the self-coding model, the first detection model, the second detection model, and the preset rule base, the abnormal flow rate detecting system may further generate a target detection result of the flow rate data to be detected according to the first detection result, the second detection result, and the third detection result.
It should be noted that, in the embodiment of the present application, the abnormal traffic detection system may perform voting processing based on the first detection result, the second detection result, and the third detection result, that is, follow the principle of minority-compliant majority, and further obtain the detection result.
Further, in the present application, if at least two of the first detection result, the second detection result, and the third detection result are abnormal flows, it may be determined that the detection result is an abnormal flow; if at least two of the first detection result, the second detection result and the third detection result are normal flow rates, it can be determined that the target detection result is a normal flow rate.
For example, in the present application, if the first detection result, the second detection result, and the third detection result are all abnormal flow rates, it may be determined that the detection results are abnormal flow rates; and if the first detection result and the second detection result are both normal flow, and the third detection result is abnormal flow, determining that the detection result is normal flow.
Further, in the embodiment of the present application, when the abnormal flow rate detection system determines the target detection result, different weight values may be assigned to the first detection result, the second detection result, and the third detection result, and then the first detection result, the second detection result, and the third detection result are subjected to weighting operation by using the weight values, so as to finally obtain the target detection result of the flow rate data to be detected.
That is to say, in the present application, the abnormal flow rate detection system may evaluate and set the credibility of different detection results generated by different models in advance, so that when the first detection result, the second detection result, and the third detection result obtained by different models are inconsistent, the final detection result may be determined more accurately.
For example, in the present application, when the abnormal flow rate detection system generates a target detection result of flow rate data to be detected according to a first detection result, a second detection result, and a third detection result, a preset weight set may be obtained first; the preset weight set includes different weight values corresponding to different detection results, for example, the preset weight set specifically includes: the first weight of the first detection result is 0.3, the second weight of the second detection result is 0.4, and the third weight of the third detection result is 0.3; then, the abnormal flow rate detecting system may perform a weighting operation using the preset weight set, the first detection result, the second detection result, and the third detection result to obtain a target detection result, for example, when the first detection result is an abnormal flow rate, the second detection result is a normal flow rate, and the third detection result is an abnormal flow rate, after performing the weighting operation, the abnormal flow rate detecting system may determine that the target detection result of the flow rate data to be detected is 40% of the probability of the normal flow rate and 60% of the probability of the abnormal flow rate through the first weight 0.3, the second weight 0.4, and the third weight 0.3 in the preset weight set, and further may determine that the target detection result is the abnormal flow rate.
It can be understood that, in the embodiment of the present application, when the abnormal flow rate detection system generates the target detection result of the to-be-detected flow rate data according to the first detection result, the second detection result, and the third detection result, after the preset weight set is obtained, the weighting operation may not be performed, but the detection result corresponding to the maximum weight may be directly used as the target detection result. For example, the preset weight set specifically includes: the first weight of the first detection result is 0.3, the second weight of the second detection result is 0.4, and the third weight of the third detection result is 0.3; when the first detection result is abnormal flow, the second detection result is normal flow, and the third detection result is abnormal flow, the abnormal flow detection system can directly use the maximum weight, namely the second detection result corresponding to the second weight, as the target detection result of the flow data to be detected, namely the target detection result is normal flow.
That is to say, in the present application, in the process of detecting the flow data to be detected, the abnormal flow detection system may adopt the idea of ensemble learning, combine the detection results output by the self-encoder, the first detection model and the second detection model, and obtain the determination of the target detection result of the flow to be detected of the detection result by the existing preset rule base matching algorithm, so as to obtain the generalization performance better than that of a single model.
Through the method for detecting abnormal flow provided by the steps 101 to 104, on one hand, the judgment mode based on the detection algorithm has better flexibility, and the abnormal detection algorithm based on unsupervised learning is more suitable for the scene of flow monitoring and has better adaptability; on the other hand, if only the unsupervised learning detection algorithm is adopted, the problem of low detection accuracy and coverage rate may exist, so that the abnormal flow detection system mainly adopts two machine learning algorithms, namely unsupervised learning (isolated forest algorithm) and supervised learning (decision tree algorithm), and because the training speeds of the isolated forest and the decision tree algorithm are very high, the whole abnormal flow detection system can train and update on a daily level (on an hourly level), so that in actual use, a detection model obtained by latest training can be used for detecting real-time flow data, so that the user behavior on the same day can be inferred and judged, meanwhile, a regular rule base and a preset information base are updated in time, and the real-time performance of the whole abnormal flow detection system can be ensured.
The application provides an abnormal flow detection method, wherein an abnormal flow detection system acquires flow data to be detected and analyzes the flow data to be detected to acquire target structured data; carrying out feature extraction processing on the target structured data to obtain target feature data; determining a first detection result corresponding to the target characteristic data based on the self-coding model and the first detection model, determining a second detection result corresponding to the target characteristic data based on the second detection model, and determining a third detection result corresponding to the target characteristic data based on a preset rule base; the self-coding model and the first detection model are generated based on an unsupervised algorithm; the second detection model is based on a model generated by a supervision algorithm; and generating a target detection result of the flow data to be detected according to the first detection result, the second detection result and the third detection result. That is to say, in the embodiment of the present application, the abnormal flow detection system uses an unsupervised algorithm and a supervised algorithm to respectively train and generate the self-coding model, the first detection model and the second detection model, so that abnormal flow detection can be performed on the to-be-detected flow data based on the self-coding model, the first detection model, the second detection model and the preset rule base, and the obtained detection result is the risk judgment of the to-be-detected flow data, which is realized by combining the unsupervised algorithm, the supervised algorithm and the preset rule base, so that the accuracy of abnormal flow detection can be improved, and the detection quality of abnormal flow can be effectively improved.
Based on the above embodiments, in a further embodiment of the present application, fig. 2 is a schematic diagram of a system structure of an abnormal flow rate detection system, as shown in fig. 2, in the present application, an abnormal flow rate detection system 10 may include an acquisition module 11, an analysis module 12, an extraction module 13, an encoding module 14, and a detection module 15.
For example, in the present application, the acquisition module 11 may be configured to capture traffic data and store the acquired network traffic mirror image. Specifically, the acquisition module 11 can directly obtain the traffic data from the network card, and can also directly receive the traffic data sent by other systems.
For example, in the present application, the parsing module 12 may perform protocol parsing on the captured network traffic data to obtain traffic data that can be processed by a subsequent functional module. The method mainly analyzes information of two communication parties including a source IP, a destination IP, a source port, a destination port, request content and response information from network flow data according to protocol specifications, discards some useless information and generates a structured data format.
It is understood that in the embodiment of the present application, the acquisition module 11 and the analysis module 12 are both used for preprocessing of the flow data.
For example, in the present application, the extraction module 13 may obtain feature data corresponding to each data sample based on the structured data obtained through parsing. The features generated mainly include four major categories: the first type is the basic characteristics of the flow, including the duration of the access, the protocol type, the number of bytes sent, and the like; the second type is the content characteristic of the flow, and the request content in the flow is converted into the text vector characteristic; the third type is based on the statistical characteristics of the time window, including the access frequency in the time window, the total number of bytes sent in the time window, the number of connections with SYN errors, and the like; the fourth type is the statistical characteristics of the accessed host in a time window, including the accessed frequency, the received total byte number, the total connection number of SYN errors and the like, and the four types of characteristics are converted into characteristic vectors with uniform length to obtain characteristic data.
For example, in the present application, the encoding module 14 may be configured to convert the high-dimensional sparse feature data extracted by the extraction module 13 into low-dimensional continuous feature data, i.e., encoded data.
It is understood that in the embodiments of the present application, both the extraction module 13 and the encoding module 14 are used for the acquisition of the characteristics of the flow data.
For example, in the present application, the detection module 15 may include a first detection model 151 obtained based on unsupervised algorithm training, a second detection model 152 obtained based on supervised algorithm training, a labeling system 153 for implementing a preset labeling strategy, and a preset rule base 154. Specifically, when detecting the flow data, the detection module 15 may simultaneously use the first detection model 151, the second detection model 152, and the preset rule base 154 to generate different detection results, respectively.
Based on the system structure shown in fig. 2, fig. 3 is a schematic diagram of an implementation flow of the abnormal flow detection method, and as shown in fig. 3, the method for detecting abnormal flow by the abnormal flow detection system may include the following steps:
step 201, collecting data flow.
In the embodiment of the present application, the acquisition module 11 may acquire traffic data first, where the acquisition module 11 may adopt devices such as an optical fiber splitter to capture traffic data transmitted in a gateway and a switch.
And step 202, analyzing the data flow.
In the embodiment of the present application, the parsing module 12 may parse the traffic data, specifically, may perform protocol parsing on the traffic data, so as to restore the original network behavior information thereof, and meanwhile, may discard some useless information, and may finally generate the structured data format.
And step 203, feature extraction.
In an embodiment of the present application, the extraction module 13 may perform feature extraction on the analyzed structured data to obtain feature data of the data traffic. The feature data may include several types of feature information, such as basic features of traffic corresponding to the traffic data, content features of the traffic, statistical features based on a time window, and statistical features of hosts accessed by the hosts within the time window.
And step 204, carrying out abnormal flow detection by using the self-coding model and the first detection model.
In the embodiment of the present application, since the first detection model is trained based on an unsupervised learning algorithm, the first detection model has a poor detection effect on the high-dimensional sparse feature information obtained after feature extraction, and then the encoding module 14 needs to encode the feature information by using a self-encoding model, the obtained encoded data is low-dimensional continuous feature data, and then the detection module 15 detects the encoded data by using the first detection model.
Further, in an embodiment of the present application, the first detection model may be obtained by training using an isolated forest algorithm.
It should be noted that, in the embodiment of the present application, for traffic data that cannot be distinguished by the first detection model, the detection module 15 may implement a marking process by using a marking system according to a preset marking policy.
It is understood that, in the embodiment of the present application, after the detection module 15 performs the flow anomaly detection on the flow data based on the self-coding model, the first detection model and the marking system, the corresponding detection result 1 may be obtained.
And step 205, abnormal flow detection is carried out by using a second detection model.
In the embodiment of the present application, the detection module 15 may directly input the feature data obtained by the extraction module 13 into the second detection model, so as to output the detection result 2 corresponding to the flow data.
Further, in an embodiment of the present application, the second detection model may be obtained by training using a decision tree algorithm.
And step 206, detecting abnormal flow by using a preset rule base.
In the embodiment of the present application, the detection module 15 may perform matching processing on the feature data by using a preset rule base, so as to obtain the detection result 3 corresponding to the flow data.
And step 207, generating a detection result.
In an embodiment of the present application, the detection module 15 may obtain a detection result of the flow data according to the detection result 1, the detection result 2, and the detection result 3.
It is understood that, in the present application, the detection result 1, the detection result 2, the detection result 3, and the detection result are all abnormal flow rates or normal flow rates.
Further, in the embodiment of the present application, when the detection module 15 generates the detection result, if at least two of the detection result 1, the detection result 2, and the detection result 3 are abnormal flow rates, the detection result is an abnormal flow rate; if at least two of the detection result 1, the detection result 2, and the detection result 3 are normal flows, the detection result is a normal flow.
Further, in the embodiment of the present application, when the detection module 15 generates the detection result, a preset weight set may also be obtained first, where the preset weight set is used to evaluate and set the credibility of different detection results generated by different models, and then the final detection result is further determined by using the weight values corresponding to the detection result 1, the detection result 2, and the detection result 3 included in the preset weight set.
The application provides an abnormal flow detection method, wherein an abnormal flow detection system acquires flow data to be detected and analyzes the flow data to be detected to acquire target structured data; carrying out feature extraction processing on the target structured data to obtain target feature data; determining a first detection result corresponding to the target characteristic data based on the self-coding model and the first detection model, determining a second detection result corresponding to the target characteristic data based on the second detection model, and determining a third detection result corresponding to the target characteristic data based on a preset rule base; the self-coding model and the first detection model are generated based on an unsupervised algorithm; the second detection model is based on a model generated by a supervised algorithm; and generating a target detection result of the flow data to be detected according to the first detection result, the second detection result and the third detection result. That is to say, in the embodiment of the present application, the abnormal flow detection system uses the unsupervised algorithm and the supervised algorithm to respectively train and generate the self-coding model, the first detection model and the second detection model, so that the abnormal flow detection can be performed on the to-be-detected flow data based on the self-coding model, the first detection model, the second detection model and the preset rule base, and the obtained detection result is the risk judgment of the to-be-detected flow data, which is realized by combining the unsupervised algorithm, the supervised algorithm and the preset rule base, so that the accuracy of the abnormal flow detection can be improved, and the detection quality of the abnormal flow can be effectively improved.
Another embodiment of the present application provides an abnormal traffic detection method, fig. 4 is a schematic view illustrating an implementation flow of the abnormal traffic detection method, and as shown in fig. 4, the method for detecting abnormal traffic by the abnormal traffic detection system may include the following steps:
step 301, acquiring first flow data, and extracting feature data corresponding to the first flow data.
In the embodiment of the present application, the abnormal flow rate detection system may obtain first flow rate data, and then may perform feature extraction on the first flow rate data to obtain feature data corresponding to the first flow rate data.
It is understood that, in the embodiment of the present application, the abnormal traffic detection system may perform training of the model before detecting the traffic to be detected by using the self-coding model, the first detection model, and the second detection model.
It should be noted that, in the embodiment of the present application, the first traffic data may be used as training data for model training. Specifically, the first streaming data may be composed of a plurality of data samples.
Further, in the embodiment of the present application, after the abnormal flow rate detecting system acquires the first flow rate data, the characteristic information corresponding to the first flow rate data may be extracted first. Specifically, the abnormal flow detection system may perform feature extraction on the first flow data to obtain feature data corresponding to the first flow data.
It should be noted that, in the embodiment of the present application, after the abnormal flow rate detection system acquires the first flow rate data and before extracting the feature data corresponding to the first flow rate data, the abnormal flow rate detection system needs to analyze and process the first flow rate data, so that useless information in the first flow rate data can be filtered out, and the problem of inaccurate model caused by non-normative data can be reduced as much as possible.
Specifically, in the embodiment of the present application, the abnormal traffic detection system may analyze the first traffic data to obtain first structured data with a uniform structure.
It should be noted that, in the embodiment of the present application, the abnormal traffic detection system may perform protocol parsing on the first traffic data, so as to restore the original network behavior information. Specifically, the abnormal traffic detection system may analyze information of both communication parties from the first traffic data according to the protocol specification, specifically, the information may include specific information such as a source IP, a destination IP, a source port, a destination port, request content, and response information, and may discard some useless information, and finally, a structured data format may be generated, that is, first structured data corresponding to the first traffic data may be generated.
It should be noted that, in the embodiment of the present application, after the abnormal traffic detection system obtains the first structured data through analysis processing, feature extraction may be performed on each piece of data in the first structured data to obtain feature information of each piece of data, and finally, feature data may be generated.
For example, in the embodiments of the present application, each of the feature data may include the following four types of feature information: the first type is the basic characteristics of the flow, including the duration of the access, the protocol type, the number of bytes sent, and the like; the second type is the content characteristic of the flow, and the request content in the flow is converted into the text vector characteristic; the third type is based on the statistical characteristics of the time window, including the access frequency in the time window, the total number of bytes sent in the time window, the number of connections with SYN errors, and the like; the fourth category is the statistical characteristics of the hosts it accesses within a time window, including the frequency of accesses, the total number of bytes received, the total number of connections on which "SYN" errors occur, etc.
That is to say, the abnormal flow detection system performs feature extraction on each piece of first structured data, obtains the above four types of feature information of the piece of data, and then generates a feature vector of uniform length of the piece of data based on the above four types of feature information.
It can be seen that, in the present application, each of the feature data is a feature vector with the same length.
Step 302, coding the characteristic data through a self-coding model to obtain coded data; the self-coding model is a model generated based on an unsupervised algorithm.
In the embodiment of the application, after the abnormal traffic detection system acquires the first traffic data and extracts the feature data corresponding to the first traffic data, the feature data may be encoded by using a self-encoding model, so as to obtain encoded data.
It should be noted that, in the embodiment of the present application, the self-coding model may be a model generated by the abnormal traffic detection system based on an unsupervised algorithm. In the real life, the unsupervised algorithm, namely unsupervised learning, lacks sufficient prior knowledge, so that the category is difficult to label manually or the cost for labeling manually is too high. Naturally, it is desirable that a computer perform these tasks, or at least provide some assistance, on behalf of a human. Various problems in pattern recognition are solved from training samples whose classes are unknown (not labeled), referred to as unsupervised learning.
Specifically, the common unsupervised learning algorithm mainly includes a Principal Component Analysis (PCA), an equidistant mapping method, a local linear embedding method, a laplacian feature mapping method, a black-filled local linear embedding method, a local tangent space arrangement method, and the like.
It can be understood that, in the embodiment of the present application, after the abnormal flow detection system performs feature extraction on the first flow data, the obtained feature data is high-dimensional sparse, and the unsupervised learning algorithm has a non-ideal effect on classifying the high-dimensional sparse discrete data, so that when performing model training by using the feature data of the first flow data, the abnormal flow detection system needs to pre-process the high-dimensional sparse feature data in order to be more suitable for the detection model generated by the unsupervised learning algorithm. Specifically, the abnormal flow detection system may encode the feature data through a self-encoding model obtained through pre-training, so that low-dimensional continuous feature data, that is, encoded data, may be obtained.
In this application, the self-coding model is an auto-encoder (AE), wherein the auto-encoder is a kind of Artificial Neural Networks (ans) used in semi-supervised learning and unsupervised learning, and functions to perform representation learning (representation learning) on input information by using the input information as a learning target. In a learning paradigm, an auto-encoder can be divided into a punctured auto-encoder (uniform auto-encoder), a regularized auto-encoder (regularized auto-encoder), and a Variational auto-encoder (VAE), where the former two are discriminant models and the latter is a generative model. The self-encoder can be a neural network with a feedforward structure or a recursive structure according to the construction type.
The auto-encoder has a function of characterizing a learning algorithm in a general sense, and is applied to dimension reduction (dimensional reduction) and outlier detection (abnormal detection). The self-encoder including the convolutional layer structure can be applied to computer vision problems including image denoising (image denoising), neural style transfer (neural style transfer), and the like.
That is to say, the self-encoder can adopt an unsupervised learning mode to carry out efficient feature extraction and feature representation on high-dimensional data, and the use is extremely wide. Fig. 5 is a schematic structural diagram of a self-encoder, and as shown in fig. 5, the self-encoder may include two parts, an encoder (encoder) and a decoder (decoder). Wherein, an input sample X is mapped to a feature space Z through an encoder, namely an encoding process, and then an abstract feature Z is mapped back to an original space through a decoder to obtain a reconstructed sample X 0 I.e. the decoding process. The optimization objective is to simultaneously train the encoder and decoder by minimizing the reconstruction error, learning to the sample input X 0 The abstract feature of (2) represents Z.
303, training the coded data to obtain a first detection model; the first detection model is a model generated based on an unsupervised algorithm.
And 304, obtaining a test result corresponding to the first traffic data according to the first detection model and a preset marking strategy.
In the embodiment of the application, after the abnormal flow detection system performs coding processing on the feature data through the self-coding model and obtains coded data, the abnormal flow detection system may perform model training by using the coded data to obtain the first detection model. And then, continuously obtaining a test result corresponding to the first flow data through the first detection model and a preset marking strategy.
It should be noted that, in the embodiment of the present application, the first detection model may be a model generated by the abnormal flow detection system based on an unsupervised algorithm. For example, the first detection model may be obtained by training the abnormal traffic detection system based on an isolated forest algorithm.
Further, in the embodiment of the present application, the abnormal flow rate detection system may first perform detection processing on the encoded data through the first detection model, so as to implement marking of a part of the first flow rate data based on the feature data, however, for another part of the first flow rate data that cannot be marked by using the first detection model, the abnormal flow rate detection system may select to perform detection processing on other first flow rate data according to a preset marking policy, and finally complete detection on all the first flow rate data, so as to obtain a detection result, that is, a test result of the first flow rate data.
It can be understood that a big problem in the abnormal traffic detection field compared with other detection fields is that the input traffic data is too large to be labeled manually, so that most of the traffic data are unlabeled and cannot be directly identified by using a supervision algorithm with high accuracy. Thus, based on the characteristic data of the first flow data, the abnormal flow detection system may classify the first flow data using an unsupervised algorithm after obtaining the encoded data. Specifically, the abnormal traffic detection system can obtain the first detection model based on the isolated forest algorithm, so that automatic marking processing of most of first traffic data is realized.
Step 305, training and obtaining a second detection model based on the feature data and the test result; the second detection model is a model generated based on a supervised algorithm.
In an embodiment of the present application, after obtaining a test result corresponding to the first flow data according to the first detection model and a preset labeling strategy, the abnormal flow detection system may perform model training by using feature data and the test result corresponding to the first flow data, so as to obtain a second detection model.
It should be noted that, in the embodiment of the present application, the second detection model may be generated based on supervised algorithm training for the abnormal flow detection system.
In particular, supervised algorithms, i.e., supervised learning, are machine learning tasks that infer functions from a set of labeled training data, which consists of a set of training instances. In supervised learning, each instance is a pair consisting of an input object (usually a vector) and a desired output value (also called a supervisory signal). Supervised learning algorithms analyze the training data and produce an inferred function that can be used to map new examples. An optimal solution would allow the algorithm to correctly determine class labels without the labels being visible.
The method is characterized in that samples with known characteristics or certain characteristics are used as a training set to establish a mathematical model (such as a discrimination model in pattern recognition, a weight model in an artificial neural network method and the like), and the established model is used for predicting unknown samples, and the method is called supervised learning. Supervised learning is the most common method of machine learning. The supervised learning may include Support Vector Machines (Support Vector Machines), linear regression (linear regression), logistic regression (logistic regression), naive Bayes (negative Bayes), linear discriminant analysis (linear discriminant analysis), decision trees (decision trees) K-nearest neighbor (K-nearest neighbor algorithm), and the like.
Further, in the embodiment of the present application, when the abnormal flow detection system obtains the second detection model based on the feature data and the test result, the feature data and the test result may be input into the decision tree algorithm, and the second detection model is obtained through training.
It should be noted that, in the embodiment of the present application, when the abnormal flow detection system trains and obtains the second detection model based on the feature data and the test result, a decision tree algorithm in supervised learning may be preferred, and specifically, the abnormal flow detection system may input the feature data and the test result into the decision tree algorithm, and may finally train and obtain the second detection model.
The decision tree is a supervised machine learning algorithm, is a tree-shaped decision diagram with probability results, and is a visual graph method applying statistical probability analysis. A decision tree in machine learning is a predictive model that represents a mapping between object attributes and object values, with each node in the tree representing a decision condition for an object attribute and its branches representing objects that meet the node. The leaf nodes of the tree represent the predicted results to which the object belongs, and decision tree algorithms are often used to solve classification and regression problems.
It can be understood that, in the embodiment of the present application, the abnormal flow rate detection system obtains the test result corresponding to the first flow rate data through the first detection model and the preset marking strategy, and completes the marking processing on the first flow rate data, that is, through the above processing flow of the encoded data, the abnormal flow rate detection system obtains the more accurate labels of the first flow rate data, so that the second detection model can be trained based on the supervised algorithm by using the labels.
Therefore, in the application, the abnormal flow detection system can preferably adopt a decision tree algorithm to realize the training of the second detection model, the high-dimensional discrete feature obtained after feature extraction is carried out on the first flow data, namely feature data is used as an initial feature, the marking result obtained through the first detection model and a preset marking strategy, namely a test result is used as a label, the feature data and the test result which correspond to each other are used as training data and sent into the decision tree for supervised learning, and finally the second detection model can be obtained through training. Compared with other machine learning algorithms, the decision tree algorithm has the advantages of being strong in interpretability, relatively simple in algorithm and capable of achieving rapid judgment.
It should be noted that, in the embodiment of the present application, in the process of performing supervised learning by using feature data and test results that correspond to each other by the abnormal flow rate detection system as training data, if the detection result determined by the second detection model based on the feature data is inconsistent with the corresponding test result, the abnormal flow rate detection system may feed back the detection result to the feature extraction process, that is, step 301, then perform feature extraction on the first flow rate data again to obtain new feature data, and then perform training of the first detection model and the second detection model in sequence according to steps 302 to 305 by using the new feature data again.
That is to say, in the present application, when the abnormal flow detection system performs model training, it may perform feedback processing according to the result output by training, and adjust the weighted values corresponding to different features in the feature extraction process, so as to continuously obtain new feature data based on the first flow data, and continuously perform training of the first detection model and the second detection model. Therefore, after multiple times of iterative training and testing, a fully trained model can be obtained.
Further, in an embodiment of the present application, the method for obtaining a test result corresponding to the first flow data by the abnormal flow detection system according to the first detection model and a preset marking policy may include the following steps:
step 304a, detecting the encoded data by using a first detection model to obtain an initial result; wherein, the initial result includes: normal traffic, abnormal traffic, and unpredictable.
Step 304b, according to a preset marking strategy, marking the encoded data of which the initial result is unpredictable to obtain a marking result; wherein the marking result comprises normal flow and abnormal flow.
Step 304c, generating a test result based on the initial result and the marking result.
In the embodiment of the application, after the abnormal traffic detection system performs coding processing on the feature data through the self-coding model and obtains coded data, the first detection model may be used to perform detection processing on the coded data to obtain an initial result.
In the embodiment of the present application, when the abnormal flow rate detection system uses the first detection model to detect the encoded data, the encoded data may be input to the first detection model (isolated forest) first, so that the prediction score X of each flow sample in the first flow rate data may be output and obtained, and then, the abnormal flow rate detection system may normalize the prediction score X of each flow sample according to the following formula to obtain the prediction score X normalized to 0 to 1 norm
X norm =(X-X min )/(X max -X min ) (1)
Wherein X is the score predicted by the isolated forest algorithm, X min Minimum score, X, predicted for training sample max Highest score, X, predicted for training samples norm Is the predicted score after normalization.
It is understood that in the embodiments of the present application, the prediction score X if normalized norm Close to 1, the flow sample can be considered to be a normal flow sample if the predicted score X after normalization norm Close to 0, then the traffic sample may be considered an abnormal traffic sample.
It should be noted that, in the embodiment of the present application, after the detection processing is performed on the encoded data by using the first detection model, the abnormal traffic detection system may obtain an initial result, which is one of a normal traffic, an abnormal traffic, and an unpredictable traffic.
Further, in the embodiment of the present application, for a flow sample with a score close to 1 or 0, the abnormal flow detection system regards the flow sample as a flow data sample with a higher degree of discrimination, and can mark the flow sample as an abnormal flow or a normal flow automatically through the first detection model, however, for a flow sample with a score close to 0.5, the abnormal flow detection system regards the flow sample as a flow data sample which is difficult to distinguish, and therefore, the marking process needs to be continued on the flow sample which is difficult to distinguish and unpredictable according to a preset marking strategy.
For example, in the present application, when the abnormal flow rate detection system performs the marking processing on the encoded data whose initial result is unpredictable according to the preset marking policy, the encoded data whose initial result is unpredictable may be input into the marking system, and a corresponding marking result is finally output.
It is understood that in the embodiments of the present application, the marking result may be an abnormal flow rate or a normal flow rate.
Specifically, fig. 6 is a schematic diagram of a composition structure of the marking system, and as shown in fig. 6, the marking system may be composed of a preset intelligence library, a regular rule library, and an examination pattern library. Correspondingly, the abnormal flow detection system can firstly carry out matching processing on the coded data with the unpredictable initial result, the preset information library and the regular rule library respectively, and if the coded data with the unpredictable initial result can be matched with the preset information library and the regular rule library simultaneously, the abnormal flow detection system can determine that the marked result is abnormal flow; if the initial result is unpredictable encoded data, the predetermined intelligence base cannot be matched, or the regular rule base cannot be matched, the abnormal traffic detection system can continue to determine the marking result according to the predetermined examination strategy. Specifically, the abnormal traffic detection system may further perform a marking process using the audit pattern library, and finally obtain a marking result.
It should be noted that, in the embodiment of the present application, the abnormal traffic detection system may update the preset intelligence library and the regular rule library in real time. Specifically, the abnormal flow detection system may update the preset information base and the regular rule base in real time by using data that cannot be matched with the preset information base or the regular rule base.
For example, fig. 7 is a schematic diagram of a labeling process, as shown in fig. 7, for data that cannot be predicted by the input first detection model, that is, data that is difficult to distinguish, the abnormal traffic detection system may match the data with a preset information library, where the preset information library may be a threat information library that the abnormal traffic detection system acquires from a third party, and if the preset information library matches the data, for example, the IP of the data matches the data in the IP blacklist in the preset information library, the data may continue to be matched with the regular rule library; if the predetermined intelligence base does not match the data, the data may be flagged according to a predetermined review policy, and in particular, the data may be flagged using a review schema in the review schema base. The preset examination strategy can represent that marking processing is carried out by using a historical examination method, and can also represent that data is considered to be examined so as to realize marking processing.
Further, in the present application, when the abnormal traffic detection system performs matching processing on the data and the regular rule base, if the preset information base is matched with the data, the data can be directly determined as a black sample, that is, the data is marked as abnormal traffic; if the predetermined intelligence base does not match the data, the data may be flagged according to a predetermined review policy, and in particular, the data may be flagged using a review schema in the review schema base.
It should be noted that, in the present application, because there are many rules involved in the regular rule base and the matching requirement is strict, if the data matches the regular rule base, the data may be directly determined as a black sample, that is, an abnormal flow; and the abnormal flow detection system can consider the data as a possible abnormal sample, so that the data needs to be marked according to a preset examination strategy, specifically, if the data is not examined, the data is determined as a black sample, namely, abnormal flow, and if the data is examined, the data is determined as a white sample, namely, normal flow. For example, for data which cannot be matched with the preset intelligence base and the regular rule base, a network security expert may perform examination to implement the marking processing on the data.
Further, in the embodiment of the present application, when the abnormal traffic detection system generates the test result based on the initial result and the marking result, if the initial result is the normal traffic, it may be determined that the test result is the normal traffic; if the initial result is abnormal flow, determining that the test result is abnormal flow; if the initial result is unpredictable, after the marking is carried out according to a preset marking strategy, if the marking result is normal flow, the testing result can be determined to be normal flow; if the marking result is abnormal traffic, the test result can be determined to be abnormal traffic.
In the application, experiments prove that the first detection model can automatically mark a large amount of flow data, and the abnormal flow detection system only needs to mark 3% of the flow data according to a preset marking strategy.
In an embodiment of the present application, further based on fig. 4, before the abnormal flow rate detecting system acquires the first flow rate data and extracts the feature data corresponding to the first flow rate data, that is, before step 301, the method for the abnormal flow rate detecting system to detect the abnormal flow rate may further include the following steps:
and step 306, capturing the network traffic data, and storing the network traffic data to a preset storage space in a mirror image manner.
In the embodiment of the application, the abnormal traffic detection system can capture the network traffic data in real time, and then store the captured network traffic data in a fixed space in a mirror manner, namely, in a preset storage space.
It can be understood that, in the embodiment of the present application, when the abnormal traffic detection system captures the network traffic data, the optical fiber splitter and other devices may be used to capture the network traffic data transmitted in the gateway and the switch, and store the acquired network traffic mirror image in the preset storage space.
It should be noted that, in the embodiment of the present application, the abnormal traffic detection system continuously captures and mirror-stores the network traffic data, so that the network traffic data stored in mirror image can be used to train and test the model.
Further, in the embodiment of the present application, when the abnormal traffic detection system acquires the first traffic data, the abnormal traffic detection system may read the first traffic data from the network traffic data stored in the preset storage space.
That is, in the present application, the first traffic data may be historical network traffic data that is previously stored by the abnormal traffic detection system.
In an embodiment of the present application, further, fig. 8 is a schematic view of an implementation flow of an abnormal traffic detection method, as shown in fig. 8, before the abnormal traffic detection system performs encoding processing on the feature data through the self-encoding model to obtain encoded data, that is, before step 302, the method for performing abnormal traffic detection by the abnormal traffic detection system may further include the following steps:
and 307, acquiring second flow data from the network flow data stored in the preset storage space.
And 308, training and obtaining a self-coding model by utilizing the second stream data based on an unsupervised algorithm.
In the embodiment of the present application, before acquiring encoded data corresponding to feature data, the abnormal traffic detection system needs to train and obtain a self-encoding model.
It should be noted that, in the embodiment of the present application, the abnormal traffic detection system may first obtain second traffic data from the network traffic data stored in the preset storage space; the self-encoding model may then be obtained based on an unsupervised algorithm using second streaming data training.
It can be understood that, in the present application, a self-coding model (self-encoder) is an unsupervised machine learning algorithm widely used for anomaly detection, and an anomaly flow detection system is based on a self-coding model obtained by training normal flow samples, and can reconstruct and restore the normal flow samples, but cannot restore data points different from normal distribution well. According to the characteristic, in the process of generating the self-coding model, the abnormal flow detection system can use a small amount of normal flow data as a training sample, so that the obtained self-coding model can obtain more uniform distribution on normal flow samples, and cannot obtain more uniform distribution on abnormal flow samples.
That is, in the present application, the second traffic data trained from the coding model is normal traffic data. Accordingly, when acquiring the second traffic data, the abnormal traffic detection system may read the traffic data that has been marked as normal traffic from the network traffic data stored in the preset storage space, and then determine these traffic data as the second traffic data. That is, the second traffic data includes a plurality of normal traffic, and does not include abnormal traffic.
Further, in the embodiment of the present application, after the abnormal traffic detection system is trained by using the second traffic data to obtain the self-encoder, only the encoding portion of the self-encoder may be reserved, that is, the self-encoding model may be the encoding portion of the self-encoder obtained by training, so that the input high-dimensional discrete feature data may be converted into low-dimensional continuous feature data by the self-encoding model, that is, the feature data is encoded by the self-encoding model to obtain encoded data.
It is understood that, in the embodiment of the present application, after the training of the self-coding model, the first detection model and the second detection model is completed, the abnormal traffic detection system may perform the detection of the abnormal traffic by using the self-coding model, the first detection model and the second detection model.
Specifically, in the present application, for the flow data to be detected, the abnormal flow detection system may use the self-coding model, the first detection model, the second detection model, and the preset rule base to perform detection processing on the flow data to be detected, so as to obtain a target detection result of the flow data to be detected.
It should be noted that, in the embodiment of the present application, the flow data to be detected may be network flow data captured by the abnormal flow detection system in real time, specifically, the abnormal flow detection system may directly collect the flow data to be detected from the network card, and may also directly receive the flow data to be detected sent by other systems.
That is to say, in the present application, after multiple iterative training and testing, the abnormal flow detection system can obtain the trained self-encoder, the first detection model, and the second detection model. In the process of detecting the flow data to be detected, the abnormal flow detection system can adopt the idea of ensemble learning, combine the detection results output by the self-encoder, the first detection model and the second detection model and the existing preset rule base matching algorithm to obtain the detection result, and determine the target detection result of the flow to be detected, so that the generalization performance better than that of a single model can be obtained.
Therefore, the core of the abnormal flow detection method provided by the application lies in that the abnormal flow is detected by combining a machine learning algorithm and the existing rule engine, and a semi-supervised learning scheme is realized. When the abnormal flow detection system uses a preset marking strategy to mark data which cannot be identified by the first detection model, a small part of less definite flow data can be distinguished by a network security expert to be marked manually, and then for larger flow data, even if only 0.1% of the flow data needs to be marked manually, the workload is not small. In order to solve the problem, on one hand, the abnormal flow detection system can be further improved by an unsupervised algorithm of an isolated forest, namely, a first detection model is optimized, so that higher accuracy and coverage rate can be achieved; on the other hand, the abnormal flow detection system can generate a large amount of abnormal flows with different characteristics by analyzing the existing abnormal flow samples and adopting a method for generating a countermeasure network, and a supervised algorithm model with higher accuracy is trained according to the generated data and the labels.
Through the abnormal flow detection method provided in the steps 301 to 308, on one hand, the judgment mode based on the algorithm has better flexibility, and the abnormal flow detection algorithm based on unsupervised learning is more suitable for the flow monitoring scene and has better adaptability; on the other hand, if only the unsupervised learning detection algorithm is adopted, the problem that the detection accuracy and the coverage rate are low may exist, and in order to solve the problem, the abnormal flow detection system only uses the unsupervised learning detection algorithm to generate a result with high confidence coefficient as a label, and for distinguishing fuzzy and unpredictable data samples, the abnormal flow detection system introduces a preset marking strategy to mark the data samples accurately, namely the combination of the detection algorithm and the preset marking strategy enables the preset marking strategy to help the detection algorithm to improve the accuracy and the coverage rate of the detection algorithm, and simultaneously screens most of flow data samples which do not need to be marked through the unsupervised algorithm, so that the marking cost is greatly saved, the discovery of unknown threats can be assisted, and a regular rule base and a preset information base are enriched and updated; on the other hand, in a network flow service scene, most behavior characteristics are discrete characteristics, and the behavior characteristics are not suitable for abnormal detection by using unsupervised learning such as an isolated forest algorithm, so that an abnormal flow detection system encodes characteristic data by using a self-encoder obtained by training, obtains continuous characteristic data and sends the continuous characteristic data to a first detection model, and the expressive ability of a detection algorithm is greatly improved; on the other hand, two machine learning algorithms, namely unsupervised learning (isolated forest algorithm) and supervised learning (decision tree algorithm), are mainly adopted by the abnormal flow detection system, and since the training speeds of the isolated forest and the decision tree algorithm are very high, the whole abnormal flow detection system can be trained and updated on a day-by-day (hour-by-hour) level, so that in actual use, a detection model obtained by latest training can be used for detecting real-time flow data, the user behavior on the same day can be inferred and judged, meanwhile, the regular rule base and the preset information base are updated in time, and the real-time performance of the whole abnormal flow detection system can be ensured.
The application provides an abnormal flow detection method, wherein an abnormal flow detection system acquires first flow data and extracts characteristic data corresponding to the first flow data; coding the characteristic data through a self-coding model to obtain coded data; the self-coding model is generated based on an unsupervised algorithm; training through the coded data to obtain a first detection model; the first detection model is a model generated based on an unsupervised algorithm; obtaining a test result corresponding to the first flow data according to the first detection model and a preset marking strategy; training based on the feature data and the test result to obtain a second detection model; the second detection model is a model generated based on a supervised algorithm. That is to say, in the embodiment of the present application, the abnormal flow detection system uses the unsupervised algorithm and the supervised algorithm to respectively train and generate the self-coding model, the first detection model and the second detection model, so that the abnormal flow detection can be performed on the to-be-detected flow data based on the self-coding model, the first detection model, the second detection model and the preset rule base, and the obtained detection result is the risk judgment of the to-be-detected flow data, which is realized by combining the unsupervised algorithm, the supervised algorithm and the preset rule base, so that the accuracy of the abnormal flow detection can be improved, and the detection quality of the abnormal flow can be effectively improved.
Based on the foregoing embodiment, in another embodiment of the present application, fig. 9 is a schematic structural diagram of a first composition of an abnormal flow rate detection system, and as shown in fig. 9, the abnormal flow rate detection system 10 according to the embodiment of the present application may include a first obtaining portion 16, an analyzing portion 17, a first extracting portion 18, a determining portion 19, and a generating portion 110.
The first acquisition part 16 is configured to acquire flow data to be detected;
the analysis part 17 is configured to analyze the flow data to be detected to obtain target structured data;
the first extraction part 18 is configured to perform feature extraction processing on the target structured data to obtain target feature data;
the determining part 19 is configured to determine a first detection result corresponding to the target feature data based on a self-coding model and a first detection model, determine a second detection result corresponding to the target feature data based on a second detection model, and determine a third detection result corresponding to the target feature data based on a preset rule base; wherein the self-coding model is a model generated based on an unsupervised algorithm; the first detection model is a model generated based on an unsupervised algorithm; the second detection model is based on a model generated by a supervised algorithm;
the generating part 110 is configured to generate a target detection result of the to-be-detected flow data according to the first detection result, the second detection result, and the third detection result.
Further, in the embodiment of the present application, the determining section 19 is specifically configured to input the target feature data to a self-encoding model, and output encoded data; and inputting the encoded data to the first detection model, and outputting the first detection result.
Further, in the embodiment of the present application, the determining part 19 is further specifically configured to input the target feature data into the second detection model, and output the second detection result.
Further, in an embodiment of the present application, the determining part 19 is further specifically configured to perform matching processing on the target feature data by using the preset rule base, so as to obtain the third detection result.
Further, in an embodiment of the present application, the generating part 110 is specifically configured to determine that the target detection result is a normal flow rate if at least two of the first detection result, the second detection result, and the third detection result are normal flow rates; and if at least two of the first detection result, the second detection result and the third detection result are abnormal flow, determining that the target detection result is the abnormal flow.
Further, in an embodiment of the present application, the generating part 110 is further specifically configured to obtain a preset weight set; and performing weighting operation by using the preset weight set, the first detection result, the second detection result and the third detection result to obtain the target detection result.
Fig. 10 is a schematic diagram of a configuration structure of an abnormal flow rate detecting system, and as shown in fig. 10, the abnormal flow rate detecting system 10 according to the embodiment of the present application may further include a second obtaining portion 111, a second extracting portion 112, an encoding portion 113, and a training portion 114.
The second acquiring section 111 configured to acquire first traffic data;
the second extraction part 112 is configured to extract feature data corresponding to the first flow data;
the encoding part 113 is configured to perform encoding processing on the feature data through a self-encoding model to obtain encoded data; wherein the self-coding model is a model generated based on an unsupervised algorithm;
the second obtaining part 111 is further configured to obtain a first detection model through the encoded data training; wherein the first detection model is a model generated based on an unsupervised algorithm; obtaining a test result corresponding to the first traffic data according to the first detection model and a preset marking strategy;
the training part 114 is configured to train and obtain a second detection model based on the feature data and the test result; wherein the second detection model is a model generated based on a supervised algorithm.
Further, in an embodiment of the present application, the second extracting portion 112 is specifically configured to parse the first flow data to obtain structured data; and performing feature extraction processing on the structured data to obtain the feature data.
Further, in an embodiment of the present application, the second obtaining part 111 is specifically configured to perform detection processing on the encoded data by using the first detection model, so as to obtain the initial result; wherein the initial result comprises: normal traffic, abnormal traffic, and unpredictable; according to the preset marking strategy, marking the coded data of which the initial result is unpredictable to obtain a marking result; wherein the marking result comprises normal flow and abnormal flow; generating the test result based on the initial result and the marking result.
Further, in an embodiment of the present application, the second obtaining part 111 is further specifically configured to determine that the marking result is an abnormal flow if the initial result is the unpredictable encoded data, and matches a preset intelligence library and a regular rule library; and if the initial result is the unpredictable coded data and does not match the preset information base or the regular rule base, determining the marking result according to a preset examination strategy.
Further, in an embodiment of the present application, the second obtaining part 111 is further specifically configured to determine that the test result is a normal flow rate if the initial result is a normal flow rate; if the initial result is abnormal flow, determining that the test result is abnormal flow; if the marking result is the normal flow, determining that the test result is the normal flow; and if the marking result is abnormal flow, determining that the test result is abnormal flow.
Further, in the embodiment of the present application, the training part 114 is specifically configured to train and obtain the first detection model by using the encoded data based on an isolated forest algorithm.
Further, in an embodiment of the present application, the training part 114 is further specifically configured to input the feature data and the test result into a decision tree algorithm, and train to obtain the second detection model.
Further, in an embodiment of the present application, the second obtaining portion 111 is further configured to obtain first traffic data, capture network traffic data before extracting feature data corresponding to the first traffic data, and mirror-store the network traffic data to a preset storage space;
correspondingly, the second obtaining part 111 is further specifically configured to obtain the first traffic data from the network traffic data stored in the preset storage space.
Further, in an embodiment of the present application, the second obtaining portion 111 is further configured to obtain second traffic data from the network traffic data stored in the preset storage space before obtaining encoded data by performing encoding processing on the feature data through a self-encoding model; training with the second stream data to obtain the self-coding model based on an unsupervised algorithm.
Fig. 11 is a schematic structural diagram of a third component of the abnormal traffic detection system, and as shown in fig. 11, the abnormal traffic detection system 10 according to the embodiment of the present application may further include a processor 115 and a memory 116 storing executable instructions of the processor 115, and further, the abnormal traffic detection system 10 may further include a communication interface 117 and a bus 118 for connecting the processor 115, the memory 116 and the communication interface 117.
In an embodiment of the present Application, the Processor 115 may be at least one of an Application Specific Integrated Circuit (ASIC), a Digital Signal Processor (DSP), a Digital Signal Processing Device (DSPD), a ProgRAMmable Logic Device (PLD), a Field ProgRAMmable Gate Array (FPGA), a Central Processing Unit (CPU), a controller, a microcontroller, and a microprocessor. It is understood that the electronic device for implementing the above processor function may be other electronic devices, and the embodiments of the present application are not limited in particular. The abnormal flow detection system 10 may also include a memory 116, and the memory 116 may be coupled to the processor 115, wherein the memory 116 is configured to store executable program code comprising computer operating instructions, and the memory 116 may comprise a high speed RAM memory and may also comprise a non-volatile memory, such as at least two disk memories.
In the embodiment of the present application, the bus 118 is used for connecting the communication interface 117, the processor 115, and the memory 116 and mutual communication between these devices.
In an embodiment of the present application, the memory 116 is used for storing instructions and data.
Further, in an embodiment of the present application, the processor 115 is configured to obtain flow data to be detected, and analyze the flow data to be detected to obtain target structured data; performing feature extraction processing on the target structured data to obtain target feature data; determining a first detection result corresponding to the target characteristic data based on a self-coding model and a first detection model, determining a second detection result corresponding to the target characteristic data based on a second detection model, and determining a third detection result corresponding to the target characteristic data based on a preset rule base; wherein the self-coding model is a model generated based on an unsupervised algorithm; the first detection model is a model generated based on an unsupervised algorithm; the second detection model is based on a model generated by a supervised algorithm; and generating a target detection result of the flow data to be detected according to the first detection result, the second detection result and the third detection result.
Further, in an embodiment of the present application, the processor 115 is further configured to acquire first traffic data and extract feature data corresponding to the first traffic data; coding the characteristic data through a self-coding model to obtain coded data; wherein the self-coding model is a model generated based on an unsupervised algorithm; training through the coded data to obtain a first detection model; wherein the first detection model is a model generated based on an unsupervised algorithm; obtaining a test result corresponding to the first flow data according to the first detection model and a preset marking strategy; training to obtain a second detection model based on the feature data and the test result; wherein the second detection model is a model generated based on a supervised algorithm.
In practical applications, the Memory 116 may be a volatile Memory (volatile Memory), such as a Random-Access Memory (RAM); or a non-volatile Memory (non-volatile Memory), such as a Read-Only first Memory (ROM), a flash Memory (flash Memory), a Hard Disk Drive (HDD) or a Solid-State Drive (SSD); or a combination of the above types of memories and provides instructions and data to the processor 115.
In addition, each functional module in this embodiment may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware or a form of a software functional module.
Based on the understanding that the technical solutions of the present embodiment substantially or partially contribute to the prior art, or all or part of the technical solutions may be embodied in the form of a software product, which is stored in a storage medium and includes several instructions for enabling a computer device (which may be a personal computer, a server, or a network device, etc.) or a processor (processor) to execute all or part of the steps of the method of the present embodiment. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and the like.
The abnormal flow detection system provided by the embodiment of the application uses an unsupervised algorithm and a supervised algorithm to respectively train and generate a self-coding model, a first detection model and a second detection model, so that abnormal flow detection can be performed on flow data to be detected based on the self-coding model, the first detection model, the second detection model and a preset rule base, and the obtained detection result is the risk judgment of the flow data to be detected realized by combining the unsupervised algorithm, the supervised algorithm and the preset rule base, so that the accuracy of abnormal flow detection can be improved, and the detection quality of abnormal flow is effectively improved.
An embodiment of the present application provides a computer-readable storage medium, on which a program is stored, and when the program is executed by a processor, the program implements the abnormal traffic detection method as described above.
Specifically, the program instructions corresponding to an abnormal traffic detection method in the present embodiment may be stored on a storage medium such as an optical disc, a hard disc, or a usb flash drive, and when the program instructions corresponding to an abnormal traffic detection method in the storage medium are read or executed by an electronic device, the method includes the following steps:
acquiring flow data to be detected, and analyzing the flow data to be detected to obtain target structured data;
carrying out feature extraction processing on the target structured data to obtain target feature data;
determining a first detection result corresponding to the target characteristic data based on a self-coding model and a first detection model, determining a second detection result corresponding to the target characteristic data based on a second detection model, and determining a third detection result corresponding to the target characteristic data based on a preset rule base; wherein the self-coding model is a model generated based on an unsupervised algorithm; the first detection model is a model generated based on an unsupervised algorithm; the second detection model is based on a model generated by a supervised algorithm;
and generating a target detection result of the flow data to be detected according to the first detection result, the second detection result and the third detection result.
When the program instructions corresponding to an abnormal flow detection method in the storage medium are read or executed by an electronic device, the method also comprises the following steps:
acquiring first flow data and extracting feature data corresponding to the first flow data;
coding the characteristic data through a self-coding model to obtain coded data; wherein the self-coding model is a model generated based on an unsupervised algorithm;
training through the coded data to obtain a first detection model; wherein the first detection model is a model generated based on an unsupervised algorithm;
obtaining a test result corresponding to the first flow data according to the first detection model and a preset marking strategy;
training to obtain a second detection model based on the feature data and the test result; wherein the second detection model is a model generated based on a supervised algorithm.
It will be apparent to those skilled in the art that embodiments of the present application may be provided as a method, terminal, or computer program product. Accordingly, the present application may take the form of a hardware embodiment, a software embodiment, or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, optical storage, and the like) having computer-usable program code embodied therein.
The present application is described with reference to flowchart illustrations and/or block diagrams of implementations of methods, apparatus (systems) and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart block or blocks.
The above description is only a preferred embodiment of the present application, and is not intended to limit the scope of the present application.
Industrial applicability
The embodiment of the application provides an abnormal flow detection method and system and a computer storage medium, wherein the abnormal flow detection system acquires flow data to be detected and analyzes the flow data to be detected to acquire target structured data; carrying out feature extraction processing on the target structured data to obtain target feature data; determining a first detection result corresponding to the target characteristic data based on the self-coding model and the first detection model, determining a second detection result corresponding to the target characteristic data based on the second detection model, and determining a third detection result corresponding to the target characteristic data based on a preset rule base; the self-coding model and the first detection model are generated based on an unsupervised algorithm; the second detection model is based on a model generated by a supervised algorithm; and generating a target detection result of the flow data to be detected according to the first detection result, the second detection result and the third detection result. That is to say, in the embodiment of the present application, the abnormal flow detection system uses an unsupervised algorithm and a supervised algorithm to respectively train and generate the self-coding model, the first detection model and the second detection model, so that abnormal flow detection can be performed on the to-be-detected flow data based on the self-coding model, the first detection model, the second detection model and the preset rule base, and the obtained detection result is the risk judgment of the to-be-detected flow data, which is realized by combining the unsupervised algorithm, the supervised algorithm and the preset rule base, so that the accuracy of abnormal flow detection can be improved, and the detection quality of abnormal flow can be effectively improved.

Claims (20)

  1. A method of abnormal traffic detection, the method comprising:
    acquiring flow data to be detected, and analyzing the flow data to be detected to obtain target structured data;
    carrying out feature extraction processing on the target structured data to obtain target feature data;
    determining a first detection result corresponding to the target characteristic data based on a self-coding model and a first detection model, determining a second detection result corresponding to the target characteristic data based on a second detection model, and determining a third detection result corresponding to the target characteristic data based on a preset rule base; wherein the self-coding model is a model generated based on an unsupervised algorithm; the first detection model is a model generated based on an unsupervised algorithm; the second detection model is based on a model generated by a supervised algorithm;
    and generating a target detection result of the flow data to be detected according to the first detection result, the second detection result and the third detection result.
  2. The method of claim 1, wherein the determining a first detection result corresponding to the target feature data based on the self-coding model and the first detection model comprises:
    inputting the target characteristic data into a self-coding model and outputting coded data;
    and inputting the encoded data to the first detection model, and outputting the first detection result.
  3. The method of claim 1, wherein the determining a second detection result corresponding to the target feature data based on a second detection model comprises:
    and inputting the target characteristic data into the second detection model, and outputting the second detection result.
  4. The method according to claim 1, wherein the determining a third detection result corresponding to the target feature data based on a preset rule base includes:
    and matching the target characteristic data by using the preset rule base to obtain the third detection result.
  5. The method of claim 1, wherein generating the target detection result of the flow data to be detected according to the first detection result, the second detection result, and the third detection result comprises:
    if at least two of the first detection result, the second detection result and the third detection result are normal flow, determining that the target detection result is normal flow;
    and if at least two of the first detection result, the second detection result and the third detection result are abnormal flow, determining that the target detection result is the abnormal flow.
  6. The method of claim 1, wherein generating the target detection result of the flow data to be detected according to the first detection result, the second detection result, and the third detection result comprises:
    acquiring a preset weight set;
    and performing weighting operation by using the preset weight set, the first detection result, the second detection result and the third detection result to obtain the target detection result.
  7. A method of abnormal traffic detection, the method comprising:
    acquiring first flow data and extracting feature data corresponding to the first flow data;
    coding the characteristic data through a self-coding model to obtain coded data; wherein the self-coding model is a model generated based on an unsupervised algorithm;
    training through the coded data to obtain a first detection model; wherein the first detection model is a model generated based on an unsupervised algorithm;
    obtaining a test result corresponding to the first traffic data according to the first detection model and a preset marking strategy;
    training and obtaining a second detection model based on the feature data and the test result; wherein the second detection model is a model generated based on a supervised algorithm.
  8. The method of claim 7, wherein the extracting feature data corresponding to the first flow data comprises:
    analyzing the first flow data to obtain structured data;
    and carrying out feature extraction processing on the structured data to obtain the feature data.
  9. The method according to claim 7, wherein the obtaining of the test result corresponding to the first traffic data according to the first detection model and a preset marking strategy comprises:
    detecting the coded data by using the first detection model to obtain the initial result; wherein the initial result comprises: normal traffic, abnormal traffic, and unpredictable;
    according to the preset marking strategy, marking the encoded data of which the initial result is unpredictable to obtain a marking result; wherein the marking result comprises normal flow and abnormal flow;
    generating the test result based on the initial result and the labeling result.
  10. The method according to claim 9, wherein the marking the encoded data, the initial result of which is unpredictable, according to the preset marking policy to obtain a marking result comprises:
    if the initial result is the unpredictable coded data and matches a preset information library and a regular rule library, determining that the marking result is abnormal flow;
    and if the initial result is the unpredictable coded data and does not match the preset intelligence base or the regular rule base, determining the marking result according to a preset examination strategy.
  11. The method of claim 9, wherein the generating the test result based on the initial result and the marking result comprises:
    if the initial result is normal flow, determining that the test result is normal flow;
    if the initial result is abnormal flow, determining that the test result is abnormal flow;
    if the marking result is the normal flow, determining that the test result is the normal flow;
    and if the marking result is abnormal flow, determining that the test result is the abnormal flow.
  12. The method of claim 7, wherein the obtaining a first detection model through the post-encoding data training comprises:
    and training by utilizing the coded data to obtain the first detection model based on an isolated forest algorithm.
  13. The method of claim 7, wherein said training based on said feature data and said test results obtains a second detection model comprising:
    and inputting the feature data and the test result into a decision tree algorithm, and training to obtain the second detection model.
  14. The method of claim 7, wherein prior to the obtaining the first flow data and extracting the feature data corresponding to the first flow data, the method further comprises:
    capturing network traffic data, and storing the network traffic data to a preset storage space in a mirror image manner;
    accordingly, the obtaining the first traffic data comprises:
    and acquiring the first flow data from the network flow data stored in the preset storage space.
  15. The method of claim 14, wherein before the encoding the feature data by the self-encoding model to obtain encoded data, the method further comprises:
    acquiring second flow data from the network flow data stored in the preset storage space;
    training with the second streaming data to obtain the self-coding model based on an unsupervised algorithm.
  16. An abnormal flow detection system, the abnormal flow detection system comprising: a first acquisition section, an analysis section, a first extraction section, a determination section, a generation section,
    the first acquisition part is configured to acquire flow data to be detected;
    the analysis part is configured to analyze the flow data to be detected to obtain target structured data;
    the first extraction part is configured to perform feature extraction processing on the target structured data to obtain target feature data;
    the determining part is configured to determine a first detection result corresponding to the target feature data based on a self-coding model and a first detection model, determine a second detection result corresponding to the target feature data based on a second detection model, and determine a third detection result corresponding to the target feature data based on a preset rule base; wherein the self-coding model is a model generated based on an unsupervised algorithm; the first detection model is a model generated based on an unsupervised algorithm; the second detection model is based on a model generated by a supervised algorithm;
    the generating part is configured to generate a target detection result of the to-be-detected flow data according to the first detection result, the second detection result and the third detection result.
  17. An abnormal flow detection system, the abnormal flow detection system comprising: a second acquisition part, a second extraction part, a coding part, a training part,
    the second acquisition part is configured to acquire first flow data;
    the second extraction part is configured to extract feature data corresponding to the first flow data;
    the coding part is configured to perform coding processing on the characteristic data through a self-coding model to obtain coded data; wherein the self-coding model is a model generated based on an unsupervised algorithm;
    the second acquisition part is further configured to acquire a first detection model through the encoded data training; wherein the first detection model is a model generated based on an unsupervised algorithm; obtaining a test result corresponding to the first flow data according to the first detection model and a preset marking strategy;
    the training part is configured to train and obtain a second detection model based on the feature data and the test result; wherein the second detection model is a model generated based on a supervised algorithm.
  18. An abnormal traffic detection system comprising a processor, a memory having stored thereon instructions executable by the processor to perform the method of any of claims 1-6 when the instructions are executed by the processor.
  19. An abnormal traffic detection system comprising a processor, a memory having stored thereon instructions executable by the processor to perform the method of any of claims 7-15 when the instructions are executed by the processor.
  20. A computer-readable storage medium having stored thereon a program for use in an abnormal flow detection system, the program, when executed by a processor, implementing the method of any one of claims 1-6 and 7-15.
CN202080100505.9A 2020-06-24 2020-06-24 Abnormal flow detection method and system, and computer storage medium Pending CN115606162A (en)

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2020/098177 WO2021258348A1 (en) 2020-06-24 2020-06-24 Abnormal flow detection method and system and computer storage medium

Publications (1)

Publication Number Publication Date
CN115606162A true CN115606162A (en) 2023-01-13

Family

ID=79282432

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202080100505.9A Pending CN115606162A (en) 2020-06-24 2020-06-24 Abnormal flow detection method and system, and computer storage medium

Country Status (2)

Country Link
CN (1) CN115606162A (en)
WO (1) WO2021258348A1 (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116582301A (en) * 2023-04-17 2023-08-11 华中科技大学 Industrial control network abnormal flow detection method and system based on Laplacian pyramid
CN117061254A (en) * 2023-10-12 2023-11-14 之江实验室 Abnormal flow detection method, device and computer equipment
CN117151768A (en) * 2023-10-30 2023-12-01 国网浙江省电力有限公司营销服务中心 Construction method and system of wind control rule base of generated marketing event

Families Citing this family (24)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113728605A (en) * 2020-01-31 2021-11-30 松下电器(美国)知识产权公司 Abnormality detection method and abnormality detection device
CN114500334B (en) * 2021-12-31 2024-04-09 钉钉(中国)信息技术有限公司 Diagnosis method and device for server application architecture
CN114547970B (en) * 2022-01-25 2024-02-20 中国长江三峡集团有限公司 Intelligent diagnosis method for abnormality of top cover drainage system of hydropower plant
CN114629699B (en) * 2022-03-07 2022-12-09 北京邮电大学 Migratory network flow behavior anomaly detection method and device based on deep reinforcement learning
CN114726581B (en) * 2022-03-09 2023-06-20 同济大学 Abnormality detection method and device, electronic equipment and storage medium
CN114679308B (en) * 2022-03-21 2023-04-07 山东大学 Unknown flow identification method and system based on double-path self-coding
CN114584391B (en) * 2022-03-22 2024-02-09 恒安嘉新(北京)科技股份公司 Method, device, equipment and storage medium for generating abnormal flow processing strategy
CN114745170B (en) * 2022-04-07 2023-08-18 鹏城实验室 Internet of things abnormality real-time detection method, device, terminal and readable storage medium
CN114615088A (en) * 2022-04-25 2022-06-10 国网冀北电力有限公司信息通信分公司 Terminal service flow abnormity detection model establishing method and abnormity detection method
CN115016433A (en) * 2022-06-01 2022-09-06 哈尔滨工业大学(威海) Vehicle-mounted CAN bus flow abnormity detection method and system
CN114866338A (en) * 2022-06-10 2022-08-05 阿里云计算有限公司 Network security detection method and device and electronic equipment
CN115277098B (en) * 2022-06-27 2023-07-18 深圳铸泰科技有限公司 Network flow abnormality detection device and method based on intelligent learning
CN115174178B (en) * 2022-06-28 2023-07-04 南京邮电大学 Semi-supervised network traffic anomaly detection method based on generation of countermeasure network
CN115174190B (en) * 2022-06-29 2024-01-26 武汉极意网络科技有限公司 Information security management and control system and method based on network traffic
CN115118514A (en) * 2022-07-11 2022-09-27 深信服科技股份有限公司 Data detection method, device, equipment and medium
CN115250199B (en) * 2022-07-15 2023-04-07 北京六方云信息技术有限公司 Data stream detection method and device, terminal equipment and storage medium
CN115278680B (en) * 2022-07-29 2023-04-07 国网区块链科技(北京)有限公司 Mobile application attack detection method, device, equipment and storage medium
CN115296919B (en) * 2022-08-15 2023-04-25 江西师范大学 Method and system for calculating special traffic packet by edge gateway
CN115080965B (en) * 2022-08-16 2022-11-15 杭州比智科技有限公司 Unsupervised anomaly detection method and unsupervised anomaly detection system based on historical performance
CN115529162A (en) * 2022-08-26 2022-12-27 中国科学院信息工程研究所 Method and system for protecting abnormal behaviors of industrial control flow
CN115694947B (en) * 2022-10-26 2024-04-16 四川大学 Network encryption traffic threat sample generation mechanism method based on countermeasure generation DQN
CN116132337B (en) * 2023-04-04 2023-06-13 深圳行云创新科技有限公司 Interface flow anomaly detection method based on service grid technology
CN116319386A (en) * 2023-05-17 2023-06-23 北京国信蓝盾科技有限公司 Availability and fault prediction method and device, electronic equipment and medium
CN116993663A (en) * 2023-06-12 2023-11-03 阿里巴巴(中国)有限公司 Image processing method and training method of image processing model

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108563548B (en) * 2018-03-19 2020-10-16 创新先进技术有限公司 Abnormality detection method and apparatus
CN108985330B (en) * 2018-06-13 2021-03-26 华中科技大学 Self-coding network and training method thereof, and abnormal power utilization detection method and system
CN111178523B (en) * 2019-08-02 2023-06-06 腾讯科技(深圳)有限公司 Behavior detection method and device, electronic equipment and storage medium

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116582301A (en) * 2023-04-17 2023-08-11 华中科技大学 Industrial control network abnormal flow detection method and system based on Laplacian pyramid
CN116582301B (en) * 2023-04-17 2024-02-02 华中科技大学 Industrial control network abnormal flow detection method, system and computer readable storage medium based on Laplacian pyramid
CN117061254A (en) * 2023-10-12 2023-11-14 之江实验室 Abnormal flow detection method, device and computer equipment
CN117061254B (en) * 2023-10-12 2024-01-23 之江实验室 Abnormal flow detection method, device and computer equipment
CN117151768A (en) * 2023-10-30 2023-12-01 国网浙江省电力有限公司营销服务中心 Construction method and system of wind control rule base of generated marketing event

Also Published As

Publication number Publication date
WO2021258348A1 (en) 2021-12-30

Similar Documents

Publication Publication Date Title
CN115606162A (en) Abnormal flow detection method and system, and computer storage medium
EP3355547B1 (en) Method and system for learning representations of network flow traffic
CN110851321B (en) Service alarm method, equipment and storage medium
CN113242207B (en) Iterative clustering network flow abnormity detection method
WO2018156976A2 (en) Processing pipeline for monitoring information systems
CN109063745A (en) A kind of types of network equipment recognition methods and system based on decision tree
CN112738039A (en) Malicious encrypted flow detection method, system and equipment based on flow behavior
CN113535825A (en) Cloud computing intelligence-based data information wind control processing method and system
CN112839014B (en) Method, system, equipment and medium for establishing abnormal visitor identification model
CN117041019B (en) Log analysis method, device and storage medium of content delivery network CDN
CN112884121A (en) Traffic identification method based on generation of confrontation deep convolutional network
CN115277189A (en) Unsupervised intrusion flow detection and identification method based on generative countermeasure network
CN117421684B (en) Abnormal data monitoring and analyzing method based on data mining and neural network
CN110650124A (en) Network flow abnormity detection method based on multilayer echo state network
CN113852605B (en) Protocol format automatic inference method and system based on relation reasoning
CN112115443B (en) Terminal user authentication method and system
CN114615088A (en) Terminal service flow abnormity detection model establishing method and abnormity detection method
CN114021637A (en) Decentralized application encrypted flow classification method and device based on measurement space
CN114169433A (en) Industrial fault prediction method based on federal learning + image learning + CNN
Li et al. Solving the data imbalance problem in network intrusion detection: A MP-CVAE based method
Liu et al. A feature compression technique for anomaly detection using convolutional neural networks
CN115442309B (en) Packet granularity network traffic classification method based on graph neural network
CN116582301B (en) Industrial control network abnormal flow detection method, system and computer readable storage medium based on Laplacian pyramid
Liao et al. GE-IDS: an intrusion detection system based on grayscale and entropy
CN114679308B (en) Unknown flow identification method and system based on double-path self-coding

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination