WO2022037130A1 - 网络流量异常的检测方法、装置、电子装置和存储介质 - Google Patents

网络流量异常的检测方法、装置、电子装置和存储介质 Download PDF

Info

Publication number
WO2022037130A1
WO2022037130A1 PCT/CN2021/092227 CN2021092227W WO2022037130A1 WO 2022037130 A1 WO2022037130 A1 WO 2022037130A1 CN 2021092227 W CN2021092227 W CN 2021092227W WO 2022037130 A1 WO2022037130 A1 WO 2022037130A1
Authority
WO
WIPO (PCT)
Prior art keywords
initial
classifier
abnormal
adaboost
training
Prior art date
Application number
PCT/CN2021/092227
Other languages
English (en)
French (fr)
Inventor
林月晴
范渊
刘博�
Original Assignee
杭州安恒信息技术股份有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 杭州安恒信息技术股份有限公司 filed Critical 杭州安恒信息技术股份有限公司
Priority to US18/022,170 priority Critical patent/US20230300159A1/en
Publication of WO2022037130A1 publication Critical patent/WO2022037130A1/zh

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/14Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic
    • H04L63/1408Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic by monitoring network traffic
    • H04L63/1425Traffic logging, e.g. anomaly detection
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2413Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on distances to training or reference patterns
    • G06F18/24147Distances to closest patterns, e.g. nearest neighbour classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/14Network analysis or design
    • H04L41/142Network analysis or design using statistical or mathematical methods
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/14Network analysis or design
    • H04L41/145Network analysis or design involving simulating, designing, planning or modelling of a network
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/16Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks using machine learning or artificial intelligence
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L43/00Arrangements for monitoring or testing data switching networks
    • H04L43/06Generation of reports
    • H04L43/062Generation of reports related to network traffic
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L47/00Traffic control in data switching networks
    • H04L47/10Flow control; Congestion control
    • H04L47/24Traffic characterised by specific attributes, e.g. priority or QoS
    • H04L47/2441Traffic characterised by specific attributes, e.g. priority or QoS relying on flow classification, e.g. using integrated services [IntServ]
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L47/00Traffic control in data switching networks
    • H04L47/10Flow control; Congestion control
    • H04L47/29Flow control; Congestion control using a combination of thresholds
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L43/00Arrangements for monitoring or testing data switching networks
    • H04L43/16Threshold monitoring
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D30/00Reducing energy consumption in communication networks
    • Y02D30/50Reducing energy consumption in communication networks in wire-line communication networks, e.g. low power modes or reduced link rate

Definitions

  • the present application relates to the technical field of data processing, and in particular, to a method, device, electronic device and storage medium for detecting abnormal network traffic.
  • Abnormal network traffic refers to the situation in which network traffic behavior deviates from normal behavior. Abnormal network traffic not only affects the normal use of the network and business systems, but also threatens the information security of network users, causing many harms to network users. By classifying and monitoring different traffic in the network, many abnormal behaviors in the computer network can be discovered in time, and at the same time, they can be controlled in a targeted manner, thereby effectively ensuring the normal operation of the computer network. Therefore, detecting abnormal traffic is an important aspect of network operation and maintenance. How to effectively diagnose abnormal network traffic and ensure network availability and smoothness plays a crucial role in the sustainable and normal development of the network.
  • the detection methods for abnormal traffic are mainly implemented in the following three ways: (1) Fixed threshold detection method: mainly relying on quantitative analysis methods, the overall operation is relatively simple, but network managers are required to combine the actual situation with the corresponding threshold value. Rich theoretical knowledge and management experience are required. (2) Statistical detection method: Statistical analysis is done with the help of data to make judgments, but only abnormal network traffic can be detected, and its attributes are not specified, and it is only suitable for non-real-time abnormal detection of network traffic. (3) SNMP-based detection method: mainly rely on software to complete the abnormal detection of network traffic, but cannot detect complex network traffic.
  • Embodiments of the present application provide a method, device, electronic device, and storage medium for detecting abnormal network traffic, so as to at least solve the problem of low detection effect of abnormal network traffic in the related art.
  • an embodiment of the present application provides a method for detecting abnormal network traffic, including:
  • the Adaboost classifier is obtained;
  • the collected traffic data is classified using the Adaboost classifier.
  • acquiring the abnormal feature vectors of the multiple pieces of traffic data includes:
  • an abnormal feature vector of the multi-segment traffic data is generated.
  • obtaining a plurality of initial classifiers includes:
  • the initial classification model is trained according to the distance and KNN algorithm to obtain the plurality of initial classifiers.
  • performing data normalization processing on the abnormal feature vectors in the training set includes:
  • Anomaly feature vectors in the training set are mapped between 0 and 1 using maximum normalization.
  • the method further includes:
  • the initial classifier is trained according to the distance and the KNN algorithm.
  • the obtained Adaboost classifier includes:
  • the training group is input into a plurality of initial classifiers, and the error of each initial classifier is calculated;
  • the initial classifier after training is input into the Adaboost classification model to obtain an Adaboost classifier.
  • determining the weight of each initial classifier according to the error of each initial classifier includes:
  • the weight of each initial classifier is determined according to the error of each initial classifier.
  • an embodiment of the present application further provides a device for detecting abnormal network traffic, including:
  • the first acquisition module is used to acquire multi-segment traffic data under different monitoring states
  • a second acquisition module configured to acquire abnormal feature vectors in the multi-segment traffic data
  • the first training module is used to train an initial classification model according to the abnormal feature vector and based on the KNN algorithm to obtain a plurality of initial classifiers;
  • the second training module according to the abnormal feature vector and the multiple initial classifiers, and training the initial Adaboost classification model based on the Adaboost algorithm to obtain the Adaboost classifier;
  • a classification module configured to use the Adaboost classifier to classify the collected traffic data.
  • an embodiment of the present application provides an electronic device, including a memory, a processor, and a computer program stored on the memory and executable on the processor, when the processor executes the computer program
  • the method for detecting network traffic anomaly as described in the first aspect above is implemented.
  • an embodiment of the present application provides a storage medium on which a computer program is stored, and when the program is executed by a processor, implements the method for detecting abnormal network traffic as described in the first aspect above.
  • the method, device, electronic device, and storage medium for detecting abnormal network traffic obtain multi-segment traffic data under different monitoring states; acquire abnormal feature vectors in the multi-segment traffic data; feature vector, and train the initial classification model based on the KNN algorithm to obtain multiple initial classifiers; train the initial Adaboost classification model based on the abnormal feature vector and multiple initial classifiers, and train the initial Adaboost classification model based on the Adaboost algorithm to obtain the Adaboost classifier; use the Adaboost classifier to The method of classifying the collected traffic data solves the problem of low detection effect of abnormal network traffic in the related art, and improves the detection accuracy of abnormal network traffic.
  • FIG. 1 is a block diagram of a hardware structure of a terminal according to a method for detecting abnormal network traffic according to an embodiment of the present application.
  • FIG. 2 is a flowchart of a method for detecting abnormal network traffic according to an embodiment of the present application.
  • FIG. 3 is a flowchart of a method for detecting abnormal network traffic according to an optional embodiment of the present application.
  • FIG. 4 is a structural block diagram of an apparatus for detecting abnormal network traffic according to an embodiment of the present application.
  • Words like "connected,” “connected,” “coupled,” and the like referred to in this application are not limited to physical or mechanical connections, but may include electrical connections, whether direct or indirect.
  • the “plurality” referred to in this application means greater than or equal to two.
  • “And/or” describes the association relationship between associated objects, indicating that there can be three kinds of relationships. For example, “A and/or B” can mean that A exists alone, A and B exist at the same time, and B exists alone.
  • the terms “first”, “second”, “third”, etc. involved in this application are only to distinguish similar objects, and do not represent a specific order for the objects.
  • FIG. 1 is a block diagram of a hardware structure of a terminal of a method for detecting abnormal network traffic according to an embodiment of the present application.
  • the terminal 10 may include one or more (only one is shown in FIG. 1 ) processors 102 (the processors 102 may include but are not limited to processing devices such as a microprocessor MCU or a programmable logic device FPGA) and a memory 104 for storing data, optionally, the above-mentioned terminal may further include a transmission device 106 and an input/output device 108 for communication functions.
  • FIG. 1 is only for illustration, which does not limit the structure of the above-mentioned terminal.
  • the terminal 10 may also include more or fewer components than those shown in FIG. 1 , or have a different configuration than that shown in FIG. 1 .
  • the memory 104 can be used to store computer programs, for example, software programs and modules of application software, such as the computer programs corresponding to the method for detecting abnormal network traffic in the embodiments of the present application, the processor 102 runs the computer programs stored in the memory 104, Thereby, various functional applications and data processing are performed, that is, the above-mentioned method is realized.
  • Memory 104 may include high-speed random access memory, and may also include non-volatile memory, such as one or more magnetic storage devices, flash memory, or other non-volatile solid-state memory. In some instances, memory 104 may further include memory located remotely from processor 102, which may be connected to terminal 10 through a network. Examples of such networks include, but are not limited to, the Internet, an intranet, a local area network, a mobile communication network, and combinations thereof.
  • Transmission device 106 is used to receive or transmit data via a network.
  • the specific example of the above-mentioned network may include a wireless network provided by the communication provider of the terminal 10 .
  • the transmission device 106 includes a network adapter (Network Interface Controller, NIC for short), which can be connected to other network devices through a base station so as to communicate with the Internet.
  • the transmission device 106 may be a radio frequency (Radio Frequency, RF for short) module, which is used to communicate with the Internet in a wireless manner.
  • RF Radio Frequency
  • FIG. 2 is a flowchart of a method for detecting abnormal network traffic according to an embodiment of the present application. As shown in FIG. 2 , the process includes the following steps:
  • Step S201 Acquire multi-segment traffic data in different monitoring states.
  • the multi-segment traffic data may be acquired in real time, or may be acquired in a database corresponding to the multi-segment traffic data.
  • Step S202 Acquire abnormal feature vectors in the multi-segment traffic data.
  • abnormal traffic data in the multi-segment traffic data may be determined first and marked; characteristic data of the marked traffic abnormal data may be determined; and abnormal characteristic vectors of the multi-segment traffic data may be generated according to the characteristic data. In this way, a way to obtain the abnormal feature vector is provided.
  • Step S203 Train an initial classification model according to the abnormal feature vector and the KNN algorithm to obtain a plurality of initial classifiers.
  • the KNN algorithm also known as the proximity algorithm, or the K-Nearest Neighbor (KNN, K-Nearest Neighbor) classification algorithm is one of the simplest methods in the data mining classification technology.
  • KNN K-Nearest Neighbor
  • the so-called K nearest neighbors means the K nearest neighbors, which means that each sample can be represented by its nearest K neighbors.
  • the nearest neighbor algorithm is a method of classifying each record in the data set.
  • step S203 can be implemented in the following ways: taking the first preset threshold abnormal feature vectors as a training set; performing data normalization processing on the abnormal feature vectors in the training set; determining to perform data normalization processing The distance between each abnormal feature vector in the subsequent training set; the initial classification model is trained according to the distance and the KNN algorithm, and multiple initial classifiers are obtained.
  • performing data normalization processing on the abnormal feature vectors in the training set includes: using the maximum value normalization to map the abnormal feature vectors in the training set between 0 and 1. In this way, it can be ensured that features of different dimensions are in the same space, so as to improve the detection effect of abnormal network traffic.
  • the second preset threshold abnormal feature vectors can also be used as the test set; the test set can be input into the multiple initial classifiers , obtain the classification result corresponding to each initial classifier; determine the accuracy rate of the classification result corresponding to each initial classifier; determine whether the accuracy rate of the classification result corresponding to each initial classifier is greater than the third preset threshold; if If not, obtain an initial classifier not greater than the third preset threshold; train the initial classifier according to the distance and KNN algorithm. In this way, the detection of the initial classifier is realized, which is convenient to improve the detection effect and the accuracy of the initial classifier.
  • Step S204 Train an initial Adaboost classification model based on the abnormal feature vector and a plurality of initial classifiers, and an Adaboost algorithm to obtain an Adaboost classifier.
  • the Adaboost algorithm is an iterative algorithm whose core idea is to train different classifiers (weak classifiers) for the same training set, and then combine these weak classifiers to form a stronger final classification classifier (strong classifier).
  • step S204 may be implemented by the following steps: a fourth preset threshold abnormal feature vector may be used as a training group; the training group may be input into a plurality of initial classifiers, and the Error; determine the weight of each initial classifier according to the error of each initial classifier; train the initial classifier corresponding to the weight of each initial classifier according to the weight of each initial classifier until convergence; The initial classifier is input to the Adaboost classification model to obtain the Adaboost classifier. In this way, the training of the Adaboost classifier is achieved.
  • determining the weight of each initial classifier according to the error of each initial classifier includes: initializing each initial classifier; assigning a preset value to each initial classifier after initialization; The error of each initial classifier determines the weight of each initial classifier. In this way, it is ensured that the initial assignments of each initial classifier are consistent, so as to improve the detection effect of the initial classifier.
  • Step S205 Use the Adaboost classifier to classify the collected traffic data.
  • multiple sets of traffic anomaly feature vectors are extracted by acquiring multiple segments of traffic data under different states of network traffic monitoring, and multiple sets of traffic anomaly feature vectors are used to train based on the KNN algorithm to obtain multiple initial classifiers, and then combine multiple initial classifiers.
  • the classifier is trained to obtain an Adaboost strong classifier, and the Adaboost strong classifier is used to classify the collected abnormal network traffic data, thereby improving the detection effect and solving the problem of low detection effect of abnormal network traffic in related technologies.
  • the present application is based on the KNN-based Adaboost network traffic anomaly detection method, which can perform abnormal judgment and classification based on a large amount of traffic monitoring data, can detect complex network traffic, and is highly adaptive, which helps to improve the accuracy of traffic data abnormality detection Rate.
  • FIG. 3 is a flowchart of a method for detecting abnormal network traffic according to an optional embodiment of the present application. As shown in Figure 3, the optional process includes:
  • Step S301 Acquire multiple sets of abnormal traffic data under different monitoring states of network traffic and perform data preprocessing to obtain multiple sets of abnormal traffic feature vectors.
  • Step S301 can be performed according to the following steps:
  • Step S302 construct an initial classifier based on the KNN algorithm.
  • Step S302 can be performed according to the following steps:
  • Data set division 80% abnormal traffic feature vector groups are randomly selected as the training set, and the remaining 20% traffic abnormal feature vector groups are used as the test set;
  • Initial classification model test input the test set into the initial classification model, and judge the accuracy of the initial classification model according to the output results and the labeling results. If the initial accuracy is not less than the set value, the initial classification model is used as the initial classification If it is less than the set value, repeat step 3 to optimize the initial classification model.
  • Step S303 Repeat step S302 for several times to obtain multiple initial classifiers based on KNN.
  • Step S304 Adaboost strong classifier is obtained by combining training of multiple initial classifiers based on KNN.
  • Step S304 can be performed according to the following steps:
  • Step S305 Use the Adaboost strong classifier to classify the collected characteristic vectors of abnormal network traffic data, and then judge the abnormal traffic.
  • This embodiment also provides an apparatus for detecting abnormality of network traffic, which is used to implement the above-mentioned embodiments and optional implementation manners, which have been described and will not be repeated.
  • the terms “module,” “unit,” “subunit,” etc. may be a combination of software and/or hardware that implements a predetermined function.
  • the apparatus described in the following embodiments is preferably implemented in software, implementations in hardware, or a combination of software and hardware, are also possible and contemplated.
  • FIG. 4 is a structural block diagram of an apparatus for detecting abnormal network traffic according to an embodiment of the present application. As shown in FIG. 4 , the apparatus includes:
  • the first acquisition module 41 is used to acquire multi-segment traffic data under different monitoring states
  • the second acquisition module 42 coupled to the first acquisition module 41, is used to acquire abnormal feature vectors in the multi-segment traffic data
  • the first training module 43 coupled to the second acquisition module 42, is used for training the initial classification model according to the abnormal feature vector and based on the KNN algorithm to obtain a plurality of initial classifiers;
  • the second training module 44 coupled to the first training module 43, trains the initial Adaboost classification model according to the abnormal feature vector and a plurality of initial classifiers, and based on the Adaboost algorithm, to obtain the Adaboost classifier;
  • the classification module 45 coupled to the second training module 44, is used for classifying the collected traffic data by using the Adaboost classifier.
  • the first acquisition module 41 is used to acquire multi-segment traffic data in different monitoring states;
  • the second acquisition module 42 is coupled to the first acquisition module 41 and is used to acquire abnormal feature vectors in the multi-segment traffic data
  • the first training module 43 is coupled to the second acquisition module 42 for training the initial classification model according to the abnormal feature vector and based on the KNN algorithm to obtain a plurality of initial classifiers;
  • the second training module 44 is coupled to the first training module 43, train the initial Adaboost classification model based on the abnormal feature vector and a plurality of initial classifiers, and train the initial Adaboost classification model based on the Adaboost algorithm to obtain the Adaboost classifier;
  • the classification module 45 is coupled to the second training module 44 for using the Adaboost classifier to collect the collected data.
  • the method of classifying the traffic data solves the problem of low detection effect of abnormal network traffic in the related art, and improves the detection accuracy of abnormal network traffic.
  • the second acquiring module 42 includes: a first determining unit, configured to determine abnormal flow data in the multi-segment traffic data, and mark it; a second determining unit, configured to determine the abnormal flow data after being marked The characteristic data of ; the generating unit is used to generate abnormal characteristic vectors of multi-segment traffic data according to the characteristic data.
  • the first training module 43 includes: a first serving unit, used for taking the first preset threshold abnormal feature vectors as a training set; a processing unit, used for performing data normalization on the abnormal feature vectors in the training set Normalization processing; the third determining unit is used to determine the distance between each abnormal feature vector in the training set after data normalization processing; the first training unit is used to train the initial classification model according to the distance and the KNN algorithm, Get multiple initial classifiers.
  • the processing unit includes a mapping sub-unit for mapping anomalous feature vectors in the training set to between 0 and 1 using maximum normalization.
  • the apparatus further includes: as a module, used to use the second preset threshold abnormal feature vectors as a test set; an input module, used to input the test set into a plurality of initial classifiers, and obtain each The classification results corresponding to the initial classifiers; the determination module is used to determine the accuracy of the classification results corresponding to each initial classifier; the judgment module is used to determine whether the accuracy of the classification results corresponding to each initial classifier is greater than that of the first classifier.
  • Three preset thresholds a third acquisition module, for acquiring an initial classifier not greater than the third preset threshold if not; a third training module for training the initial classifier according to the distance and the KNN algorithm.
  • the second training module 44 includes: a second as unit, used to use the fourth preset threshold abnormal feature vectors as a training group; a calculation unit, used to input the training group into a plurality of initial classifiers , the error of each initial classifier is calculated; the fourth determination unit is used to determine the weight of each initial classifier according to the error of each initial classifier; the second training unit is used to determine the weight of each initial classifier according to the error of each initial classifier; the second training unit is used to determine the weight of each initial classifier according to the error of each initial classifier; The weight trains the initial classifier corresponding to the weight of each initial classifier until convergence; the input unit is used to input the initial classifier after training into the Adaboost classification model to obtain the Adaboost classifier.
  • the determining unit includes: an initialization subunit, used to initialize each initial classifier; an assigning subunit, used to assign a preset value to each initial classifier after initialization; a subunit, used To determine the weight of each initial classifier according to the error of each initial classifier.
  • each of the above modules may be functional modules or program modules, and may be implemented by software or hardware.
  • the above-mentioned modules may be located in the same processor; or the above-mentioned modules may also be located in different processors in any combination.
  • This embodiment also provides an electronic device, including a memory and a processor, where a computer program is stored in the memory, and the processor is configured to run the computer program to execute the steps in any of the above method embodiments.
  • the above-mentioned electronic device may further include a transmission device and an input-output device, wherein the transmission device is connected to the above-mentioned processor, and the input-output device is connected to the above-mentioned processor.
  • the above-mentioned processor may be configured to execute the following steps through a computer program:
  • S201 Acquire multi-segment traffic data in different monitoring states.
  • S202 Acquire abnormal feature vectors in the multi-segment traffic data.
  • S203 Train an initial classification model based on the abnormal feature vector and the KNN algorithm to obtain multiple initial classifiers.
  • S205 Use the Adaboost classifier to classify the collected traffic data.
  • the embodiment of the present application may provide a storage medium for implementation.
  • a computer program is stored on the storage medium; when the computer program is executed by the processor, any method for detecting abnormal network traffic in the foregoing embodiments is implemented.

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Computer Security & Cryptography (AREA)
  • General Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Computing Systems (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Computer Hardware Design (AREA)
  • Software Systems (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Mathematical Physics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Computational Linguistics (AREA)
  • Molecular Biology (AREA)
  • General Health & Medical Sciences (AREA)
  • Biophysics (AREA)
  • Biomedical Technology (AREA)
  • Health & Medical Sciences (AREA)
  • Medical Informatics (AREA)
  • Databases & Information Systems (AREA)
  • Mathematical Optimization (AREA)
  • Mathematical Analysis (AREA)
  • Probability & Statistics with Applications (AREA)
  • Pure & Applied Mathematics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Algebra (AREA)
  • Evolutionary Biology (AREA)
  • Data Exchanges In Wide-Area Networks (AREA)

Abstract

网络流量异常的检测方法、装置、电子装置和存储介质。其中,该网络流量异常的检测方法包括:获取不同监控状态下的多段流量数据;获取多段流量数据中的异常特征向量;根据异常特征向量,以及基于KNN算法训练初始分类模型,得到多个初始分类器;根据异常特征向量和多个初始分类器,以及基于Adaboost算法训练初始Adaboost分类模型,得到Adaboost分类器;使用Adaboost分类器对采集到的流量数据进行分类。

Description

网络流量异常的检测方法、装置、电子装置和存储介质
相关申请
本申请要求2020年8月21日申请的,申请号为202010847761.6,发明名称为“网络流量异常的检测方法、装置、电子装置和存储介质”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请涉及数据处理技术领域,特别是涉及网络流量异常的检测方法、装置、电子装置和存储介质。
背景技术
网络流量异常是指网络流量行为偏离正常行为的情形。网络流量异常不仅影响网络和业务系统的正常使用,还会威胁网络用户的信息安全,给网络用户造成诸多危害。通过分类监控网络中的不同流量,从而及时发现计算机网络中所存在的诸多异常行为,同时对其予以针对性控制,从而有效确保计算机网络的正常运转。因此,检测异常流量是网络运行维护工作中的重要方面,如何能够有效诊断网络异常流量,保证网络可用性及通畅性,对网络的可持续、正常发展起到至关紧要的作用。
目前,针对异常流量的检测方法技术实现主要为以下三种方式:(1)固定阙值检测方式:主要借助量化分析方法,整体操作较为简单,但要求网络管理人员针对相应阙值联合实际情况,需要有丰富的理论认知及管理经验。(2)统计检测法:借助数据完成统计分析进行判断,但是只能察觉网络流量异常,并未对其属性加以明确,且仅适用于非实时的网络流量异常检测情况。(3)基于SNMP检测法:主要借助软件完成对网络流量异常检测,但无法对于复杂化的网络流量加以检测。
在相关技术中,为了提高网络流量异常的检测效果,首先通过使用初始有标记数据作为训练样本,利用监督学习训练初始分类模型;然后,利用初始分类模型对网络流量无标记数据进行分类,得到初始分类数据;再利用半监督学习模型对初始分类数据进行重新标记和修正;最后,利用新的分类数据重新训练分类模型,并更新初始分类模型,如此往复不断更新分类模型的方式,来提高检测效果。然而,在研究过程中发现,上述方式需要不断进行标记和修正,导致了网络流量异常的检测过程繁琐,且存在网络流量异常的检测效 果低的问题。
目前针对相关技术中网络流量异常的检测效果低的问题,尚未提出有效的解决方案。
发明内容
本申请实施例提供了一种网络流量异常的检测方法、装置、电子装置和存储介质,以至少解决相关技术中网络流量异常的检测效果低的问题。
第一方面,本申请实施例提供了一种网络流量异常的检测方法,包括:
获取不同监控状态下的多段流量数据;
获取所述多段流量数据中的异常特征向量;
根据所述异常特征向量,以及基于KNN算法训练初始分类模型,得到多个初始分类器;
根据所述异常特征向量和所述多个初始分类器,以及基于Adaboost算法训练初始Adaboost分类模型,得到Adaboost分类器;
使用所述Adaboost分类器对采集到的流量数据进行分类。
在其中一些实施例中,获取所述多段流量数据的异常特征向量包括:
确定所述多段流量数据中的流量异常数据,并进行标记;
确定被标记之后的所述流量异常数据的特征数据;
根据所述特征数据,生成所述多段流量数据的异常特征向量。
在其中一些实施例中,根据所述异常特征向量,以及基于KNN算法训练初始分类模型,得到多个初始分类器包括:
将第一预设阈值个所述异常特征向量作为训练集;
对所述训练集中的异常特征向量进行数据归一化处理;
确定进行数据归一化处理之后的所述训练集中的各个异常特征向量的之间的距离;
根据所述距离和KNN算法训练所述初始分类模型,得到所述多个初始分类器。
在其中一些实施例中,对所述训练集中的异常特征向量进行数据归一化处理包括:
使用最值归一化,将所述训练集中的异常特征向量映射到0至1之间。
在其中一些实施例中,在根据所述距离和KNN算法训练所述初始分类模型,得到所述多个初始分类器之后,所述方法还包括:
将第二预设阈值个所述异常特征向量作为测试集;
将所述测试集输入到所述多个初始分类器中,得到每个初始分类器对应的分类结果;
确定每个初始分类器对应的分类结果的准确率;
判断所述每个初始分类器对应的分类结果的准确率是否均大于第三预设阈值;
若否,则获取不大于所述第三预设阈值的初始分类器;
根据所述距离和KNN算法训练所述初始分类器。
在其中一些实施例中,根据所述异常特征向量和所述多个初始分类器,以及基于Adaboost算法训练初始Adaboost分类模型,得到Adaboost分类器包括:
将第四预设阈值个异常特征向量作为训练组;
将所述训练组输入到多个初始分类器中,计算每个初始分类器的误差;
根据所述每个初始分类器的误差,确定所述每个初始分类器的权重;
根据所述每个初始分类器的权重训练与所述每个初始分类器的权重对应的初始分类器,直至收敛;
将训练之后的初始分类器输入到所述Adaboost分类模型,得到Adaboost分类器。
在其中一些实施例中,根据所述每个初始分类器的误差,确定所述每个初始分类器的权重包括:
初始化每个初始分类器;
将预设值分别赋予给初始化之后的每个初始分类器;
根据所述每个初始分类器的误差,确定所述每个初始分类器的权重。
第二方面,本申请实施例还提供了一种网络流量异常的检测装置,包括:
第一获取模块,用于获取不同监控状态下的多段流量数据;
第二获取模块,用于获取所述多段流量数据中的异常特征向量;
第一训练模块,用于根据所述异常特征向量,以及基于KNN算法训练初始分类模型,得到多个初始分类器;
第二训练模块,根据所述异常特征向量和所述多个初始分类器,以及基于Adaboost算法训练初始Adaboost分类模型,得到Adaboost分类器;
分类模块,用于使用所述Adaboost分类器对采集到的流量数据进行分类。
第三方面,本申请实施例提供了一种电子装置,包括存储器、处理器以及存储在所述存储器上并可在所述处理器上运行的计算机程序,所述处理器执行所述计算机程序时实现如上述第一方面所述的网络流量异常的检测方法。
第四方面,本申请实施例提供了一种存储介质,其上存储有计算机程序,该程序被处理器执行时实现如上述第一方面所述的网络流量异常的检测方法。
相比于相关技术,本申请实施例提供的网络流量异常的检测方法、装置、电子装置和存储介质,通过获取不同监控状态下的多段流量数据;获取多段流量数据中的异常特征向 量;根据异常特征向量,以及基于KNN算法训练初始分类模型,得到多个初始分类器;根据异常特征向量和多个初始分类器,以及基于Adaboost算法训练初始Adaboost分类模型,得到Adaboost分类器;使用Adaboost分类器对采集到的流量数据进行分类的方式,解决了相关技术中网络流量异常的检测效果低的问题,提高了网络流量异常的检测精度。
本申请的一个或多个实施例的细节在以下附图和描述中提出,以使本申请的其他特征、目的和优点更加简明易懂。
附图说明
此处所说明的附图用来提供对本申请的进一步理解,构成本申请的一部分,本申请的示意性实施例及其说明用于解释本申请,并不构成对本申请的不当限定。在附图中:
图1是根据本申请实施例的网络流量异常的检测方法的终端的硬件结构框图。
图2是根据本申请实施例的网络流量异常的检测方法的流程图。
图3是根据本申请可选实施例的网络流量异常的检测方法的流程图。
图4是根据本申请实施例的网络流量异常的检测装置的结构框图。
具体实施方式
为了使本申请的目的、技术方案及优点更加清楚明白,以下结合附图及实施例,对本申请进行描述和说明。应当理解,此处所描述的具体实施例仅仅用以解释本申请,并不用于限定本申请。基于本申请提供的实施例,本领域普通技术人员在没有作出创造性劳动的前提下所获得的所有其他实施例,都属于本申请保护的范围。此外,还可以理解的是,虽然这种开发过程中所作出的努力可能是复杂并且冗长的,然而对于与本申请公开的内容相关的本领域的普通技术人员而言,在本申请揭露的技术内容的基础上进行的一些设计,制造或者生产等变更只是常规的技术手段,不应当理解为本申请公开的内容不充分。
在本申请中提及“实施例”意味着,结合实施例描述的特定特征、结构或特性可以包含在本申请的至少一个实施例中。在说明书中的各个位置出现该短语并不一定均是指相同的实施例,也不是与其它实施例互斥的独立的或备选的实施例。本领域普通技术人员显式地和隐式地理解的是,本申请所描述的实施例在不冲突的情况下,可以与其它实施例相结合。
除非另作定义,本申请所涉及的技术术语或者科学术语应当为本申请所属技术领域内具有一般技能的人士所理解的通常意义。本申请所涉及的“一”、“一个”、“一种”、“该”等类似词语并不表示数量限制,可表示单数或复数。本申请所涉及的术语“包括”、“包含”、 “具有”以及它们任何变形,意图在于覆盖不排他的包含;例如包含了一系列步骤或模块(单元)的过程、方法、系统、产品或设备没有限定于已列出的步骤或单元,而是可以还包括没有列出的步骤或单元,或可以还包括对于这些过程、方法、产品或设备固有的其它步骤或单元。本申请所涉及的“连接”、“相连”、“耦接”等类似的词语并非限定于物理的或者机械的连接,而是可以包括电气的连接,不管是直接的还是间接的。本申请所涉及的“多个”是指大于或者等于两个。“和/或”描述关联对象的关联关系,表示可以存在三种关系,例如,“A和/或B”可以表示:单独存在A,同时存在A和B,单独存在B这三种情况。本申请所涉及的术语“第一”、“第二”、“第三”等仅仅是区别类似的对象,不代表针对对象的特定排序。
本实施例提供的方法实施例可以在终端、计算机或者类似的运算装置中执行。以运行在终端上为例,图1是本申请实施例的网络流量异常的检测方法的终端的硬件结构框图。如图1所示,终端10可以包括一个或多个(图1中仅示出一个)处理器102(处理器102可以包括但不限于微处理器MCU或可编程逻辑器件FPGA等的处理装置)和用于存储数据的存储器104,可选地,上述终端还可以包括用于通信功能的传输设备106以及输入输出设备108。本领域普通技术人员可以理解,图1所示的结构仅为示意,其并不对上述终端的结构造成限定。例如,终端10还可包括比图1中所示更多或者更少的组件,或者具有与图1所示不同的配置。
存储器104可用于存储计算机程序,例如,应用软件的软件程序以及模块,如本申请实施例中的网络流量异常的检测方法对应的计算机程序,处理器102通过运行存储在存储器104内的计算机程序,从而执行各种功能应用以及数据处理,即实现上述的方法。存储器104可包括高速随机存储器,还可包括非易失性存储器,如一个或者多个磁性存储装置、闪存、或者其他非易失性固态存储器。在一些实例中,存储器104可进一步包括相对于处理器102远程设置的存储器,这些远程存储器可以通过网络连接至终端10。上述网络的实例包括但不限于互联网、企业内部网、局域网、移动通信网及其组合。
传输设备106用于经由一个网络接收或者发送数据。上述的网络具体实例可包括终端10的通信供应商提供的无线网络。在一个实例中,传输设备106包括一个网络适配器(Network Interface Controller,简称为NIC),其可通过基站与其他网络设备相连从而可与互联网进行通讯。在一个实例中,传输设备106可以为射频(Radio Frequency,简称为RF)模块,其用于通过无线方式与互联网进行通讯。
本实施例提供了一种网络流量异常的检测方法,图2是根据本申请实施例的网络流量异常的检测方法的流程图,如图2所示,该流程包括如下步骤:
步骤S201:获取不同监控状态下的多段流量数据。
本步骤中,多段流量数据可以实时获取的,也可以是在多段流量数据对应的数据库中获取的。
步骤S202:获取多段流量数据中的异常特征向量。
在其中一些实施例中,可以先确定多段流量数据中的流量异常数据,并进行标记;确定被标记之后的流量异常数据的特征数据;根据特征数据,生成多段流量数据的异常特征向量。通过该方式,提供了一种获取异常特征向量的方式。
步骤S203:根据异常特征向量,以及基于KNN算法训练初始分类模型,得到多个初始分类器。
在本步骤中,KNN算法,又称邻近算法,或者说K最近邻(KNN,K-NearestNeighbor)分类算法是数据挖掘分类技术中最简单的方法之一。所谓K最近邻,就是K个最近的邻居的意思,说的是每个样本都可以用它最接近的K个邻近值来代表。近邻算法就是将数据集合中每一个记录进行分类的方法。
在其中一些实施例中,步骤S203可以以下方式来实现:将第一预设阈值个异常特征向量作为训练集;对训练集中的异常特征向量进行数据归一化处理;确定进行数据归一化处理之后的训练集中的各个异常特征向量的之间的距离;根据距离和KNN算法训练初始分类模型,得到多个初始分类器。
本实施例中,对训练集中的异常特征向量进行数据归一化处理包括:使用最值归一化,将训练集中的异常特征向量映射到0至1之间。通过该方式,可以保证不同维数的特征处于同一空间,以便于提高网络流量异常的检测效果。
基于上述实施例,在根据距离和KNN算法训练初始分类模型,得到多个初始分类器之后,还可以将第二预设阈值个异常特征向量作为测试集;将测试集输入到多个初始分类器中,得到每个初始分类器对应的分类结果;确定每个初始分类器对应的分类结果的准确率;判断每个初始分类器对应的分类结果的准确率是否均大于第三预设阈值;若否,则获取不大于第三预设阈值的初始分类器;根据距离和KNN算法训练初始分类器。通过该方式,以实现对初始分类器的检测,便于提高初始分类器的检测效果和准确率。
步骤S204:根据异常特征向量和多个初始分类器,以及基于Adaboost算法训练初始Adaboost分类模型,得到Adaboost分类器。
在本步骤中,Adaboost算法:是一种迭代算法,其核心思想是针对同一个训练集训练不同的分类器(弱分类器),然后把这些弱分类器集合起来,构成一个更强的最终分类器(强分类器)。
在其中一些实施例中,步骤S204可以通过以下步骤来实现:可以将第四预设阈值个异常特征向量作为训练组;将训练组输入到多个初始分类器中,计算每个初始分类器的误差;根据每个初始分类器的误差,确定每个初始分类器的权重;根据每个初始分类器的权重训练与每个初始分类器的权重对应的初始分类器,直至收敛;将训练之后的初始分类器输入到Adaboost分类模型,得到Adaboost分类器。通过该方式,实现了对Adaboost分类器的训练。
在本实施例中,根据每个初始分类器的误差,确定每个初始分类器的权重包括:初始化每个初始分类器;将预设值分别赋予给初始化之后的每个初始分类器;根据每个初始分类器的误差,确定每个初始分类器的权重。通过该方式,确保了每个初始分类器的初始赋值一致,以便于提高了初始分类器的检测效果。
步骤S205:使用Adaboost分类器对采集到的流量数据进行分类。
基于步骤S201至S205,通过获取网络流量监控不同状态下的多段流量数据,提取多组流量异常特征向量,使用多组流量异常特征向量基于KNN算法训练得到多个初始分类器,再组合多个初始分类器训练得到Adaboost强分类器,使用Adaboost强分类器对采集到的网络流量异常数据进行分类,从而提高检测效果,解决了相关技术中网络流量异常的检测效果低的问题。
且本申请基于KNN的Adaboost网络流量异常检测方法,能够依据大量的流量监测数据进行异常判断和分类,可对复杂化的网络流量加以检测,自适应强,有助于提高流量数据异常的检测准确率。
下面通过可选实施例对本申请实施例进行描述和说明。
图3是根据本申请可选实施例的网络流量异常的检测方法的流程图。如图3所述,该可选流程包括:
步骤S301:获取多组网络流量不同监控状态下流量异常数据并进行数据预处理,获得多组流量异常特征向量。
步骤S301可以按照下述步骤进行:
(1)获取网络流量不同监控状态下的多段流量异常数据并标记;
(2)采用流量特征提取工具提取流量异常数据的特征数据;
(3)基于特征数据生成特征向量。
步骤S302:基于KNN算法构建初始分类器。
步骤S302可以按照下述步骤进行:
(1)数据集划分:随机取80%流量异常特征向量组作为训练集,剩余20%的流量异 常特征向量组作为测试集;
(2)数据归一化处理:将流量异常特征向量通过线性函数转换进行归一化处理,把所有数据映射到0-1之间,以保证不同维数的特征处于同一空间;
(3)初始分类模型训练:使用加权欧氏距离公式计算各个训练数据之间的距离,并基于KNN算法构建初始分类模型:k个距离最近的邻居训练样本中标记为同一类别;
(4)初始分类模型测试:将测试集输入至初始分类模型中,根据输出结果与标记结果判断初始分类模型的准确率,若初始准确率不小于设定值,则将初始分类模型作为初始分类器,若小于设定值,则重复步骤3,进行初始分类模型优化。
步骤S303:重复步骤S302多次,得到多个基于KNN的初始分类器。
步骤S304:根据多个基于KNN的初始分类器组合训练得到Adaboost强分类器。
步骤S304可以按照下述步骤进行:
(1)将多组流量异常特征向量进行分类,随机取70%的流量异常特征向量组作为训练组,剩余30%的流量异常特征向量组作为测试组;
(2)将训练组和多个初始分类器输入Adaboost分类器中;
(3)初始化每个初始分类器的权重,给每个KNN分类器赋予相同的权值;
(4)将训练组输入每个初始分类器中,计算每个分类器的误差;
(5)计算每个初始分类器的权值;
(6)更新每个初始分类器的权值,判断当前迭代次数是否满足当前迭代次数小于设定迭代次数,若满足,则重复步骤(4)-(6)至当前迭代次数不小于设定迭代次数;若不满足,则将多个初始分类器组合为Adaboost强分类器。
步骤S305:使用Adaboost强分类器对采集到的网络流量异常数据特征向量进行分类,进而进行流量异常判断。
本实施例还提供了一种网络流量异常的检测装置,该装置用于实现上述实施例及可选实施方式,已经进行过说明的不再赘述。如以下所使用的,术语“模块”、“单元”、“子单元”等可以实现预定功能的软件和/或硬件的组合。尽管以下实施例所描述的装置较佳地以软件来实现,但是硬件,或者软件和硬件的组合的实现也是可能并被构想的。
图4是根据本申请实施例的网络流量异常的检测装置的结构框图,如图4所示,该装置包括:
第一获取模块41,用于获取不同监控状态下的多段流量数据;
第二获取模块42,耦合至第一获取模块41,用于获取多段流量数据中的异常特征向量;
第一训练模块43,耦合至第二获取模块42,用于根据异常特征向量,以及基于KNN算法训练初始分类模型,得到多个初始分类器;
第二训练模块44,耦合至第一训练模块43,根据异常特征向量和多个初始分类器,以及基于Adaboost算法训练初始Adaboost分类模型,得到Adaboost分类器;
分类模块45,耦合至第二训练模块44,用于使用Adaboost分类器对采集到的流量数据进行分类。
在本实施例中,通过第一获取模块41,用于获取不同监控状态下的多段流量数据;第二获取模块42,耦合至第一获取模块41,用于获取多段流量数据中的异常特征向量;第一训练模块43,耦合至第二获取模块42,用于根据异常特征向量,以及基于KNN算法训练初始分类模型,得到多个初始分类器;第二训练模块44,耦合至第一训练模块43,根据异常特征向量和多个初始分类器,以及基于Adaboost算法训练初始Adaboost分类模型,得到Adaboost分类器;分类模块45,耦合至第二训练模块44,用于使用Adaboost分类器对采集到的流量数据进行分类的方式,解决了相关技术中网络流量异常的检测效果低的问题,提高了网络流量异常的检测精度。
在其中一些实施例中,第二获取模块42包括:第一确定单元,用于确定多段流量数据中的流量异常数据,并进行标记;第二确定单元,用于确定被标记之后的流量异常数据的特征数据;生成单元,用于根据特征数据,生成多段流量数据的异常特征向量。
在其中一些实施例中,第一训练模块43包括:第一作为单元,用于将第一预设阈值个异常特征向量作为训练集;处理单元,用于对训练集中的异常特征向量进行数据归一化处理;第三确定单元,用于确定进行数据归一化处理之后的训练集中的各个异常特征向量的之间的距离;第一训练单元,用于根据距离和KNN算法训练初始分类模型,得到多个初始分类器。
在其中一些实施例中,处理单元包括:映射子单元,用于使用最值归一化,将训练集中的异常特征向量映射到0至1之间。
在其中一些实施例中,该装置还包括:作为模块,用于将第二预设阈值个异常特征向量作为测试集;输入模块,用于将测试集输入到多个初始分类器中,得到每个初始分类器对应的分类结果;确定模块,用于确定每个初始分类器对应的分类结果的准确率;判断模块,用于判断每个初始分类器对应的分类结果的准确率是否均大于第三预设阈值;第三获取模块,用于若否,则获取不大于第三预设阈值的初始分类器;第三训练模块,用于根据距离和KNN算法训练初始分类器。
在其中一些实施例中,第二训练模块44包括:第二作为单元,用于将第四预设阈值 个异常特征向量作为训练组;计算单元,用于将训练组输入到多个初始分类器中,计算每个初始分类器的误差;第四确定单元,用于根据每个初始分类器的误差,确定每个初始分类器的权重;第二训练单元,用于根据每个初始分类器的权重训练与每个初始分类器的权重对应的初始分类器,直至收敛;输入单元,用于将训练之后的初始分类器输入到Adaboost分类模型,得到Adaboost分类器。
在其中一些实施例中,确定单元包括:初始化子单元,用于初始化每个初始分类器;赋予子单元,用于将预设值分别赋予给初始化之后的每个初始分类器;子单元,用于根据每个初始分类器的误差,确定每个初始分类器的权重。
需要说明的是,上述各个模块可以是功能模块也可以是程序模块,既可以通过软件来实现,也可以通过硬件来实现。对于通过硬件来实现的模块而言,上述各个模块可以位于同一处理器中;或者上述各个模块还可以按照任意组合的形式分别位于不同的处理器中。
本实施例还提供了一种电子装置,包括存储器和处理器,该存储器中存储有计算机程序,该处理器被设置为运行计算机程序以执行上述任一项方法实施例中的步骤。
可选地,上述电子装置还可以包括传输设备以及输入输出设备,其中,该传输设备和上述处理器连接,该输入输出设备和上述处理器连接。
可选地,在本实施例中,上述处理器可以被设置为通过计算机程序执行以下步骤:
S201:获取不同监控状态下的多段流量数据。
S202:获取多段流量数据中的异常特征向量。
S203:根据异常特征向量,以及基于KNN算法训练初始分类模型,得到多个初始分类器。
S204:根据异常特征向量和多个初始分类器,以及基于Adaboost算法训练初始Adaboost分类模型,得到Adaboost分类器。
S205:使用Adaboost分类器对采集到的流量数据进行分类。
需要说明的是,本实施例中的具体示例可以参考上述实施例及可选实施方式中所描述的示例,本实施例在此不再赘述。
另外,结合上述实施例中的网络流量异常的检测方法,本申请实施例可提供一种存储介质来实现。该存储介质上存储有计算机程序;该计算机程序被处理器执行时实现上述实施例中的任意一种网络流量异常的检测方法。
本领域的技术人员应该明白,以上所述实施例的各技术特征可以进行任意的组合,为使描述简洁,未对上述实施例中的各个技术特征所有可能的组合都进行描述,然而,只要这些技术特征的组合不存在矛盾,都应当认为是本说明书记载的范围。
以上所述实施例仅表达了本申请的几种实施方式,其描述较为具体和详细,但并不能因此而理解为对申请专利范围的限制。应当指出的是,对于本领域的普通技术人员来说,在不脱离本申请构思的前提下,还可以做出若干变形和改进,这些都属于本申请的保护范围。因此,本申请专利的保护范围应以所附权利要求为准。

Claims (10)

  1. 一种网络流量异常的检测方法,其特征在于,所述方法包括:
    获取不同监控状态下的多段流量数据;
    获取所述多段流量数据中的异常特征向量;
    根据所述异常特征向量,以及基于KNN算法训练初始分类模型,得到多个初始分类器;
    根据所述异常特征向量和所述多个初始分类器,以及基于Adaboost算法训练初始Adaboost分类模型,得到Adaboost分类器;
    使用所述Adaboost分类器对采集到的流量数据进行分类。
  2. 根据权利要求1所述的网络流量异常的检测方法,其中,获取所述多段流量数据的异常特征向量包括:
    确定所述多段流量数据中的流量异常数据,并进行标记;
    确定被标记之后的所述流量异常数据的特征数据;
    根据所述特征数据,生成所述多段流量数据的异常特征向量。
  3. 根据权利要求1所述的网络流量异常的检测方法,其中,根据所述异常特征向量,以及基于KNN算法训练初始分类模型,得到多个初始分类器包括:
    将第一预设阈值个所述异常特征向量作为训练集;
    对所述训练集中的异常特征向量进行数据归一化处理;
    确定进行数据归一化处理之后的所述训练集中的各个异常特征向量的之间的距离;
    根据所述距离和KNN算法训练所述初始分类模型,得到所述多个初始分类器。
  4. 根据权利要求3所述的网络流量异常的检测方法,其中,对所述训练集中的异常特征向量进行数据归一化处理包括:
    使用最值归一化,将所述训练集中的异常特征向量映射到0至1之间。
  5. 根据权利要求3所述的网络流量异常的检测方法,其中,在根据所述距离和KNN算法训练所述初始分类模型,得到所述多个初始分类器之后,所述方法还包括:
    将第二预设阈值个所述异常特征向量作为测试集;
    将所述测试集输入到所述多个初始分类器中,得到每个初始分类器对应的分类结果;
    确定每个初始分类器对应的分类结果的准确率;
    判断所述每个初始分类器对应的分类结果的准确率是否均大于第三预设阈值;
    若否,则获取不大于所述第三预设阈值的初始分类器;
    根据所述距离和KNN算法训练所述初始分类器。
  6. 根据权利要求1所述的网络流量异常的检测方法,其中,根据所述异常特征向量和所述多个初始分类器,以及基于Adaboost算法训练初始Adaboost分类模型,得到Adaboost分类器包括:
    将第四预设阈值个异常特征向量作为训练组;
    将所述训练组输入到多个初始分类器中,计算每个初始分类器的误差;
    根据所述每个初始分类器的误差,确定所述每个初始分类器的权重;
    根据所述每个初始分类器的权重训练与所述每个初始分类器的权重对应的初始分类器,直至收敛;
    将训练之后的初始分类器输入到所述Adaboost分类模型,得到Adaboost分类器。
  7. 根据权利要求6所述的网络流量异常的检测方法,其中,根据所述每个初始分类器的误差,确定所述每个初始分类器的权重包括:
    初始化每个初始分类器;
    将预设值分别赋予给初始化之后的每个初始分类器;
    根据所述每个初始分类器的误差,确定所述每个初始分类器的权重。
  8. 一种网络流量异常的检测装置,其特征在于,所述装置包括:
    第一获取模块,用于获取不同监控状态下的多段流量数据;
    第二获取模块,用于获取所述多段流量数据中的异常特征向量;
    第一训练模块,用于根据所述异常特征向量,以及基于KNN算法训练初始分类模型,得到多个初始分类器;
    第二训练模块,根据所述异常特征向量和所述多个初始分类器,以及基于Adaboost算法训练初始Adaboost分类模型,得到Adaboost分类器;
    分类模块,用于使用所述Adaboost分类器对采集到的流量数据进行分类。
  9. 一种电子装置,包括存储器和处理器,其特征在于,所述存储器中存储有计算机程序,所述处理器被设置为运行所述计算机程序以执行权利要求1至7中任一项所述的网络流量异常的检测方法。
  10. 一种存储介质,其特征在于,所述存储介质中存储有计算机程序,其中,所述计算机程序被设置为运行时执行权利要求1至7中任一项所述的网络流量异常的检测方法。
PCT/CN2021/092227 2020-08-21 2021-05-07 网络流量异常的检测方法、装置、电子装置和存储介质 WO2022037130A1 (zh)

Priority Applications (1)

Application Number Priority Date Filing Date Title
US18/022,170 US20230300159A1 (en) 2020-08-21 2021-05-07 Network traffic anomaly detection method and apparatus, and electronic apparatus and storage medium

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202010847761.6A CN112153000B (zh) 2020-08-21 2020-08-21 网络流量异常的检测方法、装置、电子装置和存储介质
CN202010847761.6 2020-08-21

Publications (1)

Publication Number Publication Date
WO2022037130A1 true WO2022037130A1 (zh) 2022-02-24

Family

ID=73888858

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/092227 WO2022037130A1 (zh) 2020-08-21 2021-05-07 网络流量异常的检测方法、装置、电子装置和存储介质

Country Status (3)

Country Link
US (1) US20230300159A1 (zh)
CN (1) CN112153000B (zh)
WO (1) WO2022037130A1 (zh)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114513374A (zh) * 2022-04-21 2022-05-17 浙江御安信息技术有限公司 一种基于人工智能的网络安全威胁识别方法及系统
CN114513473A (zh) * 2022-03-24 2022-05-17 新华三人工智能科技有限公司 一种流量类别检测方法、装置及设备
CN114615088A (zh) * 2022-04-25 2022-06-10 国网冀北电力有限公司信息通信分公司 一种终端业务流量异常检测模型建立方法及异常检测方法
CN114726749A (zh) * 2022-03-02 2022-07-08 阿里巴巴(中国)有限公司 数据异常检测模型获取方法、装置、设备、介质及产品

Families Citing this family (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112153000B (zh) * 2020-08-21 2023-04-18 杭州安恒信息技术股份有限公司 网络流量异常的检测方法、装置、电子装置和存储介质
CN113347181A (zh) * 2021-06-01 2021-09-03 上海明略人工智能(集团)有限公司 异常广告流量检测方法、系统和计算机设备和存储介质
CN113315790B (zh) * 2021-07-29 2021-11-02 湖南华菱电子商务有限公司 入侵流量检测方法、电子设备及存储介质
CN113705619B (zh) * 2021-08-03 2023-09-12 广州大学 一种恶意流量检测方法、系统、计算机及介质
CN113886118A (zh) * 2021-09-16 2022-01-04 杭州安恒信息技术股份有限公司 异常资源处理方法、装置、系统、电子装置和存储介质
CN117097578B (zh) * 2023-10-20 2024-01-05 杭州烛微智能科技有限责任公司 一种网络流量的安全监控方法、系统、介质及电子设备

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101582813A (zh) * 2009-06-26 2009-11-18 西安电子科技大学 基于分布式迁移网络学习的入侵检测系统及其方法
CN102324007A (zh) * 2011-09-22 2012-01-18 重庆大学 基于数据挖掘的异常检测方法
CN103716204A (zh) * 2013-12-20 2014-04-09 中国科学院信息工程研究所 一种基于维纳过程的异常入侵检测集成学习方法及装置
CN108093406A (zh) * 2017-11-29 2018-05-29 重庆邮电大学 一种基于集成学习的无线传感网入侵检测方法
CN112153000A (zh) * 2020-08-21 2020-12-29 杭州安恒信息技术股份有限公司 网络流量异常的检测方法、装置、电子装置和存储介质

Family Cites Families (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
IL191744A0 (en) * 2008-05-27 2009-02-11 Yuval Elovici Unknown malcode detection using classifiers with optimal training sets
JP5301310B2 (ja) * 2009-02-17 2013-09-25 株式会社日立製作所 異常検知方法及び異常検知システム
CN105787743A (zh) * 2016-02-26 2016-07-20 中国银联股份有限公司 基于样本聚类的欺诈交易检测方法
CN106529729A (zh) * 2016-11-18 2017-03-22 同济大学 基于BP_Adaboost模型的信用卡用户违约的预测方法及系统
CN106529814B (zh) * 2016-11-21 2020-01-07 武汉大学 基于Adaboost聚类和马尔科夫链的分布式光伏超短期预测方法
CN109145108A (zh) * 2017-06-16 2019-01-04 贵州小爱机器人科技有限公司 文本层叠分类器训练方法、分类方法、装置及计算机设备
CN109840413B (zh) * 2017-11-28 2020-12-22 中国移动通信集团浙江有限公司 一种钓鱼网站检测方法及装置
CN108154178A (zh) * 2017-12-25 2018-06-12 北京工业大学 基于改进的svm-knn算法的半监督托攻击检测方法
CN109167753A (zh) * 2018-07-23 2019-01-08 中国科学院计算机网络信息中心 一种网络入侵流量的检测方法及装置
CN109489977B (zh) * 2018-12-28 2021-03-05 西安工程大学 基于KNN-AdaBoost的轴承故障诊断方法
CN109842614B (zh) * 2018-12-29 2021-03-16 杭州电子科技大学 基于数据挖掘的网络入侵检测方法
CN109726767A (zh) * 2019-01-13 2019-05-07 胡燕祝 一种基于AdaBoost算法的感知器网络数据分类方法
CN109919055B (zh) * 2019-02-26 2020-11-24 中国地质大学(武汉) 一种基于AdaBoost-KNN的动态人脸情感识别方法
CN110225055B (zh) * 2019-06-22 2020-10-09 福州大学 一种基于knn半监督学习模型的网络流量异常检测方法与系统
CN110490582A (zh) * 2019-07-18 2019-11-22 同济大学 一种信用卡交易异常检测方法及装置

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101582813A (zh) * 2009-06-26 2009-11-18 西安电子科技大学 基于分布式迁移网络学习的入侵检测系统及其方法
CN102324007A (zh) * 2011-09-22 2012-01-18 重庆大学 基于数据挖掘的异常检测方法
CN103716204A (zh) * 2013-12-20 2014-04-09 中国科学院信息工程研究所 一种基于维纳过程的异常入侵检测集成学习方法及装置
CN108093406A (zh) * 2017-11-29 2018-05-29 重庆邮电大学 一种基于集成学习的无线传感网入侵检测方法
CN112153000A (zh) * 2020-08-21 2020-12-29 杭州安恒信息技术股份有限公司 网络流量异常的检测方法、装置、电子装置和存储介质

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114726749A (zh) * 2022-03-02 2022-07-08 阿里巴巴(中国)有限公司 数据异常检测模型获取方法、装置、设备、介质及产品
CN114726749B (zh) * 2022-03-02 2023-10-31 阿里巴巴(中国)有限公司 数据异常检测模型获取方法、装置、设备及介质
CN114513473A (zh) * 2022-03-24 2022-05-17 新华三人工智能科技有限公司 一种流量类别检测方法、装置及设备
CN114513374A (zh) * 2022-04-21 2022-05-17 浙江御安信息技术有限公司 一种基于人工智能的网络安全威胁识别方法及系统
CN114615088A (zh) * 2022-04-25 2022-06-10 国网冀北电力有限公司信息通信分公司 一种终端业务流量异常检测模型建立方法及异常检测方法

Also Published As

Publication number Publication date
CN112153000B (zh) 2023-04-18
US20230300159A1 (en) 2023-09-21
CN112153000A (zh) 2020-12-29

Similar Documents

Publication Publication Date Title
WO2022037130A1 (zh) 网络流量异常的检测方法、装置、电子装置和存储介质
US10587632B1 (en) Neural network-based malware detection
WO2021208721A1 (zh) 联邦学习防御方法、装置、电子设备及存储介质
CN110572382B (zh) 基于smote算法和集成学习的恶意流量检测方法
WO2021189730A1 (zh) 检测异常高密子图的方法、装置、设备及存储介质
WO2021000958A1 (zh) 用于实现模型训练的方法及装置、计算机存储介质
US11403559B2 (en) System and method for using a user-action log to learn to classify encrypted traffic
WO2018160136A1 (en) Method and apparatus for determining an identity of an unknown internet-of-things (iot) device in a communication network
Yang et al. An efficient one-class SVM for anomaly detection in the internet of things
CN113542241B (zh) 一种基于CNN-BiGRU混合模型的入侵检测方法及装置
WO2020001311A1 (zh) 一种检测干扰的方法、装置、设备及存储介质
CN113992349B (zh) 恶意流量识别方法、装置、设备和存储介质
CN107180056A (zh) 视频中片段的匹配方法和装置
CN110825545A (zh) 一种云服务平台异常检测方法与系统
Dong et al. Secure distributed on-device learning networks with byzantine adversaries
Gong et al. Multi-task based deep learning approach for open-set wireless signal identification in ISM band
CN113452802A (zh) 设备型号的识别方法、装置及系统
Chen et al. A network traffic classification model based on metric learning
Li et al. Semi-supervised network traffic classification using deep generative models
CN112115957A (zh) 数据流识别方法及装置、计算机存储介质
CN112367215B (zh) 基于机器学习的网络流量协议识别方法和装置
Ma et al. A Multi-Perspective Feature Approach to Few-Shot Classification of IoT Traffic
CN116599720A (zh) 一种基于GraphSAGE的恶意DoH流量检测方法、系统
CN114978593B (zh) 基于图匹配的不同网络环境的加密流量分类方法及系统
CN113765891B (zh) 一种设备指纹识别方法以及装置

Legal Events

Date Code Title Description
NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 21857231

Country of ref document: EP

Kind code of ref document: A1