CN107395640B - Intrusion detection system and method based on division and characteristic change - Google Patents

Intrusion detection system and method based on division and characteristic change Download PDF

Info

Publication number
CN107395640B
CN107395640B CN201710760156.3A CN201710760156A CN107395640B CN 107395640 B CN107395640 B CN 107395640B CN 201710760156 A CN201710760156 A CN 201710760156A CN 107395640 B CN107395640 B CN 107395640B
Authority
CN
China
Prior art keywords
sample
data packet
training set
intrusion detection
maj
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Fee Related
Application number
CN201710760156.3A
Other languages
Chinese (zh)
Other versions
CN107395640A (en
Inventor
郭华平
周俊
杨乐
邬长安
祁传达
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Xinyang Normal University
Original Assignee
Xinyang Normal University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Xinyang Normal University filed Critical Xinyang Normal University
Priority to CN201710760156.3A priority Critical patent/CN107395640B/en
Publication of CN107395640A publication Critical patent/CN107395640A/en
Application granted granted Critical
Publication of CN107395640B publication Critical patent/CN107395640B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/14Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic
    • H04L63/1408Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic by monitoring network traffic
    • H04L63/1416Event detection, e.g. attack signature detection
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • G06F18/232Non-hierarchical techniques
    • G06F18/2321Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions
    • G06F18/23213Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions with fixed number of clusters, e.g. K-means clustering
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/14Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic
    • H04L63/1408Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic by monitoring network traffic
    • H04L63/1425Traffic logging, e.g. anomaly detection

Landscapes

  • Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Computer Security & Cryptography (AREA)
  • Theoretical Computer Science (AREA)
  • General Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Artificial Intelligence (AREA)
  • General Physics & Mathematics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Signal Processing (AREA)
  • Computer Hardware Design (AREA)
  • Computing Systems (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Probability & Statistics with Applications (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Data Exchanges In Wide-Area Networks (AREA)

Abstract

The invention discloses an intrusion detection system and method based on division and characteristic change, wherein the method comprises the following steps: in the training stage, a normal data packet training set is divided into a plurality of clusters by using a K-means clustering method, each cluster and a network intrusion data packet training set are combined to form a plurality of new training sets, and each training set D is provided with a plurality of training setsiLearning a feature transformation matrix QiAnd is represented by QiDefined in-space learning k-nearest neighbor model Mi(ii) a In the prediction stage, a learned K-means method is used for selecting a K neighbor prediction model in a corresponding space for a data packet to be predicted, and the model is used for predicting whether the data packet is an invasive data packet or not. The invention can effectively analyze whether the data packet belongs to the intrusion packet or not, and can keep high accuracy in predicting normal and intrusion samples, thereby having wider engineering application value.

Description

Intrusion detection system and method based on division and characteristic change
Technical Field
The invention belongs to the technical field of data analysis, and relates to an intrusion detection system and method based on division and characteristic change.
Background
The appearance and wide application of the network bring convenience to the life and work of people, but also bring many safety problems, and various viruses, bugs and attacks cause huge loss to the society. How to protect information from being attacked and leaked and maintain the integrity, availability and confidentiality of the information is a focus of current research.
In the face of the current situation of network security, measures such as access control, data encryption, identity authentication, firewall, intrusion detection technology and the like are mainly adopted at present to ensure the security of a network and an information system. The intrusion detection technology is an effective means for ensuring the security of the system and the network by collecting information such as an operating system, a system program, an application program, a network data packet and the like and discovering behaviors of a monitored system or the network which violate security policies or endanger the security of the system.
The machine learning method simulates the learning activities of human beings by using a computer, researches how to learn the existing knowledge through the computer, discovers new knowledge, and improves the learning effect through continuous improvement. The machine learning includes a large number of data preprocessing and classification methods, and is related to subjects such as statistics, artificial intelligence, information theory and the like. The basic process is to further classify or predict unknown samples by learning and constructing a learning machine from the existing experience.
The network intrusion sample belongs to a few cases, the proportion of the network intrusion sample is about 0.001, and most data packets belong to normal communication data packets. If the traditional classification method is adopted, the prediction accuracy is quite considerable even if any communication packet is judged to be a normal communication packet. However, this high accuracy is not practical for identifying intrusion samples belonging to a small number of classes, and only correct identification of intrusion samples is the goal to be achieved. In a traditional k-NN algorithm, the difference between samples is usually measured by a simple Euclidean distance, but the detection of an intrusion sample is influenced due to inaccurate measurement caused by different weights of each characteristic of the sample. Therefore, it is necessary to design a classification method for intrusion detection.
Disclosure of Invention
The invention aims to provide an intrusion detection system and method based on division and characteristic change. The method ensures that whether the sample belongs to the intrusion sample can be effectively analyzed, and high accuracy can be kept on the prediction of normal and intrusion samples.
The specific technical scheme is as follows:
an intrusion detection system based on division and characteristic change comprises a data acquisition module, a learning module and a prediction module,
the data acquisition module: inputting network data packet data as basic training data for learning an intrusion detection model based on division and feature transformation;
the learning module: dividing normal data packets into a plurality of clusters by using a K-means method, and combining the data packets of each cluster and an intrusion packet into a new training set for training a characteristic transformation matrix and a corresponding K-nearest neighbor intrusion detection model;
the prediction module: given a prediction packet sample x, a K-means method is used to project the x into a corresponding feature transformation space, and a corresponding K-nearest neighbor intrusion detection model is used to predict the class of the sample x.
An intrusion detection method based on partitioning and feature variation comprises the following steps:
step 1, establishing an attribute table of a sample; acquiring a training sample, and processing the training sample according to the attribute table;
step 2, dividing a majority training set D by adopting a K-means clustering algorithmmajTo obtain a cluster Dmaj,1,Dmaj,2,...,Dmaj,K
Step 3, clustering Dmaj,1,Dmaj,2,...,Dmaj,KTraining set D with minority classes respectivelyminCombining to obtain a new training set D1,D2,...,DK
Step 4, using each new training set DkLearning transformation matrix QkThe method comprises the following specific steps: at DkFor a sample point xiAnother sample point xjThe probability of the impact on the classification result is:
Figure BSA0000149860190000021
targeting maximum of leave-one-out (LOO) accuracy for xiThe probability that it is correctly classified by all samples other than itself is
Figure BSA0000149860190000031
Wherein omegaiIs equal to xiSubscript set of samples belonging to the same class.
The optimization goal is then:
Figure BSA0000149860190000032
the gradient is as follows:
Figure BSA0000149860190000033
the corresponding characteristic transformation matrix Q can be obtained by solving the above formula by adopting a gradient descent methodk
And 5, processing the sample x to be classified by using the attribute table in the step 1, processing a training set corresponding to the x by using the transformation matrix Q obtained in the step 3, and classifying the sample x to be classified by adopting a k-NN algorithm in the converted feature space, so that whether the sample x belongs to a normal sample or an invasive sample can be judged.
Preferably, the attribute table divides the types of the data packet variables into numerical values and discrete values.
As one preference, the normal data packets are divided into K clusters.
Preferably, the samples are mapped into the optimized space before the prediction samples are predicted using the k-NN algorithm.
Still further, x is mapped to the corresponding feature transform space and the class of the sample is predicted using k neighbors with the function:
Figure BSA0000149860190000034
wherein v is a class number, i (true) 1, i (false) 0, QiFor the learned spatial transformation matrix, Qi Tx denotes mapping x to QiIn the defined feature space, DzIs represented in a feature space QiX searched by using Euclidean distanceiWherein the euclidean distance formula is as follows:
Figure BSA0000149860190000035
further, the sample attribute table includes connection duration, protocol type, network service type of the target host, connection normal or error state, source host to target host byte number, target host to source host byte number, whether from/to the same host port, number of error segments, number of emergency packets, number of times of accessing sensitive area, number of failed login attempts, number of successful logins, number of times of root user access, number of file creation operations, number of times of using shell, number of times of accessing control file, whether login belongs to "hot" list, whether login is guest login, number of connections of the same target host as the current connection, number of connections of the same service as the current connection, number of connections of the same target host as the current connection, and of the first 100 connections, the percentage of connections with SYN errors among connections with the same target host as the current connection, and the percentage of connections with REJ errors among the first 100 connections with the same service as the current connection.
Compared with the prior art, the invention has the beneficial effects that:
the invention can effectively analyze whether the data packet sample belongs to the intrusion sample, and can keep high accuracy in predicting normal and intrusion samples.
Drawings
FIG. 1 is a flow chart illustrating an intrusion detection method based on partitioning and feature variation according to the present invention;
FIG. 2 is a schematic diagram of clustering a plurality of classes by using a K-means clustering algorithm;
FIG. 3 is a schematic diagram of the classification of the test sample x in the optimized space by using the k-NN algorithm.
Detailed Description
The technical solution of the present invention will be described in further detail with reference to the accompanying drawings and specific embodiments.
As shown in fig. 1, an intrusion detection method based on partitioning and feature transformation includes the following steps:
step 1, capturing a data packet from a network by using a data acquisition module as a basic training set D for a model, and dividing the data packet into a normal data packet training set DmajAnd intrusion data packet training set DminPreprocessing the data (such as checking the rationality and integrity of the selected data, supplementing and correcting missing data and abnormal data, and performing normalization processing), and establishing an attribute table of the sample, wherein the value types of the attributes are divided into numerical values and discrete values;
step 2, dividing a normal data packet training set D by adopting a K-means clustering algorithmmajTo obtain a cluster Dmaj,1,Dmaj,2,...,Dmaj,K
Step 3, clustering Dmaj,1,Dmaj,2,...,Dmaj,KRespectively with the intrusion data packet training set DminCombining to obtain a new training set D1,D2,...,DK
Step 4, using each new training set DkLearning transformation matrix QkThe method comprises the following specific steps: at DkFor a sample point xiAnother sample point xjThe probability of the impact on the classification result is:
Figure BSA0000149860190000051
targeting maximum of leave-one-out (LOO) accuracy for xiThe probability that it is correctly classified by all samples other than itself is
Figure BSA0000149860190000052
Wherein omegaiIs equal to xiSubscript set of samples belonging to the same class.
The optimization goal is then:
Figure BSA0000149860190000053
the gradient is as follows:
Figure BSA0000149860190000054
the corresponding characteristic transformation matrix Q can be obtained by solving the above formula by adopting a gradient descent methodk
And 5, processing the sample x to be classified by using the sample attribute table in the step 1, processing a training set corresponding to the x by using the transformation matrix Q obtained in the step 4, and classifying the sample x to be classified by using a k-NN algorithm in the converted feature space, so that whether the sample x belongs to a normal sample or an invasive sample can be judged.
The above description is only a preferred embodiment of the present invention, and the scope of the present invention is not limited thereto, and any simple modifications or equivalent substitutions of the technical solutions that can be obviously obtained by those skilled in the art within the technical scope of the present invention are within the scope of the present invention.

Claims (6)

1. An intrusion detection method based on division and characteristic change is characterized by comprising a data acquisition module, a learning module and a prediction module,
the data acquisition module: inputting network data packet data as basic training data for learning an intrusion detection model based on division and feature transformation;
the learning module: dividing normal data packets into a plurality of clusters by using a K-means method, and combining the data packets of each cluster and an intrusion packet into a new training set for training a characteristic transformation matrix and a corresponding K-nearest neighbor intrusion detection model;
the prediction module: given a prediction packet sample x, projecting the x to a corresponding feature transformation matrix by using a K-means method, predicting the class of the sample x by using a corresponding K-nearest neighbor intrusion detection model,
step 1, a data acquisition module is used for capturing a data packet from a network as a basic training set D for a model, dividing the data packet into a normal data packet training set Dmaj and an invasive data packet training set Dmin, and preprocessing the data packet training set Dmin and the invasive data packet training set Dmin; checking the reasonability and integrity of the selected data, supplementing and correcting abnormal data, carrying out normalization processing, and establishing an attribute table of the sample, wherein the value types of all attributes are divided into numerical values and discrete values;
step 2, adopting a K-means clustering algorithm to train a normal data packet into a set DmajDividing into K clusters to obtain cluster Dmaj,1,Dmaj,2,…,Dmaj,K
Step 3, clustering Dmaj,1,Dmaj,2,…,Dmaj,KRespectively with the intrusion data packet training set DminCombining to obtain K new training sets D1,D2,…,DK
Step 4, using each new training set DkLearning transformation matrix QkThe method comprises the following specific steps: at DkFor a sample point xiAnother sample point xjThe probability of the impact on the classification result is:
Figure FDA0002405244500000011
where i is 1,2, … | Dk|,j=1,2,…|Dk|,i≠j,|DkI is training set DkThe size of (d);
targeting the left-of-one (LOO) accuracy maximization for sample point xiThe probability that it is correctly classified by all samples other than itself is:
Figure FDA0002405244500000021
wherein omegaiThe meaning of expression is: in training set DkAnd sample point xiSubscript sets of other samples of the same class;
the optimization goal is then:
Figure FDA0002405244500000022
the gradient is as follows:
Figure FDA0002405244500000023
solving the objective function f (Q) by gradient descent method and using the above formulak) Obtaining the corresponding characteristic transformation matrix Qk
Step 5, processing the sample x to be classified by using the sample attribute table in the step 1, and selecting the corresponding transformation matrix Q learned in the step 3 by using the K-means clustering method in the step 2kAnd processing the training set corresponding to the x, and classifying the sample x to be classified by adopting K nearest neighbors in the converted feature space, namely judging whether the sample x belongs to a normal sample or an invasive sample.
2. The intrusion detection method according to claim 1, wherein the sample attribute table classifies types of packet variables into numeric values and discrete values.
3. The intrusion detection method according to claim 1, wherein the normal data packets are trained to set DmajDivided into K clusters.
4. The partition and feature change based intrusion detection method according to claim 1, wherein the samples are mapped into the optimized space before prediction of the predicted samples using K neighbors.
5. The method of claim 1, wherein x is mapped to a corresponding eigen transformation matrix and k neighbors are used to predict classes of samples, using the function:
Figure FDA0002405244500000031
wherein v is a class number, i (true) 1, i (false) 0, QkFor the learned correspondingThe feature transformation matrix of (a) is,
Figure FDA0002405244500000032
is xiSample after feature transformation, yiIs a sample xiTrue class designation of; i (-) is a symbolic function, DzRepresenting search by Euclidean distance
Figure FDA0002405244500000033
X searched by using Euclidean distanceiWherein the euclidean distance formula is as follows:
Figure FDA0002405244500000034
wherein x isidIs a sample xiThe value in the d-dimension is,
xjdis a sample xjThe value in the d-dimension is,
n represents the feature dimension.
6. The partition and feature change based intrusion detection method according to claim 2, wherein the attribute table of the sample includes connection duration, protocol type, network service type of the target host, connection normal or wrong status, byte number of source host to target host, byte number of target host to source host, whether from/to the same host port, number of error segments, number of urgent packets, number of times of accessing sensitive area, number of times of failed login attempts, number of times of successful login, number of times of root user access, number of file creation operations, number of times of using shell, number of times of accessing control file, whether login belongs to "hot" list, whether it is a guest login, number of connections of target host same as current connection, number of connections having same service as current connection, number of connections of previous 100 connections having same target host as current connection, The percentage of the connections with SYN errors among the first 100 connections having the same target host as the current connection, and the percentage of the connections with REJ errors among the first 100 connections having the same service as the current connection.
CN201710760156.3A 2017-08-30 2017-08-30 Intrusion detection system and method based on division and characteristic change Expired - Fee Related CN107395640B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201710760156.3A CN107395640B (en) 2017-08-30 2017-08-30 Intrusion detection system and method based on division and characteristic change

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710760156.3A CN107395640B (en) 2017-08-30 2017-08-30 Intrusion detection system and method based on division and characteristic change

Publications (2)

Publication Number Publication Date
CN107395640A CN107395640A (en) 2017-11-24
CN107395640B true CN107395640B (en) 2020-05-12

Family

ID=60347125

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710760156.3A Expired - Fee Related CN107395640B (en) 2017-08-30 2017-08-30 Intrusion detection system and method based on division and characteristic change

Country Status (1)

Country Link
CN (1) CN107395640B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110474866B (en) * 2019-06-14 2022-06-21 南京邮电大学 Method for detecting disguised intrusion information

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102571486A (en) * 2011-12-14 2012-07-11 上海交通大学 Traffic identification method based on bag of word (BOW) model and statistic features
CN102855638A (en) * 2012-08-13 2013-01-02 苏州大学 Detection method for abnormal behavior of vehicle based on spectrum clustering
CN103618744A (en) * 2013-12-10 2014-03-05 华东理工大学 Intrusion detection method based on fast k-nearest neighbor (KNN) algorithm
CN103870751A (en) * 2012-12-18 2014-06-18 中国移动通信集团山东有限公司 Method and system for intrusion detection
CN104462184A (en) * 2014-10-13 2015-03-25 北京系统工程研究所 Large-scale data abnormity recognition method based on bidirectional sampling combination
CN106790175A (en) * 2016-12-29 2017-05-31 北京神州绿盟信息安全科技股份有限公司 The detection method and device of a kind of worm event

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
TW201216106A (en) * 2010-10-13 2012-04-16 Univ Nat Taiwan Science Tech Intrusion detecting system and method to establish classifying rules thereof

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102571486A (en) * 2011-12-14 2012-07-11 上海交通大学 Traffic identification method based on bag of word (BOW) model and statistic features
CN102855638A (en) * 2012-08-13 2013-01-02 苏州大学 Detection method for abnormal behavior of vehicle based on spectrum clustering
CN103870751A (en) * 2012-12-18 2014-06-18 中国移动通信集团山东有限公司 Method and system for intrusion detection
CN103618744A (en) * 2013-12-10 2014-03-05 华东理工大学 Intrusion detection method based on fast k-nearest neighbor (KNN) algorithm
CN104462184A (en) * 2014-10-13 2015-03-25 北京系统工程研究所 Large-scale data abnormity recognition method based on bidirectional sampling combination
CN106790175A (en) * 2016-12-29 2017-05-31 北京神州绿盟信息安全科技股份有限公司 The detection method and device of a kind of worm event

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
A triangle area based nearest neighbors approach to intrusion detection;Chih-Fong Tsai*,Chia-Ying Lin;《ELSEVIER》;20100131;全文 *
一种新的面向非平衡分类问题的特征变换方法;郭华平,张婷,范明;《小型微型计算机系统》;20150531(第5期);全文 *

Also Published As

Publication number Publication date
CN107395640A (en) 2017-11-24

Similar Documents

Publication Publication Date Title
Bendiab et al. IoT malware network traffic classification using visual representation and deep learning
Shahid et al. IoT devices recognition through network traffic analysis
Ortiz et al. DeviceMien: network device behavior modeling for identifying unknown IoT devices
Zhao et al. Feature-based transfer learning for network security
Upadhyay et al. Intrusion detection in SCADA based power grids: Recursive feature elimination model with majority vote ensemble algorithm
WO2022037130A1 (en) Network traffic anomaly detection method and apparatus, and electronic apparatus and storage medium
Piskozub et al. Malalert: Detecting malware in large-scale network traffic using statistical features
Possebon et al. Improved network traffic classification using ensemble learning
Dua Attribute selection and ensemble classifier based novel approach to intrusion detection system
AU2019399664A1 (en) A network device classification apparatus and process
Bodström et al. State of the art literature review on network anomaly detection with deep learning
Abirami et al. Building an ensemble learning based algorithm for improving intrusion detection system
Portela et al. Evaluation of the performance of supervised and unsupervised Machine learning techniques for intrusion detection
CN112104602A (en) Network intrusion detection method based on CNN transfer learning
Brandao et al. Log Files Analysis for Network Intrusion Detection
Shao et al. Deep learning hierarchical representation from heterogeneous flow-level communication data
Nalavade et al. Evaluation of k-means clustering for effective intrusion detection and prevention in massive network traffic data
Wang et al. A high-performance intrusion detection method based on combining supervised and unsupervised learning
Fan et al. Autoiot: Automatically updated iot device identification with semi-supervised learning
CN107395640B (en) Intrusion detection system and method based on division and characteristic change
Shukla et al. UInDeSI4. 0: An efficient Unsupervised Intrusion Detection System for network traffic flow in Industry 4.0 ecosystem
De-La-Hoz-Franco et al. Implementation of an intrusion detection system based on self organizing map
CN111917781A (en) Intelligent internal malicious behavior network attack identification method and electronic equipment
Singh et al. Mitigation of Cyber Attacks in SDN-Based IoT Systems Using Machine Learning Techniques
Das Design and development of an efficient network intrusion detection system using ensemble machine learning techniques for Wifi environments

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20200512

Termination date: 20200830

CF01 Termination of patent right due to non-payment of annual fee