CN112437053A - Intrusion detection method and device - Google Patents

Intrusion detection method and device Download PDF

Info

Publication number
CN112437053A
CN112437053A CN202011248506.6A CN202011248506A CN112437053A CN 112437053 A CN112437053 A CN 112437053A CN 202011248506 A CN202011248506 A CN 202011248506A CN 112437053 A CN112437053 A CN 112437053A
Authority
CN
China
Prior art keywords
data set
feature
sets
matrix
characteristic data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202011248506.6A
Other languages
Chinese (zh)
Other versions
CN112437053B (en
Inventor
周献飞
徐楷
焦建林
董宁
韩盟
徐浩
陈奕倩
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
State Grid Corp of China SGCC
State Grid Beijing Electric Power Co Ltd
Original Assignee
State Grid Corp of China SGCC
State Grid Beijing Electric Power Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by State Grid Corp of China SGCC, State Grid Beijing Electric Power Co Ltd filed Critical State Grid Corp of China SGCC
Priority to CN202011248506.6A priority Critical patent/CN112437053B/en
Publication of CN112437053A publication Critical patent/CN112437053A/en
Application granted granted Critical
Publication of CN112437053B publication Critical patent/CN112437053B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/14Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic
    • H04L63/1408Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic by monitoring network traffic
    • H04L63/1416Event detection, e.g. attack signature detection
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2411Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on the proximity to a decision surface, e.g. support vector machines
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2415Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on parametric or probabilistic models, e.g. based on likelihood ratio or false acceptance rate versus a false rejection rate
    • G06F18/24155Bayesian classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/243Classification techniques relating to the number of classes
    • G06F18/24323Tree-organised classifiers

Abstract

The invention discloses an intrusion detection method and device. Wherein, the method comprises the following steps: acquiring a first characteristic data set; performing dimensionality reduction processing on the first characteristic data set to obtain a second characteristic data set, wherein the dimensionality of the second characteristic data set is smaller than that of the first characteristic data set; and training the intrusion detection model by utilizing the second characteristic data set to obtain a trained intrusion detection model, wherein the trained intrusion detection model is used for carrying out intrusion detection on data to be detected. The invention solves the technical problem of lower accuracy of data detection in the related technology.

Description

Intrusion detection method and device
Technical Field
The invention relates to the field of data processing, in particular to an intrusion detection method and device.
Background
With the development of the internet, the connection and the flow of data are larger and larger, and the subsequent malicious intrusion and the threat brought by the malicious intrusion on computers and various devices are also increased, so that the intrusion detection on the data is required. When the existing intrusion detection system encounters a large amount of high-dimensional data, the problem of dimension disaster is usually encountered, so that the accuracy rate of data detection is low; in addition, the existing intrusion detection system cannot identify unknown attacks in the data detection process, and can report the unknown attacks in a missing manner, so that the accuracy of data detection is low. Therefore, the data detection accuracy of the existing intrusion detection system is low.
In view of the above problems, no effective solution has been proposed.
Disclosure of Invention
The embodiment of the invention provides an intrusion detection method and device, which at least solve the technical problem of low accuracy of data detection in the related technology.
According to an aspect of an embodiment of the present invention, there is provided an intrusion detection method, including: acquiring a first characteristic data set; performing dimensionality reduction processing on the first characteristic data set to obtain a second characteristic data set, wherein the dimensionality of the second characteristic data set is smaller than that of the first characteristic data set; and training the intrusion detection model by utilizing the second characteristic data set to obtain a trained intrusion detection model, wherein the trained intrusion detection model is used for carrying out intrusion detection on data to be detected.
Optionally, the performing dimension reduction processing on the first feature data set to obtain a second feature data set includes: dividing the first characteristic data set by using a cross verification method to generate a plurality of groups of data sets, wherein any two groups of data sets have a mutual exclusion relationship; and carrying out feature screening on the multiple groups of data sets by utilizing a random forest model to obtain multiple groups of target feature sets, wherein each group of target feature sets comprises: a plurality of target features; and performing dimension reduction processing on the multiple groups of target feature sets to obtain a second feature data set.
Optionally, the performing feature screening on the multiple sets of data sets by using a random forest model to obtain multiple sets of target feature sets includes: predicting the multiple groups of data sets by using a random forest model to obtain a score value of each original characteristic contained in the multiple groups of original characteristic sets, wherein the score value is used for representing the importance degree of each original characteristic; obtaining a score mean value of each original feature based on the score value of each original feature contained in the multiple groups of original feature sets; and determining a plurality of groups of target feature sets based on the score mean of each original feature.
Optionally, determining the plurality of sets of target feature sets based on the score mean of each raw feature comprises: according to the score average value of each original feature, sequencing the original features in an ascending order; and acquiring the first preset number of original features at the forefront in the sequenced plurality of original features to obtain a plurality of target features.
Optionally, the performing dimension reduction processing on the multiple sets of target feature sets to obtain a second feature data set includes: constructing a first matrix based on the multiple groups of target feature sets; acquiring a covariance matrix of the first matrix; determining a second matrix based on the covariance matrix; and acquiring the product of the first matrix and the second matrix to obtain a second characteristic data set.
Optionally, determining the second matrix based on the covariance matrix comprises: obtaining an eigenvalue and an eigenvector of a covariance matrix; sorting the eigenvectors according to the magnitude of the eigenvalues to generate a third matrix; and acquiring a second preset number of row matrixes at the forefront in the third matrix to generate a second matrix.
Optionally, before obtaining the covariance matrix of the first matrix, the method further includes: carrying out zero equalization processing on the first matrix to obtain a fourth matrix; and acquiring a covariance matrix of the fourth matrix.
Optionally, before obtaining the product of the first matrix and the second matrix to obtain the second feature data set, the method further includes: performing centralization processing on the first matrix to obtain a fifth matrix; and acquiring the product of the fifth matrix and the second matrix to obtain a second characteristic data set.
Optionally, before feature screening is performed on multiple sets of data sets by using a random forest model to obtain multiple sets of target feature sets, the method further includes: dividing a plurality of groups of data sets randomly for a plurality of times to obtain a plurality of groups of training sets and test sets; training the random forest model by using a plurality of groups of training sets; testing the trained random forest model by using the test set to obtain the total score of the trained random forest model; determining whether training of the random forest model is completed based on the total score.
Optionally, the method for training the intrusion detection model by using the second feature data set includes: carrying out misuse detection on the second characteristic data set to obtain a third characteristic data set, wherein the characteristic data contained in the third characteristic data set is used for representing non-attack data or normal data; and performing iterative training on the plurality of base classifiers by using an ensemble learning algorithm based on the third feature data set to obtain a trained intrusion detection model.
Optionally, the detecting misuse the second feature data set, and obtaining a third feature data set includes: predicting a plurality of different types of preset models by utilizing the second characteristic data set, and determining the detection rate of the plurality of different types of preset models; determining a preset model corresponding to the maximum detection rate as a target model; carrying out misuse detection on the second characteristic data set by using the target model to obtain a detection result of the second characteristic data set; and obtaining a third characteristic data set based on the detection result of the second characteristic data set.
Optionally, the plurality of different types of preset models includes: decision tree models, support vector machine models and naive Bayes models.
Optionally, after acquiring the first feature data set, the method further comprises: and formatting the first characteristic data set to obtain a processed first characteristic data set, wherein the types of variables contained in the processed first characteristic data set are the same.
According to another aspect of the embodiments of the present invention, there is also provided an intrusion detection apparatus, including: an obtaining module, configured to obtain a first feature data set; the processing module is used for performing dimensionality reduction processing on the first characteristic data set to obtain a second characteristic data set, wherein the dimensionality of the second characteristic data set is smaller than that of the first characteristic data set; and the training module is used for training the intrusion detection model by utilizing the second characteristic data set to obtain a trained intrusion detection model, wherein the trained intrusion detection model is used for carrying out intrusion detection on the data to be detected.
According to another aspect of the embodiments of the present invention, there is also provided a computer-readable storage medium, which includes a stored program, wherein when the program runs, the apparatus on which the computer-readable storage medium is located is controlled to execute the intrusion detection method.
According to another aspect of the embodiments of the present invention, there is also provided a processor, configured to execute a program, where the program executes the intrusion detection method described above.
In the embodiment of the invention, a first characteristic data set is obtained firstly, then dimension reduction processing is carried out on the first characteristic data set to obtain a second characteristic data set, wherein the dimension of the second characteristic data set is smaller than that of the first characteristic data set, finally, a trained intrusion detection model is obtained by utilizing the second characteristic data set, wherein the trained intrusion detection model is used for carrying out intrusion detection on data to be detected, the problem of dimension disaster is avoided by carrying out dimension reduction processing on the first characteristic data set, so that the accuracy of data detection is improved, in addition, the intrusion detection model can be timely detected by carrying out real-time training on the intrusion detection model according to the obtained characteristic data set, unknown data attack is avoided from being missed, and the accuracy of data detection can be further improved, and the technical problem of low accuracy of data detection in the related technology is solved.
Drawings
The accompanying drawings, which are included to provide a further understanding of the invention and are incorporated in and constitute a part of this application, illustrate embodiment(s) of the invention and together with the description serve to explain the invention without limiting the invention. In the drawings:
FIG. 1 is a flow chart of an intrusion detection method according to an embodiment of the invention;
FIG. 2 is a flow chart of another intrusion detection method according to an embodiment of the invention;
FIG. 3 is a schematic diagram of an intrusion detection device according to an embodiment of the invention;
fig. 4 is a schematic diagram of another intrusion detection device according to an embodiment of the invention.
Detailed Description
In order to make the technical solutions of the present invention better understood, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
It should be noted that the terms "first," "second," and the like in the description and claims of the present invention and in the drawings described above are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used is interchangeable under appropriate circumstances such that the embodiments of the invention described herein are capable of operation in sequences other than those illustrated or described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.
In accordance with an embodiment of the present invention, there is provided a method embodiment of intrusion detection, it being noted that the steps illustrated in the flowchart of the drawings may be performed in a computer system such as a set of computer-executable instructions and that, although a logical order is illustrated in the flowchart, in some cases the steps illustrated or described may be performed in an order different than here.
Fig. 1 is a flowchart of an intrusion detection method according to an embodiment of the present invention, as shown in fig. 1, the method includes the following steps:
step S102, a first characteristic data set is obtained.
The first feature data set in the above steps is a data set for network intrusion detection, and the first feature data set may be at least one of the following: a network traffic based dataset, a grid based dataset, an internet traffic based dataset, a virtual private network based dataset, an android applications based dataset, an internet of things (IOT) traffic based dataset, an internet connected device based dataset. Among other things, the data set based on network traffic may be DARPA 1998dataset, KDD Cup 1999dataset, NSL-KDD dataset, or UNSW-NB15 dataset.
And step S104, performing dimension reduction processing on the first characteristic data set to obtain a second characteristic data set.
Wherein the dimension of the second feature data set is smaller than the first feature data set.
In an alternative embodiment, the dimensionality of the feature data is reduced, so that the problem of dimension disaster can be avoided while the maximum information quantity in the feature data set is kept, the calculated quantity of the feature data is exponentially reduced by reducing the dimensionality of the feature data, and the complexity of feature data calculation can be reduced.
And step S106, training the intrusion detection model by using the second characteristic data set to obtain the trained intrusion detection model.
The trained intrusion detection model is used for carrying out intrusion detection on data to be detected.
In an optional embodiment, the intrusion detection model is trained in real time by using the feature data set after the dimension reduction, so that the intrusion detection model can detect unknown data attacks in time, the intrusion data can be more accurately predicted, and the effect of reducing the false alarm rate of the intrusion detection model is achieved.
Through the above embodiment of the present invention, first, a first feature data set is obtained, then, the dimension of the first feature data set is reduced to obtain a second feature data set, wherein the dimension of the second feature data set is smaller than that of the first feature data set, and finally, the intrusion detection model is trained by using the second feature data set to obtain a trained intrusion detection model, wherein the trained intrusion detection model is used for intrusion detection of data to be detected, and the problem of disaster dimension is avoided by performing the dimension reduction processing on the first feature data set, so that the accuracy of data detection is improved, in addition, the intrusion detection model can be timely detected by performing the real-time training on the intrusion detection model according to the obtained feature data set, so as to avoid missing report of unknown data attacks, and further improve the accuracy of data detection, and the technical problem of low accuracy of data detection in the related technology is solved.
Optionally, the performing dimension reduction processing on the first feature data set to obtain a second feature data set includes: dividing the first characteristic data set by using a cross verification method to generate a plurality of groups of data sets, wherein any two groups of data sets have a mutual exclusion relationship; and carrying out feature screening on the multiple groups of data sets by utilizing a random forest model to obtain multiple groups of target feature sets, wherein each group of target feature sets comprises: a plurality of target features; and performing dimension reduction processing on the multiple groups of target feature sets to obtain a second feature data set.
The cross-validation method in the above steps, also called loop estimation, is a practical method to cut the data samples into smaller subsets.
The random forest in the above steps is a classifier comprising a plurality of decision trees, and the output class of the classifier is determined by the mode of the class output by the individual trees.
In an alternative embodiment, the cross-validation method is used to divide the processed data set into mutually exclusive training subsets, and generate multiple training set test sets, so as to avoid errors caused by one test. The divided data sets are tested by utilizing a random forest, the contribution rates of a group of features are obtained for each group of training set test sets, the contribution rates are averaged, a few features with certain correlation are selected from the features with smaller average values of the feature contribution rates, then a group of less number of new feature data sets which are irrelevant to each other are formed again by PCA (Principal component analysis) to replace the original feature data set, so that the new features reflect the information represented by the original features to the maximum extent, the information among all indexes is ensured not to be overlapped, and the newly obtained multi-dimensional features replace the original multi-bit features to obtain a new data set. The new data set not only reduces the feature dimension, but also ensures that each dimension feature contains more information.
Optionally, the performing feature screening on the multiple sets of data sets by using a random forest model to obtain multiple sets of target feature sets includes: predicting the multiple groups of data sets by using a random forest model to obtain a score value of each original characteristic contained in the multiple groups of original characteristic sets, wherein the score value is used for representing the importance degree of each original characteristic; obtaining a score mean value of each original feature based on the score value of each original feature contained in the multiple groups of original feature sets; and determining a plurality of groups of target feature sets based on the score mean of each original feature.
In an alternative embodiment, a random forest model may be used to predict a plurality Of sets Of data sets, so as to obtain how much each feature contributes to each tree in a random forest, then an average value is taken, and finally the contribution sizes between the features are compared, which may be measured by using a cuni index (Gini index) or Out Of Bag (OOB) error rate as an evaluation index, and by comparing the contribution sizes between the features, a feature with a larger contribution value may be used as a feature in a target feature set, and a feature with a smaller contribution value may be removed.
For example, the score mean of each raw feature can be obtained by using a kini index; the importance scores of the variables are represented by VIM (Vi Improved, text editor), assuming c features x1,x2,x3,...,xcNow, each feature x is calculatediOf (2) aNib index score
Figure BDA0002770839660000061
I.e., the average amount of change in node fragmentation purity of the j features across all decision trees.
Wherein, the calculation formula of the Gini index is as follows:
Figure BDA0002770839660000062
where k represents k classes, pkRepresenting the sample weight of class k. Characteristic xjThe importance of the node m, i.e., the variation of the kini index before and after branching of the node m, is:
Figure BDA0002770839660000063
wherein, GIlAnd GIrRespectively representing the Gini indexes of two new nodes after branching. If, feature xjThe nodes that appear in decision tree i are in set M, then xjThe importance of the ith tree is:
Figure BDA0002770839660000064
assuming there are n trees in total, then
Figure BDA0002770839660000065
Finally, all the calculated importance scores are normalized as shown in table 1.
Figure BDA0002770839660000066
The denominator is the sum of all the characteristic gains, and the numerator is the kini index of the characteristic j.
TABLE 1
Feature(s) Mean value of contribution rate Feature(s) Mean value of contribution rate Feature(s) Mean value of contribution rate
dur 0.06789 dloss 0.0095 trans_depth 0.00208
proto 0.0168 sinpkt 0.01326 response_body_len 0.00425
service 0.02767 dinpkt 0.02406 ct_srv_src 0.02814
state 0.01578 sjit 0.00723 ct_state_ttl 0.05433
spkts 0.00879 djit 0.00784 ct_dst_ltm 0.01183
dpkts 0.04487 swin 0.01366 ct_src_dport_ltm 0.01409
sbytes 0.07697 stcpb 0.00479 ct_dst_sport_ltm 0.04049
dbytes 0.01584 dtcpb 0.00481 ct_dst_src_ltm 0.09199
rate 0.01298 dwin 0.00075 is_ftp_login 0.00014
sttl 0.01925 tcprtt 0.04222 ct_ftp_cmd 0.0001
dttl 0.0884 synack 0.02442 ct_flw_http_mthd 0.00231
sload 0.01486 ackdat 0.01299 ct_src_ltm 0.00852
dload 0.01313 smean 0.02917 ct_srv_dst 0.05406
sloss 0.02778 dmean 0.04264 is_sm_ips_ports 0.00416
Optionally, determining the plurality of sets of target feature sets based on the score mean of each raw feature comprises: according to the score average value of each original feature, sequencing the original features in an ascending order; and acquiring the first preset number of original features at the forefront in the sequenced plurality of original features to obtain a plurality of target features.
The first preset number in the above steps may be set by a user, and the plurality of target features are features that need to be subjected to dimension reduction processing.
In an alternative embodiment, the feature importance scores VIM derived from each of the small data sets described above may be usedjAnd taking an average value, and selecting m features with smaller feature importance scores to reduce dimensions.
Optionally, the performing dimension reduction processing on the multiple sets of target feature sets to obtain a second feature data set includes: constructing a first matrix based on the multiple groups of target feature sets; acquiring a covariance matrix of the first matrix; determining a second matrix based on the covariance matrix; and acquiring the product of the first matrix and the second matrix to obtain a second characteristic data set.
In an alternative embodiment, the first matrix may be a matrix x, the second matrix may be a matrix p, the dimension reduction process may be performed by performing dimension reduction on a pieces of m-dimensional data, forming the raw data into an m-row a-column matrix x by columns, zero-equalizing each row of x, that is, subtracting the mean value of the row, obtaining a covariance matrix, obtaining eigenvalues of the covariance matrix and corresponding eigenvectors r, arranging the eigenvectors r into a matrix from top to bottom by rows according to the size of the corresponding eigenvalues, forming a matrix p by the first k rows, multiplying the matrix formed by k eigenvectors by the centralized data matrix, that is, the data reduced to the u-dimension, and in this case, using the formula error r ═ may be used
Figure BDA0002770839660000081
Representing the error after compression, u is the number of the features after dimensionality reduction, and then determining an x, such as 0.01, with which x is determined, such that error < x, it is considered acceptable to reduce the dimensionality to the u dimension. Replacing the original m-dimensional features with the new u-dimensional features to finally obtain a new data set of (x-m + u) features, namely a second data setAnd a feature data set, the second feature data set being used for intrusion detection.
Optionally, determining the second matrix based on the covariance matrix comprises: obtaining an eigenvalue and an eigenvector of a covariance matrix; sorting the eigenvectors according to the magnitude of the eigenvalues to generate a third matrix; and acquiring a second preset number of row matrixes at the forefront in the third matrix to generate a second matrix.
In an alternative embodiment, the eigenvector of the covariance may be r, the third matrix may be the matrix q, and the second matrix may be the matrix p. The eigenvalue of covariance and corresponding eigenvector r can be obtained, the eigenvector r is arranged into a matrix q from top to bottom according to the size of the corresponding eigenvalue, and the first k rows of the matrix q are taken to form a matrix p.
Optionally, before obtaining the covariance matrix of the first matrix, the method further includes: carrying out zero equalization processing on the first matrix to obtain a fourth matrix; and acquiring a covariance matrix of the fourth matrix.
The zero-averaging processing in the above steps is to subtract the average value of the variable, which is actually a translation process, the center of all data after translation is (0,0), and the error caused by different dimensions, self variation or large numerical value difference can be cancelled through the zero-averaging processing.
In an alternative embodiment, zero-averaging may be performed on each row of data in the first matrix, that is, the mean value of each row is subtracted from the data of each row to obtain a fourth matrix, and a covariance matrix of the fourth matrix may be obtained.
Optionally, before obtaining the product of the first matrix and the second matrix to obtain the second feature data set, the method further includes: performing centralization processing on the first matrix to obtain a fifth matrix; and acquiring the product of the fifth matrix and the second matrix to obtain a second characteristic data set.
The centralization treatment in the steps has the same effect as zero averaging treatment, and errors caused by different dimensions, self variation or larger numerical value difference can be eliminated.
In an alternative embodiment, each data in the first matrix may be zero-averaged, that is, the average value of all data is subtracted from each data to obtain a fifth matrix, and a product of the fifth matrix and the second matrix may be obtained to obtain the second feature data set.
Optionally, before feature screening is performed on multiple sets of data sets by using a random forest model to obtain multiple sets of target feature sets, the method further includes: dividing a plurality of groups of data sets randomly for a plurality of times to obtain a plurality of groups of training sets and test sets; training the random forest model by using a plurality of groups of training sets; testing the trained random forest model by using the test set to obtain the total score of the trained random forest model; determining whether training of the random forest model is completed based on the total score.
In an alternative embodiment, k-flow cross Validation may be used to split sets of data into smaller sets of data. Firstly, the k-weight cross verification method randomly divides sample data into k parts, randomly selects k-1 parts each time as a training set, and randomly selects the remaining 1 parts as a test set, and after the first division is completed, the k-1 parts can be randomly selected again to train data, so that k training data sets and k test data sets can be obtained.
The process of training the random forest model by using the multiple sets of training sets can be as follows: selecting n samples from the sample set as a training set by using a sampling and returning method, and generating a decision tree by using the sample set obtained by sampling; randomly and repeatedly selecting d features at each node of the generated decision tree, respectively dividing the sample set by using the d features to find the optimal division feature, repeatedly selecting the d features, respectively dividing the sample set by using the d features to find the optimal division feature, wherein m is the number of decision trees in the random forest. And predicting the test sample by using the random forest obtained by training, and determining the predicted result by using a voting method.
Wherein, in { A2,A3,A4,……AkConstructing a random forest model M on the basis of1And to data set A1Is testedComparing the predicted value with the true value, and calculating a score a under a certain evaluation criterion1
In { A1,A3,A4,……AkConstruction of model M on the basis of1And to data set A2Verifying, comparing the predicted value with the true value, and calculating a score a under the same evaluation standard2
In { A1,A2,A3,……Ak-1Constructing a model on the basis of the data set AkVerifying, comparing the predicted value with the true value, and calculating a score a under the same evaluation standardk
a1=a1+a2+…+akK as model M1The composite score of (1).
Wherein A is1,A2,A3,……AkRespectively representing k data sets, M, obtained by a k-fold cross-validation method1Representing a trained random forest model. For each a obtained1,a2,……akEach feature has a different importance.
Optionally, the method for training the intrusion detection model by using the second feature data set includes: carrying out misuse detection on the second characteristic data set to obtain a third characteristic data set, wherein the characteristic data contained in the third characteristic data set is used for representing non-attack data or normal data; and performing iterative training on the plurality of base classifiers by using an ensemble learning algorithm based on the third feature data set to obtain a trained intrusion detection model.
The misuse detection in the steps is a method for detecting computer attacks, known attacks can be simply added into the model in the misuse detection, the false alarm rate of the detection is low, and the detection efficiency is high.
In an alternative embodiment, the second feature data set may be misuse-detected, and after the misuse-detection, there will generally be some non-attack data not in the second feature data set, and these data may be extracted and combined with the second feature data set to generate a new feature data set, i.e. a third feature data set.
In an alternative embodiment, the process of iteratively training the plurality of base classifiers by using the ensemble learning algorithm may be: firstly, initializing weight distribution of training data: w — 1i 1/N, i 1,2, … N, and then M1, 2, …, M using a weight distribution D with a weight distribution for MmLearning the training data set to obtain a basic classifier Gm(x) Calculate Gm(x) A classification error rate on the training set;
Figure BDA0002770839660000101
Figure BDA0002770839660000102
calculation of Gm(x) Coefficient (c): a ism=1/2log(1-em/em) And then updating the weight distribution (z) of the training data setmIs a normalization factor which makes zm+1Becomes a probability distribution): wm+1,i=wmi/zm exp(-amyiGm(xi)),
Figure BDA0002770839660000103
Constructing linear combination of the base classifiers to obtain a final classifier,
Figure BDA0002770839660000104
Figure BDA0002770839660000105
and finally, predicting the result by using a final classifier, wherein the final classifier is a trained intrusion detection model.
Optionally, the detecting misuse the second feature data set, and obtaining a third feature data set includes: predicting a plurality of different types of preset models by utilizing the second characteristic data set, and determining the detection rate of the plurality of different types of preset models; determining a preset model corresponding to the maximum detection rate as a target model; carrying out misuse detection on the second characteristic data set by using the target model to obtain a detection result of the second characteristic data set; and obtaining a third characteristic data set based on the detection result of the second characteristic data set.
Optionally, the plurality of different types of preset models includes: decision tree models, support vector machine models and naive Bayes models.
When a naive bayes model is employed, assume that the classification model samples are:
Figure BDA0002770839660000106
Figure BDA0002770839660000107
that is, there are m samples, each sample has n features, the feature output has K categories, defined as C1,C2,...,Ck(ii) a Obtaining a prior distribution P (Y ═ C) of naive Bayes from sample learningk) (K ═ 1,2,. K), then a conditional probability distribution P is learned (X ═ X | Y ═ C)k)=P(X1=x1,X2=x2,...Xn=xn|Y=Ck) Then, a Bayesian formula is used to obtain a combined distribution P (X, Y) of X and Y: p (X, Y ═ C)k)=P(Y=Ck)P(X=x|Y=Ck)=P(Y=Ck)P(X1=x1,X2=x2,...Xn=xn|Y=Ck)=P(X1=x1|Y=Ck)P(X2=x2|Y=Ck)…P(Xn=xn|Y=Ck) By maximum likelihood, P (Y ═ C)k) Found that it is of type CkAnd finding out the category corresponding to the maximum conditional probability in the frequency of the training set, which is the prediction of naive Bayes.
When using support vector machine model, classification function is used
Figure BDA0002770839660000108
Where l represents the number of training samples and x represents the vector of the instance to be classified,xi,yiAttribute vector and class identification, K (x), representing the ith sampleiX) represents a kernel function, aiB represents model parameters, and a is obtained by quadratic programmingiFurther, w and b are obtained to obtain a classification model g (x) ═ w x + b, when g (x)>0 and g (x)<X at 0 time belongs to different categories respectively, and the plane with the largest distance in the two categories of objects is selected.
When the decision tree model is adopted, attributes are selected according to the kini index, the information gain or the information gain ratio, and then branches are established from top to bottom according to the attributes until all samples on one node are classified into the same class or the number of samples in a certain node is lower than a given value. And (4) preventing overfitting by using a method of pruning firstly, pruning secondly or a combination of the pruning and the later pruning to obtain a final model.
In an optional embodiment, the obtained three models are used for predicting the second characteristic data set respectively, the model with the highest test detection rate is selected as the target model, the missing report rate in the detection process is reduced, and then the non-attack and normal data in the process are extracted to serve as a new data set, namely the third characteristic data set.
Optionally, after acquiring the first feature data set, the method further comprises: and formatting the first characteristic data set to obtain a processed first characteristic data set, wherein the types of variables contained in the processed first characteristic data set are the same.
In an alternative embodiment, the formatting process may be a numerical process, and since the first feature data set includes both data-type variables and character-type variables, the first feature data set needs to be subjected to a data processing process in a unified manner in order to facilitate subsequent processing of the first feature data set.
The left side of table 2 is the accuracy of the algorithms obtained by performing misuse detection with several machine learning algorithms after feature processing, and the misuse detection can be performed by selecting a decision tree through comparison, the right side of table 2 is the accuracy without feature processing, and the accuracy of the test set after performing a field of detection with an integration algorithm is improved by performing the misuse detection with the decision tree after feature processing, and combining the experimental result.
TABLE 2
Figure BDA0002770839660000111
A preferred embodiment of the present invention will be described in detail with reference to fig. 2 to 3. As shown in fig. x, the method may include the steps of:
step S201, dividing a data set to obtain a plurality of mutually exclusive training test sets;
optionally, the data set may be first unified and digitized for subsequent use.
Optionally, a cross-validation method may be adopted to divide the processed data set into mutually exclusive training subsets, and generate multiple sets of training sets and test sets, so that errors caused by one-time testing can be avoided.
As shown in FIG. 3, the data set is divided into a training test set 1, a training test set 2, and a training test set N.
Step S202, testing by using a random forest based on the divided data sets, and obtaining the contribution rate of a group of features for each group of training set testing sets;
as shown in fig. 3, a random forest 1 is used to test a training test set 1, so as to obtain a contribution rate sequence from a feature 1 to a feature x in the training test set 1; testing the training test set 2 by using the random forest 2 to obtain the contribution rate sequence from the feature 1 to the feature x in the training test set 2; and testing the training test set N by using the random forest N to obtain the contribution rate sequence from the characteristic 1 to the characteristic X in the training test set N.
Step S203, averaging the feature contribution rates, and selecting a plurality of features with certain correlation with smaller average value of the feature contribution rates as target features;
as shown in fig. 3, the features of each training test set are averaged, and several features with certain correlation are selected, wherein the average of the feature contribution rates is smaller; wherein the features of the first m rows in the feature ordering of each training test set can be selected as target features.
Step S204, a group of less independent new comprehensive indexes are recombined by utilizing PCA to replace the original indexes, and the newly obtained multi-dimensional characteristics replace the original multi-dimensional characteristics to obtain a new data set;
as shown in fig. 3, feature 1 through feature Y in each training test set may be combined using PCA to obtain a new data set.
The new data set reduces the feature dimensionality and ensures that each dimensionality feature contains more information.
Step S205, based on the new data set, carrying out misuse detection;
when the algorithm is selected, various algorithms such as a decision tree, a support vector machine and naive Bayes can be respectively used for testing, and the algorithm with the highest testing accuracy is finally selected. And the lower missing report rate of the first layer detection is ensured.
As shown in fig. 3, a decision tree, a support vector machine, and naive bayes may be used to test new data sets, respectively, add the new data set with the highest accuracy to the intrusion rule base, and perform misuse detection based on the data set.
Step S206, after the misuse test is carried out, extracting non-attack data and normal data in the characteristic rule base to be used as a new data set;
step S207, based on the data set after the misuse test, carrying out anomaly detection to obtain a strong classifier;
optionally, the anomaly detection is to train and form a plurality of base classifiers based on the data set after the misuse detection, perform iterative training on the plurality of base classifiers by using an Adaptive Boosting algorithm (Adaptive Boosting) in the inheritance learning algorithm, give higher weight to the samples with classification errors each time, and finally combine the samples into a strong classifier for anomaly detection.
And S208, carrying out intrusion detection by using the strong classifier, and extracting non-attack data and normal data in the characteristic rule base to be used as a final data set.
As shown in fig. 3, after the anomaly detection, there are some attack data that are not in the intrusion rule base, and a new data set can be formed by extracting and combining the attack data with the normal data.
Example 2
According to the embodiment of the present invention, an intrusion detection apparatus is further provided, which can execute the intrusion detection method in the above embodiment, and the specific implementation manner and the preferred application scenario are the same as those in the above embodiment, and are not described herein again.
Fig. 4 is a schematic diagram of an intrusion detection apparatus according to an embodiment of the present invention, as shown in fig. 4, the apparatus including:
an obtaining module 42 for obtaining a first feature data set;
a processing module 44, configured to perform dimension reduction processing on the first feature data set to obtain a second feature data set, where a dimension of the second feature data set is smaller than that of the first feature data set;
and a training module 46, configured to train an intrusion detection model by using the second feature data set, so as to obtain a trained intrusion detection model, where the trained intrusion detection model is used to perform intrusion detection on data to be detected.
Optionally, the processing module comprises: the dividing unit is used for dividing the first characteristic data set by using a cross verification method to generate a plurality of groups of data sets, wherein any two groups of data sets have a mutual exclusion relationship; the screening unit is used for carrying out feature screening on the multiple groups of data sets by utilizing the random forest model to obtain multiple groups of target feature sets, wherein each group of target feature sets comprises: a plurality of target features; and the processing unit is used for performing dimension reduction processing on the multiple groups of target feature sets to obtain a second feature data set.
Optionally, the screening unit comprises: the prediction subunit is used for predicting the multiple groups of data sets by using the random forest model to obtain a score value of each original characteristic contained in the multiple groups of original characteristic sets, wherein the score value is used for representing the importance degree of each original characteristic; the first obtaining subunit is used for obtaining a score mean value of each original feature based on the score value of each original feature contained in the multiple groups of original feature sets; and the first determining subunit is used for determining a plurality of groups of target feature sets based on the score mean value of each original feature.
Optionally, the determining subunit further performs ascending sorting on the plurality of original features according to the score mean of each original feature, and obtains a first preset number of the top original features in the sorted plurality of original features to obtain a plurality of target features.
Optionally, the processing unit comprises: the construction subunit is used for constructing a first matrix based on the multiple groups of target feature sets; a second obtaining subunit, configured to obtain a covariance matrix of the first matrix; a second determining subunit, configured to determine a second matrix based on the covariance matrix; the second obtaining subunit is further configured to obtain a product of the first matrix and the second matrix, and obtain a second feature data set.
Optionally, the second determining subunit is further configured to obtain eigenvalues and eigenvectors of the covariance matrix, sort the eigenvectors according to sizes of the eigenvalues, generate a third matrix, obtain a second preset number of row matrices in the third matrix, and generate the second matrix.
Optionally, the processing unit comprises: the first processing subunit is used for carrying out zero equalization processing on the first matrix to obtain a fourth matrix; and the third acquisition subunit is used for acquiring the covariance matrix of the fourth matrix.
Optionally, the processing unit comprises: the second processing subunit is used for performing centralized processing on the first matrix to obtain a fifth matrix; and the fourth acquiring subunit is used for acquiring the product of the fifth matrix and the second matrix to obtain a second characteristic data set.
Optionally, the processing module comprises: the dividing unit is also used for randomly dividing the plurality of groups of data sets for a plurality of times to obtain a plurality of groups of training sets and test sets; the first training unit is used for training the random forest model by utilizing a plurality of groups of training sets; the testing unit is used for testing the trained random forest model by using the testing set to obtain the total score of the trained random forest model; and the determining unit is used for determining whether the training of the random forest model is finished or not based on the total score.
Optionally, the training module comprises: the detection unit is used for carrying out misuse detection on the second characteristic data set to obtain a third characteristic data set, wherein the characteristic data contained in the third characteristic data set is used for representing non-attack data or normal data; and the second training unit is used for carrying out iterative training on the plurality of base classifiers by utilizing an integrated learning algorithm based on the third characteristic data set to obtain a trained intrusion detection model.
Optionally, the detection unit comprises: the prediction subunit is used for predicting the preset models of the different types by utilizing the second characteristic data set and determining the detection rates of the preset models of the different types; the third determining subunit is used for determining the preset model corresponding to the maximum detection rate as the target model; the detection subunit is used for carrying out misuse detection on the second characteristic data set by using the target model to obtain a detection result of the second characteristic data set; and the fifth acquiring subunit is used for acquiring a third characteristic data set based on the detection result of the second characteristic data set.
Optionally, the plurality of different types of preset models in the detection unit include: decision tree models, support vector machine models and naive Bayes models.
Optionally, the processing module is further configured to format the first feature data set to obtain a processed first feature data set, where types of variables included in the processed first feature data set are the same.
Example 3
According to an embodiment of the present invention, there is further provided a computer-readable storage medium, where the computer-readable storage medium includes a stored program, and when the program runs, the apparatus where the computer-readable storage medium is located is controlled to execute the intrusion detection method in embodiment 1.
Example 4
According to an embodiment of the present invention, there is further provided a processor, where the processor is configured to execute a program, where the program executes the intrusion detection method in embodiment 1.
The above-mentioned serial numbers of the embodiments of the present invention are merely for description and do not represent the merits of the embodiments.
In the above embodiments of the present invention, the descriptions of the respective embodiments have respective emphasis, and for parts that are not described in detail in a certain embodiment, reference may be made to related descriptions of other embodiments.
In the embodiments provided in the present application, it should be understood that the disclosed technology can be implemented in other ways. The above-described embodiments of the apparatus are merely illustrative, and for example, the division of the units may be a logical division, and in actual implementation, there may be another division, for example, multiple units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, units or modules, and may be in an electrical or other form.
The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.
In addition, functional units in the embodiments of the present invention may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit.
The integrated unit, if implemented in the form of a software functional unit and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a removable hard disk, a magnetic or optical disk, and other various media capable of storing program codes.
The foregoing is only a preferred embodiment of the present invention, and it should be noted that, for those skilled in the art, various modifications and decorations can be made without departing from the principle of the present invention, and these modifications and decorations should also be regarded as the protection scope of the present invention.

Claims (16)

1. An intrusion detection method, comprising:
acquiring a first characteristic data set;
performing dimensionality reduction processing on the first characteristic data set to obtain a second characteristic data set, wherein the dimensionality of the second characteristic data set is smaller than that of the first characteristic data set;
and training an intrusion detection model by utilizing the second characteristic data set to obtain a trained intrusion detection model, wherein the trained intrusion detection model is used for carrying out intrusion detection on data to be detected.
2. The method of claim 1, wherein performing dimension reduction on the first feature data set to obtain a second feature data set comprises:
dividing the first characteristic data set by using a cross verification method to generate a plurality of groups of data sets, wherein any two groups of data sets have a mutual exclusion relationship;
and performing feature screening on the multiple groups of data sets by using a random forest model to obtain multiple groups of target feature sets, wherein each group of target feature sets comprises: a plurality of target features;
and performing dimension reduction processing on the multiple groups of target feature sets to obtain the second feature data set.
3. The method of claim 2, wherein performing feature screening on the plurality of sets of data sets using the random forest model to obtain a plurality of sets of target feature sets comprises:
predicting the multiple groups of data sets by using the random forest model to obtain a score value of each original feature contained in the multiple groups of original feature sets, wherein the score value is used for representing the importance degree of each original feature;
obtaining a score mean value of each original feature based on the score value of each original feature contained in the multiple groups of original feature sets;
and determining the multiple groups of target feature sets based on the score mean of each original feature.
4. The method of claim 3, wherein determining the plurality of sets of target feature sets based on the mean score of each of the raw features comprises:
according to the score average value of each original feature, sequencing the original features in an ascending order;
and acquiring the first preset number of original features at the forefront in the sequenced plurality of original features to obtain the plurality of target features.
5. The method of claim 2, wherein performing dimension reduction on the plurality of sets of target feature sets to obtain the second feature data set comprises:
constructing a first matrix based on the plurality of sets of target feature sets;
acquiring a covariance matrix of the first matrix;
determining a second matrix based on the covariance matrix;
and acquiring the product of the first matrix and the second matrix to obtain the second characteristic data set.
6. The method of claim 5, wherein determining the second matrix based on the covariance matrix comprises:
obtaining an eigenvalue and an eigenvector of the covariance matrix;
sorting the eigenvectors according to the magnitude of the eigenvalues to generate a third matrix;
and acquiring a second preset number of row matrixes at the forefront in the third matrix to generate the second matrix.
7. The method of claim 5, wherein prior to obtaining the covariance matrix of the first matrix, the method further comprises:
carrying out zero equalization processing on the first matrix to obtain a fourth matrix;
obtaining the covariance matrix of the fourth matrix.
8. The method of claim 5, wherein prior to obtaining the product of the first matrix and the second matrix to obtain the second feature data set, the method further comprises:
performing centralization processing on the first matrix to obtain a fifth matrix;
and acquiring the product of the fifth matrix and the second matrix to obtain the second characteristic data set.
9. The method of claim 2, wherein before performing feature screening on the plurality of sets of data sets using the random forest model to obtain a plurality of sets of target feature sets, the method further comprises:
dividing the multiple groups of data sets for multiple times randomly to obtain multiple groups of training sets and test sets;
training the random forest model by using the multiple groups of training sets;
testing the trained random forest model by using the test set to obtain the total score of the trained random forest model;
determining whether training of the random forest model is complete based on the total score.
10. The method of claim 1, wherein training an intrusion detection model using the second feature data set to obtain the trained intrusion detection model comprises:
carrying out misuse detection on the second characteristic data set to obtain a third characteristic data set, wherein the characteristic data contained in the third characteristic data set is used for representing non-attack data or normal data;
and performing iterative training on a plurality of base classifiers by using an ensemble learning algorithm based on the third feature data set to obtain the trained intrusion detection model.
11. The method of claim 10, wherein performing misuse detection on the second feature data set to obtain a third feature data set comprises:
predicting a plurality of different types of preset models by utilizing the second characteristic data set, and determining the detection rate of the plurality of different types of preset models;
determining a preset model corresponding to the maximum detection rate as a target model;
carrying out misuse detection on the second characteristic data set by using the target model to obtain a detection result of the second characteristic data set;
and obtaining the third characteristic data set based on the detection result of the second characteristic data set.
12. The method of claim 11, wherein the plurality of different types of preset models comprises: decision tree models, support vector machine models and naive Bayes models.
13. The method of claim 1, wherein after acquiring the first feature data set, the method further comprises:
and formatting the first characteristic data set to obtain a processed first characteristic data set, wherein the types of variables contained in the processed first characteristic data set are the same.
14. An intrusion detection device, comprising:
an obtaining module, configured to obtain a first feature data set;
the processing module is used for performing dimensionality reduction processing on the first characteristic data set to obtain a second characteristic data set, wherein the dimensionality of the second characteristic data set is smaller than that of the first characteristic data set;
and the training module is used for utilizing the second characteristic data set to train the intrusion detection model to obtain a trained intrusion detection model, wherein the trained intrusion detection model is used for carrying out intrusion detection on data to be detected.
15. A computer-readable storage medium, comprising a stored program, wherein the program, when executed, controls an apparatus in which the computer-readable storage medium is located to perform the intrusion detection method according to any one of claims 1 to 13.
16. A processor configured to execute a program, wherein the program executes to perform the intrusion detection method according to any one of claims 1 to 13.
CN202011248506.6A 2020-11-10 2020-11-10 Intrusion detection method and device Active CN112437053B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011248506.6A CN112437053B (en) 2020-11-10 2020-11-10 Intrusion detection method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011248506.6A CN112437053B (en) 2020-11-10 2020-11-10 Intrusion detection method and device

Publications (2)

Publication Number Publication Date
CN112437053A true CN112437053A (en) 2021-03-02
CN112437053B CN112437053B (en) 2023-06-30

Family

ID=74699400

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011248506.6A Active CN112437053B (en) 2020-11-10 2020-11-10 Intrusion detection method and device

Country Status (1)

Country Link
CN (1) CN112437053B (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113542276A (en) * 2021-07-16 2021-10-22 江苏商贸职业学院 Method and system for detecting intrusion target of hybrid network
CN113645182A (en) * 2021-06-21 2021-11-12 上海电力大学 Random forest detection method for denial of service attack based on secondary feature screening
CN113836527A (en) * 2021-11-23 2021-12-24 北京微步在线科技有限公司 Intrusion event detection model construction method and device and intrusion event detection method

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101399672A (en) * 2008-10-17 2009-04-01 章毅 Intrusion detection method for fusion of multiple neutral networks
CN106878995A (en) * 2017-04-27 2017-06-20 重庆邮电大学 A kind of wireless sensor network Exception Type discrimination method based on perception data
CN106951778A (en) * 2017-03-13 2017-07-14 步步高电子商务有限责任公司 A kind of intrusion detection method towards complicated flow data event analysis
CN107395590A (en) * 2017-07-19 2017-11-24 福州大学 A kind of intrusion detection method classified based on PCA and random forest
US20180176243A1 (en) * 2016-12-16 2018-06-21 Patternex, Inc. Method and system for learning representations for log data in cybersecurity
CN108712404A (en) * 2018-05-04 2018-10-26 重庆邮电大学 A kind of Internet of Things intrusion detection method based on machine learning
CN109818798A (en) * 2019-02-19 2019-05-28 上海海事大学 A kind of wireless sensor network intruding detection system and method merging KPCA and ELM
CN110809009A (en) * 2019-12-12 2020-02-18 江苏亨通工控安全研究院有限公司 Two-stage intrusion detection system applied to industrial control network
CN110825068A (en) * 2019-09-29 2020-02-21 惠州蓄能发电有限公司 Industrial control system anomaly detection method based on PCA-CNN

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101399672A (en) * 2008-10-17 2009-04-01 章毅 Intrusion detection method for fusion of multiple neutral networks
US20180176243A1 (en) * 2016-12-16 2018-06-21 Patternex, Inc. Method and system for learning representations for log data in cybersecurity
CN106951778A (en) * 2017-03-13 2017-07-14 步步高电子商务有限责任公司 A kind of intrusion detection method towards complicated flow data event analysis
CN106878995A (en) * 2017-04-27 2017-06-20 重庆邮电大学 A kind of wireless sensor network Exception Type discrimination method based on perception data
CN107395590A (en) * 2017-07-19 2017-11-24 福州大学 A kind of intrusion detection method classified based on PCA and random forest
CN108712404A (en) * 2018-05-04 2018-10-26 重庆邮电大学 A kind of Internet of Things intrusion detection method based on machine learning
CN109818798A (en) * 2019-02-19 2019-05-28 上海海事大学 A kind of wireless sensor network intruding detection system and method merging KPCA and ELM
CN110825068A (en) * 2019-09-29 2020-02-21 惠州蓄能发电有限公司 Industrial control system anomaly detection method based on PCA-CNN
CN110809009A (en) * 2019-12-12 2020-02-18 江苏亨通工控安全研究院有限公司 Two-stage intrusion detection system applied to industrial control network

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
张宝华等: "PCA-LSTM在网络入侵检测中的应用", 《价值工程》 *
林伟宁等: "一种基于PCA和随机森林分类的入侵检测算法研究", 《信息网络安全》 *
陈卓等: "基于随机森林和XGBoost的网络入侵检测模型", 《信号处理》 *

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113645182A (en) * 2021-06-21 2021-11-12 上海电力大学 Random forest detection method for denial of service attack based on secondary feature screening
CN113542276A (en) * 2021-07-16 2021-10-22 江苏商贸职业学院 Method and system for detecting intrusion target of hybrid network
CN113542276B (en) * 2021-07-16 2023-01-24 江苏商贸职业学院 Method and system for detecting intrusion target of hybrid network
CN113836527A (en) * 2021-11-23 2021-12-24 北京微步在线科技有限公司 Intrusion event detection model construction method and device and intrusion event detection method

Also Published As

Publication number Publication date
CN112437053B (en) 2023-06-30

Similar Documents

Publication Publication Date Title
CN112437053B (en) Intrusion detection method and device
CN108960833B (en) Abnormal transaction identification method, equipment and storage medium based on heterogeneous financial characteristics
CN108875776B (en) Model training method and device, service recommendation method and device, and electronic device
CN106899440B (en) Network intrusion detection method and system for cloud computing
CN111027069B (en) Malicious software family detection method, storage medium and computing device
CN111914253B (en) Method, system, equipment and readable storage medium for intrusion detection
CN111612041A (en) Abnormal user identification method and device, storage medium and electronic equipment
CN110991474A (en) Machine learning modeling platform
CN109840413B (en) Phishing website detection method and device
WO2020114108A1 (en) Clustering result interpretation method and device
CN111695597B (en) Credit fraud group identification method and system based on improved isolated forest algorithm
CN112348080A (en) RBF improvement method, device and equipment based on industrial control abnormity detection
CN113568368B (en) Self-adaptive determination method for industrial control data characteristic reordering algorithm
CN107609589A (en) A kind of feature learning method of complex behavior sequence data
Sasank et al. Credit card fraud detection using various classification and sampling techniques: a comparative study
Gao Stability analysis of rock slope based on an abstraction ant colony clustering algorithm
CN110929525A (en) Network loan risk behavior analysis and detection method, device, equipment and storage medium
CN110472659B (en) Data processing method, device, computer readable storage medium and computer equipment
Mhawi et al. Proposed Hybrid CorrelationFeatureSelectionForestPanalizedAttribute Approach to advance IDSs
CN116304518A (en) Heterogeneous graph convolution neural network model construction method and system for information recommendation
CN114513374B (en) Network security threat identification method and system based on artificial intelligence
CN111783088B (en) Malicious code family clustering method and device and computer equipment
CN114331731A (en) PCA and RF based block chain abnormity detection method and related device
CN111428741B (en) Network community discovery method and device, electronic equipment and readable storage medium
CN110033031B (en) Group detection method, device, computing equipment and machine-readable storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant