CN112437053B - Intrusion detection method and device - Google Patents

Intrusion detection method and device Download PDF

Info

Publication number
CN112437053B
CN112437053B CN202011248506.6A CN202011248506A CN112437053B CN 112437053 B CN112437053 B CN 112437053B CN 202011248506 A CN202011248506 A CN 202011248506A CN 112437053 B CN112437053 B CN 112437053B
Authority
CN
China
Prior art keywords
data set
feature
characteristic data
sets
matrix
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202011248506.6A
Other languages
Chinese (zh)
Other versions
CN112437053A (en
Inventor
周献飞
徐楷
焦建林
董宁
韩盟
徐浩
陈奕倩
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
State Grid Corp of China SGCC
State Grid Beijing Electric Power Co Ltd
Original Assignee
State Grid Corp of China SGCC
State Grid Beijing Electric Power Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by State Grid Corp of China SGCC, State Grid Beijing Electric Power Co Ltd filed Critical State Grid Corp of China SGCC
Priority to CN202011248506.6A priority Critical patent/CN112437053B/en
Publication of CN112437053A publication Critical patent/CN112437053A/en
Application granted granted Critical
Publication of CN112437053B publication Critical patent/CN112437053B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/14Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic
    • H04L63/1408Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic by monitoring network traffic
    • H04L63/1416Event detection, e.g. attack signature detection
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2411Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on the proximity to a decision surface, e.g. support vector machines
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2415Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on parametric or probabilistic models, e.g. based on likelihood ratio or false acceptance rate versus a false rejection rate
    • G06F18/24155Bayesian classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/243Classification techniques relating to the number of classes
    • G06F18/24323Tree-organised classifiers

Landscapes

  • Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • General Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Computation (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Biology (AREA)
  • Computer Security & Cryptography (AREA)
  • Signal Processing (AREA)
  • Computing Systems (AREA)
  • Computer Hardware Design (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Probability & Statistics with Applications (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses an intrusion detection method and device. Wherein the method comprises the following steps: acquiring a first characteristic data set; performing dimension reduction processing on the first characteristic data set to obtain a second characteristic data set, wherein the dimension of the second characteristic data set is smaller than that of the first characteristic data set; and training the intrusion detection model by using the second characteristic data set to obtain a trained intrusion detection model, wherein the trained intrusion detection model is used for intrusion detection of data to be detected. The invention solves the technical problem of lower accuracy of data detection in the related technology.

Description

Intrusion detection method and device
Technical Field
The present invention relates to the field of data processing, and in particular, to an intrusion detection method and apparatus.
Background
With the development of the internet, data connection and traffic are increasing, and malicious intrusion and threat brought by the malicious intrusion to computers and various devices are increasing, so that intrusion detection is required for the data. When a large amount of high-dimensional data is encountered, the conventional intrusion detection system usually encounters the problem of dimension disaster, so that the accuracy of data detection is low; in addition, the existing intrusion detection system cannot identify unknown attacks in the data detection process, and the unknown attacks are subjected to missing report, so that the accuracy rate of data detection is low. Therefore, the data detection accuracy of the existing intrusion detection system is low.
In view of the above problems, no effective solution has been proposed at present.
Disclosure of Invention
The embodiment of the invention provides an intrusion detection method and device, which at least solve the technical problem of lower data detection accuracy in the related art.
According to an aspect of an embodiment of the present invention, there is provided an intrusion detection method including: acquiring a first characteristic data set; performing dimension reduction processing on the first characteristic data set to obtain a second characteristic data set, wherein the dimension of the second characteristic data set is smaller than that of the first characteristic data set; training an intrusion detection model by using the second characteristic data set to obtain a trained intrusion detection model, wherein the trained intrusion detection model is used for intrusion detection of data to be detected; the step of performing dimension reduction processing on the first characteristic data set to obtain a second characteristic data set comprises the following steps: dividing the first characteristic data set by using a cross verification method to generate a plurality of groups of data sets, wherein any two groups of data sets have a mutual exclusion relation; and performing feature screening on the multiple groups of data sets by using a random forest model to obtain multiple groups of target feature sets, wherein each group of target feature sets comprises: a plurality of target features; performing dimension reduction on the multiple groups of target feature sets to obtain a second feature data set; the method for training the intrusion detection model by using the second characteristic data set includes: performing misuse detection on the second characteristic data set by using a plurality of different types of preset models to obtain a third characteristic data set, wherein characteristic data contained in the third characteristic data set are used for representing non-attack data or normal data; and performing iterative training on the plurality of base classifiers by using an ensemble learning algorithm based on the third feature data set to obtain a trained intrusion detection model.
Optionally, performing a dimension reduction process on the first feature data set to obtain a second feature data set includes: dividing the first characteristic data set by using a cross verification method to generate a plurality of groups of data sets, wherein any two groups of data sets have a mutual exclusion relation; and performing feature screening on the multiple groups of data sets by using a random forest model to obtain multiple groups of target feature sets, wherein each group of target feature sets comprises: a plurality of target features; and performing dimension reduction processing on the multiple groups of target feature sets to obtain a second feature data set.
Optionally, feature screening is performed on the multiple sets of data sets by using a random forest model, and obtaining multiple sets of target feature sets includes: predicting a plurality of groups of data sets by using a random forest model to obtain a grading value of each original feature contained in a plurality of groups of original feature sets, wherein the grading value is used for representing the importance degree of each original feature; obtaining a scoring mean value of each original feature based on scoring values of each original feature contained in the plurality of groups of original feature sets; a plurality of sets of target features are determined based on the scored mean for each of the original features.
Optionally, determining multiple sets of target feature sets based on the scored mean of each original feature includes: according to the grading average value of each original feature, ascending order sorting is carried out on the plurality of original features; and acquiring a first preset number of original features at the forefront in the sorted plurality of original features to obtain a plurality of target features.
Optionally, performing dimension reduction processing on the multiple sets of target feature sets to obtain a second feature data set includes: constructing a first matrix based on a plurality of groups of target feature sets; acquiring a covariance matrix of the first matrix; determining a second matrix based on the covariance matrix; and obtaining the product of the first matrix and the second matrix to obtain a second characteristic data set.
Optionally, determining the second matrix based on the covariance matrix comprises: acquiring eigenvalues and eigenvectors of a covariance matrix; sorting the feature vectors according to the sizes of the feature values to generate a third matrix; and acquiring a second preset number of forefront row matrixes in the third matrix, and generating a second matrix.
Optionally, before acquiring the covariance matrix of the first matrix, the method further comprises: zero-equalizing the first matrix to obtain a fourth matrix; and obtaining a covariance matrix of the fourth matrix.
Optionally, before obtaining the product of the first matrix and the second matrix, the method further comprises: centering the first matrix to obtain a fifth matrix; and obtaining the product of the fifth matrix and the second matrix to obtain a second characteristic data set.
Optionally, before feature screening is performed on the multiple sets of data sets by using the random forest model to obtain multiple sets of target feature sets, the method further includes: randomly dividing a plurality of groups of data sets for a plurality of times to obtain a plurality of groups of training sets and test sets; training the random forest model by utilizing a plurality of groups of training sets; testing the trained random forest model by using the test set to obtain the total score of the trained random forest model; it is determined whether training of the random forest model is complete based on the total score.
Optionally, training the intrusion detection model using the second feature data set, and obtaining the trained intrusion detection model includes: performing misuse detection on the second characteristic data set to obtain a third characteristic data set, wherein characteristic data contained in the third characteristic data set are used for representing non-attack data or normal data; and performing iterative training on the plurality of base classifiers by using an ensemble learning algorithm based on the third feature data set to obtain a trained intrusion detection model.
Optionally, performing misuse detection on the second feature data set, and obtaining the third feature data set includes: predicting a plurality of different types of preset models by using the second characteristic data set, and determining the detection rate of the plurality of different types of preset models; determining a preset model corresponding to the maximum detection rate as a target model; performing misuse detection on the second characteristic data set by using the target model to obtain a detection result of the second characteristic data set; and obtaining a third characteristic data set based on the detection result of the second characteristic data set.
Optionally, the plurality of different types of preset models include: decision tree models, support vector machine models, and naive bayes models.
Optionally, after acquiring the first feature data set, the method further comprises: and formatting the first characteristic data set to obtain a processed first characteristic data set, wherein the types of variables contained in the processed first characteristic data set are the same.
According to another aspect of the embodiment of the present invention, there is also provided an intrusion detection apparatus including: the acquisition module is used for acquiring a first characteristic data set; the processing module is used for performing dimension reduction processing on the first characteristic data set to obtain a second characteristic data set, wherein the dimension of the second characteristic data set is smaller than that of the first characteristic data set; the training module is used for training the intrusion detection model by using the second characteristic data set to obtain a trained intrusion detection model, wherein the trained intrusion detection model is used for intrusion detection of data to be detected; a training module, comprising: the detection unit is used for carrying out misuse detection on the second characteristic data set by utilizing a plurality of different types of preset models to obtain a third characteristic data set, wherein characteristic data contained in the third characteristic data set are used for representing non-attack data or normal data; and the second training unit is used for carrying out iterative training on the plurality of base classifiers by utilizing an integrated learning algorithm based on the third characteristic data set to obtain the trained intrusion detection model.
According to another aspect of the embodiment of the present invention, there is further provided a computer readable storage medium, where the computer readable storage medium includes a stored program, and when the program runs, the device in which the computer readable storage medium is controlled to execute the intrusion detection method described above.
According to another aspect of the embodiment of the present invention, there is also provided a processor, configured to execute a program, where the program executes the intrusion detection method described above.
In the embodiment of the invention, the first characteristic data set is firstly obtained, then the first characteristic data set is subjected to dimension reduction processing to obtain the second characteristic data set, wherein the dimension of the second characteristic data set is smaller than that of the first characteristic data set, finally the second characteristic data set is utilized to train the intrusion detection model to obtain the trained intrusion detection model, the trained intrusion detection model is used for carrying out intrusion detection on data to be detected, the problem of dimension disaster is avoided by carrying out dimension reduction processing on the first characteristic data set, and therefore the accuracy of data detection is improved.
Drawings
The accompanying drawings, which are included to provide a further understanding of the invention and are incorporated in and constitute a part of this application, illustrate embodiments of the invention and together with the description serve to explain the invention and do not constitute a limitation on the invention. In the drawings:
FIG. 1 is a flow chart of an intrusion detection method according to an embodiment of the present invention;
FIG. 2 is a flow chart of another intrusion detection method according to an embodiment of the invention;
FIG. 3 is a schematic diagram of an intrusion detection device according to an embodiment of the present invention;
fig. 4 is a schematic diagram of another intrusion detection device according to an embodiment of the present invention.
Detailed Description
In order that those skilled in the art will better understand the present invention, a technical solution in the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in which it is apparent that the described embodiments are only some embodiments of the present invention, not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the present invention without making any inventive effort, shall fall within the scope of the present invention.
It should be noted that the terms "first," "second," and the like in the description and the claims of the present invention and the above figures are used for distinguishing between similar objects and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used may be interchanged where appropriate such that the embodiments of the invention described herein may be implemented in sequences other than those illustrated or otherwise described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.
In accordance with an embodiment of the present invention, there is provided a method embodiment of intrusion detection, it being noted that the steps shown in the flowcharts of the figures may be performed in a computer system such as a set of computer-executable instructions, and although a logical order is shown in the flowcharts, in some cases the steps shown or described may be performed in an order other than that shown or described herein.
Fig. 1 is a flowchart of an intrusion detection method according to an embodiment of the present invention, as shown in fig. 1, the method including the steps of:
step S102, a first feature data set is acquired.
The first characteristic data set in the above step is a data set of network intrusion detection, and the first characteristic data set may be at least one of the following: a network traffic based dataset, a grid based dataset, an internet traffic based dataset, a virtual private network based dataset, an android application based dataset, an internet of things (IOT) traffic based dataset, an internet connected device based dataset. The data set based on network traffic may be DARPA1998dataset, KDD Cup 1999dataset, NSL-KDD dataset or UNSW-NB15dataset, among others.
And step S104, performing dimension reduction processing on the first characteristic data set to obtain a second characteristic data set.
Wherein the second feature data set has a smaller dimension than the first feature data set.
In an alternative embodiment, the dimension of the feature data is reduced, so that the maximum information in the feature data set is maintained, the problem of dimension disaster is avoided, the calculated amount of the feature data is exponentially reduced by reducing the dimension of the feature data, and the calculation complexity of the feature data is reduced.
And step S106, training the intrusion detection model by using the second characteristic data set to obtain a trained intrusion detection model.
The trained intrusion detection model is used for intrusion detection of data to be detected.
In an alternative embodiment, the feature data set after dimension reduction is adopted to train the intrusion detection model in real time, so that the intrusion detection model can timely detect unknown data attack, more accurate prediction on intrusion data is realized, and the effect of reducing the false alarm rate of the intrusion detection model is achieved.
According to the embodiment of the invention, the first characteristic data set is firstly obtained, then the first characteristic data set is subjected to dimension reduction processing to obtain the second characteristic data set, wherein the dimension of the second characteristic data set is smaller than that of the first characteristic data set, finally the second characteristic data set is utilized to train the intrusion detection model to obtain the trained intrusion detection model, the trained intrusion detection model is used for carrying out intrusion detection on data to be detected, the problem of dimension disaster is avoided by carrying out dimension reduction processing on the first characteristic data set, and therefore the accuracy of data detection is improved.
Optionally, performing a dimension reduction process on the first feature data set to obtain a second feature data set includes: dividing the first characteristic data set by using a cross verification method to generate a plurality of groups of data sets, wherein any two groups of data sets have a mutual exclusion relation; and performing feature screening on the multiple groups of data sets by using a random forest model to obtain multiple groups of target feature sets, wherein each group of target feature sets comprises: a plurality of target features; and performing dimension reduction processing on the multiple groups of target feature sets to obtain a second feature data set.
The cross-validation method in the above step, also referred to as loop estimation, is a practical method of cutting data samples into smaller subsets.
The random forest in the above step is a classifier comprising a plurality of decision trees, the output class of which is determined by the mode of the class output by the individual tree.
In an alternative embodiment, the cross-validation method is used to divide the processed data set into mutually exclusive training subsets, and multiple sets of training set test sets are generated, so that errors caused by one test can be avoided. The method comprises the steps of testing divided data sets by utilizing random forests, obtaining contribution rates of a group of features for each training set test set, taking an average value of the contribution rates, selecting a plurality of features with certain relativity and smaller average value of the feature contribution rates, then using PCA (Principal components analysis, principal component analysis) to reconstruct a group of less number of new feature data sets which are mutually irrelevant to replace the original feature data sets, enabling the new features to reflect information represented by the original features to the greatest extent, ensuring that information among indexes is not overlapped, and replacing the original multi-bit features with the newly obtained multi-dimensional features to obtain the new data sets. The new data set reduces feature dimensions and ensures that each dimension feature contains more information.
Optionally, feature screening is performed on the multiple sets of data sets by using a random forest model, and obtaining multiple sets of target feature sets includes: predicting a plurality of groups of data sets by using a random forest model to obtain a grading value of each original feature contained in a plurality of groups of original feature sets, wherein the grading value is used for representing the importance degree of each original feature; obtaining a scoring mean value of each original feature based on scoring values of each original feature contained in the plurality of groups of original feature sets; a plurality of sets of target features are determined based on the scored mean for each of the original features.
In an alternative embodiment, a random forest model may be used to predict multiple sets of data sets to obtain how much each feature makes a contribution on each tree in the random forest, then an average is taken, and finally the contribution between the features is compared, which may be generally measured by using a Gini index (Gini index) or an out-of-bag data (Out Of Bag Estimation, OOB) error rate as an evaluation index, and by comparing the contribution between the features, features with larger contribution values may be used as features in the target feature set, and features with smaller contribution values may be removed.
Illustratively, a base index may be used to derive a score mean for each original feature; the importance score of a variable is expressed by VIM (Vi Improved, text editor), assuming that there are c features x 1 ,x 2 ,x 3 ,...,x c Now each feature x is calculated i Matrix index score VIM of (v) j Gini I.e. the average amount of change in node splitting non-purity of j features in all decision trees.
Wherein, the calculation formula of the base index is:
Figure GDA0004085286380000061
wherein k represents k categories, p k Sample weights representing class k. Feature x j The importance on the node m, namely the change amount of the base index before and after branching of the node m is as follows: />
Figure GDA0004085286380000062
Wherein, GI l And GI r Respectively, the Gini index of two new nodes after branching. If the feature x j In decision treeThe node appearing in i is in set M, then x j The importance in the ith tree is:
Figure GDA0004085286380000063
assuming that there are n trees, then +.>
Figure GDA0004085286380000064
Finally, all the obtained importance scores are normalized, as shown in table 1. />
Figure GDA0004085286380000065
The denominator is the sum of all characteristic gains and the numerator is the base index of characteristic j.
TABLE 1
Features (e.g. a character) Contribution rate average Features (e.g. a character) Contribution rate average Features (e.g. a character) Contribution rate average
dur 0.06789 dloss 0.0095 trans_depth 0.00208
proto 0.0168 sinpkt 0.01326 response_body_len 0.00425
service 0.02767 dinpkt 0.02406 ct_srv_src 0.02814
state 0.01578 sjit 0.00723 ct_state_ttl 0.05433
spkts 0.00879 djit 0.00784 ct_dst_ltm 0.01183
dpkts 0.04487 swin 0.01366 ct_src_dport_ltm 0.01409
sbytes 0.07697 stcpb 0.00479 ct_dst_sport_ltm 0.04049
dbytes 0.01584 dtcpb 0.00481 ct_dst_src_ltm 0.09199
rate 0.01298 dwin 0.00075 is_ftp_login 0.00014
sttl 0.01925 tcprtt 0.04222 ct_ftp_cmd 0.0001
dttl 0.0884 synack 0.02442 ct_flw_http_mthd 0.00231
sload 0.01486 ackdat 0.01299 ct_src_ltm 0.00852
dload 0.01313 smean 0.02917 ct_srv_dst 0.05406
sloss 0.02778 dmean 0.04264 is_sm_ips_ports 0.00416
Optionally, determining multiple sets of target feature sets based on the scored mean of each original feature includes: according to the grading average value of each original feature, ascending order sorting is carried out on the plurality of original features; and acquiring a first preset number of original features at the forefront in the sorted plurality of original features to obtain a plurality of target features.
The first preset number in the above steps may be set by a user, and the plurality of target features are features that need to be subjected to dimension reduction processing.
In an alternative embodiment, the feature importance scores VIM may be derived on each of the small data sets described above j And taking an average value, and selecting m features with smaller feature importance scores to reduce the dimension.
Optionally, performing dimension reduction processing on the multiple sets of target feature sets to obtain a second feature data set includes: constructing a first matrix based on a plurality of groups of target feature sets; acquiring a covariance matrix of the first matrix; determining a second matrix based on the covariance matrix; and obtaining the product of the first matrix and the second matrix to obtain a second characteristic data set.
In an alternative embodiment, the first matrix may be a matrix x, the second matrix may be a matrix p, the dimension-reducing process may be to reduce the dimension of a piece of m-dimensional data, form m rows and a columns of matrix x from raw data according to columns, zero-average each row of x, that is, subtract the average value of the row, calculate the covariance matrix, calculate the eigenvalue of the covariance matrix and the corresponding eigenvector r, arrange the eigenvector r into a matrix according to the corresponding eigenvalue from top to bottom according to rows, form a matrix p in the first k rows, multiply the matrix formed by k eigenvectors with the data matrix after centering, that is, reduce the data after u dimensions, where the formula error=can be used
Figure GDA0004085286380000081
Representing the compressed error, u is the number of reduced features, and then determining a value x, such as 0.01, such that error < x, then consider it acceptable to reduce the dimension to u. And replacing the original m-dimensional features with the new u-dimensional features to finally obtain a new data set of Y= (x-m+u) features, namely a second feature data set, and using the second feature data set for intrusion detection.
Optionally, determining the second matrix based on the covariance matrix comprises: acquiring eigenvalues and eigenvectors of a covariance matrix; sorting the feature vectors according to the sizes of the feature values to generate a third matrix; and acquiring a second preset number of forefront row matrixes in the third matrix, and generating a second matrix.
In an alternative embodiment, the eigenvector of the covariance may be r, the third matrix may be the matrix q, and the second matrix may be the matrix p. The eigenvalue of covariance and the corresponding eigenvector r can be obtained, the eigenvector r is arranged into a matrix q according to the corresponding eigenvalue from top to bottom, and the first k rows of the matrix q are taken to form a matrix p.
Optionally, before acquiring the covariance matrix of the first matrix, the method further comprises: zero-equalizing the first matrix to obtain a fourth matrix; and obtaining a covariance matrix of the fourth matrix.
The zero-averaging process in the above steps refers to the process that the variable subtracts its mean value, which is actually a shifting process, and the center of all data after shifting is (0, 0), and errors caused by different dimensions, own variation or larger value difference can be eliminated through the zero-averaging process.
In an alternative embodiment, zero-averaging may be performed on each line of data in the first matrix, that is, the average value of the line is subtracted from each line of data to obtain a fourth matrix, and a covariance matrix of the fourth matrix may be obtained.
Optionally, before obtaining the product of the first matrix and the second matrix, the method further comprises: centering the first matrix to obtain a fifth matrix; and obtaining the product of the fifth matrix and the second matrix to obtain a second characteristic data set.
The centering process in the above steps has the same effect as the zero-mean process, and errors caused by different dimensions, own variation or larger numerical value difference can be eliminated.
In an alternative embodiment, zero-averaging may be performed on each data in the first matrix, that is, the average value of all the data is subtracted from each data to obtain a fifth matrix, and the product of the fifth matrix and the second matrix may be obtained to obtain the second feature data set.
Optionally, before feature screening is performed on the multiple sets of data sets by using the random forest model to obtain multiple sets of target feature sets, the method further includes: randomly dividing a plurality of groups of data sets for a plurality of times to obtain a plurality of groups of training sets and test sets; training the random forest model by utilizing a plurality of groups of training sets; testing the trained random forest model by using the test set to obtain the total score of the trained random forest model; it is determined whether training of the random forest model is complete based on the total score.
In an alternative embodiment, the multi-set data set may be split into multiple small data sets using a k-means of re-cross validation (k-flod cross Validation). Firstly, the k-weight cross validation method randomly divides sample data into k parts, k-1 parts are randomly selected as training sets each time, the rest 1 part is used as a test set, and k-1 parts can be randomly selected again to train data after the first division is completed, so that k training data sets and k test data sets can be obtained.
The training process of the random forest model by utilizing the multiple groups of training sets can be as follows: selecting n samples from the sample set by using a sampling replacement method as a training set, and generating a decision tree by using the sampled sample set; d features are randomly and non-repeatedly selected from each node of the generated decision tree, the sample set is divided by using the d features to find the optimal dividing features, d features are repeatedly selected, the sample set is divided by using the d features, m times of steps for finding the optimal dividing features are carried out, and m is the number of decision trees in the random forest. And predicting the test sample by using the random forest obtained by training, and determining a predicted result by using a voting method.
Wherein { A } 2 ,A 3 ,A 4 ,……A k Building random forest model M based on 1 And for data set A 1 Verifying, comparing the predicted value with the true value, and calculating a score a under a certain evaluation standard 1
In { A 1 ,A 3 ,A 4 ,……A k Building a model M on the basis of } 1 And for data set A 2 Verifying, comparing the predicted value with the true value, and calculating a score a under the same evaluation standard 2
In { A 1 ,A 2 ,A 3 ,……A k-1 Building a model on the basis of the data set A k Verifying, comparing the predicted value with the true value, and calculating a score a under the same evaluation standard k
a1=a 1 +a 2 +…+a k K as model M 1 Is a composite score of (2).
Wherein A is 1 ,A 2 ,A 3 ,……A k Respectively represent k data sets obtained by a k-weight cross validation method, M 1 Representing a trained random forest model. For each a obtained 1 ,a 2 ,……a k Each feature has a different importance.
Optionally, training the intrusion detection model using the second feature data set, and obtaining the trained intrusion detection model includes: performing misuse detection on the second characteristic data set to obtain a third characteristic data set, wherein characteristic data contained in the third characteristic data set are used for representing non-attack data or normal data; and performing iterative training on the plurality of base classifiers by using an ensemble learning algorithm based on the third feature data set to obtain a trained intrusion detection model.
The misuse detection in the steps is a method for detecting the computer attack, the known attack can be simply added into the model in the misuse detection, the misinformation rate of the detection is low, and the detection efficiency is high.
In an alternative embodiment, the second feature data set may be misused to detect that there is generally some non-attack data that is not in the second feature data set after the misuse detection, and the data may be extracted to generate a new feature data set, i.e. the third feature data set, in combination with the second feature data set.
In an alternative embodiment, the process of performing iterative training on the plurality of base classifiers using the ensemble learning algorithm may be: first, the weight distribution of training data is initialized: w= (w_11, w_12,., w_1n), w_1i=1/N, i=1, 2, … N, then for m=1, 2, …, M use is made of a weight distribution D m Is learned by a training data set to obtain a basic classifier G m (x) Calculate G m (x) A classification error rate on the training set;
Figure GDA0004085286380000101
Figure GDA0004085286380000102
calculation G m (x) Coefficients of (2): a, a m =1/2log(1-e m /e m ) The weight distribution (z) m Is a normalization factor that causes z m+1 Becomes a probability distribution): w (W) m+1,i =w mi /z m exp(-a m y i G m (x i )),/>
Figure GDA0004085286380000103
Constructing a linear combination of basis classifiers to obtain a final classifier, < > >
Figure GDA0004085286380000104
Figure GDA0004085286380000105
Finally, the results may be predicted using a final classifier, wherein the final classifier is a trained intrusion detection model.
Optionally, performing misuse detection on the second feature data set, and obtaining the third feature data set includes: predicting a plurality of different types of preset models by using the second characteristic data set, and determining the detection rate of the plurality of different types of preset models; determining a preset model corresponding to the maximum detection rate as a target model; performing misuse detection on the second characteristic data set by using the target model to obtain a detection result of the second characteristic data set; and obtaining a third characteristic data set based on the detection result of the second characteristic data set.
Optionally, the plurality of different types of preset models include: decision tree models, support vector machine models, and naive bayes models.
When a naive bayes model is employed, the classification model samples are assumed to be:
Figure GDA0004085286380000106
Figure GDA0004085286380000107
i.e. m samples, each sample having n features, the feature outputs having K categories, defined asC 1 ,C 2 ,...,C k The method comprises the steps of carrying out a first treatment on the surface of the Obtaining a naive bayes prior distribution P from sample learning (y=c k ) (k=1, 2,..k) then learn the conditional probability distribution P (x=x|y=c k )=P(X 1 =x 1 ,X 2 =x 2 ,...X n =x n |Y=C k ) Then, a Bayes formula is used to obtain the joint distribution P (X, Y) of X and Y: p (X, y=c) k )=P(Y=C k )P(X=x|Y=C k )=P(Y=C k )P(X 1 =x 1 ,X 2 =x 2 ,...X n =x n |Y=C k )=P(X 1 =x 1 |Y=C k )P(X 2 =x 2 |Y=C k )…P(X n =x n |Y=C k ) P (y=c by maximum likelihood k ) Find that it is category C k And finding out the category corresponding to the maximum conditional probability in the frequency of occurrence of the training set, which is the naive Bayesian prediction.
When the support vector machine model is adopted, a classification function is adopted
Figure GDA0004085286380000111
Wherein l represents the number of training samples, x represents the vector of the instance to be classified, and x i ,y i Attribute vector and class identification representing the ith sample, K (x i X) represents a kernel function, a i B represents model parameters, and a is obtained through quadratic programming i Further, w and b are obtained to obtain a classification model g (x) =w×x+b, and g (x)>0 and g (x)<And when 0, x respectively belongs to different categories, and selecting the plane with the largest distance from the two categories of objects.
When a decision tree model is used, attributes are selected according to the base index, the information gain or the information gain ratio, and then branches are built up and down according to the attributes until all samples on one node are classified into the same class, or the number of samples in a certain node is lower than a given value. The final model is obtained by preventing overfitting with first pruning, second pruning or a combination of the two.
In an alternative embodiment, the obtained three models are used to predict the second feature data set, the model with the highest test detection rate is selected as the target model, the rate of missing report in the detection process is reduced, and then no attack and normal data in the process are extracted as new data sets, namely the third feature data set.
Optionally, after acquiring the first feature data set, the method further comprises: and formatting the first characteristic data set to obtain a processed first characteristic data set, wherein the types of variables contained in the processed first characteristic data set are the same.
In an alternative embodiment, the formatting process may be a digitizing process, and since the first feature dataset contains both data-type and character-type variables, the first feature dataset needs to be uniformly digitized for facilitating subsequent processing of the first feature dataset.
The left side of the table 2 is the accuracy of several algorithms obtained by performing misuse detection by using several machine learning algorithms after feature processing, decision trees can be selected for misuse detection by comparison, the right side of the table 2 is the accuracy of non-feature processing, the decision trees are used for misuse detection after feature processing, the accuracy of a test set after one-field detection is performed by using an integrated algorithm, and the accuracy of the intrusion detection method in the embodiment can be obtained by combining experimental results.
TABLE 2
Figure GDA0004085286380000112
Figure GDA0004085286380000121
A preferred embodiment of the present invention will be described in detail with reference to fig. 2 to 3. As shown in figure x, the method may include the steps of:
Step S201, dividing a data set to obtain a plurality of mutually exclusive training test sets;
alternatively, the data set may be first uniformly digitized for subsequent use.
Alternatively, the processed data set can be divided into mutually exclusive training subsets by adopting a cross-validation method, and multiple groups of training sets and testing sets are generated, so that errors caused by one test can be avoided.
As shown in fig. 3, the data set is divided into a training test set 1, a training test set 2, and a training test set N.
Step S202, testing is carried out based on the divided data sets by utilizing a random forest, and the contribution rate of a group of characteristics is obtained for each group of training set testing sets;
as shown in fig. 3, a random forest 1 is utilized to test the training test set 1, so as to obtain the contribution rate ranking of the features 1 to the features x in the training test set 1; testing the training test set 2 by using the random forest 2 to obtain the contribution rate ranking of the features 1 to the features x in the training test set 2; and testing the training test set N by using the random forest N to obtain the contribution rate sequencing of the features 1 to the features X in the training test set N.
Step S203, taking an average value of the characteristic contribution rates, and selecting a plurality of characteristics with certain relativity with smaller average value of the characteristic contribution rates as target characteristics;
As shown in fig. 3, the features of each training test set are averaged, and several features with certain correlation with smaller feature contribution rate average are selected; wherein the features of the first m rows in the feature ordering of each training test set may be selected as target features.
Step S204, a group of less number of new comprehensive indexes which are not related with each other are recombined by utilizing PCA to replace the original indexes, and the new data set is obtained by replacing the original multidimensional characteristics with the newly obtained multidimensional characteristics;
as shown in fig. 3, feature 1 through feature Y in each training test set may be combined using PCA to obtain a new data set.
The new data set reduces feature dimensions and ensures that each dimension feature contains more information.
Step S205, performing misuse detection based on the new data set;
when the algorithm is selected, a plurality of algorithms, such as decision trees, support vector machines and naive Bayes, can be respectively used for testing, and finally the algorithm with the highest testing accuracy is selected. The first layer detection is guaranteed to have a lower missing report rate.
As shown in fig. 3, decision trees, support vector machines, naive bayes may be used to test the new data sets, respectively, add the new data set with the highest accuracy to the intrusion rule base, and perform misuse detection based on the data set.
Step S206, after misuse test, extracting non-attack data and normal data in the characteristic rule base to be used as a new data set;
step S207, performing anomaly detection based on the data set after misuse test to obtain a strong classifier;
optionally, the anomaly detection is to train a plurality of base classifiers based on the data set after misuse detection, iterate the plurality of base classifiers by using an AdaBoost algorithm (Adaptive Boosting, adaptive enhancement) in an inheritance learning algorithm, give each time a higher weight to a sample with a wrong classification, and finally combine the samples into a strong classifier to perform anomaly detection.
And step S208, performing intrusion detection by using a strong classifier, and extracting non-attack data and normal data in the characteristic rule base to be used as a final data set.
As shown in fig. 3, after anomaly detection, some attack data which is not in the intrusion rule base can be extracted to combine with normal data to form a new data set.
Example 2
According to the embodiment of the present invention, there is further provided an intrusion detection device, which can execute the intrusion detection method in the above embodiment, and the specific implementation manner and the preferred application scenario are the same as those of the above embodiment, and are not described herein.
Fig. 4 is a schematic diagram of an intrusion detection device according to an embodiment of the present invention, as shown in fig. 4, the device includes:
an acquisition module 42 for acquiring a first feature data set;
a processing module 44, configured to perform a dimension reduction process on the first feature data set to obtain a second feature data set, where a dimension of the second feature data set is smaller than that of the first feature data set;
the training module 46 is configured to train the intrusion detection model by using the second feature data set to obtain a trained intrusion detection model, where the trained intrusion detection model is used for intrusion detection on data to be detected.
Optionally, the processing module includes: the dividing unit is used for dividing the first characteristic data set by using a cross verification method to generate a plurality of groups of data sets, wherein any two groups of data sets have a mutual exclusion relation; the screening unit is used for carrying out feature screening on a plurality of groups of data sets by utilizing a random forest model to obtain a plurality of groups of target feature sets, wherein each group of target feature sets comprises: a plurality of target features; and the processing unit is used for performing dimension reduction processing on the multiple groups of target feature sets to obtain a second feature data set.
Optionally, the screening unit comprises: the prediction subunit is used for predicting the plurality of groups of data sets by utilizing the random forest model to obtain a grading value of each original feature contained in the plurality of groups of original feature sets, wherein the grading value is used for representing the importance degree of each original feature; the first acquisition subunit is used for obtaining the score mean value of each original feature based on the score value of each original feature contained in the plurality of groups of original feature sets; a first determining subunit, configured to determine a plurality of target feature sets based on the score average of each original feature.
Optionally, the determining subunit further performs ascending sort on the plurality of original features according to the score average value of each original feature, and obtains a first preset number of original features forefront in the sorted plurality of original features, so as to obtain a plurality of target features.
Optionally, the processing unit comprises: a construction subunit configured to construct a first matrix based on the plurality of sets of target feature sets; a second obtaining subunit, configured to obtain a covariance matrix of the first matrix; a second determination subunit configured to determine a second matrix based on the covariance matrix; the second obtaining subunit is further configured to obtain a product of the first matrix and the second matrix, to obtain a second feature data set.
Optionally, the second determining subunit is further configured to obtain a eigenvalue and an eigenvector of the covariance matrix, sort the eigenvectors according to the magnitude of the eigenvalue, generate a third matrix, and obtain a second preset number of line matrices at the forefront in the third matrix to generate a second matrix.
Optionally, the processing unit comprises: the first processing subunit is used for carrying out zero-mean processing on the first matrix to obtain a fourth matrix; and the third acquisition subunit is used for acquiring the covariance matrix of the fourth matrix.
Optionally, the processing unit comprises: the second processing subunit is used for carrying out centering processing on the first matrix to obtain a fifth matrix; and the fourth acquisition subunit is used for acquiring the product of the fifth matrix and the second matrix to obtain a second characteristic data set.
Optionally, the processing module includes: the dividing unit is also used for randomly dividing the multiple groups of data sets for multiple times to obtain multiple groups of training sets and test sets; the first training unit is used for training the random forest model by utilizing a plurality of groups of training sets; the test unit is used for testing the trained random forest model by using the test set to obtain the total score of the trained random forest model; and the determining unit is used for determining whether training of the random forest model is completed or not based on the total score.
Optionally, the training module includes: the detection unit is used for carrying out misuse detection on the second characteristic data set to obtain a third characteristic data set, wherein characteristic data contained in the third characteristic data set are used for representing non-attack data or normal data; and the second training unit is used for carrying out iterative training on the plurality of base classifiers by utilizing an integrated learning algorithm based on the third characteristic data set to obtain a trained intrusion detection model.
Optionally, the detection unit includes: the prediction subunit is used for predicting a plurality of different types of preset models by using the second characteristic data set and determining the detection rate of the plurality of different types of preset models; the third determining subunit is used for determining a preset model corresponding to the maximum detection rate as a target model; the detection subunit is used for carrying out misuse detection on the second characteristic data set by utilizing the target model to obtain a detection result of the second characteristic data set; and a fifth obtaining subunit, configured to obtain a third feature data set based on the detection result of the second feature data set.
Optionally, the plurality of different types of preset models in the detection unit include: decision tree models, support vector machine models, and naive bayes models.
Optionally, the processing module is further configured to perform formatting processing on the first feature data set to obtain a processed first feature data set, where the types of variables included in the processed first feature data set are the same.
Example 3
According to an embodiment of the present invention, there is also provided a computer-readable storage medium, including a stored program, where the device in which the computer-readable storage medium is controlled to execute the intrusion detection method in embodiment 1 described above when the program runs.
Example 4
According to an embodiment of the present invention, there is also provided a processor for running a program, where the program executes the intrusion detection method in embodiment 1.
The foregoing embodiment numbers of the present invention are merely for the purpose of description, and do not represent the advantages or disadvantages of the embodiments.
In the foregoing embodiments of the present invention, the descriptions of the embodiments are emphasized, and for a portion of this disclosure that is not described in detail in this embodiment, reference is made to the related descriptions of other embodiments.
In the several embodiments provided in the present application, it should be understood that the disclosed technology content may be implemented in other manners. The above-described embodiments of the apparatus are merely exemplary, and the division of the units, for example, may be a logic function division, and may be implemented in another manner, for example, a plurality of units or components may be combined or may be integrated into another system, or some features may be omitted, or not performed. Alternatively, the coupling or direct coupling or communication connection shown or discussed with each other may be through some interfaces, units or modules, or may be in electrical or other forms.
The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of this embodiment.
In addition, each functional unit in the embodiments of the present invention may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit. The integrated units may be implemented in hardware or in software functional units.
The integrated units, if implemented in the form of software functional units and sold or used as stand-alone products, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present invention may be embodied essentially or in part or all of the technical solution or in part in the form of a software product stored in a storage medium, including instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to perform all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a Read-Only Memory (ROM), a random access Memory (RAM, random Access Memory), a removable hard disk, a magnetic disk, or an optical disk, or other various media capable of storing program codes.
The foregoing is merely a preferred embodiment of the present invention and it should be noted that modifications and adaptations to those skilled in the art may be made without departing from the principles of the present invention, which are intended to be comprehended within the scope of the present invention.

Claims (14)

1. An intrusion detection method, comprising:
acquiring a first characteristic data set;
performing dimension reduction processing on the first characteristic data set to obtain a second characteristic data set, wherein the dimension of the second characteristic data set is smaller than that of the first characteristic data set;
training an intrusion detection model by using the second characteristic data set to obtain a trained intrusion detection model, wherein the trained intrusion detection model is used for intrusion detection of data to be detected;
the step of performing dimension reduction processing on the first characteristic data set to obtain a second characteristic data set comprises the following steps: dividing the first characteristic data set by using a cross verification method to generate a plurality of groups of data sets, wherein any two groups of data sets have a mutual exclusion relation; and performing feature screening on the multiple groups of data sets by using a random forest model to obtain multiple groups of target feature sets, wherein each group of target feature sets comprises: a plurality of target features; performing dimension reduction on the multiple groups of target feature sets to obtain a second feature data set;
The training the intrusion detection model by using the second feature data set, and obtaining the trained intrusion detection model includes: performing misuse detection on the second characteristic data set by using a plurality of different types of preset models to obtain a third characteristic data set, wherein characteristic data contained in the third characteristic data set are used for representing non-attack data or normal data; and performing iterative training on the plurality of base classifiers by using an ensemble learning algorithm based on the third feature data set to obtain the trained intrusion detection model.
2. The method of claim 1, wherein feature screening the plurality of sets of data using the random forest model to obtain a plurality of sets of target features comprises:
predicting the multiple groups of data sets by utilizing the random forest model to obtain a grading value of each original feature contained in the multiple groups of original feature sets, wherein the grading value is used for representing the importance degree of each original feature;
obtaining a scoring mean value of each original feature based on scoring values of each original feature contained in the plurality of groups of original feature sets;
and determining the multiple target feature sets based on the scoring mean of each original feature.
3. The method of claim 2, wherein determining the plurality of sets of target features based on the scored mean for each original feature comprises:
according to the grading average value of each original feature, ascending order is carried out on the plurality of original features;
and acquiring a first preset number of original features at the forefront in the sorted plurality of original features, and obtaining the plurality of target features.
4. The method of claim 1, wherein performing a dimension reduction process on the plurality of sets of target feature sets to obtain the second feature data set comprises:
constructing a first matrix based on the multiple sets of target feature sets;
acquiring a covariance matrix of the first matrix;
determining a second matrix based on the covariance matrix;
and obtaining the product of the first matrix and the second matrix to obtain the second characteristic data set.
5. The method of claim 4, wherein determining a second matrix based on the covariance matrix comprises:
acquiring eigenvalues and eigenvectors of the covariance matrix;
sorting the feature vectors according to the magnitude of the feature values to generate a third matrix;
and acquiring a second preset number of forefront row matrixes in the third matrix, and generating the second matrix.
6. The method of claim 4, wherein prior to obtaining the covariance matrix of the first matrix, the method further comprises:
zero-equalizing the first matrix to obtain a fourth matrix;
and acquiring the covariance matrix of the fourth matrix.
7. The method of claim 4, wherein prior to obtaining the product of the first matrix and the second matrix to obtain the second feature data set, the method further comprises:
centering the first matrix to obtain a fifth matrix;
and obtaining the product of the fifth matrix and the second matrix to obtain the second characteristic data set.
8. The method of claim 1, wherein prior to feature screening the plurality of sets of data using the random forest model to obtain a plurality of sets of target features, the method further comprises:
randomly dividing the multiple groups of data sets for multiple times to obtain multiple groups of training sets and test sets;
training the random forest model by utilizing the multiple groups of training sets;
testing the trained random forest model by using the test set to obtain the total score of the trained random forest model;
Determining whether training of the random forest model is complete based on the total score.
9. The method of claim 1, wherein misuse detection of the second feature data set to obtain a third feature data set comprises:
predicting a plurality of different types of preset models by using the second characteristic data set, and determining the detection rate of the plurality of different types of preset models;
determining a preset model corresponding to the maximum detection rate as a target model;
performing misuse detection on the second characteristic data set by using the target model to obtain a detection result of the second characteristic data set;
and obtaining the third characteristic data set based on the detection result of the second characteristic data set.
10. The method of claim 9, wherein the plurality of different types of preset models comprises: decision tree models, support vector machine models, and naive bayes models.
11. The method of claim 1, wherein after acquiring the first feature data set, the method further comprises:
and formatting the first characteristic data set to obtain a processed first characteristic data set, wherein the types of variables contained in the processed first characteristic data set are the same.
12. An intrusion detection device, comprising:
the acquisition module is used for acquiring a first characteristic data set;
the processing module is used for performing dimension reduction processing on the first characteristic data set to obtain a second characteristic data set, wherein the dimension of the second characteristic data set is smaller than that of the first characteristic data set;
the training module is used for training the intrusion detection model by using the second characteristic data set to obtain a trained intrusion detection model, wherein the trained intrusion detection model is used for intrusion detection of data to be detected
The processing module is further configured to perform dimension reduction processing on the first feature data set, and obtaining a second feature data set includes: dividing the first characteristic data set by using a cross verification method to generate a plurality of groups of data sets, wherein any two groups of data sets have a mutual exclusion relation; and performing feature screening on the multiple groups of data sets by using a random forest model to obtain multiple groups of target feature sets, wherein each group of target feature sets comprises: a plurality of target features; performing dimension reduction on the multiple groups of target feature sets to obtain a second feature data set;
a training module, comprising: the detection unit is used for carrying out misuse detection on the second characteristic data set by utilizing a plurality of different types of preset models to obtain a third characteristic data set, wherein characteristic data contained in the third characteristic data set are used for representing non-attack data or normal data; and the second training unit is used for carrying out iterative training on the plurality of base classifiers by utilizing an integrated learning algorithm based on the third characteristic data set to obtain the trained intrusion detection model.
13. A computer-readable storage medium, characterized in that the computer-readable storage medium comprises a stored program, wherein the program, when run, controls a device in which the computer-readable storage medium is located to perform the intrusion detection method according to any one of claims 1 to 11.
14. A processor for running a program, wherein the program when run performs the intrusion detection method according to any one of claims 1 to 11.
CN202011248506.6A 2020-11-10 2020-11-10 Intrusion detection method and device Active CN112437053B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011248506.6A CN112437053B (en) 2020-11-10 2020-11-10 Intrusion detection method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011248506.6A CN112437053B (en) 2020-11-10 2020-11-10 Intrusion detection method and device

Publications (2)

Publication Number Publication Date
CN112437053A CN112437053A (en) 2021-03-02
CN112437053B true CN112437053B (en) 2023-06-30

Family

ID=74699400

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011248506.6A Active CN112437053B (en) 2020-11-10 2020-11-10 Intrusion detection method and device

Country Status (1)

Country Link
CN (1) CN112437053B (en)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113645182B (en) * 2021-06-21 2023-07-14 上海电力大学 Denial of service attack random forest detection method based on secondary feature screening
CN113542276B (en) * 2021-07-16 2023-01-24 江苏商贸职业学院 Method and system for detecting intrusion target of hybrid network
CN113836527B (en) * 2021-11-23 2022-02-18 北京微步在线科技有限公司 Intrusion event detection model construction method and device and intrusion event detection method

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101399672A (en) * 2008-10-17 2009-04-01 章毅 Intrusion detection method for fusion of multiple neutral networks
CN106878995A (en) * 2017-04-27 2017-06-20 重庆邮电大学 A kind of wireless sensor network Exception Type discrimination method based on perception data
CN106951778A (en) * 2017-03-13 2017-07-14 步步高电子商务有限责任公司 A kind of intrusion detection method towards complicated flow data event analysis
CN107395590A (en) * 2017-07-19 2017-11-24 福州大学 A kind of intrusion detection method classified based on PCA and random forest
CN108712404A (en) * 2018-05-04 2018-10-26 重庆邮电大学 A kind of Internet of Things intrusion detection method based on machine learning
CN109818798A (en) * 2019-02-19 2019-05-28 上海海事大学 A kind of wireless sensor network intruding detection system and method merging KPCA and ELM
CN110809009A (en) * 2019-12-12 2020-02-18 江苏亨通工控安全研究院有限公司 Two-stage intrusion detection system applied to industrial control network
CN110825068A (en) * 2019-09-29 2020-02-21 惠州蓄能发电有限公司 Industrial control system anomaly detection method based on PCA-CNN

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10367841B2 (en) * 2016-12-16 2019-07-30 Patternex, Inc. Method and system for learning representations for log data in cybersecurity

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101399672A (en) * 2008-10-17 2009-04-01 章毅 Intrusion detection method for fusion of multiple neutral networks
CN106951778A (en) * 2017-03-13 2017-07-14 步步高电子商务有限责任公司 A kind of intrusion detection method towards complicated flow data event analysis
CN106878995A (en) * 2017-04-27 2017-06-20 重庆邮电大学 A kind of wireless sensor network Exception Type discrimination method based on perception data
CN107395590A (en) * 2017-07-19 2017-11-24 福州大学 A kind of intrusion detection method classified based on PCA and random forest
CN108712404A (en) * 2018-05-04 2018-10-26 重庆邮电大学 A kind of Internet of Things intrusion detection method based on machine learning
CN109818798A (en) * 2019-02-19 2019-05-28 上海海事大学 A kind of wireless sensor network intruding detection system and method merging KPCA and ELM
CN110825068A (en) * 2019-09-29 2020-02-21 惠州蓄能发电有限公司 Industrial control system anomaly detection method based on PCA-CNN
CN110809009A (en) * 2019-12-12 2020-02-18 江苏亨通工控安全研究院有限公司 Two-stage intrusion detection system applied to industrial control network

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
PCA-LSTM在网络入侵检测中的应用;张宝华等;《价值工程》;20200528(第15期);全文 *
一种基于PCA和随机森林分类的入侵检测算法研究;林伟宁等;《信息网络安全》;20171110(第11期);全文 *
基于随机森林和XGBoost的网络入侵检测模型;陈卓等;《信号处理》;20200731(第07期);正文第2-5页 *

Also Published As

Publication number Publication date
CN112437053A (en) 2021-03-02

Similar Documents

Publication Publication Date Title
CN112437053B (en) Intrusion detection method and device
CN113657545B (en) User service data processing method, device, equipment and storage medium
CN111027069B (en) Malicious software family detection method, storage medium and computing device
CN110704840A (en) Convolutional neural network CNN-based malicious software detection method
US7444279B2 (en) Question answering system and question answering processing method
CN112464638B (en) Text clustering method based on improved spectral clustering algorithm
US8738534B2 (en) Method for providing with a score an object, and decision-support system
CN106060008B (en) A kind of network intrusions method for detecting abnormality
CN109657011B (en) Data mining system for screening terrorist attack event crime groups
CN111695597B (en) Credit fraud group identification method and system based on improved isolated forest algorithm
CN109190698B (en) Classification and identification system and method for network digital virtual assets
CN115577357A (en) Android malicious software detection method based on stacking integration technology
CN112183652A (en) Edge end bias detection method under federated machine learning environment
CN110929525A (en) Network loan risk behavior analysis and detection method, device, equipment and storage medium
CN115600194A (en) Intrusion detection method, storage medium and device based on XGboost and LGBM
Utami et al. Hoax information detection system using apriori algorithm and random forest algorithm in twitter
Sitorus et al. Sensing trending topics in twitter for greater Jakarta area
CN113807073B (en) Text content anomaly detection method, device and storage medium
CN117035983A (en) Method and device for determining credit risk level, storage medium and electronic equipment
CN117408699A (en) Telecom fraud recognition method based on bank card data
CN115936773A (en) Internet financial black product identification method and system
Wålinder Evaluation of logistic regression and random forest classification based on prediction accuracy and metadata analysis
CN112529319A (en) Grading method and device based on multi-dimensional features, computer equipment and storage medium
CN111581640A (en) Malicious software detection method, device and equipment and storage medium
Holm Machine learning and spending patterns: A study on the possibility of identifying riskily spending behaviour

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant