CN115766094A - Data processing method, network intrusion detection device, medium and controller - Google Patents

Data processing method, network intrusion detection device, medium and controller Download PDF

Info

Publication number
CN115766094A
CN115766094A CN202211291677.6A CN202211291677A CN115766094A CN 115766094 A CN115766094 A CN 115766094A CN 202211291677 A CN202211291677 A CN 202211291677A CN 115766094 A CN115766094 A CN 115766094A
Authority
CN
China
Prior art keywords
data
dtest
training
preset
dtrain
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202211291677.6A
Other languages
Chinese (zh)
Inventor
刘创
崔毅
孙林
陆唯佳
刘鹏
罗勇
沈文枫
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
United Automotive Electronic Systems Co Ltd
Original Assignee
United Automotive Electronic Systems Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by United Automotive Electronic Systems Co Ltd filed Critical United Automotive Electronic Systems Co Ltd
Priority to CN202211291677.6A priority Critical patent/CN115766094A/en
Publication of CN115766094A publication Critical patent/CN115766094A/en
Pending legal-status Critical Current

Links

Images

Landscapes

  • Complex Calculations (AREA)

Abstract

The invention belongs to the technical field of artificial intelligence, and particularly relates to a data processing method, a network intrusion detection device, a medium and a controller; the method and the product disclosed by the embodiment have the capability of both supervised learning and unsupervised learning, and the existing abnormal samples in the training data can be regarded as the samples generated by self-encoding to be used for the countercheck training process, so that the recall rate of the abnormal samples in the discrimination process is improved; or training an Energy-Based antagonistic Network EBGAN (Energy Based generated adaptive Network) by using a normal data set, taking a Discriminator D (Discriminator) as an Energy function, assigning an Energy value smaller than a preset Energy threshold to a first data density region, and assigning an Energy value larger than the preset Energy threshold to other regions, and further detecting abnormal data according to the above partitions when processing data to be discriminated; the invention is particularly suitable for the intrusion detection process of data imbalance, and the detection precision, the recall rate, the f1 score and other indexes of the intrusion detection process are also improved.

Description

Data processing method, network intrusion detection device, medium and controller
Technical Field
The invention belongs to the technical field of artificial intelligence, and particularly relates to a data processing method, a network intrusion detection device, a medium and a controller.
Background
Intrusion detection is an important ring for ensuring network information security, is commonly used for actively monitoring and defending malicious attacks and provides decision support for related management processes; the performance of the network security system is good and bad, and indexes such as abnormal detection rate, false alarm rate and the like of the network security system are obviously influenced.
The intrusion detection problem can be regarded as a classification problem in the artificial intelligence technology, and can be generally divided into two classification methods, namely supervised learning and unsupervised learning.
Supervised learning methods, such as Support Vector Machines (SVM) method, K-Nearest neighbor (KNN) method; in intrusion detection, normal data and intrusion data samples are unbalanced due to the difficulty in obtaining intrusion data, which is not favorable for the application of the method.
In the unsupervised learning method, a Local abnormal Factor method LOF (Local outer Factor) and an isolated Forest method IF (Isolation Forest) based on probability density Outlier detection are better in performance when the sample is unbalanced, but misjudgment can still occur on normal samples which never occur and deviate from a central area; on the other hand, high complexity and dimensionality of network traffic data also limits the feature expression capabilities of these methods.
The process based on generating the antagonistic Network method GAN (generic adaptive Network) can be used for unsupervised learning of complex and high-dimensional data distribution, and the problem of unbalanced samples can be solved to a certain extent; for example, a method of generating an anti-Network AnoGAN (analog genetic adaptive Network) based on an Anomaly; however, these processes are poor in real-time performance and often difficult to meet the time-on-the-fly requirements of intrusion detection.
Disclosure of Invention
The invention discloses a data processing method, which comprises a first preprocessing and training step and a second self-encoding and rebuilding step; the method comprises the following steps of a first preprocessing and training step, wherein a preset number of first-class random vectors in a first sample space Pz are randomly extracted in the first preprocessing and training step, and a first training set Dtrain is generated, wherein data z in the first sample space Pz correspond to known distribution or belong to network traffic data which accord with a preset safety strategy; training the energy-based generation countermeasure network EBGAN by adopting data G (z) of a first training set Dtrain, and acquiring the distribution of the first training set Dtrain in the data space; based on the data characteristics obtained by learning, the information to be detected can be classified and identified.
Further, a second self-encoding reconstruction step acquires a second test data set Dtest, wherein the second test data set Dtest comprises abnormal data to be detected; reconstructing an anomaly score D (x) of the second test data set Dtest with a self-encoder generating the countermeasure network EBGAN, the anomaly score D (x) being a preset statistic of the first training set Dtrain or the second test data set Dtest; if the abnormal score D (x) falls into a preset range, judging that: abnormal data exists in the second test data set Dtest.
Further, the first training set Dtrain and the second test data set Dtest data may be stored for subsequent processing; wherein, the preset statistic may be Mean Square Error (MSE); if the result of the mean square error processing process is larger than a preset threshold phi, judging that: abnormal data exists in the second test data set Dtest.
Specifically, the training process for generating the countermeasure network EBGAN can enable the generated countermeasure network EBGAN to obtain the characteristics of the first sample space Pz thereof by approximating the extreme value of the preset loss function L; and assigning an energy value E smaller than a preset energy threshold to the first data density region XX and assigning an energy value E larger than the preset energy threshold to the second data density region ZZ other than the first data density region XX with a discriminator D generating the countermeasure network EBGAN as an energy function.
Further, its threshold Φ may be the maximum anomaly score CMax or C%; the maximum abnormal score CMax is the maximum value of the abnormal score D (x) of each sample in the first training set Dtrain; c% is a preset percentage of the maximum anomaly score CMax.
Specifically, the first training set Dtrain and the second test data set Dtest can be stored in vector form, and are written as:
dtrain = { (x 1, y 1), (x 2, y 2), …, (xi, yi), …, (xm, ym) }, m is a positive integer, yi =0;
dtest = { (x 1, y 1), (x 2, y 2), …, (xj, yj), …, (xn, yn) }, n is a positive integer; if yj =0, the element is normal data, and if yj =1, the element is abnormal data;
model training can also be carried out on the first training set Dtrain, and model evaluation is carried out on the second test data set Dtest; its anomaly score D (x) can be given by:
d (x) = MSE (x, dec (Enc (x))), where Dec () represents a decoding process, enc () represents an encoding process, and MSE () represents a mean square error processing process.
The discriminator loss function LD (x, z) and generator loss function LG (z) may also be constructed such that:
LD(x,z)= D(x)+Max(0, m-D(G(z))),
LG(z)= D(G(z));
where m is greater than zero, x is the data sample (110), and G (z) is the sample generated by the generator.
Furthermore, the data processing method can also comprise a third normalization and stabilization step, and the spectral normalization processing is carried out on the self-encoder of the countermeasure network EBGAN, so as to improve the stability of the identification process.
The embodiment of the invention also discloses an intrusion detection device, which comprises a first preprocessing and training unit and a second self-encoding and reconstruction unit; the first preprocessing and training unit randomly extracts a preset number of first-class random vectors in a first sample space Pz and generates a first training set Dtrain, wherein data z in the first sample space Pz corresponds to known distribution or belongs to network traffic data which accord with a preset security strategy; the energy-based generation countermeasure network EBGAN is trained with data G (z) of the first training set Dtrain and the distribution of the first training set Dtrain in its data space is obtained.
Further, the second self-encoding and reconstructing unit acquires a second test data set Dtest, which includes abnormal data to be detected; reconstructing an anomaly score D (x) of the second test data set Dtest by using a self-encoder generating the countermeasure network EBGAN, wherein the anomaly score D (x) is a preset statistic of the first training set Dtrain or the second test data set Dtest; if the abnormal score D (x) falls into a preset range, judging that: abnormal data exists in the second test data set Dtest.
Specifically, the first training set Dtrain and the second test data set Dtest data may be stored for subsequent processing; wherein the preset statistic may be a result of a mean square error process; if the result of the mean square error processing process is larger than a preset threshold phi, judging that: abnormal data exists in the second test data set Dtest.
In the training process of generating the countermeasure network EBGAN, the characteristic of the first sample space Pz is obtained by approximating the extreme value of the preset loss function L; an energy value E smaller than a preset energy threshold is assigned to the first data density region XX and an energy value E larger than the preset energy threshold is assigned to the second data density region ZZ outside the first data density region XX as a function of energy by a discriminator D generating the countermeasure network EBGAN.
Specifically, the threshold Φ may be the maximum anomaly score CMax or C%; the maximum abnormal score CMax is the maximum value of the abnormal scores D (x) of all samples in the first training set Dtrain; the C% is a preset percentage of the maximum abnormality score CMax.
Further, the first training set Dtrain and the second test data set Dtest may be stored in vector form, and written as:
dtrain = { (x 1, y 1), (x 2, y 2), …, (xi, yi), …, (xm, ym) }, m is a positive integer, yi =0;
dtest = { (x 1, y 1), (x 2, y 2), …, (xj, yj), …, (xn, yn) }, n is a positive integer; if yj =0, the element is normal data, and if yj =1, the element is abnormal data;
performing model training on a first training set Dtrain, and performing model evaluation on a second test data set Dtest; its anomaly score D (x) can be given by:
d (x) = MSE (x, dec (Enc (x))), wherein Dec () represents a decoding process, enc () represents an encoding process, and MSE () represents a mean square error processing process;
further, the discriminator loss function LD (x, z) and the generator loss function LG (z) may be constructed such that:
LD(x,z)= D(x)+Max(0, m-D(G(z))),
LG(z)= D(G(z));
where m is greater than zero, x is the data sample (110), and G (z) is the sample generated by the generator.
Similarly, the intrusion detection device may further include a third normalization and stabilization unit that performs a spectral normalization process on the self-encoder generating the anti-network EBGAN.
In addition, the embodiment of the invention also discloses a computer storage medium, which comprises a storage medium body used for storing the computer program; the computer program, when executed by a microprocessor, may implement any of the data processing methods described above; further disclosed is a controller comprising any of the intrusion detection devices as above; and/or any of the computer storage media (903) described above.
The method and the product disclosed by the embodiment of the invention have the capabilities of supervised learning and unsupervised learning at the same time, and the existing abnormal samples in the training data can be regarded as the samples generated by self-encoding to be used for the countercheck training process, so that the recall rate of the abnormal samples in the discrimination process is improved; or training an Energy-Based antagonistic Network EBGAN (Energy Based generated adaptive Network) by using a normal data set, taking a Discriminator D (Discriminator) as an Energy function, assigning an Energy value smaller than a preset Energy threshold to a first data density region, and assigning an Energy value larger than the preset Energy threshold to other regions, and further detecting abnormal data according to the above partitions when processing data to be discriminated; the invention is particularly suitable for the intrusion detection process of data imbalance, and the detection precision, the recall rate, the F1 score and other indexes are also improved.
It should be noted that the terms "first", "second", and the like are used herein only for describing the components in the technical solution, and do not constitute a limitation on the technical solution, and are not understood as an indication or suggestion of the importance of the corresponding component; an element in the similar language "first", "second", etc. means that in the corresponding embodiment, the element includes at least one.
Drawings
To more clearly illustrate the technical solutions of the present invention and to facilitate further understanding of the technical effects, technical features and objects of the present invention, the present invention will be described in detail with reference to the accompanying drawings, which form an essential part of the specification, and which are used together with the embodiments of the present invention to illustrate the technical solutions of the present invention, but do not limit the present invention.
The same reference numerals in the drawings denote the same elements, and in particular:
FIG. 1 is a schematic structural diagram of the ENGAN self-coding model in the embodiments of the method and product of the present invention.
FIG. 2 is a diagram illustrating the SN normalization effect comparison in the embodiments of the method and the product of the present invention.
FIG. 3 shows the t-SNE visual clustering result of the reconstructed samples according to the embodiment of the product of the present invention.
FIG. 4 is a comparison of the accuracy, recall, and f1 score of the method and product embodiments of the present invention and other related algorithms.
FIG. 5 is a comparison of accuracy, recall, and f1 score before and after Normalization by SN (Switchable Normalization) in accordance with embodiments of the present methods and products.
FIG. 6 is a schematic diagram of a composition structure of an embodiment of the method of the present invention.
Fig. 7 is a schematic structural diagram of an intrusion detection device according to an embodiment of the present invention.
Fig. 8 is a first schematic structural diagram of an embodiment of the present invention.
Fig. 9 is a schematic structural diagram of a second embodiment of the present invention.
Fig. 10 is a third schematic structural diagram of an embodiment of the present invention.
Wherein:
100-a first preprocessing and training step;
110-data sample x;
120-data z;
121-data G (z);
200-a second self-encoding and reconstruction step;
210-an encoding process;
220-a decoding process;
230-mean square error processing;
240-energy value;
300-a third normalization and stabilization step;
401-comparing the processing precision, the recall rate and the f1 score of the embodiment of the invention;
402-effect of spectral normalization SN processing;
500-f1 score comparison;
501-curves using the SN method;
502-curves without SN method;
600-a network intrusion detection device;
a 700-t distribution random neighborhood embedding t-SNE (t-distributed stored neighboring embedding) visualization graph;
900 — smart vehicle or web application system;
901 a controller;
903 — computer storage media.
Detailed Description
The present invention will be described in further detail with reference to the accompanying drawings and examples. Of course, the following specific examples are provided only for explaining the technical solutions of the present invention, and are not intended to limit the present invention. In addition, the portions shown in the embodiments or the drawings are only illustrations of the relevant portions of the present invention, and are not all of the present invention.
The data processing method shown in fig. 1 and fig. 6 includes a first preprocessing and training step 100, a second self-encoding and reconstructing step 200; a first preprocessing and training step 100 randomly extracts a preset number of first-class random vectors in a first sample space Pz and generates a first training set Dtrain, wherein data z in the first sample space Pz corresponds to known distribution or belongs to network traffic data conforming to a preset security policy; training an energy-based generation countermeasure network EBGAN by adopting data G (z) of a first training set Dtrain, and acquiring the distribution of the first training set Dtrain in a data space; based on this distribution, newly acquired data can be classified and identified.
Specifically, the second self-encoding reconstruction step 200 acquires a second test data set Dtest, where the second test data set Dtest includes abnormal data to be detected; reconstructing an anomaly score D (x) of the second test data set Dtest by using a self-encoder generating the countermeasure network EBGAN, wherein the anomaly score D (x) is a preset statistic of the first training set Dtrain or the second test data set Dtest; if the abnormal score D (x) falls into a preset range, judging that: abnormal data exists in the second test data set Dtest.
Further, as shown in fig. 1, the first training set Dtrain and the second test data set Dtest data are stored for subsequent processing; wherein the preset statistic is Mean Square Error (MSE); if the result of the mean square error processing process is larger than a preset threshold phi, judging that: abnormal data exists in the second test data set Dtest.
In the training process of generating the confrontation network EBGAN, the characteristics of a first sample space Pz are obtained by the generated confrontation network EBGAN through approaching the extreme value of a preset loss function L; an energy value E smaller than a preset energy threshold is assigned to the first data density region XX and an energy value E larger than the preset energy threshold is assigned to the second data density region ZZ outside the first data density region XX as an energy function by a discriminator D generating the countermeasure network EBGAN.
Specifically, its threshold Φ may be chosen as the maximum anomaly score CMax or C%; the maximum abnormal score CMax is the maximum value of the abnormal score D (x) of each sample in the first training set Dtrain; the C% is a preset percentage of the maximum anomaly score CMax.
Further, the first training set Dtrain and the second test data set Dtest may be stored in vector form and written as:
dtrain = { (x 1, y 1), (x 2, y 2), …, (xi, yi), …, (xm, ym) }, m is a positive integer, yi =0;
dtest = { (x 1, y 1), (x 2, y 2), …, (xj, yj), …, (xn, yn) }, n is a positive integer; if yj =0, the element is normal data, and if yj =1, the element is abnormal data;
specifically, model training is carried out on a first training set Dtrain, and model evaluation is carried out on a second test data set Dtest; its anomaly score D (x) can be given by:
d (x) = MSE (x, dec (Enc (x))), wherein Dec () represents the decoding process 220, enc () represents the encoding process 210, MSE () represents the mean square error processing process 230;
the discriminator loss function LD (x, z) and generator loss function LG (z) may also be constructed such that:
LD(x,z)= D(x)+Max(0, m-D(G(z))),
LG(z)= D(G(z));
where m is greater than zero, x is the data sample (110), and G (z) is the sample generated by the generator.
In addition, as shown in fig. 6, the method may further include a third normalization and stabilization step 300; the third normalization and stabilization step 300 produces a spectral normalization of the self-encoder against the network EBGAN.
Further, the intrusion detection device 600 shown in fig. 7 and fig. 1 includes a first preprocessing and training unit 610, a second self-encoding and reconstructing unit 620; the first preprocessing and training unit 610 randomly extracts a preset number of first-class random vectors in a first sample space Pz and generates a first training set Dtrain, wherein data z in the first sample space Pz corresponds to known distribution or belongs to network traffic data conforming to a preset security policy; the energy-based generation countermeasure network EBGAN is trained with data G (z) of the first training set Dtrain, and the distribution of the first training set Dtrain in its data space is acquired.
Further, the second self-encoding and reconstructing unit acquires a second test data set Dtest, where the second test data set Dtest includes abnormal data to be detected; reconstructing an anomaly score D (x) of the second test data set Dtest with a self-encoder generating the countermeasure network EBGAN, the anomaly score D (x) being a preset statistic of the first training set Dtrain or the second test data set Dtest; if the abnormal score D (x) falls into a preset range, judging that: abnormal data exists in the second test data set Dtest.
Further, the first training set Dtrain and the second test data set Dtest data may be stored for subsequent processing; wherein the preset statistic is the result of the mean square error processing 230; if the result of the mean square error processing 230 is greater than the preset threshold Φ, it is determined that: abnormal data exists in the second test data set Dtest.
The method comprises the steps that a training process of the countermeasure network EBGAN can be generated, and the characteristics of a first sample space Pz can be obtained by the countermeasure network EBGAN through approaching the extreme value of a preset loss function L; an energy value E smaller than a preset energy threshold is assigned to the first data density region XX and an energy value E larger than the preset energy threshold is assigned to the second data density region ZZ outside the first data density region XX as an energy function by a discriminator D generating the countermeasure network EBGAN.
Specifically, the threshold Φ may be the maximum anomaly score CMax or C%; the maximum abnormal score CMax is the maximum value of the abnormal score D (x) of each sample in the first training set Dtrain; c% is a preset percentage of the maximum anomaly score CMax.
Further, the first training set Dtrain and the second testing data set Dtest may be stored in vector form, denoted as:
dtrain = { (x 1, y 1), (x 2, y 2), …, (xi, yi), …, (xm, ym) }, m is a positive integer, yi =0;
dtest = { (x 1, y 1), (x 2, y 2), …, (xj, yj), …, (xn, yn) }, n is a positive integer; if yj =0, the element is normal data, and if yj =1, the element is abnormal data;
specifically, model training may be performed on a first training set Dtrain, and model evaluation may be performed on a second test data set Dtest; its anomaly score D (x) can be given by:
d (x) = MSE (x, dec (Enc (x))), where Dec () represents the decoding process 220, enc () represents the encoding process 210, MSE () represents the mean square error processing process 230.
The discriminator loss function LD (x, z) and generator loss function LG (z) may also be constructed such that:
LD(x,z)= D(x)+Max(0, m-D(G(z))),
LG(z)= D(G(z));
where m is greater than zero, x is the data sample (110), and G (z) is the sample generated by the generator.
Further, the intrusion detection device 600 may further include a third normalization and stabilization unit 630, and the third normalization and stabilization unit 630 performs spectrum normalization processing on the self-encoder of the anti-intrusion network EBGAN.
As shown in fig. 8, 9, and 10, a computer storage medium 903 to which the same inventive concept is applied includes a storage medium body for storing a computer program; the computer program, when executed by a microprocessor, may implement any of the data processing methods described above; the controller 901 may include any of the intrusion detection devices 600 described above; and/or any computer storage media 903.
In order to illustrate the technical effect of the present invention, an experiment performed on a network intrusion data set according to an embodiment of the present invention is given as follows; comparing the method with an existing intrusion detection method based on generation of an anti-Network GAN (generic adaptive Network), the results 401 of the comparison respectively show the indexes such as accuracy, recall ratio and f1 score as shown in fig. 4.
Further, the technical effect 402 of adding spectral normalization SN processing is shown in FIG. 5: although the detection result of the whole model is not significantly influenced, the SN method has a great effect on the stable training of the whole model.
In the experiment, one round of test can be performed after each round of data training, and an f1 score comparison graph 500 shown in fig. 2 is adopted; wherein, reference numeral 501 represents a curve using the SN method; the process without the SN method is shown as reference numeral 502 in the figure; the influence of the SN technology on the model stability training can be observed through the graph.
Specifically, the f1 fraction of the test process of the model can be counted after each round of training, and a corresponding curve can be drawn; compared with other intrusion detection technologies based on GAN, the method and the product embodiment of the invention can improve the f1 score of abnormal detection and improve the performance of intrusion detection when processing intrusion detection data.
In order to further evaluate the effectiveness of the method and the product of the present invention, the results shown in fig. 3 can be obtained by storing the reconstructed samples of all the samples to be tested in the testing stage, clustering the reconstructed samples by using a K-means (KMeans) method, and visualizing the results by using t-SNE.
As can be seen from fig. 3: the samples to be detected comprise normal samples or abnormal samples, the clustering result of the samples to be detected has a relatively obvious partition aggregation result, and the model can better reconstruct and distinguish the samples to be detected.
On the other hand, if there are already abnormal samples in the first training set Dtrain, these abnormal samples can also be regarded as abnormal samples made by the generator and added to the confrontation training; by doing so, the recall rate index of the discriminator on the abnormal sample can be improved; in other words, the method and the discriminator in the product have the capability of supervised learning and unsupervised learning in a two-classification scene.
It should be noted that the above examples are only for clearly illustrating the technical solutions of the present invention, and those skilled in the art will understand that the embodiments of the present invention are not limited to the above contents, and obvious changes, substitutions or replacements can be made based on the above contents without departing from the scope covered by the technical solutions of the present invention; other embodiments will fall within the scope of the invention without departing from the inventive concept.

Claims (14)

1. A data processing method, characterized in that it comprises a first preprocessing and training step (100), a second self-encoding and reconstruction step (200); the first preprocessing and training step (100) randomly extracts a preset number of first-class random vectors in a first sample space Pz and generates a first training set Dtrain, wherein data z (120) in the first sample space Pz corresponds to known distribution or belongs to network traffic data conforming to a preset security policy; training an energy-based generation countermeasure network EBGAN with data G (z) (121) of the first training set Dtrain and obtaining a distribution of the first training set Dtrain in its data space;
the second self-encoding reconstruction step (200) obtains a second test data set Dtest, wherein the second test data set Dtest comprises abnormal data to be detected; reconstructing an anomaly score D (x) of the second test data set Dtest using the self-encoder for generating a countermeasure network EBGAN, the anomaly score D (x) being a preset statistic of the first training set Dtrain or the second test data set Dtest; if the abnormal score D (x) falls into a preset range, judging that: abnormal data exists in the second test data set Dtest.
2. The data processing method of claim 1, wherein: storing the first training set Dtrain and the second test data set Dtest data for subsequent processing; wherein the preset statistic is Mean Square Error (MSE) (230); if the result of the mean square error processing procedure (230) is greater than a preset threshold phi, then it is determined that: abnormal data exists in the second test data set Dtest.
3. The data processing method of claim 1 or 2, wherein: approximating an extreme value of a preset loss function L to the training process of generating the countermeasure network EBGAN to enable the generating countermeasure network EBGAN to obtain the characteristics of the first sample space Pz; assigning an energy value E (240) smaller than a preset energy threshold to a first data density region XX and assigning the energy value E (240) larger than the preset energy threshold to a second data density region ZZ outside the first data density region XX as an energy function by the discriminator D for generating the countermeasure network EBGAN.
4. The data processing method of claim 3, wherein: the threshold Φ is the maximum anomaly score CMax or C%; the maximum anomaly score CMax is the maximum value of the anomaly score D (x) of each sample in the first training set Dtrain; and C% is a preset percentage of the maximum abnormal fraction CMax.
5. The data processing method of claim 1 or 2, wherein: storing the first training set Dtrain and the second test data set Dtest as vector form, and recording as:
dtrain = { (x 1, y 1), (x 2, y 2), …, (xi, yi), …, (xm, ym) }, m is a positive integer, yi =0;
dtest = { (x 1, y 1), (x 2, y 2), …, (xj, yj), …, (xn, yn) }, n is a positive integer; if yj =0, the element is normal data, and if yj =1, the element is abnormal data;
performing model training on the first training set Dtrain, and performing model evaluation on the second test data set Dtest; the anomaly score D (x) is given by:
d (x) = MSE (x, dec (Enc (x))), wherein Dec () represents the decoding process (220), enc () represents the encoding process (210), and MSE () represents the mean square error processing process (230);
or constructing the discriminator loss function LD (x, z) and the generator loss function LG (z) such that:
LD(x,z)= D(x)+Max(0, m-D(G(z))),
LG(z)= D(G(z));
where m is greater than zero, x is the data sample (110), and G (z) is the sample produced by the generator.
6. The data processing method of claim 1 or 2, further comprising a third normalization and stabilization step (300); the third normalization and stabilization step (300) performs a spectral normalization process on the self-encoder that generates the countermeasure network EBGAN.
7. An intrusion detection device (600) comprising a first pre-processing and training unit (610), a second self-encoding and reconstruction unit (620); the first preprocessing and training unit (610) randomly extracts a preset number of first-class random vectors in a first sample space Pz and generates a first training set Dtrain, and data z (120) in the first sample space Pz corresponds to known distribution or belongs to network traffic data conforming to a preset security policy; training an energy-based generation countermeasure network EBGAN with data G (z) (121) of the first training set Dtrain and obtaining a distribution of the first training set Dtrain in a data space thereof;
the second self-encoding and reconstruction unit (620) acquires a second test data set Dtest which includes abnormal data to be detected; reconstructing an anomaly score D (x) of the second test data set Dtest using the self-encoder for generating the countermeasure network EBGAN, the anomaly score D (x) being a preset statistic of the first training set dtain or the second test data set Dtest; if the abnormal score D (x) falls into a preset range, judging that: abnormal data exists in the second test data set Dtest.
8. The intrusion detection device (600) according to claim 7, wherein: storing the first training set Dtrain and the second test data set Dtest data for subsequent processing; wherein the preset statistic is a result of a mean square error process (230); if the result of the mean square error processing procedure (230) is greater than a preset threshold phi, then it is determined that: abnormal data exists in the second test data set Dtest.
9. The intrusion detection device (600) according to claim 7 or 8, wherein: approximating an extreme value of a preset loss function L to the training process of generating the countermeasure network EBGAN to enable the generating countermeasure network EBGAN to obtain the characteristics of the first sample space Pz; assigning an energy value E (240) smaller than a preset energy threshold to a first data density region XX and assigning the energy value E (240) larger than the preset energy threshold to a second data density region ZZ outside the first data density region XX as an energy function by the discriminator D for generating the countermeasure network EBGAN.
10. The intrusion detection device (600) according to claim 9, wherein: the threshold Φ is the maximum anomaly score CMax or C%; the maximum anomaly score CMax is the maximum value of the anomaly score D (x) of each sample in the first training set Dtrain; and C% is a preset percentage of the maximum abnormal fraction CMax.
11. The intrusion detection device (600) according to claim 7 or 8, wherein: storing the first training set Dtrain and the second testing data set Dtest as vector form, and recording as:
dtrain = { (x 1, y 1), (x 2, y 2), …, (xi, yi), …, (xm, ym) }, m is a positive integer, yi =0;
dtest = { (x 1, y 1), (x 2, y 2), …, (xj, yj), …, (xn, yn) }, n is a positive integer; if yj =0, the element is normal data, and if yj =1, the element is abnormal data;
performing model training on the first training set Dtrain, and performing model evaluation on the second test data set Dtest; the anomaly score D (x) is given by:
d (x) = MSE (x, dec (Enc (x))), wherein Dec () represents the decoding process (220), enc () represents the encoding process (210), and MSE () represents the mean square error processing process (230);
or constructing the discriminator loss function LD (x, z) and the generator loss function LG (z) such that:
LD(x,z)= D(x)+Max(0, m-D(G(z))),
LG(z)= D(G(z));
where m is greater than zero, x is the data sample (110), and G (z) is the sample produced by the generator.
12. The intrusion detection device (600) according to claim 7 or 8, further comprising a third normalization and stabilization unit (630), the third normalization and stabilization unit (630) spectrally normalizing the self-encoder generating the countermeasure network EBGAN.
13. A computer storage medium (903) comprising a storage medium body for storing a computer program; the computer program, when executed by a microprocessor, implements the data processing method of any of claims 1 to 6.
14. A controller (901) comprising an intrusion detection device (600) according to any one of claims 7 to 12; and/or the computer storage medium (903) of any of claim 13.
CN202211291677.6A 2022-10-21 2022-10-21 Data processing method, network intrusion detection device, medium and controller Pending CN115766094A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211291677.6A CN115766094A (en) 2022-10-21 2022-10-21 Data processing method, network intrusion detection device, medium and controller

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211291677.6A CN115766094A (en) 2022-10-21 2022-10-21 Data processing method, network intrusion detection device, medium and controller

Publications (1)

Publication Number Publication Date
CN115766094A true CN115766094A (en) 2023-03-07

Family

ID=85352450

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211291677.6A Pending CN115766094A (en) 2022-10-21 2022-10-21 Data processing method, network intrusion detection device, medium and controller

Country Status (1)

Country Link
CN (1) CN115766094A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116633639A (en) * 2023-05-30 2023-08-22 北京交通大学 Network intrusion detection method based on unsupervised and supervised fusion reinforcement learning

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116633639A (en) * 2023-05-30 2023-08-22 北京交通大学 Network intrusion detection method based on unsupervised and supervised fusion reinforcement learning
CN116633639B (en) * 2023-05-30 2024-04-12 北京交通大学 Network intrusion detection method based on unsupervised and supervised fusion reinforcement learning

Similar Documents

Publication Publication Date Title
Eltanbouly et al. Machine learning techniques for network anomaly detection: A survey
Zhang et al. An effective deep learning based scheme for network intrusion detection
Salo et al. Clustering enabled classification using ensemble feature selection for intrusion detection
CN112149609A (en) Black box anti-sample attack method for electric energy quality signal neural network classification model
Tran et al. Machine learning for prediction of imbalanced data: Credit fraud detection
Wang et al. Gaussian process multi-class classification for transformer fault diagnosis using dissolved gas analysis
Hallaji et al. Detection of malicious SCADA communications via multi-subspace feature selection
CN111783845A (en) Hidden false data injection attack detection method based on local linear embedding and extreme learning machine
CN115766094A (en) Data processing method, network intrusion detection device, medium and controller
Song et al. Feedback learning for improving the robustness of neural networks
Zakariah et al. Intrusion Detection System with Customized Machine Learning Techniques for NSL-KDD Dataset.
Al-Khassawneh An investigation of the Intrusion detection system for the NSL-KDD dataset using machine-learning algorithms
Thanh et al. An approach to reduce data dimension in building effective network intrusion detection systems
Roy et al. A robust framework for adaptive selection of filter ensembles to detect adversarial inputs
CN116563690A (en) Unmanned aerial vehicle sensor type unbalanced data anomaly detection method and detection system
Tamy et al. Select the best machine learning algorithms for prediction and classification of intrusions using kdd99 intrusion detection dataset
Xiao et al. An Integrated Approach Fusing CEEMD Energy Entropy and Sparrow Search Algorithm‐Based PNN for Fault Diagnosis of Rolling Bearings
Tosin et al. Negative selection algorithm based intrusion detection model
Henda et al. A novel SVM based CFS for intrusion detection in IoT network
CN114118680A (en) Network security situation assessment method and system
Sharma et al. Machine Learning-Based Anomaly Detection in the Internet of Things
Garg et al. To evaluate and analyze the performance of anomaly detection in cloud of things
Vuong et al. A Comparison of Feature Selection and Feature Extraction in Network Intrusion Detection Systems
Cai et al. Machine learning-based threat identification of industrial internet
Alshaikh et al. Advanced techniques for cyber threat intelligence-based APT detection and mitigation in cloud environments

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination