CN116545783B - Sparse logistic regression-based network intrusion detection method and device - Google Patents

Sparse logistic regression-based network intrusion detection method and device Download PDF

Info

Publication number
CN116545783B
CN116545783B CN202310825797.8A CN202310825797A CN116545783B CN 116545783 B CN116545783 B CN 116545783B CN 202310825797 A CN202310825797 A CN 202310825797A CN 116545783 B CN116545783 B CN 116545783B
Authority
CN
China
Prior art keywords
intrusion detection
probability model
detection probability
data
logistic regression
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202310825797.8A
Other languages
Chinese (zh)
Other versions
CN116545783A (en
Inventor
刘瑞景
罗远哲
李雪茹
薛瑞亭
陆立军
王军亮
李玉琼
王明玉
刘志明
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shandong Wanlihong Information Technology Co ltd
Beijing China Super Industry Information Security Technology Ltd By Share Ltd
Original Assignee
Shandong Wanlihong Information Technology Co ltd
Beijing China Super Industry Information Security Technology Ltd By Share Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shandong Wanlihong Information Technology Co ltd, Beijing China Super Industry Information Security Technology Ltd By Share Ltd filed Critical Shandong Wanlihong Information Technology Co ltd
Priority to CN202310825797.8A priority Critical patent/CN116545783B/en
Publication of CN116545783A publication Critical patent/CN116545783A/en
Application granted granted Critical
Publication of CN116545783B publication Critical patent/CN116545783B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/02Network architectures or network communication protocols for network security for separating internal from external traffic, e.g. firewalls
    • H04L63/0227Filtering policies
    • H04L63/0236Filtering by address, protocol, port number or service, e.g. IP-address or URL
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N7/00Computing arrangements based on specific mathematical models
    • G06N7/01Probabilistic graphical models, e.g. probabilistic networks
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/14Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic
    • H04L63/1408Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic by monitoring network traffic
    • H04L63/1416Event detection, e.g. attack signature detection
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/14Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic
    • H04L63/1408Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic by monitoring network traffic
    • H04L63/1425Traffic logging, e.g. anomaly detection
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/30Network architectures or network communication protocols for network security for supporting lawful interception, monitoring or retaining of communications or communication related information
    • H04L63/306Network architectures or network communication protocols for network security for supporting lawful interception, monitoring or retaining of communications or communication related information intercepting packet switched data communications, e.g. Web, Internet or IMS communications
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L9/00Cryptographic mechanisms or cryptographic arrangements for secret or secure communications; Network security protocols
    • H04L9/40Network security protocols
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D30/00Reducing energy consumption in communication networks
    • Y02D30/50Reducing energy consumption in communication networks in wire-line communication networks, e.g. low power modes or reduced link rate

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Security & Cryptography (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Computer Hardware Design (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computational Mathematics (AREA)
  • Mathematical Optimization (AREA)
  • Algebra (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Probability & Statistics with Applications (AREA)
  • Mathematical Analysis (AREA)
  • Artificial Intelligence (AREA)
  • Pure & Applied Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Technology Law (AREA)
  • Burglar Alarm Systems (AREA)
  • Alarm Systems (AREA)

Abstract

The application discloses a network intrusion detection method and device based on sparse logistic regression, belonging to the technical field of computer network security, wherein the method comprises the following steps: collecting a data set for intrusion detection in a network to be detected; the data set comprises network traffic data or system logs; carrying out normalization processing on continuous data in a data set, carrying out feature scaling, and carrying out coding processing on discrete value data in the data set; extracting features from original data in a data set, dividing the data set after extracting the features into a training set and a testing set, wherein the extracted features comprise a source address, a target address, a port number and a transmission protocol of network connection; establishing an intrusion detection probability model by using sparse logistic regression, and training the intrusion detection probability model; inputting the data set acquired in real time into the trained intrusion detection probability model, carrying out network intrusion detection, and intercepting intrusion behaviors in time. The application improves the efficiency and accuracy of network intrusion detection and effectively intercepts intrusion behavior.

Description

Sparse logistic regression-based network intrusion detection method and device
Technical Field
The application relates to a network intrusion detection method and device based on sparse logistic regression, and belongs to the technical field of computer network security.
Background
Intrusion detection (Intrusion Detection) detection of intrusion behavior. It checks whether there is evidence of behavior and attack that violates security policies in a network or system by collecting and analyzing information of several key points in the network or system. Intrusion detection, which is an active security technique, provides real-time protection against internal attacks, external attacks, and mishandling, and intercepts and responds to intrusions before the network system is compromised. Therefore, the intrusion detection technology is an intrusion detection technology which prevents further attacks through post analysis, and has the advantages of high detection cost performance, concentrated detection field of view, easiness in user cutting, no need of additionally arranging a hardware platform and the like.
Intrusion detection is a security technique whose primary purposes include identifying intruders, identifying intrusion behavior, detecting and monitoring successful security breaches, and providing important information for responding to measures in time. Current intrusion detection techniques fall into two categories: abnormal intrusion detection and misuse anomaly detection. The abnormal intrusion detection technology mainly comprises intrusion detection based on a neural network, intrusion detection based on pattern prediction, intrusion detection based on data mining and the like. The misuse intrusion detection mainly comprises: intrusion detection based on expert systems, intrusion detection based on conditional probabilities, intrusion detection based on state transitions, and the like. Each method has its advantages and disadvantages, and the applicable scenarios and systems are different.
Misuse anomaly detection has the ability to detect unknown attacks in the system, but has a higher false alarm rate. The abnormal intrusion detection can only find the existing attack, but is not in the way for the unknown attack in the system, so that the report missing rate of the abnormal intrusion detection is high. Therefore, the conventional intrusion detection technology has the problems of high false alarm rate and low detection efficiency.
Disclosure of Invention
In order to solve the problems, the application provides a network intrusion detection method and device based on sparse logistic regression, which can improve the efficiency and accuracy of network intrusion detection and effectively intercept intrusion behaviors.
The technical scheme adopted for solving the technical problems is as follows:
in a first aspect, the network intrusion detection method based on sparse logistic regression provided by the embodiment of the application includes the following steps:
collecting a data set for intrusion detection in a network to be detected; the data set comprises network traffic data or a system log;
carrying out normalization processing on continuous data in the data set to carry out feature scaling, and carrying out encoding processing on discrete value data in the data set;
extracting features from the original data in the data set, dividing the data set after extracting the features into a training set and a testing set, wherein the extracted features comprise a source address, a target address, a port number and a transmission protocol of network connection;
establishing an intrusion detection probability model by using sparse logistic regression, and training the intrusion detection probability model;
inputting the data set acquired in real time into the trained intrusion detection probability model, carrying out network intrusion detection, and intercepting the detected intrusion behavior in time.
As one possible implementation of this embodiment, the dataset is an NSL-KDD benchmark dataset that includes features for investigation and target tags that display the name of the intruder or attack class.
As a possible implementation manner of this embodiment, the process of normalizing the continuous data in the dataset is:
performing linear transformation on the original data in the data set by adopting a minimum-maximum normalization method:
wherein ,for the original data +.>For the data transformed from the original data, +.>∈(0,1),/>Is the maximum value in the original data, +.>Is the minimum value in the original data.
As a possible implementation manner of this embodiment, the process of encoding discrete value data in the dataset is:
encoding discrete value data using one-time thermal encoding usingBit state register +.>Encoding the individual states; each state has its own register bit and only one bit is active at any time.
As a possible implementation manner of this embodiment, the establishing an intrusion detection probability model using sparse logistic regression and performing intrusion detection probability model training includes:
using sparse logistic regression to build an intrusion detection probability model:
wherein ,for intrusion detection probability, < > for>For intrusion detection probability model parameters, < >>Is a feature extracted from the raw data;
performing intrusion detection probability model training;
solving the intrusion detection probability model parameters by minimizing the loss function of logistic regression, and carrying out the intrusion detection probability model parametersRegularization;
carrying out intrusion detection probability model parameter solving by using a sparse logistic regression algorithm to obtain optimal intrusion detection probability model parameters;
substituting the optimal intrusion detection probability model parameters into the intrusion detection probability model to obtain a trained intrusion detection probability model.
As a possible implementation manner of this embodiment, the loss function of the minimized logistic regression is:
in the formula ,loss value for intrusion detection probability model parameter, < +.>For features extracted from the raw data, +.>For the original data +.>The intrusion detection probability model is output;
the intrusion detection probability model parameters are carried out by adopting the following stepsRegularization:
wherein ,for regularized item->For regularization parameters, ++>Is->Is a 1-norm of (c).
As a possible implementation manner of this embodiment, the process of performing intrusion detection probability model training is:
assuming that there isNA number of samples of the sample were taken,is a characteristic variable +.>The number of inputs of the variable(s),Xis thatIs used for the input matrix of the (c),Yis->Is a matrix of outputs of (a);
inputting sample data into an intrusion detection probability model to predict intrusion behavior;
is a binary response vector, +.>When the corresponding data is normal; />When the corresponding data is abnormal; when intrusion detection probability->When the data is normal data; probability of intrusion detectionAnd when the data is abnormal data of invasion.
As a possible implementation manner of the present embodiment, the method further comprises minimizing a loss function and a loss function of logistic regressionRegularization for intrusion detection probability model parameter solving, including:
step 1, initializing intrusion detection probability model parametersRegularization parameter->And counter->
Step 2, calculating a search point according to the following formula
wherein ,is a tuning parameter +.>For the calculated values of intrusion detection probability model parameters, < +.>The optimal value of the intrusion detection probability model parameter is obtained;
step 3, calculating gradient descent point by equation with adaptive step length
wherein ,for the step size obtained by the adaptive backtracking linear search, +.>Search Point->Is a decreasing gradient of (2);
step 4, calculating the optimal value of the intrusion detection probability model parameter according to the following formula
wherein ,for gradient descent point, ++>For regularization parameter->Is the sign of the adjacent operator, sgn ()A number function;
step 5, updating the optimal value of the intrusion detection probability model parameterAnd step size->When->And (3) withWhen the difference value is smaller than the threshold value, returning to the optimal value of the updated intrusion detection probability model parameter +.>The method comprises the steps of carrying out a first treatment on the surface of the Otherwise, let the counter j+1 and return to step 2 to continue calculating the optimal value of the intrusion detection probability model parameter +.>
As a possible implementation manner of this embodiment, before performing network intrusion detection, the network intrusion detection method further includes the following steps:
performing performance evaluation on the trained intrusion detection probability model training model:
wherein ,for the detection accuracy of the intrusion detection probability model training model, TP is the number of true positives, TN is the number of true negatives, FP is the number of false positives, and FN is the number of false negatives.
In a second aspect, a network intrusion detection device based on sparse logistic regression provided in an embodiment of the present application includes:
the data acquisition module is used for acquiring a data set for intrusion detection in the network to be detected; the data set comprises network traffic data or a system log;
the data preprocessing module is used for carrying out normalization processing on continuous data in the data set to carry out characteristic scaling and carrying out encoding processing on discrete value data in the data set;
the feature extraction module is used for extracting features of the original data in the data set, dividing the data set after the features are extracted into a training set and a testing set, wherein the extracted features comprise a source address, a target address, a port number and a transmission protocol of network connection;
the model building module is used for building an intrusion detection probability model by using sparse logistic regression and carrying out intrusion detection probability model training;
the intrusion detection module is used for inputting the data set acquired in real time into the trained intrusion detection probability model, carrying out network intrusion detection and intercepting the detected intrusion behavior in time.
In a third aspect, an embodiment of the present application provides a computer device, including a processor, a memory, and a bus, where the memory stores machine-readable instructions executable by the processor, and when the computer device is running, the processor communicates with the memory through the bus, and the processor executes the machine-readable instructions to perform steps of a network intrusion detection method based on sparse logistic regression as any of the above.
In a fourth aspect, an embodiment of the present application provides a storage medium having stored thereon a computer program which, when executed by a processor, performs the steps of any of the sparse logistic regression-based network intrusion detection methods described above.
The technical scheme of the embodiment of the application has the following beneficial effects:
the network intrusion detection method based on sparse logistic regression in the technical scheme of the embodiment of the application comprises the following steps: collecting a data set for intrusion detection in a network to be detected; the data set comprises network traffic data or a system log; carrying out normalization processing on continuous data in the data set to carry out feature scaling, and carrying out encoding processing on discrete value data in the data set; extracting features from the original data in the data set, dividing the data set after extracting the features into a training set and a testing set, wherein the extracted features comprise a source address, a target address, a port number and a transmission protocol of network connection; establishing an intrusion detection probability model by using sparse logistic regression, and training the intrusion detection probability model; inputting the data set acquired in real time into the trained intrusion detection probability model, carrying out network intrusion detection, and intercepting the detected intrusion behavior in time. The application not only improves the efficiency and accuracy of network intrusion detection and effectively intercepts network intrusion behavior, but also rapidly and accurately solves the problems of high false alarm rate and low detection efficiency of the traditional intrusion detection technology by preprocessing the data set and extracting the intrusion identification characteristic.
Drawings
FIG. 1 is a flowchart illustrating a sparse logistic regression-based network intrusion detection method, according to an example embodiment;
FIG. 2 is a flowchart illustrating a method for solving for optimal values of intrusion detection probability model parameters using a sparse logistic regression algorithm, according to an exemplary embodiment;
FIG. 3 is a schematic diagram illustrating a sparse logistic regression-based network intrusion detection system, according to an example embodiment.
Detailed Description
The application is further illustrated by the following examples in conjunction with the accompanying drawings:
in order to clearly illustrate the technical features of the present solution, the present application will be described in detail below with reference to the following detailed description and the accompanying drawings. The following disclosure provides many different embodiments, or examples, for implementing different structures of the application. In order to simplify the present disclosure, components and arrangements of specific examples are described below. Furthermore, the present application may repeat reference numerals and/or letters in the various examples. This repetition is for the purpose of simplicity and clarity and does not in itself dictate a relationship between the various embodiments and/or configurations discussed. It should be noted that the components illustrated in the figures are not necessarily drawn to scale. Descriptions of well-known components and processing techniques and processes are omitted so as to not unnecessarily obscure the present application.
As shown in fig. 1, the network intrusion detection method based on sparse logistic regression provided by the embodiment of the application includes the following steps:
collecting a data set for intrusion detection in a network to be detected; the data set comprises network traffic data or system logs or other intrusion-related information;
carrying out normalization processing on continuous data in the data set to carry out feature scaling, and carrying out encoding processing on discrete value data in the data set;
extracting features from the original data in the dataset, dividing the dataset after extracting the features into a training set and a testing set, wherein the extracted features comprise information such as a source address, a target address, a port number, a transmission protocol and the like of network connection, and other statistics related to intrusion detection;
establishing an intrusion detection probability model by using sparse logistic regression, and training the intrusion detection probability model;
inputting the data set acquired in real time into the trained intrusion detection probability model, carrying out network intrusion detection, and intercepting the detected intrusion behavior in time.
As one possible implementation of this embodiment, the dataset is an NSL-KDD benchmark dataset that includes features for investigation and target tags that display the name of the intruder or attack class.
As a possible implementation manner of this embodiment, the process of normalizing the continuous data in the dataset is:
performing linear transformation on the original data in the data set by adopting a minimum-maximum normalization method:
wherein ,for the original data +.>For the data transformed from the original data, +.>∈(0,1),/>Is the maximum value in the original data, +.>Is the minimum value in the original data.
The normalization method is to perform linear transformation on the original data, and the transformed data falls in the (0, 1) interval. Assuming that the dataset containsItems, each item having +.>Dimension features, wherein->Is>Item->The value of the characteristic is a value of,is +.>The first dimension of the individual item +.>Is the minimum value of the feature, +.>Is characterized in the first place before normalizationMaximum value in dimension, < >>Is normalized->Characteristic +.>And D, maintaining values.
As a possible implementation manner of this embodiment, the process of encoding discrete value data in the dataset is:
encoding discrete value data using One-Hot Encoding (One-Hot Encoding)Bit state register +.>Encoding the individual states; each state has its own register bit and only one bit is active at any time.
As a possible implementation manner of this embodiment, the establishing an intrusion detection probability model using sparse logistic regression and performing intrusion detection probability model training includes:
using sparse logistic regression to build an intrusion detection probability model:
wherein ,for intrusion detection probability, < > for>For intrusion detection probability model parameters, < >>Is->Transpose of->Is a feature extracted from the raw data;
performing intrusion detection probability model training;
solving the intrusion detection probability model parameters by minimizing the loss function of logistic regression, and carrying out the intrusion detection probability model parametersRegularization;
carrying out intrusion detection probability model parameter solving by using a sparse logistic regression algorithm to obtain optimal intrusion detection probability model parameters;
substituting the optimal intrusion detection probability model parameters into the intrusion detection probability model to obtain a trained intrusion detection probability model.
As a possible implementation manner of this embodiment, the loss function of the minimized logistic regression is:
in the formula ,loss value for intrusion detection probability model parameter, < +.>For features extracted from the raw data, +.>For the original data +.>The intrusion detection probability model is output;
the intrusion detection probability model parameters are carried out by adopting the following stepsRegularization:
wherein ,for regularized item->For regularization parameters, ++>Is->Is a 1-norm of (c).
Features are extracted from the raw data, assuming N pieces of data,is the output result,/->Is an input feature->,/>For use in training and testing intrusion detection models.
As a possible implementation manner of this embodiment, the process of performing intrusion detection probability model training is:
assuming that there isNA number of samples of the sample were taken,is characterized byVariable (I)>Input number of variable>Is->Is used for the input matrix of the (c),Y=/>is->Is a matrix of outputs of (a);
inputting sample data into an intrusion detection probability model to predict intrusion behavior;
is a binary response vector, +.>When the corresponding data is normal; />When the corresponding data is abnormal; when intrusion detection probability->When the data is normal data; probability of intrusion detectionAnd when the data is abnormal data of invasion.
As a possible implementation manner of the present embodiment, the method further comprises minimizing a loss function and a loss function of logistic regressionRegularization is carried out to solve the parameters of the intrusion detection probability model, namely, a sparse logistic regression algorithm is adopted to solve the optimal values of the intrusion detection probability model parameters, the specific process is shown in figure 2, and the bagThe method comprises the following steps:
step 1, initializing intrusion detection probability model parametersRegularization parameter->And counter->
Step 2, calculating a search point according to the following formula
wherein ,is a tuning parameter +.>For the calculated values of intrusion detection probability model parameters, < +.>The optimal value of the intrusion detection probability model parameter is obtained;
step 3, calculating gradient descent point by equation with adaptive step length
wherein ,for the step size obtained by the adaptive backtracking linear search, +.>Search Point->Is a decreasing gradient of (2);
step 4, calculating the optimal value of the intrusion detection probability model parameter according to the following formula
wherein ,for gradient descent point, ++>For regularization parameter->Is a sign function;
step 5, updating the optimal value of the intrusion detection probability model parameterAnd step size->When->And (3) withWhen the difference value is smaller than the threshold value, returning to the optimal value of the updated intrusion detection probability model parameter +.>The method comprises the steps of carrying out a first treatment on the surface of the Otherwise, let the counter j+1 and return to step 2 to continue calculating the optimal value of the intrusion detection probability model parameter +.>
As a possible implementation manner of this embodiment, before performing network intrusion detection, the network intrusion detection method further includes the following steps:
performing performance evaluation on the trained intrusion detection probability model training model:
wherein ,for the detection accuracy of the intrusion detection probability model training model, TP is the number of true positives, TN is the number of true negatives, FP is the number of false positives, and FN is the number of false negatives.
The performance of the trained model is evaluated using the test dataset, and classification metrics such as accuracy, precision, recall, and F1 score are calculated. The performance of the trained model is evaluated by using the test data set, and the calculation index of the application is the accuracy:where TP is the number of true positives (model predicted positive and actually positive), TN is the number of true negatives (model predicted negative and actually negative), FP is the number of false positives (model predicted positive but actually negative), and FN is the number of false negatives (model predicted negative but actually positive).
According to the embodiment, the data set is preprocessed and the intrusion identification characteristics are extracted, so that the efficiency and the accuracy of network intrusion detection are improved, network intrusion behaviors are effectively intercepted, and the problems of high false alarm rate and low detection efficiency of the traditional intrusion detection technology are rapidly and accurately solved.
As shown in fig. 3, a network intrusion detection device based on sparse logistic regression according to an embodiment of the present application includes:
the data acquisition module is used for acquiring a data set for intrusion detection in the network to be detected; the data set comprises network traffic data or a system log;
the data preprocessing module is used for carrying out normalization processing on continuous data in the data set to carry out characteristic scaling and carrying out encoding processing on discrete value data in the data set;
the feature extraction module is used for extracting features of the original data in the data set, dividing the data set after the features are extracted into a training set and a testing set, wherein the extracted features comprise a source address, a target address, a port number and a transmission protocol of network connection;
the model building module is used for building an intrusion detection probability model by using sparse logistic regression and carrying out intrusion detection probability model training;
the intrusion detection module is used for inputting the data set acquired in real time into the trained intrusion detection probability model, carrying out network intrusion detection and intercepting the detected intrusion behavior in time.
As one possible implementation of this embodiment, the dataset is an NSL-KDD benchmark dataset that includes features for investigation and target tags that display the name of the intruder or attack class.
As a possible implementation manner of this embodiment, the process of performing, by the data preprocessing module, normalization processing on continuous data in the dataset is:
performing linear transformation on the original data in the data set by adopting a minimum-maximum normalization method:
wherein ,for the original data +.>For the data transformed from the original data, +.>∈(0,1),/>Is the maximum value in the original data, +.>Is the minimum value in the original data.
The normalization method is to perform linear transformation on the original data, and the transformed data falls in the (0, 1) interval. Assuming that the dataset containsItems, each item having +.>Dimension features, wherein->Is>Item->The value of the characteristic is a value of,is +.>The first dimension of the individual item +.>Is the minimum value of the feature, +.>Is characterized in the first place before normalizationMaximum value in dimension, < >>Is normalized->Characteristic +.>And D, maintaining values.
As a possible implementation manner of this embodiment, the process of performing, by the data preprocessing module, encoding processing on the discrete value data in the data set is:
encoding discrete value data using One-Hot Encoding (One-Hot Encoding)Bit state register +.>Encoding the individual states; each state has its own register bit and only one bit is active at any time.
As a possible implementation manner of this embodiment, the model building module is specifically configured to:
using sparse logistic regression to build an intrusion detection probability model:
wherein ,for intrusion detection probability, < > for>For intrusion detection probability model parameters, < >>Is a feature extracted from the raw data;
performing intrusion detection probability model training;
intrusion detection by minimizing logistic regression loss functionSolving the probability model parameters, and carrying out intrusion detection on the probability model parametersRegularization;
carrying out intrusion detection probability model parameter solving by using a sparse logistic regression algorithm to obtain optimal intrusion detection probability model parameters;
substituting the optimal intrusion detection probability model parameters into the intrusion detection probability model to obtain a trained intrusion detection probability model.
As a possible implementation manner of this embodiment, the loss function of the minimized logistic regression is:
in the formula ,loss value for intrusion detection probability model parameter, < +.>For features extracted from the raw data, +.>For the original data +.>The intrusion detection probability model is output;
the intrusion detection probability model parameters are carried out by adopting the following stepsRegularization:
wherein ,is regularInnovative item, ten (Ten), ten (Ten)>For regularization parameters, ++>Is->Is a 1-norm of (c).
Features are extracted from the raw data, assuming N pieces of data,is the output result,/->Is an input feature->,/>For use in training and testing intrusion detection models.
As a possible implementation manner of this embodiment, the process of performing intrusion detection probability model training is:
assuming that there isNA number of samples of the sample were taken,is a characteristic variable +.>Input number of variable>Is->Is used for the input matrix of the (c),Y=/>is->Is a matrix of outputs of (a);
inputting sample data into an intrusion detection probability model to predict intrusion behavior;
is a binary response vector, +.>When the corresponding data is normal; />When the corresponding data is abnormal; when intrusion detection probability->When the data is normal data; probability of intrusion detectionAnd when the data is abnormal data of invasion.
As a possible implementation manner of the present embodiment, the method further comprises minimizing a loss function and a loss function of logistic regressionRegularization is carried out to solve the intrusion detection probability model parameters, namely, a sparse logistic regression algorithm is adopted to solve the optimal values of the intrusion detection probability model parameters, and the specific process is shown in fig. 2 and comprises the following steps:
step 1, initializing intrusion detection probability model parametersRegularization parameter->And counter->
Step 2, calculating a search point according to the following formula
wherein ,is a tuning parameter +.>For the calculated values of intrusion detection probability model parameters, < +.>The optimal value of the intrusion detection probability model parameter is obtained;
step 3, calculating gradient descent point by equation with adaptive step length
wherein ,for the step size obtained by the adaptive backtracking linear search, +.>Search Point->Is a decreasing gradient of (2);
step 4, calculating the optimal value of the intrusion detection probability model parameter according to the following formula
wherein ,for gradient descent point, ++>For regularization parameter->Is a sign function;
step 5, updating the optimal value of the intrusion detection probability model parameterAnd step size->When->And (3) withWhen the difference value is smaller than the threshold value, returning to the optimal value of the updated intrusion detection probability model parameter +.>The method comprises the steps of carrying out a first treatment on the surface of the Otherwise, let the counter j+1 and return to step 2 to continue calculating the optimal value of the intrusion detection probability model parameter +.>
As a possible implementation manner of this embodiment, the network intrusion detection device further includes:
the performance evaluation module is used for evaluating the performance of the trained intrusion detection probability model training model:
wherein ,for the detection accuracy of the intrusion detection probability model training model, TP is the number of true positives, TN is the number of true negatives, FP is the number of false positives, and FN is the number of false negatives.
The performance of the trained model is evaluated using the test dataset, and classification metrics such as accuracy, precision, recall, and F1 score are calculated. The performance of the trained model is evaluated by using the test data set, and the calculation index of the application is the accuracy:where TP is the number of true positives (model predicted positive and actually positive), TN is the number of true negatives (model predicted negative and actually negative), FP is the number of false positives (model predicted positive but actually negative), and FN is the number of false negatives (model predicted negative but actually positive).
According to the embodiment, the data set is preprocessed and the intrusion identification characteristics are extracted, so that the efficiency and the accuracy of network intrusion detection are improved, network intrusion behaviors are effectively intercepted, and the problems of high false alarm rate and low detection efficiency of the traditional intrusion detection technology are rapidly and accurately solved.
The method comprises the steps of mining original data, and extracting characteristic attributes of the preprocessed data through regularized typical correlation analysis; in the training stage, training a sparse logistic regression model by using a cross-validation combined grid search mode to obtain a classifier with higher efficiency and accuracy; and in the classification stage, detecting the data to obtain a performance evaluation result of the intrusion detection probability model. The application not only improves the efficiency and accuracy of network intrusion detection and effectively intercepts network intrusion behavior, but also rapidly and accurately solves the problems of high false alarm rate and low detection efficiency of the traditional intrusion detection technology.
The embodiment of the application provides a computer device, which comprises a processor, a memory and a bus, wherein the memory stores machine-readable instructions executable by the processor, when the device runs, the processor and the memory are communicated through the bus, and the processor executes the machine-readable instructions to execute the steps of the network intrusion detection method based on sparse logic regression.
In particular, the above memory and processor can be general-purpose memory and processor, which are not limited herein, and the above network intrusion detection method based on sparse logistic regression can be performed when the processor runs a computer program stored in the memory.
It will be appreciated by those skilled in the art that the structure of the computer device is not limiting of the computer device and may include more or fewer components than shown, or may be combined with or separated from certain components, or may be arranged in a different arrangement of components.
In some embodiments, the computer device may further include a touch screen operable to display a graphical user interface (e.g., a launch interface of an application) and to receive user operations with respect to the graphical user interface (e.g., launch operations with respect to the application). A particular touch screen may include a display panel and a touch panel. The display panel may be configured in the form of an LCD (Liquid Crystal Display), an OLED (Organic Light-Emitting Diode), or the like. The touch panel may collect touch or non-touch operations on or near the user and generate preset operation instructions, for example, operations of the user on or near the touch panel using any suitable object or accessory such as a finger, a stylus, or the like. In addition, the touch panel may include two parts, a touch detection device and a touch controller. The touch detection device detects the touch azimuth and the touch gesture of a user, detects signals brought by touch operation and transmits the signals to the touch controller; the touch controller receives touch information from the touch detection device, converts the touch information into information which can be processed by the processor, sends the information to the processor, and can receive and execute commands sent by the processor. In addition, the touch panel may be implemented by various types such as resistive, capacitive, infrared, and surface acoustic wave, or may be implemented by any technology developed in the future. Further, the touch panel may overlay the display panel, and a user may operate on or near the touch panel overlaid on the display panel according to a graphical user interface displayed by the display panel, and upon detection of an operation thereon or thereabout, the touch panel is transferred to the processor to determine a user input, and the processor then provides a corresponding visual output on the display panel in response to the user input. In addition, the touch panel and the display panel may be implemented as two independent components or may be integrated.
Corresponding to the above method for starting an application program, the embodiment of the present application further provides a storage medium, where a computer program is stored, and when the computer program is executed by a processor, the steps of any sparse logistic regression-based network intrusion detection method are performed.
The starting device of the application program provided by the embodiment of the application can be specific hardware on the equipment or software or firmware installed on the equipment. The device provided by the embodiment of the present application has the same implementation principle and technical effects as those of the foregoing method embodiment, and for the sake of brevity, reference may be made to the corresponding content in the foregoing method embodiment where the device embodiment is not mentioned. It will be clear to those skilled in the art that, for convenience and brevity, the specific operation of the system, apparatus and unit described above may refer to the corresponding process in the above method embodiment, which is not described in detail herein.
It will be appreciated by those skilled in the art that embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
In the embodiments provided in the present application, it should be understood that the disclosed apparatus and method may be implemented in other manners. The above-described apparatus embodiments are merely illustrative, for example, the division of modules is merely a logical function division, and there may be additional divisions in actual implementation, and for example, multiple modules or components may be combined or integrated into another system, or some features may be omitted, or not performed. In addition, the coupling or direct coupling or communication connection shown or discussed with respect to each other may be through some communication interface, indirect coupling or communication connection of devices or modules, electrical, mechanical, or other form.
The modules illustrated as separate components may or may not be physically separate, and components shown as modules may or may not be physical modules, i.e., may be located in one place, or may be distributed over a plurality of network modules. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of this embodiment.
In addition, each functional module in the embodiment provided by the application may be integrated in one processing module, or each module may exist alone physically, or two or more modules may be integrated in one module.
The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
Finally, it should be noted that: the above embodiments are only for illustrating the technical aspects of the present application and not for limiting the same, and although the present application has been described in detail with reference to the above embodiments, it should be understood by those of ordinary skill in the art that: modifications and equivalents may be made to the specific embodiments of the application without departing from the spirit and scope of the application, which is intended to be covered by the claims.

Claims (8)

1. A network intrusion detection method based on sparse logistic regression is characterized by comprising the following steps:
collecting a data set for intrusion detection in a network to be detected; the data set comprises network traffic data or a system log;
carrying out normalization processing on continuous data in the data set to carry out feature scaling, and carrying out encoding processing on discrete value data in the data set;
extracting features from the original data in the data set, dividing the data set after extracting the features into a training set and a testing set, wherein the extracted features comprise a source address, a target address, a port number and a transmission protocol of network connection;
establishing an intrusion detection probability model by using sparse logistic regression, and training the intrusion detection probability model;
inputting the data set acquired in real time into a trained intrusion detection probability model, performing network intrusion detection, and intercepting the detected intrusion behavior in time;
the process of encoding the discrete value data in the dataset is as follows:
encoding discrete value data using one-time thermal encoding usingBit state register +.>Encoding the individual states; each state has its own register bit and only one bit is active at any time;
the method for establishing an intrusion detection probability model by using sparse logistic regression and performing intrusion detection probability model training comprises the following steps:
using sparse logistic regression to build an intrusion detection probability model:
wherein ,for intrusion detection probability, < > for>For intrusion detection probability model parameters, < >>Is a feature extracted from the raw data;
performing intrusion detection probability model training;
solving the intrusion detection probability model parameters by minimizing the loss function of logistic regression, and carrying out the intrusion detection probability model parametersRegularization;
carrying out intrusion detection probability model parameter solving by using a sparse logistic regression algorithm to obtain optimal intrusion detection probability model parameters;
substituting the optimal intrusion detection probability model parameters into the intrusion detection probability model to obtain a trained intrusion detection probability model;
the loss function of the minimized logistic regression is:
in the formula ,loss value for intrusion detection probability model parameter, < +.>For features extracted from the raw data, +.>For the original data +.>The intrusion detection probability model is output;
the intrusion detection probability model parameters are carried out by adopting the following stepsRegularization:
wherein ,for regularized item->Is positive toParameters are then changed>Is->Is a 1-norm of (c).
2. The sparse logistic regression-based network intrusion detection method according to claim 1, wherein the dataset is an NSL-KDD benchmark dataset comprising features for investigation and target tags displaying intruder or attack class names.
3. The sparse logistic regression-based network intrusion detection method of claim 1, wherein the process of performing intrusion detection probability model training is:
assuming that there isNA number of samples of the sample were taken,is a characteristic variable +.>As the number of inputs to the variable,Xis->Is used for the input matrix of the (c),Yis->Is a matrix of outputs of (a);
inputting sample data into an intrusion detection probability model to predict intrusion behavior;
is a binary response vector, +.>When the corresponding data is normal;/>When the corresponding data is abnormal; when intrusion detection probability->When the data is normal data; when intrusion detection probability->And when the data is abnormal data of invasion.
4. The sparse logistic regression-based network intrusion detection method according to claim 3, wherein the minimizing the loss function sum of logistic regressionRegularization for intrusion detection probability model parameter solving, including:
step 1, initializing intrusion detection probability model parametersRegularization parameter->And counter->
Step 2, calculating a search point according to the following formula
wherein ,is a tuning parameter +.>For the calculated values of intrusion detection probability model parameters, < +.>The optimal value of the intrusion detection probability model parameter is obtained;
step 3, calculating gradient descent point by equation with adaptive step length
wherein ,for the step size obtained by the adaptive backtracking linear search, +.>Search Point->Is a decreasing gradient of (2);
step 4, calculating the optimal value of the intrusion detection probability model parameter according to the following formula
wherein ,for gradient descent point, ++>For regularization parameter->Is a sign function;
step 5, updating the optimal value of the intrusion detection probability model parameterAnd step size->When->And->When the difference value is smaller than the threshold value, returning to the optimal value of the updated intrusion detection probability model parameter +.>The method comprises the steps of carrying out a first treatment on the surface of the Otherwise, let the counter j+1 and return to step 2 to continue calculating the optimal value of the intrusion detection probability model parameter +.>
5. The sparse logistic regression-based network intrusion detection method of any one of claims 1-4, further comprising the steps of, prior to network intrusion detection:
performing performance evaluation on the trained intrusion detection probability model training model:
wherein ,for the detection accuracy of the intrusion detection probability model training model, TP is the number of true positives, TN is the number of true negatives, FP is the number of false positives, and FN is the number of false negatives.
6. A sparse logistic regression-based network intrusion detection device, comprising:
the data acquisition module is used for acquiring a data set for intrusion detection in the network to be detected; the data set comprises network traffic data or a system log;
the data preprocessing module is used for carrying out normalization processing on continuous data in the data set to carry out characteristic scaling and carrying out encoding processing on discrete value data in the data set;
the feature extraction module is used for extracting features of the original data in the data set, dividing the data set after the features are extracted into a training set and a testing set, wherein the extracted features comprise a source address, a target address, a port number and a transmission protocol of network connection;
the model building module is used for building an intrusion detection probability model by using sparse logistic regression and carrying out intrusion detection probability model training;
the intrusion detection module is used for inputting the data set acquired in real time into the trained intrusion detection probability model, carrying out network intrusion detection and intercepting the detected intrusion behavior in time;
the process of encoding the discrete value data in the dataset is as follows:
encoding discrete value data using one-time thermal encoding usingBit state register +.>Encoding the individual states; each state has its own register bit and only one bit is active at any time;
the method for establishing an intrusion detection probability model by using sparse logistic regression and performing intrusion detection probability model training comprises the following steps:
using sparse logistic regression to build an intrusion detection probability model:
wherein ,for intrusion detection probability, < > for>For intrusion detection probability model parameters, < >>Is a feature extracted from the raw data;
performing intrusion detection probability model training;
solving the intrusion detection probability model parameters by minimizing the loss function of logistic regression, and carrying out the intrusion detection probability model parametersRegularization;
carrying out intrusion detection probability model parameter solving by using a sparse logistic regression algorithm to obtain optimal intrusion detection probability model parameters;
substituting the optimal intrusion detection probability model parameters into the intrusion detection probability model to obtain a trained intrusion detection probability model;
the loss function of the minimized logistic regression is:
in the formula ,loss value for intrusion detection probability model parameter, < +.>For features extracted from the raw data, +.>For the original data +.>The intrusion detection probability model is output;
the intrusion detection probability model parameters are carried out by adopting the following stepsRegularization:
wherein ,for regularized item->For regularization parameters, ++>Is->Is a 1-norm of (c).
7. A computer device comprising a processor, a memory and a bus, the memory storing machine-readable instructions executable by the processor, the processor and the memory in communication via the bus when the computer device is in operation, the processor executing the machine-readable instructions to perform the steps of the sparse logistic regression-based network intrusion detection method according to any one of claims 1 to 5.
8. A storage medium having stored thereon a computer program which, when executed by a processor, performs the steps of the sparse logistic regression based network intrusion detection method according to any one of claims 1 to 5.
CN202310825797.8A 2023-07-07 2023-07-07 Sparse logistic regression-based network intrusion detection method and device Active CN116545783B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310825797.8A CN116545783B (en) 2023-07-07 2023-07-07 Sparse logistic regression-based network intrusion detection method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310825797.8A CN116545783B (en) 2023-07-07 2023-07-07 Sparse logistic regression-based network intrusion detection method and device

Publications (2)

Publication Number Publication Date
CN116545783A CN116545783A (en) 2023-08-04
CN116545783B true CN116545783B (en) 2023-10-03

Family

ID=87451037

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310825797.8A Active CN116545783B (en) 2023-07-07 2023-07-07 Sparse logistic regression-based network intrusion detection method and device

Country Status (1)

Country Link
CN (1) CN116545783B (en)

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114553545A (en) * 2022-02-24 2022-05-27 中国人民解放军海军航空大学航空基础学院 Intrusion flow detection and identification method and system
CN115021997A (en) * 2022-05-26 2022-09-06 广州中南网络技术有限公司 Network intrusion detection system based on machine learning

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20160065594A1 (en) * 2014-08-29 2016-03-03 Verizon Patent And Licensing Inc. Intrusion detection platform

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114553545A (en) * 2022-02-24 2022-05-27 中国人民解放军海军航空大学航空基础学院 Intrusion flow detection and identification method and system
CN115021997A (en) * 2022-05-26 2022-09-06 广州中南网络技术有限公司 Network intrusion detection system based on machine learning

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
Network Intrusion Detection through Discriminative Feature Selection by Using Sparse Logistic Regression;Reehan Ali Shah 等;Future Internet;第2-4节 *
基于稀疏模型组合的网络入侵分类;Reehan Ali Shah;CNKI博士学位论文全文库;第42-50、85-86页 *

Also Published As

Publication number Publication date
CN116545783A (en) 2023-08-04

Similar Documents

Publication Publication Date Title
Du et al. Lifelong anomaly detection through unlearning
Murtaza et al. A host-based anomaly detection approach by representing system calls as states of kernel modules
US11481495B2 (en) Anomalous behavior detection in processor based systems
CN111709028B (en) Network security state evaluation and attack prediction method
US11575688B2 (en) Method of malware characterization and prediction
US10785243B1 (en) Identifying evidence of attacks by analyzing log text
CN110730164B (en) Safety early warning method, related equipment and computer readable storage medium
CN114090406A (en) Electric power Internet of things equipment behavior safety detection method, system, equipment and storage medium
Zhan et al. NSAPs: A novel scheme for network security state assessment and attack prediction
Lee et al. Early failure detection of paper manufacturing machinery using nearest neighbor‐based feature extraction
Alshehri et al. Cyberattack Detection Framework Using Machine Learning and User Behavior Analytics.
Morcos et al. A surrogate-based technique for Android malware detectors' explainability
CN114070642A (en) Network security detection method, system, device and storage medium
CN112613032B (en) Host intrusion detection method and device based on system call sequence
US11102082B1 (en) System and method for inferring operating systems using transmission control protocol fingerprints
CN109918901A (en) The method that real-time detection is attacked based on Cache
Setiawan et al. Comparison of lstm architecture for malware classification
CN116545783B (en) Sparse logistic regression-based network intrusion detection method and device
Kapotoglu Koc et al. Selection of best fit hardware performance counters to detect cache side-channel attacks
CN113282920A (en) Log abnormity detection method and device, computer equipment and storage medium
CN116708003A (en) Malicious encryption traffic detection method
Suratkar et al. Multi hidden markov models for improved anomaly detection using system call analysis
Hnamte et al. Network Intrusion Detection using Deep Convolution Neural Network
Qin et al. ADSAD: An unsupervised attention-based discrete sequence anomaly detection framework for network security analysis
Yan et al. Deepro: Provenance-based APT Campaigns Detection via GNN

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant