CN113434859B - Intrusion detection method, device, equipment and storage medium - Google Patents

Intrusion detection method, device, equipment and storage medium Download PDF

Info

Publication number
CN113434859B
CN113434859B CN202110740041.4A CN202110740041A CN113434859B CN 113434859 B CN113434859 B CN 113434859B CN 202110740041 A CN202110740041 A CN 202110740041A CN 113434859 B CN113434859 B CN 113434859B
Authority
CN
China
Prior art keywords
feature
model
intrusion detection
features
preset
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202110740041.4A
Other languages
Chinese (zh)
Other versions
CN113434859A (en
Inventor
李泽远
王健宗
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Ping An Technology Shenzhen Co Ltd
Original Assignee
Ping An Technology Shenzhen Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Ping An Technology Shenzhen Co Ltd filed Critical Ping An Technology Shenzhen Co Ltd
Priority to CN202110740041.4A priority Critical patent/CN113434859B/en
Publication of CN113434859A publication Critical patent/CN113434859A/en
Application granted granted Critical
Publication of CN113434859B publication Critical patent/CN113434859B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/50Monitoring users, programs or devices to maintain the integrity of platforms, e.g. of processors, firmware or operating systems
    • G06F21/55Detecting local intrusion or implementing counter-measures
    • G06F21/56Computer malware detection or handling, e.g. anti-virus arrangements
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/10Complex mathematical operations
    • G06F17/18Complex mathematical operations for evaluating statistical data, e.g. average values, frequency distributions, probability functions, regression analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/211Selection of the most significant subset of features
    • G06F18/2115Selection of the most significant subset of features by evaluating different subsets according to an optimisation criterion, e.g. class separability, forward selection or backward elimination
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2411Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on the proximity to a decision surface, e.g. support vector machines
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2413Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on distances to training or reference patterns
    • G06F18/24133Distances to prototypes
    • G06F18/24137Distances to cluster centroïds
    • G06F18/2414Smoothing the distance, e.g. radial basis function networks [RBFN]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/243Classification techniques relating to the number of classes
    • G06F18/24323Tree-organised classifiers
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D30/00Reducing energy consumption in communication networks
    • Y02D30/50Reducing energy consumption in communication networks in wire-line communication networks, e.g. low power modes or reduced link rate

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Computation (AREA)
  • Software Systems (AREA)
  • Computer Security & Cryptography (AREA)
  • Computational Mathematics (AREA)
  • Pure & Applied Mathematics (AREA)
  • Mathematical Analysis (AREA)
  • Mathematical Optimization (AREA)
  • Computer Hardware Design (AREA)
  • Mathematical Physics (AREA)
  • Algebra (AREA)
  • Health & Medical Sciences (AREA)
  • Probability & Statistics with Applications (AREA)
  • Databases & Information Systems (AREA)
  • Virology (AREA)
  • General Health & Medical Sciences (AREA)
  • Operations Research (AREA)
  • Image Analysis (AREA)
  • Burglar Alarm Systems (AREA)

Abstract

The invention relates to the field of artificial intelligence, and discloses an intrusion detection method, an intrusion detection device, intrusion detection equipment and a storage medium, wherein the intrusion detection method comprises the following steps: acquiring a sample data set from a database, and determining all features in the sample data set; based on a feature selection algorithm, performing feature selection on all features once to obtain a candidate feature set; calculating weight distribution of the features in the candidate feature set in the combined model, and performing secondary feature selection to obtain an optimal feature subset; training an initial intrusion detection model according to the optimal feature subset; based on the federal learning framework, an intrusion detection model is obtained; and acquiring data to be detected, inputting the data to be detected into an intrusion detection model, detecting a classification result, and if the detection classification result is intrusion behavior, performing early warning. The method improves the training efficiency of federal learning and the accuracy of identifying intrusion behaviors by performing two-step feature selection on global data, and in addition, the invention also relates to a blockchain technology, and a database can be stored in the blockchain.

Description

Intrusion detection method, device, equipment and storage medium
Technical Field
The present invention relates to the field of artificial intelligence, and in particular, to an intrusion detection method, apparatus, device, and storage medium.
Background
The Internet brings convenience to daily life and work of people and produces malicious behaviors, such as a plausible webpage for selling counterfeit goods and a phishing webpage for stealing money or personal information of users; an attacker attacks the computer system through a large amount of malicious flow data, so that the computer is paralyzed; injecting Trojan horse virus to control host computer to implement malicious behavior, etc. How to identify intrusion behavior in a computer network through intrusion detection technology is an important research problem in the field of internet security.
The intrusion detection technology can be used for expanding the data volume by combining a plurality of safety mechanisms under the condition of ensuring the data safety by introducing federal learning, and has excellent effect on detecting the damaged Internet of things equipment. However, due to the fact that the data volume is expanded by federal learning, the problems of uncorrelated, incomplete and redundant characteristics of a large number of network traffic data samples exist, so that the classification accuracy of attack detection is low, and the training time is long.
Disclosure of Invention
The invention mainly aims to solve the technical problem that the existing federal study has low accuracy in intrusion detection.
The first aspect of the present invention provides an intrusion detection method, including: acquiring a sample data set from a preset database, and determining all the characteristics in the sample data set; based on a preset feature selection algorithm, performing data quality evaluation on all features in the sample data set, and performing feature selection once from all features to obtain a candidate feature set of the sample data set; calculating weight distribution of the features in the candidate feature set in a preset combined model, and performing secondary feature selection according to the weight distribution to obtain an optimal feature subset; taking the optimal feature subset as a training sample to carry out model training to obtain an initial intrusion detection model; uploading model parameters and weights of the initial intrusion detection model to a central server for integration so that the central server performs a joint training task to generate a global model; taking the model parameters and the weights of the global model returned by the central server as the model parameters and the weights of the initial intrusion detection model, and calculating the training sample by using a preset loss function to obtain a loss function value, if the loss function value is smaller than a preset threshold value, obtaining an intrusion detection model according to the model parameters and the weights of the initial intrusion detection model; obtaining data to be detected, inputting the data to be detected into the intrusion detection model, and detecting a classification result, wherein the intrusion classification result comprises normal behaviors or intrusion behaviors; if the detection classification result is an intrusion behavior, generating an early warning signal, and sending the early warning signal to a terminal of a worker so as to perform intrusion detection early warning.
Optionally, in a first implementation manner of the first aspect of the present invention, the feature selection algorithm includes an information gain algorithm and a chi-square test algorithm, and before the performing, based on the preset feature selection algorithm, data quality evaluation on all features in the sample data set, performing feature selection once from all features, to obtain a candidate feature set of the sample data set, the method further includes: determining a feature quantity of all features in the sample dataset; judging whether the feature quantity is larger than a preset quantity or not; if yes, the feature selection algorithm uses an information gain algorithm to perform feature selection; if not, the feature selection algorithm uses a chi-square test algorithm to perform feature selection.
Optionally, in a second implementation manner of the first aspect of the present invention, when the feature selection algorithm is an information gain algorithm, the performing, based on a preset feature selection algorithm, data quality evaluation on all features in the sample data set, performing feature selection once from all features, and obtaining a candidate feature set of the sample data set includes: calculating the overall entropy of the sample data set according to an information gain algorithm, and calculating the conditional entropy of each feature in the sample data set; calculating signal gain values of all the features in the sample data set according to the overall entropy and the conditional entropy of all the features; and taking the characteristic of which the signal gain value is larger than a preset gain value as a candidate characteristic to obtain the candidate characteristic set.
Optionally, in a third implementation manner of the first aspect of the present invention, when the feature selection algorithm is a chi-square test algorithm, the performing, based on a preset feature selection algorithm, data quality evaluation on all features in the sample data set, performing feature selection once from all features, and obtaining a candidate feature set of the sample data set includes: determining a classification variable in the sample dataset; according to the chi-square checking algorithm and the classification variable, respectively calculating chi-square values of all the characteristics in the sample data set; and taking the characteristic of which the chi-square value is larger than a preset chi-square value as a candidate characteristic to obtain the candidate characteristic set.
Optionally, in a fourth implementation manner of the first aspect of the present invention, calculating a weight distribution of the features in the candidate feature set in a preset combination model, and performing secondary feature selection according to the weight distribution, to obtain the optimal feature subset includes: dividing the candidate feature set into n parts randomly to obtain n feature subsets, wherein n is a natural number not less than 1; equally dividing the feature subsets according to the number of preset combination models, respectively inputting the feature subsets into the combination models, and calculating weight distribution of features in the feature subsets; according to the weight distribution of the features in the n feature subsets, sequencing the features in the candidate feature sets, and deleting the features with the minimum weights to obtain the optimal feature subset of the round; judging whether the number of the current model training turns is greater than or equal to n; if not, re-sequencing the features, and returning to the step of sequencing the features in the candidate feature set according to the weight distribution of the features in the n feature subsets; if yes, the round of optimal feature subset is output as the optimal feature subset.
Optionally, in a fifth implementation manner of the first aspect of the present invention, the combination model includes an xgboost model, an svm model and a random forest model, and the calculating the weight distribution of the features in the feature subset includes: dividing the feature subset into three parts uniformly according to the number of the combined models to obtain three corresponding input feature sets; respectively inputting the three input feature sets into an xgboost model, an svm model and a random forest model, and carrying out feature scoring on the features in the input feature sets through the xgboost model, the svm model and the random forest model to obtain feature weight values of the three input feature sets; and calculating the weight distribution of the features in the feature subset according to the three feature weight values.
Optionally, in a sixth implementation manner of the first aspect of the present invention, the performing model training with the optimal feature subset as a training sample to obtain an initial intrusion detection model includes: inputting the training sample into a preset neural network model, and outputting an output value of intrusion classification prediction; calculating an initial loss value of the output value according to the output value and a preset initial loss function; and updating the model parameters of the neural network model according to the initial loss value and the output value, and continuously executing the step of inputting the model training sample into the neural network model until the initial loss value converges, so as to obtain an initial intrusion detection model.
A second aspect of the present invention provides an intrusion detection device comprising: the characteristic acquisition module is used for acquiring a sample data set from a preset database and determining all the characteristics in the sample data set; the first feature selection module is used for carrying out data quality evaluation on all features in the sample data set based on a preset feature selection algorithm, and carrying out feature selection once from all the features to obtain a candidate feature set of the sample data set; the second feature selection module is used for calculating weight distribution of the features in the candidate feature set in a preset combination model, and performing secondary feature selection according to the weight distribution to obtain an optimal feature subset; the initial model training module is used for carrying out model training by taking the optimal feature subset as a training sample to obtain an initial intrusion detection model; the parameter uploading module is used for uploading the model parameters and the weights of the initial intrusion detection model to a central server for integration so that the central server performs a joint training task to generate a global model; the loss function calculation module is used for taking the model parameters and the weights of the global model returned by the central server as the model parameters and the weights of the initial intrusion detection model, and calculating the training sample by using a preset loss function to obtain a loss function value; the model generation module is used for obtaining an intrusion detection model according to model parameters and weights of the initial intrusion detection model when the loss function value is smaller than a preset threshold value; the detection module is used for acquiring data to be detected, inputting the data to be detected into the intrusion detection model, and detecting a classification result, wherein the intrusion classification result comprises normal behaviors or intrusion behaviors; and the early warning module is used for generating an early warning signal when the detection classification result is an intrusion behavior, and sending the early warning signal to a terminal of a worker so as to perform intrusion detection early warning.
Optionally, in a first implementation manner of the second aspect of the present invention, the intrusion detection device further includes an algorithm selection module, where the algorithm selection module is specifically configured to: determining a feature quantity of all features in the sample dataset; judging whether the feature quantity is larger than a preset quantity or not; if yes, the feature selection algorithm uses an information gain algorithm to perform feature selection; if not, the feature selection algorithm uses a chi-square test algorithm to perform feature selection.
Optionally, in a second implementation manner of the second aspect of the present invention, when the feature selection algorithm is an information gain algorithm, the first feature selection module is specifically configured to: calculating the overall entropy of the sample data set according to an information gain algorithm, and calculating the conditional entropy of each feature in the sample data set; calculating signal gain values of all the features in the sample data set according to the overall entropy and the conditional entropy of all the features; and taking the characteristic of which the signal gain value is larger than a preset gain value as a candidate characteristic to obtain the candidate characteristic set.
Optionally, in a third implementation manner of the second aspect of the present invention, when the feature selection algorithm is a chi-square test algorithm, the first feature selection module is specifically configured to: determining a classification variable in the sample dataset; according to the chi-square checking algorithm and the classification variable, respectively calculating chi-square values of all the characteristics in the sample data set; and taking the characteristic of which the chi-square value is larger than a preset chi-square value as a candidate characteristic to obtain the candidate characteristic set.
Optionally, in a fourth implementation manner of the second aspect of the present invention, the second feature selection module specifically includes: the equally dividing unit is used for randomly equally dividing the candidate feature set into n parts to obtain n feature subsets, wherein n is a natural number not less than 1; the weight calculation unit is used for equally dividing the feature subsets according to the number of the preset combination models, respectively inputting the feature subsets into the combination models, and calculating weight distribution of features in the feature subsets; the sorting unit is used for sorting the features in the candidate feature set according to the weight distribution of the features in the n feature subsets, deleting the features with the minimum weights, and obtaining the optimal feature subset of the round; the judging unit is used for judging whether the number of the current model training turns is greater than or equal to n; the returning unit is used for re-sequencing the features when the number of the current model training turns is smaller than n, and returning to the step of sequencing the features in the candidate feature set according to the weight distribution of the features in the n feature subsets; and the feature output unit is used for outputting the optimal feature subset of the current model as the optimal feature subset when the number of the current model training turns is greater than or equal to n.
Optionally, in a fifth implementation manner of the second aspect of the present invention, the combined model includes an xgboost model, an svm model and a random forest model, and the weight calculating unit is specifically configured to: dividing the feature subset into three parts uniformly according to the number of the combined models to obtain three corresponding input feature sets; respectively inputting the three input feature sets into an xgboost model, an svm model and a random forest model, and carrying out feature scoring on the features in the input feature sets through the xgboost model, the svm model and the random forest model to obtain feature weight values of the three input feature sets; and calculating the weight distribution of the features in the feature subset according to the three feature weight values.
Optionally, in a sixth implementation manner of the second aspect of the present invention, the initial model training module is specifically configured to: inputting the training sample into a preset neural network model, and outputting an output value of intrusion classification prediction; calculating an initial loss value of the output value according to the output value and a preset initial loss function; and updating the model parameters of the neural network model according to the initial loss value and the output value, and continuously executing the step of inputting the model training sample into the neural network model until the initial loss value converges, so as to obtain an initial intrusion detection model.
A third aspect of the present invention provides an intrusion detection device comprising: a memory and at least one processor, the memory having instructions stored therein, the memory and the at least one processor being interconnected by a line; the at least one processor invokes the instructions in the memory to cause the intrusion detection device to perform the steps of the intrusion detection method described above.
A fourth aspect of the invention provides a computer readable storage medium having instructions stored therein which, when run on a computer, cause the computer to perform the steps of the intrusion detection method described above.
In the technical scheme of the invention, a sample data set is obtained from a database, and all the characteristics in the sample data set are determined; based on a feature selection algorithm, performing feature selection on all features once to obtain a candidate feature set; calculating weight distribution of the features in the candidate feature set in the combined model, and performing secondary feature selection to obtain an optimal feature subset; training an initial intrusion detection model according to the optimal feature subset; based on a federal learning framework, model parameters of an initial intrusion detection model obtained through training are sent to a central server for parameter integration, a global model is generated, iteration is carried out according to the model parameters of the global model, and finally the intrusion detection model is obtained; and acquiring data to be detected, inputting the data to be detected into an intrusion detection model, detecting a classification result, and if the detection classification result is intrusion behavior, performing early warning. According to the method, global data are pre-screened, the optimal feature subset is screened out finally after training of the combined model, federal learning is carried out, input data are simplified and redundancy is removed, and the training efficiency of federal learning and the accuracy of identifying intrusion behaviors are improved.
Drawings
FIG. 1 is a schematic diagram of a first embodiment of an intrusion detection method according to an embodiment of the present invention;
FIG. 2 is a diagram illustrating a second embodiment of an intrusion detection method according to an embodiment of the present invention;
FIG. 3 is a diagram illustrating a third embodiment of an intrusion detection method according to an embodiment of the present invention;
FIG. 4 is a diagram illustrating a fourth embodiment of an intrusion detection method according to an embodiment of the present invention;
FIG. 5 is a schematic diagram of a fifth embodiment of an intrusion detection method according to an embodiment of the present invention;
FIG. 6 is a schematic diagram of an intrusion detection device according to an embodiment of the present invention;
FIG. 7 is a schematic diagram of an intrusion detection device according to another embodiment of the present invention;
fig. 8 is a schematic diagram of an embodiment of an intrusion detection device according to an embodiment of the present invention.
Detailed Description
In the technical scheme of the invention, a sample data set is obtained from a database, and all the characteristics in the sample data set are determined; based on a feature selection algorithm, performing feature selection on all features once to obtain a candidate feature set; calculating weight distribution of the features in the candidate feature set in the combined model, and performing secondary feature selection to obtain an optimal feature subset; training an initial intrusion detection model according to the optimal feature subset; based on a federal learning framework, model parameters of an initial intrusion detection model obtained through training are sent to a central server for parameter integration, a global model is generated, iteration is carried out according to the model parameters of the global model, and finally the intrusion detection model is obtained; and acquiring data to be detected, inputting the data to be detected into an intrusion detection model, detecting a classification result, and if the detection classification result is intrusion behavior, performing early warning. According to the method, global data are pre-screened, the optimal feature subset is screened out finally after training of the combined model, federal learning is carried out, input data are simplified and redundancy is removed, and the training efficiency of federal learning and the accuracy of identifying intrusion behaviors are improved.
The terms "first," "second," "third," "fourth" and the like in the description and in the claims and in the above drawings, if any, are used for distinguishing between similar objects and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used may be interchanged where appropriate such that the embodiments described herein may be implemented in other sequences than those illustrated or otherwise described herein. Furthermore, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed or inherent to such process, method, article, or apparatus.
For easy understanding, the following describes a specific flow of an embodiment of the present invention, referring to fig. 1, and a first embodiment of an intrusion detection method in an embodiment of the present invention includes:
101. acquiring a sample data set from a preset database, and determining all the characteristics in the sample data set;
it will be appreciated that the execution body of the present invention may be an intrusion detection device, and may also be a terminal or a server, which is not limited herein. The embodiment of the invention is described by taking a server as an execution main body as an example.
It is emphasized that the database may be stored in a blockchain node in order to ensure privacy and security of the data.
In this embodiment, during the working process of each security mechanism, data is collected by the sensor on a network data packet, a log file, a system call trace and the like in the system, and is stored in the database after being manually marked, and when the training generation of the intrusion detection model is required, the data is obtained from the database as sample data.
102. Based on a preset feature selection algorithm, performing data quality evaluation on all features in a sample data set, and performing feature selection once from all the features to obtain a candidate feature set of the sample data set;
in this embodiment, the feature selection algorithm mainly includes an information gain algorithm and a chi-square test algorithm, and in general, it is determined which feature selection algorithm is specifically used according to the magnitude order of data in the sample data set, when the number of features in the data set reaches tens of millions, a subset with higher purity is more easily obtained by using the information gain algorithm, and when the frequency of occurrence of each feature is lower, but is a key feature, the chi-square test algorithm is used.
In this embodiment, through a feature selection algorithm, all features in the sample dataset are subjected to preliminary feature selection, and a preset number of features are selected from all the existing features in the sample dataset so as to optimize a specific index of the system, so that the dimension of the dataset is reduced, and the efficiency and performance of subsequent model training are improved.
103. Calculating weight distribution of the features in the candidate feature set in a preset combined model, and performing secondary feature selection according to the weight distribution to obtain an optimal feature subset;
in this embodiment, for a candidate feature set obtained after feature selection is performed once, feature selection is performed again, after feature subset is obtained by equally dividing the candidate feature set, feature subset is divided again according to the number of combined models, in this embodiment, the number of combined models is 3, including xgboost, svm, and random forest, feature importance evaluation is performed on features input into the candidate feature set through xgboost, svm, and random forest, and the combined model is used because feature selection is performed through a single model to easily cause poor classification, for example, for the same input sample, the classification effect on xgboost is better, the classification effect on svm is poor, feature importance evaluation values which are the same as the number of models in the model are obtained after feature importance evaluation is performed on the same features through combined models, and comprehensive feature importance evaluation values output by each model can be obtained, and then the weight distribution of the comprehensive evaluation features can be obtained according to the comprehensive feature importance evaluation values of all features in the candidate feature set.
In this embodiment, features with lower weight distribution represent lower contribution, features with the smallest weight distribution are deleted from the candidate feature set, and then are re-input into the combination model to perform weight distribution calculation, and when the cycle reaches a preset cycle, the remaining features are output as the optimal feature subset.
104. Taking the optimal feature subset as a training sample to perform model training to obtain an initial intrusion detection model;
in practical application, model training of an initial intrusion detection model can be performed based on various classification algorithms, intrusion detection classification algorithms based on a Support Vector Machine (SVM), K-Means, KNN and decision trees can be used, and the training process of the initial intrusion detection model is relatively existing and is not limited.
In this embodiment, the initial model parameters may be downloaded from a central server, so as to ensure that each security mechanism performs model training based on the same conditions.
105. Uploading model parameters and weights of the initial intrusion detection model to a central server for integration so that the central server performs a joint training task to generate a global model;
in this embodiment, the sample data of a single security mechanism may have the problems of small data volume, poor data quality, and the like, because the data of each security mechanism only circulates in the interior, so that a data island appears, federal learning is essentially a distributed machine learning technology, or a machine learning framework, the aim is to realize common modeling on the basis of ensuring data privacy security and legal compliance, and promote the effect of an AI model.
106. Taking the model parameters and the weights of the global model returned by the central server as the model parameters and the weights of the initial intrusion detection model, and calculating training samples by using a preset loss function to obtain a loss function value;
in this embodiment, after completing the joint training task, the central server obtains the global model after completing the training, and sends the model parameters and weights of the global model to each safety mechanism, because the problem that the data quality is poor or the data quantity is less may exist when the central server performs the training before each safety mechanism, the global model obtained by performing the joint training task may not be perfect enough, the local initial intrusion detection model updates the model parameters and weights of the global model as the model parameters and weights of the local initial intrusion detection model, and then re-performs the model training, and sends the trained model parameters to the central server again, and the iteration is repeated until the model parameters conform to the stop condition, which indicates that the accuracy of the global model prediction is high enough, and the intrusion detection model can be obtained according to the model parameters and weights of the global model when the loop stops.
107. If the loss function value is smaller than a preset threshold value, obtaining an intrusion detection model according to the model parameters and the weights;
In this embodiment, if the calculated loss function value is greater than or equal to the preset threshold, the step of model training is performed by taking the optimal feature subset as a training sample to obtain an initial intrusion detection model, and iterating is repeated until the obtained model parameters and weights meet the convergence condition.
In this embodiment, the model after feature selection is input into a data sample afferent neural network model with a label in a sample for training, and errors of a predicted value and a true value are propagated in opposite directions, so as to adjust weight parameters in the neural network model. Through continuous training, the local weight parameters in the concentrator are continuously optimized and adjusted, and when the loss function is minimized, training can be stopped, and optimization of the local model weight parameters is completed. Wherein the selected loss function is the mean square error.
108. Obtaining data to be detected, inputting the data to be detected into an intrusion detection model, and detecting a classification result, wherein the intrusion classification result comprises normal behaviors or intrusion behaviors;
in this embodiment, intrusion detection may be performed on data in the security mechanism in real time, where the data to be detected may be data currently connected to a network, where the definition of the network connection is a sequence of TCP packets from beginning to end in a certain period of time, and in this period of time, the data is transferred from a source IP address to a destination IP address under a predefined protocol (TCP, UDP), and by detecting the data connected to the network, real-time security of the system is ensured.
109. If the detection classification result is intrusion behavior, generating an early warning signal, and sending the early warning signal to a terminal of a worker so as to perform intrusion detection early warning.
In this embodiment, the system user is notified of the detection classification result dynamically, and if the detection result is an intrusion, the system will generate an early warning signal, where the early warning signal may be a reminding user, and the user may select to retain or delete the data to be detected; or the alarm information can also remind the user that the data is possibly intrusion data and intercept the intrusion data.
In this embodiment, by obtaining a sample dataset from a database and determining all features therein; based on a feature selection algorithm, performing feature selection on all features once to obtain a candidate feature set; calculating weight distribution of the features in the candidate feature set in the combined model, and performing secondary feature selection to obtain an optimal feature subset; training an initial intrusion detection model according to the optimal feature subset; based on a federal learning framework, model parameters of an initial intrusion detection model obtained through training are sent to a central server for parameter integration, a global model is generated, iteration is carried out according to the model parameters of the global model, and finally the intrusion detection model is obtained; and acquiring data to be detected, inputting the data to be detected into an intrusion detection model, detecting a classification result, and if the detection classification result is intrusion behavior, performing early warning. According to the method, global data are pre-screened, the optimal feature subset is screened out finally after training of the combined model, federal learning is carried out, input data are simplified and redundancy is removed, and the training efficiency of federal learning and the accuracy of identifying intrusion behaviors are improved.
Referring to fig. 2, a second embodiment of an intrusion detection method according to an embodiment of the present invention includes:
201. acquiring a sample data set from a preset database, and determining all the characteristics in the sample data set;
202. according to the information gain algorithm, calculating the overall entropy of the sample data set, and calculating the conditional entropy of each feature in the sample data set;
in this embodiment, when the sample data size is large, the information gain algorithm is adopted to perform feature selection once, and the information gain value of each feature is the difference between the overall entropy of each feature and the overall entropy of the condition. For the data set, set S #,/>,…,) S (+.>,/>,…,/>) For the example containing Z features, Z represents the total number of features, C (++>,…,/>) Is a set of m class labels. Wherein: characterised by->For example, z=1, 2 …, Z, the overall entropy H (C) of the sample feature set is calculated as follows:
conditional entropy of features H (C|) The calculation formula of (2) is as follows:
wherein,,is characterized by->Value of->Representation class label->Is>Representation->Post-fixation class label->Is a conditional probability of k representing the feature +.>The number of median values, m represents the total number of class labels, < ->The proportion of class labels ci in the dataset is represented.
203. Calculating signal gain values of all the features in the sample data set according to the overall entropy and the conditional entropy of all the features;
In this embodiment, subtracting the conditional entropy from the overall entropy, and obtaining the information gain value of the corresponding feature, where the calculation formula is as follows:
wherein,,representative characteristics->Is provided.
204. Taking the characteristic with the signal gain value larger than the preset gain value as a candidate characteristic to obtain a candidate characteristic set;
205. calculating weight distribution of the features in the candidate feature set in a preset combined model, and performing secondary feature selection according to the weight distribution to obtain an optimal feature subset;
206. taking the optimal feature subset as a training sample to perform model training to obtain an initial intrusion detection model;
207. uploading model parameters and weights of the initial intrusion detection model to a central server for integration so that the central server performs a joint training task to generate a global model;
208. taking the model parameters and the weights of the global model returned by the central server as the model parameters and the weights of the initial intrusion detection model, and calculating training samples by using a preset loss function to obtain a loss function value;
209. if the loss function value is smaller than the preset threshold value, obtaining an intrusion detection model according to model parameters and weights of the initial intrusion detection model;
210. obtaining data to be detected, inputting the data to be detected into an intrusion detection model, and detecting a classification result, wherein the intrusion classification result comprises normal behaviors or intrusion behaviors;
211. If the detection classification result is intrusion behavior, generating an early warning signal, and sending the early warning signal to a terminal of a worker so as to perform intrusion detection early warning.
The present embodiment describes in detail a process of performing feature selection once by an information gain algorithm on the basis of the above embodiment, by calculating the overall entropy of a sample data set, and calculating the conditional entropy of each feature in the sample data set; calculating signal gain values of all the features in the sample data set according to the overall entropy and the conditional entropy of all the features; and taking the characteristic with the signal gain value larger than the preset gain value as a candidate characteristic to obtain a candidate characteristic set. In the embodiment, the information gain algorithm performs feature selection, so that preliminary selection of features can be performed to realize simplification and redundancy elimination of input data, and training efficiency of federal learning and accuracy of identifying intrusion behaviors are improved.
Referring to fig. 3, a third embodiment of an intrusion detection method according to an embodiment of the present invention includes:
301. acquiring a sample data set from a preset database, and determining all the characteristics in the sample data set;
302. determining a classification variable in the sample dataset;
in this embodiment, when the data magnitude in the sample data set is small, a chi-square test algorithm may be used to perform feature selection, and the principle of the chi-square test is to determine whether the theory is correct or not by observing the deviation between the actual value and the theoretical value. In the specific process, it is often assumed that two variables are truly independent ("original assumption"), then the degree of deviation between an actual value (observed value) and a theoretical value (the theoretical value refers to a value which should be present if the two variables are truly independent) is observed, if the deviation is small enough, the error is considered to be a very natural sample error, the measurement means is not accurate enough to cause or happen accidentally, and the two variables are truly independent, so that the original assumption is accepted; if the deviation is so large that such errors are unlikely to be occasional or measurement inaccuracy, the two are considered to be actually related.
In this embodiment, whether the data in the sample data set is intrusion data is used as a classification variable, and whether the feature is related to the data is determined by calculating a chi-square value between the feature and the classification variable, and the higher the chi-square value is, the more relevant the feature is to the data is, which means that the more important the feature is, and the feature can be used as a candidate feature.
303. According to a chi-square checking algorithm and classification variables, respectively calculating chi-square values of all the characteristics in the sample data set;
in this embodiment, the calculation formula of the chi-square value is as follows:
wherein N represents the number of sample data, a represents the number of times that the classification variable X and the feature Y exist simultaneously, B represents the number of times that the classification variable X exists, the feature Y does not exist, C represents the number of times that the feature Y exists, the classification variable X does not exist, and D represents the number of times that both do not exist.
304. Taking the characteristics with the chi-square value larger than the preset chi-square value as candidate characteristics to obtain a candidate characteristic set;
305. calculating weight distribution of the features in the candidate feature set in a preset combined model, and performing secondary feature selection according to the weight distribution to obtain an optimal feature subset;
306. taking the optimal feature subset as a training sample to perform model training to obtain an initial intrusion detection model;
307. Uploading model parameters and weights of the initial intrusion detection model to a central server for integration so that the central server performs a joint training task to generate a global model;
308. taking the model parameters and the weights of the global model returned by the central server as the model parameters and the weights of the initial intrusion detection model, and calculating training samples by using a preset loss function to obtain a loss function value;
310. if the loss function value is smaller than the preset threshold value, obtaining an intrusion detection model according to model parameters and weights of the initial intrusion detection model;
311. obtaining data to be detected, inputting the data to be detected into an intrusion detection model, and detecting a classification result, wherein the intrusion classification result comprises normal behaviors or intrusion behaviors;
312. if the detection classification result is intrusion behavior, generating an early warning signal, and sending the early warning signal to a terminal of a worker so as to perform intrusion detection early warning.
This embodiment describes in detail, on the basis of the previous embodiment, a process of feature selection using a chi-square detection algorithm by determining classification variables in a sample dataset; according to a chi-square checking algorithm and classification variables, respectively calculating chi-square values of all the characteristics in the sample data set; and taking the characteristics with the chi-square value larger than the preset chi-square value as candidate characteristics to obtain a candidate characteristic set. In the embodiment, the characteristics are selected through the chi-square detection algorithm, so that the characteristics can be initially selected to realize the simplification and redundancy elimination of input data, and the training efficiency of federal learning and the accuracy of identifying intrusion behaviors are improved.
Referring to fig. 4, a fourth embodiment of an intrusion detection method according to an embodiment of the present invention includes:
401. acquiring a sample data set from a preset database, and determining all the characteristics in the sample data set;
402. based on a preset feature selection algorithm, performing data quality evaluation on all features in a sample data set, and performing feature selection once from all the features to obtain a candidate feature set of the sample data set;
403. dividing the candidate feature set into n parts randomly to obtain n feature subsets;
404. the feature subset is evenly divided into three parts according to the number of the combined models, and three corresponding input feature sets are obtained;
405. respectively inputting the three input feature sets into an xgboost model, an svm model and a random forest model in the combined model, and carrying out feature scoring on the features in the input feature sets through the xgboost model, the svm model and the random forest model to obtain feature weight values of the three input feature sets;
in this embodiment, the feature importance evaluation is performed on the input feature synchronization by xgboost, svm and random forest, and this embodiment mainly introduces the process of performing feature importance evaluation on random forest, and for each decision tree in random forest, the corresponding OOB (out-of-bag data) data is used to calculate its out-of-bag data error, denoted as errOOB1, and noise interference is randomly added to the feature X of all samples of the out-of-bag data OOB (so that the value of the sample at the feature X can be randomly changed), and its out-of-bag data error is calculated again Denoted errOOB2, assuming an Ntree tree in random forest, then importance for feature X =This expression can be used as a measure of the importance of the corresponding feature because: if noise is randomly added to a certain feature, the accuracy outside the bag is greatly reduced, which means that the feature has a great influence on the classification result of the sample, that is, the importance degree of the feature is high.
406. Calculating the weight distribution of the features in the feature subset according to the three feature weight values;
407. according to the weight distribution of the features in the n feature subsets, sorting the features in the candidate feature sets, and deleting the features with the smallest weights to obtain the optimal feature subset of the round;
408. judging whether the number of the current model training turns is greater than or equal to n;
409. if not, re-sequencing the features, and returning to the step of sequencing the features in the candidate feature set according to the weight distribution of the features in the n feature subsets;
410. if yes, the round of optimal feature subset is output as the optimal feature subset;
411. taking the optimal feature subset as a training sample to perform model training to obtain an initial intrusion detection model;
412. uploading model parameters and weights of the initial intrusion detection model to a central server for integration so that the central server performs a joint training task to generate a global model;
413. Taking the model parameters and the weights of the global model returned by the central server as the model parameters and the weights of the initial intrusion detection model, and calculating training samples by using a preset loss function to obtain a loss function value;
414. if the loss function value is smaller than the preset threshold value, obtaining an intrusion detection model according to model parameters and weights of the initial intrusion detection model;
415. obtaining data to be detected, inputting the data to be detected into an intrusion detection model, and detecting a classification result, wherein the intrusion classification result comprises normal behaviors or intrusion behaviors;
416. if the detection classification result is intrusion behavior, generating an early warning signal, and sending the early warning signal to a terminal of a worker so as to perform intrusion detection early warning.
The embodiment is based on the previous embodiment, and details the process of calculating the weight distribution of the features in the candidate feature set in a preset combined model, performing secondary feature selection according to the weight distribution to obtain an optimal feature subset, and dividing the candidate feature set into n parts randomly to obtain n feature subsets, wherein n is a natural number not less than 1; equally dividing the feature subsets according to the number of the preset combination models, respectively inputting the feature subsets into the combination models, and calculating weight distribution of features in the feature subsets; and sorting the features in the candidate feature sets according to the weight distribution of the features in the n feature subsets, deleting the features with the smallest weights to obtain the optimal feature subset of the round, and outputting the optimal feature subset of the round as the optimal feature subset after n rounds of circulation. According to the method, the advantages of the models are integrated in a mode of calculating the weight distribution of the features by combining the models, and the accuracy of feature selection is improved.
Referring to fig. 5, a fifth embodiment of an intrusion detection method according to an embodiment of the present invention includes:
501. acquiring a sample data set from a preset database, and determining all the characteristics in the sample data set;
502. based on a preset feature selection algorithm, performing data quality evaluation on all features in a sample data set, and performing feature selection once from all the features to obtain a candidate feature set of the sample data set;
503. calculating weight distribution of the features in the candidate feature set in a preset combined model, and performing secondary feature selection according to the weight distribution to obtain an optimal feature subset;
504. inputting the optimal feature subset as a training sample into a preset neural network model, and outputting an output value of intrusion classification prediction;
in practical application, model training of an initial intrusion detection model can be performed based on various classification algorithms, an intrusion detection classification algorithm based on a Support Vector Machine (SVM), and also can be K-Means, KNN and decision trees. Through continuous training, the local weight parameters in the safety mechanism are continuously optimized and adjusted, and when the loss function is minimized, training can be stopped, and optimization of the local model weight parameters is completed.
505. Calculating an initial loss value of the output value according to the output value and a preset initial loss function;
506. updating model parameters of the neural network model according to the initial loss value and the output value, and continuously executing the step of inputting the model training sample into the neural network model until the initial loss value converges to obtain an initial intrusion detection model;
507. uploading model parameters and weights of the initial intrusion detection model to a central server for integration so that the central server performs a joint training task to generate a global model;
508. taking the model parameters and the weights of the global model returned by the central server as the model parameters and the weights of the initial intrusion detection model, and calculating training samples by using a preset loss function to obtain a loss function value;
509. if the loss function value is smaller than the preset threshold value, obtaining an intrusion detection model according to model parameters and weights of the initial intrusion detection model;
510. obtaining data to be detected, inputting the data to be detected into an intrusion detection model, and detecting a classification result, wherein the intrusion classification result comprises normal behaviors or intrusion behaviors;
511. if the detection classification result is intrusion behavior, generating an early warning signal, and sending the early warning signal to a terminal of a worker so as to perform intrusion detection early warning.
The embodiment describes the model training process in detail on the basis of the previous embodiment, and the model training sample is input into a preset neural network model to output an intrusion classification prediction output value; calculating an initial loss value of the output value according to the output value and a preset initial loss function; and updating model parameters of the neural network model according to the initial loss value and the output value, and continuously executing the step of inputting the model training sample into the neural network model until the initial loss value converges to obtain an initial intrusion detection model. The initial intrusion detection model is generated firstly, the model parameters of the initial intrusion detection model are sent to the central server for integration, the training speed of the model is improved, and the detection accuracy of the generated intrusion detection model is higher.
The intrusion detection method in the embodiment of the present invention is described above, and the intrusion detection device in the embodiment of the present invention is described below, referring to fig. 6, where an embodiment of the intrusion detection device in the embodiment of the present invention includes:
the feature acquisition module 601 is configured to acquire a sample data set from a preset database, and determine all features in the sample data set;
the first feature selection module 602 is configured to perform data quality evaluation on all features in the sample data set based on a preset feature selection algorithm, and perform feature selection once from all features to obtain a candidate feature set of the sample data set;
The second feature selection module 603 is configured to calculate a weight distribution of features in the candidate feature set in a preset combination model, and perform secondary feature selection according to the weight distribution, so as to obtain an optimal feature subset;
an initial model training module 604, configured to perform model training of an initial intrusion detection model by using the optimal feature subset as a model input sample;
the parameter uploading module 605 is configured to upload model parameters and weights of the initial intrusion detection model to a central server for integration, so that the central server performs a joint training task to generate a global model;
the loss function calculation module 606 is configured to take the model parameters and weights of the global model returned by the central server as the model parameters and weights of the initial intrusion detection model, and calculate the training sample by using a preset loss function to obtain a loss function value;
the model generation module 607 is configured to obtain an intrusion detection model according to model parameters and weights when the loss function value is less than a preset threshold;
the detection module 608 is configured to obtain data to be detected, and input the data to be detected into the intrusion detection model, and then detect a classification result, where the intrusion classification result includes a normal behavior or an intrusion behavior;
And the early warning module 609 is configured to generate an early warning signal when the detection classification result is an intrusion behavior, and send the early warning signal to a terminal of a staff to perform intrusion detection early warning.
It is emphasized that the database may be stored in a blockchain node in order to ensure privacy and security of the data.
In the embodiment of the invention, the intrusion detection device runs the intrusion detection method, and the intrusion detection device acquires a sample data set from a database and determines all the characteristics in the sample data set; based on a feature selection algorithm, performing feature selection on all features once to obtain a candidate feature set; calculating weight distribution of the features in the candidate feature set in the combined model, and performing secondary feature selection to obtain an optimal feature subset; training an initial intrusion detection model according to the optimal feature subset; based on a federal learning framework, model parameters of an initial intrusion detection model obtained through training are sent to a central server for parameter integration, a global model is generated, iteration is carried out according to the model parameters of the global model, and finally the intrusion detection model is obtained; and acquiring data to be detected, inputting the data to be detected into an intrusion detection model, detecting a classification result, and if the detection classification result is intrusion behavior, performing early warning. According to the method, global data are pre-screened, the optimal feature subset is screened out finally after training of the combined model, federal learning is carried out, input data are simplified and redundancy is removed, and the training efficiency of federal learning and the accuracy of identifying intrusion behaviors are improved.
Referring to fig. 7, a second embodiment of an intrusion detection device according to an embodiment of the present invention includes:
the feature acquisition module 601 is configured to acquire a sample data set from a preset database, and determine all features in the sample data set;
the first feature selection module 602 is configured to perform data quality evaluation on all features in the sample data set based on a preset feature selection algorithm, and perform feature selection once from all features to obtain a candidate feature set of the sample data set;
the second feature selection module 603 is configured to calculate a weight distribution of features in the candidate feature set in a preset combination model, and perform secondary feature selection according to the weight distribution, so as to obtain an optimal feature subset;
the initial model training module 604 is configured to perform model training with the optimal feature subset as a training sample to obtain an initial intrusion detection model;
the parameter uploading module 605 is configured to upload model parameters and weights of the initial intrusion detection model to a central server for integration, so that the central server performs a joint training task to generate a global model;
the loss function calculation module 606 is configured to take the model parameters and weights of the global model returned by the central server as the model parameters and weights of the initial intrusion detection model, and calculate the training sample by using a preset loss function to obtain a loss function value;
The model generation module 607 is configured to obtain an intrusion detection model according to model parameters and weights of an initial intrusion detection model when the loss function value is less than a preset threshold;
the detection module 608 is configured to obtain data to be detected, and input the data to be detected into the intrusion detection model, and then detect a classification result, where the intrusion classification result includes a normal behavior or an intrusion behavior;
and the early warning module 609 is configured to generate an early warning signal when the detection classification result is an intrusion behavior, and send the early warning signal to a terminal of a staff to perform intrusion detection early warning.
Wherein, the intrusion detection device further comprises an algorithm selection module 611, and the algorithm selection module 611 is specifically configured to: determining a feature quantity of all features in the sample dataset; judging whether the feature quantity is larger than a preset quantity or not; if yes, the feature selection algorithm uses an information gain algorithm to perform feature selection; if not, the feature selection algorithm uses a chi-square test algorithm to perform feature selection.
Optionally, if the feature selection algorithm is an information gain algorithm, the first feature selection module 602 is specifically configured to: calculating the overall entropy of the sample data set according to an information gain algorithm, and calculating the conditional entropy of each feature in the sample data set; calculating signal gain values of all the features in the sample data set according to the overall entropy and the conditional entropy of all the features; and taking the characteristic of which the signal gain value is larger than a preset gain value as a candidate characteristic to obtain the candidate characteristic set.
Optionally, if the feature selection algorithm is a chi-square test algorithm, the first feature selection module 602 is specifically configured to: determining a classification variable in the sample dataset; according to the chi-square checking algorithm and the classification variable, respectively calculating chi-square values of all the characteristics in the sample data set; and taking the characteristic of which the chi-square value is larger than a preset chi-square value as a candidate characteristic to obtain the candidate characteristic set.
The second feature selection module 603 specifically includes: an equipartition unit 6031 for randomly equipartiting the candidate feature set into n parts to obtain n feature subsets, wherein n is a natural number not less than 1; the weight calculation unit 6032 is configured to divide the feature subsets equally according to the number of preset combination models, input the feature subsets into the combination models respectively, and calculate weight distribution of features in the feature subsets; a ranking unit 6033, configured to rank the features in the candidate feature set according to the weight distribution of the features in the n feature subsets, and delete the feature with the smallest weight to obtain the optimal feature subset of the present round; a judging unit 6034 for judging whether the number of the current model training turns is greater than or equal to n; a returning unit 6035, configured to re-order the features when the number of rounds of training of the current model is less than n, and return to the step of ordering the features in the candidate feature set according to the weight distribution of the features in the n feature subsets; and a feature output unit 6036, configured to output the optimal feature subset of the current model as the optimal feature subset when the number of rounds of training of the current model is greater than or equal to n.
Optionally, the combined model includes an xgboost model, an svm model and a random forest model, and the weight calculating unit 6032 is specifically configured to: dividing the feature subset into three parts uniformly according to the number of the combined models to obtain three corresponding input feature sets; respectively inputting the three input feature sets into an xgboost model, an svm model and a random forest model, and carrying out feature scoring on the features in the input feature sets through the xgboost model, the svm model and the random forest model to obtain feature weight values of the three input feature sets; and calculating the weight distribution of the features in the feature subset according to the three feature weight values.
Optionally, the initial model training module 604 is specifically configured to: inputting the training sample into a preset neural network model, and outputting an output value of intrusion classification prediction; calculating an initial loss value of the output value according to the output value and a preset initial loss function; and updating the model parameters of the neural network model according to the initial loss value and the output value, and continuously executing the step of inputting the model training sample into the neural network model until the initial loss value converges, so as to obtain an initial intrusion detection model.
The embodiment describes the specific functions of each module and the unit constitution of part of the modules in detail on the basis of the previous embodiment, pre-screening global data is performed through the newly added modules, and federal learning is performed after the optimal feature subset is finally screened out by training of the combined model, so that the simplification and redundancy removal of input data are realized, the training efficiency of federal learning is improved, and the accuracy of identifying intrusion behaviors is improved.
The intrusion detection device in the embodiment of the present invention is described in detail above in terms of the modularized functional entity in fig. 6 and fig. 7, and the intrusion detection device in the embodiment of the present invention is described in detail below in terms of hardware processing.
Fig. 8 is a schematic structural diagram of an intrusion detection device according to an embodiment of the present invention, where the intrusion detection device 800 may have a relatively large difference due to different configurations or performances, and may include one or more processors (central processing units, CPU) 810 (e.g., one or more processors) and a memory 820, and one or more storage media 830 (e.g., one or more mass storage devices) storing application programs 833 or data 832. Wherein memory 820 and storage medium 830 can be transitory or persistent. The program stored on the storage medium 830 may include one or more modules (not shown), each of which may include a series of instruction operations for the intrusion detection device 800. Still further, the processor 810 may be arranged to communicate with the storage medium 830 and execute a series of instruction operations in the storage medium 830 on the intrusion detection device 800 to implement the steps of the intrusion detection method described above.
The intrusion detection device 800 may also include one or more power supplies 840, one or more wired or wireless network interfaces 850, one or more input/output interfaces 860, and/or one or more operating systems 831, such as Windows Serve, mac OS X, unix, linux, freeBSD, etc. It will be appreciated by those skilled in the art that the intrusion detection device structure shown in fig. 8 is not limiting of the intrusion detection device provided by the present application and may include more or fewer components than shown, or may be a combination of certain components, or a different arrangement of components.
The blockchain is a novel application mode of computer technologies such as distributed data storage, point-to-point transmission, consensus mechanism, encryption algorithm and the like. The Blockchain (Blockchain), which is essentially a decentralised database, is a string of data blocks that are generated by cryptographic means in association, each data block containing a batch of information of network transactions for verifying the validity of the information (anti-counterfeiting) and generating the next block. The blockchain may include a blockchain underlying platform, a platform product services layer, an application services layer, and the like.
The present application also provides a computer readable storage medium, which may be a non-volatile computer readable storage medium, or a volatile computer readable storage medium, having stored therein instructions that, when executed on a computer, cause the computer to perform the steps of the intrusion detection method.
It will be clearly understood by those skilled in the art that, for convenience and brevity of description, the specific working process of the system or apparatus and unit described above may refer to the corresponding process in the foregoing method embodiment, which is not repeated herein.
The integrated units, if implemented in the form of software functional units and sold or used as stand-alone products, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present invention may be embodied essentially or in part or all of the technical solution or in part in the form of a software product stored in a storage medium, including instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to perform all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a read-only memory (ROM), a random access memory (random access memory, RAM), a magnetic disk, or an optical disk, or other various media capable of storing program codes.
The above embodiments are only for illustrating the technical solution of the present invention, and not for limiting the same; although the invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit and scope of the technical solutions of the embodiments of the present invention.

Claims (9)

1. An intrusion detection method, the intrusion detection method comprising:
acquiring a sample data set from a preset database, and determining all the characteristics in the sample data set;
based on a preset feature selection algorithm, performing data quality evaluation on all features in the sample data set, and performing feature selection once from all features to obtain a candidate feature set of the sample data set;
dividing the candidate feature set into n parts randomly to obtain n feature subsets, wherein n is a natural number not less than 1; equally dividing the feature subsets according to the number of preset combination models, respectively inputting the feature subsets into the combination models, and calculating weight distribution of features in the feature subsets; according to the weight distribution of the features in the n feature subsets, sequencing the features in the candidate feature sets, and deleting the features with the minimum weights to obtain the optimal feature subset of the round; judging whether the number of the current model training turns is greater than or equal to n; if not, re-sequencing the features, and returning to the step of sequencing the features in the candidate feature set according to the weight distribution of the features in the n feature subsets; if yes, the round of optimal feature subset is output as the optimal feature subset;
Taking the optimal feature subset as a training sample to carry out model training to obtain an initial intrusion detection model;
uploading model parameters and weights of the initial intrusion detection model to a central server for integration so that the central server performs a joint training task to generate a global model;
taking the model parameters and weights of the global model returned by the central server as the model parameters and weights of the initial intrusion detection model, and calculating the training sample by using a preset loss function to obtain a loss function value;
if the loss function value is smaller than a preset threshold value, obtaining an intrusion detection model according to model parameters and weights of the initial intrusion detection model;
obtaining data to be detected, and inputting the data to be detected into the intrusion detection model to obtain a detection classification result, wherein the detection classification result comprises normal behavior or intrusion behavior;
if the detection classification result is an intrusion behavior, generating an early warning signal, and sending the early warning signal to a terminal of a worker so as to perform intrusion detection early warning.
2. The intrusion detection method according to claim 1, wherein the feature selection algorithm includes an information gain algorithm and a chi-square test algorithm, and before the evaluating data quality of all features in the sample dataset based on the preset feature selection algorithm, performing feature selection once from all features to obtain a candidate feature set of the sample dataset, further includes:
Determining a feature quantity of all features in the sample dataset;
judging whether the feature quantity is larger than a preset quantity or not;
if yes, the feature selection algorithm uses an information gain algorithm to perform feature selection;
if not, the feature selection algorithm uses a chi-square test algorithm to perform feature selection.
3. The intrusion detection method according to claim 2, wherein when the feature selection algorithm is an information gain algorithm, the performing data quality evaluation on all features in the sample dataset based on a preset feature selection algorithm, performing feature selection once from all features, and obtaining a candidate feature set of the sample dataset includes:
calculating the overall entropy of the sample data set according to an information gain algorithm, and calculating the conditional entropy of each feature in the sample data set;
calculating signal gain values of all the features in the sample data set according to the overall entropy and the conditional entropy of all the features;
and taking the characteristic of which the signal gain value is larger than a preset gain value as a candidate characteristic to obtain the candidate characteristic set.
4. The intrusion detection method according to claim 2, wherein when the feature selection algorithm is a chi-square inspection algorithm, the performing data quality evaluation on all features in the sample data set based on the preset feature selection algorithm, performing feature selection once from all features, and obtaining a candidate feature set of the sample data set includes:
Determining a classification variable in the sample dataset;
according to the chi-square checking algorithm and the classification variable, respectively calculating chi-square values of all the characteristics in the sample data set;
and taking the characteristic of which the chi-square value is larger than a preset chi-square value as a candidate characteristic to obtain the candidate characteristic set.
5. The intrusion detection method according to claim 1, wherein the combination model includes an xgboost model, a svm model and a random forest model, the feature subsets are equally divided according to the number of preset combination models, the feature subsets are respectively input into the combination models, and calculating the weight distribution of the features in the feature subsets includes:
dividing the feature subset into three parts uniformly according to the number of the combined models to obtain three corresponding input feature sets;
respectively inputting the three input feature sets into an xgboost model, an svm model and a random forest model, and carrying out feature scoring on the features in the input feature sets through the xgboost model, the svm model and the random forest model to obtain feature weight values of the three input feature sets;
and calculating the weight distribution of the features in the feature subset according to the three feature weight values.
6. The intrusion detection method according to claim 1, wherein model training the optimal feature subset as a training sample to obtain an initial intrusion detection model comprises:
inputting the training sample into a preset neural network model, and outputting an output value of intrusion classification prediction;
calculating an initial loss value of the output value according to the output value and a preset initial loss function;
and updating the model parameters of the neural network model according to the initial loss value and the output value, and continuously executing the step of inputting the model training sample into the neural network model until the initial loss value converges, so as to obtain an initial intrusion detection model.
7. An intrusion detection device, the intrusion detection device comprising:
the characteristic acquisition module is used for acquiring a sample data set from a preset database and determining all the characteristics in the sample data set;
the first feature selection module is used for carrying out data quality evaluation on all features in the sample data set based on a preset feature selection algorithm, and carrying out feature selection once from all the features to obtain a candidate feature set of the sample data set;
The second feature selection module is used for dividing the candidate feature set into n parts randomly to obtain n feature subsets, wherein n is a natural number not smaller than 1; equally dividing the feature subsets according to the number of preset combination models, respectively inputting the feature subsets into the combination models, and calculating weight distribution of features in the feature subsets; according to the weight distribution of the features in the n feature subsets, sequencing the features in the candidate feature sets, and deleting the features with the minimum weights to obtain the optimal feature subset of the round; judging whether the number of the current model training turns is greater than or equal to n; if not, re-sequencing the features, and returning to the step of sequencing the features in the candidate feature set according to the weight distribution of the features in the n feature subsets; if yes, the round of optimal feature subset is output as the optimal feature subset;
the initial model training module is used for carrying out model training by taking the optimal feature subset as a training sample to obtain an initial intrusion detection model;
the parameter uploading module is used for uploading the model parameters and the weights of the initial intrusion detection model to a central server for integration so that the central server performs a joint training task to generate a global model;
The loss function calculation module is used for taking the model parameters and the weights of the global model returned by the central server as the model parameters and the weights of the initial intrusion detection model, and calculating the training sample by using a preset loss function to obtain a loss function value;
the model generation module is used for obtaining an intrusion detection model according to model parameters and weights of the initial intrusion detection model when the loss function value is smaller than a preset threshold value;
the detection module is used for acquiring data to be detected, inputting the data to be detected into the intrusion detection model, and obtaining a detection classification result, wherein the detection classification result comprises normal behavior or intrusion behavior;
and the early warning module is used for generating an early warning signal when the detection classification result is an intrusion behavior, and sending the early warning signal to a terminal of a worker so as to perform intrusion detection early warning.
8. An intrusion detection device, the intrusion detection device comprising: a memory and at least one processor, the memory having instructions stored therein, the memory and the at least one processor being interconnected by a line;
the at least one processor invoking the instructions in the memory to cause the intrusion detection device to perform the steps of the intrusion detection method according to any one of claims 1-6.
9. A computer readable storage medium having stored thereon a computer program, which when executed by a processor performs the steps of the intrusion detection method according to any one of claims 1-6.
CN202110740041.4A 2021-06-30 2021-06-30 Intrusion detection method, device, equipment and storage medium Active CN113434859B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110740041.4A CN113434859B (en) 2021-06-30 2021-06-30 Intrusion detection method, device, equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110740041.4A CN113434859B (en) 2021-06-30 2021-06-30 Intrusion detection method, device, equipment and storage medium

Publications (2)

Publication Number Publication Date
CN113434859A CN113434859A (en) 2021-09-24
CN113434859B true CN113434859B (en) 2023-08-15

Family

ID=77758274

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110740041.4A Active CN113434859B (en) 2021-06-30 2021-06-30 Intrusion detection method, device, equipment and storage medium

Country Status (1)

Country Link
CN (1) CN113434859B (en)

Families Citing this family (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113992419B (en) * 2021-10-29 2023-09-01 上海交通大学 System and method for detecting and processing abnormal behaviors of user
CN114117935A (en) * 2021-12-06 2022-03-01 上海交通大学 Internet of things anomaly detection system based on joint learning and automatic encoder
CN114006775B (en) * 2021-12-31 2022-04-12 北京微步在线科技有限公司 Intrusion event detection method and device
CN115103353A (en) * 2022-06-13 2022-09-23 厦门大学 Intelligent terminal intrusion detection method
CN115277555B (en) * 2022-06-13 2024-01-16 香港理工大学深圳研究院 Heterogeneous environment network traffic classification method, heterogeneous environment network traffic classification device, terminal and storage medium
CN115714687B (en) * 2022-11-23 2024-06-04 武汉轻工大学 Intrusion flow detection method, device, equipment and storage medium
CN116886448B (en) * 2023-09-07 2023-12-01 卓望数码技术(深圳)有限公司 DDoS attack alarm studying and judging method and device based on semi-supervised learning
CN117998364B (en) * 2024-04-03 2024-05-28 中国民航大学 XGBoost WSN intrusion detection system based on mixed feature selection

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109347872A (en) * 2018-11-29 2019-02-15 电子科技大学 A kind of network inbreak detection method based on fuzziness and integrated study
CN110881037A (en) * 2019-11-19 2020-03-13 北京工业大学 Network intrusion detection method and training method and device of model thereof, and server
CN111181939A (en) * 2019-12-20 2020-05-19 广东工业大学 Network intrusion detection method and device based on ensemble learning
CN111914253A (en) * 2020-08-10 2020-11-10 中国海洋大学 Method, system, equipment and readable storage medium for intrusion detection

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109347872A (en) * 2018-11-29 2019-02-15 电子科技大学 A kind of network inbreak detection method based on fuzziness and integrated study
CN110881037A (en) * 2019-11-19 2020-03-13 北京工业大学 Network intrusion detection method and training method and device of model thereof, and server
CN111181939A (en) * 2019-12-20 2020-05-19 广东工业大学 Network intrusion detection method and device based on ensemble learning
CN111914253A (en) * 2020-08-10 2020-11-10 中国海洋大学 Method, system, equipment and readable storage medium for intrusion detection

Also Published As

Publication number Publication date
CN113434859A (en) 2021-09-24

Similar Documents

Publication Publication Date Title
CN113434859B (en) Intrusion detection method, device, equipment and storage medium
KR101752251B1 (en) Method and device for identificating a file
CN111475804A (en) Alarm prediction method and system
Adebowale et al. Comparative study of selected data mining algorithms used for intrusion detection
Nazarenko et al. Features of application of machine learning methods for classification of network traffic (features, advantages, disadvantages)
Booz et al. Tuning deep learning performance for android malware detection
CN105072214A (en) C&amp;C domain name identification method based on domain name feature
Chandolikar et al. Efficient algorithm for intrusion attack classification by analyzing KDD Cup 99
Yassin et al. Signature-Based Anomaly intrusion detection using Integrated data mining classifiers
CN113726545B (en) Network traffic generation method and device for generating countermeasure network based on knowledge enhancement
CN112202718B (en) XGboost algorithm-based operating system identification method, storage medium and device
Karanam et al. Intrusion detection mechanism for large scale networks using CNN-LSTM
Awad et al. Addressing imbalanced classes problem of intrusion detection system using weighted extreme learning machine
CN111091194B (en) Operation system identification method based on CAVWBB _ KL algorithm
CN110753913A (en) Sample-based multidimensional data cloning
CN115455457B (en) Chain data management method, system and storage medium based on intelligent big data
Hu et al. Evaluation of big data analytics and cognitive computing in smart health systems
CN107085544B (en) System error positioning method and device
CN117375855A (en) Abnormality detection method, model training method and related equipment
CN113935420A (en) Malicious encrypted data detection method and device, computer equipment and storage medium
CN115174170A (en) VPN encrypted flow identification method based on ensemble learning
Curiskis et al. Link prediction and topological feature importance in social networks
Pareriya et al. An ensemble xgboost approach for the detection of cyber-attacks in the industrial iot domain
CN110766338A (en) DPOS (distributed data processing) bifurcation prediction model method based on artificial intelligence and EOS (Ethernet over Ethernet) and IO (input/output) of block chain technology
Mauša et al. Rotation forest in software defect prediction

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant