CN111310176B

CN111310176B - Intrusion detection method and device based on feature selection

Info

Publication number: CN111310176B
Application number: CN202010062791.6A
Authority: CN
Inventors: 闫利华
Original assignee: Suzhou Inspur Intelligent Technology Co Ltd
Current assignee: Suzhou Inspur Intelligent Technology Co Ltd
Priority date: 2020-01-19
Filing date: 2020-01-19
Publication date: 2022-05-27
Anticipated expiration: 2040-01-19
Also published as: CN111310176A

Abstract

The invention provides an intrusion detection method and device based on feature selection, wherein the method comprises the following steps: acquiring and preprocessing network data to obtain characteristic samples of the network data, randomly dividing the characteristic samples into a plurality of groups, and calculating each group by an FCBF characteristic selection algorithm to remove redundant characteristics; combining all the removed group samples into a set A, and dividing the characteristic samples of the set A into a set B and a set C, wherein the set B comprises at least one characteristic sample; sequentially taking a characteristic sample from the set C, forming a set D by the characteristic sample and the set B, and respectively calculating the correlation among the set B, the set D and the category; and in response to the correlation between the set B and the category being smaller than the correlation between the set D and the category, replacing the set D with a new set B and returning to the previous step until the set C is empty to obtain a feature set for intrusion detection. The method and the device can extract the feature set with strong correlation and stability, thereby identifying the network intrusion more effectively.

Description

Intrusion detection method and device based on feature selection

Technical Field

The present invention relates to the field of computers, and more particularly, to a method and apparatus for intrusion detection based on feature selection.

Background

With the rapid development of big data and cloud computing, the network intrusion means has the characteristics of concealment and silence, and the demand of people on network security is increasing day by day. However, as the amount of data increases, the main stream network intrusion model has lower and lower intrusion recognition efficiency. Because the data is not only large in scale but also high in dimensionality, a large amount of redundant information exists, and the effect of intrusion detection can be greatly reduced by the irrelevant information. The characteristic selection is an effective means for solving the problem, and a good characteristic selection algorithm can effectively eliminate redundant characteristics or noise data existing in classified data and improve the speed and accuracy of intrusion detection. A feature selection algorithm that is robust in performance is therefore very important for the identification of intrusion detection.

FCBF (Fast Correlation-based Feature Selection), a Fast Feature Selection method based on Feature Correlation evaluation, which mainly comprises two steps: removing irrelevant features and removing redundant features by adopting a sequential forward search algorithm. The FCBF algorithm sets a threshold value, and the features with the correlation between the features and the classes smaller than the threshold value are considered as irrelevant features, so that the dimensionality of data can be greatly reduced by removing the irrelevant features. And eliminating redundant features in the feature subset by adopting a sequential forward search algorithm for the remaining related features so as to obtain a final feature subset.

The main problems of the algorithm adopting FCBF are:

when the FCBF feature selection algorithm is used for feature selection, all samples are calculated simultaneously, and therefore a final feature subset is obtained. The process of feature selection is performed only once and the subset of features obtained in this way is valid for all samples at present under a certain evaluation criterion. However, the feature subsets obtained in this way are prone to overfitting, and once the sample set changes, the feature subsets are not suitable for a new sample set, and the classification accuracy is reduced by using the original feature subsets.

When the FCBF algorithm eliminates the redundancy characteristics, the redundancy judgment condition is as follows: for two features F_i、F_j，F_iCorrelation with class C is greater than F_jCorrelation with class C, and F_iAnd F_jHas a correlation between F and F_jCorrelation with class C. But F_iAnd F_jCorrelation between F and_jthere is no direct correlation with the category C. At this time, the relevant features may be deleted as redundant features, resulting in a decrease in the classification accuracy.

How to reduce the overfitting condition of the feature subset on the basis of not reducing the feature selection speed so as to improve the accuracy of intrusion detection becomes a problem to be solved by the invention.

Disclosure of Invention

In view of this, an object of the embodiments of the present invention is to provide a method and an apparatus for intrusion detection based on feature selection, which perform optimization based on an FCBF feature selection algorithm, and select a more stable and effective feature subset without reducing a computation speed.

Based on the above object, an aspect of the embodiments of the present invention provides an intrusion detection method based on feature selection, including the following steps:

acquiring network data, preprocessing the data to obtain characteristic samples of the data, randomly dividing the characteristic samples into a plurality of groups, and calculating each group by an FCBF characteristic selection algorithm to remove redundant characteristics;

combining all the group samples with the redundant features removed into a set A, and dividing all the feature samples in the set A into a set B and a set C, wherein the set B comprises at least one feature sample;

sequentially taking a feature sample from the set C, forming a set D by the feature sample and the set B, and respectively calculating the correlations among the set B, the set D and the categories;

and in response to the correlation between the set B and the category being smaller than the correlation between the set D and the category, replacing the set D with a new set B and returning to the previous step until the set C is empty to obtain a feature set for intrusion detection.

In some embodiments, the method further comprises:

in response to the correlation between the set B and the category being not less than the correlation between the set D and the category, leaving the set B unchanged and returning to the previous step until the set C is empty.

In some embodiments, the calculating, by the FCBF feature selection algorithm, redundant features for each group comprises:

calculating each feature sample F in each group by FCBF feature selection algorithm_iSymmetry uncertainty SU with class C_icAnd setting a symmetry uncertainty threshold, and deleting the feature samples in each group, wherein the symmetry uncertainty of each group to the class C is smaller than the threshold.

In some embodiments, said calculating by the FCBF feature selection algorithm to remove redundant features for each group further comprises:

for each feature sample F in each set of feature samples after the deletion operation_iSelecting said feature samples F in said set_iEach of the latter feature samples F_jAnd calculates the symmetry uncertainty SU of the two_ij；

In response to SU_ij≥SU_jcAnd SU_ic≥SU_jcDeleting the feature samples F in the group_jIn which SU_jcTo representThe characteristic sample F_jUncertainty of symmetry with class C, SU_icRepresenting said feature sample F_iUncertainty of symmetry with class C.

In some embodiments, the threshold is represented as

Where N represents the number of feature samples in each group.

In some embodiments, the correlation between each of the sets and categories is expressed as

Where N is the characteristic number, Avg (SU)_ic) Average of the symmetry uncertainties between all feature samples in the set and class C, Avg (SU)_ij) Is the average of the feature samples in the set and the symmetry uncertainties between the feature samples.

In some embodiments, merging all the groups of samples with redundant features removed into a set a, and dividing all the feature samples in the set a into a set B and a set C, where the set B includes at least one feature sample includes:

such that the set B contains one feature sample.

In some embodiments, the method further comprises:

and training and testing the extracted feature samples in the set B to obtain a feature subset for identifying network intrusion detection.

Another aspect of the embodiments of the present invention provides an intrusion detection apparatus based on feature selection, including:

at least one processor; and

a memory storing program code executable by the processor, the program code implementing the method of any of the above when executed by the processor.

A further aspect of embodiments of the present invention provides a computer readable storage medium having stored thereon a computer program which, when executed, may implement the method of any one of the above.

The invention has the following beneficial technical effects: the intrusion detection method and the intrusion detection device based on the feature selection can select the stable and effective feature subset without reducing the operation speed, and the subset is used for classification and identification, so that the classification accuracy can be improved, the attack network attack behavior can be effectively captured, and the safety of the network environment can be guaranteed.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art that other embodiments can be obtained by using the drawings without creative efforts.

FIG. 1 is a flow chart of a method of intrusion detection based on feature selection according to the present invention;

FIG. 2 is a flow diagram of a method for intrusion detection based on feature selection according to one embodiment of the present invention;

fig. 3 is a schematic diagram of a hardware structure of an intrusion detection device based on feature selection according to the present invention.

Detailed Description

Embodiments of the present invention are described below. However, it is to be understood that the disclosed embodiments are merely examples and that other embodiments may take various and alternative forms. The figures are not necessarily to scale; certain features may be exaggerated or minimized to show details of particular components. Therefore, specific structural and functional details disclosed herein are not to be interpreted as limiting, but merely as a representative basis for teaching one skilled in the art to variously employ the present invention. As one of ordinary skill in the art will appreciate, various features illustrated and described with reference to any one of the figures may be combined with features illustrated in one or more other figures to produce embodiments that are not explicitly illustrated or described. The combination of features shown provides a representative embodiment for a typical application. However, various combinations and modifications of the features consistent with the teachings of the present invention may be desired for certain specific applications or implementations.

In order to make the objects, technical solutions and advantages of the present invention more apparent, the following embodiments of the present invention are described in further detail with reference to the accompanying drawings.

In view of the above object, an aspect of the embodiments of the present invention provides a method for intrusion detection based on feature selection, as shown in fig. 1, including the following steps:

step S101: acquiring network data, preprocessing the data to obtain characteristic samples of the data, randomly dividing the characteristic samples into a plurality of groups, and calculating each group by an FCBF characteristic selection algorithm to remove redundant characteristics;

step S102: combining all the group samples with the redundant features removed into a set A, and dividing all the feature samples in the set A into a set B and a set C, wherein the set B comprises at least one feature sample;

step S103: sequentially taking a feature sample from the set C, forming a set D by the feature sample and the set B, and respectively calculating the correlations among the set B, the set D and the categories;

step S104: and in response to the correlation between the set B and the category being smaller than the correlation between the set D and the category, replacing the set D with a new set B and returning to the previous step until the set C is empty to obtain a feature set for intrusion detection.

In some embodiments, the method further comprises: in response to the correlation between the set B and the category not being less than the correlation between the set D and the category, the set B is made unchanged and returned to the previous step (i.e., step S103) until the set C is empty.

In some embodiments, the method further comprises: and training and testing the extracted feature samples in the set B to obtain a feature subset for identifying network intrusion detection.

In some embodiments, the calculating the redundant features for each group by the FCBF feature selection algorithm comprises: calculating each feature sample F in each group by FCBF feature selection algorithm_iSymmetry uncertainty SU with class C_icAnd setting a symmetry uncertainty threshold, and deleting the feature samples in each group, wherein the symmetry uncertainty of each group to the class C is smaller than the threshold.

In some embodiments, the calculating by the FCBF feature selection algorithm to remove redundant features for each group further comprises: for each feature sample F in each set of feature samples after the deletion operation_iSelecting said feature samples F in said set_iEach of the latter feature samples F_jAnd calculates the symmetry uncertainty SU of the two_ij(ii) a In response to SU_ij≥SU_jcAnd SU_ic≥SU_jcDeleting the feature samples F in the group_jIn which SU_icRepresenting said feature sample F_jUncertainty of symmetry with class C, SU_icRepresenting said feature sample F_iUncertainty of symmetry with class C.

In some embodiments, the threshold is represented as

Where N represents the number of feature samples in each group.

In some embodiments, the correlation between each of the sets and categories is represented as

Wherein N is the characteristic number, Avg (SU)_ic) Avg (SU) as the average of the symmetry uncertainties between all feature samples in the set and class C_ij) Is the mean of the symmetry uncertainties between the feature samples in the set and the feature samples.

In some embodiments, merging all the groups of samples with redundant features removed into a set a, and dividing all the feature samples in the set a into a set B and a set C, where the set B includes at least one feature sample includes: such that the set B contains one feature sample.

In some embodiments, the FCBF algorithm is a representative method in filtering feature selection that employs an evaluation criterion of Symmetry Uncertainty (SU) for measuring the degree of association between a feature and a class or between features. I.e., for variable X, Y, the uncertainty of symmetry between them is:

wherein IG (X | Y) refers to how much mutual information between two variables is; h (X), H (Y) denote the information entropy of the variables.

In addition, set-to-class Correlation (SC) is defined. Namely:

where N is the number of features in the set, Avg (SU)_ic) Avg (SU), the average of the correlations between all features and classes_ij) Is the average of the correlation between features.

The steps for implementing the intrusion detection algorithm according to the present invention are mainly embodied in two aspects: selecting a characteristic group in a grouping mode; and selecting a final feature set by adopting a forward search mode.

The process is detailed as follows:

(1) and (3) extracting a characteristic group: the sample set is randomly divided into a plurality of groups, 10 groups are taken as an example, S₁，S₂，…，S₁₀(ii) a For each sample set S_iRemoving irrelevant features in the set, and calculating the relevance degree SU between the features and the categories for each feature_icSorting the selected features in descending order according to the degree of association, setting a threshold

The features after the position are irrelevant features, and are deleted, so that ten sets of feature sets with redundancies removed are obtained: s_1-1，S_1-2，…，S_1-10。

Then ten sets of feature sets S_1-iAnd executing the operation of removing redundant features to obtain a feature group extracted from ten groups of sample sets. The specific process is as follows: for feature set S_1-iEach feature F in_iSelecting F_iEach feature F of the latter_jCalculate SU_ij. If SU_ij≥SU_jcAnd SU_ic≥SU_jcThen, explain F_jThis feature is in combination with F_iThe feature has stronger relevance and can be represented by the feature F_iAnd (4) replacing. So F_jIs a redundant feature and should be deleted. This is performed on ten feature sets, resulting in a valid ten-set feature subset S for ten sample sets_S-1，S_S-2，…，S_S-10。

(2) Merging the feature groups, selecting the final feature subset: merging ten sets of signatures S_S-1，S_S-2，…，S_S-10Obtaining an initial set S_Init(ii) a Taking an initial set S_InitFirst characteristic F of₁Setting the final feature subset to S_Finally＝{F₁And S, and S_Init＝S_Init-{F₁}; to S_InitEach feature F in_i，S_Init＝S_Init-{F_i}，S_temp＝S_Finally+{F_i}, calculating the set S_FinallyAnd S_tempCorrelation with categories if

Then consider F_iNeeds to be reserved when S_Finally＝S_temp. After all the characteristics are executed, a final characteristic subset S is obtained_Finally。

The algorithm selects the characteristics according to the idea of grouping the samples on the basis of the FCBF algorithm, and simultaneously adds the judgment of the correlation between the characteristic set and the category on the basis of selecting the characteristics. Not only is the stability of selecting features increased, but also the redundancy between feature sets is removed. In the embodiment of randomly dividing the sample set into 10 groups, the algorithm of the present invention is 10 times as many as the original algorithm in the process of calculating the feature group, but at the same time, the number of samples is 1/10, so the time of the process can be considered to be no different from the original time. The algorithm adds a merging process of the feature groups, the time complexity in the process is constant complexity O (N), and the time complexity is far lower than O (N) through the process of removing the features in the first step, so the increase of the time complexity of the part can be ignored. In general, the feature subset selected by the algorithm is stable and effective on the basis of not reducing time complexity, and the condition of overfitting of the sample set can be effectively avoided.

The complete process of intrusion detection based on the new algorithm is shown in fig. 2. The specific process mainly comprises the steps of obtaining data, preprocessing the attack data (including but not limited to preprocessing algorithms such as normalization and discretization), selecting features by adopting the algorithm, and carrying out classification verification on the selected features (for example, classifying by adopting a naive Bayes classifier) to identify the attack behaviors and obtain feature subsets. Specifically, acquiring data refers to monitoring and acquiring data in a network, including normal data and data carrying intrusion attacks; the classification verification means that in order to ensure the stability of feature selection by the feature selection algorithm, the extracted sample set is trained and tested in a ten-fold cross validation manner, and an SVM (support vector machine) classifier is used for classification. After the verification is passed, the characteristic subset which can be used for identifying the intrusion detection is obtained, and the characteristic subset is applied to the intrusion detection process, so that the intrusion attack behavior in the network is effectively identified, and the safety in the network is guaranteed.

Where technically feasible, the technical features listed above for different embodiments may be combined with each other or changed, added, omitted, etc. to form further embodiments within the scope of the invention.

It can be seen from the foregoing embodiments that, the intrusion detection method based on feature selection provided in the embodiments of the present invention can select a stable and effective feature subset without reducing the operation speed, and classification identification using the feature subset can improve the classification accuracy, effectively capture the attack behavior of the attack network, and ensure the security of the network environment.

In view of the above object, in another aspect of the embodiments of the present invention, an intrusion detection device based on feature selection is provided, including:

at least one processor; and

Yet another aspect of embodiments of the present invention provides a computer readable storage medium having stored thereon a computer program which, when executed, may implement the method of any one of the above.

Fig. 3 is a schematic hardware structure diagram of an intrusion detection device based on feature selection according to an embodiment of the present invention.

Taking the computer device shown in fig. 3 as an example, the computer device includes a processor 301 and a memory 302, and may further include: an input device 303 and an output device 304.

The processor 301, the memory 302, the input device 303 and the output device 304 may be connected by a bus or other means, and fig. 3 illustrates the connection by a bus as an example.

The memory 302 is a non-volatile computer-readable storage medium, and can be used to store non-volatile software programs, non-volatile computer-executable programs, and modules, such as program instructions/modules corresponding to the intrusion detection method based on feature selection in the embodiment of the present application. The processor 301 executes various functional applications of the server and data processing by running nonvolatile software programs, instructions and modules stored in the memory 302, that is, implements the intrusion detection method based on feature selection of the above-described method embodiment.

The memory 302 may include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required for at least one function; the storage data area may store data created according to an intrusion detection method selected based on the feature, and the like. Further, the memory 302 may include high speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other non-volatile solid state storage device. In some embodiments, memory 302 optionally includes memory located remotely from processor 301, which may be connected to a local module via a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.

The input device 303 may receive input numeric or character information and generate key signal inputs related to user settings and function control of the computer apparatus based on the intrusion detection method of the feature selection. The output means 304 may comprise a display device such as a display screen.

Program instructions/modules corresponding to the one or more feature selection based intrusion detection methods are stored in the memory 302 and, when executed by the processor 301, perform the feature selection based intrusion detection method in any of the above-described method embodiments.

Any embodiment of the computer device for performing the intrusion detection method based on feature selection may achieve the same or similar effects as any corresponding embodiment of the method described above.

Finally, it should be noted that, as will be understood by those skilled in the art, all or part of the processes in the methods of the above embodiments may be implemented by a computer program, which may be stored in a computer-readable storage medium, and when executed, may include the processes of the embodiments of the methods described above. The storage medium may be a magnetic disk, an optical disk, a read-only memory (ROM), or a Random Access Memory (RAM).

In addition, the apparatuses, devices and the like disclosed in the embodiments of the present invention may be various electronic terminal devices, such as a mobile phone, a Personal Digital Assistant (PDA), a tablet computer (PAD), a smart television and the like, or may be a large terminal device, such as a server and the like, and therefore the scope of protection disclosed in the embodiments of the present invention should not be limited to a specific type of apparatus, device. The client disclosed in the embodiment of the present invention may be applied to any one of the above electronic terminal devices in the form of electronic hardware, computer software, or a combination of both.

Furthermore, the method disclosed according to an embodiment of the present invention may also be implemented as a computer program executed by a CPU, which may be stored in a computer-readable storage medium. The computer program, when executed by the CPU, performs the above-described functions defined in the method disclosed in the embodiments of the present invention.

Further, the above method steps and system elements may also be implemented using a controller and a computer readable storage medium for storing a computer program for causing the controller to implement the functions of the above steps or elements.

Further, it should be appreciated that the computer-readable storage media (e.g., memory) described herein can be either volatile memory or nonvolatile memory, or can include both volatile and nonvolatile memory. By way of example, and not limitation, nonvolatile memory can include Read Only Memory (ROM), Programmable ROM (PROM), Electrically Programmable ROM (EPROM), Electrically Erasable Programmable ROM (EEPROM), or flash memory. Volatile memory can include Random Access Memory (RAM), which can act as external cache memory. By way of example and not limitation, RAM is available in a variety of forms such as synchronous RAM (DRAM), Dynamic RAM (DRAM), Synchronous DRAM (SDRAM), double data rate SDRAM (DDR SDRAM), Enhanced SDRAM (ESDRAM), Synchronous Link DRAM (SLDRAM), and Direct Rambus RAM (DRRAM). The storage devices of the disclosed aspects are intended to comprise, without being limited to, these and other suitable types of memory.

Those of skill would further appreciate that the various illustrative logical blocks, modules, circuits, and algorithm steps described in connection with the disclosure herein may be implemented as electronic hardware, computer software, or combinations of both. To clearly illustrate this interchangeability of hardware and software, various illustrative components, blocks, modules, circuits, and steps have been described above generally in terms of their functionality. Whether such functionality is implemented as software or hardware depends upon the particular application and design constraints imposed on the overall system. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the disclosed embodiments of the present invention.

The steps of a method or algorithm described in connection with the disclosure herein may be embodied directly in hardware, in a software module executed by a processor, or in a combination of the two. A software module may reside in RAM memory, flash memory, ROM memory, EPROM memory, EEPROM memory, registers, hard disk, a removable disk, a CD-ROM, or any other form of storage medium known in the art. An exemplary storage medium is coupled to the processor such the processor can read information from, and write information to, the storage medium. In the alternative, the storage medium may be integral to the processor. The processor and the storage medium may reside in an ASIC. The ASIC may reside in a user terminal. In the alternative, the processor and the storage medium may reside as discrete components in a user terminal.

In one or more exemplary designs, the functions may be implemented in hardware, software, firmware, or any combination thereof. If implemented in software, the functions may be stored on or transmitted over as one or more instructions or code on a computer-readable medium. Computer-readable media includes both computer storage media and communication media including any medium that facilitates transfer of a computer program from one place to another. A storage media may be any available media that can be accessed by a general purpose or special purpose computer. By way of example, and not limitation, such computer-readable media can comprise RAM, ROM, EEPROM, CD-ROM or other optical disk storage, magnetic disk storage or other magnetic storage devices, or any other medium that can be used to carry or store desired program code in the form of instructions or data structures and that can be accessed by a general-purpose or special-purpose computer, or a general-purpose or special-purpose processor. Also, any connection is properly termed a computer-readable medium. For example, if the software is transmitted from a website, server, or other remote source using a coaxial cable, fiber optic cable, twisted pair, Digital Subscriber Line (DSL), or wireless technologies such as infrared, radio, and microwave, then the coaxial cable, fiber optic cable, twisted pair, DSL, or wireless technologies such as infrared, radio, and microwave are included in the definition of medium. Disk and disc, as used herein, includes Compact Disc (CD), laser disc, optical disc, Digital Versatile Disc (DVD), floppy disk, blu-ray disc where disks usually reproduce data magnetically, while discs reproduce data optically with lasers. Combinations of the above should also be included within the scope of computer-readable media.

It should be understood that, as used herein, the singular forms "a", "an" and "the" are intended to include the plural forms as well, unless the context clearly supports the exception. It should also be understood that "and/or" as used herein is meant to include any and all possible combinations of one or more of the associated listed items.

The numbers of the embodiments disclosed in the embodiments of the present invention are merely for description, and do not represent the merits of the embodiments.

It will be understood by those skilled in the art that all or part of the steps for implementing the above embodiments may be implemented by hardware, or may be implemented by a program instructing relevant hardware, where the program may be stored in a computer-readable storage medium, and the above-mentioned storage medium may be a read-only memory, a magnetic disk, an optical disk, or the like.

The above-described embodiments are possible examples of implementations and are presented merely for a clear understanding of the principles of the invention. Those of ordinary skill in the art will understand that: the discussion of any embodiment above is meant to be exemplary only, and is not intended to intimate that the scope of the disclosure, including the claims, of embodiments of the invention is limited to these examples; within the idea of an embodiment of the invention, also technical features in the above embodiment or in different embodiments may be combined and there are many other variations of the different aspects of an embodiment of the invention as described above, which are not provided in detail for the sake of brevity. Therefore, any omissions, modifications, substitutions, improvements, and the like that may be made without departing from the spirit and principles of the embodiments of the present invention are intended to be included within the scope of the embodiments of the present invention.

Claims

1. An intrusion detection method based on feature selection is characterized by comprising the following steps:

acquiring network data, preprocessing the data to obtain characteristic samples of the data, randomly dividing the characteristic samples into a plurality of groups, and performing the following calculation on each group through an FCBF characteristic selection algorithm to remove redundant characteristics:

computing each feature sample in each group by an FCBF feature selection algorithmF _iSymmetry uncertainty SU with class C_icAnd setting a symmetry uncertainty threshold, removing feature samples in each group having a symmetry uncertainty less than the threshold with respect to class C, wherein,

IG (X | Y) refers to the mutual information between two variables, H (X), H (Y) refers to the information entropy of the variables, and the threshold value is expressed as

Wherein N represents the number of feature samples in each group;

for each feature sample in the each group of feature samples after the deletion operation is performedF _iSelecting said feature sample in said setF _iEach characteristic sampleF _jAnd calculates the symmetry uncertainty SU of the two_ij；

In response to SU_ij≥SU_jcAnd SU_ic≥SU_jcDeleting feature samples in the groupF _jIn which SU_jcRepresenting the feature sampleF _jUncertainty of symmetry with class C, SU_icRepresenting the feature sampleF _iSymmetry uncertainty with class C;

sequentially taking a feature sample from the set C, forming a set D by the feature sample and the set B, and respectively calculating the correlation among the set B, the set D and the categories, wherein the correlation among each set and the categories is represented as

N is the characteristic number, Avg (SU)_ic) Average of the symmetry uncertainties between all feature samples in the set and class C, Avg (SU)_ij) Is an average of the symmetry uncertainties between the feature samples in the set and the feature samples;

2. The method of claim 1, further comprising:

3. The method of claim 1, wherein merging all the groups of samples with redundant features removed into a set a, and dividing all the feature samples in the set a into a set B and a set C, wherein the set B contains at least one feature sample comprises:

such that the set B contains one feature sample.

4. The method of claim 2, further comprising:

5. An intrusion detection device based on feature selection, comprising:

at least one processor; and

a memory storing program code executable by the processor, the program code implementing the method of any one of claims 1-4 when executed by the processor.

6. A computer-readable storage medium, having stored thereon a computer program which, when executed, implements the method of any of claims 1-4.