CN111310176B - Intrusion detection method and device based on feature selection - Google Patents

Intrusion detection method and device based on feature selection Download PDF

Info

Publication number
CN111310176B
CN111310176B CN202010062791.6A CN202010062791A CN111310176B CN 111310176 B CN111310176 B CN 111310176B CN 202010062791 A CN202010062791 A CN 202010062791A CN 111310176 B CN111310176 B CN 111310176B
Authority
CN
China
Prior art keywords
feature
samples
correlation
sample
characteristic
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202010062791.6A
Other languages
Chinese (zh)
Other versions
CN111310176A (en
Inventor
闫利华
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Suzhou Inspur Intelligent Technology Co Ltd
Original Assignee
Suzhou Inspur Intelligent Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Suzhou Inspur Intelligent Technology Co Ltd filed Critical Suzhou Inspur Intelligent Technology Co Ltd
Priority to CN202010062791.6A priority Critical patent/CN111310176B/en
Publication of CN111310176A publication Critical patent/CN111310176A/en
Application granted granted Critical
Publication of CN111310176B publication Critical patent/CN111310176B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/50Monitoring users, programs or devices to maintain the integrity of platforms, e.g. of processors, firmware or operating systems
    • G06F21/55Detecting local intrusion or implementing counter-measures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting

Abstract

The invention provides an intrusion detection method and device based on feature selection, wherein the method comprises the following steps: acquiring and preprocessing network data to obtain characteristic samples of the network data, randomly dividing the characteristic samples into a plurality of groups, and calculating each group by an FCBF characteristic selection algorithm to remove redundant characteristics; combining all the removed group samples into a set A, and dividing the characteristic samples of the set A into a set B and a set C, wherein the set B comprises at least one characteristic sample; sequentially taking a characteristic sample from the set C, forming a set D by the characteristic sample and the set B, and respectively calculating the correlation among the set B, the set D and the category; and in response to the correlation between the set B and the category being smaller than the correlation between the set D and the category, replacing the set D with a new set B and returning to the previous step until the set C is empty to obtain a feature set for intrusion detection. The method and the device can extract the feature set with strong correlation and stability, thereby identifying the network intrusion more effectively.

Description

Intrusion detection method and device based on feature selection
Technical Field
The present invention relates to the field of computers, and more particularly, to a method and apparatus for intrusion detection based on feature selection.
Background
With the rapid development of big data and cloud computing, the network intrusion means has the characteristics of concealment and silence, and the demand of people on network security is increasing day by day. However, as the amount of data increases, the main stream network intrusion model has lower and lower intrusion recognition efficiency. Because the data is not only large in scale but also high in dimensionality, a large amount of redundant information exists, and the effect of intrusion detection can be greatly reduced by the irrelevant information. The characteristic selection is an effective means for solving the problem, and a good characteristic selection algorithm can effectively eliminate redundant characteristics or noise data existing in classified data and improve the speed and accuracy of intrusion detection. A feature selection algorithm that is robust in performance is therefore very important for the identification of intrusion detection.
FCBF (Fast Correlation-based Feature Selection), a Fast Feature Selection method based on Feature Correlation evaluation, which mainly comprises two steps: removing irrelevant features and removing redundant features by adopting a sequential forward search algorithm. The FCBF algorithm sets a threshold value, and the features with the correlation between the features and the classes smaller than the threshold value are considered as irrelevant features, so that the dimensionality of data can be greatly reduced by removing the irrelevant features. And eliminating redundant features in the feature subset by adopting a sequential forward search algorithm for the remaining related features so as to obtain a final feature subset.
The main problems of the algorithm adopting FCBF are:
when the FCBF feature selection algorithm is used for feature selection, all samples are calculated simultaneously, and therefore a final feature subset is obtained. The process of feature selection is performed only once and the subset of features obtained in this way is valid for all samples at present under a certain evaluation criterion. However, the feature subsets obtained in this way are prone to overfitting, and once the sample set changes, the feature subsets are not suitable for a new sample set, and the classification accuracy is reduced by using the original feature subsets.
When the FCBF algorithm eliminates the redundancy characteristics, the redundancy judgment condition is as follows: for two features Fi、Fj,FiCorrelation with class C is greater than FjCorrelation with class C, and FiAnd FjHas a correlation between F and FjCorrelation with class C. But FiAnd FjCorrelation between F andjthere is no direct correlation with the category C. At this time, the relevant features may be deleted as redundant features, resulting in a decrease in the classification accuracy.
How to reduce the overfitting condition of the feature subset on the basis of not reducing the feature selection speed so as to improve the accuracy of intrusion detection becomes a problem to be solved by the invention.
Disclosure of Invention
In view of this, an object of the embodiments of the present invention is to provide a method and an apparatus for intrusion detection based on feature selection, which perform optimization based on an FCBF feature selection algorithm, and select a more stable and effective feature subset without reducing a computation speed.
Based on the above object, an aspect of the embodiments of the present invention provides an intrusion detection method based on feature selection, including the following steps:
acquiring network data, preprocessing the data to obtain characteristic samples of the data, randomly dividing the characteristic samples into a plurality of groups, and calculating each group by an FCBF characteristic selection algorithm to remove redundant characteristics;
combining all the group samples with the redundant features removed into a set A, and dividing all the feature samples in the set A into a set B and a set C, wherein the set B comprises at least one feature sample;
sequentially taking a feature sample from the set C, forming a set D by the feature sample and the set B, and respectively calculating the correlations among the set B, the set D and the categories;
and in response to the correlation between the set B and the category being smaller than the correlation between the set D and the category, replacing the set D with a new set B and returning to the previous step until the set C is empty to obtain a feature set for intrusion detection.
In some embodiments, the method further comprises:
in response to the correlation between the set B and the category being not less than the correlation between the set D and the category, leaving the set B unchanged and returning to the previous step until the set C is empty.
In some embodiments, the calculating, by the FCBF feature selection algorithm, redundant features for each group comprises:
calculating each feature sample F in each group by FCBF feature selection algorithmiSymmetry uncertainty SU with class CicAnd setting a symmetry uncertainty threshold, and deleting the feature samples in each group, wherein the symmetry uncertainty of each group to the class C is smaller than the threshold.
In some embodiments, said calculating by the FCBF feature selection algorithm to remove redundant features for each group further comprises:
for each feature sample F in each set of feature samples after the deletion operationiSelecting said feature samples F in said setiEach of the latter feature samples FjAnd calculates the symmetry uncertainty SU of the twoij
In response to SUij≥SUjcAnd SUic≥SUjcDeleting the feature samples F in the groupjIn which SUjcTo representThe characteristic sample FjUncertainty of symmetry with class C, SUicRepresenting said feature sample FiUncertainty of symmetry with class C.
In some embodiments, the threshold is represented as
Figure BDA0002375037180000031
Where N represents the number of feature samples in each group.
In some embodiments, the correlation between each of the sets and categories is expressed as
Figure BDA0002375037180000032
Where N is the characteristic number, Avg (SU)ic) Average of the symmetry uncertainties between all feature samples in the set and class C, Avg (SU)ij) Is the average of the feature samples in the set and the symmetry uncertainties between the feature samples.
In some embodiments, merging all the groups of samples with redundant features removed into a set a, and dividing all the feature samples in the set a into a set B and a set C, where the set B includes at least one feature sample includes:
such that the set B contains one feature sample.
In some embodiments, the method further comprises:
and training and testing the extracted feature samples in the set B to obtain a feature subset for identifying network intrusion detection.
Another aspect of the embodiments of the present invention provides an intrusion detection apparatus based on feature selection, including:
at least one processor; and
a memory storing program code executable by the processor, the program code implementing the method of any of the above when executed by the processor.
A further aspect of embodiments of the present invention provides a computer readable storage medium having stored thereon a computer program which, when executed, may implement the method of any one of the above.
The invention has the following beneficial technical effects: the intrusion detection method and the intrusion detection device based on the feature selection can select the stable and effective feature subset without reducing the operation speed, and the subset is used for classification and identification, so that the classification accuracy can be improved, the attack network attack behavior can be effectively captured, and the safety of the network environment can be guaranteed.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art that other embodiments can be obtained by using the drawings without creative efforts.
FIG. 1 is a flow chart of a method of intrusion detection based on feature selection according to the present invention;
FIG. 2 is a flow diagram of a method for intrusion detection based on feature selection according to one embodiment of the present invention;
fig. 3 is a schematic diagram of a hardware structure of an intrusion detection device based on feature selection according to the present invention.
Detailed Description
Embodiments of the present invention are described below. However, it is to be understood that the disclosed embodiments are merely examples and that other embodiments may take various and alternative forms. The figures are not necessarily to scale; certain features may be exaggerated or minimized to show details of particular components. Therefore, specific structural and functional details disclosed herein are not to be interpreted as limiting, but merely as a representative basis for teaching one skilled in the art to variously employ the present invention. As one of ordinary skill in the art will appreciate, various features illustrated and described with reference to any one of the figures may be combined with features illustrated in one or more other figures to produce embodiments that are not explicitly illustrated or described. The combination of features shown provides a representative embodiment for a typical application. However, various combinations and modifications of the features consistent with the teachings of the present invention may be desired for certain specific applications or implementations.
In order to make the objects, technical solutions and advantages of the present invention more apparent, the following embodiments of the present invention are described in further detail with reference to the accompanying drawings.
In view of the above object, an aspect of the embodiments of the present invention provides a method for intrusion detection based on feature selection, as shown in fig. 1, including the following steps:
step S101: acquiring network data, preprocessing the data to obtain characteristic samples of the data, randomly dividing the characteristic samples into a plurality of groups, and calculating each group by an FCBF characteristic selection algorithm to remove redundant characteristics;
step S102: combining all the group samples with the redundant features removed into a set A, and dividing all the feature samples in the set A into a set B and a set C, wherein the set B comprises at least one feature sample;
step S103: sequentially taking a feature sample from the set C, forming a set D by the feature sample and the set B, and respectively calculating the correlations among the set B, the set D and the categories;
step S104: and in response to the correlation between the set B and the category being smaller than the correlation between the set D and the category, replacing the set D with a new set B and returning to the previous step until the set C is empty to obtain a feature set for intrusion detection.
In some embodiments, the method further comprises: in response to the correlation between the set B and the category not being less than the correlation between the set D and the category, the set B is made unchanged and returned to the previous step (i.e., step S103) until the set C is empty.
In some embodiments, the method further comprises: and training and testing the extracted feature samples in the set B to obtain a feature subset for identifying network intrusion detection.
In some embodiments, the calculating the redundant features for each group by the FCBF feature selection algorithm comprises: calculating each feature sample F in each group by FCBF feature selection algorithmiSymmetry uncertainty SU with class CicAnd setting a symmetry uncertainty threshold, and deleting the feature samples in each group, wherein the symmetry uncertainty of each group to the class C is smaller than the threshold.
In some embodiments, the calculating by the FCBF feature selection algorithm to remove redundant features for each group further comprises: for each feature sample F in each set of feature samples after the deletion operationiSelecting said feature samples F in said setiEach of the latter feature samples FjAnd calculates the symmetry uncertainty SU of the twoij(ii) a In response to SUij≥SUjcAnd SUic≥SUjcDeleting the feature samples F in the groupjIn which SUicRepresenting said feature sample FjUncertainty of symmetry with class C, SUicRepresenting said feature sample FiUncertainty of symmetry with class C.
In some embodiments, the threshold is represented as
Figure BDA0002375037180000061
Where N represents the number of feature samples in each group.
In some embodiments, the correlation between each of the sets and categories is represented as
Figure BDA0002375037180000062
Wherein N is the characteristic number, Avg (SU)ic) Avg (SU) as the average of the symmetry uncertainties between all feature samples in the set and class Cij) Is the mean of the symmetry uncertainties between the feature samples in the set and the feature samples.
In some embodiments, merging all the groups of samples with redundant features removed into a set a, and dividing all the feature samples in the set a into a set B and a set C, where the set B includes at least one feature sample includes: such that the set B contains one feature sample.
In some embodiments, the FCBF algorithm is a representative method in filtering feature selection that employs an evaluation criterion of Symmetry Uncertainty (SU) for measuring the degree of association between a feature and a class or between features. I.e., for variable X, Y, the uncertainty of symmetry between them is:
Figure BDA0002375037180000071
wherein IG (X | Y) refers to how much mutual information between two variables is; h (X), H (Y) denote the information entropy of the variables.
In addition, set-to-class Correlation (SC) is defined. Namely:
Figure BDA0002375037180000072
where N is the number of features in the set, Avg (SU)ic) Avg (SU), the average of the correlations between all features and classesij) Is the average of the correlation between features.
The steps for implementing the intrusion detection algorithm according to the present invention are mainly embodied in two aspects: selecting a characteristic group in a grouping mode; and selecting a final feature set by adopting a forward search mode.
The process is detailed as follows:
(1) and (3) extracting a characteristic group: the sample set is randomly divided into a plurality of groups, 10 groups are taken as an example, S1,S2,…,S10(ii) a For each sample set SiRemoving irrelevant features in the set, and calculating the relevance degree SU between the features and the categories for each featureicSorting the selected features in descending order according to the degree of association, setting a threshold
Figure BDA0002375037180000073
The features after the position are irrelevant features, and are deleted, so that ten sets of feature sets with redundancies removed are obtained: s1-1,S1-2,…,S1-10
Then ten sets of feature sets S1-iAnd executing the operation of removing redundant features to obtain a feature group extracted from ten groups of sample sets. The specific process is as follows: for feature set S1-iEach feature F iniSelecting FiEach feature F of the latterjCalculate SUij. If SUij≥SUjcAnd SUic≥SUjcThen, explain FjThis feature is in combination with FiThe feature has stronger relevance and can be represented by the feature FiAnd (4) replacing. So FjIs a redundant feature and should be deleted. This is performed on ten feature sets, resulting in a valid ten-set feature subset S for ten sample setsS-1,SS-2,…,SS-10
(2) Merging the feature groups, selecting the final feature subset: merging ten sets of signatures SS-1,SS-2,…,SS-10Obtaining an initial set SInit(ii) a Taking an initial set SInitFirst characteristic F of1Setting the final feature subset to SFinally={F1And S, and SInit=SInit-{F1}; to SInitEach feature F ini,SInit=SInit-{Fi},Stemp=SFinally+{Fi}, calculating the set SFinallyAnd StempCorrelation with categories if
Figure BDA0002375037180000081
Then consider FiNeeds to be reserved when SFinally=Stemp. After all the characteristics are executed, a final characteristic subset S is obtainedFinally
The algorithm selects the characteristics according to the idea of grouping the samples on the basis of the FCBF algorithm, and simultaneously adds the judgment of the correlation between the characteristic set and the category on the basis of selecting the characteristics. Not only is the stability of selecting features increased, but also the redundancy between feature sets is removed. In the embodiment of randomly dividing the sample set into 10 groups, the algorithm of the present invention is 10 times as many as the original algorithm in the process of calculating the feature group, but at the same time, the number of samples is 1/10, so the time of the process can be considered to be no different from the original time. The algorithm adds a merging process of the feature groups, the time complexity in the process is constant complexity O (N), and the time complexity is far lower than O (N) through the process of removing the features in the first step, so the increase of the time complexity of the part can be ignored. In general, the feature subset selected by the algorithm is stable and effective on the basis of not reducing time complexity, and the condition of overfitting of the sample set can be effectively avoided.
The complete process of intrusion detection based on the new algorithm is shown in fig. 2. The specific process mainly comprises the steps of obtaining data, preprocessing the attack data (including but not limited to preprocessing algorithms such as normalization and discretization), selecting features by adopting the algorithm, and carrying out classification verification on the selected features (for example, classifying by adopting a naive Bayes classifier) to identify the attack behaviors and obtain feature subsets. Specifically, acquiring data refers to monitoring and acquiring data in a network, including normal data and data carrying intrusion attacks; the classification verification means that in order to ensure the stability of feature selection by the feature selection algorithm, the extracted sample set is trained and tested in a ten-fold cross validation manner, and an SVM (support vector machine) classifier is used for classification. After the verification is passed, the characteristic subset which can be used for identifying the intrusion detection is obtained, and the characteristic subset is applied to the intrusion detection process, so that the intrusion attack behavior in the network is effectively identified, and the safety in the network is guaranteed.
Where technically feasible, the technical features listed above for different embodiments may be combined with each other or changed, added, omitted, etc. to form further embodiments within the scope of the invention.
It can be seen from the foregoing embodiments that, the intrusion detection method based on feature selection provided in the embodiments of the present invention can select a stable and effective feature subset without reducing the operation speed, and classification identification using the feature subset can improve the classification accuracy, effectively capture the attack behavior of the attack network, and ensure the security of the network environment.
In view of the above object, in another aspect of the embodiments of the present invention, an intrusion detection device based on feature selection is provided, including:
at least one processor; and
a memory storing program code executable by the processor, the program code implementing the method of any of the above when executed by the processor.
Yet another aspect of embodiments of the present invention provides a computer readable storage medium having stored thereon a computer program which, when executed, may implement the method of any one of the above.
Fig. 3 is a schematic hardware structure diagram of an intrusion detection device based on feature selection according to an embodiment of the present invention.
Taking the computer device shown in fig. 3 as an example, the computer device includes a processor 301 and a memory 302, and may further include: an input device 303 and an output device 304.
The processor 301, the memory 302, the input device 303 and the output device 304 may be connected by a bus or other means, and fig. 3 illustrates the connection by a bus as an example.
The memory 302 is a non-volatile computer-readable storage medium, and can be used to store non-volatile software programs, non-volatile computer-executable programs, and modules, such as program instructions/modules corresponding to the intrusion detection method based on feature selection in the embodiment of the present application. The processor 301 executes various functional applications of the server and data processing by running nonvolatile software programs, instructions and modules stored in the memory 302, that is, implements the intrusion detection method based on feature selection of the above-described method embodiment.
The memory 302 may include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required for at least one function; the storage data area may store data created according to an intrusion detection method selected based on the feature, and the like. Further, the memory 302 may include high speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other non-volatile solid state storage device. In some embodiments, memory 302 optionally includes memory located remotely from processor 301, which may be connected to a local module via a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.
The input device 303 may receive input numeric or character information and generate key signal inputs related to user settings and function control of the computer apparatus based on the intrusion detection method of the feature selection. The output means 304 may comprise a display device such as a display screen.
Program instructions/modules corresponding to the one or more feature selection based intrusion detection methods are stored in the memory 302 and, when executed by the processor 301, perform the feature selection based intrusion detection method in any of the above-described method embodiments.
Any embodiment of the computer device for performing the intrusion detection method based on feature selection may achieve the same or similar effects as any corresponding embodiment of the method described above.
Finally, it should be noted that, as will be understood by those skilled in the art, all or part of the processes in the methods of the above embodiments may be implemented by a computer program, which may be stored in a computer-readable storage medium, and when executed, may include the processes of the embodiments of the methods described above. The storage medium may be a magnetic disk, an optical disk, a read-only memory (ROM), or a Random Access Memory (RAM).
In addition, the apparatuses, devices and the like disclosed in the embodiments of the present invention may be various electronic terminal devices, such as a mobile phone, a Personal Digital Assistant (PDA), a tablet computer (PAD), a smart television and the like, or may be a large terminal device, such as a server and the like, and therefore the scope of protection disclosed in the embodiments of the present invention should not be limited to a specific type of apparatus, device. The client disclosed in the embodiment of the present invention may be applied to any one of the above electronic terminal devices in the form of electronic hardware, computer software, or a combination of both.
Furthermore, the method disclosed according to an embodiment of the present invention may also be implemented as a computer program executed by a CPU, which may be stored in a computer-readable storage medium. The computer program, when executed by the CPU, performs the above-described functions defined in the method disclosed in the embodiments of the present invention.
Further, the above method steps and system elements may also be implemented using a controller and a computer readable storage medium for storing a computer program for causing the controller to implement the functions of the above steps or elements.
Further, it should be appreciated that the computer-readable storage media (e.g., memory) described herein can be either volatile memory or nonvolatile memory, or can include both volatile and nonvolatile memory. By way of example, and not limitation, nonvolatile memory can include Read Only Memory (ROM), Programmable ROM (PROM), Electrically Programmable ROM (EPROM), Electrically Erasable Programmable ROM (EEPROM), or flash memory. Volatile memory can include Random Access Memory (RAM), which can act as external cache memory. By way of example and not limitation, RAM is available in a variety of forms such as synchronous RAM (DRAM), Dynamic RAM (DRAM), Synchronous DRAM (SDRAM), double data rate SDRAM (DDR SDRAM), Enhanced SDRAM (ESDRAM), Synchronous Link DRAM (SLDRAM), and Direct Rambus RAM (DRRAM). The storage devices of the disclosed aspects are intended to comprise, without being limited to, these and other suitable types of memory.
Those of skill would further appreciate that the various illustrative logical blocks, modules, circuits, and algorithm steps described in connection with the disclosure herein may be implemented as electronic hardware, computer software, or combinations of both. To clearly illustrate this interchangeability of hardware and software, various illustrative components, blocks, modules, circuits, and steps have been described above generally in terms of their functionality. Whether such functionality is implemented as software or hardware depends upon the particular application and design constraints imposed on the overall system. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the disclosed embodiments of the present invention.
The steps of a method or algorithm described in connection with the disclosure herein may be embodied directly in hardware, in a software module executed by a processor, or in a combination of the two. A software module may reside in RAM memory, flash memory, ROM memory, EPROM memory, EEPROM memory, registers, hard disk, a removable disk, a CD-ROM, or any other form of storage medium known in the art. An exemplary storage medium is coupled to the processor such the processor can read information from, and write information to, the storage medium. In the alternative, the storage medium may be integral to the processor. The processor and the storage medium may reside in an ASIC. The ASIC may reside in a user terminal. In the alternative, the processor and the storage medium may reside as discrete components in a user terminal.
In one or more exemplary designs, the functions may be implemented in hardware, software, firmware, or any combination thereof. If implemented in software, the functions may be stored on or transmitted over as one or more instructions or code on a computer-readable medium. Computer-readable media includes both computer storage media and communication media including any medium that facilitates transfer of a computer program from one place to another. A storage media may be any available media that can be accessed by a general purpose or special purpose computer. By way of example, and not limitation, such computer-readable media can comprise RAM, ROM, EEPROM, CD-ROM or other optical disk storage, magnetic disk storage or other magnetic storage devices, or any other medium that can be used to carry or store desired program code in the form of instructions or data structures and that can be accessed by a general-purpose or special-purpose computer, or a general-purpose or special-purpose processor. Also, any connection is properly termed a computer-readable medium. For example, if the software is transmitted from a website, server, or other remote source using a coaxial cable, fiber optic cable, twisted pair, Digital Subscriber Line (DSL), or wireless technologies such as infrared, radio, and microwave, then the coaxial cable, fiber optic cable, twisted pair, DSL, or wireless technologies such as infrared, radio, and microwave are included in the definition of medium. Disk and disc, as used herein, includes Compact Disc (CD), laser disc, optical disc, Digital Versatile Disc (DVD), floppy disk, blu-ray disc where disks usually reproduce data magnetically, while discs reproduce data optically with lasers. Combinations of the above should also be included within the scope of computer-readable media.
It should be understood that, as used herein, the singular forms "a", "an" and "the" are intended to include the plural forms as well, unless the context clearly supports the exception. It should also be understood that "and/or" as used herein is meant to include any and all possible combinations of one or more of the associated listed items.
The numbers of the embodiments disclosed in the embodiments of the present invention are merely for description, and do not represent the merits of the embodiments.
It will be understood by those skilled in the art that all or part of the steps for implementing the above embodiments may be implemented by hardware, or may be implemented by a program instructing relevant hardware, where the program may be stored in a computer-readable storage medium, and the above-mentioned storage medium may be a read-only memory, a magnetic disk, an optical disk, or the like.
The above-described embodiments are possible examples of implementations and are presented merely for a clear understanding of the principles of the invention. Those of ordinary skill in the art will understand that: the discussion of any embodiment above is meant to be exemplary only, and is not intended to intimate that the scope of the disclosure, including the claims, of embodiments of the invention is limited to these examples; within the idea of an embodiment of the invention, also technical features in the above embodiment or in different embodiments may be combined and there are many other variations of the different aspects of an embodiment of the invention as described above, which are not provided in detail for the sake of brevity. Therefore, any omissions, modifications, substitutions, improvements, and the like that may be made without departing from the spirit and principles of the embodiments of the present invention are intended to be included within the scope of the embodiments of the present invention.

Claims (6)

1. An intrusion detection method based on feature selection is characterized by comprising the following steps:
acquiring network data, preprocessing the data to obtain characteristic samples of the data, randomly dividing the characteristic samples into a plurality of groups, and performing the following calculation on each group through an FCBF characteristic selection algorithm to remove redundant characteristics:
computing each feature sample in each group by an FCBF feature selection algorithmF i Symmetry uncertainty SU with class CicAnd setting a symmetry uncertainty threshold, removing feature samples in each group having a symmetry uncertainty less than the threshold with respect to class C, wherein,
Figure 468177DEST_PATH_IMAGE001
IG (X | Y) refers to the mutual information between two variables, H (X), H (Y) refers to the information entropy of the variables, and the threshold value is expressed as
Figure 982335DEST_PATH_IMAGE002
Wherein N represents the number of feature samples in each group;
for each feature sample in the each group of feature samples after the deletion operation is performedF i Selecting said feature sample in said setF i Each characteristic sampleF j And calculates the symmetry uncertainty SU of the twoij
In response to SUij≥SUjcAnd SUic≥SUjcDeleting feature samples in the groupF j In which SUjcRepresenting the feature sampleF j Uncertainty of symmetry with class C, SUicRepresenting the feature sampleF i Symmetry uncertainty with class C;
combining all the group samples with the redundant features removed into a set A, and dividing all the feature samples in the set A into a set B and a set C, wherein the set B comprises at least one feature sample;
sequentially taking a feature sample from the set C, forming a set D by the feature sample and the set B, and respectively calculating the correlation among the set B, the set D and the categories, wherein the correlation among each set and the categories is represented as
Figure 580807DEST_PATH_IMAGE003
N is the characteristic number, Avg (SU)ic) Average of the symmetry uncertainties between all feature samples in the set and class C, Avg (SU)ij) Is an average of the symmetry uncertainties between the feature samples in the set and the feature samples;
and in response to the correlation between the set B and the category being smaller than the correlation between the set D and the category, replacing the set D with a new set B and returning to the previous step until the set C is empty to obtain a feature set for intrusion detection.
2. The method of claim 1, further comprising:
in response to the correlation between the set B and the category being not less than the correlation between the set D and the category, leaving the set B unchanged and returning to the previous step until the set C is empty.
3. The method of claim 1, wherein merging all the groups of samples with redundant features removed into a set a, and dividing all the feature samples in the set a into a set B and a set C, wherein the set B contains at least one feature sample comprises:
such that the set B contains one feature sample.
4. The method of claim 2, further comprising:
and training and testing the extracted feature samples in the set B to obtain a feature subset for identifying network intrusion detection.
5. An intrusion detection device based on feature selection, comprising:
at least one processor; and
a memory storing program code executable by the processor, the program code implementing the method of any one of claims 1-4 when executed by the processor.
6. A computer-readable storage medium, having stored thereon a computer program which, when executed, implements the method of any of claims 1-4.
CN202010062791.6A 2020-01-19 2020-01-19 Intrusion detection method and device based on feature selection Active CN111310176B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010062791.6A CN111310176B (en) 2020-01-19 2020-01-19 Intrusion detection method and device based on feature selection

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010062791.6A CN111310176B (en) 2020-01-19 2020-01-19 Intrusion detection method and device based on feature selection

Publications (2)

Publication Number Publication Date
CN111310176A CN111310176A (en) 2020-06-19
CN111310176B true CN111310176B (en) 2022-05-27

Family

ID=71156502

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010062791.6A Active CN111310176B (en) 2020-01-19 2020-01-19 Intrusion detection method and device based on feature selection

Country Status (1)

Country Link
CN (1) CN111310176B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113141357B (en) * 2021-04-19 2022-02-18 湖南大学 Feature selection method and system for optimizing network intrusion detection performance

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108768946A (en) * 2018-04-27 2018-11-06 中山大学 A kind of Internet Intrusion Detection Model based on random forests algorithm
CN109818961A (en) * 2019-01-30 2019-05-28 广东工业大学 A kind of network inbreak detection method, device and equipment
CN110149330A (en) * 2019-05-22 2019-08-20 潘晓君 PSO feature selecting weight intrusion detection method and system based on information gain

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108768946A (en) * 2018-04-27 2018-11-06 中山大学 A kind of Internet Intrusion Detection Model based on random forests algorithm
CN109818961A (en) * 2019-01-30 2019-05-28 广东工业大学 A kind of network inbreak detection method, device and equipment
CN110149330A (en) * 2019-05-22 2019-08-20 潘晓君 PSO feature selecting weight intrusion detection method and system based on information gain

Also Published As

Publication number Publication date
CN111310176A (en) 2020-06-19

Similar Documents

Publication Publication Date Title
US10609050B2 (en) Methods and systems for malware detection
CN108681746B (en) Image identification method and device, electronic equipment and computer readable medium
CN111614599B (en) Webshell detection method and device based on artificial intelligence
CN106778241B (en) Malicious file identification method and device
WO2019128529A1 (en) Url attack detection method and apparatus, and electronic device
CN110362677B (en) Text data category identification method and device, storage medium and computer equipment
CN109978060B (en) Training method and device of natural language element extraction model
CN111382434A (en) System and method for detecting malicious files
Karataş et al. Big Data: controlling fraud by using machine learning libraries on Spark
CN110602120B (en) Network-oriented intrusion data detection method
Gabryel et al. Browser fingerprint coding methods increasing the effectiveness of user identification in the web traffic
CN114650176A (en) Phishing website detection method and device, computer equipment and storage medium
Assefa et al. Intelligent phishing website detection using deep learning
CN111310176B (en) Intrusion detection method and device based on feature selection
CN116032741A (en) Equipment identification method and device, electronic equipment and computer storage medium
CN111353514A (en) Model training method, image recognition method, device and terminal equipment
CN113886821A (en) Malicious process identification method and device based on twin network, electronic equipment and storage medium
US20210044864A1 (en) Method and apparatus for identifying video content based on biometric features of characters
CN112257689A (en) Training and recognition method of face recognition model, storage medium and related equipment
CN115314268B (en) Malicious encryption traffic detection method and system based on traffic fingerprint and behavior
CN114841705B (en) Anti-fraud monitoring method based on scene recognition
Wang et al. Malware detection using cnn via word embedding in cloud computing infrastructure
CN111159996B (en) Short text set similarity comparison method and system based on text fingerprint algorithm
CN113312619A (en) Malicious process detection method and device based on small sample learning, electronic equipment and storage medium
CN113259369A (en) Data set authentication method and system based on machine learning member inference attack

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant