CN117176442A - Illegal network access detection method and system based on DNA spatial information weight - Google Patents
Illegal network access detection method and system based on DNA spatial information weight Download PDFInfo
- Publication number
- CN117176442A CN117176442A CN202311194014.7A CN202311194014A CN117176442A CN 117176442 A CN117176442 A CN 117176442A CN 202311194014 A CN202311194014 A CN 202311194014A CN 117176442 A CN117176442 A CN 117176442A
- Authority
- CN
- China
- Prior art keywords
- dna
- information weight
- translation
- features
- sequence set
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000001514 detection method Methods 0.000 title claims abstract description 49
- 108020004414 DNA Proteins 0.000 claims abstract description 69
- 238000013519 translation Methods 0.000 claims abstract description 68
- 108091028043 Nucleic acid sequence Proteins 0.000 claims abstract description 41
- 239000011159 matrix material Substances 0.000 claims abstract description 31
- 239000012634 fragment Substances 0.000 claims abstract description 19
- 238000004422 calculation algorithm Methods 0.000 claims abstract description 12
- 238000007637 random forest analysis Methods 0.000 claims abstract description 8
- 238000000034 method Methods 0.000 claims description 33
- 108091081062 Repeated sequence (DNA) Proteins 0.000 claims description 12
- 238000004364 calculation method Methods 0.000 claims description 7
- 230000008569 process Effects 0.000 claims description 6
- 238000000605 extraction Methods 0.000 claims description 4
- 230000006870 function Effects 0.000 claims description 3
- 239000003550 marker Substances 0.000 claims description 3
- 230000000694 effects Effects 0.000 description 5
- 239000000523 sample Substances 0.000 description 5
- 238000013135 deep learning Methods 0.000 description 4
- 238000012549 training Methods 0.000 description 4
- 230000006399 behavior Effects 0.000 description 3
- 238000013507 mapping Methods 0.000 description 3
- 238000012545 processing Methods 0.000 description 3
- 238000004590 computer program Methods 0.000 description 2
- 238000010801 machine learning Methods 0.000 description 2
- 239000000463 material Substances 0.000 description 2
- 230000003287 optical effect Effects 0.000 description 2
- NAWXUBYGYWOOIX-SFHVURJKSA-N (2s)-2-[[4-[2-(2,4-diaminoquinazolin-6-yl)ethyl]benzoyl]amino]-4-methylidenepentanedioic acid Chemical compound C1=CC2=NC(N)=NC(N)=C2C=C1CCC1=CC=C(C(=O)N[C@@H](CC(=C)C(O)=O)C(O)=O)C=C1 NAWXUBYGYWOOIX-SFHVURJKSA-N 0.000 description 1
- 238000003491 array Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000005540 biological transmission Effects 0.000 description 1
- 238000007635 classification algorithm Methods 0.000 description 1
- 238000004891 communication Methods 0.000 description 1
- 238000010586 diagram Methods 0.000 description 1
- 238000011156 evaluation Methods 0.000 description 1
- 238000002474 experimental method Methods 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000012544 monitoring process Methods 0.000 description 1
- 238000010606 normalization Methods 0.000 description 1
- 239000013307 optical fiber Substances 0.000 description 1
- 238000011084 recovery Methods 0.000 description 1
- 230000003252 repetitive effect Effects 0.000 description 1
- 239000004065 semiconductor Substances 0.000 description 1
- 238000012360 testing method Methods 0.000 description 1
Abstract
The invention discloses an illegal network access detection method and system based on DNA space information weight, which relates to the technical field of network security and comprises the following steps: receiving network flow data, sorting the characteristic types of the network flow data, and integrating the network flow data to generate a network flow data set; generating a DNA translation rule dictionary according to the characteristic type of the network traffic data; performing translation coding on the network flow data set by using a DNA translation rule dictionary to obtain a DNA sequence set; extracting deep features of the DNA sequence set to obtain an information weight matrix, wherein the types and positions of DNA fragments in the DNA sequence set are considered when the deep features of the DNA sequence set are extracted; and identifying and classifying the information weight matrix by using a random forest algorithm to obtain a network intrusion detection result.
Description
Technical Field
The invention relates to the technical field of network security, in particular to an illegal network access detection method and system based on DNA spatial information weight.
Background
Nowadays, various network terminals and server nodes are increasingly popularized, so that the network is easily attacked by various malicious attacks, the normal operation of the network is affected, and the transmission and storage safety of data are threatened. In particular, in the application scenarios of the internet of vehicles and the internet of things, data leakage, network hijacking and communication delay caused by any network intrusion may cause disastrous results. Therefore, an Intrusion Detection System (IDS) plays an important role in the current network environment, and through real-time monitoring of network traffic data, the IDS can discover intrusion behavior and send out alarms in time, and the intrusion detection performance of the IDS has an important meaning on the network operation safety.
Network intrusion detection is generally regarded as a classification problem in which network behavior is discriminated from network traffic data, and some classification algorithms based on machine learning or deep learning are first applied in this field. However, since the data features are too complex and generally include symbols, continuous real numbers, discrete real numbers, etc., only a simple data normalization method may cause a certain degree of feature loss. And the deep learning can integrate a coding network in a training algorithm, and optimally code flow data through multiple iterations so as to further mine deep data characteristics. Meanwhile, the deep learning algorithm can strengthen the feature extraction of few attack types through a data enhancement network, and solve the problem of unbalanced sample types. Therefore, in the network intrusion detection algorithm, the detection effect based on deep learning is better.
In recent years, more students began to try machine learning algorithms that incorporate the idea of encoding. But there are several general problems:
1. the traditional feature coding method cuts apart discrete features and continuous features, which leads to inconsistent evaluation standards of different features, and feature dimensions still exist after standardization. This makes intrusion detection less accurate.
2. The calculated amount is large, and the real-time requirement cannot be met.
Disclosure of Invention
In order to solve the above-mentioned shortcomings in the background art, the present invention aims to provide a method and a system for detecting illegal network access based on DNA spatial information weight.
The aim of the invention can be achieved by the following technical scheme: an illegal network access detection method based on DNA space information weight comprises the following steps:
receiving network flow data, sorting the characteristic types of the network flow data, and integrating the network flow data to generate a network flow data set;
generating a DNA translation rule dictionary according to the characteristic type of the network traffic data;
performing translation coding on the network flow data set by using a DNA translation rule dictionary to obtain a DNA sequence set;
extracting deep features of the DNA sequence set to obtain an information weight matrix, wherein the types and positions of DNA fragments in the DNA sequence set are considered when the deep features of the DNA sequence set are extracted;
and identifying and classifying the information weight matrix by using a random forest algorithm to obtain a network intrusion detection result.
Preferably, the type of classifying the characteristics of the network traffic data set includes: digital type features and character type features.
Preferably, the DNA translation rule dictionary, the character dictionary includes: identifying a feature dictionary, a protocol feature dictionary and a service feature dictionary; the digital dictionary includes: a digital feature dictionary and a long digital feature dictionary.
Preferably, the DNA translation rule dictionary translation process:
translation is performed using 3 1-position, non-repetitive DNA base pair protocol features; translation of the service features with 71 4-position, non-repetitive DNA fragments; translation of the marker feature with 11 2-position, non-repetitive DNA fragments; the digital signature was translated using 11 2-position, non-repetitive DNA fragments.
For long digital features, according to the long digital feature interval, 8 DNA fragments, which are not repeated with each other at 2 bits, are used for translation.
Preferably, the encoding rule for the translation encoding of the network traffic data set using the DNA translation rule dictionary is as follows:
for protocol features, service features, identification features and long-number features, DNA translation can be completed by directly comparing with a DNA translation rule dictionary;
for other digital features, firstly splitting the digital features according to the bits, and finally, sequentially completing DNA translation by comparing the split single digits with a DNA translation rule dictionary.
Preferably, the process of extracting the DNA sequence set to obtain the information weight matrix includes: constructing a base position frequency matrix, calculating information weight and reconstructing the information weight matrix.
Preferably, the calculation model of the base position frequency matrix PFM is:
wherein: k represents a base type, p k,J Representing the frequency of occurrence of base k in the j-th column in the context of DNA sequence set M;
p k,j the calculation model of (2) is as follows:
wherein b i,J Bases on row i and column j in the DNA sequence set; i is a base presence determination function defined as follows:
preferably, the information weight calculation model is as follows:
f in k For the probability distribution of k in the whole sequence set, w k,J Information weight for base k in column J;
the information weight matrix is obtained by combining the formula as follows:
in order to achieve the above object, the present invention discloses an illegal network access detection system based on DNA spatial information weight, comprising:
and a feature classification module: the method comprises the steps of receiving network traffic data, sorting the characteristic types of the network traffic data, and integrating the network traffic data to generate a network traffic data set;
translation module: the method comprises the steps of generating a DNA translation rule dictionary according to the characteristic type of network traffic data;
and a coding module: the method comprises the steps of performing translation coding on a network traffic data set by using a DNA translation rule dictionary to obtain a DNA sequence set;
and the information weight extraction module is used for: the method comprises the steps of extracting deep features of a DNA sequence set to obtain an information weight matrix, wherein the types and positions of DNA fragments in the DNA sequence set are considered when the deep features of the DNA sequence set are extracted;
and (5) an identification and classification module: the method is used for identifying and classifying the information weight matrix by utilizing a random forest algorithm to obtain a network intrusion detection result.
In another aspect of the present invention, in order to achieve the above object, there is disclosed an apparatus, comprising:
one or more processors;
a memory for storing one or more programs;
when one or more of the programs are executed by one or more of the processors, the one or more of the processors implement an illegal network access detection method based on DNA spatial information weights as described above.
The invention has the beneficial effects that:
firstly, designing a DNA coding strategy, reconstructing network flow data by using a DNA sequence, mapping original features by using bases on specific spatial positions, and finishing standardized representation of the data. And then, by constructing an information weight matrix, deep features of network flow data in the DNA sequence set are extracted, and the precision of intrusion detection is ensured. And finally, classifying the information weight matrix by using a random forest algorithm, and judging the network intrusion behavior. Experiments prove that the method has higher detection efficiency, and improves the recognition accuracy of few attack samples on the premise of ensuring higher overall detection rate and lower false alarm rate.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described, and it will be obvious to those skilled in the art that other drawings can be obtained according to these drawings without inventive effort;
FIG. 1 is a schematic flow chart of the method of the present invention;
FIG. 2 is a schematic of the overall workflow of the present invention;
FIG. 3 is a graph comparing the detection effect of the present invention example with that of the most advanced class-based intrusion detection methods;
FIG. 4 is a graph comparing the detection effect of the present invention with that of the most advanced data-based intrusion detection methods;
fig. 5 is a schematic diagram of the system structure of the present invention.
Detailed Description
The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
As shown in fig. 1, an illegal network access detection method based on DNA spatial information weight includes the following steps:
the illegal network access detection method based on the DNA spatial information weight is characterized by comprising the following steps:
receiving network flow data, sorting the characteristic types of the network flow data, and integrating the network flow data to generate a network flow data set;
in this embodiment, the types of feature classification of the network traffic data set include: the method comprises the steps of dividing the 1 st, 5 th and 6 th features in an original sample into long digital features;
the network traffic data set TR consists of n samples TR i The mathematical definition of the composition is shown in formula 1:
TR={Tr 1 ,Tr 2 ,...,Tr i ,...,Tr n },i∈[1,n] (1)
wherein each sample Tr i Are each composed of 41 features X, including 3 character-type features (protocol type P, service type S, and flag type F) and 38 digit-type features N. The definition is shown in formula 2:
Tr i ={X 1 ,X 2 ,...,X j ,...,X 41 }
where:j∈[1,41],X 2 ∈P,X 3 ∈S,X 4 ∈F,X else ∈N (2)
generating a DNA translation rule dictionary according to the characteristic type of the network traffic data;
the DNA translation rule dictionary includes character-labeled feature dictionaries such as: a recognition feature dictionary, a protocol feature dictionary, and a service feature dictionary; digital feature dictionary: a digital feature dictionary and a long digital feature dictionary;
the DNA translation rules dictionary translation process:
to control the length and dimension of the translated network traffic data, a translation strategy should be adopted that is as compact as possible, and 3 1-bit and non-repetitive DNA base pair protocol features should be used for translation; translation of the service features with 71 4-position, non-repetitive DNA fragments; translation of the marker feature with 11 2-position, non-repetitive DNA fragments; the digital signature was translated using 11 2-position, non-repetitive DNA fragments.
For long digital features, according to the long digital feature interval, 8 DNA fragments, which are not repeated with each other at 2 bits, are used for translation.
Performing translation coding on the network flow data set by using a DNA translation rule dictionary to obtain a DNA sequence set;
in this embodiment, the encoding rule for performing translation encoding on the network traffic data set by using the DNA translation rule dictionary is as follows: for protocol features, service features, identification features and long-number features, DNA translation can be completed by directly comparing with a DNA translation rule dictionary;
for other digital features, firstly splitting the digital features according to the bits, and finally, sequentially completing DNA translation by comparing the split single digits with a DNA translation rule dictionary.
To realize coding of the DNA of Tr, the DNA-SE establishes coding rule dictionaries for four characteristics of numbers, protocols, services and marks.
A. Digital feature dictionary
In the digital feature dictionary, the set N to be encoded is '0' - '9' and '.' total eleven characters, which can be described losslessly by random non-repeated combination of two bases. Random (x) 1 ,x 2 ,x 3 ,x 4 N) logic defines the above procedure: at x 1 ,x 2 ,x 3 ,x 4 N elements are randomly selected for non-repetitive arrangement.
Equation (3) defines the generation logic of the digital feature dictionary.
EncodeDigit(X j )=Random(A,G,C,T,2)
where:X j ∈N;EncodeDigit(X j )≠EncodeDigit(X else(j) ) (3)
B. Protocol feature dictionary
The content to be coded of the protocol dictionary is three protocol types of TCP, UDP and ICMP in the set P, and translation can be completed by using a single base, and the rule is shown in a formula (4).
EncodeProtocol(X j )=Random(A,G,C,T,1)
where:X j ∈P;EncodeProtocol(X j )≠EncodeProtocol(X else(j) ) (4)
C. Service feature dictionary
Since 71 elements exist in the service type S, four base pairs are required for complete translation, and the rule is shown in formula (5).
D. Mark feature dictionary
The signature feature set F contains 11 elements, which are translated using a non-repeating combination of two bases, as shown in equation (6).
EncodeFlag(X j )=Random(A,G,C,T,2)
where:X j ∈F;EncodeFlag(X j )≠EncodeFlag(X else(j) ) (6)
E. Long digital feature dictionary
In NSL-KDD data set, the 1 st, 5 th and 6 th features are of digital type, but the value range is larger [0,1.38 multiplied by 10 ] 9 ]If the EncodeDigit (N) i ) The problem of different DNA sequences of different Trs is caused, which is unfavorable for subsequent processing. For these three features, therefore, a new translation rule is formulated according to the data length, as shown in equation (7).
The complete coding rule dictionary is obtained from equation (3) -equation (7) as shown in table 1.
Table 1 translation rules dictionary
Extracting deep features of the DNA sequence set to obtain an information weight matrix, wherein the types and positions of DNA fragments in the DNA sequence set are considered when the deep features of the DNA sequence set are extracted;
the process for extracting the DNA sequence set to obtain the information weight matrix comprises the following steps: constructing a base position frequency matrix, information weight calculation and information weight matrix reconstruction;
Tr i after completion of the DNA encoding, a DNA sequence m consisting of 169 bases b was obtained i As shown in formula (8):
m i =Encode(Tr i )=b i,1 ,b i,2 ,...,b i,J ,...,b i,169
where:J=[1,169] (8)
for the whole data set TR, a DNA sequence set M consisting of n M can be obtained through DNA-SE, and the expression is shown in the formula (9):
then, a base Position Frequency Matrix (PFM) was constructed to represent the frequency of occurrence of bases at the same positions in each sequence m:
in the formula (10), k represents a base type, p k,J The frequency of occurrence of the base k in the j-th column in the background of the DNA sequence set M can be defined by the formula (11):
in the formula (11), I is a base presence judging function, and is defined as shown in the formula (12):
the amount of information represented by the presence of base k on the j-th column can be calculated from the formula (11). But taking into account the probability f of the overall distribution of bases in the context of the entire sequence set k Influence on the information quantity at a specific position [36 ] on the basis of the information quantity model]Redefining the information weight w k,j As shown in equation (13).
F in k For the probability distribution of k in the whole sequence set, w k,J Information weight for base k in column J; can be used to characterize the information that base k appears at position J.
Information feature matrices can be constructed in combination with equations (8), (9) and (13), as shown in equation (14).
From equation (14), it can be seen that there is a unique mapping of all elements in M to information weights in IFM, which is a unique representation of the statistical characteristics of M. Tr can be queried through IFM i Translates each feature of (a) into an information weight corresponding to the DNA code, so that we compose a specific position base information weight s by the sum of the information weights at the corresponding positions in the IFM i To represent Tr i Is a feature of (a).
Tr is known from the coding rules i Feature X of (3) i,j At m i Mapping relationships in, e.g.Tr at this time i Corresponding s i The expression is:
and similarly, constructing a specific position base information weight matrix.
And identifying and classifying the information weight matrix by using a random forest algorithm to obtain a network intrusion detection result.
The actions and effects of the example implementation are as follows:
table 2 example test effects
As can be seen from table 2: the accuracy of Normal detection of the inventive example is 86.92%; the overall accuracy of the attach is 97.70%, wherein the detection accuracy of Dos and Probe Attack types is 98.73% and 100.00%, respectively, and in two Attack types R2L and U2R with less number of recognized training samples and higher detection difficulty, the detection accuracy reaches 93.14%, and the detection accuracy of the attach is 94.95%. This indicates that DNA-SIF has a better generalization ability and is affected very little by small samples.
As can be seen from fig. 3 and 4: compared with the most advanced intrusion detection methods at present, the method has obvious improvement on four indexes of Accumacy, precision, recovery and F1-score, and overcomes the phenomenon of low detection Precision and the like caused by uneven sample distribution.
TABLE 3 training time and detection time for intrusion detection methods
As can be seen from table 3: the inventive examples exhibit shorter training times and detection times than the current most advanced intrusion detection methods.
In a second aspect, as shown in fig. 5. In order to achieve the above objective, an embodiment of the present invention discloses an illegal network access detection system based on DNA spatial information weight, including:
and a feature classification module: the method comprises the steps of receiving network traffic data, sorting the characteristic types of the network traffic data, and integrating the network traffic data to generate a network traffic data set;
translation module: the method comprises the steps of generating a DNA translation rule dictionary according to the characteristic type of network traffic data;
and a coding module: the method comprises the steps of performing translation coding on a network traffic data set by using a DNA translation rule dictionary to obtain a DNA sequence set;
and the information weight extraction module is used for: the method comprises the steps of extracting deep features of a DNA sequence set to obtain an information weight matrix, wherein the types and positions of DNA fragments in the DNA sequence set are considered when the deep features of the DNA sequence set are extracted;
and (5) an identification and classification module: the method is used for identifying and classifying the information weight matrix by utilizing a random forest algorithm to obtain a network intrusion detection result.
Based on the same inventive concept, the present invention also provides a computer apparatus comprising: one or more processors, and memory for storing one or more computer programs; the program includes program instructions and the processor is configured to execute the program instructions stored in the memory. The processor may be a central processing unit (Central Processing Unit, CPU), but may also be other general purpose processors, digital signal processors (Digital Signal Processor, DSP), application specific integrated circuits (Application SpecificIntegrated Circuit, ASIC), field-Programmable gate arrays (FPGAs) or other Programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, etc., which are the computational core and control core of the terminal for implementing one or more instructions, in particular for loading and executing one or more instructions within a computer storage medium to implement the methods described above.
It should be further noted that, based on the same inventive concept, the present invention also provides a computer storage medium having a computer program stored thereon, which when executed by a processor performs the above method. The storage media may take the form of any combination of one or more computer-readable media. The computer readable medium may be a computer readable signal medium or a computer readable storage medium. The computer readable storage medium can be, for example, but not limited to, an electronic, magnetic, optical, electrical, magnetic, infrared, or semiconductor system, apparatus, or device, or a combination of any of the foregoing. More specific examples (a non-exhaustive list) of the computer-readable storage medium would include the following: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the context of this document, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device.
In the description of the present specification, the descriptions of the terms "one embodiment," "example," "specific example," and the like, mean that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the present disclosure. In this specification, schematic representations of the above terms do not necessarily refer to the same embodiments or examples. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples.
The foregoing has shown and described the basic principles, principal features, and advantages of the present disclosure. It will be understood by those skilled in the art that the present disclosure is not limited to the embodiments described above, which have been described in the foregoing and description merely illustrates the principles of the disclosure, and that various changes and modifications may be made therein without departing from the spirit and scope of the disclosure, which is defined in the appended claims.
Claims (10)
1. The illegal network access detection method based on the DNA spatial information weight is characterized by comprising the following steps:
receiving network flow data, sorting the characteristic types of the network flow data, and integrating the network flow data to generate a network flow data set;
generating a DNA translation rule dictionary according to the characteristic type of the network traffic data;
performing translation coding on the network flow data set by using a DNA translation rule dictionary to obtain a DNA sequence set;
extracting deep features of the DNA sequence set to obtain an information weight matrix, wherein the types and positions of DNA fragments in the DNA sequence set are considered when the deep features of the DNA sequence set are extracted;
and identifying and classifying the information weight matrix by using a random forest algorithm to obtain a network intrusion detection result.
2. The method for detecting illegal network access based on DNA spatial information weight according to claim 1, wherein the type of feature classification of the network traffic data set comprises: digital type features and character type features.
3. The illegal network access detection method based on DNA spatial information weight according to claim 1, wherein the DNA translation rule dictionary includes an identification character type feature dictionary and a digital feature dictionary, and the identification character type feature dictionary includes: a feature dictionary, a protocol feature dictionary, and a service feature dictionary; the digital feature dictionary includes: a digital feature dictionary, a long digital feature dictionary.
4. The illegal network access detection method based on DNA spatial information weight according to claim 3, wherein the DNA translation rule dictionary translation process:
translation is performed using 3 1-position, non-repetitive DNA base pair protocol features; translation of the service features with 71 4-position, non-repetitive DNA fragments; translation of the marker feature with 11 2-position, non-repetitive DNA fragments; the digital signature was translated using 11 2-position, non-repetitive DNA fragments.
For long digital features, according to the long digital feature interval, 8 DNA fragments, which are not repeated with each other at 2 bits, are used for translation.
5. The illegal network access detection method based on DNA spatial information weight according to claim 1, wherein the encoding rule for the translation encoding of the network traffic data set using the DNA translation rule dictionary is as follows:
for protocol features, service features, identification features and long-number features, DNA translation can be completed by directly comparing with a DNA translation rule dictionary;
for other digital features, firstly splitting the digital features according to the bits, and finally, sequentially completing DNA translation by comparing the split single digits with a DNA translation rule dictionary.
6. The method for detecting illegal network access based on DNA space information weight according to claim 1, wherein the process of extracting the DNA sequence set to obtain the information weight matrix comprises the following steps: constructing a base position frequency matrix, calculating information weight and reconstructing the information weight matrix.
7. The illegal network access detection method based on the DNA spatial information weight according to claim 6, wherein the calculation model of the base position frequency matrix PFM is:
wherein: k is E (A, G)C, T) represents a base type, p k,J Representing the frequency of occurrence of base k in the j-th column in the context of DNA sequence set M;
p k,j the calculation model of (2) is as follows:
wherein b i,J Bases on row i and column j in the DNA sequence set; i is a base presence determination function defined as follows:
8. the illegal network access detection method based on DNA spatial information weight according to claim 6, wherein the information weight calculation model is as follows:
f in k For the probability distribution of k in the whole sequence set, w k,J Information weight for base k in column J;
the information weight matrix is obtained by combining the formula as follows:
9. an illegal network access detection system based on DNA spatial information weight, comprising:
and a feature classification module: the method comprises the steps of receiving network traffic data, sorting the characteristic types of the network traffic data, and integrating the network traffic data to generate a network traffic data set;
translation module: the method comprises the steps of generating a DNA translation rule dictionary according to the characteristic type of network traffic data;
and a coding module: the method comprises the steps of performing translation coding on a network traffic data set by using a DNA translation rule dictionary to obtain a DNA sequence set;
and the information weight extraction module is used for: the method comprises the steps of extracting deep features of a DNA sequence set to obtain an information weight matrix, wherein the types and positions of DNA fragments in the DNA sequence set are considered when the deep features of the DNA sequence set are extracted;
and (5) an identification and classification module: the method is used for identifying and classifying the information weight matrix by utilizing a random forest algorithm to obtain a network intrusion detection result.
10. An apparatus, comprising:
one or more processors;
a memory for storing one or more programs;
when one or more of the programs are executed by one or more of the processors, the one or more of the processors implement a method for illegal network access detection based on DNA spatial information weights as claimed in any one of claims 1 to 8.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202311194014.7A CN117176442A (en) | 2023-09-15 | 2023-09-15 | Illegal network access detection method and system based on DNA spatial information weight |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202311194014.7A CN117176442A (en) | 2023-09-15 | 2023-09-15 | Illegal network access detection method and system based on DNA spatial information weight |
Publications (1)
Publication Number | Publication Date |
---|---|
CN117176442A true CN117176442A (en) | 2023-12-05 |
Family
ID=88929589
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202311194014.7A Pending CN117176442A (en) | 2023-09-15 | 2023-09-15 | Illegal network access detection method and system based on DNA spatial information weight |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN117176442A (en) |
-
2023
- 2023-09-15 CN CN202311194014.7A patent/CN117176442A/en active Pending
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN108737406B (en) | Method and system for detecting abnormal flow data | |
CN108897732B (en) | Statement type identification method and device, storage medium and electronic device | |
CN111818198B (en) | Domain name detection method, domain name detection device, equipment and medium | |
CN110175851B (en) | Cheating behavior detection method and device | |
CN106874253A (en) | Recognize the method and device of sensitive information | |
CN113381962B (en) | Data processing method, device and storage medium | |
CN112468659B (en) | Quality evaluation method, device, equipment and storage medium applied to telephone customer service | |
CN113381963B (en) | Domain name detection method, device and storage medium | |
CN115174250B (en) | Network asset security assessment method and device, electronic equipment and storage medium | |
CN113315789A (en) | Web attack detection method and system based on multi-level combined network | |
CN114095212B (en) | Method and device for countertraining DGA domain name detection model | |
CN115314236A (en) | System and method for detecting phishing domains in a Domain Name System (DNS) record set | |
CN113904834B (en) | XSS attack detection method based on machine learning | |
CN115146068A (en) | Method, device and equipment for extracting relation triples and storage medium | |
CN107562720B (en) | Alarm data matching method for electric power information network security linkage defense | |
CN113949525A (en) | Method and device for detecting abnormal access behavior, storage medium and electronic equipment | |
CN116962009A (en) | Network attack detection method and device | |
CN117150294A (en) | Outlier detection method, outlier detection device, electronic equipment and storage medium | |
CN117176442A (en) | Illegal network access detection method and system based on DNA spatial information weight | |
CN113688240A (en) | Threat element extraction method, device, equipment and storage medium | |
CN114528908A (en) | Network request data classification model training method, classification method and storage medium | |
CN114510720A (en) | Android malicious software classification method based on feature fusion and NLP technology | |
CN112950222A (en) | Resource processing abnormity detection method and device, electronic equipment and storage medium | |
CN116305171B (en) | Component vulnerability analysis method, device, equipment and storage medium | |
CN115718696B (en) | Source code cryptography misuse detection method and device, electronic equipment and storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |