CN112437084A - Attack feature extraction method - Google Patents

Attack feature extraction method Download PDF

Info

Publication number
CN112437084A
CN112437084A CN202011319212.8A CN202011319212A CN112437084A CN 112437084 A CN112437084 A CN 112437084A CN 202011319212 A CN202011319212 A CN 202011319212A CN 112437084 A CN112437084 A CN 112437084A
Authority
CN
China
Prior art keywords
attack
character
matrix
characters
field
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202011319212.8A
Other languages
Chinese (zh)
Other versions
CN112437084B (en
Inventor
王高翃
贾宝林
朱连凯
连栋
王英
陆炜
张家鹏
陈政熙
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Institute of Process Automation Instrumentation
Original Assignee
Shanghai Institute of Process Automation Instrumentation
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Institute of Process Automation Instrumentation filed Critical Shanghai Institute of Process Automation Instrumentation
Priority to CN202011319212.8A priority Critical patent/CN112437084B/en
Publication of CN112437084A publication Critical patent/CN112437084A/en
Application granted granted Critical
Publication of CN112437084B publication Critical patent/CN112437084B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/14Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic
    • H04L63/1408Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic by monitoring network traffic
    • H04L63/1416Event detection, e.g. attack signature detection
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L69/00Network arrangements, protocols or services independent of the application payload and not provided for in the other groups of this subclass
    • H04L69/22Parsing or analysis of headers

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Security & Cryptography (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Computation (AREA)
  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Signal Processing (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Computer Hardware Design (AREA)
  • Computer And Data Communications (AREA)

Abstract

The invention discloses a method for extracting attack features, which is characterized by comprising the following steps of: the method comprises the steps of obtaining attack fields through protocol analysis and converting the attack fields into a digital matrix, carrying out preliminary separation on field modes through special characters, carrying out statistical analysis on separated data and updating the digital matrix, carrying out statistics on repeated field combinations in the updated digital matrix, establishing an attack characteristic classification model, extracting key characteristics related to attacks in known and unknown attack fields, and predicting attack types of known and unknown attack information. The invention establishes a universal attack characteristic extraction method based on the analysis of the network communication protocol and the understanding of the attack characteristics, and classifies the attack types according to the relevant characteristics. Through analysis of different attack samples, information with attack characteristics in the attack field is extracted by using a statistical method, and on the basis, classification modeling is carried out on attack types, so that efficient and objective automatic extraction of the attack characteristics is realized.

Description

Attack feature extraction method
Technical Field
The invention relates to a method for extracting attack features, in particular to a method for extracting network attack features, and belongs to the field of data extraction.
Background
With the increasing scale of networks, the number of network attacks is also increased. How to ensure the normal and stable operation of a network system becomes the main subject of network security, and attack detection based on attack characteristics becomes the most common detection mode. The attack characteristics are a summarized description of the attack behavior, and generally, the attack characteristics are unique characteristics in the flow data generated by the attack, and an attack behavior can be intuitively found and determined through the characteristics and cannot cause great influence on daily production life. For an unknown attack behavior, the characteristics of the unknown attack behavior need to be analyzed and extracted so as to provide early warning and defense for the attack.
The existing attack feature automatic extraction technology is divided into a network-based attack feature extraction technology and a host-based attack feature extraction technology. The attack feature extraction technology based on the network extracts the attack features in the attack information by an algorithm by utilizing the attack information on the network; the attack feature extraction technology based on the host computer obtains relevant attack information from the attacked host computer and analyzes the information to obtain features by changing the system environment to a certain extent. The accuracy, the feature extraction speed, the feature usability and the method of the two methods have different degrees of advantages and disadvantages.
The process of extracting the attack features is very complicated, the speed of extracting the attack features by adopting an infiltration expert is low, the subjectivity is high, and the effectiveness of the extracted features cannot be determined. Therefore, an efficient and objective automatic attack feature extraction technology is needed.
Disclosure of Invention
The technical problem to be solved by the invention is as follows: the existing attack feature extraction mode is slow in speed and high in subjectivity, and the effectiveness of the extracted features cannot be determined.
In order to solve the above problems, the technical solution of the present invention is to provide a method for extracting attack features, which is characterized by comprising the following steps:
step one, acquiring an attack field through protocol analysis, converting data of the attack field expressed by binary stream into a digital matrix by taking bytes as a unit, and processing fields with different lengths by using masks;
determining characteristic characters serving as separators, and performing preliminary separation on the field modes through special characters;
step three: carrying out statistical analysis on the separated data, carrying out statistics on occurrence frequency information of character strings among all corresponding separators, extracting common key fields, setting corresponding threshold values, representing some character strings which occur frequently by uniform marking sequence numbers, and updating a sequence number-character/character string correspondence table and a number sequence number matrix to obtain preliminarily extracted key attack fields;
step four: counting field combinations which repeatedly appear in the updated numerical matrix, selecting the length n of a character and character string combination, counting the occurrence frequency of the character and character string combination with the length n, setting a corresponding threshold value, extracting the combination of the character and character string with more occurrence frequency, and combining adjacent combinations to obtain the final extracted feature information;
step five: establishing an attack feature classification model, extracting key features related to attacks in known and unknown attack fields through training and application of a Recurrent Neural Network (RNN) model on the basis, and predicting the attack types of known and unknown attack information.
Preferably, in the first step, for a case that 256 characters do not completely appear in the attack field, all the appearing characters are sorted correspondingly and a sequence number-character correspondence table is recorded, and finally, the sequence numbers of the corresponding characters are stored in a digital matrix to obtain a digital signal matrix, each line of the digital matrix represents one attack field, the original attack field can be obtained by searching the sequence number-character correspondence table, and the sequence numbers in the obtained digital sequence number matrix correspond to the characters in the original attack field one by one.
Preferably, after the number matrix is obtained, a threshold value may be set based on statistical information of the frequency of occurrence of characters, some characters with less occurrence are represented by uniform labeled sequence numbers, and the sequence number-character correspondence table and the number sequence number matrix are updated, where the sequence number in the updated number sequence number matrix may correspond to one or more characters in the original attack field.
Preferably, after the digital matrix is updated, for the case that the lengths of the attack fields are different, the fields with different lengths are recorded and processed in the form of a mask matrix.
Preferably, the characteristic characters include paired delimiters, juxtaposed delimiters, and assigned numbers.
Preferably, the sequence number in the updated numeric sequence number matrix in step three may correspond to a single character, multiple characters, or a specific common character string, where the common character string is a key attack field extracted preliminarily.
Preferably, the fourth step is specifically to count the serial number pairs of the beginning and the end of the character and character string combination with the length of n, count the character and character string combination with the fixed beginning and the end on the basis of the occurrence frequency of the character and character string combination, record the occurrence position of the combination, merge the adjacent combinations with the occurrence frequency above a preset threshold value through comparison of the position information according to the obtained position information of the character string combination, and combine the combined character and character string combination into the finally extracted attack features.
Preferably, the establishing of the attack feature classification model includes performing label work on each character string combination in the field and the attack type of each piece of attack information, and training and applying through a Recurrent Neural Network (RNN) model on the basis.
Compared with the prior art, the invention has the beneficial effects that:
the invention establishes a universal attack characteristic extraction method based on the analysis of the network communication protocol and the understanding of the attack characteristics, and classifies the attack types according to the relevant characteristics. In the method, through the analysis of different attack samples, information with attack characteristics in an attack field is extracted by using a statistical method, and on the basis, the attack types are classified and modeled, so that the high-efficiency and objective automatic extraction of the attack characteristics is realized.
Drawings
FIG. 1 is a flow chart of a method for attack feature extraction according to the present invention;
FIG. 2 is a diagram illustrating a conversion of an attack field into a number matrix according to an embodiment of the present invention;
FIG. 3 is a diagram of a mask matrix according to an embodiment of the present invention;
FIG. 4 is a diagram illustrating an embodiment of a data field pattern after initial segmentation;
fig. 5 is a schematic diagram of a merged common character string according to an embodiment of the present invention.
Detailed Description
In order to make the invention more comprehensible, preferred embodiments are described in detail below with reference to the accompanying drawings.
As shown in fig. 1, in view of the problems encountered in the attack behavior feature extraction process in the background art, the present invention provides an attack feature extraction method, which is based on the analysis of a network communication protocol and the understanding of attack features, establishes a general attack feature extraction method, and classifies attack types according to related features. In the method, through the analysis of different attack samples, information with attack characteristics in an attack field is extracted by using a statistical method, and on the basis, the classification modeling is carried out on the attack types. Specifically, the method comprises the following steps:
the method comprises the following steps: the method comprises the steps of obtaining an attack field through protocol analysis and converting the attack field into a digital matrix, converting the attack field expressed by binary stream into the digital matrix by taking bytes as units, and processing fields with different lengths by using masks.
Since a single byte can represent a total of 256 characters, typically, all 256 different characters will not be present in the attack field. And for the condition that 256 characters do not completely appear in the attack field, correspondingly sequencing all the appeared characters, recording a sequence number-character corresponding table, and finally storing the sequence numbers of the corresponding characters in a digital matrix to obtain the digital signal matrix. Each line of the digital matrix represents an attack field, and the original attack field can be obtained by searching the sequence number-character corresponding table. The serial numbers in the obtained digital serial number matrix correspond to the characters in the original attack field one by one. After the number matrix is obtained, a threshold value can be set based on the frequency statistical information of the occurrence of characters, some characters with less occurrence are represented by uniform marked sequence numbers, and the sequence number-character correspondence table and the number sequence number matrix are updated. The sequence numbers in the updated numerical sequence number matrix may correspond to one or more characters in the original attack field. After the digital matrix is updated, for the condition that the lengths of all attack fields are different, the fields with different lengths are recorded and processed in the form of a mask matrix.
For example, the attack information includes 3000 pieces of information as follows:
mail[#post_render][]=passthru&mail[#type]=markup&mail[#markup]=/usr/b in/who&form_id=user_register_form
form_id=user_register_form&mail[#post_render][]=exec&mail[#type]=mark up&mail[#markup]=/bin/hostname
after conversion to a digital matrix as shown in figure 2.
The corresponding mask lengths are 103 and 100 and the mask matrix is shown in fig. 3.
Step two: the field pattern is preliminarily divided by special characters, and characteristic characters as separators are determined, including pairs of symbols [ ], "", etc., and parallel separators &,/etc., and assignment symbols ═ are included.
Step three: and performing statistical analysis on the separated data, performing statistics on occurrence frequency information of character strings among all corresponding separators, setting corresponding threshold values, representing some frequently-occurring character strings by uniform marked serial numbers, and updating a serial number-character (character string) corresponding table and a numerical serial number matrix. The sequence numbers in the updated numeric sequence number matrix may correspond to a single character, multiple characters, or a particular common string of characters. These common strings are the key attack fields extracted preliminarily.
The separation of each piece of information is shown in fig. 4 according to the statistical result of the occurrence frequency of the character string after the separation is completed.
The character strings appearing less frequently between separators are replaced with ". about", and the character strings appearing more frequently include "[ # post _ render ]", "[ ]" [ # type ] "and the like in the order of appearance.
Step four: combinations of fields that occur repeatedly are counted in the reorganized number matrix. Selecting the length n of a character and character string combination, counting the occurrence frequency of the character and character string combination with the length n, setting a corresponding threshold value, extracting the combination of the character and character string with more occurrence frequency, and combining adjacent combinations to finally extract the characteristic information.
Specifically, the serial number pairs of the beginning and the end of the character and character string combination with the length of n are counted, the character and character string combination with the fixed beginning and the fixed end is counted on the basis of the occurrence frequency of the character and character string combination, and the occurrence position of the combination is recorded. And comparing the obtained position information of the character string combination and combining adjacent combinations with the occurrence frequency above a preset threshold value. And combining the characters and character strings obtained after combination into the finally extracted attack features.
The common character string (part) obtained after combining the character string combinations according to the occurrence frequency is shown in fig. 5.
Step five: establishing an attack characteristic classification model, including performing label work on each character string combination in the field and performing label work on the attack type of each attack information, and on the basis, extracting key characteristics related to the attack in the known and unknown attack fields through training and application of models such as a Recurrent Neural Network (RNN) and the like, and predicting the attack types of the known and unknown attack information.

Claims (8)

1. A method for extracting attack features is characterized by comprising the following steps:
step one, acquiring an attack field through protocol analysis, converting data of the attack field expressed by binary stream into a digital matrix by taking bytes as a unit, and processing fields with different lengths by using masks;
determining characteristic characters serving as separators, and performing preliminary separation on the field modes through special characters;
step three: carrying out statistical analysis on the separated data, carrying out statistics on occurrence frequency information of character strings among all corresponding separators, extracting common key fields, setting corresponding threshold values, representing some character strings which occur frequently by uniform marking sequence numbers, and updating a sequence number-character/character string correspondence table and a number sequence number matrix to obtain preliminarily extracted key attack fields;
step four: counting field combinations which repeatedly appear in the updated numerical matrix, selecting the length n of a character and character string combination, counting the occurrence frequency of the character and character string combination with the length n, setting a corresponding threshold value, extracting the combination of the character and character string with more occurrence frequency, and combining adjacent combinations to obtain the final extracted feature information;
step five: establishing an attack feature classification model, extracting key features related to attacks in known and unknown attack fields through training and application of a Recurrent Neural Network (RNN) model on the basis, and predicting the attack types of known and unknown attack information.
2. A method of attack feature extraction as claimed in claim 1, wherein: in the first step, for the condition that 256 characters do not completely appear in the attack field, all the characters appearing are sorted correspondingly, a sequence number-character corresponding table is recorded, finally, the sequence numbers of the corresponding characters are stored in a digital matrix to obtain a digital signal matrix, each line of the digital matrix represents one attack field, the original attack field can be obtained by searching the sequence number-character corresponding table, and the sequence numbers in the obtained digital sequence number matrix correspond to the characters in the original attack field one by one.
3. A method of attack feature extraction as claimed in claim 2, wherein: after the number matrix is obtained, a threshold value can be set based on the frequency statistical information of the occurrence of characters, some characters with less occurrence are represented by uniform marked sequence numbers, the sequence number-character correspondence table and the number sequence number matrix are updated, and the sequence number in the updated number sequence number matrix can correspond to one or more characters in the original attack field.
4. A method of attack feature extraction as claimed in claim 3, wherein: after the digital matrix is updated, for the condition that the lengths of all attack fields are different, the fields with different lengths are recorded and processed in the form of a mask matrix.
5. A method of attack feature extraction as claimed in claim 1, wherein: the characteristic characters comprise paired separators, separators in a parallel relation and assignment numbers.
6. A method of attack feature extraction as claimed in claim 1, wherein: the sequence number in the updated numerical sequence number matrix in step three may correspond to a single character, multiple characters, or a specific common character string, where the common character string is a key attack field extracted preliminarily.
7. A method of attack feature extraction as claimed in claim 1, wherein: the fourth step is specifically that the serial number pairs of the beginning and the end of the character and character string combination with the length of n are counted, the character and character string combination with the fixed beginning and the end are counted on the basis of the occurrence frequency of the character and character string combination, the occurrence position of the combination is recorded, the adjacent combinations with the occurrence frequency higher than the preset threshold value are combined through the comparison of the position information according to the obtained position information of the character string combination, and the combined character and character string combination is the attack feature which is finally extracted.
8. A method of attack feature extraction as claimed in claim 1, wherein: the establishment of the attack characteristic classification model comprises the steps of carrying out label work on each character string combination in the field and the attack type of each piece of attack information, and training and applying through a Recurrent Neural Network (RNN) model on the basis.
CN202011319212.8A 2020-11-23 2020-11-23 Attack feature extraction method Active CN112437084B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011319212.8A CN112437084B (en) 2020-11-23 2020-11-23 Attack feature extraction method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011319212.8A CN112437084B (en) 2020-11-23 2020-11-23 Attack feature extraction method

Publications (2)

Publication Number Publication Date
CN112437084A true CN112437084A (en) 2021-03-02
CN112437084B CN112437084B (en) 2023-02-28

Family

ID=74693557

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011319212.8A Active CN112437084B (en) 2020-11-23 2020-11-23 Attack feature extraction method

Country Status (1)

Country Link
CN (1) CN112437084B (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113542286A (en) * 2021-07-20 2021-10-22 龙海 Intelligent detection system for computer network security intrusion
CN113612786A (en) * 2021-08-09 2021-11-05 上海交通大学宁波人工智能研究院 Intrusion detection system and method for vehicle bus

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20160226894A1 (en) * 2015-02-04 2016-08-04 Electronics And Telecommunications Research Institute System and method for detecting intrusion intelligently based on automatic detection of new attack type and update of attack type model
CN106302436A (en) * 2016-08-11 2017-01-04 广州华多网络科技有限公司 The method that independently finds, device and the equipment of a kind of attack message characteristics
US20180212986A1 (en) * 2015-08-17 2018-07-26 NSFOCUS Information Technology Co., Ltd. Network attack detection method and device
CN110445776A (en) * 2019-07-30 2019-11-12 国网河北省电力有限公司电力科学研究院 A kind of unknown attack Feature Selection Model construction method based on machine learning

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20160226894A1 (en) * 2015-02-04 2016-08-04 Electronics And Telecommunications Research Institute System and method for detecting intrusion intelligently based on automatic detection of new attack type and update of attack type model
US20180212986A1 (en) * 2015-08-17 2018-07-26 NSFOCUS Information Technology Co., Ltd. Network attack detection method and device
CN106302436A (en) * 2016-08-11 2017-01-04 广州华多网络科技有限公司 The method that independently finds, device and the equipment of a kind of attack message characteristics
CN110445776A (en) * 2019-07-30 2019-11-12 国网河北省电力有限公司电力科学研究院 A kind of unknown attack Feature Selection Model construction method based on machine learning

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
M. THOMSON等: "Unknown Key Share Attacks on uses of TLS with the Session Description Protocol (SDP) draft-ietf-mmusic-sdp-uks-05", 《IETF 》 *
秦拯等: "基于序列比对的攻击特征自动提取方法", 《湖南大学学报(自然科学版)》 *

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113542286A (en) * 2021-07-20 2021-10-22 龙海 Intelligent detection system for computer network security intrusion
CN113542286B (en) * 2021-07-20 2023-09-12 北京辰极智程信息技术股份有限公司 Intelligent computer network safety intrusion detection system
CN113612786A (en) * 2021-08-09 2021-11-05 上海交通大学宁波人工智能研究院 Intrusion detection system and method for vehicle bus

Also Published As

Publication number Publication date
CN112437084B (en) 2023-02-28

Similar Documents

Publication Publication Date Title
CN110597734B (en) Fuzzy test case generation method suitable for industrial control private protocol
CN109218223B (en) Robust network traffic classification method and system based on active learning
CN112437084B (en) Attack feature extraction method
CN112422531A (en) CNN and XGboost-based network traffic abnormal behavior detection method
CN109903053B (en) Anti-fraud method for behavior recognition based on sensor data
CN112667750A (en) Method and device for determining and identifying message category
CN113037567B (en) Simulation method of network attack behavior simulation system for power grid enterprise
CN113452672B (en) Method for analyzing abnormal flow of terminal of Internet of things of electric power based on reverse protocol analysis
CN112613309A (en) Log classification analysis method, device and equipment and readable storage medium
CN112261645A (en) Mobile application fingerprint automatic extraction method and system based on grouping and domain division
CN109275045B (en) DFI-based mobile terminal encrypted video advertisement traffic identification method
CN108540473A (en) A kind of data analysing method and data analysis set-up
CN109660656A (en) A kind of intelligent terminal method for identifying application program
CN115277180A (en) Block chain log anomaly detection and tracing system
CN115277113A (en) Power grid network intrusion event detection and identification method based on ensemble learning
KR102014234B1 (en) Method and Apparatus for automatic analysis for Wireless protocol
CN114124565B (en) Network intrusion detection method based on graph embedding
CN113746707B (en) Encrypted traffic classification method based on classifier and network structure
CN115334179A (en) Unknown protocol reverse analysis method based on named entity recognition
CN114666273A (en) Application layer unknown network protocol oriented traffic classification method
CN114979017A (en) Deep learning protocol identification method and system based on original flow of industrial control system
CN110336817B (en) Unknown protocol frame positioning method based on TextRank
CN112559832A (en) Method for classifying secondary encrypted traffic transmitted in encrypted channel
CN115883398B (en) Reverse analysis method and device for private network protocol format and state
CN114037004A (en) IP network attack group classification method based on behavior sequence

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant