CN110445776A - A kind of unknown attack Feature Selection Model construction method based on machine learning - Google Patents

A kind of unknown attack Feature Selection Model construction method based on machine learning Download PDF

Info

Publication number
CN110445776A
CN110445776A CN201910692688.7A CN201910692688A CN110445776A CN 110445776 A CN110445776 A CN 110445776A CN 201910692688 A CN201910692688 A CN 201910692688A CN 110445776 A CN110445776 A CN 110445776A
Authority
CN
China
Prior art keywords
machine learning
parameter
attack
data
unknown attack
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201910692688.7A
Other languages
Chinese (zh)
Inventor
左晓军
董立勉
陈泽
侯波涛
赵建斌
刘欣
常杰
董娜
郗波
王春璞
刘惠颖
张君艳
刘伟娜
王颖
郭禹伶
冯海燕
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Electric Power Research Institute of State Grid Hebei Electric Power Co Ltd
State Grid Hebei Energy Technology Service Co Ltd
Original Assignee
Electric Power Research Institute of State Grid Hebei Electric Power Co Ltd
State Grid Hebei Energy Technology Service Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Electric Power Research Institute of State Grid Hebei Electric Power Co Ltd, State Grid Hebei Energy Technology Service Co Ltd filed Critical Electric Power Research Institute of State Grid Hebei Electric Power Co Ltd
Priority to CN201910692688.7A priority Critical patent/CN110445776A/en
Publication of CN110445776A publication Critical patent/CN110445776A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/10Complex mathematical operations
    • G06F17/16Matrix or vector computation, e.g. matrix-matrix or matrix-vector multiplication, matrix factorization
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/14Network analysis or design
    • H04L41/145Network analysis or design involving simulating, designing, planning or modelling of a network
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/14Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic
    • H04L63/1408Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic by monitoring network traffic
    • H04L63/1416Event detection, e.g. attack signature detection

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Data Mining & Analysis (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Mathematical Optimization (AREA)
  • Signal Processing (AREA)
  • Computer Security & Cryptography (AREA)
  • Pure & Applied Mathematics (AREA)
  • Mathematical Analysis (AREA)
  • Computational Mathematics (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Algebra (AREA)
  • Artificial Intelligence (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Computation (AREA)
  • Medical Informatics (AREA)
  • Databases & Information Systems (AREA)
  • Computer Hardware Design (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention discloses a kind of unknown attack Feature Selection Model construction method based on machine learning, this method calculates feature according to algorithm, robustness and interpretation with higher, through overtesting, the detection success rate that the unknown attack feature obtained by this method is used to detect this kind of feature is higher than artificial detection method and common attack signature automatically extracts technology, can detect to most attacks;Meanwhile feature extraction speed, the exploitativeness of the easy expenditure of feature and method itself have all reached a higher level, have effectively raised the feasibility of method.

Description

A kind of unknown attack Feature Selection Model construction method based on machine learning
Technical field
The present invention relates to machine learning techniques field, specific field is that a kind of unknown attack feature based on machine learning mentions Take model building method.
Background technique
With being growing for network size, network attack quantity also increases therewith.How network system normal is guaranteed Even running becomes the main project of network security.And the attack detecting based on attack signature become it is most commonly seen Detection mode.Attack signature is the description to a kind of summing-up of attack, it is generally the case that attack signature should be the attack Monopolizing characteristic in generated data on flows intuitively can find and determine an attack by feature, and not Daily production and living can be affected greatly.And the attack unknown for one, it would be desirable to which feature is carried out to it Analysis and extraction, so as to later to such attack early warning and defence.The very complicated complexity of the process of attack signatures generation, is adopted The mode speed for carrying out attack signatures generation with infiltration expert is slow, and subjectivity is high, can not determine having for extracted feature Effect property.Therefore a kind of efficient attack signature is needed to automatically extract technology.
Existing attack signature automatically extracts technology and is divided into network-based attack signatures generation technology and Intrusion Detection based on host Attack signatures generation technology.Network-based attack signatures generation technology is extracted using the attack information on network by algorithm Attack the attack signature in information;And the attack signatures generation technology of Intrusion Detection based on host is by making certain change to system environments, Correlation attack information is obtained in the host attacked and is analyzed obtains feature.The accuracy of two class methods, feature extraction speed, The easy expenditure of feature and method itself suffer from different degrees of advantage and disadvantage.
Summary of the invention
The purpose of the present invention is to provide a kind of unknown attack Feature Selection Model construction method based on machine learning, with Solve the problems mentioned above in the background art.
To achieve the above object, the invention provides the following technical scheme: a kind of unknown attack feature based on machine learning Model building method is extracted, the unknown attack Feature Selection Model construction method based on machine learning includes the following steps:
Step 1: the related data and relevant comparative's secure data of unknown attack to be detected are collected;
Step 2: feature preextraction is carried out to data, forms characteristic;
Step 3: character data that may be present in characteristic is converted into digital form, in a manner of character matrix Output;
Step 4: the initialization of size and content is carried out to the parameter matrix in machine learning;
Step 5: data input model will be attacked, is multiplied with parameter matrix and obtains predicted value;
Step 6: the deviation of predicted value and actual value is calculated, obtains error amount;
Step 7: whether error in judgement value meets condition, if not satisfied, according to error amount to the parameter in parameter matrix into After row updates, step 5 is returned to, carries out next step if meeting;
Step 8: completing training output matrix parameter, exports unknown attack feature according to parameter absolute value.
Preferably, preextraction and data type conversion are carried out to the correlated characteristic in attack data, in order to machine learning The training of model.
Preferably, characteristic parameter is trained by the attack data after conversion using machine learning model.
Preferably, after completing training, the weight of different characteristic is obtained according to characteristic parameter, and then extract unknown attack Hit feature.
Compared with prior art, the beneficial effects of the present invention are: a kind of unknown attack feature extraction based on machine learning Model building method, this method calculate feature according to algorithm, and (computer software is in input error, magnetic for robustness with higher In the case of disk failure, network over loading or intentional attack, can not crash, not collapse, be exactly the robustness of the software) and it is interpretable Property, through overtesting, the detection success rate that the unknown attack feature obtained by this method is used to detect this kind of feature is higher than artificial Detection method and common attack signature automatically extract technology, can detect to most attacks;Meanwhile feature mentions Take speed, the exploitativeness of the easy expenditure of feature and method itself has all reached a higher level, effectively raises method Feasibility.
Detailed description of the invention
Fig. 1 is unknown attack feature extraction flow chart of the invention.
Specific embodiment
Following will be combined with the drawings in the embodiments of the present invention, and technical solution in the embodiment of the present invention carries out clear, complete Site preparation description, it is clear that described embodiments are only a part of the embodiments of the present invention, instead of all the embodiments.It is based on Embodiment in the present invention, it is obtained by those of ordinary skill in the art without making creative efforts every other Embodiment shall fall within the protection scope of the present invention.
Referring to Fig. 1, the present invention provides a kind of technical solution: a kind of unknown attack feature extraction mould based on machine learning Type construction method, the unknown attack Feature Selection Model construction method based on machine learning include the following steps:
Step 1: the related data and relevant comparative's secure data of unknown attack to be detected are collected;
Step 2: feature preextraction is carried out to data, forms characteristic;
Step 3: character data that may be present in characteristic is converted into digital form, in a manner of character matrix Output;
Step 4: the initialization of size and content is carried out to the parameter matrix in machine learning;
Step 5: data input model will be attacked, is multiplied with parameter matrix and obtains predicted value;
Step 6: the deviation of predicted value and actual value is calculated, obtains error amount;
Step 7: whether error in judgement value meets condition, if not satisfied, according to error amount to the parameter in parameter matrix into After row updates, step 5 is returned to, carries out next step if meeting;
Step 8: completing training output matrix parameter, exports unknown attack feature according to parameter absolute value.
Attack data collection when, the attack for a unknown characteristic, first we correlation attack data should be carried out It collects, including the attack data on attack data such as the attack traffic packet and target of attack host that can be grabbed on network, Such as the file left is attacked, relevant journal file etc. should additionally collect relevant secure data so as to below It is contrasted during Experiment Training.
When data characteristics preextraction, for the data being collected into, we extract correlated characteristic present in it, example Such as the length of message in flow packet, the time of message, relevant sensitive character in message, the time of log, day in destination host Relevant sensitive information etc. in will, the feature type extracted herein is more, finally obtains the accurate of attack feature extraction Degree also can be higher.
When data conversion, for there may be characters in the characteristic that extracts, we mark character here Number, kinds of characters is successively marked with number, different characters can be replaced, with different numbers thus in order to it The training of machine learning model afterwards.
During initiation parameter matrix, it is assumed that the number of parameters extracted when data characteristics is extracted before is m (m Indicate several), n item attack record (n indicates several) has been extracted in total, then we will be at the beginning of parameter matrix size here Beginning turns to m*1, and each of matrix parameter is initially set to 1.
Prediction result is calculated, reply records matrix 1*m in every attack, it is multiplied to obtain one with parameter matrix by we Number, if the number is greater than 0.5, it is attack that we, which think this record really,;If the number is less than or equal to 0.5, I Then think that this is recorded as safety behavior.N item attack record obtains n prediction result in this way.
Calculating error is carried out after calculating prediction result, we are by the actual result (attack of n prediction result and they It is denoted as 1,0) safety behavior, which is denoted as, to be compared, the difference α between them, as error are calculated.
Error is judged, if error a is greater than preset value, we are according to error come undated parameter: bi→bi-kαi, In represent i-th of parameter, k represents learning rate, be then return to the 5th step and calculate prediction result again, if error alpha be less than it is default Value or parameter update times are greater than threshold value, then terminate to train.
Last memory Parameter analysis is analyzed the complete parameter taking-up of training, the big explanation parameter pair of parameter absolute value The feature answered plays the role of larger, and the corresponding characteristic action of the small explanation parameter of parameter absolute value is smaller, therefore deduces that this Correlated characteristic weight in unknown attack is obtained of the invention to finally extract unknown attack feature by above analysis Specific method.
Specifically, preextraction and data type conversion are carried out to the correlated characteristic in attack data, in order to engineering Practise the training of model.
Specifically, being trained by the attack data after conversion to characteristic parameter using machine learning model.
Specifically, obtaining the weight of different characteristic according to characteristic parameter, and then extract unknown after completing training Attack signature.
It although an embodiment of the present invention has been shown and described, for the ordinary skill in the art, can be with A variety of variations, modification, replacement can be carried out to these embodiments without departing from the principles and spirit of the present invention by understanding And modification, the scope of the present invention is defined by the appended.

Claims (4)

1. a kind of unknown attack Feature Selection Model construction method based on machine learning, it is characterised in that: described to be based on machine The unknown attack Feature Selection Model construction method of study includes the following steps:
Step 1: the related data and relevant comparative's secure data of unknown attack to be detected are collected;
Step 2: feature preextraction is carried out to data, forms characteristic;
Step 3: character data that may be present in characteristic is converted into digital form, is exported in a manner of character matrix;
Step 4: the initialization of size and content is carried out to the parameter matrix in machine learning;
Step 5: data input model will be attacked, is multiplied with parameter matrix and obtains predicted value;
Step 6: the deviation of predicted value and actual value is calculated, obtains error amount;
Step 7: whether error in judgement value meets condition, if not satisfied, being carried out more according to error amount to the parameter in parameter matrix After new, step 5 is returned to, carries out next step if meeting;
Step 8: completing training output matrix parameter, exports unknown attack feature according to parameter absolute value.
2. a kind of unknown attack Feature Selection Model construction method based on machine learning according to claim 1, special Sign is: preextraction and data type conversion is carried out to the correlated characteristic in attack data, in order to the instruction of machine learning model Practice.
3. a kind of unknown attack Feature Selection Model construction method based on machine learning according to claim 1, special Sign is: being trained by the attack data after conversion to characteristic parameter using machine learning model.
4. a kind of unknown attack Feature Selection Model construction method based on machine learning according to claim 1, special Sign is: after completing training, the weight of different characteristic is obtained according to characteristic parameter, and then extract unknown attack feature.
CN201910692688.7A 2019-07-30 2019-07-30 A kind of unknown attack Feature Selection Model construction method based on machine learning Pending CN110445776A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910692688.7A CN110445776A (en) 2019-07-30 2019-07-30 A kind of unknown attack Feature Selection Model construction method based on machine learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910692688.7A CN110445776A (en) 2019-07-30 2019-07-30 A kind of unknown attack Feature Selection Model construction method based on machine learning

Publications (1)

Publication Number Publication Date
CN110445776A true CN110445776A (en) 2019-11-12

Family

ID=68432218

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910692688.7A Pending CN110445776A (en) 2019-07-30 2019-07-30 A kind of unknown attack Feature Selection Model construction method based on machine learning

Country Status (1)

Country Link
CN (1) CN110445776A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111797401A (en) * 2020-07-08 2020-10-20 深信服科技股份有限公司 Attack detection parameter acquisition method, device, equipment and readable storage medium
CN112437084A (en) * 2020-11-23 2021-03-02 上海工业自动化仪表研究院有限公司 Attack feature extraction method

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108429753A (en) * 2018-03-16 2018-08-21 重庆邮电大学 A kind of matched industrial network DDoS intrusion detection methods of swift nature
CN109858254A (en) * 2019-01-15 2019-06-07 西安电子科技大学 Platform of internet of things attack detection system and method based on log analysis

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108429753A (en) * 2018-03-16 2018-08-21 重庆邮电大学 A kind of matched industrial network DDoS intrusion detection methods of swift nature
CN109858254A (en) * 2019-01-15 2019-06-07 西安电子科技大学 Platform of internet of things attack detection system and method based on log analysis

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
杨昆朋: "基于深度学习的入侵检测", 《CNKI 中国优秀硕士学位论文全文数据库 信息科技辑》 *

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111797401A (en) * 2020-07-08 2020-10-20 深信服科技股份有限公司 Attack detection parameter acquisition method, device, equipment and readable storage medium
CN111797401B (en) * 2020-07-08 2023-12-29 深信服科技股份有限公司 Attack detection parameter acquisition method, device, equipment and readable storage medium
CN112437084A (en) * 2020-11-23 2021-03-02 上海工业自动化仪表研究院有限公司 Attack feature extraction method

Similar Documents

Publication Publication Date Title
CN110233849B (en) Method and system for analyzing network security situation
CN109446635B (en) Electric power industrial control attack classification method and system based on machine learning
Murtaza et al. A host-based anomaly detection approach by representing system calls as states of kernel modules
CN109308411B (en) Method and system for hierarchically detecting software behavior defects based on artificial intelligence decision tree
CN105471882A (en) Behavior characteristics-based network attack detection method and device
CN107992746A (en) Malicious act method for digging and device
CN104753946A (en) Security analysis framework based on network traffic metadata
CN112492059A (en) DGA domain name detection model training method, DGA domain name detection device and storage medium
CN110147387A (en) A kind of root cause analysis method, apparatus, equipment and storage medium
CN107016298B (en) Webpage tampering monitoring method and device
US10572318B2 (en) Log analysis apparatus, log analysis system, log analysis method and computer program
CN107111610A (en) Mapper component for neural language performance identifying system
CN110046647A (en) A kind of identifying code machine Activity recognition method and device
CN110445776A (en) A kind of unknown attack Feature Selection Model construction method based on machine learning
CN107003992A (en) Perception associative memory for neural language performance identifying system
CN109660518A (en) Communication data detection method, device and the machine readable storage medium of network
CN110839088A (en) Detection method, system, device and storage medium for dug by virtual currency
CN116074092B (en) Attack scene reconstruction system based on heterogram attention network
CN110602105A (en) Large-scale parallelization network intrusion detection method based on k-means
CN112787984A (en) Vehicle-mounted network anomaly detection method and system based on correlation analysis
CN114408694B (en) Elevator fault prediction system and prediction method thereof
CN109547496B (en) Host malicious behavior detection method based on deep learning
CN116248362A (en) User abnormal network access behavior identification method based on double-layer hidden Markov chain
CN115373834B (en) Intrusion detection method based on process call chain
CN105915536A (en) Attack behavior real-time tracking and analysis method for cyber range

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
WD01 Invention patent application deemed withdrawn after publication
WD01 Invention patent application deemed withdrawn after publication

Application publication date: 20191112