CN110022313A - Polymorphic worm feature extraction and polymorphic worm discrimination method based on machine learning - Google Patents

Polymorphic worm feature extraction and polymorphic worm discrimination method based on machine learning Download PDF

Info

Publication number
CN110022313A
CN110022313A CN201910226995.6A CN201910226995A CN110022313A CN 110022313 A CN110022313 A CN 110022313A CN 201910226995 A CN201910226995 A CN 201910226995A CN 110022313 A CN110022313 A CN 110022313A
Authority
CN
China
Prior art keywords
worm
polymorphic worm
polymorphic
machine learning
feature
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201910226995.6A
Other languages
Chinese (zh)
Other versions
CN110022313B (en
Inventor
王方伟
王长广
杨少杰
赵冬梅
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hebei Normal University
Original Assignee
Hebei Normal University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hebei Normal University filed Critical Hebei Normal University
Priority to CN201910226995.6A priority Critical patent/CN110022313B/en
Publication of CN110022313A publication Critical patent/CN110022313A/en
Application granted granted Critical
Publication of CN110022313B publication Critical patent/CN110022313B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/14Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic
    • H04L63/1441Countermeasures against malicious traffic
    • H04L63/145Countermeasures against malicious traffic the attack involving the propagation of malware through the network, e.g. viruses, trojans or worms

Abstract

The polymorphic worm feature extraction and polymorphic worm discrimination method that the invention discloses a kind of based on machine learning, feature extracting method includes load polymorphic worm and is divided into test set and training set, establishes, trains and verify malice polymorphic worm behavioural characteristic mathematical model step.Discrimination method real-time monitor (RTM) operating status records polymorphic worm feature extraction experimental result.The present invention can more fast and accurately extract the feature of polymorphic worm, can be applied in real time among the monitoring of network flow, the scalability of program is strong, humanized convenient for control, visualization window.

Description

Polymorphic worm feature extraction and polymorphic worm discrimination method based on machine learning
Technical field
The present invention relates to a kind of polymorphic worm feature extracting methods and polymorphic worm discrimination method more particularly to one kind to be based on The polymorphic worm feature extraction of machine learning and polymorphic worm discrimination method, belong to technical field of network security.
Background technique
With internet being widely popularized and applying in every field, worm has become current network space safety and master One of the chief threat of machine safety, quick sensing and raising feature extraction accuracy problems for polymorphic worm become interconnection Major issue in net safety.With the continuous development of computer networking technology, it is fast that polymorphic worm shows mutation, spread speed Rapidly, destructive power is big, and self-reproduction ability is strong, it is difficult to which the characteristics of finding causes immeasurable damage to social production life It loses.With being constantly progressive for polymorphic worm producing method and circulation way, propagation can extend over the entire globe in a short time, and right Network environment causes the strike of strength, and propagation can also carry out breeding and Intranet transmitting, traditional sequence ratio by personal host Rapidly extracting can not have effectively been carried out to mode and the protection of network normal environment is provided.Therefore, improve what polymorphic worm extracted Accuracy and extraction efficiency become urgent problem to be solved.
Summary of the invention
The polymorphic worm feature extraction that the technical problem to be solved in the present invention is to provide a kind of based on machine learning and polymorphic Worm discrimination method.
In order to solve the above technical problems, the technical solution adopted by the present invention is that:
Technical solution one:
A kind of polymorphic worm feature extracting method based on machine learning, comprising the following steps:
Step 1: load polymorphic worm is simultaneously divided into test set and training set: building regular expression is segmented, one by one Polymorphic worm data set is loaded, malicious data set is divided into test set and training set according to preset ratio;Test set and instruction Practicing collection, further grouping forms file set, and each malicious is corresponding with a file in file set respectively;
Step 2: training malice polymorphic worm behavior model: being extracted in the training set using unsupervised machine learning mode 1 or more sub-line of malicious be characterized, the malice polymorphic worm behavior model of acquisition is that each sub-line is characterized by power It is weighted and averaged again.
The polymorphic worm feature extracting method based on machine learning further includes step 3: refining malice polymorphic worm Behavioural characteristic mathematical model: the malicious extracted again in the way of supervised learning using the malicious in test set Sub-line is characterized, and the identical sub-line that step 2 and step 3 are extracted is characterized and corresponds to weight difference less than preset value, remaining behavior Feature is given up.
Regular expression in the step 1 be " GET.*http/1.1.* r n.*? ".
The calculation method of each behavioural characteristic weight in malice polymorphic worm behavior model in the step 2 is equal are as follows:
The calculation method of the forward direction word frequency are as follows:
The calculation method of the Feature Conversion are as follows:
The calculation method of the inverse document frequency are as follows:
Technical solution two:
A kind of polymorphic worm discrimination method of polymorphic worm feature extracting method described in application technology scheme one, including with Lower step:
Real-time monitor (RTM) operating status is recorded the log of polymorphic worm feature extraction and is shown using visualization window; The log includes whether monitoring programme has each Step Time node of abnormality, program, data set Packet State, every It is identified polymorphic after worm code segment initial data, the signature analysis tentatively extracted to respective code section, program test Worm feature.
Having the technical effect that acquired by by adopting the above technical scheme
1. will not be with data volume and dimension increase and positive word frequency value using the method memory usage amount for improving hash calculating It can accomplish position corresponding relationship one by one, improve positive word frequency accuracy in computation.
2. making the feature between different dimensions numerically more have comparativity using the method that positive word frequency calculates is improved, greatly The big accuracy for improving feature extraction.
3. accelerating convergence speed of the algorithm using the method that inverse document frequency calculates is improved, really polymorphic how to go made Can more obviously to be distinguished with other characteristic behaviors in weight performance.
4. the method using weight computing can be avoided inverse document frequency and calculate rare spy when calculating final behavioural characteristic The excessively high situation of weight, advanced optimizes characteristic extraction procedure when sign.
5. the feature that the present invention can more fast and accurately extract polymorphic worm;
6. the present invention can be applied in real time among the monitoring of network flow, polymorphic worm is detected;
7. scalability of the invention is strong, humanized convenient for control, visualization window.
Detailed description of the invention
Fig. 1 is flow chart of the invention.
Specific embodiment
The present invention will be further described in detail below with reference to the accompanying drawings and specific embodiments.
Embodiment 1
A kind of polymorphic worm feature extracting method based on machine learning, comprising the following steps:
Step 1: load polymorphic worm is simultaneously divided into test set and training set: building regular expression is segmented, one by one Polymorphic worm data set is loaded, malicious data set is divided into test set and training set according to preset ratio;Test set and instruction Practicing collection, further grouping forms file set, and each malicious is corresponding with a file in file set respectively;
Step 2: training malice polymorphic worm behavior model: being extracted in the training set using unsupervised machine learning mode 1 or more sub-line of malicious be characterized, the malice polymorphic worm behavior model of acquisition is that each sub-line is characterized by power It is weighted and averaged again.
The polymorphic worm feature extracting method based on machine learning further includes step 3: refining malice polymorphic worm Behavioural characteristic mathematical model: the malicious extracted again in the way of supervised learning using the malicious in test set Sub-line is characterized, and the identical sub-line that step 2 and step 3 are extracted is characterized and corresponds to weight difference less than 0.67, remaining behavior is special Sign is given up.
Regular expression in step 1 be " GET.*http/1.1.* r n.*? ".
It is that its positive word frequency is inverse with it that each sub-line in malice polymorphic worm behavior model in step 2, which is characterized weight, To the product of document frequency;
High dimensional data, ergodic data collection building mapping, when enabling to calculate positive word frequency value are handled using Hash is improved Corresponding worm code segment specific location, specific formula are as follows:
The positive word frequency value of calculating is combined with improvement hash algorithm using double normalization algorithms, allows high latitude feature in numerical value Upper to have more comparativity, specific formula is as follows:
Subcharacter significance level in entire file is calculated using inverse document frequency algorithm, specific formula is as follows:
Final power is calculated, specific formula is as follows:
Polymorphic worm behavioural characteristic is finally determined using weighted mean method.
The model builds home environment using sbt (Simple build tool), is carried out using Scala programming language Program is realized.The needs of building of home environment are interacted with remote server, environment required for downloading experiment algorithm routine It relies on, improves the achievable probability of program by way of introducing other and relying on packet, so that environment can normally and efficiently be run, structure Build the local service repository using Spark as frame;
Wherein step 3: it refining malice polymorphic worm behavioural characteristic mathematical model: is used according to the mathematical model in step 2 Unsupervised machine learning mode extracts the weight that each sub-line of the malicious in the training set is characterized.
The feature extraction that polymorphic worm is completed in the way of unsupervised machine learning, it is unsupervised under noiseless state Machine learning, which is advantageous in that, so that algorithm is automatically extracted out under the premise of subcharacter known to not specified polymorphic worm A large amount of suspicious polymorphic worm subcharacter.
Step 4: correction malice polymorphic worm behavioural characteristic mathematical model: using in test set in the way of supervised learning The sub-line of malicious extracted of malicious verification step 3 be characterized, if institute's extraction unit characterization of molecules includes known compacted Worm characteristic segments, subcharacter are real malice polymorphic worm feature, and otherwise subcharacter is not real malice polymorphic worm feature. Wherein, the subcharacter weight of polymorphic worm differs about 1.37 with the weight of non-polymorphic worm subcharacter.
Feature is marked automatically using the method program of supervised learning, the feature after label is tested, finally Whether program validation test feature is true polymorphic worm feature, and this method can be more accurate under the conditions of noise states are stronger Identify polymorphic worm data, multiple arithmetic check finally automatically extracts more comprehensively accurate polymorphic worm feature.This method It can be applied in day normal flow monitoring.
The feature extraction that polymorphic worm is carried out using the method that unsupervised learning and supervised learning combine, can be more Comprehensively extract the feature of polymorphic worm.The extraction of unsupervised learning existing characteristics does not have targetedly disadvantage;There is supervision to learn It practises since program " learning ability " is weaker than supervised learning, existing characteristics extract incomplete disadvantage.It is carried out using two ways Carrying out verification in conjunction with and using test set can make up for it the two weakness, and the more comprehensive and accurate spy for extracting polymorphic worm Sign.
Embodiment 2
A kind of polymorphic worm discrimination method of polymorphic worm feature extracting method described in application technology scheme one, including with Lower step:
Real-time monitor (RTM) operating status is recorded the log of polymorphic worm feature extraction and is shown using visualization window; The log includes whether monitoring programme has each Step Time node of abnormality, program, data set Packet State, every It is identified polymorphic after worm code segment initial data, the signature analysis tentatively extracted to respective code section, program test Worm feature.
The above is only a preferred embodiment of the present invention, it is noted that for the ordinary skill people of the art For member, without departing from the principle of the present invention, it can also make several improvements and retouch, these improvements and modifications are also answered It is considered as protection scope of the present invention.

Claims (5)

1. a kind of polymorphic worm feature extracting method based on machine learning, it is characterised in that: the following steps are included:
Step 1: load polymorphic worm is simultaneously divided into test set and training set: building regular expression is segmented, and is loaded one by one Malicious data set is divided into test set and training set according to preset ratio by polymorphic worm data set;Test set and training set Further grouping forms file set, and each malicious is corresponding with a file in file set respectively;
Step 2: the evil in the training set training malice polymorphic worm behavior model: being extracted using unsupervised machine learning mode 1 or more the sub-line of meaning worm is characterized, and the malice polymorphic worm behavior model of acquisition is characterized for each sub-line to be added by weight Weight average.
2. the polymorphic worm feature extracting method according to claim 1 based on machine learning, it is characterised in that: further include Step 3: refining malice polymorphic worm behavioural characteristic mathematical model: compacted using the malice in test set in the way of supervised learning The sub-line for the malicious that worm is extracted again is characterized, and the identical sub-line that step 2 and step 3 are extracted is characterized and to correspond to weight poor The different reservation less than preset value, remaining sub-line, which is characterized, to be given up.
3. the polymorphic worm feature extracting method according to claim 1 based on machine learning, it is characterised in that: step 1 In regular expression be " GET.*http/1.1.* r n.*? ".
4. the polymorphic worm feature extracting method according to claim 1 based on machine learning, it is characterised in that: step 2 In malice polymorphic worm behavior model in each subcharacter weight calculation method it is equal are as follows:
The calculation method of the forward direction word frequency are as follows:
The calculation method of the Feature Conversion are as follows:
The calculation method of the inverse document frequency are as follows:
5. the polymorphic worm of the polymorphic worm feature extracting method according to any one of claims 1-4 based on machine learning Discrimination method, comprising the following steps:
Real-time monitor (RTM) operating status is recorded the log of polymorphic worm feature extraction and is shown using visualization window;It is described Log includes whether monitoring programme has each Step Time node of abnormality, program, data set Packet State, every worm Identified polymorphic worm after code segment initial data, the signature analysis tentatively extracted to respective code section, program test Feature.
CN201910226995.6A 2019-03-25 2019-03-25 Polymorphic worm feature extraction and polymorphic worm identification method based on machine learning Active CN110022313B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910226995.6A CN110022313B (en) 2019-03-25 2019-03-25 Polymorphic worm feature extraction and polymorphic worm identification method based on machine learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910226995.6A CN110022313B (en) 2019-03-25 2019-03-25 Polymorphic worm feature extraction and polymorphic worm identification method based on machine learning

Publications (2)

Publication Number Publication Date
CN110022313A true CN110022313A (en) 2019-07-16
CN110022313B CN110022313B (en) 2021-09-17

Family

ID=67189935

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910226995.6A Active CN110022313B (en) 2019-03-25 2019-03-25 Polymorphic worm feature extraction and polymorphic worm identification method based on machine learning

Country Status (1)

Country Link
CN (1) CN110022313B (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111177724A (en) * 2019-12-12 2020-05-19 河北师范大学 Automatic detection method for polymorphic worm virus
CN112910825A (en) * 2019-11-19 2021-06-04 华为技术有限公司 Worm detection method and network equipment
CN113158190A (en) * 2021-04-30 2021-07-23 河北师范大学 Malicious code countermeasure sample automatic generation method based on generation type countermeasure network

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106709349A (en) * 2016-12-15 2017-05-24 中国人民解放军国防科学技术大学 Multi-dimension behavior characteristic-based malicious code classification method
CN108200030A (en) * 2017-12-27 2018-06-22 深信服科技股份有限公司 Detection method, system, device and the computer readable storage medium of malicious traffic stream
CN108304359A (en) * 2018-02-06 2018-07-20 中国传媒大学 Unsupervised learning uniform characteristics extractor construction method
CN108563636A (en) * 2018-04-04 2018-09-21 广州杰赛科技股份有限公司 Extract method, apparatus, equipment and the storage medium of text key word
CN108769001A (en) * 2018-04-11 2018-11-06 哈尔滨工程大学 Malicious code detecting method based on the analysis of network behavior feature clustering
CN108833360A (en) * 2018-05-23 2018-11-16 四川大学 A kind of malice encryption flow identification technology based on machine learning
US20190034632A1 (en) * 2017-07-25 2019-01-31 Trend Micro Incorporated Method and system for static behavior-predictive malware detection

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106709349A (en) * 2016-12-15 2017-05-24 中国人民解放军国防科学技术大学 Multi-dimension behavior characteristic-based malicious code classification method
US20190034632A1 (en) * 2017-07-25 2019-01-31 Trend Micro Incorporated Method and system for static behavior-predictive malware detection
CN108200030A (en) * 2017-12-27 2018-06-22 深信服科技股份有限公司 Detection method, system, device and the computer readable storage medium of malicious traffic stream
CN108304359A (en) * 2018-02-06 2018-07-20 中国传媒大学 Unsupervised learning uniform characteristics extractor construction method
CN108563636A (en) * 2018-04-04 2018-09-21 广州杰赛科技股份有限公司 Extract method, apparatus, equipment and the storage medium of text key word
CN108769001A (en) * 2018-04-11 2018-11-06 哈尔滨工程大学 Malicious code detecting method based on the analysis of network behavior feature clustering
CN108833360A (en) * 2018-05-23 2018-11-16 四川大学 A kind of malice encryption flow identification technology based on machine learning

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
AVIJIT MONDAL等: "Automated signature generation for polymorphic worms using substrings extraction and principal component analysis", 《2015 IEEE INTERNATIONAL CONFERENCE ON COMPUTATIONAL INTELLIGENCE AND COMPUTING RESEARCH (ICCIC)》 *
SHADI A. ALJAWARNEH等: "Investigations of automatic methods for detecting the polymorphic worms signatures", 《FUTURE GENERATION COMPUTER SYSTEMS》 *
李鹏飞: "基于操作码序列和机器学习的恶意程序检测技术研究", 《中国优秀硕士学位论文全文数据库 信息科技辑》 *

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112910825A (en) * 2019-11-19 2021-06-04 华为技术有限公司 Worm detection method and network equipment
CN112910825B (en) * 2019-11-19 2022-06-14 华为技术有限公司 Worm detection method and network equipment
CN111177724A (en) * 2019-12-12 2020-05-19 河北师范大学 Automatic detection method for polymorphic worm virus
CN113158190A (en) * 2021-04-30 2021-07-23 河北师范大学 Malicious code countermeasure sample automatic generation method based on generation type countermeasure network
CN113158190B (en) * 2021-04-30 2022-03-29 河北师范大学 Malicious code countermeasure sample automatic generation method based on generation type countermeasure network

Also Published As

Publication number Publication date
CN110022313B (en) 2021-09-17

Similar Documents

Publication Publication Date Title
CN107786369B (en) Power communication network security situation perception and prediction method based on IRT (intelligent resilient test) hierarchical analysis and LSTM (local Scale TM)
CN111914873A (en) Two-stage cloud server unsupervised anomaly prediction method
Shirazi et al. Evaluation of anomaly detection techniques for scada communication resilience
CN110022313A (en) Polymorphic worm feature extraction and polymorphic worm discrimination method based on machine learning
CN111191767B (en) Vectorization-based malicious traffic attack type judging method
CN106209862A (en) A kind of steal-number defence implementation method and device
CN111711608B (en) Method and system for detecting abnormal flow of power data network and electronic equipment
CN114301712B (en) Industrial internet alarm log correlation analysis method and system based on graph method
He et al. Deep-feature-based autoencoder network for few-shot malicious traffic detection
CN111741002B (en) Method and device for training network intrusion detection model
CN113269389A (en) Network security situation assessment and situation prediction modeling method based on deep belief network
CN109547455A (en) Industrial Internet of Things anomaly detection method, readable storage medium storing program for executing and terminal
CN112087442A (en) Time sequence related network intrusion detection method based on attention mechanism
CN113556319A (en) Intrusion detection method based on long-short term memory self-coding classifier under internet of things
CN116823233B (en) User data processing method and system based on full-period operation and maintenance
CN111784404B (en) Abnormal asset identification method based on behavior variable prediction
CN115277065B (en) Anti-attack method and device in abnormal traffic detection of Internet of things
CN108121912B (en) Malicious cloud tenant identification method and device based on neural network
Huang et al. Attack detection and data generation for wireless cyber-physical systems based on self-training powered generative adversarial networks
Hu et al. Classification of Abnormal Traffic in Smart Grids Based on GACNN and Data Statistical Analysis
CN113542222B (en) Zero-day multi-step threat identification method based on dual-domain VAE
Wang Research of intrusion detection based on an improved K-means algorithm
Oh et al. Attack Classification Based on Data Mining Technique and Its Application for Reliable Medical Sensor Communication.
CN114006744A (en) LSTM-based power monitoring system network security situation prediction method and system
CN110290101B (en) Deep trust network-based associated attack behavior identification method in smart grid environment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant