CN109948339A - A kind of malicious script detection method based on machine learning - Google Patents

A kind of malicious script detection method based on machine learning Download PDF

Info

Publication number
CN109948339A
CN109948339A CN201910210330.6A CN201910210330A CN109948339A CN 109948339 A CN109948339 A CN 109948339A CN 201910210330 A CN201910210330 A CN 201910210330A CN 109948339 A CN109948339 A CN 109948339A
Authority
CN
China
Prior art keywords
webshell
sample data
data
machine learning
script
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201910210330.6A
Other languages
Chinese (zh)
Inventor
孙波
李应博
张伟
司成祥
张建松
李胜男
毛蔚轩
盖伟麟
房婧
王亿芳
胡晓旭
王梦禹
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
National Computer Network and Information Security Management Center
Original Assignee
National Computer Network and Information Security Management Center
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by National Computer Network and Information Security Management Center filed Critical National Computer Network and Information Security Management Center
Priority to CN201910210330.6A priority Critical patent/CN109948339A/en
Publication of CN109948339A publication Critical patent/CN109948339A/en
Pending legal-status Critical Current

Links

Landscapes

  • Data Exchanges In Wide-Area Networks (AREA)

Abstract

The malicious script detection method based on machine learning that the present invention provides a kind of, this method step include: S1. building network simulating environment, acquire the sample data of Webshell script;S2. data prediction is carried out to collected sample data, and analyzes the traffic characteristic extracted in sample data;S3. it is based on traffic characteristic, constructs Internet Intrusion Detection Model;S4. the Internet Intrusion Detection Model is disposed in server end, access server data on flows detects the network intrusions behavior of server end;S5. it will test result in system interface real-time display, and be recorded into detection log.

Description

A kind of malicious script detection method based on machine learning
Technical field
The present invention relates to field of information security technology more particularly to a kind of malicious script detection methods.
Background technique
In internet+epoch, server security is faced with the security threat from network intrusions, wherein Webshell It is a kind of malicious script that typical attacker uses, the purpose is to upgrade and safeguard to WEB application journey under attack The permanent access of sequence.Webshell itself cannot be attacked or using long-range loophole, therefore it is the second step of attack always.Attack Person can use common loophole, if SQL injection, telefile include (RFI), FTP, even with cross-site script (XSS) As a part of attack, to upload malicious script.General utility functions includes but is not limited to that shell-command executes, code executes, number It is enumerated according to library and file management.Therefore, the invasion for how effectively detecting Webshell becomes a problem in field.
Summary of the invention
It is a primary object of the present invention to propose a kind of malicious script detection method based on machine learning, it is intended to solve such as How about how automatically and efficiently the network intrusion event at detection service device end.
To achieve the above object, a kind of malicious script detection method based on machine learning provided by the invention, this method Key step includes:
S1. network simulating environment is constructed, the sample data of Webshell script is acquired;
S2. data prediction is carried out to collected sample data, and analyzes the traffic characteristic extracted in sample data;
S3. it is based on traffic characteristic, constructs Internet Intrusion Detection Model;
S4. the Internet Intrusion Detection Model is disposed in server end, access server data on flows detects server The network intrusions behavior at end.
S5. it will test result in system interface real-time display, and be recorded into detection log.
Preferably, in step S1 further include: when constructing network simulating environment, according to webshell on multiple servers Type, attack behavior finish writing automatized script, use Network Sniffing tool collect Webshell flow.
Preferably, the traffic characteristic in step S2 be keyword, webpage path structure hierachy number, cookie key assignments logarithm, Return to one of multiple features such as structure of web page similarity, POST/GET entropy, cookie key-value pair entropy or a variety of.
Preferably, the Internet Intrusion Detection Model in step S3 is using adboost, SVM, random forest, logistic regression etc. One of machine learning algorithm carries out model training.
Malicious script detection method proposed by the present invention based on machine learning, by carrying out data to Webshell flow Analysis and feature extraction construct Internet Intrusion Detection Model, so as to which the network intrusions from Webshell are effectively detected Flow, to improve the network security performance of user.
Detailed description of the invention
Fig. 1 is flow chart of the method for the present invention.
Specific embodiment
Malicious script detection method provided by the invention based on machine learning, this method key step include:
S1. network simulating environment is constructed, the sample data of Webshell script is acquired;
Webshell sample lacks in true environment, and substantially in tens of thousands of http flows, all difficulty has one Flow caused by webshell.Therefore for machine learning, high quality, multi-quantity sample will be challenge. In order to solve the thorny problem of this sample hardly possible, we specially simulate the environment for having built webshell invasion, according to The type of webshell, the behavior of attack finish writing automatized script, and when operation generates a large amount of webshell flows, are smelt using network Spy tool (such as Wireshark, Tcpdump etc.) has collected Webshell flow.
S2. data prediction is carried out to collected sample data, and analyzes the traffic characteristic extracted in sample data;
After the expertise knowledge in Feature Engineering, being collected into and actual historical data statistical analysis, start special Sign analysis.
1. the feature based on keyword.
Behavioural analysis for webshell itself, it has for system calling, system configuration, database, file Operational motion, its behavior determine that mostly band parameter has some apparent features in its data traffic, in addition closes again Decode operation first is carried out to flow before keyword matching.
2. get/post number of parameters in flow
It has been observed that the number of parameters of in general webshell get/post is fewer, a feature can be used as.
3. the comentropy of get/post in flow
General request all can submit data to server, and webshell is no exception.But if the data submitted are passed through Encryption or coded treatment, entropy will become larger.For normal web operation system, if submitting number to a certain URI According to entropy it is obvious bigger than normal in other pages, then the corresponding sound code file of the URI is just more suspicious.And it is logical generally to have done encryption The webshell of letter submits the entropy of data can be bigger than normal, so can detected.
4. the feature extraction based on cookie
In normal http access, because http access is stateless agreement, server will not safeguard visitor automatically The contextual information at family then saves contextual information using session.Session is stored in server end, in order to For the cost of reduction server storage then when there is http request, server can return to a cookie to record sessionID And it is stored in browser local, cookie can be carried in request when accessing next time.The content of cookie specifically includes that name Word, value, expired time, path and domain.Path constitutes the sphere of action of cookie together with domain.It analyzes according to observations, webShell Generated cookie some is sky, although the structure quantum for having key-value pair having is considerably less, and is named without real The meaning on border.It is used to distinguish webShell and normal website visiting so extracting this feature.
5. returning to structure of web page similarity value
The page that Webshell is much returned have structural similarity, can extract this feature of structure of web page similarity into Row compares.Mentality of designing is to compare with the acquired webshell structure of web page similarity generated, with return webpage Structural similarity is as a feature.
6. the webpage path number of plies
The webpage path of Webshell can be deep, and webpage is concealed deep, is not easy to be found by normal browsing person.
7. access time section
Webshell is compared with regular traffic, and the time of browsing is discrepant, it will usually which selection is in normal discharge rareness Time access.Therefore feature is found time as a dimension.According to time big category feature, can be unfolded it is several under it is several Small category feature, which in one day period, in one week what day, which in 1 year in week, which in 1 year in season, working day, Weekend.
8. whether there is or not referer
In flow, if the page up webpage that webpage does not jump, referer parameter will be sky, therefore Select this feature as a kind of auxiliary judgment.
S3. it is based on traffic characteristic, constructs Internet Intrusion Detection Model;
S4. the Internet Intrusion Detection Model is disposed in server end, access server data on flows detects server The network intrusions behavior at end.
It can according to need in actual implementation and choose adboost, SVM, random forest, logistic regression scheduling algorithm progress model Training, general default choice random forests algorithm is as model training algorithm.
S5. it will test result in system interface real-time display, and be recorded into detection log.
The embodiment of the present invention is described with above attached drawing, but the invention is not limited to above-mentioned specific Embodiment, the above mentioned embodiment is only schematical, rather than restrictive, those skilled in the art Under the inspiration of the present invention, without breaking away from the scope protected by the purposes and claims of the present invention, it can also make very much Form, all of these belong to the protection of the present invention.

Claims (4)

1. a kind of malicious script detection method based on machine learning, this method step include:
S1. network simulating environment is constructed, the sample data of Webshell script is acquired;
S2. data prediction is carried out to collected sample data, and analyzes the traffic characteristic extracted in sample data;
S3. it is based on traffic characteristic, constructs Internet Intrusion Detection Model;
S4. the Internet Intrusion Detection Model is disposed in server end, access server data on flows detects server end Network intrusions behavior;
S5. it will test result in system interface real-time display, and be recorded into detection log.
2. the method as described in claim 1, it is characterised in that: in the step S1 further include: in building network simulating environment When, automatized script is finished writing according to the behavior of the type of webshell, attack on multiple servers, uses Network Sniffing tool Collect Webshell flow.
3. the method as described in claim 1, it is characterised in that: the traffic characteristic in the step S2 is keyword, webpage road Gauge structure hierachy number, returns to structure of web page similarity, POST/GET entropy, cookie key-value pair entropy at cookie key assignments logarithm Etc. one of multiple features or a variety of.
4. the method as described in claim 1, it is characterised in that: the Internet Intrusion Detection Model in the step S3 uses One of machine learning algorithms such as adboost, SVM, random forest, logistic regression carry out model training.
CN201910210330.6A 2019-03-20 2019-03-20 A kind of malicious script detection method based on machine learning Pending CN109948339A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910210330.6A CN109948339A (en) 2019-03-20 2019-03-20 A kind of malicious script detection method based on machine learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910210330.6A CN109948339A (en) 2019-03-20 2019-03-20 A kind of malicious script detection method based on machine learning

Publications (1)

Publication Number Publication Date
CN109948339A true CN109948339A (en) 2019-06-28

Family

ID=67010379

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910210330.6A Pending CN109948339A (en) 2019-03-20 2019-03-20 A kind of malicious script detection method based on machine learning

Country Status (1)

Country Link
CN (1) CN109948339A (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110417810A (en) * 2019-08-20 2019-11-05 西安电子科技大学 The malice for the enhancing model that logic-based returns encrypts flow rate testing methods
CN112491882A (en) * 2020-11-27 2021-03-12 泰康保险集团股份有限公司 Webshell detection method, webshell detection device, webshell detection medium and electronic equipment
CN113239352A (en) * 2021-04-06 2021-08-10 中国科学院信息工程研究所 Webshell detection method and system

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110417810A (en) * 2019-08-20 2019-11-05 西安电子科技大学 The malice for the enhancing model that logic-based returns encrypts flow rate testing methods
CN110417810B (en) * 2019-08-20 2021-06-25 西安电子科技大学 Malicious encrypted flow detection method based on enhanced model of logistic regression
CN112491882A (en) * 2020-11-27 2021-03-12 泰康保险集团股份有限公司 Webshell detection method, webshell detection device, webshell detection medium and electronic equipment
CN113239352A (en) * 2021-04-06 2021-08-10 中国科学院信息工程研究所 Webshell detection method and system
CN113239352B (en) * 2021-04-06 2022-05-17 中国科学院信息工程研究所 Webshell detection method and system

Similar Documents

Publication Publication Date Title
Bercovitch et al. HoneyGen: An automated honeytokens generator
US8225402B1 (en) Anomaly-based detection of SQL injection attacks
CN105721427B (en) A method of excavating attack Frequent Sequential Patterns from Web daily records
US9509714B2 (en) Web page and web browser protection against malicious injections
CN103559235B (en) A kind of online social networks malicious web pages detection recognition methods
Liu et al. A novel approach for detecting browser-based silent miner
US20160088015A1 (en) Web page and web browser protection against malicious injections
Najafabadi et al. User behavior anomaly detection for application layer ddos attacks
Taylor et al. Detecting malicious exploit kits using tree-based similarity searches
CA2762429A1 (en) Systems and methods for application-level security
CN109948339A (en) A kind of malicious script detection method based on machine learning
Liao et al. Feature extraction and construction of application layer DDoS attack based on user behavior
CN106503557A (en) SQL injection attacks system of defense and defence method based on dynamic mapping
Vargas et al. Knowing your enemies: Leveraging data analysis to expose phishing patterns against a major US financial institution
Sharma et al. Growth of Cyber-crimes in Society 4.0
CN107800686A (en) A kind of fishing website recognition methods and device
Dharam et al. Runtime monitors for tautology based SQL injection attacks
CN106845248A (en) A kind of XSS leak detection methods based on state transition graph
Tripathi et al. A novel web fraud detection technique using association rule mining
Roy et al. A large-scale analysis of phishing websites hosted on free web hosting domains
CN116319065A (en) Threat situation analysis method and system applied to business operation and maintenance
Ro et al. Detection Method for Distributed Web‐Crawlers: A Long‐Tail Threshold Model
Patil et al. Preprocessing web logs for web intrusion detection
CN113190841A (en) Method for defending graph data attack by using differential privacy technology
Chen et al. Anomaly behavior analysis for web page inspection

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
WD01 Invention patent application deemed withdrawn after publication
WD01 Invention patent application deemed withdrawn after publication

Application publication date: 20190628