CN109948339A - A kind of malicious script detection method based on machine learning - Google Patents
A kind of malicious script detection method based on machine learning Download PDFInfo
- Publication number
- CN109948339A CN109948339A CN201910210330.6A CN201910210330A CN109948339A CN 109948339 A CN109948339 A CN 109948339A CN 201910210330 A CN201910210330 A CN 201910210330A CN 109948339 A CN109948339 A CN 109948339A
- Authority
- CN
- China
- Prior art keywords
- webshell
- sample data
- data
- machine learning
- script
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Landscapes
- Data Exchanges In Wide-Area Networks (AREA)
Abstract
The malicious script detection method based on machine learning that the present invention provides a kind of, this method step include: S1. building network simulating environment, acquire the sample data of Webshell script;S2. data prediction is carried out to collected sample data, and analyzes the traffic characteristic extracted in sample data;S3. it is based on traffic characteristic, constructs Internet Intrusion Detection Model;S4. the Internet Intrusion Detection Model is disposed in server end, access server data on flows detects the network intrusions behavior of server end;S5. it will test result in system interface real-time display, and be recorded into detection log.
Description
Technical field
The present invention relates to field of information security technology more particularly to a kind of malicious script detection methods.
Background technique
In internet+epoch, server security is faced with the security threat from network intrusions, wherein Webshell
It is a kind of malicious script that typical attacker uses, the purpose is to upgrade and safeguard to WEB application journey under attack
The permanent access of sequence.Webshell itself cannot be attacked or using long-range loophole, therefore it is the second step of attack always.Attack
Person can use common loophole, if SQL injection, telefile include (RFI), FTP, even with cross-site script (XSS)
As a part of attack, to upload malicious script.General utility functions includes but is not limited to that shell-command executes, code executes, number
It is enumerated according to library and file management.Therefore, the invasion for how effectively detecting Webshell becomes a problem in field.
Summary of the invention
It is a primary object of the present invention to propose a kind of malicious script detection method based on machine learning, it is intended to solve such as
How about how automatically and efficiently the network intrusion event at detection service device end.
To achieve the above object, a kind of malicious script detection method based on machine learning provided by the invention, this method
Key step includes:
S1. network simulating environment is constructed, the sample data of Webshell script is acquired;
S2. data prediction is carried out to collected sample data, and analyzes the traffic characteristic extracted in sample data;
S3. it is based on traffic characteristic, constructs Internet Intrusion Detection Model;
S4. the Internet Intrusion Detection Model is disposed in server end, access server data on flows detects server
The network intrusions behavior at end.
S5. it will test result in system interface real-time display, and be recorded into detection log.
Preferably, in step S1 further include: when constructing network simulating environment, according to webshell on multiple servers
Type, attack behavior finish writing automatized script, use Network Sniffing tool collect Webshell flow.
Preferably, the traffic characteristic in step S2 be keyword, webpage path structure hierachy number, cookie key assignments logarithm,
Return to one of multiple features such as structure of web page similarity, POST/GET entropy, cookie key-value pair entropy or a variety of.
Preferably, the Internet Intrusion Detection Model in step S3 is using adboost, SVM, random forest, logistic regression etc.
One of machine learning algorithm carries out model training.
Malicious script detection method proposed by the present invention based on machine learning, by carrying out data to Webshell flow
Analysis and feature extraction construct Internet Intrusion Detection Model, so as to which the network intrusions from Webshell are effectively detected
Flow, to improve the network security performance of user.
Detailed description of the invention
Fig. 1 is flow chart of the method for the present invention.
Specific embodiment
Malicious script detection method provided by the invention based on machine learning, this method key step include:
S1. network simulating environment is constructed, the sample data of Webshell script is acquired;
Webshell sample lacks in true environment, and substantially in tens of thousands of http flows, all difficulty has one
Flow caused by webshell.Therefore for machine learning, high quality, multi-quantity sample will be challenge.
In order to solve the thorny problem of this sample hardly possible, we specially simulate the environment for having built webshell invasion, according to
The type of webshell, the behavior of attack finish writing automatized script, and when operation generates a large amount of webshell flows, are smelt using network
Spy tool (such as Wireshark, Tcpdump etc.) has collected Webshell flow.
S2. data prediction is carried out to collected sample data, and analyzes the traffic characteristic extracted in sample data;
After the expertise knowledge in Feature Engineering, being collected into and actual historical data statistical analysis, start special
Sign analysis.
1. the feature based on keyword.
Behavioural analysis for webshell itself, it has for system calling, system configuration, database, file
Operational motion, its behavior determine that mostly band parameter has some apparent features in its data traffic, in addition closes again
Decode operation first is carried out to flow before keyword matching.
2. get/post number of parameters in flow
It has been observed that the number of parameters of in general webshell get/post is fewer, a feature can be used as.
3. the comentropy of get/post in flow
General request all can submit data to server, and webshell is no exception.But if the data submitted are passed through
Encryption or coded treatment, entropy will become larger.For normal web operation system, if submitting number to a certain URI
According to entropy it is obvious bigger than normal in other pages, then the corresponding sound code file of the URI is just more suspicious.And it is logical generally to have done encryption
The webshell of letter submits the entropy of data can be bigger than normal, so can detected.
4. the feature extraction based on cookie
In normal http access, because http access is stateless agreement, server will not safeguard visitor automatically
The contextual information at family then saves contextual information using session.Session is stored in server end, in order to
For the cost of reduction server storage then when there is http request, server can return to a cookie to record sessionID
And it is stored in browser local, cookie can be carried in request when accessing next time.The content of cookie specifically includes that name
Word, value, expired time, path and domain.Path constitutes the sphere of action of cookie together with domain.It analyzes according to observations, webShell
Generated cookie some is sky, although the structure quantum for having key-value pair having is considerably less, and is named without real
The meaning on border.It is used to distinguish webShell and normal website visiting so extracting this feature.
5. returning to structure of web page similarity value
The page that Webshell is much returned have structural similarity, can extract this feature of structure of web page similarity into
Row compares.Mentality of designing is to compare with the acquired webshell structure of web page similarity generated, with return webpage
Structural similarity is as a feature.
6. the webpage path number of plies
The webpage path of Webshell can be deep, and webpage is concealed deep, is not easy to be found by normal browsing person.
7. access time section
Webshell is compared with regular traffic, and the time of browsing is discrepant, it will usually which selection is in normal discharge rareness
Time access.Therefore feature is found time as a dimension.According to time big category feature, can be unfolded it is several under it is several
Small category feature, which in one day period, in one week what day, which in 1 year in week, which in 1 year in season, working day,
Weekend.
8. whether there is or not referer
In flow, if the page up webpage that webpage does not jump, referer parameter will be sky, therefore
Select this feature as a kind of auxiliary judgment.
S3. it is based on traffic characteristic, constructs Internet Intrusion Detection Model;
S4. the Internet Intrusion Detection Model is disposed in server end, access server data on flows detects server
The network intrusions behavior at end.
It can according to need in actual implementation and choose adboost, SVM, random forest, logistic regression scheduling algorithm progress model
Training, general default choice random forests algorithm is as model training algorithm.
S5. it will test result in system interface real-time display, and be recorded into detection log.
The embodiment of the present invention is described with above attached drawing, but the invention is not limited to above-mentioned specific
Embodiment, the above mentioned embodiment is only schematical, rather than restrictive, those skilled in the art
Under the inspiration of the present invention, without breaking away from the scope protected by the purposes and claims of the present invention, it can also make very much
Form, all of these belong to the protection of the present invention.
Claims (4)
1. a kind of malicious script detection method based on machine learning, this method step include:
S1. network simulating environment is constructed, the sample data of Webshell script is acquired;
S2. data prediction is carried out to collected sample data, and analyzes the traffic characteristic extracted in sample data;
S3. it is based on traffic characteristic, constructs Internet Intrusion Detection Model;
S4. the Internet Intrusion Detection Model is disposed in server end, access server data on flows detects server end
Network intrusions behavior;
S5. it will test result in system interface real-time display, and be recorded into detection log.
2. the method as described in claim 1, it is characterised in that: in the step S1 further include: in building network simulating environment
When, automatized script is finished writing according to the behavior of the type of webshell, attack on multiple servers, uses Network Sniffing tool
Collect Webshell flow.
3. the method as described in claim 1, it is characterised in that: the traffic characteristic in the step S2 is keyword, webpage road
Gauge structure hierachy number, returns to structure of web page similarity, POST/GET entropy, cookie key-value pair entropy at cookie key assignments logarithm
Etc. one of multiple features or a variety of.
4. the method as described in claim 1, it is characterised in that: the Internet Intrusion Detection Model in the step S3 uses
One of machine learning algorithms such as adboost, SVM, random forest, logistic regression carry out model training.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910210330.6A CN109948339A (en) | 2019-03-20 | 2019-03-20 | A kind of malicious script detection method based on machine learning |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910210330.6A CN109948339A (en) | 2019-03-20 | 2019-03-20 | A kind of malicious script detection method based on machine learning |
Publications (1)
Publication Number | Publication Date |
---|---|
CN109948339A true CN109948339A (en) | 2019-06-28 |
Family
ID=67010379
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910210330.6A Pending CN109948339A (en) | 2019-03-20 | 2019-03-20 | A kind of malicious script detection method based on machine learning |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN109948339A (en) |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110417810A (en) * | 2019-08-20 | 2019-11-05 | 西安电子科技大学 | The malice for the enhancing model that logic-based returns encrypts flow rate testing methods |
CN112491882A (en) * | 2020-11-27 | 2021-03-12 | 泰康保险集团股份有限公司 | Webshell detection method, webshell detection device, webshell detection medium and electronic equipment |
CN113239352A (en) * | 2021-04-06 | 2021-08-10 | 中国科学院信息工程研究所 | Webshell detection method and system |
-
2019
- 2019-03-20 CN CN201910210330.6A patent/CN109948339A/en active Pending
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110417810A (en) * | 2019-08-20 | 2019-11-05 | 西安电子科技大学 | The malice for the enhancing model that logic-based returns encrypts flow rate testing methods |
CN110417810B (en) * | 2019-08-20 | 2021-06-25 | 西安电子科技大学 | Malicious encrypted flow detection method based on enhanced model of logistic regression |
CN112491882A (en) * | 2020-11-27 | 2021-03-12 | 泰康保险集团股份有限公司 | Webshell detection method, webshell detection device, webshell detection medium and electronic equipment |
CN113239352A (en) * | 2021-04-06 | 2021-08-10 | 中国科学院信息工程研究所 | Webshell detection method and system |
CN113239352B (en) * | 2021-04-06 | 2022-05-17 | 中国科学院信息工程研究所 | Webshell detection method and system |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Bercovitch et al. | HoneyGen: An automated honeytokens generator | |
US8225402B1 (en) | Anomaly-based detection of SQL injection attacks | |
CN105721427B (en) | A method of excavating attack Frequent Sequential Patterns from Web daily records | |
US9509714B2 (en) | Web page and web browser protection against malicious injections | |
CN103559235B (en) | A kind of online social networks malicious web pages detection recognition methods | |
Liu et al. | A novel approach for detecting browser-based silent miner | |
US20160088015A1 (en) | Web page and web browser protection against malicious injections | |
Najafabadi et al. | User behavior anomaly detection for application layer ddos attacks | |
Taylor et al. | Detecting malicious exploit kits using tree-based similarity searches | |
CA2762429A1 (en) | Systems and methods for application-level security | |
CN109948339A (en) | A kind of malicious script detection method based on machine learning | |
Liao et al. | Feature extraction and construction of application layer DDoS attack based on user behavior | |
CN106503557A (en) | SQL injection attacks system of defense and defence method based on dynamic mapping | |
Vargas et al. | Knowing your enemies: Leveraging data analysis to expose phishing patterns against a major US financial institution | |
Sharma et al. | Growth of Cyber-crimes in Society 4.0 | |
CN107800686A (en) | A kind of fishing website recognition methods and device | |
Dharam et al. | Runtime monitors for tautology based SQL injection attacks | |
CN106845248A (en) | A kind of XSS leak detection methods based on state transition graph | |
Tripathi et al. | A novel web fraud detection technique using association rule mining | |
Roy et al. | A large-scale analysis of phishing websites hosted on free web hosting domains | |
CN116319065A (en) | Threat situation analysis method and system applied to business operation and maintenance | |
Ro et al. | Detection Method for Distributed Web‐Crawlers: A Long‐Tail Threshold Model | |
Patil et al. | Preprocessing web logs for web intrusion detection | |
CN113190841A (en) | Method for defending graph data attack by using differential privacy technology | |
Chen et al. | Anomaly behavior analysis for web page inspection |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
WD01 | Invention patent application deemed withdrawn after publication | ||
WD01 | Invention patent application deemed withdrawn after publication |
Application publication date: 20190628 |