CN107659570B - Webshell detection method and system based on machine learning and dynamic and static analysis - Google Patents
Webshell detection method and system based on machine learning and dynamic and static analysis Download PDFInfo
- Publication number
- CN107659570B CN107659570B CN201710903110.2A CN201710903110A CN107659570B CN 107659570 B CN107659570 B CN 107659570B CN 201710903110 A CN201710903110 A CN 201710903110A CN 107659570 B CN107659570 B CN 107659570B
- Authority
- CN
- China
- Prior art keywords
- dynamic
- file
- machine learning
- webshell
- static
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Images
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L63/00—Network architectures or network communication protocols for network security
- H04L63/14—Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic
- H04L63/1408—Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic by monitoring network traffic
- H04L63/1425—Traffic logging, e.g. anomaly detection
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/35—Clustering; Classification
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N20/00—Machine learning
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L67/00—Network arrangements or protocols for supporting network services or applications
- H04L67/01—Protocols
- H04L67/02—Protocols based on web technology, e.g. hypertext transfer protocol [HTTP]
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- General Engineering & Computer Science (AREA)
- Computing Systems (AREA)
- Software Systems (AREA)
- General Physics & Mathematics (AREA)
- Signal Processing (AREA)
- Computer Networks & Wireless Communication (AREA)
- Computer Security & Cryptography (AREA)
- Data Mining & Analysis (AREA)
- Physics & Mathematics (AREA)
- Medical Informatics (AREA)
- Evolutionary Computation (AREA)
- Mathematical Physics (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Computer Hardware Design (AREA)
- Artificial Intelligence (AREA)
- Databases & Information Systems (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The embodiment of the invention provides a Webshell detection method and system based on machine learning and dynamic and static analysis, and relates to the technical field of Webshell detection. The method comprises the steps of obtaining a sample file, extracting static characteristics and dynamic characteristics of the sample file, obtaining a classification model according to the static characteristics, the dynamic characteristics and a machine learning algorithm, and analyzing a file to be detected by the classification model to obtain a detection result. The method adopts an analysis means combining dynamic and static states, the extracted features are more comprehensive, a machine learning algorithm combining various classification algorithms is adopted to learn a large number of Webshell samples and normal webpage samples to form a classification model, the stability of the classification model is higher, and the classification is more accurate; by adopting the classification model, Webshell and variants thereof can be effectively detected, the novel Webshell can be predicted, text confusion means can be well dealt with, and the defect that a feature code matching detection mode is adopted in the prior art is overcome.
Description
Technical Field
The invention relates to the technical field of Webshell detection, in particular to a Webshell detection method and system based on machine learning and dynamic and static analysis.
Background
With the vigorous development of internet application and the rapid increase of internet data, the problem of server security is increasingly serious, and backdoor programs based on Web applications such as Webshell have great harm to user information and even the whole application system, so that the vulnerability and backdoor of the server are detected and discovered in time, and the security of the server is guaranteed to be crucial.
Because the Webshell is mostly written by scripting language and is easy to modify and change shape, the characteristics of the Webshell are not limited to feature codes, but also comprise a file operation function, a malicious execution function, a file annotation size, a single-line character string length, an confusion degree and the like, when the Webshell is subjected to simple variation or the feature codes are intentionally confused, the traditional method can miss reporting of the Webshell, namely, detection of a firewall and antivirus software is easily bypassed by a confusion mode, so that the conventional Webshell detection method based on feature matching is difficult to quickly detect and identify the variation of the Webshell.
Therefore, how to overcome the singleness and the hysteresis of the traditional Webshell detection mode based on feature code matching and to implement quick detection of Webshell and its variants by using a text confusion method of Webshell has always been the focus of attention of those skilled in the art.
Disclosure of Invention
The invention aims to provide a Webshell detection method based on machine learning and dynamic and static analysis, so as to overcome the singleness and the hysteresis of the traditional Webshell detection mode based on feature code matching, improve the accuracy of Webshell detection and quickly detect Webshell and variants thereof.
The invention also aims to provide a Webshell detection system based on machine learning and dynamic and static analysis, so as to overcome the singleness and the hysteresis of the traditional Webshell detection mode based on feature code matching, improve the accuracy of Webshell detection and quickly detect Webshell and variants thereof.
In order to achieve the above purpose, the embodiment of the present invention adopts the following technical solutions:
in a first aspect, an embodiment of the present invention provides a Webshell detection method based on machine learning and dynamic and static analysis, where the Webshell detection method based on machine learning and dynamic and static analysis includes: obtaining a sample file; extracting static characteristics and dynamic characteristics of the sample file; and obtaining a classification model according to the static characteristics, the dynamic characteristics and a machine learning algorithm, and analyzing the file to be detected by the classification model to obtain a detection result.
Further, the step of extracting the static features and the dynamic features of the sample file comprises: performing static analysis on the sample file to obtain the static characteristics, wherein the static characteristics comprise document characteristics, basic function characteristics and file behavior characteristics of the sample file; and dynamically analyzing the sample file to obtain the dynamic characteristics, wherein the dynamic characteristics comprise file containing operation characteristics, sensitive function operation characteristics and sensitive character string characteristics.
Further, the step of obtaining a classification model according to the static features, the dynamic features and a machine learning algorithm comprises: and learning the static characteristics and the dynamic characteristics by adopting the machine learning algorithm to obtain the classification model.
Further, the machine learning algorithm is a collective learning mode combining a plurality of classification algorithms.
Further, the Webshell detection method based on machine learning and dynamic and static analysis further includes: and when the file to be detected is determined to be Webshell after detection, performing machine learning again according to the file to be detected and the sample file to update the classification model.
In a second aspect, an embodiment of the present invention further provides a Webshell detection system based on machine learning and dynamic and static analysis, where the Webshell detection system based on machine learning and dynamic and static analysis includes a sample acquisition module, a feature extraction module, and a model establishment module. The sample acquisition module is used for acquiring a sample file; the characteristic extraction module is used for extracting static characteristics and dynamic characteristics of the sample file; the model establishing module is used for obtaining a classification model according to the static characteristics, the dynamic characteristics and a machine learning algorithm, and the classification model analyzes the file to be detected and obtains a detection result.
Further, the feature extraction module comprises a static analysis module and a dynamic analysis module. The static analysis module is used for carrying out static analysis on the sample file to obtain the static characteristics, wherein the static characteristics comprise document characteristics, basic function characteristics and file behavior characteristics of the sample file; the dynamic analysis module is used for dynamically analyzing the sample file to obtain the dynamic characteristics, wherein the dynamic characteristics comprise operation characteristics, sensitive function operation characteristics and sensitive character string characteristics.
Further, the model establishing module is configured to learn the static features and the dynamic features by using the machine learning algorithm to obtain the classification model.
Further, the machine learning algorithm adopted by the model building module is a collective learning mode combining a plurality of classification algorithms.
Further, the Webshell detection system based on machine learning and dynamic and static analysis further comprises a model updating module, wherein the model updating module is used for carrying out machine learning again according to the file to be detected and the sample file to update the classification model when the file to be detected is determined to be Webshell after being detected.
Compared with the prior art, the invention has the following beneficial effects: according to the Webshell detection method and system based on machine learning and dynamic and static analysis, provided by the embodiment of the invention, the static characteristics and the dynamic characteristics of the sample file are extracted by obtaining the sample file, the classification model is obtained according to the static characteristics, the dynamic characteristics and a machine learning algorithm, and the classification model analyzes the file to be detected and obtains the detection result. The embodiment of the invention adopts an analysis means combining dynamic and static states, has more comprehensive extraction characteristics, adopts a machine learning algorithm combining various classification algorithms to learn a large amount of Webshell samples and normal webpage samples to form a classification model, and has higher stability and more accurate classification. The machine learning algorithm can be used for the complex classification calculation of multiple features, so that the features related to detection are not limited to a single feature code. The user can effectively detect the Webshell and the variation thereof by adopting the classification model, predict the novel Webshell, better deal with text confusion means and make up the defect of the traditional characteristic code matching detection mode.
In order to make the aforementioned and other objects, features and advantages of the present invention comprehensible, preferred embodiments accompanied with figures are described in detail below.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings needed to be used in the embodiments will be briefly described below, it should be understood that the following drawings only illustrate some embodiments of the present invention and therefore should not be considered as limiting the scope, and for those skilled in the art, other related drawings can be obtained according to the drawings without inventive efforts.
Fig. 1 shows a block schematic diagram of a server provided by an embodiment of the present invention.
Fig. 2 shows a functional block diagram of a Webshell detection system based on machine learning and dynamic and static analysis according to a first embodiment of the present invention.
Fig. 3 shows a functional block diagram of the feature extraction module of fig. 2.
Fig. 4 shows a schematic flow chart of Webshell detection performed by the Webshell detection system based on machine learning and dynamic and static analysis.
Fig. 5 is a schematic flow chart of a Webshell detection method based on machine learning and dynamic and static analysis according to a second embodiment of the present invention.
Fig. 6 shows a detailed flowchart of step S202 in fig. 5.
Icon: 100-a server; 400-Webshell detection system based on machine learning and dynamic and static analysis; 110-a memory; 120-a memory controller; 130-a processor; 410-a sample acquisition module; 420-a feature extraction module; 430-model building module; 440-model update module; 421-static analysis module; 422-dynamic analysis module.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. The components of embodiments of the present invention generally described and illustrated in the figures herein may be arranged and designed in a wide variety of different configurations. Thus, the following detailed description of the embodiments of the present invention, presented in the figures, is not intended to limit the scope of the invention, as claimed, but is merely representative of selected embodiments of the invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments of the present invention without making any creative effort, shall fall within the protection scope of the present invention.
It should be noted that: like reference numbers and letters refer to like items in the following figures, and thus, once an item is defined in one figure, it need not be further defined and explained in subsequent figures. Meanwhile, in the description of the present invention, the terms "first", "second", and the like are used only for distinguishing the description, and are not to be construed as indicating or implying relative importance.
The Webshell detection method and system based on machine learning and dynamic and static analysis provided by the embodiment of the invention can be applied to the server 100 shown in fig. 1. In this embodiment, the server 100 may be, but is not limited to, a web server, a database server, a cloud server, and the like. As shown in fig. 1, server 100 may include a memory 110, a memory controller 120, and a processor 130.
The memory 110, the memory controller 120, and the processor 130 are electrically connected directly or indirectly to each other to realize data transmission or interaction. For example, the components may be electrically connected to each other via one or more communication buses or signal lines. The Webshell detection system 400 based on machine learning and dynamic and static analysis includes at least one software function module which can be stored in the memory 110 in the form of software or firmware (firmware) or fixed in an Operating System (OS) of the server 100. The processor 130 is configured to execute executable modules stored in the memory 110, for example, software functional modules and computer programs included in the Webshell detection system 400 based on machine learning and dynamic and static analysis.
The Memory 110 may be, but is not limited to, a Random Access Memory (RAM), a Read Only Memory (ROM), a Programmable Read-Only Memory (PROM), an Erasable Read-Only Memory (EPROM), an electrically Erasable Read-Only Memory (EEPROM), and the like. The memory 110 may be used to store software programs and modules that the processor 130 uses to execute upon receiving execution instructions.
The processor 130 may be an integrated circuit chip having signal processing capabilities. The Processor 130 may be a general-purpose Processor, and includes a Central Processing Unit (CPU), a Network Processor (NP), and the like; but may also be a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), an off-the-shelf programmable gate array (FPGA) or other programmable logic device, discrete gate or transistor logic, discrete hardware components. A general purpose processor may be a microprocessor or the processor 130 may be any conventional processor or the like.
It will be appreciated that the configuration shown in fig. 1 is merely illustrative and that the server 100 may include more or fewer components than shown in fig. 1 or may have a different configuration than shown in fig. 1. The components shown in fig. 1 may be implemented in hardware, software, or a combination thereof.
First embodiment
Referring to fig. 2, a functional block diagram of a Webshell detection system 400 based on machine learning and dynamic and static analysis according to a first embodiment of the present invention is shown. The Webshell detection system 400 based on machine learning and dynamic and static analysis comprises a sample acquisition module 410, a feature extraction module 420 and a model building module 430.
The sample acquiring module 410 is used for acquiring a sample file. In this embodiment, the sample file includes a large number of Webshell samples and normal website samples, where the types of the Webshell samples include: the types of the trojans written by various languages such as ASP trojan, PHP trojan, JSP trojan and the like can also be divided into one sentence talk trojan, picture codes, functional uploading big horses and the like; the normal website samples are CMSs in PHP language, or source codes of websites to be detected, and the like, which is not limited herein. Preferably, a large number of acquired sample files are stored in a database, and users can freely add self-collected Webshell samples and normal webpage codes. According to different detection environments, the accuracy of the model can be improved and the false alarm rate can be reduced by providing the original file code of the website as a positive sample.
The feature extraction module 420 is configured to extract static features and dynamic features of the sample file.
In this embodiment, the feature extraction module 420 is configured to perform dynamic and static analysis on a large number of sample files. As shown in fig. 3, the feature extraction module 420 specifically includes a static analysis module 421 and a dynamic analysis module 422.
The static analysis module 421 is configured to perform static analysis on the sample file to obtain the static feature, where the static feature includes a document feature, a basic function feature, and a file behavior feature of the sample file. In this embodiment, the static analysis module 421 mainly analyzes the characters in the sample file, and counts the values of the sample file in multiple feature dimensions. In particular, the document features may include, but are not limited to: the number of words, the number of different words, the number of lines, the average number of words per line, the number of empty characters and spaces, the maximum word length, the number of comments and the like; the basis function features may include, but are not limited to: character operation function, sensitive function call, system function call quantity, script block quantity, function parameter maximum length, encryption and decryption function call and the like; the document behavior characteristics may include, but are not limited to: file operations, ftp operations, database operations, and the like.
The dynamic analysis module 422 is configured to perform dynamic analysis on the sample file to obtain the dynamic characteristics, where the dynamic characteristics include a file including an operation characteristic, a sensitive function running characteristic, and a sensitive character string characteristic. In this embodiment, the dynamic analysis module 422 mainly establishes a compiling environment or a hook extension for different program languages, monitors and combines a mark tracking mechanism and a black-and-white list mechanism of an external input variable to perform real-time dynamic detection of the Webshell, and summarizes dynamic characteristics of the sample file. In this embodiment, the features to be confused by Webshell include: calculating the value of the moisture content of the text, the number of invalid characters of the text, dynamically analyzing the sensitive character strings generated by running, sensitive functions and the like.
The model establishing module 430 is configured to obtain a classification model according to the static features, the dynamic features and a machine learning algorithm, where the classification model analyzes a file to be detected and obtains a detection result.
In this embodiment, a user uploads a file to be detected to a system, and the classification model can complete Webshell detection of the file to be detected, obtain a classification result, and generate a detection report for the user to check.
In this embodiment, the model establishing module 430 is configured to learn the static features and the dynamic features by using the machine learning algorithm to obtain the classification model. Specifically, the model establishing module 430 first performs normalization operation on the static features and the dynamic features to obtain a feature vector set, learns the feature vector set by using a machine learning algorithm, and calculates to obtain a classification model. Preferably, in this embodiment, the machine learning algorithm is a collective learning method combining multiple classification algorithms, and specifically may include: random forest algorithm, decision tree algorithm, logic algorithm, etc. The stability and robustness of the model can be improved by combining a collective learning mode of various classification algorithms, so that the detection accuracy of the classification model is improved.
It should be noted that, in this embodiment, after the classification model is built, part of the data that is not learned may be used to test the error detection rate, the false alarm rate, and the like of the classification model, and then the classification model is adjusted according to the tested data, for example, the proportion, the number, the type, and the like of the positive and negative samples in the sample file are adjusted, so as to improve the accuracy of the classification model and achieve the optimization of the classification model.
Further, the Webshell detection system 400 based on machine learning and dynamic and static analysis further includes a model updating module 440, where the model updating module 440 is configured to perform machine learning again according to the file to be detected and the sample file to update the classification model when the file to be detected conforms to the Webshell feature.
In this embodiment, a user may perform Webshell detection on an unknown file (that is, a file to be detected) by using the Webshell detection system 400 based on machine learning and dynamic and static analysis, and when it is detected that the file to be detected is a malicious file Webshell, add the file to be detected to a malicious sample database, and perform machine learning again together with the previous sample file to optimize and update the classification model. A specific process of using the system to perform Webshell detection by a user may refer to fig. 4, and specifically includes:
and step S101, acquiring the file to be detected.
Specifically, the user connects to the system and uploads the file to be detected, and the system obtains the file to be detected through the sample obtaining module 410.
And S102, extracting the static characteristics and the dynamic characteristics of the file to be detected.
In this embodiment, the system automatically performs dynamic and static feature extraction on the file to be detected through the feature extraction module 420.
And S103, analyzing the file to be detected by adopting the classification model and obtaining a detection result.
Specifically, after the feature extraction of the file to be detected is completed, the established classification model is used for detecting to determine whether the file to be detected is a malicious file Webshell or not, so as to obtain a detection result, and then a detection report is formed by combining the dynamic and static features extracted by the feature extraction module 420 so as to be convenient for a user to check. For example, the content presented by the detection report may include: the probability percentage that the file to be detected is a malicious file Webshell, the extracted features (such as malicious functions, file operation behaviors, appearing blacklist characters) and the like.
Second embodiment
Fig. 5 is a schematic flow chart of a Webshell detection method based on machine learning and dynamic and static analysis according to a second embodiment of the present invention. It should be noted that, the Webshell detection method based on machine learning and dynamic and static analysis according to the embodiment of the present invention is not limited by fig. 5 and the specific sequence described below, the basic principle and the generated technical effect are the same as those of the first embodiment, and for brief description, reference may be made to corresponding contents in the first embodiment for a part not mentioned in the embodiment. It should be understood that, in other embodiments, the order of some steps in the Webshell detection method based on machine learning and dynamic and static analysis according to the present invention may be interchanged according to actual needs, or some steps may be omitted or deleted. The specific flow shown in fig. 5 will be described in detail below.
In step S201, a sample file is acquired.
It is understood that the step S201 may be performed by the sample acquiring module 410 described above.
Step S202, extracting static characteristics and dynamic characteristics of the sample file.
It is understood that this step S202 may be performed by the feature extraction module 420 described above.
As shown in fig. 6, in this embodiment, the step S202 specifically includes the following sub-steps:
and a substep S2021, performing static analysis on the sample file to obtain the static characteristics, wherein the static characteristics include document characteristics, basic function characteristics and file behavior characteristics of the sample file.
It is understood that this step S2021 may be performed by the static analysis module 421 described above.
And a substep S2022 of performing dynamic analysis on the sample file to obtain the dynamic characteristics, wherein the dynamic characteristics comprise file operation characteristics, sensitive function operation characteristics and sensitive character string characteristics.
It is understood that this step S2022 may be performed by the dynamic analysis module 422 described above.
In this embodiment, the order of the sub-steps S2021 and S2022 is not limited, and may be executed simultaneously.
Step S203, a classification model is obtained according to the static characteristics, the dynamic characteristics and a machine learning algorithm, and the classification model analyzes the file to be detected and obtains a detection result.
It is understood that this step S203 can be performed by the model building module 430 described above.
And step S204, when the file to be detected is determined to be Webshell after detection, performing machine learning again according to the file to be detected and the sample file to update the classification model.
It is understood that this step S204 may be performed by the model update module 440 described above.
In summary, according to the Webshell detection method and system based on machine learning and dynamic and static analysis provided by the embodiments of the present invention, a sample file is obtained, static analysis and dynamic analysis are performed on the sample file to respectively extract static features and dynamic features of the sample file, a machine learning algorithm is used for learning according to the static features and the dynamic features to obtain a classification model, and the classification model analyzes a file to be detected and obtains a detection result. Further, when the file to be detected is determined to be Webshell after detection, the file to be detected is added into the sample database, and machine learning is carried out again together with the previous sample file to realize updating of the classification model. The embodiment of the invention adopts an analysis means combining dynamic and static states, has more comprehensive extraction characteristics, adopts a machine learning algorithm combining various classification algorithms to learn a large amount of Webshell samples and normal webpage samples to form a classification model, and has higher stability and more accurate classification. The machine learning algorithm can be used for the complex classification calculation of multiple features, so that the features related to detection are not limited to a single feature code. The user can effectively detect the Webshell and the variation thereof by adopting the classification model, predict the novel Webshell, better deal with text confusion means and make up the defect of the traditional characteristic code matching detection mode.
It is noted that, in this document, relational terms such as "first" and "second," and the like, may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.
The above description is only a preferred embodiment of the present invention and is not intended to limit the present invention, and various modifications and changes may be made by those skilled in the art. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention should be included in the protection scope of the present invention. It should be noted that: like reference numbers and letters refer to like items in the following figures, and thus, once an item is defined in one figure, it need not be further defined and explained in subsequent figures.
Claims (8)
1. A Webshell detection method based on machine learning and dynamic and static analysis is characterized by comprising the following steps:
obtaining a sample file;
analyzing characters in the sample file to obtain static characteristics of the sample file, wherein the static characteristics comprise document characteristics, basic function characteristics and file behavior characteristics of the sample file, the document characteristics comprise word quantity, different word quantity, line number, average word quantity per line, empty character and space quantity, maximum word length and annotation quantity, the basic function characteristics comprise character operation functions, sensitive function calls, system function call quantity, script block quantity, function parameter maximum length and encryption and decryption function calls, and the file behavior characteristics comprise file operations, ftp operations and database operations;
respectively establishing a compiling environment or a hook extension aiming at different program languages, monitoring and combining a mark tracking mechanism and a black and white list mechanism of an external input variable to carry out real-time dynamic detection on the Webshell, and summarizing the dynamic characteristics of the sample file, wherein the dynamic characteristics comprise file containing operation characteristics, sensitive function operation characteristics and sensitive character string characteristics;
and obtaining a classification model according to the static characteristics, the dynamic characteristics and a machine learning algorithm, and analyzing the file to be detected by the classification model to obtain a detection result.
2. The Webshell detection method based on machine learning and dynamic and static analysis of claim 1, wherein the step of obtaining a classification model based on the static features, the dynamic features and a machine learning algorithm comprises:
and learning the static characteristics and the dynamic characteristics by adopting the machine learning algorithm to obtain the classification model.
3. The Webshell detection method based on machine learning and dynamic and static analysis as claimed in claim 1, wherein the machine learning algorithm is a collective learning mode combining multiple classification algorithms.
4. The Webshell detection method based on machine learning and dynamic and static analysis of claim 1, wherein the Webshell detection method based on machine learning and dynamic and static analysis further comprises:
and when the file to be detected is determined to be Webshell after detection, performing machine learning again according to the file to be detected and the sample file to update the classification model.
5. A Webshell detection system based on machine learning and dynamic and static analysis is characterized in that the Webshell detection system based on machine learning and dynamic and static analysis comprises:
the sample acquisition module is used for acquiring a sample file;
the characteristic extraction module is used for extracting static characteristics and dynamic characteristics of the sample file; the feature extraction module includes: the static analysis module is used for analyzing characters in the sample file to obtain static characteristics of the sample file, wherein the static characteristics comprise document characteristics, basic function characteristics and file behavior characteristics of the sample file, the document characteristics comprise word quantity, different word quantity, line number, average word quantity per line, empty character and space quantity, maximum word length and annotation quantity, the basic function characteristics comprise character operation functions, sensitive function calls, system function call quantity, script block quantity, function parameter maximum length and encryption and decryption function calls, and the file behavior characteristics comprise file operations, ftp operations and database operations; the dynamic analysis module is used for respectively establishing a compiling environment or a hook extension aiming at different program languages, monitoring and combining a mark tracking mechanism and a black and white list mechanism of an external input variable to carry out real-time dynamic detection on the Webshell, and summarizing the dynamic characteristics of the sample file, wherein the dynamic characteristics comprise operating characteristics, sensitive function operating characteristics and sensitive character string characteristics of the file;
and the model establishing module is used for obtaining a classification model according to the static characteristics, the dynamic characteristics and a machine learning algorithm, and the classification model analyzes the file to be detected and obtains a detection result.
6. The Webshell detection system based on machine learning and dynamic and static analysis of claim 5, wherein the model building module is configured to learn the static features and the dynamic features using the machine learning algorithm to obtain the classification model.
7. The Webshell detection system based on machine learning and dynamic and static analysis as claimed in claim 5, wherein the machine learning algorithm adopted by the model building module is a collective learning mode combining multiple classification algorithms.
8. The Webshell detection system based on machine learning and dynamic and static analysis of claim 5, wherein the Webshell detection system based on machine learning and dynamic and static analysis further comprises:
and the model updating module is used for carrying out machine learning again according to the file to be detected and the sample file to update the classification model when the file to be detected is determined to be Webshell after detection.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710903110.2A CN107659570B (en) | 2017-09-29 | 2017-09-29 | Webshell detection method and system based on machine learning and dynamic and static analysis |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710903110.2A CN107659570B (en) | 2017-09-29 | 2017-09-29 | Webshell detection method and system based on machine learning and dynamic and static analysis |
Publications (2)
Publication Number | Publication Date |
---|---|
CN107659570A CN107659570A (en) | 2018-02-02 |
CN107659570B true CN107659570B (en) | 2020-09-15 |
Family
ID=61116698
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201710903110.2A Active CN107659570B (en) | 2017-09-29 | 2017-09-29 | Webshell detection method and system based on machine learning and dynamic and static analysis |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN107659570B (en) |
Families Citing this family (23)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108334781B (en) * | 2018-03-07 | 2020-04-14 | 腾讯科技(深圳)有限公司 | Virus detection method, device, computer readable storage medium and computer equipment |
CN110198291B (en) * | 2018-03-15 | 2022-02-18 | 腾讯科技(深圳)有限公司 | Webpage backdoor detection method, device, terminal and storage medium |
CN108446561A (en) * | 2018-03-21 | 2018-08-24 | 河北师范大学 | A kind of malicious code behavioural characteristic extracting method |
CN108804921A (en) * | 2018-05-29 | 2018-11-13 | 中国科学院信息工程研究所 | The going of a kind of PowerShell codes obscures method and device |
CN110619211A (en) * | 2018-06-20 | 2019-12-27 | 深信服科技股份有限公司 | Malicious software identification method, system and related device based on dynamic characteristics |
CN108985061B (en) * | 2018-07-05 | 2021-10-01 | 北京大学 | Webshell detection method based on model fusion |
CN109598124A (en) * | 2018-12-11 | 2019-04-09 | 厦门服云信息科技有限公司 | A kind of webshell detection method and device |
CN109600382B (en) * | 2018-12-19 | 2021-07-13 | 北京知道创宇信息技术股份有限公司 | Webshell detection method and device and HMM model training method and device |
CN109933977A (en) * | 2019-03-12 | 2019-06-25 | 北京神州绿盟信息安全科技股份有限公司 | A kind of method and device detecting webshell data |
CN110086788A (en) * | 2019-04-17 | 2019-08-02 | 杭州安恒信息技术股份有限公司 | Deep learning WebShell means of defence based on cloud WAF |
CN110210225A (en) * | 2019-05-27 | 2019-09-06 | 四川大学 | A kind of intelligentized Docker container malicious file detection method and device |
CN110750789B (en) * | 2019-10-18 | 2021-07-20 | 杭州奇盾信息技术有限公司 | De-obfuscation method, de-obfuscation device, computer apparatus, and storage medium |
CN111163095B (en) * | 2019-12-31 | 2022-08-30 | 奇安信科技集团股份有限公司 | Network attack analysis method, network attack analysis device, computing device, and medium |
CN113111346A (en) * | 2020-01-13 | 2021-07-13 | 深信服科技股份有限公司 | Multi-engine WebShell script file detection method and system |
CN113110986A (en) * | 2020-01-13 | 2021-07-13 | 深信服科技股份有限公司 | WebShell script file detection method and system |
CN111385295B (en) * | 2020-03-04 | 2022-11-22 | 深信服科技股份有限公司 | WebShell detection method, device, equipment and storage medium |
CN111931187A (en) * | 2020-08-13 | 2020-11-13 | 深信服科技股份有限公司 | Component vulnerability detection method, device, equipment and readable storage medium |
CN112597498A (en) * | 2020-12-29 | 2021-04-02 | 天津睿邦安通技术有限公司 | Webshell detection method, system and device and readable storage medium |
CN112883373A (en) * | 2020-12-30 | 2021-06-01 | 国药集团基因科技有限公司 | PHP type WebShell detection method and detection system thereof |
CN112926054B (en) * | 2021-02-22 | 2023-10-03 | 亚信科技(成都)有限公司 | Malicious file detection method, device, equipment and storage medium |
CN112948834A (en) * | 2021-03-25 | 2021-06-11 | 国药(武汉)医学实验室有限公司 | Deep ensemble learning model construction method for malicious WebShell detection |
CN113239352B (en) * | 2021-04-06 | 2022-05-17 | 中国科学院信息工程研究所 | Webshell detection method and system |
CN116991978B (en) * | 2023-09-26 | 2024-01-02 | 杭州今元标矩科技有限公司 | CMS (content management system) fragment feature extraction method, system, electronic equipment and storage medium |
Family Cites Families (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102663296B (en) * | 2012-03-31 | 2015-01-07 | 杭州安恒信息技术有限公司 | Intelligent detection method for Java script malicious code facing to the webpage |
CN102779249B (en) * | 2012-06-28 | 2015-07-29 | 北京奇虎科技有限公司 | Malware detection methods and scanning engine |
CN103532949B (en) * | 2013-10-14 | 2017-06-09 | 刘胜利 | Self adaptation wooden horse communication behavior detection method based on dynamical feedback |
CN107169351A (en) * | 2017-05-11 | 2017-09-15 | 北京理工大学 | With reference to the Android unknown malware detection methods of dynamic behaviour feature |
-
2017
- 2017-09-29 CN CN201710903110.2A patent/CN107659570B/en active Active
Also Published As
Publication number | Publication date |
---|---|
CN107659570A (en) | 2018-02-02 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN107659570B (en) | Webshell detection method and system based on machine learning and dynamic and static analysis | |
CN108763928B (en) | Open source software vulnerability analysis method and device and storage medium | |
US11637859B1 (en) | System and method for analyzing binary code for malware classification using artificial neural network techniques | |
US11379577B2 (en) | Uniform resource locator security analysis using malice patterns | |
US9621570B2 (en) | System and method for selectively evolving phishing detection rules | |
D'Angelo et al. | Effective classification of android malware families through dynamic features and neural networks | |
CN107204960B (en) | Webpage identification method and device and server | |
CN112800427B (en) | Webshell detection method and device, electronic equipment and storage medium | |
RU2722692C1 (en) | Method and system for detecting malicious files in a non-isolated medium | |
US20160352763A1 (en) | Method And System For Detecting Malicious Code | |
CN112685735B (en) | Method, apparatus and computer readable storage medium for detecting abnormal data | |
CN111835777B (en) | Abnormal flow detection method, device, equipment and medium | |
CN107395650B (en) | Method and device for identifying Trojan back connection based on sandbox detection file | |
Olukoya et al. | Security-oriented view of app behaviour using textual descriptions and user-granted permission requests | |
JPWO2019013266A1 (en) | Determination device, determination method, and determination program | |
EP3799367B1 (en) | Generation device, generation method, and generation program | |
CN107786529B (en) | Website detection method, device and system | |
CN111382432A (en) | Malicious software detection and classification model generation method and device | |
Ugarte-Pedrero et al. | On the adoption of anomaly detection for packed executable filtering | |
US11308091B2 (en) | Information collection system, information collection method, and recording medium | |
CN113779437A (en) | Privacy detection method and device and computer storage medium | |
US20230252144A1 (en) | Cyber threat information processing apparatus, cyber threat information processing method, and storage medium storing cyber threat information processing program | |
US20220237289A1 (en) | Automated malware classification with human-readable explanations | |
US12079285B2 (en) | Training device, determination device, training method, determination method, training method, and determination program | |
CN113626815A (en) | Virus information identification method, virus information identification device and electronic equipment |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
CB02 | Change of applicant information |
Address after: 310000 No. 188 Lianhui Street, Xixing Street, Binjiang District, Hangzhou City, Zhejiang Province Applicant after: Hangzhou Anheng Information Technology Co.,Ltd. Address before: Zhejiang Zhongcai Building No. 68 Binjiang District road Hangzhou City, Zhejiang Province, the 310051 and 15 layer Applicant before: DBAPPSECURITY Co.,Ltd. |
|
CB02 | Change of applicant information | ||
GR01 | Patent grant | ||
GR01 | Patent grant |