CN107659570A - Webshell detection methods and system based on machine learning and static and dynamic analysis - Google Patents
Webshell detection methods and system based on machine learning and static and dynamic analysis Download PDFInfo
- Publication number
- CN107659570A CN107659570A CN201710903110.2A CN201710903110A CN107659570A CN 107659570 A CN107659570 A CN 107659570A CN 201710903110 A CN201710903110 A CN 201710903110A CN 107659570 A CN107659570 A CN 107659570A
- Authority
- CN
- China
- Prior art keywords
- static
- file
- machine learning
- webshell
- dynamic analysis
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L63/00—Network architectures or network communication protocols for network security
- H04L63/14—Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic
- H04L63/1408—Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic by monitoring network traffic
- H04L63/1425—Traffic logging, e.g. anomaly detection
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/35—Clustering; Classification
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N20/00—Machine learning
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L67/00—Network arrangements or protocols for supporting network services or applications
- H04L67/01—Protocols
- H04L67/02—Protocols based on web technology, e.g. hypertext transfer protocol [HTTP]
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Signal Processing (AREA)
- Data Mining & Analysis (AREA)
- Computer Security & Cryptography (AREA)
- Computer Networks & Wireless Communication (AREA)
- Software Systems (AREA)
- Physics & Mathematics (AREA)
- Computing Systems (AREA)
- Medical Informatics (AREA)
- Artificial Intelligence (AREA)
- Mathematical Physics (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Databases & Information Systems (AREA)
- Evolutionary Computation (AREA)
- Computer Hardware Design (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The embodiment of the present invention proposes Webshell detection methods and system based on machine learning and static and dynamic analysis, is related to Webshell detection technique fields.This method extracts the static nature and behavioral characteristics of the sample file, obtains disaggregated model according to the static nature, behavioral characteristics and machine learning algorithm, the disaggregated model is analyzed file to be detected and obtains testing result by obtaining sample file.The present invention uses the analysis means of combination ofperformance and static behavior, extract feature more comprehensively, the machine learning algorithm combined using a variety of sorting algorithms is carried out study to a large amount of Webshell samples and normal webpage sample and forms disaggregated model, and disaggregated model stability is higher, and classification is more accurate;Using the disaggregated model can effective detection go out Webshell and its mutation, predict new Webshell, can preferably tackle text and obscure means, make up the deficiency of conventionally employed condition code matching detection mode.
Description
Technical field
The present invention relates to Webshell detection technique fields, in particular to one kind based on machine learning and sound state
The Webshell detection methods and system of analysis.
Background technology
As the very fast growth to flourish with internet data of the Internet, applications, server security problem are increasingly tight
It is high, and Webshell it is this kind of based on Web application backdoor programs to user profile, even whole application system it is very harmful,
Therefore detection in time finds leak and the back door of server, ensures that the safety of server is most important.
Because Webshell is not only limited to condition code, also wrapped by scripting language, easily modification deformation, its feature mostly
Include file manipulation function, malice performs function, file notes size, single file string length, obscures degree etc., work as Webshell
When carrying out simple mutation or deliberately obscuring its condition code, conventional method can fail to report such Webshell, that is, be easy to pass through
The mode obscured bypasses the detection of fire wall and antivirus software, therefore the Webshell detection methods for being currently based on characteristic matching are difficult
The mutation of quick detection and identification Webshell.
Therefore, the unicity and hysteresis quality of the Webshell detection modes of traditional feature based code matching how are overcome, should
Means are obscured to Webshell text, realize quick detection Webshell and its mutation, are all art technology all the time
The emphasis of personnel's concern.
The content of the invention
It is an object of the invention to provide a kind of Webshell detection methods based on machine learning and static and dynamic analysis, with
Overcome the unicity and hysteresis quality of the Webshell detection modes of traditional feature based code matching, improve the standard of Webshell detections
True property, quick detection Webshell and its mutation.
The present invention also aims to provide a kind of Webshell detecting systems based on machine learning and static and dynamic analysis,
To overcome the unicity and hysteresis quality of the Webshell detection modes of traditional feature based code matching, Webshell detections are improved
Accuracy, quick detection Webshell and its mutation.
To achieve these goals, the technical scheme that the embodiment of the present invention uses is as follows:
In a first aspect, the embodiment of the present invention proposes a kind of Webshell detection sides based on machine learning and static and dynamic analysis
Method, the Webshell detection methods based on machine learning and static and dynamic analysis include:Obtain sample file;Extract the sample
The static nature and behavioral characteristics of this document;Divided according to the static nature, the behavioral characteristics and machine learning algorithm
Class model, the disaggregated model are analyzed file to be detected and obtain testing result.
Further, the step of static nature and behavioral characteristics of the extraction sample file includes:To the sample
This document carries out static analysis and obtains the static nature, wherein, the document that the static nature includes the sample file is special
Sign, basic function feature, file behavioural characteristic;Dynamic analysis are carried out to the sample file and obtain the behavioral characteristics, wherein,
The behavioral characteristics include file and include operating characteristics, sensitivity function operation characteristic, sensitive character string feature.
Further, it is described to obtain disaggregated model according to the static nature, the behavioral characteristics and machine learning algorithm
The step of include:The static nature and the behavioral characteristics are learnt using the machine learning algorithm, obtained described
Disaggregated model.
Further, the machine learning algorithm is the collective study mode for combining a variety of sorting algorithms.
Further, the Webshell detection methods based on machine learning and static and dynamic analysis also include:When described
When file to be detected confirms as Webshell afterwards after testing, machine is re-started according to the file to be detected and the sample file
Device learns to update the disaggregated model.
Second aspect, the embodiment of the present invention also propose a kind of based on the Webshell of machine learning and static and dynamic analysis detections
System, the Webshell detecting systems based on machine learning and static and dynamic analysis include sample acquisition module, feature extraction
Module and model building module.The sample acquisition module is used to obtain sample file;The characteristic extracting module is used to extract
The static nature and behavioral characteristics of the sample file;The model building module is used for according to the static nature, described dynamic
State feature and machine learning algorithm obtain disaggregated model, and the disaggregated model is analyzed file to be detected and obtains detection knot
Fruit.
Further, the characteristic extracting module includes Static analysis module and dynamic analysis module.The static analysis
Module is used to obtain the static nature to sample file progress static analysis, wherein, the static nature includes described
The file characteristics of sample file, basic function feature, file behavioural characteristic;The dynamic analysis module is used for sample text
Part carries out dynamic analysis and obtains the behavioral characteristics, wherein, the behavioral characteristics include file and include operating characteristics, sensitivity function
Operation characteristic, sensitive character string feature.
Further, the model building module is used to use the machine to the static nature and the behavioral characteristics
Learning algorithm is learnt, and obtains the disaggregated model.
Further, the machine learning algorithm that the model building module uses is combines a variety of sorting algorithms
Collective study mode.
Further, the Webshell detecting systems based on machine learning and static and dynamic analysis also include model modification
Module, the model modification module are used for when the file to be detected confirms as Webshell afterwards after testing, according to institute
State file to be detected and re-start machine learning with the sample file to update the disaggregated model.
Compared with the prior art, the invention has the advantages that:It is provided in an embodiment of the present invention based on machine learning with
The Webshell detection methods and system of static and dynamic analysis, by obtaining sample file, the static state for extracting the sample file is special
Seek peace behavioral characteristics, disaggregated model, the classification are obtained according to the static nature, the behavioral characteristics and machine learning algorithm
Model is analyzed file to be detected and obtains testing result.The embodiment of the present invention uses the analysis hand of combination ofperformance and static behavior
Section, feature is extracted more comprehensively, using the machine learning algorithm that a variety of sorting algorithms combine to a large amount of Webshell samples and normally
Webpage sample carries out study and forms disaggregated model, and disaggregated model stability is higher, and classification is more accurate.Using machine learning algorithm
The complicated classified calculating of multiple features can be tackled, the feature involved by detection is not intended to be limited to single condition code.User adopts
With the disaggregated model can effective detection go out Webshell and its mutation, predict new Webshell, can preferably tackle text and mix
Confuse means, make up the deficiency of conventionally employed condition code matching detection mode.
To enable the above objects, features and advantages of the present invention to become apparent, preferred embodiment cited below particularly, and coordinate
Appended accompanying drawing, is described in detail below.
Brief description of the drawings
In order to illustrate the technical solution of the embodiments of the present invention more clearly, below by embodiment it is required use it is attached
Figure is briefly described, it will be appreciated that the following drawings illustrate only certain embodiments of the present invention, therefore be not construed as pair
The restriction of scope, for those of ordinary skill in the art, on the premise of not paying creative work, can also be according to this
A little accompanying drawings obtain other related accompanying drawings.
Fig. 1 shows the block diagram for the server that the embodiment of the present invention is provided.
Fig. 2 shows that what first embodiment of the invention provided is examined based on the Webshell of machine learning and static and dynamic analysis
The functional block diagram of examining system.
Fig. 3 shows the functional block diagram of characteristic extracting module in Fig. 2.
Fig. 4 is shown carries out Webshell detections based on the Webshell detecting systems of machine learning and static and dynamic analysis
Schematic flow sheet.
Fig. 5 shows that what second embodiment of the invention provided is examined based on the Webshell of machine learning and static and dynamic analysis
The schematic flow sheet of survey method.
Fig. 6 shows the idiographic flow schematic diagram of step S202 in Fig. 5.
Icon:100- servers;Webshell detecting systems of the 400- based on machine learning and static and dynamic analysis;110- is deposited
Reservoir;120- storage controls;130- processors;410- sample acquisition modules;420- characteristic extracting modules;430- models are established
Module;440- model modification modules;421- Static analysis modules;422- dynamic analysis modules.
Embodiment
Below in conjunction with accompanying drawing in the embodiment of the present invention, the technical scheme in the embodiment of the present invention is carried out clear, complete
Ground describes, it is clear that described embodiment is only part of the embodiment of the present invention, rather than whole embodiments.Generally exist
The component of the embodiment of the present invention described and illustrated in accompanying drawing can be configured to arrange and design with a variety of herein.Cause
This, the detailed description of the embodiments of the invention to providing in the accompanying drawings is not intended to limit claimed invention below
Scope, but it is merely representative of the selected embodiment of the present invention.Based on embodiments of the invention, those skilled in the art are not doing
The every other embodiment obtained on the premise of going out creative work, belongs to the scope of protection of the invention.
It should be noted that:Similar label and letter represents similar terms in following accompanying drawing, therefore, once a certain Xiang Yi
It is defined, then it further need not be defined and explained in subsequent accompanying drawing in individual accompanying drawing.Meanwhile the present invention's
In description, term " first ", " second " etc. are only used for distinguishing description, and it is not intended that instruction or hint relative importance.
The Webshell detection methods and system based on machine learning and static and dynamic analysis that the embodiment of the present invention is provided
It can be applied to server 100 as shown in Figure 1.In the present embodiment, the server 100 may be, but not limited to, network clothes
Business device, database server, cloud server etc..As shown in figure 1, server 100 can include memory 110, storage control
Device 120 and processor 130.
The memory 110, storage control 120 and processor 130, directly or indirectly electrically connect between each element
Connect, to realize the transmission of data or interaction.For example, these elements can pass through one or more communication bus or letter between each other
Number line, which is realized, to be electrically connected with.Webshell detecting systems 400 based on machine learning and static and dynamic analysis include it is at least one can
The operation of the server 100 is stored in the memory 110 or is solidificated in the form of software or firmware (firmware)
Software function module in system (operating system, OS).The processor 130 is used to perform to deposit in memory 110
The executable module of storage, for example, being somebody's turn to do soft included by the Webshell detecting systems 400 based on machine learning and static and dynamic analysis
Part functional module and computer program etc..
Wherein, memory 110 may be, but not limited to, random access memory (Random Access Memory,
RAM), read-only storage (Read Only Memory, ROM), programmable read only memory (Programmable Read-Only
Memory, PROM), erasable read-only memory (Erasable Programmable Read-Only Memory, EPROM),
Electricallyerasable ROM (EEROM) (Electric Erasable Programmable Read-Only Memory, EEPROM) etc..
Memory 110 can be used for storage software program and module, and processor 130 is used for after execute instruction is received, described in execution
Program.
Processor 130 is probably a kind of IC chip, has the disposal ability of signal.Above-mentioned processor 130 can
To be general processor, including central processing unit (Central Processing Unit, abbreviation CPU), network processing unit
(Network Processor, abbreviation NP) etc.;Can also be digital signal processor (DSP), application specific integrated circuit (ASIC),
Ready-made programmable gate array (FPGA) either other PLDs, discrete gate or transistor logic, discrete hard
Part component.General processor can be microprocessor or the processor 130 can also be any conventional processor etc..
It is appreciated that structure shown in Fig. 1 is only to illustrate, the server 100 may also include it is more more than shown in Fig. 1 or
The less component of person, or there is the configuration different from shown in Fig. 1.Each component shown in Fig. 1 can use hardware, software or
It, which is combined, realizes.
First embodiment
Refer to Fig. 2, by first embodiment of the invention provide based on machine learning and static and dynamic analysis
The functional block diagram of Webshell detecting systems 400.It is described that system is detected based on the Webshell of machine learning and static and dynamic analysis
System 400 includes sample acquisition module 410, characteristic extracting module 420 and model building module 430.
The sample acquisition module 410 is used to obtain sample file.In the present embodiment, the sample file includes a large amount of
Webshell samples and normal website sample, wherein, the type of the Webshell samples includes:ASP wooden horses, PHP wooden horses,
The wooden horse that the multilinguals such as JSP wooden horses are write, a word wooden horse, picture code, feature can be also divided into from species and uploads big horse
Deng;Normal website sample is all kinds of CMS of PHP language, or source code for required detection website etc., and this is not limited
It is fixed.Preferably, the great amount of samples file of acquisition is stored in database, and user can freely add the Webshell samples of oneself collection
Sheet and normal web page code.According to the difference of detection environment, there is provided website original document code usually can carry as positive sample
The accuracy rate of high model, reduce rate of false alarm.
The characteristic extracting module 420 is used for the static nature and behavioral characteristics for extracting the sample file.
In the present embodiment, the characteristic extracting module 420 is used to carry out static and dynamic analysis to substantial amounts of sample file.Such as
Shown in Fig. 3, the characteristic extracting module 420 specifically includes Static analysis module 421 and dynamic analysis module 422.
The Static analysis module 421 is used to obtain the static nature to sample file progress static analysis, its
In, file characteristics of the static nature including the sample file, basic function feature, file behavioural characteristic.In this implementation
In example, the Static analysis module 421 is mainly analyzed the character in sample file, counts sample file in multiple spies
Levy the numerical value in dimension.Specifically, the file characteristics may include but be not limited to:Word quantity, various words quantity, line number,
Average often row word number, NUL and space quantity, maximum word length, annotation quantity etc.;The basic function feature can wrap
Include but be not limited to:Character manipulation function, sensitivity function call, system function calls quantity, script block counts, function parameter maximum
Length, encryption and decryption function call etc.;The file behavioural characteristic may include but be not limited to:File operation, ftp operations, database
Operation etc..
The dynamic analysis module 422 is used to obtain the behavioral characteristics to sample file progress dynamic analysis, its
In, the behavioral characteristics include file and include operating characteristics, sensitivity function operation characteristic, sensitive character string feature.In this implementation
In example, the dynamic analysis module 422 establishes translation and compiling environment or hook extensions, prison respectively primarily directed to distinct program language
Control and combine the mark tracking of outside input variable, black and white lists mechanism carries out Webshell Real-time and Dynamic Detection, summarize
Go out the behavioral characteristics of sample file.In the present embodiment, the feature that reply Webshell obscures includes:Calculate text moisture in the soil value, text
This idle character number, the sensitive character string of dynamic analysis operation generation, sensitivity function etc..
The model building module 430 is used to obtain according to the static nature, the behavioral characteristics and machine learning algorithm
To disaggregated model, the disaggregated model is analyzed file to be detected and obtains testing result.
In the present embodiment, by uploading file to be detected into system, the disaggregated model can be completed to be detected user
The Webshell detections of file, draw classification results, and generate examining report and checked for user.
In the present embodiment, the model building module 430 is used to use the static nature and the behavioral characteristics
The machine learning algorithm is learnt, and obtains the disaggregated model.Specifically, the model building module 430 is first to institute
State static nature and the behavioral characteristics be normalized operation and obtain set of eigenvectors, using machine learning algorithm to feature to
Quantity set is learnt, and disaggregated model is calculated.Preferably, in the present embodiment, the machine learning algorithm is more to combine
The collective study mode of kind sorting algorithm, specifically may include:Random forests algorithm, decision Tree algorithms, logical algorithm etc..With reference to more
The collective study mode of kind sorting algorithm can improve the stability and robustness of model, accurate so as to improve the detection of disaggregated model
Rate.
It should be noted that in the present embodiment, after the completion of disaggregated model foundation, also part can be used not learnt
Data the disaggregated model is carried out to test the error detection rate of the disaggregated model, rate of false alarm, rate of failing to report etc., then according to test
The ratio of positive negative sample, quantity, type etc. in the data point reuse disaggregated model gone out, such as adjustment sample file, so as to improve point
The degree of accuracy of class model, realize the optimization of disaggregated model.
Further, the Webshell detecting systems 400 based on machine learning and static and dynamic analysis also include model
Update module 440, the model modification module 440 are used for when the file to be detected meets Webshell features, according to institute
State file to be detected and re-start machine learning with the sample file to update the disaggregated model.
In the present embodiment, user can use the Webshell detecting systems based on machine learning and static and dynamic analysis
400 pairs of unknown files (that is to say file to be detected) carry out Webshell detections, when detecting that file to be detected is malicious file
During Webshell, then the file to be detected is added in malice sample database, and sample file before enters again together
The optimization and renewal of disaggregated model are realized in row machine learning.The idiographic flow that user carries out Webshell detections using the system can
Reference picture 4, is specifically included:
Step S101, obtain file to be detected.
Specifically, user is connected into the system and uploads file to be detected, and system is got by sample acquisition module 410
The file to be detected.
Step S102, extract the static nature and behavioral characteristics of the file to be detected.
In the present embodiment, the system carries out sound state automatically by characteristic extracting module 420 to the file to be detected
Feature extraction.
Step S103, the file to be detected is analyzed using the disaggregated model and obtains testing result.
Specifically, after the completion of the feature extraction of file to be detected, detected by the disaggregated model of foundation, to determine
Whether the file to be detected is malicious file Webshell, obtains testing result, is extracted then in conjunction with characteristic extracting module 420
Dynamic static nature, form examining report and check so as to user.For example, the content that examining report is shown may include:It is to be detected
File be malicious file Webshell possibility percentage, extraction feature (such as the function of malice, file operation behavior,
The blacklist character of appearance) etc..
Second embodiment
Refer to Fig. 5, by second embodiment of the invention provide based on machine learning and static and dynamic analysis
The schematic flow sheet of Webshell detection methods.It should be noted that described in the embodiment of the present invention based on machine learning with it is dynamic
The Webshell detection methods of static analysis not using Fig. 5 and particular order as described below as limitation, its general principle and
Caused technique effect is identical with first embodiment, to briefly describe, does not refer to part in the present embodiment, refers to the first implementation
Corresponding contents in example.It is of the present invention based on machine learning and static and dynamic analysis it should be appreciated that in other embodiments
The order of Webshell detection method which part steps can be exchanged with each other according to being actually needed, or part steps therein
It can also omit or delete.The idiographic flow shown in Fig. 5 will be described in detail below.
Step S201, obtain sample file.
It is appreciated that step S201 can be performed by above-mentioned sample acquisition module 410.
Step S202, extract the static nature and behavioral characteristics of the sample file.
It is appreciated that step S202 can be performed by above-mentioned characteristic extracting module 420.
As shown in fig. 6, in the present embodiment, the step S202 specifically includes following sub-step:
Sub-step S2021, static analysis is carried out to the sample file and obtains the static nature, wherein, the static state
File characteristics of the feature including the sample file, basic function feature, file behavioural characteristic.
It is appreciated that step S2021 can be performed by above-mentioned Static analysis module 421.
Sub-step S2022, dynamic analysis are carried out to the sample file and obtain the behavioral characteristics, wherein, the dynamic
Feature includes file and includes operating characteristics, sensitivity function operation characteristic, sensitive character string feature.
It is appreciated that step S2022 can be performed by above-mentioned dynamic analysis module 422.
It should be noted that in the present embodiment, sub-paragraphs S2021, S2022 order do not limit or
Perform simultaneously.
Step S203, disaggregated model is obtained according to the static nature, the behavioral characteristics and machine learning algorithm, it is described
Disaggregated model is analyzed file to be detected and obtains testing result.
It is appreciated that step S203 can be performed by above-mentioned model building module 430.
Step S204, when the file to be detected confirms as Webshell afterwards after testing, according to the file to be detected
Machine learning is re-started with the sample file to update the disaggregated model.
It is appreciated that step S204 can be performed by above-mentioned model modification module 440.
In summary, the Webshell detection sides based on machine learning and static and dynamic analysis that the embodiment of the present invention is provided
Method and system, by obtaining sample file, static analysis is carried out to the sample file and dynamic analysis extract the sample respectively
The static nature and behavioral characteristics of this document, carried out according to the static nature and the behavioral characteristics using machine learning algorithm
Study obtains disaggregated model, and the disaggregated model is analyzed file to be detected and obtains testing result.Further, institute is worked as
When stating file to be detected and confirming as Webshell afterwards after testing, then the file to be detected is added in sample database, therewith
Preceding sample file re-starts the renewal that the disaggregated model is realized in machine learning together.The embodiment of the present invention uses sound state
The analysis means being combined, more comprehensively, the machine learning algorithm combined using a variety of sorting algorithms is to a large amount of for extraction feature
Webshell samples and normal webpage sample carry out study and form disaggregated model, and disaggregated model stability is higher, and classification is more accurate
Really.The complicated classified calculating of multiple features can be tackled using machine learning algorithm, is not intended to be limited to the feature involved by detection
Single condition code.User using the disaggregated model can effective detection go out Webshell and its mutation, predict new Webshell,
Text can preferably be tackled and obscure means, make up the deficiency of conventionally employed condition code matching detection mode.
It should be noted that herein, the relational terms of such as " first " and " second " or the like are used merely to one
Individual entity or operation make a distinction with another entity or operation, and not necessarily require or imply these entities or operate it
Between any this actual relation or order be present.Moreover, term " comprising ", "comprising" or its any other variant are intended to
Cover including for nonexcludability, so that process, method, article or equipment including a series of elements not only include those
Key element, but also the other element including being not expressly set out, or also include for this process, method, article or set
Standby intrinsic key element.In the absence of more restrictions, the key element limited by sentence "including a ...", it is not excluded that
Other identical element in the process including the key element, method, article or equipment also be present.
The preferred embodiments of the present invention are the foregoing is only, are not intended to limit the invention, for the skill of this area
For art personnel, the present invention can have various modifications and variations.Within the spirit and principles of the invention, that is made any repaiies
Change, equivalent substitution, improvement etc., should be included in the scope of the protection.It should be noted that:Similar label and letter exists
Similar terms is represented in following accompanying drawing, therefore, once being defined in a certain Xiang Yi accompanying drawing, is then not required in subsequent accompanying drawing
It is further defined and explained.
Claims (10)
1. a kind of Webshell detection methods based on machine learning and static and dynamic analysis, it is characterised in that described to be based on machine
Learn to include with the Webshell detection methods of static and dynamic analysis:
Obtain sample file;
Extract the static nature and behavioral characteristics of the sample file;
Disaggregated model is obtained according to the static nature, the behavioral characteristics and machine learning algorithm, the disaggregated model is treated
Detection file is analyzed and obtains testing result.
2. the Webshell detection methods based on machine learning and static and dynamic analysis as claimed in claim 1, it is characterised in that
The step of static nature and behavioral characteristics of the extraction sample file, includes:
Static analysis is carried out to the sample file and obtains the static nature, wherein, the static nature includes the sample
The file characteristics of file, basic function feature, file behavioural characteristic;
Dynamic analysis are carried out to the sample file and obtain the behavioral characteristics, wherein, the behavioral characteristics include including file
Operating characteristics, sensitivity function operation characteristic, sensitive character string feature.
3. the Webshell detection methods based on machine learning and static and dynamic analysis as claimed in claim 1, it is characterised in that
Described the step of obtaining disaggregated model according to the static nature, the behavioral characteristics and machine learning algorithm, includes:
The static nature and the behavioral characteristics are learnt using the machine learning algorithm, obtain the classification mould
Type.
4. the Webshell detection methods based on machine learning and static and dynamic analysis as claimed in claim 1, it is characterised in that
The machine learning algorithm is the collective study mode for combining a variety of sorting algorithms.
5. the Webshell detection methods based on machine learning and static and dynamic analysis as claimed in claim 1, it is characterised in that
The Webshell detection methods based on machine learning and static and dynamic analysis also include:
When the file to be detected confirms as Webshell afterwards after testing, according to the file to be detected and the sample file
Machine learning is re-started to update the disaggregated model.
6. a kind of Webshell detecting systems based on machine learning and static and dynamic analysis, it is characterised in that described to be based on machine
Learn to include with the Webshell detecting systems of static and dynamic analysis:
Sample acquisition module, for obtaining sample file;
Characteristic extracting module, for extracting the static nature and behavioral characteristics of the sample file;
Model building module, for obtaining disaggregated model according to the static nature, the behavioral characteristics and machine learning algorithm,
The disaggregated model is analyzed file to be detected and obtains testing result.
7. the Webshell detecting systems based on machine learning and static and dynamic analysis as claimed in claim 6, it is characterised in that
The characteristic extracting module includes:
Static analysis module, the static nature is obtained for carrying out static analysis to the sample file, wherein, the static state
File characteristics of the feature including the sample file, basic function feature, file behavioural characteristic;
Dynamic analysis module, the behavioral characteristics are obtained for carrying out dynamic analysis to the sample file, wherein, the dynamic
Feature includes file and includes operating characteristics, sensitivity function operation characteristic, sensitive character string feature.
8. the Webshell detecting systems based on machine learning and static and dynamic analysis as claimed in claim 6, it is characterised in that
The model building module is used to learn the static nature and the behavioral characteristics using the machine learning algorithm,
Obtain the disaggregated model.
9. the Webshell detecting systems based on machine learning and static and dynamic analysis as claimed in claim 6, it is characterised in that
The machine learning algorithm that the model building module uses is combines the collective study mode of a variety of sorting algorithms.
10. the Webshell detecting systems based on machine learning and static and dynamic analysis, its feature exist as claimed in claim 6
In the Webshell detecting systems based on machine learning and static and dynamic analysis also include:
Model modification module, for when the file to be detected confirms as Webshell afterwards after testing, according to described to be detected
File re-starts machine learning to update the disaggregated model with the sample file.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710903110.2A CN107659570B (en) | 2017-09-29 | 2017-09-29 | Webshell detection method and system based on machine learning and dynamic and static analysis |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710903110.2A CN107659570B (en) | 2017-09-29 | 2017-09-29 | Webshell detection method and system based on machine learning and dynamic and static analysis |
Publications (2)
Publication Number | Publication Date |
---|---|
CN107659570A true CN107659570A (en) | 2018-02-02 |
CN107659570B CN107659570B (en) | 2020-09-15 |
Family
ID=61116698
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201710903110.2A Active CN107659570B (en) | 2017-09-29 | 2017-09-29 | Webshell detection method and system based on machine learning and dynamic and static analysis |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN107659570B (en) |
Cited By (23)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108334781A (en) * | 2018-03-07 | 2018-07-27 | 腾讯科技(深圳)有限公司 | Method for detecting virus, device, computer readable storage medium and computer equipment |
CN108446561A (en) * | 2018-03-21 | 2018-08-24 | 河北师范大学 | A kind of malicious code behavioural characteristic extracting method |
CN108804921A (en) * | 2018-05-29 | 2018-11-13 | 中国科学院信息工程研究所 | The going of a kind of PowerShell codes obscures method and device |
CN108985061A (en) * | 2018-07-05 | 2018-12-11 | 北京大学 | A kind of webshell detection method based on Model Fusion |
CN109600382A (en) * | 2018-12-19 | 2019-04-09 | 北京知道创宇信息技术有限公司 | Webshell detection method and device, HMM model training method and device |
CN109598124A (en) * | 2018-12-11 | 2019-04-09 | 厦门服云信息科技有限公司 | A kind of webshell detection method and device |
CN109933977A (en) * | 2019-03-12 | 2019-06-25 | 北京神州绿盟信息安全科技股份有限公司 | A kind of method and device detecting webshell data |
CN110086788A (en) * | 2019-04-17 | 2019-08-02 | 杭州安恒信息技术股份有限公司 | Deep learning WebShell means of defence based on cloud WAF |
CN110198291A (en) * | 2018-03-15 | 2019-09-03 | 腾讯科技(深圳)有限公司 | A kind of webpage back door detection method, device, terminal and storage medium |
CN110210225A (en) * | 2019-05-27 | 2019-09-06 | 四川大学 | A kind of intelligentized Docker container malicious file detection method and device |
WO2019242441A1 (en) * | 2018-06-20 | 2019-12-26 | 深信服科技股份有限公司 | Dynamic feature-based malware recognition method and system and related apparatus |
CN110750789A (en) * | 2019-10-18 | 2020-02-04 | 杭州奇盾信息技术有限公司 | De-obfuscation method, de-obfuscation device, computer apparatus, and storage medium |
CN111163095A (en) * | 2019-12-31 | 2020-05-15 | 奇安信科技集团股份有限公司 | Network attack analysis method, network attack analysis device, computing device, and medium |
CN111385295A (en) * | 2020-03-04 | 2020-07-07 | 深信服科技股份有限公司 | WebShell detection method, device, equipment and storage medium |
CN111931187A (en) * | 2020-08-13 | 2020-11-13 | 深信服科技股份有限公司 | Component vulnerability detection method, device, equipment and readable storage medium |
CN112597498A (en) * | 2020-12-29 | 2021-04-02 | 天津睿邦安通技术有限公司 | Webshell detection method, system and device and readable storage medium |
CN112883373A (en) * | 2020-12-30 | 2021-06-01 | 国药集团基因科技有限公司 | PHP type WebShell detection method and detection system thereof |
CN112926054A (en) * | 2021-02-22 | 2021-06-08 | 亚信科技(成都)有限公司 | Malicious file detection method, device, equipment and storage medium |
CN112948834A (en) * | 2021-03-25 | 2021-06-11 | 国药(武汉)医学实验室有限公司 | Deep ensemble learning model construction method for malicious WebShell detection |
CN113111346A (en) * | 2020-01-13 | 2021-07-13 | 深信服科技股份有限公司 | Multi-engine WebShell script file detection method and system |
CN113110986A (en) * | 2020-01-13 | 2021-07-13 | 深信服科技股份有限公司 | WebShell script file detection method and system |
CN113239352A (en) * | 2021-04-06 | 2021-08-10 | 中国科学院信息工程研究所 | Webshell detection method and system |
CN116991978A (en) * | 2023-09-26 | 2023-11-03 | 杭州今元标矩科技有限公司 | CMS (content management system) fragment feature extraction method, system, electronic equipment and storage medium |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102663296A (en) * | 2012-03-31 | 2012-09-12 | 杭州安恒信息技术有限公司 | Intelligent detection method for Java script malicious code facing to the webpage |
CN102779249A (en) * | 2012-06-28 | 2012-11-14 | 奇智软件(北京)有限公司 | Malicious program detection method and scan engine |
CN103532949A (en) * | 2013-10-14 | 2014-01-22 | 刘胜利 | Self-adaptive trojan communication behavior detection method on basis of dynamic feedback |
CN107169351A (en) * | 2017-05-11 | 2017-09-15 | 北京理工大学 | With reference to the Android unknown malware detection methods of dynamic behaviour feature |
-
2017
- 2017-09-29 CN CN201710903110.2A patent/CN107659570B/en active Active
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102663296A (en) * | 2012-03-31 | 2012-09-12 | 杭州安恒信息技术有限公司 | Intelligent detection method for Java script malicious code facing to the webpage |
CN102779249A (en) * | 2012-06-28 | 2012-11-14 | 奇智软件(北京)有限公司 | Malicious program detection method and scan engine |
CN103532949A (en) * | 2013-10-14 | 2014-01-22 | 刘胜利 | Self-adaptive trojan communication behavior detection method on basis of dynamic feedback |
CN107169351A (en) * | 2017-05-11 | 2017-09-15 | 北京理工大学 | With reference to the Android unknown malware detection methods of dynamic behaviour feature |
Non-Patent Citations (3)
Title |
---|
MING-YANG SU, KEK-TUNG FUNG, YU-HAO HUANG, MING-ZHI KANG: "Detection of Android Malware: Combined with Static Analysis and Dynamic Analysis", 《INTERNATIONAL CONFERENCE ON HIGH PERFORMANCE COMPUTING & SIMULATION》 * |
SULEIMAN Y. YERIMA, SAKIR SEZER: "Android Malware Detection Using Parallel Machine Learning Classifiers", 《8TH INTERNATIONAL CONFERENCE ON NEXT GENERATION MOBILE APPLICATIONS, SERVICES AND TECHNOLOGIES》 * |
张华: "《精通ASP疑难解析与技巧300例》", 31 July 2007, 中国铁道工业出版社 * |
Cited By (31)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108334781A (en) * | 2018-03-07 | 2018-07-27 | 腾讯科技(深圳)有限公司 | Method for detecting virus, device, computer readable storage medium and computer equipment |
CN108334781B (en) * | 2018-03-07 | 2020-04-14 | 腾讯科技(深圳)有限公司 | Virus detection method, device, computer readable storage medium and computer equipment |
CN110198291B (en) * | 2018-03-15 | 2022-02-18 | 腾讯科技(深圳)有限公司 | Webpage backdoor detection method, device, terminal and storage medium |
CN110198291A (en) * | 2018-03-15 | 2019-09-03 | 腾讯科技(深圳)有限公司 | A kind of webpage back door detection method, device, terminal and storage medium |
CN108446561A (en) * | 2018-03-21 | 2018-08-24 | 河北师范大学 | A kind of malicious code behavioural characteristic extracting method |
CN108804921A (en) * | 2018-05-29 | 2018-11-13 | 中国科学院信息工程研究所 | The going of a kind of PowerShell codes obscures method and device |
WO2019242441A1 (en) * | 2018-06-20 | 2019-12-26 | 深信服科技股份有限公司 | Dynamic feature-based malware recognition method and system and related apparatus |
CN110619211A (en) * | 2018-06-20 | 2019-12-27 | 深信服科技股份有限公司 | Malicious software identification method, system and related device based on dynamic characteristics |
CN108985061A (en) * | 2018-07-05 | 2018-12-11 | 北京大学 | A kind of webshell detection method based on Model Fusion |
CN109598124A (en) * | 2018-12-11 | 2019-04-09 | 厦门服云信息科技有限公司 | A kind of webshell detection method and device |
CN109600382A (en) * | 2018-12-19 | 2019-04-09 | 北京知道创宇信息技术有限公司 | Webshell detection method and device, HMM model training method and device |
CN109600382B (en) * | 2018-12-19 | 2021-07-13 | 北京知道创宇信息技术股份有限公司 | Webshell detection method and device and HMM model training method and device |
CN109933977A (en) * | 2019-03-12 | 2019-06-25 | 北京神州绿盟信息安全科技股份有限公司 | A kind of method and device detecting webshell data |
CN110086788A (en) * | 2019-04-17 | 2019-08-02 | 杭州安恒信息技术股份有限公司 | Deep learning WebShell means of defence based on cloud WAF |
CN110210225A (en) * | 2019-05-27 | 2019-09-06 | 四川大学 | A kind of intelligentized Docker container malicious file detection method and device |
CN110750789A (en) * | 2019-10-18 | 2020-02-04 | 杭州奇盾信息技术有限公司 | De-obfuscation method, de-obfuscation device, computer apparatus, and storage medium |
CN111163095A (en) * | 2019-12-31 | 2020-05-15 | 奇安信科技集团股份有限公司 | Network attack analysis method, network attack analysis device, computing device, and medium |
CN111163095B (en) * | 2019-12-31 | 2022-08-30 | 奇安信科技集团股份有限公司 | Network attack analysis method, network attack analysis device, computing device, and medium |
CN113110986A (en) * | 2020-01-13 | 2021-07-13 | 深信服科技股份有限公司 | WebShell script file detection method and system |
CN113111346A (en) * | 2020-01-13 | 2021-07-13 | 深信服科技股份有限公司 | Multi-engine WebShell script file detection method and system |
CN111385295B (en) * | 2020-03-04 | 2022-11-22 | 深信服科技股份有限公司 | WebShell detection method, device, equipment and storage medium |
CN111385295A (en) * | 2020-03-04 | 2020-07-07 | 深信服科技股份有限公司 | WebShell detection method, device, equipment and storage medium |
CN111931187A (en) * | 2020-08-13 | 2020-11-13 | 深信服科技股份有限公司 | Component vulnerability detection method, device, equipment and readable storage medium |
CN112597498A (en) * | 2020-12-29 | 2021-04-02 | 天津睿邦安通技术有限公司 | Webshell detection method, system and device and readable storage medium |
CN112883373A (en) * | 2020-12-30 | 2021-06-01 | 国药集团基因科技有限公司 | PHP type WebShell detection method and detection system thereof |
CN112926054A (en) * | 2021-02-22 | 2021-06-08 | 亚信科技(成都)有限公司 | Malicious file detection method, device, equipment and storage medium |
CN112926054B (en) * | 2021-02-22 | 2023-10-03 | 亚信科技(成都)有限公司 | Malicious file detection method, device, equipment and storage medium |
CN112948834A (en) * | 2021-03-25 | 2021-06-11 | 国药(武汉)医学实验室有限公司 | Deep ensemble learning model construction method for malicious WebShell detection |
CN113239352A (en) * | 2021-04-06 | 2021-08-10 | 中国科学院信息工程研究所 | Webshell detection method and system |
CN116991978A (en) * | 2023-09-26 | 2023-11-03 | 杭州今元标矩科技有限公司 | CMS (content management system) fragment feature extraction method, system, electronic equipment and storage medium |
CN116991978B (en) * | 2023-09-26 | 2024-01-02 | 杭州今元标矩科技有限公司 | CMS (content management system) fragment feature extraction method, system, electronic equipment and storage medium |
Also Published As
Publication number | Publication date |
---|---|
CN107659570B (en) | 2020-09-15 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN107659570A (en) | Webshell detection methods and system based on machine learning and static and dynamic analysis | |
CN107204960B (en) | Webpage identification method and device and server | |
US20160261618A1 (en) | System and method for selectively evolving phishing detection rules | |
CN111639337B (en) | Unknown malicious code detection method and system for massive Windows software | |
EP3454230B1 (en) | Access classification device, access classification method, and access classification program | |
CN104156490A (en) | Method and device for detecting suspicious fishing webpage based on character recognition | |
CN112800427B (en) | Webshell detection method and device, electronic equipment and storage medium | |
CN110135157A (en) | Malware homology analysis method, system, electronic equipment and storage medium | |
KR101858620B1 (en) | Device and method for analyzing javascript using machine learning | |
CN110427755A (en) | A kind of method and device identifying script file | |
CN111737692B (en) | Application program risk detection method and device, equipment and storage medium | |
CN107341399A (en) | Assess the method and device of code file security | |
Liu et al. | An efficient multistage phishing website detection model based on the CASE feature framework: Aiming at the real web environment | |
CN109344614B (en) | Android malicious application online detection method | |
CN109784059B (en) | Trojan file tracing method, system and equipment | |
CN112817877B (en) | Abnormal script detection method and device, computer equipment and storage medium | |
CN111382432A (en) | Malicious software detection and classification model generation method and device | |
CN116932381A (en) | Automatic evaluation method for security risk of applet and related equipment | |
CN116108880A (en) | Training method of random forest model, malicious website detection method and device | |
Congyi et al. | Method for detecting Android malware based on ensemble learning | |
US20220237289A1 (en) | Automated malware classification with human-readable explanations | |
Zhang et al. | Research on SQL injection vulnerabilities and its detection methods | |
CN109684844A (en) | A kind of webshell detection method and device | |
Gao et al. | Quorum chain-based malware detection in android smart devices | |
CN114595482A (en) | Software source code privacy detection method and system based on static detection |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
CB02 | Change of applicant information |
Address after: 310000 No. 188 Lianhui Street, Xixing Street, Binjiang District, Hangzhou City, Zhejiang Province Applicant after: Hangzhou Anheng Information Technology Co.,Ltd. Address before: Zhejiang Zhongcai Building No. 68 Binjiang District road Hangzhou City, Zhejiang Province, the 310051 and 15 layer Applicant before: DBAPPSECURITY Co.,Ltd. |
|
CB02 | Change of applicant information | ||
GR01 | Patent grant | ||
GR01 | Patent grant |