CN107659570A - Webshell detection methods and system based on machine learning and static and dynamic analysis - Google Patents

Webshell detection methods and system based on machine learning and static and dynamic analysis Download PDF

Info

Publication number
CN107659570A
CN107659570A CN201710903110.2A CN201710903110A CN107659570A CN 107659570 A CN107659570 A CN 107659570A CN 201710903110 A CN201710903110 A CN 201710903110A CN 107659570 A CN107659570 A CN 107659570A
Authority
CN
China
Prior art keywords
static
file
machine learning
webshell
dynamic analysis
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201710903110.2A
Other languages
Chinese (zh)
Other versions
CN107659570B (en
Inventor
唐佳莉
范渊
莫金友
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
DBAPPSecurity Co Ltd
Original Assignee
DBAPPSecurity Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by DBAPPSecurity Co Ltd filed Critical DBAPPSecurity Co Ltd
Priority to CN201710903110.2A priority Critical patent/CN107659570B/en
Publication of CN107659570A publication Critical patent/CN107659570A/en
Application granted granted Critical
Publication of CN107659570B publication Critical patent/CN107659570B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/14Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic
    • H04L63/1408Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic by monitoring network traffic
    • H04L63/1425Traffic logging, e.g. anomaly detection
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/02Protocols based on web technology, e.g. hypertext transfer protocol [HTTP]

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Signal Processing (AREA)
  • Data Mining & Analysis (AREA)
  • Computer Security & Cryptography (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Software Systems (AREA)
  • Physics & Mathematics (AREA)
  • Computing Systems (AREA)
  • Medical Informatics (AREA)
  • Artificial Intelligence (AREA)
  • Mathematical Physics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Databases & Information Systems (AREA)
  • Evolutionary Computation (AREA)
  • Computer Hardware Design (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The embodiment of the present invention proposes Webshell detection methods and system based on machine learning and static and dynamic analysis, is related to Webshell detection technique fields.This method extracts the static nature and behavioral characteristics of the sample file, obtains disaggregated model according to the static nature, behavioral characteristics and machine learning algorithm, the disaggregated model is analyzed file to be detected and obtains testing result by obtaining sample file.The present invention uses the analysis means of combination ofperformance and static behavior, extract feature more comprehensively, the machine learning algorithm combined using a variety of sorting algorithms is carried out study to a large amount of Webshell samples and normal webpage sample and forms disaggregated model, and disaggregated model stability is higher, and classification is more accurate;Using the disaggregated model can effective detection go out Webshell and its mutation, predict new Webshell, can preferably tackle text and obscure means, make up the deficiency of conventionally employed condition code matching detection mode.

Description

Webshell detection methods and system based on machine learning and static and dynamic analysis
Technical field
The present invention relates to Webshell detection technique fields, in particular to one kind based on machine learning and sound state The Webshell detection methods and system of analysis.
Background technology
As the very fast growth to flourish with internet data of the Internet, applications, server security problem are increasingly tight It is high, and Webshell it is this kind of based on Web application backdoor programs to user profile, even whole application system it is very harmful, Therefore detection in time finds leak and the back door of server, ensures that the safety of server is most important.
Because Webshell is not only limited to condition code, also wrapped by scripting language, easily modification deformation, its feature mostly Include file manipulation function, malice performs function, file notes size, single file string length, obscures degree etc., work as Webshell When carrying out simple mutation or deliberately obscuring its condition code, conventional method can fail to report such Webshell, that is, be easy to pass through The mode obscured bypasses the detection of fire wall and antivirus software, therefore the Webshell detection methods for being currently based on characteristic matching are difficult The mutation of quick detection and identification Webshell.
Therefore, the unicity and hysteresis quality of the Webshell detection modes of traditional feature based code matching how are overcome, should Means are obscured to Webshell text, realize quick detection Webshell and its mutation, are all art technology all the time The emphasis of personnel's concern.
The content of the invention
It is an object of the invention to provide a kind of Webshell detection methods based on machine learning and static and dynamic analysis, with Overcome the unicity and hysteresis quality of the Webshell detection modes of traditional feature based code matching, improve the standard of Webshell detections True property, quick detection Webshell and its mutation.
The present invention also aims to provide a kind of Webshell detecting systems based on machine learning and static and dynamic analysis, To overcome the unicity and hysteresis quality of the Webshell detection modes of traditional feature based code matching, Webshell detections are improved Accuracy, quick detection Webshell and its mutation.
To achieve these goals, the technical scheme that the embodiment of the present invention uses is as follows:
In a first aspect, the embodiment of the present invention proposes a kind of Webshell detection sides based on machine learning and static and dynamic analysis Method, the Webshell detection methods based on machine learning and static and dynamic analysis include:Obtain sample file;Extract the sample The static nature and behavioral characteristics of this document;Divided according to the static nature, the behavioral characteristics and machine learning algorithm Class model, the disaggregated model are analyzed file to be detected and obtain testing result.
Further, the step of static nature and behavioral characteristics of the extraction sample file includes:To the sample This document carries out static analysis and obtains the static nature, wherein, the document that the static nature includes the sample file is special Sign, basic function feature, file behavioural characteristic;Dynamic analysis are carried out to the sample file and obtain the behavioral characteristics, wherein, The behavioral characteristics include file and include operating characteristics, sensitivity function operation characteristic, sensitive character string feature.
Further, it is described to obtain disaggregated model according to the static nature, the behavioral characteristics and machine learning algorithm The step of include:The static nature and the behavioral characteristics are learnt using the machine learning algorithm, obtained described Disaggregated model.
Further, the machine learning algorithm is the collective study mode for combining a variety of sorting algorithms.
Further, the Webshell detection methods based on machine learning and static and dynamic analysis also include:When described When file to be detected confirms as Webshell afterwards after testing, machine is re-started according to the file to be detected and the sample file Device learns to update the disaggregated model.
Second aspect, the embodiment of the present invention also propose a kind of based on the Webshell of machine learning and static and dynamic analysis detections System, the Webshell detecting systems based on machine learning and static and dynamic analysis include sample acquisition module, feature extraction Module and model building module.The sample acquisition module is used to obtain sample file;The characteristic extracting module is used to extract The static nature and behavioral characteristics of the sample file;The model building module is used for according to the static nature, described dynamic State feature and machine learning algorithm obtain disaggregated model, and the disaggregated model is analyzed file to be detected and obtains detection knot Fruit.
Further, the characteristic extracting module includes Static analysis module and dynamic analysis module.The static analysis Module is used to obtain the static nature to sample file progress static analysis, wherein, the static nature includes described The file characteristics of sample file, basic function feature, file behavioural characteristic;The dynamic analysis module is used for sample text Part carries out dynamic analysis and obtains the behavioral characteristics, wherein, the behavioral characteristics include file and include operating characteristics, sensitivity function Operation characteristic, sensitive character string feature.
Further, the model building module is used to use the machine to the static nature and the behavioral characteristics Learning algorithm is learnt, and obtains the disaggregated model.
Further, the machine learning algorithm that the model building module uses is combines a variety of sorting algorithms Collective study mode.
Further, the Webshell detecting systems based on machine learning and static and dynamic analysis also include model modification Module, the model modification module are used for when the file to be detected confirms as Webshell afterwards after testing, according to institute State file to be detected and re-start machine learning with the sample file to update the disaggregated model.
Compared with the prior art, the invention has the advantages that:It is provided in an embodiment of the present invention based on machine learning with The Webshell detection methods and system of static and dynamic analysis, by obtaining sample file, the static state for extracting the sample file is special Seek peace behavioral characteristics, disaggregated model, the classification are obtained according to the static nature, the behavioral characteristics and machine learning algorithm Model is analyzed file to be detected and obtains testing result.The embodiment of the present invention uses the analysis hand of combination ofperformance and static behavior Section, feature is extracted more comprehensively, using the machine learning algorithm that a variety of sorting algorithms combine to a large amount of Webshell samples and normally Webpage sample carries out study and forms disaggregated model, and disaggregated model stability is higher, and classification is more accurate.Using machine learning algorithm The complicated classified calculating of multiple features can be tackled, the feature involved by detection is not intended to be limited to single condition code.User adopts With the disaggregated model can effective detection go out Webshell and its mutation, predict new Webshell, can preferably tackle text and mix Confuse means, make up the deficiency of conventionally employed condition code matching detection mode.
To enable the above objects, features and advantages of the present invention to become apparent, preferred embodiment cited below particularly, and coordinate Appended accompanying drawing, is described in detail below.
Brief description of the drawings
In order to illustrate the technical solution of the embodiments of the present invention more clearly, below by embodiment it is required use it is attached Figure is briefly described, it will be appreciated that the following drawings illustrate only certain embodiments of the present invention, therefore be not construed as pair The restriction of scope, for those of ordinary skill in the art, on the premise of not paying creative work, can also be according to this A little accompanying drawings obtain other related accompanying drawings.
Fig. 1 shows the block diagram for the server that the embodiment of the present invention is provided.
Fig. 2 shows that what first embodiment of the invention provided is examined based on the Webshell of machine learning and static and dynamic analysis The functional block diagram of examining system.
Fig. 3 shows the functional block diagram of characteristic extracting module in Fig. 2.
Fig. 4 is shown carries out Webshell detections based on the Webshell detecting systems of machine learning and static and dynamic analysis Schematic flow sheet.
Fig. 5 shows that what second embodiment of the invention provided is examined based on the Webshell of machine learning and static and dynamic analysis The schematic flow sheet of survey method.
Fig. 6 shows the idiographic flow schematic diagram of step S202 in Fig. 5.
Icon:100- servers;Webshell detecting systems of the 400- based on machine learning and static and dynamic analysis;110- is deposited Reservoir;120- storage controls;130- processors;410- sample acquisition modules;420- characteristic extracting modules;430- models are established Module;440- model modification modules;421- Static analysis modules;422- dynamic analysis modules.
Embodiment
Below in conjunction with accompanying drawing in the embodiment of the present invention, the technical scheme in the embodiment of the present invention is carried out clear, complete Ground describes, it is clear that described embodiment is only part of the embodiment of the present invention, rather than whole embodiments.Generally exist The component of the embodiment of the present invention described and illustrated in accompanying drawing can be configured to arrange and design with a variety of herein.Cause This, the detailed description of the embodiments of the invention to providing in the accompanying drawings is not intended to limit claimed invention below Scope, but it is merely representative of the selected embodiment of the present invention.Based on embodiments of the invention, those skilled in the art are not doing The every other embodiment obtained on the premise of going out creative work, belongs to the scope of protection of the invention.
It should be noted that:Similar label and letter represents similar terms in following accompanying drawing, therefore, once a certain Xiang Yi It is defined, then it further need not be defined and explained in subsequent accompanying drawing in individual accompanying drawing.Meanwhile the present invention's In description, term " first ", " second " etc. are only used for distinguishing description, and it is not intended that instruction or hint relative importance.
The Webshell detection methods and system based on machine learning and static and dynamic analysis that the embodiment of the present invention is provided It can be applied to server 100 as shown in Figure 1.In the present embodiment, the server 100 may be, but not limited to, network clothes Business device, database server, cloud server etc..As shown in figure 1, server 100 can include memory 110, storage control Device 120 and processor 130.
The memory 110, storage control 120 and processor 130, directly or indirectly electrically connect between each element Connect, to realize the transmission of data or interaction.For example, these elements can pass through one or more communication bus or letter between each other Number line, which is realized, to be electrically connected with.Webshell detecting systems 400 based on machine learning and static and dynamic analysis include it is at least one can The operation of the server 100 is stored in the memory 110 or is solidificated in the form of software or firmware (firmware) Software function module in system (operating system, OS).The processor 130 is used to perform to deposit in memory 110 The executable module of storage, for example, being somebody's turn to do soft included by the Webshell detecting systems 400 based on machine learning and static and dynamic analysis Part functional module and computer program etc..
Wherein, memory 110 may be, but not limited to, random access memory (Random Access Memory, RAM), read-only storage (Read Only Memory, ROM), programmable read only memory (Programmable Read-Only Memory, PROM), erasable read-only memory (Erasable Programmable Read-Only Memory, EPROM), Electricallyerasable ROM (EEROM) (Electric Erasable Programmable Read-Only Memory, EEPROM) etc.. Memory 110 can be used for storage software program and module, and processor 130 is used for after execute instruction is received, described in execution Program.
Processor 130 is probably a kind of IC chip, has the disposal ability of signal.Above-mentioned processor 130 can To be general processor, including central processing unit (Central Processing Unit, abbreviation CPU), network processing unit (Network Processor, abbreviation NP) etc.;Can also be digital signal processor (DSP), application specific integrated circuit (ASIC), Ready-made programmable gate array (FPGA) either other PLDs, discrete gate or transistor logic, discrete hard Part component.General processor can be microprocessor or the processor 130 can also be any conventional processor etc..
It is appreciated that structure shown in Fig. 1 is only to illustrate, the server 100 may also include it is more more than shown in Fig. 1 or The less component of person, or there is the configuration different from shown in Fig. 1.Each component shown in Fig. 1 can use hardware, software or It, which is combined, realizes.
First embodiment
Refer to Fig. 2, by first embodiment of the invention provide based on machine learning and static and dynamic analysis The functional block diagram of Webshell detecting systems 400.It is described that system is detected based on the Webshell of machine learning and static and dynamic analysis System 400 includes sample acquisition module 410, characteristic extracting module 420 and model building module 430.
The sample acquisition module 410 is used to obtain sample file.In the present embodiment, the sample file includes a large amount of Webshell samples and normal website sample, wherein, the type of the Webshell samples includes:ASP wooden horses, PHP wooden horses, The wooden horse that the multilinguals such as JSP wooden horses are write, a word wooden horse, picture code, feature can be also divided into from species and uploads big horse Deng;Normal website sample is all kinds of CMS of PHP language, or source code for required detection website etc., and this is not limited It is fixed.Preferably, the great amount of samples file of acquisition is stored in database, and user can freely add the Webshell samples of oneself collection Sheet and normal web page code.According to the difference of detection environment, there is provided website original document code usually can carry as positive sample The accuracy rate of high model, reduce rate of false alarm.
The characteristic extracting module 420 is used for the static nature and behavioral characteristics for extracting the sample file.
In the present embodiment, the characteristic extracting module 420 is used to carry out static and dynamic analysis to substantial amounts of sample file.Such as Shown in Fig. 3, the characteristic extracting module 420 specifically includes Static analysis module 421 and dynamic analysis module 422.
The Static analysis module 421 is used to obtain the static nature to sample file progress static analysis, its In, file characteristics of the static nature including the sample file, basic function feature, file behavioural characteristic.In this implementation In example, the Static analysis module 421 is mainly analyzed the character in sample file, counts sample file in multiple spies Levy the numerical value in dimension.Specifically, the file characteristics may include but be not limited to:Word quantity, various words quantity, line number, Average often row word number, NUL and space quantity, maximum word length, annotation quantity etc.;The basic function feature can wrap Include but be not limited to:Character manipulation function, sensitivity function call, system function calls quantity, script block counts, function parameter maximum Length, encryption and decryption function call etc.;The file behavioural characteristic may include but be not limited to:File operation, ftp operations, database Operation etc..
The dynamic analysis module 422 is used to obtain the behavioral characteristics to sample file progress dynamic analysis, its In, the behavioral characteristics include file and include operating characteristics, sensitivity function operation characteristic, sensitive character string feature.In this implementation In example, the dynamic analysis module 422 establishes translation and compiling environment or hook extensions, prison respectively primarily directed to distinct program language Control and combine the mark tracking of outside input variable, black and white lists mechanism carries out Webshell Real-time and Dynamic Detection, summarize Go out the behavioral characteristics of sample file.In the present embodiment, the feature that reply Webshell obscures includes:Calculate text moisture in the soil value, text This idle character number, the sensitive character string of dynamic analysis operation generation, sensitivity function etc..
The model building module 430 is used to obtain according to the static nature, the behavioral characteristics and machine learning algorithm To disaggregated model, the disaggregated model is analyzed file to be detected and obtains testing result.
In the present embodiment, by uploading file to be detected into system, the disaggregated model can be completed to be detected user The Webshell detections of file, draw classification results, and generate examining report and checked for user.
In the present embodiment, the model building module 430 is used to use the static nature and the behavioral characteristics The machine learning algorithm is learnt, and obtains the disaggregated model.Specifically, the model building module 430 is first to institute State static nature and the behavioral characteristics be normalized operation and obtain set of eigenvectors, using machine learning algorithm to feature to Quantity set is learnt, and disaggregated model is calculated.Preferably, in the present embodiment, the machine learning algorithm is more to combine The collective study mode of kind sorting algorithm, specifically may include:Random forests algorithm, decision Tree algorithms, logical algorithm etc..With reference to more The collective study mode of kind sorting algorithm can improve the stability and robustness of model, accurate so as to improve the detection of disaggregated model Rate.
It should be noted that in the present embodiment, after the completion of disaggregated model foundation, also part can be used not learnt Data the disaggregated model is carried out to test the error detection rate of the disaggregated model, rate of false alarm, rate of failing to report etc., then according to test The ratio of positive negative sample, quantity, type etc. in the data point reuse disaggregated model gone out, such as adjustment sample file, so as to improve point The degree of accuracy of class model, realize the optimization of disaggregated model.
Further, the Webshell detecting systems 400 based on machine learning and static and dynamic analysis also include model Update module 440, the model modification module 440 are used for when the file to be detected meets Webshell features, according to institute State file to be detected and re-start machine learning with the sample file to update the disaggregated model.
In the present embodiment, user can use the Webshell detecting systems based on machine learning and static and dynamic analysis 400 pairs of unknown files (that is to say file to be detected) carry out Webshell detections, when detecting that file to be detected is malicious file During Webshell, then the file to be detected is added in malice sample database, and sample file before enters again together The optimization and renewal of disaggregated model are realized in row machine learning.The idiographic flow that user carries out Webshell detections using the system can Reference picture 4, is specifically included:
Step S101, obtain file to be detected.
Specifically, user is connected into the system and uploads file to be detected, and system is got by sample acquisition module 410 The file to be detected.
Step S102, extract the static nature and behavioral characteristics of the file to be detected.
In the present embodiment, the system carries out sound state automatically by characteristic extracting module 420 to the file to be detected Feature extraction.
Step S103, the file to be detected is analyzed using the disaggregated model and obtains testing result.
Specifically, after the completion of the feature extraction of file to be detected, detected by the disaggregated model of foundation, to determine Whether the file to be detected is malicious file Webshell, obtains testing result, is extracted then in conjunction with characteristic extracting module 420 Dynamic static nature, form examining report and check so as to user.For example, the content that examining report is shown may include:It is to be detected File be malicious file Webshell possibility percentage, extraction feature (such as the function of malice, file operation behavior, The blacklist character of appearance) etc..
Second embodiment
Refer to Fig. 5, by second embodiment of the invention provide based on machine learning and static and dynamic analysis The schematic flow sheet of Webshell detection methods.It should be noted that described in the embodiment of the present invention based on machine learning with it is dynamic The Webshell detection methods of static analysis not using Fig. 5 and particular order as described below as limitation, its general principle and Caused technique effect is identical with first embodiment, to briefly describe, does not refer to part in the present embodiment, refers to the first implementation Corresponding contents in example.It is of the present invention based on machine learning and static and dynamic analysis it should be appreciated that in other embodiments The order of Webshell detection method which part steps can be exchanged with each other according to being actually needed, or part steps therein It can also omit or delete.The idiographic flow shown in Fig. 5 will be described in detail below.
Step S201, obtain sample file.
It is appreciated that step S201 can be performed by above-mentioned sample acquisition module 410.
Step S202, extract the static nature and behavioral characteristics of the sample file.
It is appreciated that step S202 can be performed by above-mentioned characteristic extracting module 420.
As shown in fig. 6, in the present embodiment, the step S202 specifically includes following sub-step:
Sub-step S2021, static analysis is carried out to the sample file and obtains the static nature, wherein, the static state File characteristics of the feature including the sample file, basic function feature, file behavioural characteristic.
It is appreciated that step S2021 can be performed by above-mentioned Static analysis module 421.
Sub-step S2022, dynamic analysis are carried out to the sample file and obtain the behavioral characteristics, wherein, the dynamic Feature includes file and includes operating characteristics, sensitivity function operation characteristic, sensitive character string feature.
It is appreciated that step S2022 can be performed by above-mentioned dynamic analysis module 422.
It should be noted that in the present embodiment, sub-paragraphs S2021, S2022 order do not limit or Perform simultaneously.
Step S203, disaggregated model is obtained according to the static nature, the behavioral characteristics and machine learning algorithm, it is described Disaggregated model is analyzed file to be detected and obtains testing result.
It is appreciated that step S203 can be performed by above-mentioned model building module 430.
Step S204, when the file to be detected confirms as Webshell afterwards after testing, according to the file to be detected Machine learning is re-started with the sample file to update the disaggregated model.
It is appreciated that step S204 can be performed by above-mentioned model modification module 440.
In summary, the Webshell detection sides based on machine learning and static and dynamic analysis that the embodiment of the present invention is provided Method and system, by obtaining sample file, static analysis is carried out to the sample file and dynamic analysis extract the sample respectively The static nature and behavioral characteristics of this document, carried out according to the static nature and the behavioral characteristics using machine learning algorithm Study obtains disaggregated model, and the disaggregated model is analyzed file to be detected and obtains testing result.Further, institute is worked as When stating file to be detected and confirming as Webshell afterwards after testing, then the file to be detected is added in sample database, therewith Preceding sample file re-starts the renewal that the disaggregated model is realized in machine learning together.The embodiment of the present invention uses sound state The analysis means being combined, more comprehensively, the machine learning algorithm combined using a variety of sorting algorithms is to a large amount of for extraction feature Webshell samples and normal webpage sample carry out study and form disaggregated model, and disaggregated model stability is higher, and classification is more accurate Really.The complicated classified calculating of multiple features can be tackled using machine learning algorithm, is not intended to be limited to the feature involved by detection Single condition code.User using the disaggregated model can effective detection go out Webshell and its mutation, predict new Webshell, Text can preferably be tackled and obscure means, make up the deficiency of conventionally employed condition code matching detection mode.
It should be noted that herein, the relational terms of such as " first " and " second " or the like are used merely to one Individual entity or operation make a distinction with another entity or operation, and not necessarily require or imply these entities or operate it Between any this actual relation or order be present.Moreover, term " comprising ", "comprising" or its any other variant are intended to Cover including for nonexcludability, so that process, method, article or equipment including a series of elements not only include those Key element, but also the other element including being not expressly set out, or also include for this process, method, article or set Standby intrinsic key element.In the absence of more restrictions, the key element limited by sentence "including a ...", it is not excluded that Other identical element in the process including the key element, method, article or equipment also be present.
The preferred embodiments of the present invention are the foregoing is only, are not intended to limit the invention, for the skill of this area For art personnel, the present invention can have various modifications and variations.Within the spirit and principles of the invention, that is made any repaiies Change, equivalent substitution, improvement etc., should be included in the scope of the protection.It should be noted that:Similar label and letter exists Similar terms is represented in following accompanying drawing, therefore, once being defined in a certain Xiang Yi accompanying drawing, is then not required in subsequent accompanying drawing It is further defined and explained.

Claims (10)

1. a kind of Webshell detection methods based on machine learning and static and dynamic analysis, it is characterised in that described to be based on machine Learn to include with the Webshell detection methods of static and dynamic analysis:
Obtain sample file;
Extract the static nature and behavioral characteristics of the sample file;
Disaggregated model is obtained according to the static nature, the behavioral characteristics and machine learning algorithm, the disaggregated model is treated Detection file is analyzed and obtains testing result.
2. the Webshell detection methods based on machine learning and static and dynamic analysis as claimed in claim 1, it is characterised in that The step of static nature and behavioral characteristics of the extraction sample file, includes:
Static analysis is carried out to the sample file and obtains the static nature, wherein, the static nature includes the sample The file characteristics of file, basic function feature, file behavioural characteristic;
Dynamic analysis are carried out to the sample file and obtain the behavioral characteristics, wherein, the behavioral characteristics include including file Operating characteristics, sensitivity function operation characteristic, sensitive character string feature.
3. the Webshell detection methods based on machine learning and static and dynamic analysis as claimed in claim 1, it is characterised in that Described the step of obtaining disaggregated model according to the static nature, the behavioral characteristics and machine learning algorithm, includes:
The static nature and the behavioral characteristics are learnt using the machine learning algorithm, obtain the classification mould Type.
4. the Webshell detection methods based on machine learning and static and dynamic analysis as claimed in claim 1, it is characterised in that The machine learning algorithm is the collective study mode for combining a variety of sorting algorithms.
5. the Webshell detection methods based on machine learning and static and dynamic analysis as claimed in claim 1, it is characterised in that The Webshell detection methods based on machine learning and static and dynamic analysis also include:
When the file to be detected confirms as Webshell afterwards after testing, according to the file to be detected and the sample file Machine learning is re-started to update the disaggregated model.
6. a kind of Webshell detecting systems based on machine learning and static and dynamic analysis, it is characterised in that described to be based on machine Learn to include with the Webshell detecting systems of static and dynamic analysis:
Sample acquisition module, for obtaining sample file;
Characteristic extracting module, for extracting the static nature and behavioral characteristics of the sample file;
Model building module, for obtaining disaggregated model according to the static nature, the behavioral characteristics and machine learning algorithm, The disaggregated model is analyzed file to be detected and obtains testing result.
7. the Webshell detecting systems based on machine learning and static and dynamic analysis as claimed in claim 6, it is characterised in that The characteristic extracting module includes:
Static analysis module, the static nature is obtained for carrying out static analysis to the sample file, wherein, the static state File characteristics of the feature including the sample file, basic function feature, file behavioural characteristic;
Dynamic analysis module, the behavioral characteristics are obtained for carrying out dynamic analysis to the sample file, wherein, the dynamic Feature includes file and includes operating characteristics, sensitivity function operation characteristic, sensitive character string feature.
8. the Webshell detecting systems based on machine learning and static and dynamic analysis as claimed in claim 6, it is characterised in that The model building module is used to learn the static nature and the behavioral characteristics using the machine learning algorithm, Obtain the disaggregated model.
9. the Webshell detecting systems based on machine learning and static and dynamic analysis as claimed in claim 6, it is characterised in that The machine learning algorithm that the model building module uses is combines the collective study mode of a variety of sorting algorithms.
10. the Webshell detecting systems based on machine learning and static and dynamic analysis, its feature exist as claimed in claim 6 In the Webshell detecting systems based on machine learning and static and dynamic analysis also include:
Model modification module, for when the file to be detected confirms as Webshell afterwards after testing, according to described to be detected File re-starts machine learning to update the disaggregated model with the sample file.
CN201710903110.2A 2017-09-29 2017-09-29 Webshell detection method and system based on machine learning and dynamic and static analysis Active CN107659570B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201710903110.2A CN107659570B (en) 2017-09-29 2017-09-29 Webshell detection method and system based on machine learning and dynamic and static analysis

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710903110.2A CN107659570B (en) 2017-09-29 2017-09-29 Webshell detection method and system based on machine learning and dynamic and static analysis

Publications (2)

Publication Number Publication Date
CN107659570A true CN107659570A (en) 2018-02-02
CN107659570B CN107659570B (en) 2020-09-15

Family

ID=61116698

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710903110.2A Active CN107659570B (en) 2017-09-29 2017-09-29 Webshell detection method and system based on machine learning and dynamic and static analysis

Country Status (1)

Country Link
CN (1) CN107659570B (en)

Cited By (23)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108334781A (en) * 2018-03-07 2018-07-27 腾讯科技(深圳)有限公司 Method for detecting virus, device, computer readable storage medium and computer equipment
CN108446561A (en) * 2018-03-21 2018-08-24 河北师范大学 A kind of malicious code behavioural characteristic extracting method
CN108804921A (en) * 2018-05-29 2018-11-13 中国科学院信息工程研究所 The going of a kind of PowerShell codes obscures method and device
CN108985061A (en) * 2018-07-05 2018-12-11 北京大学 A kind of webshell detection method based on Model Fusion
CN109600382A (en) * 2018-12-19 2019-04-09 北京知道创宇信息技术有限公司 Webshell detection method and device, HMM model training method and device
CN109598124A (en) * 2018-12-11 2019-04-09 厦门服云信息科技有限公司 A kind of webshell detection method and device
CN109933977A (en) * 2019-03-12 2019-06-25 北京神州绿盟信息安全科技股份有限公司 A kind of method and device detecting webshell data
CN110086788A (en) * 2019-04-17 2019-08-02 杭州安恒信息技术股份有限公司 Deep learning WebShell means of defence based on cloud WAF
CN110198291A (en) * 2018-03-15 2019-09-03 腾讯科技(深圳)有限公司 A kind of webpage back door detection method, device, terminal and storage medium
CN110210225A (en) * 2019-05-27 2019-09-06 四川大学 A kind of intelligentized Docker container malicious file detection method and device
WO2019242441A1 (en) * 2018-06-20 2019-12-26 深信服科技股份有限公司 Dynamic feature-based malware recognition method and system and related apparatus
CN110750789A (en) * 2019-10-18 2020-02-04 杭州奇盾信息技术有限公司 De-obfuscation method, de-obfuscation device, computer apparatus, and storage medium
CN111163095A (en) * 2019-12-31 2020-05-15 奇安信科技集团股份有限公司 Network attack analysis method, network attack analysis device, computing device, and medium
CN111385295A (en) * 2020-03-04 2020-07-07 深信服科技股份有限公司 WebShell detection method, device, equipment and storage medium
CN111931187A (en) * 2020-08-13 2020-11-13 深信服科技股份有限公司 Component vulnerability detection method, device, equipment and readable storage medium
CN112597498A (en) * 2020-12-29 2021-04-02 天津睿邦安通技术有限公司 Webshell detection method, system and device and readable storage medium
CN112883373A (en) * 2020-12-30 2021-06-01 国药集团基因科技有限公司 PHP type WebShell detection method and detection system thereof
CN112926054A (en) * 2021-02-22 2021-06-08 亚信科技(成都)有限公司 Malicious file detection method, device, equipment and storage medium
CN112948834A (en) * 2021-03-25 2021-06-11 国药(武汉)医学实验室有限公司 Deep ensemble learning model construction method for malicious WebShell detection
CN113111346A (en) * 2020-01-13 2021-07-13 深信服科技股份有限公司 Multi-engine WebShell script file detection method and system
CN113110986A (en) * 2020-01-13 2021-07-13 深信服科技股份有限公司 WebShell script file detection method and system
CN113239352A (en) * 2021-04-06 2021-08-10 中国科学院信息工程研究所 Webshell detection method and system
CN116991978A (en) * 2023-09-26 2023-11-03 杭州今元标矩科技有限公司 CMS (content management system) fragment feature extraction method, system, electronic equipment and storage medium

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102663296A (en) * 2012-03-31 2012-09-12 杭州安恒信息技术有限公司 Intelligent detection method for Java script malicious code facing to the webpage
CN102779249A (en) * 2012-06-28 2012-11-14 奇智软件(北京)有限公司 Malicious program detection method and scan engine
CN103532949A (en) * 2013-10-14 2014-01-22 刘胜利 Self-adaptive trojan communication behavior detection method on basis of dynamic feedback
CN107169351A (en) * 2017-05-11 2017-09-15 北京理工大学 With reference to the Android unknown malware detection methods of dynamic behaviour feature

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102663296A (en) * 2012-03-31 2012-09-12 杭州安恒信息技术有限公司 Intelligent detection method for Java script malicious code facing to the webpage
CN102779249A (en) * 2012-06-28 2012-11-14 奇智软件(北京)有限公司 Malicious program detection method and scan engine
CN103532949A (en) * 2013-10-14 2014-01-22 刘胜利 Self-adaptive trojan communication behavior detection method on basis of dynamic feedback
CN107169351A (en) * 2017-05-11 2017-09-15 北京理工大学 With reference to the Android unknown malware detection methods of dynamic behaviour feature

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
MING-YANG SU, KEK-TUNG FUNG, YU-HAO HUANG, MING-ZHI KANG: "Detection of Android Malware: Combined with Static Analysis and Dynamic Analysis", 《INTERNATIONAL CONFERENCE ON HIGH PERFORMANCE COMPUTING & SIMULATION》 *
SULEIMAN Y. YERIMA, SAKIR SEZER: "Android Malware Detection Using Parallel Machine Learning Classifiers", 《8TH INTERNATIONAL CONFERENCE ON NEXT GENERATION MOBILE APPLICATIONS, SERVICES AND TECHNOLOGIES》 *
张华: "《精通ASP疑难解析与技巧300例》", 31 July 2007, 中国铁道工业出版社 *

Cited By (31)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108334781A (en) * 2018-03-07 2018-07-27 腾讯科技(深圳)有限公司 Method for detecting virus, device, computer readable storage medium and computer equipment
CN108334781B (en) * 2018-03-07 2020-04-14 腾讯科技(深圳)有限公司 Virus detection method, device, computer readable storage medium and computer equipment
CN110198291B (en) * 2018-03-15 2022-02-18 腾讯科技(深圳)有限公司 Webpage backdoor detection method, device, terminal and storage medium
CN110198291A (en) * 2018-03-15 2019-09-03 腾讯科技(深圳)有限公司 A kind of webpage back door detection method, device, terminal and storage medium
CN108446561A (en) * 2018-03-21 2018-08-24 河北师范大学 A kind of malicious code behavioural characteristic extracting method
CN108804921A (en) * 2018-05-29 2018-11-13 中国科学院信息工程研究所 The going of a kind of PowerShell codes obscures method and device
WO2019242441A1 (en) * 2018-06-20 2019-12-26 深信服科技股份有限公司 Dynamic feature-based malware recognition method and system and related apparatus
CN110619211A (en) * 2018-06-20 2019-12-27 深信服科技股份有限公司 Malicious software identification method, system and related device based on dynamic characteristics
CN108985061A (en) * 2018-07-05 2018-12-11 北京大学 A kind of webshell detection method based on Model Fusion
CN109598124A (en) * 2018-12-11 2019-04-09 厦门服云信息科技有限公司 A kind of webshell detection method and device
CN109600382A (en) * 2018-12-19 2019-04-09 北京知道创宇信息技术有限公司 Webshell detection method and device, HMM model training method and device
CN109600382B (en) * 2018-12-19 2021-07-13 北京知道创宇信息技术股份有限公司 Webshell detection method and device and HMM model training method and device
CN109933977A (en) * 2019-03-12 2019-06-25 北京神州绿盟信息安全科技股份有限公司 A kind of method and device detecting webshell data
CN110086788A (en) * 2019-04-17 2019-08-02 杭州安恒信息技术股份有限公司 Deep learning WebShell means of defence based on cloud WAF
CN110210225A (en) * 2019-05-27 2019-09-06 四川大学 A kind of intelligentized Docker container malicious file detection method and device
CN110750789A (en) * 2019-10-18 2020-02-04 杭州奇盾信息技术有限公司 De-obfuscation method, de-obfuscation device, computer apparatus, and storage medium
CN111163095A (en) * 2019-12-31 2020-05-15 奇安信科技集团股份有限公司 Network attack analysis method, network attack analysis device, computing device, and medium
CN111163095B (en) * 2019-12-31 2022-08-30 奇安信科技集团股份有限公司 Network attack analysis method, network attack analysis device, computing device, and medium
CN113110986A (en) * 2020-01-13 2021-07-13 深信服科技股份有限公司 WebShell script file detection method and system
CN113111346A (en) * 2020-01-13 2021-07-13 深信服科技股份有限公司 Multi-engine WebShell script file detection method and system
CN111385295B (en) * 2020-03-04 2022-11-22 深信服科技股份有限公司 WebShell detection method, device, equipment and storage medium
CN111385295A (en) * 2020-03-04 2020-07-07 深信服科技股份有限公司 WebShell detection method, device, equipment and storage medium
CN111931187A (en) * 2020-08-13 2020-11-13 深信服科技股份有限公司 Component vulnerability detection method, device, equipment and readable storage medium
CN112597498A (en) * 2020-12-29 2021-04-02 天津睿邦安通技术有限公司 Webshell detection method, system and device and readable storage medium
CN112883373A (en) * 2020-12-30 2021-06-01 国药集团基因科技有限公司 PHP type WebShell detection method and detection system thereof
CN112926054A (en) * 2021-02-22 2021-06-08 亚信科技(成都)有限公司 Malicious file detection method, device, equipment and storage medium
CN112926054B (en) * 2021-02-22 2023-10-03 亚信科技(成都)有限公司 Malicious file detection method, device, equipment and storage medium
CN112948834A (en) * 2021-03-25 2021-06-11 国药(武汉)医学实验室有限公司 Deep ensemble learning model construction method for malicious WebShell detection
CN113239352A (en) * 2021-04-06 2021-08-10 中国科学院信息工程研究所 Webshell detection method and system
CN116991978A (en) * 2023-09-26 2023-11-03 杭州今元标矩科技有限公司 CMS (content management system) fragment feature extraction method, system, electronic equipment and storage medium
CN116991978B (en) * 2023-09-26 2024-01-02 杭州今元标矩科技有限公司 CMS (content management system) fragment feature extraction method, system, electronic equipment and storage medium

Also Published As

Publication number Publication date
CN107659570B (en) 2020-09-15

Similar Documents

Publication Publication Date Title
CN107659570A (en) Webshell detection methods and system based on machine learning and static and dynamic analysis
CN107204960B (en) Webpage identification method and device and server
US20160261618A1 (en) System and method for selectively evolving phishing detection rules
CN111639337B (en) Unknown malicious code detection method and system for massive Windows software
EP3454230B1 (en) Access classification device, access classification method, and access classification program
CN104156490A (en) Method and device for detecting suspicious fishing webpage based on character recognition
CN112800427B (en) Webshell detection method and device, electronic equipment and storage medium
CN110135157A (en) Malware homology analysis method, system, electronic equipment and storage medium
KR101858620B1 (en) Device and method for analyzing javascript using machine learning
CN110427755A (en) A kind of method and device identifying script file
CN111737692B (en) Application program risk detection method and device, equipment and storage medium
CN107341399A (en) Assess the method and device of code file security
Liu et al. An efficient multistage phishing website detection model based on the CASE feature framework: Aiming at the real web environment
CN109344614B (en) Android malicious application online detection method
CN109784059B (en) Trojan file tracing method, system and equipment
CN112817877B (en) Abnormal script detection method and device, computer equipment and storage medium
CN111382432A (en) Malicious software detection and classification model generation method and device
CN116932381A (en) Automatic evaluation method for security risk of applet and related equipment
CN116108880A (en) Training method of random forest model, malicious website detection method and device
Congyi et al. Method for detecting Android malware based on ensemble learning
US20220237289A1 (en) Automated malware classification with human-readable explanations
Zhang et al. Research on SQL injection vulnerabilities and its detection methods
CN109684844A (en) A kind of webshell detection method and device
Gao et al. Quorum chain-based malware detection in android smart devices
CN114595482A (en) Software source code privacy detection method and system based on static detection

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
CB02 Change of applicant information

Address after: 310000 No. 188 Lianhui Street, Xixing Street, Binjiang District, Hangzhou City, Zhejiang Province

Applicant after: Hangzhou Anheng Information Technology Co.,Ltd.

Address before: Zhejiang Zhongcai Building No. 68 Binjiang District road Hangzhou City, Zhejiang Province, the 310051 and 15 layer

Applicant before: DBAPPSECURITY Co.,Ltd.

CB02 Change of applicant information
GR01 Patent grant
GR01 Patent grant