CN112688966A

CN112688966A - Webshell detection method, device, medium and equipment

Info

Publication number: CN112688966A
Application number: CN202110263959.4A
Authority: CN
Inventors: 徐国爱; 徐国胜; 齐向东; 纪胜龙; 王少杰; 王浩宇; 柏杨
Original assignee: Beijing University of Posts and Telecommunications; Qianxin Technology Group Co Ltd
Current assignee: Beijing University of Posts and Telecommunications; Qianxin Technology Group Co Ltd
Priority date: 2021-03-11
Filing date: 2021-03-11
Publication date: 2021-04-20

Abstract

The exemplary embodiment of the invention provides a webshell detection method, a webshell detection device, a webshell detection medium and webshell detection equipment, wherein the webshell detection method comprises the following steps: identifying dynamic characteristics of a document to be detected, including: after syntactic analysis and lexical analysis are carried out on a file to be detected, an abstract syntax tree of the file to be detected is generated; recording nodes with dynamic characteristics in the abstract syntax tree, and detecting whether stain variables exist in each node with dynamic characteristics through Trojan analysis; when the node with the dynamic characteristic has a taint variable, determining that the file to be detected has a risk; and when the node with the dynamic characteristic has no taint variable, determining the safety of the file to be detected. The method takes the dynamic characteristics of the file to be detected as core characteristics, and improves the detection accuracy and response efficiency of the webshell by matching with a dynamic and static analysis technology.

Description

Webshell detection method, device, medium and equipment

Technical Field

The exemplary embodiment of the invention relates to the technical field of internet, in particular to a webshell detection method, a webshell detection device, a webshell detection medium and webshell detection equipment.

Background

According to the public Internet network security situation and threat monitoring and handling report of the first half of 2020 published by the National Internet Emergency Center (CNCERT), the CNCERT monitors that 1.8 million IP addresses are implanted into 3.9 million websites in China, and about 7.4 million websites are tampered. Compared with 2019, the number of websites planted in backdoor in China is increased by more than 2.59 times. According to the '2019 network security situation awareness report' issued by the deep belief service, web scanning and website backdoor (webshell) become the most common web attack methods for attackers, and the occupation ratio is 52%. Meanwhile, statistics shows that the one sentence webshell (pony) is flexible in compiling, powerful in function, various in embedding mode, good in concealment, not easy to find and the like, so that the method becomes the first choice for an attacker to upload the webshell type.

At present, a great deal of research is carried out in the field of webshell detection at home and abroad, some feasible detection methods are provided, and corresponding software security analysis methods are formed, but the methods have respective defects in the aspects of accuracy and response speed, and no method and device capable of solving the problems exist at present.

Disclosure of Invention

In view of this, an object of an exemplary embodiment of the present invention is to provide a method, an apparatus, a medium, and a device for detecting a webshell, so as to solve the problems of low accuracy and insufficient response speed of the current webshell detection.

Based on the above purpose, an exemplary embodiment of the present invention provides a webshell detection method, including:

identifying dynamic characteristics of a document to be detected, including: after syntactic analysis and lexical analysis are carried out on a file to be detected, an abstract syntax tree of the file to be detected is generated; wherein the dynamic characteristics are defined as: changes to a certain section of code in the file to be detected can cause or possibly cause changes to the function of the code;

recording nodes with dynamic characteristics in the abstract syntax tree, and detecting whether stain variables exist in each node with dynamic characteristics through Trojan analysis;

when the node with the dynamic characteristic has a taint variable, determining that the file to be detected has a risk; and when the node with the dynamic characteristic has no taint variable, determining the safety of the file to be detected.

With reference to the foregoing description, in another possible implementation manner of the embodiment of the present invention, the recording nodes with dynamic characteristics in the abstract syntax tree, and detecting whether a taint variable exists in each node with dynamic characteristics through a Trojan analysis includes:

traversing each node on the abstract syntax tree, searching a function call or method call node which accords with the definition of the dynamic characteristic of the file to be detected, and recording at least one specific information of a corresponding line number, a function name and a parameter name;

and performing static taint analysis according to the obtained specific information, and determining that the node to which the specific information belongs uses a taint variable.

In another possible implementation manner of the embodiment of the present invention, in combination with the above description, the method further includes:

generating a control flow graph of the file to be detected according to the abstract syntax tree;

and analyzing the path between the data input source of the control flow graph and the node with the dynamic characteristic, and determining that the node with the dynamic characteristic is at risk when taint transfer exists.

judging whether the file to be detected is in a data encryption or confusion state;

and in response to the fact that the file to be detected is in a data encryption or confusion state, restoring the function in the file to be detected through a simulation program, and detecting whether the node with the dynamic characteristic has a stain variable or not by combining the Trojan analysis so as to perform static analysis.

With reference to the foregoing description, in another possible implementation manner of the embodiment of the present invention, before the identifying the dynamic characteristic of the file to be detected, the method further includes:

and after preprocessing the given file, determining the file to be scanned and analyzed.

generating an extensible webshell rule base;

when a webshell design feature update occurs, expanding the updated feature in the extensible webshell rule base.

establishing a white list and white sample library mechanism, and filtering dynamic characteristics through the white list and white sample mechanism;

when different filter functions exist, the scanning mechanism is customized to add the different filter functions to the security function of the taint tracking.

In a second aspect, the present invention further provides a webshell detection apparatus, including:

the identification module is used for identifying the dynamic characteristics of the file to be detected and comprises the following components: after syntactic analysis and lexical analysis are carried out on a file to be detected, an abstract syntax tree of the file to be detected is generated; wherein the dynamic characteristics are defined as: changes to a certain section of code in the file to be detected can cause or possibly cause changes to the function of the code;

the analysis module is used for recording the nodes with dynamic characteristics in the abstract syntax tree and detecting whether the nodes with dynamic characteristics have stain variables or not through Trojan analysis;

the judging module is used for responding to the situation that the node with the dynamic characteristic has a stain variable, and determining that the file to be detected has a risk; and when the node with the dynamic characteristic has no taint variable, determining the safety of the file to be detected.

In a third aspect, the present invention further provides an electronic device, which includes a memory, a processor, and a computer program stored in the memory and executable on the processor, where the processor implements the webshell detection method when executing the computer program.

In a fourth aspect, the present invention also provides a non-transitory computer-readable storage medium storing computer instructions for causing the computer to perform the webshell detection method.

As can be seen from the above, the webshell detection method, device, medium and device provided by the exemplary embodiment of the present invention improve the detection capability of the webshell by using the dynamic characteristics of the file to be detected as the core characteristics and by using the dynamic and static analysis technology, and have important significance in helping security personnel to quickly and accurately position the back door and improve the emergency response speed.

Drawings

In order to more clearly illustrate the exemplary embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only exemplary embodiments of the present invention, and for those skilled in the art, other drawings may be obtained based on these drawings without inventive effort.

FIG. 1 is a schematic flow chart of a webshell detection method according to an exemplary embodiment of the present invention;

FIG. 2 is a schematic diagram of a tool configuration corresponding to a method in accordance with an exemplary embodiment of the present invention;

FIG. 3 is a schematic diagram of a specific detection process according to an exemplary embodiment of the present invention;

FIG. 4 is a schematic illustration of a specific identification of an exemplary embodiment of the present invention;

FIG. 5 is a diagram of an abstract syntax tree in accordance with an exemplary embodiment of the present invention;

FIG. 6 is a schematic illustration of a taint propagation process in accordance with an exemplary embodiment of the present invention;

FIG. 7 is a schematic view of a security detection in accordance with an exemplary embodiment of the present invention;

FIG. 8 is a schematic diagram of real-time monitoring of an exemplary embodiment of the present invention;

fig. 9 is a schematic diagram of a basic structure of a webshell detection apparatus according to an exemplary embodiment of the present invention;

fig. 10 is a schematic diagram of an apparatus according to an exemplary embodiment of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is described in further detail below with reference to specific embodiments and the accompanying drawings.

It should be noted that technical terms or scientific terms used in the exemplary embodiments of the present invention should have a general meaning as understood by those having ordinary skill in the art to which the present invention pertains, unless otherwise defined. The word "comprising" or "comprises", and the like, means that the element or item listed before the word covers the element or item listed after the word and its equivalents, but does not exclude other elements or items.

Currently, the research on the webshell detection technology is many, and the mainstream webshell detection tools include: d shield, dongle, SHELLUB, findwebshell, CloudWalker, etc. The following is a brief introduction to these software:

1. d shield

The D shield is a webshell detection tool running in a Windows system, and is used for performing webshell killing by using a static code analysis engine in a traditional feature matching mode. Meanwhile, the D shield does not distinguish file extensions and can analyze more hidden webshell backdoor behaviors.

Compared with other detection tools, the accuracy of the D shield is higher, and the D shield can have certain reduction capability on the confused webshell sample. For webshells hidden in normal files, the D-shield can find the abnormal part. In addition, the D shield has strong identification capability for variable functions and some obscure command execution functions. If the risk threat is found, the D shield sends an alarm, displays specific information such as risk code parameters and the like to an administrator, and gives a corresponding risk level (0-5).

D shield supports command line mode and graphics mode, but can only run under Windows. In addition, the false alarm rate of the D shield is high, and the D shield can generate a certain number of false alarms aiming at the normal white sample file.

2. Safety dog

The safety dog is protection software aiming at a web website server and has a webshell searching and killing function. Through the combination of the local webshell searching and killing engine and the cloud artificial intelligent webshell searching and killing engine, various webshell risk files and other risk files can be searched and killed, and resource consumption is low.

The security dog can run in windows and Linux platforms, and different deployment modes are used for different middleware such as Apache, IIS, Nginx and the like.

3、SHELLPUB

SHELLUB is a webshell scanner that can run on windows and Linux platforms. The searching and killing technology of the traditional characteristic and cloud AI dual engine is adopted. A traditional searching and killing engine adopts a sensitive function identification method and is matched with a model trained by a cloud decision tree machine learning algorithm, so that the heuristic detection function is realized for an unknown sample.

4、findwebshell

The findwebshell is a webshell inspection tool developed based on python, and can conveniently detect a backdoor of the webshell through a configuration script. The findwebshell adopts a traditional regular matching mode, has a good searching and killing effect on known sample deformation, but has a poor searching and killing effect on unknown malicious webshell samples.

5、CloudWalker

CloudWalker is a webshell searching and killing tool launched by a kiosk. The tool is written by using Go language, is combined with a model trained by a machine learning SVM algorithm through traditional rule matching, divides a detection result into 5 grades, and feeds back the risk condition of a sample file through grade height.

A large amount of research is carried out in the field of webshell detection at home and abroad, some feasible detection methods are provided, and corresponding software security analysis methods are formed and mainly comprise the following 4 methods:

flow-based detection method

The flow-based detection technology is used for analyzing the difference between the attacker and a normal page in the aspects of parameter keywords, access behavior statistical indexes, information entropy, page relevance and the like when the attacker communicates with the webshell, extracting features and classifying the features through a certain algorithm.

Log-based detection method

The log-based detection is to extract features from the aspects of text features, statistical features, response page features and the like of the Web log and model the normal service access log, so that the webshell access request is detected in modes of unsupervised clustering and the like. Behavior characteristics of an abstract layer are obtained in a characteristic extraction stage, and a common escape method for webshell detection is provided, for example: the character string coding and structure, code confusion, file inclusion and the like form effective identification, a known detection escape technology can be effectively resisted, and meanwhile, a webshell detection classification model is built by adopting a deep neural network, so that the webshell detection based on a log means is completed.

Behavior-based detection method

Behavior-based detection utilizes the difference between behaviors executed by the system and normal page files during activity of the webshell. According to the behavior analysis of common webshells, the webshells often have file reading and writing behaviors, network monitoring behaviors, database connection behaviors and the like during operation. And collecting behaviors of the webshell in operation by adopting methods such as hook technology, RASP and the like, and detecting by adopting a certain rule or algorithm.

File-based detection method

The file-based detection is a technology for statically detecting the webshell by utilizing the difference between the webshell and a normal page file in the aspects of hash value, attribute, text keyword, statistical index and the like. One method is to perform statistical analysis on all collectable webshell samples and calculate file hash to form a webshell fingerprint database. By using the detection method based on the locality sensitive hashing algorithm, the bypass condition caused by locality modification and confusion can be avoided to a certain extent. Aiming at the common webshell detection and escape technology, the characteristics of information entropy, the longest word, the coincidence index, the compression ratio and the like can be extracted, and whether the sample is mixed and escaped can be effectively judged. And (3) performing webshell detection based on semantic analysis, extracting a stain subtree by using an abstract syntax tree and a manual definition risk characteristic library, calculating the risk degree of the file, and performing qualitative judgment in a mode of manually setting a threshold.

Although many domestic and foreign webshell detection methods can effectively detect webshells, some problems exist. The summary is as follows:

(1) flow-based detection

The actual effect of the current flow detection method is not as good as that of feature detection, the accuracy and reliability are not enough, only the behavior of uploading or accessing the webshell can be detected, and the problems of large workload, high false alarm rate and the like, which are caused by the fact that the existing and unused webshell in a website cannot be detected, are required to be deeply researched.

(2) Log-based detection

On the other hand, a large amount of log records can bear the performance of the server, and the detection process consumes long time due to the huge amount of logs, so that the detection speed is low. The webshell backdoor can simulate normal database operation, does not have obvious static special attributes, can not form obvious access characteristics with few access times, and is difficult to find through log analysis.

(3) Behavior-based detection

The behavior-based webshell detection method usually needs to deploy a probe in a production environment, and perform instrumentation and HOOK aiming at a bottom-layer WEB middleware and a PHP (Hypertext Preprocessor), which not only affects service performance, but also causes abnormal use of the production environment and further affects normal operation of services once the probe fails, thereby causing major accidents.

(4) File-based detection

In the traditional file-based detection, due to insufficient feature selection, a single static feature or dynamic feature cannot completely represent a complex and variable webshell file, and a high detection rate is difficult to achieve. Meanwhile, since PHP is a dynamic weak-type language, the modes of parameter transmission, type conversion and function calling are very flexible, development convenience is brought to developers, and meanwhile, a plurality of new ideas are brought to attackers for constructing a sentence webshell. This results in a horse that is very flexible and difficult to detect. The existing research method has poor detection effect and low accuracy for PHP pons.

The webshell detection method based on file detection has respective advantages and respective disadvantages, and the characteristic research on webshells is not easy to find, and most of the methods stay at the aspect of statistical characteristics and script content characteristics. With the continuous evolution of webshell attack and defense, attackers often hide webshell text features to avoid detection by using a method of obfuscation encryption. Aiming at the condition that the PHP-sentence webshell detection effect of the existing method is poor, the invention provides a new method based on file feature detection, which comprises the following steps: the PHP file is taken as an example, the dynamic characteristic of the PHP is taken as a core characteristic, and the dynamic and static analysis technology is matched, so that the defects of the existing method are overcome, the detection capability of the webshell is further improved, and the method has important significance in helping safety personnel to quickly and accurately position the back door position, improving the emergency response speed and the like.

Fig. 1 is a schematic diagram of a basic flow of a webshell detection method according to an embodiment of the present invention, and an exemplary embodiment of the present invention takes a webshell horse detection method based on a PHP dynamic characteristic applied to a PHP programming language as an example, and a corresponding detection tool is manufactured according to the method, where the method specifically includes the following steps:

in step 110, identifying dynamic characteristics of the document to be detected includes: after syntactic analysis and lexical analysis are carried out on a file to be detected, an abstract syntax tree of the file to be detected is generated; wherein the dynamic characteristics are defined as: changes to a certain section of code in the file to be detected can cause or possibly cause changes to the function of the code;

the characteristics of a speech webshell determine that all speech webshells need to use the dynamic characteristics, and therefore, identifying a speech webshell according to the dynamic characteristics has strong pertinence.

By collecting a large number of webshell pony samples and performing statistics and analysis, the webshell pony file is obtained to have one of the following dynamic characteristics: the code execution, function call, class creation and method call and file contain at least one of four broad classes, for the generated abstract syntax tree, the above dynamic characteristics are represented by nodes on the abstract syntax tree.

For a certain section of code, when the variable value of the code is changed, the code will cause or may cause the section of code to generate functional change, the functional change can be directly reflected by the running result of the code and can be obtained by the statistics and analysis result, and therefore a set of detection method for the PHP dynamic characteristics is established.

On the basis of identifying the dynamic characteristics of the PHP, whether the code is a webshell horse is further identified by analyzing the data flow of the calling parameters and judging whether the calling parameters are parameters marked by dirty points.

Since an attacker often hides the webshell text features by using a confusion encryption method to avoid detection, the traditional method usually adopts a text feature plus machine learning or neural network mode for identification, and the dynamic characteristics of the PHP script file are used for detection instead of the text features of the webshell.

The PHP dynamic characteristics are identified through an Abstract Syntax Tree (AST), the file characteristics in the traditional mode are not enough to characterize the characteristics of a sentence webshell, and the method identifies the webshell through the PHP dynamic characteristics. However, since not all programs applied to the dynamic feature are webshells, a static stain tracking data flow analysis technique is adopted to further improve the detection accuracy.

In step 120, recording nodes with dynamic characteristics in the abstract syntax tree, and detecting whether a taint variable exists in each node with dynamic characteristics through Trojan analysis;

by traversing each node on the abstract syntax tree, the function call or method call node which accords with the PHP dynamic characteristic definition is found, and the specific information of the line number, the function (method) name, the parameter name and the like is recorded.

The Trojan analysis method is a static analysis method, and in the static code analysis, lexical analysis is a process of converting a character sequence in a program source code into a keyword character string sequence with practical significance. Lexical analysis is the basis for the subsequent work for the present method, and in this stage, the source code needs to be read line by line, from left to right, and then the keyword sequence is generated.

And sending the result generated by the lexical analyzer to a syntax analyzer to complete the generation of the abstract syntax tree.

And performing static taint analysis according to the obtained parameter name, and analyzing whether the parameter is directly or indirectly controllable by a user.

In step 130, when a stain variable exists in the node with the dynamic characteristic, determining that the file to be detected has a risk; and when the node with the dynamic characteristic has no taint variable, determining the safety of the file to be detected.

The static taint analysis process can be divided into three parts, a point of danger convergence, pollution spread and a source of pollution.

When the taint is determined to exist, the node can be determined to have the risk.

The method disclosed by the invention takes a PHP file as an example, takes the dynamic characteristic of the PHP as a core characteristic, is matched with a dynamic and static analysis technology, overcomes the defects of the existing method, further improves the detection capability of the webshell, and has important significance in helping safety personnel to quickly and accurately position the back door position, improving the emergency response speed and the like.

Specifically, in an implementation manner of the exemplary embodiment of the present invention, the analysis engine is implemented by an analysis engine for a PHP-sentence webshell, and the analysis engine mainly includes a code interpretation engine, a dynamic characteristic detection engine, a data flow analysis engine, a user interface module, a webshell rule base, and a risk reporting module, and a constructed model is as shown in fig. 2.

With reference to fig. 2, the purpose of the code interpretation engine is to perform lexical analysis and Syntax analysis on the PHP program specified by the user, and generate an AST Abstract Syntax Tree (Abstract Syntax Tree) to provide a basis for subsequent analysis.

The dynamic characteristic detection engine is used for traversing the abstract syntax tree generated in the last step, judging whether each node of the abstract syntax tree has the PHP dynamic characteristic or not, and recording the information of the node.

The data flow analysis engine detects whether the function or method with PHP dynamic characteristics uses taint variable or not by utilizing PHP one-sentence Trojan taint analysis technology according to the analysis of the previous step.

And the dynamic detection engine judges whether the function name is encrypted and confused according to the analysis of the previous step, and simulates and executes to dynamically restore the function name.

The webshell rule base functions: and judging whether the function or the method has the risk of becoming webshell after using the taint parameter through a rule base built in the tool. Meanwhile, the user can add custom rules to the module to improve the detection capability and the detection quality of the detection tool.

The user interface module is mainly responsible for interacting with the user, and comprises: and the functions of configuring a scanning file path, customizing a scanning rule, customizing a white list, configuring a log, outputting a result form and the like are realized.

The risk report module is used for generating a final report which is convenient for a user to read and analyze according to the result of the analysis engine.

Specifically, referring to fig. 3, a main process flow of the engine analysis tool includes:

after configuration items such as a white list are appointed to be scanned by receiving a path of a file (folder) to be detected appointed by a user, the tool starts to enter a scanning state.

The code analysis engine loads standard PHP files in files or folders appointed by a user at first, excludes other files and improves the scanning speed. In addition, the scanning tool also can be used for further improving the scanning quality and the scanning efficiency according to a corresponding scanning strategy specified by a common Content Management System (CMS). And judging whether the specified target to be detected is a Content Management System (CMS) or not according to the feature file, and loading a corresponding scanning rule.

After a series of preprocessing, the PHP file to be scanned is delivered to a code analysis engine for analysis:

lexical analysis is performed first. Among static code analysis, lexical analysis is the process of converting a sequence of characters in program source code into a sequence of key character strings of practical significance. Lexical analysis is the basis for the subsequent work for the present tool, in this phase the parsing engine reads the source code line by line, from left to right, and then generates a sequence of keywords. The specific flow is shown in fig. 4.

And sending the result generated by the lexical analyzer to a syntax analyzer to complete the generation of the abstract syntax tree. Taking the most common sentence webshell as an example, the generated abstract syntax tree is shown in fig. 5:

according to an abstract syntax tree generated by a code analysis engine, function calling or method calling nodes conforming to PHP dynamic characteristic definition are found by traversing each node on the syntax tree, and specific information such as line numbers, function (method) names, parameter names and the like of the function calling or method calling nodes is recorded.

And performing static taint analysis according to the obtained parameter name, and analyzing whether the parameter is directly or indirectly controllable by a user. The static taint analysis process can be divided into three parts, a point of danger convergence, pollution spread and a source of pollution. The stain propagation process is shown in FIG. 6:

firstly, a CFG control flow graph of the AST abstract syntax tree generating program generated by a code analysis engine is convenient for subsequent data flow analysis.

Secondly, the assignment condition of the dangerous parameters is searched in the current file, wherein the assignment condition comprises files contained in functions such as PHP include and require. And searching whether a pollution source input by a user exists, such as a supercomplete array of $ _ GET, $ _ POST and the like.

And determining the taint sink position according to a sensitive function or method obtained by the code dynamic characteristic detection engine.

In one implementation of the exemplary embodiments of this invention, the method includes:

And analyzing whether an effective path exists between the data input source and the sensitive function by using a control flow graph of the program, and performing taint analysis on the effective path. And (3) influencing the spread of the taint based on the rich characteristic function of the PHP, and setting different taint spread strategies for taint spread operation. For the operation of the assignment operator, the left operand inherits the taint attribute of the right operand, any one operand on the right is polluted, and the left operand is marked as taint data. Other functions can be used as safety functions of all sensitive points, such as md5, sha1 and the like, and the taint attribute disappears after data is processed by the functions; for the functions of base64_ encode and the like, the calling order is stored in a stack space, and if the functions of base64_ decode are decoded, the original taint attribute is restored.

If the variable cleaning function and the encoding function exist on the path, the variable cleaning function and the encoding function show that the taint variable can not be transmitted into the sensitive function through the current effective path, and taint transmission is interrupted. If the result obtained in the above process is: and if a taint transmission path exists from the source point to the sink point, the risk is proved to exist.

According to the above result, if the encrypted and confused webshell exists, the DYNAMIC detection engine starts the sandbox, simulates the program execution, obtains the function name by the instruction from HOOK to INIT _ DYNAMICs _ CALL before the execution, and can further make an accurate judgment for the webshell in the above way by matching with the above static detection.

After the security analysis is finished, the found risk result is stored in a local disk and submitted to a user for review.

In one implementation of the exemplary embodiments of this invention, the method further comprises:

In the invention, a Control Flow Graph (CFG) of a program is required to be generated according to an AST abstract syntax tree generated by a code analysis engine, wherein the CFG is also called a Control Flow Graph (CFG) which is an abstract data structure used in a compiler and represents all paths traversed in the program execution process, so that the follow-up data Flow analysis is more convenient.

And determining the taint sink position according to the sensitive function or method obtained by the code dynamic characteristic detection engine in combination with the taint propagation process shown in the figure 6.

In an implementation manner of the exemplary embodiment of the present invention, because a large number of dynamic characteristics exist in library files in part of the web framework, in order to reduce false alarm, a white list and white sample library mechanism is established, and dynamic characteristics are filtered through the white list and white sample library mechanism;

The part is that aiming at the optimization of a content management system, a large number of code structures of the PHP content management system are analyzed, and the result shows that a large number of dynamic characteristics exist in library files of the existing PHP CMS framework, and the files are normal codes written by developers by means of the flexible characteristics of PHP. And (3) aiming at samples of different CMS frame libraries, an MD5 white list is constructed, so that false reports can be reduced, and the scanning process can be accelerated.

Meanwhile, different filtering functions exist in different CMS frameworks, and aiming at the situation, the scheme provided by the invention adopts a customized scanning mechanism to add the filtering functions under different frameworks into a safety function of static taint tracking so as to ensure the accuracy of taint analysis results.

The beneficial effects of the invention include:

static detection capability for PHP dynamics:

the PHP dynamic characteristic is the core of the method for detecting the webshell horse. A set of detection method for PHP dynamic characteristics is formulated by collecting a large number of webshell horse samples, and carrying out statistics and analysis. According to the characteristics of the speech webshell mentioned above, all the speech webshells need to use the dynamic characteristics, so that the identification of the speech webshell according to the characteristics is highly targeted. On the basis of identifying the dynamic characteristics of the PHP, whether the code is a webshell horse is further identified by analyzing the data flow of the calling parameters and judging whether the calling parameters are parameters marked by dirty points.

Aiming at the dynamic detection capability after webshell confusion:

the static analysis technology converts the program code into intermediate forms such as abstract syntax trees and the like, and then analyzes the program, and the dynamic analysis detects the safety of the program in the actual operation process of the program instead of actually operating the program. By analyzing the opcode instructions of the confused webshells, the webshells have common characteristics: there is a phenomenon of DYNAMIC CALL, i.e., there is an INIT _ DYNAMIC _ CALL instruction at the opcode level. Therefore, in the Zend engine, the execution is HOOK, the function name is acquired before the execution, and the subsequent normal function thereof is executed. In combination with the above-mentioned static detection, an accurate judgment can be further made for such webshells.

The webshell detection feature expansion capability:

at present, the calling modes of the PHP webshell are classified according to the characteristics of the existing PHP webshell, and the design idea of the existing PHP webshell can be basically covered. When a new sentence webshell design characteristic appears, a user can conveniently expand the characteristic through the webshell rule base of the tool to further enhance the accuracy of the detection tool.

Multiple operating system support:

compared with other webshell detection tools, the system can support most of the mainstream operating systems at present, including: windows series, Unix, Linux, FreeBSD, etc.

The invention can also enable enterprises and individuals who carry out regular safety detection to use the product to detect the risk of the webshell of the sentence in the existing business code, find out the existing risk from the risk and repair the risk. The code improvement flow is shown in fig. 7: the appointed source program code is manually checked after risk scanning is carried out by using the tool of the invention, risk restoration is carried out, and then re-detection is carried out.

As shown in fig. 8, the present invention further includes monitoring the website source code in real time, and ensuring the security of the website during operation by detecting whether the file uploaded by the user is webshell, and the application process is as follows: and configuring a monitoring path, uploading the detection file, determining the safety of the file to be detected when the safety risk detection passes, and performing alarm and bug fixing when the file to be detected does not pass so as to avoid Trojan poisoning.

In an implementation manner of the exemplary embodiment of the present invention, the method further includes: generating an extensible webshell rule base; when a webshell design feature update occurs, expanding the updated feature in the extensible webshell rule base.

The user can conveniently expand the webshell characteristics through the webshell rule base of the tool, so that the accuracy of the detection tool is further enhanced.

Aiming at a common webshell obfuscation technology, the PHP sandbox technology is introduced to simulate and run an object to be tested, and the function name of the sensitive function is obtained before the sensitive function is executed by utilizing the characteristic that the program needs to restore and obfuscate during running. By matching with the static detection provided by the invention, accurate judgment can be further made for the webshell.

On the basis of identifying the dynamic characteristics of the PHP, whether the code is a webshell horse is further identified by analyzing the data stream of the calling parameter and judging whether the calling parameter is a parameter marked by a dirty point.

Aiming at the existing one-sentence webshell searching, killing and escaping method, the invention finds out that a new ending method is provided for webshells under the conditions of function name confusion and encryption, namely, the method is executed by utilizing a sandbox dynamic simulation program, can restore the risk functions by matching with a HOOK technology, and can have better detection capability for the one-sentence webshell according to the result of static analysis.

When a large amount of dynamic characteristics exist in the content management system, the invention introduces a white list and white sample library mechanism, which can not only accelerate the scanning speed, but also reduce false alarm.

It is to be appreciated that the method can be performed by any apparatus, device, platform, cluster of devices having computing and processing capabilities.

Based on the same inventive concept, fig. 9 is a schematic structural diagram of a webshell detection apparatus provided in an embodiment of the present invention, and the apparatus may be implemented by software and/or hardware, is generally integrated in an intelligent terminal, and can be implemented by a webshell detection method. As shown in the figure, the present embodiment provides a webshell detection apparatus corresponding to any of the foregoing webshell detection method embodiments, and the apparatus mainly includes an identification module 910, an analysis module 920, and a determination module 930.

The identifying module 910 is configured to identify a dynamic characteristic of a file to be detected, and includes: after syntactic analysis and lexical analysis are carried out on a file to be detected, an abstract syntax tree of the file to be detected is generated; wherein the dynamic characteristics refer to: there is a dynamic call instruction;

the analysis module 920 is configured to record nodes with dynamic characteristics in the abstract syntax tree, and detect whether a taint variable exists in each node with dynamic characteristics through Trojan analysis;

the determining module 930 is configured to determine that the file to be detected has a risk when the node with the dynamic characteristic has a taint variable; and when the node with the dynamic characteristic has no taint variable, determining the safety of the file to be detected.

In an implementation of the exemplary embodiment of the invention, the analysis module is further configured to:

In an implementation of the exemplary embodiment of the invention, the apparatus further comprises:

the flow diagram module is used for generating a control flow diagram of the file to be detected according to the abstract syntax tree;

and the path analysis module is used for analyzing paths between data input sources of the control flow graph and the nodes with the dynamic characteristics, and determining that the nodes with the dynamic characteristics are at risk when taint transfer exists.

the encryption judgment module is used for judging whether the file to be detected is in a data encryption or confusion state;

and the preprocessing module is used for preprocessing the given file, determining the file to be scanned and analyzed.

the database building module is used for generating an extensible webshell rule base;

and the expansion module is used for expanding the updated characteristics in the expandable webshell rule base when the webshell design characteristics are updated.

In an implementation manner of the exemplary embodiment of the present invention, the apparatus further includes a white list module, configured to:

when the number of the dynamic features exceeds a preset value, a white list and a white sample library mechanism are established, and the dynamic features are filtered through the white list and the white sample mechanism;

For convenience of description, the above devices are respectively described by dividing functions into various modules, and certainly, when the exemplary embodiment of the present invention is implemented, the functions of the modules may be implemented in the same software and/or hardware, and the webcall detection device provided in the above embodiment may execute the webcall detection method provided in any embodiment of the present invention, and has corresponding functional modules and beneficial effects for executing the method.

Based on the same inventive concept, corresponding to any of the above-mentioned embodiments, one or more embodiments of the present specification further provide an electronic device, which includes a memory, a processor, and a computer program stored in the memory and running on the processor, and when the processor executes the computer program, the WEBSHELL detection method according to any of the above embodiments is implemented.

The technology carrier involved in the embodiments of the present specification may include, for example, Near Field Communication (NFC), WIFI, 3G/4G/5G, POS machine card swiping technology, two-dimensional code scanning technology, barcode scanning technology, bluetooth, infrared, Short Message Service (SMS), Multimedia Message (MMS), and the like.

It should be noted that the method of the exemplary embodiment of the present invention may be executed by a single device, such as a computer or a server. The method of the embodiment can also be applied to a distributed scene and completed by the mutual cooperation of a plurality of devices. In such a distributed scenario, one of the devices may perform only one or more steps of the method according to the exemplary embodiment of the present invention, and the devices interact with each other to complete the WEBSHELL detection method.

The foregoing description has been directed to specific embodiments of this disclosure. Other embodiments are within the scope of the following claims. In some cases, the actions or steps recited in the claims may be performed in a different order than in the embodiments and still achieve desirable results. In addition, the processes depicted in the accompanying figures do not necessarily require the particular order shown, or sequential order, to achieve desirable results. In some embodiments, multitasking and parallel processing may also be possible or may be advantageous.

For convenience of description, the above devices are described as being divided into various modules by functions, and are described separately. Of course, the functionality of the various modules may be implemented in the same one or more software and/or hardware in implementing the exemplary embodiments of this invention.

Fig. 10 is a schematic diagram illustrating a more specific hardware structure of an electronic device according to this embodiment, where the electronic device may include: a processor 1010, a memory 1020, an input/output interface 1030, a communication interface 1040, and a bus 1050. Wherein the processor 1010, memory 1020, input/output interface 1030, and communication interface 1040 are communicatively coupled to each other within the device via bus 1050.

The processor 1010 may be implemented by a general-purpose CPU (Central Processing Unit), a microprocessor, an Application Specific Integrated Circuit (ASIC), or one or more Integrated circuits, and is configured to execute related programs to implement the technical solutions provided in the embodiments of the present disclosure.

The Memory 1020 may be implemented in the form of a ROM (Read Only Memory), a RAM (Random Access Memory), a static storage device, a dynamic storage device, or the like. The memory 1020 can store an operating system and other application programs, and when the technical solution provided by the embodiment of the present disclosure is implemented by software or firmware, the relevant program codes are stored in the memory 1020 and called by the processor 1010 to execute the WEBSHELL detection method according to the embodiment of the present disclosure.

The input/output interface 1030 is used for connecting an input/output module to input and output information. The i/o module may be configured as a component in a device (not shown) or may be external to the device to provide a corresponding function. The input devices may include a keyboard, a mouse, a touch screen, a microphone, various sensors, etc., and the output devices may include a display, a speaker, a vibrator, an indicator light, etc.

The communication interface 1040 is used for connecting a communication module (not shown in the drawings) to implement communication interaction between the present apparatus and other apparatuses. The communication module can realize communication in a wired mode (such as USB, network cable and the like) and also can realize communication in a wireless mode (such as mobile network, WIFI, Bluetooth and the like).

Bus 1050 includes a path that transfers information between various components of the device, such as processor 1010, memory 1020, input/output interface 1030, and communication interface 1040.

It should be noted that although the above-mentioned device only shows the processor 1010, the memory 1020, the input/output interface 1030, the communication interface 1040 and the bus 1050, in a specific implementation, the device may also include other components necessary for normal operation. In addition, those skilled in the art will appreciate that the above-described apparatus may also include only those components necessary to implement the embodiments of the present description, and not necessarily all of the components shown in the figures.

The electronic device in the foregoing embodiment is used to implement the corresponding webcall detection method in any of the foregoing embodiments, and has the beneficial effects of the corresponding method embodiment, which are not described herein again.

Exemplary embodiments of the present invention also provide a non-transitory computer readable storage medium, including permanent and non-permanent, removable and non-removable media, that can implement information storage by any method or technology, corresponding to the method of any exemplary embodiment of the present invention, based on the same inventive concept. The information may be computer readable instructions, data structures, programs, modules of the programs themselves, or other data. Examples of the storage medium of the computer include, but are not limited to: phase change memory (PRAM), Static Random Access Memory (SRAM), Dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), Read Only Memory (ROM), Electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technologies, compact disc read only memory (CD-ROM), Digital Versatile Discs (DVD) or other optical storage, magnetic cassettes, magnetic tape storage or other magnetic storage devices, or any other non-transmission medium, may be used to store information that may be accessed by a computing device for performing the WEBSHELL detection method described in exemplary embodiments of the invention.

Those of ordinary skill in the art will understand that: the discussion of any embodiment above is meant to be exemplary only, and is not intended to intimate that the scope of the disclosure, including the claims, is limited to those examples; within the idea of the invention, also technical features in the above embodiments or in different embodiments may be combined, steps may be implemented in any order, and there are many other variations of the different aspects of the exemplary embodiments of the invention as described above, which are not provided in detail for the sake of brevity.

In addition, well-known power/ground connections to Integrated Circuit (IC) chips and other components may or may not be shown within the provided figures for simplicity of illustration and discussion, and so as not to obscure the exemplary embodiments of the invention. Furthermore, devices may be shown in block diagram form in order to avoid obscuring exemplary embodiments of the present invention, and this also takes into account the fact that specifics with respect to implementation of such block diagram devices are highly dependent upon the platform within which the exemplary embodiments of the present invention are to be implemented (i.e., specifics should be well within purview of one skilled in the art). Where specific details (e.g., circuits) are set forth in order to describe example embodiments of the invention, it should be apparent to one skilled in the art that example embodiments of the invention can be practiced without, or with variation of, these specific details. Accordingly, the description is to be regarded as illustrative instead of restrictive.

While the present invention has been described in conjunction with specific embodiments thereof, many alternatives, modifications, and variations of these embodiments will be apparent to those of ordinary skill in the art in light of the foregoing description. For example, other memory architectures (e.g., dynamic ram (dram)) may use the discussed embodiments.

The exemplary embodiments of the invention are intended to embrace all such alternatives, modifications and variances that fall within the broad scope of the appended claims. Therefore, any omissions, modifications, substitutions, improvements, and the like that may be made without departing from the spirit and principles of the exemplary embodiments of the invention are intended to be within the scope of the invention.

Claims

1. A webshell detection method is characterized by comprising the following steps:

2. The webshell detection method of claim 1, wherein the recording of the nodes with dynamic characteristics in the abstract syntax tree and the detecting of whether the nodes with dynamic characteristics have a taint variable through Trojan analysis comprise:

3. The webshell detection method of claim 1, further comprising:

4. The webshell detection method of claim 1, further comprising:

5. The webshell detection method of claim 1, wherein prior to identifying the dynamic characteristics of the file to be detected, the method further comprises:

6. The webshell detection method of claim 1, further comprising:

generating an extensible webshell rule base;

7. The webshell detection method of claim 1, further comprising:

8. A webshell detection device, comprising:

the identification module is used for identifying the dynamic characteristics of the file to be detected and comprises the following components: after syntactic analysis and lexical analysis are carried out on a file to be detected, an abstract syntax tree of the file to be detected is generated; wherein the dynamic characteristics refer to: changes to a certain section of code in the file to be detected can cause or possibly cause changes to the function of the code;

9. An electronic device comprising a memory, a processor, and a computer program stored on the memory and executable on the processor, wherein the processor implements the webshell detection method of any of claims 1 to 7 when executing the program.

10. A non-transitory computer-readable storage medium storing computer instructions for causing a computer to perform the webshell detection method of any of claims 1 to 7.