WO2020000743A1

WO2020000743A1 - Webshell detection method and related device

Info

Publication number: WO2020000743A1
Application number: PCT/CN2018/108472
Authority: WO
Inventors: 刘立业
Original assignee: 平安科技（深圳）有限公司
Priority date: 2018-06-27
Filing date: 2018-09-28
Publication date: 2020-01-02
Also published as: CN108985057A; CN108985057B

Abstract

A malicious script back door webshell detection method and a related device, wherein the method comprises: Upon receiving a security detection instruction, a server detects a file to be detected under a target directory and/or a target path according to the indication of the security detection instruction, and determines a feature hash value of the file to be detected (101); The server compares the feature hash value of the file to be detected with a sample feature hash value of a webshell sample in a pre-established hash fingerprint database (102), if there is a sample feature hash value matching the feature hash value exists, the server determines that the file to be detected is a webshell (103). The method is beneficial to improve the detection efficiency of the webshell.

Description

Webshell detection method and related equipment

This application claims priority from a Chinese patent application filed on June 27, 2018 with the Chinese Patent Office, application number 201810685031.3, and application name "A Webshell Detection Method and Related Equipment", the entire contents of which are incorporated herein by reference. in.

Technical field

The present application relates to the field of computer technology, and in particular, to a webshell detection method and related equipment.

Background technique

A malicious script backdoor webshell is a backdoor program in which a hacker controls the server. If a hacker finds that a web application has a file upload vulnerability, he can upload a webshell for subsequent attacks, and use the webshell to secretly remotely control the web server, upload, view, modify, delete files on the website server, read and modify the website database. Data, you can even run system commands directly on the web server.

At present, webshell detection mainly uses methods such as malicious code, string encoding, and dangerous functions. The code or function corresponding to the file to be tested needs to be compared with existing malicious codes and dangerous functions. The amount of calculation is large and the time-consuming is long. Low efficiency.

Summary of the invention

The embodiments of the present application provide a webshell detection method and related equipment. When performing webshell detection, it is not necessary to compare the string, code, various functions, etc. of the file to be tested in real time, but only to compare the hash value, which is beneficial to improve webshell detection efficiency.

In a first aspect, an embodiment of the present application provides a webshell detection method. The method includes:

When a security detection instruction is received, detecting a file under test in a target directory and / or a target path according to the instruction of the security detection instruction, and determining a characteristic hash value of the file under test;

Comparing the characteristic hash value of the file to be tested with a sample characteristic hash value of a webshell sample in a pre-established hash fingerprint database;

If there is a sample feature hash value that matches the feature hash value, it is determined that the file to be tested is a webshell.

In a second aspect, an embodiment of the present application provides a webshell detection device. The webshell detection device includes a module for executing the method in the first aspect.

In a third aspect, an embodiment of the present application provides a server. The server includes a processor, a network interface, and a memory. The processor, the network interface, and the memory are connected to each other. The network interface is controlled by the processor. The memory is configured to receive and send messages, and the memory is configured to store a computer program that supports a server to execute the foregoing method, the computer program includes program instructions, and the processor is configured to call the program instructions to execute the method of the first aspect.

In a fourth aspect, an embodiment of the present application provides a computer-readable storage medium. The computer-readable storage medium stores a computer program, where the computer program includes program instructions, and the program instructions, when executed by a processor, cause all the The processor executes the method of the first aspect.

In the embodiment of the present application, the server may compare the characteristic hash value of the file under test with the sample characteristic hash value of the malicious script backdoor webshell sample in the pre-established hash fingerprint database. If there is a sample characteristic hash value that matches the characteristic hash value, It is determined that the file to be tested is a webshell. Through the embodiments of the present application, it is beneficial to improve the detection efficiency of the webshell.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 is a schematic flowchart of a webshell detection method according to an embodiment of the present application; FIG.

2 is a schematic flowchart of another webshell detection method according to an embodiment of the present application;

3 is a schematic flowchart of another webshell detection method according to an embodiment of the present application;

4 is a schematic block diagram of a webshell detection device according to an embodiment of the present application;

FIG. 5 is a schematic block diagram of a server according to an embodiment of the present application.

detailed description

The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application. Obviously, the described embodiments are only a part of the embodiments of the present application, but not all of the embodiments. Based on the embodiments in this application, all other embodiments obtained by a person of ordinary skill in the art without creative efforts shall fall within the protection scope of this application.

Referring to FIG. 1, FIG. 1 is a schematic flowchart of a webshell detection method according to an embodiment of the present application. As shown in the figure, the webshell detection method may include:

101. When receiving a security detection instruction, the server detects a file under test in a target directory and / or a target path according to the instruction of the security detection instruction, and determines a characteristic hash value of the file under test.

In one embodiment, when an operation and maintenance personnel finds an intrusion event or finds that a business system has a webshell, a security detection instruction may be input for the server, which is used to instruct the server to scan and detect files to be tested in a specified directory or a specified path For example, the security test instruction may be used to instruct scanning and detection of files to be tested in the F directory. Further, after receiving the security detection instruction input by the user, the server can detect any one of the files to be tested in the specified directory or the specified path according to the user's instruction, and determine the characteristic hash of the file to be tested through the text similarity algorithm. value.

In one embodiment, before the server detects the files to be tested in the target directory and / or the target path according to the instruction of the security detection instruction, the server may further identify the security test instruction and determine the target directory and / Or the target path, and obtain the extensions of the target directory and / or all files in the target path, and further determine the files with the preset extensions in the target directory and / or the target path as the files to be tested. In this way, it is not necessary to scan all files in the specified directory or the specified path, which reduces the amount of calculation and further improves the efficiency of webshell detection. The preset extension may be an extension that does not include the security file designated by the operation and maintenance personnel, for example, it does not include extensions such as .doc, .pdf, and .rar.

Exemplarily, the operation and maintenance personnel want to perform webshell detection on the file in the directory F and the extension is a preset extension. In this case, the operation and maintenance personnel can enter a security test instruction through the server to instruct the directory F ( That is, the file (ie the file to be tested) in the target directory) and the extension is a preset extension for webshell detection. Further, after receiving the security test instruction, the server may determine the directory F as a target directory, identify the extensions of all the files in the directory F, and determine the files with the preset extensions in the directory F as the files to be tested. And perform webshell detection on the file under test.

102. The server compares a characteristic hash value of the file to be tested with a sample characteristic hash value of a webshell sample in a pre-established hash fingerprint database.

103. If there is a sample characteristic hash value that matches the characteristic hash value, the server determines that the file to be tested is a webshell.

In one embodiment, the server may obtain webshell samples of different scripting languages in advance, and determine the sample feature hash value of each webshell sample through a text similarity algorithm, and then hash the webshell samples of different scripting languages and the sample feature of each webshell sample. Values are stored in the hash fingerprint database, that is, the hash fingerprint database is established. In this case, after the server determines the characteristic hash value of the file to be tested, the characteristic hash value of the file to be tested may be compared with the sample characteristic hash value of any one or more webshell samples in the fingerprint database. It is determined that the feature hash value matches any of the sample feature hash values (for example, the Hamming distance between the feature hash value and any sample feature hash value is equal to the Hamming distance threshold, or the feature hash value matches any of the sample feature hash values. The matching value is equal to the matching threshold, etc.), it is determined that the file under test is a webshell. With such a detection method, it is not necessary to compare the character string, code characteristics, various functions, etc. of the file to be tested in real time, but only to compare the hash value, which is beneficial to improving the efficiency of webshell detection.

In the embodiment of the present application, when receiving the security detection instruction, the server may detect the file under test in the target directory and / or the target path according to the instruction of the security detection instruction, and determine a characteristic hash value of the file under test. The feature hash value of the file to be tested is compared with the sample feature hash value of the webshell sample in the pre-established hash fingerprint database. If there is a sample feature hash value that matches the characteristic hash value, it is determined that the file to be tested is webshell. Through the embodiments of the present application, it is beneficial to improve the detection efficiency of the webshell.

Referring to FIG. 2, FIG. 2 is a schematic flowchart of another webshell detection method according to an embodiment of the present application. As shown in the figure, the webshell detection method may include:

201. The server obtains webshell samples of different scripting languages.

202. The server uses N types of text similarity algorithms to determine N sample feature hash values of each webshell sample under the N types of text similarity algorithms, where N is a positive integer.

203. The server establishes a hash fingerprint database according to the N sample feature hash values of the webshell samples of different scripting languages and the webshell samples under the N text similarity algorithms.

In one embodiment, the server can obtain webshell samples of different scripting languages and use different text similarity algorithms to calculate the hash value of the webshell sample. That is, for the same webshell sample, N algorithms correspond to N hash values. . Obtaining webshell samples of different scripting languages can overcome the problem that most webshells are based on scripting backdoors of ASP, ASP.NET, PHP and other languages. Insufficient support for other scripting languages such as jsp is helpful for detecting multiple scripting language files. stand by. Among them, the text similarity algorithm may include a simhash algorithm, an ssdeep algorithm, and the like.

Further, the server may store all the obtained webshell samples and the N hash sample values corresponding to the respective webshell samples in a hash fingerprint database. For example, when the text similarity algorithm includes two algorithms, simhash and ssdeep, the hash fingerprint database can reflect the correspondence shown in Table 1-1:

Table 1-1

webshell样本webshell samples	simhash下的hashhash under simhash	ssdeep下的hashhash under ssdeep
aa	10000000011000000001	101000000001101000000001

b

1000000010

101100000000

Alternatively, the server may also establish a sub-hash fingerprint database corresponding to each text similarity algorithm for different text similarity algorithms, that is, the hash fingerprint database includes one or more sub-hash fingerprint databases. For example, when the text similarity algorithm includes two algorithms, simhash and ssdeep, the hash fingerprint database can establish a sub-hash fingerprint database of the simhash algorithm and the ssdeep algorithm, respectively, and the relationship reflected by the established sub-hash fingerprint database can be shown in Table 1- 2 and 1-3. Establish corresponding sub-hash fingerprint libraries for different text similarity algorithms, so that it is easy to directly target the corresponding sub-hash when the target algorithm (such as simhash algorithm or ssdeep algorithm) is used for text similarity comparison. The fingerprint database obtains the sample feature hash value under the target algorithm, which further improves the webshell detection efficiency.

Table 1-2

webshell样本webshell samples	simhash下的hashhash under simhash
aa	10000000011000000001
bb	10000000101000000010

Table 1-3

webshell样本webshell samples	ssdeep下的hashhash under ssdeep
aa	101000000001101000000001
bb	101100000000101100000000

204. When the server receives the security detection instruction, it detects the current operating environment, and determines a target algorithm that matches the current operating environment from a preset list of text similarity algorithms. The target algorithm is the first text similarity algorithm. Or the second text similarity algorithm. The preset text similarity algorithm list includes N types of text similarity algorithms.

The N text similarity algorithms included in the text similarity algorithm list are consistent with the N text similarity algorithms used when the hash fingerprint database is established in step 202.

In one embodiment, the N types of text similarity algorithms may include at least one of a first text similarity algorithm and a second text similarity algorithm. The first text similarity algorithm may be, for example, the simhash algorithm and the second text similarity algorithm may be, for example, the ssdeep algorithm. Both of these algorithms are similarity detection algorithms, and both may be used for suspect samples (ie, files to be tested). After the feature hash value is generated, a hash comparison is performed with the sample feature hash value of any webshell sample in the above-mentioned hash fingerprint database. But in terms of these two algorithms, the simhash algorithm is easy to use, has a short detection time, and has no requirements for the deployment configuration of the production environment. It is suitable for scenarios such as remote detection and rapid detection in security emergency response. The ssdeep algorithm has higher detection accuracy than the simhash algorithm, but it has certain requirements on the deployment environment and relies on additional libraries (such as function modules relying on the algorithm library), that is, the ssdeep algorithm is generally suitable for webshell detection in local scenarios.

In one embodiment, the text similarity algorithm list includes the first text similarity algorithm and the second text similarity algorithm. The first text similarity algorithm has no requirements for the deployment configuration of the production environment, and the second text similarity algorithm. There are certain requirements for the deployment environment, and the runtime needs to rely on additional libraries. In this case, after receiving the security detection instruction, the server can detect whether the current operating environment is configured with the function module required for the second text similarity algorithm to rely on the algorithm library. If configured, the second text similarity algorithm is configured. Determine the target algorithm, and detect the test files in the target directory and / or target path indicated by the security detection instruction based on the second text similarity algorithm, and determine the characteristic hash of the test file under the second text similarity algorithm value. If it is not configured, the first text similarity algorithm is determined as the target algorithm, and based on the first text similarity algorithm, the target file and / or the target file in the target path indicated by the security detection instruction are detected to determine the target Measure the characteristic hash value of the file under the first text similarity algorithm. Using a combination of multiple algorithms for webshell detection, on the one hand, this solution can perform webshell detection in different remote and local scenarios, which improves the generality; on the other hand, in the case of local scene execution, the detection accuracy can be preferentially selected. High algorithms (such as the ssdeep algorithm) make webshell detection results more accurate.

205. If the target algorithm is a first text similarity algorithm, the server detects the files to be tested in the target directory and / or the target path indicated by the security detection instruction based on the first text similarity algorithm, and determines that the files to be tested are Feature hash value under the first text similarity algorithm.

206. The server obtains a sample feature hash value corresponding to any one or more webshell samples under the first text similarity algorithm from a pre-established hash fingerprint database.

207. The server obtains each Hamming distance between the feature hash value of the file under test under the first text similarity algorithm and the sample feature hash value corresponding to any one or more webshell samples under the first text similarity algorithm.

208. If any Hamming distance of each Hamming distance is equal to the first Hamming threshold, the server determines that a sample feature hash value matching the feature hash value exists, and determines that the file to be tested is a webshell.

In one embodiment, when the server determines that the target algorithm is the first text similarity algorithm, it can obtain any one or more webshell samples in the hash fingerprint database, and the corresponding sample feature hash values under the first text similarity algorithm. (Hereinafter referred to as the first sample hash value), and obtain the characteristic hash value of the first sample corresponding to any one or more webshell samples and the test file under the first text similarity algorithm of the test file. The characteristic hash value (hereinafter referred to as the first hash value) is compared with the Hamming distance to obtain each Hamming distance between the first hash value of the file under test and any one or more webshell samples corresponding to the first sample hash value . Further, the server may compare each Hamming distance with a preset first Hamming threshold and a second Hamming threshold, and the second Hamming threshold is greater than the first Hamming threshold. If any Hamming distance of each Hamming distance is equal to the first Hamming threshold (for example, 0), it can be determined that there is a sample feature hash value matching the feature hash value, and it can be further determined that the file to be tested is a webshell; or, If any Hamming distance is greater than the first Hamming threshold and not greater than the second Hamming threshold (for example, greater than 0 and not greater than 3), the file under test may be determined to be a variant webshell; or, if the If a Hamming distance is greater than the second Hamming threshold (such as greater than 3), it can be determined that the file to be tested is a non-webshell.

Wherein, when the server determines that the file to be tested is a webshell or a variant webshell, it may also output an alarm message for prompting that a webshell or a variant webshell is detected. In one embodiment,

During the process of detecting the file to be tested, the server can also record the characteristic hash value of the current file to be tested, and the path information (that is, the target path and directory directory in step 205). When the file to be tested is detected, When it is a webshell, it is also possible to obtain a comparison with the file under test in the hash fingerprint database, and determine the path information of the target webshell sample of the webshell to be tested, and the sample feature hash value of the target webshell sample. After determining whether the file to be tested is a webshell or a variant webshell, the above recorded information (i.e., the characteristic hash value of the current file to be tested, the path information of the current file to be tested, the path information of the target webshell sample, and the target webshell sample) The sample feature hash value) generates a scan log and outputs it to the user for easy viewing by the user.

In one embodiment, when the server determines that the file to be tested is a variant webshell, the variant webshell and the characteristic hash value corresponding to the variant webshell may be associated and stored in a previously established hash fingerprint database to implement the hash fingerprint database. Update.

In the embodiment of the present application, the server may detect the current running environment, and determine a target algorithm matching the current running environment from a preset list of text similarity algorithms. If the target algorithm is the first text similarity algorithm, it is based on the first text similarity algorithm. A text similarity algorithm detects the files to be tested in the target directory and / or the target path indicated by the security detection instruction, which is helpful to improve the detection efficiency of the webshell.

Referring to FIG. 3, FIG. 3 is a schematic flowchart of another webshell detection method according to an embodiment of the present application. As shown in the figure, the webshell detection method may include:

301. The server obtains webshell samples of different scripting languages.

302. The server uses N types of text similarity algorithms to determine N sample feature hash values of each webshell sample under the N types of text similarity algorithms, where N is a positive integer.

303. The server establishes a hash fingerprint database according to the N sample feature hash values of the webshell samples of different scripting languages and the webshell samples under the N text similarity algorithms.

304. When receiving the security detection instruction, the server detects the current operating environment, and determines a target algorithm that matches the current operating environment from a preset list of text similarity algorithms. The target algorithm is the first text similarity algorithm. Or the second text similarity algorithm.

For specific implementations of steps 301 to 304, refer to related descriptions of steps 201 to 204 in the foregoing embodiment, and details are not described herein again.

305. If the target algorithm is a second text similarity algorithm, the server detects the files to be tested in the target directory and / or the target path indicated by the security detection instruction based on the second text similarity algorithm, and determines the files to be tested. Feature hash value under the second text similarity algorithm.

306. The server obtains a sample feature hash value corresponding to any one or more webshell samples under a second text similarity algorithm from a pre-established hash fingerprint database.

307. The server obtains each weighted editing distance between the feature hash value of the file under test under the second text similarity algorithm and the corresponding sample feature hash value of any one or more webshell samples under the second text similarity algorithm.

308. The server determines, according to each weighted editing distance, each matching value between the feature hash value under the second text similarity algorithm and the sample feature hash value corresponding to any one or more webshell samples under the second text similarity algorithm.

309. If any of the matching values is equal to the first matching threshold, the server determines that a sample feature hash value matching the feature hash value exists, and determines that the file to be tested is a webshell.

In one embodiment, when the server detects that the target algorithm is the second text similarity algorithm, it can obtain any one or more webshell samples in the hash fingerprint database, and the corresponding sample feature hash values under the second text similarity algorithm. (Hereinafter referred to as the second sample hash value), and the characteristic hash value of the second sample corresponding to any one or more webshell samples obtained and the characteristics of the file under test under the second text similarity algorithm of the file under test The hash value (hereinafter referred to as the second hash value) is compared for similarity, and the weighted editing distances between the second hash value of the file to be tested and the second sample hash value corresponding to any one or more webshell samples are obtained, and then A matching value between the second hash value and each second sample hash value is determined according to each weighted editing distance. In an embodiment, the server may divide each weighted edit distance after obtaining the weighted edit distances between the second hash value of the file to be tested and the second sample hash value corresponding to any one or more webshell samples. Take the length sum of the second hash value and the second sample hash value, and then map the divided result to an integer value from 0 to 100 to obtain the second hash value of the file to be tested and any one or more webshells. Matching values between the second sample hash values corresponding to the samples.

Further, in this case, if any one of the respective matching values is equal to the first matching threshold (for example, 100), the server may determine that there is a sample feature hash value matching the feature hash value, and then determine the test value to be tested. The file is a webshell.

Or, if any of the matching values is greater than the second matching threshold (for example, 50) and less than the first matching threshold (for example, 100), it is determined that the file to be tested is a variant webshell, that is, a hacker may have performed code on the webshell In the case of confusion, the first matching threshold is greater than the second matching threshold.

Or, if any of the matching values is not greater than the second matching threshold (for example, 50) and is not smaller than 0, it can be determined that the file to be tested is a non-webshell.

In an example, the server executes step 303 to establish a hash fingerprint database according to the N sample feature hash values of webshell samples of different scripting languages and webshell samples under N types of text similarity algorithms. You can further combine the conditions for determining the file under test as a webshell (such as the matching value is equal to the first matching threshold), the conditions of the variant webshell (such as the matching value is greater than the second matching threshold (such as 50) and less than the first matching threshold), and Non-webshell conditions (such as the matching value is not greater than the second matching threshold (such as 50) and not less than 0) to establish a hash fingerprint database.

Exemplarily, it is assumed that the first fingerprint similarity algorithm and the second text similarity algorithm are adopted to establish the hash fingerprint database, which are the simhash algorithm and the ssdeep algorithm, respectively. When the hash fingerprint database is established, the hash fingerprint database can be subdivided into two sub-segments. Hash fingerprint database, one is a sub-hash fingerprint database under the simhash algorithm (hereinafter referred to as the simhash fingerprint database), and the other is a sub-hash fingerprint database under the ssdeep algorithm (hereinafter referred to as the ssdeep fingerprint database). Samples and the sample feature hash value of each webshell sample under the simhash algorithm (hereinafter referred to as the simhash sample feature hash value); the ssdeep fingerprint database includes each webshell sample and the sample feature hash value of each webshell sample under the ssdeep algorithm (hereinafter referred to as ssdeep sample Characteristic hash value). In this case, the server can compare the feature hash value with each simhash sample to satisfy that the Hamming distance is equal to the first Hamming threshold, the Hamming distance is greater than the first Hamming threshold, and is not greater than the second Hamming threshold, the Hamming distance. Various hash values greater than the second Hamming threshold (such as greater than 3) are stored in the simhash fingerprint database in association with each other, and tag information can be added to each type of hash value during storage. For example, you can compare the hash value with the characteristic hash value of each simhash sample to satisfy such a hash value with a Hamming distance equal to the first Hamming threshold, and add the tag information for marking as a webshell; you can compare the characteristic hash value with each simhash sample to satisfy the Han This type of hash value is greater than the first Hamming threshold and not greater than the second Hamming threshold. Marking information used to mark the variant as a webshell is added; it can be compared with the characteristic hash value of each simhash sample to meet the greater than the second Hamming. This type of hash value for thresholds adds tagging information for marking as non-webshell.

For example, if the sample feature hash value of webshell sample a under the simhash algorithm is 1001001, then a string (that is, a hash value) with a Hamming distance of 0 from 1001001 and 1001001 can be associated with 1001001 and stored in the simhash fingerprint database. Is the string to which the webshell belongs; stores a string with a Hamming distance of 1001001 greater than 3 (that is, a hash value) in association with 1001001 to the simhash fingerprint database, and marks this type of string as a string that does not belong to the webshell; it will be linked to 1001001 A string of Hamming distance greater than 0 and not less than 3 (ie, a hash value) is associated with 1001001 and stored in the simhash fingerprint database, and this type of string is marked as the string to which the variant webshell belongs.

In this case, after the server determines the characteristic hash value of the file to be tested, it can search for a character string that is the same as the characteristic hash value of the file to be tested in a pre-established hash fingerprint database, and then determine the tag information to which the string belongs. If the tag information indicates that the character string belongs to the webshell, the file to be tested is determined to be a webshell; if the tag information indicates that the character string is a character string that does not belong to the webshell, the file to be tested is determined to be a non-webshell; if the tag information Indicates that the string is a string to which the variant webshell belongs, and it is determined that the file to be tested is a variant webshell. In this way, the calculation amount when comparing the characteristic hash value of the file to be tested with the sample characteristic hash value in the pre-established hash fingerprint database can be reduced, and the webshell detection efficiency can be further improved.

In the embodiment of the present application, the server may detect the current running environment, and determine a target algorithm matching the current running environment from a preset list of text similarity algorithms. If the target algorithm is a second text similarity algorithm, then based on the The two-text similarity algorithm detects files under test in the target directory and / or the target path indicated by the security detection instruction. On the one hand, it is beneficial to improve the detection efficiency of the webshell, and on the other hand, the text similarity algorithm that matches the current running environment is used to detect the file to be tested, which is helpful to improve the accuracy of the webshell detection.

An embodiment of the present application further provides a webshell detection device, and the device includes a module for executing the method described in FIG. 1, FIG. 2, or FIG. 3. Specifically, referring to FIG. 4, it is a schematic block diagram of a webshell detection device according to an embodiment of the present application. The webshell detection device of this embodiment includes:

A detection module 40 is configured to detect a file to be tested in a target directory and / or a target path according to the instruction of the security test instruction when a security detection instruction is received, and determine characteristics of the file to be tested. hash value

A comparison module 41, configured to compare the characteristic hash value of the file to be tested with a sample characteristic hash value of a webshell sample in a pre-established hash fingerprint database;

A determining module 42 is configured to determine that the file to be tested is a webshell if a sample characteristic hash value matching the characteristic hash value exists.

In one embodiment, the apparatus further includes: an obtaining module 43 and a establishing module 44, wherein:

An obtaining module 43 for obtaining webshell samples of different scripting languages;

The determining module 42 is further configured to use N types of text similarity algorithms to determine N sample feature hash values of each webshell sample under the N types of text similarity algorithms, where N is a positive integer;

A establishing module 44 is configured to establish a hash fingerprint database according to webshell samples of the different scripting languages and N sample feature hash values of the webshell samples under the N text similarity algorithms.

In one embodiment, the detection module 40 is specifically configured to detect a current running environment, and determine a target algorithm matching the current running environment from a preset text similarity algorithm list, and the preset text similarity The algorithm list includes the N kinds of text similarity algorithms; based on the target algorithm, detecting the files to be tested in the target directory and / or the target path indicated by the security detection instruction, and determining whether the files to be tested are A characteristic hash value under the target algorithm.

In one embodiment, the comparison module 41 is specifically configured to obtain any one or more webshell samples from a pre-established hash fingerprint database, and the respective sample feature hash values corresponding to the target algorithm under the target algorithm, if present, are consistent with the feature hash values. Matching sample feature hash values; comparing the feature hash values of the file under test under the target algorithm with the sample feature hash values corresponding to any one or more webshell samples under the target algorithm comparing.

In an embodiment, the target algorithm is a first text similarity algorithm, and the comparison module 41 is further specifically configured to obtain a characteristic hash value of the file under test under the first text similarity algorithm and the task hash value. Hamming distances between one or more webshell samples corresponding to the respective sample feature hash values under the first text similarity algorithm; a determination module 42 is specifically configured to detect if any one of the Hamming distances If the bright distance is equal to the first Hamming threshold, it is determined that there is a sample feature hash value that matches the feature hash value.

In one embodiment, the target algorithm is a second text similarity algorithm, and the comparison module 41 is specifically configured to obtain a feature hash value of the file under test under the second text similarity algorithm and any one of the two. Or each weighted edit distance between the respective sample feature hash values of multiple webshell samples under the second text similarity algorithm; and determining the feature hash under the second text similarity algorithm according to the weighted edit distances Each matching value between the value and the sample feature hash value corresponding to each of the one or more webshell samples under the second text similarity algorithm;

The determining module 42 is further specifically configured to determine that, if any of the matching values is equal to the first matching threshold, a sample feature hash value matching the feature hash value exists.

In one embodiment, if the determining module 42 determines that any one of the Hamming distances is greater than the first Hamming threshold and not greater than the second Hamming threshold, determining that the file to be tested is a variant webshell, and the first The second Hamming threshold is greater than the first Hamming threshold;

In one embodiment, if the determining module 42 determines that any one of the matching values is greater than a second matching threshold and less than the first matching threshold, it is determined that the file to be tested is a variant webshell, and the first matching threshold is greater than The second matching threshold.

In an embodiment, the determining module 42 is further configured to identify the security detection instruction to determine a target directory and / or a target path to be tested by the security detection instruction; and the acquisition module 43 is further configured to: The extensions of all the files in the target directory and / or the target path are obtained, and files with the preset extensions in the target directory and / or the target path are determined as the files to be tested.

It should be noted that the functions of the functional modules of the webshell detection device described in the embodiments of the present application can be specifically implemented according to the method in the method embodiment described in FIG. 1, FIG. 2, or FIG. 3, and the specific implementation process can refer to FIG. The relevant descriptions of the method embodiments in FIG. 2 or FIG. 3 are not repeated here.

Please refer to FIG. 5, which is a schematic block diagram of a server provided by an embodiment of the present application. As shown in FIG. 5, the server includes a processor 501, a memory 502, and a network interface 503. The processor 501, the memory 502, and the network interface 503 may be connected through a bus or in other manners. In FIG. 5 shown in the embodiment of the present application, connection through a bus is taken as an example. The network interface 503 is controlled by the processor to send and receive messages, and the memory 502 is used to store a computer program. The computer program includes program instructions, and the processor 501 is configured to execute the program instructions stored in the memory 502. Wherein, the processor 501 is configured to call the program instruction to execute: upon receiving a security detection instruction, detecting a file to be tested in a target directory and / or a target path according to an instruction of the security detection instruction. To determine a characteristic hash value of the file to be tested; compare the characteristic hash value of the file to be tested with a sample characteristic hash value of a webshell sample in a pre-established hash fingerprint database; if there is a hash with the characteristic If the value matches the sample feature hash value, it is determined that the file to be tested is a webshell.

It should be understood that, in the embodiment of the present application, the processor 501 may be a Central Processing Unit (CPU), and the processor 501 may also be another general-purpose processor or a digital signal processor (Digital Signal Processor, DSP). ), Application Specific Integrated Circuit (ASIC), Field-Programmable Gate Array (FPGA) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, etc. A general-purpose processor may be a microprocessor or the processor may be any conventional processor or the like.

The memory 502 may include a read-only memory and a random access memory, and provide instructions and data to the processor 501. A part of the memory 502 may further include a non-volatile random access memory. For example, the memory 502 may also store information of a device type.

In specific implementation, the processor 501, the memory 502, and the network interface 503 described in the embodiments of the present application may perform the implementation manners described in the method embodiments described in FIG. 1, FIG. 2 or FIG. 3 provided by the embodiments of the present application, The implementation manner of the webshell detection device described in the embodiment of the present application may also be performed, and details are not described herein again.

In another embodiment of the present application, a computer-readable storage medium is provided. The computer-readable storage medium stores a computer program, where the computer program includes program instructions, and the program instructions are implemented when executed by a processor: When a security detection instruction is received, the files to be tested in the target directory and / or the target path are detected according to the instructions of the security detection instruction, and a characteristic hash value of the files to be tested is determined; The feature hash value of the test file is compared with a sample feature hash value of a webshell sample in a pre-established hash fingerprint database; if a sample feature hash value that matches the feature hash value exists, determining that the file to be tested is a webshell .

The computer-readable storage medium may be an internal storage unit of the server according to any of the foregoing embodiments, such as a hard disk or a memory of the server. The computer-readable storage medium may also be an external storage device of the server, such as a plug-in hard disk, a Smart Media Card (SMC), and a Secure Digital (SD) card provided on the server. , Flash card (Flash card) and so on. Further, the computer-readable storage medium may further include both an internal storage unit of the server and an external storage device. The computer-readable storage medium is used to store the computer program and other programs and data required by the server. The computer-readable storage medium may also be used to temporarily store data that has been or will be output.

Those of ordinary skill in the art can understand that all or part of the processes in the method of the foregoing embodiment can be implemented by using a computer program to instruct related hardware. The program can be stored in a computer-readable storage medium. When executed, the processes of the embodiments of the methods described above may be included. The storage medium may be a magnetic disk, an optical disk, a read-only memory (Read-Only Memory, ROM), or a random access memory (Random, Access Memory, RAM).

The above disclosure is only a part of the embodiments of this application, and of course, the scope of rights of this application cannot be limited by this. Those skilled in the art can understand all or part of the processes of implementing the above embodiments and make according to the claims of this application. The equivalent changes still fall within the scope of the invention.

Claims

A method for detecting a malicious script backdoor webshell, which includes:

When a security detection instruction is received, detecting a file under test in a target directory and / or a target path according to the instruction of the security detection instruction, and determining a characteristic hash value of the file under test;

Comparing the characteristic hash value of the file to be tested with a sample characteristic hash value of a webshell sample in a pre-established hash fingerprint database;

If there is a sample feature hash value that matches the feature hash value, it is determined that the file to be tested is a webshell.
The method according to claim 1, wherein the detection of the files under test in the target directory and / or the target path according to the instructions of the security detection instruction determines the characteristic hash of the files under test Values, including:

Detecting the current running environment, and determining a target algorithm matching the current running environment from a preset list of text similarity algorithms, where the preset list of text similarity algorithms includes the N types of text similarity algorithms;

Detecting the file under test in the target directory and / or the target path indicated by the security detection instruction based on the target algorithm to determine a characteristic hash value of the file under test in the target algorithm;

Wherein, comparing the characteristic hash value of the file to be tested with a sample characteristic hash value of a webshell sample in a pre-established hash fingerprint database includes:

Obtaining, from a pre-established hash fingerprint database, sample characteristic hash values corresponding to any one or more webshell samples under the target algorithm;

Compare the characteristic hash value of the file under test under the target algorithm with the sample characteristic hash value corresponding to any one or more webshell samples under the target algorithm.
The method according to claim 2, wherein the target algorithm is a first text similarity algorithm, and the feature hash value of the file under test under the target algorithm is equal to the one or more of the The comparison of the respective webshell samples under the target algorithm to the sample feature hash values includes:

Obtain the feature hash value of the file under test under the first text similarity algorithm and the corresponding sample feature hash value of any one or more webshell samples under the first text similarity algorithm Each Hamming distance;

If any one of the Hamming distances is equal to the first Hamming threshold, it is determined that there is a sample feature hash value that matches the feature hash value.
The method according to claim 3, further comprising:

If any one of the Hamming distances is greater than the first Hamming threshold and not greater than the second Hamming threshold, determining that the file under test is a variant webshell, and the second Hamming threshold is greater than the first Hamming threshold Threshold.
The method according to claim 2, wherein the target algorithm is a second text similarity algorithm, and the feature hash value of the file under test under the target algorithm is compared with any one or more of the The comparison of the respective sample feature hash values of the webshell samples under the target algorithm includes:

Obtain the feature hash value of the file under test under the second text similarity algorithm and the corresponding feature feature hash value of any one or more webshell samples under the second text similarity algorithm Each weighted edit distance;

Determining between the feature hash value under the second text similarity algorithm and the corresponding feature hash value for each of the one or more webshell samples under the second text similarity algorithm according to the weighted editing distances Each matching value of

If any one of the respective matching values is equal to the first matching threshold, it is determined that there is a sample feature hash value that matches the feature hash value.
The method according to claim 5, further comprising:

If any one of the matching values is greater than a second matching threshold and less than the first matching threshold, determining that the file under test is a variant webshell, and the first matching threshold is greater than the second matching threshold.
The method according to any one of claims 1-6, wherein before comparing the characteristic hash value of the file to be tested with a sample characteristic hash value of a webshell sample in a pre-established hash fingerprint database , The method further includes:

Get webshell samples of different scripting languages;

N types of text similarity algorithms are used to determine the N sample feature hash values of each webshell sample under the N types of text similarity algorithms, where N is a positive integer;

A hash fingerprint database is established according to the webshell samples of the different scripting languages and the N sample feature hash values of the webshell samples under the N text similarity algorithms.
The method according to any one of claims 1 to 7, characterized in that before the detecting a file under test in a target directory and / or a target path according to an instruction of the security detection instruction, the method further include:

Identifying the security detection instruction to determine a target directory and / or a target path to be tested by the security detection instruction;

The extensions of all the files in the target directory and / or the target path are obtained, and files with the preset extensions in the target directory and / or the target path are determined as the files to be tested.
A webshell detection device, comprising:

A detection module, configured to detect a file under test in a target directory and / or a target path according to an instruction of the security test instruction, and determine a characteristic hash of the file under test when the security test instruction is received; value;

A comparison module, configured to compare the characteristic hash value of the file under test with a sample characteristic hash value of a webshell sample in a pre-established hash fingerprint database;

A determining module, configured to determine that the file to be tested is a webshell if a sample characteristic hash value that matches the characteristic hash value exists.
The device according to claim 9, wherein the detection module is specifically configured to detect a current running environment, and determine a target algorithm that matches the current running environment from a preset text similarity algorithm list, and The preset text similarity algorithm list includes the N types of text similarity algorithms; based on the target algorithm, the files to be tested in the target directory and / or the target path indicated by the security detection instruction are detected to determine A characteristic hash value of the file under test under the target algorithm; wherein the comparison module is specifically configured to obtain any one or more webshell samples in a pre-established hash fingerprint database corresponding to each of the target algorithms The feature hash value of the sample; the feature hash value of the test file under the target algorithm and the sample feature hash value corresponding to each of the one or more webshell samples under the target algorithm are performed Compared.
The device according to claim 10, wherein the target algorithm is a first text similarity algorithm, and the comparison module is further configured to obtain the test file under the first text similarity algorithm. The Hamming distance between the feature hash value of each of the one and more webshell samples under the first text similarity algorithm, and the respective Hamming distances; the determining module is specifically configured to: If any one of the Hamming distances is equal to the first Hamming threshold, it is determined that there is a sample feature hash value that matches the feature hash value.
The apparatus according to claim 11, wherein the determining module is further configured to determine if it is determined that any one of the Hamming distances is greater than the first Hamming threshold and not greater than a second Hamming threshold. The file to be tested is a variant webshell, and the second Hamming threshold is greater than the first Hamming threshold.
The device according to claim 10, wherein the target algorithm is a second text similarity algorithm, and the comparison module is specifically configured to obtain the file under test under the second text similarity algorithm. Each weighted edit distance between the feature hash value and the sample feature hash value of each of the one or more webshell samples corresponding to the second text similarity algorithm; determining the second weighted distance according to each weighted edit distance Each matching value between the feature hash value under the text similarity algorithm and the sample feature hash value corresponding to each of the one or more webshell samples under the second text similarity algorithm; wherein the determining module, Specifically, the method is further configured to determine that, if any one of the matching values is equal to the first matching threshold, a sample feature hash value matching the feature hash value exists.
The device according to claim 13, wherein the determining module is further configured to determine the to-be-measured if it is determined that any one of the matching values is greater than a second matching threshold value and less than the first matching threshold value. The file is a variant webshell, and the first matching threshold is greater than the second matching threshold.
The device according to claim 9-14, further comprising: an acquisition module and a establishment module, wherein:

The obtaining module is used to obtain webshell samples of different scripting languages;

The determining module is further configured to use N text similarity algorithms to determine N sample feature hash values of each webshell sample under the N text similarity algorithms, where N is a positive integer;

The establishing module is configured to establish a hash fingerprint database according to webshell samples of the different scripting languages and N sample feature hash values of the webshell samples under the N types of text similarity algorithms.
The device according to any one of claims 9 to 15, wherein the determining module is further configured to identify the security detection instruction to determine a target directory and / or target to be tested by the security detection instruction. Path; the obtaining module is further configured to obtain the extensions of the target directory and / or all files in the target path, and determine the files in the target directory and / or the target path with the preset extensions as File under test.
A server is characterized in that it includes a processor and a memory, and the processor and the memory are connected to each other, wherein the memory is used to store a computer program, the computer program includes program instructions, and the processor is configured For invoking the execution of the program instruction: when a security detection instruction is received, detecting a file to be tested in a target directory and / or a target path according to an instruction of the security detection instruction, and determining the file to be tested The characteristic hash value of the file; comparing the characteristic hash value of the file to be tested with the sample characteristic hash value of the webshell sample in a pre-established hash fingerprint database; if a sample characteristic hash value matching the characteristic hash value exists , It is determined that the file under test is a webshell.
The server according to claim 17, wherein the processor is further configured to:

Detecting the current running environment, and determining a target algorithm matching the current running environment from a preset list of text similarity algorithms, where the preset list of text similarity algorithms includes the N types of text similarity algorithms;

Detecting the file under test in the target directory and / or the target path indicated by the security detection instruction based on the target algorithm to determine a characteristic hash value of the file under test in the target algorithm;

Obtaining, in a pre-established hash fingerprint database, sample feature hash values corresponding to any one or more webshell samples under the target algorithm;

Compare the characteristic hash value of the file under test under the target algorithm with the sample characteristic hash value corresponding to any one or more webshell samples under the target algorithm.
The server according to claim 18, wherein the target algorithm is a first text similarity algorithm, and the processor is further configured to:

Obtain the feature hash value of the file under test under the first text similarity algorithm and the corresponding sample feature hash value of any one or more webshell samples under the first text similarity algorithm Each Hamming distance;

If any one of the Hamming distances is equal to the first Hamming threshold, it is determined that there is a sample feature hash value that matches the feature hash value.
A computer-readable storage medium, characterized in that the computer-readable storage medium stores a computer program, the computer program includes program instructions, and when the program instructions are executed by a processor, the processor executes A method according to any one of 1-8 is required.