WO2020000743A1 - Webshell detection method and related device - Google Patents

Webshell detection method and related device Download PDF

Info

Publication number
WO2020000743A1
WO2020000743A1 PCT/CN2018/108472 CN2018108472W WO2020000743A1 WO 2020000743 A1 WO2020000743 A1 WO 2020000743A1 CN 2018108472 W CN2018108472 W CN 2018108472W WO 2020000743 A1 WO2020000743 A1 WO 2020000743A1
Authority
WO
WIPO (PCT)
Prior art keywords
hash value
webshell
target
algorithm
text similarity
Prior art date
Application number
PCT/CN2018/108472
Other languages
French (fr)
Chinese (zh)
Inventor
刘立业
Original Assignee
平安科技(深圳)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 平安科技(深圳)有限公司 filed Critical 平安科技(深圳)有限公司
Publication of WO2020000743A1 publication Critical patent/WO2020000743A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/50Monitoring users, programs or devices to maintain the integrity of platforms, e.g. of processors, firmware or operating systems
    • G06F21/55Detecting local intrusion or implementing counter-measures
    • G06F21/56Computer malware detection or handling, e.g. anti-virus arrangements
    • G06F21/562Static detection

Definitions

  • the present application relates to the field of computer technology, and in particular, to a webshell detection method and related equipment.
  • a malicious script backdoor webshell is a backdoor program in which a hacker controls the server. If a hacker finds that a web application has a file upload vulnerability, he can upload a webshell for subsequent attacks, and use the webshell to secretly remotely control the web server, upload, view, modify, delete files on the website server, read and modify the website database. Data, you can even run system commands directly on the web server.
  • webshell detection mainly uses methods such as malicious code, string encoding, and dangerous functions.
  • the code or function corresponding to the file to be tested needs to be compared with existing malicious codes and dangerous functions.
  • the amount of calculation is large and the time-consuming is long. Low efficiency.
  • the embodiments of the present application provide a webshell detection method and related equipment.
  • it is not necessary to compare the string, code, various functions, etc. of the file to be tested in real time, but only to compare the hash value, which is beneficial to improve webshell detection efficiency.
  • an embodiment of the present application provides a webshell detection method.
  • the method includes:
  • an embodiment of the present application provides a webshell detection device.
  • the webshell detection device includes a module for executing the method in the first aspect.
  • an embodiment of the present application provides a server.
  • the server includes a processor, a network interface, and a memory.
  • the processor, the network interface, and the memory are connected to each other.
  • the network interface is controlled by the processor.
  • the memory is configured to receive and send messages, and the memory is configured to store a computer program that supports a server to execute the foregoing method, the computer program includes program instructions, and the processor is configured to call the program instructions to execute the method of the first aspect.
  • an embodiment of the present application provides a computer-readable storage medium.
  • the computer-readable storage medium stores a computer program, where the computer program includes program instructions, and the program instructions, when executed by a processor, cause all the The processor executes the method of the first aspect.
  • the server may compare the characteristic hash value of the file under test with the sample characteristic hash value of the malicious script backdoor webshell sample in the pre-established hash fingerprint database. If there is a sample characteristic hash value that matches the characteristic hash value, It is determined that the file to be tested is a webshell. Through the embodiments of the present application, it is beneficial to improve the detection efficiency of the webshell.
  • FIG. 1 is a schematic flowchart of a webshell detection method according to an embodiment of the present application
  • FIG. 2 is a schematic flowchart of another webshell detection method according to an embodiment of the present application.
  • FIG. 3 is a schematic flowchart of another webshell detection method according to an embodiment of the present application.
  • FIG. 4 is a schematic block diagram of a webshell detection device according to an embodiment of the present application.
  • FIG. 5 is a schematic block diagram of a server according to an embodiment of the present application.
  • FIG. 1 is a schematic flowchart of a webshell detection method according to an embodiment of the present application.
  • the webshell detection method may include:
  • the server When receiving a security detection instruction, the server detects a file under test in a target directory and / or a target path according to the instruction of the security detection instruction, and determines a characteristic hash value of the file under test.
  • a security detection instruction may be input for the server, which is used to instruct the server to scan and detect files to be tested in a specified directory or a specified path
  • the security test instruction may be used to instruct scanning and detection of files to be tested in the F directory.
  • the server can detect any one of the files to be tested in the specified directory or the specified path according to the user's instruction, and determine the characteristic hash of the file to be tested through the text similarity algorithm. value.
  • the server may further identify the security test instruction and determine the target directory and / Or the target path, and obtain the extensions of the target directory and / or all files in the target path, and further determine the files with the preset extensions in the target directory and / or the target path as the files to be tested.
  • the preset extension may be an extension that does not include the security file designated by the operation and maintenance personnel, for example, it does not include extensions such as .doc, .pdf, and .rar.
  • the operation and maintenance personnel want to perform webshell detection on the file in the directory F and the extension is a preset extension.
  • the operation and maintenance personnel can enter a security test instruction through the server to instruct the directory F (That is, the file (ie the file to be tested) in the target directory) and the extension is a preset extension for webshell detection.
  • the server may determine the directory F as a target directory, identify the extensions of all the files in the directory F, and determine the files with the preset extensions in the directory F as the files to be tested. And perform webshell detection on the file under test.
  • the server compares a characteristic hash value of the file to be tested with a sample characteristic hash value of a webshell sample in a pre-established hash fingerprint database.
  • the server determines that the file to be tested is a webshell.
  • the server may obtain webshell samples of different scripting languages in advance, and determine the sample feature hash value of each webshell sample through a text similarity algorithm, and then hash the webshell samples of different scripting languages and the sample feature of each webshell sample. Values are stored in the hash fingerprint database, that is, the hash fingerprint database is established. In this case, after the server determines the characteristic hash value of the file to be tested, the characteristic hash value of the file to be tested may be compared with the sample characteristic hash value of any one or more webshell samples in the fingerprint database.
  • the feature hash value matches any of the sample feature hash values (for example, the Hamming distance between the feature hash value and any sample feature hash value is equal to the Hamming distance threshold, or the feature hash value matches any of the sample feature hash values.
  • the matching value is equal to the matching threshold, etc.
  • the server when receiving the security detection instruction, may detect the file under test in the target directory and / or the target path according to the instruction of the security detection instruction, and determine a characteristic hash value of the file under test.
  • the feature hash value of the file to be tested is compared with the sample feature hash value of the webshell sample in the pre-established hash fingerprint database. If there is a sample feature hash value that matches the characteristic hash value, it is determined that the file to be tested is webshell.
  • it is beneficial to improve the detection efficiency of the webshell.
  • FIG. 2 is a schematic flowchart of another webshell detection method according to an embodiment of the present application.
  • the webshell detection method may include:
  • the server obtains webshell samples of different scripting languages.
  • the server uses N types of text similarity algorithms to determine N sample feature hash values of each webshell sample under the N types of text similarity algorithms, where N is a positive integer.
  • the server establishes a hash fingerprint database according to the N sample feature hash values of the webshell samples of different scripting languages and the webshell samples under the N text similarity algorithms.
  • the server can obtain webshell samples of different scripting languages and use different text similarity algorithms to calculate the hash value of the webshell sample. That is, for the same webshell sample, N algorithms correspond to N hash values. .
  • N algorithms correspond to N hash values.
  • Obtaining webshell samples of different scripting languages can overcome the problem that most webshells are based on scripting backdoors of ASP, ASP.NET, PHP and other languages. Insufficient support for other scripting languages such as jsp is helpful for detecting multiple scripting language files. stand by.
  • the text similarity algorithm may include a simhash algorithm, an ssdeep algorithm, and the like.
  • the server may store all the obtained webshell samples and the N hash sample values corresponding to the respective webshell samples in a hash fingerprint database.
  • a hash fingerprint database can reflect the correspondence shown in Table 1-1:
  • the server may also establish a sub-hash fingerprint database corresponding to each text similarity algorithm for different text similarity algorithms, that is, the hash fingerprint database includes one or more sub-hash fingerprint databases.
  • the hash fingerprint database can establish a sub-hash fingerprint database of the simhash algorithm and the ssdeep algorithm, respectively, and the relationship reflected by the established sub-hash fingerprint database can be shown in Table 1- 2 and 1-3.
  • Establish corresponding sub-hash fingerprint libraries for different text similarity algorithms so that it is easy to directly target the corresponding sub-hash when the target algorithm (such as simhash algorithm or ssdeep algorithm) is used for text similarity comparison.
  • the fingerprint database obtains the sample feature hash value under the target algorithm, which further improves the webshell detection efficiency.
  • the server receives the security detection instruction, it detects the current operating environment, and determines a target algorithm that matches the current operating environment from a preset list of text similarity algorithms.
  • the target algorithm is the first text similarity algorithm. Or the second text similarity algorithm.
  • the preset text similarity algorithm list includes N types of text similarity algorithms.
  • the N text similarity algorithms included in the text similarity algorithm list are consistent with the N text similarity algorithms used when the hash fingerprint database is established in step 202.
  • the N types of text similarity algorithms may include at least one of a first text similarity algorithm and a second text similarity algorithm.
  • the first text similarity algorithm may be, for example, the simhash algorithm and the second text similarity algorithm may be, for example, the ssdeep algorithm. Both of these algorithms are similarity detection algorithms, and both may be used for suspect samples (ie, files to be tested). After the feature hash value is generated, a hash comparison is performed with the sample feature hash value of any webshell sample in the above-mentioned hash fingerprint database. But in terms of these two algorithms, the simhash algorithm is easy to use, has a short detection time, and has no requirements for the deployment configuration of the production environment.
  • the ssdeep algorithm has higher detection accuracy than the simhash algorithm, but it has certain requirements on the deployment environment and relies on additional libraries (such as function modules relying on the algorithm library), that is, the ssdeep algorithm is generally suitable for webshell detection in local scenarios.
  • the text similarity algorithm list includes the first text similarity algorithm and the second text similarity algorithm.
  • the first text similarity algorithm has no requirements for the deployment configuration of the production environment, and the second text similarity algorithm. There are certain requirements for the deployment environment, and the runtime needs to rely on additional libraries.
  • the server can detect whether the current operating environment is configured with the function module required for the second text similarity algorithm to rely on the algorithm library. If configured, the second text similarity algorithm is configured. Determine the target algorithm, and detect the test files in the target directory and / or target path indicated by the security detection instruction based on the second text similarity algorithm, and determine the characteristic hash of the test file under the second text similarity algorithm value.
  • the first text similarity algorithm is determined as the target algorithm, and based on the first text similarity algorithm, the target file and / or the target file in the target path indicated by the security detection instruction are detected to determine the target Measure the characteristic hash value of the file under the first text similarity algorithm.
  • this solution can perform webshell detection in different remote and local scenarios, which improves the generality; on the other hand, in the case of local scene execution, the detection accuracy can be preferentially selected. High algorithms (such as the ssdeep algorithm) make webshell detection results more accurate.
  • the server detects the files to be tested in the target directory and / or the target path indicated by the security detection instruction based on the first text similarity algorithm, and determines that the files to be tested are Feature hash value under the first text similarity algorithm.
  • the server obtains a sample feature hash value corresponding to any one or more webshell samples under the first text similarity algorithm from a pre-established hash fingerprint database.
  • the server obtains each Hamming distance between the feature hash value of the file under test under the first text similarity algorithm and the sample feature hash value corresponding to any one or more webshell samples under the first text similarity algorithm.
  • the server determines that a sample feature hash value matching the feature hash value exists, and determines that the file to be tested is a webshell.
  • the server when it determines that the target algorithm is the first text similarity algorithm, it can obtain any one or more webshell samples in the hash fingerprint database, and the corresponding sample feature hash values under the first text similarity algorithm. (Hereinafter referred to as the first sample hash value), and obtain the characteristic hash value of the first sample corresponding to any one or more webshell samples and the test file under the first text similarity algorithm of the test file.
  • the characteristic hash value (hereinafter referred to as the first hash value) is compared with the Hamming distance to obtain each Hamming distance between the first hash value of the file under test and any one or more webshell samples corresponding to the first sample hash value .
  • the server may compare each Hamming distance with a preset first Hamming threshold and a second Hamming threshold, and the second Hamming threshold is greater than the first Hamming threshold. If any Hamming distance of each Hamming distance is equal to the first Hamming threshold (for example, 0), it can be determined that there is a sample feature hash value matching the feature hash value, and it can be further determined that the file to be tested is a webshell; or, If any Hamming distance is greater than the first Hamming threshold and not greater than the second Hamming threshold (for example, greater than 0 and not greater than 3), the file under test may be determined to be a variant webshell; or, if the If a Hamming distance is greater than the second Hamming threshold (such as greater than 3), it can be determined that the file to be tested is a non-webshell.
  • the first Hamming threshold for example, 0
  • the second Hamming threshold for example, greater than 3
  • the server when the server determines that the file to be tested is a webshell or a variant webshell, it may also output an alarm message for prompting that a webshell or a variant webshell is detected.
  • the server may also output an alarm message for prompting that a webshell or a variant webshell is detected.
  • the server can also record the characteristic hash value of the current file to be tested, and the path information (that is, the target path and directory directory in step 205).
  • the path information that is, the target path and directory directory in step 205.
  • the above recorded information i.e., the characteristic hash value of the current file to be tested, the path information of the current file to be tested, the path information of the target webshell sample, and the target webshell sample
  • the sample feature hash value generates a scan log and outputs it to the user for easy viewing by the user.
  • the server determines that the file to be tested is a variant webshell
  • the variant webshell and the characteristic hash value corresponding to the variant webshell may be associated and stored in a previously established hash fingerprint database to implement the hash fingerprint database. Update.
  • the server may detect the current running environment, and determine a target algorithm matching the current running environment from a preset list of text similarity algorithms. If the target algorithm is the first text similarity algorithm, it is based on the first text similarity algorithm.
  • a text similarity algorithm detects the files to be tested in the target directory and / or the target path indicated by the security detection instruction, which is helpful to improve the detection efficiency of the webshell.
  • FIG. 3 is a schematic flowchart of another webshell detection method according to an embodiment of the present application.
  • the webshell detection method may include:
  • the server obtains webshell samples of different scripting languages.
  • the server uses N types of text similarity algorithms to determine N sample feature hash values of each webshell sample under the N types of text similarity algorithms, where N is a positive integer.
  • the server establishes a hash fingerprint database according to the N sample feature hash values of the webshell samples of different scripting languages and the webshell samples under the N text similarity algorithms.
  • the server When receiving the security detection instruction, the server detects the current operating environment, and determines a target algorithm that matches the current operating environment from a preset list of text similarity algorithms.
  • the target algorithm is the first text similarity algorithm. Or the second text similarity algorithm.
  • steps 301 to 304 refer to related descriptions of steps 201 to 204 in the foregoing embodiment, and details are not described herein again.
  • the server detects the files to be tested in the target directory and / or the target path indicated by the security detection instruction based on the second text similarity algorithm, and determines the files to be tested. Feature hash value under the second text similarity algorithm.
  • the server obtains a sample feature hash value corresponding to any one or more webshell samples under a second text similarity algorithm from a pre-established hash fingerprint database.
  • the server obtains each weighted editing distance between the feature hash value of the file under test under the second text similarity algorithm and the corresponding sample feature hash value of any one or more webshell samples under the second text similarity algorithm.
  • the server determines, according to each weighted editing distance, each matching value between the feature hash value under the second text similarity algorithm and the sample feature hash value corresponding to any one or more webshell samples under the second text similarity algorithm.
  • the server determines that a sample feature hash value matching the feature hash value exists, and determines that the file to be tested is a webshell.
  • the server when it detects that the target algorithm is the second text similarity algorithm, it can obtain any one or more webshell samples in the hash fingerprint database, and the corresponding sample feature hash values under the second text similarity algorithm. (Hereinafter referred to as the second sample hash value), and the characteristic hash value of the second sample corresponding to any one or more webshell samples obtained and the characteristics of the file under test under the second text similarity algorithm of the file under test.
  • the hash value (hereinafter referred to as the second hash value) is compared for similarity, and the weighted editing distances between the second hash value of the file to be tested and the second sample hash value corresponding to any one or more webshell samples are obtained, and then A matching value between the second hash value and each second sample hash value is determined according to each weighted editing distance.
  • the server may divide each weighted edit distance after obtaining the weighted edit distances between the second hash value of the file to be tested and the second sample hash value corresponding to any one or more webshell samples. Take the length sum of the second hash value and the second sample hash value, and then map the divided result to an integer value from 0 to 100 to obtain the second hash value of the file to be tested and any one or more webshells. Matching values between the second sample hash values corresponding to the samples.
  • the server may determine that there is a sample feature hash value matching the feature hash value, and then determine the test value to be tested.
  • the file is a webshell.
  • the second matching threshold for example, 50
  • the first matching threshold for example, 100
  • the matching values is not greater than the second matching threshold (for example, 50) and is not smaller than 0, it can be determined that the file to be tested is a non-webshell.
  • the second matching threshold for example, 50
  • the server executes step 303 to establish a hash fingerprint database according to the N sample feature hash values of webshell samples of different scripting languages and webshell samples under N types of text similarity algorithms.
  • You can further combine the conditions for determining the file under test as a webshell such as the matching value is equal to the first matching threshold
  • the conditions of the variant webshell such as the matching value is greater than the second matching threshold (such as 50) and less than the first matching threshold
  • Non-webshell conditions such as the matching value is not greater than the second matching threshold (such as 50) and not less than 0) to establish a hash fingerprint database.
  • the first fingerprint similarity algorithm and the second text similarity algorithm are adopted to establish the hash fingerprint database, which are the simhash algorithm and the ssdeep algorithm, respectively.
  • the hash fingerprint database can be subdivided into two sub-segments.
  • Hash fingerprint database one is a sub-hash fingerprint database under the simhash algorithm (hereinafter referred to as the simhash fingerprint database), and the other is a sub-hash fingerprint database under the ssdeep algorithm (hereinafter referred to as the ssdeep fingerprint database).
  • the ssdeep fingerprint database includes each webshell sample and the sample feature hash value of each webshell sample under the ssdeep algorithm (hereinafter referred to as ssdeep sample Characteristic hash value).
  • the server can compare the feature hash value with each simhash sample to satisfy that the Hamming distance is equal to the first Hamming threshold, the Hamming distance is greater than the first Hamming threshold, and is not greater than the second Hamming threshold, the Hamming distance.
  • hash values greater than the second Hamming threshold are stored in the simhash fingerprint database in association with each other, and tag information can be added to each type of hash value during storage. For example, you can compare the hash value with the characteristic hash value of each simhash sample to satisfy such a hash value with a Hamming distance equal to the first Hamming threshold, and add the tag information for marking as a webshell; you can compare the characteristic hash value with each simhash sample to satisfy the Han This type of hash value is greater than the first Hamming threshold and not greater than the second Hamming threshold.
  • Marking information used to mark the variant as a webshell is added; it can be compared with the characteristic hash value of each simhash sample to meet the greater than the second Hamming.
  • This type of hash value for thresholds adds tagging information for marking as non-webshell.
  • a string that is, a hash value
  • a Hamming distance of 0 from 1001001 and 1001001 can be associated with 1001001 and stored in the simhash fingerprint database.
  • Is the string to which the webshell belongs stores a string with a Hamming distance of 1001001 greater than 3 (that is, a hash value) in association with 1001001 to the simhash fingerprint database, and marks this type of string as a string that does not belong to the webshell; it will be linked to 1001001
  • a string of Hamming distance greater than 0 and not less than 3 ie, a hash value is associated with 1001001 and stored in the simhash fingerprint database, and this type of string is marked as the string to which the variant webshell belongs.
  • the server after the server determines the characteristic hash value of the file to be tested, it can search for a character string that is the same as the characteristic hash value of the file to be tested in a pre-established hash fingerprint database, and then determine the tag information to which the string belongs. If the tag information indicates that the character string belongs to the webshell, the file to be tested is determined to be a webshell; if the tag information indicates that the character string is a character string that does not belong to the webshell, the file to be tested is determined to be a non-webshell; if the tag information Indicates that the string is a string to which the variant webshell belongs, and it is determined that the file to be tested is a variant webshell. In this way, the calculation amount when comparing the characteristic hash value of the file to be tested with the sample characteristic hash value in the pre-established hash fingerprint database can be reduced, and the webshell detection efficiency can be further improved.
  • the server may detect the current running environment, and determine a target algorithm matching the current running environment from a preset list of text similarity algorithms. If the target algorithm is a second text similarity algorithm, then based on the The two-text similarity algorithm detects files under test in the target directory and / or the target path indicated by the security detection instruction.
  • the text similarity algorithm that matches the current running environment is used to detect the file to be tested, which is helpful to improve the accuracy of the webshell detection.
  • An embodiment of the present application further provides a webshell detection device, and the device includes a module for executing the method described in FIG. 1, FIG. 2, or FIG. 3.
  • FIG. 4 it is a schematic block diagram of a webshell detection device according to an embodiment of the present application.
  • the webshell detection device of this embodiment includes:
  • a detection module 40 is configured to detect a file to be tested in a target directory and / or a target path according to the instruction of the security test instruction when a security detection instruction is received, and determine characteristics of the file to be tested. hash value
  • a comparison module 41 configured to compare the characteristic hash value of the file to be tested with a sample characteristic hash value of a webshell sample in a pre-established hash fingerprint database
  • a determining module 42 is configured to determine that the file to be tested is a webshell if a sample characteristic hash value matching the characteristic hash value exists.
  • the apparatus further includes: an obtaining module 43 and a establishing module 44, wherein:
  • the determining module 42 is further configured to use N types of text similarity algorithms to determine N sample feature hash values of each webshell sample under the N types of text similarity algorithms, where N is a positive integer;
  • a establishing module 44 is configured to establish a hash fingerprint database according to webshell samples of the different scripting languages and N sample feature hash values of the webshell samples under the N text similarity algorithms.
  • the detection module 40 is specifically configured to detect a current running environment, and determine a target algorithm matching the current running environment from a preset text similarity algorithm list, and the preset text similarity
  • the algorithm list includes the N kinds of text similarity algorithms; based on the target algorithm, detecting the files to be tested in the target directory and / or the target path indicated by the security detection instruction, and determining whether the files to be tested are A characteristic hash value under the target algorithm.
  • the comparison module 41 is specifically configured to obtain any one or more webshell samples from a pre-established hash fingerprint database, and the respective sample feature hash values corresponding to the target algorithm under the target algorithm, if present, are consistent with the feature hash values. Matching sample feature hash values; comparing the feature hash values of the file under test under the target algorithm with the sample feature hash values corresponding to any one or more webshell samples under the target algorithm comparing.
  • the target algorithm is a first text similarity algorithm
  • the comparison module 41 is further specifically configured to obtain a characteristic hash value of the file under test under the first text similarity algorithm and the task hash value.
  • Hamming distances between one or more webshell samples corresponding to the respective sample feature hash values under the first text similarity algorithm; a determination module 42 is specifically configured to detect if any one of the Hamming distances If the bright distance is equal to the first Hamming threshold, it is determined that there is a sample feature hash value that matches the feature hash value.
  • the target algorithm is a second text similarity algorithm
  • the comparison module 41 is specifically configured to obtain a feature hash value of the file under test under the second text similarity algorithm and any one of the two. Or each weighted edit distance between the respective sample feature hash values of multiple webshell samples under the second text similarity algorithm; and determining the feature hash under the second text similarity algorithm according to the weighted edit distances Each matching value between the value and the sample feature hash value corresponding to each of the one or more webshell samples under the second text similarity algorithm;
  • the determining module 42 is further specifically configured to determine that, if any of the matching values is equal to the first matching threshold, a sample feature hash value matching the feature hash value exists.
  • the determining module 42 determines that any one of the Hamming distances is greater than the first Hamming threshold and not greater than the second Hamming threshold, determining that the file to be tested is a variant webshell, and the first The second Hamming threshold is greater than the first Hamming threshold;
  • the determining module 42 determines that any one of the matching values is greater than a second matching threshold and less than the first matching threshold, it is determined that the file to be tested is a variant webshell, and the first matching threshold is greater than The second matching threshold.
  • the determining module 42 is further configured to identify the security detection instruction to determine a target directory and / or a target path to be tested by the security detection instruction; and the acquisition module 43 is further configured to: The extensions of all the files in the target directory and / or the target path are obtained, and files with the preset extensions in the target directory and / or the target path are determined as the files to be tested.
  • FIG. 5 is a schematic block diagram of a server provided by an embodiment of the present application.
  • the server includes a processor 501, a memory 502, and a network interface 503.
  • the processor 501, the memory 502, and the network interface 503 may be connected through a bus or in other manners.
  • connection through a bus is taken as an example.
  • the network interface 503 is controlled by the processor to send and receive messages, and the memory 502 is used to store a computer program.
  • the computer program includes program instructions, and the processor 501 is configured to execute the program instructions stored in the memory 502.
  • the processor 501 is configured to call the program instruction to execute: upon receiving a security detection instruction, detecting a file to be tested in a target directory and / or a target path according to an instruction of the security detection instruction. To determine a characteristic hash value of the file to be tested; compare the characteristic hash value of the file to be tested with a sample characteristic hash value of a webshell sample in a pre-established hash fingerprint database; if there is a hash with the characteristic If the value matches the sample feature hash value, it is determined that the file to be tested is a webshell.
  • the processor 501 may be a Central Processing Unit (CPU), and the processor 501 may also be another general-purpose processor or a digital signal processor (Digital Signal Processor, DSP). ), Application Specific Integrated Circuit (ASIC), Field-Programmable Gate Array (FPGA) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, etc.
  • a general-purpose processor may be a microprocessor or the processor may be any conventional processor or the like.
  • the memory 502 may include a read-only memory and a random access memory, and provide instructions and data to the processor 501. A part of the memory 502 may further include a non-volatile random access memory. For example, the memory 502 may also store information of a device type.
  • the processor 501, the memory 502, and the network interface 503 described in the embodiments of the present application may perform the implementation manners described in the method embodiments described in FIG. 1, FIG. 2 or FIG. 3 provided by the embodiments of the present application,
  • the implementation manner of the webshell detection device described in the embodiment of the present application may also be performed, and details are not described herein again.
  • a computer-readable storage medium stores a computer program, where the computer program includes program instructions, and the program instructions are implemented when executed by a processor: When a security detection instruction is received, the files to be tested in the target directory and / or the target path are detected according to the instructions of the security detection instruction, and a characteristic hash value of the files to be tested is determined; The feature hash value of the test file is compared with a sample feature hash value of a webshell sample in a pre-established hash fingerprint database; if a sample feature hash value that matches the feature hash value exists, determining that the file to be tested is a webshell .
  • the computer-readable storage medium may be an internal storage unit of the server according to any of the foregoing embodiments, such as a hard disk or a memory of the server.
  • the computer-readable storage medium may also be an external storage device of the server, such as a plug-in hard disk, a Smart Media Card (SMC), and a Secure Digital (SD) card provided on the server. , Flash card (Flash card) and so on.
  • the computer-readable storage medium may further include both an internal storage unit of the server and an external storage device.
  • the computer-readable storage medium is used to store the computer program and other programs and data required by the server.
  • the computer-readable storage medium may also be used to temporarily store data that has been or will be output.
  • the program can be stored in a computer-readable storage medium.
  • the storage medium may be a magnetic disk, an optical disk, a read-only memory (Read-Only Memory, ROM), or a random access memory (Random, Access Memory, RAM).

Abstract

A malicious script back door webshell detection method and a related device, wherein the method comprises: Upon receiving a security detection instruction, a server detects a file to be detected under a target directory and/or a target path according to the indication of the security detection instruction, and determines a feature hash value of the file to be detected (101); The server compares the feature hash value of the file to be detected with a sample feature hash value of a webshell sample in a pre-established hash fingerprint database (102), if there is a sample feature hash value matching the feature hash value exists, the server determines that the file to be detected is a webshell (103). The method is beneficial to improve the detection efficiency of the webshell.

Description

一种webshell检测方法及相关设备Webshell detection method and related equipment
本申请要求于2018年06月27日提交中国专利局、申请号为201810685031.3、申请名称为“一种webshell检测方法及相关设备”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。This application claims priority from a Chinese patent application filed on June 27, 2018 with the Chinese Patent Office, application number 201810685031.3, and application name "A Webshell Detection Method and Related Equipment", the entire contents of which are incorporated herein by reference. in.
技术领域Technical field
本申请涉及计算机技术领域,尤其涉及一种webshell检测方法及相关设备。The present application relates to the field of computer technology, and in particular, to a webshell detection method and related equipment.
背景技术Background technique
恶意脚本后门webshell是一种黑客对服务器进行控制的后门程序。如果黑客发现web应用存在文件上传漏洞,可以上传webshell进行后续的攻击,并利用该webshell暗中对web服务器进行远程控制,上传、查看、修改、删除网站服务器上的文件,读取并修改网站数据库的数据,甚至可以直接在网站服务器上运行系统命令。A malicious script backdoor webshell is a backdoor program in which a hacker controls the server. If a hacker finds that a web application has a file upload vulnerability, he can upload a webshell for subsequent attacks, and use the webshell to secretly remotely control the web server, upload, view, modify, delete files on the website server, read and modify the website database. Data, you can even run system commands directly on the web server.
目前,webshell检测主要通过恶意代码,字符串编码,危险函数等方法,需要将待测文件对应的代码或者函数与现有的恶意代码、危险函数进行对比,运算量大,耗时很长,检测效率较低。At present, webshell detection mainly uses methods such as malicious code, string encoding, and dangerous functions. The code or function corresponding to the file to be tested needs to be compared with existing malicious codes and dangerous functions. The amount of calculation is large and the time-consuming is long. Low efficiency.
发明内容Summary of the invention
本申请实施例提供了一种webshell检测方法及相关设备,在进行webshell检测时,不用实时比对待测文件的字符串、代码、各种函数等等,而仅是比对hash值,有利于提高webshell检测效率。The embodiments of the present application provide a webshell detection method and related equipment. When performing webshell detection, it is not necessary to compare the string, code, various functions, etc. of the file to be tested in real time, but only to compare the hash value, which is beneficial to improve webshell detection efficiency.
第一方面,本申请实施例提供了一种webshell检测方法,该方法包括:In a first aspect, an embodiment of the present application provides a webshell detection method. The method includes:
在接收到安全检测指令的情况下,根据所述安全检测指令的指示对目标目录下和/或目标路径下的待测文件进行检测,确定出所述待测文件的特征hash值;When a security detection instruction is received, detecting a file under test in a target directory and / or a target path according to the instruction of the security detection instruction, and determining a characteristic hash value of the file under test;
将所述待测文件的所述特征hash值与预先建立的hash指纹库中webshell样本的样本特征hash值进行对比;Comparing the characteristic hash value of the file to be tested with a sample characteristic hash value of a webshell sample in a pre-established hash fingerprint database;
若存在与所述特征hash值匹配的样本特征hash值,则确定所述待测文件为webshell。If there is a sample feature hash value that matches the feature hash value, it is determined that the file to be tested is a webshell.
第二方面,本申请实施例提供了一种webshell检测装置,该webshell检测装置包括用 于执行上述第一方面的方法的模块。In a second aspect, an embodiment of the present application provides a webshell detection device. The webshell detection device includes a module for executing the method in the first aspect.
第三方面,本申请实施例提供了一种服务器,该服务器包括处理器、网络接口和存储器,所述处理器、网络接口和存储器相互连接,其中,所述网络接口受所述处理器的控制用于收发消息,所述存储器用于存储支持服务器执行上述方法的计算机程序,所述计算机程序包括程序指令,所述处理器被配置用于调用所述程序指令,执行上述第一方面的方法。In a third aspect, an embodiment of the present application provides a server. The server includes a processor, a network interface, and a memory. The processor, the network interface, and the memory are connected to each other. The network interface is controlled by the processor. The memory is configured to receive and send messages, and the memory is configured to store a computer program that supports a server to execute the foregoing method, the computer program includes program instructions, and the processor is configured to call the program instructions to execute the method of the first aspect.
第四方面,本申请实施例提供了一种计算机可读存储介质,所述计算机可读存储介质存储有计算机程序,所述计算机程序包括程序指令,所述程序指令当被处理器执行时使所述处理器执行上述第一方面的方法。In a fourth aspect, an embodiment of the present application provides a computer-readable storage medium. The computer-readable storage medium stores a computer program, where the computer program includes program instructions, and the program instructions, when executed by a processor, cause all the The processor executes the method of the first aspect.
本申请实施例中,服务器可以将待测文件的特征hash值与预先建立的hash指纹库中恶意脚本后门webshell样本的样本特征hash值进行对比,如果存在与特征hash值匹配的样本特征hash值,则确定待测文件为webshell。通过本申请实施例,有利于提高webshell的检测效率。In the embodiment of the present application, the server may compare the characteristic hash value of the file under test with the sample characteristic hash value of the malicious script backdoor webshell sample in the pre-established hash fingerprint database. If there is a sample characteristic hash value that matches the characteristic hash value, It is determined that the file to be tested is a webshell. Through the embodiments of the present application, it is beneficial to improve the detection efficiency of the webshell.
附图说明BRIEF DESCRIPTION OF THE DRAWINGS
图1是本申请实施例提供的一种webshell检测方法的流程示意图;FIG. 1 is a schematic flowchart of a webshell detection method according to an embodiment of the present application; FIG.
图2是本申请实施例提供的另一种webshell检测方法的流程示意图;2 is a schematic flowchart of another webshell detection method according to an embodiment of the present application;
图3是本申请实施例提供的又一种webshell检测方法的流程示意图;3 is a schematic flowchart of another webshell detection method according to an embodiment of the present application;
图4是本申请实施例提供的一种webshell检测装置的示意性框图;4 is a schematic block diagram of a webshell detection device according to an embodiment of the present application;
图5是本申请实施例提供的一种服务器的示意性框图。FIG. 5 is a schematic block diagram of a server according to an embodiment of the present application.
具体实施方式detailed description
下面将结合本申请实施例中的附图,对本申请实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例仅仅是本申请一部分实施例,而不是全部的实施例。基于本申请中的实施例,本领域普通技术人员在没有作出创造性劳动前提下所获得的所有其他实施例,都属于本申请保护的范围。The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application. Obviously, the described embodiments are only a part of the embodiments of the present application, but not all of the embodiments. Based on the embodiments in this application, all other embodiments obtained by a person of ordinary skill in the art without creative efforts shall fall within the protection scope of this application.
参见图1,图1是本申请实施例提供的一种webshell检测方法的流程示意图,如图所示,该webshell检测方法可包括:Referring to FIG. 1, FIG. 1 is a schematic flowchart of a webshell detection method according to an embodiment of the present application. As shown in the figure, the webshell detection method may include:
101、服务器在接收到安全检测指令的情况下,根据安全检测指令的指示对目标目录下和/或目标路径下的待测文件进行检测,确定出待测文件的特征hash值。101. When receiving a security detection instruction, the server detects a file under test in a target directory and / or a target path according to the instruction of the security detection instruction, and determines a characteristic hash value of the file under test.
在一个实施例中,当运维人员发现有入侵事件或者发现业务系统有webshell存在时,可以针对服务器输入安全检测指令,用于指示服务器针对指定目录下或者指定路径下的待测文件进行扫描检测,例如该安全测试指令可以用于指示对F目录中的待测文件进行扫描检测。进一步地,服务器接收到用户输入的该安全检测指令后,则可以根据用户的指示对指定目录或者指定路径的任意一个待测文件进行检测,并通过文本相似度算法确定出待测文件的特征hash值。In one embodiment, when an operation and maintenance personnel finds an intrusion event or finds that a business system has a webshell, a security detection instruction may be input for the server, which is used to instruct the server to scan and detect files to be tested in a specified directory or a specified path For example, the security test instruction may be used to instruct scanning and detection of files to be tested in the F directory. Further, after receiving the security detection instruction input by the user, the server can detect any one of the files to be tested in the specified directory or the specified path according to the user's instruction, and determine the characteristic hash of the file to be tested through the text similarity algorithm. value.
在一个实施例中,服务器根据安全检测指令的指示对目标目录下和/或目标路径下的待测文件进行检测之前,还可以识别安全测试指令,确定安全测试指令所需测试的目标目录和/或目标路径,并获取目标目录和/或目标路径中所有文件的扩展名,进而将目标目录和/或目标路径中扩展名为预设扩展名的文件确定为待测文件。采用这样的方式,可以不用对指定目录中或者指定路径中的所有文件进行扫描,减少运算量,进一步提高webshell检测效率。其中,该预设扩展名可以为不包括运维人员指定的安全文件的扩展名,例如不包括.doc、.pdf和.rar等的扩展名。In one embodiment, before the server detects the files to be tested in the target directory and / or the target path according to the instruction of the security detection instruction, the server may further identify the security test instruction and determine the target directory and / Or the target path, and obtain the extensions of the target directory and / or all files in the target path, and further determine the files with the preset extensions in the target directory and / or the target path as the files to be tested. In this way, it is not necessary to scan all files in the specified directory or the specified path, which reduces the amount of calculation and further improves the efficiency of webshell detection. The preset extension may be an extension that does not include the security file designated by the operation and maintenance personnel, for example, it does not include extensions such as .doc, .pdf, and .rar.
示例性地,运维人员想要对目录F中且扩展名为预设扩展名的文件进行webshell检测,这种情况下,运维人员可以通过服务器输入安全测试指令,用于指示对目录F(即目标目录)中且扩展名为预设扩展名的文件(即待测文件)进行webshell检测。进一步地,服务器接收到该安全测试指令后,可以将目录F确定为目标目录,并识别目录F中所有文件的扩展名,将目录F中扩展名为预设扩展名的文件确定为待测文件,并对待测文件进行webshell检测。Exemplarily, the operation and maintenance personnel want to perform webshell detection on the file in the directory F and the extension is a preset extension. In this case, the operation and maintenance personnel can enter a security test instruction through the server to instruct the directory F ( That is, the file (ie the file to be tested) in the target directory) and the extension is a preset extension for webshell detection. Further, after receiving the security test instruction, the server may determine the directory F as a target directory, identify the extensions of all the files in the directory F, and determine the files with the preset extensions in the directory F as the files to be tested. And perform webshell detection on the file under test.
102、服务器将待测文件的特征hash值与预先建立的hash指纹库中webshell样本的样本特征hash值进行对比。102. The server compares a characteristic hash value of the file to be tested with a sample characteristic hash value of a webshell sample in a pre-established hash fingerprint database.
103、如果存在与特征hash值匹配的样本特征hash值,服务器则确定待测文件为webshell。103. If there is a sample characteristic hash value that matches the characteristic hash value, the server determines that the file to be tested is a webshell.
在一个实施例中,服务器可以预先获取不同脚本语言的webshell样本,并通过文本相似度算法确定各个webshell样本的样本特征hash值,进而将不同脚本语言的webshell样本,以及各个webshell样本的样本特征hash值关联存储至hash指纹库中,也即建立hash指纹库。这种情况下,当服务器确定出待测文件的特征hash值之后,可以将待测文件的特征hash值与指纹库中的任一个或者多个webshell样本的样本特征hash值进行对比,如果对比时确定出特征hash值与任一个样本特征hash值匹配(如:特征hash值与任一个样本特征hash 值之间的汉明距离等于汉明距离阈值,或者特征hash值与任一个样本特征hash值之间的匹配值等于匹配阈值等),则确定该待测文件为webshell。采用这样的检测方式,不用实时比对待测文件的字符串、代码特征、各种函数等等,而仅是比对hash值,有利于提高webshell检测效率。In one embodiment, the server may obtain webshell samples of different scripting languages in advance, and determine the sample feature hash value of each webshell sample through a text similarity algorithm, and then hash the webshell samples of different scripting languages and the sample feature of each webshell sample. Values are stored in the hash fingerprint database, that is, the hash fingerprint database is established. In this case, after the server determines the characteristic hash value of the file to be tested, the characteristic hash value of the file to be tested may be compared with the sample characteristic hash value of any one or more webshell samples in the fingerprint database. It is determined that the feature hash value matches any of the sample feature hash values (for example, the Hamming distance between the feature hash value and any sample feature hash value is equal to the Hamming distance threshold, or the feature hash value matches any of the sample feature hash values. The matching value is equal to the matching threshold, etc.), it is determined that the file under test is a webshell. With such a detection method, it is not necessary to compare the character string, code characteristics, various functions, etc. of the file to be tested in real time, but only to compare the hash value, which is beneficial to improving the efficiency of webshell detection.
本申请实施例中,服务器可以在接收到安全检测指令的情况下,根据安全检测指令的指示对目标目录下和/或目标路径下的待测文件进行检测,确定出待测文件的特征hash值,并将待测文件的特征hash值与预先建立的hash指纹库中webshell样本的样本特征hash值进行对比,若存在与特征hash值匹配的样本特征hash值,则确定待测文件为webshell。通过本申请实施例,有利于提高webshell的检测效率。In the embodiment of the present application, when receiving the security detection instruction, the server may detect the file under test in the target directory and / or the target path according to the instruction of the security detection instruction, and determine a characteristic hash value of the file under test. The feature hash value of the file to be tested is compared with the sample feature hash value of the webshell sample in the pre-established hash fingerprint database. If there is a sample feature hash value that matches the characteristic hash value, it is determined that the file to be tested is webshell. Through the embodiments of the present application, it is beneficial to improve the detection efficiency of the webshell.
参见图2,图2是本申请实施例提供的另一种webshell检测方法的流程示意图,如图所示,该webshell检测方法可包括:Referring to FIG. 2, FIG. 2 is a schematic flowchart of another webshell detection method according to an embodiment of the present application. As shown in the figure, the webshell detection method may include:
201、服务器获取不同脚本语言的webshell样本。201. The server obtains webshell samples of different scripting languages.
202、服务器采用N种文本相似度算法确定出每个webshell样本在N种文本相似度算法下的N个样本特征hash值,其中,N为正整数。202. The server uses N types of text similarity algorithms to determine N sample feature hash values of each webshell sample under the N types of text similarity algorithms, where N is a positive integer.
203、服务器根据不同脚本语言的webshell样本和webshell样本在N种文本相似度算法下的N个样本特征hash值,建立hash指纹库。203. The server establishes a hash fingerprint database according to the N sample feature hash values of the webshell samples of different scripting languages and the webshell samples under the N text similarity algorithms.
在一个实施例中,服务器可以获取不同脚本语言的webshell样本,并采用不同的文本相似度算法计算webshell样本的hash值,也即对于同一个webshell样本而言,N种算法则对应N个hash值。获取不同脚本语言的webshell样本,可以克服大多数检测webshell是基于ASP、ASP.NET、PHP等语言的脚本后门进行检查,对jsp等其他脚本语言支持不足的问题,有利于多种脚本语言文件检测支持。其中,文本相似度算法可以包括simhash算法、ssdeep算法等。In one embodiment, the server can obtain webshell samples of different scripting languages and use different text similarity algorithms to calculate the hash value of the webshell sample. That is, for the same webshell sample, N algorithms correspond to N hash values. . Obtaining webshell samples of different scripting languages can overcome the problem that most webshells are based on scripting backdoors of ASP, ASP.NET, PHP and other languages. Insufficient support for other scripting languages such as jsp is helpful for detecting multiple scripting language files. stand by. Among them, the text similarity algorithm may include a simhash algorithm, an ssdeep algorithm, and the like.
进一步地,服务器可以将获取到的所有webshell样本,以及与各个webshell样本对应的N个hash样本值关联存储至hash指纹库。例如,当文本相似度算法包括simhash和ssdeep这两种算法时,hash指纹库可以反映如表1-1所示的对应关系:Further, the server may store all the obtained webshell samples and the N hash sample values corresponding to the respective webshell samples in a hash fingerprint database. For example, when the text similarity algorithm includes two algorithms, simhash and ssdeep, the hash fingerprint database can reflect the correspondence shown in Table 1-1:
表1-1Table 1-1
webshell样本webshell samples simhash下的hashhash under simhash ssdeep下的hashhash under ssdeep
aa 10000000011000000001 101000000001101000000001
bb 10000000101000000010 101100000000101100000000
或者,服务器也可以针对不同的文本相似度算法,建立各个文本相似度算法各自对应的子hash指纹库,也即hash指纹库中包括一个或者多个该子hash指纹库。例如,当文本相似度算法包括simhash和ssdeep这两种算法时,hash指纹库可以分别建立simhash算法和ssdeep算法的子hash指纹库,建立后的子hash指纹库反映的关系可以分别如表1-2和1-3所示。针对不同的文本相似度算法建立各自对应的子hash指纹库,这样便于当检测到采用目标算法(如simhash算法或者ssdeep算法等)进行文本相似度比对时,可以直接在目标算法对应的子hash指纹库获取到目标算法下的样本特征hash值,进一步提高webshell检测效率。Alternatively, the server may also establish a sub-hash fingerprint database corresponding to each text similarity algorithm for different text similarity algorithms, that is, the hash fingerprint database includes one or more sub-hash fingerprint databases. For example, when the text similarity algorithm includes two algorithms, simhash and ssdeep, the hash fingerprint database can establish a sub-hash fingerprint database of the simhash algorithm and the ssdeep algorithm, respectively, and the relationship reflected by the established sub-hash fingerprint database can be shown in Table 1- 2 and 1-3. Establish corresponding sub-hash fingerprint libraries for different text similarity algorithms, so that it is easy to directly target the corresponding sub-hash when the target algorithm (such as simhash algorithm or ssdeep algorithm) is used for text similarity comparison. The fingerprint database obtains the sample feature hash value under the target algorithm, which further improves the webshell detection efficiency.
表1-2Table 1-2
webshell样本webshell samples simhash下的hashhash under simhash
aa 10000000011000000001
bb 10000000101000000010
表1-3Table 1-3
webshell样本webshell samples ssdeep下的hashhash under ssdeep
aa 101000000001101000000001
bb 101100000000101100000000
204、服务器在接收到安全检测指令的情况下,检测当前运行环境,从预设的文本相似度算法列表中确定出与当前运行环境相匹配的目标算法,该目标算法为第一文本相似度算法或者第二文本相似度算法。该预设的文本相似度算法列表中包括N种文本相似度算法。204. When the server receives the security detection instruction, it detects the current operating environment, and determines a target algorithm that matches the current operating environment from a preset list of text similarity algorithms. The target algorithm is the first text similarity algorithm. Or the second text similarity algorithm. The preset text similarity algorithm list includes N types of text similarity algorithms.
其中,该文本相似度算法列表中包括的N种文本相似度算法和步骤202中建立hash指纹库时所采用的N种文本相似度算法是保持一致的。The N text similarity algorithms included in the text similarity algorithm list are consistent with the N text similarity algorithms used when the hash fingerprint database is established in step 202.
在一个实施例中,该N种文本相似度算法可以包括第一文本相似度算法和第二文本相似度算法中的至少一个。其中,该第一文本相似度算法例如可以为simhash算法和第二文本相似度算法例如可以为ssdeep算法,这两种算法都是相似性检测算法,都可用于对疑似样本(即待测文件)生成特征hash值后与上述hash指纹库中任一webshell样本的样本特征hash值进行hash比对。但就这两种算法而言,simhash算法使用容易,检测时间短,对生产环境的部署配置没有要求,适用于远程检测,安全应急响应中快速检测等场景。ssdeep 算法相比simhash算法检测准确度更高,但是对部署环境有一定的要求,依赖额外的库(如功能模块依赖算法库),也即ssdeep算法一般适用于本地场景的webshell检测。In one embodiment, the N types of text similarity algorithms may include at least one of a first text similarity algorithm and a second text similarity algorithm. The first text similarity algorithm may be, for example, the simhash algorithm and the second text similarity algorithm may be, for example, the ssdeep algorithm. Both of these algorithms are similarity detection algorithms, and both may be used for suspect samples (ie, files to be tested). After the feature hash value is generated, a hash comparison is performed with the sample feature hash value of any webshell sample in the above-mentioned hash fingerprint database. But in terms of these two algorithms, the simhash algorithm is easy to use, has a short detection time, and has no requirements for the deployment configuration of the production environment. It is suitable for scenarios such as remote detection and rapid detection in security emergency response. The ssdeep algorithm has higher detection accuracy than the simhash algorithm, but it has certain requirements on the deployment environment and relies on additional libraries (such as function modules relying on the algorithm library), that is, the ssdeep algorithm is generally suitable for webshell detection in local scenarios.
在一个实施例中,文本相似度算法列表中包括上述第一文本相似度算法和第二文本相似度算法,该第一文本相似度算法对生产环境的部署配置没有要求,第二文本相似度算法,对部署环境有一定的要求,运行时需要依赖额外的库。这种情况下,服务器在接收到安全检测指令后,可以检测当前运行环境是否配置有第二文本相似度算法运行所需的功能模块依赖算法库,如果配置有,则将第二文本相似度算法确定为目标算法,并基于第二文本相似度算法对安全检测指令指示的目标目录和/或目标路径中的待测文件进行检测,确定出待测文件在第二文本相似度算法下的特征hash值。如果未配置有,则将第一文本相似度算法确定为目标算法,并基于第一文本相似度算法对安全检测指令指示的目标目录和/或目标路径中的待测文件进行检测,确定出待测文件在第一文本相似度算法下的特征hash值。采用多种算法结合的方式进行webshell检测,一方面使得本方案可以在远程和本地不同场景执行webshell检测,提高了通用性;另一方面,在本地场景执行的情况下,可以优先选择检测精度更高的算法(如ssdeep算法),使得webshell检测结果更加准确。In one embodiment, the text similarity algorithm list includes the first text similarity algorithm and the second text similarity algorithm. The first text similarity algorithm has no requirements for the deployment configuration of the production environment, and the second text similarity algorithm. There are certain requirements for the deployment environment, and the runtime needs to rely on additional libraries. In this case, after receiving the security detection instruction, the server can detect whether the current operating environment is configured with the function module required for the second text similarity algorithm to rely on the algorithm library. If configured, the second text similarity algorithm is configured. Determine the target algorithm, and detect the test files in the target directory and / or target path indicated by the security detection instruction based on the second text similarity algorithm, and determine the characteristic hash of the test file under the second text similarity algorithm value. If it is not configured, the first text similarity algorithm is determined as the target algorithm, and based on the first text similarity algorithm, the target file and / or the target file in the target path indicated by the security detection instruction are detected to determine the target Measure the characteristic hash value of the file under the first text similarity algorithm. Using a combination of multiple algorithms for webshell detection, on the one hand, this solution can perform webshell detection in different remote and local scenarios, which improves the generality; on the other hand, in the case of local scene execution, the detection accuracy can be preferentially selected. High algorithms (such as the ssdeep algorithm) make webshell detection results more accurate.
205、如果该目标算法为第一文本相似度算法,服务器基于第一文本相似度算法对安全检测指令指示的目标目录下和/或目标路径下的待测文件进行检测,确定出待测文件在第一文本相似度算法下的特征hash值。205. If the target algorithm is a first text similarity algorithm, the server detects the files to be tested in the target directory and / or the target path indicated by the security detection instruction based on the first text similarity algorithm, and determines that the files to be tested are Feature hash value under the first text similarity algorithm.
206、服务器在预先建立的hash指纹库中获取任一个或者多个webshell样本在第一文本相似度算法下各自对应的样本特征hash值。206. The server obtains a sample feature hash value corresponding to any one or more webshell samples under the first text similarity algorithm from a pre-established hash fingerprint database.
207、服务器获取待测文件在第一文本相似度算法下的特征hash值与任一个或者多个webshell样本在第一文本相似度算法下各自对应的样本特征hash值之间的各个汉明距离。207. The server obtains each Hamming distance between the feature hash value of the file under test under the first text similarity algorithm and the sample feature hash value corresponding to any one or more webshell samples under the first text similarity algorithm.
208、如果各个汉明距离中的任一个汉明距离等于第一汉明阈值,服务器则确定存在与特征hash值匹配的样本特征hash值,并确定待测文件为webshell。208. If any Hamming distance of each Hamming distance is equal to the first Hamming threshold, the server determines that a sample feature hash value matching the feature hash value exists, and determines that the file to be tested is a webshell.
在一个实施例中,当服务器确定出目标算法为第一文本相似度算法时,可以在hash指纹库中获取任一个或者多个webshell样本在第一文本相似度算法下各自对应的样本特征hash值(以下简称第一样本hash值),并将获取到任一个或者多个webshell样本各自对应的该第一样本特征hash值与待测文件在第一文本相似度算法下的待测文件的特征hash值(以下简称第一hash值)进行汉明距离对比,获取到待测文件的第一hash值与任一个或者多个webshell样本各自对应第一样本hash值之间的各个汉明距离。进一步地,服务器可以 将各个汉明距离与预先设置的第一汉明阈值、第二汉明阈值进行比较,该第二汉明阈值大于第一汉明阈值。如果各个汉明距离中的任一个汉明距离等于第一汉明阈值(例如0),则可以确定存在与该特征hash值匹配的样本特征hash值,进而可以确定待测文件为webshell;或者,如果该任一个汉明距离大于第一汉明阈值,且不大于第二汉明阈值(例如大于0,且不大于3),则可以确定该待测文件为变种webshell;又或者,如果该任一个汉明距离大于第二汉明阈值(如大于3),则可以确定待测文件为非webshell。In one embodiment, when the server determines that the target algorithm is the first text similarity algorithm, it can obtain any one or more webshell samples in the hash fingerprint database, and the corresponding sample feature hash values under the first text similarity algorithm. (Hereinafter referred to as the first sample hash value), and obtain the characteristic hash value of the first sample corresponding to any one or more webshell samples and the test file under the first text similarity algorithm of the test file. The characteristic hash value (hereinafter referred to as the first hash value) is compared with the Hamming distance to obtain each Hamming distance between the first hash value of the file under test and any one or more webshell samples corresponding to the first sample hash value . Further, the server may compare each Hamming distance with a preset first Hamming threshold and a second Hamming threshold, and the second Hamming threshold is greater than the first Hamming threshold. If any Hamming distance of each Hamming distance is equal to the first Hamming threshold (for example, 0), it can be determined that there is a sample feature hash value matching the feature hash value, and it can be further determined that the file to be tested is a webshell; or, If any Hamming distance is greater than the first Hamming threshold and not greater than the second Hamming threshold (for example, greater than 0 and not greater than 3), the file under test may be determined to be a variant webshell; or, if the If a Hamming distance is greater than the second Hamming threshold (such as greater than 3), it can be determined that the file to be tested is a non-webshell.
其中,当服务器确定出待测文件为webshell或者变种webshell时,还可以输出报警信息,用于提示检测出webshell或者变种webshell。在一个实施例中,Wherein, when the server determines that the file to be tested is a webshell or a variant webshell, it may also output an alarm message for prompting that a webshell or a variant webshell is detected. In one embodiment,
服务器在对待测文件进行检测的过程中,还可以记录当前待检测文件的特征hash值,以及它所在的路径信息(也即是步骤205中的目标路径和目录目录),当检测到待测文件是webshell的时候,还可以在hash指纹库中获取与这个待测文件进行比对,并确定待测文件是webshell的目标webshell样本的路径信息,以及这个目标webshell样本的样本特征hash值。在确定出待测文件是webshell或者变种webshell后,还可以将上述记录的信息(即当前待检测文件的特征hash值,当前待检测文件的路径信息,目标webshell样本的路径信息,以及目标webshell样本的样本特征hash值)生成扫描日志,输出给用户,以便于用户查看。During the process of detecting the file to be tested, the server can also record the characteristic hash value of the current file to be tested, and the path information (that is, the target path and directory directory in step 205). When the file to be tested is detected, When it is a webshell, it is also possible to obtain a comparison with the file under test in the hash fingerprint database, and determine the path information of the target webshell sample of the webshell to be tested, and the sample feature hash value of the target webshell sample. After determining whether the file to be tested is a webshell or a variant webshell, the above recorded information (i.e., the characteristic hash value of the current file to be tested, the path information of the current file to be tested, the path information of the target webshell sample, and the target webshell sample) The sample feature hash value) generates a scan log and outputs it to the user for easy viewing by the user.
在一个实施例中,当服务器确定出待测文件为变种webshell时,还可以将该变种webshell以及该变种webshell对应的特征hash值关联存储至之前建立的hash指纹库中,实现对hash指纹库的更新。In one embodiment, when the server determines that the file to be tested is a variant webshell, the variant webshell and the characteristic hash value corresponding to the variant webshell may be associated and stored in a previously established hash fingerprint database to implement the hash fingerprint database. Update.
本申请实施例中,服务器可以检测当前运行环境,从预设的文本相似度算法列表中确定出与当前运行环境相匹配的目标算法,如果该目标算法为第一文本相似度算法,则基于第一文本相似度算法对安全检测指令指示的目标目录下和/或目标路径下的待测文件进行检测,有利于提高webshell的检测效率。In the embodiment of the present application, the server may detect the current running environment, and determine a target algorithm matching the current running environment from a preset list of text similarity algorithms. If the target algorithm is the first text similarity algorithm, it is based on the first text similarity algorithm. A text similarity algorithm detects the files to be tested in the target directory and / or the target path indicated by the security detection instruction, which is helpful to improve the detection efficiency of the webshell.
参见图3,图3是本申请实施例提供的又一种webshell检测方法的流程示意图,如图所示,该webshell检测方法可包括:Referring to FIG. 3, FIG. 3 is a schematic flowchart of another webshell detection method according to an embodiment of the present application. As shown in the figure, the webshell detection method may include:
301、服务器获取不同脚本语言的webshell样本。301. The server obtains webshell samples of different scripting languages.
302、服务器采用N种文本相似度算法确定出每个webshell样本在N种文本相似度算法下的N个样本特征hash值,其中,N为正整数。302. The server uses N types of text similarity algorithms to determine N sample feature hash values of each webshell sample under the N types of text similarity algorithms, where N is a positive integer.
303、服务器根据不同脚本语言的webshell样本和webshell样本在N种文本相似度算法下的N个样本特征hash值,建立hash指纹库。303. The server establishes a hash fingerprint database according to the N sample feature hash values of the webshell samples of different scripting languages and the webshell samples under the N text similarity algorithms.
304、服务器在接收到安全检测指令的情况下,检测当前运行环境,从预设的文本相似度算法列表中确定出与当前运行环境相匹配的目标算法,该目标算法为第一文本相似度算法或者第二文本相似度算法。304. When receiving the security detection instruction, the server detects the current operating environment, and determines a target algorithm that matches the current operating environment from a preset list of text similarity algorithms. The target algorithm is the first text similarity algorithm. Or the second text similarity algorithm.
其中,步骤301-步骤304的具体实现方式可以参见上述实施例中步骤201-步骤204的相关描述,此处不再赘述。For specific implementations of steps 301 to 304, refer to related descriptions of steps 201 to 204 in the foregoing embodiment, and details are not described herein again.
305、如果该目标算法为第二文本相似度算法,服务器则基于第二文本相似度算法对安全检测指令指示的目标目录下和/或目标路径下的待测文件进行检测,确定出待测文件在第二文本相似度算法下的特征hash值。305. If the target algorithm is a second text similarity algorithm, the server detects the files to be tested in the target directory and / or the target path indicated by the security detection instruction based on the second text similarity algorithm, and determines the files to be tested. Feature hash value under the second text similarity algorithm.
306、服务器在预先建立的hash指纹库中获取任一个或者多个webshell样本在第二文本相似度算法下各自对应的样本特征hash值。306. The server obtains a sample feature hash value corresponding to any one or more webshell samples under a second text similarity algorithm from a pre-established hash fingerprint database.
307、服务器获取待测文件在第二文本相似度算法下的特征hash值与任一个或者多个webshell样本在第二文本相似度算法下各自对应的样本特征hash值之间的各个加权编辑距离。307. The server obtains each weighted editing distance between the feature hash value of the file under test under the second text similarity algorithm and the corresponding sample feature hash value of any one or more webshell samples under the second text similarity algorithm.
308、服务器根据各个加权编辑距离确定第二文本相似度算法下的特征hash值与任一个或者多个webshell样本在第二文本相似度算法下各自对应的样本特征hash值之间的各个匹配值。308. The server determines, according to each weighted editing distance, each matching value between the feature hash value under the second text similarity algorithm and the sample feature hash value corresponding to any one or more webshell samples under the second text similarity algorithm.
309、如果该各个匹配值中任一个匹配值等于第一匹配阈值,服务器则确定存在与特征hash值匹配的样本特征hash值,并确定待测文件为webshell。309. If any of the matching values is equal to the first matching threshold, the server determines that a sample feature hash value matching the feature hash value exists, and determines that the file to be tested is a webshell.
在一个实施例中,当服务器检测到目标算法为第二文本相似度算法时,可以在hash指纹库中获取任一个或者多个webshell样本在第二文本相似度算法下各自对应的样本特征hash值(以下简称第二样本hash值),并将获取到的任一个或者多个webshell样本各自对应的该第二样本特征hash值与待测文件在第二文本相似度算法下的待测文件的特征hash值(以下简称第二hash值)进行相似度对比,获取到待测文件的第二hash值与任一个或者多个webshell样本各自对应的第二样本hash值之间的各个加权编辑距离,进而根据各个加权编辑距离确定出第二hash值与各个第二样本hash值之间的匹配值。在一个实施例中,服务器可以在得到待测文件的第二hash值与任一个或者多个webshell样本各自对应的第二样本hash值之间的各个加权编辑距离之后,可以将各个加权编辑距离除以第二hash值和第二样本hash值的长度和,再将相除后的结果映射到0-100的一个整数值上,进而得到待测文件的第二hash值与任一个或者多个webshell样本各自对应的第二样本hash值之间的各个 匹配值。In one embodiment, when the server detects that the target algorithm is the second text similarity algorithm, it can obtain any one or more webshell samples in the hash fingerprint database, and the corresponding sample feature hash values under the second text similarity algorithm. (Hereinafter referred to as the second sample hash value), and the characteristic hash value of the second sample corresponding to any one or more webshell samples obtained and the characteristics of the file under test under the second text similarity algorithm of the file under test The hash value (hereinafter referred to as the second hash value) is compared for similarity, and the weighted editing distances between the second hash value of the file to be tested and the second sample hash value corresponding to any one or more webshell samples are obtained, and then A matching value between the second hash value and each second sample hash value is determined according to each weighted editing distance. In an embodiment, the server may divide each weighted edit distance after obtaining the weighted edit distances between the second hash value of the file to be tested and the second sample hash value corresponding to any one or more webshell samples. Take the length sum of the second hash value and the second sample hash value, and then map the divided result to an integer value from 0 to 100 to obtain the second hash value of the file to be tested and any one or more webshells. Matching values between the second sample hash values corresponding to the samples.
进一步地,这种情况下,如果该各个匹配值中的任一个匹配值等于第一匹配阈值(如100),服务器则可以确定存在与该特征hash值匹配的样本特征hash值,进而确定待测文件为webshell。Further, in this case, if any one of the respective matching values is equal to the first matching threshold (for example, 100), the server may determine that there is a sample feature hash value matching the feature hash value, and then determine the test value to be tested. The file is a webshell.
或者,如果该任一个匹配值大于第二匹配阈值(如50)且小于该第一匹配阈值(如100),则确定待测文件为变种webshell,也即可能存在黑客对该Webshell的代码进行了混淆的情况,该第一匹配阈值大于第二匹配阈值。Or, if any of the matching values is greater than the second matching threshold (for example, 50) and less than the first matching threshold (for example, 100), it is determined that the file to be tested is a variant webshell, that is, a hacker may have performed code on the webshell In the case of confusion, the first matching threshold is greater than the second matching threshold.
又或者,如果该任一个匹配值不大于第二匹配阈值(如50),且不小于0,则可以确定待测文件为非webshell。Or, if any of the matching values is not greater than the second matching threshold (for example, 50) and is not smaller than 0, it can be determined that the file to be tested is a non-webshell.
在一个实例中,服务器在执行步骤303,根据不同脚本语言的webshell样本和webshell样本在N种文本相似度算法下的N个样本特征hash值,建立hash指纹库时。还可以进一步结合判定待测文件为webshell的条件(如匹配值等于第一匹配阈值)、变种webshell的条件(如该匹配值大于第二匹配阈值(如50)且小于该第一匹配阈值)以及非webshell的条件(如匹配值不大于第二匹配阈值(如50),且不小于0),来建立hash指纹库。In an example, the server executes step 303 to establish a hash fingerprint database according to the N sample feature hash values of webshell samples of different scripting languages and webshell samples under N types of text similarity algorithms. You can further combine the conditions for determining the file under test as a webshell (such as the matching value is equal to the first matching threshold), the conditions of the variant webshell (such as the matching value is greater than the second matching threshold (such as 50) and less than the first matching threshold), and Non-webshell conditions (such as the matching value is not greater than the second matching threshold (such as 50) and not less than 0) to establish a hash fingerprint database.
示例性地,假设建立hash指纹库采用了第一文本相似度算法和第二文本相似度算法,分别为simhash算法和ssdeep算法,在建立hash指纹库时,可以将hash指纹库细分为两个子hash指纹库,一个为simhash算法下的子hash指纹库(以下简称simhash指纹库),另一个为ssdeep算法下的子hash指纹库(以下简称ssdeep指纹库),其中,该simhash指纹库包括各个webshell样本以及各个webshell样本在simhash算法下的样本特征hash值(以下简称simhash样本特征hash值);该ssdeep指纹库包括各个webshell样本以及各个webshell样本在ssdeep算法下的样本特征hash值(以下简称ssdeep样本特征hash值)。这种情况下,服务器可以将与各个simhash样本特征hash值对比满足汉明距离等于第一汉明阈值、汉明距离大于第一汉明阈值,且不大于第二汉明阈值、该汉明距离大于第二汉明阈值(如大于3)的各类hash值关联存储至simhash指纹库中,在存储时可以分别给每一类的hash值添加标记信息。例如,可以将与各个simhash样本特征hash值对比满足汉明距离等于第一汉明阈值的这类hash值,添加用于标记为webshell的标记信息;可以将与各个simhash样本特征hash值对比满足汉明距离大于第一汉明阈值,且不大于第二汉明阈值的这类hash值,添加用于标记为变种webshell的标记信息;可以将与各个simhash样本特征hash值对比满足大于第二汉明阈值的这类hash值,添加用于标记为非webshell的标记信息。Exemplarily, it is assumed that the first fingerprint similarity algorithm and the second text similarity algorithm are adopted to establish the hash fingerprint database, which are the simhash algorithm and the ssdeep algorithm, respectively. When the hash fingerprint database is established, the hash fingerprint database can be subdivided into two sub-segments. Hash fingerprint database, one is a sub-hash fingerprint database under the simhash algorithm (hereinafter referred to as the simhash fingerprint database), and the other is a sub-hash fingerprint database under the ssdeep algorithm (hereinafter referred to as the ssdeep fingerprint database). Samples and the sample feature hash value of each webshell sample under the simhash algorithm (hereinafter referred to as the simhash sample feature hash value); the ssdeep fingerprint database includes each webshell sample and the sample feature hash value of each webshell sample under the ssdeep algorithm (hereinafter referred to as ssdeep sample Characteristic hash value). In this case, the server can compare the feature hash value with each simhash sample to satisfy that the Hamming distance is equal to the first Hamming threshold, the Hamming distance is greater than the first Hamming threshold, and is not greater than the second Hamming threshold, the Hamming distance. Various hash values greater than the second Hamming threshold (such as greater than 3) are stored in the simhash fingerprint database in association with each other, and tag information can be added to each type of hash value during storage. For example, you can compare the hash value with the characteristic hash value of each simhash sample to satisfy such a hash value with a Hamming distance equal to the first Hamming threshold, and add the tag information for marking as a webshell; you can compare the characteristic hash value with each simhash sample to satisfy the Han This type of hash value is greater than the first Hamming threshold and not greater than the second Hamming threshold. Marking information used to mark the variant as a webshell is added; it can be compared with the characteristic hash value of each simhash sample to meet the greater than the second Hamming. This type of hash value for thresholds adds tagging information for marking as non-webshell.
例如,simhash算法下webshell样本a的样本特征hash值为1001001,那么可以将与1001001的汉明距离=0的字符串(即hash值)与1001001关联存储至simhash指纹库,并标记这类字符串为webshell所属的字符串;将与1001001的汉明距离大于3的字符串(即hash值)与1001001关联存储至simhash指纹库,并标记这类字符串为非webshell所属的字符串;将与1001001的汉明距离大于0且不小于3的字符串(即hash值)与1001001关联存储至simhash指纹库,并标记这类字符串为变种webshell所属的字符串。For example, if the sample feature hash value of webshell sample a under the simhash algorithm is 1001001, then a string (that is, a hash value) with a Hamming distance of 0 from 1001001 and 1001001 can be associated with 1001001 and stored in the simhash fingerprint database. Is the string to which the webshell belongs; stores a string with a Hamming distance of 1001001 greater than 3 (that is, a hash value) in association with 1001001 to the simhash fingerprint database, and marks this type of string as a string that does not belong to the webshell; it will be linked to 1001001 A string of Hamming distance greater than 0 and not less than 3 (ie, a hash value) is associated with 1001001 and stored in the simhash fingerprint database, and this type of string is marked as the string to which the variant webshell belongs.
这种情况下,当服务器确定出待测文件的特征hash值之后,可以在预先建立的hash指纹库中查找与待测文件的特征hash值相同的字符串,进而确定该字符串所属的标记信息,若标记信息指示该字符串为webshell所属的字符串,则确定待测文件为webshell;若标记信息指示该字符串为非webshell所属的字符串,则确定待测文件为非webshell;若标记信息指示该字符串为变种webshell所属的字符串,则确定待测文件为变种webshell。采用这样的方式,可以减少将待测文件的特征hash值与预先建立的hash指纹库中的样本特征hash值进行对比时的计算量,进一步提高webshell检测效率。In this case, after the server determines the characteristic hash value of the file to be tested, it can search for a character string that is the same as the characteristic hash value of the file to be tested in a pre-established hash fingerprint database, and then determine the tag information to which the string belongs. If the tag information indicates that the character string belongs to the webshell, the file to be tested is determined to be a webshell; if the tag information indicates that the character string is a character string that does not belong to the webshell, the file to be tested is determined to be a non-webshell; if the tag information Indicates that the string is a string to which the variant webshell belongs, and it is determined that the file to be tested is a variant webshell. In this way, the calculation amount when comparing the characteristic hash value of the file to be tested with the sample characteristic hash value in the pre-established hash fingerprint database can be reduced, and the webshell detection efficiency can be further improved.
本申请实施例中,服务器可以检测当前运行环境,从预设的文本相似度算法列表中确定出与当前运行环境相匹配的目标算法,如果该目标算法为第二文本相似度算法,则基于第二文本相似度算法对安全检测指令指示的目标目录下和/或目标路径下的待测文件进行检测。一方面,有利于提高webshell的检测效率,另一方面,采用与当前运行环境匹配的文本相似度算法对待测文件进行检测,有利于提高webshell检测的准确度。In the embodiment of the present application, the server may detect the current running environment, and determine a target algorithm matching the current running environment from a preset list of text similarity algorithms. If the target algorithm is a second text similarity algorithm, then based on the The two-text similarity algorithm detects files under test in the target directory and / or the target path indicated by the security detection instruction. On the one hand, it is beneficial to improve the detection efficiency of the webshell, and on the other hand, the text similarity algorithm that matches the current running environment is used to detect the file to be tested, which is helpful to improve the accuracy of the webshell detection.
本申请实施例还提供了一种webshell检测装置,该装置包括用于执行前述图1、图2或者图3所述的方法的模块。具体地,参见图4,是本申请实施例提供的一种webshell检测装置的示意框图。本实施例的webshell检测装置包括:An embodiment of the present application further provides a webshell detection device, and the device includes a module for executing the method described in FIG. 1, FIG. 2, or FIG. 3. Specifically, referring to FIG. 4, it is a schematic block diagram of a webshell detection device according to an embodiment of the present application. The webshell detection device of this embodiment includes:
检测模块40,用于在接收到安全检测指令的情况下,根据所述安全检测指令的指示对目标目录下和/或目标路径下的待测文件进行检测,确定出所述待测文件的特征hash值;A detection module 40 is configured to detect a file to be tested in a target directory and / or a target path according to the instruction of the security test instruction when a security detection instruction is received, and determine characteristics of the file to be tested. hash value
对比模块41,用于将所述待测文件的所述特征hash值与预先建立的hash指纹库中webshell样本的样本特征hash值进行对比;A comparison module 41, configured to compare the characteristic hash value of the file to be tested with a sample characteristic hash value of a webshell sample in a pre-established hash fingerprint database;
确定模块42,用于若存在与所述特征hash值匹配的样本特征hash值,则确定所述待测文件为webshell。A determining module 42 is configured to determine that the file to be tested is a webshell if a sample characteristic hash value matching the characteristic hash value exists.
在一个实施例中,所述装置还包括:获取模块43,建立模块44,其中:In one embodiment, the apparatus further includes: an obtaining module 43 and a establishing module 44, wherein:
获取模块43,用于获取不同脚本语言的webshell样本;An obtaining module 43 for obtaining webshell samples of different scripting languages;
确定模块42,还用于采用N种文本相似度算法确定出每个webshell样本在所述N种文本相似度算法下的N个样本特征hash值,其中,N为正整数;The determining module 42 is further configured to use N types of text similarity algorithms to determine N sample feature hash values of each webshell sample under the N types of text similarity algorithms, where N is a positive integer;
建立模块44,用于根据所述不同脚本语言的webshell样本和所述webshell样本在所述N种文本相似度算法下的N个样本特征hash值,建立hash指纹库。A establishing module 44 is configured to establish a hash fingerprint database according to webshell samples of the different scripting languages and N sample feature hash values of the webshell samples under the N text similarity algorithms.
在一个实施例中,所述检测模块40,具体用于检测当前运行环境,从预设的文本相似度算法列表中确定出与当前运行环境相匹配的目标算法,所述预设的文本相似度算法列表中包括所述N种文本相似度算法;基于所述目标算法对所述安全检测指令指示的目标目录下和/或目标路径下的待测文件进行检测,确定出所述待测文件在所述目标算法下的特征hash值。In one embodiment, the detection module 40 is specifically configured to detect a current running environment, and determine a target algorithm matching the current running environment from a preset text similarity algorithm list, and the preset text similarity The algorithm list includes the N kinds of text similarity algorithms; based on the target algorithm, detecting the files to be tested in the target directory and / or the target path indicated by the security detection instruction, and determining whether the files to be tested are A characteristic hash value under the target algorithm.
在一个实施例中,对比模块41,具体用于在预先建立的hash指纹库中获取任一个或者多个webshell样本在所述目标算法下各自对应的样本特征hash值若存在与所述特征hash值匹配的样本特征hash值;将所述待测文件在所述目标算法下的所述特征hash值与所述任一个或者多个webshell样本在所述目标算法下各自对应的所述样本特征hash值进行对比。In one embodiment, the comparison module 41 is specifically configured to obtain any one or more webshell samples from a pre-established hash fingerprint database, and the respective sample feature hash values corresponding to the target algorithm under the target algorithm, if present, are consistent with the feature hash values. Matching sample feature hash values; comparing the feature hash values of the file under test under the target algorithm with the sample feature hash values corresponding to any one or more webshell samples under the target algorithm comparing.
在一个实施例中,所述目标算法为第一文本相似度算法,对比模块41,具体还用于获取所述待测文件在所述第一文本相似度算法下的特征hash值与所述任一个或者多个webshell样本在所述第一文本相似度算法下各自对应的样本特征hash值之间的各个汉明距离;确定模块42,具体用于如果所述各个汉明距离中的任一个汉明距离等于第一汉明阈值,则确定存在与所述特征hash值匹配的样本特征hash值。In an embodiment, the target algorithm is a first text similarity algorithm, and the comparison module 41 is further specifically configured to obtain a characteristic hash value of the file under test under the first text similarity algorithm and the task hash value. Hamming distances between one or more webshell samples corresponding to the respective sample feature hash values under the first text similarity algorithm; a determination module 42 is specifically configured to detect if any one of the Hamming distances If the bright distance is equal to the first Hamming threshold, it is determined that there is a sample feature hash value that matches the feature hash value.
在一个实施例中,所述目标算法为第二文本相似度算法,对比模块41,具体用于获取所述待测文件在所述第二文本相似度算法下的特征hash值与所述任一个或者多个webshell样本在所述第二文本相似度算法下各自对应的样本特征hash值之间的各个加权编辑距离;根据所述各个加权编辑距离确定所述第二文本相似度算法下的特征hash值与所述任一个或者多个webshell样本在所述第二文本相似度算法下各自对应的样本特征hash值之间的各个匹配值;In one embodiment, the target algorithm is a second text similarity algorithm, and the comparison module 41 is specifically configured to obtain a feature hash value of the file under test under the second text similarity algorithm and any one of the two. Or each weighted edit distance between the respective sample feature hash values of multiple webshell samples under the second text similarity algorithm; and determining the feature hash under the second text similarity algorithm according to the weighted edit distances Each matching value between the value and the sample feature hash value corresponding to each of the one or more webshell samples under the second text similarity algorithm;
其中,确定模块42,具体还用于如果所述各个匹配值中任一个匹配值等于第一匹配阈值,则确定存在与所述特征hash值匹配的样本特征hash值。The determining module 42 is further specifically configured to determine that, if any of the matching values is equal to the first matching threshold, a sample feature hash value matching the feature hash value exists.
在一个实施例中,如果确定模块42确定出所述任一个汉明距离大于所述第一汉明阈值且不大于第二汉明阈值,则确定所述待测文件为变种webshell,所述第二汉明阈值大于所述第一汉明阈值;In one embodiment, if the determining module 42 determines that any one of the Hamming distances is greater than the first Hamming threshold and not greater than the second Hamming threshold, determining that the file to be tested is a variant webshell, and the first The second Hamming threshold is greater than the first Hamming threshold;
在一个实施例中,如果确定模块42确定出所述任一个匹配值大于第二匹配阈值且小于所述第一匹配阈值,则确定所述待测文件为变种webshell,所述第一匹配阈值大于所述第二匹配阈值。In one embodiment, if the determining module 42 determines that any one of the matching values is greater than a second matching threshold and less than the first matching threshold, it is determined that the file to be tested is a variant webshell, and the first matching threshold is greater than The second matching threshold.
在一个实施例中,所述确定模块42,还用于识别所述安全检测指令,以确定所述安全检测指令所需测试的目标目录和/或目标路径;所述获取模块43,还用于获取所述目标目录和/或所述目标路径中所有文件的扩展名,并将目标目录和/或目标路径中扩展名为预设扩展名的文件确定为待测文件。In an embodiment, the determining module 42 is further configured to identify the security detection instruction to determine a target directory and / or a target path to be tested by the security detection instruction; and the acquisition module 43 is further configured to: The extensions of all the files in the target directory and / or the target path are obtained, and files with the preset extensions in the target directory and / or the target path are determined as the files to be tested.
需要说明的是,本申请实施例所描述的webshell检测装置的各功能模块的功能可根据图1、图2或者图3所述的方法实施例中的方法具体实现,其具体实现过程可以参照图1、图2或者图3的方法实施例的相关描述,此处不再赘述。It should be noted that the functions of the functional modules of the webshell detection device described in the embodiments of the present application can be specifically implemented according to the method in the method embodiment described in FIG. 1, FIG. 2, or FIG. 3, and the specific implementation process can refer to FIG. The relevant descriptions of the method embodiments in FIG. 2 or FIG. 3 are not repeated here.
请参见图5,图5是本申请实施例提供的一种服务器的示意性框图。如图5所示,该服务器包括,处理器501、存储器502和网络接口503。上述处理器501、存储器502和网络接口503可通过总线或其他方式连接,在本申请实施例所示图5中以通过总线连接为例。其中,网络接口503受所述处理器的控制用于收发消息,存储器502用于存储计算机程序,所述计算机程序包括程序指令,处理器501用于执行存储器502存储的程序指令。其中,处理器501被配置用于调用所述程序指令执行:在接收到安全检测指令的情况下,根据所述安全检测指令的指示对目标目录下和/或目标路径下的待测文件进行检测,确定出所述待测文件的特征hash值;将所述待测文件的所述特征hash值与预先建立的hash指纹库中webshell样本的样本特征hash值进行对比;若存在与所述特征hash值匹配的样本特征hash值,则确定所述待测文件为webshell。Please refer to FIG. 5, which is a schematic block diagram of a server provided by an embodiment of the present application. As shown in FIG. 5, the server includes a processor 501, a memory 502, and a network interface 503. The processor 501, the memory 502, and the network interface 503 may be connected through a bus or in other manners. In FIG. 5 shown in the embodiment of the present application, connection through a bus is taken as an example. The network interface 503 is controlled by the processor to send and receive messages, and the memory 502 is used to store a computer program. The computer program includes program instructions, and the processor 501 is configured to execute the program instructions stored in the memory 502. Wherein, the processor 501 is configured to call the program instruction to execute: upon receiving a security detection instruction, detecting a file to be tested in a target directory and / or a target path according to an instruction of the security detection instruction. To determine a characteristic hash value of the file to be tested; compare the characteristic hash value of the file to be tested with a sample characteristic hash value of a webshell sample in a pre-established hash fingerprint database; if there is a hash with the characteristic If the value matches the sample feature hash value, it is determined that the file to be tested is a webshell.
应当理解,在本申请实施例中,所称处理器501可以是中央处理单元(Central Processing Unit,CPU),该处理器501还可以是其他通用处理器、数字信号处理器(Digital Signal Processor,DSP)、专用集成电路(Application Specific Integrated Circuit,ASIC)、现成可编程门阵列(Field-Programmable Gate Array,FPGA)或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件等。通用处理器可以是微处理器或者该处理器也可以是任何常规的处理器等。It should be understood that, in the embodiment of the present application, the processor 501 may be a Central Processing Unit (CPU), and the processor 501 may also be another general-purpose processor or a digital signal processor (Digital Signal Processor, DSP). ), Application Specific Integrated Circuit (ASIC), Field-Programmable Gate Array (FPGA) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, etc. A general-purpose processor may be a microprocessor or the processor may be any conventional processor or the like.
该存储器502可以包括只读存储器和随机存取存储器,并向处理器501提供指令和数据。存储器502的一部分还可以包括非易失性随机存取存储器。例如,存储器502还可以存储设备类型的信息。The memory 502 may include a read-only memory and a random access memory, and provide instructions and data to the processor 501. A part of the memory 502 may further include a non-volatile random access memory. For example, the memory 502 may also store information of a device type.
具体实现中,本申请实施例中所描述的处理器501、存储器502和网络接口503可执行本申请实施例提供的图1、图2或者图3所述的方法实施例所描述的实现方式,也可执行本申请实施例所描述的webshell检测装置的实现方式,在此不再赘述。In specific implementation, the processor 501, the memory 502, and the network interface 503 described in the embodiments of the present application may perform the implementation manners described in the method embodiments described in FIG. 1, FIG. 2 or FIG. 3 provided by the embodiments of the present application, The implementation manner of the webshell detection device described in the embodiment of the present application may also be performed, and details are not described herein again.
在本申请的另一实施例中提供一种计算机可读存储介质,所述计算机可读存储介质存储有计算机程序,所述计算机程序包括程序指令,所述程序指令被处理器执行时实现:在接收到安全检测指令的情况下,根据所述安全检测指令的指示对目标目录下和/或目标路径下的待测文件进行检测,确定出所述待测文件的特征hash值;将所述待测文件的所述特征hash值与预先建立的hash指纹库中webshell样本的样本特征hash值进行对比;若存在与所述特征hash值匹配的样本特征hash值,则确定所述待测文件为webshell。In another embodiment of the present application, a computer-readable storage medium is provided. The computer-readable storage medium stores a computer program, where the computer program includes program instructions, and the program instructions are implemented when executed by a processor: When a security detection instruction is received, the files to be tested in the target directory and / or the target path are detected according to the instructions of the security detection instruction, and a characteristic hash value of the files to be tested is determined; The feature hash value of the test file is compared with a sample feature hash value of a webshell sample in a pre-established hash fingerprint database; if a sample feature hash value that matches the feature hash value exists, determining that the file to be tested is a webshell .
所述计算机可读存储介质可以是前述任一实施例所述的服务器的内部存储单元,例如服务器的硬盘或内存。所述计算机可读存储介质也可以是所述服务器的外部存储设备,例如所述服务器上配备的插接式硬盘,智能存储卡(Smart Media Card,SMC),安全数字(Secure Digital,SD)卡,闪存卡(Flash Card)等。进一步地,所述计算机可读存储介质还可以既包括所述服务器的内部存储单元也包括外部存储设备。所述计算机可读存储介质用于存储所述计算机程序以及所述服务器所需的其他程序和数据。所述计算机可读存储介质还可以用于暂时地存储已经输出或者将要输出的数据。The computer-readable storage medium may be an internal storage unit of the server according to any of the foregoing embodiments, such as a hard disk or a memory of the server. The computer-readable storage medium may also be an external storage device of the server, such as a plug-in hard disk, a Smart Media Card (SMC), and a Secure Digital (SD) card provided on the server. , Flash card (Flash card) and so on. Further, the computer-readable storage medium may further include both an internal storage unit of the server and an external storage device. The computer-readable storage medium is used to store the computer program and other programs and data required by the server. The computer-readable storage medium may also be used to temporarily store data that has been or will be output.
本领域普通技术人员可以理解实现上述实施例方法中的全部或部分流程,是可以通过计算机程序来指令相关的硬件来完成,所述的程序可存储于一计算机可读取存储介质中,该程序在执行时,可包括如上述各方法的实施例的流程。其中,所述的存储介质可为磁碟、光盘、只读存储记忆体(Read-Only Memory,ROM)或随机存储记忆体(Random Access Memory,RAM)等。Those of ordinary skill in the art can understand that all or part of the processes in the method of the foregoing embodiment can be implemented by using a computer program to instruct related hardware. The program can be stored in a computer-readable storage medium. When executed, the processes of the embodiments of the methods described above may be included. The storage medium may be a magnetic disk, an optical disk, a read-only memory (Read-Only Memory, ROM), or a random access memory (Random, Access Memory, RAM).
以上所揭露的仅为本申请的部分实施例而已,当然不能以此来限定本申请之权利范围,本领域普通技术人员可以理解实现上述实施例的全部或部分流程,并依本申请权利要求所作的等同变化,仍属于发明所涵盖的范围。The above disclosure is only a part of the embodiments of this application, and of course, the scope of rights of this application cannot be limited by this. Those skilled in the art can understand all or part of the processes of implementing the above embodiments and make according to the claims of this application. The equivalent changes still fall within the scope of the invention.

Claims (20)

  1. 一种恶意脚本后门webshell检测方法,其特征在于,包括:A method for detecting a malicious script backdoor webshell, which includes:
    在接收到安全检测指令的情况下,根据所述安全检测指令的指示对目标目录下和/或目标路径下的待测文件进行检测,确定出所述待测文件的特征hash值;When a security detection instruction is received, detecting a file under test in a target directory and / or a target path according to the instruction of the security detection instruction, and determining a characteristic hash value of the file under test;
    将所述待测文件的所述特征hash值与预先建立的hash指纹库中webshell样本的样本特征hash值进行对比;Comparing the characteristic hash value of the file to be tested with a sample characteristic hash value of a webshell sample in a pre-established hash fingerprint database;
    若存在与所述特征hash值匹配的样本特征hash值,则确定所述待测文件为webshell。If there is a sample feature hash value that matches the feature hash value, it is determined that the file to be tested is a webshell.
  2. 根据权利要求1所述的方法,其特征在于,所述根据所述安全检测指令的指示对目标目录下和/或目标路径下的待测文件进行检测,确定出所述待测文件的特征hash值,包括:The method according to claim 1, wherein the detection of the files under test in the target directory and / or the target path according to the instructions of the security detection instruction determines the characteristic hash of the files under test Values, including:
    检测当前运行环境,从预设的文本相似度算法列表中确定出与当前运行环境相匹配的目标算法,所述预设的文本相似度算法列表中包括所述N种文本相似度算法;Detecting the current running environment, and determining a target algorithm matching the current running environment from a preset list of text similarity algorithms, where the preset list of text similarity algorithms includes the N types of text similarity algorithms;
    基于所述目标算法对所述安全检测指令指示的目标目录下和/或目标路径下的待测文件进行检测,确定出所述待测文件在所述目标算法下的特征hash值;Detecting the file under test in the target directory and / or the target path indicated by the security detection instruction based on the target algorithm to determine a characteristic hash value of the file under test in the target algorithm;
    其中,所述将所述待测文件的所述特征hash值与预先建立的hash指纹库中webshell样本的样本特征hash值进行对比,包括:Wherein, comparing the characteristic hash value of the file to be tested with a sample characteristic hash value of a webshell sample in a pre-established hash fingerprint database includes:
    在预先建立的hash指纹库中获取任一个或者多个webshell样本在所述目标算法下各自对应的样本特征hash值;Obtaining, from a pre-established hash fingerprint database, sample characteristic hash values corresponding to any one or more webshell samples under the target algorithm;
    将所述待测文件在所述目标算法下的所述特征hash值与所述任一个或者多个webshell样本在所述目标算法下各自对应的所述样本特征hash值进行对比。Compare the characteristic hash value of the file under test under the target algorithm with the sample characteristic hash value corresponding to any one or more webshell samples under the target algorithm.
  3. 根据权利要求2所述的方法,其特征在于,所述目标算法为第一文本相似度算法,所述将所述待测文件在所述目标算法下的特征hash值与所述任一个或者多个webshell样本在所述目标算法下各自对应的所述样本特征hash值进行对比,包括:The method according to claim 2, wherein the target algorithm is a first text similarity algorithm, and the feature hash value of the file under test under the target algorithm is equal to the one or more of the The comparison of the respective webshell samples under the target algorithm to the sample feature hash values includes:
    获取所述待测文件在所述第一文本相似度算法下的特征hash值与所述任一个或者多个webshell样本在所述第一文本相似度算法下各自对应的样本特征hash值之间的各个汉明距离;Obtain the feature hash value of the file under test under the first text similarity algorithm and the corresponding sample feature hash value of any one or more webshell samples under the first text similarity algorithm Each Hamming distance;
    如果所述各个汉明距离中的任一个汉明距离等于第一汉明阈值,则确定存在与所述特征hash值匹配的样本特征hash值。If any one of the Hamming distances is equal to the first Hamming threshold, it is determined that there is a sample feature hash value that matches the feature hash value.
  4. 根据权利要求3所述的方法,其特征在于,所述方法还包括:The method according to claim 3, further comprising:
    如果所述任一个汉明距离大于所述第一汉明阈值且不大于第二汉明阈值,则确定所述 待测文件为变种webshell,所述第二汉明阈值大于所述第一汉明阈值。If any one of the Hamming distances is greater than the first Hamming threshold and not greater than the second Hamming threshold, determining that the file under test is a variant webshell, and the second Hamming threshold is greater than the first Hamming threshold Threshold.
  5. 根据权利要求2所述的方法,其特征在于,所述目标算法为第二文本相似度算法,所述将所述待测文件在所述目标算法下的特征hash值与所述任一个或者多个webshell样本在所述目标算法下各自对应的样本特征hash值进行对比,包括:The method according to claim 2, wherein the target algorithm is a second text similarity algorithm, and the feature hash value of the file under test under the target algorithm is compared with any one or more of the The comparison of the respective sample feature hash values of the webshell samples under the target algorithm includes:
    获取所述待测文件在所述第二文本相似度算法下的特征hash值与所述任一个或者多个webshell样本在所述第二文本相似度算法下各自对应的样本特征hash值之间的各个加权编辑距离;Obtain the feature hash value of the file under test under the second text similarity algorithm and the corresponding feature feature hash value of any one or more webshell samples under the second text similarity algorithm Each weighted edit distance;
    根据所述各个加权编辑距离确定所述第二文本相似度算法下的特征hash值与所述任一个或者多个webshell样本在所述第二文本相似度算法下各自对应的样本特征hash值之间的各个匹配值;Determining between the feature hash value under the second text similarity algorithm and the corresponding feature hash value for each of the one or more webshell samples under the second text similarity algorithm according to the weighted editing distances Each matching value of
    如果所述各个匹配值中的任一个匹配值等于第一匹配阈值,则确定存在与所述特征hash值匹配的样本特征hash值。If any one of the respective matching values is equal to the first matching threshold, it is determined that there is a sample feature hash value that matches the feature hash value.
  6. 根据权利要求5所述的方法,其特征在于,所述方法还包括:The method according to claim 5, further comprising:
    如果所述任一个匹配值大于第二匹配阈值且小于所述第一匹配阈值,则确定所述待测文件为变种webshell,所述第一匹配阈值大于所述第二匹配阈值。If any one of the matching values is greater than a second matching threshold and less than the first matching threshold, determining that the file under test is a variant webshell, and the first matching threshold is greater than the second matching threshold.
  7. 根据权利要求1-6任一项所述的方法,其特征在于,所述将所述待测文件的所述特征hash值与预先建立的hash指纹库中webshell样本的样本特征hash值进行对比之前,所述方法还包括:The method according to any one of claims 1-6, wherein before comparing the characteristic hash value of the file to be tested with a sample characteristic hash value of a webshell sample in a pre-established hash fingerprint database , The method further includes:
    获取不同脚本语言的webshell样本;Get webshell samples of different scripting languages;
    采用N种文本相似度算法确定出每个webshell样本在所述N种文本相似度算法下的N个样本特征hash值,其中,N为正整数;N types of text similarity algorithms are used to determine the N sample feature hash values of each webshell sample under the N types of text similarity algorithms, where N is a positive integer;
    根据所述不同脚本语言的webshell样本和所述webshell样本在所述N种文本相似度算法下的N个样本特征hash值,建立hash指纹库。A hash fingerprint database is established according to the webshell samples of the different scripting languages and the N sample feature hash values of the webshell samples under the N text similarity algorithms.
  8. 根据权利要求1-7任一项所述的方法,其特征在于,所述根据所述安全检测指令的指示对目标目录下和/或目标路径下的待测文件进行检测之前,所述方法还包括:The method according to any one of claims 1 to 7, characterized in that before the detecting a file under test in a target directory and / or a target path according to an instruction of the security detection instruction, the method further include:
    识别所述安全检测指令,以确定所述安全检测指令所需测试的目标目录和/或目标路径;Identifying the security detection instruction to determine a target directory and / or a target path to be tested by the security detection instruction;
    获取所述目标目录和/或所述目标路径中所有文件的扩展名,并将目标目录和/或目标路径中扩展名为预设扩展名的文件确定为待测文件。The extensions of all the files in the target directory and / or the target path are obtained, and files with the preset extensions in the target directory and / or the target path are determined as the files to be tested.
  9. 一种webshell检测装置,其特征在于,包括:A webshell detection device, comprising:
    检测模块,用于在接收到安全检测指令的情况下,根据所述安全检测指令的指示对目标目录下和/或目标路径下的待测文件进行检测,确定出所述待测文件的特征hash值;A detection module, configured to detect a file under test in a target directory and / or a target path according to an instruction of the security test instruction, and determine a characteristic hash of the file under test when the security test instruction is received; value;
    对比模块,用于将所述待测文件的所述特征hash值与预先建立的hash指纹库中webshell样本的样本特征hash值进行对比;A comparison module, configured to compare the characteristic hash value of the file under test with a sample characteristic hash value of a webshell sample in a pre-established hash fingerprint database;
    确定模块,用于若存在与所述特征hash值匹配的样本特征hash值,则确定所述待测文件为webshell。A determining module, configured to determine that the file to be tested is a webshell if a sample characteristic hash value that matches the characteristic hash value exists.
  10. 根据权利要求9所述的装置,其特征在于,所述检测模块,具体用于检测当前运行环境,从预设的文本相似度算法列表中确定出与当前运行环境相匹配的目标算法,所述预设的文本相似度算法列表中包括所述N种文本相似度算法;基于所述目标算法对所述安全检测指令指示的目标目录下和/或目标路径下的待测文件进行检测,确定出所述待测文件在所述目标算法下的特征hash值;其中,所述对比模块,具体用于在预先建立的hash指纹库中获取任一个或者多个webshell样本在所述目标算法下各自对应的样本特征hash值;将所述待测文件在所述目标算法下的所述特征hash值与所述任一个或者多个webshell样本在所述目标算法下各自对应的所述样本特征hash值进行对比。The device according to claim 9, wherein the detection module is specifically configured to detect a current running environment, and determine a target algorithm that matches the current running environment from a preset text similarity algorithm list, and The preset text similarity algorithm list includes the N types of text similarity algorithms; based on the target algorithm, the files to be tested in the target directory and / or the target path indicated by the security detection instruction are detected to determine A characteristic hash value of the file under test under the target algorithm; wherein the comparison module is specifically configured to obtain any one or more webshell samples in a pre-established hash fingerprint database corresponding to each of the target algorithms The feature hash value of the sample; the feature hash value of the test file under the target algorithm and the sample feature hash value corresponding to each of the one or more webshell samples under the target algorithm are performed Compared.
  11. 根据权利要求10所述的装置,其特征在于,所述目标算法为第一文本相似度算法,所述对比模块,具体还用于获取所述待测文件在所述第一文本相似度算法下的特征hash值与所述任一个或者多个webshell样本在所述第一文本相似度算法下各自对应的样本特征hash值之间的各个汉明距离;所述确定模块,具体用于如果所述各个汉明距离中的任一个汉明距离等于第一汉明阈值,则确定存在与所述特征hash值匹配的样本特征hash值。The device according to claim 10, wherein the target algorithm is a first text similarity algorithm, and the comparison module is further configured to obtain the test file under the first text similarity algorithm. The Hamming distance between the feature hash value of each of the one and more webshell samples under the first text similarity algorithm, and the respective Hamming distances; the determining module is specifically configured to: If any one of the Hamming distances is equal to the first Hamming threshold, it is determined that there is a sample feature hash value that matches the feature hash value.
  12. 根据权利要求11所述的装置,其特征在于,所述确定模块,还用于如果确定出所述任一个汉明距离大于所述第一汉明阈值且不大于第二汉明阈值,则确定所述待测文件为变种webshell,所述第二汉明阈值大于所述第一汉明阈值。The apparatus according to claim 11, wherein the determining module is further configured to determine if it is determined that any one of the Hamming distances is greater than the first Hamming threshold and not greater than a second Hamming threshold. The file to be tested is a variant webshell, and the second Hamming threshold is greater than the first Hamming threshold.
  13. 根据权利要求10所述的装置,其特征在于,所述目标算法为第二文本相似度算法,所述对比模块,具体用于获取所述待测文件在所述第二文本相似度算法下的特征hash值与所述任一个或者多个webshell样本在所述第二文本相似度算法下各自对应的样本特征hash值之间的各个加权编辑距离;根据所述各个加权编辑距离确定所述第二文本相似度算法下的特征hash值与所述任一个或者多个webshell样本在所述第二文本相似度算法下各自对应的样本特征hash值之间的各个匹配值;其中,所述确定模块,具体还用于如果所述各个匹 配值中任一个匹配值等于第一匹配阈值,则确定存在与所述特征hash值匹配的样本特征hash值。The device according to claim 10, wherein the target algorithm is a second text similarity algorithm, and the comparison module is specifically configured to obtain the file under test under the second text similarity algorithm. Each weighted edit distance between the feature hash value and the sample feature hash value of each of the one or more webshell samples corresponding to the second text similarity algorithm; determining the second weighted distance according to each weighted edit distance Each matching value between the feature hash value under the text similarity algorithm and the sample feature hash value corresponding to each of the one or more webshell samples under the second text similarity algorithm; wherein the determining module, Specifically, the method is further configured to determine that, if any one of the matching values is equal to the first matching threshold, a sample feature hash value matching the feature hash value exists.
  14. 根据权利要求13所述的装置,其特征在于,所述确定模块,还用于如果确定出所述任一个匹配值大于第二匹配阈值且小于所述第一匹配阈值,则确定所述待测文件为变种webshell,所述第一匹配阈值大于所述第二匹配阈值。The device according to claim 13, wherein the determining module is further configured to determine the to-be-measured if it is determined that any one of the matching values is greater than a second matching threshold value and less than the first matching threshold value. The file is a variant webshell, and the first matching threshold is greater than the second matching threshold.
  15. 根据权利要求9-14所述的装置,其特征在于,所述装置还包括:获取模块、建立模块,其中:The device according to claim 9-14, further comprising: an acquisition module and a establishment module, wherein:
    所述获取模块,用于获取不同脚本语言的webshell样本;The obtaining module is used to obtain webshell samples of different scripting languages;
    所述确定模块,还用于采用N种文本相似度算法确定出每个webshell样本在所述N种文本相似度算法下的N个样本特征hash值,其中,N为正整数;The determining module is further configured to use N text similarity algorithms to determine N sample feature hash values of each webshell sample under the N text similarity algorithms, where N is a positive integer;
    所述建立模块,用于根据所述不同脚本语言的webshell样本和所述webshell样本在所述N种文本相似度算法下的N个样本特征hash值,建立hash指纹库。The establishing module is configured to establish a hash fingerprint database according to webshell samples of the different scripting languages and N sample feature hash values of the webshell samples under the N types of text similarity algorithms.
  16. 根据权利要求9-15任一项所述的装置,其特征在于,所述确定模块,还用于识别所述安全检测指令,以确定所述安全检测指令所需测试的目标目录和/或目标路径;所述获取模块,还用于获取所述目标目录和/或所述目标路径中所有文件的扩展名,并将目标目录和/或目标路径中扩展名为预设扩展名的文件确定为待测文件。The device according to any one of claims 9 to 15, wherein the determining module is further configured to identify the security detection instruction to determine a target directory and / or target to be tested by the security detection instruction. Path; the obtaining module is further configured to obtain the extensions of the target directory and / or all files in the target path, and determine the files in the target directory and / or the target path with the preset extensions as File under test.
  17. 一种服务器,其特征在于,包括处理器和存储器,所述处理器和所述存储器相互连接,其中,所述存储器用于存储计算机程序,所述计算机程序包括程序指令,所述处理器被配置用于调用所述程序指令执行:在接收到安全检测指令的情况下,根据所述安全检测指令的指示对目标目录下和/或目标路径下的待测文件进行检测,确定出所述待测文件的特征hash值;将所述待测文件的所述特征hash值与预先建立的hash指纹库中webshell样本的样本特征hash值进行对比;若存在与所述特征hash值匹配的样本特征hash值,则确定所述待测文件为webshell。A server is characterized in that it includes a processor and a memory, and the processor and the memory are connected to each other, wherein the memory is used to store a computer program, the computer program includes program instructions, and the processor is configured For invoking the execution of the program instruction: when a security detection instruction is received, detecting a file to be tested in a target directory and / or a target path according to an instruction of the security detection instruction, and determining the file to be tested The characteristic hash value of the file; comparing the characteristic hash value of the file to be tested with the sample characteristic hash value of the webshell sample in a pre-established hash fingerprint database; if a sample characteristic hash value matching the characteristic hash value exists , It is determined that the file under test is a webshell.
  18. 根据权利要求17所述的服务器,其特征在于,所述处理器,还用于;The server according to claim 17, wherein the processor is further configured to:
    检测当前运行环境,从预设的文本相似度算法列表中确定出与当前运行环境相匹配的目标算法,所述预设的文本相似度算法列表中包括所述N种文本相似度算法;Detecting the current running environment, and determining a target algorithm matching the current running environment from a preset list of text similarity algorithms, where the preset list of text similarity algorithms includes the N types of text similarity algorithms;
    基于所述目标算法对所述安全检测指令指示的目标目录下和/或目标路径下的待测文件进行检测,确定出所述待测文件在所述目标算法下的特征hash值;Detecting the file under test in the target directory and / or the target path indicated by the security detection instruction based on the target algorithm to determine a characteristic hash value of the file under test in the target algorithm;
    在预先建立的hash指纹库中获取任一个或者多个webshell样本在所述目标算法下各自 对应的样本特征hash值;Obtaining, in a pre-established hash fingerprint database, sample feature hash values corresponding to any one or more webshell samples under the target algorithm;
    将所述待测文件在所述目标算法下的所述特征hash值与所述任一个或者多个webshell样本在所述目标算法下各自对应的所述样本特征hash值进行对比。Compare the characteristic hash value of the file under test under the target algorithm with the sample characteristic hash value corresponding to any one or more webshell samples under the target algorithm.
  19. 根据权利要求18所述的服务器,其特征在于,所述目标算法为第一文本相似度算法,所述处理器,还用于:The server according to claim 18, wherein the target algorithm is a first text similarity algorithm, and the processor is further configured to:
    获取所述待测文件在所述第一文本相似度算法下的特征hash值与所述任一个或者多个webshell样本在所述第一文本相似度算法下各自对应的样本特征hash值之间的各个汉明距离;Obtain the feature hash value of the file under test under the first text similarity algorithm and the corresponding sample feature hash value of any one or more webshell samples under the first text similarity algorithm Each Hamming distance;
    如果所述各个汉明距离中的任一个汉明距离等于第一汉明阈值,则确定存在与所述特征hash值匹配的样本特征hash值。If any one of the Hamming distances is equal to the first Hamming threshold, it is determined that there is a sample feature hash value that matches the feature hash value.
  20. 一种计算机可读存储介质,其特征在于,所述计算机可读存储介质存储有计算机程序,所述计算机程序包括程序指令,所述程序指令当被处理器执行时使所述处理器执行如权利要求1-8任一项所述的方法。A computer-readable storage medium, characterized in that the computer-readable storage medium stores a computer program, the computer program includes program instructions, and when the program instructions are executed by a processor, the processor executes A method according to any one of 1-8 is required.
PCT/CN2018/108472 2018-06-27 2018-09-28 Webshell detection method and related device WO2020000743A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201810685031.3A CN108985057B (en) 2018-06-27 2018-06-27 Webshell detection method and related equipment
CN201810685031.3 2018-06-27

Publications (1)

Publication Number Publication Date
WO2020000743A1 true WO2020000743A1 (en) 2020-01-02

Family

ID=64539212

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2018/108472 WO2020000743A1 (en) 2018-06-27 2018-09-28 Webshell detection method and related device

Country Status (2)

Country Link
CN (1) CN108985057B (en)
WO (1) WO2020000743A1 (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111695117A (en) * 2020-06-12 2020-09-22 国网浙江省电力有限公司信息通信分公司 Webshell script detection method and device
CN112800427A (en) * 2021-04-08 2021-05-14 北京邮电大学 Webshell detection method and device, electronic equipment and storage medium
CN112926054A (en) * 2021-02-22 2021-06-08 亚信科技(成都)有限公司 Malicious file detection method, device, equipment and storage medium
CN113240247A (en) * 2021-04-21 2021-08-10 深圳铭锋达精密技术有限公司 Quality measurement method and device, terminal equipment and storage medium
CN113805894A (en) * 2021-09-17 2021-12-17 杭州云深科技有限公司 Abnormal APK (android Package) identification method, electronic equipment and readable storage medium

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110034921B (en) * 2019-04-18 2022-04-15 成都信息工程大学 Webshell detection method based on weighted fuzzy hash
CN110086811B (en) * 2019-04-29 2022-03-22 深信服科技股份有限公司 Malicious script detection method and related device
CN110162973B (en) * 2019-05-24 2021-04-09 新华三信息安全技术有限公司 Webshell file detection method and device
CN113746784B (en) * 2020-05-29 2023-04-07 深信服科技股份有限公司 Data detection method, system and related equipment

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103561012A (en) * 2013-10-28 2014-02-05 中国科学院信息工程研究所 WEB backdoor detection method and system based on relevance tree
CN104811447A (en) * 2015-04-21 2015-07-29 深信服网络科技(深圳)有限公司 Security detection method and system based on attack association
CN105933268A (en) * 2015-11-27 2016-09-07 中国银联股份有限公司 Webshell detection method and apparatus based on total access log analysis
CN107103237A (en) * 2016-02-23 2017-08-29 阿里巴巴集团控股有限公司 A kind of detection method and device of malicious file
CN108156131A (en) * 2017-10-27 2018-06-12 上海观安信息技术股份有限公司 Webshell detection methods, electronic equipment and computer storage media

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7921300B2 (en) * 2003-10-10 2011-04-05 Via Technologies, Inc. Apparatus and method for secure hash algorithm
CN102880628B (en) * 2012-06-15 2015-02-25 福建星网锐捷网络有限公司 Hash data storage method and device
CN105812196A (en) * 2014-12-30 2016-07-27 中国移动通信集团公司 WebShell detection method and electronic device
CN106301974A (en) * 2015-05-14 2017-01-04 阿里巴巴集团控股有限公司 A kind of website back door detection method and device

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103561012A (en) * 2013-10-28 2014-02-05 中国科学院信息工程研究所 WEB backdoor detection method and system based on relevance tree
CN104811447A (en) * 2015-04-21 2015-07-29 深信服网络科技(深圳)有限公司 Security detection method and system based on attack association
CN105933268A (en) * 2015-11-27 2016-09-07 中国银联股份有限公司 Webshell detection method and apparatus based on total access log analysis
CN107103237A (en) * 2016-02-23 2017-08-29 阿里巴巴集团控股有限公司 A kind of detection method and device of malicious file
CN108156131A (en) * 2017-10-27 2018-06-12 上海观安信息技术股份有限公司 Webshell detection methods, electronic equipment and computer storage media

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111695117A (en) * 2020-06-12 2020-09-22 国网浙江省电力有限公司信息通信分公司 Webshell script detection method and device
CN111695117B (en) * 2020-06-12 2023-10-03 国网浙江省电力有限公司信息通信分公司 Webshell script detection method and device
CN112926054A (en) * 2021-02-22 2021-06-08 亚信科技(成都)有限公司 Malicious file detection method, device, equipment and storage medium
CN112926054B (en) * 2021-02-22 2023-10-03 亚信科技(成都)有限公司 Malicious file detection method, device, equipment and storage medium
CN112800427A (en) * 2021-04-08 2021-05-14 北京邮电大学 Webshell detection method and device, electronic equipment and storage medium
CN112800427B (en) * 2021-04-08 2021-09-28 北京邮电大学 Webshell detection method and device, electronic equipment and storage medium
CN113240247A (en) * 2021-04-21 2021-08-10 深圳铭锋达精密技术有限公司 Quality measurement method and device, terminal equipment and storage medium
CN113805894A (en) * 2021-09-17 2021-12-17 杭州云深科技有限公司 Abnormal APK (android Package) identification method, electronic equipment and readable storage medium
CN113805894B (en) * 2021-09-17 2023-08-18 杭州云深科技有限公司 Abnormal APK identification method, electronic equipment and readable storage medium

Also Published As

Publication number Publication date
CN108985057A (en) 2018-12-11
CN108985057B (en) 2022-07-22

Similar Documents

Publication Publication Date Title
WO2020000743A1 (en) Webshell detection method and related device
US9665713B2 (en) System and method for automated machine-learning, zero-day malware detection
CN109145600B (en) System and method for detecting malicious files using static analysis elements
US9336389B1 (en) Rapid malware inspection of mobile applications
US9141796B2 (en) System and method for detecting malware in file based on genetic map of file
CN107888554B (en) Method and device for detecting server attack
US20180278635A1 (en) Apparatus, method, and computer program for detecting malware in software defined network
CN109992969B (en) Malicious file detection method and device and detection platform
CN110647750B (en) File integrity measurement method and device, terminal and security management center
US20190180032A1 (en) Classification apparatus, classification method, and classification program
JP2017142744A (en) Information processing apparatus, virus detection method, and program
CN113114680B (en) Detection method and detection device for file uploading vulnerability
CN111368289B (en) Malicious software detection method and device
US20230418943A1 (en) Method and device for image-based malware detection, and artificial intelligence-based endpoint detection and response system using same
CN113472803A (en) Vulnerability attack state detection method and device, computer equipment and storage medium
US20190370476A1 (en) Determination apparatus, determination method, and determination program
CN113452710B (en) Unauthorized vulnerability detection method, device, equipment and computer program product
KR101628602B1 (en) Similarity judge method and appratus for judging similarity of program
US20220284109A1 (en) Backdoor inspection apparatus, backdoor inspection method, and non-transitory computer readable medium
CN111368128A (en) Target picture identification method and device and computer readable storage medium
US20200334353A1 (en) Method and system for detecting and classifying malware based on families
CN115310087A (en) Website backdoor detection method and system based on abstract syntax tree
CN108021951A (en) A kind of method of document detection, server and computer-readable recording medium
US20210174199A1 (en) Classifying domain names based on character embedding and deep learning
CN114254069A (en) Domain name similarity detection method and device and storage medium

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 18923773

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 08.04.2021)

122 Ep: pct application non-entry in european phase

Ref document number: 18923773

Country of ref document: EP

Kind code of ref document: A1