CN113810375B - Webshell detection method, device and equipment and readable storage medium - Google Patents

Webshell detection method, device and equipment and readable storage medium Download PDF

Info

Publication number
CN113810375B
CN113810375B CN202110930565.XA CN202110930565A CN113810375B CN 113810375 B CN113810375 B CN 113810375B CN 202110930565 A CN202110930565 A CN 202110930565A CN 113810375 B CN113810375 B CN 113810375B
Authority
CN
China
Prior art keywords
character string
subset
printable
strings
determining whether
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202110930565.XA
Other languages
Chinese (zh)
Other versions
CN113810375A (en
Inventor
刘卓龙
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Wangsu Science and Technology Co Ltd
Original Assignee
Wangsu Science and Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Wangsu Science and Technology Co Ltd filed Critical Wangsu Science and Technology Co Ltd
Priority to CN202110930565.XA priority Critical patent/CN113810375B/en
Publication of CN113810375A publication Critical patent/CN113810375A/en
Application granted granted Critical
Publication of CN113810375B publication Critical patent/CN113810375B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/14Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic
    • H04L63/1408Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic by monitoring network traffic
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/50Monitoring users, programs or devices to maintain the integrity of platforms, e.g. of processors, firmware or operating systems
    • G06F21/55Detecting local intrusion or implementing counter-measures
    • G06F21/56Computer malware detection or handling, e.g. anti-virus arrangements
    • G06F21/562Static detection
    • G06F21/563Static detection by source code analysis
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/14Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic
    • H04L63/1408Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic by monitoring network traffic
    • H04L63/1416Event detection, e.g. attack signature detection
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/02Protocols based on web technology, e.g. hypertext transfer protocol [HTTP]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2221/00Indexing scheme relating to security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F2221/03Indexing scheme relating to G06F21/50, monitoring users, programs or devices to maintain the integrity of platforms
    • G06F2221/033Test or assess software

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Security & Cryptography (AREA)
  • Computer Hardware Design (AREA)
  • General Engineering & Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Software Systems (AREA)
  • Computing Systems (AREA)
  • Theoretical Computer Science (AREA)
  • General Health & Medical Sciences (AREA)
  • Virology (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Information Transfer Between Computers (AREA)

Abstract

The application discloses a webshell detection method, a device, equipment and a readable storage medium. Wherein the object code is a printable string of the set of strings other than the normal string. By adopting the scheme, the webshell detection can be carried out on the webpage file without carrying out the webshell detection on the whole webpage file, but only by judging whether the target code exists in the character string set of the webpage file, and as the number of the printable character strings in the character string set of one webpage file is smaller, the aims of reducing the resource consumption of the webshell detection and improving the speed of the webshell detection can be fulfilled.

Description

Webshell detection method, device and equipment and readable storage medium
Technical Field
The present application relates to the field of information security technologies, and in particular, to a webshell detection method, apparatus, device, and readable storage medium.
Background
The webshell is a command execution environment in the form of a web page file, and is a malicious web page backdoor file. After a hacker invades a website server, the webpage backdoor file and the normal webpage file in the Web directory of the website server are mixed together, so that malicious operations such as data deletion or modification and the like are performed on the website server.
In order to prevent hackers from invading the website server, webshell detection needs to be performed on the webpage files. Common webshell detection methods include regular matching, machine learning algorithms, and the like. The web page file includes a text file, a picture file, and the like.
However, since the image file has a large volume, the webshell detection method consumes a lot of resources, is slow, and affects other services of the website server.
Disclosure of Invention
The embodiment of the application provides a webshell detection method, a webshell detection device, webshell detection equipment and a readable storage medium, and aims of reducing resource consumption of webshell detection and improving webshell detection speed are achieved by performing webshell detection on a webpage file in a mode of judging whether a target code exists in a printable character string extracted from the webpage file.
In a first aspect, an embodiment of the present application provides a webshell detection method, including:
extracting printable character strings from a webpage file to obtain a character string set;
determining whether a target code exists in the character string set, wherein the target code is a printable character string except a normal character string in the character string set;
and if the target code exists in the character string set, determining that the webpage file is a webshell file.
In a second aspect, an embodiment of the present application provides a webshell detection apparatus, including:
the extraction module is used for extracting the printable character strings from the webpage files to obtain a character string set;
a determining module, configured to determine whether a target code exists in the character string set, where the target code is a printable character string in the character string set except for a normal character string;
and the processing module is used for determining that the webpage file is a webshell file if the target code exists in the character string set.
In a third aspect, an embodiment of the present application provides an electronic device, including: a processor, a memory and a computer program stored on the memory and executable on the processor, the processor when executing the computer program causing the electronic device to carry out the method according to the first aspect or the various possible implementations of the first aspect.
In a fourth aspect, embodiments of the present application provide a computer-readable storage medium, in which computer instructions are stored, and when executed by a processor, the computer instructions are configured to implement the method according to the first aspect or various possible implementation manners of the first aspect.
In a fifth aspect, embodiments of the present application provide a computer program product comprising a computer program, which when executed by a processor, implements the method according to the first aspect or the various possible implementations of the first aspect.
According to the webshell detection method, the device and the equipment and the readable storage medium provided by the embodiment of the application, the electronic equipment extracts the printable character strings in the webpage file to obtain the character string set, further determines whether the character string set has the target codes, and determines that the webpage file is the webshell file if the character string set has the target codes. Wherein the object code is a printable string of the set of strings other than the normal string. By adopting the scheme, webshell detection is carried out on the webpage file without carrying out webshell detection on the whole webpage file, but only whether the target code exists in the character string set of the webpage file is judged, and the number of printable character strings in the character string set of one webpage file is small, so that the aims of reducing resource consumption of webshell detection and improving webshell detection speed can be fulfilled.
Drawings
In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.
Fig. 1 is a flowchart of a webshell detection method provided in an embodiment of the present application;
FIG. 2 is a schematic diagram of a set of strings for a picture file in JPEG format;
FIG. 3 is another schematic diagram of a set of character strings for a picture file in JPEG format;
FIG. 4 is a schematic diagram of a string collection for a picture file in GIF format;
fig. 5 is a schematic process diagram of a webshell detection method provided in an embodiment of the present application;
FIG. 6 is a corresponding flow diagram of FIG. 5;
fig. 7 is a schematic diagram of a webshell detection apparatus according to an embodiment of the present application;
fig. 8 is a schematic structural diagram of an electronic device according to an embodiment of the present application.
Detailed Description
To make the objects, technical solutions and advantages of the present application more clear, embodiments of the present application will be described in further detail below with reference to the accompanying drawings.
Webshell is also called as a web script Trojan, a web backdoor and the like, and is a command execution environment in the form of web files such as dynamic Server Pages (ASPs), hypertext preprocessors (PHPs) and Java Server Pages (JSPs). A hacker uploads a webpage file containing Webshell to a server, then the Webshell is used for obtaining the access right of the server, and the server is used for executing any system command, carrying out operations of increasing, deleting, modifying and checking files on the system, implanting malicious software or further attacking an intranet and the like. Therefore, in order to maintain the security of the Web site (Web) server, it is necessary to perform webshell detection on the Web page file.
The traditional webshell detection method comprises regular matching, a machine learning algorithm and the like. By the traditional webshell detection method, web page files such as picture files and text files can be subjected to webshell detection. In the conventional detection processes, calculation such as encoding and vector conversion needs to be performed on the webpage file.
However, since the volume of the picture file is large, it is usually much larger than the volume of the text file, for example, the size of the text file is 4K, and the size of the picture file is 4M, which is 100 times the size of the text file. Resources consumed for encoding, vector conversion and the like of the 4K text file are far smaller than resources consumed for encoding, vector conversion and the like of the 4M picture file.
Obviously, compared with the webshell detection of the text file, if the traditional webshell detection method is adopted to perform the webshell detection of the picture file, a large amount of computing resources, network resources, storage resources and the like are consumed. Moreover, the detection speed is slow. Obviously, the conventional webshell detection method cannot perform webshell detection on websites with many pictures, such as gallery websites and the like.
Based on this, embodiments of the present application provide a method, an apparatus, a device, and a readable storage medium for webshell detection, where a web page file is subjected to webshell detection by determining whether a target code exists in a printable string extracted from the web page file, so as to achieve the purposes of reducing resource consumption of the webshell detection and increasing the webshell detection speed.
The webshell detection method is applied to a scene of webshell detection of a webpage file, wherein the webpage file is, for example, a picture file, a compression package file and the like. The server may be an independent physical server, or a server cluster or distributed system formed by a plurality of physical servers, and may be a cloud server providing basic cloud computing services such as a cloud service, a cloud database, cloud computing, a cloud function, cloud storage, a Network service, cloud communication, a middleware service, a domain name service, a security service, a Content Delivery Network (CDN), a big data and artificial intelligence platform, and the like.
The terminal device may be hardware or software. When the terminal device is hardware, the terminal device is, for example, a mobile phone, a tablet computer, an electronic book reader, a laptop portable computer, a desktop computer, or the like. When the terminal device is software, it may be installed in the above listed hardware devices, in this case, the terminal device is, for example, a plurality of software modules or a single software module, and the embodiments of the present application are not limited.
Fig. 1 is a flowchart of a webshell detection method provided in an embodiment of the present application. The main execution body of the present embodiment is the electronic device, and the present embodiment includes:
101. printable strings are extracted from a web page file to obtain a set of strings.
In the embodiment of the present application, the web page file is, for example, a picture file, a compressed package file, or the like. The electronic equipment can periodically perform webshell detection on the webpage file. Or the electronic equipment is a website server with more pictures, and when the terminal equipment requests to acquire the picture file from the website server, the electronic equipment performs webshell detection on the picture file requested by the terminal equipment.
Typically, the web page files are stored in the electronic device in the form of a binary sequence. The electronic device extracts printable character strings from the webpage file by using a tool such as a linux command and the like, so as to obtain a character string set.
102. Determining whether an object code exists in the character string set, wherein the object code is a printable character string in the character string set except a normal character string. If the target code exists in the character string set, executing step 103; if the target code does not exist in the character string set, step 104 is executed.
Generally, the content of executable files in the operating system of an electronic device is difficult to read and understand completely by human beings unless the executable files are restored into codes through reverse engineering and the like, and most of the executable files except a small number of recognizable character strings such as characters, letters, numbers, symbols and combinations thereof are presented as messy codes. The recognizable character strings can be extracted and printed, and called printable character strings, and the character strings such as characters, letters, numbers, symbols and combinations thereof are not particularly meaningful and are not codes, and are called normal character strings hereinafter. Taking the operating system as centros 7.1 as an example, printable strings such as abc4, b34r and the like are extracted from the executable file. For the executable file, whether the executable file is a safe file can be determined by judging the similarity of the printable character string and the preset character string.
Most normal web documents have printable strings that are short and fixed in length. Taking a web page file as an image file, and taking the image file as an example, where the image file is not inserted into webshell, the length of a normal character string of the image file in a Joint Photographic Experts Group (JPEG) Format is 4-7 characters, and the length of a normal character string of the image file in a Graphics Interchange Format (GIF) Format is usually about 8 characters.
Once a picture file has webshell embedded therein, or a webshell file is disguised as a picture file, the set of strings of the picture file may be, except for a few normal strings, strings that characterize webshell code, which are referred to as target code, which may be, for example, a sentence of webshell code or code fragment. For example, a sentence webshell code, a sentence php webshell: <? php @ eval ($ _ POST [ pp ]); is there a FIG. 6, a sentence jsp webshell: get time (). Exec (request. Getparameter ("cmd")); %, a sentence asp webshell: <% eval request ("chopper")%), etc.
Fig. 2 is a schematic diagram of a character string set of a picture file in JPEG format. Fig. 2 illustrates only a portion of the printable strings in the set of strings. Each row in FIG. 2 represents a printable string, such as C < Ww, h # & Y, etc., which has no special meaning, and the printable strings are normal strings, each normal string (each row) being 4-7 characters in length.
Fig. 3 is another schematic diagram of a character string set of a picture file in JPEG format. Fig. 3 illustrates only a portion of a printable string of a picture file in JPEG format. Each row in fig. 3 represents a printable string, which is also a normal string, and it should be noted that the normal strings may include some strings for identification in addition to some strings formed by combinations of numbers, letters and symbols. In the figure, 5 character strings are arranged in a thick black rectangular box, wherein 3 printable character strings comprise Adobe and are used for identification, one of the other two normal character strings is a time character string, and the other one is a string of numbers.
The webshell is not embedded in the picture file shown in fig. 2 and 3.
Fig. 4 is a schematic diagram of a character string set of a picture file in GIF format, in which a webshell is embedded. FIG. 4 illustrates a portion of a printable string of a set of strings. Each row in fig. 4 represents a printable string, and the printable strings of the other rows, except for the last row, are about 8 characters in length. The length of the last line of printable character strings is far greater than 8 characters, which is caused by the reason that the picture file is a webshell file. That is to say, the printable character string in the last line is used for representing that the image file is the target code of the webshell file, and the printable character strings in other lines are normal character strings.
Therefore, whether a webpage file is a webshell file can be judged according to the mode of whether the target code exists in the character string set. For example, the electronic device determines whether the object code is present in the set of strings based on a white list, a black list, or the like. For another example, the electronic device determines whether the object code is present in the set of strings using a webshell detection tool. The Webshell detection tool is used for detecting whether the printable character strings are the target codes or not, a regular matching engine, a machine learning model and the like can be adopted, and the embodiment of the application is not limited.
103. And determining the webpage file as a webshell file.
104. Determining that the web page file is not a webshell file.
According to the webshell detection method provided by the embodiment of the application, the electronic equipment extracts the printable character strings in the webpage file to obtain the character string set, further determines whether the character string set has the target codes, and determines that the webpage file is the webshell file if the character string set has the target codes. Wherein the object code is a printable string of the set of strings other than the normal string. By adopting the scheme, webshell detection is carried out on the webpage file without carrying out webshell detection on the whole webpage file, but only whether the target code exists in the character string set of the webpage file is judged, and the number of printable character strings in the character string set of one webpage file is small, so that the aims of reducing resource consumption of webshell detection and improving webshell detection speed can be fulfilled.
Optionally, in the above embodiment, when the electronic device determines whether the target code exists in the character string set, it is first determined whether a printable character string whose length exceeds a preset length exists in the character string set. And if the printable character string with the length exceeding the preset length does not exist in the character string set, determining that the webpage file is not the webshell file. If printable character strings with the length exceeding the preset length exist in the character string set, whether target codes exist in the character string set or not is determined according to the first subset, and the length of each printable character string in the first subset exceeds the preset length.
Illustratively, the length of the normal character string in the web page file with a specific format is relatively fixed. For example, the printable string in the JPEG picture file is 4-7 characters in length, and the printable string in the GIF picture file is about 8 characters in length. Therefore, the electronic device sets a preset length for the web page files in different formats in advance. For example, the preset length is 7 characters for a JPEG format picture file, and 8 characters for a GIF format picture file.
After extracting the printable character string of the web page file, the electronic device outputs the character string in the form of each line in line-feed intervals. That is, there are multiple rows in the character string set, each row represents a printable character string, and the number of rows is the number of printable character strings. Thereafter, the electronic device determines a length of each printable string in the set of strings. And if the length of the printable character string of a certain row is greater than the preset length, marking the printable character string of the row as a suspicious character string. And if the character string set does not have the suspicious character strings, determining that the webpage file is a normal file instead of the webshell file. If printable character strings with the length exceeding the preset length exist in the character string set, a first set is obtained according to the printable character strings with the length exceeding the preset length.
And after the electronic equipment obtains the first set, determining whether the webpage file is a webshell file according to the first set. For example, the electronic device determines whether the web page file is a webshell file based on the first set, the white list, the black list, and the like. For another example, the electronic device determines whether the webpage file is a webshell file according to the first set, the webshell detection tool, and the like.
By adopting the scheme, the electronic equipment determines whether the target code exists in the character string set according to the length of the printable character string in the character string set, so that the resource consumption is low and the speed is high.
Fig. 5 is a schematic process diagram of a webshell detection method according to an embodiment of the present application. In this embodiment, the electronic device does not use a webshell detection tool, but analyzes the length of the printable character string in the character string set, whether the printable character string is matched with a white list, whether the printable character string is matched with a black list, and the like, determines whether a target code exists in the character string set step by step, and further determines whether the web page file is a webshell file.
Fig. 6 is a flowchart corresponding to fig. 5. The embodiment comprises the following steps:
601. printable character strings in a webpage file are collected to obtain a character string set.
602. Determining whether a printable character string with a length exceeding a preset length exists in the character string set, and if the printable character string with the length exceeding the preset length exists, executing step 603; if there is no printable string with a length exceeding the preset length, go to step 606.
603. Determining whether printable character strings which cannot be matched with a white list exist in the first subset, and if all printable character strings in the first set are matched with the white list, executing a step 606; if there are printable strings in the first subset that cannot be matched to the whitelist, step 604 is performed.
Wherein a length of each printable string in the first subset exceeds a preset length. Matching each printable string in the first set with the white list means: for any printable string in the first set, there is a keyword in the whitelist that matches the printable string. The existence of printable strings in the first subset that cannot be matched with the white list means that: at least one printable character string exists in the first set, and keywords corresponding to the printable character string do not exist in the white list.
The electronic device pre-establishes a white list, and keywords in the white list are, for example, "Adobe", "Microsoft", "<? xpacket ", and the like. The electronic device may establish a general white list, or may establish a white list for web page files of different formats, respectively.
When the electronic device determines whether the printable character strings which cannot be matched with the white list exist in the first subset, a full matching mode, an inclusion matching mode, a beginning matching mode or the like can be adopted according to requirements.
In the full matching mode, the printable character strings in the first subset are required to be identical to the keywords, that is, the electronic device determines whether each printable character string in the first subset is a keyword in the white list. In this manner, for each printable string in the first subset, the electronic device determines whether a common keyword exists in the whitelist. If each printable character string in the first subset is a keyword in a white list, the electronic equipment determines that no target code exists in the character string set, and further determines that the webpage file is not a webshell file. If at least one printable string in the first subset is not a keyword in the whitelist, the electronic device further determines whether the target code exists in the set of strings according to the second subset. The second subset includes each printable string that is longer than a predetermined length and that cannot be matched against the whitelist.
In the inclusion matching mode, the printable strings in the first subset are required to include at least the keyword, that is, the electronic device determines whether each printable string in the first subset includes the keyword in the white list. If each printable character string in the first subset comprises the keywords in the white list, determining that no target code exists in the character string set, and further determining that the webpage file is not a webshell file; if at least one printable character string in the first subset does not contain keywords in the white list, the electronic device further determines whether a target code exists in the character string set according to the second subset, and further determines whether the webpage file is a webshell file. The second subset includes each printable string that is longer than a predetermined length and that fails to match the whitelist.
In the beginning matching manner, several characters at the beginning of the printable character strings in the first subset are required to be keywords in the white list, that is, the electronic device determines whether each printable character string in the first subset includes the keywords in the white list, and whether the character corresponding to the keyword in the printable character string is located at the beginning of the printable character string. If each printable character string in the first subset comprises a keyword in the white list, and a character corresponding to the keyword in the printable character string is located at the beginning of the printable character string, determining that no target code exists in the character string set, and further determining that the webpage file is not a webshell file; if at least one printable character string in the first subset does not contain the keywords in the white list, and/or the keywords are not located at the beginning of the printable character string, the electronic device further determines whether the target code exists in the character string set according to the second subset, and further determines whether the webpage file is a webshell file.
The full matching method has the highest accuracy, and comprises the next matching method and the beginning matching method. The electronic equipment can flexibly select a proper matching mode according to the requirement, and has high flexibility and high speed.
604. Determining whether printable strings in the second subset match the blacklist exist, if printable strings in the second subset do not match the blacklist, executing step 606; if there is at least one string in the second subset that matches the blacklist, then step 605 is executed.
The electronic device establishes a blacklist in advance, and keywords in the blacklist are "php", "asp", "jsp", and the like. The electronic device may establish a general blacklist, or may establish a blacklist for web page files of different formats separately.
When the electronic device determines whether the printable character string matched with the blacklist exists in the second subset, a full matching mode, an inclusion matching mode, a beginning matching mode, and the like can be adopted according to requirements, and the white list matching process can be referred to, and details are not repeated here.
605. And determining that the webpage file is a webshell file.
606. Determining that the web page file is not a webshell file.
By adopting the scheme, the electronic equipment determines whether the webpage file is the webshell file or not by analyzing the length of the printable character strings in the character string set, whether the printable character strings are matched with the keywords in the white list, whether the printable character strings are matched with the keywords in the black list and the like.
In the above embodiment, the electronic device directly analyzes the length of the printable character string, and determines whether the printable character string is matched with a white list or not to perform webshell detection on the webpage file. However, the embodiment of the application is not limited, and in other feasible implementation manners, after the electronic device obtains the character string set, it may also be determined whether the webpage file is a webshell file by using a webshell detection tool.
For example, after the electronic device obtains the character string set, directly inputting printable character strings in the character string set to the webshell detection tool, so that the webshell detection tool outputs a target code detection result, where the target code detection result is used to indicate whether a target code exists in the character string set. This mode will be referred to as mode one hereinafter.
In the scheme, printable character strings do not need to be filtered according to length, a white list, a black list and the like, all printable character strings in the character string set are input to the webshell detection model, the method is suitable for scenes with high safety requirements, and the printable character strings in the character string set only need to be input to the webshell detection tool, the whole webpage file does not need to be input to the webshell detection tool, so that the resource consumption pressure is reduced, and the detection efficiency is improved.
For another example, after the electronic device obtains the character string set, it is determined whether printable character strings with lengths exceeding a preset length exist in the character string set. If the printable character string with the length exceeding the preset length exists in the character string set, generating a first subset according to the printable character string with the length exceeding the preset length. And then inputting the printable character strings in the first subset into a webshell detection tool, so that the webshell detection tool outputs a target code detection result, wherein the target code detection result is used for indicating whether a target code exists in the character string set or not. This method will be referred to as method two hereinafter.
In the scheme, the printable character strings in the character string set are filtered according to the length, only the printable character strings with the length exceeding the preset length are reserved, the accuracy of the target code detection result is lower than that of the mode one, but the amount of the printable character strings input to the webshell detection tool is smaller than that of the mode one, so that the mode two is higher in speed and lower in resource consumption.
For another example, after obtaining the set of character strings, the electronic device determines whether printable character strings with a length exceeding a preset length exist in the set of character strings. If the printable character string with the length exceeding the preset length exists in the character string set, generating a first subset according to the printable character string with the length exceeding the preset length. Thereafter, the electronic device further determines whether there are printable strings in the first subset that cannot be matched to the whitelist. If the printable character strings which cannot be matched with the white list exist in the first subset, generating a second subset according to the printable character strings which cannot be matched with the white list, inputting the printable character strings in the second subset into a webshell detection tool to output a target code detection result, wherein the target code detection result is used for indicating whether target codes exist in the character string set or not, and the length of each printable character string contained in the second subset exceeds a preset length and cannot be matched with the white list. This method will be referred to as method three below.
Compared with the second method, the third method filters more printable character strings, so that the accuracy of the target code detection result is further reduced, but the speed is higher and the resource consumption is lower.
For another example, after the electronic device generates the second subset according to the printable strings that cannot be matched with the white list, it is further determined whether there are printable strings in the second subset that match with the black list. If printable character strings capable of being matched with the blacklist exist in the second subset, a third subset is generated according to the printable character strings capable of being matched with the blacklist, the printable character strings in the third subset are input to a webshell detection tool, so that a target code detection result is output, and the target code detection result is used for indicating whether target codes exist in the character string set or not. Obviously, the third subset comprises each printable string having a length exceeding a preset length, not matching the white list, and matching the black list. This mode will be referred to as mode four hereinafter.
Compared with the third mode, the mode is that more printable character strings are filtered, so that the accuracy of the target code detection result is further reduced, but the speed is higher, the resource consumption is lower, and the method is suitable for scenes with the lowest safety requirement.
For another example, after the electronic device obtains the character string set, each printable character string in the character string set is directly matched with the blacklist. If the printable character string which can be matched with the blacklist exists in the character string, generating a fourth subset according to the printable character string which can be matched with the blacklist, inputting the printable character string in the fourth subset into a webshell detection tool to output a target code detection result, wherein the target code detection result is used for indicating whether a target code exists in the character string set or not. This mode will be referred to as mode five hereinafter.
Compared with the first method, the fifth method filters more printable strings, so that the accuracy of the target code detection result is further reduced, but the speed is higher and the resource consumption is lower.
The webshell detection tool is, for example, a regular matching engine, a machine learning model, and the like, and the embodiments of the present application are not limited.
It should be noted that, when the webshell detection tool is used for detecting a web page file, the webshell detection tool may also perform encoding, decoding, and the like.
When the electronic equipment conducts webshell detection on the webpage file by means of the webshell detection tool, printable character strings in the character string set, the first subset, the second subset, the third subset or the fourth subset can be flexibly selected to be input into the webshell detection tool, and the method is high in flexibility and high in speed.
The following are embodiments of the apparatus of the present application that may be used to perform embodiments of the method of the present application. For details which are not disclosed in the embodiments of the apparatus of the present application, reference is made to the embodiments of the method of the present application.
Fig. 7 is a schematic diagram of a webshell detection apparatus according to an embodiment of the present application. This webshell detection device 700 includes: an extraction module 71, a determination module 72 and a processing module 73.
An extracting module 71, configured to extract a printable character string from a web page file to obtain a character string set;
a determining module 72, configured to determine whether an object code exists in the character string set, where the object code is a printable character string in the character string set except for a normal character string;
and the processing module 73 is configured to determine that the web page file is a webshell file if the target code exists in the character string set.
In a possible implementation manner, the determining module 72 is configured to determine whether there is a printable character string with a length exceeding a preset length in the character string set; if printable character strings with the length exceeding the preset length exist in the character string set, whether target codes exist in the character string set or not is determined according to the first subset, and the length of each printable character string in the first subset exceeds the preset length.
In a possible implementation manner, when determining whether the target code exists in the character string set according to the first subset, the determining module 72 is configured to determine whether a printable character string that cannot be matched with the white list exists in the first subset; if the printable character strings which cannot be matched with the white list exist in the first subset, determining whether printable character strings which are matched with the black list exist in a second subset, wherein the length of each printable character string contained in the second subset exceeds a preset length and cannot be matched with the white list; and if at least one printable character string matched with the blacklist exists in the second subset, determining that a target code exists in the character string set.
In a possible implementation manner, when determining whether the target code exists in the character string set according to the first subset, the determining module 72 is configured to input the printable character strings in the first subset to a webshell detection tool to output a target code detection result, where the target code detection result is used to indicate whether the target code exists in the character string set.
In a possible implementation manner, when determining whether the target code exists in the character string set according to the first subset, the determining module 72 is configured to determine whether a printable character string that cannot be matched with the white list exists in the first subset; if printable character strings which cannot be matched with the white list exist in the first subset, inputting the printable character strings in the second subset into a webshell detection tool to output a target code detection result, wherein the target code detection result is used for indicating whether target codes exist in the character string set or not, and the length of each printable character string included in the second subset exceeds a preset length and cannot be matched with the white list.
In a possible implementation manner, when determining whether the target code exists in the character string set according to the first subset, the determining module 72 is configured to determine whether a printable character string that cannot be matched with the white list exists in the first subset; if the printable character strings which cannot be matched with the white list exist in the first subset, determining whether printable character strings which are matched with the black list exist in a second subset, wherein the length of each printable character string contained in the second subset exceeds a preset length and cannot be matched with the white list; inputting printable character strings in a third subset into a webshell detection tool to output a target code detection result, wherein the target code detection result is used for indicating whether a target code exists in the character string set, and the length of each printable character string in the third subset exceeds a preset length, is not matched with a white list and is matched with a black list.
In a possible implementation, when the determining module 72 determines whether there are printable strings in the first subset that cannot be matched with a white list, it determines whether each printable string in the first subset is a keyword in the white list; or, determining whether each printable string in the first subset includes a keyword in the whitelist; or, determining whether each printable character string in the first subset includes a keyword in the white list, and whether a character corresponding to the keyword in the printable character string is located at the beginning of the printable character string.
In a possible implementation manner, the determining module 72 is configured to input the printable character strings in the character string set to a webshell detection tool to output a target code detection result, where the target code detection result is used to indicate whether a target code exists in the character string set.
In a possible implementation manner, the determining module 72 is configured to determine printable strings that match a blacklist from the set of strings to obtain a fourth subset; inputting the printable character strings in the fourth subset into a webshell detection tool to output a target code detection result, wherein the target code detection result is used for indicating whether a target code exists in the character string set or not.
The webshell detection device provided by the embodiment of the application can execute the actions of the electronic equipment in the embodiment, the implementation principle and the technical effect are similar, and the details are not repeated here.
Fig. 8 is a schematic structural diagram of an electronic device according to an embodiment of the present application. As shown in fig. 8, the electronic device 800 includes:
a processor 801 and a memory 802;
the memory 802 stores computer instructions;
the processor 801 executes the computer instructions stored by the memory 802, causing the processor 801 to perform the webshell detection method described above.
For a specific implementation process of the processor 801, reference may be made to the above method embodiments, which have similar implementation principles and technical effects, and details of this embodiment are not described herein again.
Optionally, the electronic device 800 further comprises a communication component 803. Wherein the processor 801, the memory 802 and the communication component 803 may be connected by a bus 804.
Embodiments of the present application further provide a computer-readable storage medium, in which computer instructions are stored, and when the computer instructions are executed by a processor, the computer instructions are used to implement the webshell detection method implemented by the electronic device.
Embodiments of the present application further provide a computer program product, which contains a computer program, and when the computer program is executed by a processor, the webshell detection method implemented by the electronic device is implemented.
Other embodiments of the present application will be apparent to those skilled in the art from consideration of the specification and practice of the invention disclosed herein. This application is intended to cover any variations, uses, or adaptations of the invention following, in general, the principles of the application and including such departures from the present disclosure as come within known or customary practice within the art to which the invention pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the application being indicated by the following claims.
It will be understood that the present application is not limited to the precise arrangements described above and shown in the drawings and that various modifications and changes may be made without departing from the scope thereof. The scope of the application is limited only by the appended claims.

Claims (10)

1. A webshell detection method is characterized by comprising the following steps:
extracting printable character strings from a webpage file to obtain a character string set;
determining whether a target code exists in the character string set, wherein the target code is a printable character string except a normal character string in the character string set; wherein the determining whether the target code exists in the character string set comprises: determining whether printable character strings with the length exceeding a preset length exist in the character string set;
if printable character strings with the length exceeding the preset length exist in the character string set, marking the printable character strings with the length exceeding the preset length as suspicious character strings, obtaining a first subset according to the suspicious character strings, and determining whether target codes exist in the character string set or not according to the first subset;
determining whether an object code is present in the set of strings based on the first subset, comprising:
determining whether there are printable strings in the first subset that cannot be matched to a whitelist;
if the printable character strings which cannot be matched with the white list exist in the first subset, determining whether printable character strings which are matched with the black list exist in a second subset, wherein the length of each printable character string contained in the second subset exceeds a preset length and cannot be matched with the white list;
determining whether a target code exists in the character string set according to a second subset, and if at least one printable character string matched with the blacklist exists in the second subset, determining that the target code exists in the character string set;
and if the target code exists in the character string set, determining that the webpage file is a webshell file.
2. The method of claim 1, wherein determining whether the object code is present in the set of strings according to the first subset comprises:
inputting the printable character strings in the first subset into a webshell detection tool to output a code detection result, wherein the code detection result is used for indicating whether the target code exists in the character string set or not.
3. The method of claim 1, wherein determining whether object code is present in the set of strings according to the second subset comprises:
inputting the printable character strings in the second subset into a webshell detection tool to output a code detection result, wherein the code detection result is used for indicating whether a target code exists in the character string set, and the length of each printable character string in the second subset exceeds a preset length and cannot be matched with a white list.
4. The method of claim 1, wherein determining whether object code is present in the set of strings based on the second subset further comprises:
if at least one printable character string matched with the blacklist exists in the second subset, inputting the printable character string in a third subset into a webshell detection tool to output a code detection result, wherein the code detection result is used for indicating whether a target code exists in the character string set or not, and the length of each printable character string contained in the third subset exceeds a preset length, is not matched with the whitelist and is matched with the blacklist.
5. The method of claim 1 or 4, wherein the determining whether there are printable strings in the first subset that cannot be matched to a whitelist comprises:
determining whether each printable string in the first subset is a keyword in the whitelist; alternatively, the first and second electrodes may be,
determining whether each printable string in the first subset includes a keyword in the whitelist; alternatively, the first and second electrodes may be,
determining whether each printable character string in the first subset includes a keyword in the white list, and whether a character corresponding to the keyword in the printable character string is located at the beginning of the printable character string.
6. The method of claim 1, wherein the determining whether object code is present in the set of strings comprises:
inputting printable character strings in the character string set into a webshell detection tool to output a code detection result, wherein the code detection result is used for indicating whether a target code exists in the character string set.
7. The method of claim 1, wherein the determining whether object code is present in the set of strings comprises:
determining printable character strings matched with a blacklist from the character string set to obtain a fourth subset;
inputting the printable character strings in the fourth subset into a webshell detection tool to output a code detection result, wherein the code detection result is used for indicating whether a target code exists in the character string set.
8. A webshell detection device, comprising:
the extraction module is used for extracting the printable character strings from the webpage files to obtain a character string set;
a determining module, configured to determine whether an object code exists in the character string set, where the object code is a printable character string in the character string set except a normal character string; wherein the determining whether the target code exists in the character string set comprises: determining whether printable character strings with the length exceeding a preset length exist in the character string set; if printable character strings with the length exceeding the preset length exist in the character string set, marking the printable character strings with the length exceeding the preset length as suspicious character strings, obtaining a first subset according to the suspicious character strings, and determining whether target codes exist in the character string set or not according to the first subset; determining whether an object code is present in the set of strings based on the first subset, comprising: determining whether there are printable strings in the first subset that cannot be matched to a whitelist; if the printable character strings which cannot be matched with the white list exist in the first subset, determining whether printable character strings which are matched with the black list exist in a second subset, wherein the length of each printable character string contained in the second subset exceeds a preset length and cannot be matched with the white list; determining whether a target code exists in the character string set or not according to a second subset, and if at least one printable character string matched with the blacklist exists in the second subset, determining that the target code exists in the character string set;
and the processing module is used for determining that the webpage file is a webshell file if the target code exists in the character string set.
9. An electronic device comprising a processor, a memory, and a computer program stored on the memory and executable on the processor, wherein the processor, when executing the computer program, causes the electronic device to carry out the method of any one of claims 1 to 7.
10. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the method according to any one of claims 1 to 7.
CN202110930565.XA 2021-08-13 2021-08-13 Webshell detection method, device and equipment and readable storage medium Active CN113810375B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110930565.XA CN113810375B (en) 2021-08-13 2021-08-13 Webshell detection method, device and equipment and readable storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110930565.XA CN113810375B (en) 2021-08-13 2021-08-13 Webshell detection method, device and equipment and readable storage medium

Publications (2)

Publication Number Publication Date
CN113810375A CN113810375A (en) 2021-12-17
CN113810375B true CN113810375B (en) 2023-01-20

Family

ID=78942908

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110930565.XA Active CN113810375B (en) 2021-08-13 2021-08-13 Webshell detection method, device and equipment and readable storage medium

Country Status (1)

Country Link
CN (1) CN113810375B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115438342A (en) * 2022-09-13 2022-12-06 武汉思普崚技术有限公司 Webshell detection method and related equipment
CN117579385B (en) * 2024-01-16 2024-03-19 山东星维九州安全技术有限公司 Method, system and equipment for rapidly screening novel WebShell flow

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10366234B2 (en) * 2016-09-16 2019-07-30 Rapid7, Inc. Identifying web shell applications through file analysis
CN107516041B (en) * 2017-08-17 2020-04-03 北京安普诺信息技术有限公司 WebShell detection method and system based on deep neural network
CN109905396A (en) * 2019-03-11 2019-06-18 北京奇艺世纪科技有限公司 A kind of WebShell file test method, device and electronic equipment
CN111368303B (en) * 2020-03-12 2023-12-29 深信服科技股份有限公司 PowerShell malicious script detection method and device
CN111414621B (en) * 2020-03-26 2022-07-08 厦门网宿有限公司 Malicious webpage file identification method and device
CN112487427A (en) * 2020-11-26 2021-03-12 网宿科技股份有限公司 Method, system and server for determining system white list

Also Published As

Publication number Publication date
CN113810375A (en) 2021-12-17

Similar Documents

Publication Publication Date Title
US10817615B2 (en) Method and apparatus for verifying images based on image verification codes
US20190019058A1 (en) System and method for detecting homoglyph attacks with a siamese convolutional neural network
CN113810375B (en) Webshell detection method, device and equipment and readable storage medium
CN108090351B (en) Method and apparatus for processing request message
US8490861B1 (en) Systems and methods for providing security information about quick response codes
CN112966713A (en) DGA domain name detection method and device based on deep learning and computer equipment
US20230222762A1 (en) Adversarially robust visual fingerprinting and image provenance models
CN115314291A (en) Model training method and assembly, safety detection method and assembly
CN107786529B (en) Website detection method, device and system
CN114626061A (en) Webpage Trojan horse detection method and device, electronic equipment and medium
US8464343B1 (en) Systems and methods for providing security information about quick response codes
CN115225328B (en) Page access data processing method and device, electronic equipment and storage medium
CN111368693A (en) Identification method and device for identity card information
CN116015777A (en) Document detection method, device, equipment and storage medium
CN115982675A (en) Document processing method, device, electronic equipment and storage medium
CN113888760B (en) Method, device, equipment and medium for monitoring violation information based on software application
CN114461833A (en) Picture evidence obtaining method and device, computer equipment and storage medium
CN114143074A (en) Webshell attack recognition device and method
CN114169540A (en) Webpage user behavior detection method and system based on improved machine learning
CN114491528A (en) Malicious software detection method, device and equipment
CN115883111A (en) Phishing website identification method and device, electronic equipment and storage medium
CN113378025A (en) Data processing method and device, electronic equipment and storage medium
CN114372265A (en) Malicious program detection method and device, electronic equipment and storage medium
CN116340991B (en) Big data management method and device for IP gallery material resources and electronic equipment
US11989266B2 (en) Method for authenticating digital content items with blockchain and writing digital content items data to blockchain

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant