CN108985059B - Webpage backdoor detection method, device, equipment and storage medium - Google Patents

Webpage backdoor detection method, device, equipment and storage medium Download PDF

Info

Publication number
CN108985059B
CN108985059B CN201810714112.1A CN201810714112A CN108985059B CN 108985059 B CN108985059 B CN 108985059B CN 201810714112 A CN201810714112 A CN 201810714112A CN 108985059 B CN108985059 B CN 108985059B
Authority
CN
China
Prior art keywords
html
file
matching
backdoor
framework
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201810714112.1A
Other languages
Chinese (zh)
Other versions
CN108985059A (en
Inventor
张鑫
王凯平
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Hongxiang Technical Service Co Ltd
Original Assignee
Beijing Qihoo Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Qihoo Technology Co Ltd filed Critical Beijing Qihoo Technology Co Ltd
Priority to CN201810714112.1A priority Critical patent/CN108985059B/en
Publication of CN108985059A publication Critical patent/CN108985059A/en
Application granted granted Critical
Publication of CN108985059B publication Critical patent/CN108985059B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/50Monitoring users, programs or devices to maintain the integrity of platforms, e.g. of processors, firmware or operating systems
    • G06F21/55Detecting local intrusion or implementing counter-measures
    • G06F21/56Computer malware detection or handling, e.g. anti-virus arrangements
    • G06F21/562Static detection

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Security & Cryptography (AREA)
  • Computer Hardware Design (AREA)
  • General Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Virology (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Information Transfer Between Computers (AREA)

Abstract

The embodiment of the specification provides a webpage backdoor detection method, a webpage backdoor detection device, a webpage backdoor detection equipment and a webpage backdoor detection storage medium. The method comprises the following steps: obtaining an html file by monitoring network traffic between a target host and a browser; extracting the html tag from the html file to obtain an html framework; matching the html frameworks by using a pre-established html framework model, wherein the html framework model is obtained by training a plurality of html frameworks serving as samples; and judging whether the html file is a webpage backdoor file or not according to the matching result. The variation of the webpage backdoor is usually realized by changing rendering effect and the like, and the changes usually do not relate to the change of the html framework, so that the variation of the webpage backdoor can be effectively resisted by the detection method provided by the embodiment of the specification.

Description

Webpage backdoor detection method, device, equipment and storage medium
Technical Field
The embodiment of the specification relates to the technical field of network security, in particular to a method, a device, equipment and a storage medium for detecting a webpage backdoor.
Background
Web backdoors are a common tool for hackers to attack target hosts. Taking Webshell as an example, the command execution environment is a command execution environment that exists in the form of web page files such as asp (Active Server Pages, dynamic target host Pages), php (Hypertext Preprocessor), jsp (Java Server Pages, Java target host Pages), cgi (common gateway interface), and the like, and may also be referred to as a web portal.
The traditional webpage backdoor detection method mainly describes the characteristics of known webpage backdoor files accurately to form a regular expression, and utilizes the regular expression to detect the webpage backdoor. This detection method relies on manual experience and is not flexible. The web backdoor is prevented from being identified, various varieties appear, and the web backdoor varieties are difficult to deal with by adopting the traditional detection mode.
Disclosure of Invention
Compared with a detection method based on a regular expression, the implementation mode is simple and flexible, and the webpage backdoor variety can be effectively resisted.
In a first aspect, an embodiment of the present specification provides a method for detecting a backdoor of a web page, where the method includes:
acquiring an html (hypertext markup language) file by monitoring network traffic between a target host and a browser;
extracting html tags from the html files to obtain html tag sequences arranged in the html files, wherein the html tag sequences form html frameworks of the html files;
matching the html framework by using a pre-established html framework model, wherein the html framework model is obtained by training a plurality of html frameworks serving as samples;
and judging whether the html file is a webpage backdoor file or not according to the matching result.
Optionally, the html framework model is obtained by extracting a html framework of a known web page backdoor file and training with the html framework of the known web page backdoor file as a sample.
Optionally, the matching result is a value of a matching degree, and the determining whether the html file is a web backdoor file according to the matching result includes:
comparing the value of the matching degree with a set threshold value;
and if the value of the matching degree exceeds a set threshold value, judging that the html file is a webpage backdoor file.
Optionally, if the value of the matching degree does not exceed the set threshold, the method further includes:
extracting attribute information of a label in the html file, wherein the attribute information of the label comprises an attribute and an attribute value of the label; respectively matching the attribute information of the tags by using a pre-established attribute information model; judging whether the html file is a webpage backdoor file or not according to a matching result;
alternatively, the first and second electrodes may be,
carrying out segmentation processing on the html file to obtain a plurality of html fragments; matching each html fragment by utilizing a pre-established fragment hash model; and judging whether the html file is a webpage backdoor file or not according to the matching result.
Based on any of the above method embodiments, optionally, before the matching of the html skeleton by using the pre-established html skeleton model, the method further includes: replacing the specific character string in the html label with a standard character string according to a preset rule;
and when the html framework model is established, replacing the specific character string in the html tag in the html framework as the sample with the standard character string according to the preset rule.
In a second aspect, an embodiment of the present specification provides a web page backdoor detection apparatus, including:
the Html file acquisition module is used for acquiring Html files by monitoring network traffic between the target host and the browser;
the Html framework extraction module is used for extracting Html tags from the Html files to obtain Html tag sequences arranged in the Html files, and the Html tag sequences form Html frameworks of the Html files;
the model matching module is used for matching the html framework by utilizing a pre-established html framework model, and the html framework model is obtained by training a plurality of html frameworks serving as samples;
and the webpage backdoor detection module is used for judging whether the html file is a webpage backdoor file according to the matching result.
Optionally, the html framework model is obtained by extracting a html framework of a known web page backdoor file and training with the html framework of the known web page backdoor file as a sample.
Optionally, the web page backdoor detection module is configured to:
comparing the value of the matching degree with a set threshold value;
and if the value of the matching degree exceeds a set threshold value, judging that the html file is a webpage backdoor file.
Optionally, if the value of the matching degree does not exceed the set threshold, the method further includes a second detection module, configured to:
extracting attribute information of each label in the html file, wherein the attribute information of the label comprises the attribute and the attribute value of the label; respectively matching the attribute information of each label by using a pre-established attribute information model; judging whether the html file is a webpage backdoor file or not according to a matching result;
alternatively, the first and second electrodes may be,
carrying out segmentation processing on the html file to obtain a plurality of html fragments; matching each html fragment by utilizing a pre-established fragment hash model; and judging whether the html file is a webpage backdoor file or not according to the matching result.
In a third aspect, an embodiment of the present specification provides a web page backdoor detection apparatus, including a memory and a processor, where the memory stores a computer program, and the processor implements the following steps when executing the computer program:
obtaining an html file by monitoring network traffic between a target host and a browser;
extracting html tags from the html files to obtain html tag sequences arranged in the html files, wherein the html tag sequences form html frameworks of the html files;
matching the html framework by using a pre-established html framework model, wherein the html framework model is obtained by training a plurality of html frameworks serving as samples;
and judging whether the html file is a webpage backdoor file or not according to the matching result.
In a fourth aspect, the present specification provides a computer readable storage medium, on which a computer program is stored, the computer program, when executed by a processor, implementing the steps of:
obtaining an html file by monitoring network traffic between a target host and a browser;
extracting html tags from the html files to obtain html tag sequences arranged in the html files, wherein the html tag sequences form html frameworks of the html files;
matching the html framework by using a pre-established html framework model, wherein the html framework model is obtained by training a plurality of html frameworks serving as samples;
and judging whether the html file is a webpage backdoor file or not according to the matching result.
The embodiment of the specification has the following beneficial effects:
in the process of implementing the invention, the inventor discovers through creative work that: the html framework of the webpage backdoor file is different from the normal html framework, namely, the html tag sorting mode of the webpage backdoor file is different from the normal html tag sorting mode and has track. Therefore, the html framework model can be obtained by training the html framework, model matching is further carried out on the html framework of the html file to be recognized, whether the html file is a webpage backdoor file or not is judged, and webpage backdoor recognition is achieved. In addition, the variation of the backdoor of the web page is usually realized by changing a rendering effect and the like, and the changes do not usually relate to the change of the html framework, so that the variation of the backdoor of the web page can be effectively resisted by the detection method provided by the embodiment of the specification.
Drawings
Fig. 1 is a schematic view of an application scenario provided in an embodiment of the present specification;
FIG. 2 is a flow chart of a method of a first aspect of an embodiment of the present disclosure;
fig. 3 is a schematic diagram of an apparatus according to a second aspect of an embodiment of the present disclosure.
Detailed Description
In order to better understand the technical solutions, the technical solutions of the embodiments of the present specification are described in detail below with reference to the drawings and specific embodiments, and it should be understood that the specific features of the embodiments and embodiments of the present specification are detailed descriptions of the technical solutions of the embodiments of the present specification, and are not limitations of the technical solutions of the present specification, and the technical features of the embodiments and embodiments of the present specification may be combined with each other without conflict.
The technical scheme provided by the embodiment of the present description is applied to a traffic layer of a network, and can be implemented at a gateway, or on a device having a gateway function, or on any device capable of monitoring network traffic. Taking the method provided by the embodiment of the present specification as an example, as shown in fig. 1, a gateway 101 monitors network traffic between a target host 102 and a browser 103, and obtains an html file from the network traffic, that is, an html code starting with a start tag < html > and ending with an end tag </html >; extracting an html framework of the html file, wherein the html framework is an html tag sequence arranged in sequence; and matching the html skeleton obtained by extraction by using a pre-established html skeleton model, and performing webpage backdoor identification according to a matching result.
The target host may be a server providing various services, a personal computer capable of implementing specific functions, or other network devices capable of providing network services. The target host can receive request data sent by the browser and used for initiating a request service to the target host, perform corresponding data processing according to the request data to obtain response data, and feed back the response data to the browser. The browser may run on a variety of electronic devices that are display enabled and support interactive functionality, including but not limited to smart phones, tablets, personal computers, desktop computers, and the like.
The method provided by the embodiment of the specification can be used for detecting the webshell but is not limited to detecting the webshell.
The following describes in detail the technical solutions provided in the embodiments of the present specification in terms of a method, an apparatus, a device, and a storage medium, respectively, in conjunction with a specific application scenario.
In a first aspect, an embodiment of the present disclosure provides a method for detecting a backdoor of a web page, please refer to fig. 2, including:
step 202, obtaining the html file by monitoring the network traffic between the target host and the browser.
Specifically, the network traffic may be monitored in a network sniffing manner, or may be monitored in a network port mirroring manner. The network sniffing mode is to set the network card of the target host into a hybrid mode and capture the network traffic of the target host by calling a network packet interception tool. The network port mirroring mode is to map the acquisition port of the target host to another port and copy data in real time, so as to obtain the network traffic of the target host. Of course, the specific implementation manner of monitoring the network traffic is not limited to the above two manners, and this embodiment does not limit this.
For the monitored network traffic, the html file can be obtained by, but not limited to, identifying the start tag < html > and the end tag </html >.
And 204, extracting html tags from the html files to obtain html tag sequences arranged in the html files, wherein the html tag sequences form html frameworks of the html files.
The implementation manner of the step can be that the tags are identified (namely the tags are extracted) in the html file, and the content among the tags is deleted, so that the html framework is obtained; or the tag is identified from the html file, and the html tag sequence is constructed according to the sequence in the html file by using the identified tag, so as to obtain the html framework.
Wherein, the label identification is carried out by identifying the angle brackets. It should be noted that the sharp brackets identified herein are html tag symbols, which appear directly in html documents as "<" > ", rather than sharp brackets in the tag attribute values or content. If sharp brackets are required to appear in the tag attribute values or content, they are typically encoded as special symbols in html files, e.g., "<" is encoded as "& lt".
And step 206, matching the html framework by using a pre-established html framework model.
The html framework model is obtained by training a plurality of html frameworks serving as samples.
The html skeleton used as a sample is defined in the above description, and is not described in detail here.
And step 208, judging whether the html file is a webpage backdoor file according to the matching result.
The html framework model is obtained by training the html framework, and then model matching is carried out on the html framework of the html file to be recognized, so that whether the html file is a webpage backdoor file or not is judged, and the webpage backdoor recognition is realized. In addition, the variation of the backdoor of the web page is usually realized by changing a rendering effect and the like, and the changes do not usually relate to the change of the html framework, so that the variation of the backdoor of the web page can be effectively resisted by the detection method provided by the embodiment of the specification.
The method provided by the embodiment of the specification does not limit the sample, the modeling method and the matching algorithm used for establishing the html skeleton model.
For the samples used for modeling, preferably, a large number of known web page backdoor files can be obtained, html frameworks of the web page backdoor files are extracted, and model training is performed by taking the html frameworks of the known web page backdoor files as samples. The html framework model trained in the way well describes html framework features (namely ordering rules of html tags) of the webpage backdoor file. Correspondingly, when the html framework model obtained in the mode is used for model matching, the matching result can be a value representing the matching degree, generally, the larger the value is, the higher the matching degree is, which means that the probability that the html file is a webpage backdoor file is higher. Accordingly, the specific implementation manner of step 208 may be: comparing the value of the matching degree with a set threshold value; and if the value of the matching degree exceeds a set threshold value, judging the html file as a webpage backdoor file.
The threshold may be an empirical value, or may be calculated by simulation or the like.
It should be noted that the size of the value and the meaning of the value are determined by the model matching algorithm. When some model matching algorithms are adopted, the lower the value is, the higher the matching degree is.
Correspondingly, if the higher the value of the matching degree, the higher the matching degree is, then exceeding the threshold means being greater than or equal to the threshold; if the higher the value of the matching degree, the lower the matching degree, then exceeding the threshold means being less than or equal to the threshold.
Besides performing model training by taking the html skeleton of the known webpage backdoor file as a sample, model training can also be performed by taking the html skeleton of the known normal html file as a sample. The html framework model trained in the way well describes html framework features (namely ordering rules of html tags) of a normal html file. Correspondingly, when the html framework model obtained in the mode is used for model matching, the matching result can be a value representing the matching degree, generally, the larger the value is, the higher the matching degree is, which means that the probability that the html file is a normal html file is higher, and conversely, the smaller the value is, the lower the matching degree is, which means that the probability that the html file is a webpage backdoor file is higher. Accordingly, the specific implementation manner of step 208 may be: comparing the value of the matching degree with a set threshold value; and if the value of the matching degree exceeds a set threshold value, judging the html file as a webpage backdoor file. In this implementation manner, if the higher the value of the matching degree is, the higher the matching degree is, then exceeding the threshold means being smaller than the threshold; if the higher the value of the matching degree, the lower the matching degree, then exceeding the threshold means being greater than or equal to the threshold.
No matter which type of html skeleton is used as a sample and which modeling method is used, the model matching can be realized by using the existing matching algorithm, such as hash matching, and the specific implementation manner is not described herein.
If the value of the matching degree does not exceed the set threshold, the fact that the html file is not detected to be the webpage backdoor file in the mode means that the detection is finished.
It should be noted that any existing web page backdoor detection method cannot perfectly detect all web page backdoor files, and there is a possibility of false negative. In order to reduce the risk of missing reports, when the web page backdoor file is not detected by using the method provided by the above description embodiment, other detection methods may be used to continue web page backdoor detection on the html file.
The webpage backdoor detection can be sequentially carried out by adopting any one or more existing detection modes.
Preferably, the following two detection methods provided in the embodiments of the present specification may be further adopted to sequentially perform the web page backdoor detection, or any one of the detection methods may be adopted.
The webpage backdoor detection method based on the html attribute information comprises the following steps: extracting attribute information of a label in the html file, wherein the attribute information of the label comprises an attribute and an attribute value of the label; matching the attribute information of the label by using a pre-established attribute information model; and judging whether the html file is a webpage backdoor file or not according to the matching result.
The webpage backdoor detection method based on the segment hash comprises the following steps: carrying out segmentation processing on the html file to obtain a plurality of html fragments; matching the html fragments respectively by utilizing a pre-established fragment hash model; and judging whether the html file is a webpage backdoor file or not according to the matching result.
In the webpage backdoor detection method based on html attribute information, attributes of html tags appear in pairs and are in a name ═ value format. Therefore, the attribute information can be extracted by identifying the equal sign. For example, word segmentation is performed first, then an equal sign is identified, a character string on the left side of the equal sign is extracted as an attribute name, a quotation mark on the right side of the equal sign is identified, and the content in the middle of the quotation mark is extracted as an attribute value.
It should be noted that an html tag may have multiple attributes, that is, the attribute information of an html tag may include multiple pairs of attribute names and attribute values.
The method provided by the embodiment of the specification does not limit the sample, the modeling method and the matching algorithm used for establishing the html attribute information model.
For the samples used for modeling, preferably, a large number of known web page backdoor files can be obtained, the attribute information of html tags of the web page backdoor files is extracted, and model training is performed by taking the attribute information of the html tags of the known web page backdoor files as samples. The html attribute information model trained in the mode well describes the attribute characteristics of the html tags of the webpage backdoor files.
More specifically, one sample may be attribute information of one label, and a plurality of attributes of the label are collectively regarded as one sample. In addition, one sample can also be a combination of attribute information of the tags extracted from the webpage backdoor file, all combination forms are traversed, and the attribute information of each combination form forms a sample.
Taking the attribute information of a single label as a sample to construct an html attribute information model, correspondingly, when performing model matching, taking the single html label as a unit, matching the extracted attribute information of each html label by using a pre-established html attribute information model, wherein each html label corresponds to a matching result, the matching result can be a value representing the matching degree, generally, the larger the value is, the higher the matching degree is, which means that the probability that the html file is a webpage backdoor file is higher. Correspondingly, comparing the value of the highest matching degree with a set threshold value; and if the value of the matching degree exceeds a set threshold value, judging the html file as a webpage backdoor file.
For example, extracting the attribute information of N tags from the html file to be detected, matching the attribute information of the N tags one by one to obtain N values of matching degrees, taking the maximum value, comparing the maximum value with a set threshold, and if the maximum value is greater than the set threshold, the html file is a web backdoor file.
The threshold may be an empirical value, or may be calculated by simulation or the like.
It should be noted that the size of the value and the meaning of the value are determined by the model matching algorithm. When some model matching algorithms are adopted, the lower the value is, the higher the matching degree is.
Correspondingly, if the higher the value of the matching degree, the higher the matching degree is, then exceeding the threshold means being greater than or equal to the threshold; if the higher the value of the matching degree, the lower the matching degree, then exceeding the threshold means being less than or equal to the threshold.
In addition to performing model training by taking the attribute information of the html tag of the known web page backdoor file as a sample, the model training can also be performed by taking the attribute information of the html tag of the known normal html file as a sample. The html framework model obtained by training in the mode well describes the html label attribute characteristics of the normal html file. Correspondingly, when the html attribute information model obtained in the manner is used for model matching, the matching result can be a value representing the matching degree, generally, the larger the value is, the higher the matching degree is, which means that the probability that the html file is a normal html file is higher, and conversely, the smaller the value is, the lower the matching degree is, which means that the probability that the html file is a webpage backdoor file is higher. Correspondingly, comparing the value of the highest matching degree with a set threshold value; and if the value of the matching degree exceeds a set threshold value, judging the html file as a webpage backdoor file. In this implementation manner, if the higher the value of the matching degree is, the higher the matching degree is, then exceeding the threshold means being smaller than the threshold; if the higher the value of the matching degree, the lower the matching degree, then exceeding the threshold means being greater than or equal to the threshold.
No matter which type of html tag attribute information is used as a sample and which modeling method is used, the matching of the model can be realized by using the existing matching algorithm, such as hash matching, and the specific implementation manner is not described herein.
In the web page backdoor detection method based on the segment hash, there are various bases for segmentation, and in order to improve the detection accuracy, it is preferable to use a greater-than number as a segment identifier, and segment when the greater-than number is detected. An html fragment is the content between two larger numbers.
The method provided by the embodiment of the specification does not limit the sample, the modeling method and the matching algorithm used for establishing the segment hash model.
For the samples used for modeling, preferably, a large number of known web page backdoor files can be obtained, the web page backdoor files are subjected to segmentation processing, and html fragments of the known web page backdoor files are used as samples for model training. The segment hash model obtained by training in the mode well describes the characteristics of the webpage backdoor file. Correspondingly, when model matching is performed, each html segment is subjected to model matching to obtain a segment hash value, whether the html segment is matched or not is judged according to the segment hash value, for example, if the segment hash value is greater than a set matching threshold value, the html segment is matched, and generally, the more the number of matched html segments is, the higher the probability that the html file is a webpage backdoor file is. Correspondingly, comparing the hit rate with a set threshold value, wherein the hit rate is the ratio of the number of the matched html fragments to the total number of the html fragments; and if the hit rate exceeds a set threshold value, judging that the html file is a webpage backdoor file. The method can also be as follows: and comparing the number of the matched html fragments with a set threshold, and if the number of the matched html fragments exceeds the set threshold, judging the html file to be a webpage backdoor file.
The term "exceeding a predetermined threshold" means being greater than or equal to the predetermined threshold.
The threshold may be an empirical value, may be fixed, and may also be dynamically adjusted according to the total number of html fragments of the html file to be detected.
The embodiments in this specification do not limit the specific implementation manner of the threshold adjustment. For example, the adjustment may be performed according to a functional relationship between the total number of html fragments and the threshold value, or may be performed according to a correspondence table between the total number of html fragments and the threshold value.
Besides the model training by taking the html fragment of the known webpage backdoor file as a sample, the model training can also be performed by taking the html fragment of the known normal html file as a sample. The segment hash model obtained by training in the mode well describes the characteristics of the normal html file. Correspondingly, when the fragment hash model obtained in the mode is used for model matching, the more the number of matched html fragments is, the higher the probability that the html file is a normal html file is, and conversely, the higher the probability that the html file is a webpage backdoor file is. Correspondingly, the hit rate is compared with a set threshold value; and if the hit rate is less than the set threshold value, judging the html file as a webpage backdoor file.
No matter which type of html segment is used as a sample and which modeling method is used, the matching can be realized by using the existing matching algorithm, such as hash matching, and the specific implementation manner is not described herein.
In order to better resist against webpage backdoor variations, in any of the above method embodiments, before step 206, the specific character string in the html tag may be replaced with a standard character string according to a predetermined rule; correspondingly, when the html framework model is established, replacing the specific character string in the html tag in the html framework as the sample with the standard character string according to the preset rule.
The above process may be referred to as a preprocessing process, which may be performed before or after the html tag is extracted.
Specific character strings such as numbers can appear in the attribute values of the tags and the content among the tags, the specific content of the specific character strings is different in different html files (for example, different numbers take values), but the specific character strings do not belong to the characteristics capable of distinguishing normal html files from webpage backdoor files, the specific character strings are classified, each type of character strings are replaced by corresponding standard character strings, the recognition of the html files cannot be affected, and the accuracy and the efficiency in matching can be improved instead.
The predetermined rules describe the recognition rules of each type of specific character string and the standard character string corresponding to each type of specific character string. For example, in an html file, the string "123456" and the string "654321" are both numbers, replaced with the standard string "digital".
If the web page backdoor file is not detected based on the html framework model, the web page backdoor is continuously detected based on the html attribute information model and/or based on the segment hash model, the last two detection modes also need to be preprocessed, and preferably, the preprocessing operation can be carried out in the detection process based on the html framework model so as to improve the subsequent processing efficiency. For example, when performing webpage backdoor detection based on the html attribute information model, in addition to the preprocessing operation of the above standard string replacement, a set general tag may be filtered, that is, only the attribute information of a non-general html tag is extracted, where the general tag refers to a tag that both a normal html file and a webpage backdoor file have. The generic labels may be maintained by a generic label list. Correspondingly, when the html attribute information model is established, only the attribute information of the non-universal html tag can be extracted.
In a second aspect, based on the same inventive concept, an embodiment of the present specification provides a web page backdoor detection apparatus, please refer to fig. 3, including:
the Html file acquisition module 301 is configured to acquire a hypertext markup language Html file of a target host by monitoring network traffic between the target host and a browser;
an Html framework extraction module 302, configured to extract Html tags from the Html files to obtain Html tag sequences arranged in the Html files, where the Html tag sequences form Html frameworks of the Html files;
the model matching module 303 is configured to match the html framework by using a pre-established html framework model, where the html framework model is obtained by training a plurality of html frameworks as samples;
and the web page backdoor detection module 304 is configured to determine whether the html file is a web page backdoor file according to the matching result.
The html framework model is obtained by training the html framework, and then model matching is carried out on the html framework of the html file to be recognized, so that whether the html file is a webpage backdoor file or not is judged, and recognition of the webpage backdoor is achieved. In addition, the variation of the backdoor of the web page is usually realized by changing a rendering effect and the like, and the changes do not usually relate to the change of the html framework, so that the variation of the backdoor of the web page can be effectively resisted by the detection method provided by the embodiment of the specification.
Optionally, the html framework model is obtained by extracting a html framework of a known web page backdoor file and training with the html framework of the known web page backdoor file as a sample.
Optionally, the web page backdoor detection module is configured to: comparing the value of the matching degree with a set threshold value; and if the value of the matching degree exceeds a set threshold value, judging that the html file is a webpage backdoor file.
Optionally, if the value of the matching degree does not exceed the set threshold, the method further includes a second detection module, configured to:
extracting attribute information of each label in the html file, wherein the attribute information of the label comprises the attribute and the attribute value of the label; respectively matching the attribute information of each label by using a pre-established attribute information model; judging whether the html file is a webpage backdoor file or not according to a matching result;
alternatively, the first and second electrodes may be,
carrying out segmentation processing on the html file to obtain a plurality of html fragments; matching each html fragment by utilizing a pre-established fragment hash model; and judging whether the html file is a webpage backdoor file or not according to the matching result.
Based on any of the above device embodiments, optionally, the method further includes: replacing the specific character string in the html label with a standard character string according to a preset rule; and when the html framework model is established, replacing the specific character string in the html tag in the html framework as the sample with the standard character string according to the preset rule.
In a third aspect, based on the same inventive concept, an embodiment of the present specification provides a web page backdoor detection apparatus, including a memory and a processor, where the memory stores a computer program, and the processor implements the steps of any of the above method embodiments when executing the computer program.
The web page backdoor detection device provided in the embodiments of the present specification may be, but is not limited to, a gateway, a device having a gateway function, or other devices that can monitor network traffic.
In a fourth aspect, based on the same inventive concept, the present specification further provides a computer-readable storage medium, on which a computer program is stored, and the computer program, when executed by a processor, implements the steps of any of the above-mentioned embodiments.
The description has been presented with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the description. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
While preferred embodiments of the present specification have been described, additional variations and modifications in those embodiments may occur to those skilled in the art once they learn of the basic inventive concepts. Therefore, it is intended that the appended claims be interpreted as including preferred embodiments and all changes and modifications that fall within the scope of the specification.
It will be apparent to those skilled in the art that various changes and modifications may be made in the present specification without departing from the spirit and scope of the specification. Thus, if such modifications and variations of the present specification fall within the scope of the claims of the present specification and their equivalents, the specification is intended to include such modifications and variations.

Claims (12)

1. A webpage backdoor detection method is characterized by comprising the following steps:
acquiring a hypertext markup language html file by monitoring network traffic between a target host and a browser;
extracting html tags from the html files to obtain html tag sequences arranged in the html files, wherein the html tag sequences form html frameworks of the html files;
matching the html framework by using a pre-established html framework model, wherein the html framework model is obtained by training a plurality of html frameworks serving as samples;
and judging whether the html file is a webpage backdoor file or not according to the matching result.
2. The method according to claim 1, wherein the html framework model is obtained by extracting html framework of a known web page backdoor file and training by taking html framework of the known web page backdoor file as a sample.
3. The method according to claim 1, wherein the matching result is a value of a matching degree, and the determining whether the html file is a web backdoor file according to the matching result comprises:
comparing the value of the matching degree with a set threshold value;
and if the value of the matching degree exceeds a set threshold value, judging that the html file is a webpage backdoor file.
4. The method of claim 3, wherein if the value of the matching degree does not exceed a set threshold, the method further comprises:
extracting attribute information of a label in the html file, wherein the attribute information of the label comprises an attribute and an attribute value of the label; respectively matching the attribute information of the tags by using a pre-established attribute information model; judging whether the html file is a webpage backdoor file or not according to a matching result;
alternatively, the first and second electrodes may be,
carrying out segmentation processing on the html file to obtain a plurality of html fragments; matching each html fragment by utilizing a pre-established fragment hash model; and judging whether the html file is a webpage backdoor file or not according to the matching result.
5. The method according to any one of claims 1 to 4, wherein before matching the html skeleton by using the pre-established html skeleton model, the method further comprises: replacing the specific character string in the html label with a standard character string according to a preset rule;
and when the html framework model is established, replacing the specific character string in the html tag in the html framework as the sample with the standard character string according to the preset rule.
6. A web page backdoor detection apparatus, comprising:
the Html file acquisition module is used for acquiring Html files by monitoring network traffic between the target host and the browser;
the Html framework extraction module is used for extracting Html tags from the Html files to obtain Html tag sequences arranged in the Html files, and the Html tag sequences form Html frameworks of the Html files;
the model matching module is used for matching the html framework by utilizing a pre-established html framework model, and the html framework model is obtained by training a plurality of html frameworks serving as samples;
and the webpage backdoor detection module is used for judging whether the html file is a webpage backdoor file according to the matching result.
7. The device of claim 6, wherein the html skeleton model is obtained by extracting an html skeleton of a known web page backdoor file and training the html skeleton of the known web page backdoor file by taking the html skeleton of the known web page backdoor file as a sample.
8. The apparatus of claim 7, wherein the matching result is a value of a matching degree, and the web backdoor detection module is configured to:
comparing the value of the matching degree with a set threshold value;
and if the value of the matching degree exceeds a set threshold value, judging that the html file is a webpage backdoor file.
9. The apparatus of claim 8, further comprising a second detection module, if the value of the matching degree does not exceed a set threshold, configured to:
extracting attribute information of each label in the html file, wherein the attribute information of the label comprises the attribute and the attribute value of the label; respectively matching the attribute information of each label by using a pre-established attribute information model; judging whether the html file is a webpage backdoor file or not according to a matching result;
alternatively, the first and second electrodes may be,
carrying out segmentation processing on the html file to obtain a plurality of html fragments; matching each html fragment by utilizing a pre-established fragment hash model; and judging whether the html file is a webpage backdoor file or not according to the matching result.
10. The apparatus of any one of claims 6 to 9, further comprising a pre-processing module for:
replacing the specific character string in the html label with a standard character string according to a preset rule;
and when the html framework model is established, replacing the specific character string in the html tag in the html framework as the sample with the standard character string according to the preset rule.
11. A web page backdoor detection apparatus comprising a memory and a processor, the memory storing a computer program, wherein the processor implements the steps of the method of any one of claims 1 to 5 when executing the computer program.
12. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the steps of the method of any one of claims 1 to 5.
CN201810714112.1A 2018-06-29 2018-06-29 Webpage backdoor detection method, device, equipment and storage medium Active CN108985059B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810714112.1A CN108985059B (en) 2018-06-29 2018-06-29 Webpage backdoor detection method, device, equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810714112.1A CN108985059B (en) 2018-06-29 2018-06-29 Webpage backdoor detection method, device, equipment and storage medium

Publications (2)

Publication Number Publication Date
CN108985059A CN108985059A (en) 2018-12-11
CN108985059B true CN108985059B (en) 2021-09-24

Family

ID=64539883

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810714112.1A Active CN108985059B (en) 2018-06-29 2018-06-29 Webpage backdoor detection method, device, equipment and storage medium

Country Status (1)

Country Link
CN (1) CN108985059B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109684844B (en) * 2018-12-27 2020-11-20 北京神州绿盟信息安全科技股份有限公司 Webshell detection method and device, computing equipment and computer-readable storage medium
CN114422148B (en) * 2022-03-25 2024-04-09 北京长亭未来科技有限公司 Framework depiction and detection method, device and equipment of Webshell

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101471818A (en) * 2007-12-24 2009-07-01 北京启明星辰信息技术股份有限公司 Detection method and system for malevolence injection script web page
CN103902476A (en) * 2013-12-27 2014-07-02 哈尔滨安天科技股份有限公司 Webpage backdoor detection method and system based on non-credit-granting
CN104301314A (en) * 2014-10-31 2015-01-21 电子科技大学 Intrusion detection method and device based on browser tag attributes
CN107451476A (en) * 2017-07-21 2017-12-08 上海携程商务有限公司 Webpage back door detection method, system, equipment and storage medium based on cloud platform

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102780682B (en) * 2011-05-12 2015-02-18 同济大学 Website behavior model modeling method based on HTML (Hyper Text Markup Language)
CN103559235B (en) * 2013-10-24 2016-08-17 中国科学院信息工程研究所 A kind of online social networks malicious web pages detection recognition methods
CN103970845B (en) * 2014-04-28 2017-03-22 南京邮电大学 Webpage filtering method based on program slicing technology
CN106790007A (en) * 2016-12-13 2017-05-31 武汉虹旭信息技术有限责任公司 Web attack defending systems and its method based on XSS and CSRF
CN106778357B (en) * 2016-12-23 2020-02-07 北京神州绿盟信息安全科技股份有限公司 Webpage tampering detection method and device

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101471818A (en) * 2007-12-24 2009-07-01 北京启明星辰信息技术股份有限公司 Detection method and system for malevolence injection script web page
CN103902476A (en) * 2013-12-27 2014-07-02 哈尔滨安天科技股份有限公司 Webpage backdoor detection method and system based on non-credit-granting
CN104301314A (en) * 2014-10-31 2015-01-21 电子科技大学 Intrusion detection method and device based on browser tag attributes
CN107451476A (en) * 2017-07-21 2017-12-08 上海携程商务有限公司 Webpage back door detection method, system, equipment and storage medium based on cloud platform

Also Published As

Publication number Publication date
CN108985059A (en) 2018-12-11

Similar Documents

Publication Publication Date Title
US10547691B2 (en) System and method for main page identification in web decoding
US9954886B2 (en) Method and apparatus for detecting website security
WO2019218514A1 (en) Method for extracting webpage target information, device, and storage medium
CN108920955B (en) Webpage backdoor detection method, device, equipment and storage medium
CN108491866B (en) Pornographic picture identification method, electronic device and readable storage medium
KR20190022431A (en) Training Method of Random Forest Model, Electronic Apparatus and Storage Medium
CN105989268A (en) Safety access method and system for human-computer identification
CN107733967A (en) Processing method, device, computer equipment and the storage medium of pushed information
CN108228421A (en) data monitoring method, device, computer and storage medium
CN106844685A (en) Method, device and server for recognizing website
CN115941322B (en) Attack detection method, device, equipment and storage medium based on artificial intelligence
CN108985059B (en) Webpage backdoor detection method, device, equipment and storage medium
CN107437088B (en) File identification method and device
CN111355628A (en) Model training method, business recognition device and electronic device
CN114386013A (en) Automatic student status authentication method and device, computer equipment and storage medium
CN108920950B (en) Webpage backdoor detection method, device, equipment and storage medium
CN109684844B (en) Webshell detection method and device, computing equipment and computer-readable storage medium
CN111382383A (en) Method, device, medium and computer equipment for determining sensitive type of webpage content
CN107688594A (en) The identifying system and method for risk case based on social information
CN111611388A (en) Account classification method, device and equipment
CN116743474A (en) Decision tree generation method and device, electronic equipment and storage medium
CN108875060B (en) Website identification method and identification system
CN108073803A (en) For detecting the method and device of malicious application
CN114140850A (en) Face recognition method and device and electronic equipment
CN114513355A (en) Malicious domain name detection method, device, equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
TR01 Transfer of patent right
TR01 Transfer of patent right

Effective date of registration: 20220722

Address after: 300450 No. 9-3-401, No. 39, Gaoxin 6th Road, Binhai Science Park, Binhai New Area, Tianjin

Patentee after: 3600 Technology Group Co.,Ltd.

Address before: 100088 room 112, block D, 28 new street, new street, Xicheng District, Beijing (Desheng Park)

Patentee before: BEIJING QIHOO TECHNOLOGY Co.,Ltd.

TR01 Transfer of patent right
TR01 Transfer of patent right

Effective date of registration: 20230711

Address after: 1765, floor 17, floor 15, building 3, No. 10 Jiuxianqiao Road, Chaoyang District, Beijing 100015

Patentee after: Beijing Hongxiang Technical Service Co.,Ltd.

Address before: 300450 No. 9-3-401, No. 39, Gaoxin 6th Road, Binhai Science Park, Binhai New Area, Tianjin

Patentee before: 3600 Technology Group Co.,Ltd.