CN108920950A - A kind of webpage back door detection method, device, equipment and storage medium - Google Patents

A kind of webpage back door detection method, device, equipment and storage medium Download PDF

Info

Publication number
CN108920950A
CN108920950A CN201810713529.6A CN201810713529A CN108920950A CN 108920950 A CN108920950 A CN 108920950A CN 201810713529 A CN201810713529 A CN 201810713529A CN 108920950 A CN108920950 A CN 108920950A
Authority
CN
China
Prior art keywords
html
file
fragments
backdoor
matching
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201810713529.6A
Other languages
Chinese (zh)
Other versions
CN108920950B (en
Inventor
张鑫
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Qihoo Technology Co Ltd
Original Assignee
Beijing Qihoo Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Qihoo Technology Co Ltd filed Critical Beijing Qihoo Technology Co Ltd
Priority to CN201810713529.6A priority Critical patent/CN108920950B/en
Publication of CN108920950A publication Critical patent/CN108920950A/en
Application granted granted Critical
Publication of CN108920950B publication Critical patent/CN108920950B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/50Monitoring users, programs or devices to maintain the integrity of platforms, e.g. of processors, firmware or operating systems
    • G06F21/55Detecting local intrusion or implementing counter-measures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/50Monitoring users, programs or devices to maintain the integrity of platforms, e.g. of processors, firmware or operating systems
    • G06F21/55Detecting local intrusion or implementing counter-measures
    • G06F21/556Detecting local intrusion or implementing counter-measures involving covert channels, i.e. data leakage between processes
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/50Monitoring users, programs or devices to maintain the integrity of platforms, e.g. of processors, firmware or operating systems
    • G06F21/57Certifying or maintaining trusted computer platforms, e.g. secure boots or power-downs, version controls, system software checks, secure updates or assessing vulnerabilities
    • G06F21/577Assessing vulnerabilities and evaluating computer system security

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Security & Cryptography (AREA)
  • Software Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Hardware Design (AREA)
  • General Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Computing Systems (AREA)
  • Information Transfer Between Computers (AREA)

Abstract

This specification embodiment provides a kind of webpage back door detection method, device, equipment and storage medium.Its method includes:HyperText Markup Language html file is obtained by the network flow between monitoring objective host and browser;Segment processing is carried out to the html file, obtains multiple html segments;Each html segment is matched respectively using the segment Hash model pre-established;Judge whether the html file is webpage backdoor file according to matching result.The mutation at webpage back door is usually to change what the modes such as rendering effect were realized, these changes are not usually related to the change of html fragment match quantity, and therefore, the detection method provided using this specification embodiment can effectively antagonize the mutation at webpage back door.

Description

Webpage backdoor detection method, device, equipment and storage medium
Technical Field
The embodiment of the specification relates to the technical field of network security, in particular to a method, a device, equipment and a storage medium for detecting a webpage backdoor.
Background
Web backdoors are a common tool for hackers to attack target hosts. Taking Webshell as an example, the command execution environment is a command execution environment existing in the form of web page files such as asp (active Server Pages, dynamic target host Pages), php (Hypertext Preprocessor), jsp (Java Server Pages, Java target host Pages), cgi (common gateway interface), and the like, and may also be referred to as a web portal.
The traditional webpage backdoor detection method mainly describes the characteristics of known webpage backdoor files accurately to form a regular expression, and utilizes the regular expression to detect the webpage backdoor. This detection method relies on manual experience and is not flexible. The web backdoor is prevented from being identified, various varieties appear, and the web backdoor varieties are difficult to deal with by adopting the traditional detection mode.
Disclosure of Invention
Compared with a detection method based on a regular expression, the implementation mode is simple and flexible, and the webpage backdoor variety can be effectively resisted.
In a first aspect, an embodiment of the present specification provides a method for detecting a backdoor of a web page, where the method includes:
acquiring an html (hypertext markup language) file by monitoring network traffic between a target host and a browser;
carrying out segmentation processing on the html file to obtain a plurality of html fragments;
matching each html fragment by utilizing a pre-established fragment hash model;
and judging whether the html file is a webpage backdoor file or not according to the number of the matched html fragments.
Optionally, the segment hash model is obtained by performing segmentation processing on a known web page backdoor file and training with an html segment of the known web page backdoor file as a sample.
Optionally, the determining, according to the number of the matched html fragments, whether the html file is a web backdoor file includes:
comparing a hit rate with a set threshold value, wherein the hit rate is the ratio of the number of the matched html fragments to the total number of the html fragments;
and if the hit rate exceeds a set threshold value, judging that the html file is a webpage backdoor file.
Optionally, the method further includes, before determining whether the html file is a web backdoor file according to the number of the matched html fragments, that:
and adjusting the threshold value according to the total number of the html fragments, wherein the higher the total number of the html fragments is, the lower the threshold value is.
Optionally, if the value of the matching degree does not exceed the set threshold, the method further includes:
extracting attribute information of each label in the html file, wherein the attribute information of the label comprises the attribute and the attribute value of the label; respectively matching the attribute information of each label by using a pre-established attribute information model; judging whether the html file is a webpage backdoor file or not according to a matching result;
or,
extracting html tags from the html files to obtain html tag sequences arranged in the html files, wherein the html tag sequences form html frameworks of the html files; matching the html framework by using a pre-established html framework model, wherein the html framework model is obtained by training a plurality of html frameworks serving as samples; and judging whether the html file is a webpage backdoor file or not according to the matching result.
Based on any of the above method embodiments, optionally, before the matching is performed on each html segment by using the pre-established segment hash model, the method further includes: replacing the specific character string in the html file with a standard character string according to a preset rule;
and when the segment hash model is established, replacing the specific character string in the html segment serving as the sample with a standard character string according to the preset rule.
Based on any of the above method embodiments, optionally, the segmenting the html file to obtain a plurality of html fragments includes:
and carrying out segmentation processing on the html file, and deleting the set general html fragments to obtain a plurality of html fragments.
Based on any of the above method embodiments, optionally, the segmenting the html file to obtain a plurality of html fragments includes:
and carrying out segmentation processing on the html file by taking the greater than number as a segmentation identifier to obtain a plurality of html fragments.
In a second aspect, an embodiment of the present specification provides a web page backdoor detection apparatus, including:
the Html file acquisition module is used for acquiring Html files by monitoring network traffic between the target host and the browser;
the Html segmentation module is used for carrying out segmentation processing on the Html file to obtain a plurality of Html fragments;
the model matching module is used for respectively matching each html fragment by utilizing a pre-established fragment hash model;
and the webpage backdoor detection module is used for judging whether the html file is the webpage backdoor file or not according to the number of the matched html fragments.
Optionally, the segment hash model is obtained by performing segmentation processing on a known web page backdoor file and training with an html segment of the known web page backdoor file as a sample.
Optionally, the web page backdoor detection module is configured to:
comparing a hit rate with a set threshold value, wherein the hit rate is the ratio of the number of the matched html fragments to the total number of the html fragments;
and if the hit rate exceeds a set threshold value, judging that the html file is a webpage backdoor file.
Optionally, the system further includes a threshold adjusting module, configured to adjust the threshold according to the total number of html fragments, where the higher the total number of html fragments is, the lower the threshold is.
Optionally, if the hit rate does not exceed the set threshold, the method further includes a second detecting module, configured to:
extracting attribute information of each label in the html file, wherein the attribute information of the label comprises the attribute and the attribute value of the label; respectively matching the attribute information of each label by using a pre-established attribute information model; judging whether the html file is a webpage backdoor file or not according to a matching result;
or,
extracting html tags from the html files to obtain html tag sequences arranged in the html files, wherein the html tag sequences form html frameworks of the html files; matching the html framework by using a pre-established html framework model, wherein the html framework model is obtained by training a plurality of html frameworks serving as samples; and judging whether the html file is a webpage backdoor file or not according to the matching result.
Based on any of the above device embodiments, optionally, the method further includes: replacing the specific character string in the html file with a standard character string according to a preset rule;
and when the segment hash model is established, replacing the specific character string in the html segment serving as the sample with a standard character string according to the preset rule.
Based on any of the above device embodiments, optionally, the html segmentation module is configured to:
and carrying out segmentation processing on the html file, and deleting the set general html fragments to obtain a plurality of html fragments.
Based on any of the above device embodiments, optionally, the html segmentation module is configured to:
and carrying out segmentation processing on the html file by taking the greater than number as a segmentation identifier to obtain a plurality of html fragments.
In a third aspect, an embodiment of the present specification provides a web page backdoor detection apparatus, including a memory and a processor, where the memory stores a computer program, and the processor implements the following steps when executing the computer program:
acquiring a hypertext markup language html file by monitoring network traffic between a target host and a browser;
carrying out segmentation processing on the html file to obtain a plurality of html fragments;
matching each html fragment by utilizing a pre-established fragment hash model;
and judging whether the html file is a webpage backdoor file or not according to the number of the matched html fragments.
In a fourth aspect, the present specification provides a computer readable storage medium, on which a computer program is stored, the computer program, when executed by a processor, implementing the steps of:
acquiring a hypertext markup language html file by monitoring network traffic between a target host and a browser;
carrying out segmentation processing on the html file to obtain a plurality of html fragments;
matching each html fragment by utilizing a pre-established fragment hash model;
and judging whether the html file is a webpage backdoor file or not according to the number of the matched html fragments.
The embodiment of the specification has the following beneficial effects:
in the process of implementing the invention, the inventor discovers through creative work that: the html segment of the webpage backdoor file is different from the normal html segment and has track. Therefore, the fragment hash model can be obtained by training the html fragment, and then the html fragment of the html file to be recognized is subjected to model matching, so that whether the html file is a webpage backdoor file or not is judged, and the webpage backdoor recognition is realized. In addition, the variation of the backdoor of the web page is usually realized by changing the rendering effect and the like, and the change does not influence the number of the matched html fragments, so that the variation of the backdoor of the web page can be effectively resisted by the detection method provided by the embodiment of the specification.
Drawings
Fig. 1 is a schematic view of an application scenario provided in an embodiment of the present specification;
FIG. 2 is a flow chart of a method of a first aspect of an embodiment of the present disclosure;
fig. 3 is a schematic diagram of an apparatus according to a second aspect of an embodiment of the present disclosure.
Detailed Description
In order to better understand the technical solutions, the technical solutions of the embodiments of the present specification are described in detail below with reference to the drawings and specific embodiments, and it should be understood that the specific features of the embodiments and embodiments of the present specification are detailed descriptions of the technical solutions of the embodiments of the present specification, and are not limitations of the technical solutions of the present specification, and the technical features of the embodiments and embodiments of the present specification may be combined with each other without conflict.
The technical scheme provided by the embodiment of the present description is applied to a traffic layer of a network, and can be implemented at a gateway, or on a device having a gateway function, or on any device capable of monitoring network traffic. Taking the method provided by the embodiment of the present specification as an example, as shown in fig. 1, a gateway 101 monitors network traffic between a target host 102 and a browser 103, and obtains an html file from the network traffic, that is, an html code starting with a start tag < html > and ending with an end tag </html >; carrying out segmentation processing on the html file to obtain a plurality of html fragments; matching each html fragment by utilizing a pre-established fragment hash model; and judging whether the html file is a webpage backdoor file or not according to the number of the matched html fragments.
The target host may be a server providing various services, a personal computer capable of implementing specific functions, or other network devices capable of providing network services. The target host can receive request data sent by the browser and used for initiating a request service to the target host, perform corresponding data processing according to the request data to obtain response data, and feed back the response data to the browser. The browser may run on a variety of electronic devices that are display enabled and support interactive functionality, including but not limited to smart phones, tablets, personal computers, desktop computers, and the like.
The method provided by the embodiment of the specification can be used for detecting the webshell but is not limited to detecting the webshell.
The following describes in detail the technical solutions provided in the embodiments of the present specification in terms of a method, an apparatus, a device, and a storage medium, respectively, in conjunction with a specific application scenario.
In a first aspect, an embodiment of the present disclosure provides a method for detecting a backdoor of a web page, please refer to fig. 2, including:
step 202, obtaining the html file by monitoring the network traffic between the target host and the browser.
In particular, the monitoring of the network traffic may be performed by, but not limited to, port mirroring.
The html file can be obtained by, but not limited to, recognizing the start tag < html > and the end tag </html >.
And 204, carrying out segmentation processing on the html file to obtain a plurality of html fragments.
The basis of segmentation is various, and in order to improve the detection accuracy, it is preferable to use a greater-than number as the segment identifier, and segment each time the greater-than number is detected. An html fragment is the content between two larger numbers.
And step 206, matching each html segment respectively by utilizing a pre-established segment hash model.
The segment hash model is obtained by training a plurality of known html segments as samples.
And step 208, judging whether the html file is a webpage backdoor file according to the matching result.
The html fragment is trained to obtain a fragment hash model, and then model matching is carried out on the html fragment of the html file to be recognized, so that whether the html file is a webpage backdoor file or not is judged, and the webpage backdoor recognition is achieved. In addition, the variation of the backdoor of the web page is usually realized by changing the rendering effect and the like, and the change does not influence the number of the matched html fragments, so that the variation of the backdoor of the web page can be effectively resisted by the detection method provided by the embodiment of the specification.
The method provided by the embodiment of the specification does not limit the sample, the modeling method and the matching algorithm used for establishing the segment hash model.
For the samples used for modeling, preferably, a large number of known web page backdoor files can be obtained, the web page backdoor files are subjected to segmentation processing, and html fragments of the known web page backdoor files are used as samples for model training. The segment hash model obtained by training in the mode well describes the characteristics of the webpage backdoor file. Correspondingly, when model matching is performed, each html segment is subjected to model matching to obtain a segment hash value, whether the html segment is matched or not is judged according to the segment hash value, for example, if the segment hash value is greater than a set matching threshold value, the html segment is matched, and generally, the more the number of matched html segments is, the higher the probability that the html file is a webpage backdoor file is. Accordingly, the specific implementation manner of step 208 may be: comparing a hit rate with a set threshold value, wherein the hit rate is the ratio of the number of the matched html fragments to the total number of the html fragments; and if the hit rate exceeds a set threshold value, judging that the html file is a webpage backdoor file. The specific implementation manner of the step 208 may also be: and comparing the number of the matched html fragments with a set threshold, and if the number of the matched html fragments exceeds the set threshold, judging the html file to be a webpage backdoor file.
The term "exceeding a predetermined threshold" means being greater than or equal to the predetermined threshold.
The threshold may be an empirical value, may be fixed, and may also be dynamically adjusted according to the total number of html fragments of the html file to be detected.
The embodiments in this specification do not limit the specific implementation manner of the threshold adjustment. For example, the adjustment may be performed according to a functional relationship between the total number of html fragments and the threshold value, or may be performed according to a correspondence table between the total number of html fragments and the threshold value.
Besides the model training by taking the html fragment of the known webpage backdoor file as a sample, the model training can also be performed by taking the html fragment of the known normal html file as a sample. The segment hash model obtained by training in the mode well describes the characteristics of the normal html file. Correspondingly, when the fragment hash model obtained in the mode is used for model matching, the more the number of matched html fragments is, the higher the probability that the html file is a normal html file is, and conversely, the higher the probability that the html file is a webpage backdoor file is. Accordingly, the specific implementation manner of step 208 may be: comparing the hit rate with a set threshold; and if the hit rate is less than the set threshold value, judging the html file as a webpage backdoor file.
No matter which type of html segment is used as a sample and which modeling method is used, the matching can be realized by using the existing matching algorithm, such as hash matching, and the specific implementation manner is not described herein.
If the hit rate does not exceed the set threshold, it means that it is not detected that the html file is a web page backdoor file in the above manner, and this detection is finished.
It should be noted that any existing web page backdoor detection method cannot perfectly detect all web page backdoor files, and there is a possibility of false negative. In order to reduce the risk of missing reports, when the web page backdoor file is not detected by using the method provided by the above description embodiment, other detection methods may be used to continue web page backdoor detection on the html file.
The webpage backdoor detection can be sequentially carried out by adopting any one or more existing detection modes.
Preferably, the following two detection methods provided in the embodiments of the present specification may be further adopted to sequentially perform the web page backdoor detection, or any one of the detection methods may be adopted.
The webpage backdoor detection method based on the html framework comprises the following steps: extracting html tags from the html files to obtain html tag sequences arranged in the html files, wherein the html tag sequences form html frameworks of the html files; matching the html framework by using a pre-established html framework model, wherein the html framework model is obtained by training a plurality of html frameworks serving as samples; and judging whether the html file is a webpage backdoor file or not according to the matching result.
The webpage backdoor detection method based on the html attribute information comprises the following steps: extracting attribute information of a label in the html file, wherein the attribute information of the label comprises an attribute and an attribute value of the label; matching the attribute information of the label by using a pre-established attribute information model; and judging whether the html file is a webpage backdoor file or not according to the matching result.
In the webpage backdoor detection method based on html attribute information, attributes of html tags appear in pairs and are in a name ═ value format. Therefore, the attribute information can be extracted by identifying the equal sign. For example, word segmentation is performed first, then an equal sign is identified, a character string on the left side of the equal sign is extracted as an attribute name, a quotation mark on the right side of the equal sign is identified, and the content in the middle of the quotation mark is extracted as an attribute value.
It should be noted that an html tag may have multiple attributes, that is, the attribute information of an html tag may include multiple pairs of attribute names and attribute values.
The method provided by the embodiment of the specification does not limit the sample, the modeling method and the matching algorithm used for establishing the html attribute information model.
For the samples used for modeling, preferably, a large number of known web page backdoor files can be obtained, the attribute information of html tags of the web page backdoor files is extracted, and model training is performed by taking the attribute information of the html tags of the known web page backdoor files as samples. The html attribute information model trained in the mode well describes the attribute characteristics of the html tags of the webpage backdoor files.
More specifically, one sample may be attribute information of one label, and a plurality of attributes of the label are collectively regarded as one sample. In addition, one sample can also be a combination of attribute information of the tags extracted from the webpage backdoor file, all combination forms are traversed, and the attribute information of each combination form forms a sample.
Taking the attribute information of a single label as a sample to construct an html attribute information model, correspondingly, when performing model matching, taking the single html label as a unit, matching the extracted attribute information of each html label by using a pre-established html attribute information model, wherein each html label corresponds to a matching result, the matching result can be a value representing the matching degree, generally, the larger the value is, the higher the matching degree is, which means that the probability that the html file is a webpage backdoor file is higher. Correspondingly, comparing the value of the highest matching degree with a set threshold value; and if the value of the matching degree exceeds a set threshold value, judging the html file as a webpage backdoor file.
For example, extracting the attribute information of N tags from the html file to be detected, matching the attribute information of the N tags one by one to obtain N values of matching degrees, taking the maximum value, comparing the maximum value with a set threshold, and if the maximum value is greater than the set threshold, the html file is a web backdoor file.
The threshold may be an empirical value, or may be calculated by simulation or the like.
It should be noted that the size of the value and the meaning of the value are determined by the model matching algorithm. When some model matching algorithms are adopted, the lower the value is, the higher the matching degree is.
Correspondingly, if the higher the value of the matching degree, the higher the matching degree is, then exceeding the threshold means being greater than or equal to the threshold; if the higher the value of the matching degree, the lower the matching degree, then exceeding the threshold means being less than or equal to the threshold.
In addition to performing model training by taking the attribute information of the html tag of the known web page backdoor file as a sample, the model training can also be performed by taking the attribute information of the html tag of the known normal html file as a sample. The html framework model obtained by training in the mode well describes the html label attribute characteristics of the normal html file. Correspondingly, when the html framework model obtained in the mode is used for model matching, the matching result can be a value representing the matching degree, generally, the larger the value is, the higher the matching degree is, which means that the probability that the html file is a normal html file is higher, and conversely, the smaller the value is, the lower the matching degree is, which means that the probability that the html file is a webpage backdoor file is higher. Correspondingly, comparing the value of the highest matching degree with a set threshold value; and if the value of the matching degree exceeds a set threshold value, judging the html file as a webpage backdoor file. In this implementation manner, if the higher the value of the matching degree is, the higher the matching degree is, then exceeding the threshold means being smaller than the threshold; if the higher the value of the matching degree, the lower the matching degree, then exceeding the threshold means being greater than or equal to the threshold.
No matter which type of html tag attribute information is used as a sample and which modeling method is used, the matching of the model can be realized by using the existing matching algorithm, such as hash matching, and the specific implementation manner is not described herein.
In the webpage backdoor identification method based on the html framework, the html framework can be extracted by identifying tags (namely extracting tags) in an html file and deleting contents among the tags, so that the html framework is obtained; or the tag is identified from the html file, and the html tag sequence is constructed according to the sequence in the html file by using the identified tag, so as to obtain the html framework.
Wherein, the label identification is carried out by identifying the angle brackets. It should be noted that the sharp brackets identified herein are html tag symbols, which appear directly in html documents as "<" > ", rather than sharp brackets in the tag attribute values or content. If sharp brackets are required to appear in the tag attribute values or content, they are typically encoded as special symbols in html files, e.g., "<" is encoded as "& lt".
The method provided by the embodiment of the specification does not limit the sample, the modeling method and the matching algorithm used for establishing the html skeleton model.
For the samples used for modeling, preferably, a large number of known web page backdoor files can be obtained, html frameworks of the web page backdoor files are extracted, and model training is performed by taking the html frameworks of the known web page backdoor files as samples. The html framework model trained in the way well describes html framework features (namely ordering rules of html tags) of the webpage backdoor file. Correspondingly, when the html framework model obtained in the mode is used for model matching, the matching result can be a value representing the matching degree, generally, the larger the value is, the higher the matching degree is, which means that the probability that the html file is a webpage backdoor file is higher. Correspondingly, the value of the matching degree is compared with a set threshold value; and if the value of the matching degree exceeds a set threshold value, judging the html file as a webpage backdoor file.
The threshold may be an empirical value, or may be calculated by simulation or the like.
It should be noted that the size of the value and the meaning of the value are determined by the model matching algorithm. When some model matching algorithms are adopted, the lower the value is, the higher the matching degree is.
Correspondingly, if the higher the value of the matching degree, the higher the matching degree is, then exceeding the threshold means being greater than or equal to the threshold; if the higher the value of the matching degree, the lower the matching degree, then exceeding the threshold means being less than or equal to the threshold.
Besides performing model training by taking the html skeleton of the known webpage backdoor file as a sample, model training can also be performed by taking the html skeleton of the known normal html file as a sample. The html framework model trained in the way well describes html framework features (namely ordering rules of html tags) of a normal html file. Correspondingly, when the html framework model obtained in the mode is used for model matching, the matching result can be a value representing the matching degree, generally, the larger the value is, the higher the matching degree is, which means that the probability that the html file is a normal html file is higher, and conversely, the smaller the value is, the lower the matching degree is, which means that the probability that the html file is a webpage backdoor file is higher. Correspondingly, the value of the matching degree is compared with a set threshold value; and if the value of the matching degree exceeds a set threshold value, judging the html file as a webpage backdoor file. In this implementation manner, if the higher the value of the matching degree is, the higher the matching degree is, then exceeding the threshold means being smaller than the threshold; if the higher the value of the matching degree, the lower the matching degree, then exceeding the threshold means being greater than or equal to the threshold.
No matter which type of html skeleton is used as a sample and which modeling method is used, the model matching can be realized by using the existing matching algorithm, such as hash matching, and the specific implementation manner is not described herein.
In order to better resist against webpage backdoor variations, in any of the above method embodiments, before step 206, the specific character string in the html tag may be replaced with a standard character string according to a predetermined rule; accordingly, when the segment hash model is established, the specific character string in the html segment as the sample is replaced by the standard character string according to the predetermined rule.
The above process may be referred to as a preprocessing process, which may be performed before the segmentation process or may be performed after the segmentation process.
Specific character strings such as numbers can appear in the attribute values of the tags and the content among the tags, the specific content of the specific character strings is different in different html files (for example, different numbers take values), but the specific character strings do not belong to the characteristics capable of distinguishing normal html files from webpage backdoor files, the specific character strings are classified, each type of character strings are replaced by corresponding standard character strings, the recognition of the html files cannot be affected, and the accuracy and the efficiency in matching can be improved instead.
The predetermined rules describe the recognition rules of each type of specific character string and the standard character string corresponding to each type of specific character string. For example, in an html file, the string "123456" and the string "654321" are both numbers, replaced with the standard string "digital".
When webpage backdoor detection is carried out based on the segment hash model, besides the preprocessing operation of the standard character string replacement, a set general html segment can be filtered, wherein the general html segment refers to html segments of both a normal html file and a webpage backdoor file. The generic html fragments may be maintained by a generic html fragment list. Accordingly, when the segment hash model is established, only non-generic html segments can be used.
If the web page backdoor file is not detected based on the segment hash model, the web page backdoor is continuously detected based on the html framework model and/or the html attribute information model, and the last two detection modes also need to be preprocessed, preferably, the preprocessing operation can be carried out in the detection process based on the segment hash model so as to improve the subsequent processing efficiency.
In a second aspect, based on the same inventive concept, an embodiment of the present specification provides a web page backdoor detection apparatus, please refer to fig. 3, including:
the Html file acquisition module 301 is configured to acquire an Html file by monitoring network traffic between the target host and the browser;
an Html segmentation module 302, configured to perform segmentation processing on the Html file to obtain a plurality of Html fragments;
the model matching module 303 is configured to match each html segment by using a pre-established segment hash model;
and the web page backdoor detection module 304 is configured to determine whether the html file is a web page backdoor file according to the number of the matched html fragments.
Optionally, the segment hash model is obtained by performing segmentation processing on a known web page backdoor file and training with an html segment of the known web page backdoor file as a sample.
Optionally, the web page backdoor detection module is configured to:
comparing a hit rate with a set threshold value, wherein the hit rate is the ratio of the number of the matched html fragments to the total number of the html fragments;
and if the hit rate exceeds a set threshold value, judging that the html file is a webpage backdoor file.
Optionally, the system further includes a threshold adjusting module, configured to adjust the threshold according to the total number of html fragments, where the higher the total number of html fragments is, the lower the threshold is.
Optionally, if the hit rate does not exceed the set threshold, the method further includes a second detecting module, configured to:
extracting attribute information of each label in the html file, wherein the attribute information of the label comprises the attribute and the attribute value of the label; respectively matching the attribute information of each label by using a pre-established attribute information model; judging whether the html file is a webpage backdoor file or not according to a matching result;
or,
extracting html tags from the html files to obtain html tag sequences arranged in the html files, wherein the html tag sequences form html frameworks of the html files; matching the html framework by using a pre-established html framework model, wherein the html framework model is obtained by training a plurality of html frameworks serving as samples; and judging whether the html file is a webpage backdoor file or not according to the matching result.
Based on any of the above device embodiments, optionally, the method further includes: replacing the specific character string in the html file with a standard character string according to a preset rule;
and when the segment hash model is established, replacing the specific character string in the html segment serving as the sample with a standard character string according to the preset rule.
Based on any of the above device embodiments, optionally, the html segmentation module is configured to:
and carrying out segmentation processing on the html file, and deleting the set general html fragments to obtain a plurality of html fragments.
Based on any of the above device embodiments, optionally, the html segmentation module is configured to:
and carrying out segmentation processing on the html file by taking the greater than number as a segmentation identifier to obtain a plurality of html fragments.
In a third aspect, based on the same inventive concept, an embodiment of the present specification provides a web page backdoor detection apparatus, including a memory and a processor, where the memory stores a computer program, and the processor implements the steps of any of the above method embodiments when executing the computer program.
The web page backdoor detection device provided in the embodiments of the present specification may be, but is not limited to, a gateway, a device having a gateway function, or other devices that can monitor network traffic.
In a fourth aspect, based on the same inventive concept, the present specification further provides a computer-readable storage medium, on which a computer program is stored, and the computer program, when executed by a processor, implements the steps of any of the above-mentioned embodiments.
The description has been presented with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the description. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
While preferred embodiments of the present specification have been described, additional variations and modifications in those embodiments may occur to those skilled in the art once they learn of the basic inventive concepts. Therefore, it is intended that the appended claims be interpreted as including preferred embodiments and all changes and modifications that fall within the scope of the specification.
It will be apparent to those skilled in the art that various changes and modifications may be made in the present specification without departing from the spirit and scope of the specification. Thus, if such modifications and variations of the present specification fall within the scope of the claims of the present specification and their equivalents, the specification is intended to include such modifications and variations.

Claims (10)

1. A webpage backdoor detection method is characterized by comprising the following steps:
acquiring a hypertext markup language html file by monitoring network traffic between a target host and a browser;
carrying out segmentation processing on the html file to obtain a plurality of html fragments;
matching each html fragment by utilizing a pre-established fragment hash model;
and judging whether the html file is a webpage backdoor file or not according to the number of the matched html fragments.
2. The method according to claim 1, wherein the segment hash model is obtained by performing segmentation processing on a known web page backdoor file and training by taking html (hypertext markup language) segments of the known web page backdoor file as samples.
3. The method according to claim 2, wherein the judging whether the html file is a web backdoor file according to the number of the matched html fragments comprises:
comparing a hit rate with a set threshold value, wherein the hit rate is the ratio of the number of the matched html fragments to the total number of the html fragments;
and if the hit rate exceeds a set threshold value, judging that the html file is a webpage backdoor file.
4. The method according to claim 3, wherein the judging whether the html file is before the web backdoor file according to the number of the matched html fragments further comprises:
and adjusting the threshold value according to the total number of the html fragments, wherein the higher the total number of the html fragments is, the lower the threshold value is.
5. The method of claim 3, wherein if the value of the matching degree does not exceed a set threshold, the method further comprises:
extracting attribute information of each label in the html file, wherein the attribute information of the label comprises the attribute and the attribute value of the label; respectively matching the attribute information of each label by using a pre-established attribute information model; judging whether the html file is a webpage backdoor file or not according to a matching result;
or,
extracting html tags from the html files to obtain html tag sequences arranged in the html files, wherein the html tag sequences form html frameworks of the html files; matching the html framework by using a pre-established html framework model, wherein the html framework model is obtained by training a plurality of html frameworks serving as samples; and judging whether the html file is a webpage backdoor file or not according to the matching result.
6. The method according to any one of claims 1 to 5, wherein before the matching is performed on each html fragment by using the pre-established fragment hash model, the method further comprises: replacing the specific character string in the html file with a standard character string according to a preset rule;
and when the segment hash model is established, replacing the specific character string in the html segment serving as the sample with a standard character string according to the preset rule.
7. The method according to any one of claims 1 to 5, wherein the segmenting the html file to obtain a plurality of html fragments comprises:
and carrying out segmentation processing on the html file, and deleting the set general html fragments to obtain a plurality of html fragments.
8. A web page backdoor detection apparatus, comprising:
the Html file acquisition module is used for acquiring Html files by monitoring network traffic between the target host and the browser;
the Html segmentation module is used for carrying out segmentation processing on the Html file to obtain a plurality of Html fragments;
the model matching module is used for respectively matching each html fragment by utilizing a pre-established fragment hash model;
and the webpage backdoor detection module is used for judging whether the html file is the webpage backdoor file or not according to the number of the matched html fragments.
9. A web page backdoor detection apparatus comprising a memory and a processor, the memory storing a computer program, wherein the processor implements the steps of the method of any one of claims 1 to 7 when executing the computer program.
10. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the steps of the method of any one of claims 1 to 7.
CN201810713529.6A 2018-06-29 2018-06-29 Webpage backdoor detection method, device, equipment and storage medium Active CN108920950B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810713529.6A CN108920950B (en) 2018-06-29 2018-06-29 Webpage backdoor detection method, device, equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810713529.6A CN108920950B (en) 2018-06-29 2018-06-29 Webpage backdoor detection method, device, equipment and storage medium

Publications (2)

Publication Number Publication Date
CN108920950A true CN108920950A (en) 2018-11-30
CN108920950B CN108920950B (en) 2022-03-08

Family

ID=64425162

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810713529.6A Active CN108920950B (en) 2018-06-29 2018-06-29 Webpage backdoor detection method, device, equipment and storage medium

Country Status (1)

Country Link
CN (1) CN108920950B (en)

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101471818A (en) * 2007-12-24 2009-07-01 北京启明星辰信息技术股份有限公司 Detection method and system for malevolence injection script web page
CN103559235A (en) * 2013-10-24 2014-02-05 中国科学院信息工程研究所 Online social network malicious webpage detection and identification method
CN103902476A (en) * 2013-12-27 2014-07-02 哈尔滨安天科技股份有限公司 Webpage backdoor detection method and system based on non-credit-granting
CN104301314A (en) * 2014-10-31 2015-01-21 电子科技大学 Intrusion detection method and device based on browser tag attributes
CN106790007A (en) * 2016-12-13 2017-05-31 武汉虹旭信息技术有限责任公司 Web attack defending systems and its method based on XSS and CSRF
CN107451476A (en) * 2017-07-21 2017-12-08 上海携程商务有限公司 Webpage back door detection method, system, equipment and storage medium based on cloud platform

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101471818A (en) * 2007-12-24 2009-07-01 北京启明星辰信息技术股份有限公司 Detection method and system for malevolence injection script web page
CN103559235A (en) * 2013-10-24 2014-02-05 中国科学院信息工程研究所 Online social network malicious webpage detection and identification method
CN103902476A (en) * 2013-12-27 2014-07-02 哈尔滨安天科技股份有限公司 Webpage backdoor detection method and system based on non-credit-granting
CN104301314A (en) * 2014-10-31 2015-01-21 电子科技大学 Intrusion detection method and device based on browser tag attributes
CN106790007A (en) * 2016-12-13 2017-05-31 武汉虹旭信息技术有限责任公司 Web attack defending systems and its method based on XSS and CSRF
CN107451476A (en) * 2017-07-21 2017-12-08 上海携程商务有限公司 Webpage back door detection method, system, equipment and storage medium based on cloud platform

Also Published As

Publication number Publication date
CN108920950B (en) 2022-03-08

Similar Documents

Publication Publication Date Title
CN108416198B (en) Device and method for establishing human-machine recognition model and computer readable storage medium
CN109145216B (en) Network public opinion monitoring method, device and storage medium
WO2018166114A1 (en) Picture identification method and system, electronic device, and medium
CN110309388B (en) Method and device for identifying illegal risk of data object information and computer system
WO2019200781A1 (en) Receipt recognition method and device, and storage medium
WO2019218514A1 (en) Method for extracting webpage target information, device, and storage medium
WO2019119505A1 (en) Face recognition method and device, computer device and storage medium
US20190163742A1 (en) Method and apparatus for generating information
US9275307B2 (en) Method and system for automatic selection of one or more image processing algorithm
CN113051356B (en) Open relation extraction method and device, electronic equipment and storage medium
US20190311114A1 (en) Man-machine identification method and device for captcha
CN108920955B (en) Webpage backdoor detection method, device, equipment and storage medium
CN105072214B (en) C&C domain name recognition methods based on domain name feature
CN108491866B (en) Pornographic picture identification method, electronic device and readable storage medium
CN110363220B (en) Behavior class detection method and device, electronic equipment and computer readable medium
CN108228421A (en) data monitoring method, device, computer and storage medium
CN112214984A (en) Content plagiarism identification method, device, equipment and storage medium
CN108985059B (en) Webpage backdoor detection method, device, equipment and storage medium
CN115941322A (en) Attack detection method, device, equipment and storage medium based on artificial intelligence
CN114363019A (en) Method, device and equipment for training phishing website detection model and storage medium
CN107168635A (en) Information demonstrating method and device
CN107688594B (en) The identifying system and method for risk case based on social information
CN108920950B (en) Webpage backdoor detection method, device, equipment and storage medium
CN108595453B (en) URL (Uniform resource locator) identifier mapping obtaining method and device
CN111382383A (en) Method, device, medium and computer equipment for determining sensitive type of webpage content

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant