CN106911636B - Method and device for detecting whether backdoor program exists in website - Google Patents

Method and device for detecting whether backdoor program exists in website Download PDF

Info

Publication number
CN106911636B
CN106911636B CN201510976063.5A CN201510976063A CN106911636B CN 106911636 B CN106911636 B CN 106911636B CN 201510976063 A CN201510976063 A CN 201510976063A CN 106911636 B CN106911636 B CN 106911636B
Authority
CN
China
Prior art keywords
uniform resource
url
resource locator
urls
subset
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201510976063.5A
Other languages
Chinese (zh)
Other versions
CN106911636A (en
Inventor
董方
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
3600 Technology Group Co ltd
Original Assignee
Beijing Qihoo Technology Co Ltd
Qizhi Software Beijing Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Qihoo Technology Co Ltd, Qizhi Software Beijing Co Ltd filed Critical Beijing Qihoo Technology Co Ltd
Priority to CN201510976063.5A priority Critical patent/CN106911636B/en
Publication of CN106911636A publication Critical patent/CN106911636A/en
Application granted granted Critical
Publication of CN106911636B publication Critical patent/CN106911636B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/14Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic
    • H04L63/1433Vulnerability analysis

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Security & Cryptography (AREA)
  • Computer Hardware Design (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Information Transfer Between Computers (AREA)

Abstract

The application discloses a method and a device for detecting whether a website has a backdoor program, wherein the method comprises the following steps: acquiring a Uniform Resource Locator (URL) accessed by a website to be detected within a first statistical duration to obtain a first set containing the URL; acquiring a Uniform Resource Locator (URL) accessed by the website to be detected within a second statistical time length after the first statistical time length to obtain a second set containing the URL; determining uniform resource locator URLs included in the second set and not included in the first set as suspicious uniform resource locator URLs; judging whether a webpage code obtained by the suspicious uniform resource locator URL request webpage contains a predetermined backdoor fingerprint or not; and if so, judging that the to-be-detected website exists in a backdoor program. According to the embodiment of the application, the detection of the backdoor program in the website can be realized, so that the security level of the website is improved.

Description

Method and device for detecting whether backdoor program exists in website
Technical Field
The present application relates to the field of internet technologies, and in particular, to a method and an apparatus for detecting whether a website has a backdoor program.
Background
With the development of internet technology, information resources have been increased explosively, and the security problem of the information resources is accompanied. Information resources located in an internet environment may face various threats. Typically, such threats may result from purposeful active attacks by malicious programs or code, such as hackers, viruses, etc., from the point of origin; there may be a "congenital" security hole from the carrier (e.g., application software, client program, web page/website, etc.) on which the information resource itself depends, and such a hole may be greatly and possibly illegally utilized by an unauthorized molecule, thereby threatening the information resource. The threat posed by the "back door procedure" is a more common phenomenon in the latter case.
For example, in the development stage of software, in order to facilitate operations such as modification, debugging, upgrading and the like on the software, a programmer may create or reserve an appropriate interface in the software, so as to modify some defects in programming or improve some functions through the interface. However, if the interface program is known by others or is not deleted in time before the software is released, it may be used by malicious persons such as hackers to gain access to the relevant program or system through the interface by bypassing the security control, and to perform illegal operations such as collecting information. Such interfaces that may be compromised by the security of the information resources are generally referred to as back-door programs, which may have serious consequences once utilized. Therefore, it is necessary to detect whether a backdoor program exists in a carrier in which an information resource exists by an appropriate method, and then perform operations such as deletion thereof, thereby reducing security risks.
In some scenarios in the prior art, backdoor programs can be better discovered and processed in a timely manner. For example, for a backdoor program hidden in a client program, detection can be performed through existing antivirus software, and check and kill processing is performed in time after detection. However, for the backdoor programs hidden in the website, an effective backdoor program detection mode for improving the security level of the website does not exist at present.
Disclosure of Invention
Embodiments of the present application provide a method and apparatus for detecting whether a website has a backdoor program, which overcome the above problems or at least partially solve the above problems.
The embodiment of the application adopts the following technical scheme:
a method for detecting whether a website exists at a backdoor comprises the following steps:
acquiring a Uniform Resource Locator (URL) accessed by a website to be detected within a first statistical duration to obtain a first set containing the URL;
acquiring a Uniform Resource Locator (URL) accessed by the website to be detected within a second statistical duration after or before the first statistical duration to obtain a second set containing the URL;
determining Uniform Resource Locator (URL) contained in the second set and not contained in the first set or uniform resource locators contained in the first set and not contained in the second set as suspicious Uniform Resource Locator (URL);
judging whether the webpage code corresponding to the suspicious uniform resource locator URL contains a predetermined backdoor fingerprint obtained by training a plurality of sample backdoor programs in a backdoor sample library;
and if so, judging that the to-be-detected website exists in a backdoor program.
Preferably, after obtaining the uniform resource locator URL accessed by the website to be detected within the first statistical duration and obtaining the first set including the uniform resource locator URL, the method further includes:
de-duplicating the Uniform Resource Locators (URLs) included in the first set; and/or the presence of a gas in the gas,
filtering Uniform Resource Locators (URLs) with corresponding static resources contained in the first set;
after obtaining the uniform resource locator URL accessed by the website to be detected within a second statistical duration after or before the first statistical duration and obtaining a second set containing the uniform resource locator URL, the method further comprises:
de-duplicating the Uniform Resource Locators (URLs) included in the second set; and/or the presence of a gas in the gas,
and filtering the Uniform Resource Locators (URLs) with corresponding static resources contained in the second set.
Preferably, determining the uniform resource locator URL included in the second set and not included in the first set as a suspicious URL of the website to be detected specifically includes:
determining a uniform resource locator, URL, included in the second set and not included in the first set;
judging whether the determined uniform resource locator URL carries parameters or not;
if so, determining the uniform resource locator URL as a suspicious uniform resource locator URL.
Preferably, after obtaining the uniform resource locator URL accessed by the website to be detected within the first statistical duration and obtaining the first set including the uniform resource locator URL, the method further includes:
dividing the first set into a first subset comprising Uniform Resource Locator (URL) with parameters and a second subset comprising URL without parameters;
after acquiring a uniform resource locator URL accessed by a website to be detected within a first statistical duration and obtaining a first set containing the uniform resource locator URL, the method further comprises the following steps:
dividing the second set into a third subset comprising Uniform Resource Locator (URL) with parameters and a fourth subset comprising URL without parameters;
then, determining the uniform resource locator URL included in the second set and not included in the first set as a suspicious uniform resource locator URL, specifically including:
determining Uniform Resource Locators (URLs) contained in the first subset and not contained in the third subset as suspicious Uniform Resource Locators (URLs) with parameters;
determining uniform resource locator URLs included in the second subset and not included in the fourth subset as suspect uniform resource locator URLs without parameters.
Preferably, determining the uniform resource locator URL included in the second set and not included in the first set as a suspicious uniform resource locator URL specifically includes:
determining a Uniform Resource Locator (URL) with parameters that is included in the second set and not included in the first set;
judging whether the determined uniform resource locator URL contains a backdoor URL feature in a preset backdoor sample library or not;
if so, determining the uniform resource locator URL as a suspicious uniform resource locator URL.
An apparatus for detecting whether a website has a backdoor program, comprising:
the first acquisition unit is used for acquiring the Uniform Resource Locators (URLs) of the to-be-detected websites accessed within a first statistical duration to obtain a first set containing the URLs;
the second acquisition unit is used for acquiring the Uniform Resource Locators (URLs) of the website to be detected, which are accessed within a second statistical duration after or before the first statistical duration, so as to obtain a second set containing the URLs;
a determining unit, configured to determine a uniform resource locator URL included in the second set and not included in the first set or a uniform resource locator included in the first set and not included in the second set as a suspicious uniform resource locator URL;
a judging unit, configured to judge whether a web code corresponding to the suspicious uniform resource locator URL obtained by the suspicious uniform resource locator URL includes a predetermined backdoor fingerprint, where the backdoor fingerprint is obtained by training a plurality of sample backdoor programs in a backdoor sample library; and if so, judging that the to-be-detected website exists in a backdoor program.
Preferably, the apparatus further comprises:
a first preprocessing unit, configured to deduplicate Uniform Resource Locators (URLs) included in the first set; and/or eliminating Uniform Resource Locators (URLs) belonging to static resources and contained in the first set;
a second preprocessing unit, configured to deduplicate Uniform Resource Locators (URLs) included in the second set; and/or eliminating Uniform Resource Locators (URLs) belonging to static resources and contained in the second set.
Preferably, the determining unit specifically includes:
a first determining subunit for determining uniform resource locators, URLs, included in the second set and not included in the first set;
a first judging subunit, configured to judge whether the determined uniform resource locator URL carries a parameter; if so, determining the uniform resource locator URL as a suspicious uniform resource locator URL.
Preferably, the apparatus further comprises:
a first partitioning unit for partitioning the first set into a first subset comprising Uniform Resource Locators (URLs) with parameters and a second subset comprising Uniform Resource Locators (URLs) without parameters;
a first partitioning unit for partitioning the second set into a third subset comprising uniform resource locators with parameters and a fourth subset comprising uniform resource locators without parameters;
then, the determining unit is specifically configured to:
determining Uniform Resource Locators (URLs) contained in the first subset and not contained in the third subset as suspicious Uniform Resource Locators (URLs) with parameters;
determining uniform resource locator URLs included in the second subset and not included in the fourth subset as suspect uniform resource locator URLs without parameters.
Preferably, the determining unit specifically includes:
a second determining subunit, configured to determine a uniform resource locator, URL, with parameters that is included in the second set and not included in the first set;
the second judging subunit is used for judging whether the determined uniform resource locator URL contains the characteristics of a back door URL in a preset back door sample library or not; if so, determining the uniform resource locator URL as a suspicious uniform resource locator URL.
The embodiment of the application adopts at least one technical scheme which can achieve the following beneficial effects:
the method comprises the steps of respectively obtaining Uniform Resource Locators (URLs) accessed by a website to be detected within a first statistical time length and a second statistical time length to respectively obtain a first set and a second set containing the URLs, then determining the URLs which are contained in the second set and are not contained in the first set as suspicious URLs, then requesting the determined suspicious URLs for a webpage to obtain a webpage code, and finally judging whether the website to be detected has a backdoor program or not by judging whether the webpage code contains a preset backdoor fingerprint or not. Compared with the prior art, by the website access flow analysis mode, suspicious webpage codes in the website can be found out in time, so that backdoor program detection is carried out on the suspicious webpage codes, backdoor programs in the website are effectively found, necessary targeted measures are taken for the backdoor programs in the website, and the improvement of the security level of the website is facilitated.
Drawings
The accompanying drawings, which are included to provide a further understanding of the application and are incorporated in and constitute a part of this application, illustrate embodiment(s) of the application and together with the description serve to explain the application and not to limit the application. In the drawings:
fig. 1 is a flowchart illustrating a method for detecting whether a website exists at a backdoor according to an embodiment of the present disclosure;
FIG. 2 is a flowchart illustrating an embodiment of determining a suspicious uniform resource locator URL;
fig. 3 is a block diagram of an apparatus for detecting whether a website exists at a backdoor according to an embodiment of the present application.
Detailed Description
In order to make the objects, technical solutions and advantages of the present application more apparent, the technical solutions of the present application will be described in detail and completely with reference to the following specific embodiments of the present application and the accompanying drawings. It should be apparent that the described embodiments are only some of the embodiments of the present application, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.
Fig. 1 is a flowchart of a method for detecting whether a website exists at a backdoor according to an embodiment of the present application, including:
s101: and acquiring the uniform resource locator URL accessed by the website to be detected within the first statistical time length to obtain a first set Q1 containing the uniform resource locator URL.
S102: and acquiring the uniform resource locator URL accessed by the website to be detected within a second statistical time length after the first statistical time length to obtain a second set Q2 containing the uniform resource locator URL.
In the steps S101 and S102, the website to be detected may be any website that can be accessed by a user through a browser. The computer may obtain a Uniform Resource Locator (URL) that the website to be detected has been accessed by viewing log data of the website to be detected. The log data may include: host, time, IP address, uniform resource locator URL, webpage parameter and other information, and can mark the log data according to time, thus obtaining the log data within a certain statistical duration.
In this embodiment of the application, the log data of the website to be detected in the corresponding statistical period may be obtained at regular time according to a statistical period (the first statistical duration or the second statistical duration), so as to find that the website to be detected may belong to a uniform resource locator URL of a backdoor file. That is, the first statistical time duration and the second statistical time duration are equal. For example, the first statistical duration and the second statistical duration are one day, so that the URL visited by the website to be detected in the previous day is obtained in step S101, and the URL visited by the website to be detected in the next day is obtained in step S102. Of course, in other embodiments of the present application, the first and second statistical durations may also be unequal, and the first and second statistical durations may also be any other durations. It should be noted that the second statistical duration of the present application may also be before the first statistical duration.
In this embodiment, the first set Q1 is a set of URLs of the to-be-detected websites visited within a first statistical duration, and the second set Q2 is a set of URLs of the to-be-detected websites visited within a second statistical duration. Generally, each web page on the website to be detected may be a directory structure. For example: the URL of the homepage of a certain website to be detected is: www.sina.com.cn, respectively; with the URL of the main page as the first level of the directory, it is assumed that the second level URL under the first level URL may include: www.sports.sina.com.cn, respectively; www.book.sina.com.cn, respectively; www.game.sina.com.cn, respectively; and the like; assume that the third level URL under the second level URL "www.sports.sina.com.cn" described above may include: www.sports.sina.com.cn/g/laliga/; assume that the fourth level URL under the third level URL "www.sports.sina.com.cn/g/laliga/" above may include: www.sports.sina.com.cn/g/laliga/2015-12-16/doc-ifxmpnuk1614789. shtml; by analogy, in summary, the URL on the website to be detected may be a directory structure like that described above. In this embodiment, taking the statistical duration as one day as an example, the traffic (or frequency) of each URL visited on the to-be-detected website per day is substantially equal, and if a change is found in the traffic (or frequency) of a URL visited on the to-be-detected website in a certain day, it may be determined that the URL is a suspicious URL.
In the embodiment of the present application, in the process of obtaining the first set Q1 and the second set Q2, since one URL is usually recorded in the log data of the website to be detected according to the accessed time, in practice, multiple records may be performed on the same URL in the log data for a preset time (e.g., 1 day), so that, in order to make the finally obtained URL data in the first set and the second set more concise, so as to improve the processing efficiency of the computer, the embodiment of the present application may be implemented by the following scheme:
after the above step S101, the method further includes the steps of: the uniform resource locators URLs contained in the first set Q1 are deduplicated. After the step S102, the method further includes the steps of: the uniform resource locators URLs contained in the second set Q2 are deduplicated. Through the steps, duplicate URL data in the first set and the second set can be removed, so that the finally obtained URL data is more simplified.
In addition, in order to further ensure that the obtained URL data is more simplified, after the step S101, the method further includes the following steps: eliminating Uniform Resource Locators (URLs) belonging to static resources and contained in the first set Q1; after the step S102, the method further includes the steps of: eliminating Uniform Resource Locators (URLs) with corresponding static resources contained in the second set Q2. Wherein the static resources include but are not limited to: CSS (cascading style Sheets), js (javascript), HTML, pictures, and the like, "correspond" here means that the URL corresponds to the URL, that is, static resources may exist in resources of some URLs. By filtering the URLs of the static resources included in the first set Q1 and the second set Q2, the URL data in the finally obtained sets can be more simplified, and the processing efficiency of the computer can be further improved.
It should be noted that, in the embodiment of the present application, one of the foregoing step of the deduplication processing and the foregoing step of the static resource filtering may be selected separately, or the foregoing step of the deduplication processing and the foregoing step of the static resource filtering may be combined.
Generally, URLs can be classified into URLs with parameters and URLs without parameters. The URL with parameters is, for example: http:// www.xxx.com/cgi-bin/phf? Qname is root%; URLs without parameters are for example: www.sports.sina.com.cn/g/laliga/. Wherein, "? Qname ═ root% "is a parameter of the URL. Generally, a URL with a parameter carries more information than a URL without a parameter. In view of this, in some embodiments of the present application, after the step S101, the method further includes the steps of: the first set Q is divided into a first subset Q11 including uniform resource locator URLs with parameters, and a second subset Q12 including uniform resource locator URLs without parameters. Accordingly, after the step S102, the method further includes the steps of: the second set Q2 is divided into a third subset Q21 that includes uniform resource locator URLs with parameters, and a fourth subset Q22 that includes uniform resource locator URLs without parameters. By dividing the obtained URL according to whether the URL has the parameters or not, the accuracy of finally determining the backdoor program can be improved.
S103: determining uniform resource locator URLs included in the second set and not included in the first set as suspect uniform resource locator URLs.
Taking the first statistical duration and the second statistical duration as one day as an example, if it is found that a certain URL accessed in the following day does not appear in log data of the preceding day, that is, the URL belongs to a newly accessed URL, it may be indicated to a certain extent that the URL belongs to a suspicious URL, and it is necessary to further determine whether the URL belongs to a URL of a backdoor program. In a specific implementation process, the log data (visited URL data) of the previous day of the website to be detected and the log data of the next day are respectively obtained through the above S101 and S102, and then one or more URLs contained in the second set Q2 but not contained in the second set Q1 are obtained by finding a difference set between the first set Q1 and the second set Q2, and the URLs can be determined as suspicious URLs. Likewise, another situation is where one or more URLs contained in the first set Q1 but not in the second set Q2 are available, which may likewise be determined to be available. For simplicity of description, the following description of the present application will focus on the related steps for determining suspicious URLs only on the case of the former case.
As described above, in some embodiments of the present application, if the URL is divided according to whether the URL has a parameter, the step S103 may specifically include the following steps:
determine uniform resource locator URLs included in the first subset Q11 and not included in the third subset Q21 as suspect uniform resource locator URLs with parameters, such as: http:// www.xxx.com/cgi-bin/phf? Qname is root%.
Determining uniform resource locator URLs included in the second subset Q12 and not included in the fourth subset Q22 as suspect uniform resource locator URLs without parameters, such as: www.sports.sina.com.cn/g/laliga/.
It should be noted that, in the embodiment of the present application, because the URL with the parameter carries more information, only the suspicious URL with the parameter may be retained, and the subsequent judgment of the backdoor program may be performed, which may further alleviate the processing pressure of the computer. Of course, in other embodiments, only the suspect URL without the parameters may be retained, or both the suspect URL with the parameters and the suspect URL without the parameters may be retained.
Referring to fig. 2, in some embodiments of the present application, the method may not include the step of dividing the URLs in the first set and the second set according to whether the URLs have parameters, and the step S103 may alternatively include the steps of:
s1031: a Uniform Resource Locator (URL) included in the second set of Q2 and not included in the first set of Q1 is determined.
S1032: and judging whether the determined uniform resource locator URL carries parameters.
Generally, a URL with a parameter refers to an address with a tail of "? "URL, then can be determined by identifying if the address tail carries"? "to determine if it carries a parameter.
S1033: and if the determined uniform resource locator URL carries parameters, determining the uniform resource locator URL as a suspicious uniform resource locator URL. Through the above process, only the URL with parameters may be retained through the specific flow of step S103 without dividing the URL.
S104: judging whether a webpage code obtained by the suspicious uniform resource locator URL request webpage contains a predetermined backdoor fingerprint or not; and if so, judging that the to-be-detected website exists in a backdoor program.
Before implementing step S104, a number of pre-obtained backdoor samples (i.e., backdoor programs) may be utilized to extract backdoor fingerprints included in the backdoor samples, where the backdoor fingerprints are code segments unique to the program code of a certain backdoor program, such as, "& shell ═ S". For a backdoor sample, one or more code segments (backdoor fingerprints) unique to the program code of the backdoor sample may be determined simultaneously. In this embodiment, the obtained plurality of back door fingerprints are finally classified into the back door fingerprint library.
After the suspicious URL accessed in the following day is determined in step S103, the URL can be dynamically used to request a web page in step S104 to obtain a web page corresponding to the suspicious URL, and then, the web page code corresponding to the content is returned by capturing page HTML, and the obtained backdoor fingerprint library is used to find out whether the web page code corresponding to the suspicious URL includes the backdoor fingerprint in the backdoor fingerprint library in an offset positioning manner, so as to determine whether the suspicious URL is the backdoor of the website to be detected. Preferably, in the embodiment of the present application, to improve the accuracy of determining the backdoor program, in step S104, if it is found that the webpage code corresponding to the suspicious URL includes at least three sections of backdoor fingerprints, and the at least three sections of backdoor fingerprints are discontinuous, it may be determined that the suspicious URL belongs to the backdoor of the website to be detected, and it is not necessary to check a specific source code of the website.
For example, assume that a suspicious URL for a website to be detected is: http:// www.xxx.com/cgi-bin/phf? The Qname is root%, and a code segment in the web page code obtained by dynamically requesting the web page is, for example:
pUdphdr->SrcPort=htons(SRCPORT);
pUdphdr->DestPort=htons(DESTPORT);
pUdphdr->Checksum=0;
char*pData=&buf[sizeof(IP_HEADER)+sizeof(UDP_HEADER)];
memcpy(pData,szMsg,nMsgLen);
UdpCheckSum(pIphdr,pUdphdr,pData,nMsgLen);
SOCKADDR_IN addr={0};//
suppose that examining the above code finds that three sections of backdoor fingerprints are included:
pUdphdr->DestPort=htons(DESTPORT);
char*pData=&buf[sizeof(IP_HEADER)+sizeof(UDP_HEADER)];
UdpCheckSum(pIphdr,pUdphdr,pData,nMsgLen);
it can be determined that the website to be detected exists in the backdoor program.
To sum up, in the method provided by the embodiment of the present application, Uniform Resource Locators (URLs) that are accessed by a website to be detected within a first statistical duration and a second statistical duration are respectively obtained to respectively obtain a first set and a second set that include the URLs, then the URLs that are included in the second set and are not included in the first set are determined as suspicious URLs, then the determined suspicious URLs are requested for the web pages to obtain web page codes, and finally whether the website to be detected exists at backdoor is determined by determining whether the web page codes include preset backdoor fingerprints. Through the process, the detection of the backdoor program in the website can be realized, so that the security level of the website is improved.
Fig. 3 is a block diagram of an apparatus for detecting whether a website has a backdoor program according to an embodiment of the present application. The functions implemented by the units included in the apparatus are the same as the functions implemented by the steps included in the method, so the specific technical details related to the apparatus may refer to the contents in the embodiments of the method, and are not described herein again. The device includes:
the first obtaining unit 101 is configured to obtain a uniform resource locator URL of a website to be detected, which is accessed within a first statistical duration, to obtain a first set including the uniform resource locator URL;
a second obtaining unit 102, configured to obtain a uniform resource locator URL that is accessed by the website to be detected within a second statistical duration after or before the first statistical duration, to obtain a second set including the uniform resource locator URL;
a determining unit 103, configured to determine a uniform resource locator URL included in the second set and not included in the first set or a uniform resource locator included in the first set and not included in the second set as a suspicious uniform resource locator URL;
a judging unit 104, configured to judge whether a web code corresponding to the suspicious uniform resource locator URL obtained by the suspicious uniform resource locator URL includes a predetermined backdoor fingerprint, where the backdoor fingerprint is obtained by training a plurality of sample backdoor programs in a backdoor sample library; and if so, judging that the to-be-detected website exists in a backdoor program.
In the device provided by the application embodiment, Uniform Resource Locators (URLs) accessed by a website to be detected within a first statistical time period and a second statistical time period are respectively obtained to respectively obtain a first set and a second set containing the URLs, then the URLs which are contained in the second set and are not contained in the first set are determined as suspicious URLs, then the determined suspicious URLs are requested for a webpage to obtain a webpage code, and finally whether the website to be detected exists at a backdoor is judged by judging whether the webpage code contains a preset backdoor fingerprint. Through the process, the detection of the backdoor program in the website can be realized, so that the security level of the website is improved.
In an embodiment of the present application, the apparatus further includes:
a first preprocessing unit, configured to deduplicate Uniform Resource Locators (URLs) included in the first set; and/or eliminating Uniform Resource Locators (URLs) with corresponding static resources contained in the first set;
a second preprocessing unit, configured to deduplicate Uniform Resource Locators (URLs) included in the second set; and/or filtering Uniform Resource Locators (URLs) with corresponding static resources contained in the second set. The first and second preprocessing units can make the finally obtained URL data set more compact, thereby improving the efficiency of computer processing.
In an embodiment of the present application, the determining unit specifically includes:
a first determining subunit for determining uniform resource locators, URLs, included in the second set and not included in the first set;
a first judging subunit, configured to judge whether the determined uniform resource locator URL carries a parameter; if so, determining the uniform resource locator URL as a suspicious uniform resource locator URL. The finally determined suspicious URL can be more accurate through the first determining subunit and the first judging subunit, so that the processing efficiency of a computer is improved.
In an embodiment of the present application, the apparatus further includes:
a first partitioning unit for partitioning the first set into a first subset comprising Uniform Resource Locators (URLs) with parameters and a second subset comprising Uniform Resource Locators (URLs) without parameters;
a first partitioning unit for partitioning the second set into a third subset comprising uniform resource locators with parameters and a fourth subset comprising uniform resource locators without parameters;
then, the determining unit is specifically configured to:
determining Uniform Resource Locators (URLs) contained in the first subset and not contained in the third subset as suspicious Uniform Resource Locators (URLs) with parameters;
determining uniform resource locator URLs included in the second subset and not included in the fourth subset as suspect uniform resource locator URLs without parameters. The accuracy of the suspicious URL judging process can be further improved by classifying the URLs according to whether the URLs have parameters or not.
In an embodiment of the present application, the determining unit specifically includes:
a second determining subunit, configured to determine a uniform resource locator, URL, with parameters that is included in the second set and not included in the first set;
the second judging subunit is used for judging whether the determined uniform resource locator URL contains the characteristics of a back door URL in a preset back door sample library or not; if so, determining the uniform resource locator URL as a suspicious uniform resource locator URL. The finally determined suspicious URL can be more accurate through the second determining subunit and the second judging subunit.
As will be appreciated by one skilled in the art, embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
In a typical configuration, a computing device includes one or more processors (CPUs), input/output interfaces, network interfaces, and memory.
The memory may include forms of volatile memory in a computer readable medium, Random Access Memory (RAM) and/or non-volatile memory, such as Read Only Memory (ROM) or flash memory (flash RAM). Memory is an example of a computer-readable medium.
Computer-readable media, including both non-transitory and non-transitory, removable and non-removable media, may implement information storage by any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of computer storage media include, but are not limited to, phase change memory (PRAM), Static Random Access Memory (SRAM), Dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), Read Only Memory (ROM), Electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technology, compact disc read only memory (CD-ROM), Digital Versatile Discs (DVD) or other optical storage, magnetic cassettes, magnetic tape magnetic disk storage or other magnetic storage devices, or any other non-transmission medium that can be used to store information that can be accessed by a computing device. As defined herein, a computer readable medium does not include a transitory computer readable medium such as a modulated data signal and a carrier wave.
It should also be noted that the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.
As will be appreciated by one skilled in the art, embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
The above description is only an example of the present application and is not intended to limit the present application. Various modifications and changes may occur to those skilled in the art. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present application should be included in the scope of the claims of the present application.

Claims (8)

1. A method for detecting whether a website has a backdoor program or not is characterized by comprising the following steps:
acquiring a Uniform Resource Locator (URL) accessed by a website to be detected within a first statistical duration to obtain a first set containing the URL; dividing the first set into a first subset comprising Uniform Resource Locator (URL) with parameters and a second subset comprising URL without parameters;
acquiring a Uniform Resource Locator (URL) accessed by the website to be detected within a second statistical duration after or before the first statistical duration to obtain a second set containing the URL; dividing the second set into a third subset comprising Uniform Resource Locator (URL) with parameters and a fourth subset comprising URL without parameters;
determining Uniform Resource Locators (URLs) contained in the second set and not contained in the first set or uniform resource locators contained in the first set and not contained in the second set as suspicious URL; wherein the uniform resource locator URL contained in the first subset and not contained in the third subset is determined as a suspicious uniform resource locator URL with parameters; determining uniform resource locator URLs contained in the second subset and not contained in the fourth subset as suspect uniform resource locator URLs without parameters; or, determining Uniform Resource Locators (URLs) which are contained in the third subset and are not contained in the first subset as suspicious Uniform Resource Locators (URLs) with parameters; determining uniform resource locator URLs contained in the fourth subset and not contained in the second subset as suspect uniform resource locator URLs without parameters;
judging whether a webpage code corresponding to the suspicious uniform resource locator URL with the parameter and/or the suspicious uniform resource locator URL without the parameter contains a predetermined backdoor fingerprint, wherein the backdoor fingerprint is obtained by training a plurality of sample backdoor programs in a backdoor sample library;
and if so, judging that the to-be-detected website exists in a backdoor program.
2. The method of claim 1, wherein after obtaining the uniform resource locator URL accessed by the website to be detected within the first statistical duration to obtain the first set including the uniform resource locator URL, the method further comprises:
de-duplicating the Uniform Resource Locators (URLs) included in the first set; and/or the presence of a gas in the gas,
filtering Uniform Resource Locators (URLs) with corresponding static resources contained in the first set;
after obtaining the uniform resource locator URL accessed by the website to be detected within a second statistical duration after or before the first statistical duration and obtaining a second set containing the uniform resource locator URL, the method further comprises:
de-duplicating the Uniform Resource Locators (URLs) included in the second set; and/or the presence of a gas in the gas,
and filtering the Uniform Resource Locators (URLs) with corresponding static resources contained in the second set.
3. The method according to claim 1, wherein determining the URL contained in the second set and not contained in the first set as the suspicious URL of the website to be detected specifically comprises:
determining a uniform resource locator, URL, included in the second set and not included in the first set;
judging whether the determined uniform resource locator URL carries parameters or not;
if so, determining the uniform resource locator URL as a suspicious uniform resource locator URL.
4. The method of claim 1, wherein determining uniform resource locator URLs included in the second set and not included in the first set as suspect uniform resource locator URLs comprises:
determining a Uniform Resource Locator (URL) with parameters that is included in the second set and not included in the first set;
judging whether the determined uniform resource locator URL contains a backdoor URL feature in a preset backdoor sample library or not;
if so, determining the uniform resource locator URL as a suspicious uniform resource locator URL.
5. An apparatus for detecting whether a website has a backdoor program, comprising:
the first acquisition unit is used for acquiring the Uniform Resource Locators (URLs) of the to-be-detected websites accessed within a first statistical duration to obtain a first set containing the URLs;
a first partitioning unit for partitioning the first set into a first subset comprising Uniform Resource Locators (URLs) with parameters and a second subset comprising Uniform Resource Locators (URLs) without parameters;
the second acquisition unit is used for acquiring the Uniform Resource Locators (URLs) of the website to be detected, which are accessed within a second statistical duration after or before the first statistical duration, so as to obtain a second set containing the URLs;
a first partitioning unit for partitioning the second set into a third subset comprising uniform resource locators with parameters and a fourth subset comprising uniform resource locators without parameters;
a determining unit, configured to determine a uniform resource locator URL included in the second set and not included in the first set or a uniform resource locator included in the first set and not included in the second set as a suspicious uniform resource locator URL; the determining unit is specifically configured to: determining Uniform Resource Locators (URLs) contained in the first subset and not contained in the third subset as suspicious Uniform Resource Locators (URLs) with parameters; determining uniform resource locator URLs contained in the second subset and not contained in the fourth subset as suspect uniform resource locator URLs without parameters; or, determining Uniform Resource Locators (URLs) which are contained in the third subset and are not contained in the first subset as suspicious Uniform Resource Locators (URLs) with parameters; determining uniform resource locator URLs contained in the fourth subset and not contained in the second subset as suspect uniform resource locator URLs without parameters;
the judging unit is used for judging whether the webpage codes corresponding to the suspicious uniform resource locators with the parameters and/or the suspicious uniform resource locators without the parameters contain predetermined backdoor fingerprints, and the backdoor fingerprints are obtained by training a plurality of sample backdoor programs in a backdoor sample library; and if so, judging that the to-be-detected website exists in a backdoor program.
6. The apparatus of claim 5, wherein the apparatus further comprises:
a first preprocessing unit, configured to deduplicate Uniform Resource Locators (URLs) included in the first set; and/or filtering Uniform Resource Locators (URLs) with corresponding static resources contained in the first set;
a second preprocessing unit, configured to deduplicate Uniform Resource Locators (URLs) included in the second set; and/or filtering Uniform Resource Locators (URLs) with corresponding static resources contained in the second set.
7. The apparatus according to claim 5, wherein the determining unit specifically includes:
a first determining subunit for determining uniform resource locators, URLs, included in the second set and not included in the first set;
a first judging subunit, configured to judge whether the determined uniform resource locator URL carries a parameter; if so, determining the uniform resource locator URL as a suspicious uniform resource locator URL.
8. The apparatus according to claim 5, wherein the determining unit specifically includes:
a second determining subunit, configured to determine a uniform resource locator, URL, with parameters that is included in the second set and not included in the first set;
the second judging subunit is used for judging whether the determined uniform resource locator URL contains the characteristics of a back door URL in a preset back door sample library or not; if so, determining the uniform resource locator URL as a suspicious uniform resource locator URL.
CN201510976063.5A 2015-12-22 2015-12-22 Method and device for detecting whether backdoor program exists in website Active CN106911636B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201510976063.5A CN106911636B (en) 2015-12-22 2015-12-22 Method and device for detecting whether backdoor program exists in website

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201510976063.5A CN106911636B (en) 2015-12-22 2015-12-22 Method and device for detecting whether backdoor program exists in website

Publications (2)

Publication Number Publication Date
CN106911636A CN106911636A (en) 2017-06-30
CN106911636B true CN106911636B (en) 2020-09-04

Family

ID=59200875

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201510976063.5A Active CN106911636B (en) 2015-12-22 2015-12-22 Method and device for detecting whether backdoor program exists in website

Country Status (1)

Country Link
CN (1) CN106911636B (en)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107241296B (en) * 2016-03-28 2020-06-05 阿里巴巴集团控股有限公司 Webshell detection method and device
CN108337269B (en) * 2018-03-28 2020-12-15 杭州安恒信息技术股份有限公司 WebShell detection method
CN114430348B (en) * 2022-02-07 2023-12-05 云盾智慧安全科技有限公司 Web site search engine optimization backdoor identification method and device

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102045360A (en) * 2010-12-27 2011-05-04 成都市华为赛门铁克科技有限公司 Method and device for processing baleful website library
CN102158499A (en) * 2011-06-02 2011-08-17 国家计算机病毒应急处理中心 Trojan-embedded website detection method based on hyper text transfer protocol (HTTP) traffic analysis
CN102377583A (en) * 2010-08-09 2012-03-14 百度在线网络技术(北京)有限公司 Method and system for counting website traffic
CN103297435A (en) * 2013-06-06 2013-09-11 中国科学院信息工程研究所 Abnormal access behavior detection method and system on basis of WEB logs
CN103607413A (en) * 2013-12-05 2014-02-26 北京奇虎科技有限公司 Method and device for detecting website backdoor program
CN103902476A (en) * 2013-12-27 2014-07-02 哈尔滨安天科技股份有限公司 Webpage backdoor detection method and system based on non-credit-granting
US8826426B1 (en) * 2011-05-05 2014-09-02 Symantec Corporation Systems and methods for generating reputation-based ratings for uniform resource locators
CN104468477A (en) * 2013-09-16 2015-03-25 杭州迪普科技有限公司 WebShell detection method and system

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20090328208A1 (en) * 2008-06-30 2009-12-31 International Business Machines Method and apparatus for preventing phishing attacks

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102377583A (en) * 2010-08-09 2012-03-14 百度在线网络技术(北京)有限公司 Method and system for counting website traffic
CN102045360A (en) * 2010-12-27 2011-05-04 成都市华为赛门铁克科技有限公司 Method and device for processing baleful website library
US8826426B1 (en) * 2011-05-05 2014-09-02 Symantec Corporation Systems and methods for generating reputation-based ratings for uniform resource locators
CN102158499A (en) * 2011-06-02 2011-08-17 国家计算机病毒应急处理中心 Trojan-embedded website detection method based on hyper text transfer protocol (HTTP) traffic analysis
CN103297435A (en) * 2013-06-06 2013-09-11 中国科学院信息工程研究所 Abnormal access behavior detection method and system on basis of WEB logs
CN104468477A (en) * 2013-09-16 2015-03-25 杭州迪普科技有限公司 WebShell detection method and system
CN103607413A (en) * 2013-12-05 2014-02-26 北京奇虎科技有限公司 Method and device for detecting website backdoor program
CN103902476A (en) * 2013-12-27 2014-07-02 哈尔滨安天科技股份有限公司 Webpage backdoor detection method and system based on non-credit-granting

Also Published As

Publication number Publication date
CN106911636A (en) 2017-06-30

Similar Documents

Publication Publication Date Title
US10193929B2 (en) Methods and systems for improving analytics in distributed networks
CN107241296B (en) Webshell detection method and device
US10216848B2 (en) Method and system for recommending cloud websites based on terminal access statistics
US9613156B2 (en) Cookie information sharing method and system
CN107689940B (en) WebShell detection method and device
CN107257390B (en) URL address resolution method and system
CN106911635B (en) Method and device for detecting whether backdoor program exists in website
CN111008405A (en) Website fingerprint identification method based on file Hash
CN107426196B (en) Method and system for identifying WEB invasion
CN107145779B (en) Method and device for identifying offline malicious software log
CN110619075B (en) Webpage identification method and equipment
CN106911636B (en) Method and device for detecting whether backdoor program exists in website
CN112769775B (en) Threat information association analysis method, system, equipment and computer medium
CN112600797A (en) Method and device for detecting abnormal access behavior, electronic equipment and storage medium
CN105653949A (en) Malicious program detection method and device
CN107103243B (en) Vulnerability detection method and device
CN107180194B (en) Method and device for vulnerability detection based on visual analysis system
CN114024761B (en) Network threat data detection method and device, storage medium and electronic equipment
CN106657422B (en) Method, device and system for crawling website page and storage medium
CN109600272B (en) Crawler detection method and device
JP6169497B2 (en) Connection destination information determination device, connection destination information determination method, and program
CN112507341A (en) Vulnerability scanning method, device, equipment and storage medium based on web crawler
CN110889065B (en) Page stay time determination method, device and equipment
CN106446687B (en) Malicious sample detection method and device
CN108228613B (en) Data reading method and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
TR01 Transfer of patent right
TR01 Transfer of patent right

Effective date of registration: 20220819

Address after: No. 9-3-401, No. 39, Gaoxin 6th Road, Binhai Science and Technology Park, High-tech Zone, Binhai New District, Tianjin 300000

Patentee after: 3600 Technology Group Co.,Ltd.

Address before: 100088 room 112, block D, 28 new street, new street, Xicheng District, Beijing (Desheng Park)

Patentee before: BEIJING QIHOO TECHNOLOGY Co.,Ltd.

Patentee before: Qizhi software (Beijing) Co.,Ltd.