WO2017059778A1 - Method, device and system for detecting shell website - Google Patents

Method, device and system for detecting shell website Download PDF

Info

Publication number
WO2017059778A1
WO2017059778A1 PCT/CN2016/100734 CN2016100734W WO2017059778A1 WO 2017059778 A1 WO2017059778 A1 WO 2017059778A1 CN 2016100734 W CN2016100734 W CN 2016100734W WO 2017059778 A1 WO2017059778 A1 WO 2017059778A1
Authority
WO
WIPO (PCT)
Prior art keywords
website
websites
empty shell
empty
shell
Prior art date
Application number
PCT/CN2016/100734
Other languages
French (fr)
Chinese (zh)
Inventor
戚宏伟
Original Assignee
阿里巴巴集团控股有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 阿里巴巴集团控股有限公司 filed Critical 阿里巴巴集团控股有限公司
Publication of WO2017059778A1 publication Critical patent/WO2017059778A1/en

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L43/00Arrangements for monitoring or testing data switching networks
    • H04L43/08Monitoring or testing based on specific metrics, e.g. QoS, energy consumption or environmental parameters
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L43/00Arrangements for monitoring or testing data switching networks
    • H04L43/50Testing arrangements

Definitions

  • the present invention relates to the field of computers, and in particular to a method, device and system for detecting a shell website.
  • the empty shell website means that in the industrial and informational filing system, the historical record information of the sponsor of the website contains the main body information and the website information, but there is no access information (that is, the website only has the record number, but the actual use space IP address of the website changes. The website sponsor did not transfer the filing information to the new access service provider.)
  • the organizer of the website “website” submitted a filing application to the Ministry of Industry and Information Technology through the Facebook Cloud filing system, and obtained the registration number of the website issued by the Ministry of Industry and Information Technology.
  • the website " The website data of the website will be stored in the virtual space provided by Facebook Cloud.
  • the organizer of the website “website” privately selects the IP address of the changed access service provider, that is, the organizer of the website “website” has changed the access privately.
  • the sponsor of the website “website” did not transfer the filing information to the new access service provider.
  • the website “website” The access provider is still Facebook Cloud.
  • the website “website” does not use any products of Facebook Cloud.
  • the website “website” is a shell website for the access service provider Facebook Cloud.
  • the empty shell website In the existing detection technology of the empty shell website, it is often used to identify the empty shell website, that is, according to the experience of the customer service personnel of the access provider, the empty shell website is found, and then the empty shell website is cleaned up.
  • the embodiment of the invention provides a method, a device and a system for detecting a shell website, so as to at least solve the problem that the prior art adopts a manual resolution method to detect and clean the shell website, which is easy to miss detection, resulting in the detection result of the shell website.
  • Technical problems with low accuracy are also known.
  • a method for detecting a empty shell website includes: extracting a plurality of websites to be detected; calling one or more empty shell website detection conditions from the website detection condition set, and using one Or multiple empty shell website detection conditions to determine whether any website is a shell website; the output test results are the website of the empty shell website.
  • an apparatus for detecting a private website comprising: an extracting unit, configured to extract a plurality of websites to be detected; and an invoking unit, configured to invoke one from the website detecting condition set Or multiple empty shell websites detect conditions and use one or more empty shell website detection conditions to determine whether any one website is an empty shell website; an output unit for outputting a website whose detection result is a shell website.
  • a system for detecting a shell website including: a record server for storing information of a plurality of websites; a detection server, establishing a communication relationship with the record server, for recording from the record
  • the server extracts multiple websites to be detected, invokes one or more empty shell website detection conditions from the website detection condition collection, and uses one or more empty shell website detection conditions to determine whether any one website is an empty shell website;
  • the server is also used to output a website whose detection result is an empty shell website.
  • a plurality of websites to be detected are extracted; one or more empty website detection conditions are invoked from the website detection condition set, and one or more empty website detection conditions are used to determine whether any one website is It is an empty shell website; the output test results are the website of the empty shell website.
  • FIG. 1 is a block diagram showing the hardware structure of a method for detecting a shell website according to an embodiment of the present invention
  • FIG. 2 is a flow chart of a method of detecting a empty shell website in accordance with an embodiment of the present invention
  • FIG. 3 is a schematic diagram of an apparatus for detecting a shell website according to an embodiment of the present invention.
  • FIG. 4 is a schematic diagram of an optional apparatus for detecting a shell website, in accordance with an embodiment of the present invention.
  • FIG. 5 is a schematic diagram of an optional apparatus for detecting a shell website, in accordance with an embodiment of the present invention.
  • FIG. 6 is a schematic diagram of an optional apparatus for detecting a shell website, in accordance with an embodiment of the present invention.
  • FIG. 7 is a schematic diagram of an optional apparatus for detecting a shell website, in accordance with an embodiment of the present invention.
  • FIG. 8 is a schematic diagram of an optional apparatus for detecting a shell website, in accordance with an embodiment of the present invention.
  • FIG. 9 is a schematic diagram of an optional apparatus for detecting a shell website, in accordance with an embodiment of the present invention.
  • FIG. 10 is a schematic diagram of an optional apparatus for detecting a shell website according to an embodiment of the present invention.
  • FIG. 11 is a schematic diagram of an optional apparatus for detecting a shell website according to an embodiment of the present invention.
  • FIG. 12 is a schematic diagram of a system for detecting a shell website according to an embodiment of the present invention.
  • FIG. 13 is a structural block diagram of a computer terminal according to an embodiment of the present application.
  • ODPS Open Data Processing Service
  • the development of data processing services is Facebook Cloud's independent research and development, providing TB/PB-level data, real-time non-demanding distributed processing capabilities, used in data analysis, mining, business intelligence and other fields. .
  • OTS Open Table Service
  • an open structured data service is a NoSQL service, oriented to Structured data and semi-structured data provide massive storage and real-time query capabilities with strong consistency, high concurrency, low latency, and support for flexible data models.
  • HBASE (Hadoop Database) is a high-reliability, high-performance, column-oriented, scalable distributed storage system that uses HBASE technology to build large-scale structured storage clusters on inexpensive PC Servers.
  • an embodiment of a method of detecting a shell website is also provided, it being noted that the steps illustrated in the flowchart of the figures may be performed in a computer system such as a set of computer executable instructions And, although the logical order is shown in the flowcharts, in some cases the steps shown or described may be performed in a different order than the ones described herein.
  • FIG. 1 is a block diagram showing a hardware structure of a computer terminal for detecting a method of a shell website according to an embodiment of the present invention.
  • computer terminal 10 may include one or more (only one shown) processor 102 (processor 102 may include, but is not limited to, a processing device such as a microprocessor MCU or a programmable logic device FPGA)
  • processor 102 may include, but is not limited to, a processing device such as a microprocessor MCU or a programmable logic device FPGA)
  • a memory 104 for storing data
  • a transmission module 106 for communication functions.
  • computer terminal 10 may also include more or fewer components than those shown in FIG. 1, or have a different configuration than that shown in FIG.
  • the memory 104 can be used to store software programs and modules of the application software, such as the program instructions/modules corresponding to the method for detecting the empty shell website in the embodiment of the present invention, and the processor 102 runs the software program and the module stored in the memory 104, thereby Perform various functional applications and data processing, that is, implement the vulnerability detection method of the above application.
  • Memory 104 may include high speed random access memory, and may also include non-volatile memory such as one or more magnetic storage devices, flash memory, or other non-volatile solid state memory.
  • memory 104 may further include memory remotely located relative to processor 102, which may be coupled to computer terminal 10 via a network. Examples of such networks include, but are not limited to, the Internet, intranets, local area networks, mobile communication networks, and combinations thereof.
  • Transmission device 106 is for receiving or transmitting data via a network.
  • the network specific examples described above may include a wireless network provided by a communication provider of the computer terminal 10.
  • the transmission device 106 includes a Network Interface Controller (NIC) that can be connected to other network devices through a base station to communicate with the Internet.
  • the transmission device 106 can be a Radio Frequency (RF) module for communicating with the Internet wirelessly.
  • NIC Network Interface Controller
  • RF Radio Frequency
  • FIG. 2 is a flow chart of a method of processing website data according to a first embodiment of the present invention.
  • the method for processing website data may include the following implementation steps:
  • Step A extracting a plurality of websites to be detected.
  • the plurality of websites to be detected may be stored in a filing system of the website access provider, and the website sponsor may back up the website information to the filing system of the website access provider when the website is created,
  • the website information of a plurality of websites may be stored.
  • the website data of the filing website in the above filing system is saved in the virtual network of the access provider server provided by the website access provider. In space.
  • the detection of the empty shell website solution can be implemented by using a special detection server, that is, the website access provider can use the detection server to obtain the website of the above-mentioned large number of websites from the filing system of the website access provider. Information, and then detect the empty shell website.
  • the website sponsor when Facebook Cloud detects the empty shell website in the Facebook Cloud filing system, the website sponsor will select Facebook Cloud as the website access provider when creating the website.
  • the website sponsor can file the filing with Facebook Cloud's filing system.
  • the information will include information about each website (such as the website's website provider information).
  • the website successfully registered in the Facebook Cloud Filing System will save the website data in the access provider's server, ie Ali. Cloud access provider server, but in some cases, some website sponsors will change the IP address of the website access provider privately. For example, the founder "U1" of the website "W1" will privately access the IP address of the website access provider.
  • the reason for the formation of the above-mentioned empty shell website is not limited to the website sponsor changing the IP address of the website access provider, and the website itself is stopped from being established, and a hollow shell website is also formed.
  • Step B Invoking one or more empty shell detection conditions from the website detection condition set, and using one or more empty shell website detection conditions to determine whether any one website is an empty shell website.
  • a website detection condition set may be preset in the detection server, and the website detection condition set may include one or more empty shell website detection conditions, and the detection server may invoke the one or more detections.
  • the condition is to detect a plurality of websites to be detected to detect the empty website in the website to be detected.
  • Google Cloud detects the empty shell website in the Facebook Cloud filing system.
  • Facebook Cloud's detection server can pre-store one or more empty shell website detection conditions. It should be noted that the website detection condition collection can be based on actual conditions. The condition of the empty shell website is increased or decreased or changed. After Facebook Cloud's detection server extracts multiple websites from Facebook Cloud's filing system, it can call the detection conditions of one or more of the above-mentioned empty shell websites to detect the above conditions. The website is screened to filter out empty shell sites.
  • step C the output detection result is a website of the empty shell website.
  • the detecting server may determine the legal website and the empty shell website, and the detecting server may output the matching.
  • the empty shell website that detects the conditions can be processed by the staff.
  • the detecting server may first extract information of multiple websites in the filing system, and then detect the condition through the preset website.
  • the detection conditions of one or more empty shell websites in the collection are detected on the plurality of websites, and finally the detection server outputs a website whose detection result is a shell website. It is easy to notice that since the access provider only needs to send a detection instruction to the detection server during the process of determining the empty shell website by the detection server, the detection server can automatically call the preset empty shell website detection condition to judge a large number of websites, so
  • the solution provided by the embodiment of the present invention does not require a large amount of manpower to identify the empty shell website.
  • the detection server automatically identifies the empty shell website from multiple websites, and the number of identifications is not limited, so that the detection server can be realized not only
  • the automatic and batch realization of the identification of the empty shell website avoids the short period of the manual identification of the empty shell website in the prior art, and the solution for identifying the empty shell website provided by the present application is called by detecting one or more tests through the detection server.
  • the conditions for the detection of the website, the accuracy of the detection of the empty shell website is greatly increased, therefore, it can ensure accurate and rapid implementation of empty shell websites detected in a large number of websites. Therefore, the solution of the first embodiment provided by the present application solves the technical problem that the prior art adopts the manual resolution method to detect and clean the empty shell website, which is easy to miss detection, and leads to low accuracy of the detection result of the empty shell website.
  • step B calling one or more empty shell website detection conditions from the website detection condition set, and using one or more empty shell website detection conditions to determine whether any one website is
  • the steps for an empty shell website can include:
  • Step S141 the empty shell website detection condition in the website detection condition set in step B above may be invoked according to a predetermined calling rule, wherein the predetermined calling rule includes any one or more of the following rules: calling order, calling quantity, and calling Types of.
  • the detection server may invoke the website detection condition set according to the calling rule.
  • the preset number may be specified in the calling rule, and the detecting server invokes a preset number of empty shell website detecting conditions in the website detecting condition set; and the preset empty shell website detecting condition may also be specified in the calling rule.
  • Type the detecting server may call the empty shell website detecting condition that meets the preset type in the above-mentioned website detecting condition set; in the above calling rule, the preset order may also be specified, that is, when the shell website detection is performed, multiple empty shell websites are The execution order of the detection conditions.
  • Google Cloud's detection server can call a number of empty shell website detection conditions in a pre-stored website detection condition set according to a preset calling rule, for example, in the case of Facebook Cloud detection of the empty cloud website in the Facebook Cloud filing system.
  • the website detection condition set includes five empty shell website detection conditions, and the Facebook Cloud detection server can call any number of arbitrary website detection conditions of the above five empty shell website detection conditions according to the predetermined calling rule, and execute the empty shell.
  • multiple empty shell detection conditions can be executed according to the calling order in the calling rule.
  • step B after any one of the plurality of empty shell website detection conditions determines that the first website of the plurality of websites is a shell website, This embodiment further includes the following steps:
  • Step S142 using other empty shell website conditions in the plurality of empty shell website detection conditions to determine whether the first website is a shell website, and determining that the first website is a shell website when all the shell website detection conditions determine that the first website is a shell website, The first website is an empty website. Otherwise, if any of the empty website detection conditions determines that the first website is not a shell website, the first website is determined to be a legitimate website.
  • step S142 of the present application after determining that the first website is an empty shell website by calling any one of the website detection condition detection conditions, the other website detection conditions in the website detection condition set may be further called.
  • the first website is finally determined only if each of the empty website detection conditions in the website detection condition set determines that the first website is a hollow website.
  • the empty website is output, otherwise the first website is a legitimate website, that is, if one of the website detection condition sets determines that the first website is not a shell website, the first website is finally confirmed as a legitimate website.
  • Google Cloud's server detects the empty shell website in the Facebook Cloud filing system.
  • Facebook Cloud's server makes a shell judgment on the website "W1".
  • the website detection condition set in the Facebook Cloud detection server can contain 5 empty shells.
  • Website detection conditions such as condition A, condition B, condition C, condition D, condition E
  • when the Facebook Cloud detection server performs the detection of the empty shell website it can call the condition A to the website "W1" according to the predetermined calling rule.
  • condition A determines that the website "W1" is a shell website, and at this time, the detection server continues to call the remaining conditions to continue the determination of the website "W1", and if the conditions B to E determine the website " W1" is an empty shell The website, the detection server determines that the website “W1" is a shell website and outputs it. If condition A to condition E determines that the website "W1" is a shell website, condition E determines that the website "W1" is not a shell website. The detection server determines that the above website "W1" is a legitimate website.
  • the foregoing empty shell website detection condition may include any one or more of the following types: whether any one website is in the white list, whether it is in the record or in the record change, at the scheduled time. Whether there is an access record, whether it is registered, and whether there is report information in the analysis result.
  • the empty shell website detection condition set may include the above four conditions, the first condition: whether any website is in the white list; the second condition: whether any website is in the record or in the record change; Condition: Whether there is an access record for any website; Fourth condition: Whether any website is registered and whether the report has the report information.
  • the detecting server may invoke the above four conditions according to a certain predetermined rule.
  • the detecting server may sequentially invoke the first condition to the fourth condition, and it should be noted that each of the above four conditions The conditions can be judged individually by any website (the first website) as a shell website, but the detection server determines that the first website is a shell website only if the four conditions determine that the first website is a shell website. That is, if the above four conditions have a condition that the first website is not a shell website, the detection server determines that the first website is a legitimate website.
  • the detection server invokes the empty shell website detection condition
  • the above four conditions may be invoked according to a preset rule.
  • the detecting server may determine whether the first website is empty according to whether the first website is in the white list, whether it is in the record or in the filing change, whether there is an access record within a predetermined time, whether it is registered, and whether the parsing result has the report information. Shell website.
  • the step of using the empty shell website detecting condition to determine whether any one website is an empty shell website includes: :
  • step S1411 the website information of any website is read.
  • the website information of any one of the websites may be the domain name of the website
  • the detection server may read the website domain name of the first website of the plurality of websites to be detected.
  • step S1412 it is determined whether the website information of any one website matches the website information saved in the white list.
  • step S1413 if the matching is successful, it is determined that any one website is a legitimate website.
  • the domain name of the plurality of legitimate websites may be pre-stored in the white list, and the detecting server first reads the domain name of the first website, and then combines the domain name of the first website with the plurality of the white list.
  • the domain name of the French website is matched.
  • the first website is a whitelist website, and the detection server determines that the first website is a legitimate website.
  • the detection server determines that the first website exists as a risk of the empty website, and calls other judgment conditions to continue to determine the first website.
  • the first website is finally determined to be a shell website only when all the judgment conditions determine that the first website is a shell website.
  • Google Cloud detects the empty shell website in the Facebook Cloud filing system.
  • the Google Cloud detection server there is a white list pre-stored.
  • the white list stores a website that has a cooperative relationship with Facebook Cloud.
  • Facebook Cloud will be whitelisted.
  • the website defaults to a legitimate website to prevent large customers from mishandling themselves, causing the websites of large customers to be mistakenly cleaned up because of the conditions of the empty shell website.
  • Google Cloud's detection server makes a shell judgment on the website "W1”
  • Facebook Cloud's detection server can read the domain name of the website "W1”
  • the domain name of "W1" is the same as the domain name of multiple legitimate websites in the whitelist. Matching is performed.
  • the above-mentioned website "W1" of Facebook Cloud's detection server is a whitelisted website. If the match fails, the detection server determines that the website "W1" exists as a risk of the empty website, and then the detection server continues to call other determination conditions to determine the website "W1" until each determination condition determines the website "W1" For the empty shell website, the detection server determines that the website "W1" is an empty shell website.
  • the empty shell website detecting condition when the empty shell website detecting condition is to detect whether any one website has an access record within a predetermined time, the empty shell website detecting condition is used to determine whether any one website is an empty shell website.
  • the steps can include:
  • Step S1414 Acquire an access log of the domain name recorded by the plurality of websites in the server.
  • step S1415 the access log is queried according to the domain name of any website to check whether an access record is recorded within a predetermined time.
  • the detecting server may obtain the access log corresponding to the domain name of the plurality of websites from the plurality of websites in the website server, and the access log of each domain name may be queried from the access log, and the detecting server may Check whether each domain name has an access record within a predetermined time.
  • step S1416 if an access record is recorded within a predetermined time, it is determined that any one website is a legitimate website.
  • step S1416 if the first website of the plurality of websites records the access record within the predetermined time period, the detecting server determines the first website as the legitimate website.
  • the detection server determines that the first website exists as a risk of the empty website, and then calls other judgment conditions to continue the determination of the first website.
  • the first website is only finalized when all the judgment conditions determine that the first website is a shell website. Determined to be an empty shell website.
  • Google Cloud's detection server can query the website "W1" from the website server according to the domain name of the first website when determining the website "W1".
  • the computer access log from the above access log, can query whether the domain name of the website "W1" has an access record at the scheduled time. If the domain name of the website "W1" exists in the access record within 60 days, the detection server will "the website” W1" is determined to be a legitimate website.
  • the detection server determines that the website "W1" exists as a risk of the empty website, and the detection server continues to call other determination conditions to determine the website “W1” until each The judgment conditions all determine that the website "W1" is a shell website, and the detection server determines the website “W1" as a shell website. It should be noted that the detection server can use the verification website "W1" for the existence of the access record.
  • the ODPS performs data cleaning and then stores the cleaned data in the OTS to provide quick access to external devices.
  • the empty shell website detecting condition when the empty shell website detecting condition is to detect any one website as a record or a record change, the empty shell website detecting condition is used to determine whether any one website is an empty shell website.
  • the steps include:
  • step S1417 the website information of any website is read, wherein the website information includes: the filing status of the domain name of any one website.
  • step S1418 it is determined whether the filing status of the domain name of any one website is in the record or in the record change.
  • the detecting server may read the filing status of the first website domain name from the access service provider's filing system according to the domain name of the first website, and the filing system may store the filing status of the plurality of website domain names in the filing system, and detect The server can determine whether the filing status of the domain name of the first website is in the record or in the record change.
  • step S1419 if the filing status of the domain name of any website is in the filing or the filing change, it is determined that any website is a legitimate website.
  • step S1419 when the filing status of the domain name of the first website is in the filing or the filing change, the detecting server determines that the first website is an illegal website.
  • the detecting server determines that the first website exists as a risk of the empty website, and then calls other judgment conditions.
  • the first website continues to determine that the first website is finally determined to be a shell website only when all the judgment conditions determine that the first website is a shell website.
  • the empty shell website detecting condition is whether the any one website is registered and whether the parsing result has the report information
  • the empty shell website detecting condition is used to determine any
  • the steps for whether a website is an empty shell include:
  • step S1420 the website information of any website is read.
  • Step S1421 Query whether there is information matching the website information of any one website in the registration information table.
  • step S1422 if the matching is successful, the type of the any one of the websites is determined according to whether there is any report information according to the result of any website parsing.
  • the website information of any one of the websites may be the domain name of the website, and the detection server may read the domain name and registration information of the first website after reading the domain name of the first website from the filing system.
  • the multiple domain names in the table are matched.
  • the domain name in the registration information table may be the domain name of the registered website. If the matching is successful, the detection server determines the first according to the analysis result of the first website.
  • a website is a legitimate website or a shell website.
  • the detection condition of the empty shell website invoked by the detection server directly determines that the first website has the risk of becoming a shell website. If the success is successful, the detection server continues to determine whether the first website is a legitimate website or a shell website based on the results of the parsing of the first website.
  • the step of determining the type of the any one of the websites according to whether the result of any one of the website parsings is determined by the step of the website parsing may include:
  • step S14221 if the IP address of any website is the same as the IP address already recorded by the access provider server, it is determined that any website is a normal website.
  • step S14222 if the IP address of any website is different from the IP address already recorded by the access provider server, it is determined that any website is an empty website.
  • the detection server invokes The detection condition of the empty shell website determines that the first website is a normal website. If the IP address after the domain name resolution of the first website is not the IP address of the data access provider server, the detection condition of the empty shell detected by the detection server is determined.
  • the first website mentioned above is a look at the empty website.
  • the method provided in this embodiment may further include:
  • Step S18 the website information of the plurality of websites is sequentially written into the data queue by starting at least n data distribution threads.
  • Step S19 sequentially, by starting at least m detection threads, sequentially reading website information of a plurality of websites from the data queue, wherein m and n are automatically adjusted according to a preset total detection time, where m is greater than or equal to n, and m and n For natural numbers.
  • the cleaning server may include a data distribution function module and a verification function module.
  • the data distribution function module may start n data distribution threads by default, and the n data distribution threads may simultaneously send data queues.
  • the website information of the plurality of websites is stored, and then the verification function module starts m detection threads by default, and the m detection threads can read the website information of the plurality of websites from the data queue, and then perform detection to determine the empty shell website.
  • this scheme can use BlockingQueue (bounded blocking queue supported by array) as the data queue.
  • Google Cloud's detection server can start two data distribution threads by default each time.
  • the above two data distribution threads can simultaneously send to the BlockingQueue (supported by the array).
  • the website information of each website further includes a terminal address of the sponsor of each website, and in step C, after outputting the website whose detection result is a shell website, the embodiment provides The method can also include:
  • step S20 the alarm information is sent to the terminal address of the sponsor determined to be the empty shell website, wherein the alarm information includes at least the domain name of the empty shell website.
  • the detecting server may send an alarm message to the organizer of the first website after determining that the first website is a shell website, and the alarm information is used to remind the founder of the shell website to adjust the shell website.
  • the alarm information may prompt the user of the empty shell website to handle the filing and access.
  • Google Cloud's detection server is determining After the website "W1" is an empty shell website, the detection server can send audio to the mobile phone of the website sponsor "U1" of the website "W1", which can be a suggested adjustment scheme for the empty shell website, and a user who cannot send audio. You can send mail. After successfully sending the alarm information, Facebook Cloud's detection server can generate a transmission report and send it to the message center to ensure that the customer has received the alarm information. For customers who cannot be notified, the detection server transfers to the manual customer service.
  • the method provided in this embodiment may include:
  • step S21 after the preset duration is reached, step A to step C are repeatedly executed to obtain a website that is determined to be a shell website again.
  • the detection scheme of the shell website of the above steps A to C can be performed again after the preset duration, and then the determination is made again.
  • the empty shell website it should be noted that, in the above step S20, the cleaning server has notified the organizer of the empty shell website (for example, the first website), and if the organizer of the first website does not adjust the website in time, the first The website will be identified as a shell website again.
  • step S22 the website determined to be the empty shell website is recorded as the website to be cleaned up.
  • step S23 the domain name of the website to be cleaned is sent to the target server.
  • the target server may be a server of the communication authority.
  • the detecting server may send the domain name of the first website to the communication authority, and the communication is performed. The Authority will cancel the access to the first website.
  • Step S30 a scheme for performing data extraction, which can extract a plurality of websites to be checked.
  • Google Cloud's cleaning server can extract information of the website to be verified from Facebook Cloud's filing system. It should be noted that the cleaning server can automatically extract the website information data of the website to be verified during the non-business peak period to reduce the impact on the normal business of Facebook Cloud. It should also be noted that the cleaning server can extract the newly filed data of the last 90 days when it is extracted, so as to avoid the new filing customer being cleaned up by mistake, which can improve security.
  • the domain name of each website in the information of the plurality of websites extracted by the cleaning server from the filing system, the domain name of each website, the website address of each website, the filing status of each website, and the like may be included, preferably, In the empty shell website determining step S31, the present embodiment determines the empty shell website by the domain name of each website.
  • step S31 it is determined which of the plurality of websites to be checked are empty shell websites.
  • the determining scheme of the above-mentioned empty shell website may include an optional solution, which may include the following implementation steps:
  • Step S311 obtaining the domain name of each website to start verification, and checking whether the website belongs to the empty shell website based on the domain name of each website.
  • Google Cloud's cleaning server may extract the domain names of multiple websites from the above website information and determine the empty shell website according to the domain name.
  • Step S312 determining whether there is a domain name whitelist for the domain name of any website to be verified.
  • the domain name whitelist may include a domain name of a large client that needs to be maintained in cooperation with Facebook Cloud. If the domain name has a whitelist, step S318 is performed. If the domain name does not exist in the whitelist, step S313 is performed. . It should be noted that if the whitelist domain name is not cleaned, it can prevent the fault caused by the customer's misoperation being cleaned up.
  • step S313 it is determined whether there is an access record for the domain name of any website to be verified.
  • the cleaning server can obtain the domain name access log of all the computer rooms in the website server, and merge into the top-level domain name. If the domain name of the website exists in the access record within 60 days, step S318 is performed, and the domain name of the website does not exist within 60 days. In the case of recording, step S314 is performed. It should be noted that this step does not clean up the access records of the website's domain name within 60 days. It should also be noted that the judgment of the access record involves big data cleaning, the existing ODPS can be used for data cleaning, the data storage is cleaned to the OTS to provide high concurrent fast access, and the ODPS can be replaced with other big data processing technologies. , OTS can be replaced with HBASE.
  • step S314 it is determined whether the status of the domain name of any website to be verified is in the record or in the record change.
  • the cleaning server may further determine whether the domain name of any one of the verified websites is in the record or the record change, and in the case that the domain name of any website to be verified is in the record or the record change, step S318 is performed, and any one is to be checked. If the status of the domain name of the website is neither in the filing nor in the case of the filing change, step S315 is performed.
  • step S315 it is determined whether the domain name of any website to be verified is registered.
  • the cleaning server may further determine whether the domain name of any website to be verified is registered. If the domain name is not registered, step S317 is performed. If the domain name of any website to be checked is already registered, step S316 is performed.
  • Step S316 parsing the domain name, and determining whether the parsed IP belongs to Facebook Cloud.
  • the cleanup server can parse the domain name (direct domain name resolution or www domain name resolution).
  • the cleaning server determines whether the IP address of the domain name belongs to Facebook Cloud. If it belongs to Facebook Cloud, step S318 is performed. If the parsed IP does not belong to Facebook Cloud, step S317 is performed.
  • step S317 any website to be verified is determined as an empty shell website.
  • step S3108 any website to be verified is determined as a normal website.
  • step S311 to step S318 can be cyclically executed for a predetermined time, for example, 5 days, through the above-mentioned loop protection measures, the false positive rate of the empty shell website is reduced, and the accuracy of the existing data is ensured, and the maximum is improved. safety.
  • Step S32 notifying the customer of the website information determined to be the empty shell website.
  • the cleaning server may notify the client in batches of the domain name determined to be the empty shell website, that is, the sponsor of the empty shell website. It should be noted that the cleaning server obtains the contact information of the customer's mobile phone number mailbox, and notifies the customer to make adjustments through the above contact information. The cleaning server can automatically call the customer's mobile phone number, and then play the website that the customer needs to adjust and the specific adjustment plan. For customers who cannot be called, the cleaning server can send an adjustment plan to the customer's mailbox. After the cleanup server notifies the client, the cleanup server will call back the message center to ensure that the customer has received the adjustment notice, so that the customer is not cleaned up without knowing, and the unreachable client can be manually processed.
  • step S33 the client of the empty shell website rectifies the empty shell website.
  • the customer may rectify the empty shell website according to the above rectification plan.
  • step S34 the cleaning server performs a review and cleanup on the website determined as the empty shell website.
  • the cleaning server re-checks the website determined as the empty website website according to step S311 to step S318, and does not clean up the website that the customer has adjusted and is not adjusted, or Adjust the unqualified website (again identified as a shell website), clean up the server to generate a message and submit it to the management office, and cancel the access operation to the empty shell website.
  • the technical solution of the present invention which is essential or contributes to the prior art, may be embodied in the form of a software product stored in a storage medium (such as ROM/RAM, disk,
  • a storage medium such as ROM/RAM, disk,
  • the optical disc includes a number of instructions for causing a terminal device (which may be a cell phone, a computer, a server, or a network device, etc.) to perform the methods of various embodiments of the present invention.
  • the apparatus includes: an extracting unit 40, an invoking unit 42, and an output unit 44. .
  • the extracting unit 40 is configured to extract a plurality of websites to be detected.
  • the calling unit 42 is configured to invoke one or more empty shell website detection conditions from the website detection condition set, and use one or more empty shell website detection conditions to determine whether any one website is an empty shell website; the output unit 44 uses The website that outputs the test results for the empty shell website.
  • the extracting unit 40, the extracting unit 40, the calling unit 42, and the output unit 44 correspond to the steps A to C in the first embodiment, and the examples and application scenarios implemented by the three units and corresponding steps. The same, but not limited to, the content disclosed in the above embodiment 1. It should be noted that the above module can be operated as part of the device in the computer terminal 10 provided in the first embodiment.
  • the detecting server may first extract information of multiple websites in the filing system, and then preset The detection condition of one or more empty shell websites in the website detection condition collection detects the above multiple websites, and finally the detection server outputs the website whose detection result is an empty shell website. It is easy to notice that since the access provider only needs to send a detection instruction to the detection server during the process of determining the empty shell website by the detection server, the detection server can automatically call the preset empty shell website detection condition to judge a large number of websites, so The solution provided by the embodiment of the present invention does not require a large amount of manpower to identify the empty shell website.
  • the detection server automatically identifies the empty shell website from multiple websites, and the number of identifications is not limited, so that the detection server can be realized not only
  • the automatic and batch realization of the identification of the empty shell website avoids the short period of the manual identification of the empty shell website in the prior art, and the solution for identifying the empty shell website provided by the present application is called by detecting one or more tests through the detection server.
  • the conditions for the detection of the website, the accuracy of the detection of the empty shell website is greatly increased, therefore, it can ensure accurate and rapid implementation of empty shell websites detected in a large number of websites. Therefore, the solution of the foregoing embodiment 2 provided by the present application solves the technical problem that the prior art adopts the manual resolution method to detect and clean the empty shell website, which is easy to miss detection, and leads to low accuracy of the detection result of the empty shell website.
  • the calling unit 42 may include: calling the module 421.
  • the calling module 421 is configured to invoke the empty shell website detection condition in the website detection condition set according to the predetermined calling rule, wherein the predetermined calling rule includes any one or more of the following rules: the calling order, the calling quantity, and the calling type.
  • the calling module 421 corresponds to the step S141 in the first embodiment, and the module is the same as the example and the application scenario implemented by the corresponding steps, but is not limited to the content disclosed in the first embodiment. It should be noted that the above module can be operated as part of the device in the computer terminal 10 provided in the first embodiment.
  • the calling unit 42 may further include: a determining module 422.
  • the determining module 422 is configured to determine whether the first website is an empty shell website by using other empty shell website conditions in the plurality of empty shell website detecting conditions, and determining that the first website is a shell website in all the shell website detecting conditions. In the case, it is determined that the first website is an empty website, otherwise, in the case that any of the empty website detection conditions determines that the first website is not a shell website, the first website is determined to be a legitimate website.
  • the calling module 422 corresponds to the step S142 in the first embodiment, and the module is the same as the example and the application scenario implemented by the corresponding steps, but is not limited to the content disclosed in the first embodiment. It should be noted that the above module can be operated as part of the device in the computer terminal 10 provided in the first embodiment.
  • the empty shell website detection condition includes any one or more of the following types: whether any one website is in the white list, whether it is in the record or in the record change, whether there is an access record within a predetermined time, whether it is registered, and parsing The result is whether there is a report information.
  • the calling unit 42 may further include: an obtaining module 423, a querying module 424, and a second determining module 425.
  • the obtaining module 423 is configured to obtain an access log of a domain name recorded by multiple websites in the server.
  • the querying module 424 is configured to query, in the access log, whether the access record is recorded within a predetermined time according to the domain name of any website.
  • the second determining module 425 is configured to determine that any one website is a legitimate website if an access record is recorded within a predetermined time.
  • the obtaining module 423, the querying module 424, and the second determining module 425 correspond to the steps S1414 to S1416 in the first embodiment, and the three modules are the same as the examples and application scenarios implemented by the corresponding steps, but It is not limited to the contents disclosed in the above embodiment 1. It should be noted that the above module can be operated as part of the device in the computer terminal 10 provided in the first embodiment.
  • the calling unit 42 may further include: a second reading module 426, a second determining module 427, and a third determining module 428.
  • the second reading module 426 is configured to read website information of any website, where the website information packet Included: the filing status of the domain name of any website; the second determining module 427 is configured to determine whether the filing status of the domain name of any one website is in the filing or the filing change; the third determining module 428 is used on any website. If the filing status of the domain name is in the case of filing or filing change, it is determined that any website is a legitimate website.
  • the second reading module 426, the second determining module 427, and the third determining module 428 correspond to the steps S1417 to S1419 in the first embodiment, and the three modules and the corresponding steps are implemented by the steps and
  • the application scenario is the same, but is not limited to the content disclosed in the first embodiment. It should be noted that the above module can be operated as part of the device in the computer terminal 10 provided in the first embodiment.
  • the calling unit 42 may further include: a third reading module 429, a second query module 430, and a fourth determining module 431.
  • the third reading module 429 is configured to read website information of any website; the second query module 430 is configured to query, in the registration information table, whether there is information matching the website information of any one website;
  • the determining module 431 is configured to determine the type of any website according to whether the result of any one of the website parsing has the report information if the matching is successful.
  • the third reading module 429, the second query module 430, and the fourth determining module 431 correspond to the steps S1420 to S1422 in the first embodiment, and the three modules and the corresponding steps are implemented by the corresponding steps.
  • the application scenario is the same, but is not limited to the content disclosed in the first embodiment. It should be noted that the above module can be operated as part of the device in the computer terminal 10 provided in the first embodiment.
  • the fourth determining module 431 further includes a sub-determining module 4311.
  • the fifth determining module 432 is configured to determine the type of any website according to the IP address resolved by the domain name of any website; wherein the IP address of any website is the same as the IP address already recorded by the access provider server. In the case of any website, it is determined that any website is a normal website; if the IP address of any website is different from the IP address already recorded by the access server, it is determined that any website is an empty website.
  • sub-determination module 4311 corresponds to the step S14221 to the step S14222 in the first embodiment, and the module is the same as the example and the application scenario implemented by the corresponding steps, but is not limited to the content disclosed in the first embodiment. . It should be noted that the above module can be operated as part of the device in the computer terminal 10 provided in the first embodiment.
  • the apparatus provided in this embodiment may further include: a data distribution unit 46, and a detecting unit 48.
  • the data distribution unit 46 is configured to sequentially write the website information of the plurality of websites into the data queue by starting at least the n data distribution threads, and the detecting unit 48 is configured to: according to the at least m detection threads, start from the data queue
  • the website information of a plurality of websites is read at a time; wherein m and n are automatically adjusted according to a preset total detection time, m is greater than or equal to n, and m and n are natural numbers.
  • the data distribution unit 46, the detecting unit 48 corresponds to the step S18 to the step S19 in the first embodiment, and the module is the same as the example and the application scenario implemented by the corresponding steps, but is not limited to the first embodiment.
  • a system for detecting a shell website for implementing the method for detecting a shell website is further provided.
  • the system includes: a record server 1200, and a detection server 1210.
  • the record server 1200 is configured to store information of a plurality of websites; the detection server 1210 establishes a communication relationship with the record server, and is configured to call one of the website detection condition sets after extracting a plurality of websites to be detected from the record server. Or multiple empty shell websites detect conditions and use one or more empty shell website detection conditions to determine whether any one website is an empty shell website, and output a website whose detection result is an empty shell website.
  • the detecting server 1210 is further configured to invoke the empty shell website detecting condition in the website detecting condition set according to the predetermined calling rule, wherein the predetermined calling rule includes any one or more of the following: Rules: Call order, number of calls, and type of call.
  • the detecting server 1210 is further configured to determine, by using any one of a plurality of empty shell website detecting conditions, that the first website of the plurality of websites is an empty shell website. After that, using the other empty shell website conditions in the multiple shell website detection conditions to determine whether the first website is an empty shell website, and in the case that all the empty shell website detection conditions determine that the first website is a hollow shell website, the first A website is an empty website. Otherwise, if any of the empty website detection conditions determines that the first website is not a shell website, the first website is determined to be a legitimate website.
  • the empty shell website detection condition includes any one or more of the following types: whether any one website is in the white list, whether it is in the record or in the record change, is it within a predetermined time period? There is an access record, whether it is registered, and whether there is report information for the result of the analysis.
  • the detecting server may first extract information of multiple websites in the filing system, and then preset The detection condition of one or more empty shell websites in the website detection condition collection detects the above multiple websites, and finally the detection server outputs the website whose detection result is an empty shell website. It is easy to notice that since the access provider only needs to send a detection instruction to the detection server during the process of determining the empty shell website by the detection server, the detection server can A large number of websites are determined by automatically calling the preset empty shell website detection condition.
  • the solution provided by the embodiment of the present invention does not require a large amount of manpower to identify the empty shell website, and at the same time, the detection server automatically uses multiple The website identifies the empty shell website, and the number of identifications is not limited. This not only realizes that the detection server can automatically and batchly realize the identification of the empty shell website, and avoids the defect that the prior art manually identifies the empty shell website, and this application is long.
  • the solution for identifying the empty shell website is because the detection server detects the website by calling one or more detection conditions, and the accuracy of the detection of the empty shell website is greatly increased, so that accurate and rapid implementation can be ensured in a large number of websites. A shell website was detected. Therefore, the solution of the foregoing embodiment 3 provided by the present application solves the technical problem that the prior art adopts the manual resolution method to detect and clean the empty shell website, which is easy to miss detection, and leads to low accuracy of the detection result of the empty shell website.
  • the detecting server 1210 when the empty shell website detecting condition is to detect whether any one website is in the white list, the detecting server 1210 is configured to read website information of any one website; Whether the website information matches the website information saved in the white list; if the matching is successful, the above detection server 1210 determines that any one website is a legitimate website.
  • the detecting server 1210 when the empty shell website detecting condition is to detect whether any one website has an access record within a predetermined time, the detecting server 1210 is configured to acquire a domain name recorded by multiple websites in the server. The access log; in the access log according to the domain name of any website, whether the access record is recorded within the predetermined time; if the access record is recorded within the predetermined time, the detection server 1210 determines that any website is a legitimate website.
  • the detecting server 1210 when the detecting condition of the empty shell website is to detect any one of the websites as a record or a change in the record, is configured to read the website information of any website, wherein The website information includes: the status of the registration of the domain name of any website; whether the filing status of the domain name of any website is in the record or the record change; in the case where the filing status of the domain name of any website is in the record or in the case of the record change The above detection server 1210 determines that any one website is a legitimate website.
  • the detection server 1210 when the empty shell website detects whether the website is registered and whether the report has the report information, the detection server 1210 is configured to read the website information of any website; In the registration information table, it is queried whether there is information matching the website information of any one website; in the case that the matching is successful, the detection server 1210 determines the type of any website according to whether the result of any website analysis has the report information. .
  • the detecting server 1210 is further configured to determine that any website is legal if the IP address of any website is the same as the IP address already recorded by the access provider server.
  • the website detecting server 1210 determines that any one website is an empty shell website, in the case where the IP address of any one website is different from the IP address already recorded by the access provider server.
  • the detecting server 1210 is further configured to sequentially write website information of multiple websites into the data queue by starting at least n data distribution threads; by starting at least m detection threads from The data queue sequentially reads website information of a plurality of websites; wherein, m and n are automatically adjusted according to a preset total detection time, m is greater than or equal to n, and m and n are natural numbers.
  • the website information of each website further includes a terminal address of the sponsor of each website, wherein after detecting the website whose detection result is a shell website, the detection server 1210 further And sending the alarm information to the terminal address of the sponsor determined to be the empty shell website, wherein the alarm information includes at least the domain name of the empty shell website.
  • the detecting server 1210 is further configured to repeatedly perform step A after the preset duration is reached. Go to step C, obtain the website that is determined to be the empty shell website again; record the website that is determined to be the empty shell website again as the website to be cleaned; and send the domain name of the website to be cleaned to the target server.
  • Embodiments of the present invention may provide a computer terminal, which may be any one of computer terminal groups.
  • the foregoing computer terminal may also be replaced with a terminal device such as a mobile terminal.
  • the computer terminal may be located in at least one network device of the plurality of network devices of the computer network.
  • the computer terminal may execute the program code of the following steps in the vulnerability detection method of the application: extracting multiple websites to be detected; and calling one or more empty shell detection conditions from the website detection condition set, and Use one or more empty shell detection conditions to determine whether any website is a shell site; output a website that detects the result as a shell site.
  • FIG. 13 is a structural block diagram of a computer terminal according to an embodiment of the present invention.
  • the computer terminal A may include one or more (only one shown in the figure) processor 510, memory 530, and transmission device 550.
  • the memory can be used to store software programs and modules, such as the security vulnerability detection method and the program instruction/module corresponding to the device in the embodiment of the present invention.
  • the processor executes various functions by running a software program and a module stored in the memory.
  • Application and data processing that is, the detection method for implementing the above system vulnerability attack.
  • Memory High speed random access memory may also be included, and may also include non-volatile memory such as one or more magnetic storage devices, flash memory, or other non-volatile solid state memory.
  • the memory can further include memory remotely located relative to the processor, which can be connected to terminal A via a network. Examples of such networks include, but are not limited to, the Internet, intranets, local area networks, mobile communication networks, and combinations thereof.
  • the processor may invoke the memory stored information and the application by the transmission device to perform the following steps: Step A: extracting multiple websites to be detected; Step B: calling one or more empty shell websites from the website detection condition set Conditions, and use one or more empty shell website detection conditions to determine whether any website is an empty shell website; Step C: Output the website whose detection result is a shell website.
  • the foregoing processor may further execute the following program code: invoke the empty shell website detection condition in the website detection condition set according to a predetermined calling rule, where the predetermined calling rule includes any one or more of the following rules: Order, number of calls, and type of call.
  • the foregoing processor may further execute the following program code: in any one of the plurality of empty shell website detection conditions, the empty website detection condition determines that the first website of the plurality of websites is an empty shell website, and the use is performed.
  • the other empty shell website conditions in the empty shell website detection condition determine whether the first website is a empty shell website, and if all the empty shell websites detect that the first website is a shell website, the first website is determined to be empty.
  • the shell website otherwise, in the case where any of the empty shell websites detects that the first website is not a shell website, the first website is determined to be a legitimate website.
  • the processor may further execute the following program code: the empty shell website detection condition includes any one or more of the following types: whether any website is in the white list, whether it is in the record or in the record change, is scheduled Whether there is an access record, whether it is registered, and whether there is a report information in the analysis result.
  • the foregoing processor may further execute the following program code: when the empty shell website detecting condition is to detect whether any website is in the white list, use the empty shell website detection condition to determine whether any one website is an empty shell website.
  • the steps include: reading the website information of any website; determining whether the website information of any one website matches the website information saved in the white list; and if the matching is successful, determining that any one website is a legitimate website.
  • the processor may further execute the following program code: when the empty shell website detecting condition is to detect whether any website has an access record within a predetermined time, use the empty shell website detection condition to determine whether any website is
  • the steps of the empty shell website include: obtaining an access log of the domain name recorded by the plurality of websites in the server; querying in the access log according to the domain name of any website whether the access record is recorded within the predetermined time; if the access is recorded within the predetermined time Record and determine that any website is a legitimate website.
  • the foregoing processor may further execute the following program code: when the empty shell website detection condition is to detect any one website as a record or a record change, use the empty shell website detection condition to determine whether any one website is empty.
  • the steps of the shell website include: reading the website information of any website, wherein the website information includes: the filing status of the domain name of any one website; determining whether the filing status of the domain name of any one website is in the record or in the record change; When the filing status of a website's domain name is in the record or in the case of a record change, it is determined that any website is a legitimate website.
  • the processor may further execute the following program code: when the empty shell website detects whether any website is registered and whether the parsing result has report information, use the shell detection condition to determine whether any website is
  • the steps of the empty shell website include: reading the website information of any website; querying the registration information table whether there is information matching the website information of any one website; if the matching is successful, parsing according to any website The result is whether there is a report information to determine the type of any website.
  • the foregoing processor may further execute the following program code: determining whether any type of the website is included according to whether the result of any website parsing has the report information: the IP address of the website and the access provider server have been If the recorded IP addresses are the same, it is determined that any website is a legitimate website; if the IP address of any website is different from the IP address already recorded by the access provider server, it is determined that any website is an empty website.
  • the foregoing processor may further execute the following program code: after extracting a plurality of websites to be detected, the method further includes: sequentially initiating, by starting at least n data distribution threads, site information of the plurality of websites into the data queue.
  • the website information of multiple websites is sequentially read from the data queue by starting at least m detection threads; wherein, m and n are automatically adjusted according to a preset total detection time, m is greater than or equal to n, and m and n are Natural number.
  • the foregoing processor may further execute program code of the following steps: the website information of each website further includes a terminal address of a sponsor of each website, wherein after outputting the website whose detection result is a shell website, the method further The method includes: sending an alarm message to a terminal address of a sponsor determined to be a shell website, wherein the alarm information includes at least a domain name of the empty shell website.
  • the foregoing processor may further execute the following program code: after sending the alarm information to the terminal address of the sponsor determined to be the empty shell website, the method further includes:
  • step A to step C are repeatedly performed to obtain a website that is determined to be the empty shell website again; the website that is determined to be the empty shell website is recorded as the website to be cleaned; the domain name of the website to be cleaned is sent to Target server.
  • a method for detecting a shell website By extracting multiple websites to be detected; One or more empty shell detection conditions are invoked from the website detection condition collection, and one or more empty website detection conditions are used to determine whether any website is an empty shell website; and the website whose detection result is an empty shell website is output.
  • the technical problem that the prior art adopts the manual resolution method to detect and clean the empty shell website is easy to miss, which leads to the low accuracy of the detection result of the empty shell website.
  • FIG. 13 is only for illustration, and the computer terminal can also be a smart phone (such as an Android mobile phone, an iOS mobile phone, etc.), a tablet computer, an applause computer, and a mobile Internet device (Mobile Internet Devices, MID). ), PAD and other terminal devices.
  • FIG. 13 does not limit the structure of the above electronic device.
  • computer terminal 10 may also include more or fewer components (such as a network interface, display device, etc.) than shown in FIG. 13, or have a different configuration than that shown in FIG.
  • Embodiments of the present invention also provide a storage medium.
  • the foregoing storage medium may be used to save the program code executed by the method for detecting the empty shell website provided in the first embodiment.
  • the foregoing storage medium may be located in any one of the computer terminal groups in the computer network, or in any one of the mobile terminal groups.
  • the storage medium is configured to store program code for performing the following steps: extracting a plurality of websites to be detected; and calling one or more empty shell website detection conditions from the website detection condition set, And using the one or more empty shell website detection conditions to determine whether any one website is an empty shell website; and outputting the detection result to the website of the empty shell website.
  • the disclosed technical contents may be implemented in other manners.
  • the device embodiments described above are merely illustrative.
  • the division of the unit is only a logical function division.
  • multiple units or components may be combined or may be Integrate into another system, or some features can be ignored or not executed.
  • the mutual coupling or direct coupling or communication connection shown or discussed may be through some interface, unit or module indirect.
  • the coupling or communication connection can be in electrical or other form.
  • the units described as separate components may or may not be physically separated, and the components displayed as units may or may not be physical units, that is, may be located in one place, or may be distributed to multiple network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of the embodiment.
  • each functional unit in each embodiment of the present invention may be integrated into one processing unit, or each unit may exist physically separately, or two or more units may be integrated into one unit.
  • the above integrated unit can be implemented in the form of hardware or in the form of a software functional unit.
  • the integrated unit if implemented in the form of a software functional unit and sold or used as a standalone product, may be stored in a computer readable storage medium.
  • the technical solution of the present invention which is essential or contributes to the prior art, or all or part of the technical solution, may be embodied in the form of a software product stored in a storage medium.
  • a number of instructions are included to cause a computer device (which may be a personal computer, server or network device, etc.) to perform all or part of the steps of the methods described in various embodiments of the present invention.
  • the foregoing storage medium includes: a U disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a removable hard disk, a magnetic disk, or an optical disk, and the like. .

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Environmental & Geological Engineering (AREA)
  • Information Transfer Between Computers (AREA)
  • Computer And Data Communications (AREA)

Abstract

Disclosed are a method, device and system for detecting a shell website. The method comprises: extracting a plurality of websites to be detected; invoking one or more shell website detection conditions from a website detection condition set, and using the one or more shell website detection conditions to determine whether any one website is a shell website; and outputting a website, of which a detection result is the shell website. The present invention solves the technical problem in the prior art that the easy occurrence of missing detection due to the use of the solution of performing artificial distinguishing to detect and clear a shell website results in a low accuracy rate of a detection result of the shell website.

Description

检测空壳网站的方法、装置及系统Method, device and system for detecting empty shell website
本申请要求2015年10月08日递交的申请号为201510646922.4、发明名称为“检测空壳网站的方法、装置及系统”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。The present application claims the priority of the Chinese Patent Application No. 20151064692, filed on Oct. 08, 2015, the entire disclosure of which is incorporated herein by reference.
技术领域Technical field
本发明涉及计算机领域,具体而言,涉及一种检测空壳网站的方法、装置及系统。The present invention relates to the field of computers, and in particular to a method, device and system for detecting a shell website.
背景技术Background technique
空壳网站是指在工业和信息化备案系统中,网站的主办者的历史备案信息含有主体信息和网站信息,但没有接入信息(即网站只有备案号,但由于网站实际使用空间IP地址变更,网站主办者并没有在新的接入服务商办理备案信息转接入)。The empty shell website means that in the industrial and informational filing system, the historical record information of the sponsor of the website contains the main body information and the website information, but there is no access information (that is, the website only has the record number, but the actual use space IP address of the website changes. The website sponsor did not transfer the filing information to the new access service provider.)
例如,阿里云作为一个网站的接入服务商,网站“website”的主办者通过阿里云备案系统向工信部提交了备案申请,并获取了工信部下发的网站的备案号,正常情况下,网站“website”的网站数据会存放在阿里云提供的虚拟空间中,但是,网站“website”的主办者私自选择变更的接入服务商的IP地址,即网站“website”的主办者私自变更了接入服务商,同时,网站“website”的主办者也没有在新的接入服务商办理备案信息转接入,在阿里云的备案系统以及工信部的系统中记载的备案信息中,网站“website”的接入商仍为阿里云,实际上网站“website”并没有使用阿里云的任何产品,那么,网站“website”对于接入服务商阿里云来讲,则是一个空壳网站。For example, Alibaba Cloud as a website access service provider, the organizer of the website "website" submitted a filing application to the Ministry of Industry and Information Technology through the Alibaba Cloud filing system, and obtained the registration number of the website issued by the Ministry of Industry and Information Technology. Under normal circumstances, the website " The website data of the website will be stored in the virtual space provided by Alibaba Cloud. However, the organizer of the website "website" privately selects the IP address of the changed access service provider, that is, the organizer of the website "website" has changed the access privately. At the same time, the sponsor of the website "website" did not transfer the filing information to the new access service provider. In the filing system of Alibaba Cloud and the filing information recorded in the system of the Ministry of Industry and Information, the website "website" The access provider is still Alibaba Cloud. In fact, the website “website” does not use any products of Alibaba Cloud. Then, the website “website” is a shell website for the access service provider Alibaba Cloud.
在现有的空壳网站的检测技术中,往往采取人工去识别空壳网站,即根据接入商的客服人员的经验去发现空壳网站,再对空壳网站进行清理。In the existing detection technology of the empty shell website, it is often used to identify the empty shell website, that is, according to the experience of the customer service personnel of the access provider, the empty shell website is found, and then the empty shell website is cleaned up.
需要说明的是,上述人工检测空壳网站后出现如下问题:It should be noted that the above problems occur after manually detecting the empty shell website:
(1)在通过接入商的备案系统备案的网站数量超过一定数量的情况下,人工检测空壳网站容易漏检、导致空壳网站检测结果准确率低。(1) In the case that the number of websites registered through the accessor's filing system exceeds a certain number, the manual detection of the empty shell website is likely to be missed, resulting in a low accuracy rate of the shell website detection results.
(2)人工检测空壳网站耗费的周期长,效率低,不能满足需求。(2) Manual detection of empty shell websites takes a long period of time and is inefficient, which cannot meet the demand.
针对上述现有技术采用人工分辨的方式来检测并清理空壳网站的方案容易漏检,导致空壳网站的检测结果准确率低的技术问题,目前尚未提出有效的解决方案。In view of the above-mentioned prior art, the manual resolution method for detecting and cleaning the empty shell website is easy to miss, which leads to the technical problem of low accuracy of the detection result of the empty shell website, and no effective solution has been proposed yet.
发明内容Summary of the invention
本发明实施例提供了一种检测空壳网站的方法、装置及系统,以至少解决现有技术采用人工分辨的方式来检测并清理空壳网站的方案容易漏检,导致空壳网站的检测结果准确率低的技术问题。The embodiment of the invention provides a method, a device and a system for detecting a shell website, so as to at least solve the problem that the prior art adopts a manual resolution method to detect and clean the shell website, which is easy to miss detection, resulting in the detection result of the shell website. Technical problems with low accuracy.
根据本发明实施例的一个方面,提供了一种检测空壳网站的方法,包括:提取待检测的多个网站;从网站检测条件集合中调用一个或多个空壳网站检测条件,并使用一个或多个空壳网站检测条件来判定任意一个网站是否为空壳网站;输出检测结果为空壳网站的网站。According to an aspect of an embodiment of the present invention, a method for detecting a empty shell website includes: extracting a plurality of websites to be detected; calling one or more empty shell website detection conditions from the website detection condition set, and using one Or multiple empty shell website detection conditions to determine whether any website is a shell website; the output test results are the website of the empty shell website.
根据本发明实施例的另一方面,还提供了一种检测空壳网站的装置,包括:提取单元,用于提取待检测的多个网站;调用单元,用于从网站检测条件集合中调用一个或多个空壳网站检测条件,并使用一个或多个空壳网站检测条件来判定任意一个网站是否为空壳网站;输出单元,用于输出检测结果为空壳网站的网站。According to another aspect of the embodiments of the present invention, there is provided an apparatus for detecting a private website, comprising: an extracting unit, configured to extract a plurality of websites to be detected; and an invoking unit, configured to invoke one from the website detecting condition set Or multiple empty shell websites detect conditions and use one or more empty shell website detection conditions to determine whether any one website is an empty shell website; an output unit for outputting a website whose detection result is a shell website.
根据本发明实施例的另一方面,还提供了一种检测空壳网站的系统,包括:备案服务器,用于存储多个网站的信息;检测服务器,与备案服务器建立通信关系,用于从备案服务器中提取待检测的多个网站,从网站检测条件集合中调用一个或多个空壳网站检测条件,并使用一个或多个空壳网站检测条件来判定任意一个网站是否为空壳网站;检测服务器还用于输出检测结果为空壳网站的网站。According to another aspect of the embodiments of the present invention, a system for detecting a shell website is further provided, including: a record server for storing information of a plurality of websites; a detection server, establishing a communication relationship with the record server, for recording from the record The server extracts multiple websites to be detected, invokes one or more empty shell website detection conditions from the website detection condition collection, and uses one or more empty shell website detection conditions to determine whether any one website is an empty shell website; The server is also used to output a website whose detection result is an empty shell website.
在本发明实施例中,采用提取待检测的多个网站;从网站检测条件集合中调用一个或多个空壳网站检测条件,并使用一个或多个空壳网站检测条件来判定任意一个网站是否为空壳网站;输出检测结果为空壳网站的网站。解决了现有技术采用人工分辨的方式来检测并清理空壳网站的方案容易漏检,导致空壳网站的检测结果准确率低的技术问题。In the embodiment of the present invention, a plurality of websites to be detected are extracted; one or more empty website detection conditions are invoked from the website detection condition set, and one or more empty website detection conditions are used to determine whether any one website is It is an empty shell website; the output test results are the website of the empty shell website. The technical problem that the prior art adopts the manual resolution method to detect and clean the empty shell website is easy to miss, which leads to the low accuracy of the detection result of the empty shell website.
附图说明DRAWINGS
此处所说明的附图用来提供对本发明的进一步理解,构成本申请的一部分,本发明的示意性实施例及其说明用于解释本发明,并不构成对本发明的不当限定。在附图中:The drawings described herein are intended to provide a further understanding of the invention, and are intended to be a part of the invention. In the drawing:
图1是根据本发明实施例的一种检测空壳网站的方法的硬件结构框图;1 is a block diagram showing the hardware structure of a method for detecting a shell website according to an embodiment of the present invention;
图2是根据本发明实施例的检测空壳网站的方法的流程图;2 is a flow chart of a method of detecting a empty shell website in accordance with an embodiment of the present invention;
图3是根据本发明实施例的一种检测空壳网站的装置的示意图;3 is a schematic diagram of an apparatus for detecting a shell website according to an embodiment of the present invention;
图4是根据本发明实施例的可选的检测空壳网站的装置的示意图;4 is a schematic diagram of an optional apparatus for detecting a shell website, in accordance with an embodiment of the present invention;
图5是根据本发明实施例的可选的检测空壳网站的装置的示意图; 5 is a schematic diagram of an optional apparatus for detecting a shell website, in accordance with an embodiment of the present invention;
图6是根据本发明实施例的可选的检测空壳网站的装置的示意图;6 is a schematic diagram of an optional apparatus for detecting a shell website, in accordance with an embodiment of the present invention;
图7是根据本发明实施例的可选的检测空壳网站的装置的示意图;7 is a schematic diagram of an optional apparatus for detecting a shell website, in accordance with an embodiment of the present invention;
图8是根据本发明实施例的可选的检测空壳网站的装置的示意图;8 is a schematic diagram of an optional apparatus for detecting a shell website, in accordance with an embodiment of the present invention;
图9是根据本发明实施例的可选的检测空壳网站的装置的示意图;9 is a schematic diagram of an optional apparatus for detecting a shell website, in accordance with an embodiment of the present invention;
图10是根据本发明实施例的可选的检测空壳网站的装置的示意图;10 is a schematic diagram of an optional apparatus for detecting a shell website according to an embodiment of the present invention;
图11是根据本发明实施例的可选的检测空壳网站的装置的示意图;11 is a schematic diagram of an optional apparatus for detecting a shell website according to an embodiment of the present invention;
图12是根据本发明实施例的一种检测空壳网站的系统的示意图;以及12 is a schematic diagram of a system for detecting a shell website according to an embodiment of the present invention;
图13是根据本申请实施例的一种计算机终端的结构框图。FIG. 13 is a structural block diagram of a computer terminal according to an embodiment of the present application.
具体实施方式detailed description
为了使本技术领域的人员更好地理解本发明方案,下面将结合本发明实施例中的附图,对本发明实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例仅仅是本发明一部分的实施例,而不是全部的实施例。基于本发明中的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例,都应当属于本发明保护的范围。The technical solutions in the embodiments of the present invention are clearly and completely described in the following with reference to the accompanying drawings in the embodiments of the present invention. It is an embodiment of the invention, but not all of the embodiments. All other embodiments obtained by those skilled in the art based on the embodiments of the present invention without creative efforts shall fall within the scope of the present invention.
需要说明的是,本发明的说明书和权利要求书及上述附图中的术语“第一”、“第二”等是用于区别类似的对象,而不必用于描述特定的顺序或先后次序。应该理解这样使用的数据在适当情况下可以互换,以便这里描述的本发明的实施例能够以除了在这里图示或描述的那些以外的顺序实施。此外,术语“包括”和“具有”以及他们的任何变形,意图在于覆盖不排他的包含,例如,包含了一系列步骤或单元的过程、方法、系统、产品或设备不必限于清楚地列出的那些步骤或单元,而是可包括没有清楚地列出的或对于这些过程、方法、产品或设备固有的其它步骤或单元。It is to be understood that the terms "first", "second" and the like in the specification and claims of the present invention are used to distinguish similar objects, and are not necessarily used to describe a particular order or order. It is to be understood that the data so used may be interchanged where appropriate, so that the embodiments of the invention described herein can be implemented in a sequence other than those illustrated or described herein. In addition, the terms "comprises" and "comprises" and "the" and "the" are intended to cover a non-exclusive inclusion, for example, a process, method, system, product, or device that comprises a series of steps or units is not necessarily limited to Those steps or units may include other steps or units not explicitly listed or inherent to such processes, methods, products or devices.
本申请中涉及到的名词解释如下:The terms referred to in this application are explained as follows:
空壳网站:是指在工业和信息化备案系统中,网站的主办者的历史备案信息含有主体信息和网站信息,但没有接入信息(即网站只有备案号,但由于网站实际使用空间IP地址变更,网站主办者并没有在新的接入服务商办理备案信息转接入)。Empty shell website: In the industrial and informational filing system, the historical record information of the organizer of the website contains the main body information and the website information, but there is no access information (that is, the website only has the record number, but because the website actually uses the space IP address) The change, the website sponsor did not transfer the filing information to the new access service provider).
ODPS:(Open Data Processing Service),开发数据处理服务,是阿里云自主研发、提供针对TB/PB级数据、实时性要求不高的分布式处理能力,应用于数据分析、挖掘、商业智能等领域。ODPS: (Open Data Processing Service), the development of data processing services, is Alibaba Cloud's independent research and development, providing TB/PB-level data, real-time non-demanding distributed processing capabilities, used in data analysis, mining, business intelligence and other fields. .
OTS:(Open Table Service),开放结构化数据服务,是一种NoSQL服务,面向 结构化数据与半结构化数据,提供海量存储和实时查询能力,具有强一致、高并发、低延迟以及支持灵活的数据模型等特点。OTS: (Open Table Service), an open structured data service, is a NoSQL service, oriented to Structured data and semi-structured data provide massive storage and real-time query capabilities with strong consistency, high concurrency, low latency, and support for flexible data models.
HBASE:(Hadoop Database),是一个高可靠性、高性能、面向列、可伸缩的分布式存储系统,利用HBASE技术可在廉价PC Server上搭建起大规模结构化存储集群。HBASE: (Hadoop Database) is a high-reliability, high-performance, column-oriented, scalable distributed storage system that uses HBASE technology to build large-scale structured storage clusters on inexpensive PC Servers.
实施例1Example 1
根据本发明实施例,还提供了一种检测空壳网站的方法的实施例,需要说明的是,在附图的流程图示出的步骤可以在诸如一组计算机可执行指令的计算机系统中执行,并且,虽然在流程图中示出了逻辑顺序,但是在某些情况下,可以以不同于此处的顺序执行所示出或描述的步骤。In accordance with an embodiment of the present invention, an embodiment of a method of detecting a shell website is also provided, it being noted that the steps illustrated in the flowchart of the figures may be performed in a computer system such as a set of computer executable instructions And, although the logical order is shown in the flowcharts, in some cases the steps shown or described may be performed in a different order than the ones described herein.
本申请实施例一所提供的方法实施例可以在移动终端、计算机终端或者类似的运算装置中执行。以运行在计算机终端上为例,图1是本发明实施例的一种检测空壳网站的方法的计算机终端的硬件结构框图。如图1所示,计算机终端10可以包括一个或多个(图中仅示出一个)处理器102(处理器102可以包括但不限于微处理器MCU或可编程逻辑器件FPGA等的处理装置)、用于存储数据的存储器104、以及用于通信功能的传输模块106。本领域普通技术人员可以理解,图1所示的结构仅为示意,其并不对上述电子装置的结构造成限定。例如,计算机终端10还可包括比图1中所示更多或者更少的组件,或者具有与图1所示不同的配置。The method embodiment provided in Embodiment 1 of the present application can be executed in a mobile terminal, a computer terminal or the like. Taking a computer terminal as an example, FIG. 1 is a block diagram showing a hardware structure of a computer terminal for detecting a method of a shell website according to an embodiment of the present invention. As shown in FIG. 1, computer terminal 10 may include one or more (only one shown) processor 102 (processor 102 may include, but is not limited to, a processing device such as a microprocessor MCU or a programmable logic device FPGA) A memory 104 for storing data, and a transmission module 106 for communication functions. It will be understood by those skilled in the art that the structure shown in FIG. 1 is merely illustrative and does not limit the structure of the above electronic device. For example, computer terminal 10 may also include more or fewer components than those shown in FIG. 1, or have a different configuration than that shown in FIG.
存储器104可用于存储应用软件的软件程序以及模块,如本发明实施例中的检测空壳网站的方法对应的程序指令/模块,处理器102通过运行存储在存储器104内的软件程序以及模块,从而执行各种功能应用以及数据处理,即实现上述的应用程序的漏洞检测方法。存储器104可包括高速随机存储器,还可包括非易失性存储器,如一个或者多个磁性存储装置、闪存、或者其他非易失性固态存储器。在一些实例中,存储器104可进一步包括相对于处理器102远程设置的存储器,这些远程存储器可以通过网络连接至计算机终端10。上述网络的实例包括但不限于互联网、企业内部网、局域网、移动通信网及其组合。The memory 104 can be used to store software programs and modules of the application software, such as the program instructions/modules corresponding to the method for detecting the empty shell website in the embodiment of the present invention, and the processor 102 runs the software program and the module stored in the memory 104, thereby Perform various functional applications and data processing, that is, implement the vulnerability detection method of the above application. Memory 104 may include high speed random access memory, and may also include non-volatile memory such as one or more magnetic storage devices, flash memory, or other non-volatile solid state memory. In some examples, memory 104 may further include memory remotely located relative to processor 102, which may be coupled to computer terminal 10 via a network. Examples of such networks include, but are not limited to, the Internet, intranets, local area networks, mobile communication networks, and combinations thereof.
传输装置106用于经由一个网络接收或者发送数据。上述的网络具体实例可包括计算机终端10的通信供应商提供的无线网络。在一个实例中,传输装置106包括一个网络适配器(Network Interface Controller,NIC),其可通过基站与其他网络设备相连从而可与互联网进行通讯。在一个实例中,传输装置106可以为射频(Radio Frequency,RF)模块,其用于通过无线方式与互联网进行通讯。 Transmission device 106 is for receiving or transmitting data via a network. The network specific examples described above may include a wireless network provided by a communication provider of the computer terminal 10. In one example, the transmission device 106 includes a Network Interface Controller (NIC) that can be connected to other network devices through a base station to communicate with the Internet. In one example, the transmission device 106 can be a Radio Frequency (RF) module for communicating with the Internet wirelessly.
在上述运行环境下,本申请提供了如图2所示的检测空壳网站的方法。图2是根据本发明实施例一的网站数据的处理方法的流程图。In the above operating environment, the present application provides a method for detecting a shell website as shown in FIG. 2. 2 is a flow chart of a method of processing website data according to a first embodiment of the present invention.
如图2所示,网站数据的处理方法可以包括如下实施步骤:As shown in FIG. 2, the method for processing website data may include the following implementation steps:
步骤A,提取待检测的多个网站。Step A, extracting a plurality of websites to be detected.
本申请上述步骤A中,上述待检测的多个网站可以存储在网站接入商的备案系统中,网站主办者在创办网站时,可以向上述网站接入商的备案系统进行网站信息备份,因此在上述网站接入商的备案系统中,可以存储有多个网站的网站信息,在正常情况下,上述备案系统中备案网站的网站数据是保存在网站接入商提供的接入商服务器的虚拟空间中。In the above step A of the present application, the plurality of websites to be detected may be stored in a filing system of the website access provider, and the website sponsor may back up the website information to the filing system of the website access provider when the website is created, In the above-mentioned website access provider's filing system, the website information of a plurality of websites may be stored. Under normal circumstances, the website data of the filing website in the above filing system is saved in the virtual network of the access provider server provided by the website access provider. In space.
需要说明的是,在本申请中,检测空壳网站方案可以采用专门的检测服务器来实现,即网站接入商可以采用检测服务器来从上述网站接入商的备案系统中获取上述大量网站的网站信息,进而检测出空壳网站。It should be noted that, in the present application, the detection of the empty shell website solution can be implemented by using a special detection server, that is, the website access provider can use the detection server to obtain the website of the above-mentioned large number of websites from the filing system of the website access provider. Information, and then detect the empty shell website.
以阿里云检测阿里云备案系统中的空壳网站为例,网站主办者者在创建网站时,会选择阿里云作为网站接入商,网站主办者可以向阿里云的备案系统中进行备案,备案信息中会包括每个网站的信息(例如网站的网站接入商信息),在正常情况下,在阿里云备案系统中成功备案过的网站会将网站数据保存在接入商服务器中,即阿里云的接入商服务器,但是在有些情况下,某些网站主办者会私自改变网站接入商的IP地址,例如,网站“W1”的创办者“U1”将网站接入商的IP地址私自更换为“某某云”的IP地址,那么这就造成了一种状况:在阿里云的备案系统中,网站“W1”的接入商为阿里云,但是实际上,网站“W1”的实际网站接入商为上述“某某云”,上述网站“W1”对于阿里云来讲则为空壳网站,阿里云要检测出上述空壳网站,则可以采用检测服务器先获取阿里云的备案系统中的多个待检测的网站,以进一步检测出空壳网站。For example, when Alibaba Cloud detects the empty shell website in the Alibaba Cloud filing system, the website sponsor will select Alibaba Cloud as the website access provider when creating the website. The website sponsor can file the filing with Alibaba Cloud's filing system. The information will include information about each website (such as the website's website provider information). Under normal circumstances, the website successfully registered in the Alibaba Cloud Filing System will save the website data in the access provider's server, ie Ali. Cloud access provider server, but in some cases, some website sponsors will change the IP address of the website access provider privately. For example, the founder "U1" of the website "W1" will privately access the IP address of the website access provider. Replaced with the IP address of "a certain cloud", then this creates a situation: in the filing system of Alibaba Cloud, the access provider of the website "W1" is Alibaba Cloud, but in fact, the actuality of the website "W1" The website access provider is the above-mentioned "a certain cloud". The above-mentioned website "W1" is an empty shell website for Alibaba Cloud. If Alibaba Cloud detects the above-mentioned empty shell website, it can use the detection server first. Ali cloud filing system to be detected in multiple sites, to further detect shell website.
需要说明的是,上述空壳网站的形成原因不限于网站主办者改变网站接入商的IP地址,网站本身停止创办,也会形成空壳网站。It should be noted that the reason for the formation of the above-mentioned empty shell website is not limited to the website sponsor changing the IP address of the website access provider, and the website itself is stopped from being established, and a hollow shell website is also formed.
步骤B,从网站检测条件集合中调用一个或多个空壳网站检测条件,并使用一个或多个空壳网站检测条件来判定任意一个网站是否为空壳网站。Step B: Invoking one or more empty shell detection conditions from the website detection condition set, and using one or more empty shell website detection conditions to determine whether any one website is an empty shell website.
本申请上述步骤B中,在检测服务器中可以预设一个网站检测条件集合,在该网站检测条件集合中可以包含一个或多个空壳网站检测条件,上述检测服务器可以调用上述一个或多个检测条件来对多个待检测的网站进行检测,以检测出上述待检测网站中的空壳网站。 In the foregoing step B of the present application, a website detection condition set may be preset in the detection server, and the website detection condition set may include one or more empty shell website detection conditions, and the detection server may invoke the one or more detections. The condition is to detect a plurality of websites to be detected to detect the empty website in the website to be detected.
仍旧以阿里云检测阿里云备案系统中的空壳网站为例,阿里云的检测服务器中可以预存有一个或多个空壳网站检测条件,需要说明的是,可以根据实际情况对网站检测条件集合中的空壳网站条件进行增加或减少或更改,阿里云的检测服务器在从阿里云的备案系统中提取出多个网站之后,可以调用上述一个或多个空壳网站的检测条件对上述待检测的网站进行筛选,以筛选出空壳网站。As an example, Alibaba Cloud detects the empty shell website in the Alibaba Cloud filing system. Alibaba Cloud's detection server can pre-store one or more empty shell website detection conditions. It should be noted that the website detection condition collection can be based on actual conditions. The condition of the empty shell website is increased or decreased or changed. After Alibaba Cloud's detection server extracts multiple websites from Alibaba Cloud's filing system, it can call the detection conditions of one or more of the above-mentioned empty shell websites to detect the above conditions. The website is screened to filter out empty shell sites.
步骤C,输出检测结果为空壳网站的网站。In step C, the output detection result is a website of the empty shell website.
本申请上述步骤C中,检测服务器在根据上述空壳网站检测条件对上述从接入商备案系统中提取的多个网站进行判断后,可以判断出合法网站以及空壳网站,检测服务器可以输出符合检测条件的空壳网站,工作人员则可以对上述空壳网站进行处理。In the above step C of the present application, after detecting the plurality of websites extracted from the access agent filing system according to the above-mentioned empty shell website detecting condition, the detecting server may determine the legal website and the empty shell website, and the detecting server may output the matching. The empty shell website that detects the conditions can be processed by the staff.
本申请上述实施例一公开的方案中,如果网站接入商希望清理备案服务器中的空壳网站,可以采用检测服务器先提取备案系统中的多个网站的信息,然后通过预设的网站检测条件集合中调用一个或多个空壳网站的检测条件对上述多个网站进行检测,最后检测服务器输出检测结果为空壳网站的网站。容易注意到,由于在检测服务器确定空壳网站的过程中,接入商只需要向检测服务器发送检测指令,检测服务器就可以自动调用预设的空壳网站检测条件对大量的网站进行判定,因此,通过本发明实施例所提供的方案,无需耗费大量的人力去识别空壳网站,同时,采用检测服务器自动从多个网站识别空壳网站,识别数量不受限制,这样不仅实现了检测服务器可以自动、批量的实现空壳网站的识别,避免了现有技术人工识别空壳网站的周期长的缺陷,而且,本申请提供的识别空壳网站的方案由于通过检测服务器通过调用一个或多个检测条件对网站进行检测,空壳网站的检测的准确率大大增高,因此,可以保证准确的、快速的实现在大量网站中检测出空壳网站。由此,本申请提供的上述实施例一的方案解决了现有技术采用人工分辨的方式来检测并清理空壳网站的方案容易漏检,导致空壳网站的检测结果准确率低的技术问题。In the solution disclosed in the first embodiment of the present application, if the website access provider wants to clean up the empty shell website in the filing server, the detecting server may first extract information of multiple websites in the filing system, and then detect the condition through the preset website. The detection conditions of one or more empty shell websites in the collection are detected on the plurality of websites, and finally the detection server outputs a website whose detection result is a shell website. It is easy to notice that since the access provider only needs to send a detection instruction to the detection server during the process of determining the empty shell website by the detection server, the detection server can automatically call the preset empty shell website detection condition to judge a large number of websites, so The solution provided by the embodiment of the present invention does not require a large amount of manpower to identify the empty shell website. At the same time, the detection server automatically identifies the empty shell website from multiple websites, and the number of identifications is not limited, so that the detection server can be realized not only The automatic and batch realization of the identification of the empty shell website avoids the short period of the manual identification of the empty shell website in the prior art, and the solution for identifying the empty shell website provided by the present application is called by detecting one or more tests through the detection server. The conditions for the detection of the website, the accuracy of the detection of the empty shell website is greatly increased, therefore, it can ensure accurate and rapid implementation of empty shell websites detected in a large number of websites. Therefore, the solution of the first embodiment provided by the present application solves the technical problem that the prior art adopts the manual resolution method to detect and clean the empty shell website, which is easy to miss detection, and leads to low accuracy of the detection result of the empty shell website.
在本申请提供的一种可选实施例中,步骤B,从网站检测条件集合中调用一个或多个空壳网站检测条件,并使用一个或多个空壳网站检测条件来判定任意一个网站是否为空壳网站的步骤可以包括:In an optional embodiment provided by the present application, step B, calling one or more empty shell website detection conditions from the website detection condition set, and using one or more empty shell website detection conditions to determine whether any one website is The steps for an empty shell website can include:
步骤S141,可以按照预定的调用规则去调用上述步骤B中的网站检测条件集合中的空壳网站检测条件,其中,预定的调用规则包括如下任意一个或多个规则:调用顺序、调用数量和调用类型。Step S141, the empty shell website detection condition in the website detection condition set in step B above may be invoked according to a predetermined calling rule, wherein the predetermined calling rule includes any one or more of the following rules: calling order, calling quantity, and calling Types of.
本申请上述步骤S141中,检测服务器可以按照调用规则去调用网站检测条件集合, 具体地,在上述调用规则中可以规定预设数量,检测服务器在上述网站检测条件集合中调用预设数量的空壳网站检测条件;在上述调用规则中也可以规定预设的空壳网站检测条件类型,检测服务器可以在上述网站检测条件集合调用符合预设类型的空壳网站检测条件;在上述调用规则中也可以规定预设的顺序,即在进行空壳网站检测时,多个空壳网站检测条件的执行顺序。In the above step S141 of the present application, the detection server may invoke the website detection condition set according to the calling rule. Specifically, the preset number may be specified in the calling rule, and the detecting server invokes a preset number of empty shell website detecting conditions in the website detecting condition set; and the preset empty shell website detecting condition may also be specified in the calling rule. Type, the detecting server may call the empty shell website detecting condition that meets the preset type in the above-mentioned website detecting condition set; in the above calling rule, the preset order may also be specified, that is, when the shell website detection is performed, multiple empty shell websites are The execution order of the detection conditions.
仍旧以阿里云检测阿里云备案系统中的空壳网站为例,阿里云的检测服务器可以根据预设的调用规则去调用预存的网站检测条件集合中的多个空壳网站检测条件,比如,在网站检测条件集合中包括5条空壳网站检测条件,阿里云的检测服务器可以根据预定调用规则去调用上述5条空壳网站检测条件的任意数量、任意类型的网站检测条件,而且在执行空壳网站检测时,可以按照调用规则中的调用顺序来执行多个空壳网站检测条件。For example, Alibaba Cloud's detection server can call a number of empty shell website detection conditions in a pre-stored website detection condition set according to a preset calling rule, for example, in the case of Alibaba Cloud detection of the empty cloud website in the Alibaba Cloud filing system. The website detection condition set includes five empty shell website detection conditions, and the Alibaba Cloud detection server can call any number of arbitrary website detection conditions of the above five empty shell website detection conditions according to the predetermined calling rule, and execute the empty shell. When the website is detected, multiple empty shell detection conditions can be executed according to the calling order in the calling rule.
在本申请提供的一种可选实施例中,在步骤B中,在多个空壳网站检测条件中的任意一个空壳网站检测条件判定多个网站中的第一网站是空壳网站之后,本实施例还包括以下步骤:In an optional embodiment provided by the present application, in step B, after any one of the plurality of empty shell website detection conditions determines that the first website of the plurality of websites is a shell website, This embodiment further includes the following steps:
步骤S142,使用多个空壳网站检测条件中的其他空壳网站条件来判定第一网站是否为空壳网站,在所有空壳网站检测条件都判定第一网站是空壳网站的情况下,确定第一网站为空壳网站,否则,在任意一个空壳网站检测条件判定第一网站不是空壳网站的情况下,确定第一网站为合法网站。Step S142, using other empty shell website conditions in the plurality of empty shell website detection conditions to determine whether the first website is a shell website, and determining that the first website is a shell website when all the shell website detection conditions determine that the first website is a shell website, The first website is an empty website. Otherwise, if any of the empty website detection conditions determines that the first website is not a shell website, the first website is determined to be a legitimate website.
本申请上述步骤S142中,在调用网站检测条件集合中的任意一个空壳网站检测条件判定上述第一网站为空壳网站之后,可以再继续调用网站检测条件集合中的其他空壳网站检测条件来对上述第一网站继续判定,这里需要说明的是,只有在网站检测条件集合中的每一条空壳网站检测条件都判定第一网站为空壳网站的情况下,上述第一网站才最终被判定为空壳网站被输出,否则该第一网站为合法网站,即,但凡网站检测条件集合中有一条判定第一网站不是空壳网站,上述第一网站则最终被确认为合法网站。In the above step S142 of the present application, after determining that the first website is an empty shell website by calling any one of the website detection condition detection conditions, the other website detection conditions in the website detection condition set may be further called. To continue to determine the above first website, it should be noted that the first website is finally determined only if each of the empty website detection conditions in the website detection condition set determines that the first website is a hollow website. The empty website is output, otherwise the first website is a legitimate website, that is, if one of the website detection condition sets determines that the first website is not a shell website, the first website is finally confirmed as a legitimate website.
仍旧以阿里云检测阿里云备案系统中的空壳网站为例,阿里云的服务器对网站“W1”进行空壳判定,在阿里云的检测服务器中的网站检测条件集合中可以包含5条空壳网站检测条件(例如条件A,条件B,条件C,条件D,条件E),阿里云检测服务器在执行空壳网站的检测时,可以按照预定的调用规则先调用条件A来对网站“W1”进行判断,条件A判定上述网站“W1”为空壳网站,此时,检测服务器则继续调用剩下的条件对上述网站“W1”进行继续的判定,如果条件B至条件E都判定上述网站“W1”为空壳 网站,检测服务器则确定上述网站“W1”为空壳网站并进行输出,如果条件A至条件E判定上述网站“W1”为空壳网站,但是,条件E判定上述网站“W1”不是空壳网站,检测服务器则判定上述网站“W1”为合法网站。For example, Alibaba Cloud's server detects the empty shell website in the Alibaba Cloud filing system. Alibaba Cloud's server makes a shell judgment on the website "W1". The website detection condition set in the Alibaba Cloud detection server can contain 5 empty shells. Website detection conditions (such as condition A, condition B, condition C, condition D, condition E), when the Alibaba Cloud detection server performs the detection of the empty shell website, it can call the condition A to the website "W1" according to the predetermined calling rule. Judging, condition A determines that the website "W1" is a shell website, and at this time, the detection server continues to call the remaining conditions to continue the determination of the website "W1", and if the conditions B to E determine the website " W1" is an empty shell The website, the detection server determines that the website "W1" is a shell website and outputs it. If condition A to condition E determines that the website "W1" is a shell website, condition E determines that the website "W1" is not a shell website. The detection server determines that the above website "W1" is a legitimate website.
在本申请提供的一种可选实施例中,上述空壳网站检测条件可以包括如下任意一种或多种类型:任意一个网站是否在白名单中、是否备案中或备案变更中、在预定时间内是否存在访问记录、是否被注册以及解析结果是否存在报备信息。In an optional embodiment provided by the present application, the foregoing empty shell website detection condition may include any one or more of the following types: whether any one website is in the white list, whether it is in the record or in the record change, at the scheduled time. Whether there is an access record, whether it is registered, and whether there is report information in the analysis result.
在上述实施例中,空壳网站检测条件集合中可以包含上述四个条件,第一条件:任意一个网站是否在白名单中;第二条件:任意一个网站是否备案中或备案变更中;第三条件:任意一个网站是否存在访问记录;第四条件:任意一个网站是否被注册以及解析结果是否存在报备信息。检测服务器可以根据一定的预定规则调用上述四个条件,在预定规则为顺序调用的情况下,检测服务器可以依次调用上述第一条件至上述第四条件,需要说明的是,上述四个条件每个条件都可以单独的判断任意一个网站(第一网站)为空壳网站,但是,只有在四个条件都判定第一网站为空壳网站的情况下,检测服务器才确定第一网站为空壳网站,即,如果上述四个条件有一个条件判定第一网站不是空壳网站,那么检测服务器则确定第一网站为合法网站。In the above embodiment, the empty shell website detection condition set may include the above four conditions, the first condition: whether any website is in the white list; the second condition: whether any website is in the record or in the record change; Condition: Whether there is an access record for any website; Fourth condition: Whether any website is registered and whether the report has the report information. The detecting server may invoke the above four conditions according to a certain predetermined rule. In the case that the predetermined rule is a sequential call, the detecting server may sequentially invoke the first condition to the fourth condition, and it should be noted that each of the above four conditions The conditions can be judged individually by any website (the first website) as a shell website, but the detection server determines that the first website is a shell website only if the four conditions determine that the first website is a shell website. That is, if the above four conditions have a condition that the first website is not a shell website, the detection server determines that the first website is a legitimate website.
需要说明的是,在检测服务器在调用空壳网站检测条件时,可以根据预设的规则来调用上述4个条件,It should be noted that when the detection server invokes the empty shell website detection condition, the above four conditions may be invoked according to a preset rule.
检测服务器可以根据第一网站是否在白名单中、是否备案中或备案变更中、在预定时间内是否存在访问记录、是否被注册以及解析结果是否存在报备信息来判断上述第一网站是否为空壳网站。The detecting server may determine whether the first website is empty according to whether the first website is in the white list, whether it is in the record or in the filing change, whether there is an access record within a predetermined time, whether it is registered, and whether the parsing result has the report information. Shell website.
在本申请提供的一种可选实施例中,当空壳网站检测条件为检测任意一个网站是否在白名单中时,使用空壳网站检测条件来判定任意一个网站是否为空壳网站的步骤包括:In an optional embodiment provided by the present application, when the empty shell website detecting condition is to detect whether any one website is in the white list, the step of using the empty shell website detecting condition to determine whether any one website is an empty shell website includes: :
步骤S1411,读取任意一个网站的网站信息。In step S1411, the website information of any website is read.
在上述步骤S1411中,上述任意一个网站的网站信息可以为网站的域名,检测服务器可以读取待检测的多个网站中的第一网站的网站域名。In the above step S1411, the website information of any one of the websites may be the domain name of the website, and the detection server may read the website domain name of the first website of the plurality of websites to be detected.
步骤S1412,判断任意一个网站的网站信息是否与白名单中保存的网站信息相匹配。In step S1412, it is determined whether the website information of any one website matches the website information saved in the white list.
步骤S1413,在匹配成功的情况下,确定任意一个网站为合法网站。In step S1413, if the matching is successful, it is determined that any one website is a legitimate website.
在上述步骤S1412至步骤S1413中,白名单中可以预存有多个合法网站的域名,检测服务器先读取第一网站的域名,然后将上述第一网站的域名同上述白名单中的多个合 法网站的域名进行匹配。在匹配成功的情况下,则说明第一网站为白名单网站,检测服务器则确定第一网站为合法网站。In the above step S1412 to step S1413, the domain name of the plurality of legitimate websites may be pre-stored in the white list, and the detecting server first reads the domain name of the first website, and then combines the domain name of the first website with the plurality of the white list. The domain name of the French website is matched. In the case that the matching is successful, the first website is a whitelist website, and the detection server determines that the first website is a legitimate website.
需要说明的是,如果第一网站的域名与白名单中保存的域名匹配失败,检测服务器则确定上述第一网站存在作为空壳网站的风险,会调用其他的判定条件对第一网站继续判定,只有在所有的判定条件每个都判定第一网站为空壳网站时,第一网站才被最终确定为空壳网站。It should be noted that if the domain name of the first website fails to match the domain name saved in the white list, the detection server determines that the first website exists as a risk of the empty website, and calls other judgment conditions to continue to determine the first website. The first website is finally determined to be a shell website only when all the judgment conditions determine that the first website is a shell website.
仍旧以阿里云检测阿里云备案系统中的空壳网站为例,在阿里云的检测服务器中预存有白名单,该白名单则保存着与阿里云有着合作关系的网站,阿里云将白名单中的网站默认为合法网站,以防止大客户自己误操作,致使大客户的网站因为满足空壳网站的条件被误清理掉。在阿里云的检测服务器对网站“W1”进行空壳判定时,阿里云的检测服务器可以读取网站“W1”的域名,然后将“W1”的域名同白名单中的多个合法网站的域名进行匹配,在匹配成功的情况下,阿里云的检测服务器上述网站“W1”为白名单网站。如果匹配失败,检测服务器则确定网站“W1”存在作为空壳网站的风险,然后检测服务器会继续调用其他的判定条件对上述网站“W1”进行判定,直到每个判定条件都判定网站“W1”为空壳网站,检测服务器才确定网站“W1”为空壳网站。For example, Alibaba Cloud detects the empty shell website in the Alibaba Cloud filing system. In the Alibaba Cloud detection server, there is a white list pre-stored. The white list stores a website that has a cooperative relationship with Alibaba Cloud. Alibaba Cloud will be whitelisted. The website defaults to a legitimate website to prevent large customers from mishandling themselves, causing the websites of large customers to be mistakenly cleaned up because of the conditions of the empty shell website. When Alibaba Cloud's detection server makes a shell judgment on the website "W1", Alibaba Cloud's detection server can read the domain name of the website "W1", and then the domain name of "W1" is the same as the domain name of multiple legitimate websites in the whitelist. Matching is performed. In the case where the matching is successful, the above-mentioned website "W1" of Alibaba Cloud's detection server is a whitelisted website. If the match fails, the detection server determines that the website "W1" exists as a risk of the empty website, and then the detection server continues to call other determination conditions to determine the website "W1" until each determination condition determines the website "W1" For the empty shell website, the detection server determines that the website "W1" is an empty shell website.
在本申请提供的一种可选实施例中,当空壳网站检测条件为检测任意一个网站在预定时间内是否存在访问记录时,使用空壳网站检测条件来判定任意一个网站是否为空壳网站的步骤可以包括:In an optional embodiment provided by the present application, when the empty shell website detecting condition is to detect whether any one website has an access record within a predetermined time, the empty shell website detecting condition is used to determine whether any one website is an empty shell website. The steps can include:
步骤S1414,获取多个网站在服务器中记录的域名的访问日志。Step S1414: Acquire an access log of the domain name recorded by the plurality of websites in the server.
步骤S1415,根据任意一个网站的域名在访问日志中查询是否在预定时间内记录有访问记录。In step S1415, the access log is queried according to the domain name of any website to check whether an access record is recorded within a predetermined time.
在上述步骤S1415中,检测服务器可以从多个网站在网站服务器中的获取多个网站的域名所对应的访问日志,从上述访问日志中可以查询到各个域名在各个时期的访问记录,检测服务器可以检测各个域名在预定时间内是否存在访问记录。In the above step S1415, the detecting server may obtain the access log corresponding to the domain name of the plurality of websites from the plurality of websites in the website server, and the access log of each domain name may be queried from the access log, and the detecting server may Check whether each domain name has an access record within a predetermined time.
步骤S1416,如果在预定时间内记录有访问记录,确定任意一个网站为合法网站。In step S1416, if an access record is recorded within a predetermined time, it is determined that any one website is a legitimate website.
在上述步骤S1416中,如果多个网站中的第一网站在上述预定时间内记录有访问记录,检测服务器将将上述第一网站确定为合法网站。In the above step S1416, if the first website of the plurality of websites records the access record within the predetermined time period, the detecting server determines the first website as the legitimate website.
需要说明的是,如果第一网站的域名在预定时间内不存在访问记录,检测服务器则确定上述第一网站存在作为空壳网站的风险,然后会调用其他的判定条件对第一网站继续判定,只有在所有的判定条件每个都判定第一网站为空壳网站时,第一网站才被最终 确定为空壳网站。It should be noted that if the domain name of the first website does not have an access record within a predetermined time, the detection server determines that the first website exists as a risk of the empty website, and then calls other judgment conditions to continue the determination of the first website. The first website is only finalized when all the judgment conditions determine that the first website is a shell website. Determined to be an empty shell website.
仍旧以阿里云检测阿里云备案系统中的空壳网站为例,阿里云的检测服务器在对网站“W1”进行判定时,可以根据第一网站的域名从网站服务器中查询上述网站“W1”的机房访问日志,从上述访问日志中可以查询到网站“W1”的域名在预定时间是否存在访问记录,如果在60天内,网站“W1”的域名存在访问记录的情况下,检测服务器则将网站“W1”确定为合法网站。如果网站“W1”的域名在60天内不存在访问记录,则检测服务器确定网站“W1”存在作为空壳网站的风险,检测服务器会继续调用其他的判定条件对网站“W1”进行判定,直到每个判定条件都判定网站“W1”为空壳网站,检测服务器才将网站“W1”确定为空壳网站,需要说明的是,检测服务器在核查网站“W1”是否存在访问记录的过程中可以采用ODPS进行数据清洗,然后将清洗后的数据存储到OTS中以提供给外部设备快速的访问。For example, when Alibaba Cloud detects the empty shell website in the Alibaba Cloud filing system, Alibaba Cloud's detection server can query the website "W1" from the website server according to the domain name of the first website when determining the website "W1". The computer access log, from the above access log, can query whether the domain name of the website "W1" has an access record at the scheduled time. If the domain name of the website "W1" exists in the access record within 60 days, the detection server will "the website" W1" is determined to be a legitimate website. If the domain name of the website "W1" does not have an access record within 60 days, the detection server determines that the website "W1" exists as a risk of the empty website, and the detection server continues to call other determination conditions to determine the website "W1" until each The judgment conditions all determine that the website "W1" is a shell website, and the detection server determines the website "W1" as a shell website. It should be noted that the detection server can use the verification website "W1" for the existence of the access record. The ODPS performs data cleaning and then stores the cleaned data in the OTS to provide quick access to external devices.
在本申请提供的一种可选实施例中,当空壳网站检测条件为检测任意一个网站为备案中或备案变更中时,使用空壳网站检测条件来判定任意一个网站是否为空壳网站的步骤包括:In an optional embodiment provided by the present application, when the empty shell website detecting condition is to detect any one website as a record or a record change, the empty shell website detecting condition is used to determine whether any one website is an empty shell website. The steps include:
步骤S1417,读取任意一个网站的网站信息,其中,网站信息包括:任意一个网站的域名的备案状态。In step S1417, the website information of any website is read, wherein the website information includes: the filing status of the domain name of any one website.
步骤S1418,判断任意一个网站的域名的备案状态是否为备案中或备案变更中。In step S1418, it is determined whether the filing status of the domain name of any one website is in the record or in the record change.
在上述步骤S1418中,检测服务器可以根据第一网站的域名从接入服务商的备案系统中读取第一网站域名的备案状态,在上述备案系统可以存储多个网站的域名的备案状态,检测服务器可以判断第一网站的域名的备案状态是否为备案中或备案变更中。In the above step S1418, the detecting server may read the filing status of the first website domain name from the access service provider's filing system according to the domain name of the first website, and the filing system may store the filing status of the plurality of website domain names in the filing system, and detect The server can determine whether the filing status of the domain name of the first website is in the record or in the record change.
步骤S1419,在任意一个网站的域名的备案状态为备案中或备案变更中的情况下,确定任意一个网站为合法网站。In step S1419, if the filing status of the domain name of any website is in the filing or the filing change, it is determined that any website is a legitimate website.
在上述步骤S1419中,在上述第一网站的域名的备案状态为备案中或备案变更中,检测服务器则确定第一网站为非法网站。In the above step S1419, when the filing status of the domain name of the first website is in the filing or the filing change, the detecting server determines that the first website is an illegal website.
需要说明的是,如果第一网站的域名的备案状态既不为备案中,也不为备案变更中,检测服务器则确定上述第一网站存在作为空壳网站的风险,然后会调用其他的判定条件对第一网站继续判定,只有在所有的判定条件每个都判定第一网站为空壳网站时,第一网站才被最终确定为空壳网站。It should be noted that if the filing status of the domain name of the first website is neither in the filing nor in the filing change, the detecting server determines that the first website exists as a risk of the empty website, and then calls other judgment conditions. The first website continues to determine that the first website is finally determined to be a shell website only when all the judgment conditions determine that the first website is a shell website.
仍旧以阿里云检测阿里云备案系统中的空壳网站为例,阿里云的检测服务器在对网站“W1”进行判定时,可以读取网站“W1”的域名,然后从备案系统中查询网站“W1” 的域名的备案状态是否为备案中或备案变更中,在网站“W1”的备案状态为备案中或备案变更中的情况下,阿里云的检测服务器则确定网站“W1”为合法网站。如果网站“W1”的域名在状态既不为备案中,也不为备案变更中的情况下,检测服务器确定网站“W1”存在作为空壳网站的风险,检测服务器会继续调用其他的判定条件对网站“W1”进行判定,直到每个判定条件都判定网站“W1”为空壳网站,检测服务器才将网站“W1”确定为空壳网站。For example, when Alibaba Cloud detects the empty shell website in the Alibaba Cloud filing system, Alibaba Cloud's detection server can read the domain name of the website “W1” when it determines the website “W1”, and then query the website from the filing system. W1" Whether the filing status of the domain name is in the record or the record change, in the case that the filing status of the website "W1" is in the record or the record change, the Alibaba Cloud detection server determines that the website "W1" is a legitimate website. If the domain name of the website "W1" is neither in the record nor in the case of the record change, the detection server determines that the website "W1" exists as a risk of the empty shell website, and the detection server will continue to call other judgment conditions. The website "W1" judges until the determination condition determines that the website "W1" is a empty shell website, and the detection server determines the website "W1" as the empty shell website.
在本申请提供的一种可选实施例中,当所述空壳网站检测条件为所述任意一个网站是否被注册以及解析结果是否存在报备信息,使用所述空壳网站检测条件来判定任意一个网站是否为空壳网站的步骤包括:In an optional embodiment provided by the present application, when the empty shell website detecting condition is whether the any one website is registered and whether the parsing result has the report information, the empty shell website detecting condition is used to determine any The steps for whether a website is an empty shell include:
步骤S1420,读取任意一个网站的网站信息。In step S1420, the website information of any website is read.
步骤S1421,在注册信息表中查询是否存在与任意一个网站的网站信息相匹配的信息。Step S1421: Query whether there is information matching the website information of any one website in the registration information table.
步骤S1422,在匹配成功的情况下,根据任意一个网站解析的结果是否存在报备信息来确定所述任意一个网站的类型。In step S1422, if the matching is successful, the type of the any one of the websites is determined according to whether there is any report information according to the result of any website parsing.
在上述步骤S1420至步骤S1422中,上述任意一个网站的网站信息可以为网站的域名,检测服务器将在从备案系统中读取到第一网站的域名之后,可以将第一网站的域名与注册信息表中的多个域名进行匹配,上述注册信息表中的域名可以为域名的状态为已注册的网站的域名,在匹配成功的情况下,检测服务器则根据第一网站的解析的结果来确定第一网站是合法网站或者空壳网站。In the above steps S1420 to S1422, the website information of any one of the websites may be the domain name of the website, and the detection server may read the domain name and registration information of the first website after reading the domain name of the first website from the filing system. The multiple domain names in the table are matched. The domain name in the registration information table may be the domain name of the registered website. If the matching is successful, the detection server determines the first according to the analysis result of the first website. A website is a legitimate website or a shell website.
这里需要说明的是,如果匹配失败,那么检测服务器调用的空壳网站检测条件则直接判定第一网站存在成为空壳网站的风险。如果匹成功,检测服务器继续根据第一网站的解析的结果来确定第一网站是合法网站或者空壳网站。It should be noted here that if the matching fails, then the detection condition of the empty shell website invoked by the detection server directly determines that the first website has the risk of becoming a shell website. If the success is successful, the detection server continues to determine whether the first website is a legitimate website or a shell website based on the results of the parsing of the first website.
在本申请提供的一种可选实施例中,在匹配成功的情况下,步骤S1422根据任意一个网站解析的结果是否存在报备信息来确定所述任意一个网站的类型的步骤可以包括:In an optional embodiment provided by the present application, if the matching is successful, the step of determining the type of the any one of the websites according to whether the result of any one of the website parsings is determined by the step of the website parsing may include:
步骤S14221,在任意一个网站的IP地址与接入商服务器已经记录的IP地址相同的情况下,确定任意一个网站为正常网站。In step S14221, if the IP address of any website is the same as the IP address already recorded by the access provider server, it is determined that any website is a normal website.
步骤S14222,在任意一个网站的IP地址与接入商服务器已经记录的IP地址都不相同的情况下,确定任意一个网站为空壳网站。In step S14222, if the IP address of any website is different from the IP address already recorded by the access provider server, it is determined that any website is an empty website.
在上述步骤S14221至步骤S14222中,如果上述任意一个网站(例如第一网站)的域名解析之后的IP地址属于接入商服务器中记录的IP地址的情况下,检测服务器调用 的空壳网站检测条件则确定上述第一网站为正常网站,如果第一网站的域名解析之后的IP地址不是数据接入商服务器的IP地址的情况下,检测服务器调用的空壳检测条件则确定上述第一网站为看空壳网站。In the above steps S14221 to S14222, if the IP address after the domain name resolution of any one of the above websites (for example, the first website) belongs to the IP address recorded in the access provider server, the detection server invokes The detection condition of the empty shell website determines that the first website is a normal website. If the IP address after the domain name resolution of the first website is not the IP address of the data access provider server, the detection condition of the empty shell detected by the detection server is determined. The first website mentioned above is a look at the empty website.
在本申请提供的一种可选实施例中,在步骤A,提取待检测的多个网站之后,本实施例提供的方法还可以包括:In an optional embodiment provided by the present application, after the step of extracting a plurality of websites to be detected, the method provided in this embodiment may further include:
步骤S18,通过启动至少n个数据分发线程将多个网站的网站信息依次写入数据队列中。Step S18, the website information of the plurality of websites is sequentially written into the data queue by starting at least n data distribution threads.
步骤S19,通过启动至少m个检测线程从数据队列中依次读取多个网站的网站信息,其中,m和n根据预先设定的检测总时间进行自动调整,m大于等于n,且m和n为自然数。Step S19: sequentially, by starting at least m detection threads, sequentially reading website information of a plurality of websites from the data queue, wherein m and n are automatically adjusted according to a preset total detection time, where m is greater than or equal to n, and m and n For natural numbers.
在上述步骤S18至步骤S19中,清理服务器中可以包括一个数据分发功能模块以及核查功能模块,数据分发功能模块可以每次默认启动n个数据分发线程,上述n个数据分发线程可以同时向数据队列中存放多个网站的网站信息,然后核查功能模块默认启动m个检测线程,上述m个检测线程可以从上述数据队列读取上述多个网站的网站信息,然后进行检测,确定出空壳网站,需要说明的是,本方案可以采用BlockingQueue(由数组支持的有界阻塞队列)作为数据队列。In the above steps S18 to S19, the cleaning server may include a data distribution function module and a verification function module. The data distribution function module may start n data distribution threads by default, and the n data distribution threads may simultaneously send data queues. The website information of the plurality of websites is stored, and then the verification function module starts m detection threads by default, and the m detection threads can read the website information of the plurality of websites from the data queue, and then perform detection to determine the empty shell website. It should be noted that this scheme can use BlockingQueue (bounded blocking queue supported by array) as the data queue.
仍旧以阿里云检测阿里云备案系统中的空壳网站为例,阿里云的检测服务器可以可以每次默认启动两个数据分发线程,上述两个数据分发线程可以同时向BlockingQueue(由数组支持的有界阻塞队列)中不断的存放数据,即多个网站的信息,然后检测服务器默认启动5个核查线程,上述5个核查线程中的每个核查线程都可以从上述BlockingQueue中取数据,然后进行核查,确定出空壳网站。For example, Alibaba Cloud's detection server can start two data distribution threads by default each time. The above two data distribution threads can simultaneously send to the BlockingQueue (supported by the array). Constantly storing data in the boundary blocking queue, that is, information of multiple websites, and then detecting that the server starts five verification threads by default, and each of the above five verification threads can take data from the above BlockingQueue, and then perform verification. , identify the empty shell website.
在本申请提供的一种可选实施例中,每个网站的网站信息还包括每个网站的主办者的终端地址,在步骤C,输出检测结果为空壳网站的网站之后,本实施例提供的方法还可以包括:In an optional embodiment provided by the present application, the website information of each website further includes a terminal address of the sponsor of each website, and in step C, after outputting the website whose detection result is a shell website, the embodiment provides The method can also include:
步骤S20,向被确定为空壳网站的主办者的终端地址发送告警信息,其中,告警信息至少包括空壳网站的域名。In step S20, the alarm information is sent to the terminal address of the sponsor determined to be the empty shell website, wherein the alarm information includes at least the domain name of the empty shell website.
在上述步骤S20中,检测服务器如果确定第一网站为空壳网站之后,可以向第一网站的主办者发送告警信息,该告警信息用于提醒空壳网站的创办者对该空壳网站进行调整,例如告警信息可以提示空壳网站的用户办理备案转接入。In the above step S20, the detecting server may send an alarm message to the organizer of the first website after determining that the first website is a shell website, and the alarm information is used to remind the founder of the shell website to adjust the shell website. For example, the alarm information may prompt the user of the empty shell website to handle the filing and access.
仍旧以阿里云检测阿里云备案系统中的空壳网站为例,阿里云的检测服务器在确定 网站“W1”为空壳网站之后,检测服务器可以向网站“W1”的网站主办者“U1”的手机发送音频,该音频可以为针对空壳网站的建议调整方案,对与无法发送音频的用户可以发送邮件。阿里云的检测服务器在成功发送告警信息后,可以生成发送报告并发送至消息中心,以确保客户已经收到告警信息,对于无法通知到的客户,检测服务器转人工客服处理。Still taking Alibaba Cloud to detect the empty shell website in the Alibaba Cloud filing system, Alibaba Cloud's detection server is determining After the website "W1" is an empty shell website, the detection server can send audio to the mobile phone of the website sponsor "U1" of the website "W1", which can be a suggested adjustment scheme for the empty shell website, and a user who cannot send audio. You can send mail. After successfully sending the alarm information, Alibaba Cloud's detection server can generate a transmission report and send it to the message center to ensure that the customer has received the alarm information. For customers who cannot be notified, the detection server transfers to the manual customer service.
在本申请提供的一种可选实施例中,在步骤S20,向被确定为空壳网站的主办者的终端地址发送告警信息之后,本实施例提供的方法可以包括:In an optional embodiment provided by the present application, after the sending the alarm information to the terminal address of the sponsor determined to be the empty shell website in step S20, the method provided in this embodiment may include:
步骤S21,在预设时长到达之后,重复执行步骤A至步骤C,获取再次被确定为空壳网站的网站。In step S21, after the preset duration is reached, step A to step C are repeatedly executed to obtain a website that is determined to be a shell website again.
在上述步骤S21中,检测服务器在将上述告警信息发送至空壳网站的主办者之后,可以在预设时长之后,再次执行上述步骤A至步骤C的空壳网站的检测方案,然后再次确定出空壳网站,需要说明的是,由于在上述步骤S20,清理服务器已经将告警信息通知了空壳网站(例如第一网站)的主办者,如果第一网站的主办者没有及时调整网站,第一网站会再次被确定为空壳网站。In the above step S21, after the detection server sends the alarm information to the sponsor of the empty shell website, the detection scheme of the shell website of the above steps A to C can be performed again after the preset duration, and then the determination is made again. The empty shell website, it should be noted that, in the above step S20, the cleaning server has notified the organizer of the empty shell website (for example, the first website), and if the organizer of the first website does not adjust the website in time, the first The website will be identified as a shell website again.
步骤S22,将再次被确定为空壳网站的网站记录为待清理网站。In step S22, the website determined to be the empty shell website is recorded as the website to be cleaned up.
步骤S23,将待清理网站的域名发送至目标服务器。In step S23, the domain name of the website to be cleaned is sent to the target server.
在上述步骤S23中,上述目标服务器可以为通信管理局的服务器,在上述第一网站被确定为待清理网站的情况下,检测服务器可以将上述第一网站的域名发送至通信管理局,由通信管理局对第一网站进行取消接入的处理。In the above step S23, the target server may be a server of the communication authority. In a case where the first website is determined to be a website to be cleaned, the detecting server may send the domain name of the first website to the communication authority, and the communication is performed. The Authority will cancel the access to the first website.
下面以一种优选的实施例对本申请的方案进行详细阐述:The solution of the present application is described in detail below in a preferred embodiment:
如图3所示,阿里云清理阿里云备案系统中的空壳网站可通过如下步骤:As shown in Figure 3, Alibaba Cloud can clean up the empty shell website in the Alibaba Cloud filing system by the following steps:
步骤S30,用于执行数据提取的方案,该方案可以提取待核查的多个网站。Step S30, a scheme for performing data extraction, which can extract a plurality of websites to be checked.
具体地,阿里云的清理服务器可以从阿里云的备案系统中提取待核查的网站的信息。需要说明的是,清理服务器可以在非业务高峰期时间段自动提取上述待核查的网站的网站信息数据,以减少对阿里云正常业务的影响。还需要说明的是,清理服务器可以提取时排除掉最近90天新备案的数据,以免新备案客户被误清理,可以提高安全性。需要说明的是,在清理服务器从备案系统中提取的多个网站的信息中,可以包括每个网站的域名、每个网站的网址、每个网站的备案状态等信息,优选地,在接下来的空壳网站判定步骤S31中,本实施例通过每个网站的域名来进行空壳网站的判定。Specifically, Alibaba Cloud's cleaning server can extract information of the website to be verified from Alibaba Cloud's filing system. It should be noted that the cleaning server can automatically extract the website information data of the website to be verified during the non-business peak period to reduce the impact on the normal business of Alibaba Cloud. It should also be noted that the cleaning server can extract the newly filed data of the last 90 days when it is extracted, so as to avoid the new filing customer being cleaned up by mistake, which can improve security. It should be noted that, in the information of the plurality of websites extracted by the cleaning server from the filing system, the domain name of each website, the website address of each website, the filing status of each website, and the like may be included, preferably, In the empty shell website determining step S31, the present embodiment determines the empty shell website by the domain name of each website.
步骤S31,判定待核查的多个网站中哪些网站是空壳网站。 In step S31, it is determined which of the plurality of websites to be checked are empty shell websites.
上述空壳网站的判定方案可以包括如下一种可选的方案,该方案可以包括如下实施步骤:The determining scheme of the above-mentioned empty shell website may include an optional solution, which may include the following implementation steps:
步骤S311,获取每个网站的域名开始核查,基于每个网站的域名核查网站是否属于空壳网站。Step S311, obtaining the domain name of each website to start verification, and checking whether the website belongs to the empty shell website based on the domain name of each website.
具体地,阿里云的清理服务器可以从上述网站信息中提取出多个网站的域名并根据域名来进行空壳网站的判定。Specifically, Alibaba Cloud's cleaning server may extract the domain names of multiple websites from the above website information and determine the empty shell website according to the domain name.
步骤S312,判断任意一个待核查的网站的域名是否存在域名白名单。Step S312, determining whether there is a domain name whitelist for the domain name of any website to be verified.
具体地,上述域名白名单可以包含与阿里云有合作关系的需要维护的大客户的域名,在域名存在白名单的情况下,执行步骤S318,在域名不存在白名单的情况下,执行步骤S313。需要说明的是,对于白名单域名不清理,可以防止因客户的误操作被清理而产生故障。Specifically, the domain name whitelist may include a domain name of a large client that needs to be maintained in cooperation with Alibaba Cloud. If the domain name has a whitelist, step S318 is performed. If the domain name does not exist in the whitelist, step S313 is performed. . It should be noted that if the whitelist domain name is not cleaned, it can prevent the fault caused by the customer's misoperation being cleaned up.
步骤S313,判断任意一个待核查的网站的域名是否存在访问记录。In step S313, it is determined whether there is an access record for the domain name of any website to be verified.
具体地,清理服务器可以获取网站服务器中的所有机房的域名访问日志,合并为顶级域名,在网站的域名在60天内存在访问记录情况下,执行步骤S318,在网站的域名在60天内不存在访问记录情况下,执行步骤S314。需要说明的是,此步骤对于网站的域名在60天内存在访问记录的不清理。还需要说明的是,访问记录的判断涉及到大数据清洗,可以利用现有的ODPS来进行数据的清洗,清洗出数据存储到OTS提供高并发快速访问,ODPS也可更换为其它大数据处理技术,OTS可以更换为HBASE。Specifically, the cleaning server can obtain the domain name access log of all the computer rooms in the website server, and merge into the top-level domain name. If the domain name of the website exists in the access record within 60 days, step S318 is performed, and the domain name of the website does not exist within 60 days. In the case of recording, step S314 is performed. It should be noted that this step does not clean up the access records of the website's domain name within 60 days. It should also be noted that the judgment of the access record involves big data cleaning, the existing ODPS can be used for data cleaning, the data storage is cleaned to the OTS to provide high concurrent fast access, and the ODPS can be replaced with other big data processing technologies. , OTS can be replaced with HBASE.
步骤S314,判断任意一个待核查的网站的域名的状态是否处于备案中或备案变更中。In step S314, it is determined whether the status of the domain name of any website to be verified is in the record or in the record change.
具体地,清理服务器可以进一步判断任意一个核查的网站的域名是否处于备案中或者备案变更中,在任意一个待核查的网站的域名处于备案中或备案变更中,执行步骤S318,在任意一个待核查的网站的域名的状态既不属于备案中,也不属于备案变更中的情况下,执行步骤S315。Specifically, the cleaning server may further determine whether the domain name of any one of the verified websites is in the record or the record change, and in the case that the domain name of any website to be verified is in the record or the record change, step S318 is performed, and any one is to be checked. If the status of the domain name of the website is neither in the filing nor in the case of the filing change, step S315 is performed.
步骤S315,判断任意一个待核查的网站的域名是否注册。In step S315, it is determined whether the domain name of any website to be verified is registered.
具体地,清理服务器可以进一步判断任意一个待核查的网站的域名是否注册,如果域名没有被注册,执行步骤S317,如果任意一个待核查的网站的域名已注册,执行步骤S316。Specifically, the cleaning server may further determine whether the domain name of any website to be verified is registered. If the domain name is not registered, step S317 is performed. If the domain name of any website to be checked is already registered, step S316 is performed.
步骤S316,解析域名,并判断解析的IP是否属于阿里云。Step S316, parsing the domain name, and determining whether the parsed IP belongs to Alibaba Cloud.
具体地,清理服务器可以对域名进行解析(域名直接解析或者加www域名解析), 清理服务器判断域名解析的IP是否属于阿里云,如果属于阿里云,则执行步骤S318,如果解析的IP不属于阿里云,则执行步骤S317。Specifically, the cleanup server can parse the domain name (direct domain name resolution or www domain name resolution). The cleaning server determines whether the IP address of the domain name belongs to Alibaba Cloud. If it belongs to Alibaba Cloud, step S318 is performed. If the parsed IP does not belong to Alibaba Cloud, step S317 is performed.
步骤S317,将任意一个待核查的网站确定为空壳网站。In step S317, any website to be verified is determined as an empty shell website.
步骤S318,将任意一个待核查的网站确定为正常网站。In step S318, any website to be verified is determined as a normal website.
需要说明的是,上述步骤S311至步骤S318的执行顺序为本方案的一个优选的实施例,本方案在进行空壳判定的过程中,也可以变换步骤S311至步骤S318的执行顺序。还需要说明的是,步骤S311至步骤S318可以循环执行预定时间,例如:5天,通过上述循环保护措施,降低空壳网站的误判率,保证已有数据准确率的同时,最大限度的提高安全性。It should be noted that the execution order of the foregoing steps S311 to S318 is a preferred embodiment of the solution. In the process of performing the empty shell determination, the present embodiment may also change the execution sequence of the steps S311 to S318. It should be noted that step S311 to step S318 can be cyclically executed for a predetermined time, for example, 5 days, through the above-mentioned loop protection measures, the false positive rate of the empty shell website is reduced, and the accuracy of the existing data is ensured, and the maximum is improved. safety.
步骤S32,将确定为空壳网站的网站信息通知客户。Step S32, notifying the customer of the website information determined to be the empty shell website.
具体地,清理服务器可以将判定为空壳网站的域名分批通知客户,即空壳网站的主办者。需要说明的是,清理服务器获取客户的手机号邮箱等联系方式,通过上述联系方式通知客户进行调整。清理服务器可以自动呼叫客户的手机号,然后播放客户需要调整的网站和具体调整的方案,对于无法呼叫的客户,清理服务器可以向客户的邮箱发送调整方案。在清理服务器通知客户之后,清理服务器会回调消息中心,保证客户已经收到调整通知,以免客户在不知情的情况下被清理,无法通知到的客户转人工处理。Specifically, the cleaning server may notify the client in batches of the domain name determined to be the empty shell website, that is, the sponsor of the empty shell website. It should be noted that the cleaning server obtains the contact information of the customer's mobile phone number mailbox, and notifies the customer to make adjustments through the above contact information. The cleaning server can automatically call the customer's mobile phone number, and then play the website that the customer needs to adjust and the specific adjustment plan. For customers who cannot be called, the cleaning server can send an adjustment plan to the customer's mailbox. After the cleanup server notifies the client, the cleanup server will call back the message center to ensure that the customer has received the adjustment notice, so that the customer is not cleaned up without knowing, and the unreachable client can be manually processed.
步骤S33,空壳网站的客户对空壳网站进行整改。In step S33, the client of the empty shell website rectifies the empty shell website.
具体地,客户在收到清理服务器发送的整改方案之后,可以根据上述整改方案对空壳网站进行整改。Specifically, after receiving the rectification plan sent by the cleaning server, the customer may rectify the empty shell website according to the above rectification plan.
步骤S34,清理服务器对上述被确定为空壳网站的网站进行复查清理。In step S34, the cleaning server performs a review and cleanup on the website determined as the empty shell website.
具体地,在通知客户五个自然日后,清理服务器对上述被确定为空壳网站的网站重新按照步骤S311至步骤S318再进行一次核查,对于客户已经调整合格的网站不清理,对于其它没有调整或者调整不合格的网站(再次被确定为空壳网站),清理服务器生成报文提交管局,对空壳网站进行取消接入操作。Specifically, after notifying the client for five natural days, the cleaning server re-checks the website determined as the empty website website according to step S311 to step S318, and does not clean up the website that the customer has adjusted and is not adjusted, or Adjust the unqualified website (again identified as a shell website), clean up the server to generate a message and submit it to the management office, and cancel the access operation to the empty shell website.
需要说明的是,对于前述的各方法实施例,为了简单描述,故将其都表述为一系列的动作组合,但是本领域技术人员应该知悉,本发明并不受所描述的动作顺序的限制,因为依据本发明,某些步骤可以采用其他顺序或者同时进行。其次,本领域技术人员也应该知悉,说明书中所描述的实施例均属于优选实施例,所涉及的动作和模块并不一定是本发明所必须的。It should be noted that, for the foregoing method embodiments, for the sake of simple description, they are all expressed as a series of action combinations, but those skilled in the art should understand that the present invention is not limited by the described action sequence. Because certain steps may be performed in other sequences or concurrently in accordance with the present invention. In addition, those skilled in the art should also understand that the embodiments described in the specification are all preferred embodiments, and the actions and modules involved are not necessarily required by the present invention.
通过以上的实施方式的描述,本领域的技术人员可以清楚地了解到根据上述实施例 的方法可借助软件加必需的通用硬件平台的方式来实现,当然也可以通过硬件,但很多情况下前者是更佳的实施方式。基于这样的理解,本发明的技术方案本质上或者说对现有技术做出贡献的部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质(如ROM/RAM、磁碟、光盘)中,包括若干指令用以使得一台终端设备(可以是手机,计算机,服务器,或者网络设备等)执行本发明各个实施例的方法。Through the description of the above embodiments, those skilled in the art can clearly understand the above embodiments according to the above embodiments. The method can be implemented by means of software plus a necessary general hardware platform, and of course, through hardware, but in many cases the former is a better implementation. Based on such understanding, the technical solution of the present invention, which is essential or contributes to the prior art, may be embodied in the form of a software product stored in a storage medium (such as ROM/RAM, disk, The optical disc includes a number of instructions for causing a terminal device (which may be a cell phone, a computer, a server, or a network device, etc.) to perform the methods of various embodiments of the present invention.
实施例2Example 2
根据本发明实施例,还提供了一种用于实施上述检测空壳网站的方法的检测空壳网站的装置,如图4所示,该装置包括:提取单元40,调用单元42,输出单元44。According to an embodiment of the present invention, there is further provided a device for detecting a shell website for implementing the method for detecting a shell website. As shown in FIG. 4, the apparatus includes: an extracting unit 40, an invoking unit 42, and an output unit 44. .
其中,提取单元40,用于提取待检测的多个网站。调用单元42,用于从网站检测条件集合中调用一个或多个空壳网站检测条件,并使用一个或多个空壳网站检测条件来判定任意一个网站是否为空壳网站;输出单元44,用于输出检测结果为空壳网站的网站。The extracting unit 40 is configured to extract a plurality of websites to be detected. The calling unit 42 is configured to invoke one or more empty shell website detection conditions from the website detection condition set, and use one or more empty shell website detection conditions to determine whether any one website is an empty shell website; the output unit 44 uses The website that outputs the test results for the empty shell website.
此处需要说明的是,提取单元40,上述提取单元40,调用单元42,输出单元44对应于实施例一中的步骤A至步骤C,三个单元与对应的步骤所实现的实例和应用场景相同,但不限于上述实施例一所公开的内容。需要说明的是,上述模块作为装置的一部分可以运行在实施例一提供的计算机终端10中。It should be noted that the extracting unit 40, the extracting unit 40, the calling unit 42, and the output unit 44 correspond to the steps A to C in the first embodiment, and the examples and application scenarios implemented by the three units and corresponding steps. The same, but not limited to, the content disclosed in the above embodiment 1. It should be noted that the above module can be operated as part of the device in the computer terminal 10 provided in the first embodiment.
由上可知,本申请上述实施例二公开的方案中,如果网站接入商希望清理备案服务器中的空壳网站,可以采用检测服务器先提取备案系统中的多个网站的信息,然后通过预设的网站检测条件集合中调用一个或多个空壳网站的检测条件对上述多个网站进行检测,最后检测服务器输出检测结果为空壳网站的网站。容易注意到,由于在检测服务器确定空壳网站的过程中,接入商只需要向检测服务器发送检测指令,检测服务器就可以自动调用预设的空壳网站检测条件对大量的网站进行判定,因此,通过本发明实施例所提供的方案,无需耗费大量的人力去识别空壳网站,同时,采用检测服务器自动从多个网站识别空壳网站,识别数量不受限制,这样不仅实现了检测服务器可以自动、批量的实现空壳网站的识别,避免了现有技术人工识别空壳网站的周期长的缺陷,而且,本申请提供的识别空壳网站的方案由于通过检测服务器通过调用一个或多个检测条件对网站进行检测,空壳网站的检测的准确率大大增高,因此,可以保证准确的、快速的实现在大量网站中检测出空壳网站。由此,本申请提供的上述实施例二的方案解决了现有技术采用人工分辨的方式来检测并清理空壳网站的方案容易漏检,导致空壳网站的检测结果准确率低的技术问题。As can be seen from the above, in the solution disclosed in the second embodiment of the present application, if the website access provider wants to clean up the empty shell website in the filing server, the detecting server may first extract information of multiple websites in the filing system, and then preset The detection condition of one or more empty shell websites in the website detection condition collection detects the above multiple websites, and finally the detection server outputs the website whose detection result is an empty shell website. It is easy to notice that since the access provider only needs to send a detection instruction to the detection server during the process of determining the empty shell website by the detection server, the detection server can automatically call the preset empty shell website detection condition to judge a large number of websites, so The solution provided by the embodiment of the present invention does not require a large amount of manpower to identify the empty shell website. At the same time, the detection server automatically identifies the empty shell website from multiple websites, and the number of identifications is not limited, so that the detection server can be realized not only The automatic and batch realization of the identification of the empty shell website avoids the short period of the manual identification of the empty shell website in the prior art, and the solution for identifying the empty shell website provided by the present application is called by detecting one or more tests through the detection server. The conditions for the detection of the website, the accuracy of the detection of the empty shell website is greatly increased, therefore, it can ensure accurate and rapid implementation of empty shell websites detected in a large number of websites. Therefore, the solution of the foregoing embodiment 2 provided by the present application solves the technical problem that the prior art adopts the manual resolution method to detect and clean the empty shell website, which is easy to miss detection, and leads to low accuracy of the detection result of the empty shell website.
可选地,如图5所示,调用单元42可以包括:调用模块421。 Optionally, as shown in FIG. 5, the calling unit 42 may include: calling the module 421.
其中,调用模块421用于按照预定的调用规则调用网站检测条件集合中的空壳网站检测条件,其中,预定的调用规则包括如下任意一个或多个规则:调用顺序、调用数量和调用类型。The calling module 421 is configured to invoke the empty shell website detection condition in the website detection condition set according to the predetermined calling rule, wherein the predetermined calling rule includes any one or more of the following rules: the calling order, the calling quantity, and the calling type.
此处需要说明的是,调用模块421对应于实施例一中的步骤S141,该模块与对应的步骤所实现的实例和应用场景相同,但不限于上述实施例一所公开的内容。需要说明的是,上述模块作为装置的一部分可以运行在实施例一提供的计算机终端10中。It should be noted that the calling module 421 corresponds to the step S141 in the first embodiment, and the module is the same as the example and the application scenario implemented by the corresponding steps, but is not limited to the content disclosed in the first embodiment. It should be noted that the above module can be operated as part of the device in the computer terminal 10 provided in the first embodiment.
可选地,如图6所示,上述调用单元42还可以包括:判定模块422。Optionally, as shown in FIG. 6, the calling unit 42 may further include: a determining module 422.
其中,判定模块422用于使用多个空壳网站检测条件中的其他空壳网站条件来判定第一网站是否为空壳网站,在所有空壳网站检测条件都判定第一网站是空壳网站的情况下,确定第一网站为空壳网站,否则,在任意一个空壳网站检测条件判定第一网站不是空壳网站的情况下,确定第一网站为合法网站。The determining module 422 is configured to determine whether the first website is an empty shell website by using other empty shell website conditions in the plurality of empty shell website detecting conditions, and determining that the first website is a shell website in all the shell website detecting conditions. In the case, it is determined that the first website is an empty website, otherwise, in the case that any of the empty website detection conditions determines that the first website is not a shell website, the first website is determined to be a legitimate website.
此处需要说明的是,调用模块422对应于实施例一中的步骤S142,该模块与对应的步骤所实现的实例和应用场景相同,但不限于上述实施例一所公开的内容。需要说明的是,上述模块作为装置的一部分可以运行在实施例一提供的计算机终端10中。It should be noted that the calling module 422 corresponds to the step S142 in the first embodiment, and the module is the same as the example and the application scenario implemented by the corresponding steps, but is not limited to the content disclosed in the first embodiment. It should be noted that the above module can be operated as part of the device in the computer terminal 10 provided in the first embodiment.
可选地,空壳网站检测条件包括如下任意一种或多种类型:任意一个网站是否在白名单中、是否备案中或备案变更中、在预定时间内是否存在访问记录、是否被注册以及解析结果是否存在报备信息。Optionally, the empty shell website detection condition includes any one or more of the following types: whether any one website is in the white list, whether it is in the record or in the record change, whether there is an access record within a predetermined time, whether it is registered, and parsing The result is whether there is a report information.
可选地,如图7所示,上述调用单元42还可以包括:获取模块423,查询模块424,第二确定模块425。Optionally, as shown in FIG. 7, the calling unit 42 may further include: an obtaining module 423, a querying module 424, and a second determining module 425.
其中,获取模块423,用于获取多个网站在服务器中记录的域名的访问日志。查询模块424,用于根据任意一个网站的域名在访问日志中查询是否在预定时间内记录有访问记录。第二确定模块425,用于如果在预定时间内记录有访问记录,确定任意一个网站为合法网站。The obtaining module 423 is configured to obtain an access log of a domain name recorded by multiple websites in the server. The querying module 424 is configured to query, in the access log, whether the access record is recorded within a predetermined time according to the domain name of any website. The second determining module 425 is configured to determine that any one website is a legitimate website if an access record is recorded within a predetermined time.
此处需要说明的是,获取模块423,查询模块424,第二确定模块425对应于实施例一中的步骤S1414至步骤S1416,三个模块与对应的步骤所实现的实例和应用场景相同,但不限于上述实施例一所公开的内容。需要说明的是,上述模块作为装置的一部分可以运行在实施例一提供的计算机终端10中。It should be noted that the obtaining module 423, the querying module 424, and the second determining module 425 correspond to the steps S1414 to S1416 in the first embodiment, and the three modules are the same as the examples and application scenarios implemented by the corresponding steps, but It is not limited to the contents disclosed in the above embodiment 1. It should be noted that the above module can be operated as part of the device in the computer terminal 10 provided in the first embodiment.
可选地,如图8所示,上述调用单元42还可以包括:第二读取模块426,第二判断模块427,第三确定模块428。Optionally, as shown in FIG. 8, the calling unit 42 may further include: a second reading module 426, a second determining module 427, and a third determining module 428.
其中,第二读取模块426,用于读取任意一个网站的网站信息,其中,网站信息包 括:任意一个网站的域名的备案状态;第二判断模块427,用于判断任意一个网站的域名的备案状态是否为备案中或备案变更中;第三确定模块428,用于在任意一个网站的域名的备案状态为备案中或备案变更中的情况下,确定任意一个网站为合法网站。The second reading module 426 is configured to read website information of any website, where the website information packet Included: the filing status of the domain name of any website; the second determining module 427 is configured to determine whether the filing status of the domain name of any one website is in the filing or the filing change; the third determining module 428 is used on any website. If the filing status of the domain name is in the case of filing or filing change, it is determined that any website is a legitimate website.
此处需要说明的是,第二读取模块426,第二判断模块427,第三确定模块428对应于实施例一中的步骤S1417至步骤S1419,三个模块与对应的步骤所实现的实例和应用场景相同,但不限于上述实施例一所公开的内容。需要说明的是,上述模块作为装置的一部分可以运行在实施例一提供的计算机终端10中。It should be noted that the second reading module 426, the second determining module 427, and the third determining module 428 correspond to the steps S1417 to S1419 in the first embodiment, and the three modules and the corresponding steps are implemented by the steps and The application scenario is the same, but is not limited to the content disclosed in the first embodiment. It should be noted that the above module can be operated as part of the device in the computer terminal 10 provided in the first embodiment.
可选地,如图9所示,上述调用单元42还可以包括:第三读取模块429,第二查询模块430,第四确定模块431。Optionally, as shown in FIG. 9, the calling unit 42 may further include: a third reading module 429, a second query module 430, and a fourth determining module 431.
其中,第三读取模块429,用于读取任意一个网站的网站信息;第二查询模块430,用于在注册信息表中查询是否存在与任意一个网站的网站信息相匹配的信息;第四确定模块431,用于在匹配成功的情况下,根据任意一个网站解析的结果是否存在报备信息来确定任意一个网站的类型。The third reading module 429 is configured to read website information of any website; the second query module 430 is configured to query, in the registration information table, whether there is information matching the website information of any one website; The determining module 431 is configured to determine the type of any website according to whether the result of any one of the website parsing has the report information if the matching is successful.
此处需要说明的是,第三读取模块429,第二查询模块430,第四确定模块431对应于实施例一中的步骤S1420至步骤S1422,三个模块与对应的步骤所实现的实例和应用场景相同,但不限于上述实施例一所公开的内容。需要说明的是,上述模块作为装置的一部分可以运行在实施例一提供的计算机终端10中。It should be noted that the third reading module 429, the second query module 430, and the fourth determining module 431 correspond to the steps S1420 to S1422 in the first embodiment, and the three modules and the corresponding steps are implemented by the corresponding steps. The application scenario is the same, but is not limited to the content disclosed in the first embodiment. It should be noted that the above module can be operated as part of the device in the computer terminal 10 provided in the first embodiment.
可选地,如图10所示上述第四确定模块431还可以包括:子确定模块4311。Optionally, the fourth determining module 431 further includes a sub-determining module 4311.
其中,第五确定模块432,用于根据任意一个网站的域名解析而成的IP地址来确定任意一个网站的类型;其中,在任意一个网站的IP地址与接入商服务器已经记录的IP地址相同的情况下,确定任意一个网站为正常网站;在任意一个网站的IP地址与接入商服务器已经记录的IP地址都不相同的情况下,确定任意一个网站为空壳网站。The fifth determining module 432 is configured to determine the type of any website according to the IP address resolved by the domain name of any website; wherein the IP address of any website is the same as the IP address already recorded by the access provider server. In the case of any website, it is determined that any website is a normal website; if the IP address of any website is different from the IP address already recorded by the access server, it is determined that any website is an empty website.
此处需要说明的是,子确定模块4311对应于实施例一中的步骤S14221至步骤S14222,该模块与对应的步骤所实现的实例和应用场景相同,但不限于上述实施例一所公开的内容。需要说明的是,上述模块作为装置的一部分可以运行在实施例一提供的计算机终端10中。It should be noted that the sub-determination module 4311 corresponds to the step S14221 to the step S14222 in the first embodiment, and the module is the same as the example and the application scenario implemented by the corresponding steps, but is not limited to the content disclosed in the first embodiment. . It should be noted that the above module can be operated as part of the device in the computer terminal 10 provided in the first embodiment.
可选地,如图11所示本实施例提供的装置还可以包括:数据分发单元46,检测单元48。Optionally, the apparatus provided in this embodiment, as shown in FIG. 11, may further include: a data distribution unit 46, and a detecting unit 48.
其中,数据分发单元46,用于通过启动至少n个数据分发线程将多个网站的网站信息依次写入数据队列中;检测单元48,用于通过启动至少m个检测线程从数据队列中依 次读取多个网站的网站信息;其中,m和n根据预先设定的检测总时间进行自动调整,m大于等于n,且m和n为自然数。The data distribution unit 46 is configured to sequentially write the website information of the plurality of websites into the data queue by starting at least the n data distribution threads, and the detecting unit 48 is configured to: according to the at least m detection threads, start from the data queue The website information of a plurality of websites is read at a time; wherein m and n are automatically adjusted according to a preset total detection time, m is greater than or equal to n, and m and n are natural numbers.
此处需要说明的是,数据分发单元46,检测单元48对应于实施例一中的步骤S18至步骤S19,该模块与对应的步骤所实现的实例和应用场景相同,但不限于上述实施例一所公开的内容。需要说明的是,上述模块作为装置的一部分可以运行在实施例一提供的计算机终端10中。It should be noted that the data distribution unit 46, the detecting unit 48 corresponds to the step S18 to the step S19 in the first embodiment, and the module is the same as the example and the application scenario implemented by the corresponding steps, but is not limited to the first embodiment. The content disclosed. It should be noted that the above module can be operated as part of the device in the computer terminal 10 provided in the first embodiment.
实施例3Example 3
根据本发明实施例,还提供了一种用于实施上述检测空壳网站的方法的检测空壳网站的系统,如图12所示,该系统包括:备案服务器1200,检测服务器1210。According to an embodiment of the present invention, a system for detecting a shell website for implementing the method for detecting a shell website is further provided. As shown in FIG. 12, the system includes: a record server 1200, and a detection server 1210.
其中,备案服务器1200,用于存储多个网站的信息;检测服务器1210,与备案服务器建立通信关系,用于在从备案服务器中提取待检测的多个网站之后,从网站检测条件集合中调用一个或多个空壳网站检测条件,并使用一个或多个空壳网站检测条件来判定任意一个网站是否为空壳网站,输出检测结果为空壳网站的网站。The record server 1200 is configured to store information of a plurality of websites; the detection server 1210 establishes a communication relationship with the record server, and is configured to call one of the website detection condition sets after extracting a plurality of websites to be detected from the record server. Or multiple empty shell websites detect conditions and use one or more empty shell website detection conditions to determine whether any one website is an empty shell website, and output a website whose detection result is an empty shell website.
在本申请提供的一种可选实施例中,检测服务器1210还用于按照预定的调用规则调用网站检测条件集合中的空壳网站检测条件,其中,预定的调用规则包括如下任意一个或多个规则:调用顺序、调用数量和调用类型。In an optional embodiment provided by the application, the detecting server 1210 is further configured to invoke the empty shell website detecting condition in the website detecting condition set according to the predetermined calling rule, wherein the predetermined calling rule includes any one or more of the following: Rules: Call order, number of calls, and type of call.
在本申请提供的一种可选实施例中,检测服务器1210还用于使用在多个空壳网站检测条件中的任意一个空壳网站检测条件判定多个网站中的第一网站是空壳网站之后,使用多个空壳网站检测条件中的其他空壳网站条件来判定第一网站是否为空壳网站,在所有空壳网站检测条件都判定第一网站是空壳网站的情况下,确定第一网站为空壳网站,否则,在任意一个空壳网站检测条件判定第一网站不是空壳网站的情况下,确定第一网站为合法网站。In an optional embodiment provided by the present application, the detecting server 1210 is further configured to determine, by using any one of a plurality of empty shell website detecting conditions, that the first website of the plurality of websites is an empty shell website. After that, using the other empty shell website conditions in the multiple shell website detection conditions to determine whether the first website is an empty shell website, and in the case that all the empty shell website detection conditions determine that the first website is a hollow shell website, the first A website is an empty website. Otherwise, if any of the empty website detection conditions determines that the first website is not a shell website, the first website is determined to be a legitimate website.
在本申请提供的一种可选实施例中,空壳网站检测条件包括如下任意一种或多种类型:任意一个网站是否在白名单中、是否备案中或备案变更中、在预定时间内是否存在访问记录、是否被注册以及解析结果是否存在报备信息。In an optional embodiment provided by the present application, the empty shell website detection condition includes any one or more of the following types: whether any one website is in the white list, whether it is in the record or in the record change, is it within a predetermined time period? There is an access record, whether it is registered, and whether there is report information for the result of the analysis.
由此可知,本申请上述实施例三公开的方案中,如果网站接入商希望清理备案服务器中的空壳网站,可以采用检测服务器先提取备案系统中的多个网站的信息,然后通过预设的网站检测条件集合中调用一个或多个空壳网站的检测条件对上述多个网站进行检测,最后检测服务器输出检测结果为空壳网站的网站。容易注意到,由于在检测服务器确定空壳网站的过程中,接入商只需要向检测服务器发送检测指令,检测服务器就可 以自动调用预设的空壳网站检测条件对大量的网站进行判定,因此,通过本发明实施例所提供的方案,无需耗费大量的人力去识别空壳网站,同时,采用检测服务器自动从多个网站识别空壳网站,识别数量不受限制,这样不仅实现了检测服务器可以自动、批量的实现空壳网站的识别,避免了现有技术人工识别空壳网站的周期长的缺陷,而且,本申请提供的识别空壳网站的方案由于通过检测服务器通过调用一个或多个检测条件对网站进行检测,空壳网站的检测的准确率大大增高,因此,可以保证准确的、快速的实现在大量网站中检测出空壳网站。由此,本申请提供的上述实施例三的方案解决了现有技术采用人工分辨的方式来检测并清理空壳网站的方案容易漏检,导致空壳网站的检测结果准确率低的技术问题。Therefore, in the solution disclosed in the third embodiment of the present application, if the website access provider wants to clean up the empty shell website in the filing server, the detecting server may first extract information of multiple websites in the filing system, and then preset The detection condition of one or more empty shell websites in the website detection condition collection detects the above multiple websites, and finally the detection server outputs the website whose detection result is an empty shell website. It is easy to notice that since the access provider only needs to send a detection instruction to the detection server during the process of determining the empty shell website by the detection server, the detection server can A large number of websites are determined by automatically calling the preset empty shell website detection condition. Therefore, the solution provided by the embodiment of the present invention does not require a large amount of manpower to identify the empty shell website, and at the same time, the detection server automatically uses multiple The website identifies the empty shell website, and the number of identifications is not limited. This not only realizes that the detection server can automatically and batchly realize the identification of the empty shell website, and avoids the defect that the prior art manually identifies the empty shell website, and this application is long. The solution for identifying the empty shell website is because the detection server detects the website by calling one or more detection conditions, and the accuracy of the detection of the empty shell website is greatly increased, so that accurate and rapid implementation can be ensured in a large number of websites. A shell website was detected. Therefore, the solution of the foregoing embodiment 3 provided by the present application solves the technical problem that the prior art adopts the manual resolution method to detect and clean the empty shell website, which is easy to miss detection, and leads to low accuracy of the detection result of the empty shell website.
在本申请提供的一种可选实施例中,当空壳网站检测条件为检测任意一个网站是否在白名单中时,检测服务器1210用于读取任意一个网站的网站信息;判断任意一个网站的网站信息是否与白名单中保存的网站信息相匹配;在匹配成功的情况下,上述检测服务器1210确定任意一个网站为合法网站。In an optional embodiment provided by the present application, when the empty shell website detecting condition is to detect whether any one website is in the white list, the detecting server 1210 is configured to read website information of any one website; Whether the website information matches the website information saved in the white list; if the matching is successful, the above detection server 1210 determines that any one website is a legitimate website.
在本申请提供的一种可选实施例中,当空壳网站检测条件为检测任意一个网站在预定时间内是否存在访问记录时,上述检测服务器1210用于获取多个网站在服务器中记录的域名的访问日志;根据任意一个网站的域名在访问日志中查询是否在预定时间内记录有访问记录;如果在预定时间内记录有访问记录,上述检测服务器1210确定任意一个网站为合法网站。In an optional embodiment provided by the present application, when the empty shell website detecting condition is to detect whether any one website has an access record within a predetermined time, the detecting server 1210 is configured to acquire a domain name recorded by multiple websites in the server. The access log; in the access log according to the domain name of any website, whether the access record is recorded within the predetermined time; if the access record is recorded within the predetermined time, the detection server 1210 determines that any website is a legitimate website.
在本申请提供的一种可选实施例中,当空壳网站检测条件为检测任意一个网站为备案中或备案变更中时,上述检测服务器1210用于读取任意一个网站的网站信息,其中,网站信息包括:任意一个网站的域名的备案状态;判断任意一个网站的域名的备案状态是否为备案中或备案变更中;在任意一个网站的域名的备案状态为备案中或备案变更中的情况下,上述检测服务器1210确定任意一个网站为合法网站。In an optional embodiment provided by the present application, when the detecting condition of the empty shell website is to detect any one of the websites as a record or a change in the record, the detecting server 1210 is configured to read the website information of any website, wherein The website information includes: the status of the registration of the domain name of any website; whether the filing status of the domain name of any website is in the record or the record change; in the case where the filing status of the domain name of any website is in the record or in the case of the record change The above detection server 1210 determines that any one website is a legitimate website.
在本申请提供的一种可选实施例中,当空壳网站检测条件为任意一个网站是否被注册以及解析结果是否存在报备信息,上述检测服务器1210用于读取任意一个网站的网站信息;在注册信息表中查询是否存在与任意一个网站的网站信息相匹配的信息;在匹配成功的情况下,上述检测服务器1210根据任意一个网站解析的结果是否存在报备信息来确定任意一个网站的类型。In an optional embodiment provided by the present application, when the empty shell website detects whether the website is registered and whether the report has the report information, the detection server 1210 is configured to read the website information of any website; In the registration information table, it is queried whether there is information matching the website information of any one website; in the case that the matching is successful, the detection server 1210 determines the type of any website according to whether the result of any website analysis has the report information. .
在本申请提供的一种可选实施例中,上述检测服务器1210还用于在任意一个网站的IP地址与接入商服务器已经记录的IP地址相同的情况下,确定任意一个网站为合法 网站;在任意一个网站的IP地址与接入商服务器已经记录的IP地址都不相同的情况下,上述检测服务器1210确定任意一个网站为空壳网站。In an optional embodiment provided by the present application, the detecting server 1210 is further configured to determine that any website is legal if the IP address of any website is the same as the IP address already recorded by the access provider server. The website detecting server 1210 determines that any one website is an empty shell website, in the case where the IP address of any one website is different from the IP address already recorded by the access provider server.
在本申请提供的一种可选实施例中,上述检测服务器1210还用于通过启动至少n个数据分发线程将多个网站的网站信息依次写入数据队列中;通过启动至少m个检测线程从数据队列中依次读取多个网站的网站信息;其中,m和n根据预先设定的检测总时间进行自动调整,m大于等于n,且m和n为自然数。In an optional embodiment provided by the present application, the detecting server 1210 is further configured to sequentially write website information of multiple websites into the data queue by starting at least n data distribution threads; by starting at least m detection threads from The data queue sequentially reads website information of a plurality of websites; wherein, m and n are automatically adjusted according to a preset total detection time, m is greater than or equal to n, and m and n are natural numbers.
在本申请提供的一种可选实施例中,每个网站的网站信息还包括每个网站的主办者的终端地址,其中,在输出检测结果为空壳网站的网站之后,上述检测服务器1210还用于向被确定为空壳网站的主办者的终端地址发送告警信息,其中,告警信息至少包括空壳网站的域名。In an optional embodiment provided by the application, the website information of each website further includes a terminal address of the sponsor of each website, wherein after detecting the website whose detection result is a shell website, the detection server 1210 further And sending the alarm information to the terminal address of the sponsor determined to be the empty shell website, wherein the alarm information includes at least the domain name of the empty shell website.
在本申请提供的一种可选实施例中,在向被确定为空壳网站的主办者的终端地址发送告警信息之后,上述检测服务器1210还用于在预设时长到达之后,重复执行步骤A至步骤C,获取再次被确定为空壳网站的网站;将再次被确定为空壳网站的网站记录为待清理网站;将待清理网站的域名发送至目标服务器。In an optional embodiment provided by the present application, after sending the alarm information to the terminal address of the sponsor determined to be the empty shell website, the detecting server 1210 is further configured to repeatedly perform step A after the preset duration is reached. Go to step C, obtain the website that is determined to be the empty shell website again; record the website that is determined to be the empty shell website again as the website to be cleaned; and send the domain name of the website to be cleaned to the target server.
实施例4Example 4
本发明的实施例可以提供一种计算机终端,该计算机终端可以是计算机终端群中的任意一个计算机终端设备。可选地,在本实施例中,上述计算机终端也可以替换为移动终端等终端设备。Embodiments of the present invention may provide a computer terminal, which may be any one of computer terminal groups. Optionally, in this embodiment, the foregoing computer terminal may also be replaced with a terminal device such as a mobile terminal.
可选地,在本实施例中,上述计算机终端可以位于计算机网络的多个网络设备中的至少一个网络设备。Optionally, in this embodiment, the computer terminal may be located in at least one network device of the plurality of network devices of the computer network.
在本实施例中,上述计算机终端可以执行应用程序的漏洞检测方法中以下步骤的程序代码:提取待检测的多个网站;从网站检测条件集合中调用一个或多个空壳网站检测条件,并使用一个或多个空壳网站检测条件来判定任意一个网站是否为空壳网站;输出检测结果为空壳网站的网站。In this embodiment, the computer terminal may execute the program code of the following steps in the vulnerability detection method of the application: extracting multiple websites to be detected; and calling one or more empty shell detection conditions from the website detection condition set, and Use one or more empty shell detection conditions to determine whether any website is a shell site; output a website that detects the result as a shell site.
可选地,图13是根据本发明实施例的一种计算机终端的结构框图。如图13所示,该计算机终端A可以包括:一个或多个(图中仅示出一个)处理器510、存储器530、传输装置550。Optionally, FIG. 13 is a structural block diagram of a computer terminal according to an embodiment of the present invention. As shown in FIG. 13, the computer terminal A may include one or more (only one shown in the figure) processor 510, memory 530, and transmission device 550.
其中,存储器可用于存储软件程序以及模块,如本发明实施例中的安全漏洞检测方法和装置对应的程序指令/模块,处理器通过运行存储在存储器内的软件程序以及模块,从而执行各种功能应用以及数据处理,即实现上述的系统漏洞攻击的检测方法。存储器 可包括高速随机存储器,还可以包括非易失性存储器,如一个或者多个磁性存储装置、闪存、或者其他非易失性固态存储器。在一些实例中,存储器可进一步包括相对于处理器远程设置的存储器,这些远程存储器可以通过网络连接至终端A。上述网络的实例包括但不限于互联网、企业内部网、局域网、移动通信网及其组合。The memory can be used to store software programs and modules, such as the security vulnerability detection method and the program instruction/module corresponding to the device in the embodiment of the present invention. The processor executes various functions by running a software program and a module stored in the memory. Application and data processing, that is, the detection method for implementing the above system vulnerability attack. Memory High speed random access memory may also be included, and may also include non-volatile memory such as one or more magnetic storage devices, flash memory, or other non-volatile solid state memory. In some examples, the memory can further include memory remotely located relative to the processor, which can be connected to terminal A via a network. Examples of such networks include, but are not limited to, the Internet, intranets, local area networks, mobile communication networks, and combinations thereof.
处理器可以通过传输装置调用存储器存储的信息及应用程序,以执行下述步骤:步骤A:提取待检测的多个网站;步骤B:从网站检测条件集合中调用一个或多个空壳网站检测条件,并使用一个或多个空壳网站检测条件来判定任意一个网站是否为空壳网站;步骤C:输出检测结果为空壳网站的网站。The processor may invoke the memory stored information and the application by the transmission device to perform the following steps: Step A: extracting multiple websites to be detected; Step B: calling one or more empty shell websites from the website detection condition set Conditions, and use one or more empty shell website detection conditions to determine whether any website is an empty shell website; Step C: Output the website whose detection result is a shell website.
可选地,上述处理器还可以执行如下步骤的程序代码:按照预定的调用规则调用网站检测条件集合中的空壳网站检测条件,其中,预定的调用规则包括如下任意一个或多个规则:调用顺序、调用数量和调用类型。Optionally, the foregoing processor may further execute the following program code: invoke the empty shell website detection condition in the website detection condition set according to a predetermined calling rule, where the predetermined calling rule includes any one or more of the following rules: Order, number of calls, and type of call.
可选地,上述处理器还可以执行如下步骤的程序代码:在多个空壳网站检测条件中的任意一个空壳网站检测条件判定多个网站中的第一网站是空壳网站之后,使用多个空壳网站检测条件中的其他空壳网站条件来判定第一网站是否为空壳网站,在所有空壳网站检测条件都判定第一网站是空壳网站的情况下,确定第一网站为空壳网站,否则,在任意一个空壳网站检测条件判定第一网站不是空壳网站的情况下,确定第一网站为合法网站。Optionally, the foregoing processor may further execute the following program code: in any one of the plurality of empty shell website detection conditions, the empty website detection condition determines that the first website of the plurality of websites is an empty shell website, and the use is performed. The other empty shell website conditions in the empty shell website detection condition determine whether the first website is a empty shell website, and if all the empty shell websites detect that the first website is a shell website, the first website is determined to be empty. The shell website, otherwise, in the case where any of the empty shell websites detects that the first website is not a shell website, the first website is determined to be a legitimate website.
可选地,上述处理器还可以执行如下步骤的程序代码:空壳网站检测条件包括如下任意一种或多种类型:任意一个网站是否在白名单中、是否备案中或备案变更中、在预定时间内是否存在访问记录、是否被注册以及解析结果是否存在报备信息。Optionally, the processor may further execute the following program code: the empty shell website detection condition includes any one or more of the following types: whether any website is in the white list, whether it is in the record or in the record change, is scheduled Whether there is an access record, whether it is registered, and whether there is a report information in the analysis result.
可选地,上述处理器还可以执行如下步骤的程序代码:当空壳网站检测条件为检测任意一个网站是否在白名单中时,使用空壳网站检测条件来判定任意一个网站是否为空壳网站的步骤包括:读取任意一个网站的网站信息;判断任意一个网站的网站信息是否与白名单中保存的网站信息相匹配;在匹配成功的情况下,确定任意一个网站为合法网站。Optionally, the foregoing processor may further execute the following program code: when the empty shell website detecting condition is to detect whether any website is in the white list, use the empty shell website detection condition to determine whether any one website is an empty shell website. The steps include: reading the website information of any website; determining whether the website information of any one website matches the website information saved in the white list; and if the matching is successful, determining that any one website is a legitimate website.
可选地,上述处理器还可以执行如下步骤的程序代码:当空壳网站检测条件为检测任意一个网站在预定时间内是否存在访问记录时,使用空壳网站检测条件来判定任意一个网站是否为空壳网站的步骤包括:获取多个网站在服务器中记录的域名的访问日志;根据任意一个网站的域名在访问日志中查询是否在预定时间内记录有访问记录;如果在预定时间内记录有访问记录,确定任意一个网站为合法网站。 Optionally, the processor may further execute the following program code: when the empty shell website detecting condition is to detect whether any website has an access record within a predetermined time, use the empty shell website detection condition to determine whether any website is The steps of the empty shell website include: obtaining an access log of the domain name recorded by the plurality of websites in the server; querying in the access log according to the domain name of any website whether the access record is recorded within the predetermined time; if the access is recorded within the predetermined time Record and determine that any website is a legitimate website.
可选地,上述处理器还可以执行如下步骤的程序代码:当空壳网站检测条件为检测任意一个网站为备案中或备案变更中时,使用空壳网站检测条件来判定任意一个网站是否为空壳网站的步骤包括:读取任意一个网站的网站信息,其中,网站信息包括:任意一个网站的域名的备案状态;判断任意一个网站的域名的备案状态是否为备案中或备案变更中;在任意一个网站的域名的备案状态为备案中或备案变更中的情况下,确定任意一个网站为合法网站。Optionally, the foregoing processor may further execute the following program code: when the empty shell website detection condition is to detect any one website as a record or a record change, use the empty shell website detection condition to determine whether any one website is empty. The steps of the shell website include: reading the website information of any website, wherein the website information includes: the filing status of the domain name of any one website; determining whether the filing status of the domain name of any one website is in the record or in the record change; When the filing status of a website's domain name is in the record or in the case of a record change, it is determined that any website is a legitimate website.
可选地,上述处理器还可以执行如下步骤的程序代码:当空壳网站检测条件为任意一个网站是否被注册以及解析结果是否存在报备信息,使用空壳网站检测条件来判定任意一个网站是否为空壳网站的步骤包括:读取任意一个网站的网站信息;在注册信息表中查询是否存在与任意一个网站的网站信息相匹配的信息;在匹配成功的情况下,根据任意一个网站解析的结果是否存在报备信息来确定任意一个网站的类型。Optionally, the processor may further execute the following program code: when the empty shell website detects whether any website is registered and whether the parsing result has report information, use the shell detection condition to determine whether any website is The steps of the empty shell website include: reading the website information of any website; querying the registration information table whether there is information matching the website information of any one website; if the matching is successful, parsing according to any website The result is whether there is a report information to determine the type of any website.
可选地,上述处理器还可以执行如下步骤的程序代码:根据任意一个网站解析的结果是否存在报备信息来确定任意一个网站的类型包括:在任意一个网站的IP地址与接入商服务器已经记录的IP地址相同的情况下,确定任意一个网站为合法网站;在任意一个网站的IP地址与接入商服务器已经记录的IP地址都不相同的情况下,确定任意一个网站为空壳网站。Optionally, the foregoing processor may further execute the following program code: determining whether any type of the website is included according to whether the result of any website parsing has the report information: the IP address of the website and the access provider server have been If the recorded IP addresses are the same, it is determined that any website is a legitimate website; if the IP address of any website is different from the IP address already recorded by the access provider server, it is determined that any website is an empty website.
可选地,上述处理器还可以执行如下步骤的程序代码:在提取待检测的多个网站之后,方法还包括:通过启动至少n个数据分发线程将多个网站的网站信息依次写入数据队列中;通过启动至少m个检测线程从数据队列中依次读取多个网站的网站信息;其中,m和n根据预先设定的检测总时间进行自动调整,m大于等于n,且m和n为自然数。Optionally, the foregoing processor may further execute the following program code: after extracting a plurality of websites to be detected, the method further includes: sequentially initiating, by starting at least n data distribution threads, site information of the plurality of websites into the data queue. The website information of multiple websites is sequentially read from the data queue by starting at least m detection threads; wherein, m and n are automatically adjusted according to a preset total detection time, m is greater than or equal to n, and m and n are Natural number.
可选地,上述处理器还可以执行如下步骤的程序代码:每个网站的网站信息还包括每个网站的主办者的终端地址,其中,在输出检测结果为空壳网站的网站之后,方法还包括:向被确定为空壳网站的主办者的终端地址发送告警信息,其中,告警信息至少包括空壳网站的域名。Optionally, the foregoing processor may further execute program code of the following steps: the website information of each website further includes a terminal address of a sponsor of each website, wherein after outputting the website whose detection result is a shell website, the method further The method includes: sending an alarm message to a terminal address of a sponsor determined to be a shell website, wherein the alarm information includes at least a domain name of the empty shell website.
可选地,上述处理器还可以执行如下步骤的程序代码:在向被确定为空壳网站的主办者的终端地址发送告警信息之后,方法还包括:Optionally, the foregoing processor may further execute the following program code: after sending the alarm information to the terminal address of the sponsor determined to be the empty shell website, the method further includes:
在预设时长到达之后,重复执行步骤A至步骤C,获取再次被确定为空壳网站的网站;将再次被确定为空壳网站的网站记录为待清理网站;将待清理网站的域名发送至目标服务器。After the preset duration arrives, step A to step C are repeatedly performed to obtain a website that is determined to be the empty shell website again; the website that is determined to be the empty shell website is recorded as the website to be cleaned; the domain name of the website to be cleaned is sent to Target server.
采用本发明实施例,提供了一种检测空壳网站的方法。通过提取待检测的多个网站; 从网站检测条件集合中调用一个或多个空壳网站检测条件,并使用一个或多个空壳网站检测条件来判定任意一个网站是否为空壳网站;输出检测结果为空壳网站的网站。解决了现有技术采用人工分辨的方式来检测并清理空壳网站的方案容易漏检,导致空壳网站的检测结果准确率低的技术问题。With the embodiment of the present invention, a method for detecting a shell website is provided. By extracting multiple websites to be detected; One or more empty shell detection conditions are invoked from the website detection condition collection, and one or more empty website detection conditions are used to determine whether any website is an empty shell website; and the website whose detection result is an empty shell website is output. The technical problem that the prior art adopts the manual resolution method to detect and clean the empty shell website is easy to miss, which leads to the low accuracy of the detection result of the empty shell website.
本领域普通技术人员可以理解,图13所示的结构仅为示意,计算机终端也可以是智能手机(如Android手机、iOS手机等)、平板电脑、掌声电脑以及移动互联网设备(Mobile Internet Devices,MID)、PAD等终端设备。图13其并不对上述电子装置的结构造成限定。例如,计算机终端10还可包括比图13中所示更多或者更少的组件(如网络接口、显示装置等),或者具有与图13所示不同的配置。A person skilled in the art can understand that the structure shown in FIG. 13 is only for illustration, and the computer terminal can also be a smart phone (such as an Android mobile phone, an iOS mobile phone, etc.), a tablet computer, an applause computer, and a mobile Internet device (Mobile Internet Devices, MID). ), PAD and other terminal devices. FIG. 13 does not limit the structure of the above electronic device. For example, computer terminal 10 may also include more or fewer components (such as a network interface, display device, etc.) than shown in FIG. 13, or have a different configuration than that shown in FIG.
本领域普通技术人员可以理解上述实施例的各种方法中的全部或部分步骤是可以通过程序来指令终端设备相关的硬件来完成,该程序可以存储于一计算机可读存储介质中,存储介质可以包括:闪存盘、只读存储器(Read-Only Memory,ROM)、随机存取器(Random Access Memory,RAM)、磁盘或光盘等。A person of ordinary skill in the art may understand that all or part of the steps of the foregoing embodiments may be completed by a program to instruct terminal device related hardware, and the program may be stored in a computer readable storage medium, and the storage medium may be Including: flash disk, read-only memory (ROM), random access memory (RAM), disk or optical disk.
实施例5Example 5
本发明的实施例还提供了一种存储介质。可选地,在本实施例中,上述存储介质可以用于保存上述实施例一所提供的检测空壳网站的方法所执行的程序代码。Embodiments of the present invention also provide a storage medium. Optionally, in the embodiment, the foregoing storage medium may be used to save the program code executed by the method for detecting the empty shell website provided in the first embodiment.
可选地,在本实施例中,上述存储介质可以位于计算机网络中计算机终端群中的任意一个计算机终端中,或者位于移动终端群中的任意一个移动终端中。Optionally, in this embodiment, the foregoing storage medium may be located in any one of the computer terminal groups in the computer network, or in any one of the mobile terminal groups.
可选地,在本实施例中,存储介质被设置为存储用于执行以下步骤的程序代码:提取待检测的多个网站;从网站检测条件集合中调用一个或多个空壳网站检测条件,并使用所述一个或多个空壳网站检测条件来判定任意一个网站是否为空壳网站;输出检测结果为所述空壳网站的网站。Optionally, in this embodiment, the storage medium is configured to store program code for performing the following steps: extracting a plurality of websites to be detected; and calling one or more empty shell website detection conditions from the website detection condition set, And using the one or more empty shell website detection conditions to determine whether any one website is an empty shell website; and outputting the detection result to the website of the empty shell website.
上述本发明实施例序号仅仅为了描述,不代表实施例的优劣。The serial numbers of the embodiments of the present invention are merely for the description, and do not represent the advantages and disadvantages of the embodiments.
在本发明的上述实施例中,对各个实施例的描述都各有侧重,某个实施例中没有详述的部分,可以参见其他实施例的相关描述。In the above-mentioned embodiments of the present invention, the descriptions of the various embodiments are different, and the parts that are not detailed in a certain embodiment can be referred to the related descriptions of other embodiments.
在本申请所提供的几个实施例中,应该理解到,所揭露的技术内容,可通过其它的方式实现。其中,以上所描述的装置实施例仅仅是示意性的,例如所述单元的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,例如多个单元或组件可以结合或者可以集成到另一个系统,或一些特征可以忽略,或不执行。另一点,所显示或讨论的相互之间的耦合或直接耦合或通信连接可以是通过一些接口,单元或模块的间接 耦合或通信连接,可以是电性或其它的形式。In the several embodiments provided by the present application, it should be understood that the disclosed technical contents may be implemented in other manners. The device embodiments described above are merely illustrative. For example, the division of the unit is only a logical function division. In actual implementation, there may be another division manner. For example, multiple units or components may be combined or may be Integrate into another system, or some features can be ignored or not executed. Alternatively, the mutual coupling or direct coupling or communication connection shown or discussed may be through some interface, unit or module indirect. The coupling or communication connection can be in electrical or other form.
所述作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部单元来实现本实施例方案的目的。The units described as separate components may or may not be physically separated, and the components displayed as units may or may not be physical units, that is, may be located in one place, or may be distributed to multiple network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of the embodiment.
另外,在本发明各个实施例中的各功能单元可以集成在一个处理单元中,也可以是各个单元单独物理存在,也可以两个或两个以上单元集成在一个单元中。上述集成的单元既可以采用硬件的形式实现,也可以采用软件功能单元的形式实现。In addition, each functional unit in each embodiment of the present invention may be integrated into one processing unit, or each unit may exist physically separately, or two or more units may be integrated into one unit. The above integrated unit can be implemented in the form of hardware or in the form of a software functional unit.
所述集成的单元如果以软件功能单元的形式实现并作为独立的产品销售或使用时,可以存储在一个计算机可读取存储介质中。基于这样的理解,本发明的技术方案本质上或者说对现有技术做出贡献的部分或者该技术方案的全部或部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质中,包括若干指令用以使得一台计算机设备(可为个人计算机、服务器或者网络设备等)执行本发明各个实施例所述方法的全部或部分步骤。而前述的存储介质包括:U盘、只读存储器(ROM,Read-Only Memory)、随机存取存储器(RAM,Random Access Memory)、移动硬盘、磁碟或者光盘等各种可以存储程序代码的介质。The integrated unit, if implemented in the form of a software functional unit and sold or used as a standalone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present invention, which is essential or contributes to the prior art, or all or part of the technical solution, may be embodied in the form of a software product stored in a storage medium. A number of instructions are included to cause a computer device (which may be a personal computer, server or network device, etc.) to perform all or part of the steps of the methods described in various embodiments of the present invention. The foregoing storage medium includes: a U disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a removable hard disk, a magnetic disk, or an optical disk, and the like. .
以上所述仅是本发明的优选实施方式,应当指出,对于本技术领域的普通技术人员来说,在不脱离本发明原理的前提下,还可以做出若干改进和润饰,这些改进和润饰也应视为本发明的保护范围。 The above description is only a preferred embodiment of the present invention, and it should be noted that those skilled in the art can also make several improvements and retouchings without departing from the principles of the present invention. It should be considered as the scope of protection of the present invention.

Claims (25)

  1. 一种检测空壳网站的方法,其特征在于,包括:A method for detecting a shell website, characterized in that it comprises:
    步骤A,提取待检测的多个网站;Step A, extracting a plurality of websites to be detected;
    步骤B,从网站检测条件集合中调用一个或多个空壳网站检测条件,并使用所述一个或多个空壳网站检测条件来判定任意一个网站是否为空壳网站;Step B: Calling one or more empty shell website detection conditions from the website detection condition set, and using the one or more empty shell website detection conditions to determine whether any one website is an empty shell website;
    步骤C,输出检测结果为所述空壳网站的网站。In step C, the output detection result is the website of the empty shell website.
  2. 根据权利要求1所述的方法,其特征在于,按照预定的调用规则调用所述网站检测条件集合中的空壳网站检测条件,其中,所述预定的调用规则包括如下任意一个或多个规则:调用顺序、调用数量和调用类型。The method according to claim 1, wherein the empty shell website detection condition in the website detection condition set is invoked according to a predetermined calling rule, wherein the predetermined calling rule comprises any one or more of the following rules: Call order, number of calls, and type of call.
  3. 根据权利要求1所述的方法,其特征在于,在所述多个空壳网站检测条件中的任意一个空壳网站检测条件判定所述多个网站中的第一网站是所述空壳网站之后,使用所述多个空壳网站检测条件中的其他空壳网站条件来判定所述第一网站是否为所述空壳网站,在所有空壳网站检测条件都判定所述第一网站是所述空壳网站的情况下,确定所述第一网站为所述空壳网站,否则,在所述任意一个空壳网站检测条件判定所述第一网站不是所述空壳网站的情况下,确定所述第一网站为合法网站。The method according to claim 1, wherein any one of the plurality of empty shell website detection conditions determines that the first website of the plurality of websites is the empty shell website Determining whether the first website is the empty website website by using the other empty shell website conditions in the plurality of empty shell website detection conditions, and determining that the first website is the In the case of the empty shell website, determining that the first website is the empty shell website, otherwise, in the case that the any one of the empty shell websites detects that the first website is not the empty website, The first website is a legal website.
  4. 根据权利要求1至3中任意一项所述的方法,其特征在于,所述空壳网站检测条件包括如下任意一种或多种类型:所述任意一个网站是否在白名单中、是否备案中或备案变更中、在预定时间内是否存在访问记录、是否被注册以及解析结果是否存在报备信息。The method according to any one of claims 1 to 3, wherein the empty shell website detection condition comprises any one or more of the following types: whether the any one of the websites is in the white list, whether it is in the record or not. Or whether there is an access record during the record change, whether it is registered, and whether there is report information in the analysis result.
  5. 根据权利要求4所述的方法,其特征在于,当所述空壳网站检测条件为检测所述任意一个网站是否在白名单中时,使用所述空壳网站检测条件来判定任意一个网站是否为空壳网站的步骤包括:The method according to claim 4, wherein when the empty shell website detecting condition is to detect whether the any one website is in a white list, the empty shell website detecting condition is used to determine whether any one website is The steps of the empty shell website include:
    读取所述任意一个网站的网站信息;Reading the website information of any one of the websites;
    判断所述任意一个网站的网站信息是否与所述白名单中保存的网站信息相匹配;Determining whether the website information of any one of the websites matches the website information saved in the white list;
    在匹配成功的情况下,确定所述任意一个网站为合法网站。In case the matching is successful, it is determined that any one of the websites is a legitimate website.
  6. 根据权利要求4所述的方法,其特征在于,当所述空壳网站检测条件为检测所述任意一个网站在预定时间内是否存在访问记录时,使用所述空壳网站检测条件来判定任意一个网站是否为空壳网站的步骤包括:The method according to claim 4, wherein when the empty shell website detecting condition is to detect whether the any one website has an access record within a predetermined time, the empty shell website detecting condition is used to determine any one The steps for whether a website is an empty website include:
    获取所述多个网站在服务器中记录的域名的访问日志;Obtaining an access log of the domain name recorded by the plurality of websites in the server;
    根据所述任意一个网站的域名在所述访问日志中查询是否在所述预定时间内记录 有访问记录;Querying in the access log according to the domain name of any one of the websites whether to record in the predetermined time Have access records;
    如果在所述预定时间内记录有所述访问记录,确定所述任意一个网站为合法网站。If the access record is recorded within the predetermined time, it is determined that the any one of the websites is a legitimate website.
  7. 根据权利要求4所述的方法,其特征在于,当所述空壳网站检测条件为检测所述任意一个网站为备案中或备案变更中时,使用所述空壳网站检测条件来判定任意一个网站是否为空壳网站的步骤包括:The method according to claim 4, wherein when the empty shell website detecting condition is to detect that any one of the websites is in a record or a record change, the empty shell website detecting condition is used to determine any one website. The steps for an empty shell site include:
    读取所述任意一个网站的网站信息,其中,所述网站信息包括:所述任意一个网站的域名的备案状态;Reading the website information of any one of the websites, wherein the website information includes: a record status of the domain name of any one of the websites;
    判断所述任意一个网站的域名的备案状态是否为备案中或备案变更中;Determining whether the filing status of the domain name of any one of the websites is in the record or in the record change;
    在所述任意一个网站的域名的备案状态为所述备案中或所述备案变更中的情况下,确定所述任意一个网站为合法网站。In a case where the filing status of the domain name of any one of the websites is in the filing or the filing change, it is determined that the any one of the websites is a legitimate website.
  8. 根据权利要求4所述的方法,其特征在于,当所述空壳网站检测条件为所述任意一个网站是否被注册以及解析结果是否存在报备信息,使用所述空壳网站检测条件来判定任意一个网站是否为空壳网站的步骤包括:The method according to claim 4, wherein when the empty shell website detecting condition is whether the any one website is registered and whether the parsing result has report information, the empty shell website detecting condition is used to determine any The steps for whether a website is an empty shell include:
    读取所述任意一个网站的网站信息;Reading the website information of any one of the websites;
    在注册信息表中查询是否存在与所述任意一个网站的网站信息相匹配的信息;Querying in the registration information table whether there is information matching the website information of any one of the websites;
    在匹配成功的情况下,根据所述任意一个网站解析的结果是否存在报备信息来确定所述任意一个网站的类型。In the case that the matching is successful, the type of the any one of the websites is determined according to whether the result of the analysis of any one of the websites has the report information.
  9. 根据权利要求8所述的方法,其特征在于,根据任意一个网站解析的结果是否存在报备信息来确定所述任意一个网站的类型包括:The method according to claim 8, wherein determining the type of the any one of the websites according to whether the result of any one of the website parsings has the report information comprises:
    在所述任意一个网站的IP地址与接入商服务器已经记录的IP地址相同的情况下,确定所述任意一个网站为合法网站;If the IP address of any one of the websites is the same as the IP address already recorded by the access provider server, determining that any one of the websites is a legitimate website;
    在所述任意一个网站的IP地址与所述接入商服务器已经记录的IP地址都不相同的情况下,确定所述任意一个网站为所述空壳网站。In a case where the IP address of any one of the websites is different from the IP address already recorded by the access provider server, it is determined that the any one of the websites is the empty website.
  10. 根据权利要求1所述的方法,其特征在于,在提取待检测的多个网站之后,所述方法还包括:The method according to claim 1, wherein after extracting a plurality of websites to be detected, the method further comprises:
    通过启动至少n个数据分发线程将所述多个网站的网站信息依次写入数据队列中;The website information of the plurality of websites is sequentially written into the data queue by starting at least n data distribution threads;
    通过启动至少m个检测线程从所述数据队列中依次读取所述多个网站的网站信息;Reading website information of the plurality of websites sequentially from the data queue by starting at least m detection threads;
    其中,m和n根据预先设定的检测总时间进行自动调整,m大于等于n,且m和n为自然数。Wherein m and n are automatically adjusted according to a preset total detection time, m is greater than or equal to n, and m and n are natural numbers.
  11. 根据权利要求1所述的方法,其特征在于,每个网站的网站信息还包括所述每 个网站的主办者的终端地址,其中,在输出检测结果为所述空壳网站的网站之后,所述方法还包括:The method of claim 1 wherein the website information for each website further comprises said each The terminal address of the sponsor of the website, wherein after the output of the detection result is the website of the empty website, the method further includes:
    向被确定为所述空壳网站的主办者的终端地址发送告警信息,其中,所述告警信息至少包括所述空壳网站的域名。Sending alarm information to a terminal address determined to be the sponsor of the empty shell website, wherein the alarm information includes at least a domain name of the empty shell website.
  12. 根据权利要求1所述的方法,其特征在于,在向被确定为所述空壳网站的主办者的终端地址发送告警信息之后,所述方法还包括:The method according to claim 1, wherein after the sending of the alarm information to the terminal address of the sponsor determined to be the empty shell website, the method further comprises:
    在预设时长到达之后,重复执行所述步骤A至步骤C,获取再次被确定为所述空壳网站的网站;After the preset duration arrives, the steps A to C are repeatedly performed to obtain a website that is determined to be the empty shell website again;
    将所述再次被确定为空壳网站的网站记录为待清理网站;Recording the website that is determined to be the empty shell website again as the website to be cleaned;
    将所述待清理网站的域名发送至目标服务器。Send the domain name of the website to be cleaned to the target server.
  13. 一种检测空壳网站的装置,其特征在于,包括:An apparatus for detecting a shell website, comprising:
    提取单元,用于提取待检测的多个网站;An extracting unit, configured to extract a plurality of websites to be detected;
    调用单元,用于从网站检测条件集合中调用一个或多个空壳网站检测条件,并使用所述一个或多个空壳网站检测条件来判定任意一个网站是否为空壳网站;a calling unit, configured to invoke one or more empty shell website detection conditions from the website detection condition set, and use the one or more empty shell website detection conditions to determine whether any one website is an empty shell website;
    输出单元,用于输出检测结果为所述空壳网站的网站。An output unit, configured to output a website whose detection result is the empty shell website.
  14. 根据权利要求13所述的装置,其特征在于,所述调用单元包括:The device according to claim 13, wherein the calling unit comprises:
    调用模块,用于按照预定的调用规则调用所述网站检测条件集合中的空壳网站检测条件,其中,所述预定的调用规则包括如下任意一个或多个规则:调用顺序、调用数量和调用类型。The calling module is configured to invoke the empty shell website detecting condition in the website detecting condition set according to a predetermined calling rule, wherein the predetermined calling rule includes any one or more of the following rules: calling order, calling quantity, and calling type .
  15. 根据权利要求13所述的装置,其特征在于,所述调用单元还包括:The device according to claim 13, wherein the calling unit further comprises:
    判定模块,用于使用所述多个空壳网站检测条件中的任意一个空壳网站检测条件判定所述多个网站中的第一网站是所述空壳网站之后,使用所述多个空壳网站检测条件中的其他空壳网站条件来判定所述第一网站是否为所述空壳网站,在所有空壳网站检测条件都判定所述第一网站是所述空壳网站的情况下,确定所述第一网站为所述空壳网站,否则,在所述任意一个空壳网站检测条件判定所述第一网站不是所述空壳网站的情况下,确定所述第一网站为合法网站。a determining module, configured to determine, after using the one of the plurality of empty shell website detection conditions, that the first website of the plurality of websites is the empty shell website, using the plurality of empty shells Determining, by the website detecting condition, other empty shell website conditions to determine whether the first website is the empty shell website, and determining that the first website is the empty shell website when all the empty website detecting conditions determine that the first website is the empty shell website, The first website is the empty website, otherwise, if the any of the empty website detection conditions determines that the first website is not the empty website, the first website is determined to be a legitimate website.
  16. 根据权利要求13至15中任意一项所述的装置,其特征在于,Apparatus according to any one of claims 13 to 15 wherein:
    所述空壳网站检测条件包括如下任意一种或多种类型:所述任意一个网站是否在白名单中、是否备案中或备案变更中、在预定时间内是否存在访问记录、是否被注册以及解析结果是否存在报备信息。 The empty shell website detection condition includes any one or more of the following types: whether the any one of the websites is in the white list, whether it is in the record or in the record change, whether there is an access record within a predetermined time, whether it is registered, and parsing The result is whether there is a report information.
  17. 根据权利要求16所述的装置,其特征在于,所述调用单元还包括:The device according to claim 16, wherein the calling unit further comprises:
    获取模块,用于获取所述多个网站在服务器中记录的域名的访问日志;An obtaining module, configured to obtain an access log of the domain name recorded by the plurality of websites in the server;
    查询模块,用于根据所述任意一个网站的域名在所述访问日志中查询是否在所述预定时间内记录有访问记录;a querying module, configured to query, in the access log, whether an access record is recorded within the predetermined time according to the domain name of any one of the websites;
    第二确定模块,用于如果在所述预定时间内记录有所述访问记录,确定所述任意一个网站为合法网站。And a second determining module, configured to determine that any one of the websites is a legitimate website if the access record is recorded within the predetermined time.
  18. 根据权利要求16所述的装置,其特征在于,所述调用单元包括:The device according to claim 16, wherein the calling unit comprises:
    第二读取模块,用于读取所述任意一个网站的网站信息,其中,所述网站信息包括:所述任意一个网站的域名的备案状态;a second reading module, configured to read website information of any one of the websites, where the website information includes: a filing status of a domain name of any one of the websites;
    第二判断模块,用于判断所述任意一个网站的域名的备案状态是否为备案中或备案变更中;a second determining module, configured to determine whether the filing status of the domain name of the any one of the websites is in the record or in the record change;
    第三确定模块,用于在所述任意一个网站的域名的备案状态为所述备案中或所述备案变更中的情况下,确定所述任意一个网站为合法网站。And a third determining module, configured to determine that any one of the websites is a legitimate website if the filing status of the domain name of the any one of the websites is in the filing or the filing change.
  19. 根据权利要求16所述的装置,其特征在于,所述调用单元包括:The device according to claim 16, wherein the calling unit comprises:
    第三读取模块,用于读取所述任意一个网站的网站信息;a third reading module, configured to read website information of any one of the websites;
    第二查询模块,用于在注册信息表中查询是否存在与所述任意一个网站的网站信息相匹配的信息;a second query module, configured to query, in the registration information table, whether there is information that matches the website information of any one of the websites;
    第四确定模块,用于在匹配成功的情况下,根据所述任意一个网站解析的结果是否存在报备信息来确定所述任意一个网站的类型。And a fourth determining module, configured to determine, according to whether the result of any one of the website parsings is reported, whether the type of the any one of the websites is determined, if the matching is successful.
  20. 根据权利要求19所述的装置,其特征在于,所述第四确定模块包括:The apparatus according to claim 19, wherein the fourth determining module comprises:
    子确定模块,用于根据所述任意一个网站的域名解析而成的IP地址来确定所述任意一个网站的类型;其中,在所述任意一个网站的IP地址与接入商服务器已经记录的IP地址相同的情况下,确定所述任意一个网站为正常网站;在所述任意一个网站的IP地址与所述接入商服务器已经记录的IP地址都不相同的情况下,确定所述任意一个网站为所述空壳网站。a sub-determination module, configured to determine, according to the IP address of the domain name of any one of the websites, the type of the any one of the websites; wherein, the IP address of the any one of the websites and the IP that has been recorded by the access provider server If the addresses are the same, it is determined that the any one of the websites is a normal website; and if the IP address of any one of the websites is different from the IP address already recorded by the access server, determining any one of the websites For the empty shell website.
  21. 根据权利要求13所述的装置,其特征在于,所述装置还包括:The device according to claim 13, wherein the device further comprises:
    数据分发单元,用于通过启动至少n个数据分发线程将所述多个网站的网站信息依次写入数据队列中;a data distribution unit, configured to sequentially write website information of the plurality of websites into the data queue by starting at least n data distribution threads;
    检测单元,用于通过启动至少m个检测线程从所述数据队列中依次读取所述多个网站的网站信息;其中,m和n根据预先设定的检测总时间进行自动调整,m大于等于n, 且m和n为自然数。a detecting unit, configured to sequentially read website information of the plurality of websites from the data queue by starting at least m detecting threads; wherein, m and n are automatically adjusted according to a preset total detection time, where m is greater than or equal to n, And m and n are natural numbers.
  22. 一种检测空壳网站的系统,其特征在于,所述系统包括:A system for detecting a shell website, characterized in that the system comprises:
    备案服务器,用于存储多个网站的信息;a filing server for storing information of a plurality of websites;
    检测服务器,与所述备案服务器建立通信关系,用于在从所述备案服务器中提取待检测的多个网站之后,从网站检测条件集合中调用一个或多个空壳网站检测条件,并使用所述一个或多个空壳网站检测条件来判定任意一个网站是否为空壳网站,输出检测结果为所述空壳网站的网站。The detecting server establishes a communication relationship with the filing server, and after extracting a plurality of websites to be detected from the filing server, calling one or more empty shell website detecting conditions from the website detecting condition set, and using the Describe one or more empty shell website detection conditions to determine whether any one website is a shell website, and output a detection result to the website of the empty shell website.
  23. 根据权利要求22所述的系统,其特征在于,所述检测服务器还用于按照预定的调用规则调用所述网站检测条件集合中的空壳网站检测条件,其中,所述预定的调用规则包括如下任意一个或多个规则:调用顺序、调用数量和调用类型。The system according to claim 22, wherein the detecting server is further configured to invoke the empty shell website detecting condition in the website detecting condition set according to a predetermined calling rule, wherein the predetermined calling rule comprises the following Any one or more rules: call order, number of calls, and type of call.
  24. 根据权利要求22所述的系统,其特征在于,所述检测服务器还用于使用在所述多个空壳网站检测条件中的任意一个空壳网站检测条件判定所述多个网站中的第一网站是所述空壳网站之后,使用所述多个空壳网站检测条件中的其他空壳网站条件来判定所述第一网站是否为所述空壳网站,在所有空壳网站检测条件都判定所述第一网站是所述空壳网站的情况下,确定所述第一网站为所述空壳网站,否则,在所述任意一个空壳网站检测条件判定所述第一网站不是所述空壳网站的情况下,确定所述第一网站为合法网站。The system according to claim 22, wherein said detecting server is further configured to determine a first one of said plurality of websites using any one of said plurality of empty shell website detecting conditions After the website is the empty shell website, use the other empty shell website conditions in the plurality of empty shell website detection conditions to determine whether the first website is the empty shell website, and the detection conditions are determined on all the empty shell websites. If the first website is the empty website, determining that the first website is the empty website, otherwise, detecting, by the any one of the empty website, that the first website is not the empty In the case of a shell website, it is determined that the first website is a legitimate website.
  25. 根据权利要求22至24中任意一项所述的系统,其特征在于,所述空壳网站检测条件包括如下任意一种或多种类型:所述任意一个网站是否在白名单中、是否备案中或备案变更中、在预定时间内是否存在访问记录、是否被注册以及解析结果是否存在报备信息。 The system according to any one of claims 22 to 24, wherein the empty shell website detection condition comprises any one or more of the following types: whether the any one of the websites is in the white list, whether or not it is in the record. Or whether there is an access record during the record change, whether it is registered, and whether there is report information in the analysis result.
PCT/CN2016/100734 2015-10-08 2016-09-29 Method, device and system for detecting shell website WO2017059778A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201510646922.4 2015-10-08
CN201510646922.4A CN106571971B (en) 2015-10-08 2015-10-08 Method, device and system for detecting vacant website

Publications (1)

Publication Number Publication Date
WO2017059778A1 true WO2017059778A1 (en) 2017-04-13

Family

ID=58487394

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2016/100734 WO2017059778A1 (en) 2015-10-08 2016-09-29 Method, device and system for detecting shell website

Country Status (2)

Country Link
CN (1) CN106571971B (en)
WO (1) WO2017059778A1 (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111726330A (en) * 2019-06-28 2020-09-29 上海妃鱼网络科技有限公司 IP-based secure login control method and server
CN112966204A (en) * 2021-03-18 2021-06-15 北京金山云网络技术有限公司 Website filing information submitting method and device
CN114070599A (en) * 2021-11-11 2022-02-18 北京顶象技术有限公司 Method and device for identifying unsafe equipment of user side
CN117439821A (en) * 2023-12-20 2024-01-23 成都无糖信息技术有限公司 Website judgment method and system based on data fusion and multi-factor decision method

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102739653A (en) * 2012-06-06 2012-10-17 奇智软件(北京)有限公司 Detection method and device aiming at webpage address
CN102882716A (en) * 2012-09-25 2013-01-16 杭州安恒信息技术有限公司 Ministry of industry and information technology recording detecting method and system
US20130042222A1 (en) * 2011-08-08 2013-02-14 Computer Associates Think, Inc. Automating functionality test cases
CN103744941A (en) * 2013-12-31 2014-04-23 北京百度网讯科技有限公司 Method and device for determining website evaluation result based on website attribute information
CN104954188A (en) * 2015-06-30 2015-09-30 北京奇虎科技有限公司 Cloud based web log security analysis method, device and system

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102647408A (en) * 2012-02-27 2012-08-22 珠海市君天电子科技有限公司 Method for judging phishing website based on content analysis

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20130042222A1 (en) * 2011-08-08 2013-02-14 Computer Associates Think, Inc. Automating functionality test cases
CN102739653A (en) * 2012-06-06 2012-10-17 奇智软件(北京)有限公司 Detection method and device aiming at webpage address
CN102882716A (en) * 2012-09-25 2013-01-16 杭州安恒信息技术有限公司 Ministry of industry and information technology recording detecting method and system
CN103744941A (en) * 2013-12-31 2014-04-23 北京百度网讯科技有限公司 Method and device for determining website evaluation result based on website attribute information
CN104954188A (en) * 2015-06-30 2015-09-30 北京奇虎科技有限公司 Cloud based web log security analysis method, device and system

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111726330A (en) * 2019-06-28 2020-09-29 上海妃鱼网络科技有限公司 IP-based secure login control method and server
CN112966204A (en) * 2021-03-18 2021-06-15 北京金山云网络技术有限公司 Website filing information submitting method and device
CN112966204B (en) * 2021-03-18 2023-11-03 北京金山云网络技术有限公司 Website record information submitting method and device
CN114070599A (en) * 2021-11-11 2022-02-18 北京顶象技术有限公司 Method and device for identifying unsafe equipment of user side
CN117439821A (en) * 2023-12-20 2024-01-23 成都无糖信息技术有限公司 Website judgment method and system based on data fusion and multi-factor decision method

Also Published As

Publication number Publication date
CN106571971B (en) 2020-12-29
CN106571971A (en) 2017-04-19

Similar Documents

Publication Publication Date Title
CN104573611B (en) A kind of distributed recognition of face group system
CN104320756B (en) A kind of variation and device of account information
CN107094158B (en) Automatic change intranet security fragile analytic system
WO2017059778A1 (en) Method, device and system for detecting shell website
US10231124B2 (en) Anti-theft method and client for a mobile terminal
TWI507063B (en) Method, terminal, server and system for sharing information
WO2017101606A1 (en) System and method for collecting and analyzing data
CN104184763B (en) A kind of feedback information processing method and system, service equipment
US9137245B2 (en) Login method, apparatus, and system
CN108173813B (en) Vulnerability detection method and device
WO2014172956A1 (en) Login method,apparatus, and system
CN105426761B (en) A kind of recognition methods of illegal application and mobile terminal
CN110502318A (en) Event-handling method, event processing server, storage medium and device
CN112735050A (en) Cabinet opening processing method, device and system based on intelligent cabinet
CN104125485B (en) A kind of user profile shared method, equipment and system
WO2018121266A1 (en) Method and device for obtaining application and terminal device
CN109688094B (en) Suspicious IP configuration method, device, equipment and storage medium based on network security
CN109495378A (en) Detect method, apparatus, server and the storage medium of abnormal account number
CN104333538B (en) A kind of network equipment access method
CN103490978A (en) Terminal, server and message monitoring method
CN107648854B (en) Game classification method and device and electronic equipment
US20150350809A1 (en) Terminal peripheral management method and m2m gateway
CN103763181A (en) Automatic attribute setting device and method
KR20130065322A (en) Sns trap collection system and url collection method by the same
WO2016037489A1 (en) Method, device and system for monitoring rcs spam messages

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 16853083

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 16853083

Country of ref document: EP

Kind code of ref document: A1