CN109302299B - Website broken link detection method and device - Google Patents

Website broken link detection method and device Download PDF

Info

Publication number
CN109302299B
CN109302299B CN201710612685.9A CN201710612685A CN109302299B CN 109302299 B CN109302299 B CN 109302299B CN 201710612685 A CN201710612685 A CN 201710612685A CN 109302299 B CN109302299 B CN 109302299B
Authority
CN
China
Prior art keywords
website
link
information
abnormal information
feeding back
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201710612685.9A
Other languages
Chinese (zh)
Other versions
CN109302299A (en
Inventor
潘峰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Gridsum Technology Co Ltd
Original Assignee
Beijing Gridsum Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Gridsum Technology Co Ltd filed Critical Beijing Gridsum Technology Co Ltd
Priority to CN201710612685.9A priority Critical patent/CN109302299B/en
Publication of CN109302299A publication Critical patent/CN109302299A/en
Application granted granted Critical
Publication of CN109302299B publication Critical patent/CN109302299B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/50Network service management, e.g. ensuring proper service fulfilment according to agreements
    • H04L41/5003Managing SLA; Interaction between SLA and QoS
    • H04L41/5009Determining service level performance parameters or violations of service level contracts, e.g. violations of agreed response time or mean time between failures [MTBF]
    • H04L41/5012Determining service level performance parameters or violations of service level contracts, e.g. violations of agreed response time or mean time between failures [MTBF] determining service availability, e.g. which services are available at a certain point in time
    • H04L41/5016Determining service level performance parameters or violations of service level contracts, e.g. violations of agreed response time or mean time between failures [MTBF] determining service availability, e.g. which services are available at a certain point in time based on statistics of service availability, e.g. in percentage or over a given time
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L43/00Arrangements for monitoring or testing data switching networks
    • H04L43/08Monitoring or testing based on specific metrics, e.g. QoS, energy consumption or environmental parameters
    • H04L43/0805Monitoring or testing based on specific metrics, e.g. QoS, energy consumption or environmental parameters by checking availability
    • H04L43/0811Monitoring or testing based on specific metrics, e.g. QoS, energy consumption or environmental parameters by checking availability by checking connectivity

Abstract

The invention discloses a website broken link detection method and device, relates to the technical field of networks, and aims to solve the problem that the accuracy of a detection result is low when website broken link detection is carried out in the prior art. The method of the invention comprises the following steps: acquiring a website link for feeding back abnormal information in a website; judging whether the number of the website links feeding back the abnormal information exceeds the link breakage threshold value of the website or not; if not, determining that the website link feeding back the abnormal information is website broken link; if the feedback information is the same as the abnormal information, determining that the website link feeding back the abnormal information is website broken link. The method is suitable for detecting the website broken link.

Description

Website broken link detection method and device
Technical Field
The invention relates to the technical field of networks, in particular to a website broken link detection method and device.
Background
With the gradual popularization of networks, the networks have become important components in the life of people. When browsing web pages in a website, situations such as web page display errors may occur, and generally, these web page links that cannot be browsed are called website links breaking. For a website, the number of broken links of the website is an important index for measuring the quality of the website. Therefore, the website link breaking is usually detected on the website, so as to achieve the effect of monitoring the quality of the website. Generally, when website link breakage is detected on a website, a crawler is used for simulating an access behavior of a user, and whether the website link is the website link breakage is determined according to feedback information of the website link, so that a website link breakage detection function is realized.
However, when using the crawler, the crawler may be disabled by the website, so that the crawler may be abnormal in function, and thus, when using the crawler to detect the website broken link, the accuracy of the detection result is low. In order to eliminate the situation that the crawler is disabled and ensure that the crawler functions normally, in the prior art, when the crawler is used to detect a website broken link, one link of the website is generally selected as a reference chain, and is used as a reference object when other links of the website are crawled. However, in actual operation, because the reference chain is not fixed and constant, the reference chain is often changed due to version change or upgrading of the website, so that the reference chain is disabled, and the condition that the crawler is disabled cannot be eliminated, and the detection result is affected, so that the accuracy of website broken chain detection is low.
Disclosure of Invention
In view of the above problems, the present invention provides a method and an apparatus for detecting website link down, and mainly aims to improve the accuracy of website link down detection.
In order to solve the above technical problem, in a first aspect, the present invention provides a website broken link detection method, including:
acquiring a website link for feeding back abnormal information in a website;
judging whether the number of the website links feeding back the abnormal information exceeds the link breakage threshold value of the website or not;
if not, determining that the website link feeding back the abnormal information is website broken link;
if the feedback information is the same as the abnormal information, determining that the website link feeding back the abnormal information is website broken link.
Optionally, if the feedback information obtained by detecting the website link fed back with the abnormal information by the proxy server is the same as the abnormal information, determining that the website link fed back with the abnormal information is website broken link includes:
acquiring address information through a proxy server;
crawling the website link of the feedback abnormal information by using a crawler according to the address information, and acquiring the feedback information of the website link of the feedback abnormal information;
judging whether the feedback information of the website link feeding back the abnormal information is the same as the abnormal information or not;
and if the feedback information is the same as the feedback information, determining that the website link feeding the abnormal feedback information is website broken link.
Optionally, if it is determined that the number of the website links feeding back the abnormal information does not exceed the link-breaking threshold of the website, determining that the website links feeding back the abnormal information are website link-breaking includes:
if the number of the website links feeding back the abnormal information does not exceed the link breaking threshold value of the website, acquiring address information through a proxy server;
crawling the website link of the feedback abnormal information by using a crawler according to the address information, and acquiring the feedback information of the website link of the feedback abnormal information;
and when the feedback information of the website link feeding back the abnormal information is the same as the abnormal information, determining that the website link is actually the website broken link.
Optionally, the method further includes:
when the feedback information detected by the proxy server for the website link feeding back the abnormal information is different from the abnormal information, determining that the website link is not website broken link;
and outputting reminding information, wherein the reminding information is used for indicating that the original server address used by the crawler is forbidden by the website.
Optionally, the acquiring the website link for feeding back the abnormal information in the website includes:
crawling all links of the website layer by using a crawler;
determining website links for feeding back abnormal information in the website links according to information fed back by the website links crawled by the crawler layer by layer;
and calculating the number of the website links feeding back the abnormal information.
Optionally, the abnormal information is a webpage abnormal state code.
In a second aspect, the present invention further provides a website broken link detection apparatus, including:
the acquisition unit is used for acquiring the website link for feeding back the abnormal information in the website;
the judging unit is used for judging whether the number of the website links of the feedback abnormal information acquired by the acquiring unit exceeds the link breaking threshold of the website or not;
the determining unit is used for determining the website link feeding back the abnormal information as the website link breakage if the judging unit judges that the number of the website links feeding back the abnormal information does not exceed the link breakage threshold of the website;
the detection unit is used for detecting the website links feeding back the abnormal information through the proxy server if the judgment unit judges that the number of the website links feeding back the abnormal information exceeds the link-breaking threshold value of the website;
the determining unit is further configured to determine that the website link feeding the abnormal information is a website broken link when the feedback information detected by the detecting unit is the same as the abnormal information.
Optionally, the determining unit includes:
the first acquisition module is used for acquiring the address information through the proxy server;
the first crawling module is used for crawling the website link feeding back the abnormal information by using a crawler according to the address information acquired by the first acquiring module;
the first obtaining module is further configured to obtain feedback information of the website link where the abnormal information is fed back;
the judging module is used for judging whether the feedback information of the website link for feeding back the abnormal information acquired by the first acquiring module is the same as the abnormal information;
the first determining module is configured to determine that the website link feeding back the abnormal information is a website broken link if the judging module judges that the feedback information of the website link feeding back the abnormal information is the same as the abnormal information.
Optionally, the determining unit includes:
a second obtaining module, configured to obtain, by the proxy server, address information after it is determined that the number of website links for which the feedback abnormal information is sent does not exceed the link-breaking threshold of the website;
the second crawling module is used for crawling the website link feeding back the abnormal information by using a crawler according to the address information acquired by the second acquiring module;
the second obtaining module is further configured to obtain feedback information of the website link of the feedback abnormal information;
and the second determining module is used for determining that the website link is really a website broken link when the feedback information of the website link which feeds back the abnormal information and is acquired by the second acquiring module is the same as the abnormal information.
Optionally, the apparatus further comprises:
the determining unit is further configured to determine that the website link is not a website broken link when feedback information for detecting the website link feeding back the abnormal information by the proxy server is different from the abnormal information;
and the output unit is used for outputting reminding information, and the reminding information is used for indicating that the original server address used by the crawler is forbidden by the website.
Optionally, the obtaining unit includes:
the crawling module is used for crawling all links of the website layer by using a crawler;
the determining module is used for determining the website link which feeds back abnormal information in the website links according to information fed back by the website links crawled layer by the crawler;
and the calculation module is used for calculating the number of the website links which feed back the abnormal information and are determined by the determination module.
Optionally, the abnormal information is a webpage abnormal state code.
In order to achieve the above object, according to a third aspect of the present invention, there is provided a storage medium including a stored program, wherein when the program runs, a device on which the storage medium is located is controlled to execute the above website broken link detection method.
In order to achieve the above object, according to a fourth aspect of the present invention, there is provided a processor for executing a program, wherein the program executes to execute the website broken link detection method.
By means of the technical scheme, the method and the device for detecting the website broken link have the advantages that the problem that in the prior art, when the website broken link is determined by using the reference link, the reference link fails, the detection result is susceptible, and accordingly the accuracy of the website broken link detection is low is solved. In addition, when the link quantity of the feedback abnormal information is determined to exceed the link-breaking threshold value, the proxy server detects the website link feeding back the abnormal information, whether the crawler function is forbidden or not can be identified, and then when the feedback information detected by the proxy server is the same as the abnormal information, the current link feeding back the abnormal information is determined to be the website link-breaking, and the influence of the forbidden crawler on the website link-breaking detection is further eliminated, so that the problem that the website link-breaking detection is influenced by the failure of a reference link in the prior art is avoided, and the accuracy of the website link-breaking is further improved.
The foregoing description is only an overview of the technical solutions of the present invention, and the embodiments of the present invention are described below in order to make the technical means of the present invention more clearly understood and to make the above and other objects, features, and advantages of the present invention more clearly understandable.
Drawings
Various other advantages and benefits will become apparent to those of ordinary skill in the art upon reading the following detailed description of the preferred embodiments. The drawings are only for purposes of illustrating the preferred embodiments and are not to be construed as limiting the invention. Also, like reference numerals are used to refer to like parts throughout the drawings. In the drawings:
fig. 1 shows a flowchart of a website link breakage detection method according to an embodiment of the present invention;
FIG. 2 is a flow chart of another website link-breaking detection method according to an embodiment of the present invention;
FIG. 3 is a block diagram illustrating a website link-breaking detection apparatus according to an embodiment of the present invention;
fig. 4 is a block diagram illustrating another website broken link detection apparatus according to an embodiment of the present invention.
Detailed Description
Exemplary embodiments of the present invention will be described in more detail below with reference to the accompanying drawings. While exemplary embodiments of the invention are shown in the drawings, it should be understood that the invention can be embodied in various forms and should not be limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the invention to those skilled in the art.
In order to improve the accuracy of website link breakage detection, an embodiment of the present invention provides a website link breakage detection method, as shown in fig. 1, the method includes:
101. and acquiring the website link for feeding back the abnormal information in the website.
Generally, when a web page link in a website is accessed, the website link responds to an access request and sends feedback information to a client, and when the web page link is not a website broken link, the web page can normally access and feed back the normal feedback information. However, when the web page link is a web site link failure, because the web page link cannot be normally accessed, the feedback information of the web page in response is abnormal information, where the abnormal information may include any one of different manners, such as abnormal web page data or abnormal web page status codes. In addition, when the website broken link is detected, a crawler can be selected to crawl the webpage link of the website so as to simulate the access behavior of the user to the webpage link. It should be noted that, in the embodiment of the present invention, the access behavior of the user is mainly simulated in a manner of using a crawler, but other manners may also be selected as a specific implementation manner of simulating the access behavior of the user, which is not limited herein and may be selected as needed.
Therefore, according to the method in this step, a crawler can be used to crawl the website, and according to the feedback information received after crawling, the website link feeding the abnormal information is obtained, and the website link feeding the abnormal information is obtained.
102. And judging whether the number of the website links feeding back the abnormal information exceeds the link breakage threshold value of the website.
Generally, a website contains a large number of web page links, and not all the links are normal links, but more or less abnormal web page links, such as web page links breaking, exist. For this case, whether it is the administrator of the website or the visitor to the website, a certain degree of tolerance is maintained for website links broken whose number does not exceed a threshold. Therefore, a concept of a broken link threshold is derived, the broken link threshold can be regarded as a "red line" of the broken link number of the website, and when the broken link threshold is exceeded, it means that the number of broken links of the current website is too large and the user experience of the website is seriously affected, and a relevant process is required. Therefore, the link-breaking threshold of the website can be used as a reference in the detection process or the analysis process of the website link-breaking.
According to the method provided by the embodiment of the invention, the web site broken link detection is carried out by adopting the crawler. Therefore, when the crawler is forbidden by the website, the crawler cannot crawl the link in the target website, and the obtained feedback information is abnormal information, so that the accuracy of the website broken link detection result is influenced. In fact, however, the links to the web site are normally accessible without the use of a crawler. For this situation, in order to verify whether the website disables the crawler, according to the method in this step, after the website link feeding back the abnormal information is obtained in step 101, the number of the website links feeding back the abnormal information is counted, and then the counted number is compared with the link-breaking threshold of the website to determine whether the number exceeds the link-breaking threshold.
In addition, after step 102 is executed, if it is determined that the number of website links for feeding back the abnormal information does not exceed the link-breaking threshold of the website, step 103 is executed, and the steps specifically include:
103. and determining the website link feeding back the abnormal information as a website broken link.
And when the number of the website links feeding back the abnormal information does not exceed the link-breaking threshold of the website, the current function of the crawler is normal, and the condition that the crawler is forbidden does not occur. The crawler can crawl the links of the web pages in the website normally, so that the fact that the links of the website, which feed back abnormal information when the crawler crawls, are the links which cannot be accessed normally is confirmed, the links can be determined to be the broken links of the website, and the accuracy of detecting the broken links of the website is guaranteed.
In addition, after step 102 is executed, if it is determined that the number of website links for feeding back the abnormal information exceeds the link-breaking threshold of the website, step 104 is executed, and the steps specifically include:
104. and detecting the website link feeding back the abnormal information through a proxy server, and determining that the website link feeding back the abnormal information is website broken link when the detected feedback information is the same as the abnormal information.
After the determination in step 102, if it is determined that the number of the website links feeding back the abnormal information exceeds the link-breaking threshold of the website, it indicates that the current crawler may be disabled, so that all the web pages crawled by the crawler feed back the abnormal information, and the number of the links feeding back the abnormal information exceeds the link-breaking threshold. In this case, it is shown that currently, the detection of website links broken by crawlers is likely to be inaccurate, and the website links feeding back abnormal information need to be confirmed by other ways. In the method in this step, the proxy server is used to detect the website link feeding back the abnormal information once again, and determine whether the website link feeding back the abnormal information is a website broken link according to the detection result. Therefore, by detecting the website link feeding back the abnormal information through the proxy server, when the feedback information is the same as the previous abnormal information, the detected website link is determined to be a link which cannot be normally accessed, and therefore the website link feeding back the abnormal information at present can be determined to be website broken link.
For example, when website links of website a are detected by a crawler, when the number of links feeding back abnormal information is 33 and the link-breaking threshold is 5, according to the method described in this step, since the number of website links feeding back abnormal information is 33 and is greater than the link-breaking threshold 5, it can be determined that the crawler is likely to be disabled by website a, and therefore the detection of the 33 website links feeding back abnormal information is likely to be inaccurate. Therefore, the 33 website links feeding back abnormal information are detected by the proxy server. When the website link a is detected by the proxy server and the fed back information is the same as the fed back abnormal information, the website link a is really a link which cannot be accessed, so that the website link a can be determined to be website broken link.
According to the website broken link detection method provided by the embodiment of the invention, for the problem that the accuracy of website broken link detection is low due to the fact that a detection result is easily influenced because a reference chain fails when the website broken link is determined through the reference chain in the prior art, the broken link threshold of a website is compared with the number of website links for feeding back abnormal information, and the crawler function is determined to be normal when the number does not exceed the threshold, so that the normal function of the website broken link detection is determined; on the other hand, when the number exceeds the threshold value, the proxy server is used for detecting to further determine whether the functions of the crawler are normal or not, so that whether the website chain breakage detection functions are normal or not is determined.
Further, as a refinement and an extension of the embodiment shown in fig. 1, the embodiment of the present invention further provides another website broken link detection method, as shown in fig. 2.
201. And acquiring the website link for feeding back the abnormal information in the website.
The abnormal information in the embodiment of the present invention may be a web page abnormal state code. The web page abnormal state code actually refers to a state code in the web page state code for indicating that the web page state is abnormal. When the user side sends an access request to the webpage, the webpage can feed back according to the request, wherein the webpage comprises a webpage state code used for representing whether the webpage can be normally accessed.
The web page Status Code, also called HTTP Status Code (HTTP Status Code), is a 3-bit digital Code used to indicate the HTTP response Status of the web server. It is defined by the RFC 2616 specification and extended by specifications such as RFC 2518, RFC 2817, RFC 2295, RFC 2774, RFC 4918, and the like. The web page abnormality status code according to the embodiment of the present invention is a part of the web page status code indicating the web page abnormality, for example, "404", "503", and the like.
Therefore, according to the method in the step, firstly, the access behavior of the user is simulated through the crawler, and all webpage links in the website to be detected are crawled layer by layer. Because the webpage link of the website can feed back the corresponding webpage state code after being crawled, all state codes fed back by the link of the website and the link corresponding to the webpage state code are obtained after crawling.
Then, the abnormal web page status codes are screened out from the acquired web page status codes, and it should be noted here that each status code corresponds to a fixed meaning because the web page status codes are three digits edited according to inherent rules. Thus, in the method described in this step, the status code representing the web page abnormality can be screened out according to the inherent meaning of the status code, for example, if "404" represents that the access request fails, the status code can be screened out as the web page abnormality status code described in this step. And determining the web page link feeding back the code according to the screened web page abnormal state code, thereby obtaining the web page link feeding back the web page abnormal state code in the web site.
And finally, calculating the website links of the feedback abnormal webpage state codes, and calculating the number of the website links of the feedback abnormal information.
Whether the feedback information is abnormal information or not is determined through the website abnormal state code in the step, so that the analysis and determination processes of the feedback information can be simplified, the analysis time is reduced, the time consumption of website broken link detection in the embodiment of the invention is further improved, and the detection efficiency is improved.
202. And judging whether the number of the website links feeding back the abnormal information exceeds the link breakage threshold value of the website.
The link breaking threshold of the website in this step is the same as that described in step 102 in the foregoing embodiment, and is not described herein again. Therefore, according to the method described in this step, after the number of links of the website for feeding back the abnormal status code of the webpage is calculated in step 201, the number can be compared with the link-breaking threshold, and a determination can be made according to the comparison result.
In addition, after the step 202 is executed, if it is determined that the number of website links for feeding back the abnormal information does not exceed the link-breaking threshold of the website, the step 203 is executed, specifically:
203. and determining the website link feeding back the abnormal information as a website broken link.
Generally, when a crawler is used to crawl a website link, the crawler may be disabled by the website, that is, feedback received by the crawler crawling any link of the website is a webpage abnormal status code, and the feedback is not a normal webpage status code obtained by using the crawler to access the link of the website. In this case, the web page status code obtained by the crawler to simulate the user is inaccurate because the crawler's functionality has actually failed.
Therefore, when the crawler function is disabled by the website, the number of website links for feeding back the abnormal status code of the webpage generally exceeds the preset broken link threshold of the website. Therefore, according to the method in this step, when it is determined that the number of website links feeding back the abnormal status code of the webpage does not exceed the link-breaking threshold of the website, it indicates that the current crawler is not disabled and functions normally. Therefore, the fact that the website link crawled by the crawler is an abnormal link is proved, and the website link feeding back the webpage abnormal state code can be determined to be website broken link.
In addition, in order to further improve the accuracy of website link breakage detection, after the number of website links of the feedback webpage abnormal state code is judged not to exceed the link breakage threshold value of the website, the website links of the feedback webpage abnormal state code can be further detected through a proxy server.
Specifically, first, address information may be acquired by the proxy server. Because, when using crawlers, crawlers crawl links to web pages in a web site based on address information. For example, a fixed IP address is used to crawl all links to a target web site. However, in the method according to the embodiment of the present invention, the main purpose is to solve the problem that the detection result is inaccurate due to the fact that the crawler is disabled, and normally, the web site disabling the crawler is mainly performed in a manner of closing the address of the crawler, that is, no matter what operation is performed at the address where the crawler is located, the web site does not normally respond to the request of the address. Thus, the proxy server is used to provide the crawler with additional addresses when it is determined that the link-breaking threshold has not been exceeded in this step, thereby precluding the address used by the crawler from being disabled.
And then, crawling the website link of the feedback webpage abnormal state code again by using a crawler according to the address information. After being crawled, the crawled webpage link feeds back a webpage state code to obtain the webpage state code, namely the website link which feeds back the abnormal state code is connected with the webpage state code which is fed back after being crawled again.
And finally, comparing the website state code obtained after the crawler is crawled again with the webpage abnormal state code obtained by crawling before, and when the two state codes are the same, showing that no matter the original address of the crawler is used for crawling or the address provided by the proxy server is used for crawling, the webpage link feeds back the webpage abnormal state code, so that the function of the crawler is shown to be normal, and the website link is determined to be the website broken link indeed.
For example, when the link-breaking threshold of the website B is 10, and the currently acquired website links of the feedback abnormal state code are respectively link 1, link 2, link 3, link 4, link 5, link 6, and link 7, and when the number of the currently acquired website links of the feedback abnormal state code is 7, since the number is less than 10 link-breaking thresholds, according to the method described in the step, it may be determined that the current crawler is not disabled, and it is determined that the 7 links are website link-breaking of the website B; further, however, when it is determined that the number of links feeding back the abnormal status code is 7 and less than the broken link threshold value of 10, the proxy server may be used to provide a new address for the crawler and re-crawl the 7 links. When a new address provided by the proxy server is not used, the feedback obtained by crawling the link 1 by using the crawler is the webpage abnormal state code '404' of which the webpage request fails, and the state code of the feedback received when the link 1 is crawled again is the same as the previous state code '404', which indicates that whether the original address of the crawler is used for crawling or the new address provided by the proxy server is replaced to crawl the link 1 is the webpage abnormal state code '404' of which the request does not respond, so that the crawler can be determined not to be disabled, and the link 1 is really the website broken link.
Therefore, according to the method in the step, the crawler function can be determined to be normal by comparing with the link-breaking threshold value, so that the website link feeding back the webpage abnormal state code can be ensured to be the website link-breaking indeed, the influence of the prohibition of the crawler on the detection result is eliminated, and the accuracy of website link-breaking detection is improved. And further, when the number of the web page links feeding back the web page abnormal state codes does not exceed the link-breaking threshold value, the proxy server is used for further detecting the web page links, and when the address provided by the proxy server is used for crawling the fed back state codes again to be the same as the previously acquired web page abnormal state codes, the website link-breaking is determined, and the condition that the current crawler is forbidden is further eliminated, so that the accuracy of the website link-breaking detection is further improved.
In addition, after step 202 is executed, if it is determined that the number of website links for feeding back the abnormal information exceeds the link-breaking threshold of the website, step 204 is executed, and the step specifically includes:
204. and detecting the website link feeding back the abnormal information through a proxy server, and determining that the website link feeding back the abnormal information is website broken link when the detected feedback information is the same as the abnormal information.
After the method in step 202 is executed, when it is determined that the number of links of the website feeding back the abnormal status code of the webpage exceeds the website link-breaking threshold, it indicates that the current crawler may be disabled by the website, and therefore, in order to ensure the accuracy of detecting the website link-breaking, a proxy server needs to be used to detect the links, where a specific detection process may be: firstly, address information is obtained through a proxy server, and the address information is used as a new address when a crawler re-crawls a website link. And then, re-crawling the website link feeding back the abnormal state code of the webpage through the determined new address, and acquiring the webpage state code fed back by the link according to the feedback of the link on crawling. And then comparing the webpage state code fed back after re-crawling with the webpage abnormal state code fed back during previous crawling, and judging whether the two are the same. And finally, when the two state codes are the same, the feedback of the website link in the two crawling processes is the webpage abnormal state code, and the current website link can be determined to be the website broken link indeed.
For example, when the link-breaking threshold of the website C is 5, and the currently acquired website links of the feedback abnormal status code are respectively link a, link b, link C, link d, link e, link f, and link g, and 7, since the number is greater than 5, according to the method described in the step, the function of the crawler may have been disabled, and thus the detection result may be inaccurate, and further, an additional method may be required for detection, for example, an agent service may be used for detection to ensure the accuracy of the detection of the website link-breaking. When a new address provided by the proxy server is not used, the feedback obtained by crawling the link b by using the crawler is the webpage abnormal state code '503' which the server cannot process the request currently, and the state code '503' of the feedback received when the link b is crawled again is the same as the previous state code, which indicates that whether the original address of the crawler is used for crawling or the new address provided by the proxy server is replaced to crawl the link b, the result is the webpage abnormal state code '503' which the request does not respond to, so that the link b can be determined to be the website broken link.
Therefore, according to the method in the step, when the chain breakage threshold value is exceeded and whether the crawler is forbidden or not cannot be determined, the interference of the website forbidden crawler function on the detection result is eliminated through the proxy server, and the accuracy of website chain breakage detection is improved.
In addition, after step 202 is executed, when the feedback information detected by the proxy server for the website link feeding back the abnormal information is different from the abnormal information, step 205 is executed, and the step specifically is:
205. and when the feedback information detected by the proxy server for the website link feeding back the abnormal information is different from the abnormal information, determining that the website link is not website broken link.
According to the method of the embodiment of the invention, when the proxy server is used for detecting the website link feeding back the abnormal status code of the webpage, except that the status codes fed back by crawling twice are determined to be the same in step 204, different situations exist, namely the abnormal status code of the webpage is fed back when the original address is used by the crawler to crawl the link, and the normal status code of the webpage is fed back after the new address provided by the proxy server is used, so that the feedback results of crawling twice are different.
Therefore, according to the method of this step, there is a case where the feedback status code detected by the proxy server for the website link feeding back the abnormal status code is different from the abnormal status code. This is because, as described in step 203, the website performs a blocking operation on the address where the crawler is located, that is, the website does not respond no matter what operation action is requested by the address to the website. Therefore, after the original address of the crawler is closed, the crawler can be crawled normally when another address is replaced, so that the website link feeding back the abnormal state code of the webpage is determined to be normal actually, only the original address where the crawler is located is forbidden, and the link is determined not to be the website broken link.
206. And outputting the reminding information.
After determining that the website link is not a website link-breaking link in step 205, it is necessary to output a prompting message, where the prompting message is mainly used to send a prompt to relevant people to prompt that the currently used client address of the crawler is closed by the website, and the function of crawling by the crawler at the address is disabled, so that the relevant people can perform corresponding operations, for example, replacing the address information of the existing client.
Therefore, according to the method described in step 205-206, when it is determined that the website link feeding back the abnormal status code of the webpage is different from the status code fed back by using the proxy server, it may be determined that the website link is not a website broken link, thereby confirming that the original address where the crawler is located is closed, and the function of the crawler is practically prohibited when crawling using the original address, thereby ensuring that the condition that the crawler is prohibited is found in time, and outputting the prompt information may facilitate relevant personnel to adjust the crawler in time, thereby ensuring that the condition of misjudgment due to the prohibition of the function of the crawler when performing website broken link detection by using the crawler is not occurred.
Further, as an implementation of the method shown in fig. 1, an embodiment of the present invention further provides a website broken link detection apparatus, which is used for implementing the method shown in fig. 1. The embodiment of the apparatus corresponds to the embodiment of the method, and for convenience of reading, details in the embodiment of the apparatus are not repeated one by one, but it should be clear that the apparatus in the embodiment can correspondingly implement all the contents in the embodiment of the method. As shown in fig. 3, the apparatus includes: acquisition unit 31, judgment unit 32, determination unit 33, detection unit 34, wherein
The obtaining unit 31 may be configured to obtain a website link for feeding back the abnormal information in the website.
The determining unit 32 may be configured to determine whether the number of website links of the feedback abnormal information acquired by the acquiring unit 31 exceeds a link-breaking threshold of the website.
The determining unit 33 may be configured to determine that the website link feeding back the abnormal information is a website link failure if the determining unit 32 determines that the number of website links feeding back the abnormal information does not exceed the link failure threshold of the website.
The detecting unit 34 may be configured to detect, by the proxy server, the website link feeding back the abnormal information if the determining unit 32 determines that the number of the website links feeding back the abnormal information exceeds the link-breaking threshold of the website.
The determining unit 33 may be further configured to determine that the website link of the feedback abnormal information is a website broken link when the feedback information detected by the detecting unit 34 is the same as the abnormal information.
Further, as an implementation of the method shown in fig. 2, an embodiment of the present invention further provides another website broken link detection apparatus, which is used for implementing the method shown in fig. 2. The embodiment of the apparatus corresponds to the embodiment of the method, and for convenience of reading, details in the embodiment of the apparatus are not repeated one by one, but it should be clear that the apparatus in the embodiment can correspondingly implement all the contents in the embodiment of the method. As shown in fig. 4, the apparatus includes: an acquisition unit 41, a judgment unit 42, a determination unit 43, a detection unit 44, wherein
An obtaining unit 41, configured to obtain a website link for feeding back the abnormal information in the website;
the determining unit 42 may be configured to determine whether the number of website links of the feedback abnormal information acquired by the acquiring unit 41 exceeds a link-breaking threshold of the website.
The determining unit 43 may be configured to determine that the website link feeding back the abnormal information is a website link failure if the determining unit 42 determines that the number of website links feeding back the abnormal information does not exceed the link failure threshold of the website.
The detecting unit 44 may be configured to detect, by the proxy server, the website link feeding back the abnormal information if the determining unit 42 determines that the number of website links feeding back the abnormal information exceeds the link-breaking threshold of the website.
The determining unit 43 may be further configured to determine that the website link of the feedback abnormal information is a website broken link when the feedback information detected by the detecting unit 44 is the same as the abnormal information.
Further, the determination unit 43 includes:
the first obtaining module 431 may be configured to obtain the address information through the proxy server.
The first crawling module 432 may be configured to crawl, by using a crawler, the website link that feeds back the abnormal information through the address information obtained by the first obtaining module 431.
The first obtaining module 431 may be further configured to obtain feedback information of the website link that feeds back the abnormal information.
The determining module 433 may be configured to determine whether the feedback information of the website link feeding back the abnormal information acquired by the first acquiring module 431 is the same as the abnormal information.
The first determining module 434 may be configured to determine that the website link feeding back the abnormal information is a website broken link if the determining module 433 determines that the feedback information of the website link feeding back the abnormal information is the same as the abnormal information.
Further, the determination unit 43 includes:
the second obtaining module 435 may be configured to obtain, by the proxy server, the address information after determining that the number of the website links feeding back the abnormal information does not exceed the link-breaking threshold of the website.
The second crawling module 436 may be configured to crawl, by using a crawler, the website link that feeds the abnormal information according to the address information obtained by the second obtaining module 435.
The second obtaining module 435 may be further configured to obtain feedback information of the website link that feeds back the abnormal information.
The second determining module 437 may be configured to determine that the website link is actually a website link failure when the feedback information of the website link feeding back the abnormal information acquired by the second acquiring module 435 is the same as the abnormal information.
Further, the apparatus further comprises:
the determining unit 43 may be further configured to determine that the website link is not a website broken link when the feedback information detected by the proxy server for feeding back the website link with the abnormal information is different from the abnormal information.
The output unit 45 may be configured to output a reminding message after the determining unit 43 determines that the website link is not a website link break, where the reminding message is used to indicate that the original server address used by the crawler has been disabled by the website.
Further, the acquiring unit 41 includes:
the crawling module 411 may be configured to crawl all links of the website layer by layer using a crawler.
The determining module 412 may be configured to determine, according to information fed back by the website link crawled layer by the crawler module 411, a website link in the website link that feeds back abnormal information.
The calculating module 413 may be configured to calculate the number of website links for feeding back the abnormal information determined by the determining module 412.
Further, the abnormal information is a webpage abnormal state code.
The embodiment of the invention provides another website broken link detection method and device. For the problem that the accuracy of website broken link detection is low due to the fact that a detection result is susceptible to influence when a website broken link is determined through a reference link in the prior art, the method and the device determine that the crawler function is normal when the quantity does not exceed a threshold value by comparing a broken link threshold value of a website with the quantity of website links for feeding back abnormal information, and determine that the website broken link detection function is normal; on the other hand, when the number exceeds the threshold value, the proxy server is used for detecting to further determine whether the functions of the crawler are normal or not, so that whether the website chain breakage detection functions are normal or not is determined. In addition, the abnormal state code in the website state code is used as the abnormal information, so that the analysis and determination process of the feedback information can be simplified, the analysis time of the feedback information is reduced, the time consumption of website broken link detection is further improved, and the detection efficiency is improved. Meanwhile, by comparing with the link-breaking threshold value, when the website link feeding back the abnormal state code is smaller than the link-breaking threshold value, the crawler can be determined to be normal in function, so that the website link feeding back the abnormal state code of the webpage can be ensured to be the link-breaking of the website, the influence of the prohibition of the crawler on the detection result is eliminated, and the accuracy of website link-breaking detection is improved. And when the number of the web page links feeding back the web page abnormal state codes does not exceed the link-breaking threshold value, the proxy server is used for further detecting the web page links, and the web page link-breaking is determined when the address provided by the proxy server is used for crawling the fed-back state codes again to be the same as the previously acquired web page abnormal state codes, so that the condition that the current crawler is forbidden is further eliminated, and the accuracy of the web page link-breaking detection is further improved. In addition, when the website link feeding back the abnormal state code is larger than the link-breaking threshold, the proxy server is used for detecting, so that when the link-breaking threshold is exceeded and whether the crawler cannot be determined to be forbidden or not can be ensured, the proxy server can be used for eliminating the interference of the function of forbidding the crawler on the detection result, and the accuracy of website link-breaking detection is improved. In addition, when the website link feeding back the abnormal state code of the webpage is determined to be different from the state code fed back by using the proxy server through detection of the proxy server, the website link can be determined not to be a website broken link, and meanwhile, prompt information is output, so that the condition that the crawler is forbidden can be timely found, and related personnel can timely adjust the crawler, and therefore the condition that misjudgment is caused by the fact that the function of the crawler is forbidden when the website broken link detection is carried out through the crawler is ensured, and the accuracy of website broken link detection is further improved.
The text processing device comprises a processor and a memory, wherein the acquisition unit, the judgment unit, the determination unit, the detection unit and the like are stored in the memory as program units, and the processor executes the program units stored in the memory to realize corresponding functions.
The processor comprises a kernel, and the kernel calls the corresponding program unit from the memory. The kernel can be set to be one or more than one, and the accuracy of website broken link detection is improved by adjusting kernel parameters.
The memory may include volatile memory in a computer readable medium, Random Access Memory (RAM) and/or nonvolatile memory such as Read Only Memory (ROM) or flash memory (flash RAM), and the memory includes at least one memory chip.
An embodiment of the present invention provides a storage medium, on which a program is stored, and the program, when executed by a processor, implements the website broken link detection method.
The embodiment of the invention provides a processor, which is used for running a program, wherein the website broken link detection method is executed when the program runs.
The embodiment of the invention provides equipment, which comprises a processor, a memory and a program which is stored on the memory and can run on the processor, wherein the processor executes the program and realizes the following steps: acquiring a website link for feeding back abnormal information in a website; judging whether the number of the website links feeding back the abnormal information exceeds the link breakage threshold value of the website or not; if not, determining that the website link feeding back the abnormal information is website broken link; if the feedback information is the same as the abnormal information, determining that the website link feeding back the abnormal information is website broken link.
Further, if the feedback information obtained by detecting the website link feeding back the abnormal information by the proxy server is the same as the abnormal information, determining that the website link feeding back the abnormal information is website link failure includes:
acquiring address information through a proxy server;
crawling the website link of the feedback abnormal information by using a crawler according to the address information, and acquiring the feedback information of the website link of the feedback abnormal information;
judging whether the feedback information of the website link feeding back the abnormal information is the same as the abnormal information or not;
and if the feedback information is the same as the feedback information, determining that the website link feeding the abnormal feedback information is website broken link.
Further, if it is determined that the number of the website links feeding back the abnormal information does not exceed the link-breaking threshold of the website, determining that the website links feeding back the abnormal information are website links-breaking includes:
if the number of the website links feeding back the abnormal information does not exceed the link breaking threshold value of the website, acquiring address information through a proxy server;
crawling the website link of the feedback abnormal information by using a crawler according to the address information, and acquiring the feedback information of the website link of the feedback abnormal information;
and when the feedback information of the website link feeding back the abnormal information is the same as the abnormal information, determining that the website link is actually the website broken link.
Further, after it is determined that the number of website links feeding back the abnormal information exceeds the link-breaking threshold of the website, the method further includes:
when the feedback information detected by the proxy server for the website link feeding back the abnormal information is different from the abnormal information, determining that the website link is not website broken link;
and outputting reminding information, wherein the reminding information is used for indicating that the original server address used by the crawler is forbidden by the website.
Further, the acquiring the website link for feeding back the abnormal information in the website includes:
crawling all links of the website layer by using a crawler;
determining website links for feeding back abnormal information in the website links according to information fed back by the website links crawled by the crawler layer by layer;
and calculating the number of the website links feeding back the abnormal information.
Further, the abnormal information is a webpage abnormal state code.
The device in the embodiment of the invention can be a server, a PC, a PAD, a mobile phone and the like.
An embodiment of the present invention further provides a computer program product, which, when executed on a data processing apparatus, is adapted to execute a program that initializes the following method steps: acquiring a website link for feeding back abnormal information in a website; judging whether the number of the website links feeding back the abnormal information exceeds the link breakage threshold value of the website or not; if not, determining that the website link feeding back the abnormal information is website broken link; if the feedback information is the same as the abnormal information, determining that the website link feeding back the abnormal information is website broken link.
Further, if the feedback information obtained by detecting the website link feeding back the abnormal information by the proxy server is the same as the abnormal information, determining that the website link feeding back the abnormal information is website link failure includes:
acquiring address information through a proxy server;
crawling the website link of the feedback abnormal information by using a crawler according to the address information, and acquiring the feedback information of the website link of the feedback abnormal information;
judging whether the feedback information of the website link feeding back the abnormal information is the same as the abnormal information or not;
and if the feedback information is the same as the feedback information, determining that the website link feeding the abnormal feedback information is website broken link.
Further, if it is determined that the number of the website links feeding back the abnormal information does not exceed the link-breaking threshold of the website, determining that the website links feeding back the abnormal information are website links-breaking includes:
if the number of the website links feeding back the abnormal information does not exceed the link breaking threshold value of the website, acquiring address information through a proxy server;
crawling the website link of the feedback abnormal information by using a crawler according to the address information, and acquiring the feedback information of the website link of the feedback abnormal information;
and when the feedback information of the website link feeding back the abnormal information is the same as the abnormal information, determining that the website link is actually the website broken link.
Further, after it is determined that the number of website links feeding back the abnormal information exceeds the link-breaking threshold of the website, the method further includes:
when the feedback information detected by the proxy server for the website link feeding back the abnormal information is different from the abnormal information, determining that the website link is not website broken link;
and outputting reminding information, wherein the reminding information is used for indicating that the original server address used by the crawler is forbidden by the website.
Further, the acquiring the website link for feeding back the abnormal information in the website includes:
crawling all links of the website layer by using a crawler;
determining website links for feeding back abnormal information in the website links according to information fed back by the website links crawled by the crawler layer by layer;
and calculating the number of the website links feeding back the abnormal information.
Further, the abnormal information is a webpage abnormal state code.
As will be appreciated by one skilled in the art, embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
In a typical configuration, a computing device includes one or more processors (CPUs), input/output interfaces, network interfaces, and memory.
The memory may include forms of volatile memory in a computer readable medium, Random Access Memory (RAM) and/or non-volatile memory, such as Read Only Memory (ROM) or flash memory (flash RAM). The memory is an example of a computer-readable medium.
Computer-readable media, including both non-transitory and non-transitory, removable and non-removable media, may implement information storage by any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of computer storage media include, but are not limited to, phase change memory (PRAM), Static Random Access Memory (SRAM), Dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), Read Only Memory (ROM), Electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technology, compact disc read only memory (CD-ROM), Digital Versatile Discs (DVD) or other optical storage, magnetic cassettes, magnetic tape magnetic disk storage or other magnetic storage devices, or any other non-transmission medium that can be used to store information that can be accessed by a computing device. As defined herein, a computer readable medium does not include a transitory computer readable medium such as a modulated data signal and a carrier wave.
It should also be noted that the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in the process, method, article, or apparatus that comprises the element.
As will be appreciated by one skilled in the art, embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
The above are merely examples of the present application and are not intended to limit the present application. Various modifications and changes may occur to those skilled in the art. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present application should be included in the scope of the claims of the present application.

Claims (8)

1. A website chain scission detection method is characterized by comprising the following steps:
acquiring a website link for feeding back abnormal information in a website, wherein the feedback abnormal information is acquired from feedback information received after a crawler crawls the website;
judging whether the number of the website links feeding back the abnormal information exceeds the link breakage threshold value of the website or not;
if not, determining that the website link feeding back the abnormal information is website broken link;
if the feedback information is the same as the abnormal information, determining that the website link feeding back the abnormal information is website broken link;
the acquiring of the website link for feeding back the abnormal information in the website includes:
crawling all links of the website layer by using a crawler;
determining website links for feeding back abnormal information in the website links according to information fed back by the website links crawled by the crawler layer by layer;
and calculating the number of the website links feeding back the abnormal information.
2. The method of claim 1, wherein if the feedback information detected by the proxy server for the website link feeding back the abnormal information is the same as the abnormal information, determining that the website link feeding back the abnormal information is a website link-down includes:
acquiring address information through a proxy server;
crawling the website link of the feedback abnormal information by using a crawler according to the address information, and acquiring the feedback information of the website link of the feedback abnormal information;
judging whether the feedback information of the website link feeding back the abnormal information is the same as the abnormal information or not;
and if the feedback information is the same as the feedback information, determining that the website link feeding the abnormal feedback information is website broken link.
3. The method of claim 2, wherein if it is determined that the number of website links feeding back the abnormal information does not exceed the website link-breaking threshold, determining that the website link feeding back the abnormal information is a website link-breaking includes:
if the number of the website links feeding back the abnormal information does not exceed the link breaking threshold value of the website, acquiring address information through a proxy server;
crawling the website link of the feedback abnormal information by using a crawler according to the address information, and acquiring the feedback information of the website link of the feedback abnormal information;
and when the feedback information of the website link feeding back the abnormal information is the same as the abnormal information, determining that the website link is actually the website broken link.
4. The method of claim 2, further comprising:
when the feedback information detected by the proxy server for the website link feeding back the abnormal information is different from the abnormal information, determining that the website link is not website broken link;
and outputting reminding information, wherein the reminding information is used for indicating that the original server address used by the crawler is forbidden by the website.
5. The method according to any one of claims 1-4, wherein the exception information is a web page exception status code.
6. A website delinking detection apparatus, the apparatus comprising:
the system comprises an acquisition unit, a processing unit and a processing unit, wherein the acquisition unit is used for acquiring a website link for feeding back abnormal information in a website, and the feedback abnormal information is acquired from feedback information received after a crawler crawls the website;
the judging unit is used for judging whether the number of the website links of the feedback abnormal information acquired by the acquiring unit exceeds the link breaking threshold of the website or not;
the determining unit is used for determining the website link feeding back the abnormal information as the website link breakage if the judging unit judges that the number of the website links feeding back the abnormal information does not exceed the link breakage threshold of the website;
the detection unit is used for detecting the website links feeding back the abnormal information through the proxy server if the judgment unit judges that the number of the website links feeding back the abnormal information exceeds the link-breaking threshold value of the website;
the determining unit is further configured to determine that the website link feeding the abnormal information is a website broken link when the feedback information detected by the detecting unit is the same as the abnormal information;
the acquisition unit includes:
the crawling module is used for crawling all links of the website layer by using a crawler;
the determining module is used for determining the website link which feeds back abnormal information in the website links according to information fed back by the website links crawled layer by the crawler;
and the calculation module is used for calculating the number of the website links which feed back the abnormal information and are determined by the determination module.
7. The apparatus of claim 6, wherein the determining unit comprises:
the first acquisition module is used for acquiring the address information through the proxy server;
the first crawling module is used for crawling the website link feeding back the abnormal information by using a crawler according to the address information acquired by the first acquiring module;
the first obtaining module is further configured to obtain feedback information of the website link where the abnormal information is fed back;
the judging module is used for judging whether the feedback information of the website link for feeding back the abnormal information acquired by the first acquiring module is the same as the abnormal information;
the first determining module is configured to determine that the website link feeding back the abnormal information is a website broken link if the judging module judges that the feedback information of the website link feeding back the abnormal information is the same as the abnormal information.
8. A storage medium, characterized in that the storage medium comprises a stored program, wherein when the program runs, a device in which the storage medium is located is controlled to execute the website broken link detection method according to any one of claims 1 to 5.
CN201710612685.9A 2017-07-25 2017-07-25 Website broken link detection method and device Active CN109302299B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201710612685.9A CN109302299B (en) 2017-07-25 2017-07-25 Website broken link detection method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710612685.9A CN109302299B (en) 2017-07-25 2017-07-25 Website broken link detection method and device

Publications (2)

Publication Number Publication Date
CN109302299A CN109302299A (en) 2019-02-01
CN109302299B true CN109302299B (en) 2021-12-28

Family

ID=65167402

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710612685.9A Active CN109302299B (en) 2017-07-25 2017-07-25 Website broken link detection method and device

Country Status (1)

Country Link
CN (1) CN109302299B (en)

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102739663A (en) * 2012-06-18 2012-10-17 奇智软件(北京)有限公司 Detection method and scanning engine of web pages
CA2762544A1 (en) * 2011-12-20 2013-06-20 Ibm Canada Limited - Ibm Canada Limitee Identifying requests that invalidate user sessions
CN104182462A (en) * 2014-07-21 2014-12-03 安徽华贞信息科技有限公司 Web crawler service system for housing library network
CN104537005A (en) * 2014-12-15 2015-04-22 北京国双科技有限公司 Data processing method and device for webpage crawling
CN106547793A (en) * 2015-09-22 2017-03-29 北京国双科技有限公司 The method and apparatus for obtaining proxy server address
CN106874487A (en) * 2017-02-21 2017-06-20 国信优易数据有限公司 A kind of distributed reptile management system and its method

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8230062B2 (en) * 2010-06-21 2012-07-24 Salesforce.Com, Inc. Referred internet traffic analysis system and method

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CA2762544A1 (en) * 2011-12-20 2013-06-20 Ibm Canada Limited - Ibm Canada Limitee Identifying requests that invalidate user sessions
CN102739663A (en) * 2012-06-18 2012-10-17 奇智软件(北京)有限公司 Detection method and scanning engine of web pages
CN104182462A (en) * 2014-07-21 2014-12-03 安徽华贞信息科技有限公司 Web crawler service system for housing library network
CN104537005A (en) * 2014-12-15 2015-04-22 北京国双科技有限公司 Data processing method and device for webpage crawling
CN106547793A (en) * 2015-09-22 2017-03-29 北京国双科技有限公司 The method and apparatus for obtaining proxy server address
CN106874487A (en) * 2017-02-21 2017-06-20 国信优易数据有限公司 A kind of distributed reptile management system and its method

Also Published As

Publication number Publication date
CN109302299A (en) 2019-02-01

Similar Documents

Publication Publication Date Title
US20200358826A1 (en) Methods and apparatus to assess compliance of a virtual computing environment
CN106716972B (en) Semi-automatic failover
US10069856B2 (en) System and method of comparative evaluation for phishing mitigation
US10678672B1 (en) Security validation of software delivered as a service
US11157343B2 (en) Systems and methods for real time computer fault evaluation
AU2015253468A1 (en) Adjustment of protection based on prediction and warning of malware-prone activity
US9355005B2 (en) Detection apparatus and detection method
US11093319B2 (en) Automated recovery of webpage functionality
CN109298987B (en) Method and device for detecting running state of web crawler
US11106520B2 (en) Systems and methods for preventing client application crashes due to operating system updates
CN105404581A (en) Database evaluation method and device
CN106961410B (en) Abnormal access detection method and device
US20170244595A1 (en) Dynamic data collection profile configuration
CN111898059A (en) Website page quality evaluation and monitoring method and system
CN106657422B (en) Method, device and system for crawling website page and storage medium
US11455223B2 (en) Using system errors and manufacturer defects in system components causing the system errors to determine a quality assessment value for the components
CN109600272B (en) Crawler detection method and device
CN109302299B (en) Website broken link detection method and device
CN111478792B (en) Cutover information processing method, system and device
CN113992378B (en) Security monitoring method and device, electronic equipment and storage medium
US10733080B2 (en) Automatically establishing significance of static analysis results
CN114021115A (en) Malicious application detection method and device, storage medium and processor
CN112804104A (en) Early warning method, device, equipment and medium
WO2022015313A1 (en) Generation of alerts of correlated time-series behavior of environments
US10735246B2 (en) Monitoring an object to prevent an occurrence of an issue

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
CB02 Change of applicant information

Address after: 100083 No. 401, 4th Floor, Haitai Building, 229 North Fourth Ring Road, Haidian District, Beijing

Applicant after: Beijing Guoshuang Technology Co.,Ltd.

Address before: 100086 Cuigong Hotel, 76 Zhichun Road, Shuangyushu District, Haidian District, Beijing

Applicant before: Beijing Guoshuang Technology Co.,Ltd.

CB02 Change of applicant information
GR01 Patent grant
GR01 Patent grant