CN109302299A - A kind of website chain rupture detection method and device - Google Patents

A kind of website chain rupture detection method and device Download PDF

Info

Publication number
CN109302299A
CN109302299A CN201710612685.9A CN201710612685A CN109302299A CN 109302299 A CN109302299 A CN 109302299A CN 201710612685 A CN201710612685 A CN 201710612685A CN 109302299 A CN109302299 A CN 109302299A
Authority
CN
China
Prior art keywords
feedback
web site
website
site url
exception information
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201710612685.9A
Other languages
Chinese (zh)
Other versions
CN109302299B (en
Inventor
潘峰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Gridsum Technology Co Ltd
Original Assignee
Beijing Gridsum Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Gridsum Technology Co Ltd filed Critical Beijing Gridsum Technology Co Ltd
Priority to CN201710612685.9A priority Critical patent/CN109302299B/en
Publication of CN109302299A publication Critical patent/CN109302299A/en
Application granted granted Critical
Publication of CN109302299B publication Critical patent/CN109302299B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/50Network service management, e.g. ensuring proper service fulfilment according to agreements
    • H04L41/5003Managing SLA; Interaction between SLA and QoS
    • H04L41/5009Determining service level performance parameters or violations of service level contracts, e.g. violations of agreed response time or mean time between failures [MTBF]
    • H04L41/5012Determining service level performance parameters or violations of service level contracts, e.g. violations of agreed response time or mean time between failures [MTBF] determining service availability, e.g. which services are available at a certain point in time
    • H04L41/5016Determining service level performance parameters or violations of service level contracts, e.g. violations of agreed response time or mean time between failures [MTBF] determining service availability, e.g. which services are available at a certain point in time based on statistics of service availability, e.g. in percentage or over a given time
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L43/00Arrangements for monitoring or testing data switching networks
    • H04L43/08Monitoring or testing based on specific metrics, e.g. QoS, energy consumption or environmental parameters
    • H04L43/0805Monitoring or testing based on specific metrics, e.g. QoS, energy consumption or environmental parameters by checking availability
    • H04L43/0811Monitoring or testing based on specific metrics, e.g. QoS, energy consumption or environmental parameters by checking availability by checking connectivity

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Environmental & Geological Engineering (AREA)
  • Physics & Mathematics (AREA)
  • Probability & Statistics with Applications (AREA)
  • Information Transfer Between Computers (AREA)
  • Debugging And Monitoring (AREA)

Abstract

The invention discloses a kind of website chain rupture detection method and device, are related to network technique field, and to solve the prior art when carrying out website chain rupture detection, the accuracy rate of testing result is lower and invents.The method comprise the steps that obtaining the web site url for feeding back exception information in website;Judge it is described feedback exception information web site url quantity whether be more than the website chain rupture threshold value;If not exceeded, the web site url for then determining the feedback exception information is website chain rupture;If being more than, detected by web site url of the proxy server to the feedback exception information, when the feedback information of the detection is identical as the exception information, determines that the web site url of the feedback exception information is website chain rupture.The present invention is suitably applied in the detection of website chain rupture.

Description

A kind of website chain rupture detection method and device
Technical field
The present invention relates to network technique field more particularly to a kind of website chain rupture detection method and device.
Background technique
With gradually popularizing for network, network has become the important component in people's life.In browsing web sites Webpage when, it is possible that situations such as web displaying mistake, general these web page interlinkages that can not be browsed are known as net by us It stands chain rupture.For a website, the quantity of website chain rupture is to measure the important indicator of Website quality quality.Therefore, usually The detection of website chain rupture can be carried out, to website to realize the effect being monitored to the quality of website.In general, to website When carrying out website chain rupture detection, the access behavior that crawler carrys out analog subscriber will use, and true according to the feedback information of web site url Whether the fixed web site url is website chain rupture, to realize the detection function of website chain rupture.
But when using crawler, in fact it could happen that the case where crawler disables caused crawler dysfunction by website, thus When causing to detect website chain rupture using crawler, the accuracy rate of testing result is lower.In order to exclude the forbidden feelings of crawler Condition simultaneously ensures that crawler function is normal, and the prior art generally first selects a chain of website when using crawler detection website chain rupture It connects as benchmark chain, as object of reference when crawling other links of the website, for example, in other webpages using crawler to website When link is crawled, if feedback information is exception information, and being crawled fed back information not to benchmark chain is abnormal letter When breath, it is determined that the web page interlinkage of current feedback exception information is website chain rupture.However, in actual operation, due to benchmark chain It is not fixed and invariable, usually occurs changing because of the correcting or upgrading of website, cause benchmark chain to fail, Jin Erwu Method excludes the forbidden situation of crawler and influences testing result, and the accuracy rate so as to cause website chain rupture detection is lower.
Summary of the invention
In view of the above problems, the present invention provides a kind of website chain rupture detection method and device, and main purpose is to improve net The accuracy of chain rupture of standing detection.
In order to solve the above technical problems, in a first aspect, the present invention provides a kind of website chain rupture detection method, this method packet It includes:
Obtain the web site url that exception information is fed back in website;
Judge it is described feedback exception information web site url quantity whether be more than the website chain rupture threshold value;
If not exceeded, the web site url for then determining the feedback exception information is website chain rupture;
If being more than, detected by web site url of the proxy server to the feedback exception information, when the inspection When the feedback information of survey is identical as the exception information, determine that the web site url of the feedback exception information is website chain rupture.
Optionally, the feedback letter detected by web site url of the proxy server to the feedback exception information When ceasing identical as the exception information, it is determined that the web site url of the feedback exception information is that website chain rupture includes:
Address information is obtained by proxy server;
It by the address information, is crawled, and obtained using web site url of the crawler to the feedback exception information The feedback information of the web site url of the feedback exception information;
Judge whether the feedback information of the web site url of the feedback exception information is identical as the exception information;
If they are the same, it is determined that the web site url of the feedback exception information is website chain rupture.
Optionally, if the quantity of the web site url for judging the feedback exception information is less than the chain rupture of the website Threshold value, it is determined that the web site url of the feedback exception information is that website chain rupture includes:
If being less than the chain rupture threshold value of the website in the quantity of the web site url for judging the feedback exception information Later, address information is obtained by proxy server;
It by the address information, is crawled, and obtained using web site url of the crawler to the feedback exception information The feedback information of the web site url of the feedback exception information;
When the feedback information of the web site url of the feedback exception information is identical as the exception information, the net is determined Link of standing is really website chain rupture.
Optionally, the method also includes:
When by proxy server to the feedback information that is detected of web site url of the feedback exception information with it is described When exception information difference, determining the web site url not is website chain rupture;
Prompting message is exported, the prompting message is used to indicate former server address that the crawler uses by the net It stands disabling.
Optionally, the web site url for obtaining feedback exception information in website includes:
Whole links of website are successively crawled using crawler;
The information of the web site url feedback successively crawled according to the crawler, which determines, feeds back abnormal letter in the web site url The web site url of breath;
Calculate the web site url quantity of the feedback exception information.
Optionally, the exception information is webpage abnormality code.
Second aspect, the present invention also provides a kind of website broken chain detecting device, which includes:
Acquiring unit, for obtaining the web site url for feeding back exception information in website;
Whether the quantity of judging unit, the web site url of the feedback exception information for judging the acquiring unit acquisition surpasses Cross the chain rupture threshold value of the website;
Determination unit, if it is described to judge that the quantity for feeding back the web site url of exception information is less than for the judging unit The chain rupture threshold value of website, it is determined that the web site url of the feedback exception information is website chain rupture;
Detection unit, if judging that the quantity of the web site url of feedback exception information is more than the net for the judging unit The chain rupture threshold value stood then is detected by web site url of the proxy server to the feedback exception information;
The determination unit is also used to when the feedback information that the detection unit detects is identical as the exception information, The web site url for determining the feedback exception information is website chain rupture.
Optionally, the determination unit includes:
First obtains module, for obtaining address information by proxy server;
First crawls module, for obtaining the address information that module obtains by described first, using crawler to described anti- The web site url of feedback exception information is crawled;
Described first obtains module, is also used to obtain the feedback information of the web site url of the feedback exception information;
Judgment module, the feedback letter of the web site url of the feedback exception information for judging the first acquisition module acquisition It whether identical as the exception information ceases;
First determining module, if for the judgment module judge feedback exception information web site url feedback information with The exception information is identical, it is determined that the web site url of the feedback exception information is website chain rupture.
Optionally, the determination unit includes:
Second obtains module, if for being less than institute in the quantity of the web site url for judging the feedback exception information After the chain rupture threshold value for stating website, address information is obtained by proxy server;
Second crawls module, for obtaining the address information that module obtains by described second, using crawler to described anti- The web site url of feedback exception information is crawled;
Described second obtains module, is also used to obtain the feedback information of the web site url of the feedback exception information;
Second determining module, the feedback of the web site url for obtaining the feedback exception information that module obtains when described second When information is identical as the exception information, determining the web site url really is website chain rupture.
Optionally, described device further include:
The determination unit is also used to examine when by web site url of the proxy server to the feedback exception information When the feedback information of survey and the exception information difference, determining the web site url not is website chain rupture;
Output unit, for exporting prompting message, the prompting message is used to indicate the former server that the crawler uses Address is disabled by the website.
Optionally, the acquiring unit includes:
Module is crawled, for successively crawling using crawler to whole links of website;
Determining module, the information for crawling the web site url feedback that module is successively crawled using crawler according to determine The web site url of exception information is fed back in the web site url;
Computing module, for calculating the web site url quantity for the feedback exception information that the determining module determines.
Optionally, the exception information is webpage abnormality code.
To achieve the goals above, according to the third aspect of the invention we, a kind of storage medium, the storage medium are provided Program including storage, wherein equipment where controlling the storage medium in described program operation executes net described above It stands chain rupture detection method.
To achieve the goals above, according to the fourth aspect of the invention, a kind of processor is provided, the processor is used for Run program, wherein described program executes chain rupture detection method in website described above when running.
By above-mentioned technical proposal, chain rupture detection method and device in website provided by the invention, for the prior art logical It crosses using benchmark chain when determining website chain rupture, the case where there are the failures of benchmark chain, makes testing result susceptible, so as to cause The lower problem of the accuracy rate of website chain rupture detection, the present invention pass through the web site url quantity and chain rupture that will have fed back exception information Threshold value compares, and can determine whether the function of current crawler is disabled, to determine when being less than chain rupture threshold value determining Crawler function is normal, and determines that feedback exception information is linked as website chain rupture when detecting exception information, and then can arrange To the influence of website chain rupture detection when being disabled except crawler function by website, the accuracy of website chain rupture detection is improved.In addition, working as When determining that the number of links of feedback exception information is more than chain rupture threshold value, by proxy server to the website chain of feedback exception information Tap into capable detection, can identify whether crawler function disabled, so the feedback information detected by proxy server with it is different When often information is identical, determine current feedback exception information is linked as website chain rupture, and it is disabled right further to eliminate crawler The influence of website chain rupture detection rings asking for website chain rupture detection so as to avoid to the failure in the prior art due to benchmark chain Topic, then further improves the accuracy of website chain rupture.
The above description is only an overview of the technical scheme of the present invention, in order to better understand the technical means of the present invention, And it can be implemented in accordance with the contents of the specification, and in order to allow above and other objects of the present invention, feature and advantage can It is clearer and more comprehensible, the followings are specific embodiments of the present invention.
Detailed description of the invention
By reading the following detailed description of the preferred embodiment, various other advantages and benefits are common for this field Technical staff will become clear.The drawings are only for the purpose of illustrating a preferred embodiment, and is not considered as to the present invention Limitation.And throughout the drawings, the same reference numbers will be used to refer to the same parts.In the accompanying drawings:
Fig. 1 shows a kind of website chain rupture detection method flow chart provided in an embodiment of the present invention;
Fig. 2 shows another website chain rupture detection method flow charts provided in an embodiment of the present invention;
Fig. 3 shows a kind of composition block diagram of website broken chain detecting device provided in an embodiment of the present invention;
Fig. 4 shows the composition block diagram of another website broken chain detecting device provided in an embodiment of the present invention.
Specific embodiment
The exemplary embodiment that the present invention will be described in more detail below with reference to accompanying drawings.Although showing the present invention in attached drawing Exemplary embodiment, it being understood, however, that may be realized in various forms the present invention without should be by embodiments set forth here It is limited.It is to be able to thoroughly understand the present invention on the contrary, providing these embodiments, and can be by the scope of the present invention It is fully disclosed to those skilled in the art.
In order to improve the accuracy of website chain rupture detection, the embodiment of the invention provides a kind of website chain rupture detection method, As shown in Figure 1, this method comprises:
101, the web site url that exception information is fed back in website is obtained.
In general, web site url can be rung according to access request when accessing to the web page interlinkage in a website It answers and issues feedback information to client, when the web page interlinkage is not website chain rupture, the webpage can normally access simultaneously anti- Present normal feedback information.But when web page interlinkage is website chain rupture, since web page interlinkage cannot be accessed normally, then the net Feedback information of the page when being responded is exception information, wherein the exception information may include abnormal web data or different Any one of the different modes such as normal webpage status code.In addition, can choose use when being detected to website chain rupture and climb Web page interlinkage of the worm to website is crawled, the access behavior with analog subscriber to the web page interlinkage.It should be noted that In the embodiment of the present invention, the access behavior of analog subscriber is mainly carried out in a manner of using crawler, but analog subscriber accesses behavior Specific embodiment is also an option that other modes, it is not limited here, can according to need and is chosen.
Therefore, the method according to this step can be used crawler and crawl to website, and according to receiving after crawling The feedback information arrived therefrom obtains the web site url of feedback exception information, obtains the web site url of feedback exception information.
102, judge it is described feedback exception information web site url quantity whether be more than the website chain rupture threshold value.
In general, a large amount of web page interlinkage is contained in a website, meanwhile, not all link is all normal Link, more or less can all have abnormal web page interlinkage, such as website chain rupture.In this case, either website The visitor of administrative staff either website keeps tolerance to a certain extent for the website chain rupture that quantity is less than threshold value. Also therefore, the concept of chain rupture threshold value has been derived, the chain rupture threshold value can be regarded as " red line " of website chain rupture quantity, when It has been more than chain rupture threshold value, then has meaned that chain rupture quantity is excessive in current site and seriously affected the user experience of website, needed Carry out relevant processing.Therefore, the chain rupture threshold value of website can be used as in the detection process or analytic process to website chain rupture In reference frame.
The method according to embodiments of the present invention, due to carrying out the detection of website chain rupture using crawler.Therefore, when climbing When worm is disabled by the website, cause crawler that can not crawl the link in targeted website, the feedback information obtained from is all different Normal information, to influence the accuracy of website chain rupture testing result.But in fact, website can when being linked at without using crawler Normal access.In this case, in order to verify whether website disables crawler, the side according to this step is needed Method counts the web site url of the feedback exception information after the web site url that step 101 gets the feedback exception information Quantity, then the quantity counted and the chain rupture threshold value of the website are compared, judge whether be more than chain rupture threshold value.
In addition, after executing step 102, if it is described to judge that the quantity of the web site url of the feedback exception information is less than When the chain rupture threshold value of website, 103 are thened follow the steps, step specifically:
103, the web site url for determining the feedback exception information is website chain rupture.
When the quantity of the web site url of the feedback exception information is less than the chain rupture threshold value of the website, illustrate currently to climb The function of worm is normally, not occur the forbidden situation of crawler.Since crawler can normally crawl in the website The link of webpage, so that the web site url for confirming that crawler feeds back exception information when crawling is strictly the chain that can not normally access It connects, may thereby determine that this is linked as website chain rupture, ensure that the accuracy of website chain rupture detection.
In addition, after executing step 102, if the quantity for judging the web site url of the feedback exception information is more than the net When the chain rupture threshold value stood, 104 are thened follow the steps, step specifically:
104, it is detected by web site url of the proxy server to the feedback exception information, it is anti-when the detection When feedforward information is identical as the exception information, determine that the web site url of the feedback exception information is website chain rupture.
After step 102 judgement, if judging, the quantity of the web site url for feeding back exception information is more than the website Chain rupture threshold value, then illustrate that current crawler may be disabled, what the webpage for crawling crawler was fed back is all exception information, thus Cause to feed back the number of links of exception information to be more than chain rupture threshold value.In the case, illustrate that currently passing through crawler breaks to website The detection of chain is likely to inaccuracy, needs to come in other way to carry out the web site url of feedback exception information true Recognize.In method described in this step, by using proxy server, one is carried out to the web site url of feedback exception information again Secondary detection, and feed back whether the web site url of exception information is website chain rupture according to testing result to determine.Pass through agency as a result, Detection of the server to the web site url of the feedback exception information, when feedback information is identical as exception information before, really The fixed detected web site url is the link that can not normally access really, it is possible thereby to determine current feedback exception information Web site url is website chain rupture.
For example, when carrying out website chain rupture detection to website A by crawler, when the number of links of feedback exception information is 33 A, chain rupture threshold value is 5, the method according to this step, and web site url quantity 33 due to feedback exception information are greater than It chain rupture threshold value 5, is disabled by website A it is possible thereby to determine and be likely to crawler, therefore the net of this 33 feedback exception informations The detection linked of standing is likely to inaccuracy.Therefore, by proxy server this 33 are fed back with the website chain of exception information Tap into capable detection.When web site url a therein is after proxy server detects, the information of feedback and the exception fed back before Information is identical, then illustrates that web site url a is strictly the link that cannot be accessed, thus may determine that web site url a is website chain rupture.
Chain rupture detection method in website provided in an embodiment of the present invention is determining website by benchmark chain for the prior art When chain rupture, there are benchmark chain failure the case where, make testing result susceptible, so as to cause website chain rupture detection accuracy rate compared with Low problem, by the present invention in that compared with the chain rupture threshold value of website and the web site url quantity of feedback exception information, Determine that crawler function is normal when the non-superthreshold of quantity, so that it is determined that website chain rupture detection function is normal;On the other hand in quantity When more than threshold value, by carrying out detection can further determine whether the function of crawler is normal using proxy server, from And determining whether chain rupture detection function in website is normal, therefore compared with the prior art, the present invention passes through the chain rupture threshold using website Whether the function that value verifies crawler with proxy server is normal, the forbidden situation of crawler function can be excluded, to avoid There is the situation of testing result inaccuracy when being disabled by website in crawler, and then solves the prior art because benchmark chain is unstable, The problem for causing testing result accuracy poor.
Further, as the refinement and extension to embodiment illustrated in fig. 1, the embodiment of the invention also provides another nets It stands chain rupture detection method, as shown in Figure 2.
201, the web site url that exception information is fed back in website is obtained.
Wherein, exception information described in the embodiment of the present invention can be webpage abnormality code.The webpage abnormality Code actually refers to the status code that webpage abnormal state is used to indicate in webpage status code.Access is issued to webpage in user terminal to ask When asking, webpage can be fed back according to the request, wherein including the webpage that whether can be normally accessed for characterizing the webpage Status code.
Webpage status code, also known as doing HTTP status code (HTTP Status Code, be translated into HTTP status code) is a kind of use To indicate 3 digit numerical codes of web page server http response state.It is defined by the specification of RFC 2616, and obtains RFC 2518, the specifications such as RFC 2817, RFC 2295, RFC 2774, RFC 4918 extend.Webpage described in the embodiment of the present invention is abnormal Status code is then that the part status code of webpage exception is indicated in the webpage status code, for example, " 404 ", " 503 " etc..
The method according to this step as a result, firstly, passing through the access behavior of crawler analog subscriber, to be detected Whole web page interlinkages in website carry out layer-by-layer crawl.Since the web page interlinkage of website can feed back corresponding net after being crawled Page status code, therefore obtain after crawling whole status codes of the link feedback of the website and corresponding with the webpage status code Link.
Then, webpage abnormality code is filtered out in the webpage status code got, the explanation needed herein, by In three bit digitals that webpage status code is according to intrinsic rule editing, therefore each status code is corresponding with fixed meaning.By This, in the method described in this step, can filter out the status code for representing webpage exception according to the intrinsic meaning of status code, Such as " 404 " represent access request failure, then can be used as webpage abnormality code described in this step and screened.And according to The webpage abnormality code filtered out feeds back the web page interlinkage of this yard to determine, to obtain feeding back webpage exception in the website The web site url of status code.
Finally, the web site url for feeding back abnormal webpage status code is calculated, calculates the feedback exception information Web site url quantity.
Determine whether feedback information is exception information by the website abnormal status code described in this step, can simplify pair The analysis and determination process of feedback information, and reduce analysis time, and then it is disconnected to improve website described in the embodiment of the present invention The time loss of chain detection, improves detection efficiency.
202, judge it is described feedback exception information web site url quantity whether be more than the website chain rupture threshold value.
Wherein, the chain rupture threshold value of website described in this step is identical as the description in step 102 in previous embodiment, herein It does not repeat them here.Therefore, the method according to this step calculates the website chain of feedback webpage abnormality code in step 201 After connecing quantity, which can be compared with chain rupture threshold value, and judged according to comparison result.
In addition, if judging, the quantity of the web site url of the feedback exception information is less than after executing the step 202 The chain rupture threshold value of the website thens follow the steps 203, its step specifically:
203, the web site url for determining the feedback exception information is website chain rupture.
In general, when crawling web site url using crawler, it is understood that there may be the case where being disabled by the website, i.e. crawler Crawling feedback received by any link of the website is all webpage abnormality code, rather than the link with crawler to website What is accessed is normal webpage status code.In the case, since the function of crawler is actually no longer valid, because This has been inaccurate with the webpage status code that crawler obtains come analog subscriber.
So the web site url quantity for feeding back webpage abnormality code is general when crawler function is disabled by the website It is all the preset chain rupture threshold value in website to be more than.Therefore the method according to this step, when webpage exception shape is fed back in judgement When the quantity of the web site url of state code is less than the chain rupture threshold value of the website, it is disabled to illustrate that current crawler does not have, function is just Often.Thus prove that the web site url that crawler crawls is strictly there is abnormal link, and then can determine that the feedback webpage is different The web site url of normal status code is website chain rupture.
In addition, judging the feedback webpage abnormality to further improve the accuracy of website chain rupture detection The quantity of the web site url of code is less than after the chain rupture threshold value of the website, can also be by proxy server to the feedback The web site url of webpage abnormality code is further detected.
Specifically, it is possible, firstly, to obtaining address information by proxy server.Because crawler is all when using crawler The web page interlinkage in website is crawled based on an address information.For example, being come using the IP address that some is fixed Whole links of targeted website are crawled.But in the method described in the embodiment of the present invention, main purpose is to solve The certainly disabled problem for leading to testing result inaccuracy of crawler, and under normal conditions, website is to the disabling of crawler mainly with envelope Close what the mode of the address of the crawler carried out, i.e., what operation behavior no matter the address where crawler carry out, and website is to the address Request all without normally responding.Therefore, proxy server is used when determination is less than chain rupture threshold value in this step Additional address is provided for crawler, to exclude the forbidden situation in address that crawler uses.
Then, by the address information, using crawler to the web site url of the feedback webpage abnormality code again It is crawled.After being crawled, the web page interlinkage being crawled can feed back a webpage status code, obtain the webpage status code, i.e., The webpage status code that the web site url of feedback abnormality code is fed back after being crawled again is obtained.
Finally, the website status code obtained after crawler is crawled again and the webpage abnormality code that crawls before into Row comparison illustrates either crawl using the original address of crawler or using agency when the two status codes are identical The address that server provides is crawled, and the web page interlinkage feedback is all therefore webpage abnormality code illustrates crawler Function is normal, and determines that the web site url is website chain rupture really.
For example, when the chain rupture threshold value of website B is 10, and the web site url of the feedback abnormality code obtained at present is respectively Link 1, link 2, link 3, link 4, link 5, link 6, link 7 when, totally 7 when, due to quantity be less than chain rupture threshold value 10, The then method according to step can determine that current crawler is not disabled, and determine that this 7 links are the websites of website B Chain rupture;But further, when determining that number of links 7 of feedback abnormality code are less than chain rupture threshold value 10, can make A new address is provided for crawler with proxy server, and crawls this 7 links again.It is provided when unused proxy server When new address, the webpage abnormality code " 404 " for being fed back to web-page requests and failing that link 1 obtains is crawled using crawler, and is weighed The status code for newly crawling the feedback received when link 1 is that " 404 " are as before, then illustrates either to use crawler raw address It is crawled, still having replaced the result that the new address that proxy server provides crawls link 1 is all to request not respond Webpage abnormality code " 404 ", it is possible thereby to determine crawler it is not disabled, and the link 1 also really be website chain rupture.
The method according to this step as a result, can determine crawler function just by comparing with chain rupture threshold value Often, the web site url so as to ensure to feed back webpage abnormality code is website chain rupture really, so that it is banned to eliminate crawler With the influence to testing result, the accuracy of website chain rupture detection is improved.Also, further when feedback webpage abnormality When the web page interlinkage quantity of code is less than chain rupture threshold value, detected using proxy server is further to the web page interlinkage, And to crawl the webpage abnormality code that fed back status code is obtained with before again identical in the address that proxy server provides When determine website chain rupture, the forbidden situation of current crawler is further eliminated, so that it is disconnected further to improve website The accuracy of chain detection.
In addition, after executing step 202, if the quantity for judging the web site url of the feedback exception information is more than the net When the chain rupture threshold value stood, 204 are thened follow the steps, step specifically:
204, it is detected by web site url of the proxy server to the feedback exception information, it is anti-when the detection When feedforward information is identical as the exception information, determine that the web site url of the feedback exception information is website chain rupture.
After executing method described in step 202, when the web site url quantity of judgement feedback webpage abnormality code is super When having crossed the website chain rupture threshold value, illustrate current crawler there may be being disabled by the website, therefore, in order to true The accuracy for protecting website chain rupture detection, needs to detect the link using proxy server, wherein specifically detected Journey can be with are as follows: firstly, obtaining address information by proxy server, and crawls website chain again as crawler using this address information New address when connecing.Then, it is crawled again by web site url of the determining new address to feedback webpage abnormality code, And the webpage status code of the feedback of the link is obtained to the feedback for crawling progress according to the link.Later, it is climbed described again The webpage status code fed back after taking is compared with the webpage abnormality code fed back when crawling before, and both judge whether phase Together.Finally, when the two status codes are identical, illustrate that web site url feeds back when this is crawled twice is all webpage abnormality Code, then can determine current site link really is website chain rupture.
For example, when the chain rupture threshold value of website C is 5, and the web site url of the feedback abnormality code obtained at present is respectively Link a, link b, link c, link d, link e, link f, link g, totally 7 when, due to quantity be greater than chain rupture threshold value 5, then The function of the method according to step, crawler may be disabled, therefore testing result may be inaccuracy, in turn It needs to be detected using additional mode, agency service can be used for example to be detected the inspection to ensure website chain rupture The accuracy of survey.When be not used proxy server provide new address when, using crawler crawl link b obtain be fed back to service Device can not currently handle the webpage abnormality code " 503 " of request, and crawl the status code of the feedback received when linking b again It is as before for " 503 ", then illustrate either to be crawled using crawler raw address, proxy server offer has still been provided New address be all the webpage abnormality code " 503 " for requesting not respond to the result that is crawled of link b, it is possible thereby to determine Link b is website chain rupture.
The method according to this step as a result, it is ensured that when be more than chain rupture threshold value lead to not determine crawler whether by When disabling, interference of the website disabling crawler function to testing result is excluded by proxy server, improves website chain rupture inspection The accuracy of survey.
In addition, being carried out after executing step 202 when by web site url of the proxy server to the feedback exception information When the feedback information of detection and the exception information difference, 205 are thened follow the steps, step specifically:
205, when by proxy server to the feedback information that is detected of web site url of the feedback exception information with When the exception information difference, determining the web site url not is website chain rupture.
The method according to embodiments of the present invention, due to using proxy server to feedback webpage abnormality code When web site url is detected, other than determining that the status code for crawling feedback twice is identical in step 204, there is also different Situation, i.e., what is fed back when crawler before crawls link using original address is webpage abnormality code, and is made Behind new address with proxy server offer, feedback is webpage normal condition code, and then causes to crawl fed back knot twice Fruit is different.
Therefore, the method according to this step, there are a kind of situations, i.e., abnormal to the feedback by proxy server The feedback states code that the web site url of status code the is detected situation different from the abnormality code.And such case occurs The reason of as described in step 203, be that " locked in " operation has been carried out to the address where crawler as website, i.e., no matter the address to Website issues the request of which kind of operation behavior, and website is all without response.It follows that after crawler raw address is closed, It when replacing another address, can normally crawl, so that it is determined that the web site url of the feedback webpage abnormality code is real It is normally that only the raw address where crawler is disabled, and then can determine that the link is not website on border Chain rupture.
206, prompting message is exported.
After step 205 has determined the web site url not and be website chain rupture, need to export prompting message, wherein this is mentioned Awake information, which is mainly used for issuing to related personnel, to be prompted, and the client address for prompting crawler currently used is closed by website, Crawler is disabled in the function that the address is crawled, and then related personnel can be made to carry out corresponding operation, for example, replacement The address information etc. at existing customer end.
Therefore, the method according to step 205-206 when the determining web site url for feeding back webpage abnormality code and makes When the status code difference fed back with proxy server, can determine the web site url not is website chain rupture, is climbed to confirm Raw address where worm has been closed, and crawler function when being crawled using raw address is actually forbidden, thus really The forbidden situation of timely discovery crawler has been protected, and exports prompt information crawler can be adjusted in time in order to related personnel, So that it is guaranteed that will not be because of its function disabled the case where judging by accident when implementing chain rupture detection in website by crawler.
Further, as the realization to method shown in above-mentioned Fig. 1, the embodiment of the invention also provides a kind of website chain ruptures Detection device, for being realized to above-mentioned method shown in FIG. 1.The Installation practice is corresponding with preceding method embodiment, is Easy to read, present apparatus embodiment no longer repeats the detail content in preceding method embodiment one by one, it should be understood that Device in the present embodiment can correspond to the full content realized in preceding method embodiment.As shown in figure 3, the device includes: Acquiring unit 31, judging unit 32, determination unit 33, detection unit 34, wherein
Acquiring unit 31 can be used for obtaining the web site url that exception information is fed back in website.
Judging unit 32 can be used for judging the number of the web site url for the feedback exception information that the acquiring unit 31 obtains Amount whether be more than the website chain rupture threshold value.
Determination unit 33, if can be used for the quantity of the web site url of the judgement of the judging unit 32 feedback exception information not More than the chain rupture threshold value of the website, it is determined that the web site url of the feedback exception information is website chain rupture.
Detection unit 34, if the quantity that can be used for the web site url of the judgement of the judging unit 32 feedback exception information is super The chain rupture threshold value of the website is crossed, then is detected by web site url of the proxy server to the feedback exception information.
The determination unit 33 can be also used for feedback information and the exception information when the detection unit 34 detection When identical, determine that the web site url of the feedback exception information is website chain rupture.
Further, as the realization to method shown in above-mentioned Fig. 2, it is disconnected that the embodiment of the invention also provides another websites Chain detection device, for being realized to above-mentioned method shown in Fig. 2.The Installation practice is corresponding with preceding method embodiment, To be easy to read, present apparatus embodiment no longer repeats the detail content in preceding method embodiment one by one, but it should bright Really, the device in the present embodiment can correspond to the full content realized in preceding method embodiment.As shown in figure 4, the device packet It includes: acquiring unit 41, judging unit 42, determination unit 43, detection unit 44, wherein
Acquiring unit 41 can be used for obtaining the web site url that exception information is fed back in website;
Judging unit 42 can be used for judging the number of the web site url for the feedback exception information that the acquiring unit 41 obtains Amount whether be more than the website chain rupture threshold value.
Determination unit 43, if can be used for the quantity of the web site url of the judgement of the judging unit 42 feedback exception information not More than the chain rupture threshold value of the website, it is determined that the web site url of the feedback exception information is website chain rupture.
Detection unit 44, if the quantity that can be used for the web site url of the judgement of the judging unit 42 feedback exception information is super The chain rupture threshold value of the website is crossed, then is detected by web site url of the proxy server to the feedback exception information.
The determination unit 43 can be also used for feedback information and the exception information when the detection unit 44 detection When identical, determine that the web site url of the feedback exception information is website chain rupture.
Further, the determination unit 43 includes:
First obtains module 431, can be used for obtaining address information by proxy server.
First crawls module 432, can be used for obtaining the address information that module 431 obtains by described first, using climbing Worm crawls the web site url of the feedback exception information.
Described first obtains module 431, can be also used for the feedback letter for obtaining the web site url of the feedback exception information Breath.
Judgment module 433 can be used for judging the described first website chain for obtaining the feedback exception information that module 431 obtains Whether the feedback information connect is identical as the exception information.
First determining module 434, if can be used for the web site url of the judgement of the judgment module 433 feedback exception information Feedback information is identical as the exception information, it is determined that the web site url of the feedback exception information is website chain rupture.
Further, the determination unit 43 includes:
Second obtains module 435, if can be used for the quantity in the web site url for judging the feedback exception information It is less than after the chain rupture threshold value of the website, address information is obtained by proxy server.
Second crawls module 436, can be used for obtaining the address information that module 435 obtains by described second, using climbing Worm crawls the web site url of the feedback exception information.
Described second obtains module 435, can be also used for the feedback letter for obtaining the web site url of the feedback exception information Breath.
Second determining module 437, the website that can be used for obtaining the feedback exception information that module 435 obtains when described second When the feedback information of link is identical as the exception information, determining the web site url really is website chain rupture.
Further, described device further include:
The determination unit 43 can be also used for when the web site url by proxy server to the feedback exception information When the feedback information detected and the exception information difference, determining the web site url not is website chain rupture.
Output unit 45 can be used for after the determining web site url of the determination unit 43 is not website chain rupture, defeated Prompting message out, the prompting message are used to indicate the former server address that the crawler uses and are disabled by the website.
Further, the acquiring unit 41 includes:
Module 411 is crawled, crawler is used for and whole links of website is successively crawled.
Determining module 412 can be used for crawling module 411 according to and be fed back using the web site url that crawler successively crawls Information determine in the web site url feed back exception information web site url.
Computing module 413 can be used for calculating the web site url number for the feedback exception information that the determining module 412 determines Amount.
Further, the exception information is webpage abnormality code.
Another kind website chain rupture detection method and device provided in an embodiment of the present invention.Benchmark is being passed through for the prior art Chain makes testing result susceptible, come the case where when determining website chain rupture, there are the failures of benchmark chain so as to cause website chain rupture inspection The lower problem of the accuracy rate of survey, by the present invention in that with the chain rupture threshold value of website and the web site url quantity of feedback exception information It compares, determines that crawler function is normal in the non-superthreshold of quantity, so that it is determined that website chain rupture detection function is normal;It is another Aspect can further determine that the function of crawler is by carrying out detection using proxy server when quantity is more than threshold value No normal, so that it is determined that whether chain rupture detection function in website is normal, therefore compared with the prior art, the present invention is by utilizing website Chain rupture threshold value verified with proxy server crawler function it is whether normal, the forbidden situation of crawler function can be excluded, Occur the situation of testing result inaccuracy when being disabled so as to avoid crawler by website, and then solves the prior art because of benchmark chain Problem that is unstable, causing testing result accuracy poor.In addition, by using the abnormality code in website status code as different Normal information can simplify analysis and determination process to feedback information, and reduce the analysis time of feedback information, and then improve The time loss of website chain rupture detection, improves detection efficiency.It is different when feeding back meanwhile by being compared with chain rupture threshold value When the web site url of normal status code is less than chain rupture threshold value, it can determine that crawler function is normal, it is different so as to ensure to feed back webpage The web site url of normal status code is website chain rupture really, to eliminate the disabled influence to testing result of crawler, is improved The accuracy of website chain rupture detection.Also, when the web page interlinkage quantity for feeding back webpage abnormality code is less than chain rupture threshold value, It is detected using proxy server is further to the web page interlinkage, and is crawled again in the address that proxy server provides again Website chain rupture is determined when the status code fed back is identical as the webpage abnormality code obtained before, is further eliminated and is worked as The preceding forbidden situation of crawler, and then improve the accuracy of website chain rupture detection.In addition, when the website of feedback abnormality code It when link is greater than chain rupture threshold value, is detected by using proxy server, it can be ensured that be more than that chain rupture threshold value leads to not When determining whether crawler is disabled, interference of the website disabling crawler function to testing result can be excluded using proxy server, Improve the accuracy of website chain rupture detection.In addition, when carrying out detection by proxy server and determining that feedback webpage is abnormal When the web site url of status code and the status code difference fed back using proxy server, it can determine that the web site url is not net It stands chain rupture, while exporting prompt information, it is ensured that the discovery forbidden situation of crawler in time, in order to which related personnel is in time to climbing Worm is adjusted, so that it is guaranteed that implement will not to occur judging by accident because its function is disabled when website chain rupture detection by crawler Situation further improves the accuracy of website chain rupture detection.
The text processing apparatus includes processor and memory, above-mentioned acquiring unit, judging unit, determination unit and Detection unit etc. stores in memory as program unit, executes above procedure list stored in memory by processor Member realizes corresponding function.
Include kernel in processor, is gone in memory to transfer corresponding program unit by kernel.Kernel can be set one Or more, the accuracy of website chain rupture detection is improved by adjusting kernel parameter.
Memory may include the non-volatile memory in computer-readable medium, random access memory (RAM) and/ Or the forms such as Nonvolatile memory, if read-only memory (ROM) or flash memory (flash RAM), memory include that at least one is deposited Store up chip.
The embodiment of the invention provides a kind of storage mediums, are stored thereon with program, real when which is executed by processor The existing website chain rupture detection method.
The embodiment of the invention provides a kind of processor, the processor is for running program, wherein described program operation Chain rupture detection method in website described in Shi Zhihang.
The embodiment of the invention provides a kind of equipment, equipment include processor, memory and storage on a memory and can The program run on a processor, processor perform the steps of the net for obtaining and feeding back exception information in website when executing program It stands link;Judge it is described feedback exception information web site url quantity whether be more than the website chain rupture threshold value;If not surpassing It crosses, it is determined that the web site url of the feedback exception information is website chain rupture;If being more than, by proxy server to described anti- The web site url of feedback exception information is detected, and when the feedback information of the detection is identical as the exception information, determines institute The web site url for stating feedback exception information is website chain rupture.
Further, the feedback detected by web site url of the proxy server to the feedback exception information When information is identical as the exception information, it is determined that the web site url of the feedback exception information is that website chain rupture includes:
Address information is obtained by proxy server;
It by the address information, is crawled, and obtained using web site url of the crawler to the feedback exception information The feedback information of the web site url of the feedback exception information;
Judge whether the feedback information of the web site url of the feedback exception information is identical as the exception information;
If they are the same, it is determined that the web site url of the feedback exception information is website chain rupture.
Further, if the quantity of the web site url for judging the feedback exception information is less than the disconnected of the website Chain threshold value, it is determined that the web site url of the feedback exception information is that website chain rupture includes:
If being less than the chain rupture threshold value of the website in the quantity of the web site url for judging the feedback exception information Later, address information is obtained by proxy server;
It by the address information, is crawled, and obtained using web site url of the crawler to the feedback exception information The feedback information of the web site url of the feedback exception information;
When the feedback information of the web site url of the feedback exception information is identical as the exception information, the net is determined Link of standing is really website chain rupture.
Further, if judging that the quantity of the web site url of the feedback exception information is more than the chain rupture threshold of the website After value, the method also includes:
When by proxy server to the feedback information that is detected of web site url of the feedback exception information with it is described When exception information difference, determining the web site url not is website chain rupture;
Prompting message is exported, the prompting message is used to indicate former server address that the crawler uses by the net It stands disabling.
Further, the web site url for obtaining feedback exception information in website includes:
Whole links of website are successively crawled using crawler;
The information of the web site url feedback successively crawled according to the crawler, which determines, feeds back abnormal letter in the web site url The web site url of breath;
Calculate the web site url quantity of the feedback exception information.
Further, the exception information is webpage abnormality code.
Equipment in the embodiment of the present invention can be server, PC, PAD, mobile phone etc..
The embodiment of the invention also provides a kind of computer program products, when executing on data processing equipment, are suitable for It executes the program of initialization there are as below methods step: obtaining the web site url for feeding back exception information in website;Judge the feedback The quantity of the web site url of exception information whether be more than the website chain rupture threshold value;If not exceeded, then determining that the feedback is different The web site url of normal information is website chain rupture;If being more than, by proxy server to the website chain of the feedback exception information Capable detection is tapped into, when the feedback information of the detection is identical as the exception information, determines the net of the feedback exception information Station is linked as website chain rupture.
Further, the feedback detected by web site url of the proxy server to the feedback exception information When information is identical as the exception information, it is determined that the web site url of the feedback exception information is that website chain rupture includes:
Address information is obtained by proxy server;
It by the address information, is crawled, and obtained using web site url of the crawler to the feedback exception information The feedback information of the web site url of the feedback exception information;
Judge whether the feedback information of the web site url of the feedback exception information is identical as the exception information;
If they are the same, it is determined that the web site url of the feedback exception information is website chain rupture.
Further, if the quantity of the web site url for judging the feedback exception information is less than the disconnected of the website Chain threshold value, it is determined that the web site url of the feedback exception information is that website chain rupture includes:
If being less than the chain rupture threshold value of the website in the quantity of the web site url for judging the feedback exception information Later, address information is obtained by proxy server;
It by the address information, is crawled, and obtained using web site url of the crawler to the feedback exception information The feedback information of the web site url of the feedback exception information;
When the feedback information of the web site url of the feedback exception information is identical as the exception information, the net is determined Link of standing is really website chain rupture.
Further, if judging that the quantity of the web site url of the feedback exception information is more than the chain rupture threshold of the website After value, the method also includes:
When by proxy server to the feedback information that is detected of web site url of the feedback exception information with it is described When exception information difference, determining the web site url not is website chain rupture;
Prompting message is exported, the prompting message is used to indicate former server address that the crawler uses by the net It stands disabling.
Further, the web site url for obtaining feedback exception information in website includes:
Whole links of website are successively crawled using crawler;
The information of the web site url feedback successively crawled according to the crawler, which determines, feeds back abnormal letter in the web site url The web site url of breath;
Calculate the web site url quantity of the feedback exception information.
Further, the exception information is webpage abnormality code.
It should be understood by those skilled in the art that, embodiments herein can provide as method, system or computer program Product.Therefore, complete hardware embodiment, complete software embodiment or reality combining software and hardware aspects can be used in the application Apply the form of example.Moreover, it wherein includes the computer of computer usable program code that the application, which can be used in one or more, The computer program implemented in usable storage medium (including but not limited to magnetic disk storage, CD-ROM, optical memory etc.) produces The form of product.
The application is referring to method, the process of equipment (system) and computer program product according to the embodiment of the present application Figure and/or block diagram describe.It should be understood that every one stream in flowchart and/or the block diagram can be realized by computer program instructions The combination of process and/or box in journey and/or box and flowchart and/or the block diagram.It can provide these computer programs Instruct the processor of general purpose computer, special purpose computer, Embedded Processor or other programmable data processing devices to produce A raw machine, so that being generated by the instruction that computer or the processor of other programmable data processing devices execute for real The device for the function of being specified in present one or more flows of the flowchart and/or one or more blocks of the block diagram.
These computer program instructions, which may also be stored in, is able to guide computer or other programmable data processing devices with spy Determine in the computer-readable memory that mode works, so that it includes referring to that instruction stored in the computer readable memory, which generates, Enable the manufacture of device, the command device realize in one box of one or more flows of the flowchart and/or block diagram or The function of being specified in multiple boxes.
These computer program instructions also can be loaded onto a computer or other programmable data processing device, so that counting Series of operation steps are executed on calculation machine or other programmable devices to generate computer implemented processing, thus in computer or The instruction executed on other programmable devices is provided for realizing in one or more flows of the flowchart and/or block diagram one The step of function of being specified in a box or multiple boxes.
In a typical configuration, calculating equipment includes one or more processors (CPU), input/output interface, net Network interface and memory.
Memory may include the non-volatile memory in computer-readable medium, random access memory (RAM) and/ Or the forms such as Nonvolatile memory, such as read-only memory (ROM) or flash memory (flash RAM).Memory is computer-readable Jie The example of matter.
Computer-readable medium includes permanent and non-permanent, removable and non-removable media can be by any method Or technology come realize information store.Information can be computer readable instructions, data structure, the module of program or other data. The example of the storage medium of computer includes, but are not limited to phase change memory (PRAM), static random access memory (SRAM), moves State random access memory (DRAM), other kinds of random access memory (RAM), read-only memory (ROM), electric erasable Programmable read only memory (EEPROM), flash memory or other memory techniques, read-only disc read only memory (CD-ROM) (CD-ROM), Digital versatile disc (DVD) or other optical storage, magnetic cassettes, tape magnetic disk storage or other magnetic storage devices Or any other non-transmission medium, can be used for storage can be accessed by a computing device information.As defined in this article, it calculates Machine readable medium does not include temporary computer readable media (transitory media), such as the data-signal and carrier wave of modulation.
It should also be noted that, the terms "include", "comprise" or its any other variant are intended to nonexcludability It include so that the process, method, commodity or the equipment that include a series of elements not only include those elements, but also to wrap Include other elements that are not explicitly listed, or further include for this process, method, commodity or equipment intrinsic want Element.In the absence of more restrictions, the element limited by sentence "including a ...", it is not excluded that including element There is also other identical elements in process, method, commodity or equipment.
It will be understood by those skilled in the art that embodiments herein can provide as method, system or computer program product. Therefore, complete hardware embodiment, complete software embodiment or embodiment combining software and hardware aspects can be used in the application Form.It is deposited moreover, the application can be used to can be used in the computer that one or more wherein includes computer usable program code The shape for the computer program product implemented on storage media (including but not limited to magnetic disk storage, CD-ROM, optical memory etc.) Formula.
The above is only embodiments herein, are not intended to limit this application.To those skilled in the art, Various changes and changes are possible in this application.It is all within the spirit and principles of the present application made by any modification, equivalent replacement, Improve etc., it should be included within the scope of the claims of this application.

Claims (10)

1. a kind of website chain rupture detection method, which is characterized in that the described method includes:
Obtain the web site url that exception information is fed back in website;
Judge it is described feedback exception information web site url quantity whether be more than the website chain rupture threshold value;
If not exceeded, the web site url for then determining the feedback exception information is website chain rupture;
If being more than, detected by web site url of the proxy server to the feedback exception information, when the detection When feedback information is identical as the exception information, determine that the web site url of the feedback exception information is website chain rupture.
2. the method according to claim 1, wherein it is described by proxy server to the feedback exception information The feedback information that is detected of web site url it is identical as the exception information when, it is determined that the website of the feedback exception information Being linked as website chain rupture includes:
Address information is obtained by proxy server;
It by the address information, is crawled using web site url of the crawler to the feedback exception information, and described in acquisition Feed back the feedback information of the web site url of exception information;
Judge whether the feedback information of the web site url of the feedback exception information is identical as the exception information;
If they are the same, it is determined that the web site url of the feedback exception information is website chain rupture.
3. if according to the method described in claim 2, the it is characterized in that, web site url for judging the feedback exception information Quantity be less than the chain rupture threshold value of the website, it is determined that the web site url of the feedback exception information is website chain rupture packet It includes:
After if the quantity of the web site url for judging the feedback exception information is less than the chain rupture threshold value of the website, Address information is obtained by proxy server;
It by the address information, is crawled using web site url of the crawler to the feedback exception information, and described in acquisition Feed back the feedback information of the web site url of exception information;
When the feedback information of the web site url of the feedback exception information is identical as the exception information, the website chain is determined It connects really as website chain rupture.
4. according to the method described in claim 2, it is characterized in that, the method also includes:
When the feedback information and the exception detected by web site url of the proxy server to the feedback exception information When information difference, determining the web site url not is website chain rupture;
Prompting message is exported, the prompting message is used to indicate the former server address that the crawler uses to be prohibited by the website With.
5. the method according to claim 1, wherein described obtain the web site url for feeding back exception information in website Include:
Whole links of website are successively crawled using crawler;
The information of the web site url feedback successively crawled according to the crawler, which determines, feeds back exception information in the web site url Web site url;
Calculate the web site url quantity of the feedback exception information.
6. method according to any one of claims 1-5, which is characterized in that the exception information is webpage abnormality Code.
7. a kind of website broken chain detecting device, which is characterized in that described device includes:
Acquiring unit, for obtaining the web site url for feeding back exception information in website;
Whether the quantity of judging unit, the web site url of the feedback exception information for judging the acquiring unit acquisition is more than institute State the chain rupture threshold value of website;
Determination unit, if judging that the quantity of the web site url of feedback exception information is less than the website for the judging unit Chain rupture threshold value, it is determined that it is described feedback exception information web site url be website chain rupture;
Detection unit, if judging that the quantity of the web site url of feedback exception information is more than the website for the judging unit Chain rupture threshold value is then detected by web site url of the proxy server to the feedback exception information;
The determination unit is also used to determine when the feedback information that the detection unit detects is identical as the exception information The web site url of the feedback exception information is website chain rupture.
8. device according to claim 7, which is characterized in that the determination unit includes:
First obtains module, for obtaining address information by proxy server;
First crawls module, different to the feedback using crawler for obtaining the address information that module obtains by described first The web site url of normal information is crawled;
Described first obtains module, is also used to obtain the feedback information of the web site url of the feedback exception information;
Judgment module, for judging that the feedback information of web site url of feedback exception information that the first acquisition module obtains is It is no identical as the exception information;
First determining module, if for the judgment module judge feedback exception information web site url feedback information with it is described Exception information is identical, it is determined that the web site url of the feedback exception information is website chain rupture.
9. a kind of storage medium, which is characterized in that the storage medium includes the program of storage, wherein run in described program When control the storage medium where equipment perform claim require 1 to the website chain rupture inspection described in any one of claim 6 Survey method.
10. a kind of processor, which is characterized in that the processor is for running program, wherein right of execution when described program is run Benefit requires 1 to the website chain rupture detection method described in any one of claim 6.
CN201710612685.9A 2017-07-25 2017-07-25 Website broken link detection method and device Active CN109302299B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201710612685.9A CN109302299B (en) 2017-07-25 2017-07-25 Website broken link detection method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710612685.9A CN109302299B (en) 2017-07-25 2017-07-25 Website broken link detection method and device

Publications (2)

Publication Number Publication Date
CN109302299A true CN109302299A (en) 2019-02-01
CN109302299B CN109302299B (en) 2021-12-28

Family

ID=65167402

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710612685.9A Active CN109302299B (en) 2017-07-25 2017-07-25 Website broken link detection method and device

Country Status (1)

Country Link
CN (1) CN109302299B (en)

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102739663A (en) * 2012-06-18 2012-10-17 奇智软件(北京)有限公司 Detection method and scanning engine of web pages
CA2762544A1 (en) * 2011-12-20 2013-06-20 Ibm Canada Limited - Ibm Canada Limitee Identifying requests that invalidate user sessions
US20130179217A1 (en) * 2010-06-21 2013-07-11 Salesforce.Com, Inc. Referred internet traffic analysis system and method
CN104182462A (en) * 2014-07-21 2014-12-03 安徽华贞信息科技有限公司 Web crawler service system for housing library network
CN104537005A (en) * 2014-12-15 2015-04-22 北京国双科技有限公司 Data processing method and device for webpage crawling
CN106547793A (en) * 2015-09-22 2017-03-29 北京国双科技有限公司 The method and apparatus for obtaining proxy server address
CN106874487A (en) * 2017-02-21 2017-06-20 国信优易数据有限公司 A kind of distributed reptile management system and its method

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20130179217A1 (en) * 2010-06-21 2013-07-11 Salesforce.Com, Inc. Referred internet traffic analysis system and method
CA2762544A1 (en) * 2011-12-20 2013-06-20 Ibm Canada Limited - Ibm Canada Limitee Identifying requests that invalidate user sessions
CN102739663A (en) * 2012-06-18 2012-10-17 奇智软件(北京)有限公司 Detection method and scanning engine of web pages
CN104182462A (en) * 2014-07-21 2014-12-03 安徽华贞信息科技有限公司 Web crawler service system for housing library network
CN104537005A (en) * 2014-12-15 2015-04-22 北京国双科技有限公司 Data processing method and device for webpage crawling
CN106547793A (en) * 2015-09-22 2017-03-29 北京国双科技有限公司 The method and apparatus for obtaining proxy server address
CN106874487A (en) * 2017-02-21 2017-06-20 国信优易数据有限公司 A kind of distributed reptile management system and its method

Also Published As

Publication number Publication date
CN109302299B (en) 2021-12-28

Similar Documents

Publication Publication Date Title
US10459780B2 (en) Automatic application repair by network device agent
CN105243252B (en) A kind of method and device of account risk assessment
US20130276124A1 (en) Systems, methods, apparatuses and computer program products for providing mobile device protection
CN103984673A (en) Automatic detection of fraudulent ratings/comments related to an application store
CN107958456A (en) Dispensing detection method, device and electronic equipment
CN107943949A (en) A kind of method and server of definite web crawlers
CN108599995A (en) Network line failure judgment method and server
US20170053224A1 (en) System and method for providing multi-site visualization and scoring of performance against service agreement
CN106961410B (en) Abnormal access detection method and device
CN110968760A (en) Webpage data crawling method and device, and webpage login method and device
US20170053225A1 (en) System and method for providing visualization of performance against service agreement
CN109298987A (en) A kind of method and device detecting web crawlers operating status
CN109726068A (en) A kind of data detection method and device
CN106572056A (en) Risk monitoring method and device
CN108228431A (en) A kind of method and system of configurationization reptile quality-monitoring
CN105490835A (en) Information monitoring method and device
CN110941787A (en) Page redirection method and device
CN109302299A (en) A kind of website chain rupture detection method and device
CN106411860B (en) A kind of method and device of Internet protocol IP detection
CN107896232A (en) A kind of IP address appraisal procedure and device
CN110278105A (en) The method for detecting whole service operation quality based on zabbix and web testing
US10536534B2 (en) System and method for providing visual feedback in site-related service activity roadmap
CN106612261A (en) Website data obtaining method, devices and system
CN109597743A (en) Page circle choosing method, click volume statistical method and relevant device
US20170052957A1 (en) System and method for providing high-level graphical feedback related to overall site performance and health

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
CB02 Change of applicant information

Address after: 100083 No. 401, 4th Floor, Haitai Building, 229 North Fourth Ring Road, Haidian District, Beijing

Applicant after: Beijing Guoshuang Technology Co.,Ltd.

Address before: 100086 Cuigong Hotel, 76 Zhichun Road, Shuangyushu District, Haidian District, Beijing

Applicant before: Beijing Guoshuang Technology Co.,Ltd.

CB02 Change of applicant information
GR01 Patent grant
GR01 Patent grant