CN109302299A - A kind of website chain rupture detection method and device - Google Patents
A kind of website chain rupture detection method and device Download PDFInfo
- Publication number
- CN109302299A CN109302299A CN201710612685.9A CN201710612685A CN109302299A CN 109302299 A CN109302299 A CN 109302299A CN 201710612685 A CN201710612685 A CN 201710612685A CN 109302299 A CN109302299 A CN 109302299A
- Authority
- CN
- China
- Prior art keywords
- feedback
- web site
- website
- site url
- exception information
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L41/00—Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
- H04L41/50—Network service management, e.g. ensuring proper service fulfilment according to agreements
- H04L41/5003—Managing SLA; Interaction between SLA and QoS
- H04L41/5009—Determining service level performance parameters or violations of service level contracts, e.g. violations of agreed response time or mean time between failures [MTBF]
- H04L41/5012—Determining service level performance parameters or violations of service level contracts, e.g. violations of agreed response time or mean time between failures [MTBF] determining service availability, e.g. which services are available at a certain point in time
- H04L41/5016—Determining service level performance parameters or violations of service level contracts, e.g. violations of agreed response time or mean time between failures [MTBF] determining service availability, e.g. which services are available at a certain point in time based on statistics of service availability, e.g. in percentage or over a given time
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L43/00—Arrangements for monitoring or testing data switching networks
- H04L43/08—Monitoring or testing based on specific metrics, e.g. QoS, energy consumption or environmental parameters
- H04L43/0805—Monitoring or testing based on specific metrics, e.g. QoS, energy consumption or environmental parameters by checking availability
- H04L43/0811—Monitoring or testing based on specific metrics, e.g. QoS, energy consumption or environmental parameters by checking availability by checking connectivity
Landscapes
- Engineering & Computer Science (AREA)
- Computer Networks & Wireless Communication (AREA)
- Signal Processing (AREA)
- Environmental & Geological Engineering (AREA)
- Physics & Mathematics (AREA)
- Probability & Statistics with Applications (AREA)
- Information Transfer Between Computers (AREA)
- Debugging And Monitoring (AREA)
Abstract
The invention discloses a kind of website chain rupture detection method and device, are related to network technique field, and to solve the prior art when carrying out website chain rupture detection, the accuracy rate of testing result is lower and invents.The method comprise the steps that obtaining the web site url for feeding back exception information in website;Judge it is described feedback exception information web site url quantity whether be more than the website chain rupture threshold value;If not exceeded, the web site url for then determining the feedback exception information is website chain rupture;If being more than, detected by web site url of the proxy server to the feedback exception information, when the feedback information of the detection is identical as the exception information, determines that the web site url of the feedback exception information is website chain rupture.The present invention is suitably applied in the detection of website chain rupture.
Description
Technical field
The present invention relates to network technique field more particularly to a kind of website chain rupture detection method and device.
Background technique
With gradually popularizing for network, network has become the important component in people's life.In browsing web sites
Webpage when, it is possible that situations such as web displaying mistake, general these web page interlinkages that can not be browsed are known as net by us
It stands chain rupture.For a website, the quantity of website chain rupture is to measure the important indicator of Website quality quality.Therefore, usually
The detection of website chain rupture can be carried out, to website to realize the effect being monitored to the quality of website.In general, to website
When carrying out website chain rupture detection, the access behavior that crawler carrys out analog subscriber will use, and true according to the feedback information of web site url
Whether the fixed web site url is website chain rupture, to realize the detection function of website chain rupture.
But when using crawler, in fact it could happen that the case where crawler disables caused crawler dysfunction by website, thus
When causing to detect website chain rupture using crawler, the accuracy rate of testing result is lower.In order to exclude the forbidden feelings of crawler
Condition simultaneously ensures that crawler function is normal, and the prior art generally first selects a chain of website when using crawler detection website chain rupture
It connects as benchmark chain, as object of reference when crawling other links of the website, for example, in other webpages using crawler to website
When link is crawled, if feedback information is exception information, and being crawled fed back information not to benchmark chain is abnormal letter
When breath, it is determined that the web page interlinkage of current feedback exception information is website chain rupture.However, in actual operation, due to benchmark chain
It is not fixed and invariable, usually occurs changing because of the correcting or upgrading of website, cause benchmark chain to fail, Jin Erwu
Method excludes the forbidden situation of crawler and influences testing result, and the accuracy rate so as to cause website chain rupture detection is lower.
Summary of the invention
In view of the above problems, the present invention provides a kind of website chain rupture detection method and device, and main purpose is to improve net
The accuracy of chain rupture of standing detection.
In order to solve the above technical problems, in a first aspect, the present invention provides a kind of website chain rupture detection method, this method packet
It includes:
Obtain the web site url that exception information is fed back in website;
Judge it is described feedback exception information web site url quantity whether be more than the website chain rupture threshold value;
If not exceeded, the web site url for then determining the feedback exception information is website chain rupture;
If being more than, detected by web site url of the proxy server to the feedback exception information, when the inspection
When the feedback information of survey is identical as the exception information, determine that the web site url of the feedback exception information is website chain rupture.
Optionally, the feedback letter detected by web site url of the proxy server to the feedback exception information
When ceasing identical as the exception information, it is determined that the web site url of the feedback exception information is that website chain rupture includes:
Address information is obtained by proxy server;
It by the address information, is crawled, and obtained using web site url of the crawler to the feedback exception information
The feedback information of the web site url of the feedback exception information;
Judge whether the feedback information of the web site url of the feedback exception information is identical as the exception information;
If they are the same, it is determined that the web site url of the feedback exception information is website chain rupture.
Optionally, if the quantity of the web site url for judging the feedback exception information is less than the chain rupture of the website
Threshold value, it is determined that the web site url of the feedback exception information is that website chain rupture includes:
If being less than the chain rupture threshold value of the website in the quantity of the web site url for judging the feedback exception information
Later, address information is obtained by proxy server;
It by the address information, is crawled, and obtained using web site url of the crawler to the feedback exception information
The feedback information of the web site url of the feedback exception information;
When the feedback information of the web site url of the feedback exception information is identical as the exception information, the net is determined
Link of standing is really website chain rupture.
Optionally, the method also includes:
When by proxy server to the feedback information that is detected of web site url of the feedback exception information with it is described
When exception information difference, determining the web site url not is website chain rupture;
Prompting message is exported, the prompting message is used to indicate former server address that the crawler uses by the net
It stands disabling.
Optionally, the web site url for obtaining feedback exception information in website includes:
Whole links of website are successively crawled using crawler;
The information of the web site url feedback successively crawled according to the crawler, which determines, feeds back abnormal letter in the web site url
The web site url of breath;
Calculate the web site url quantity of the feedback exception information.
Optionally, the exception information is webpage abnormality code.
Second aspect, the present invention also provides a kind of website broken chain detecting device, which includes:
Acquiring unit, for obtaining the web site url for feeding back exception information in website;
Whether the quantity of judging unit, the web site url of the feedback exception information for judging the acquiring unit acquisition surpasses
Cross the chain rupture threshold value of the website;
Determination unit, if it is described to judge that the quantity for feeding back the web site url of exception information is less than for the judging unit
The chain rupture threshold value of website, it is determined that the web site url of the feedback exception information is website chain rupture;
Detection unit, if judging that the quantity of the web site url of feedback exception information is more than the net for the judging unit
The chain rupture threshold value stood then is detected by web site url of the proxy server to the feedback exception information;
The determination unit is also used to when the feedback information that the detection unit detects is identical as the exception information,
The web site url for determining the feedback exception information is website chain rupture.
Optionally, the determination unit includes:
First obtains module, for obtaining address information by proxy server;
First crawls module, for obtaining the address information that module obtains by described first, using crawler to described anti-
The web site url of feedback exception information is crawled;
Described first obtains module, is also used to obtain the feedback information of the web site url of the feedback exception information;
Judgment module, the feedback letter of the web site url of the feedback exception information for judging the first acquisition module acquisition
It whether identical as the exception information ceases;
First determining module, if for the judgment module judge feedback exception information web site url feedback information with
The exception information is identical, it is determined that the web site url of the feedback exception information is website chain rupture.
Optionally, the determination unit includes:
Second obtains module, if for being less than institute in the quantity of the web site url for judging the feedback exception information
After the chain rupture threshold value for stating website, address information is obtained by proxy server;
Second crawls module, for obtaining the address information that module obtains by described second, using crawler to described anti-
The web site url of feedback exception information is crawled;
Described second obtains module, is also used to obtain the feedback information of the web site url of the feedback exception information;
Second determining module, the feedback of the web site url for obtaining the feedback exception information that module obtains when described second
When information is identical as the exception information, determining the web site url really is website chain rupture.
Optionally, described device further include:
The determination unit is also used to examine when by web site url of the proxy server to the feedback exception information
When the feedback information of survey and the exception information difference, determining the web site url not is website chain rupture;
Output unit, for exporting prompting message, the prompting message is used to indicate the former server that the crawler uses
Address is disabled by the website.
Optionally, the acquiring unit includes:
Module is crawled, for successively crawling using crawler to whole links of website;
Determining module, the information for crawling the web site url feedback that module is successively crawled using crawler according to determine
The web site url of exception information is fed back in the web site url;
Computing module, for calculating the web site url quantity for the feedback exception information that the determining module determines.
Optionally, the exception information is webpage abnormality code.
To achieve the goals above, according to the third aspect of the invention we, a kind of storage medium, the storage medium are provided
Program including storage, wherein equipment where controlling the storage medium in described program operation executes net described above
It stands chain rupture detection method.
To achieve the goals above, according to the fourth aspect of the invention, a kind of processor is provided, the processor is used for
Run program, wherein described program executes chain rupture detection method in website described above when running.
By above-mentioned technical proposal, chain rupture detection method and device in website provided by the invention, for the prior art logical
It crosses using benchmark chain when determining website chain rupture, the case where there are the failures of benchmark chain, makes testing result susceptible, so as to cause
The lower problem of the accuracy rate of website chain rupture detection, the present invention pass through the web site url quantity and chain rupture that will have fed back exception information
Threshold value compares, and can determine whether the function of current crawler is disabled, to determine when being less than chain rupture threshold value determining
Crawler function is normal, and determines that feedback exception information is linked as website chain rupture when detecting exception information, and then can arrange
To the influence of website chain rupture detection when being disabled except crawler function by website, the accuracy of website chain rupture detection is improved.In addition, working as
When determining that the number of links of feedback exception information is more than chain rupture threshold value, by proxy server to the website chain of feedback exception information
Tap into capable detection, can identify whether crawler function disabled, so the feedback information detected by proxy server with it is different
When often information is identical, determine current feedback exception information is linked as website chain rupture, and it is disabled right further to eliminate crawler
The influence of website chain rupture detection rings asking for website chain rupture detection so as to avoid to the failure in the prior art due to benchmark chain
Topic, then further improves the accuracy of website chain rupture.
The above description is only an overview of the technical scheme of the present invention, in order to better understand the technical means of the present invention,
And it can be implemented in accordance with the contents of the specification, and in order to allow above and other objects of the present invention, feature and advantage can
It is clearer and more comprehensible, the followings are specific embodiments of the present invention.
Detailed description of the invention
By reading the following detailed description of the preferred embodiment, various other advantages and benefits are common for this field
Technical staff will become clear.The drawings are only for the purpose of illustrating a preferred embodiment, and is not considered as to the present invention
Limitation.And throughout the drawings, the same reference numbers will be used to refer to the same parts.In the accompanying drawings:
Fig. 1 shows a kind of website chain rupture detection method flow chart provided in an embodiment of the present invention;
Fig. 2 shows another website chain rupture detection method flow charts provided in an embodiment of the present invention;
Fig. 3 shows a kind of composition block diagram of website broken chain detecting device provided in an embodiment of the present invention;
Fig. 4 shows the composition block diagram of another website broken chain detecting device provided in an embodiment of the present invention.
Specific embodiment
The exemplary embodiment that the present invention will be described in more detail below with reference to accompanying drawings.Although showing the present invention in attached drawing
Exemplary embodiment, it being understood, however, that may be realized in various forms the present invention without should be by embodiments set forth here
It is limited.It is to be able to thoroughly understand the present invention on the contrary, providing these embodiments, and can be by the scope of the present invention
It is fully disclosed to those skilled in the art.
In order to improve the accuracy of website chain rupture detection, the embodiment of the invention provides a kind of website chain rupture detection method,
As shown in Figure 1, this method comprises:
101, the web site url that exception information is fed back in website is obtained.
In general, web site url can be rung according to access request when accessing to the web page interlinkage in a website
It answers and issues feedback information to client, when the web page interlinkage is not website chain rupture, the webpage can normally access simultaneously anti-
Present normal feedback information.But when web page interlinkage is website chain rupture, since web page interlinkage cannot be accessed normally, then the net
Feedback information of the page when being responded is exception information, wherein the exception information may include abnormal web data or different
Any one of the different modes such as normal webpage status code.In addition, can choose use when being detected to website chain rupture and climb
Web page interlinkage of the worm to website is crawled, the access behavior with analog subscriber to the web page interlinkage.It should be noted that
In the embodiment of the present invention, the access behavior of analog subscriber is mainly carried out in a manner of using crawler, but analog subscriber accesses behavior
Specific embodiment is also an option that other modes, it is not limited here, can according to need and is chosen.
Therefore, the method according to this step can be used crawler and crawl to website, and according to receiving after crawling
The feedback information arrived therefrom obtains the web site url of feedback exception information, obtains the web site url of feedback exception information.
102, judge it is described feedback exception information web site url quantity whether be more than the website chain rupture threshold value.
In general, a large amount of web page interlinkage is contained in a website, meanwhile, not all link is all normal
Link, more or less can all have abnormal web page interlinkage, such as website chain rupture.In this case, either website
The visitor of administrative staff either website keeps tolerance to a certain extent for the website chain rupture that quantity is less than threshold value.
Also therefore, the concept of chain rupture threshold value has been derived, the chain rupture threshold value can be regarded as " red line " of website chain rupture quantity, when
It has been more than chain rupture threshold value, then has meaned that chain rupture quantity is excessive in current site and seriously affected the user experience of website, needed
Carry out relevant processing.Therefore, the chain rupture threshold value of website can be used as in the detection process or analytic process to website chain rupture
In reference frame.
The method according to embodiments of the present invention, due to carrying out the detection of website chain rupture using crawler.Therefore, when climbing
When worm is disabled by the website, cause crawler that can not crawl the link in targeted website, the feedback information obtained from is all different
Normal information, to influence the accuracy of website chain rupture testing result.But in fact, website can when being linked at without using crawler
Normal access.In this case, in order to verify whether website disables crawler, the side according to this step is needed
Method counts the web site url of the feedback exception information after the web site url that step 101 gets the feedback exception information
Quantity, then the quantity counted and the chain rupture threshold value of the website are compared, judge whether be more than chain rupture threshold value.
In addition, after executing step 102, if it is described to judge that the quantity of the web site url of the feedback exception information is less than
When the chain rupture threshold value of website, 103 are thened follow the steps, step specifically:
103, the web site url for determining the feedback exception information is website chain rupture.
When the quantity of the web site url of the feedback exception information is less than the chain rupture threshold value of the website, illustrate currently to climb
The function of worm is normally, not occur the forbidden situation of crawler.Since crawler can normally crawl in the website
The link of webpage, so that the web site url for confirming that crawler feeds back exception information when crawling is strictly the chain that can not normally access
It connects, may thereby determine that this is linked as website chain rupture, ensure that the accuracy of website chain rupture detection.
In addition, after executing step 102, if the quantity for judging the web site url of the feedback exception information is more than the net
When the chain rupture threshold value stood, 104 are thened follow the steps, step specifically:
104, it is detected by web site url of the proxy server to the feedback exception information, it is anti-when the detection
When feedforward information is identical as the exception information, determine that the web site url of the feedback exception information is website chain rupture.
After step 102 judgement, if judging, the quantity of the web site url for feeding back exception information is more than the website
Chain rupture threshold value, then illustrate that current crawler may be disabled, what the webpage for crawling crawler was fed back is all exception information, thus
Cause to feed back the number of links of exception information to be more than chain rupture threshold value.In the case, illustrate that currently passing through crawler breaks to website
The detection of chain is likely to inaccuracy, needs to come in other way to carry out the web site url of feedback exception information true
Recognize.In method described in this step, by using proxy server, one is carried out to the web site url of feedback exception information again
Secondary detection, and feed back whether the web site url of exception information is website chain rupture according to testing result to determine.Pass through agency as a result,
Detection of the server to the web site url of the feedback exception information, when feedback information is identical as exception information before, really
The fixed detected web site url is the link that can not normally access really, it is possible thereby to determine current feedback exception information
Web site url is website chain rupture.
For example, when carrying out website chain rupture detection to website A by crawler, when the number of links of feedback exception information is 33
A, chain rupture threshold value is 5, the method according to this step, and web site url quantity 33 due to feedback exception information are greater than
It chain rupture threshold value 5, is disabled by website A it is possible thereby to determine and be likely to crawler, therefore the net of this 33 feedback exception informations
The detection linked of standing is likely to inaccuracy.Therefore, by proxy server this 33 are fed back with the website chain of exception information
Tap into capable detection.When web site url a therein is after proxy server detects, the information of feedback and the exception fed back before
Information is identical, then illustrates that web site url a is strictly the link that cannot be accessed, thus may determine that web site url a is website chain rupture.
Chain rupture detection method in website provided in an embodiment of the present invention is determining website by benchmark chain for the prior art
When chain rupture, there are benchmark chain failure the case where, make testing result susceptible, so as to cause website chain rupture detection accuracy rate compared with
Low problem, by the present invention in that compared with the chain rupture threshold value of website and the web site url quantity of feedback exception information,
Determine that crawler function is normal when the non-superthreshold of quantity, so that it is determined that website chain rupture detection function is normal;On the other hand in quantity
When more than threshold value, by carrying out detection can further determine whether the function of crawler is normal using proxy server, from
And determining whether chain rupture detection function in website is normal, therefore compared with the prior art, the present invention passes through the chain rupture threshold using website
Whether the function that value verifies crawler with proxy server is normal, the forbidden situation of crawler function can be excluded, to avoid
There is the situation of testing result inaccuracy when being disabled by website in crawler, and then solves the prior art because benchmark chain is unstable,
The problem for causing testing result accuracy poor.
Further, as the refinement and extension to embodiment illustrated in fig. 1, the embodiment of the invention also provides another nets
It stands chain rupture detection method, as shown in Figure 2.
201, the web site url that exception information is fed back in website is obtained.
Wherein, exception information described in the embodiment of the present invention can be webpage abnormality code.The webpage abnormality
Code actually refers to the status code that webpage abnormal state is used to indicate in webpage status code.Access is issued to webpage in user terminal to ask
When asking, webpage can be fed back according to the request, wherein including the webpage that whether can be normally accessed for characterizing the webpage
Status code.
Webpage status code, also known as doing HTTP status code (HTTP Status Code, be translated into HTTP status code) is a kind of use
To indicate 3 digit numerical codes of web page server http response state.It is defined by the specification of RFC 2616, and obtains RFC
2518, the specifications such as RFC 2817, RFC 2295, RFC 2774, RFC 4918 extend.Webpage described in the embodiment of the present invention is abnormal
Status code is then that the part status code of webpage exception is indicated in the webpage status code, for example, " 404 ", " 503 " etc..
The method according to this step as a result, firstly, passing through the access behavior of crawler analog subscriber, to be detected
Whole web page interlinkages in website carry out layer-by-layer crawl.Since the web page interlinkage of website can feed back corresponding net after being crawled
Page status code, therefore obtain after crawling whole status codes of the link feedback of the website and corresponding with the webpage status code
Link.
Then, webpage abnormality code is filtered out in the webpage status code got, the explanation needed herein, by
In three bit digitals that webpage status code is according to intrinsic rule editing, therefore each status code is corresponding with fixed meaning.By
This, in the method described in this step, can filter out the status code for representing webpage exception according to the intrinsic meaning of status code,
Such as " 404 " represent access request failure, then can be used as webpage abnormality code described in this step and screened.And according to
The webpage abnormality code filtered out feeds back the web page interlinkage of this yard to determine, to obtain feeding back webpage exception in the website
The web site url of status code.
Finally, the web site url for feeding back abnormal webpage status code is calculated, calculates the feedback exception information
Web site url quantity.
Determine whether feedback information is exception information by the website abnormal status code described in this step, can simplify pair
The analysis and determination process of feedback information, and reduce analysis time, and then it is disconnected to improve website described in the embodiment of the present invention
The time loss of chain detection, improves detection efficiency.
202, judge it is described feedback exception information web site url quantity whether be more than the website chain rupture threshold value.
Wherein, the chain rupture threshold value of website described in this step is identical as the description in step 102 in previous embodiment, herein
It does not repeat them here.Therefore, the method according to this step calculates the website chain of feedback webpage abnormality code in step 201
After connecing quantity, which can be compared with chain rupture threshold value, and judged according to comparison result.
In addition, if judging, the quantity of the web site url of the feedback exception information is less than after executing the step 202
The chain rupture threshold value of the website thens follow the steps 203, its step specifically:
203, the web site url for determining the feedback exception information is website chain rupture.
In general, when crawling web site url using crawler, it is understood that there may be the case where being disabled by the website, i.e. crawler
Crawling feedback received by any link of the website is all webpage abnormality code, rather than the link with crawler to website
What is accessed is normal webpage status code.In the case, since the function of crawler is actually no longer valid, because
This has been inaccurate with the webpage status code that crawler obtains come analog subscriber.
So the web site url quantity for feeding back webpage abnormality code is general when crawler function is disabled by the website
It is all the preset chain rupture threshold value in website to be more than.Therefore the method according to this step, when webpage exception shape is fed back in judgement
When the quantity of the web site url of state code is less than the chain rupture threshold value of the website, it is disabled to illustrate that current crawler does not have, function is just
Often.Thus prove that the web site url that crawler crawls is strictly there is abnormal link, and then can determine that the feedback webpage is different
The web site url of normal status code is website chain rupture.
In addition, judging the feedback webpage abnormality to further improve the accuracy of website chain rupture detection
The quantity of the web site url of code is less than after the chain rupture threshold value of the website, can also be by proxy server to the feedback
The web site url of webpage abnormality code is further detected.
Specifically, it is possible, firstly, to obtaining address information by proxy server.Because crawler is all when using crawler
The web page interlinkage in website is crawled based on an address information.For example, being come using the IP address that some is fixed
Whole links of targeted website are crawled.But in the method described in the embodiment of the present invention, main purpose is to solve
The certainly disabled problem for leading to testing result inaccuracy of crawler, and under normal conditions, website is to the disabling of crawler mainly with envelope
Close what the mode of the address of the crawler carried out, i.e., what operation behavior no matter the address where crawler carry out, and website is to the address
Request all without normally responding.Therefore, proxy server is used when determination is less than chain rupture threshold value in this step
Additional address is provided for crawler, to exclude the forbidden situation in address that crawler uses.
Then, by the address information, using crawler to the web site url of the feedback webpage abnormality code again
It is crawled.After being crawled, the web page interlinkage being crawled can feed back a webpage status code, obtain the webpage status code, i.e.,
The webpage status code that the web site url of feedback abnormality code is fed back after being crawled again is obtained.
Finally, the website status code obtained after crawler is crawled again and the webpage abnormality code that crawls before into
Row comparison illustrates either crawl using the original address of crawler or using agency when the two status codes are identical
The address that server provides is crawled, and the web page interlinkage feedback is all therefore webpage abnormality code illustrates crawler
Function is normal, and determines that the web site url is website chain rupture really.
For example, when the chain rupture threshold value of website B is 10, and the web site url of the feedback abnormality code obtained at present is respectively
Link 1, link 2, link 3, link 4, link 5, link 6, link 7 when, totally 7 when, due to quantity be less than chain rupture threshold value 10,
The then method according to step can determine that current crawler is not disabled, and determine that this 7 links are the websites of website B
Chain rupture;But further, when determining that number of links 7 of feedback abnormality code are less than chain rupture threshold value 10, can make
A new address is provided for crawler with proxy server, and crawls this 7 links again.It is provided when unused proxy server
When new address, the webpage abnormality code " 404 " for being fed back to web-page requests and failing that link 1 obtains is crawled using crawler, and is weighed
The status code for newly crawling the feedback received when link 1 is that " 404 " are as before, then illustrates either to use crawler raw address
It is crawled, still having replaced the result that the new address that proxy server provides crawls link 1 is all to request not respond
Webpage abnormality code " 404 ", it is possible thereby to determine crawler it is not disabled, and the link 1 also really be website chain rupture.
The method according to this step as a result, can determine crawler function just by comparing with chain rupture threshold value
Often, the web site url so as to ensure to feed back webpage abnormality code is website chain rupture really, so that it is banned to eliminate crawler
With the influence to testing result, the accuracy of website chain rupture detection is improved.Also, further when feedback webpage abnormality
When the web page interlinkage quantity of code is less than chain rupture threshold value, detected using proxy server is further to the web page interlinkage,
And to crawl the webpage abnormality code that fed back status code is obtained with before again identical in the address that proxy server provides
When determine website chain rupture, the forbidden situation of current crawler is further eliminated, so that it is disconnected further to improve website
The accuracy of chain detection.
In addition, after executing step 202, if the quantity for judging the web site url of the feedback exception information is more than the net
When the chain rupture threshold value stood, 204 are thened follow the steps, step specifically:
204, it is detected by web site url of the proxy server to the feedback exception information, it is anti-when the detection
When feedforward information is identical as the exception information, determine that the web site url of the feedback exception information is website chain rupture.
After executing method described in step 202, when the web site url quantity of judgement feedback webpage abnormality code is super
When having crossed the website chain rupture threshold value, illustrate current crawler there may be being disabled by the website, therefore, in order to true
The accuracy for protecting website chain rupture detection, needs to detect the link using proxy server, wherein specifically detected
Journey can be with are as follows: firstly, obtaining address information by proxy server, and crawls website chain again as crawler using this address information
New address when connecing.Then, it is crawled again by web site url of the determining new address to feedback webpage abnormality code,
And the webpage status code of the feedback of the link is obtained to the feedback for crawling progress according to the link.Later, it is climbed described again
The webpage status code fed back after taking is compared with the webpage abnormality code fed back when crawling before, and both judge whether phase
Together.Finally, when the two status codes are identical, illustrate that web site url feeds back when this is crawled twice is all webpage abnormality
Code, then can determine current site link really is website chain rupture.
For example, when the chain rupture threshold value of website C is 5, and the web site url of the feedback abnormality code obtained at present is respectively
Link a, link b, link c, link d, link e, link f, link g, totally 7 when, due to quantity be greater than chain rupture threshold value 5, then
The function of the method according to step, crawler may be disabled, therefore testing result may be inaccuracy, in turn
It needs to be detected using additional mode, agency service can be used for example to be detected the inspection to ensure website chain rupture
The accuracy of survey.When be not used proxy server provide new address when, using crawler crawl link b obtain be fed back to service
Device can not currently handle the webpage abnormality code " 503 " of request, and crawl the status code of the feedback received when linking b again
It is as before for " 503 ", then illustrate either to be crawled using crawler raw address, proxy server offer has still been provided
New address be all the webpage abnormality code " 503 " for requesting not respond to the result that is crawled of link b, it is possible thereby to determine
Link b is website chain rupture.
The method according to this step as a result, it is ensured that when be more than chain rupture threshold value lead to not determine crawler whether by
When disabling, interference of the website disabling crawler function to testing result is excluded by proxy server, improves website chain rupture inspection
The accuracy of survey.
In addition, being carried out after executing step 202 when by web site url of the proxy server to the feedback exception information
When the feedback information of detection and the exception information difference, 205 are thened follow the steps, step specifically:
205, when by proxy server to the feedback information that is detected of web site url of the feedback exception information with
When the exception information difference, determining the web site url not is website chain rupture.
The method according to embodiments of the present invention, due to using proxy server to feedback webpage abnormality code
When web site url is detected, other than determining that the status code for crawling feedback twice is identical in step 204, there is also different
Situation, i.e., what is fed back when crawler before crawls link using original address is webpage abnormality code, and is made
Behind new address with proxy server offer, feedback is webpage normal condition code, and then causes to crawl fed back knot twice
Fruit is different.
Therefore, the method according to this step, there are a kind of situations, i.e., abnormal to the feedback by proxy server
The feedback states code that the web site url of status code the is detected situation different from the abnormality code.And such case occurs
The reason of as described in step 203, be that " locked in " operation has been carried out to the address where crawler as website, i.e., no matter the address to
Website issues the request of which kind of operation behavior, and website is all without response.It follows that after crawler raw address is closed,
It when replacing another address, can normally crawl, so that it is determined that the web site url of the feedback webpage abnormality code is real
It is normally that only the raw address where crawler is disabled, and then can determine that the link is not website on border
Chain rupture.
206, prompting message is exported.
After step 205 has determined the web site url not and be website chain rupture, need to export prompting message, wherein this is mentioned
Awake information, which is mainly used for issuing to related personnel, to be prompted, and the client address for prompting crawler currently used is closed by website,
Crawler is disabled in the function that the address is crawled, and then related personnel can be made to carry out corresponding operation, for example, replacement
The address information etc. at existing customer end.
Therefore, the method according to step 205-206 when the determining web site url for feeding back webpage abnormality code and makes
When the status code difference fed back with proxy server, can determine the web site url not is website chain rupture, is climbed to confirm
Raw address where worm has been closed, and crawler function when being crawled using raw address is actually forbidden, thus really
The forbidden situation of timely discovery crawler has been protected, and exports prompt information crawler can be adjusted in time in order to related personnel,
So that it is guaranteed that will not be because of its function disabled the case where judging by accident when implementing chain rupture detection in website by crawler.
Further, as the realization to method shown in above-mentioned Fig. 1, the embodiment of the invention also provides a kind of website chain ruptures
Detection device, for being realized to above-mentioned method shown in FIG. 1.The Installation practice is corresponding with preceding method embodiment, is
Easy to read, present apparatus embodiment no longer repeats the detail content in preceding method embodiment one by one, it should be understood that
Device in the present embodiment can correspond to the full content realized in preceding method embodiment.As shown in figure 3, the device includes:
Acquiring unit 31, judging unit 32, determination unit 33, detection unit 34, wherein
Acquiring unit 31 can be used for obtaining the web site url that exception information is fed back in website.
Judging unit 32 can be used for judging the number of the web site url for the feedback exception information that the acquiring unit 31 obtains
Amount whether be more than the website chain rupture threshold value.
Determination unit 33, if can be used for the quantity of the web site url of the judgement of the judging unit 32 feedback exception information not
More than the chain rupture threshold value of the website, it is determined that the web site url of the feedback exception information is website chain rupture.
Detection unit 34, if the quantity that can be used for the web site url of the judgement of the judging unit 32 feedback exception information is super
The chain rupture threshold value of the website is crossed, then is detected by web site url of the proxy server to the feedback exception information.
The determination unit 33 can be also used for feedback information and the exception information when the detection unit 34 detection
When identical, determine that the web site url of the feedback exception information is website chain rupture.
Further, as the realization to method shown in above-mentioned Fig. 2, it is disconnected that the embodiment of the invention also provides another websites
Chain detection device, for being realized to above-mentioned method shown in Fig. 2.The Installation practice is corresponding with preceding method embodiment,
To be easy to read, present apparatus embodiment no longer repeats the detail content in preceding method embodiment one by one, but it should bright
Really, the device in the present embodiment can correspond to the full content realized in preceding method embodiment.As shown in figure 4, the device packet
It includes: acquiring unit 41, judging unit 42, determination unit 43, detection unit 44, wherein
Acquiring unit 41 can be used for obtaining the web site url that exception information is fed back in website;
Judging unit 42 can be used for judging the number of the web site url for the feedback exception information that the acquiring unit 41 obtains
Amount whether be more than the website chain rupture threshold value.
Determination unit 43, if can be used for the quantity of the web site url of the judgement of the judging unit 42 feedback exception information not
More than the chain rupture threshold value of the website, it is determined that the web site url of the feedback exception information is website chain rupture.
Detection unit 44, if the quantity that can be used for the web site url of the judgement of the judging unit 42 feedback exception information is super
The chain rupture threshold value of the website is crossed, then is detected by web site url of the proxy server to the feedback exception information.
The determination unit 43 can be also used for feedback information and the exception information when the detection unit 44 detection
When identical, determine that the web site url of the feedback exception information is website chain rupture.
Further, the determination unit 43 includes:
First obtains module 431, can be used for obtaining address information by proxy server.
First crawls module 432, can be used for obtaining the address information that module 431 obtains by described first, using climbing
Worm crawls the web site url of the feedback exception information.
Described first obtains module 431, can be also used for the feedback letter for obtaining the web site url of the feedback exception information
Breath.
Judgment module 433 can be used for judging the described first website chain for obtaining the feedback exception information that module 431 obtains
Whether the feedback information connect is identical as the exception information.
First determining module 434, if can be used for the web site url of the judgement of the judgment module 433 feedback exception information
Feedback information is identical as the exception information, it is determined that the web site url of the feedback exception information is website chain rupture.
Further, the determination unit 43 includes:
Second obtains module 435, if can be used for the quantity in the web site url for judging the feedback exception information
It is less than after the chain rupture threshold value of the website, address information is obtained by proxy server.
Second crawls module 436, can be used for obtaining the address information that module 435 obtains by described second, using climbing
Worm crawls the web site url of the feedback exception information.
Described second obtains module 435, can be also used for the feedback letter for obtaining the web site url of the feedback exception information
Breath.
Second determining module 437, the website that can be used for obtaining the feedback exception information that module 435 obtains when described second
When the feedback information of link is identical as the exception information, determining the web site url really is website chain rupture.
Further, described device further include:
The determination unit 43 can be also used for when the web site url by proxy server to the feedback exception information
When the feedback information detected and the exception information difference, determining the web site url not is website chain rupture.
Output unit 45 can be used for after the determining web site url of the determination unit 43 is not website chain rupture, defeated
Prompting message out, the prompting message are used to indicate the former server address that the crawler uses and are disabled by the website.
Further, the acquiring unit 41 includes:
Module 411 is crawled, crawler is used for and whole links of website is successively crawled.
Determining module 412 can be used for crawling module 411 according to and be fed back using the web site url that crawler successively crawls
Information determine in the web site url feed back exception information web site url.
Computing module 413 can be used for calculating the web site url number for the feedback exception information that the determining module 412 determines
Amount.
Further, the exception information is webpage abnormality code.
Another kind website chain rupture detection method and device provided in an embodiment of the present invention.Benchmark is being passed through for the prior art
Chain makes testing result susceptible, come the case where when determining website chain rupture, there are the failures of benchmark chain so as to cause website chain rupture inspection
The lower problem of the accuracy rate of survey, by the present invention in that with the chain rupture threshold value of website and the web site url quantity of feedback exception information
It compares, determines that crawler function is normal in the non-superthreshold of quantity, so that it is determined that website chain rupture detection function is normal;It is another
Aspect can further determine that the function of crawler is by carrying out detection using proxy server when quantity is more than threshold value
No normal, so that it is determined that whether chain rupture detection function in website is normal, therefore compared with the prior art, the present invention is by utilizing website
Chain rupture threshold value verified with proxy server crawler function it is whether normal, the forbidden situation of crawler function can be excluded,
Occur the situation of testing result inaccuracy when being disabled so as to avoid crawler by website, and then solves the prior art because of benchmark chain
Problem that is unstable, causing testing result accuracy poor.In addition, by using the abnormality code in website status code as different
Normal information can simplify analysis and determination process to feedback information, and reduce the analysis time of feedback information, and then improve
The time loss of website chain rupture detection, improves detection efficiency.It is different when feeding back meanwhile by being compared with chain rupture threshold value
When the web site url of normal status code is less than chain rupture threshold value, it can determine that crawler function is normal, it is different so as to ensure to feed back webpage
The web site url of normal status code is website chain rupture really, to eliminate the disabled influence to testing result of crawler, is improved
The accuracy of website chain rupture detection.Also, when the web page interlinkage quantity for feeding back webpage abnormality code is less than chain rupture threshold value,
It is detected using proxy server is further to the web page interlinkage, and is crawled again in the address that proxy server provides again
Website chain rupture is determined when the status code fed back is identical as the webpage abnormality code obtained before, is further eliminated and is worked as
The preceding forbidden situation of crawler, and then improve the accuracy of website chain rupture detection.In addition, when the website of feedback abnormality code
It when link is greater than chain rupture threshold value, is detected by using proxy server, it can be ensured that be more than that chain rupture threshold value leads to not
When determining whether crawler is disabled, interference of the website disabling crawler function to testing result can be excluded using proxy server,
Improve the accuracy of website chain rupture detection.In addition, when carrying out detection by proxy server and determining that feedback webpage is abnormal
When the web site url of status code and the status code difference fed back using proxy server, it can determine that the web site url is not net
It stands chain rupture, while exporting prompt information, it is ensured that the discovery forbidden situation of crawler in time, in order to which related personnel is in time to climbing
Worm is adjusted, so that it is guaranteed that implement will not to occur judging by accident because its function is disabled when website chain rupture detection by crawler
Situation further improves the accuracy of website chain rupture detection.
The text processing apparatus includes processor and memory, above-mentioned acquiring unit, judging unit, determination unit and
Detection unit etc. stores in memory as program unit, executes above procedure list stored in memory by processor
Member realizes corresponding function.
Include kernel in processor, is gone in memory to transfer corresponding program unit by kernel.Kernel can be set one
Or more, the accuracy of website chain rupture detection is improved by adjusting kernel parameter.
Memory may include the non-volatile memory in computer-readable medium, random access memory (RAM) and/
Or the forms such as Nonvolatile memory, if read-only memory (ROM) or flash memory (flash RAM), memory include that at least one is deposited
Store up chip.
The embodiment of the invention provides a kind of storage mediums, are stored thereon with program, real when which is executed by processor
The existing website chain rupture detection method.
The embodiment of the invention provides a kind of processor, the processor is for running program, wherein described program operation
Chain rupture detection method in website described in Shi Zhihang.
The embodiment of the invention provides a kind of equipment, equipment include processor, memory and storage on a memory and can
The program run on a processor, processor perform the steps of the net for obtaining and feeding back exception information in website when executing program
It stands link;Judge it is described feedback exception information web site url quantity whether be more than the website chain rupture threshold value;If not surpassing
It crosses, it is determined that the web site url of the feedback exception information is website chain rupture;If being more than, by proxy server to described anti-
The web site url of feedback exception information is detected, and when the feedback information of the detection is identical as the exception information, determines institute
The web site url for stating feedback exception information is website chain rupture.
Further, the feedback detected by web site url of the proxy server to the feedback exception information
When information is identical as the exception information, it is determined that the web site url of the feedback exception information is that website chain rupture includes:
Address information is obtained by proxy server;
It by the address information, is crawled, and obtained using web site url of the crawler to the feedback exception information
The feedback information of the web site url of the feedback exception information;
Judge whether the feedback information of the web site url of the feedback exception information is identical as the exception information;
If they are the same, it is determined that the web site url of the feedback exception information is website chain rupture.
Further, if the quantity of the web site url for judging the feedback exception information is less than the disconnected of the website
Chain threshold value, it is determined that the web site url of the feedback exception information is that website chain rupture includes:
If being less than the chain rupture threshold value of the website in the quantity of the web site url for judging the feedback exception information
Later, address information is obtained by proxy server;
It by the address information, is crawled, and obtained using web site url of the crawler to the feedback exception information
The feedback information of the web site url of the feedback exception information;
When the feedback information of the web site url of the feedback exception information is identical as the exception information, the net is determined
Link of standing is really website chain rupture.
Further, if judging that the quantity of the web site url of the feedback exception information is more than the chain rupture threshold of the website
After value, the method also includes:
When by proxy server to the feedback information that is detected of web site url of the feedback exception information with it is described
When exception information difference, determining the web site url not is website chain rupture;
Prompting message is exported, the prompting message is used to indicate former server address that the crawler uses by the net
It stands disabling.
Further, the web site url for obtaining feedback exception information in website includes:
Whole links of website are successively crawled using crawler;
The information of the web site url feedback successively crawled according to the crawler, which determines, feeds back abnormal letter in the web site url
The web site url of breath;
Calculate the web site url quantity of the feedback exception information.
Further, the exception information is webpage abnormality code.
Equipment in the embodiment of the present invention can be server, PC, PAD, mobile phone etc..
The embodiment of the invention also provides a kind of computer program products, when executing on data processing equipment, are suitable for
It executes the program of initialization there are as below methods step: obtaining the web site url for feeding back exception information in website;Judge the feedback
The quantity of the web site url of exception information whether be more than the website chain rupture threshold value;If not exceeded, then determining that the feedback is different
The web site url of normal information is website chain rupture;If being more than, by proxy server to the website chain of the feedback exception information
Capable detection is tapped into, when the feedback information of the detection is identical as the exception information, determines the net of the feedback exception information
Station is linked as website chain rupture.
Further, the feedback detected by web site url of the proxy server to the feedback exception information
When information is identical as the exception information, it is determined that the web site url of the feedback exception information is that website chain rupture includes:
Address information is obtained by proxy server;
It by the address information, is crawled, and obtained using web site url of the crawler to the feedback exception information
The feedback information of the web site url of the feedback exception information;
Judge whether the feedback information of the web site url of the feedback exception information is identical as the exception information;
If they are the same, it is determined that the web site url of the feedback exception information is website chain rupture.
Further, if the quantity of the web site url for judging the feedback exception information is less than the disconnected of the website
Chain threshold value, it is determined that the web site url of the feedback exception information is that website chain rupture includes:
If being less than the chain rupture threshold value of the website in the quantity of the web site url for judging the feedback exception information
Later, address information is obtained by proxy server;
It by the address information, is crawled, and obtained using web site url of the crawler to the feedback exception information
The feedback information of the web site url of the feedback exception information;
When the feedback information of the web site url of the feedback exception information is identical as the exception information, the net is determined
Link of standing is really website chain rupture.
Further, if judging that the quantity of the web site url of the feedback exception information is more than the chain rupture threshold of the website
After value, the method also includes:
When by proxy server to the feedback information that is detected of web site url of the feedback exception information with it is described
When exception information difference, determining the web site url not is website chain rupture;
Prompting message is exported, the prompting message is used to indicate former server address that the crawler uses by the net
It stands disabling.
Further, the web site url for obtaining feedback exception information in website includes:
Whole links of website are successively crawled using crawler;
The information of the web site url feedback successively crawled according to the crawler, which determines, feeds back abnormal letter in the web site url
The web site url of breath;
Calculate the web site url quantity of the feedback exception information.
Further, the exception information is webpage abnormality code.
It should be understood by those skilled in the art that, embodiments herein can provide as method, system or computer program
Product.Therefore, complete hardware embodiment, complete software embodiment or reality combining software and hardware aspects can be used in the application
Apply the form of example.Moreover, it wherein includes the computer of computer usable program code that the application, which can be used in one or more,
The computer program implemented in usable storage medium (including but not limited to magnetic disk storage, CD-ROM, optical memory etc.) produces
The form of product.
The application is referring to method, the process of equipment (system) and computer program product according to the embodiment of the present application
Figure and/or block diagram describe.It should be understood that every one stream in flowchart and/or the block diagram can be realized by computer program instructions
The combination of process and/or box in journey and/or box and flowchart and/or the block diagram.It can provide these computer programs
Instruct the processor of general purpose computer, special purpose computer, Embedded Processor or other programmable data processing devices to produce
A raw machine, so that being generated by the instruction that computer or the processor of other programmable data processing devices execute for real
The device for the function of being specified in present one or more flows of the flowchart and/or one or more blocks of the block diagram.
These computer program instructions, which may also be stored in, is able to guide computer or other programmable data processing devices with spy
Determine in the computer-readable memory that mode works, so that it includes referring to that instruction stored in the computer readable memory, which generates,
Enable the manufacture of device, the command device realize in one box of one or more flows of the flowchart and/or block diagram or
The function of being specified in multiple boxes.
These computer program instructions also can be loaded onto a computer or other programmable data processing device, so that counting
Series of operation steps are executed on calculation machine or other programmable devices to generate computer implemented processing, thus in computer or
The instruction executed on other programmable devices is provided for realizing in one or more flows of the flowchart and/or block diagram one
The step of function of being specified in a box or multiple boxes.
In a typical configuration, calculating equipment includes one or more processors (CPU), input/output interface, net
Network interface and memory.
Memory may include the non-volatile memory in computer-readable medium, random access memory (RAM) and/
Or the forms such as Nonvolatile memory, such as read-only memory (ROM) or flash memory (flash RAM).Memory is computer-readable Jie
The example of matter.
Computer-readable medium includes permanent and non-permanent, removable and non-removable media can be by any method
Or technology come realize information store.Information can be computer readable instructions, data structure, the module of program or other data.
The example of the storage medium of computer includes, but are not limited to phase change memory (PRAM), static random access memory (SRAM), moves
State random access memory (DRAM), other kinds of random access memory (RAM), read-only memory (ROM), electric erasable
Programmable read only memory (EEPROM), flash memory or other memory techniques, read-only disc read only memory (CD-ROM) (CD-ROM),
Digital versatile disc (DVD) or other optical storage, magnetic cassettes, tape magnetic disk storage or other magnetic storage devices
Or any other non-transmission medium, can be used for storage can be accessed by a computing device information.As defined in this article, it calculates
Machine readable medium does not include temporary computer readable media (transitory media), such as the data-signal and carrier wave of modulation.
It should also be noted that, the terms "include", "comprise" or its any other variant are intended to nonexcludability
It include so that the process, method, commodity or the equipment that include a series of elements not only include those elements, but also to wrap
Include other elements that are not explicitly listed, or further include for this process, method, commodity or equipment intrinsic want
Element.In the absence of more restrictions, the element limited by sentence "including a ...", it is not excluded that including element
There is also other identical elements in process, method, commodity or equipment.
It will be understood by those skilled in the art that embodiments herein can provide as method, system or computer program product.
Therefore, complete hardware embodiment, complete software embodiment or embodiment combining software and hardware aspects can be used in the application
Form.It is deposited moreover, the application can be used to can be used in the computer that one or more wherein includes computer usable program code
The shape for the computer program product implemented on storage media (including but not limited to magnetic disk storage, CD-ROM, optical memory etc.)
Formula.
The above is only embodiments herein, are not intended to limit this application.To those skilled in the art,
Various changes and changes are possible in this application.It is all within the spirit and principles of the present application made by any modification, equivalent replacement,
Improve etc., it should be included within the scope of the claims of this application.
Claims (10)
1. a kind of website chain rupture detection method, which is characterized in that the described method includes:
Obtain the web site url that exception information is fed back in website;
Judge it is described feedback exception information web site url quantity whether be more than the website chain rupture threshold value;
If not exceeded, the web site url for then determining the feedback exception information is website chain rupture;
If being more than, detected by web site url of the proxy server to the feedback exception information, when the detection
When feedback information is identical as the exception information, determine that the web site url of the feedback exception information is website chain rupture.
2. the method according to claim 1, wherein it is described by proxy server to the feedback exception information
The feedback information that is detected of web site url it is identical as the exception information when, it is determined that the website of the feedback exception information
Being linked as website chain rupture includes:
Address information is obtained by proxy server;
It by the address information, is crawled using web site url of the crawler to the feedback exception information, and described in acquisition
Feed back the feedback information of the web site url of exception information;
Judge whether the feedback information of the web site url of the feedback exception information is identical as the exception information;
If they are the same, it is determined that the web site url of the feedback exception information is website chain rupture.
3. if according to the method described in claim 2, the it is characterized in that, web site url for judging the feedback exception information
Quantity be less than the chain rupture threshold value of the website, it is determined that the web site url of the feedback exception information is website chain rupture packet
It includes:
After if the quantity of the web site url for judging the feedback exception information is less than the chain rupture threshold value of the website,
Address information is obtained by proxy server;
It by the address information, is crawled using web site url of the crawler to the feedback exception information, and described in acquisition
Feed back the feedback information of the web site url of exception information;
When the feedback information of the web site url of the feedback exception information is identical as the exception information, the website chain is determined
It connects really as website chain rupture.
4. according to the method described in claim 2, it is characterized in that, the method also includes:
When the feedback information and the exception detected by web site url of the proxy server to the feedback exception information
When information difference, determining the web site url not is website chain rupture;
Prompting message is exported, the prompting message is used to indicate the former server address that the crawler uses to be prohibited by the website
With.
5. the method according to claim 1, wherein described obtain the web site url for feeding back exception information in website
Include:
Whole links of website are successively crawled using crawler;
The information of the web site url feedback successively crawled according to the crawler, which determines, feeds back exception information in the web site url
Web site url;
Calculate the web site url quantity of the feedback exception information.
6. method according to any one of claims 1-5, which is characterized in that the exception information is webpage abnormality
Code.
7. a kind of website broken chain detecting device, which is characterized in that described device includes:
Acquiring unit, for obtaining the web site url for feeding back exception information in website;
Whether the quantity of judging unit, the web site url of the feedback exception information for judging the acquiring unit acquisition is more than institute
State the chain rupture threshold value of website;
Determination unit, if judging that the quantity of the web site url of feedback exception information is less than the website for the judging unit
Chain rupture threshold value, it is determined that it is described feedback exception information web site url be website chain rupture;
Detection unit, if judging that the quantity of the web site url of feedback exception information is more than the website for the judging unit
Chain rupture threshold value is then detected by web site url of the proxy server to the feedback exception information;
The determination unit is also used to determine when the feedback information that the detection unit detects is identical as the exception information
The web site url of the feedback exception information is website chain rupture.
8. device according to claim 7, which is characterized in that the determination unit includes:
First obtains module, for obtaining address information by proxy server;
First crawls module, different to the feedback using crawler for obtaining the address information that module obtains by described first
The web site url of normal information is crawled;
Described first obtains module, is also used to obtain the feedback information of the web site url of the feedback exception information;
Judgment module, for judging that the feedback information of web site url of feedback exception information that the first acquisition module obtains is
It is no identical as the exception information;
First determining module, if for the judgment module judge feedback exception information web site url feedback information with it is described
Exception information is identical, it is determined that the web site url of the feedback exception information is website chain rupture.
9. a kind of storage medium, which is characterized in that the storage medium includes the program of storage, wherein run in described program
When control the storage medium where equipment perform claim require 1 to the website chain rupture inspection described in any one of claim 6
Survey method.
10. a kind of processor, which is characterized in that the processor is for running program, wherein right of execution when described program is run
Benefit requires 1 to the website chain rupture detection method described in any one of claim 6.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710612685.9A CN109302299B (en) | 2017-07-25 | 2017-07-25 | Website broken link detection method and device |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710612685.9A CN109302299B (en) | 2017-07-25 | 2017-07-25 | Website broken link detection method and device |
Publications (2)
Publication Number | Publication Date |
---|---|
CN109302299A true CN109302299A (en) | 2019-02-01 |
CN109302299B CN109302299B (en) | 2021-12-28 |
Family
ID=65167402
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201710612685.9A Active CN109302299B (en) | 2017-07-25 | 2017-07-25 | Website broken link detection method and device |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN109302299B (en) |
Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102739663A (en) * | 2012-06-18 | 2012-10-17 | 奇智软件(北京)有限公司 | Detection method and scanning engine of web pages |
CA2762544A1 (en) * | 2011-12-20 | 2013-06-20 | Ibm Canada Limited - Ibm Canada Limitee | Identifying requests that invalidate user sessions |
US20130179217A1 (en) * | 2010-06-21 | 2013-07-11 | Salesforce.Com, Inc. | Referred internet traffic analysis system and method |
CN104182462A (en) * | 2014-07-21 | 2014-12-03 | 安徽华贞信息科技有限公司 | Web crawler service system for housing library network |
CN104537005A (en) * | 2014-12-15 | 2015-04-22 | 北京国双科技有限公司 | Data processing method and device for webpage crawling |
CN106547793A (en) * | 2015-09-22 | 2017-03-29 | 北京国双科技有限公司 | The method and apparatus for obtaining proxy server address |
CN106874487A (en) * | 2017-02-21 | 2017-06-20 | 国信优易数据有限公司 | A kind of distributed reptile management system and its method |
-
2017
- 2017-07-25 CN CN201710612685.9A patent/CN109302299B/en active Active
Patent Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20130179217A1 (en) * | 2010-06-21 | 2013-07-11 | Salesforce.Com, Inc. | Referred internet traffic analysis system and method |
CA2762544A1 (en) * | 2011-12-20 | 2013-06-20 | Ibm Canada Limited - Ibm Canada Limitee | Identifying requests that invalidate user sessions |
CN102739663A (en) * | 2012-06-18 | 2012-10-17 | 奇智软件(北京)有限公司 | Detection method and scanning engine of web pages |
CN104182462A (en) * | 2014-07-21 | 2014-12-03 | 安徽华贞信息科技有限公司 | Web crawler service system for housing library network |
CN104537005A (en) * | 2014-12-15 | 2015-04-22 | 北京国双科技有限公司 | Data processing method and device for webpage crawling |
CN106547793A (en) * | 2015-09-22 | 2017-03-29 | 北京国双科技有限公司 | The method and apparatus for obtaining proxy server address |
CN106874487A (en) * | 2017-02-21 | 2017-06-20 | 国信优易数据有限公司 | A kind of distributed reptile management system and its method |
Also Published As
Publication number | Publication date |
---|---|
CN109302299B (en) | 2021-12-28 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US10459780B2 (en) | Automatic application repair by network device agent | |
CN105243252B (en) | A kind of method and device of account risk assessment | |
US20130276124A1 (en) | Systems, methods, apparatuses and computer program products for providing mobile device protection | |
CN103984673A (en) | Automatic detection of fraudulent ratings/comments related to an application store | |
CN107958456A (en) | Dispensing detection method, device and electronic equipment | |
CN107943949A (en) | A kind of method and server of definite web crawlers | |
CN108599995A (en) | Network line failure judgment method and server | |
US20170053224A1 (en) | System and method for providing multi-site visualization and scoring of performance against service agreement | |
CN106961410B (en) | Abnormal access detection method and device | |
CN110968760A (en) | Webpage data crawling method and device, and webpage login method and device | |
US20170053225A1 (en) | System and method for providing visualization of performance against service agreement | |
CN109298987A (en) | A kind of method and device detecting web crawlers operating status | |
CN109726068A (en) | A kind of data detection method and device | |
CN106572056A (en) | Risk monitoring method and device | |
CN108228431A (en) | A kind of method and system of configurationization reptile quality-monitoring | |
CN105490835A (en) | Information monitoring method and device | |
CN110941787A (en) | Page redirection method and device | |
CN109302299A (en) | A kind of website chain rupture detection method and device | |
CN106411860B (en) | A kind of method and device of Internet protocol IP detection | |
CN107896232A (en) | A kind of IP address appraisal procedure and device | |
CN110278105A (en) | The method for detecting whole service operation quality based on zabbix and web testing | |
US10536534B2 (en) | System and method for providing visual feedback in site-related service activity roadmap | |
CN106612261A (en) | Website data obtaining method, devices and system | |
CN109597743A (en) | Page circle choosing method, click volume statistical method and relevant device | |
US20170052957A1 (en) | System and method for providing high-level graphical feedback related to overall site performance and health |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
CB02 | Change of applicant information |
Address after: 100083 No. 401, 4th Floor, Haitai Building, 229 North Fourth Ring Road, Haidian District, Beijing Applicant after: Beijing Guoshuang Technology Co.,Ltd. Address before: 100086 Cuigong Hotel, 76 Zhichun Road, Shuangyushu District, Haidian District, Beijing Applicant before: Beijing Guoshuang Technology Co.,Ltd. |
|
CB02 | Change of applicant information | ||
GR01 | Patent grant | ||
GR01 | Patent grant |