CN113590987A - Link detection method and device - Google Patents

Link detection method and device Download PDF

Info

Publication number
CN113590987A
CN113590987A CN202111148717.7A CN202111148717A CN113590987A CN 113590987 A CN113590987 A CN 113590987A CN 202111148717 A CN202111148717 A CN 202111148717A CN 113590987 A CN113590987 A CN 113590987A
Authority
CN
China
Prior art keywords
hyperlink
list
link
response
invalid
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202111148717.7A
Other languages
Chinese (zh)
Inventor
尹彬强
孙成新
王金明
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Feihu Information Technology Tianjin Co Ltd
Original Assignee
Feihu Information Technology Tianjin Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Feihu Information Technology Tianjin Co Ltd filed Critical Feihu Information Technology Tianjin Co Ltd
Priority to CN202111148717.7A priority Critical patent/CN113590987A/en
Publication of CN113590987A publication Critical patent/CN113590987A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/955Retrieval from the web using information identifiers, e.g. uniform resource locators [URL]
    • G06F16/9566URL specific, e.g. using aliases, detecting broken or misspelled links

Abstract

The invention provides a link detection method and a device, and the method comprises the following steps: acquiring a preset website list, wherein the website list comprises at least one website to be detected; extracting hyperlinks in the website to be detected and constructing a corresponding hyperlink list according to the hyperlinks; accessing each hyperlink in the hyperlink list and obtaining the response result of each hyperlink; and determining a final dead link list containing invalid links in the hyperlink list according to the response result of each hyperlink. According to the scheme, a pre-constructed website list containing the websites to be detected is obtained, hyperlinks in the websites to be detected are extracted, and the hyperlink list is constructed. And accessing each hyperlink in the hyperlink list to obtain a corresponding response result, and determining a final dead link list according to the obtained response result, wherein the final dead link list comprises invalid links in the hyperlink list. The invalid links are determined without clicking the links one by detection personnel, so that the time for detecting the dead links is shortened, and the accuracy rate for detecting the dead links is improved.

Description

Link detection method and device
Technical Field
The present invention relates to the field of link analysis technologies, and in particular, to a link detection method and apparatus.
Background
At present, when information is displayed through a dynamic page, data is mainly acquired from a server by using a link in the dynamic page, and the acquired data is rendered into the dynamic page to complete information display. In order to ensure that the dynamic page can normally display information, a failed or wrong link (also called a dead link) in the dynamic page needs to be detected and processed.
The existing method for detecting the dead chain is as follows: the link in the dynamic page is clicked one by the detection personnel to carry out manual detection, however, because the number of the dynamic pages is large and the dynamic page also contains a large amount of links, the detection personnel needs to consume a large amount of time to detect the dead link, the detection time is long, and careless mistakes are easy to occur in the process of detecting the dead link by the detection personnel, and the detection accuracy is low.
Disclosure of Invention
In view of this, embodiments of the present invention provide a link detection method and apparatus, so as to solve the problems of long detection time and low detection accuracy in the existing dead link detection method.
In order to achieve the above purpose, the embodiments of the present invention provide the following technical solutions:
the first aspect of the embodiments of the present invention discloses a link detection method, including:
acquiring a preset website list, wherein the website list comprises at least one website to be detected;
extracting hyperlinks in the website to be detected and constructing a corresponding hyperlink list according to the hyperlinks;
accessing each hyperlink in the hyperlink list and obtaining a response result of each hyperlink;
and determining a final dead link list containing invalid links in the hyperlink list according to the response result of each hyperlink.
Preferably, the determining a final dead link list including invalid links in the hyperlink list according to the response result of each hyperlink includes:
for each hyperlink, if the response result of the hyperlink conforms to a preset dead link judgment condition, determining the hyperlink to be an invalid link, wherein the dead link judgment condition is set based on the response state indicating that the hyperlink is the invalid link, response content and response access time;
determining a final dead link list using all the determined invalid links.
Preferably, the determining a final dead link list by using all the determined invalid links includes:
determining a list of initially checked dead links by using all the determined invalid links;
accessing each invalid link in the initial detection dead link list and acquiring a corresponding response result;
and eliminating the invalid links of which the response results do not accord with the dead link judgment conditions from the initial detection dead link list, returning to the step of executing the step of accessing each invalid link in the initial detection dead link list and acquiring the corresponding response results until the execution times is equal to the preset times, and determining that the initial detection dead link list after eliminating the invalid links of which the response results do not accord with the dead link judgment conditions is a final dead link list.
Preferably, for each hyperlink, if a response result of the hyperlink meets a preset dead link judgment condition, determining the hyperlink to be an invalid link includes:
analyzing the response result of each hyperlink to obtain the corresponding response state, response content and response access time;
and for each hyperlink, if the response state of the hyperlink does not meet the preset response state, or if the time of response access of the hyperlink is greater than the preset time, or if the response content of the hyperlink is the preset response content, determining that the hyperlink is an invalid link.
Preferably, the extracting hyperlinks in the website to be detected and constructing a corresponding hyperlink list according to the hyperlinks includes:
acquiring a corresponding page code according to the website to be detected;
and extracting hyperlinks from the page codes corresponding to the to-be-detected websites and constructing a hyperlink list according to the hyperlinks.
Preferably, after determining a final dead link list including invalid links in the hyperlink list according to a response result of each of the hyperlinks, the method further includes:
and sending the final dead link list to a target object.
Preferably, after determining a final dead link list including invalid links in the hyperlink list according to a response result of each of the hyperlinks, the method further includes:
and sending the invalid links which do not belong to the preset blacklist in the final dead link list to a target object.
A second aspect of the embodiments of the present invention discloses a link detection apparatus, including:
the device comprises an acquisition unit, a processing unit and a processing unit, wherein the acquisition unit is used for acquiring a preset website list, and the website list comprises at least one website to be detected;
the extraction unit is used for extracting the hyperlinks in the website to be detected and constructing a corresponding hyperlink list according to the hyperlinks;
the processing unit is used for accessing each hyperlink in the hyperlink list and acquiring a response result of each hyperlink;
and the determining unit is used for determining a final dead link list containing invalid links in the hyperlink list according to the response result of each hyperlink.
Preferably, the determining unit is specifically configured to: for each hyperlink, if the response result of the hyperlink conforms to a preset dead link judgment condition, determining the hyperlink to be an invalid link, wherein the dead link judgment condition is set based on the response state indicating that the hyperlink is the invalid link, response content and response access time; determining a final dead link list using all the determined invalid links.
Preferably, the determination unit includes:
the determining module is used for determining a preliminary examination dead link list by using all the determined invalid links;
the access module is used for accessing each invalid link in the initial detection dead link list and acquiring a corresponding response result;
and the processing module is used for eliminating the invalid links of which the response results do not accord with the dead link judgment conditions from the initial detection dead link list, executing the access module until the execution times are equal to the preset times, and determining that the initial detection dead link list after the invalid links of which the response results do not accord with the dead link judgment conditions are eliminated is a final dead link list.
Based on the above-mentioned link detection method and device provided by the embodiments of the present invention, the method is: acquiring a preset website list, wherein the website list comprises at least one website to be detected; extracting hyperlinks in the website to be detected and constructing a corresponding hyperlink list according to the hyperlinks; accessing each hyperlink in the hyperlink list and obtaining the response result of each hyperlink; and determining a final dead link list containing invalid links in the hyperlink list according to the response result of each hyperlink. According to the scheme, a pre-constructed website list containing the websites to be detected is obtained, hyperlinks in the websites to be detected are extracted, and the hyperlink list is constructed. And accessing each hyperlink in the hyperlink list to obtain a corresponding response result, and determining a final dead link list according to the obtained response result, wherein the final dead link list comprises invalid links in the hyperlink list. The invalid links are determined without clicking the links one by detection personnel, so that the time for detecting the dead links is shortened, and the accuracy rate for detecting the dead links is improved.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the provided drawings without creative efforts.
Fig. 1 is a flowchart of a link detection method according to an embodiment of the present invention;
FIG. 2 is a flowchart of determining a final list of dead links according to an embodiment of the present invention;
fig. 3 is a block diagram of a link detection apparatus according to an embodiment of the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
In this application, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.
As can be seen from the background art, at present, a detection person clicks links in a dynamic page one by one to detect a dead link, but the number of dynamic pages is large, and the dynamic page also contains a large number of links, so that the detection person needs to spend a large amount of time to detect the dead link, the detection time is long, careless mistakes are easily generated in the process of detecting the dead link by the detection person, and the detection accuracy is low.
Therefore, the embodiment of the invention provides a link detection method and device, which are used for acquiring a pre-constructed website list containing websites to be detected, extracting hyperlinks in the websites to be detected and constructing the hyperlink list. And accessing each hyperlink in the hyperlink list to obtain a corresponding response result, and determining a final dead link list according to the obtained response result, wherein the final dead link list comprises invalid links in the hyperlink list. Invalid links are determined without clicking the links one by detection personnel, so that the time for detecting the dead links is shortened, and the accuracy rate for detecting the dead links is improved.
It should be noted that the invalid link in the embodiment of the present invention is a dead link or a wrong link in a page, that is, the invalid link in the embodiment of the present invention is a dead link.
It is understood that the reasons for the occurrence of dead chains include, but are not limited to: a link becomes a dead link due to a website server setting error, folder name modification, and the like. The dead link is specifically divided into two types, namely a protocol dead link and a content dead link.
Referring to fig. 1, a flowchart of a link detection method provided in an embodiment of the present invention is shown, where the link detection method includes:
step S101: and acquiring a preset website list.
It should be noted that a website list is preset, and the website list includes at least one website to be detected.
In the process of specifically implementing step S101, a preset website list is obtained, and a website to be detected in the website list is obtained.
In some embodiments, the website to be detected in the website list may be updated according to a preset update period and an update rule.
It should be noted that the website to be detected corresponds to one page, each page has a plurality of hyperlinks, and the hyperlinks may be Uniform Resource Locators (URLs).
Step S102: and extracting hyperlinks in the website to be detected and constructing a corresponding hyperlink list according to the hyperlinks.
In the process of implementing step S102 specifically, for each website to be detected, a corresponding page code is obtained according to the website to be detected, a designated analysis tool (e.g., JSOUP) is used to analyze the page code corresponding to the website to be detected, hyperlinks in the website to be detected are extracted and obtained, and a hyperlink list is constructed according to the hyperlink list, where the hyperlink list includes all hyperlinks in the website to be detected.
It should be noted that all hyperlinks of each website to be detected can be constructed to obtain a hyperlink list, that is, one hyperlink list contains all hyperlinks of one website to be detected; similarly, a hyperlink list can be constructed by using all hyperlinks of all websites to be detected, namely the hyperlink list contains all hyperlinks of all websites to be detected; the manner in which the list of hyperlinks is constructed is not limited herein.
The specific implementation mode of extracting the hyperlink in the website to be detected by using the appointed analysis tool is as follows: and requesting to access the website to be detected to obtain a corresponding page code, analyzing the page code by using an appointed analysis tool, and obtaining the hyperlink in the website to be detected from the target tag of the page code.
For example: the method comprises the steps of requesting to access a website to be detected to obtain a corresponding page code, assuming that the obtained page code is hypertext Markup Language (HTML), analyzing the page code by using JSOUP, obtaining a hyperlink in the website to be detected from href attribute in a tag of HTML, < a href =,'/>, and obtaining the hyperlink in the website to be detected from src attribute in img tag of the HTML.
Step S103: and accessing each hyperlink in the hyperlink list and acquiring a response result of each hyperlink.
In the process of implementing step S103, for each hyperlink list, each hyperlink in the hyperlink list is accessed one by one, and a response result of each hyperlink is obtained.
It should be noted that, because a large number of hyperlinks (for example, thousands of hyperlinks) exist in the page corresponding to each website to be detected, that is, each hyperlink list includes a large number of hyperlinks, when accessing each hyperlink in the hyperlink list, each hyperlink in the hyperlink list can be accessed in a multithreading manner, so as to further reduce the time for detecting a dead link.
Step S104: and determining a final dead link list containing invalid links in the hyperlink list according to the response result of each hyperlink.
In the process of implementing step S104 specifically, for each hyperlink, if a response result of the hyperlink conforms to a preset dead link judgment condition, the hyperlink is determined to be an invalid link, and the dead link judgment condition is set based on a response state indicating that the hyperlink is an invalid link, response content, and response access time. Using all invalid links determined, a final dead link list is determined.
In some embodiments, the specific implementation of determining a hyperlink as an invalid link is: analyzing the response result of each hyperlink to obtain the corresponding response state, response content and response access time; and for each hyperlink, if the response state of the hyperlink does not meet the preset response state, or if the response access time of the hyperlink is greater than the preset time, or if the response content of the hyperlink is the preset response content, determining that the hyperlink is an invalid link.
In summary, the specific content of the dead link judgment condition is as follows: the response state does not meet the preset response state, the response access time is longer than the preset time, and the response content is the preset response content.
For example: for a hyperlink, if the response status of the hyperlink is not 200 status, 301 status, and 302 status, it indicates that the hyperlink is an invalid link, that is, if the response status of a hyperlink is not 200 status, 301 status, and 302 status, it can be determined that the hyperlink is an invalid link. I.e., the 200 state, the 301 state, and the 302 state are the preset response states mentioned above.
It should be noted that the 200 state indicates that the request sent when accessing the hyperlink is successfully executed, the 301 state indicates that the web page corresponding to the hyperlink is permanently transferred to another URL, and the 302 state indicates that the accessed hyperlink is redirected to a new address.
It should be further noted that the 200 state, the 301 state and the 302 state used in determining whether the link is invalid according to the response state of the hyperlink are only used for illustration, and in practical applications, the specific preset response state used may be determined according to practical situations.
Another example is: for a hyperlink, if the time of the hyperlink to respond to the access is more than 30 seconds, the hyperlink is an invalid link.
Another example is: if the response content of a hyperlink contains one or more of a response page, a play page, an album page and the like, if any one of the following 3 cases occurs in the response content of the hyperlink, the hyperlink is an invalid link, and the 3 cases are as follows: the response page is 404 pages, the video in the play page is not playable, and there is no video in the album page. That is, the preset response content mentioned above is: the response page is 404 pages, the video in the play page is not playable, and there is no video in the album page.
It should be noted that, for a hyperlink whose response state does not satisfy the preset response state, the hyperlink is an invalid link and the type of the invalid link is a protocol dead link; for the hyperlink with the access response time being greater than the preset time, the hyperlink is an invalid link, and the type of the invalid link is a protocol dead link; and for the hyperlink with the response content being the preset response content, the hyperlink is an invalid link and the type of the invalid link is a content dead link.
In the above manner, invalid links are determined from each hyperlink list, and a final dead link list is determined by using all the determined invalid links, where the final dead link list includes the invalid links (i.e., dead links) in each hyperlink list.
Specifically, after determining to obtain the invalid links in each hyperlink list, all invalid links are summarized through the dead link port, and then the final dead link list can be determined.
In some embodiments, after determining that the invalid link is obtained, the invalid link may be analyzed to obtain a dead link reason corresponding to the invalid link and corresponding dead link information, where the dead link information may indicate which hyperlink in the website to be detected the invalid link is.
It should be noted that, because a network problem may affect the accuracy of determining the invalid links, in the process of determining the final dead link list, multiple repeated detections may be performed on each invalid link, and the final dead link list is determined by using each invalid link obtained after the multiple repeated detections, where a specific implementation manner of the multiple repeated detections is detailed in the following content in fig. 2 of the embodiment of the present invention.
Preferably, after determining to obtain the final dead link list, the final dead link list is sent to the target object, for example: and pre-configuring a mailbox address of an operator, and after determining that the final dead link list is obtained, sending an alarm mail carrying at least the final dead link list to the mailbox address of the operator so as to remind the operator to process invalid links (namely dead links) in the final dead link list at the first time.
In practical applications, an operator may consider that some detected invalid links are not dead links, and therefore a preset blacklist is configured in advance, and the preset blacklist includes hyperlinks considered by the operator not to be dead links. Preferably, the invalid links in the final dead link list that do not belong to the preset blacklist are sent to the target object, that is, the invalid links in the final dead link list that belong to the preset blacklist are not sent to the target object.
It should be noted that, the steps S101 to S104 may be executed periodically (the period time may be customized), and the invalid links in the websites to be detected are detected periodically.
In the embodiment of the invention, a pre-constructed website list containing the websites to be detected is obtained, hyperlinks in the websites to be detected are extracted, and a hyperlink list is constructed. And accessing each hyperlink in the hyperlink list to obtain a corresponding response result, and determining a final dead link list according to the obtained response result, wherein the final dead link list comprises invalid links in the hyperlink list. The invalid links are determined without clicking the links one by detection personnel, so that the time for detecting the dead links is shortened, and the accuracy rate for detecting the dead links is improved.
The process of determining the final dead link list in step S104 in fig. 1 in the above embodiment of the present invention, referring to fig. 2, shows a flowchart of determining the final dead link list provided in the embodiment of the present invention, including the following steps:
step S201: and determining a list of the initial check dead links by using all the determined invalid links.
In the process of implementing step S201, a list of the first check dead links is constructed by using all the invalid links determined from each hyperlink list.
Step S202: and accessing each invalid link in the initial check dead link list and acquiring a corresponding response result.
In the process of implementing step S202 specifically, each invalid link in the preliminary examination dead link list is accessed, and a response result corresponding to each invalid link in the preliminary examination dead link list is obtained.
For example: and accessing each invalid link in the initial check dead link list in a multithreading mode to obtain a response result corresponding to each invalid link in the initial check dead link list.
Step S203: and eliminating invalid links of which the response results do not accord with the dead link judgment conditions from the initial detection dead link list.
It should be noted that dead link misjudgment may occur when detecting an invalid link due to the influence of a network problem, that is, some invalid links in the dead link list may not be dead links when initially detected. As can be seen from the content in step S104 in fig. 1 in the embodiment of the present invention, the invalid link is a hyperlink that meets the dead link determination condition, and the hyperlink that does not meet the dead link determination condition is not an invalid link, and specific ways of determining whether the hyperlink is an invalid link may refer to the content in step S104 in fig. 1 in the embodiment of the present invention, which is not described herein again.
Therefore, in the process of specifically implementing step S203, an invalid link whose response result does not meet the dead link judgment condition is removed from the initial detection dead link list; that is, the hyperlinks erroneously determined as invalid links in the preliminary dead link list (i.e., hyperlinks not satisfying the dead link determination condition) are removed from the preliminary dead link list.
Step S204: judging whether the times of executing the step S202 is more than or equal to the preset times or not; if the number of times of executing the step S202 is less than the preset number of times, returning to execute the step S202; if the number of times of executing step S202 is greater than or equal to the preset number of times, step S205 is executed.
In the process of specifically implementing the step S204, whether the number of times of executing the step S202 is greater than a preset number of times is determined, if the number of times of executing the step S202 is less than the preset number of times, each invalid link in the preliminary examination dead link list is accessed again and a corresponding response result is obtained (that is, the step S202 is executed again), and the invalid link of which the response result does not meet the dead link determination condition is removed from the preliminary examination dead link list again (that is, the step S203 is executed again); if the number of times of executing step S202 is greater than or equal to the preset number of times, the initial dead link list after the invalid link whose response result is not in accordance with the dead link determination condition is removed for the last time is used as the final dead link list, that is, step S205 is executed.
For example: and (5) repeating the steps S202 to S204 for 10 times, and taking the initial detection dead link list after the invalid link with the response result not meeting the dead link judgment condition is removed for the last time as a final dead link list.
Step S205: and determining the initial detection dead link list after eliminating the invalid links of which the response results do not accord with the dead link judgment conditions as a final dead link list.
In the embodiment of the invention, the list of the initially detected dead links is determined by utilizing all the determined invalid links. And repeating the invalid link detection of the initial detection dead link list for a preset number of times until a final dead link list is obtained. And the condition of dead link misjudgment caused by network problems is eliminated, and the accuracy of detecting the dead link is further improved.
Corresponding to the above-mentioned link detection method provided by the embodiment of the present invention, referring to fig. 3, the embodiment of the present invention further provides a structural block diagram of a link detection apparatus, where the link detection apparatus includes: an acquisition unit 301, an extraction unit 302, a processing unit 303, and a determination unit 304;
the acquiring unit 301 is configured to acquire a preset website list, where the website list includes at least one website to be detected.
The extracting unit 302 is configured to extract hyperlinks in the website to be detected and construct a corresponding hyperlink list according to the hyperlinks.
In a specific implementation, the extracting unit 302 is specifically configured to: acquiring a corresponding page code according to the website to be detected; and extracting the hyperlink from the page code corresponding to the website to be detected and constructing a hyperlink list according to the hyperlink.
The processing unit 303 is configured to access each hyperlink in the hyperlink list and obtain a response result of each hyperlink.
A determining unit 304, configured to determine a final dead link list including invalid links in the hyperlink list according to the response result of each hyperlink.
In a specific implementation, the determining unit 304 is specifically configured to: aiming at each hyperlink, if the response result of the hyperlink accords with a preset dead link judgment condition, determining the hyperlink to be an invalid link, wherein the dead link judgment condition is obtained based on the response state indicating that the hyperlink is the invalid link, response content and response access time setting; using all invalid links determined, a final dead link list is determined.
In a specific implementation, the determining unit 304 is specifically configured to: analyzing the response result of each hyperlink to obtain the corresponding response state, response content and response access time; and aiming at each hyperlink, if the response state of the hyperlink does not meet the preset response state, or if the time of response access of the hyperlink is greater than the preset time, or if the response content of the hyperlink is the preset response content, determining the hyperlink to be an invalid link.
In the embodiment of the invention, a pre-constructed website list containing the websites to be detected is obtained, hyperlinks in the websites to be detected are extracted, and a hyperlink list is constructed. And accessing each hyperlink in the hyperlink list to obtain a corresponding response result, and determining a final dead link list according to the obtained response result, wherein the final dead link list comprises invalid links in the hyperlink list. The invalid links are determined without clicking the links one by detection personnel, so that the time for detecting the dead links is shortened, and the accuracy rate for detecting the dead links is improved.
Preferably, in conjunction with the content shown in fig. 3, the determining unit includes: the system comprises a determining module, an accessing module and a processing module, wherein the execution principle of each module is as follows:
and the determining module is used for determining the initial check dead link list by using all the determined invalid links.
And the access module is used for accessing each invalid link in the initial check dead link list and acquiring a corresponding response result.
And the processing module is used for eliminating the invalid links of which the response results do not accord with the dead link judgment conditions from the initial detection dead link list, executing the access module until the execution times are equal to the preset times, and determining the initial detection dead link list after the invalid links of which the response results do not accord with the dead link judgment conditions are eliminated as a final dead link list.
In the embodiment of the invention, the list of the initially detected dead links is determined by utilizing all the determined invalid links. And repeating the invalid link detection of the initial detection dead link list for a preset number of times until a final dead link list is obtained. And the condition of dead link misjudgment caused by network problems is eliminated, and the accuracy of detecting the dead link is further improved.
Preferably, in conjunction with the content shown in fig. 3, the link detection apparatus further includes:
and the feedback unit is used for sending the final dead link list to the target object, or sending the invalid link which does not belong to the preset blacklist in the final dead link list to the target object.
In summary, embodiments of the present invention provide a method and an apparatus for detecting a website, which acquire a website list including a website to be detected, extract hyperlinks in the website to be detected, and construct a hyperlink list. And accessing each hyperlink in the hyperlink list to obtain a corresponding response result, and determining a final dead link list according to the obtained response result, wherein the final dead link list comprises invalid links in the hyperlink list. The invalid links are determined without clicking the links one by detection personnel, so that the time for detecting the dead links is shortened, and the accuracy rate for detecting the dead links is improved.
The embodiments in the present specification are described in a progressive manner, and the same and similar parts among the embodiments are referred to each other, and each embodiment focuses on the differences from the other embodiments. In particular, the system or system embodiments are substantially similar to the method embodiments and therefore are described in a relatively simple manner, and reference may be made to some of the descriptions of the method embodiments for related points. The above-described system and system embodiments are only illustrative, wherein the units described as separate parts may or may not be physically separate, and the parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment. One of ordinary skill in the art can understand and implement it without inventive effort.
Those of skill would further appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware, computer software, or combinations of both, and that the various illustrative components and steps have been described above generally in terms of their functionality in order to clearly illustrate this interchangeability of hardware and software. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present invention.
The previous description of the disclosed embodiments is provided to enable any person skilled in the art to make or use the present invention. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the invention. Thus, the present invention is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims (10)

1. A method of link detection, the method comprising:
acquiring a preset website list, wherein the website list comprises at least one website to be detected;
extracting hyperlinks in the website to be detected and constructing a corresponding hyperlink list according to the hyperlinks;
accessing each hyperlink in the hyperlink list and obtaining a response result of each hyperlink;
and determining a final dead link list containing invalid links in the hyperlink list according to the response result of each hyperlink.
2. The method of claim 1, wherein determining a final dead link list containing invalid links in the list of hyperlinks based on the response of each of the hyperlinks comprises:
for each hyperlink, if the response result of the hyperlink conforms to a preset dead link judgment condition, determining the hyperlink to be an invalid link, wherein the dead link judgment condition is set based on the response state indicating that the hyperlink is the invalid link, response content and response access time;
determining a final dead link list using all the determined invalid links.
3. The method of claim 2, wherein determining a final dead link list using all of the determined invalid links comprises:
determining a list of initially checked dead links by using all the determined invalid links;
accessing each invalid link in the initial detection dead link list and acquiring a corresponding response result;
and eliminating the invalid links of which the response results do not accord with the dead link judgment conditions from the initial detection dead link list, returning to the step of executing the step of accessing each invalid link in the initial detection dead link list and acquiring the corresponding response results until the execution times is equal to the preset times, and determining that the initial detection dead link list after eliminating the invalid links of which the response results do not accord with the dead link judgment conditions is a final dead link list.
4. The method of claim 2, wherein for each hyperlink, if a response result of the hyperlink meets a preset dead link judgment condition, determining the hyperlink to be an invalid link comprises:
analyzing the response result of each hyperlink to obtain the corresponding response state, response content and response access time;
and for each hyperlink, if the response state of the hyperlink does not meet the preset response state, or if the time of response access of the hyperlink is greater than the preset time, or if the response content of the hyperlink is the preset response content, determining that the hyperlink is an invalid link.
5. The method according to claim 1, wherein the extracting hyperlinks from the website to be detected and constructing a corresponding hyperlink list based on the extracted hyperlinks comprises:
acquiring a corresponding page code according to the website to be detected;
and extracting hyperlinks from the page codes corresponding to the to-be-detected websites and constructing a hyperlink list according to the hyperlinks.
6. The method of claim 1, wherein after determining a final dead link list containing invalid links in the list of hyperlinks according to the response result of each of the hyperlinks, further comprising:
and sending the final dead link list to a target object.
7. The method of claim 1, wherein after determining a final dead link list containing invalid links in the list of hyperlinks according to the response result of each of the hyperlinks, further comprising:
and sending the invalid links which do not belong to the preset blacklist in the final dead link list to a target object.
8. A link detection apparatus, the apparatus comprising:
the device comprises an acquisition unit, a processing unit and a processing unit, wherein the acquisition unit is used for acquiring a preset website list, and the website list comprises at least one website to be detected;
the extraction unit is used for extracting the hyperlinks in the website to be detected and constructing a corresponding hyperlink list according to the hyperlinks;
the processing unit is used for accessing each hyperlink in the hyperlink list and acquiring a response result of each hyperlink;
and the determining unit is used for determining a final dead link list containing invalid links in the hyperlink list according to the response result of each hyperlink.
9. The apparatus according to claim 8, wherein the determining unit is specifically configured to: for each hyperlink, if the response result of the hyperlink conforms to a preset dead link judgment condition, determining the hyperlink to be an invalid link, wherein the dead link judgment condition is set based on the response state indicating that the hyperlink is the invalid link, response content and response access time; determining a final dead link list using all the determined invalid links.
10. The apparatus of claim 9, wherein the determining unit comprises:
the determining module is used for determining a preliminary examination dead link list by using all the determined invalid links;
the access module is used for accessing each invalid link in the initial detection dead link list and acquiring a corresponding response result;
and the processing module is used for eliminating the invalid links of which the response results do not accord with the dead link judgment conditions from the initial detection dead link list, executing the access module until the execution times are equal to the preset times, and determining that the initial detection dead link list after the invalid links of which the response results do not accord with the dead link judgment conditions are eliminated is a final dead link list.
CN202111148717.7A 2021-09-29 2021-09-29 Link detection method and device Pending CN113590987A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111148717.7A CN113590987A (en) 2021-09-29 2021-09-29 Link detection method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111148717.7A CN113590987A (en) 2021-09-29 2021-09-29 Link detection method and device

Publications (1)

Publication Number Publication Date
CN113590987A true CN113590987A (en) 2021-11-02

Family

ID=78242722

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111148717.7A Pending CN113590987A (en) 2021-09-29 2021-09-29 Link detection method and device

Country Status (1)

Country Link
CN (1) CN113590987A (en)

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102752154A (en) * 2012-07-29 2012-10-24 西北工业大学 Detecting method of dead link of Web site
CN104598458A (en) * 2013-10-30 2015-05-06 腾讯科技(深圳)有限公司 Page detection method and device
CN106326485A (en) * 2016-09-05 2017-01-11 郑州悉知信息科技股份有限公司 Method for detecting web link and device thereof
CN107885820A (en) * 2017-11-07 2018-04-06 北京小度互娱科技有限公司 Breadth traversal orientation grasping means based on crawler system
CN108062362A (en) * 2017-12-01 2018-05-22 北京小度互娱科技有限公司 Dead chain detection method and device
CN112416707A (en) * 2020-11-16 2021-02-26 北京五八信息技术有限公司 Link detection method and device
CN112417240A (en) * 2020-02-21 2021-02-26 上海哔哩哔哩科技有限公司 Website link detection method and device and computer equipment

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102752154A (en) * 2012-07-29 2012-10-24 西北工业大学 Detecting method of dead link of Web site
CN104598458A (en) * 2013-10-30 2015-05-06 腾讯科技(深圳)有限公司 Page detection method and device
CN106326485A (en) * 2016-09-05 2017-01-11 郑州悉知信息科技股份有限公司 Method for detecting web link and device thereof
CN107885820A (en) * 2017-11-07 2018-04-06 北京小度互娱科技有限公司 Breadth traversal orientation grasping means based on crawler system
CN108062362A (en) * 2017-12-01 2018-05-22 北京小度互娱科技有限公司 Dead chain detection method and device
CN112417240A (en) * 2020-02-21 2021-02-26 上海哔哩哔哩科技有限公司 Website link detection method and device and computer equipment
CN112416707A (en) * 2020-11-16 2021-02-26 北京五八信息技术有限公司 Link detection method and device

Similar Documents

Publication Publication Date Title
US20180219907A1 (en) Method and apparatus for detecting website security
CN110324311B (en) Vulnerability detection method and device, computer equipment and storage medium
CN105119783B (en) Method and device for detecting network request data
CN106886494B (en) Automatic interface testing method and system
CN103279710B (en) The detection method of Internet information system malicious code and system
CN102663000A (en) Establishment method for malicious website database, method and device for identifying malicious website
CN103297394A (en) Website security detection method and device
CN110020062B (en) Customizable web crawler method and system
CN104767747A (en) Click jacking safety detection method and device
CN101895517B (en) Method and device for extracting script semantics
CN107070873B (en) Webpage illegal data screening method and system, data screening server and browser
CN110457900B (en) Website monitoring method, device and equipment and readable storage medium
WO2018145637A1 (en) Method and device for recording web browsing behavior, and user terminal
CN113590987A (en) Link detection method and device
EP2937801B1 (en) Harmful site collection device and method
CN102917053A (en) Method, device and system for judging uniform resource locator rewriting of webpage
CN109862074B (en) Data acquisition method and device, readable medium and electronic equipment
CN108156121A (en) The alarm method and device that the monitoring method and device of flow abduction, flow are kidnapped
CN109067726B (en) Identification method and device for station building system, electronic equipment and storage medium
CN109067794B (en) Network behavior detection method and device
CN111046317A (en) Page data acquisition method, device, equipment and computer readable storage medium
CN108573155B (en) Method and device for detecting vulnerability influence range, electronic equipment and storage medium
CN108667766B (en) File detection method and file detection device
CN110851681A (en) Crawler processing method and device, server and computer readable storage medium
CN110719344B (en) Domain name acquisition method and device, electronic equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination