CN106326485A - Method for detecting web link and device thereof - Google Patents

Method for detecting web link and device thereof Download PDF

Info

Publication number
CN106326485A
CN106326485A CN201610802243.6A CN201610802243A CN106326485A CN 106326485 A CN106326485 A CN 106326485A CN 201610802243 A CN201610802243 A CN 201610802243A CN 106326485 A CN106326485 A CN 106326485A
Authority
CN
China
Prior art keywords
address
analyzed
website
object linking
page
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201610802243.6A
Other languages
Chinese (zh)
Inventor
李齐明
常晓阳
乔景亮
王路
张丽辉
朱雨莹
张扬蕾
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
ZHENGZHOU XIZHI INFORMATION TECHNOLOGY Co Ltd
Original Assignee
ZHENGZHOU XIZHI INFORMATION TECHNOLOGY Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by ZHENGZHOU XIZHI INFORMATION TECHNOLOGY Co Ltd filed Critical ZHENGZHOU XIZHI INFORMATION TECHNOLOGY Co Ltd
Priority to CN201610802243.6A priority Critical patent/CN106326485A/en
Publication of CN106326485A publication Critical patent/CN106326485A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/955Retrieval from the web using information identifiers, e.g. uniform resource locators [URL]
    • G06F16/9566URL specific, e.g. using aliases, detecting broken or misspelled links

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Information Transfer Between Computers (AREA)

Abstract

The invention provides a method for detecting a web link and a device thereof, and the method includes the following steps: obtaining a target link address needed to be detected currently in a web site to be analyzed; based on the target link address, sending a page request to the web site to be analyzed; when returned response data aimed at the page request indicate that the target link address is valid, obtaining page data returned by the web site to be analyzed; based on the page data, extracting a link address contained in a target page which the target link address points to; storing the extracted link address as link address to be detected corresponding to the web site to be analyzed. The method for detecting the web link is capable of detecting the effectiveness of all the links in the web site more comprehensively.

Description

The method and apparatus of detection web site url
Technical field
The present invention relates to data collection techniques field, be specifically related to a kind of method and apparatus detecting web site url.
Background technology
At present, there may be multiple link in each page in website, under normal circumstances, a link should be permissible It is linked to a Website page;If certain link normal, chain cannot receive a Website page, then this be linked as one invalid Link.Invalid link can largely effect on Consumer's Experience, thus causes the loss of website traffic.
In order to be reduced or avoided due to the invalid link impact on Consumer's Experience, there is invalid link in website, it is right to need In website, the link in each webpage detects.And owing to website typically all can include the substantial amounts of page, and each page Face includes again multiple link, these pages being linked to can include again substantial amounts of link, existing link Detection mode can only carry out link detection for the single page, and cannot carry out entirely for all catalogues and the page in a website The link detection in face, thus cannot the most comprehensively detect the linking status of all-links in website.
Summary of the invention
In view of this, the embodiment of the present invention provides a kind of method and apparatus detecting web site url, to realize more fully The effectiveness of all-links in website is detected, thus advantageously reduce the invalid link in website.
For achieving the above object, the following technical scheme of embodiment of the present invention offer:
A kind of method detecting web site url, including:
Obtain the Object linking address currently need to detected in website to be analyzed;
Based on described Object linking address, send page request to described website to be analyzed;
When the response data returned for described page request shows that described Object linking address is effective chained address, Obtain the page data that described website to be analyzed returns;
Based on described page data, extract the link included in the target pages pointed by described Object linking address Address;
The chained address extracted is stored as the chained address to be detected that described website to be analyzed is corresponding.
Optionally, the Object linking address currently need to detected in described acquisition website to be analyzed, including:
From the chained address described to be detected of storage, obtain and described website to be analyzed currently need to detect object chain Ground connection location.
Optionally, the Object linking address currently need to detected in described acquisition website to be analyzed, including:
Obtain the chained address of portal page in website to be analyzed;
Using the chained address of described portal page as the Object linking address currently need to detected.
Optionally, also include:
When the response data returned for described page request shows that described Object linking address is invalid link address, It is invalid link by described Object linking address mark.
Optionally, described based on described Object linking address, page request is sent to described website to be analyzed, including:
Based on described Object linking address, send HTTP request to described website to be analyzed;
Described based on described Object linking address, after described website to be analyzed sends page request, also include:
Obtain the HTTP conditional code that described website to be analyzed returns;
When described HTTP conditional code is not belonging to the conditional code characterizing invalid link preset, it is determined that described Object linking Address is effective chained address.
Optionally, also include:
When all chained addresses to be analyzed have been detected, show the detection knot corresponding to each chained address Really.
Optionally, also include:
When terminating the detection of described Object linking address, then return described in performing described to be detected from store In chained address, obtain the operation that Object linking address currently need to be detected in described website to be analyzed, until all to be analyzed Chained address be detected till.
On the other hand, the embodiment of the present application additionally provides a kind of device detecting web site url, including:
Link acquiring unit, for obtaining the Object linking address currently need to detected in website to be analyzed;
Request transmitting unit, for based on described Object linking address, sends page request to described website to be analyzed;
Page analysis unit, for showing described Object linking address when the response data returned for described page request During for effective chained address, obtain the page data that described website to be analyzed returns;
Link placement unit, for based on described page data, extracts the target pointed by described Object linking address Chained address included in the page;
Link memory element, for being stored as corresponding to be detected in described website to be analyzed by the chained address extracted Chained address.
Preferably, described link acquiring unit, particularly as follows: from the chained address described to be detected of storage, obtain institute State and website to be analyzed currently need to detect Object linking address;
Or, obtain the chained address of portal page in website to be analyzed, and the chained address of described portal page made For the Object linking address currently need to detected.
Preferably, also include:
Invalid link indexing unit, for showing described Object linking when the response data returned for described page request When address is invalid link address, it is invalid link by described Object linking address mark.
Based on technique scheme, after the Object linking address currently need to detected in getting website to be analyzed, can With using this Object linking address as a detection starting point, based on this Object linking address, sending the page to website to be analyzed please Ask, if the response data returned for this page request shows that this Object linking address is effective chained address, also can continue The continuous page data obtaining the return of this website to be analyzed, so, can determine that this Object linking address based on this page data Which chained address specified target pages includes, and these chained addresses are entered as the chained address needing detection Row storage, in order to the follow-up Object linking address that these chained addresses are detected as needs successively, it is seen then that pass through bed-by-bed analysis The chained address that in can getting website, each page is comprised, and can one by one the effectiveness of each chained address be carried out Detection, such that it is able to effectively cover the all-links address in all pages in whole website, it is achieved that to all in website The complete detection of chained address, is also beneficial to the invalid link address detecting in website the most accurately.
Accompanying drawing explanation
In order to be illustrated more clearly that the embodiment of the present application or technical scheme of the prior art, below will be to embodiment or existing In having technology to describe, the required accompanying drawing used is briefly described, it should be apparent that, the accompanying drawing in describing below is only this The embodiment of application, for those of ordinary skill in the art, on the premise of not paying creative work, it is also possible to according to The accompanying drawing provided obtains other accompanying drawing.
Fig. 1 is the schematic flow sheet of a kind of one embodiment of method detecting web site url of the application;
Fig. 2 is the schematic flow sheet of a kind of another embodiment of method detecting web site url of the application;
Fig. 3 is the schematic flow sheet of a kind of one embodiment of device detecting web site url of the application.
Detailed description of the invention
Below in conjunction with the accompanying drawing in the embodiment of the present invention, the technical scheme in the embodiment of the present invention is carried out clear, complete Describe, it is clear that described embodiment is only a part of embodiment of the present invention rather than whole embodiments wholely.Based on Embodiment in the present invention, it is every other that those of ordinary skill in the art are obtained under not making creative work premise Embodiment, broadly falls into the scope of protection of the invention.
See Fig. 1, it illustrates the schematic flow sheet of a kind of one embodiment of method detecting web site url of the application, this The method of embodiment can apply to server or other can carry out in the calculating equipment of data analysis, this server or calculating Equipment can and website to be analyzed between be connected by network, the method for the present embodiment may include that
101, obtain the Object linking address currently need to detected in website to be analyzed.
Wherein, website to be analyzed is the website needing to carry out link detection in the embodiment of the present application, needs in the present embodiment The chained address comprised in each page involved by this website is detected, whether to detect chained address as nothing Effect chained address.
It is understood that in the embodiment of the present application, this chained address can be the address of a page;Can also be Referring to point to the linking relationship of a page from a webpage, wherein, the pointed page can be another webpage, it is also possible to be Diverse location in same web page, it is also possible to be the page including picture, e-mail address, file etc..For the ease of district Point, the chained address being currently needed for detecting link detection is referred to as Object linking address.
102, based on this Object linking address, send page request to this website to be analyzed.
Wherein, this page request is for asking this page data pointed by Object linking address, and such as, request accesses should The page pointed by Object linking address.
Wherein, the mode sending page request to this website to be analyzed can have multiple, e.g., and can be according to this Object linking Address generates access request, and such as, this access request can be HTML (Hypertext Markup Language) (HTTP, Hyper Text Transfer Protocol) request, then, access request can be sent to website to be analyzed by server, and access to be analyzed can respond this Access request, and analyze whether the Object linking address entrained by this access request may be coupled to the page, and be for access Request returns response data.
103, when the response data returned for this page request shows that this Object linking address is effective chained address, Obtain the page data that this website to be analyzed returns.
It is understood that at server after this website to be analyzed sends page request, website to be analyzed can be inquired about Whether this Object linking address is associated with the page, if this Object linking address is not associated pages such as any picture, webpage, then This Object linking address is invalid chained address, and the response data that website is fed back for effective chained address, with net The response data fed back for invalid chained address of standing is different, so, just can determine that this target according to response data Whether chained address is effective chained address.
Optionally, in the case of the page request sent to website to be analyzed is HTTP request, website to be analyzed returns Response data can be HTTP conditional code.And can reflect whether Object linking address is effective by HTTP conditional code Chained address, concrete, the conditional code characterizing invalid link can be preset, if the HTTP conditional code that website to be analyzed returns is not Belong to this conditional code characterizing invalid link preset, then may determine that this Object linking address is effective chained address;Instead It, if the HTTP conditional code that website to be analyzed returns belongs to the default conditional code characterizing invalid link, then may determine that this Object linking address belongs to invalid chained address.Such as, when HTTP conditional code is 404, then entrained by explanation HTTP request Object linking address is invalid chained address;When HTTP conditional code is 200, then explanation Object linking entrained by HTTP request Address is effective chained address.
In the case of this Object linking address is effective chained address, website to be analyzed can return this object chain ground connection The page data of the page pointed by location, so that server or calculating equipment can show based on this page data accordingly Content of pages.
104, based on this page data, extract the link included in this target pages pointed by Object linking address Address.
Wherein, for the ease of distinguishing, the page pointed by Object linking address is referred to as page object by the embodiment of the present application Face.
By this web data is analyzed, can extract in the target pages pointed by Object linking address and comprise There is which chained address.As, the source code of this target pages can be obtained based on this page data, then extract from source code Go out the chained address comprised in this target pages.
105, the chained address extracted is stored as the chained address to be detected that website to be analyzed is corresponding.
After extracting the chained address included in this target pages, this can be extracted by the embodiment of the present application Chained address store as chained address to be detected, in order to according to the flow process of the embodiment of the present application to chain to be detected Ground connection location is detected, and determines in the target pages pointed by each chained address to be detected and target pages and comprise Chained address, it is achieved the chained address included in the bed-by-bed analysis page at different levels, and the chained address in the pages at different levels is entered Row detection, to realize all of chained address in Overall Acquisition website, and has all-links address in complete detection website Effect property.
Wherein, the chained address extracted can be stored in server or calculating equipment the memory area specified, Can also be to store in the data base specified, not be limited at this.
It should be noted that scheme in the embodiment of the present application is a process repeatedly performed, i.e. one takes turns Reciprocal this step 101 that performs is to 105, and Fig. 1 embodiment is only to be described as a example by the detection of a chained address.And not Homogeneous detection can be for different chained addresses.
Optionally, when carrying out link detection (first round performs each step of Fig. 1 in other words) first, can be to obtain In website to be analyzed, the chained address of portal page is as Object linking address.Wherein, this portal page can be to preset Alright, e.g., the chained address of this portal page can be the address of homepage of website;Or, any one page in website The chained address in face.Certainly, the chained address of portal page can also be inputted in real time by user, such as, in this step 101 Before, can represent the address inputting interface of a portal page, user can input portal page in this inputting interface Address.
And in addition to carrying out link detection first, follow-up be required for from storage chained address to be detected, determine Go out the not yet detected and Object linking address of this needs detection.
Certainly, after determining Object linking address every time, it is required for performing step 102 to 105, and completes this step 105 execution time, can confirm that the validation checking currently finished Object linking address, in that case, then return Perform, from the chained address described to be detected of storage, to obtain and website to be analyzed currently need to detect Object linking address Operation, until the chained address all to be detected of storage all has been carried out validation checking.
It is understood that generally, the address of the portal page of website to be analyzed is all effective chained address, But, address based on portal page carries out in the chained address that bed-by-bed analysis goes out then it is possible that invalid chained address, If for page request return response data show that this Object linking address is invalid link address time, then can be by this mesh Mark chained address is labeled as invalid link, in order to this invalid link of follow-up deletion or to effectively link or the invalid chain in website Tap into row statistical analysis etc. to process.
See Fig. 2, it illustrates the schematic flow sheet of a kind of another embodiment of method detecting web site url of the application, The method of the present embodiment can apply to server, and the method may include that
201, obtain the chained address of portal page in website to be analyzed.
As, represent user's inputting interface, this user's inputting interface can obtain the net to be analyzed of user's input The chained address of the portal page in standing.
Wherein, the chained address of this portal page is referred to as the address of this portal page or the page of portal page Address.When carrying out link detection for the first time, the chained address of portal page is the Object linking address currently need to detected.
202, send the first HTTP request to website to be analyzed, this first HTTP request carries the chain ground connection of portal page Location.
Detect in view of to the chained address of portal page, for the detection first in circulation, in order to make it easy to understand, also Make a distinction with the follow-up detection for the chained address of non-portal page, the HTTP of the chained address of portal page will be carried Request is referred to as the first HTTP request;And the HTTP request of follow-up transmission is referred to as the second HTTP request.
Accordingly, the HTTP conditional code returned for this first HTTP request is referred to as a HTTP conditional code;Will be for The HTTP conditional code that this second HTTP request returns is referred to as the 2nd HTTP conditional code.
203, the HTTP conditional code returned for this first HTTP request when website to be analyzed characterizes portal page When chained address is invalid link address, the chained address of this portal page is labeled as invalid link address, and returns step 201。
Wherein, generally, the chained address of portal page should be effective chained address.But when user inputs Mistake, or user have input invalid chained address just, then after this chained address can being marked, return this step Rapid 201, to reacquire the chained address of other portal page in website, such as, prompting user re-enters an entrance The chained address etc. of the page.
It is understood that after the chained address of portal page is labeled as invalid link address, can be invalid by this Chained address is also stored into data base or other memory areas, in order to invalid link address is processed by follow-up unification.
Optionally, before this step 203, can receive that website to be analyzed returns for this first HTTP request first HTTP conditional code.
204, when the chained address that a HTTP conditional code characterizes portal page is effective chained address, obtains and treat point The page data of this portal page that analysis website returns.
It is understood that after have sent the first HTTP request to website to be analyzed, this website to be analyzed is determined When this chained address is effective chained address, corresponding page data can be returned, so that server or calculating equipment represent this Portal page, it is achieved the access to this portal page.
205, from the page data of this portal page, extract the chained address included in this portal page.
Wherein, the chained address comprised in this portal page can be one or more, and these chained addresses all can be made Store in data base for the follow-up chained address needing detection.
206, the chained address extracted is stored in the data base specified as chained address to be detected.
Wherein, the chained address extracted in step 205 and subsequent step 212 all can store in data base, in order to after Continue and carry out validation checking.
Optionally, it is contemplated that the different pages may include identical chained address, the chain ground connection that will extract Before location stores data base, it is also possible to is compared with the chained address in data base in the chained address extracted, if Data base does not exist this chained address, then this chained address is stored in data base.
207, whether Test database exists not yet detected chained address, if it is, terminate net to be analyzed The detection of chained address in standing;If it is not, then perform step 208.
After the chained address getting this portal page, can be using this portal page as ground floor page to be detected Face, then analyzes the all-links address that this portal page is comprised, and using the all-links address that gets as to be detected Chained address, the most gradually using each chained address as the Object linking address being currently needed for detection, and at this object chain On the premise of ground connection location is effective chained address, determine this target pages pointed by Object linking address, and by this page object The chained address comprised in address, face is re-used as chained address to be detected, the most in layer carries out the page in website point Analysis, until all of chained address has the most been carried out validation checking.
It is to say, by the linking relationship between the page and the page, a given portal page can be accomplished, it is possible to Shape webbed automatization addressing effect, until arriving at each page.
208, from database purchase and not yet detected chained address, obtain the object chain ground connection currently need to detected Location.
It is understood that be analyzed through treating the page data of each page in analyzing web site, chain in data base Ground connection location is gradually increased, and all stores in data base until being analysed in website any one chained address.
Optionally, in order to distinguish the chained address being detected in data base and not detected chained address, Using chained address as after Object linking address, chained address can be marked.As, permissible in current time data base Storage has 10 chained addresses, then can be stored in a documentation form by the chained address carrying out validation checking In, and once chained address has been carried out effect and detects, then move to this chained address carry out in another one documentation form Storage.
209, send the second HTTP request to website to be analyzed, this second HTTP request carries Object linking address.
210, the 2nd HTTP conditional code returned for this second HTTP request when website to be analyzed characterizes this Object linking When address is invalid link address, it is invalid link address by this Object linking address mark, and returns step 207.
Wherein, after Object linking address is detected, the testing result of this Object linking address can be recorded, with labelling Going out this Object linking address is effective chained address, or invalid link address.
Optionally, before this step 210, it is also possible to receive that website to be analyzed returns for this second HTTP request Two HTTP conditional codes.
211, when the 2nd HTTP conditional code sign Object linking address is effective chained address, obtain website to be analyzed The page data returned;
212, from this page data, extract the link included in this target pages pointed by Object linking address Address, and perform step 206.
Determine that the mode of the chained address included in target pages may refer to the related introduction of preceding embodiment, at this Repeat no more.
Optionally, in one embodiment of the application any of the above, when all chained addresses to be analyzed are the most tested During survey, it is also possible to show the testing result corresponding to each chained address.
Further, this testing result can represent from multiple dimensions.
As, after the chained address that the target pages extracted pointed by Object linking address is comprised, can set up Incidence relation between this Object linking address and the chained address extracted, to characterize the mesh pointed by this Object linking address The mark page can comprise these chained addresses extracted.Accordingly, when carrying out testing result and representing, can show not With the incidence relation between chained address, and indicate whether these chained addresses are effective chained address.
And for example, in the embodiment of the present application, it is also possible to pointed by record access portal page and Object linking address The access time delay of target pages, and while representing result, show corresponding to the page pointed by each chained address Access time delay.
Certainly, in actual applications, it is also possible to incidence relation between to sum up above access time delay, chained address and The dimensions such as the effectiveness of chained address carry out representing of testing result.
A kind of device detecting web site url provided the application below is introduced.
See Fig. 3, it illustrates the structural representation of a kind of one embodiment of device detecting web site url of the application, this The device of embodiment may include that
Link acquiring unit 301, for obtaining the Object linking address currently need to detected in website to be analyzed;
Request transmitting unit 302, for based on described Object linking address, sending the page to described website to be analyzed please Ask;
Page analysis unit 303, for showing described Object linking when the response data returned for described page request When address is effective chained address, obtain the page data that described website to be analyzed returns;
Link placement unit 304, for based on described page data, extracts the mesh pointed by described Object linking address Chained address included in the mark page;
Link memory element 305, for being stored as corresponding to be checked in described website to be analyzed by the chained address extracted The chained address surveyed.
In the present embodiment, after the Object linking address currently need to detected in getting website to be analyzed, can be with This Object linking address is as a detection starting point, based on this Object linking address, sends page request to website to be analyzed, as When fruit shows that this Object linking address is effective chained address for the response data that this page request returns, also may proceed to obtain The page data that this website to be analyzed returns, so, can determine that specified by this Object linking address based on this page data Target pages in include which chained address, and these chained addresses are deposited as the chained address needing detection Storage, in order to the follow-up Object linking address that these chained addresses are detected as needs successively, it is seen then that permissible by bed-by-bed analysis The chained address that in getting website, each page is comprised, and can one by one the effectiveness of each chained address be examined Survey, such that it is able to effectively cover the all-links address in all pages in whole website, it is achieved that to chains all in website The complete detection of ground connection location, is also beneficial to the invalid link address detecting in website the most accurately.
Optionally, on the one hand, described link acquiring unit, particularly as follows: from the chained address described to be detected of storage In, obtain and described website to be analyzed currently need to detect Object linking address;
On the other hand, this link acquiring unit specifically, obtain the chained address of portal page in website to be analyzed, and Using the chained address of described portal page as the Object linking address currently need to detected.
Optionally, the device of the present embodiment can also include:
Invalid link indexing unit, for showing described Object linking when the response data returned for described page request When address is invalid link address, it is invalid link by described Object linking address mark.
Optionally, described request transmitting unit is specifically, be used for: based on described Object linking address, to described to be analyzed Website sends HTTP request;
Described device can also include:
Conditional code acquiring unit, is used for described based on described Object linking address, sends page to described website to be analyzed Request in person after asking, obtain the HTTP conditional code that described website to be analyzed returns;
Efficiency analysis unit, for being not belonging to the conditional code of the sign invalid link preset when described HTTP conditional code Time, it is determined that described Object linking address is effective chained address.
Optionally, described device also includes:
Result presentation unit, for when all chained addresses to be analyzed have been detected, shows each link Testing result corresponding to address.
Optionally, described device also includes:
Circulation trigger element, for when terminating the detection of described Object linking address, then return perform described from In the chained address described to be detected of storage, obtain the behaviour that Object linking address currently need to be detected in described website to be analyzed Make, till all chained addresses to be analyzed have been detected.
In this specification, each embodiment uses the mode gone forward one by one to describe, and what each embodiment stressed is and other The difference of embodiment, between each embodiment, identical similar portion sees mutually.For device disclosed in embodiment For, owing to it corresponds to the method disclosed in Example, so describe is fairly simple, relevant part sees method part and says Bright.
Professional further appreciates that, in conjunction with the unit of each example that the embodiments described herein describes And algorithm steps, it is possible to electronic hardware, computer software or the two be implemented in combination in, in order to clearly demonstrate hardware and The interchangeability of software, the most generally describes composition and the step of each example according to function.These Function performs with hardware or software mode actually, depends on application-specific and the design constraint of technical scheme.Specialty Technical staff specifically should can be used for using different methods to realize described function to each, but this realization should not Think beyond the scope of this invention.
The method described in conjunction with the embodiments described herein or the step of algorithm can direct hardware, processor be held The software module of row, or the combination of the two implements.Software module can be placed in random access memory (RAM), internal memory, read-only deposit Reservoir (ROM), electrically programmable ROM, electrically erasable ROM, depositor, hard disk, moveable magnetic disc, CD-ROM or technology In any other form of storage medium well known in field.
Described above to the disclosed embodiments, makes professional and technical personnel in the field be capable of or uses the present invention. Multiple amendment to these embodiments will be apparent from for those skilled in the art, as defined herein General Principle can realize without departing from the spirit or scope of the present invention in other embodiments.Therefore, the present invention It is not intended to be limited to the embodiments shown herein, and is to fit to and principles disclosed herein and features of novelty phase one The widest scope caused.

Claims (10)

1. the method detecting web site url, it is characterised in that including:
Obtain the Object linking address currently need to detected in website to be analyzed;
Based on described Object linking address, send page request to described website to be analyzed;
When the response data returned for described page request shows that described Object linking address is effective chained address, obtain The page data that described website to be analyzed returns;
Based on described page data, extract the chain ground connection included in the target pages pointed by described Object linking address Location;
The chained address extracted is stored as the chained address to be detected that described website to be analyzed is corresponding.
Method the most according to claim 1, it is characterised in that the target that currently need to detect in described acquisition website to be analyzed Chained address, including:
From the chained address described to be detected of storage, obtain and described website to be analyzed currently need to detect object chain ground connection Location.
Method the most according to claim 1, it is characterised in that the target that currently need to detect in described acquisition website to be analyzed Chained address, including:
Obtain the chained address of portal page in website to be analyzed;
Using the chained address of described portal page as the Object linking address currently need to detected.
Method the most according to claim 2, it is characterised in that also include:
When the response data returned for described page request shows that described Object linking address is invalid link address, by institute Stating Object linking address mark is invalid link.
5. according to the method described in claim 1,2 or 4, it is characterised in that described based on described Object linking address, to described Website to be analyzed sends page request, including:
Based on described Object linking address, send HTTP request to described website to be analyzed;
Described based on described Object linking address, after described website to be analyzed sends page request, also include:
Obtain the HTTP conditional code that described website to be analyzed returns;
When described HTTP conditional code is not belonging to the conditional code characterizing invalid link preset, it is determined that described Object linking address For effective chained address.
Method the most according to claim 4, it is characterised in that also include:
When all chained addresses to be analyzed have been detected, show the testing result corresponding to each chained address.
7. according to the method described in claim 2 or 4, it is characterised in that also include:
When terminating the detection of described Object linking address, then return and perform the described link described to be detected from storage In address, obtain the operation that Object linking address currently need to be detected in described website to be analyzed, until all chains to be analyzed Till ground connection location has been detected.
8. the device detecting web site url, it is characterised in that including:
Link acquiring unit, for obtaining the Object linking address currently need to detected in website to be analyzed;
Request transmitting unit, for based on described Object linking address, sends page request to described website to be analyzed;
When the response data returned for described page request, page analysis unit, for showing that described Object linking address is for having During effect chained address, obtain the page data that described website to be analyzed returns;
Link placement unit, for based on described page data, extracts the target pages pointed by described Object linking address Included in chained address;
Link memory element, for being stored as the link to be detected that described website to be analyzed is corresponding by the chained address extracted Address.
Device the most according to claim 8, it is characterised in that described link acquiring unit, particularly as follows: from described in storage In chained address to be detected, obtain and described website to be analyzed currently need to detect Object linking address;
Or, obtain the chained address of portal page in website to be analyzed, and using the chained address of described portal page as working as The front Object linking address that need to detect.
Device the most according to claim 9, it is characterised in that also include:
Invalid link indexing unit, for showing described Object linking address when the response data returned for described page request During for invalid link address, it is invalid link by described Object linking address mark.
CN201610802243.6A 2016-09-05 2016-09-05 Method for detecting web link and device thereof Pending CN106326485A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201610802243.6A CN106326485A (en) 2016-09-05 2016-09-05 Method for detecting web link and device thereof

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201610802243.6A CN106326485A (en) 2016-09-05 2016-09-05 Method for detecting web link and device thereof

Publications (1)

Publication Number Publication Date
CN106326485A true CN106326485A (en) 2017-01-11

Family

ID=57788061

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201610802243.6A Pending CN106326485A (en) 2016-09-05 2016-09-05 Method for detecting web link and device thereof

Country Status (1)

Country Link
CN (1) CN106326485A (en)

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108304402A (en) * 2017-01-12 2018-07-20 广州市动景计算机科技有限公司 Exterior chain availability monitor method and monitoring device
CN109408760A (en) * 2018-09-30 2019-03-01 东软集团股份有限公司 The method and apparatus for obtaining the information of necrosis link
CN110708270A (en) * 2018-07-10 2020-01-17 阿里巴巴集团控股有限公司 Abnormal link detection method and device
CN110740074A (en) * 2019-08-22 2020-01-31 阿里巴巴集团控股有限公司 Network address detection method and device and electronic equipment
CN111914531A (en) * 2020-06-20 2020-11-10 北京海金格医药科技股份有限公司 Hyperlink state determination method and device, electronic equipment and readable storage medium
CN112417240A (en) * 2020-02-21 2021-02-26 上海哔哩哔哩科技有限公司 Website link detection method and device and computer equipment
CN112416707A (en) * 2020-11-16 2021-02-26 北京五八信息技术有限公司 Link detection method and device
CN112699280A (en) * 2020-12-31 2021-04-23 北京天融信网络安全技术有限公司 Website monitoring method, website map establishing method and device and electronic equipment
CN113590987A (en) * 2021-09-29 2021-11-02 飞狐信息技术(天津)有限公司 Link detection method and device

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20090172154A1 (en) * 2007-12-31 2009-07-02 International Business Machines Corporation Method for autonomic detection and repair of broken links in web environments
US20140095964A1 (en) * 2012-10-01 2014-04-03 Cellco Partnership D/B/A Verizon Wireless Message links
CN104036053A (en) * 2014-07-07 2014-09-10 广州金山网络科技有限公司 Invalid link address processing method and device
CN104317938A (en) * 2014-10-31 2015-01-28 北京国双科技有限公司 Webpage validation method and device
CN105183919A (en) * 2015-10-13 2015-12-23 郑州悉知信息科技股份有限公司 Deployment method and device for internal links of website
CN105306462A (en) * 2015-10-13 2016-02-03 郑州悉知信息科技股份有限公司 Web page link detecting method and device

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20090172154A1 (en) * 2007-12-31 2009-07-02 International Business Machines Corporation Method for autonomic detection and repair of broken links in web environments
US20140095964A1 (en) * 2012-10-01 2014-04-03 Cellco Partnership D/B/A Verizon Wireless Message links
CN104036053A (en) * 2014-07-07 2014-09-10 广州金山网络科技有限公司 Invalid link address processing method and device
CN104317938A (en) * 2014-10-31 2015-01-28 北京国双科技有限公司 Webpage validation method and device
CN105183919A (en) * 2015-10-13 2015-12-23 郑州悉知信息科技股份有限公司 Deployment method and device for internal links of website
CN105306462A (en) * 2015-10-13 2016-02-03 郑州悉知信息科技股份有限公司 Web page link detecting method and device

Cited By (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108304402A (en) * 2017-01-12 2018-07-20 广州市动景计算机科技有限公司 Exterior chain availability monitor method and monitoring device
CN110708270A (en) * 2018-07-10 2020-01-17 阿里巴巴集团控股有限公司 Abnormal link detection method and device
CN110708270B (en) * 2018-07-10 2022-06-03 阿里巴巴集团控股有限公司 Abnormal link detection method and device
CN109408760A (en) * 2018-09-30 2019-03-01 东软集团股份有限公司 The method and apparatus for obtaining the information of necrosis link
CN110740074A (en) * 2019-08-22 2020-01-31 阿里巴巴集团控股有限公司 Network address detection method and device and electronic equipment
CN110740074B (en) * 2019-08-22 2023-04-18 创新先进技术有限公司 Network address detection method and device and electronic equipment
CN112417240A (en) * 2020-02-21 2021-02-26 上海哔哩哔哩科技有限公司 Website link detection method and device and computer equipment
CN111914531A (en) * 2020-06-20 2020-11-10 北京海金格医药科技股份有限公司 Hyperlink state determination method and device, electronic equipment and readable storage medium
CN112416707A (en) * 2020-11-16 2021-02-26 北京五八信息技术有限公司 Link detection method and device
CN112416707B (en) * 2020-11-16 2022-02-11 北京五八信息技术有限公司 Link detection method and device
CN112699280A (en) * 2020-12-31 2021-04-23 北京天融信网络安全技术有限公司 Website monitoring method, website map establishing method and device and electronic equipment
CN113590987A (en) * 2021-09-29 2021-11-02 飞狐信息技术(天津)有限公司 Link detection method and device

Similar Documents

Publication Publication Date Title
CN106326485A (en) Method for detecting web link and device thereof
CN103942497B (en) Forensics type website vulnerability scanning method and system
CN104317938B (en) Web page interlinkage validation verification method and device
CN104980309B (en) website security detection method and device
US9075914B2 (en) Analytics driven development
CN104346462B (en) Preserve the method, apparatus and browser client of web page element
CN104881603B (en) Webpage redirects leak detection method and device
CN106844522A (en) A kind of network data crawling method and device
CN105373478B (en) Automated testing method and system
CN102870118B (en) Access method, device and system to user behavior
CN103678109A (en) Dump document analysis method, device and system
CN106598991A (en) Web crawler system capable of realizing website interaction and automatic form extraction by conversational mode
CN108399124A (en) Application testing method, device, computer equipment and storage medium
CN107340954A (en) A kind of information extracting method and device
CN107729729A (en) It is a kind of based on random forest slip identifying code automatically by method of testing
CN105117340B (en) URL detection methods and device for iOS browser application quality evaluations
CN108874802A (en) Page detection method and device
CN108694325A (en) The condition discriminating apparatus of the discriminating conduct and specified type website of specified type website
CN104317884A (en) Method and device for acquiring types of source pages of website
CN111476446A (en) Service state monitoring processing method, device, equipment and storage medium
CN103929498B (en) The method and apparatus for handling client request
CN107085684A (en) The detection method and device of performance of program
JP7074188B2 (en) Security coping ability measurement system, method and program
Patil et al. Survey on different phases of digital forensics investigation models
CN106126538A (en) The transformation processing method of the page and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20170111