CN106326485A - Method for detecting web link and device thereof - Google Patents
Method for detecting web link and device thereof Download PDFInfo
- Publication number
- CN106326485A CN106326485A CN201610802243.6A CN201610802243A CN106326485A CN 106326485 A CN106326485 A CN 106326485A CN 201610802243 A CN201610802243 A CN 201610802243A CN 106326485 A CN106326485 A CN 106326485A
- Authority
- CN
- China
- Prior art keywords
- address
- analyzed
- website
- object linking
- page
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/95—Retrieval from the web
- G06F16/955—Retrieval from the web using information identifiers, e.g. uniform resource locators [URL]
- G06F16/9566—URL specific, e.g. using aliases, detecting broken or misspelled links
Landscapes
- Engineering & Computer Science (AREA)
- Databases & Information Systems (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
- Information Transfer Between Computers (AREA)
Abstract
The invention provides a method for detecting a web link and a device thereof, and the method includes the following steps: obtaining a target link address needed to be detected currently in a web site to be analyzed; based on the target link address, sending a page request to the web site to be analyzed; when returned response data aimed at the page request indicate that the target link address is valid, obtaining page data returned by the web site to be analyzed; based on the page data, extracting a link address contained in a target page which the target link address points to; storing the extracted link address as link address to be detected corresponding to the web site to be analyzed. The method for detecting the web link is capable of detecting the effectiveness of all the links in the web site more comprehensively.
Description
Technical field
The present invention relates to data collection techniques field, be specifically related to a kind of method and apparatus detecting web site url.
Background technology
At present, there may be multiple link in each page in website, under normal circumstances, a link should be permissible
It is linked to a Website page;If certain link normal, chain cannot receive a Website page, then this be linked as one invalid
Link.Invalid link can largely effect on Consumer's Experience, thus causes the loss of website traffic.
In order to be reduced or avoided due to the invalid link impact on Consumer's Experience, there is invalid link in website, it is right to need
In website, the link in each webpage detects.And owing to website typically all can include the substantial amounts of page, and each page
Face includes again multiple link, these pages being linked to can include again substantial amounts of link, existing link
Detection mode can only carry out link detection for the single page, and cannot carry out entirely for all catalogues and the page in a website
The link detection in face, thus cannot the most comprehensively detect the linking status of all-links in website.
Summary of the invention
In view of this, the embodiment of the present invention provides a kind of method and apparatus detecting web site url, to realize more fully
The effectiveness of all-links in website is detected, thus advantageously reduce the invalid link in website.
For achieving the above object, the following technical scheme of embodiment of the present invention offer:
A kind of method detecting web site url, including:
Obtain the Object linking address currently need to detected in website to be analyzed;
Based on described Object linking address, send page request to described website to be analyzed;
When the response data returned for described page request shows that described Object linking address is effective chained address,
Obtain the page data that described website to be analyzed returns;
Based on described page data, extract the link included in the target pages pointed by described Object linking address
Address;
The chained address extracted is stored as the chained address to be detected that described website to be analyzed is corresponding.
Optionally, the Object linking address currently need to detected in described acquisition website to be analyzed, including:
From the chained address described to be detected of storage, obtain and described website to be analyzed currently need to detect object chain
Ground connection location.
Optionally, the Object linking address currently need to detected in described acquisition website to be analyzed, including:
Obtain the chained address of portal page in website to be analyzed;
Using the chained address of described portal page as the Object linking address currently need to detected.
Optionally, also include:
When the response data returned for described page request shows that described Object linking address is invalid link address,
It is invalid link by described Object linking address mark.
Optionally, described based on described Object linking address, page request is sent to described website to be analyzed, including:
Based on described Object linking address, send HTTP request to described website to be analyzed;
Described based on described Object linking address, after described website to be analyzed sends page request, also include:
Obtain the HTTP conditional code that described website to be analyzed returns;
When described HTTP conditional code is not belonging to the conditional code characterizing invalid link preset, it is determined that described Object linking
Address is effective chained address.
Optionally, also include:
When all chained addresses to be analyzed have been detected, show the detection knot corresponding to each chained address
Really.
Optionally, also include:
When terminating the detection of described Object linking address, then return described in performing described to be detected from store
In chained address, obtain the operation that Object linking address currently need to be detected in described website to be analyzed, until all to be analyzed
Chained address be detected till.
On the other hand, the embodiment of the present application additionally provides a kind of device detecting web site url, including:
Link acquiring unit, for obtaining the Object linking address currently need to detected in website to be analyzed;
Request transmitting unit, for based on described Object linking address, sends page request to described website to be analyzed;
Page analysis unit, for showing described Object linking address when the response data returned for described page request
During for effective chained address, obtain the page data that described website to be analyzed returns;
Link placement unit, for based on described page data, extracts the target pointed by described Object linking address
Chained address included in the page;
Link memory element, for being stored as corresponding to be detected in described website to be analyzed by the chained address extracted
Chained address.
Preferably, described link acquiring unit, particularly as follows: from the chained address described to be detected of storage, obtain institute
State and website to be analyzed currently need to detect Object linking address;
Or, obtain the chained address of portal page in website to be analyzed, and the chained address of described portal page made
For the Object linking address currently need to detected.
Preferably, also include:
Invalid link indexing unit, for showing described Object linking when the response data returned for described page request
When address is invalid link address, it is invalid link by described Object linking address mark.
Based on technique scheme, after the Object linking address currently need to detected in getting website to be analyzed, can
With using this Object linking address as a detection starting point, based on this Object linking address, sending the page to website to be analyzed please
Ask, if the response data returned for this page request shows that this Object linking address is effective chained address, also can continue
The continuous page data obtaining the return of this website to be analyzed, so, can determine that this Object linking address based on this page data
Which chained address specified target pages includes, and these chained addresses are entered as the chained address needing detection
Row storage, in order to the follow-up Object linking address that these chained addresses are detected as needs successively, it is seen then that pass through bed-by-bed analysis
The chained address that in can getting website, each page is comprised, and can one by one the effectiveness of each chained address be carried out
Detection, such that it is able to effectively cover the all-links address in all pages in whole website, it is achieved that to all in website
The complete detection of chained address, is also beneficial to the invalid link address detecting in website the most accurately.
Accompanying drawing explanation
In order to be illustrated more clearly that the embodiment of the present application or technical scheme of the prior art, below will be to embodiment or existing
In having technology to describe, the required accompanying drawing used is briefly described, it should be apparent that, the accompanying drawing in describing below is only this
The embodiment of application, for those of ordinary skill in the art, on the premise of not paying creative work, it is also possible to according to
The accompanying drawing provided obtains other accompanying drawing.
Fig. 1 is the schematic flow sheet of a kind of one embodiment of method detecting web site url of the application;
Fig. 2 is the schematic flow sheet of a kind of another embodiment of method detecting web site url of the application;
Fig. 3 is the schematic flow sheet of a kind of one embodiment of device detecting web site url of the application.
Detailed description of the invention
Below in conjunction with the accompanying drawing in the embodiment of the present invention, the technical scheme in the embodiment of the present invention is carried out clear, complete
Describe, it is clear that described embodiment is only a part of embodiment of the present invention rather than whole embodiments wholely.Based on
Embodiment in the present invention, it is every other that those of ordinary skill in the art are obtained under not making creative work premise
Embodiment, broadly falls into the scope of protection of the invention.
See Fig. 1, it illustrates the schematic flow sheet of a kind of one embodiment of method detecting web site url of the application, this
The method of embodiment can apply to server or other can carry out in the calculating equipment of data analysis, this server or calculating
Equipment can and website to be analyzed between be connected by network, the method for the present embodiment may include that
101, obtain the Object linking address currently need to detected in website to be analyzed.
Wherein, website to be analyzed is the website needing to carry out link detection in the embodiment of the present application, needs in the present embodiment
The chained address comprised in each page involved by this website is detected, whether to detect chained address as nothing
Effect chained address.
It is understood that in the embodiment of the present application, this chained address can be the address of a page;Can also be
Referring to point to the linking relationship of a page from a webpage, wherein, the pointed page can be another webpage, it is also possible to be
Diverse location in same web page, it is also possible to be the page including picture, e-mail address, file etc..For the ease of district
Point, the chained address being currently needed for detecting link detection is referred to as Object linking address.
102, based on this Object linking address, send page request to this website to be analyzed.
Wherein, this page request is for asking this page data pointed by Object linking address, and such as, request accesses should
The page pointed by Object linking address.
Wherein, the mode sending page request to this website to be analyzed can have multiple, e.g., and can be according to this Object linking
Address generates access request, and such as, this access request can be HTML (Hypertext Markup Language) (HTTP, Hyper Text Transfer
Protocol) request, then, access request can be sent to website to be analyzed by server, and access to be analyzed can respond this
Access request, and analyze whether the Object linking address entrained by this access request may be coupled to the page, and be for access
Request returns response data.
103, when the response data returned for this page request shows that this Object linking address is effective chained address,
Obtain the page data that this website to be analyzed returns.
It is understood that at server after this website to be analyzed sends page request, website to be analyzed can be inquired about
Whether this Object linking address is associated with the page, if this Object linking address is not associated pages such as any picture, webpage, then
This Object linking address is invalid chained address, and the response data that website is fed back for effective chained address, with net
The response data fed back for invalid chained address of standing is different, so, just can determine that this target according to response data
Whether chained address is effective chained address.
Optionally, in the case of the page request sent to website to be analyzed is HTTP request, website to be analyzed returns
Response data can be HTTP conditional code.And can reflect whether Object linking address is effective by HTTP conditional code
Chained address, concrete, the conditional code characterizing invalid link can be preset, if the HTTP conditional code that website to be analyzed returns is not
Belong to this conditional code characterizing invalid link preset, then may determine that this Object linking address is effective chained address;Instead
It, if the HTTP conditional code that website to be analyzed returns belongs to the default conditional code characterizing invalid link, then may determine that this
Object linking address belongs to invalid chained address.Such as, when HTTP conditional code is 404, then entrained by explanation HTTP request
Object linking address is invalid chained address;When HTTP conditional code is 200, then explanation Object linking entrained by HTTP request
Address is effective chained address.
In the case of this Object linking address is effective chained address, website to be analyzed can return this object chain ground connection
The page data of the page pointed by location, so that server or calculating equipment can show based on this page data accordingly
Content of pages.
104, based on this page data, extract the link included in this target pages pointed by Object linking address
Address.
Wherein, for the ease of distinguishing, the page pointed by Object linking address is referred to as page object by the embodiment of the present application
Face.
By this web data is analyzed, can extract in the target pages pointed by Object linking address and comprise
There is which chained address.As, the source code of this target pages can be obtained based on this page data, then extract from source code
Go out the chained address comprised in this target pages.
105, the chained address extracted is stored as the chained address to be detected that website to be analyzed is corresponding.
After extracting the chained address included in this target pages, this can be extracted by the embodiment of the present application
Chained address store as chained address to be detected, in order to according to the flow process of the embodiment of the present application to chain to be detected
Ground connection location is detected, and determines in the target pages pointed by each chained address to be detected and target pages and comprise
Chained address, it is achieved the chained address included in the bed-by-bed analysis page at different levels, and the chained address in the pages at different levels is entered
Row detection, to realize all of chained address in Overall Acquisition website, and has all-links address in complete detection website
Effect property.
Wherein, the chained address extracted can be stored in server or calculating equipment the memory area specified,
Can also be to store in the data base specified, not be limited at this.
It should be noted that scheme in the embodiment of the present application is a process repeatedly performed, i.e. one takes turns
Reciprocal this step 101 that performs is to 105, and Fig. 1 embodiment is only to be described as a example by the detection of a chained address.And not
Homogeneous detection can be for different chained addresses.
Optionally, when carrying out link detection (first round performs each step of Fig. 1 in other words) first, can be to obtain
In website to be analyzed, the chained address of portal page is as Object linking address.Wherein, this portal page can be to preset
Alright, e.g., the chained address of this portal page can be the address of homepage of website;Or, any one page in website
The chained address in face.Certainly, the chained address of portal page can also be inputted in real time by user, such as, in this step 101
Before, can represent the address inputting interface of a portal page, user can input portal page in this inputting interface
Address.
And in addition to carrying out link detection first, follow-up be required for from storage chained address to be detected, determine
Go out the not yet detected and Object linking address of this needs detection.
Certainly, after determining Object linking address every time, it is required for performing step 102 to 105, and completes this step
105 execution time, can confirm that the validation checking currently finished Object linking address, in that case, then return
Perform, from the chained address described to be detected of storage, to obtain and website to be analyzed currently need to detect Object linking address
Operation, until the chained address all to be detected of storage all has been carried out validation checking.
It is understood that generally, the address of the portal page of website to be analyzed is all effective chained address,
But, address based on portal page carries out in the chained address that bed-by-bed analysis goes out then it is possible that invalid chained address,
If for page request return response data show that this Object linking address is invalid link address time, then can be by this mesh
Mark chained address is labeled as invalid link, in order to this invalid link of follow-up deletion or to effectively link or the invalid chain in website
Tap into row statistical analysis etc. to process.
See Fig. 2, it illustrates the schematic flow sheet of a kind of another embodiment of method detecting web site url of the application,
The method of the present embodiment can apply to server, and the method may include that
201, obtain the chained address of portal page in website to be analyzed.
As, represent user's inputting interface, this user's inputting interface can obtain the net to be analyzed of user's input
The chained address of the portal page in standing.
Wherein, the chained address of this portal page is referred to as the address of this portal page or the page of portal page
Address.When carrying out link detection for the first time, the chained address of portal page is the Object linking address currently need to detected.
202, send the first HTTP request to website to be analyzed, this first HTTP request carries the chain ground connection of portal page
Location.
Detect in view of to the chained address of portal page, for the detection first in circulation, in order to make it easy to understand, also
Make a distinction with the follow-up detection for the chained address of non-portal page, the HTTP of the chained address of portal page will be carried
Request is referred to as the first HTTP request;And the HTTP request of follow-up transmission is referred to as the second HTTP request.
Accordingly, the HTTP conditional code returned for this first HTTP request is referred to as a HTTP conditional code;Will be for
The HTTP conditional code that this second HTTP request returns is referred to as the 2nd HTTP conditional code.
203, the HTTP conditional code returned for this first HTTP request when website to be analyzed characterizes portal page
When chained address is invalid link address, the chained address of this portal page is labeled as invalid link address, and returns step
201。
Wherein, generally, the chained address of portal page should be effective chained address.But when user inputs
Mistake, or user have input invalid chained address just, then after this chained address can being marked, return this step
Rapid 201, to reacquire the chained address of other portal page in website, such as, prompting user re-enters an entrance
The chained address etc. of the page.
It is understood that after the chained address of portal page is labeled as invalid link address, can be invalid by this
Chained address is also stored into data base or other memory areas, in order to invalid link address is processed by follow-up unification.
Optionally, before this step 203, can receive that website to be analyzed returns for this first HTTP request first
HTTP conditional code.
204, when the chained address that a HTTP conditional code characterizes portal page is effective chained address, obtains and treat point
The page data of this portal page that analysis website returns.
It is understood that after have sent the first HTTP request to website to be analyzed, this website to be analyzed is determined
When this chained address is effective chained address, corresponding page data can be returned, so that server or calculating equipment represent this
Portal page, it is achieved the access to this portal page.
205, from the page data of this portal page, extract the chained address included in this portal page.
Wherein, the chained address comprised in this portal page can be one or more, and these chained addresses all can be made
Store in data base for the follow-up chained address needing detection.
206, the chained address extracted is stored in the data base specified as chained address to be detected.
Wherein, the chained address extracted in step 205 and subsequent step 212 all can store in data base, in order to after
Continue and carry out validation checking.
Optionally, it is contemplated that the different pages may include identical chained address, the chain ground connection that will extract
Before location stores data base, it is also possible to is compared with the chained address in data base in the chained address extracted, if
Data base does not exist this chained address, then this chained address is stored in data base.
207, whether Test database exists not yet detected chained address, if it is, terminate net to be analyzed
The detection of chained address in standing;If it is not, then perform step 208.
After the chained address getting this portal page, can be using this portal page as ground floor page to be detected
Face, then analyzes the all-links address that this portal page is comprised, and using the all-links address that gets as to be detected
Chained address, the most gradually using each chained address as the Object linking address being currently needed for detection, and at this object chain
On the premise of ground connection location is effective chained address, determine this target pages pointed by Object linking address, and by this page object
The chained address comprised in address, face is re-used as chained address to be detected, the most in layer carries out the page in website point
Analysis, until all of chained address has the most been carried out validation checking.
It is to say, by the linking relationship between the page and the page, a given portal page can be accomplished, it is possible to
Shape webbed automatization addressing effect, until arriving at each page.
208, from database purchase and not yet detected chained address, obtain the object chain ground connection currently need to detected
Location.
It is understood that be analyzed through treating the page data of each page in analyzing web site, chain in data base
Ground connection location is gradually increased, and all stores in data base until being analysed in website any one chained address.
Optionally, in order to distinguish the chained address being detected in data base and not detected chained address,
Using chained address as after Object linking address, chained address can be marked.As, permissible in current time data base
Storage has 10 chained addresses, then can be stored in a documentation form by the chained address carrying out validation checking
In, and once chained address has been carried out effect and detects, then move to this chained address carry out in another one documentation form
Storage.
209, send the second HTTP request to website to be analyzed, this second HTTP request carries Object linking address.
210, the 2nd HTTP conditional code returned for this second HTTP request when website to be analyzed characterizes this Object linking
When address is invalid link address, it is invalid link address by this Object linking address mark, and returns step 207.
Wherein, after Object linking address is detected, the testing result of this Object linking address can be recorded, with labelling
Going out this Object linking address is effective chained address, or invalid link address.
Optionally, before this step 210, it is also possible to receive that website to be analyzed returns for this second HTTP request
Two HTTP conditional codes.
211, when the 2nd HTTP conditional code sign Object linking address is effective chained address, obtain website to be analyzed
The page data returned;
212, from this page data, extract the link included in this target pages pointed by Object linking address
Address, and perform step 206.
Determine that the mode of the chained address included in target pages may refer to the related introduction of preceding embodiment, at this
Repeat no more.
Optionally, in one embodiment of the application any of the above, when all chained addresses to be analyzed are the most tested
During survey, it is also possible to show the testing result corresponding to each chained address.
Further, this testing result can represent from multiple dimensions.
As, after the chained address that the target pages extracted pointed by Object linking address is comprised, can set up
Incidence relation between this Object linking address and the chained address extracted, to characterize the mesh pointed by this Object linking address
The mark page can comprise these chained addresses extracted.Accordingly, when carrying out testing result and representing, can show not
With the incidence relation between chained address, and indicate whether these chained addresses are effective chained address.
And for example, in the embodiment of the present application, it is also possible to pointed by record access portal page and Object linking address
The access time delay of target pages, and while representing result, show corresponding to the page pointed by each chained address
Access time delay.
Certainly, in actual applications, it is also possible to incidence relation between to sum up above access time delay, chained address and
The dimensions such as the effectiveness of chained address carry out representing of testing result.
A kind of device detecting web site url provided the application below is introduced.
See Fig. 3, it illustrates the structural representation of a kind of one embodiment of device detecting web site url of the application, this
The device of embodiment may include that
Link acquiring unit 301, for obtaining the Object linking address currently need to detected in website to be analyzed;
Request transmitting unit 302, for based on described Object linking address, sending the page to described website to be analyzed please
Ask;
Page analysis unit 303, for showing described Object linking when the response data returned for described page request
When address is effective chained address, obtain the page data that described website to be analyzed returns;
Link placement unit 304, for based on described page data, extracts the mesh pointed by described Object linking address
Chained address included in the mark page;
Link memory element 305, for being stored as corresponding to be checked in described website to be analyzed by the chained address extracted
The chained address surveyed.
In the present embodiment, after the Object linking address currently need to detected in getting website to be analyzed, can be with
This Object linking address is as a detection starting point, based on this Object linking address, sends page request to website to be analyzed, as
When fruit shows that this Object linking address is effective chained address for the response data that this page request returns, also may proceed to obtain
The page data that this website to be analyzed returns, so, can determine that specified by this Object linking address based on this page data
Target pages in include which chained address, and these chained addresses are deposited as the chained address needing detection
Storage, in order to the follow-up Object linking address that these chained addresses are detected as needs successively, it is seen then that permissible by bed-by-bed analysis
The chained address that in getting website, each page is comprised, and can one by one the effectiveness of each chained address be examined
Survey, such that it is able to effectively cover the all-links address in all pages in whole website, it is achieved that to chains all in website
The complete detection of ground connection location, is also beneficial to the invalid link address detecting in website the most accurately.
Optionally, on the one hand, described link acquiring unit, particularly as follows: from the chained address described to be detected of storage
In, obtain and described website to be analyzed currently need to detect Object linking address;
On the other hand, this link acquiring unit specifically, obtain the chained address of portal page in website to be analyzed, and
Using the chained address of described portal page as the Object linking address currently need to detected.
Optionally, the device of the present embodiment can also include:
Invalid link indexing unit, for showing described Object linking when the response data returned for described page request
When address is invalid link address, it is invalid link by described Object linking address mark.
Optionally, described request transmitting unit is specifically, be used for: based on described Object linking address, to described to be analyzed
Website sends HTTP request;
Described device can also include:
Conditional code acquiring unit, is used for described based on described Object linking address, sends page to described website to be analyzed
Request in person after asking, obtain the HTTP conditional code that described website to be analyzed returns;
Efficiency analysis unit, for being not belonging to the conditional code of the sign invalid link preset when described HTTP conditional code
Time, it is determined that described Object linking address is effective chained address.
Optionally, described device also includes:
Result presentation unit, for when all chained addresses to be analyzed have been detected, shows each link
Testing result corresponding to address.
Optionally, described device also includes:
Circulation trigger element, for when terminating the detection of described Object linking address, then return perform described from
In the chained address described to be detected of storage, obtain the behaviour that Object linking address currently need to be detected in described website to be analyzed
Make, till all chained addresses to be analyzed have been detected.
In this specification, each embodiment uses the mode gone forward one by one to describe, and what each embodiment stressed is and other
The difference of embodiment, between each embodiment, identical similar portion sees mutually.For device disclosed in embodiment
For, owing to it corresponds to the method disclosed in Example, so describe is fairly simple, relevant part sees method part and says
Bright.
Professional further appreciates that, in conjunction with the unit of each example that the embodiments described herein describes
And algorithm steps, it is possible to electronic hardware, computer software or the two be implemented in combination in, in order to clearly demonstrate hardware and
The interchangeability of software, the most generally describes composition and the step of each example according to function.These
Function performs with hardware or software mode actually, depends on application-specific and the design constraint of technical scheme.Specialty
Technical staff specifically should can be used for using different methods to realize described function to each, but this realization should not
Think beyond the scope of this invention.
The method described in conjunction with the embodiments described herein or the step of algorithm can direct hardware, processor be held
The software module of row, or the combination of the two implements.Software module can be placed in random access memory (RAM), internal memory, read-only deposit
Reservoir (ROM), electrically programmable ROM, electrically erasable ROM, depositor, hard disk, moveable magnetic disc, CD-ROM or technology
In any other form of storage medium well known in field.
Described above to the disclosed embodiments, makes professional and technical personnel in the field be capable of or uses the present invention.
Multiple amendment to these embodiments will be apparent from for those skilled in the art, as defined herein
General Principle can realize without departing from the spirit or scope of the present invention in other embodiments.Therefore, the present invention
It is not intended to be limited to the embodiments shown herein, and is to fit to and principles disclosed herein and features of novelty phase one
The widest scope caused.
Claims (10)
1. the method detecting web site url, it is characterised in that including:
Obtain the Object linking address currently need to detected in website to be analyzed;
Based on described Object linking address, send page request to described website to be analyzed;
When the response data returned for described page request shows that described Object linking address is effective chained address, obtain
The page data that described website to be analyzed returns;
Based on described page data, extract the chain ground connection included in the target pages pointed by described Object linking address
Location;
The chained address extracted is stored as the chained address to be detected that described website to be analyzed is corresponding.
Method the most according to claim 1, it is characterised in that the target that currently need to detect in described acquisition website to be analyzed
Chained address, including:
From the chained address described to be detected of storage, obtain and described website to be analyzed currently need to detect object chain ground connection
Location.
Method the most according to claim 1, it is characterised in that the target that currently need to detect in described acquisition website to be analyzed
Chained address, including:
Obtain the chained address of portal page in website to be analyzed;
Using the chained address of described portal page as the Object linking address currently need to detected.
Method the most according to claim 2, it is characterised in that also include:
When the response data returned for described page request shows that described Object linking address is invalid link address, by institute
Stating Object linking address mark is invalid link.
5. according to the method described in claim 1,2 or 4, it is characterised in that described based on described Object linking address, to described
Website to be analyzed sends page request, including:
Based on described Object linking address, send HTTP request to described website to be analyzed;
Described based on described Object linking address, after described website to be analyzed sends page request, also include:
Obtain the HTTP conditional code that described website to be analyzed returns;
When described HTTP conditional code is not belonging to the conditional code characterizing invalid link preset, it is determined that described Object linking address
For effective chained address.
Method the most according to claim 4, it is characterised in that also include:
When all chained addresses to be analyzed have been detected, show the testing result corresponding to each chained address.
7. according to the method described in claim 2 or 4, it is characterised in that also include:
When terminating the detection of described Object linking address, then return and perform the described link described to be detected from storage
In address, obtain the operation that Object linking address currently need to be detected in described website to be analyzed, until all chains to be analyzed
Till ground connection location has been detected.
8. the device detecting web site url, it is characterised in that including:
Link acquiring unit, for obtaining the Object linking address currently need to detected in website to be analyzed;
Request transmitting unit, for based on described Object linking address, sends page request to described website to be analyzed;
When the response data returned for described page request, page analysis unit, for showing that described Object linking address is for having
During effect chained address, obtain the page data that described website to be analyzed returns;
Link placement unit, for based on described page data, extracts the target pages pointed by described Object linking address
Included in chained address;
Link memory element, for being stored as the link to be detected that described website to be analyzed is corresponding by the chained address extracted
Address.
Device the most according to claim 8, it is characterised in that described link acquiring unit, particularly as follows: from described in storage
In chained address to be detected, obtain and described website to be analyzed currently need to detect Object linking address;
Or, obtain the chained address of portal page in website to be analyzed, and using the chained address of described portal page as working as
The front Object linking address that need to detect.
Device the most according to claim 9, it is characterised in that also include:
Invalid link indexing unit, for showing described Object linking address when the response data returned for described page request
During for invalid link address, it is invalid link by described Object linking address mark.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201610802243.6A CN106326485A (en) | 2016-09-05 | 2016-09-05 | Method for detecting web link and device thereof |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201610802243.6A CN106326485A (en) | 2016-09-05 | 2016-09-05 | Method for detecting web link and device thereof |
Publications (1)
Publication Number | Publication Date |
---|---|
CN106326485A true CN106326485A (en) | 2017-01-11 |
Family
ID=57788061
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201610802243.6A Pending CN106326485A (en) | 2016-09-05 | 2016-09-05 | Method for detecting web link and device thereof |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN106326485A (en) |
Cited By (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108304402A (en) * | 2017-01-12 | 2018-07-20 | 广州市动景计算机科技有限公司 | Exterior chain availability monitor method and monitoring device |
CN109408760A (en) * | 2018-09-30 | 2019-03-01 | 东软集团股份有限公司 | The method and apparatus for obtaining the information of necrosis link |
CN110708270A (en) * | 2018-07-10 | 2020-01-17 | 阿里巴巴集团控股有限公司 | Abnormal link detection method and device |
CN110740074A (en) * | 2019-08-22 | 2020-01-31 | 阿里巴巴集团控股有限公司 | Network address detection method and device and electronic equipment |
CN111914531A (en) * | 2020-06-20 | 2020-11-10 | 北京海金格医药科技股份有限公司 | Hyperlink state determination method and device, electronic equipment and readable storage medium |
CN112417240A (en) * | 2020-02-21 | 2021-02-26 | 上海哔哩哔哩科技有限公司 | Website link detection method and device and computer equipment |
CN112416707A (en) * | 2020-11-16 | 2021-02-26 | 北京五八信息技术有限公司 | Link detection method and device |
CN112699280A (en) * | 2020-12-31 | 2021-04-23 | 北京天融信网络安全技术有限公司 | Website monitoring method, website map establishing method and device and electronic equipment |
CN113590987A (en) * | 2021-09-29 | 2021-11-02 | 飞狐信息技术(天津)有限公司 | Link detection method and device |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20090172154A1 (en) * | 2007-12-31 | 2009-07-02 | International Business Machines Corporation | Method for autonomic detection and repair of broken links in web environments |
US20140095964A1 (en) * | 2012-10-01 | 2014-04-03 | Cellco Partnership D/B/A Verizon Wireless | Message links |
CN104036053A (en) * | 2014-07-07 | 2014-09-10 | 广州金山网络科技有限公司 | Invalid link address processing method and device |
CN104317938A (en) * | 2014-10-31 | 2015-01-28 | 北京国双科技有限公司 | Webpage validation method and device |
CN105183919A (en) * | 2015-10-13 | 2015-12-23 | 郑州悉知信息科技股份有限公司 | Deployment method and device for internal links of website |
CN105306462A (en) * | 2015-10-13 | 2016-02-03 | 郑州悉知信息科技股份有限公司 | Web page link detecting method and device |
-
2016
- 2016-09-05 CN CN201610802243.6A patent/CN106326485A/en active Pending
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20090172154A1 (en) * | 2007-12-31 | 2009-07-02 | International Business Machines Corporation | Method for autonomic detection and repair of broken links in web environments |
US20140095964A1 (en) * | 2012-10-01 | 2014-04-03 | Cellco Partnership D/B/A Verizon Wireless | Message links |
CN104036053A (en) * | 2014-07-07 | 2014-09-10 | 广州金山网络科技有限公司 | Invalid link address processing method and device |
CN104317938A (en) * | 2014-10-31 | 2015-01-28 | 北京国双科技有限公司 | Webpage validation method and device |
CN105183919A (en) * | 2015-10-13 | 2015-12-23 | 郑州悉知信息科技股份有限公司 | Deployment method and device for internal links of website |
CN105306462A (en) * | 2015-10-13 | 2016-02-03 | 郑州悉知信息科技股份有限公司 | Web page link detecting method and device |
Cited By (12)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108304402A (en) * | 2017-01-12 | 2018-07-20 | 广州市动景计算机科技有限公司 | Exterior chain availability monitor method and monitoring device |
CN110708270A (en) * | 2018-07-10 | 2020-01-17 | 阿里巴巴集团控股有限公司 | Abnormal link detection method and device |
CN110708270B (en) * | 2018-07-10 | 2022-06-03 | 阿里巴巴集团控股有限公司 | Abnormal link detection method and device |
CN109408760A (en) * | 2018-09-30 | 2019-03-01 | 东软集团股份有限公司 | The method and apparatus for obtaining the information of necrosis link |
CN110740074A (en) * | 2019-08-22 | 2020-01-31 | 阿里巴巴集团控股有限公司 | Network address detection method and device and electronic equipment |
CN110740074B (en) * | 2019-08-22 | 2023-04-18 | 创新先进技术有限公司 | Network address detection method and device and electronic equipment |
CN112417240A (en) * | 2020-02-21 | 2021-02-26 | 上海哔哩哔哩科技有限公司 | Website link detection method and device and computer equipment |
CN111914531A (en) * | 2020-06-20 | 2020-11-10 | 北京海金格医药科技股份有限公司 | Hyperlink state determination method and device, electronic equipment and readable storage medium |
CN112416707A (en) * | 2020-11-16 | 2021-02-26 | 北京五八信息技术有限公司 | Link detection method and device |
CN112416707B (en) * | 2020-11-16 | 2022-02-11 | 北京五八信息技术有限公司 | Link detection method and device |
CN112699280A (en) * | 2020-12-31 | 2021-04-23 | 北京天融信网络安全技术有限公司 | Website monitoring method, website map establishing method and device and electronic equipment |
CN113590987A (en) * | 2021-09-29 | 2021-11-02 | 飞狐信息技术(天津)有限公司 | Link detection method and device |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN106326485A (en) | Method for detecting web link and device thereof | |
CN103942497B (en) | Forensics type website vulnerability scanning method and system | |
CN104317938B (en) | Web page interlinkage validation verification method and device | |
CN104980309B (en) | website security detection method and device | |
US9075914B2 (en) | Analytics driven development | |
CN104346462B (en) | Preserve the method, apparatus and browser client of web page element | |
CN104881603B (en) | Webpage redirects leak detection method and device | |
CN106844522A (en) | A kind of network data crawling method and device | |
CN105373478B (en) | Automated testing method and system | |
CN102870118B (en) | Access method, device and system to user behavior | |
CN103678109A (en) | Dump document analysis method, device and system | |
CN106598991A (en) | Web crawler system capable of realizing website interaction and automatic form extraction by conversational mode | |
CN108399124A (en) | Application testing method, device, computer equipment and storage medium | |
CN107340954A (en) | A kind of information extracting method and device | |
CN107729729A (en) | It is a kind of based on random forest slip identifying code automatically by method of testing | |
CN105117340B (en) | URL detection methods and device for iOS browser application quality evaluations | |
CN108874802A (en) | Page detection method and device | |
CN108694325A (en) | The condition discriminating apparatus of the discriminating conduct and specified type website of specified type website | |
CN104317884A (en) | Method and device for acquiring types of source pages of website | |
CN111476446A (en) | Service state monitoring processing method, device, equipment and storage medium | |
CN103929498B (en) | The method and apparatus for handling client request | |
CN107085684A (en) | The detection method and device of performance of program | |
JP7074188B2 (en) | Security coping ability measurement system, method and program | |
Patil et al. | Survey on different phases of digital forensics investigation models | |
CN106126538A (en) | The transformation processing method of the page and device |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20170111 |