CN108062362A - Dead chain detection method and device - Google Patents

Dead chain detection method and device Download PDF

Info

Publication number
CN108062362A
CN108062362A CN201711247919.0A CN201711247919A CN108062362A CN 108062362 A CN108062362 A CN 108062362A CN 201711247919 A CN201711247919 A CN 201711247919A CN 108062362 A CN108062362 A CN 108062362A
Authority
CN
China
Prior art keywords
chained address
webpage
content
dead chain
state
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201711247919.0A
Other languages
Chinese (zh)
Inventor
常炎隆
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Small Mutual Entertainment Technology Co Ltd
Original Assignee
Beijing Small Mutual Entertainment Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Small Mutual Entertainment Technology Co Ltd filed Critical Beijing Small Mutual Entertainment Technology Co Ltd
Priority to CN201711247919.0A priority Critical patent/CN108062362A/en
Publication of CN108062362A publication Critical patent/CN108062362A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/958Organisation or management of web site content, e.g. publishing, maintaining pages or automatic linking
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/951Indexing; Web crawling techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/955Retrieval from the web using information identifiers, e.g. uniform resource locators [URL]
    • G06F16/9566URL specific, e.g. using aliases, detecting broken or misspelled links

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The present invention proposes a kind of dead chain detection method and device, and wherein method includes:Obtain chained address to be detected;Obtain the corresponding resource in chained address;Resource is parsed and is rendered, obtains the content and/or state of the corresponding webpage in chained address;According to the content and/or state of webpage, judge whether chained address is dead chain, the content and/or state of the corresponding webpage in each chained address are obtained so as to simulation browser, whether it is dead chain according to the content of webpage and/or condition adjudgement chained address, accuracy is high, time is short, and efficient, disclosure satisfy that the subsequently data analysis requirements based on chained address.

Description

Dead chain detection method and device
Technical field
The present invention relates to Internet technical field more particularly to a kind of dead chain detection method and device.
Background technology
At present, the link data that Web Spider reptile etc. grabs, are frequently used to the data analysis of various demands, example Such as user requirements analysis, network traffics detect.However, there are a large amount of dead chains in the link data that Web Spider crawler capturing arrives, It can cause data analysis inefficient, and be easy to cause data analysis program collapse.Based on this, current dead chain detection method is main It is by manually screening dead chain, detection efficiency is low, of high cost, poor accuracy, and detection time is long, it is impossible to meet subsequent data Analysis demand.
The content of the invention
It is contemplated that it solves at least some of the technical problems in related technologies.
For this purpose, first purpose of the present invention is to propose a kind of dead chain detection method, it is dead in the prior art for solving The problem of chain detection efficiency is poor, poor accuracy and time are long.
Second object of the present invention is to propose a kind of dead chain detection device.
Third object of the present invention is to propose another dead chain detection device.
Fourth object of the present invention is to propose a kind of non-transitorycomputer readable storage medium.
The 5th purpose of the present invention is to propose a kind of computer program product.
In order to achieve the above object, first aspect present invention embodiment proposes a kind of dead chain detection method, including:
Obtain chained address to be detected;
Obtain the corresponding resource in the chained address;
The resource is parsed and rendered, obtains the content and/or state of the corresponding webpage in the chained address;
According to the content and/or state of the webpage, judge whether the chained address is dead chain.
Further, before the corresponding resource in the acquisition chained address, further include:
Obtain the quantity of the chained address;
When the quantity of the chained address is at least two, the priority of the chained address is determined;
According to the priority of the chained address, the chained address is ranked up;
It is corresponding, it is described to obtain the corresponding resource in the chained address, including:
According to clooating sequence, the corresponding resource in preceding chained address that sorts is obtained successively.
Further, the content and/or state according to the webpage judges whether the chained address is dead chain, Including:
Judge in the content of the webpage with the presence or absence of the content for meeting dead chain condition;And/or
Judge whether the state of the webpage meets the dead chain condition;
Exist in the content of the webpage content that meets dead chain condition and/or, described in the state of the webpage meets During dead chain condition, it is dead chain to determine the corresponding chained address of the webpage.
Further, the content of the webpage includes:The word content that is shown in the webpage and/or, to the webpage In the word content that is identified of picture or video.
Further, the priority for determining the chained address, including:
Domain name and/or server name in the chained address determine the priority of the chained address.
The dead chain detection method of the embodiment of the present invention, by obtaining chained address to be detected;Chained address is obtained to correspond to Resource;Resource is parsed and is rendered, obtains the content and/or state of the corresponding webpage in chained address;According to webpage Content and/or state judge whether chained address is dead chain, and the corresponding net in each chained address is obtained so as to simulation browser Whether the content and/or state of page are dead chain according to the content of webpage and/or condition adjudgement chained address, and accuracy is high, the time It is short and efficient, it disclosure satisfy that the data analysis requirements subsequently based on link data.
In order to achieve the above object, second aspect of the present invention embodiment proposes a kind of dead chain detection device, including:
Acquisition module, for obtaining chained address to be detected;
The acquisition module is additionally operable to obtain the corresponding resource in the chained address;
Rendering module is parsed, for the resource to be parsed and rendered, obtains the corresponding webpage in the chained address Content and/or state;
Judgment module for the content and/or state according to the webpage, judges whether the chained address is dead chain.
Further, the device further includes:Determining module and sorting module;
The acquisition module is additionally operable to obtain the quantity of the chained address;
The determining module, for when the quantity of the chained address is at least two, determining the chained address Priority;
The sorting module for the priority according to the chained address, is ranked up the chained address;
Corresponding, the acquisition module is specifically used for, and according to clooating sequence, obtains preceding chained address pair of sorting successively The resource answered.
Further, the judgment module is specifically used for,
Judge in the content of the webpage with the presence or absence of the content for meeting dead chain condition;And/or
Judge whether the state of the webpage meets the dead chain condition;
Exist in the content of the webpage content that meets dead chain condition and/or, described in the state of the webpage meets During dead chain condition, it is dead chain to determine the corresponding chained address of the webpage.
Further, the content of the webpage includes:The word content that is shown in the webpage and/or, to the webpage In the word content that is identified of picture or video.
Further, the determining module is specifically used for,
Domain name and/or server name in the chained address determine the priority of the chained address.
The dead chain detection device of the embodiment of the present invention, by obtaining chained address to be detected;Chained address is obtained to correspond to Resource;Resource is parsed and is rendered, obtains the content and/or state of the corresponding webpage in chained address;According to webpage Content and/or state judge whether chained address is dead chain, and the corresponding net in each chained address is obtained so as to simulation browser Whether the content and/or state of page are dead chain according to the content of webpage and/or condition adjudgement chained address, and accuracy is high, the time It is short and efficient, it disclosure satisfy that the subsequently data analysis requirements based on chained address.
In order to achieve the above object, third aspect present invention embodiment proposes another dead chain detection device, including:Storage Device, processor and storage are on a memory and the computer program that can run on a processor, which is characterized in that the processor Dead chain detection method as described above is realized when performing described program.
To achieve these goals, fourth aspect present invention embodiment proposes a kind of computer-readable storage of non-transitory Medium is stored thereon with computer program, and dead chain detection method as described above is realized when which is executed by processor.
To achieve these goals, fifth aspect present invention embodiment proposes a kind of computer program product, when described When instruction processing unit in computer program product performs, a kind of dead chain detection method is performed, the described method includes:
Obtain chained address to be detected;
Obtain the corresponding resource in the chained address;
The resource is parsed and rendered, obtains the content and/or state of the corresponding webpage in the chained address;
According to the content and/or state of the webpage, judge whether the chained address is dead chain.
The additional aspect of the present invention and advantage will be set forth in part in the description, and will partly become from the following description It obtains substantially or is recognized by the practice of the present invention.
Description of the drawings
Above-mentioned and/or additional aspect and advantage of the invention will become from the following description of the accompanying drawings of embodiments Substantially and it is readily appreciated that, wherein:
Fig. 1 is a kind of flow diagram of dead chain detection method provided in an embodiment of the present invention;
Fig. 2 is the flow diagram of another dead chain detection method provided in an embodiment of the present invention;
Fig. 3 is a kind of structure diagram of dead chain detection device provided in an embodiment of the present invention;
Fig. 4 is the structure diagram of another dead chain detection device provided in an embodiment of the present invention;
Fig. 5 is the structure diagram of another dead chain detection device provided in an embodiment of the present invention.
Specific embodiment
The embodiment of the present invention is described below in detail, the example of the embodiment is shown in the drawings, wherein from beginning to end Same or similar label represents same or similar element or has the function of same or like element.Below with reference to attached The embodiment of figure description is exemplary, it is intended to for explaining the present invention, and is not considered as limiting the invention.
Below with reference to the accompanying drawings the dead chain detection method and device of the embodiment of the present invention are described.
Fig. 1 is a kind of flow diagram of dead chain detection method provided in an embodiment of the present invention.As shown in Figure 1, damned chain Detection method comprises the following steps:
S101, chained address to be detected is obtained.
The executive agent of dead chain detection method provided by the invention is dead chain detection device, and dead chain detection device can be hard The software installed on part equipment, such as server etc. or hardware device.Wherein, chained address to be detected for example can be Uniform resource locator (Uniform Resource Locator, URL).
In the present embodiment, chained address to be detected can be the chain in the link data that Web Spider reptile etc. grabs Location is grounded, dead chain detection device can be interacted with instruments such as Web Spider reptiles, obtain the chain that Web Spider reptile etc. grabs Data are connect, alternatively, dead chain detection device can be interacted with data analysis tool, obtain link data, wherein, data analysis tool Link data to be grabbed based on Web Spider reptile etc. carry out the instrument of various demand analyses.
S102, the corresponding resource in chained address is obtained.
In the present embodiment, dead chain detection device can send to corresponding server according to chained address and ask, and obtain chain It is grounded the corresponding resource in location.
S103, resource is parsed and is rendered, obtain the content and/or state of the corresponding webpage in chained address.
In the present embodiment, dead chain detection device can be with simulation browser, parsing rendering intent or collection with reference to browser Into the parsing module of browser, the corresponding resource in chained address is parsed and rendered, obtain the corresponding webpage in chained address Display interface;Web displaying interface is analyzed, obtains content and/or state in webpage.Wherein, the content of webpage can be with Including:The word content that is shown in webpage and/or, the word content that the picture in webpage or video are identified.
S104, content and/or state according to webpage judge whether chained address is dead chain.
In the present embodiment, the process that dead chain detection device performs step 104 is specifically as follows, judge be in the content of webpage It is no to there is the content for meeting dead chain condition;And/or judge whether the state of webpage meets dead chain condition;In the content of webpage And/or, when the state of webpage meets dead chain condition, the corresponding chained address of webpage is determined in the presence of the content for meeting dead chain condition For dead chain.
Wherein, can include in dead chain condition:At least one word and at least one state etc..Word such as " video Loading failure ", " loading is unsuccessful ", " webpage is not opened ", " can not show this webpage " etc..State such as " 404Not Found ", " 403Forbidden " etc..
Further, after step 104, the method can also include:Definite chained address be dead chain when, delete The chained address, avoids carrying out the corresponding resource in the chained address when subsequently carrying out data analysis being analyzed, waste money Source.
In the present embodiment, whether it is dead chain according to the content of webpage and/or condition adjudgement chained address, is capable of detecting when root The subproblem examined and do not measured is detected according to the corresponding resource in chained address, such as video loads unsuccessful problem etc., So as to more accurately detect dead chain, the accuracy of dead chain detection is further improved.
The dead chain detection method of the embodiment of the present invention, by obtaining chained address to be detected;Chained address is obtained to correspond to Resource;Resource is parsed and is rendered, obtains the content and/or state of the corresponding webpage in chained address;According to webpage Content and/or state judge whether chained address is dead chain, and the corresponding net in each chained address is obtained so as to simulation browser Whether the content and/or state of page are dead chain according to the content of webpage and/or condition adjudgement chained address, and accuracy is high, the time It is short and efficient, it disclosure satisfy that the subsequently data analysis requirements based on chained address.
Fig. 2 is the flow diagram of another dead chain detection method provided in an embodiment of the present invention, as shown in Fig. 2, in Fig. 1 On the basis of illustrated embodiment, before step 102, the method can also comprise the following steps:
S105, the quantity for obtaining chained address.
S106, when the quantity of chained address is at least two, determine the priority of chained address.
In the present embodiment, when there are many quantity of chained address, above-mentioned dead chain detection method may not detect portion in time Divide whether important chained address is dead chain, so as to influence the efficiency of the subsequently data analysis based on chained address, in order to The important chained address of guarantee section is detected in time, and dead chain detection device can first determine the priority of chained address, It is ranked up according to the priority of chained address, dead chain detection is carried out according to clooating sequence, so as to ensure important link Address can be detected in time, and then improve the efficiency of the subsequently data analysis based on chained address.
Wherein, dead chain detection device can be inquired about default preferential according to the domain name and/or server name in chained address Grade list determines the priority of chained address.The corresponding priority of domain name or clothes can be wherein preserved in priority list The corresponding priority of device name of being engaged in or the corresponding priority of domain name+server name.
In addition, it is necessary to illustrate, in the present embodiment, dead chain detection device can also be more than pre- in the quantity of chained address If during amount threshold, determining the priority of chained address, it is ranked up according to priority and dead chain detects.Default amount threshold It can be set according to actual needs.
S107, the priority according to chained address, are ranked up chained address.
Corresponding, step 102 is specifically as follows, and according to clooating sequence, obtains preceding chained address of sorting successively and corresponds to Resource.That is, dead chain detection device can carry out dead chain detection to each chained address successively according to clooating sequence. Wherein, clooating sequence can be to be ranked up from high to low according to the priority of chained address.
The dead chain detection method of the embodiment of the present invention, by obtaining chained address to be detected;Obtain the number of chained address Amount;When the quantity of chained address is at least two, the priority of chained address is determined;It is right according to the priority of chained address Chained address is ranked up;According to clooating sequence, the corresponding resource in preceding chained address that sorts is obtained successively;Resource is carried out It parses and renders, obtain the content and/or state of the corresponding webpage in chained address;According to the content and/or state of webpage, judge Whether chained address is dead chain, and the content and/or state of the corresponding webpage in each chained address are obtained so as to simulation browser, Whether it is dead chain according to the content of webpage and/or condition adjudgement chained address, accuracy is high, and the time is short, and efficient, Neng Gouman The sufficient subsequently data analysis requirements based on chained address.
Fig. 3 is a kind of structure diagram of dead chain detection device provided in an embodiment of the present invention.As shown in figure 3, including:It obtains Modulus block 31, parsing rendering module 32 and judgment module 33;
Wherein, acquisition module 31, for obtaining chained address to be detected;
The acquisition module 31 is additionally operable to obtain the corresponding resource in the chained address;
Rendering module 32 is parsed, for the resource to be parsed and rendered, obtains the corresponding net in the chained address The content and/or state of page;
Judgment module 33 for the content and/or state according to the webpage, judges whether the chained address is dead Chain.
Dead chain detection device provided by the invention can be hardware device, pacify on such as server etc. or hardware device The software of dress.Wherein, chained address to be detected for example can be uniform resource locator (Uniform Resource Locator, URL).
In the present embodiment, chained address to be detected can be the chain in the link data that Web Spider reptile etc. grabs Location is grounded, dead chain detection device can be interacted with instruments such as Web Spider reptiles, obtain the chain that Web Spider reptile etc. grabs Data are connect, alternatively, dead chain detection device can be interacted with data analysis tool, obtain link data, wherein, data analysis tool Link data to be grabbed based on Web Spider reptile etc. carry out the instrument of various demand analyses.
In the present embodiment, dead chain detection device can be with simulation browser, parsing rendering intent or collection with reference to browser Into the parsing module of browser, the corresponding resource in chained address is parsed and rendered, obtain the corresponding webpage in chained address Display interface;Web displaying interface is analyzed, obtains content and/or state in webpage.Wherein, the content of webpage can be with Including:The word content that is shown in webpage and/or, the word content that the picture in webpage or video are identified.
In the present embodiment, judgment module specifically can be used for, and judge in the content of the webpage with the presence or absence of meeting dead chain The content of condition;And/or judge whether the state of the webpage meets the dead chain condition;It is deposited in the content of the webpage Meet the content of dead chain condition and/or, when the state of the webpage meets the dead chain condition, determine that the webpage corresponds to Chained address be dead chain.
Further, dead chain detection device can be also used for, and when in definite chained address being dead chain, delete chain ground connection Location, avoids carrying out the corresponding resource in the chained address when subsequently carrying out data analysis being analyzed, waste of resource.
In the present embodiment, whether it is dead chain according to the content of webpage and/or condition adjudgement chained address, is capable of detecting when root The subproblem examined and do not measured is detected according to the corresponding resource in chained address, such as video loads unsuccessful problem etc., So as to more accurately detect dead chain, the accuracy of dead chain detection is further improved.
The dead chain detection device of the embodiment of the present invention, by obtaining chained address to be detected;Chained address is obtained to correspond to Resource;Resource is parsed and is rendered, obtains the content and/or state of the corresponding webpage in chained address;According to webpage Content and/or state judge whether chained address is dead chain, and the corresponding net in each chained address is obtained so as to simulation browser Whether the content and/or state of page are dead chain according to the content of webpage and/or condition adjudgement chained address, and accuracy is high, the time It is short and efficient, it disclosure satisfy that the subsequently data analysis requirements based on chained address.
Further, with reference to reference to figure 4, on the basis of embodiment illustrated in fig. 3, the device can also include:Really Cover half block 34 and sorting module 35;
The acquisition module 31 is additionally operable to obtain the quantity of the chained address;
The determining module 34, for when the quantity of the chained address is at least two, determining the chained address Priority;
The sorting module 35 for the priority according to the chained address, is ranked up the chained address;
Corresponding, the acquisition module 31 is specifically used for, and according to clooating sequence, obtains preceding chained address of sorting successively Corresponding resource.
In the present embodiment, when there are many quantity of chained address, above-mentioned dead chain detection device may not detect portion in time Divide whether important chained address is dead chain, so as to influence the efficiency of the subsequently data analysis based on chained address, in order to The important chained address of guarantee section is detected in time, and dead chain detection device can first determine the priority of chained address, It is ranked up according to the priority of chained address, dead chain detection is carried out according to clooating sequence, so as to ensure important link Address can be detected in time, and then improve the efficiency of the subsequently data analysis based on chained address.
Wherein, dead chain detection device can be inquired about default preferential according to the domain name and/or server name in chained address Grade list determines the priority of chained address.The corresponding priority of domain name or clothes can be wherein preserved in priority list The corresponding priority of device name of being engaged in or the corresponding priority of domain name+server name.
In addition, it is necessary to illustrate, in the present embodiment, dead chain detection device can also be more than pre- in the quantity of chained address If during amount threshold, determining the priority of chained address, it is ranked up according to priority and dead chain detects.Default amount threshold It can be set according to actual needs.
Further, the determining module is specifically used for, domain name and/or server name in the chained address, Determine the priority of the chained address.
The dead chain detection device of the embodiment of the present invention, by obtaining chained address to be detected;Obtain the number of chained address Amount;When the quantity of chained address is at least two, the priority of chained address is determined;It is right according to the priority of chained address Chained address is ranked up;According to clooating sequence, the corresponding resource in preceding chained address that sorts is obtained successively;Resource is carried out It parses and renders, obtain the content and/or state of the corresponding webpage in chained address;According to the content and/or state of webpage, judge Whether chained address is dead chain, and the content and/or state of the corresponding webpage in each chained address are obtained so as to simulation browser, Whether it is dead chain according to the content of webpage and/or condition adjudgement chained address, accuracy is high, and the time is short, and efficient, Neng Gouman The sufficient subsequently data analysis requirements based on chained address.
Fig. 5 is the structure diagram of another dead chain detection device provided in an embodiment of the present invention.Damned chain detection device Including:
Memory 1001, processor 1002 and it is stored in the calculating that can be run on memory 1001 and on processor 1002 Machine program.
Processor 1002 realizes the dead chain detection method provided in above-described embodiment when performing described program.
Further, dead chain detection device further includes:
Communication interface 1003, for the communication between memory 1001 and processor 1002.
Memory 1001, for storing the computer program that can be run on processor 1002.
Memory 1001 may include high-speed RAM memory, it is also possible to further include nonvolatile memory (non- Volatile memory), a for example, at least magnetic disk storage.
Processor 1002, for performing described program when, realize the dead chain detection method described in above-described embodiment.
If memory 1001, processor 1002 and the independent realization of communication interface 1003, communication interface 1003, memory 1001 and processor 1002 can be connected with each other by bus and complete mutual communication.The bus can be industrial standard Architecture (Industry Standard Architecture, referred to as ISA) bus, external equipment interconnection (Peripheral Component, referred to as PCI) bus or extended industry-standard architecture (Extended Industry Standard Architecture, referred to as EISA) bus etc..The bus can be divided into address bus, data/address bus, control Bus processed etc..For ease of representing, only represented in Fig. 5 with a thick line, it is not intended that an only bus or a type of Bus.
Optionally, in specific implementation, if memory 1001, processor 1002 and communication interface 1003, are integrated in one It is realized on block chip, then memory 1001, processor 1002 and communication interface 1003 can be completed mutual by internal interface Communication.
Processor 1002 may be a central processing unit (Central Processing Unit, referred to as CPU) or Person is specific integrated circuit (Application Specific Integrated Circuit, referred to as ASIC) or quilt It is configured to implement one or more integrated circuits of the embodiment of the present invention.
The present invention also provides a kind of computer readable storage mediums, are stored thereon with computer program, which is processed Dead chain detection method as described above is realized when device performs.
In the description of this specification, reference term " one embodiment ", " some embodiments ", " example ", " specifically show The description of example " or " some examples " etc. means specific features, structure, material or the spy for combining the embodiment or example description Point is contained at least one embodiment of the present invention or example.In the present specification, schematic expression of the above terms is not It must be directed to identical embodiment or example.Moreover, particular features, structures, materials, or characteristics described can be in office It is combined in an appropriate manner in one or more embodiments or example.In addition, without conflicting with each other, the skill of this field Art personnel can tie the different embodiments described in this specification or example and different embodiments or exemplary feature It closes and combines.
In addition, term " first ", " second " are only used for description purpose, and it is not intended that instruction or hint relative importance Or the implicit quantity for indicating indicated technical characteristic.Define " first " as a result, the feature of " second " can be expressed or Implicitly include at least one this feature.In the description of the present invention, " multiple " are meant that at least two, such as two, three It is a etc., unless otherwise specifically defined.
Any process described otherwise above or method description are construed as in flow chart or herein, represent to include Module, segment or the portion of the code of the executable instruction of one or more the step of being used to implement custom logic function or process Point, and the scope of the preferred embodiment of the present invention includes other realization, wherein can not press shown or discuss suitable Sequence, including according to involved function by it is basic simultaneously in the way of or in the opposite order, carry out perform function, this should be of the invention Embodiment person of ordinary skill in the field understood.
Expression or logic and/or step described otherwise above herein in flow charts, for example, being considered use In the order list for the executable instruction for realizing logic function, may be embodied in any computer-readable medium, for Instruction execution system, device or equipment (such as computer based system, including the system of processor or other can be held from instruction Row system, device or equipment instruction fetch and the system executed instruction) it uses or combines these instruction execution systems, device or set It is standby and use.For the purpose of this specification, " computer-readable medium " can any can be included, store, communicate, propagate or pass Defeated program is for instruction execution system, device or equipment or the dress used with reference to these instruction execution systems, device or equipment It puts.The more specific example (non-exhaustive list) of computer-readable medium includes following:Electricity with one or more wiring Connecting portion (electronic device), portable computer diskette box (magnetic device), random access memory (RAM), read-only memory (ROM), erasable edit read-only storage (EPROM or flash memory), fiber device and portable optic disk is read-only deposits Reservoir (CDROM).In addition, computer-readable medium can even is that the paper that can print described program on it or other are suitable Medium, because can be for example by carrying out optical scanner to paper or other media, then into edlin, interpretation or if necessary with it His suitable method is handled electronically to obtain described program, is then stored in computer storage.
It should be appreciated that each several part of the present invention can be realized with hardware, software, firmware or combination thereof.Above-mentioned In embodiment, software that multiple steps or method can in memory and by suitable instruction execution system be performed with storage Or firmware is realized.Such as, if realized with hardware in another embodiment, following skill well known in the art can be used Any one of art or their combination are realized:With for data-signal realize logic function logic gates from Logic circuit is dissipated, the application-specific integrated circuit with suitable combinational logic gate circuit, programmable gate array (PGA), scene can compile Journey gate array (FPGA) etc..
Those skilled in the art are appreciated that realize all or part of step that above-described embodiment method carries Suddenly it is that relevant hardware can be instructed to complete by program, the program can be stored in a kind of computer-readable storage medium In matter, the program upon execution, one or a combination set of the step of including embodiment of the method.
In addition, each functional unit in each embodiment of the present invention can be integrated in a processing module, it can also That unit is individually physically present, can also two or more units be integrated in a module.Above-mentioned integrated mould The form that hardware had both may be employed in block is realized, can also be realized in the form of software function module.The integrated module is such as Fruit is realized in the form of software function module and is independent production marketing or in use, can also be stored in a computer In read/write memory medium.
Storage medium mentioned above can be read-only memory, disk or CD etc..Although it has been shown and retouches above The embodiment of the present invention is stated, it is to be understood that above-described embodiment is exemplary, it is impossible to be interpreted as the limit to the present invention System, those of ordinary skill in the art can be changed above-described embodiment, change, replace and become within the scope of the invention Type.

Claims (13)

1. a kind of dead chain detection method, which is characterized in that including:
Obtain chained address to be detected;
Obtain the corresponding resource in the chained address;
The resource is parsed and rendered, obtains the content and/or state of the corresponding webpage in the chained address;
According to the content and/or state of the webpage, judge whether the chained address is dead chain.
2. according to the method described in claim 1, it is characterized in that, it is described obtain the corresponding resource in the chained address before, It further includes:
Obtain the quantity of the chained address;
When the quantity of the chained address is at least two, the priority of the chained address is determined;
According to the priority of the chained address, the chained address is ranked up;
It is corresponding, it is described to obtain the corresponding resource in the chained address, including:
According to clooating sequence, the corresponding resource in preceding chained address that sorts is obtained successively.
3. according to the method described in claim 1, it is characterized in that, the content and/or state according to the webpage, judges Whether the chained address is dead chain, including:
Judge in the content of the webpage with the presence or absence of the content for meeting dead chain condition;And/or
Judge whether the state of the webpage meets the dead chain condition;
In the content of the webpage exist meet dead chain condition content and/or, the state of the webpage meets the dead chain During condition, it is dead chain to determine the corresponding chained address of the webpage.
4. according to any methods of claim 1-3, which is characterized in that the content of the webpage includes:In the webpage The word content of display and/or, the word content that the picture in the webpage or video are identified.
5. according to the method described in claim 2, it is characterized in that, it is described determine the chained address priority, including:
Domain name and/or server name in the chained address determine the priority of the chained address.
6. a kind of dead chain detection device, which is characterized in that including:
Acquisition module, for obtaining chained address to be detected;
The acquisition module is additionally operable to obtain the corresponding resource in the chained address;
Rendering module is parsed, for the resource to be parsed and rendered, obtains the interior of the corresponding webpage in the chained address Appearance and/or state;
Judgment module for the content and/or state according to the webpage, judges whether the chained address is dead chain.
7. device according to claim 6, which is characterized in that further include:Determining module and sorting module;
The acquisition module is additionally operable to obtain the quantity of the chained address;
The determining module, for when the quantity of the chained address is at least two, determining the preferential of the chained address Grade;
The sorting module for the priority according to the chained address, is ranked up the chained address;
Corresponding, the acquisition module is specifically used for, and according to clooating sequence, it is corresponding to obtain preceding chained address of sorting successively Resource.
8. device according to claim 6, which is characterized in that the judgment module is specifically used for,
Judge in the content of the webpage with the presence or absence of the content for meeting dead chain condition;And/or
Judge whether the state of the webpage meets the dead chain condition;
In the content of the webpage exist meet dead chain condition content and/or, the state of the webpage meets the dead chain During condition, it is dead chain to determine the corresponding chained address of the webpage.
9. according to any devices of claim 6-8, which is characterized in that the content of the webpage includes:In the webpage The word content of display and/or, the word content that the picture in the webpage or video are identified.
10. device according to claim 7, which is characterized in that the determining module is specifically used for,
Domain name and/or server name in the chained address determine the priority of the chained address.
11. a kind of dead chain detection device, which is characterized in that including:
Memory, processor and storage are on a memory and the computer program that can run on a processor, which is characterized in that institute State the dead chain detection method realized when processor performs described program as described in any in claim 1-5.
12. a kind of non-transitorycomputer readable storage medium, is stored thereon with computer program, which is characterized in that the program The dead chain detection method as described in any in claim 1-5 is realized when being executed by processor.
13. a kind of computer program product when the instruction processing unit in the computer program product performs, performs a kind of dead Chain detection method, the described method includes:
Obtain chained address to be detected;
Obtain the corresponding resource in the chained address;
The resource is parsed and rendered, obtains the content and/or state of the corresponding webpage in the chained address;
According to the content and/or state of the webpage, judge whether the chained address is dead chain.
CN201711247919.0A 2017-12-01 2017-12-01 Dead chain detection method and device Pending CN108062362A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201711247919.0A CN108062362A (en) 2017-12-01 2017-12-01 Dead chain detection method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201711247919.0A CN108062362A (en) 2017-12-01 2017-12-01 Dead chain detection method and device

Publications (1)

Publication Number Publication Date
CN108062362A true CN108062362A (en) 2018-05-22

Family

ID=62136033

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201711247919.0A Pending CN108062362A (en) 2017-12-01 2017-12-01 Dead chain detection method and device

Country Status (1)

Country Link
CN (1) CN108062362A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112269666A (en) * 2020-11-10 2021-01-26 北京百度网讯科技有限公司 Applet dead link detection method and device, computing device and medium
CN113590987A (en) * 2021-09-29 2021-11-02 飞狐信息技术(天津)有限公司 Link detection method and device

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102663062A (en) * 2012-03-30 2012-09-12 奇智软件(北京)有限公司 Method and device for processing invalid links in search result
CN102752154A (en) * 2012-07-29 2012-10-24 西北工业大学 Detecting method of dead link of Web site
CN104317938A (en) * 2014-10-31 2015-01-28 北京国双科技有限公司 Webpage validation method and device
CN104869033A (en) * 2015-04-23 2015-08-26 百度在线网络技术(北京)有限公司 Method and apparatus for determining dead links
US9298839B2 (en) * 2012-05-30 2016-03-29 International Business Machines Corporation Resolving a dead shortened uniform resource locator

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102663062A (en) * 2012-03-30 2012-09-12 奇智软件(北京)有限公司 Method and device for processing invalid links in search result
US9298839B2 (en) * 2012-05-30 2016-03-29 International Business Machines Corporation Resolving a dead shortened uniform resource locator
CN102752154A (en) * 2012-07-29 2012-10-24 西北工业大学 Detecting method of dead link of Web site
CN104317938A (en) * 2014-10-31 2015-01-28 北京国双科技有限公司 Webpage validation method and device
CN104869033A (en) * 2015-04-23 2015-08-26 百度在线网络技术(北京)有限公司 Method and apparatus for determining dead links

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112269666A (en) * 2020-11-10 2021-01-26 北京百度网讯科技有限公司 Applet dead link detection method and device, computing device and medium
CN112269666B (en) * 2020-11-10 2023-07-25 北京百度网讯科技有限公司 Applet dead-link detection method and device, computing device and medium
CN113590987A (en) * 2021-09-29 2021-11-02 飞狐信息技术(天津)有限公司 Link detection method and device

Similar Documents

Publication Publication Date Title
CN108595583A (en) Dynamic chart class page data crawling method, device, terminal and storage medium
CN104881318B (en) A kind of interface call method, device and terminal
CN107908959A (en) Site information detection method, device, electronic equipment and storage medium
CN107678937A (en) Page compatibility detection method, device, server and medium
CN104036003B (en) search result integration method and device
CN104021185B (en) The method and apparatus is identified by the information attribute of data in webpage
CN106354519A (en) Method and device for generating label for user portrait
CN109325161A (en) Public sentiment data grasping means, device, equipment and storage medium
CN106469350A (en) The generation method of inspection task, system server
CN110069827B (en) Layout and wiring method and device for FPGA (field programmable Gate array) online logic analyzer
CN106033450B (en) Advertisement blocking method and device and browser
CN104572923A (en) Method and device for advertisement blocking in dual-core browser
CN107247722A (en) File scanning method and device and intelligent terminal
CN108062362A (en) Dead chain detection method and device
CN108334508A (en) The extracting method and device of webpage information
CN106649221A (en) Method and device for detecting duplicated texts
CN108985289A (en) Messy code detection method and device
CN110069739A (en) The page preloads method and device
CN110619103A (en) Webpage image-text detection method and device and storage medium
CN107766036B (en) Module construction method and device and terminal equipment
CN107977234A (en) Software function bootstrap technique and device
CN103929339B (en) A kind of web data acquisition method and system
CN109992511B (en) Device and method for obtaining code test coverage rate
CN109582883A (en) The determination method and apparatus of column page
CN105278929A (en) Application program audit data processing method, device and system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20180522

RJ01 Rejection of invention patent application after publication