CN105975395A - Website state reconnaissance method and device - Google Patents

Website state reconnaissance method and device Download PDF

Info

Publication number
CN105975395A
CN105975395A CN201610370314.XA CN201610370314A CN105975395A CN 105975395 A CN105975395 A CN 105975395A CN 201610370314 A CN201610370314 A CN 201610370314A CN 105975395 A CN105975395 A CN 105975395A
Authority
CN
China
Prior art keywords
target web
web
collection target
information
web page
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201610370314.XA
Other languages
Chinese (zh)
Inventor
张军
贾西贝
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenzhen Huaao Data Technology Co Ltd
Original Assignee
Shenzhen Huaao Data Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shenzhen Huaao Data Technology Co Ltd filed Critical Shenzhen Huaao Data Technology Co Ltd
Priority to CN201610370314.XA priority Critical patent/CN105975395A/en
Publication of CN105975395A publication Critical patent/CN105975395A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/36Preventing errors by testing or debugging software
    • G06F11/3604Software analysis for verifying properties of programs
    • G06F11/3612Software analysis for verifying properties of programs by runtime analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/3003Monitoring arrangements specially adapted to the computing system or computing system component being monitored
    • G06F11/3006Monitoring arrangements specially adapted to the computing system or computing system component being monitored where the computing system is distributed, e.g. networked systems, clusters, multiprocessor systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/951Indexing; Web crawling techniques

Abstract

The invention relates to the technical field of network information, in particular to a website state reconnaissance method and device. The website state reconnaissance method includes the following steps that an access request is sent to an acquisition target webpage periodically according to a preset reconnaissance period, and response information returned by a server of the acquisition target webpage is received; the response information is processed; when the response information indicates that the acquisition target webpage is not accessible, first alarming information is sent, wherein the first alarming information is used for indicating that the acquisition target webpage is not accessible. According to the website state reconnaissance method, the accessable state of the acquisition target webpage is reconnoitered periodically, the alarming information is sent when the acquisition target webpage cannot be accessed, and the availability and large-scale acquisition management capacity of information acquisition programs are improved.

Description

Website status reconnaissance method and device
Technical field
The present invention relates to technical field of network information, be specifically related to a kind of website status reconnaissance method and device.
Background technology
Web crawlers is a kind of program or script automatically capturing internet information according to certain rule. Web crawlers is responsible for collecting webpage from the Internet, and gathers information from the webpage collected.More specifically, It it is acquisition web content data from the webserver.
Current crawlers, when carrying out extensive webpage information acquisition, frequently appears according to joining in advance When the parameter put carries out information gathering, return the exception output of unexpected useful information, and be forced to stop webpage The situation of information gathering.For this kind of abnormal output of crawlers, the manpower consumption substantial amounts of time is needed to enter Row failture evacuation, remodifies crawlers, just can proceed webpage information acquisition work.
Summary of the invention
For occurring the problem of accidental interruption in webpage information acquisition, the present invention provides a kind of website status to scout Method and device, in the availability improving information acquisition program to a certain extent.
First aspect, the website status reconnaissance method that the present invention proposes, comprise the following steps: according to setting in advance In the fixed scouting cycle, periodically to gathering target web transmission access request, and receive this collection target network The response message that the server of page returns;Process this response message;And indicate this collection mesh at this response message During mark webpage inaccessible, sending the first warning information, this first warning information is used for indicating this collection target Webpage inaccessible;This collection target web is one in task web page listings;In this task web page listings Each collection info web corresponding to target web respectively according to collection period set in advance, periodically Ground is collected.
Further, the website status reconnaissance method that the present invention proposes, this response message of this process, also include: When this response message indicates this collection target web to access, accessing this collection target web, obtaining should Gather the structure of web page information of target web;And according to this structure of web page information, this collection target detected When the structure of web page of webpage changes, sending the second warning information, this second warning information is used for indicating this The structure of web page gathering target web there occurs change.
Further, the website status reconnaissance method that the present invention proposes, this is according to this structure of web page information, inspection The structure of web page measuring this collection target web changes, including following any one or multinomial: detect this The frame information gathering target web there occurs change;Detect that the content information of this collection target web occurs Change;Detect that the spatial cue of this collection target web there occurs change;This collection target network detected The format information of page there occurs change.
Further, the website status reconnaissance method that the present invention proposes, this scouting cycle set in advance is less than This collection period set in advance.
Compared with prior art, the website status reconnaissance method that the present invention proposes periodically scouts collection target The addressable state of webpage, and send alarm information when inaccessible such that it is able in next time to target network Before page carries out information gathering, the addressable state of webpage is obtained ahead of time, thus reduces because webpage can not be visited Ask and cause the disabled situation of crawlers, decrease invalidated acquisitions operation, improve information acquisition program The integrated management ability of availability and management personnel.
Second aspect, the website status ferreting device that the present invention proposes, including: request access modules, should Ask access modules for according to the scouting cycle set in advance, periodically send to collection target web and access Request, and the response message that the server receiving this collection target web returns;Response message processing module, This response message processing module is used for processing this response message;First alarm module, this first alarm module is used In time indicating this collection target web inaccessible at this response message, sending the first warning information, this is first years old Warning information is used for indicating this collection target web inaccessible;This collection target web is task web page listings In one;Info web corresponding in this task web page listings each collection target web respectively according to Collection period set in advance, the most collected.
Further, the website status ferreting device that the present invention proposes, also include:
Structure of web page data obtaining module, this structure of web page data obtaining module is for indicating at this response message When this collection target web can access, access this collection target web, and obtain this collection target web Structure of web page information;Second alarm module, this second alarm module is for according to this structure of web page information, inspection Measure the structure of web page of this collection target web when changing, send the second warning information, this second alarm Information is for indicating the structure of web page of this collection target web to there occurs change.
Further, the website status ferreting device that the present invention proposes, this is according to this structure of web page information, inspection The structure of web page measuring this collection target web changes, including following any one or multinomial: detect this The frame information gathering target web there occurs change;Detect that the content information of this collection target web occurs Change;Detect that the spatial cue of this collection target web there occurs change;This collection target network detected The format information of page there occurs change.
Further, the website status ferreting device that the present invention proposes, this scouting cycle set in advance is less than This collection period set in advance.
Compared with prior art, the website status ferreting device that the present invention proposes periodically scouts collection target The addressable state of webpage, and send alarm information when inaccessible such that it is able in next time to target network Before page carries out information gathering, the addressable state of webpage is obtained ahead of time, thus reduces because webpage can not be visited Ask and cause the disabled situation of crawlers, decrease invalidated acquisitions operation, improve information acquisition program The integrated management ability of availability and management personnel.
Accompanying drawing explanation
In order to be illustrated more clearly that the specific embodiment of the invention or technical scheme of the prior art, below will The accompanying drawing used required in detailed description of the invention or description of the prior art is briefly described.
Accompanying drawing herein is merged in description and constitutes the part of this specification, it is shown that meet the present invention Embodiment, and for explaining the principle of the present invention together with description.
Fig. 1 is the schematic flow sheet of the embodiment of the present invention 1 website status reconnaissance method;
Fig. 2 is the schematic flow sheet of the embodiment of the present invention 2 website status reconnaissance method;
Fig. 3 is the schematic flow sheet of the embodiment of the present invention 3 website status reconnaissance method;
Fig. 4 is the composition schematic diagram of the embodiment of the present invention 4 website status ferreting device;
Fig. 5 is the composition schematic diagram of the embodiment of the present invention 5 website status ferreting device.
Detailed description of the invention
For making the purpose of embodiment, technical scheme and advantage clearer, attached below in conjunction with in embodiment Figure, is clearly and completely described the technical scheme in embodiment, it is clear that described embodiment is only It is a part of embodiment of the present invention rather than whole embodiments.
Based on following embodiment, those of ordinary skill in the art are obtained under not making creative work premise The every other embodiment obtained, broadly falls into the scope of protection of the invention.
Embodiment 1
As it is shown in figure 1, the present embodiment website status reconnaissance method, including:
Step S10: according to the scouting cycle set in advance, periodically to gathering target web transmission access Request, and the response message that the server receiving this collection target web returns;
Step S20: process this response message;
Step S30: judge that this response message indicates whether this collection target web may have access to, and in this collection When target web cannot access, execution step S40:
Step S40: send the first warning information, this first warning information is used for indicating this collection target web Inaccessible;This collection target web is one in task web page listings.
Preferably, the info web that each in this task web page listings gathers target web corresponding is pressed respectively The most collected according to collection period set in advance.
Due to internet site and the natural unstability of structure of web page, capture program last month often occurs Available, and the situation that this month just cannot use.In particular for the when of promptly collection, people can be safeguarded Member causes the biggest operating pressure;If concurrent tasks is a lot, then situation can become very passive and poverty-stricken.
The most conventional crawlers, carries out the information gathering of routine, substantially only according to the parameter being pre-configured with It is not concerned with the availability of website and the correcting situation of webpage.This passive working method cannot adapt on a large scale The needs of the management of acquisition tasks.
The present embodiment website status reconnaissance method periodically scouts the addressable state gathering target web, and Alarm information is sent such that it is able to before next time carries out information gathering to target web when inaccessible, The addressable state of webpage is obtained ahead of time, thus reduces because webpage inaccessible causes crawlers unavailable Situation, decrease invalidated acquisitions operation, improve combining of the availability of information acquisition program and management personnel Close operating capability.
Embodiment 2
The present embodiment website status reconnaissance method is on the basis of embodiment 1, further to reconnaissance net page structure Illustrate.
As in figure 2 it is shown, the present embodiment website status reconnaissance method, also include:
Step S50: when this response message indicates this collection target web to access, access this collection mesh Mark webpage, obtains the structure of web page information of this collection target web;
Step S60: according to this structure of web page information, detect that the structure of web page of this collection target web is sent out During changing, sending the second warning information, this second warning information is for indicating the net of this collection target web Page structure there occurs change.
The acquisition module that crawlers is typically according to customizing for this webpage carries out information analysis.Therefore, if Structure of web page changes, when causing structure of web page and this acquisition module set in advance not to match that, and reptile Program will return abnormal data.
The present embodiment website status reconnaissance method, when gathering target web and can access, accesses this collection target Webpage, obtains the structure of web page information of this collection target web, and this is adopted according to this structure of web page infomation detection Whether the structure of web page of collection target web changes, and when structure of web page changes, sends alarm letter Breath such that it is able to before next time carries out information gathering to target web, know that structure of web page has been sent out in advance The raw state changed, thus reduce because webpage inaccessible causes the disabled situation of crawlers, reduce Invalidated acquisitions operation, improves the availability of information acquisition program and the integrated management ability of management personnel.
Embodiment 3
The present embodiment website status reconnaissance method is on the basis of embodiment 2, further to reconnaissance net page structure The method of change illustrates.
As it is shown on figure 3, according to this structure of web page information in the present embodiment website status reconnaissance method, detect The structure of web page of this collection target web changes, and can include following one or more:
Step S61: detect that the frame information of this collection target web there occurs change;
Step S62: detect that the content information of this collection target web there occurs change;
Step S63: detect that the spatial cue of this collection target web there occurs change;
Step S64: detect that the format information of this collection target web there occurs change.
One webpage is a kind of composite file of the content that carrying is shown according to certain layout.Conventional webpage is abided by Follow the requirement of HTML specification.
HTML (HTML:Hyper Text Markup Language, hereinafter referred to as HTML) It is used to describe a kind of language of webpage.HTML is not a kind of programming language, but a kind of markup language (markup language).Markup language is a set of labelling label (markup tag).HTML uses labelling Label describes webpage.Such as:<hTML>with</HTML>between text webpage is described;<body> With</body>between text be visible content of pages;<h1>with</h1>between text be shown as mark Topic;<p>with</p>between text be shown as paragraph.
Generally use h label, p label, table label to carry out the display of content in HTML, use CSS carries out the control of display effect, uses JS to realize dynamic treatment effect (as Asynchronous loading JSON believes Breath).
The variation of web content data is generally divided into following three kinds:
The variation of a, core display content part, i.e. the variation of the content visible part that user pays close attention to;
B, CSS etc. control element or the variation of definition of display effect;
C, the variation of other non-key elements, such as page meta element, the variation etc. of JS code.
It should be noted that the variation of other non-key elements can also include all displays being not concerned with of user Content.
Consideration based on business scenario, when being embodied as, generally only focuses on the first, and what i.e. user paid close attention to can See content part, namely the variation of core display content.
Detailed process is as follows:
The first step, obtains the content of specified page, abandons the HTML that all of non-user such as CSS, JS is paid close attention to Element, extracts the core display content of a band basic format, and preserves.
It should be noted that this core display content is user's preassigned web retrieval information, such as enterprise Title or enterprises registration code name etc., can be one or more.
Second step, takes the extraction logic identical with the first step, and then the result of comparison twice, scouts and specify Whether webpage changes.
After being started by the first step, scouting process subsequently is then for periodically repeating above-mentioned second step.
Preferably, sometimes target web may be bigger, can introduce some Optimized Measures, on reducing State the amount of calculation of comparison in second step.
This core display content can be further divided into again service metadata and business tine, such as certain enterprise letter In the breath page, there is " enterprise name " and " ABC technology Co., Ltd " printed words, wherein " enterprise Title " it is service metadata, " ABC technology Co., Ltd " is business tine.
Owing to capture program generally can't be caused obstructive to affect by the variation of business tine concrete printed words, so Business tine part can not be extracted during the core extracting a band basic format shows content, this Sample can accelerate comparison process, reduces memory data output simultaneously.
The present embodiment website status reconnaissance method is by detecting the frame information of this collection target web, content letter Structure of web page change is scouted by breath, spatial cue or format information.
Preferably, in embodiment 1 to 3 website status reconnaissance method, this scouting cycle set in advance is less than This collection period set in advance.
It should be noted that above-described embodiment 1 to 3 can be combined and implements, and realize the function after combination.
Embodiment 4
As shown in Figure 4, the present embodiment website status ferreting device, including:
Request access modules 10, this request access modules is for according to scouting cycle set in advance, cycle Property ground to gather target web send access request, and receive this collection target web server return should Answer information;
Response message processing module 20, this response message processing module is used for processing this response message;
First alarm module 30, this first alarm module is for indicating this collection target network at this response message During page inaccessible, sending the first warning information, this first warning information is used for indicating this collection target web Inaccessible;This collection target web is one in task web page listings;Every in this task web page listings One gathers info web corresponding to target web respectively according to collection period set in advance, periodically by Gather.
The present embodiment website status ferreting device periodically scouts the addressable state gathering target web, and Alarm information is sent such that it is able to before next time carries out information gathering to target web when inaccessible, The addressable state of webpage is obtained ahead of time, thus reduces because webpage inaccessible causes crawlers unavailable Situation, decrease invalidated acquisitions operation, improve combining of the availability of information acquisition program and management personnel Close operating capability.
Detailed description of the invention and the technique effect of the present embodiment website status ferreting device see embodiment 1, this In repeat no more.
Embodiment 5
The present embodiment website status ferreting device is on the basis of embodiment 4, further to reconnaissance net page structure Method illustrate.
As it is shown in figure 5, the present embodiment website status ferreting device, also include:
Structure of web page data obtaining module 40, this structure of web page data obtaining module is at this response message When indicating this collection target web to access, access this collection target web, and obtain this collection target network The structure of web page information of page;
Second alarm module 50, this second alarm module, for according to this structure of web page information, detects this When the structure of web page of collection target web changes, sending the second warning information, this second warning information is used Change is there occurs in the structure of web page indicating this collection target web.
Preferably, the present embodiment website status ferreting device, this, according to this structure of web page information, detects this The structure of web page gathering target web changes, including following any one or multinomial: this collection mesh detected The frame information of mark webpage there occurs change;Detect that the content information of this collection target web there occurs change; Detect that the spatial cue of this collection target web there occurs change;The form of this collection target web detected Information there occurs change.
Preferably, in embodiment 4 to 5 website status ferreting device, this scouting cycle set in advance is less than This collection period set in advance.
Detailed description of the invention and the technique effect of the present embodiment website status ferreting device see embodiment 2 to 3, Here repeat no more.
It should be noted that above-described embodiment 4 to 5 can be combined and implements, and realize the function after combination.
Being described above various embodiments of the present invention, described above is exemplary, and non-exclusive, And it is also not necessarily limited to disclosed each embodiment.In the scope and spirit without departing from illustrated each embodiment In the case of, many modifications and changes will be apparent from for those skilled in the art. The selection of term used herein, it is intended to explain that the principle of each embodiment, reality are applied or to market best In the improvement of technology, or make other those of ordinary skill of the art be understood that disclose herein each Embodiment.
Those skilled in the art, after considering description and putting into practice disclosure disclosed herein, will readily occur to these public affairs Other embodiment opened.The application is intended to any modification, purposes or the adaptations of the disclosure, These modification, purposes or adaptations are followed the general principle of the disclosure and include that the disclosure is not disclosed Common knowledge in the art or conventional techniques means.
If described function realizes and as independent production marketing or use using the form of SFU software functional unit Time, can be stored in a computer read/write memory medium.Based on such understanding, the skill of the present invention Part or the part of this technical scheme that prior art is contributed by art scheme the most in other words are permissible Embodying with the form of software product, this computer software product is stored in a storage medium, including Some instructions are with so that a computer equipment (can be that personal computer, server, or network set Standby etc.) perform all or part of step of method described in each embodiment of the present invention.And aforesaid storage medium Including: USB flash disk, portable hard drive, read only memory (ROM, Read-Only Memory), random access memory Memorizer (RAM, Random Access Memory), magnetic disc or CD etc. are various can store program The medium of code.
Last it is noted that various embodiments above is only in order to illustrate technical scheme, rather than to it Limit;Although the present invention being described in detail with reference to foregoing embodiments, the ordinary skill of this area Personnel it is understood that the technical scheme described in foregoing embodiments still can be modified by it, or The most some or all of technical characteristic is carried out equivalent;And these amendments or replacement, do not make phase The essence answering technical scheme departs from the scope of various embodiments of the present invention technical scheme, and it all should be contained in the present invention Claim and description scope in the middle of.

Claims (8)

1. a website status reconnaissance method, it is characterised in that comprise the following steps:
According to the scouting cycle set in advance, periodically to gathering target web transmission access request, and connect Receive the response message that the server of described collection target web returns;
Process described response message;
And when described response message indicates described collection target web inaccessible, send the first warning information, Described first warning information is used for indicating described collection target web inaccessible;
Described collection target web is one in task web page listings;Each in described task web page listings Gather info web corresponding to target web respectively according to collection period set in advance, periodically adopted Collection.
Website status reconnaissance method the most according to claim 1, it is characterised in that described process institute State response message, also include:
When described response message indicates described collection target web to access, access described collection target network Page, obtains the structure of web page information of described collection target web;
And according to described structure of web page information, detect that the structure of web page of described collection target web becomes During change, sending the second warning information, described second warning information is for indicating the net of described collection target web Page structure there occurs change.
Website status reconnaissance method the most according to claim 2, it is characterised in that described according to institute State structure of web page information, detect that the structure of web page of described collection target web changes, including following One or more:
Detect that the frame information of described collection target web there occurs change;
Detect that the content information of described collection target web there occurs change;
Detect that the spatial cue of described collection target web there occurs change;
Detect that the format information of described collection target web there occurs change.
Website status reconnaissance method the most according to claim 1, it is characterised in that described in preset The scouting cycle less than described collection period set in advance.
5. a website status ferreting device, it is characterised in that including:
Request access modules, described request access modules is used for according to the scouting cycle set in advance, periodically Ground to gather target web send access request, and receive described collection target web server return should Answer information;
Response message processing module, described response message processing module is used for processing described response message;
First alarm module, described first alarm module is for indicating described collection target at described response message During webpage inaccessible, sending the first warning information, described first warning information is used for indicating described collection mesh Mark webpage inaccessible;
Described collection target web is one in task web page listings;Each in described task web page listings Gather info web corresponding to target web respectively according to collection period set in advance, periodically adopted Collection.
Website status ferreting device the most according to claim 5, it is characterised in that also include:
Structure of web page data obtaining module, described structure of web page data obtaining module is at described response message When indicating described collection target web to access, access described collection target web, and obtain described collection The structure of web page information of target web;
Second alarm module, described second alarm module, for according to described structure of web page information, detects institute State and gather the structure of web page of target web when changing, send the second warning information, described second alarm letter Breath is for indicating the structure of web page of described collection target web to there occurs change.
Website status ferreting device the most according to claim 6, it is characterised in that described according to institute State structure of web page information, detect that the structure of web page of described collection target web changes, including following One or more:
Detect that the frame information of described collection target web there occurs change;
Detect that the content information of described collection target web there occurs change;
Detect that the spatial cue of described collection target web there occurs change;
Detect that the format information of described collection target web there occurs change.
Website status ferreting device the most according to claim 5, it is characterised in that described in preset The scouting cycle less than described collection period set in advance.
CN201610370314.XA 2016-05-30 2016-05-30 Website state reconnaissance method and device Pending CN105975395A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201610370314.XA CN105975395A (en) 2016-05-30 2016-05-30 Website state reconnaissance method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201610370314.XA CN105975395A (en) 2016-05-30 2016-05-30 Website state reconnaissance method and device

Publications (1)

Publication Number Publication Date
CN105975395A true CN105975395A (en) 2016-09-28

Family

ID=57010528

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201610370314.XA Pending CN105975395A (en) 2016-05-30 2016-05-30 Website state reconnaissance method and device

Country Status (1)

Country Link
CN (1) CN105975395A (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106777281A (en) * 2016-12-29 2017-05-31 深圳市华傲数据技术有限公司 For improving web crawlers stability, the data processing method of availability and device
CN109298987A (en) * 2017-07-25 2019-02-01 北京国双科技有限公司 A kind of method and device detecting web crawlers operating status
CN112100083A (en) * 2020-11-13 2020-12-18 北京智慧星光信息技术有限公司 Crawler template change monitoring method and system, electronic equipment and storage medium

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101035128A (en) * 2007-04-18 2007-09-12 大连理工大学 Three-folded webpage text content recognition and filtering method based on the Chinese punctuation
US20100174774A1 (en) * 2009-01-08 2010-07-08 Inernational Business Machines Corporation Method for server-side logging of client browser state through markup language
CN102624713A (en) * 2012-02-29 2012-08-01 深信服网络科技(深圳)有限公司 Website tampering identification method and website tampering identification device

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101035128A (en) * 2007-04-18 2007-09-12 大连理工大学 Three-folded webpage text content recognition and filtering method based on the Chinese punctuation
US20100174774A1 (en) * 2009-01-08 2010-07-08 Inernational Business Machines Corporation Method for server-side logging of client browser state through markup language
CN102624713A (en) * 2012-02-29 2012-08-01 深信服网络科技(深圳)有限公司 Website tampering identification method and website tampering identification device

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
北京网景盛世技术开发公司: "《酷维(CoolWei)网站诊断与监测平台》", 31 January 2007 *

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106777281A (en) * 2016-12-29 2017-05-31 深圳市华傲数据技术有限公司 For improving web crawlers stability, the data processing method of availability and device
CN109298987A (en) * 2017-07-25 2019-02-01 北京国双科技有限公司 A kind of method and device detecting web crawlers operating status
CN112100083A (en) * 2020-11-13 2020-12-18 北京智慧星光信息技术有限公司 Crawler template change monitoring method and system, electronic equipment and storage medium

Similar Documents

Publication Publication Date Title
TWI753887B (en) Front-end user behavior statistics method and device
CN105138599B (en) It is a kind of can in the automatically monitoring website whole page each link clicks amount method
CN107895009B (en) Distributed internet data acquisition method and system
JP6294307B2 (en) Method and system for monitoring and tracking browsing activity on portable devices
CN106294101B (en) The page gets test method and device ready
CN106897347B (en) Webpage display method, operation event recording method and device
CN110020339B (en) Webpage data acquisition method and device based on non-buried point
CN106295382B (en) A kind of Information Risk preventing control method and device
KR20110107363A (en) Method for server-side logging of client browser state through markup language
CN107085549B (en) Method and device for generating fault information
CN102663091B (en) WEB application navigation management method and system thereof
CN104765689A (en) Method and device for conducting real-time supervision to interface performance data
CN103440175A (en) Method and device for handling exception of intelligent card
CN105975395A (en) Website state reconnaissance method and device
CN104301175A (en) WEB service system simulation monitoring method based on browser
CN102870118A (en) Access method, device and system to user behavior
CN114064144B (en) Cross-application data acquisition communication plug-in and communication method
CN105144117A (en) Automated correlation and analysis of callstack and context data
CN103929339B (en) A kind of web data acquisition method and system
CN101763432A (en) Method for constructing lightweight webpage dynamic view
CN112989162B (en) Buried point reporting method, device, equipment and storage medium
CN103488675A (en) Automatic precise extraction device for multi-webpage news comment contents
CN103853717A (en) Web crawler
TWI570579B (en) An information retrieving method utilizing webpage visual features and webpage language features and a system using thereof
US9104573B1 (en) Providing relevant diagnostic information using ontology rules

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20160928