CN105975395A - Website state reconnaissance method and device - Google Patents
Website state reconnaissance method and device Download PDFInfo
- Publication number
- CN105975395A CN105975395A CN201610370314.XA CN201610370314A CN105975395A CN 105975395 A CN105975395 A CN 105975395A CN 201610370314 A CN201610370314 A CN 201610370314A CN 105975395 A CN105975395 A CN 105975395A
- Authority
- CN
- China
- Prior art keywords
- target web
- web
- collection target
- information
- web page
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/36—Preventing errors by testing or debugging software
- G06F11/3604—Software analysis for verifying properties of programs
- G06F11/3612—Software analysis for verifying properties of programs by runtime analysis
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/30—Monitoring
- G06F11/3003—Monitoring arrangements specially adapted to the computing system or computing system component being monitored
- G06F11/3006—Monitoring arrangements specially adapted to the computing system or computing system component being monitored where the computing system is distributed, e.g. networked systems, clusters, multiprocessor systems
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/95—Retrieval from the web
- G06F16/951—Indexing; Web crawling techniques
Abstract
The invention relates to the technical field of network information, in particular to a website state reconnaissance method and device. The website state reconnaissance method includes the following steps that an access request is sent to an acquisition target webpage periodically according to a preset reconnaissance period, and response information returned by a server of the acquisition target webpage is received; the response information is processed; when the response information indicates that the acquisition target webpage is not accessible, first alarming information is sent, wherein the first alarming information is used for indicating that the acquisition target webpage is not accessible. According to the website state reconnaissance method, the accessable state of the acquisition target webpage is reconnoitered periodically, the alarming information is sent when the acquisition target webpage cannot be accessed, and the availability and large-scale acquisition management capacity of information acquisition programs are improved.
Description
Technical field
The present invention relates to technical field of network information, be specifically related to a kind of website status reconnaissance method and device.
Background technology
Web crawlers is a kind of program or script automatically capturing internet information according to certain rule.
Web crawlers is responsible for collecting webpage from the Internet, and gathers information from the webpage collected.More specifically,
It it is acquisition web content data from the webserver.
Current crawlers, when carrying out extensive webpage information acquisition, frequently appears according to joining in advance
When the parameter put carries out information gathering, return the exception output of unexpected useful information, and be forced to stop webpage
The situation of information gathering.For this kind of abnormal output of crawlers, the manpower consumption substantial amounts of time is needed to enter
Row failture evacuation, remodifies crawlers, just can proceed webpage information acquisition work.
Summary of the invention
For occurring the problem of accidental interruption in webpage information acquisition, the present invention provides a kind of website status to scout
Method and device, in the availability improving information acquisition program to a certain extent.
First aspect, the website status reconnaissance method that the present invention proposes, comprise the following steps: according to setting in advance
In the fixed scouting cycle, periodically to gathering target web transmission access request, and receive this collection target network
The response message that the server of page returns;Process this response message;And indicate this collection mesh at this response message
During mark webpage inaccessible, sending the first warning information, this first warning information is used for indicating this collection target
Webpage inaccessible;This collection target web is one in task web page listings;In this task web page listings
Each collection info web corresponding to target web respectively according to collection period set in advance, periodically
Ground is collected.
Further, the website status reconnaissance method that the present invention proposes, this response message of this process, also include:
When this response message indicates this collection target web to access, accessing this collection target web, obtaining should
Gather the structure of web page information of target web;And according to this structure of web page information, this collection target detected
When the structure of web page of webpage changes, sending the second warning information, this second warning information is used for indicating this
The structure of web page gathering target web there occurs change.
Further, the website status reconnaissance method that the present invention proposes, this is according to this structure of web page information, inspection
The structure of web page measuring this collection target web changes, including following any one or multinomial: detect this
The frame information gathering target web there occurs change;Detect that the content information of this collection target web occurs
Change;Detect that the spatial cue of this collection target web there occurs change;This collection target network detected
The format information of page there occurs change.
Further, the website status reconnaissance method that the present invention proposes, this scouting cycle set in advance is less than
This collection period set in advance.
Compared with prior art, the website status reconnaissance method that the present invention proposes periodically scouts collection target
The addressable state of webpage, and send alarm information when inaccessible such that it is able in next time to target network
Before page carries out information gathering, the addressable state of webpage is obtained ahead of time, thus reduces because webpage can not be visited
Ask and cause the disabled situation of crawlers, decrease invalidated acquisitions operation, improve information acquisition program
The integrated management ability of availability and management personnel.
Second aspect, the website status ferreting device that the present invention proposes, including: request access modules, should
Ask access modules for according to the scouting cycle set in advance, periodically send to collection target web and access
Request, and the response message that the server receiving this collection target web returns;Response message processing module,
This response message processing module is used for processing this response message;First alarm module, this first alarm module is used
In time indicating this collection target web inaccessible at this response message, sending the first warning information, this is first years old
Warning information is used for indicating this collection target web inaccessible;This collection target web is task web page listings
In one;Info web corresponding in this task web page listings each collection target web respectively according to
Collection period set in advance, the most collected.
Further, the website status ferreting device that the present invention proposes, also include:
Structure of web page data obtaining module, this structure of web page data obtaining module is for indicating at this response message
When this collection target web can access, access this collection target web, and obtain this collection target web
Structure of web page information;Second alarm module, this second alarm module is for according to this structure of web page information, inspection
Measure the structure of web page of this collection target web when changing, send the second warning information, this second alarm
Information is for indicating the structure of web page of this collection target web to there occurs change.
Further, the website status ferreting device that the present invention proposes, this is according to this structure of web page information, inspection
The structure of web page measuring this collection target web changes, including following any one or multinomial: detect this
The frame information gathering target web there occurs change;Detect that the content information of this collection target web occurs
Change;Detect that the spatial cue of this collection target web there occurs change;This collection target network detected
The format information of page there occurs change.
Further, the website status ferreting device that the present invention proposes, this scouting cycle set in advance is less than
This collection period set in advance.
Compared with prior art, the website status ferreting device that the present invention proposes periodically scouts collection target
The addressable state of webpage, and send alarm information when inaccessible such that it is able in next time to target network
Before page carries out information gathering, the addressable state of webpage is obtained ahead of time, thus reduces because webpage can not be visited
Ask and cause the disabled situation of crawlers, decrease invalidated acquisitions operation, improve information acquisition program
The integrated management ability of availability and management personnel.
Accompanying drawing explanation
In order to be illustrated more clearly that the specific embodiment of the invention or technical scheme of the prior art, below will
The accompanying drawing used required in detailed description of the invention or description of the prior art is briefly described.
Accompanying drawing herein is merged in description and constitutes the part of this specification, it is shown that meet the present invention
Embodiment, and for explaining the principle of the present invention together with description.
Fig. 1 is the schematic flow sheet of the embodiment of the present invention 1 website status reconnaissance method;
Fig. 2 is the schematic flow sheet of the embodiment of the present invention 2 website status reconnaissance method;
Fig. 3 is the schematic flow sheet of the embodiment of the present invention 3 website status reconnaissance method;
Fig. 4 is the composition schematic diagram of the embodiment of the present invention 4 website status ferreting device;
Fig. 5 is the composition schematic diagram of the embodiment of the present invention 5 website status ferreting device.
Detailed description of the invention
For making the purpose of embodiment, technical scheme and advantage clearer, attached below in conjunction with in embodiment
Figure, is clearly and completely described the technical scheme in embodiment, it is clear that described embodiment is only
It is a part of embodiment of the present invention rather than whole embodiments.
Based on following embodiment, those of ordinary skill in the art are obtained under not making creative work premise
The every other embodiment obtained, broadly falls into the scope of protection of the invention.
Embodiment 1
As it is shown in figure 1, the present embodiment website status reconnaissance method, including:
Step S10: according to the scouting cycle set in advance, periodically to gathering target web transmission access
Request, and the response message that the server receiving this collection target web returns;
Step S20: process this response message;
Step S30: judge that this response message indicates whether this collection target web may have access to, and in this collection
When target web cannot access, execution step S40:
Step S40: send the first warning information, this first warning information is used for indicating this collection target web
Inaccessible;This collection target web is one in task web page listings.
Preferably, the info web that each in this task web page listings gathers target web corresponding is pressed respectively
The most collected according to collection period set in advance.
Due to internet site and the natural unstability of structure of web page, capture program last month often occurs
Available, and the situation that this month just cannot use.In particular for the when of promptly collection, people can be safeguarded
Member causes the biggest operating pressure;If concurrent tasks is a lot, then situation can become very passive and poverty-stricken.
The most conventional crawlers, carries out the information gathering of routine, substantially only according to the parameter being pre-configured with
It is not concerned with the availability of website and the correcting situation of webpage.This passive working method cannot adapt on a large scale
The needs of the management of acquisition tasks.
The present embodiment website status reconnaissance method periodically scouts the addressable state gathering target web, and
Alarm information is sent such that it is able to before next time carries out information gathering to target web when inaccessible,
The addressable state of webpage is obtained ahead of time, thus reduces because webpage inaccessible causes crawlers unavailable
Situation, decrease invalidated acquisitions operation, improve combining of the availability of information acquisition program and management personnel
Close operating capability.
Embodiment 2
The present embodiment website status reconnaissance method is on the basis of embodiment 1, further to reconnaissance net page structure
Illustrate.
As in figure 2 it is shown, the present embodiment website status reconnaissance method, also include:
Step S50: when this response message indicates this collection target web to access, access this collection mesh
Mark webpage, obtains the structure of web page information of this collection target web;
Step S60: according to this structure of web page information, detect that the structure of web page of this collection target web is sent out
During changing, sending the second warning information, this second warning information is for indicating the net of this collection target web
Page structure there occurs change.
The acquisition module that crawlers is typically according to customizing for this webpage carries out information analysis.Therefore, if
Structure of web page changes, when causing structure of web page and this acquisition module set in advance not to match that, and reptile
Program will return abnormal data.
The present embodiment website status reconnaissance method, when gathering target web and can access, accesses this collection target
Webpage, obtains the structure of web page information of this collection target web, and this is adopted according to this structure of web page infomation detection
Whether the structure of web page of collection target web changes, and when structure of web page changes, sends alarm letter
Breath such that it is able to before next time carries out information gathering to target web, know that structure of web page has been sent out in advance
The raw state changed, thus reduce because webpage inaccessible causes the disabled situation of crawlers, reduce
Invalidated acquisitions operation, improves the availability of information acquisition program and the integrated management ability of management personnel.
Embodiment 3
The present embodiment website status reconnaissance method is on the basis of embodiment 2, further to reconnaissance net page structure
The method of change illustrates.
As it is shown on figure 3, according to this structure of web page information in the present embodiment website status reconnaissance method, detect
The structure of web page of this collection target web changes, and can include following one or more:
Step S61: detect that the frame information of this collection target web there occurs change;
Step S62: detect that the content information of this collection target web there occurs change;
Step S63: detect that the spatial cue of this collection target web there occurs change;
Step S64: detect that the format information of this collection target web there occurs change.
One webpage is a kind of composite file of the content that carrying is shown according to certain layout.Conventional webpage is abided by
Follow the requirement of HTML specification.
HTML (HTML:Hyper Text Markup Language, hereinafter referred to as HTML)
It is used to describe a kind of language of webpage.HTML is not a kind of programming language, but a kind of markup language
(markup language).Markup language is a set of labelling label (markup tag).HTML uses labelling
Label describes webpage.Such as:<hTML>with</HTML>between text webpage is described;<body>
With</body>between text be visible content of pages;<h1>with</h1>between text be shown as mark
Topic;<p>with</p>between text be shown as paragraph.
Generally use h label, p label, table label to carry out the display of content in HTML, use
CSS carries out the control of display effect, uses JS to realize dynamic treatment effect (as Asynchronous loading JSON believes
Breath).
The variation of web content data is generally divided into following three kinds:
The variation of a, core display content part, i.e. the variation of the content visible part that user pays close attention to;
B, CSS etc. control element or the variation of definition of display effect;
C, the variation of other non-key elements, such as page meta element, the variation etc. of JS code.
It should be noted that the variation of other non-key elements can also include all displays being not concerned with of user
Content.
Consideration based on business scenario, when being embodied as, generally only focuses on the first, and what i.e. user paid close attention to can
See content part, namely the variation of core display content.
Detailed process is as follows:
The first step, obtains the content of specified page, abandons the HTML that all of non-user such as CSS, JS is paid close attention to
Element, extracts the core display content of a band basic format, and preserves.
It should be noted that this core display content is user's preassigned web retrieval information, such as enterprise
Title or enterprises registration code name etc., can be one or more.
Second step, takes the extraction logic identical with the first step, and then the result of comparison twice, scouts and specify
Whether webpage changes.
After being started by the first step, scouting process subsequently is then for periodically repeating above-mentioned second step.
Preferably, sometimes target web may be bigger, can introduce some Optimized Measures, on reducing
State the amount of calculation of comparison in second step.
This core display content can be further divided into again service metadata and business tine, such as certain enterprise letter
In the breath page, there is " enterprise name " and " ABC technology Co., Ltd " printed words, wherein " enterprise
Title " it is service metadata, " ABC technology Co., Ltd " is business tine.
Owing to capture program generally can't be caused obstructive to affect by the variation of business tine concrete printed words, so
Business tine part can not be extracted during the core extracting a band basic format shows content, this
Sample can accelerate comparison process, reduces memory data output simultaneously.
The present embodiment website status reconnaissance method is by detecting the frame information of this collection target web, content letter
Structure of web page change is scouted by breath, spatial cue or format information.
Preferably, in embodiment 1 to 3 website status reconnaissance method, this scouting cycle set in advance is less than
This collection period set in advance.
It should be noted that above-described embodiment 1 to 3 can be combined and implements, and realize the function after combination.
Embodiment 4
As shown in Figure 4, the present embodiment website status ferreting device, including:
Request access modules 10, this request access modules is for according to scouting cycle set in advance, cycle
Property ground to gather target web send access request, and receive this collection target web server return should
Answer information;
Response message processing module 20, this response message processing module is used for processing this response message;
First alarm module 30, this first alarm module is for indicating this collection target network at this response message
During page inaccessible, sending the first warning information, this first warning information is used for indicating this collection target web
Inaccessible;This collection target web is one in task web page listings;Every in this task web page listings
One gathers info web corresponding to target web respectively according to collection period set in advance, periodically by
Gather.
The present embodiment website status ferreting device periodically scouts the addressable state gathering target web, and
Alarm information is sent such that it is able to before next time carries out information gathering to target web when inaccessible,
The addressable state of webpage is obtained ahead of time, thus reduces because webpage inaccessible causes crawlers unavailable
Situation, decrease invalidated acquisitions operation, improve combining of the availability of information acquisition program and management personnel
Close operating capability.
Detailed description of the invention and the technique effect of the present embodiment website status ferreting device see embodiment 1, this
In repeat no more.
Embodiment 5
The present embodiment website status ferreting device is on the basis of embodiment 4, further to reconnaissance net page structure
Method illustrate.
As it is shown in figure 5, the present embodiment website status ferreting device, also include:
Structure of web page data obtaining module 40, this structure of web page data obtaining module is at this response message
When indicating this collection target web to access, access this collection target web, and obtain this collection target network
The structure of web page information of page;
Second alarm module 50, this second alarm module, for according to this structure of web page information, detects this
When the structure of web page of collection target web changes, sending the second warning information, this second warning information is used
Change is there occurs in the structure of web page indicating this collection target web.
Preferably, the present embodiment website status ferreting device, this, according to this structure of web page information, detects this
The structure of web page gathering target web changes, including following any one or multinomial: this collection mesh detected
The frame information of mark webpage there occurs change;Detect that the content information of this collection target web there occurs change;
Detect that the spatial cue of this collection target web there occurs change;The form of this collection target web detected
Information there occurs change.
Preferably, in embodiment 4 to 5 website status ferreting device, this scouting cycle set in advance is less than
This collection period set in advance.
Detailed description of the invention and the technique effect of the present embodiment website status ferreting device see embodiment 2 to 3,
Here repeat no more.
It should be noted that above-described embodiment 4 to 5 can be combined and implements, and realize the function after combination.
Being described above various embodiments of the present invention, described above is exemplary, and non-exclusive,
And it is also not necessarily limited to disclosed each embodiment.In the scope and spirit without departing from illustrated each embodiment
In the case of, many modifications and changes will be apparent from for those skilled in the art.
The selection of term used herein, it is intended to explain that the principle of each embodiment, reality are applied or to market best
In the improvement of technology, or make other those of ordinary skill of the art be understood that disclose herein each
Embodiment.
Those skilled in the art, after considering description and putting into practice disclosure disclosed herein, will readily occur to these public affairs
Other embodiment opened.The application is intended to any modification, purposes or the adaptations of the disclosure,
These modification, purposes or adaptations are followed the general principle of the disclosure and include that the disclosure is not disclosed
Common knowledge in the art or conventional techniques means.
If described function realizes and as independent production marketing or use using the form of SFU software functional unit
Time, can be stored in a computer read/write memory medium.Based on such understanding, the skill of the present invention
Part or the part of this technical scheme that prior art is contributed by art scheme the most in other words are permissible
Embodying with the form of software product, this computer software product is stored in a storage medium, including
Some instructions are with so that a computer equipment (can be that personal computer, server, or network set
Standby etc.) perform all or part of step of method described in each embodiment of the present invention.And aforesaid storage medium
Including: USB flash disk, portable hard drive, read only memory (ROM, Read-Only Memory), random access memory
Memorizer (RAM, Random Access Memory), magnetic disc or CD etc. are various can store program
The medium of code.
Last it is noted that various embodiments above is only in order to illustrate technical scheme, rather than to it
Limit;Although the present invention being described in detail with reference to foregoing embodiments, the ordinary skill of this area
Personnel it is understood that the technical scheme described in foregoing embodiments still can be modified by it, or
The most some or all of technical characteristic is carried out equivalent;And these amendments or replacement, do not make phase
The essence answering technical scheme departs from the scope of various embodiments of the present invention technical scheme, and it all should be contained in the present invention
Claim and description scope in the middle of.
Claims (8)
1. a website status reconnaissance method, it is characterised in that comprise the following steps:
According to the scouting cycle set in advance, periodically to gathering target web transmission access request, and connect
Receive the response message that the server of described collection target web returns;
Process described response message;
And when described response message indicates described collection target web inaccessible, send the first warning information,
Described first warning information is used for indicating described collection target web inaccessible;
Described collection target web is one in task web page listings;Each in described task web page listings
Gather info web corresponding to target web respectively according to collection period set in advance, periodically adopted
Collection.
Website status reconnaissance method the most according to claim 1, it is characterised in that described process institute
State response message, also include:
When described response message indicates described collection target web to access, access described collection target network
Page, obtains the structure of web page information of described collection target web;
And according to described structure of web page information, detect that the structure of web page of described collection target web becomes
During change, sending the second warning information, described second warning information is for indicating the net of described collection target web
Page structure there occurs change.
Website status reconnaissance method the most according to claim 2, it is characterised in that described according to institute
State structure of web page information, detect that the structure of web page of described collection target web changes, including following
One or more:
Detect that the frame information of described collection target web there occurs change;
Detect that the content information of described collection target web there occurs change;
Detect that the spatial cue of described collection target web there occurs change;
Detect that the format information of described collection target web there occurs change.
Website status reconnaissance method the most according to claim 1, it is characterised in that described in preset
The scouting cycle less than described collection period set in advance.
5. a website status ferreting device, it is characterised in that including:
Request access modules, described request access modules is used for according to the scouting cycle set in advance, periodically
Ground to gather target web send access request, and receive described collection target web server return should
Answer information;
Response message processing module, described response message processing module is used for processing described response message;
First alarm module, described first alarm module is for indicating described collection target at described response message
During webpage inaccessible, sending the first warning information, described first warning information is used for indicating described collection mesh
Mark webpage inaccessible;
Described collection target web is one in task web page listings;Each in described task web page listings
Gather info web corresponding to target web respectively according to collection period set in advance, periodically adopted
Collection.
Website status ferreting device the most according to claim 5, it is characterised in that also include:
Structure of web page data obtaining module, described structure of web page data obtaining module is at described response message
When indicating described collection target web to access, access described collection target web, and obtain described collection
The structure of web page information of target web;
Second alarm module, described second alarm module, for according to described structure of web page information, detects institute
State and gather the structure of web page of target web when changing, send the second warning information, described second alarm letter
Breath is for indicating the structure of web page of described collection target web to there occurs change.
Website status ferreting device the most according to claim 6, it is characterised in that described according to institute
State structure of web page information, detect that the structure of web page of described collection target web changes, including following
One or more:
Detect that the frame information of described collection target web there occurs change;
Detect that the content information of described collection target web there occurs change;
Detect that the spatial cue of described collection target web there occurs change;
Detect that the format information of described collection target web there occurs change.
Website status ferreting device the most according to claim 5, it is characterised in that described in preset
The scouting cycle less than described collection period set in advance.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201610370314.XA CN105975395A (en) | 2016-05-30 | 2016-05-30 | Website state reconnaissance method and device |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201610370314.XA CN105975395A (en) | 2016-05-30 | 2016-05-30 | Website state reconnaissance method and device |
Publications (1)
Publication Number | Publication Date |
---|---|
CN105975395A true CN105975395A (en) | 2016-09-28 |
Family
ID=57010528
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201610370314.XA Pending CN105975395A (en) | 2016-05-30 | 2016-05-30 | Website state reconnaissance method and device |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN105975395A (en) |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106777281A (en) * | 2016-12-29 | 2017-05-31 | 深圳市华傲数据技术有限公司 | For improving web crawlers stability, the data processing method of availability and device |
CN109298987A (en) * | 2017-07-25 | 2019-02-01 | 北京国双科技有限公司 | A kind of method and device detecting web crawlers operating status |
CN112100083A (en) * | 2020-11-13 | 2020-12-18 | 北京智慧星光信息技术有限公司 | Crawler template change monitoring method and system, electronic equipment and storage medium |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101035128A (en) * | 2007-04-18 | 2007-09-12 | 大连理工大学 | Three-folded webpage text content recognition and filtering method based on the Chinese punctuation |
US20100174774A1 (en) * | 2009-01-08 | 2010-07-08 | Inernational Business Machines Corporation | Method for server-side logging of client browser state through markup language |
CN102624713A (en) * | 2012-02-29 | 2012-08-01 | 深信服网络科技(深圳)有限公司 | Website tampering identification method and website tampering identification device |
-
2016
- 2016-05-30 CN CN201610370314.XA patent/CN105975395A/en active Pending
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101035128A (en) * | 2007-04-18 | 2007-09-12 | 大连理工大学 | Three-folded webpage text content recognition and filtering method based on the Chinese punctuation |
US20100174774A1 (en) * | 2009-01-08 | 2010-07-08 | Inernational Business Machines Corporation | Method for server-side logging of client browser state through markup language |
CN102624713A (en) * | 2012-02-29 | 2012-08-01 | 深信服网络科技(深圳)有限公司 | Website tampering identification method and website tampering identification device |
Non-Patent Citations (1)
Title |
---|
北京网景盛世技术开发公司: "《酷维(CoolWei)网站诊断与监测平台》", 31 January 2007 * |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106777281A (en) * | 2016-12-29 | 2017-05-31 | 深圳市华傲数据技术有限公司 | For improving web crawlers stability, the data processing method of availability and device |
CN109298987A (en) * | 2017-07-25 | 2019-02-01 | 北京国双科技有限公司 | A kind of method and device detecting web crawlers operating status |
CN112100083A (en) * | 2020-11-13 | 2020-12-18 | 北京智慧星光信息技术有限公司 | Crawler template change monitoring method and system, electronic equipment and storage medium |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
TWI753887B (en) | Front-end user behavior statistics method and device | |
CN105138599B (en) | It is a kind of can in the automatically monitoring website whole page each link clicks amount method | |
CN107895009B (en) | Distributed internet data acquisition method and system | |
JP6294307B2 (en) | Method and system for monitoring and tracking browsing activity on portable devices | |
CN106294101B (en) | The page gets test method and device ready | |
CN106897347B (en) | Webpage display method, operation event recording method and device | |
CN110020339B (en) | Webpage data acquisition method and device based on non-buried point | |
CN106295382B (en) | A kind of Information Risk preventing control method and device | |
KR20110107363A (en) | Method for server-side logging of client browser state through markup language | |
CN107085549B (en) | Method and device for generating fault information | |
CN102663091B (en) | WEB application navigation management method and system thereof | |
CN104765689A (en) | Method and device for conducting real-time supervision to interface performance data | |
CN103440175A (en) | Method and device for handling exception of intelligent card | |
CN105975395A (en) | Website state reconnaissance method and device | |
CN104301175A (en) | WEB service system simulation monitoring method based on browser | |
CN102870118A (en) | Access method, device and system to user behavior | |
CN114064144B (en) | Cross-application data acquisition communication plug-in and communication method | |
CN105144117A (en) | Automated correlation and analysis of callstack and context data | |
CN103929339B (en) | A kind of web data acquisition method and system | |
CN101763432A (en) | Method for constructing lightweight webpage dynamic view | |
CN112989162B (en) | Buried point reporting method, device, equipment and storage medium | |
CN103488675A (en) | Automatic precise extraction device for multi-webpage news comment contents | |
CN103853717A (en) | Web crawler | |
TWI570579B (en) | An information retrieving method utilizing webpage visual features and webpage language features and a system using thereof | |
US9104573B1 (en) | Providing relevant diagnostic information using ontology rules |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20160928 |