CN108228431A - A kind of method and system of configurationization reptile quality-monitoring - Google Patents

A kind of method and system of configurationization reptile quality-monitoring Download PDF

Info

Publication number
CN108228431A
CN108228431A CN201810007604.7A CN201810007604A CN108228431A CN 108228431 A CN108228431 A CN 108228431A CN 201810007604 A CN201810007604 A CN 201810007604A CN 108228431 A CN108228431 A CN 108228431A
Authority
CN
China
Prior art keywords
threshold value
monitoring
website
alarm threshold
authority record
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201810007604.7A
Other languages
Chinese (zh)
Inventor
张波
李界鹏
王能
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Zhongguancun Kejin Technology Co Ltd
Original Assignee
Beijing Zhongguancun Kejin Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Zhongguancun Kejin Technology Co Ltd filed Critical Beijing Zhongguancun Kejin Technology Co Ltd
Priority to CN201810007604.7A priority Critical patent/CN108228431A/en
Publication of CN108228431A publication Critical patent/CN108228431A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/32Monitoring with visual or acoustical indication of the functioning of the machine
    • G06F11/324Display of status information
    • G06F11/327Alarm or error message display
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/3003Monitoring arrangements specially adapted to the computing system or computing system component being monitored
    • G06F11/302Monitoring arrangements specially adapted to the computing system or computing system component being monitored where the computing system component is a software system
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/3089Monitoring arrangements determined by the means or processing involved in sensing the monitored data, e.g. interfaces, connectors, sensors, probes, agents
    • G06F11/3093Configuration details thereof, e.g. installation, enabling, spatial arrangement of the probes
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance or administration or management of packet switching networks
    • H04L41/06Arrangements for maintenance or administration or management of packet switching networks involving management of faults or events or alarms
    • H04L41/0604Alarm or event filtering, e.g. for reduction of information
    • H04L41/0622Alarm or event filtering, e.g. for reduction of information based on time
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L43/00Arrangements for monitoring or testing packet switching networks
    • H04L43/16Arrangements for monitoring or testing packet switching networks using threshold monitoring
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network-specific arrangements or communication protocols supporting networked applications
    • H04L67/30Network-specific arrangements or communication protocols supporting networked applications involving profiles

Abstract

This application discloses a kind of method of configurationization reptile quality-monitoring, including:It obtains crawlers and crawls the authority record parameter of each website, and authority record parameter is saved in database;Configuration file is read, to obtain website ID, monitoring time section and the alarm threshold value for needing to monitor;The authority record parameter that website ID corresponds to the monitoring time section of website is read from database;Judge whether authority record parameter is more than alarm threshold value;If so, send out alarm signal.This method crawls the authority record parameter of each website by obtaining crawlers, can realize the multi-faceted monitoring licensing process of crawlers and reptile quality;By reading configuration file, to obtain website ID, monitoring time section and the alarm threshold value for needing to monitor, achieve the effect that do personalized monitoring according to user demand.The application additionally provides a kind of system, server and the computer readable storage medium of configurationization reptile quality-monitoring simultaneously, has above-mentioned advantageous effect.

Description

A kind of method and system of configurationization reptile quality-monitoring
Technical field
This application involves web crawlers field, more particularly to a kind of method of configurationization reptile quality-monitoring, system, service Device and computer readable storage medium.
Background technology
With the rapid development of Internet technology, the big data epoch have arrived, and data acquisition becomes vital ring Section.The important source that crawlers are acquired as data, plays irreplaceable role.
In the prior art, the quality for the data that crawlers crawl generally is referred to reptile quality, it is main according to certain The quantity that data are crawled in time and the correctness for crawling data judge the height of reptile quality.Usually, when being climbed After maintenance or correcting, reptile quality will appear a degree of decline for the targeted website of worm routine access.
Current existing reptile quality-monitoring scheme is to do number statistics to the Authorization result of website, and broadly to reptile The output effect of program is monitored, and is periodically generated report.As it can be seen that the monitoring pair of existing configurationization reptile quality-monitoring scheme As single, the report content monitored is general, it is impossible to do personalized monitoring according to user demand.
Therefore, how to realize that it is the current need of those skilled in the art to do personalized monitoring to reptile quality according to user demand Technical problems to be solved.
Invention content
The purpose of the application is to provide a kind of method, system, server and the computer of configurationization reptile quality-monitoring can Storage medium is read, this method can be realized does personalized monitoring according to user demand to reptile quality.
In order to solve the above technical problems, the application provides a kind of method of configurationization reptile quality-monitoring, this method includes:
It obtains crawlers and crawls the authority record parameter of each website, and the authority record parameter is saved in database In;
Configuration file is read, to obtain website ID, monitoring time section and the alarm threshold value for needing to monitor;
The authority record parameter that the website ID corresponds to the monitoring time section of website is read from the database;
Judge whether the authority record parameter is more than the alarm threshold value;
If so, send out alarm signal.
Optionally, it before the acquisition crawlers crawl the authority record parameter of each website, further includes:
The field name and verification mode for needing to verify are read from the configuration file;
When the crawlers crawl data, field name described in the data is verified using the verification mode Corresponding data field;
The data field for verifying failure is labeled as abnormal data.
Optionally, the authority record parameter include serial number, reptile type, http url, conditional code, authorize take, At least one of in abnormal data quantity.
Optionally, judge whether the authority record parameter is more than the alarm threshold value, including:
Change rate of the state value for the conditional code proportion of " successfully completing " is calculated, and judges whether the change rate surpasses Cross change rate alarm threshold value;
If not exceeded, then calculating the average value for authorizing and taking, and judge whether the average value is more than to authorize to take Alarm threshold value;
If the average value is less than described authorize and takes alarm threshold value, the average response time of http url is counted, And judge whether the average response time is more than response time alarm threshold value;
If the average response time is less than the response time alarm threshold value, judge that the abnormal data quantity is No is more than abnormal data quantity alarm threshold value;
If the abnormal data quantity is more than the abnormal data quantity alarm threshold value, alarm command is sent.
Optionally, the reading configuration file, including:
Judge whether to receive configuration file input by user;
If so, read the configuration file input by user;
If it is not, then read default configuration file.
Optionally, it further includes:
The database periodically deletes the authority record parameter.
Optionally, the database includes mysql databases, hbase databases, mongodb databases, redis data At least one of in library.
The application also provides a kind of system of configurationization reptile quality-monitoring, which includes:
It obtains and preserving module, for obtaining the authority record parameter that crawlers crawl each website, and by the mandate Recording parameters are saved in database;
First read module, for reading configuration file, to obtain website ID, monitoring time section and the alarm for needing to monitor Threshold value;
Second read module, for reading the monitoring time section that the website ID corresponds to website from the database Authority record parameter;
Judgment module, for judging whether the authority record parameter is more than the alarm threshold value;
Alarm module, for when the authority record parameter is more than the alarm threshold value, sending out alarm signal.
The application also provides a kind of configurationization reptile quality monitoring server, which includes:
Memory, for storing computer program;
Processor realizes the configurationization reptile quality-monitoring as described in any of the above-described during for performing the computer program Method the step of.
The application also provides a kind of computer readable storage medium, and calculating is stored on the computer readable storage medium Machine program realizes the side of the configurationization reptile quality-monitoring as described in any of the above-described when the computer program is executed by processor The step of method.
A kind of method of configuration reptile quality-monitoring provided herein crawls each website including obtaining crawlers Authority record parameter, and authority record parameter is saved in database;Configuration file is read, to obtain the net for needing to monitor It stands ID, monitoring time section and alarm threshold value;The authority record that website ID corresponds to the monitoring time section of website is read from database Parameter;Judge whether authority record parameter is more than alarm threshold value;If so, send out alarm signal.
Technical solution provided herein crawls the authority record parameter of each website by obtaining crawlers, and will Authority record parameter is saved in database, can realize the multi-faceted monitoring licensing process of crawlers and reptile quality, just The reason of reptile quality declines is found in user;By reading configuration file, with obtain need monitor website ID, monitoring when Between section and alarm threshold value so that user can by configuration file setting selection want monitoring website, monitoring time and Monitoring standard achievees the effect that do personalized monitoring according to user demand.The application additionally provides a kind of configurationization reptile simultaneously System, server and the computer readable storage medium of quality-monitoring have above-mentioned advantageous effect, and details are not described herein.
Description of the drawings
In order to illustrate the technical solutions in the embodiments of the present application or in the prior art more clearly, to embodiment or it will show below There is attached drawing needed in technology description to be briefly described, it should be apparent that, the accompanying drawings in the following description is only this The embodiment of application, for those of ordinary skill in the art, without creative efforts, can also basis The attached drawing of offer obtains other attached drawings.
The flow chart of a kind of method of configuration reptile quality-monitoring that Fig. 1 is provided by the embodiment of the present application;
A kind of practical manifestation mode of S104 in a kind of method of configuration reptile quality-monitoring that Fig. 2 is provided by Fig. 1 Flow chart;
The structure chart of the system of a kind of configuration reptile quality-monitoring that Fig. 3 is provided by the embodiment of the present application;
The structure chart of the system of another configurationization reptile quality-monitoring that Fig. 4 is provided by the embodiment of the present application;
The structure chart of a kind of configuration reptile quality monitoring server that Fig. 5 is provided by the embodiment of the present application.
Specific embodiment
The core of the application is to provide a kind of method, system, server and the computer of configurationization reptile quality-monitoring can Storage medium is read, this method can be realized does personalized monitoring according to user demand to reptile quality.
Purpose, technical scheme and advantage to make the embodiment of the present application are clearer, below in conjunction with the embodiment of the present application In attached drawing, the technical solution in the embodiment of the present application is clearly and completely described, it is clear that described embodiment is Some embodiments of the present application, instead of all the embodiments.Based on the embodiment in the application, those of ordinary skill in the art All other embodiments obtained without making creative work shall fall in the protection scope of this application.
It please refers to Fig.1, the flow of a kind of method of configuration reptile quality-monitoring that Fig. 1 is provided by the embodiment of the present application Figure.
It specifically comprises the following steps:
S101:It obtains crawlers and crawls the authority record parameter of each website, and authority record parameter is saved in data In library;
It is directed to that existing reptile quality-monitoring scheme monitoring object is single, and report content is general, it is impossible to according to user's need It asks and does personalized monitoring, this application provides a kind of methods of configurationization reptile quality-monitoring, can realize according to user demand Personalized monitoring is done to reptile quality;
Optionally, before obtaining crawlers and crawling the authority record parameter of each website, can also include:
The field name and verification mode for needing to verify are read from configuration file;
When crawlers crawl data, the corresponding data field of field name in verification mode verification data is utilized;
The data field for verifying failure is labeled as abnormal data;
Checking routine can first read the field name and verification mode for needing to verify from configuration system, then to reptile Each object of data is obtained object type and all field names, field value, is then utilized verification mode using the mode of reflection The corresponding data field of field name in verification data;Checking routine supports a series of rule in itself, if data format is closes Another certain non-empty etc. during the one of non-empty of the identification card number of method, legal cell-phone number, non-empty, two relevant fields;If It was found that check results are failure, the data field for verifying failure is labeled as abnormal data;
Optionally, authority record parameter mentioned herein can include serial number, reptile type, http url, conditional code, It authorizes at least one of time-consuming, in abnormal data quantity;
The authority record parameter of each website is crawled by obtaining crawlers, and authority record parameter is saved in database In, it being capable of the multi-faceted original for monitoring the licensing process of crawlers and reptile quality, the decline of reptile quality being found convenient for user Cause.
S102:Configuration file is read, to obtain website ID, monitoring time section and the alarm threshold value for needing to monitor;
Configuration file mentioned herein can be configuration file input by user, or pre-set default configuration File;
Based on this, configuration file is read, can be included:
Judge whether to receive configuration file input by user;
If so, read configuration file input by user;
If it is not, then read default configuration file.
Alarm threshold value correspondence mentioned herein includes change rate alarm threshold value, authorizes time-consuming alarm threshold value, response time report At least one of in alert threshold value, abnormal data quantity alarm threshold value;
Optionally, which periodically deletes the authority record parameter of deposit, and the parameter deletion period can also be matched by reading Put file acquisition;
Optionally, database can include mysql databases, hbase databases, mongodb databases, redis data At least one of in library;
Website ID, monitoring time section and the alarm threshold value for needing to monitor are obtained by reading configuration file so that Yong Huneng Enough website, monitoring time and monitoring standards by wanting monitoring to the setting selection of configuration file reach according to user demand Do the effect of personalized monitoring.
S103:The authority record parameter that website ID corresponds to the monitoring time section of website is read from database;
S104:Judge whether authority record parameter is more than alarm threshold value;
If so, enter step S105.
S105:Send out alarm signal.
When authority record parameter is more than alarm threshold value, then the reptile quality for proving to monitor does not meet user's requirement, this When send out alarm signal, allow the user to find in time and crawlers made with corresponding adjustment;
Optionally, the mode for sending out alarm signal can be to read the obtained type of alarm of configuration file, including such as short message, The mode of mail or a combination of both, alert receipt person may be to read the mail address of alert receipt person that configuration file obtains Or note number.
Based on above-mentioned technical proposal, a kind of method of configuration reptile quality-monitoring provided herein passes through acquisition Crawlers crawl the authority record parameter of each website, and authority record parameter is saved in database, can realize multi-party The position monitoring licensing process of crawlers and reptile quality find the reason of reptile quality declines convenient for user;Pass through reading Configuration file, to obtain website ID, monitoring time section and the alarm threshold value for needing to monitor so that user can be by configuration text Website, monitoring time and the monitoring standard of monitoring are wanted in the setting selection of part, reach and do personalized monitoring according to user demand Effect.
It based on above-described embodiment, please refers to Fig.2, a kind of method of configuration reptile quality-monitoring that Fig. 2 is provided by Fig. 1 A kind of flow chart of practical manifestation mode of middle S104.
The present embodiment is the S104 for a upper embodiment, is to be made that specific implementation to the S104 contents described Description, here is flow chart shown in Fig. 2, specifically includes following steps:
S201:Change rate of the state value for the conditional code proportion of " successfully completing " is calculated, and judges that the change rate is No is more than change rate alarm threshold value;
If so, enter step S205;If it is not, then enter step S202.
S202:It calculates and authorizes time-consuming average value, and judge whether average value is more than to authorize to take alarm threshold value;
If so, enter step S205;If it is not, then enter step S203.
S203:The average response time of http url is counted, and judges whether average response time is more than to report the response time Alert threshold value;
If so, enter step S205;If it is not, then enter step S204.
S204:Judge whether abnormal data quantity is more than constant data bulk alarm threshold value;
If so, enter step S205.
S205:Send alarm command.
When abnormal data quantity is more than abnormal data quantity alarm threshold value, alarm command is sent, so that configurationization reptile The program of quality-monitoring sends out alarm signal;
It should be noted that the application is not specifically limited step S201 to the sequence between S204, user can basis Self-demand makees corresponding setting to step S201 to the sequence between S204.
Based on above-mentioned technical proposal, the embodiment of the present application by judge authority record parameter value whether more than alarm threshold value come Whether judgement reptile quality meets user's requirement, realizes the purpose of various dimensions monitoring reptile quality, finds and climb convenient for user The reason of worm quality declines.
It please refers to Fig.3, the structure of the system of a kind of configuration reptile quality-monitoring that Fig. 3 is provided by the embodiment of the present application Figure.
The system can include:
Acquisition and preserving module 100 for obtaining the authority record parameter that crawlers crawl each website, and are remembered authorizing Record parameter is saved in database;
First read module 200, for reading configuration file, with obtain need monitor website ID, monitoring time section and Alarm threshold value;
Second read module 300, for reading the mandate note that website ID corresponds to the monitoring time section of website from database Record parameter;
Judgment module 400, for judging whether authority record parameter is more than alarm threshold value;
Alarm module 500, for when authority record parameter is more than alarm threshold value, sending out alarm signal.
It please refers to Fig.4, the knot of the system of another configurationization reptile quality-monitoring that Fig. 4 is provided by the embodiment of the present application Composition.
The system can also include:
Third read module, for reading the field name and verification mode that need to verify from configuration file;
Correction verification module, for when crawlers crawl data, utilizing field name pair in verification mode verification data The data field answered;
Mark module is labeled as abnormal data for that will verify the data field of failure.
The judgment module 400 can include:
First judging submodule, for calculating the change rate for the conditional code proportion that state value is " successfully completing ", and Judge whether change rate is more than change rate alarm threshold value;
Second judgment submodule, for when whether change rate is more than non-change rate alarm threshold value, calculating and authorizing what is taken Average value, and judge whether average value is more than to authorize to take alarm threshold value;
Third judging submodule, for when average value is less than mandate and takes alarm threshold value, statistics http url's to be flat The equal response time, and judge whether average response time is more than response time alarm threshold value;
4th judging submodule, for when average response time is less than response time alarm threshold value, judging abnormal number Whether data bulk is more than abnormal data quantity alarm threshold value;
Sending submodule, for when abnormal data quantity is more than abnormal data quantity alarm threshold value, being sent out to alarm module Send alarm command.
First read module 200 can include:
5th judging submodule, for judging whether to receive configuration file input by user;
Reading submodule, for when receiving configuration file input by user, reading configuration file input by user;When When not receiving configuration file input by user, default configuration file is read.
Each component part of system above can be applied in a following practical flow:
Third read module reads the field name and verification mode for needing to verify from configuration file;Work as crawlers When crawling data, correction verification module utilizes the corresponding data field of field name in verification mode verification data;Mark module will The data field of verification failure is labeled as abnormal data;
It obtains and preserving module obtains crawlers and crawls the authority record parameter of each website, and authority record parameter is protected It is stored in database;5th judging submodule judges whether to receive configuration file input by user;It is inputted when receiving user Configuration file when, reading submodule reads configuration file input by user;When not receiving configuration file input by user, Reading submodule reads default configuration file, to obtain website ID, monitoring time section and the alarm threshold value for needing to monitor;Second reads Modulus block reads the authority record parameter that website ID corresponds to the monitoring time section of website from database;
First judging submodule calculates change rate of the state value for the conditional code proportion of " successfully completing ", and judges to become Whether rate is more than change rate alarm threshold value;When whether change rate is more than non-change rate alarm threshold value, second judgment submodule It calculates and authorizes time-consuming average value, and judge whether average value is more than to authorize to take alarm threshold value;When average value is less than mandate When taking alarm threshold value, the average response time of third judging submodule statistics http url, and judge that average response time is No is more than response time alarm threshold value;When average response time is less than response time alarm threshold value, the 4th judging submodule Judge whether abnormal data quantity is more than abnormal data quantity alarm threshold value;When abnormal data quantity is more than abnormal data quantity report During alert threshold value, sending submodule sends alarm command to alarm module;When receiving alarm command, alarm module sends out alarm Signal.
Please refer to Fig. 5, the structure of a kind of configuration reptile quality monitoring server that Fig. 5 is provided by the embodiment of the present application Figure.
The server can generate bigger difference due to configuration or different performance, can include at one or more Device (central processing units, CPU) 622 (for example, one or more processors) and memory 632 is managed, The storage medium 630 of one or more storage application programs 642 or data 644 (such as one or more magnanimity are deposited Store up equipment).Wherein, memory 632 and storage medium 630 can be of short duration storage or persistent storage.It is stored in storage medium 630 Program can include one or more modules (diagram does not mark), each module can include to a series of in device Instruction operation.Further, central processing unit 622 could be provided as communicating with storage medium 630, in configurationization reptile quality The series of instructions operation in storage medium 630 is performed in monitoring server 600.
Configurationization reptile quality monitoring server 600 can also include one or more power supplys 626, one or one More than wired or wireless network interface 650, one or more input/output interfaces 658 and/or, one or more Operating system 641, such as Windows ServerTM, Mac OS XTM, UnixTM, LinuxTM, FreeBSDTM etc..
Step in the method for the described configurationization reptile quality-monitorings of above-mentioned Fig. 1 to Fig. 2 is by configurationization reptile quality Monitoring server is based on the structure shown in fig. 5 and realizes.
It is apparent to those skilled in the art that for convenience and simplicity of description, the system of foregoing description, The specific work process of system and module can refer to the corresponding process in preceding method embodiment, and details are not described herein.
In several embodiments provided herein, it should be understood that disclosed system, server and method, it can To realize by another way.For example, system embodiment described above is only schematical, for example, module is drawn Point, only a kind of division of logic function can have other dividing mode, such as multiple module or components can in actual implementation To combine or be desirably integrated into another system or some features can be ignored or does not perform.Another point, it is shown or beg for The mutual coupling, direct-coupling or communication connection of opinion can be the INDIRECT COUPLING by some interfaces, system or module Or communication connection, can be electrical, machinery or other forms.
The module illustrated as separating component may or may not be physically separate, be shown as module Component may or may not be physical module, you can be located at a place or can also be distributed to multiple networks In module.Some or all of module therein can be selected according to the actual needs to realize the purpose of this embodiment scheme.
In addition, each function module in each embodiment of the application can be integrated in a processing module, it can also That modules are individually physically present, can also two or more modules be integrated in a module.Above-mentioned integrated mould The form that hardware had both may be used in block is realized, can also be realized in the form of software function module.
If integrated module realized in the form of software function module and be independent product sale or in use, can To be stored in a computer read/write memory medium.Based on such understanding, the technical solution of the application substantially or Saying all or part of the part contribute to the prior art or the technical solution can be embodied in the form of software product Out, which is stored in a storage medium, is used including some instructions so that a Computer Service Device (can be personal computer, funcall system or network server etc.) performs each embodiment method of the application All or part of step.And aforementioned storage medium includes:USB flash disk, mobile hard disk, read-only memory (Read-Only Memory, ROM), random access memory (Random Access Memory, RAM), magnetic disc or CD etc. are various can store program The medium of code.
Above to method, system, server and the computer of a kind of configuration reptile quality-monitoring provided herein Readable storage medium storing program for executing is described in detail.Specific case used herein carries out the principle and embodiment of the application It illustrates, the explanation of above example is only intended to help to understand the present processes and its core concept.It should be pointed out that for this For the those of ordinary skill of technical field, under the premise of the application principle is not departed from, the application can also be carried out several Improvement and modification, these improvement and modification are also fallen into the application scope of the claims.
It should also be noted that, in the present specification, relational terms such as first and second and the like be used merely to by One entity or operation are distinguished with another entity or operation, without necessarily requiring or implying these entities or operation Between there are any actual relationship or orders.Moreover, term " comprising ", "comprising" or its any other variant meaning Covering non-exclusive inclusion, so that process, method, article or server including a series of elements not only include Those elements, but also including other elements that are not explicitly listed or further include for this process, method, article or The intrinsic element of person's server.In the absence of more restrictions, the element limited by sentence "including a ...", and It is not precluded in the process including element, method, article or server that also there are other identical elements.

Claims (10)

  1. A kind of 1. method of configurationization reptile quality-monitoring, which is characterized in that including:
    It obtains crawlers and crawls the authority record parameter of each website, and the authority record parameter is saved in database;
    Configuration file is read, to obtain website ID, monitoring time section and the alarm threshold value for needing to monitor;
    The authority record parameter that the website ID corresponds to the monitoring time section of website is read from the database;
    Judge whether the authority record parameter is more than the alarm threshold value;
    If so, send out alarm signal.
  2. 2. according to the method described in claim 1, it is characterized in that, crawl the mandate note of each website in the acquisition crawlers Before recording parameter, further include:
    The field name and verification mode for needing to verify are read from the configuration file;
    When the crawlers crawl data, verify field name described in the data using the verification mode and correspond to Data field;
    The data field for verifying failure is labeled as abnormal data.
  3. 3. according to the method described in claim 2, it is characterized in that, the authority record parameter include serial number, reptile type, Http url, conditional code, authorize take, in abnormal data quantity at least one of.
  4. 4. according to claim 1-3 any one of them methods, which is characterized in that judge the authority record parameter whether be more than The alarm threshold value, including:
    Change rate of the state value for the conditional code proportion of " successfully completing " is calculated, and judges whether the change rate is more than to become Rate alarm threshold value;
    If not exceeded, then calculating the average value for authorizing and taking, and judge whether the average value is more than to authorize to take alarm Threshold value;
    If the average value is less than described authorize and takes alarm threshold value, the average response time of http url is counted, and sentence Whether the average response time that breaks is more than response time alarm threshold value;
    If the average response time is less than the response time alarm threshold value, judge whether the abnormal data quantity surpasses Cross abnormal data quantity alarm threshold value;
    If the abnormal data quantity is more than the abnormal data quantity alarm threshold value, alarm command is sent.
  5. 5. according to the method described in claim 1, it is characterized in that, the reading configuration file, including:
    Judge whether to receive configuration file input by user;
    If so, read the configuration file input by user;
    If it is not, then read default configuration file.
  6. 6. it according to the method described in claim 5, it is characterized in that, further includes:
    The database periodically deletes the authority record parameter.
  7. 7. according to the method described in claim 6, it is characterized in that, the database includes mysql databases, hbase data At least one of in library, mongodb databases, redis databases.
  8. 8. a kind of system of configurationization reptile quality-monitoring, which is characterized in that including:
    It obtains and preserving module, for obtaining the authority record parameter that crawlers crawl each website, and by the authority record Parameter is saved in database;
    First read module, for reading configuration file, to obtain website ID, monitoring time section and the warning level for needing to monitor Value;
    Second read module, the monitoring time section that website is corresponded to for reading the website ID from the database are awarded Weigh recording parameters;
    Judgment module, for judging whether the authority record parameter is more than the alarm threshold value;
    Alarm module, for when the authority record parameter is more than the alarm threshold value, sending out alarm signal.
  9. 9. a kind of configurationization reptile quality monitoring server, which is characterized in that including:
    Memory, for storing computer program;
    Processor realizes the configurationization reptile quality as described in any one of claim 1 to 7 during for performing the computer program The step of method of monitoring.
  10. 10. a kind of computer readable storage medium, which is characterized in that be stored with computer on the computer readable storage medium Program realizes that configurationization reptile quality is supervised as described in any one of claim 1 to 7 when the computer program is executed by processor The step of method of survey.
CN201810007604.7A 2018-01-04 2018-01-04 A kind of method and system of configurationization reptile quality-monitoring Pending CN108228431A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810007604.7A CN108228431A (en) 2018-01-04 2018-01-04 A kind of method and system of configurationization reptile quality-monitoring

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810007604.7A CN108228431A (en) 2018-01-04 2018-01-04 A kind of method and system of configurationization reptile quality-monitoring

Publications (1)

Publication Number Publication Date
CN108228431A true CN108228431A (en) 2018-06-29

Family

ID=62642931

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810007604.7A Pending CN108228431A (en) 2018-01-04 2018-01-04 A kind of method and system of configurationization reptile quality-monitoring

Country Status (1)

Country Link
CN (1) CN108228431A (en)

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103034725A (en) * 2012-12-19 2013-04-10 中国科学院深圳先进技术研究院 Data acquisition, analysis and pre-warning system and method thereof
CN103248625A (en) * 2013-04-27 2013-08-14 北京京东尚科信息技术有限公司 Monitoring method and system for abnormal operation of web crawler
US8868541B2 (en) * 2011-01-21 2014-10-21 Google Inc. Scheduling resource crawls
CN106202467A (en) * 2016-07-18 2016-12-07 浪潮集团有限公司 A kind of definable towards peer-to-peer network searches for the web crawlers method of emphasis
CN106326447A (en) * 2016-08-26 2017-01-11 北京量科邦信息技术有限公司 Detection method and system of data captured by crowd sourcing network crawlers
CN107092544A (en) * 2016-05-24 2017-08-25 口碑控股有限公司 monitoring method and device
CN107329969A (en) * 2017-05-23 2017-11-07 合肥智权信息科技有限公司 It is a kind of that system and method are updated based on the data message repeatedly verified

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8868541B2 (en) * 2011-01-21 2014-10-21 Google Inc. Scheduling resource crawls
CN103034725A (en) * 2012-12-19 2013-04-10 中国科学院深圳先进技术研究院 Data acquisition, analysis and pre-warning system and method thereof
CN103248625A (en) * 2013-04-27 2013-08-14 北京京东尚科信息技术有限公司 Monitoring method and system for abnormal operation of web crawler
CN107092544A (en) * 2016-05-24 2017-08-25 口碑控股有限公司 monitoring method and device
CN106202467A (en) * 2016-07-18 2016-12-07 浪潮集团有限公司 A kind of definable towards peer-to-peer network searches for the web crawlers method of emphasis
CN106326447A (en) * 2016-08-26 2017-01-11 北京量科邦信息技术有限公司 Detection method and system of data captured by crowd sourcing network crawlers
CN107329969A (en) * 2017-05-23 2017-11-07 合肥智权信息科技有限公司 It is a kind of that system and method are updated based on the data message repeatedly verified

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
张军强: "《面向多爬虫的监控系统的设计与实现》", 《中国优秀硕士学位论文全文数据库》 *

Similar Documents

Publication Publication Date Title
US9582980B2 (en) Intentional monitoring
US10474840B2 (en) Method and apparatus for generating privacy profiles
US10972282B2 (en) Distributed logging of application events in a blockchain
Felt et al. I've got 99 problems, but vibration ain't one: a survey of smartphone users' concerns
US9348896B2 (en) Dynamic network analytics system
JP2020113315A (en) Distributed, decentralized data aggregation
CN104468249B (en) Account abnormity detection method and device
US10708291B2 (en) Security threat information gathering and incident reporting systems and methods
US9219787B1 (en) Stateless cookie operations server
CN104580074B (en) The login method of client application and its corresponding server
CA2798759C (en) Bug clearing house
KR101677217B1 (en) Method and Device, Program and Recording Medium for Identifying User Behavior
Wen et al. To shut them up or to clarify: Restraining the spread of rumors in online social networks
CN104391979B (en) Network malice reptile recognition methods and device
US8595626B2 (en) Application recommendation
US20180253350A1 (en) Monitoring node usage in a distributed system
CN104216947B (en) A kind of user of invitation adds method and the device of group
US9824199B2 (en) Multi-factor profile and security fingerprint analysis
US10223524B1 (en) Compromised authentication information clearing house
US9026916B2 (en) User interface for managing questions and answers across multiple social media data sources
KR20190075972A (en) Systems and methods for identifying process flows from log files and for visualizing flows
CN106850346B (en) Method and device for monitoring node change and assisting in identifying blacklist and electronic equipment
US20190386834A1 (en) Blockchain management apparatus, blockchain management method, and program
JP5551704B2 (en) Evaluating online marketing efficiency
US20060200373A1 (en) Facilitating Root Cause Analysis for Abnormal Behavior of Systems in a Networked Environment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination