CN110691005A - Website monitoring system and method - Google Patents

Website monitoring system and method Download PDF

Info

Publication number
CN110691005A
CN110691005A CN201910869776.XA CN201910869776A CN110691005A CN 110691005 A CN110691005 A CN 110691005A CN 201910869776 A CN201910869776 A CN 201910869776A CN 110691005 A CN110691005 A CN 110691005A
Authority
CN
China
Prior art keywords
website
url
link
webpage
terminal system
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Withdrawn
Application number
CN201910869776.XA
Other languages
Chinese (zh)
Inventor
方宏
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Jiangsu Maritime Institute
Original Assignee
Jiangsu Maritime Institute
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Jiangsu Maritime Institute filed Critical Jiangsu Maritime Institute
Priority to CN201910869776.XA priority Critical patent/CN110691005A/en
Publication of CN110691005A publication Critical patent/CN110691005A/en
Withdrawn legal-status Critical Current

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L43/00Arrangements for monitoring or testing data switching networks
    • H04L43/50Testing arrangements
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/06Management of faults, events, alarms or notifications
    • H04L41/0631Management of faults, events, alarms or notifications using root cause analysis; using analysis of correlation between notifications, alarms or events based on decision criteria, e.g. hierarchy, tree or time analysis

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Information Transfer Between Computers (AREA)

Abstract

The invention discloses a website monitoring system and a method, which are used for setting a webpage link path depth filtering link and a depth recursion traversal method, controlling the inspection range of the webpage link in a website in the horizontal and vertical directions, analyzing the webpage by a headless browser, enabling the website monitoring to be flexible, efficient and accurate, and avoiding the problem of endless circulation link in the website by limiting recursion depth parameter values. The website monitoring system comprises a front terminal system, a database and a rear terminal system, wherein the front terminal system and the rear terminal system are respectively connected with the database through a mapping relation; the front terminal system is used for setting system parameters and detecting task operation input instructions, writing the instructions into a database, calling the instructions by the rear terminal system to execute feedback, and displaying historical detection records and information stored in the database to a user; the back terminal system is used for inputting the address of the first page of the checked website, checking related parameters and outputting checking information and alarm information stored in the relational database.

Description

Website monitoring system and method
Technical Field
The invention relates to the technical field of webpage monitoring, in particular to a website monitoring system and method.
Background
When some large websites are constructed, operated and debugged, because the websites have huge structures and more contents, particularly, the websites of a certain organization or government have deeper paths and local websites, errors existing in the webpages are difficult to find. Usually, more manpower is needed for manual proofreading, and manual proofreading errors cannot be avoided.
The method comprises the following steps of performing daily inspection and maintenance on a target website, wherein the problem of reducing the workload of manual inspection is more prominent, the current inspection on the webpage is mainly divided into two categories, one category is that an http request is sent to a target webpage link, whether the webpage link is in a problem or not is judged according to a returned protocol state code, and the content of the webpage to which the link points is not read; the other is that a headless browser is used for actually loading and analyzing a webpage pointed by a link to find out problems in the webpage;
a set of self-defined rules is set according to specific website characteristics, pages and links in the inspection station are automatically traversed, inspection information is recorded and displayed, problem pages are found and reported, multiple websites can be simultaneously inspected at a multi-thread regular period, and the existing inspection tool has the defect that an algorithm can fall into infinite circulation and has the difficulty of mainly inspecting circulation in a single direction.
Disclosure of Invention
The invention aims to provide a website monitoring system and a website monitoring method, which are used for setting a webpage link path depth filtering link and a depth recursion traversal method, controlling the inspection range of the webpage link in a website in the horizontal and vertical directions, analyzing the webpage by a headless browser, enabling the website monitoring to be flexible, efficient and accurate, and solving the problem of endless circulation link inspection in the website by limiting the parameter value of the recursion depth.
In order to achieve the purpose, the invention provides the following technical scheme: in a first aspect, the invention provides a website monitoring system, which comprises a front terminal system, a database and a rear terminal system, wherein the front terminal system and the rear terminal system are respectively connected with the database through a mapping relation;
the front terminal system is used for setting system parameters and detecting task operation input instructions, writing the instructions into a database, calling the instructions by the rear terminal system to execute feedback, and displaying historical detection records and information stored in the database to a user;
the back terminal system is used for inputting the address of the first page of the checked website, checking related parameters and outputting checking information and alarm information stored in the relational database.
In the system, the rear terminal system further comprises a task scheduling module, a task detection module and an alarm module;
the task scheduling module automatically starts the task detection module according to the website address and time of the checked website; and invoking a task detection module to automatically detect,
the input end of the task scheduling module is provided with a headless browser, and the headless browser is used for simulating the behavior of actively browsing a webpage, loading all information of the webpage, and loading, analyzing and executing a script;
and the task detection module accesses the detected target website.
In the system, the front-end subsystem browses the parameter setting, the detection state information and the alarm information of the task and executes the starting/stopping of the detection task.
In a second aspect, the present invention provides a method for monitoring a website, including the following steps:
s1, initializing, presetting the webpage URL which is not to be checked as a black list, setting the webpage URL which needs to be checked as a white list, checking a link set, reading the URL of the website home page, checking related parameters, and checking all URLs of a list under a directory;
s2: requesting a web site by adopting an http link, and checking the accessibility of the web site;
if the success is achieved, the next step is carried out;
if the link is failed, sending a link request at specified time intervals, reaching a certain number of times, if the number of times exceeds the specified number of times, storing abnormal information, sending alarm information, and exiting the circular inspection;
s3: checking a webpage path according to the link depth, loading and analyzing the webpage under the specified path by a headless browser, traversing and extracting links of the webpage, and checking the accessibility of the links one by one; recording the link test result, adding the tested URL into the checked link set and storing the URL in the database;
s4, loading and analyzing the web pages appointed by the URL in the white list one by one, if the URL link is classified in the checked link set, the check is not repeated;
s5: and storing the website inspection result in a database, calling an alarm module by a rear terminal system, judging whether an alarm is needed according to a set alarm rule, and sending alarm information if the alarm is needed.
In the above method, in step S3, the traversed extracted link is screened out by URL to be in-site, the document type is "text/html", the protocol is http, and the URL path depth is equal to or less than the specified value;
and if the value of the recursive calling depth variable rdepth is smaller than the specified parameter value, adding 1 to the rdepth, recursively calling the crawling URL to check the next webpage, and subtracting 1 from the rdepth after the crawling URL method returns.
Further, determining the path depth according to a formula, setting the initial path depth to pdepth to be n, and adjusting pdepth to be log as the recursion depth rdepth increases2rdepth + n, value rounded.
Further, in step S3, the links in the web page are extracted from the top to the bottom, and the extracted links are traversed, if the links are not checked and are not in the URL blacklist, the reachability of each http link is verified by testing one by one, the link test result is recorded, and the tested URL is added to the checked link set.
Furthermore, the input URL is loaded and analyzed through a headless browser, if the webpage is abnormal in loading and analysis, various abnormal information is recorded, and the URL crawling method returns.
In a third aspect, an embodiment of the present invention further provides a computer storage medium, on which a computer program is stored, where the computer program, when executed by a processor, implements the steps of the method in the second aspect.
The invention has the technical effects and advantages that:
a deep recursion traversal inspection method combined with deep screening of link paths is adopted for traversal of web page links in the website, and important web pages in the website can be flexibly, efficiently and accurately inspected.
Drawings
In order to more clearly illustrate the technical solutions in the embodiments of the present invention, the drawings that are required to be used in the description of the embodiments will be briefly described below.
FIG. 1 is a block diagram of a system provided by the present invention;
FIG. 2 is a flow chart of the system operation provided by the present invention;
fig. 3 is a schematic view of the monitoring results provided by the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention clearer, the present invention will be further described in detail with reference to the accompanying drawings, and all other embodiments obtained by a person of ordinary skill in the art based on the embodiments of the present invention without creative efforts shall fall within the protection scope of the present invention.
Like a website of a government or a large enterprise, the website has obvious hierarchy, a link path corresponding to a webpage generally has hierarchy, the link depth of a first page of the website is 0, the most visited and most important, and the deeper the link path of the corresponding webpage gets to a lower department and a function block.
The application scenario described in the embodiment of the present invention is for more clearly illustrating the technical solution of the embodiment of the present invention, and does not form a limitation on the technical solution provided in the embodiment of the present invention, and it can be known by a person skilled in the art that with the occurrence of a new application scenario, the technical solution provided in the embodiment of the present invention is also applicable to similar technical problems.
For the above application scenario, the present embodiment provides a website monitoring system, as shown in fig. 1, which includes a front terminal system, a database, and a back terminal system, where the front terminal system and the back terminal system are respectively connected to the database through a mapping relationship;
the front terminal system is used for setting system parameters and detecting task operation input instructions, writing the instructions into a database, calling the instructions by the rear terminal system to execute feedback, and displaying historical detection records and information stored in the database to a user;
the back terminal system is used for inputting the address of the first page of the checked website, checking related parameters and outputting checking information and alarm information stored in the relational database.
The rear terminal system also comprises a task scheduling module, a task detection module and an alarm module;
the task scheduling module automatically starts the task detection module according to the website address and the time of the checked website; and invoking a task detection module to automatically detect,
the input end of the task scheduling module is provided with a headless browser, and the headless browser is used for simulating the behavior of actively browsing a webpage, loading all information of the webpage, and loading, analyzing and executing a script;
and the task detection module accesses the detected target website.
It should be noted that the headless browser is a real browser without an interface, and can automatically simulate the situation that a person accesses a webpage through the browser, implement automatic loading, analysis and script execution of all information in pages such as html, css, javascript, pictures and the like, and can comprehensively check whether each part of the webpage is normal, and also control whether to execute js script, load pictures and the like through parameters, so as to balance checking performance and accuracy.
The headless browser is used for loading and analyzing the webpage and is the headless browser htmlonit, the speed is higher, and the resource consumption is lower.
By using webdriver-driven headless browsers phantomjs and chrome, the problems of the web pages can be found more accurately, and the problems are not described herein and are selected according to project requirements.
And the front terminal system is used for setting parameters of tasks, browsing detection state information and alarm information and executing starting/stopping of detection tasks.
Based on the same inventive concept, the present embodiment further provides a method for monitoring a website, as shown in fig. 2, including the following steps:
s1, initializing, presetting the webpage URL which is not to be checked as a black list, setting the webpage URL which needs to be checked as a white list, checking a link set, reading the URL of the website home page, checking related parameters, and checking all URLs of a list under a directory;
s2: requesting a web site by adopting an http link, and checking the accessibility of the web site;
if the success is achieved, the next step is carried out;
if the link is failed, sending a link request at specified time intervals, reaching a certain number of times, if the number of times exceeds the specified number of times, storing abnormal information, sending alarm information, and exiting the circular inspection;
s3: checking a webpage path according to the link depth, loading and analyzing the webpage under the specified path by a headless browser, traversing and extracting links of the webpage, and checking the accessibility of the links one by one; recording the link test result, adding the tested URL into the checked link set and storing the URL in the database;
s4, loading and analyzing the web pages appointed by the URL in the white list one by one, if the URL link is classified in the checked link set, the check is not repeated;
s5: and storing the website inspection result in a database, calling an alarm module by a rear terminal system, judging whether an alarm is needed according to a set alarm rule, and sending alarm information if the alarm is needed.
In the above implementation step S3, the traversed and extracted link is screened out by URL to be in-site, the document type is "text/html", the protocol is http, and the URL path depth is equal to or less than the URL of the specified value;
and if the value of the recursive calling depth variable rdepth is smaller than the specified parameter value, adding 1 to the rdepth, recursively calling the crawling URL to check the next webpage, and subtracting 1 from the rdepth after the crawling URL method returns.
Specifically, the path depth is determined according to a formula, the initial path depth is set to be pdepth to be n, and the pdepth is adjusted to be log along with the increase of the recursive depth rdepth2rdepth + n, value rounded.
In the above embodiment, the links in the web page are extracted from the top to the bottom from the left and the right, the extracted links are traversed, if the links are not checked and are not in the URL blacklist, the reachability of each http link is verified one by one (anchor link, mail address, https link, etc. are not tested), the link test result (http status code, content type, connection time, whether timeout occurs, etc.) is recorded, and the tested URL is added to the checked link set.
And the input URL loads and analyzes corresponding webpage content through a headless browser, if the webpage is abnormal in loading and analyzing, various abnormal information is recorded, and a URL crawling method returns.
The system mainly uses java language and automatically monitors the portal websites of government and school according to the method.
As shown in fig. 3, the link path depth is exemplified by: http:// www.nanjing.gov.cn, path depth of 2,/xxgkn/depth 1,/xxgkn/jgld/depth 2.
Based on the observation that: the shallower the link path depth, the more important the web page it points to in general.
Further, an embodiment of the present invention also provides a computer storage medium, on which a computer program is stored, where the computer program is executed by a processor, and the computer program performs the steps of the method in the above embodiment.
The present application is described above with reference to block diagrams and/or flowchart illustrations of methods, apparatus (systems) and/or computer program products according to embodiments of the application. It will be understood that one block of the block diagrams and/or flowchart illustrations, and combinations of blocks in the block diagrams and/or flowchart illustrations, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, and/or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer and/or other programmable data processing apparatus, create means for implementing the functions/acts specified in the block diagrams and/or flowchart block or blocks.
Accordingly, the subject application may also be embodied in hardware and/or in software (including firmware, resident software, micro-code, etc.).
It will be apparent to those skilled in the art that various changes and modifications may be made in the present invention without departing from the spirit and scope of the invention. Thus, if such modifications and variations of the present invention fall within the scope of the claims of the present invention and their equivalents, the present invention is also intended to include such modifications and variations.

Claims (8)

1. The utility model provides a website monitoring system, includes front end subsystem, database and back end subsystem, its characterized in that: the front terminal system and the rear terminal system are respectively connected with a database through a mapping relation;
the front terminal system is used for setting system parameters and detecting task operation input instructions, writing the instructions into a database, calling the instructions by the rear terminal system to execute feedback, and displaying historical detection records and information stored in the database to a user;
the back terminal system is used for inputting the address of the first page of the checked website, checking related parameters and outputting checking information and alarm information stored in the relational database.
2. The website monitoring system according to claim 1, wherein: the rear terminal system also comprises a task scheduling module, a task detection module and an alarm module;
the task scheduling module automatically starts the task detection module according to the website address and the time of the checked website; and invoking a task detection module to automatically detect,
the input end of the task scheduling module is provided with a headless browser, and the headless browser is used for simulating the behavior of actively browsing a webpage, loading all information of the webpage, and loading, analyzing and executing a script;
and the task detection module accesses the detected target website.
3. The website monitoring system according to claim 1, wherein: and the front terminal system is used for setting parameters of tasks, browsing detection state information and alarm information and executing starting/stopping of detection tasks.
4. A method for a website monitoring system according to claims 1-3, characterized by comprising the steps of:
s1, initializing, presetting the webpage URL which is not to be checked as a black list, setting the webpage URL which needs to be checked as a white list, checking a link set, reading the URL of the website home page, checking related parameters, and checking all URLs of a list under a directory;
s2: requesting a web site by adopting an http link, and checking the accessibility of the web site;
if the success is achieved, the next step is carried out;
if the link is failed, sending a link request at specified time intervals, reaching a certain number of times, if the number of times exceeds the specified number of times, storing abnormal information, sending alarm information, and exiting the circular inspection;
s3: checking a webpage path according to the link depth, loading and analyzing the webpage under the specified path by a headless browser, traversing and extracting links of the webpage, and checking the accessibility of the links one by one; recording the link test result, adding the tested URL into the checked link set and storing the URL in the database;
s4, loading and analyzing the web pages appointed by the URL in the white list one by one, if the URL link is classified in the checked link set, the check is not repeated;
s5: and storing the website inspection result in a database, calling an alarm module by a rear terminal system, judging whether an alarm is needed according to a set alarm rule, and sending alarm information if the alarm is needed.
5. The method of claim 3, wherein the website monitoring system comprises: in step S3, the link extracted by traversal is screened out by URL to be in-site, the document type is "text/html", the protocol is http, and the URL path depth is less than or equal to the URL of the specified value;
and if the value of the recursive calling depth variable rdepth is smaller than the specified parameter value, adding 1 to the rdepth, recursively calling the crawling URL to check the next webpage, and subtracting 1 from the rdepth after the crawling URL method returns.
6. The method of claim 5, wherein the website monitoring system comprises: determining the path depth according to a formula, setting the initial path depth to pdepth to be n, and adjusting pdepth to be log along with the increase of the recursive depth rdepth2rdepth + n, value rounded.
7. The method of claim 3, wherein the website monitoring system comprises: in step S3, the links in the web page are extracted from top to bottom from left to right with the website home page as the starting point, the extracted links are traversed, if the links are not checked and are not in the URL blacklist, the reachability of each http link is verified one by one, the link test result is recorded, and the tested URLs are added into the checked link set.
8. The method of claim 3, wherein the website monitoring system comprises: and the input URL loads and analyzes corresponding webpage content through a headless browser, if the webpage is abnormal in loading and analyzing, various abnormal information is recorded, and a URL crawling method returns.
CN201910869776.XA 2019-09-16 2019-09-16 Website monitoring system and method Withdrawn CN110691005A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910869776.XA CN110691005A (en) 2019-09-16 2019-09-16 Website monitoring system and method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910869776.XA CN110691005A (en) 2019-09-16 2019-09-16 Website monitoring system and method

Publications (1)

Publication Number Publication Date
CN110691005A true CN110691005A (en) 2020-01-14

Family

ID=69109180

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910869776.XA Withdrawn CN110691005A (en) 2019-09-16 2019-09-16 Website monitoring system and method

Country Status (1)

Country Link
CN (1) CN110691005A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112699280A (en) * 2020-12-31 2021-04-23 北京天融信网络安全技术有限公司 Website monitoring method, website map establishing method and device and electronic equipment
CN114595253A (en) * 2022-02-22 2022-06-07 深圳海域信息技术有限公司 Brand monitoring method, brand monitoring device, electronic equipment and medium

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112699280A (en) * 2020-12-31 2021-04-23 北京天融信网络安全技术有限公司 Website monitoring method, website map establishing method and device and electronic equipment
CN114595253A (en) * 2022-02-22 2022-06-07 深圳海域信息技术有限公司 Brand monitoring method, brand monitoring device, electronic equipment and medium

Similar Documents

Publication Publication Date Title
US7962547B2 (en) Method for server-side logging of client browser state through markup language
CN111061526A (en) Automatic testing method and device, computer equipment and storage medium
CN107918575B (en) Page state monitoring method and device
US20140129620A1 (en) Indicating coverage of web application testing
CN105335280A (en) Program performance test method and device
CN110147327B (en) Multi-granularity-based web automatic test management method
CN104301175A (en) WEB service system simulation monitoring method based on browser
CN110691005A (en) Website monitoring system and method
CN112486789A (en) Log analysis system, method and device
CN110740081A (en) Data visualization method for page performance of standard multiple companies
CN103559228A (en) Loading method and device for label pages in browsers
CN104407979B (en) script detection method and device
CN110708270B (en) Abnormal link detection method and device
CN116016270A (en) Switch test management method and device, electronic equipment and storage medium
CN116048959A (en) Website testing method, device, equipment and storage medium
CN115525528A (en) Page quality detection method and device, electronic equipment and storage medium
Bartoli et al. Recording and replaying navigations on AJAX web sites
CN112347326B (en) Crawler detection method and device based on browser end
CN113704760B (en) Page detection method and related device
CN111125590A (en) Method and device for drawing thermodynamic diagram
Ferrucci et al. A crawljax based approach to exploit traditional accessibility evaluation tools for AJAX applications
WO2024045954A1 (en) Method and apparatus for obtaining secondary page, and computer device
CN113986603B (en) Method and device for determining page loading abnormity reason and storage medium
CN109257317B (en) Method and device for detecting phishing website of mobile internet
CN116980320A (en) Website operation test method, device, equipment and medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
WW01 Invention patent application withdrawn after publication
WW01 Invention patent application withdrawn after publication

Application publication date: 20200114