CN106960058A - A kind of structure of web page alteration detection method and system - Google Patents

A kind of structure of web page alteration detection method and system Download PDF

Info

Publication number
CN106960058A
CN106960058A CN201710216863.6A CN201710216863A CN106960058A CN 106960058 A CN106960058 A CN 106960058A CN 201710216863 A CN201710216863 A CN 201710216863A CN 106960058 A CN106960058 A CN 106960058A
Authority
CN
China
Prior art keywords
web page
dom tree
tree
node
webpage
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201710216863.6A
Other languages
Chinese (zh)
Other versions
CN106960058B (en
Inventor
范晓忻
朱志伟
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Kim Union Bank (beijing) Information Technology Co Ltd
Original Assignee
Kim Union Bank (beijing) Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Kim Union Bank (beijing) Information Technology Co Ltd filed Critical Kim Union Bank (beijing) Information Technology Co Ltd
Priority to CN201710216863.6A priority Critical patent/CN106960058B/en
Publication of CN106960058A publication Critical patent/CN106960058A/en
Application granted granted Critical
Publication of CN106960058B publication Critical patent/CN106960058B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/958Organisation or management of web site content, e.g. publishing, maintaining pages or automatic linking
    • G06F16/986Document structures and storage, e.g. HTML extensions

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Transfer Between Computers (AREA)

Abstract

The disclosure is directed to a kind of structure of web page alteration detection method and system.Including:Obtain webpage HTML code and parse, obtain web data;According to the web data, the structure of web page of wherein each label is extracted, dom tree is set up;By the tree structure of dom tree and the web data dom tree type structure matching prestored;The change of structure of web page is determined according to matching result.The disclosure realizes the precise positioning at the quick inspection changed to structure of web page and change.

Description

A kind of structure of web page alteration detection method and system
Technical field
This disclosure relates to webpage detection technique field, more particularly to a kind of structure of web page alteration detection method and system.
Background technology
Structure of web page is the layout of web page contents, and it is actually to enter professional etiquette to the layout of web page contents to create structure of web page Draw.The establishment of structure of web page is one of important step of page layout optimization, can directly affect the Consumer's Experience and correlation of the page, and And the quantity that also overall structure and the page of influence website are included to a certain extent.From the angle of page structure, net Page is main, and by navigation bar, column and body matter, this three big key element is constituted.The establishment of structure of web page, the rule of web page contents layout It is also actually to deploy around this three big element to draw.
Actual structure of web page is exactly to this three big basic component carry out group of the page of navigation bar, column and body matter Weave cotton cloth office.According to the difference of content of pages emphasis, webpage can be divided into navigational route type, content type and navigation content mating type three Kind.
In general, the one-level column of an enterprise web site is not to be exceeded 8 to webpage colume structure, and column level is with three It is proper within layer.Website column offer is the basis of a website structure, is also the basis of Website navigation system, should accomplish Set reasonable, well arranged.Research department's network marketing to website colume structure is oriented to the basis of Web Hosting.The cloth of webpage Office, in traditional website design based on HTML, structure of web page positioning generally has form locating and frame structure, and present main flow is Using form locating;In website design based on XHTML language, typical positioning method uses layer.
However, the structure of webpage typically can be all adjusted according to the content of webpage, different contents determines different Structure of web page.One webpage has the adjustment on structure of web page when content changes.This, which just gives, needs to capture webpage The difficulty that the application of content is brought., may if still being gone to capture web page contents with old structure of web page after structure of web page change The web content data of mistake can be obtained.Thus, need badly want it is a kind of can fast and accurately detect structure of web page change scheme, To solve that the problem of structure of web page changes can not be accurately identified in the prior art.
The content of the invention
To overcome problem present in correlation technique, the embodiment of the present disclosure provide a kind of structure of web page alteration detection method and System.
According to the first aspect of the embodiment of the present disclosure there is provided a kind of structure of web page alteration detection method, including:
Obtain webpage HTML code and parse, obtain web data;
According to the web data, the structure of web page of wherein each label is extracted, dom tree is set up;
By the tree structure of dom tree and the web data dom tree type structure matching prestored;
The change of location of structure of web page is determined according to matching result.
The acquisition webpage HTML code is simultaneously parsed, including:
Web page server is logined, the corresponding HTML code of the webpage is asked;
The corresponding HTML code of the webpage is stored according to default form.
It is described to extract the structure of web page of wherein each label according to the web data, set up dom tree, including:
Obtain each html tag in the web data;
Corresponding structure of web page is obtained according to the html tag;
According to the structure of web page, cleaning removes the property value and content of text node of wherein each html tag, tied Structure data;
Dom tree is set up according to the structured data.
The tree structure by dom tree and the web data dom tree type structure matching prestored, including:
The tree structure for the dom tree that different time points are obtained matches.
By the tree structure of dom tree and the web data dom tree type structure matching prestored, including:
Matched according to tree structure matching algorithm;Specific tree structure matching algorithm includes:
Match the MD5 values of two dom trees;
When the MD5 values for confirming described two dom trees are inconsistent, from the father node of described two dom trees to child node, son Node carries out the contrast verification of recursion MD5 values one by one to leaf node;
By the inconsistent node of MD5 values, storage is into structure change set.
The change that structure of web page is determined according to matching result, including:
Traversal dom tree type structure simultaneously compares corresponding node, when there is node different, it is determined that structure of web page is sent out at the node Change is given birth to.
According to the another aspect of the embodiment of the present disclosure there is provided a kind of structure of web page alteration detection system, including:
HTML parsing modules, for obtaining webpage HTML code and parsing, obtain web data;
Dom tree sets up module, for according to the web data, extracting the structure of web page of wherein each label, setting up DOM Tree;
Matching module, for by the tree structure of dom tree and the web data dom tree type structure matching that prestores;
Detection module, the change for determining structure of web page according to matching result.
The HTML parsing modules, including:
Request unit, for logining web page server, asks the corresponding HTML code of the webpage;
Memory cell, for the corresponding HTML code of the webpage to be stored according to default form.
The dom tree sets up module, including:
Label acquiring unit, for obtaining each html tag in the web data;
Structure of web page acquiring unit, for obtaining corresponding structure of web page according to the html tag;
Cleaning unit, for according to the structure of web page, cleaning to be removed in the property value and text of wherein each html tag Hold node, obtain structured data;
Dom tree sets up unit, for setting up dom tree according to the structured data.
The matching module, including:
Matching unit, the MD5 values for matching whole dom tree one by one;
Verification unit, for when the MD5 values for confirming the dom tree are inconsistent, the father node from dom tree to child node, Child node carries out the verification of recursion MD5 values one by one to leaf node;
Aggregation units, for by the inconsistent node of MD5 values, storage to be into structure change set.
The technical scheme provided by this disclosed embodiment can include the following benefits:
The disclosure obtains web data by obtaining webpage HTML code and parsing;According to the web data, extract The structure of web page of wherein each label, sets up dom tree;By the tree structure of dom tree and the web data dom tree type knot prestored Structure is matched;The change of structure of web page is determined according to matching result.The disclosure provide scheme, can quick detection go out structure of web page Change, dom tree is set up by web page tag content, and confirm the change of structure of web page, energy by comparing dom tree shape structure Good detection instrument enough is provided for structure of web page detection related application, structure of web page change is provided at quick inspection and change Precise positioning.
It should be appreciated that the general description of the above and detailed description hereinafter are only exemplary and explanatory, not The disclosure can be limited.
Brief description of the drawings
Accompanying drawing herein is merged in specification and constitutes the part of this specification, shows the implementation for meeting the disclosure Example, and be used to together with specification to explain the principle of the disclosure.
Fig. 1 is a kind of structure of web page alteration detection Method And Principle flow chart according to an exemplary embodiment.
Fig. 2 is the webpage DOM tree structure schematic diagram according to an exemplary embodiment.
Fig. 3 is another webpage DOM tree structure schematic diagram according to an exemplary embodiment.
Fig. 4 is the tree structure matching algorithm principle flow chart according to an exemplary embodiment.
Fig. 5 is a kind of structure of web page alteration detection system structure diagram according to an exemplary embodiment.
Embodiment
Here exemplary embodiment will be illustrated in detail, its example is illustrated in the accompanying drawings.Following description is related to During accompanying drawing, unless otherwise indicated, the same numbers in different accompanying drawings represent same or analogous key element.Following exemplary embodiment Described in embodiment do not represent all embodiments consistent with the disclosure.On the contrary, they be only with it is such as appended The example of the consistent apparatus and method of some aspects be described in detail in claims, the disclosure.
Fig. 1 is a kind of structure of web page alteration detection Method And Principle flow chart according to an exemplary embodiment, including:
Step 11, obtain webpage HTML code and parse, obtain web data.
In the present embodiment, it is necessary first to obtain the web data of webpage to be detected.Because existing webpage is mostly based on HTML code is write, thus, it is necessary first to obtain the HTML code of webpage.
HTML is an application under a kind of HTML, standard generalized markup language, is also a kind of specification, one The standard of kind, it marks the various pieces in the webpage to be shown by label symbol.Web page files are a kind of text text in itself Part, by adding marker character in text, can tell how browser shows content therein (such as:Word such as where How reason, picture arranges, and how picture shows).
One multiple html file of webpage correspondence, with .htm, (disc operating system DOS is limited HTML document Foreign language abbreviation) be extension name or .HTML (foreign language abbreviation) is extension name.Can use any can generate TXT type source documents The text editor of part produces HTML document, only with modification file suffixes.The hypertext markup of standard Language file all has a basic overall structure, and it is typically all to occur in pairs to mark (except the mark of part for example:<br/ >), i.e. the beginning of HTML document and the head and entity two large divisions of ending mark and HTML. There are three double labellings to accord with for the integrally-built confirmation of the page.
By the HTML code of webpage, the substance of webpage can be obtained, and webpage structural framing etc..These numbers The parsing and arrangement classified according to needs.Namely content-data in webpage and structured data are classified, and according to default Form is preserved.The data of preservation are web data.Web data is needed after further processing, can be specific Parse particular content.
As a rule, the webpage of detection needed for logining is generally required, then acquisition request webpage HTML code therein, will Stored after these HTML codes classification correspondence.
Step 12, according to the web data, the structure of web page of wherein each label is extracted, dom tree is set up.
In one embodiment, specific web data is needed according to respective label substance, corresponds to specific webpage Structure, according to the difference of structure of web page, sets up different dom trees.
Html tag is most basic unit in html language, and html tag is the most important parts of HTML.Generally Html tag there are following features:
The keyword surrounded by angle brackets, such as<HTML>;
Typically occur in pairs, such as<div>With</div>;
First label of label centering is to start label, and second label is end-tag;
Beginning and end label is also referred to as open label and closure label;
Also there is the label individually presented, such as:<Img src=" Baidupedia .GIF "/>Deng;
The general label occurred in pairs, its content is in the middle of two labels, the label individually presented, then in tag attributes Assignment, such as<h1>Title</h1>With<Input type=" text " value=" button "/>;
The content of webpage need to be<HTML>In label, the letter such as title, character format, language, compatibility, keyword, description Breath is shown in<head>In label, and the content that webpage need to be shown need to be nested in<body>In label.
These html tags, define the concrete structure node of webpage, according to these html tags, can accurately judge The structure and content of webpage.
DOM (HTML Document Object Model, DOM Document Object Model), is specially adapted for HTML/XHTML DOM Document Object Model.It all regards each element in webpage as one by one object, so that the element in webpage can also be by Computer language is obtained or edited.DOM is the set with the node of hierarchical structure tissue or pieces of information.This hierarchical structure Allow developer to be navigated in tree and find customizing messages.The structure is analyzed to usually require to load whole document and tectonic remnant basin knot Structure, then can just do any work.Because it is that based on level of information, thus DOM is considered as based on tree or based on object 's.
Dom tree is the hierarchical structure of html page.It is made up of element, attribute and text, and they are all a nodes (Node), just as the organization chart of company.The webpage of input is pre-processed, the webpage of wherein each label is extracted Structure, is stored with dom tree shape structure (being exactly a multiway tree).Following HTML code is a simple table tableau format, After pretreatment, its dom tree is can extract out, as shown in Figure 2.
Each label is a DOM node inside HTML code, and each node can include other nodes, similar Other trunks can be grown on trunk.Dom tree is counted since root label H TML, can find any one in current page Label.
Specifically, dom tree can by HTML image being reflected in tree structure, be easy to follow-up comparison.Obtain institute State the html tag of each in web data;Corresponding structure of web page is obtained according to the html tag;According to the structure of web page Set up dom tree.
Step 13, by the tree structure of dom tree and the web data dom tree type structure matching prestored.
In one embodiment, it is necessary to be obtained with other time section or time point after the tree structure of specific dom tree is set up To the tree structure of dom tree be compared, can just be confirmed whether to change.And specific other time section or time point are obtained The tree structure of the dom tree taken can be obtained in advance, can also sets itself.
The tree structure for the dom tree that the different periods obtains, may be identical, it is also possible to which difference is, it is necessary to by specific Matching algorithm is determined.The tree structure of the dom tree of some webpage is extracted, we can extract the same net of different time The DOM tree structure of page.Matched for the DOM tree structure of the same webpage of different time, find out the node position of its change Put.
For example, set the corresponding webpage of above-mentioned code as webpage earlier, the corresponding webpage of following codes is newer Webpage, it is as follows be newer web page code, lacked a line compared to above-mentioned webpage:
As above the DOM tree structure of webpage can be extracted by the Web-page preprocessing stage, as shown in Figure 3.The two can lead to Cross dom tree matching algorithm and specifically match and compare, it is specific as shown in Figure 4:
Read dom tree shape structure first, that is, the dom tree shape structure that prestores and webpage to be detected dom tree Shape structure, can be represented with A and B.
The structure content cleaned in dom tree shape structure, that is, cleaning remove specific content, only retain tree structure. According to the dom tree, cleaning removes the property value and content of text node of wherein each label, only retains html tag structure Dom tree.
The MD5 values of each node of two dom tree shape structures are obtained, are compared one by one, are confirmed whether consistent.If one Cause, then flow terminates, and structure of web page does not change.
If it is inconsistent, more whether need to determine whether this MD5 value has been completed recursive procedure, such as Fruit is that then this is the comparison of last node, thus terminates flow, otherwise, continues flow.
Two dom trees are searched to the node of same position one by one in sequence.
Whether be leafy node, if it is, judging whether the quantity of the brother of node is consistent if judging the node.
When the node is not leafy node, continue to contrast the MD5 values of all nodes of node subordinate.The value of node differs During cause, continuation judges whether the quantity of the brother of node is consistent.When consistent, return and perform the step of recurrence completes to judge.
When the node is not leafy node, or the node downstream site MD5 values it is inconsistent when, continue to judge section at the same level Whether the quantity of point is consistent.If consistent, return and perform the step of recurrence completes to judge.Otherwise, flow is continued.
Calculate two dom tree shape structures, that is, A and B difference.
The position of storage dom tree shape structure to be detected at present and DOM change difference.
Accordingly, it may be determined that acquisition dom tree shape structure to be detected and the dom tree shape structure prestored difference in change Value, also just obtains the change of location of specific structure of web page.
MD5 is Message-Digest Algorithm 5 (Message-Digest Algorithm 5), for ensuring that information transfer is complete Unanimously.MD5 typical case application be to a segment information (Message) produce informative abstract (Message-Digest), with prevent by Distort.Specifically the MD5 values of file are like " digital finger-print " of this file.The MD5 values of each file be it is different, such as Really anyone has done any change to file, i.e. its MD5 value corresponding " digital finger-print " will change.The present embodiment In, by the comparison to each node M D5 values, confirm whether each node changes.
Step 14, the change of structure of web page is determined according to matching result.
The result of matching is obtained according to the concrete outcome of matching algorithm, is specifically needed traversal dom tree type structure and is compared Compared with corresponding node, when there is node different, it is determined that structure of web page is changed at the node.
Core in the present embodiment is the process of structure of web page alteration detection, that is, specific HTML code is set up Dom tree, and matched to be confirmed whether the process of change using tree structure matching algorithm.
The disclosure obtains web data by obtaining webpage HTML code and parsing;According to the web data, extract The structure of web page of wherein each label, sets up dom tree;By the tree structure of dom tree and the web data dom tree type knot prestored Structure is matched;The change of structure of web page is determined according to matching result.The disclosure provide scheme, can quick detection go out structure of web page Change, dom tree is set up by web page tag content, and confirm the change of structure of web page, energy by comparing dom tree shape structure Enough detect that related application provides detection instrument well for structure of web page, provided for structure of web page change at quick inspection and change Precise positioning.
Further, as shown in figure 5, being a kind of structure of web page alteration detection system according to an exemplary embodiment Structural representation, wherein,
HTML parsing modules 21, for obtaining webpage HTML code and parsing, obtain web data;
Dom tree sets up module 22, for according to the web data, extracting the structure of web page of wherein each label, sets up Dom tree;
Matching module 23, for by the tree structure of dom tree and the web data dom tree type structure matching that prestores;
Detection module 24, the change for determining structure of web page according to matching result.
Further, the HTML parsing modules 21, including:
Request unit, for logining web page server, asks the corresponding HTML code of the webpage;
Memory cell, for the corresponding HTML code of the webpage to be stored according to default form.
Further, the dom tree sets up module 22, including:
Label acquiring unit, for obtaining each html tag in the web data;
Structure of web page acquiring unit, for obtaining corresponding structure of web page according to the html tag;
Cleaning unit, for according to the structure of web page, cleaning to be removed in the property value and text of wherein each html tag Hold node, obtain structured data;
Dom tree sets up unit, for setting up dom tree according to the structured data.
Further, the matching module 23, including:
Matching unit, the MD5 values for matching whole dom tree;
Verification unit, for when the MD5 values for confirming the dom tree are inconsistent, the father node from dom tree to child node, Child node carries out the verification of recursion MD5 values one by one to leaf node;
Aggregation units, for by the inconsistent node of MD5 values, storage to be into structure change set, and user record becomes Node content more.
Specifically, the present embodiment obtains web data by obtaining webpage HTML code and parsing;According to the webpage Data, extract the structure of web page of wherein each label, set up dom tree;By the tree structure of dom tree and the webpage number prestored According to dom tree type structure matching;The change of structure of web page is determined according to matching result.The scheme that the disclosure is provided, can quickly be examined The change of structure of web page is measured, dom tree is set up by web page tag content, and confirm webpage by comparing dom tree shape structure The change of structure, can provide detection instrument well for structure of web page detection related application, provide fast for structure of web page change Precise positioning at speed inspection and change.
Those skilled in the art will readily occur to its of the disclosure after considering specification and putting into practice disclosure disclosed herein Its embodiment.The application is intended to any modification, purposes or the adaptations of the disclosure, these modifications, purposes or Person's adaptations follow the general principle of the disclosure and including the undocumented common knowledge in the art of the disclosure Or conventional techniques.Description and embodiments are considered only as exemplary, and the true scope of the disclosure and spirit are by following Claim is pointed out.
It should be appreciated that the precision architecture that the disclosure is not limited to be described above and is shown in the drawings, and And various modifications and changes can be being carried out without departing from the scope.The scope of the present disclosure is only limited by appended claim.

Claims (10)

1. a kind of structure of web page alteration detection method, it is characterised in that including:
Obtain webpage HTML code and parse, obtain web data;
According to the web data, the structure of web page of wherein each label is extracted, dom tree is set up;
By the tree structure of dom tree and the web data dom tree type structure matching prestored;
The change of location of structure of web page is determined according to matching result.
2. according to the method described in claim 1, it is characterised in that the acquisition webpage HTML code is simultaneously parsed, including:
Web page server is logined, the corresponding HTML code of the webpage is asked;
The corresponding HTML code of the webpage is stored according to default form.
3. according to the method described in claim 1, it is characterised in that described according to the web data, extract wherein each mark The structure of web page of label, sets up dom tree, including:
Obtain each html tag in the web data;
Corresponding structure of web page is obtained according to the html tag;
According to the structure of web page, cleaning removes the property value and content of text node of wherein each html tag, obtains structure number According to;
Dom tree is set up according to the structured data.
4. according to the method described in claim 1, it is characterised in that the tree structure by dom tree and the net prestored Page data dom tree type structure matching, including:
The tree structure for the dom tree that different time points are obtained matches.
5. according to the method described in claim 1, it is characterised in that by the tree structure of dom tree and the webpage number prestored According to dom tree type structure matching, including:
Matched according to tree structure matching algorithm;Specific tree structure matching algorithm includes:
Match the MD5 values of two dom trees;
When the MD5 values for confirming described two dom trees are inconsistent, from the father node of described two dom trees to child node, child node To leaf node, the contrast verification of recursion MD5 values is carried out one by one;
By the inconsistent node of MD5 values, storage is into structure change set.
6. according to the method described in claim 1, it is characterised in that the change that structure of web page is determined according to matching result, Including:
Traversal dom tree type structure simultaneously compares corresponding node, when there is node different, it is determined that structure of web page there occurs at the node Change.
7. a kind of structure of web page alteration detection system, it is characterised in that including:
HTML parsing modules, for obtaining webpage HTML code and parsing, obtain web data;
Dom tree sets up module, for according to the web data, extracting the structure of web page of wherein each label, setting up dom tree;
Matching module, for by the tree structure of dom tree and the web data dom tree type structure matching that prestores;
Detection module, the change for determining structure of web page according to matching result.
8. system according to claim 7, it is characterised in that the HTML parsing modules, including:
Request unit, for logining web page server, asks the corresponding HTML code of the webpage;
Memory cell, for the corresponding HTML code of the webpage to be stored according to default form.
9. system according to claim 7, it is characterised in that the dom tree sets up module, including:
Label acquiring unit, for obtaining each html tag in the web data;
Structure of web page acquiring unit, for obtaining corresponding structure of web page according to the html tag;
Cleaning unit, for according to the structure of web page, cleaning to remove the property value and content of text section of wherein each html tag Point, obtains structured data;
Dom tree sets up unit, for setting up dom tree according to the structured data.
10. system according to claim 7, it is characterised in that the matching module, including:
Matching unit, the MD5 values for matching whole dom tree;
Verification unit, for when the MD5 values for confirming the dom tree are inconsistent, the father node from dom tree to be to child node, sub- section Point arrives leaf node, and the verification of recursion MD5 values is carried out one by one;
Aggregation units, for by the inconsistent node of MD5 values, storage to be into structure change set.
CN201710216863.6A 2017-04-05 2017-04-05 Webpage structure change detection method and system Expired - Fee Related CN106960058B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201710216863.6A CN106960058B (en) 2017-04-05 2017-04-05 Webpage structure change detection method and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710216863.6A CN106960058B (en) 2017-04-05 2017-04-05 Webpage structure change detection method and system

Publications (2)

Publication Number Publication Date
CN106960058A true CN106960058A (en) 2017-07-18
CN106960058B CN106960058B (en) 2021-01-12

Family

ID=59483978

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710216863.6A Expired - Fee Related CN106960058B (en) 2017-04-05 2017-04-05 Webpage structure change detection method and system

Country Status (1)

Country Link
CN (1) CN106960058B (en)

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107612908A (en) * 2017-09-15 2018-01-19 杭州安恒信息技术有限公司 webpage tamper monitoring method and device
CN109542776A (en) * 2018-11-07 2019-03-29 北京潘达互娱科技有限公司 Page comparison method, device and equipment
CN109597972A (en) * 2018-12-10 2019-04-09 杭州全维技术股份有限公司 A kind of webpage dynamic change and altering detecting method based on web page frame
CN110046295A (en) * 2019-03-12 2019-07-23 重庆金融资产交易所有限责任公司 Structure of web page alteration detection method, apparatus and computer readable storage medium
CN112887381A (en) * 2021-01-15 2021-06-01 中国地质大学(武汉) Method and device for detecting and converging new content facing specific network entrance
CN111158973B (en) * 2019-12-05 2021-06-18 北京大学 Web application dynamic evolution monitoring method
CN114528005A (en) * 2021-11-29 2022-05-24 深圳市千源互联网科技服务有限公司 Grab tag updating method, device, equipment and storage medium
CN114969478A (en) * 2022-05-30 2022-08-30 上海弘玑信息技术有限公司 Webpage structure detection method, equipment and readable storage medium

Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102129528A (en) * 2010-01-19 2011-07-20 北京启明星辰信息技术股份有限公司 WEB page tampering identification method and system
CN102682098A (en) * 2012-04-27 2012-09-19 北京神州绿盟信息安全科技股份有限公司 Method and device for detecting web page content changes
CN103345532A (en) * 2013-07-26 2013-10-09 人民搜索网络股份公司 Method and device for extracting webpage information
CN203251315U (en) * 2012-12-20 2013-10-23 上海明想电子科技有限公司 Webpage variation monitoring system
CN103544213A (en) * 2013-09-16 2014-01-29 青岛英网资讯股份有限公司 Network content upgrading detection assessment method and system
CN103605925A (en) * 2013-11-29 2014-02-26 北京奇虎科技有限公司 Webpage tampering detecting method and device
CN103605926A (en) * 2013-11-29 2014-02-26 北京奇虎科技有限公司 Webpage tampering detecting method and device
CN103838801A (en) * 2012-11-27 2014-06-04 大连灵动科技发展有限公司 Webpage theme information extraction method
CN103885960A (en) * 2012-12-20 2014-06-25 上海明想电子科技有限公司 Method for monitoring webpage change
CN105630843A (en) * 2014-11-17 2016-06-01 广州市动景计算机科技有限公司 Webpage change monitoring method and device

Patent Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102129528A (en) * 2010-01-19 2011-07-20 北京启明星辰信息技术股份有限公司 WEB page tampering identification method and system
CN102682098A (en) * 2012-04-27 2012-09-19 北京神州绿盟信息安全科技股份有限公司 Method and device for detecting web page content changes
CN103838801A (en) * 2012-11-27 2014-06-04 大连灵动科技发展有限公司 Webpage theme information extraction method
CN203251315U (en) * 2012-12-20 2013-10-23 上海明想电子科技有限公司 Webpage variation monitoring system
CN103885960A (en) * 2012-12-20 2014-06-25 上海明想电子科技有限公司 Method for monitoring webpage change
CN103345532A (en) * 2013-07-26 2013-10-09 人民搜索网络股份公司 Method and device for extracting webpage information
CN103544213A (en) * 2013-09-16 2014-01-29 青岛英网资讯股份有限公司 Network content upgrading detection assessment method and system
CN103605925A (en) * 2013-11-29 2014-02-26 北京奇虎科技有限公司 Webpage tampering detecting method and device
CN103605926A (en) * 2013-11-29 2014-02-26 北京奇虎科技有限公司 Webpage tampering detecting method and device
CN105630843A (en) * 2014-11-17 2016-06-01 广州市动景计算机科技有限公司 Webpage change monitoring method and device

Cited By (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107612908A (en) * 2017-09-15 2018-01-19 杭州安恒信息技术有限公司 webpage tamper monitoring method and device
CN107612908B (en) * 2017-09-15 2020-06-05 杭州安恒信息技术股份有限公司 Webpage tampering monitoring method and device
CN109542776A (en) * 2018-11-07 2019-03-29 北京潘达互娱科技有限公司 Page comparison method, device and equipment
CN109597972A (en) * 2018-12-10 2019-04-09 杭州全维技术股份有限公司 A kind of webpage dynamic change and altering detecting method based on web page frame
CN110046295A (en) * 2019-03-12 2019-07-23 重庆金融资产交易所有限责任公司 Structure of web page alteration detection method, apparatus and computer readable storage medium
CN111158973B (en) * 2019-12-05 2021-06-18 北京大学 Web application dynamic evolution monitoring method
CN112887381A (en) * 2021-01-15 2021-06-01 中国地质大学(武汉) Method and device for detecting and converging new content facing specific network entrance
CN114528005A (en) * 2021-11-29 2022-05-24 深圳市千源互联网科技服务有限公司 Grab tag updating method, device, equipment and storage medium
CN114528005B (en) * 2021-11-29 2023-06-23 深圳市千源互联网科技服务有限公司 Grabbing label updating method, grabbing label updating device, grabbing label updating equipment and storage medium
CN114969478A (en) * 2022-05-30 2022-08-30 上海弘玑信息技术有限公司 Webpage structure detection method, equipment and readable storage medium

Also Published As

Publication number Publication date
CN106960058B (en) 2021-01-12

Similar Documents

Publication Publication Date Title
CN106960058A (en) A kind of structure of web page alteration detection method and system
US8381095B1 (en) Automated document revision markup and change control
US20070033520A1 (en) System and method for web page localization
CN109857956B (en) News webpage key information automatic extraction method based on label and block characteristics
JP2010086517A (en) Computer-implemented method for extracting data from web page
JPH08241332A (en) Device and method for retrieving all-sentence registered word
CN111680634A (en) Document file processing method and device, computer equipment and storage medium
Cardoso et al. An efficient language-independent method to extract content from news webpages
CN113254751B (en) Method, equipment and storage medium for accurately extracting complex webpage structured information
CN109344355A (en) Automatic returning detection and Block- matching adaptive approach and device for Web evolution
CN105740355B (en) Webpage context extraction method and device based on aggregation text density
CN107145591B (en) Title-based webpage effective metadata content extraction method
JPH11110384A (en) Method and device for retrieving and displaying structured document
KR100284580B1 (en) Web document automatic generating device and method
Sirsat et al. Pattern matching for extraction of core contents from news web pages
CN111158973B (en) Web application dynamic evolution monitoring method
CN116090416B (en) Standard writing method, system, equipment and medium based on standard knowledge graph
CN113343140B (en) Method for automatically extracting webpage text content based on neo4j graphic database
CN114115831A (en) Data processing method, device, equipment and storage medium
CN113392354A (en) Webpage text analysis method, system, medium and electronic equipment
CN114637505A (en) Page content extraction method and device
CN112328246A (en) Page component generation method and device, computer equipment and storage medium
CN108897749A (en) Method for abstracting web page information and system based on syntax tree and text block density
CN103870546B (en) The analysis method of on-line off-line environment page contrast and equipment after transcoding
CN114492419B (en) Text labeling method, system and device based on newly added key words in labeling

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20210112

CF01 Termination of patent right due to non-payment of annual fee