CN106960058A - A kind of structure of web page alteration detection method and system - Google Patents
A kind of structure of web page alteration detection method and system Download PDFInfo
- Publication number
- CN106960058A CN106960058A CN201710216863.6A CN201710216863A CN106960058A CN 106960058 A CN106960058 A CN 106960058A CN 201710216863 A CN201710216863 A CN 201710216863A CN 106960058 A CN106960058 A CN 106960058A
- Authority
- CN
- China
- Prior art keywords
- web page
- dom tree
- tree
- node
- webpage
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/95—Retrieval from the web
- G06F16/958—Organisation or management of web site content, e.g. publishing, maintaining pages or automatic linking
- G06F16/986—Document structures and storage, e.g. HTML extensions
Landscapes
- Engineering & Computer Science (AREA)
- Databases & Information Systems (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Information Transfer Between Computers (AREA)
Abstract
The disclosure is directed to a kind of structure of web page alteration detection method and system.Including:Obtain webpage HTML code and parse, obtain web data;According to the web data, the structure of web page of wherein each label is extracted, dom tree is set up;By the tree structure of dom tree and the web data dom tree type structure matching prestored;The change of structure of web page is determined according to matching result.The disclosure realizes the precise positioning at the quick inspection changed to structure of web page and change.
Description
Technical field
This disclosure relates to webpage detection technique field, more particularly to a kind of structure of web page alteration detection method and system.
Background technology
Structure of web page is the layout of web page contents, and it is actually to enter professional etiquette to the layout of web page contents to create structure of web page
Draw.The establishment of structure of web page is one of important step of page layout optimization, can directly affect the Consumer's Experience and correlation of the page, and
And the quantity that also overall structure and the page of influence website are included to a certain extent.From the angle of page structure, net
Page is main, and by navigation bar, column and body matter, this three big key element is constituted.The establishment of structure of web page, the rule of web page contents layout
It is also actually to deploy around this three big element to draw.
Actual structure of web page is exactly to this three big basic component carry out group of the page of navigation bar, column and body matter
Weave cotton cloth office.According to the difference of content of pages emphasis, webpage can be divided into navigational route type, content type and navigation content mating type three
Kind.
In general, the one-level column of an enterprise web site is not to be exceeded 8 to webpage colume structure, and column level is with three
It is proper within layer.Website column offer is the basis of a website structure, is also the basis of Website navigation system, should accomplish
Set reasonable, well arranged.Research department's network marketing to website colume structure is oriented to the basis of Web Hosting.The cloth of webpage
Office, in traditional website design based on HTML, structure of web page positioning generally has form locating and frame structure, and present main flow is
Using form locating;In website design based on XHTML language, typical positioning method uses layer.
However, the structure of webpage typically can be all adjusted according to the content of webpage, different contents determines different
Structure of web page.One webpage has the adjustment on structure of web page when content changes.This, which just gives, needs to capture webpage
The difficulty that the application of content is brought., may if still being gone to capture web page contents with old structure of web page after structure of web page change
The web content data of mistake can be obtained.Thus, need badly want it is a kind of can fast and accurately detect structure of web page change scheme,
To solve that the problem of structure of web page changes can not be accurately identified in the prior art.
The content of the invention
To overcome problem present in correlation technique, the embodiment of the present disclosure provide a kind of structure of web page alteration detection method and
System.
According to the first aspect of the embodiment of the present disclosure there is provided a kind of structure of web page alteration detection method, including:
Obtain webpage HTML code and parse, obtain web data;
According to the web data, the structure of web page of wherein each label is extracted, dom tree is set up;
By the tree structure of dom tree and the web data dom tree type structure matching prestored;
The change of location of structure of web page is determined according to matching result.
The acquisition webpage HTML code is simultaneously parsed, including:
Web page server is logined, the corresponding HTML code of the webpage is asked;
The corresponding HTML code of the webpage is stored according to default form.
It is described to extract the structure of web page of wherein each label according to the web data, set up dom tree, including:
Obtain each html tag in the web data;
Corresponding structure of web page is obtained according to the html tag;
According to the structure of web page, cleaning removes the property value and content of text node of wherein each html tag, tied
Structure data;
Dom tree is set up according to the structured data.
The tree structure by dom tree and the web data dom tree type structure matching prestored, including:
The tree structure for the dom tree that different time points are obtained matches.
By the tree structure of dom tree and the web data dom tree type structure matching prestored, including:
Matched according to tree structure matching algorithm;Specific tree structure matching algorithm includes:
Match the MD5 values of two dom trees;
When the MD5 values for confirming described two dom trees are inconsistent, from the father node of described two dom trees to child node, son
Node carries out the contrast verification of recursion MD5 values one by one to leaf node;
By the inconsistent node of MD5 values, storage is into structure change set.
The change that structure of web page is determined according to matching result, including:
Traversal dom tree type structure simultaneously compares corresponding node, when there is node different, it is determined that structure of web page is sent out at the node
Change is given birth to.
According to the another aspect of the embodiment of the present disclosure there is provided a kind of structure of web page alteration detection system, including:
HTML parsing modules, for obtaining webpage HTML code and parsing, obtain web data;
Dom tree sets up module, for according to the web data, extracting the structure of web page of wherein each label, setting up DOM
Tree;
Matching module, for by the tree structure of dom tree and the web data dom tree type structure matching that prestores;
Detection module, the change for determining structure of web page according to matching result.
The HTML parsing modules, including:
Request unit, for logining web page server, asks the corresponding HTML code of the webpage;
Memory cell, for the corresponding HTML code of the webpage to be stored according to default form.
The dom tree sets up module, including:
Label acquiring unit, for obtaining each html tag in the web data;
Structure of web page acquiring unit, for obtaining corresponding structure of web page according to the html tag;
Cleaning unit, for according to the structure of web page, cleaning to be removed in the property value and text of wherein each html tag
Hold node, obtain structured data;
Dom tree sets up unit, for setting up dom tree according to the structured data.
The matching module, including:
Matching unit, the MD5 values for matching whole dom tree one by one;
Verification unit, for when the MD5 values for confirming the dom tree are inconsistent, the father node from dom tree to child node,
Child node carries out the verification of recursion MD5 values one by one to leaf node;
Aggregation units, for by the inconsistent node of MD5 values, storage to be into structure change set.
The technical scheme provided by this disclosed embodiment can include the following benefits:
The disclosure obtains web data by obtaining webpage HTML code and parsing;According to the web data, extract
The structure of web page of wherein each label, sets up dom tree;By the tree structure of dom tree and the web data dom tree type knot prestored
Structure is matched;The change of structure of web page is determined according to matching result.The disclosure provide scheme, can quick detection go out structure of web page
Change, dom tree is set up by web page tag content, and confirm the change of structure of web page, energy by comparing dom tree shape structure
Good detection instrument enough is provided for structure of web page detection related application, structure of web page change is provided at quick inspection and change
Precise positioning.
It should be appreciated that the general description of the above and detailed description hereinafter are only exemplary and explanatory, not
The disclosure can be limited.
Brief description of the drawings
Accompanying drawing herein is merged in specification and constitutes the part of this specification, shows the implementation for meeting the disclosure
Example, and be used to together with specification to explain the principle of the disclosure.
Fig. 1 is a kind of structure of web page alteration detection Method And Principle flow chart according to an exemplary embodiment.
Fig. 2 is the webpage DOM tree structure schematic diagram according to an exemplary embodiment.
Fig. 3 is another webpage DOM tree structure schematic diagram according to an exemplary embodiment.
Fig. 4 is the tree structure matching algorithm principle flow chart according to an exemplary embodiment.
Fig. 5 is a kind of structure of web page alteration detection system structure diagram according to an exemplary embodiment.
Embodiment
Here exemplary embodiment will be illustrated in detail, its example is illustrated in the accompanying drawings.Following description is related to
During accompanying drawing, unless otherwise indicated, the same numbers in different accompanying drawings represent same or analogous key element.Following exemplary embodiment
Described in embodiment do not represent all embodiments consistent with the disclosure.On the contrary, they be only with it is such as appended
The example of the consistent apparatus and method of some aspects be described in detail in claims, the disclosure.
Fig. 1 is a kind of structure of web page alteration detection Method And Principle flow chart according to an exemplary embodiment, including:
Step 11, obtain webpage HTML code and parse, obtain web data.
In the present embodiment, it is necessary first to obtain the web data of webpage to be detected.Because existing webpage is mostly based on
HTML code is write, thus, it is necessary first to obtain the HTML code of webpage.
HTML is an application under a kind of HTML, standard generalized markup language, is also a kind of specification, one
The standard of kind, it marks the various pieces in the webpage to be shown by label symbol.Web page files are a kind of text text in itself
Part, by adding marker character in text, can tell how browser shows content therein (such as:Word such as where
How reason, picture arranges, and how picture shows).
One multiple html file of webpage correspondence, with .htm, (disc operating system DOS is limited HTML document
Foreign language abbreviation) be extension name or .HTML (foreign language abbreviation) is extension name.Can use any can generate TXT type source documents
The text editor of part produces HTML document, only with modification file suffixes.The hypertext markup of standard
Language file all has a basic overall structure, and it is typically all to occur in pairs to mark (except the mark of part for example:<br/
>), i.e. the beginning of HTML document and the head and entity two large divisions of ending mark and HTML.
There are three double labellings to accord with for the integrally-built confirmation of the page.
By the HTML code of webpage, the substance of webpage can be obtained, and webpage structural framing etc..These numbers
The parsing and arrangement classified according to needs.Namely content-data in webpage and structured data are classified, and according to default
Form is preserved.The data of preservation are web data.Web data is needed after further processing, can be specific
Parse particular content.
As a rule, the webpage of detection needed for logining is generally required, then acquisition request webpage HTML code therein, will
Stored after these HTML codes classification correspondence.
Step 12, according to the web data, the structure of web page of wherein each label is extracted, dom tree is set up.
In one embodiment, specific web data is needed according to respective label substance, corresponds to specific webpage
Structure, according to the difference of structure of web page, sets up different dom trees.
Html tag is most basic unit in html language, and html tag is the most important parts of HTML.Generally
Html tag there are following features:
The keyword surrounded by angle brackets, such as<HTML>;
Typically occur in pairs, such as<div>With</div>;
First label of label centering is to start label, and second label is end-tag;
Beginning and end label is also referred to as open label and closure label;
Also there is the label individually presented, such as:<Img src=" Baidupedia .GIF "/>Deng;
The general label occurred in pairs, its content is in the middle of two labels, the label individually presented, then in tag attributes
Assignment, such as<h1>Title</h1>With<Input type=" text " value=" button "/>;
The content of webpage need to be<HTML>In label, the letter such as title, character format, language, compatibility, keyword, description
Breath is shown in<head>In label, and the content that webpage need to be shown need to be nested in<body>In label.
These html tags, define the concrete structure node of webpage, according to these html tags, can accurately judge
The structure and content of webpage.
DOM (HTML Document Object Model, DOM Document Object Model), is specially adapted for HTML/XHTML
DOM Document Object Model.It all regards each element in webpage as one by one object, so that the element in webpage can also be by
Computer language is obtained or edited.DOM is the set with the node of hierarchical structure tissue or pieces of information.This hierarchical structure
Allow developer to be navigated in tree and find customizing messages.The structure is analyzed to usually require to load whole document and tectonic remnant basin knot
Structure, then can just do any work.Because it is that based on level of information, thus DOM is considered as based on tree or based on object
's.
Dom tree is the hierarchical structure of html page.It is made up of element, attribute and text, and they are all a nodes
(Node), just as the organization chart of company.The webpage of input is pre-processed, the webpage of wherein each label is extracted
Structure, is stored with dom tree shape structure (being exactly a multiway tree).Following HTML code is a simple table tableau format,
After pretreatment, its dom tree is can extract out, as shown in Figure 2.
Each label is a DOM node inside HTML code, and each node can include other nodes, similar
Other trunks can be grown on trunk.Dom tree is counted since root label H TML, can find any one in current page
Label.
Specifically, dom tree can by HTML image being reflected in tree structure, be easy to follow-up comparison.Obtain institute
State the html tag of each in web data;Corresponding structure of web page is obtained according to the html tag;According to the structure of web page
Set up dom tree.
Step 13, by the tree structure of dom tree and the web data dom tree type structure matching prestored.
In one embodiment, it is necessary to be obtained with other time section or time point after the tree structure of specific dom tree is set up
To the tree structure of dom tree be compared, can just be confirmed whether to change.And specific other time section or time point are obtained
The tree structure of the dom tree taken can be obtained in advance, can also sets itself.
The tree structure for the dom tree that the different periods obtains, may be identical, it is also possible to which difference is, it is necessary to by specific
Matching algorithm is determined.The tree structure of the dom tree of some webpage is extracted, we can extract the same net of different time
The DOM tree structure of page.Matched for the DOM tree structure of the same webpage of different time, find out the node position of its change
Put.
For example, set the corresponding webpage of above-mentioned code as webpage earlier, the corresponding webpage of following codes is newer
Webpage, it is as follows be newer web page code, lacked a line compared to above-mentioned webpage:
As above the DOM tree structure of webpage can be extracted by the Web-page preprocessing stage, as shown in Figure 3.The two can lead to
Cross dom tree matching algorithm and specifically match and compare, it is specific as shown in Figure 4:
Read dom tree shape structure first, that is, the dom tree shape structure that prestores and webpage to be detected dom tree
Shape structure, can be represented with A and B.
The structure content cleaned in dom tree shape structure, that is, cleaning remove specific content, only retain tree structure.
According to the dom tree, cleaning removes the property value and content of text node of wherein each label, only retains html tag structure
Dom tree.
The MD5 values of each node of two dom tree shape structures are obtained, are compared one by one, are confirmed whether consistent.If one
Cause, then flow terminates, and structure of web page does not change.
If it is inconsistent, more whether need to determine whether this MD5 value has been completed recursive procedure, such as
Fruit is that then this is the comparison of last node, thus terminates flow, otherwise, continues flow.
Two dom trees are searched to the node of same position one by one in sequence.
Whether be leafy node, if it is, judging whether the quantity of the brother of node is consistent if judging the node.
When the node is not leafy node, continue to contrast the MD5 values of all nodes of node subordinate.The value of node differs
During cause, continuation judges whether the quantity of the brother of node is consistent.When consistent, return and perform the step of recurrence completes to judge.
When the node is not leafy node, or the node downstream site MD5 values it is inconsistent when, continue to judge section at the same level
Whether the quantity of point is consistent.If consistent, return and perform the step of recurrence completes to judge.Otherwise, flow is continued.
Calculate two dom tree shape structures, that is, A and B difference.
The position of storage dom tree shape structure to be detected at present and DOM change difference.
Accordingly, it may be determined that acquisition dom tree shape structure to be detected and the dom tree shape structure prestored difference in change
Value, also just obtains the change of location of specific structure of web page.
MD5 is Message-Digest Algorithm 5 (Message-Digest Algorithm 5), for ensuring that information transfer is complete
Unanimously.MD5 typical case application be to a segment information (Message) produce informative abstract (Message-Digest), with prevent by
Distort.Specifically the MD5 values of file are like " digital finger-print " of this file.The MD5 values of each file be it is different, such as
Really anyone has done any change to file, i.e. its MD5 value corresponding " digital finger-print " will change.The present embodiment
In, by the comparison to each node M D5 values, confirm whether each node changes.
Step 14, the change of structure of web page is determined according to matching result.
The result of matching is obtained according to the concrete outcome of matching algorithm, is specifically needed traversal dom tree type structure and is compared
Compared with corresponding node, when there is node different, it is determined that structure of web page is changed at the node.
Core in the present embodiment is the process of structure of web page alteration detection, that is, specific HTML code is set up
Dom tree, and matched to be confirmed whether the process of change using tree structure matching algorithm.
The disclosure obtains web data by obtaining webpage HTML code and parsing;According to the web data, extract
The structure of web page of wherein each label, sets up dom tree;By the tree structure of dom tree and the web data dom tree type knot prestored
Structure is matched;The change of structure of web page is determined according to matching result.The disclosure provide scheme, can quick detection go out structure of web page
Change, dom tree is set up by web page tag content, and confirm the change of structure of web page, energy by comparing dom tree shape structure
Enough detect that related application provides detection instrument well for structure of web page, provided for structure of web page change at quick inspection and change
Precise positioning.
Further, as shown in figure 5, being a kind of structure of web page alteration detection system according to an exemplary embodiment
Structural representation, wherein,
HTML parsing modules 21, for obtaining webpage HTML code and parsing, obtain web data;
Dom tree sets up module 22, for according to the web data, extracting the structure of web page of wherein each label, sets up
Dom tree;
Matching module 23, for by the tree structure of dom tree and the web data dom tree type structure matching that prestores;
Detection module 24, the change for determining structure of web page according to matching result.
Further, the HTML parsing modules 21, including:
Request unit, for logining web page server, asks the corresponding HTML code of the webpage;
Memory cell, for the corresponding HTML code of the webpage to be stored according to default form.
Further, the dom tree sets up module 22, including:
Label acquiring unit, for obtaining each html tag in the web data;
Structure of web page acquiring unit, for obtaining corresponding structure of web page according to the html tag;
Cleaning unit, for according to the structure of web page, cleaning to be removed in the property value and text of wherein each html tag
Hold node, obtain structured data;
Dom tree sets up unit, for setting up dom tree according to the structured data.
Further, the matching module 23, including:
Matching unit, the MD5 values for matching whole dom tree;
Verification unit, for when the MD5 values for confirming the dom tree are inconsistent, the father node from dom tree to child node,
Child node carries out the verification of recursion MD5 values one by one to leaf node;
Aggregation units, for by the inconsistent node of MD5 values, storage to be into structure change set, and user record becomes
Node content more.
Specifically, the present embodiment obtains web data by obtaining webpage HTML code and parsing;According to the webpage
Data, extract the structure of web page of wherein each label, set up dom tree;By the tree structure of dom tree and the webpage number prestored
According to dom tree type structure matching;The change of structure of web page is determined according to matching result.The scheme that the disclosure is provided, can quickly be examined
The change of structure of web page is measured, dom tree is set up by web page tag content, and confirm webpage by comparing dom tree shape structure
The change of structure, can provide detection instrument well for structure of web page detection related application, provide fast for structure of web page change
Precise positioning at speed inspection and change.
Those skilled in the art will readily occur to its of the disclosure after considering specification and putting into practice disclosure disclosed herein
Its embodiment.The application is intended to any modification, purposes or the adaptations of the disclosure, these modifications, purposes or
Person's adaptations follow the general principle of the disclosure and including the undocumented common knowledge in the art of the disclosure
Or conventional techniques.Description and embodiments are considered only as exemplary, and the true scope of the disclosure and spirit are by following
Claim is pointed out.
It should be appreciated that the precision architecture that the disclosure is not limited to be described above and is shown in the drawings, and
And various modifications and changes can be being carried out without departing from the scope.The scope of the present disclosure is only limited by appended claim.
Claims (10)
1. a kind of structure of web page alteration detection method, it is characterised in that including:
Obtain webpage HTML code and parse, obtain web data;
According to the web data, the structure of web page of wherein each label is extracted, dom tree is set up;
By the tree structure of dom tree and the web data dom tree type structure matching prestored;
The change of location of structure of web page is determined according to matching result.
2. according to the method described in claim 1, it is characterised in that the acquisition webpage HTML code is simultaneously parsed, including:
Web page server is logined, the corresponding HTML code of the webpage is asked;
The corresponding HTML code of the webpage is stored according to default form.
3. according to the method described in claim 1, it is characterised in that described according to the web data, extract wherein each mark
The structure of web page of label, sets up dom tree, including:
Obtain each html tag in the web data;
Corresponding structure of web page is obtained according to the html tag;
According to the structure of web page, cleaning removes the property value and content of text node of wherein each html tag, obtains structure number
According to;
Dom tree is set up according to the structured data.
4. according to the method described in claim 1, it is characterised in that the tree structure by dom tree and the net prestored
Page data dom tree type structure matching, including:
The tree structure for the dom tree that different time points are obtained matches.
5. according to the method described in claim 1, it is characterised in that by the tree structure of dom tree and the webpage number prestored
According to dom tree type structure matching, including:
Matched according to tree structure matching algorithm;Specific tree structure matching algorithm includes:
Match the MD5 values of two dom trees;
When the MD5 values for confirming described two dom trees are inconsistent, from the father node of described two dom trees to child node, child node
To leaf node, the contrast verification of recursion MD5 values is carried out one by one;
By the inconsistent node of MD5 values, storage is into structure change set.
6. according to the method described in claim 1, it is characterised in that the change that structure of web page is determined according to matching result,
Including:
Traversal dom tree type structure simultaneously compares corresponding node, when there is node different, it is determined that structure of web page there occurs at the node
Change.
7. a kind of structure of web page alteration detection system, it is characterised in that including:
HTML parsing modules, for obtaining webpage HTML code and parsing, obtain web data;
Dom tree sets up module, for according to the web data, extracting the structure of web page of wherein each label, setting up dom tree;
Matching module, for by the tree structure of dom tree and the web data dom tree type structure matching that prestores;
Detection module, the change for determining structure of web page according to matching result.
8. system according to claim 7, it is characterised in that the HTML parsing modules, including:
Request unit, for logining web page server, asks the corresponding HTML code of the webpage;
Memory cell, for the corresponding HTML code of the webpage to be stored according to default form.
9. system according to claim 7, it is characterised in that the dom tree sets up module, including:
Label acquiring unit, for obtaining each html tag in the web data;
Structure of web page acquiring unit, for obtaining corresponding structure of web page according to the html tag;
Cleaning unit, for according to the structure of web page, cleaning to remove the property value and content of text section of wherein each html tag
Point, obtains structured data;
Dom tree sets up unit, for setting up dom tree according to the structured data.
10. system according to claim 7, it is characterised in that the matching module, including:
Matching unit, the MD5 values for matching whole dom tree;
Verification unit, for when the MD5 values for confirming the dom tree are inconsistent, the father node from dom tree to be to child node, sub- section
Point arrives leaf node, and the verification of recursion MD5 values is carried out one by one;
Aggregation units, for by the inconsistent node of MD5 values, storage to be into structure change set.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710216863.6A CN106960058B (en) | 2017-04-05 | 2017-04-05 | Webpage structure change detection method and system |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710216863.6A CN106960058B (en) | 2017-04-05 | 2017-04-05 | Webpage structure change detection method and system |
Publications (2)
Publication Number | Publication Date |
---|---|
CN106960058A true CN106960058A (en) | 2017-07-18 |
CN106960058B CN106960058B (en) | 2021-01-12 |
Family
ID=59483978
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201710216863.6A Expired - Fee Related CN106960058B (en) | 2017-04-05 | 2017-04-05 | Webpage structure change detection method and system |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN106960058B (en) |
Cited By (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107612908A (en) * | 2017-09-15 | 2018-01-19 | 杭州安恒信息技术有限公司 | webpage tamper monitoring method and device |
CN109542776A (en) * | 2018-11-07 | 2019-03-29 | 北京潘达互娱科技有限公司 | Page comparison method, device and equipment |
CN109597972A (en) * | 2018-12-10 | 2019-04-09 | 杭州全维技术股份有限公司 | A kind of webpage dynamic change and altering detecting method based on web page frame |
CN110046295A (en) * | 2019-03-12 | 2019-07-23 | 重庆金融资产交易所有限责任公司 | Structure of web page alteration detection method, apparatus and computer readable storage medium |
CN112887381A (en) * | 2021-01-15 | 2021-06-01 | 中国地质大学(武汉) | Method and device for detecting and converging new content facing specific network entrance |
CN111158973B (en) * | 2019-12-05 | 2021-06-18 | 北京大学 | Web application dynamic evolution monitoring method |
CN114528005A (en) * | 2021-11-29 | 2022-05-24 | 深圳市千源互联网科技服务有限公司 | Grab tag updating method, device, equipment and storage medium |
CN114969478A (en) * | 2022-05-30 | 2022-08-30 | 上海弘玑信息技术有限公司 | Webpage structure detection method, equipment and readable storage medium |
Citations (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102129528A (en) * | 2010-01-19 | 2011-07-20 | 北京启明星辰信息技术股份有限公司 | WEB page tampering identification method and system |
CN102682098A (en) * | 2012-04-27 | 2012-09-19 | 北京神州绿盟信息安全科技股份有限公司 | Method and device for detecting web page content changes |
CN103345532A (en) * | 2013-07-26 | 2013-10-09 | 人民搜索网络股份公司 | Method and device for extracting webpage information |
CN203251315U (en) * | 2012-12-20 | 2013-10-23 | 上海明想电子科技有限公司 | Webpage variation monitoring system |
CN103544213A (en) * | 2013-09-16 | 2014-01-29 | 青岛英网资讯股份有限公司 | Network content upgrading detection assessment method and system |
CN103605925A (en) * | 2013-11-29 | 2014-02-26 | 北京奇虎科技有限公司 | Webpage tampering detecting method and device |
CN103605926A (en) * | 2013-11-29 | 2014-02-26 | 北京奇虎科技有限公司 | Webpage tampering detecting method and device |
CN103838801A (en) * | 2012-11-27 | 2014-06-04 | 大连灵动科技发展有限公司 | Webpage theme information extraction method |
CN103885960A (en) * | 2012-12-20 | 2014-06-25 | 上海明想电子科技有限公司 | Method for monitoring webpage change |
CN105630843A (en) * | 2014-11-17 | 2016-06-01 | 广州市动景计算机科技有限公司 | Webpage change monitoring method and device |
-
2017
- 2017-04-05 CN CN201710216863.6A patent/CN106960058B/en not_active Expired - Fee Related
Patent Citations (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102129528A (en) * | 2010-01-19 | 2011-07-20 | 北京启明星辰信息技术股份有限公司 | WEB page tampering identification method and system |
CN102682098A (en) * | 2012-04-27 | 2012-09-19 | 北京神州绿盟信息安全科技股份有限公司 | Method and device for detecting web page content changes |
CN103838801A (en) * | 2012-11-27 | 2014-06-04 | 大连灵动科技发展有限公司 | Webpage theme information extraction method |
CN203251315U (en) * | 2012-12-20 | 2013-10-23 | 上海明想电子科技有限公司 | Webpage variation monitoring system |
CN103885960A (en) * | 2012-12-20 | 2014-06-25 | 上海明想电子科技有限公司 | Method for monitoring webpage change |
CN103345532A (en) * | 2013-07-26 | 2013-10-09 | 人民搜索网络股份公司 | Method and device for extracting webpage information |
CN103544213A (en) * | 2013-09-16 | 2014-01-29 | 青岛英网资讯股份有限公司 | Network content upgrading detection assessment method and system |
CN103605925A (en) * | 2013-11-29 | 2014-02-26 | 北京奇虎科技有限公司 | Webpage tampering detecting method and device |
CN103605926A (en) * | 2013-11-29 | 2014-02-26 | 北京奇虎科技有限公司 | Webpage tampering detecting method and device |
CN105630843A (en) * | 2014-11-17 | 2016-06-01 | 广州市动景计算机科技有限公司 | Webpage change monitoring method and device |
Cited By (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107612908A (en) * | 2017-09-15 | 2018-01-19 | 杭州安恒信息技术有限公司 | webpage tamper monitoring method and device |
CN107612908B (en) * | 2017-09-15 | 2020-06-05 | 杭州安恒信息技术股份有限公司 | Webpage tampering monitoring method and device |
CN109542776A (en) * | 2018-11-07 | 2019-03-29 | 北京潘达互娱科技有限公司 | Page comparison method, device and equipment |
CN109597972A (en) * | 2018-12-10 | 2019-04-09 | 杭州全维技术股份有限公司 | A kind of webpage dynamic change and altering detecting method based on web page frame |
CN110046295A (en) * | 2019-03-12 | 2019-07-23 | 重庆金融资产交易所有限责任公司 | Structure of web page alteration detection method, apparatus and computer readable storage medium |
CN111158973B (en) * | 2019-12-05 | 2021-06-18 | 北京大学 | Web application dynamic evolution monitoring method |
CN112887381A (en) * | 2021-01-15 | 2021-06-01 | 中国地质大学(武汉) | Method and device for detecting and converging new content facing specific network entrance |
CN114528005A (en) * | 2021-11-29 | 2022-05-24 | 深圳市千源互联网科技服务有限公司 | Grab tag updating method, device, equipment and storage medium |
CN114528005B (en) * | 2021-11-29 | 2023-06-23 | 深圳市千源互联网科技服务有限公司 | Grabbing label updating method, grabbing label updating device, grabbing label updating equipment and storage medium |
CN114969478A (en) * | 2022-05-30 | 2022-08-30 | 上海弘玑信息技术有限公司 | Webpage structure detection method, equipment and readable storage medium |
Also Published As
Publication number | Publication date |
---|---|
CN106960058B (en) | 2021-01-12 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN106960058A (en) | A kind of structure of web page alteration detection method and system | |
US8381095B1 (en) | Automated document revision markup and change control | |
US20070033520A1 (en) | System and method for web page localization | |
CN109857956B (en) | News webpage key information automatic extraction method based on label and block characteristics | |
JP2010086517A (en) | Computer-implemented method for extracting data from web page | |
JPH08241332A (en) | Device and method for retrieving all-sentence registered word | |
CN111680634A (en) | Document file processing method and device, computer equipment and storage medium | |
Cardoso et al. | An efficient language-independent method to extract content from news webpages | |
CN113254751B (en) | Method, equipment and storage medium for accurately extracting complex webpage structured information | |
CN109344355A (en) | Automatic returning detection and Block- matching adaptive approach and device for Web evolution | |
CN105740355B (en) | Webpage context extraction method and device based on aggregation text density | |
CN107145591B (en) | Title-based webpage effective metadata content extraction method | |
JPH11110384A (en) | Method and device for retrieving and displaying structured document | |
KR100284580B1 (en) | Web document automatic generating device and method | |
Sirsat et al. | Pattern matching for extraction of core contents from news web pages | |
CN111158973B (en) | Web application dynamic evolution monitoring method | |
CN116090416B (en) | Standard writing method, system, equipment and medium based on standard knowledge graph | |
CN113343140B (en) | Method for automatically extracting webpage text content based on neo4j graphic database | |
CN114115831A (en) | Data processing method, device, equipment and storage medium | |
CN113392354A (en) | Webpage text analysis method, system, medium and electronic equipment | |
CN114637505A (en) | Page content extraction method and device | |
CN112328246A (en) | Page component generation method and device, computer equipment and storage medium | |
CN108897749A (en) | Method for abstracting web page information and system based on syntax tree and text block density | |
CN103870546B (en) | The analysis method of on-line off-line environment page contrast and equipment after transcoding | |
CN114492419B (en) | Text labeling method, system and device based on newly added key words in labeling |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant | ||
CF01 | Termination of patent right due to non-payment of annual fee |
Granted publication date: 20210112 |
|
CF01 | Termination of patent right due to non-payment of annual fee |