CN102682098B - 检测网页内容变更的方法及装置 - Google Patents
检测网页内容变更的方法及装置 Download PDFInfo
- Publication number
- CN102682098B CN102682098B CN201210129996.7A CN201210129996A CN102682098B CN 102682098 B CN102682098 B CN 102682098B CN 201210129996 A CN201210129996 A CN 201210129996A CN 102682098 B CN102682098 B CN 102682098B
- Authority
- CN
- China
- Prior art keywords
- subtree
- dom
- tree
- dom tree
- hash
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000000034 method Methods 0.000 title claims abstract description 31
- 101150060512 SPATA6 gene Proteins 0.000 claims description 134
- 238000012986 modification Methods 0.000 claims description 5
- 230000004048 modification Effects 0.000 claims description 5
- 238000004364 calculation method Methods 0.000 claims description 3
- 238000012217 deletion Methods 0.000 claims description 3
- 230000037430 deletion Effects 0.000 claims description 3
- 238000003780 insertion Methods 0.000 claims description 3
- 230000037431 insertion Effects 0.000 claims description 3
- 238000001514 detection method Methods 0.000 abstract description 6
- 238000010586 diagram Methods 0.000 description 4
- 238000005516 engineering process Methods 0.000 description 2
- 238000011161 development Methods 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 239000003550 marker Substances 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 238000012545 processing Methods 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
- 238000012795 verification Methods 0.000 description 1
Images
Landscapes
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
- Document Processing Apparatus (AREA)
Abstract
Description
Claims (4)
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201210129996.7A CN102682098B (zh) | 2012-04-27 | 2012-04-27 | 检测网页内容变更的方法及装置 |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201210129996.7A CN102682098B (zh) | 2012-04-27 | 2012-04-27 | 检测网页内容变更的方法及装置 |
Publications (2)
Publication Number | Publication Date |
---|---|
CN102682098A CN102682098A (zh) | 2012-09-19 |
CN102682098B true CN102682098B (zh) | 2014-05-14 |
Family
ID=46814023
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201210129996.7A Active CN102682098B (zh) | 2012-04-27 | 2012-04-27 | 检测网页内容变更的方法及装置 |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN102682098B (zh) |
Families Citing this family (13)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103577526B (zh) * | 2013-08-01 | 2017-06-06 | 星云融创(北京)科技有限公司 | 一种验证页面是否被修改的方法、系统及浏览器 |
CN105302803B (zh) * | 2014-05-28 | 2019-03-19 | 中国科学院沈阳自动化研究所 | 一种产品bom差异分析与同步更新方法 |
CN107204960B (zh) * | 2016-03-16 | 2020-11-24 | 阿里巴巴集团控股有限公司 | 网页识别方法及装置、服务器 |
CN108073828B (zh) * | 2016-11-16 | 2022-02-18 | 阿里巴巴集团控股有限公司 | 一种网页防篡改方法、装置及系统 |
CN106599242B (zh) * | 2016-12-20 | 2019-03-26 | 福建六壬网安股份有限公司 | 一种基于相似度计算的网页变更监测方法和系统 |
CN106960058B (zh) * | 2017-04-05 | 2021-01-12 | 金电联行(北京)信息技术有限公司 | 一种网页结构变更检测方法及系统 |
CN109255088A (zh) * | 2017-07-07 | 2019-01-22 | 普天信息技术有限公司 | 网页数据监测方法和设备 |
CN108021692B (zh) * | 2017-12-18 | 2022-03-11 | 北京天融信网络安全技术有限公司 | 一种监控网页的方法、服务器及计算机可读存储介质 |
CN109542776A (zh) * | 2018-11-07 | 2019-03-29 | 北京潘达互娱科技有限公司 | 页面比对方法、装置及设备 |
CN109815744A (zh) * | 2018-12-18 | 2019-05-28 | 中国科学院计算机网络信息中心 | 网页篡改的检测方法、装置及存储介质 |
CN110046295A (zh) * | 2019-03-12 | 2019-07-23 | 重庆金融资产交易所有限责任公司 | 网页结构变更检测方法、装置及计算机可读存储介质 |
CN111143744B (zh) * | 2019-12-26 | 2023-10-13 | 杭州安恒信息技术股份有限公司 | 一种web资产检测的方法、装置、设备及可读存储介质 |
CN112887381B (zh) * | 2021-01-15 | 2022-07-19 | 中国地质大学(武汉) | 用于面向特定网络入口的新内容检测和汇聚方法及装置 |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101471818A (zh) * | 2007-12-24 | 2009-07-01 | 北京启明星辰信息技术股份有限公司 | 一种恶意注入脚本网页检测方法和系统 |
CN101587488A (zh) * | 2009-05-25 | 2009-11-25 | 深圳市腾讯计算机系统有限公司 | 一种搜索引擎中页面重定向的检测方法及装置 |
JP2010086517A (ja) * | 2008-09-29 | 2010-04-15 | Mitsubishi Electric Research Laboratories Inc | コンピュータによって実施される、ウェブページからデータを抽出する方法 |
CN102316081A (zh) * | 2010-06-30 | 2012-01-11 | 北京启明星辰信息技术股份有限公司 | 一种相似网页的识别方法及装置 |
WO2012022044A1 (en) * | 2010-08-20 | 2012-02-23 | Hewlett-Packard Development Company, L. P. | Systems and methods for filtering web page contents |
Family Cites Families (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US8667015B2 (en) * | 2009-11-25 | 2014-03-04 | Hewlett-Packard Development Company, L.P. | Data extraction method, computer program product and system |
-
2012
- 2012-04-27 CN CN201210129996.7A patent/CN102682098B/zh active Active
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101471818A (zh) * | 2007-12-24 | 2009-07-01 | 北京启明星辰信息技术股份有限公司 | 一种恶意注入脚本网页检测方法和系统 |
JP2010086517A (ja) * | 2008-09-29 | 2010-04-15 | Mitsubishi Electric Research Laboratories Inc | コンピュータによって実施される、ウェブページからデータを抽出する方法 |
CN101587488A (zh) * | 2009-05-25 | 2009-11-25 | 深圳市腾讯计算机系统有限公司 | 一种搜索引擎中页面重定向的检测方法及装置 |
CN102316081A (zh) * | 2010-06-30 | 2012-01-11 | 北京启明星辰信息技术股份有限公司 | 一种相似网页的识别方法及装置 |
WO2012022044A1 (en) * | 2010-08-20 | 2012-02-23 | Hewlett-Packard Development Company, L. P. | Systems and methods for filtering web page contents |
Also Published As
Publication number | Publication date |
---|---|
CN102682098A (zh) | 2012-09-19 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN102682098B (zh) | 检测网页内容变更的方法及装置 | |
US7941420B2 (en) | Method for organizing structurally similar web pages from a web site | |
US9384175B2 (en) | Determination of differences between electronic documents | |
US11205041B2 (en) | Web element rediscovery system and method | |
US9639631B2 (en) | Converting XML to JSON with configurable output | |
US20090063538A1 (en) | Method for normalizing dynamic urls of web pages through hierarchical organization of urls from a web site | |
JP2010501096A (ja) | ラッパー生成およびテンプレート検出の協同最適化 | |
US8321382B2 (en) | Validating aggregate documents | |
KR20120124581A (ko) | 개선된 유사 문서 탐지 방법, 장치 및 컴퓨터 판독 가능한 기록 매체 | |
US8762829B2 (en) | Robust wrappers for web extraction | |
JP2009543255A (ja) | パラレル・データを特定するために階層的かつ順次的なドキュメント・ツリーを対応付けること | |
JP2006004417A (ja) | 情報ファイルの特定のタイプを認識する方法及び装置 | |
Gowda et al. | Clustering web pages based on structure and style similarity (application paper) | |
CN107862039B (zh) | 网页数据获取方法、系统和数据匹配推送方法 | |
Ferrara et al. | Automatic wrapper adaptation by tree edit distance matching | |
CN114817811B (zh) | 一种网站解析方法和装置 | |
Döhmen et al. | Multi-hypothesis CSV parsing | |
CN111782798B (zh) | 摘要生成方法、装置和设备以及项目管理方法 | |
CN104765882A (zh) | 一种基于网页特征字符串的互联网网站统计方法 | |
CN105740370B (zh) | 一种在线Web新闻内容抽取系统 | |
US8954438B1 (en) | Structured metadata extraction | |
CN110413307B (zh) | 代码功能的关联方法、装置及电子设备 | |
US20090204889A1 (en) | Adaptive sampling of web pages for extraction | |
Nethra et al. | WEB CONTENT EXTRACTION USING HYBRID APPROACH. | |
CN103870590A (zh) | 具有报错特征的网页识别方法和装置 |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
C14 | Grant of patent or utility model | ||
GR01 | Patent grant | ||
ASS | Succession or assignment of patent right |
Owner name: NSFOCUS TECHNOLOGY CO., LTD. Effective date: 20140514 |
|
C41 | Transfer of patent application or patent right or utility model | ||
TR01 | Transfer of patent right |
Effective date of registration: 20140514 Address after: 100089 Beijing city Haidian District Road No. 4 North wa Yitai 3 storey building Patentee after: NSFOCUS INFORMATION TECHNOLOGY Co.,Ltd. Patentee after: NSFOCUS TECHNOLOGIES Inc. Address before: 100089 Beijing city Haidian District Road No. 4 North wa Yitai 3 storey building Patentee before: NSFOCUS INFORMATION TECHNOLOGY Co.,Ltd. |
|
CP01 | Change in the name or title of a patent holder |
Address after: 100089 3rd floor, Yitai building, 4 Beiwa Road, Haidian District, Beijing Patentee after: NSFOCUS Technologies Group Co.,Ltd. Patentee after: NSFOCUS TECHNOLOGIES Inc. Address before: 100089 3rd floor, Yitai building, 4 Beiwa Road, Haidian District, Beijing Patentee before: NSFOCUS INFORMATION TECHNOLOGY Co.,Ltd. Patentee before: NSFOCUS TECHNOLOGIES Inc. |
|
CP01 | Change in the name or title of a patent holder |