CN102103636B - 一种面向深层网页的增量信息获取方法 - Google Patents
一种面向深层网页的增量信息获取方法 Download PDFInfo
- Publication number
- CN102103636B CN102103636B CN 201110020898 CN201110020898A CN102103636B CN 102103636 B CN102103636 B CN 102103636B CN 201110020898 CN201110020898 CN 201110020898 CN 201110020898 A CN201110020898 A CN 201110020898A CN 102103636 B CN102103636 B CN 102103636B
- Authority
- CN
- China
- Prior art keywords
- data
- value
- frequency
- url
- novelty
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Expired - Fee Related
Links
- 238000000034 method Methods 0.000 title claims abstract description 39
- 230000008859 change Effects 0.000 claims abstract description 18
- 230000008569 process Effects 0.000 claims abstract description 12
- 238000012549 training Methods 0.000 claims description 11
- 239000000284 extract Substances 0.000 claims description 5
- 230000009471 action Effects 0.000 claims description 4
- 230000000694 effects Effects 0.000 claims description 4
- 230000001360 synchronised effect Effects 0.000 claims description 4
- 230000003993 interaction Effects 0.000 claims description 3
- 238000012545 processing Methods 0.000 claims description 3
- 238000010801 machine learning Methods 0.000 claims description 2
- 238000013481 data capture Methods 0.000 claims 1
- 230000002787 reinforcement Effects 0.000 claims 1
- 230000010354 integration Effects 0.000 abstract description 2
- 230000015572 biosynthetic process Effects 0.000 description 10
- 241000270322 Lepidosauria Species 0.000 description 3
- 238000005516 engineering process Methods 0.000 description 3
- 238000011109 contamination Methods 0.000 description 2
- 238000000605 extraction Methods 0.000 description 2
- 238000013507 mapping Methods 0.000 description 2
- 238000011160 research Methods 0.000 description 2
- 238000009825 accumulation Methods 0.000 description 1
- 238000004458 analytical method Methods 0.000 description 1
- 230000007812 deficiency Effects 0.000 description 1
- 230000002950 deficient Effects 0.000 description 1
- 238000010586 diagram Methods 0.000 description 1
- 238000011156 evaluation Methods 0.000 description 1
- 238000011835 investigation Methods 0.000 description 1
- 238000012423 maintenance Methods 0.000 description 1
- 238000004519 manufacturing process Methods 0.000 description 1
- 238000005457 optimization Methods 0.000 description 1
- 230000004044 response Effects 0.000 description 1
- 230000003068 static effect Effects 0.000 description 1
- 238000003860 storage Methods 0.000 description 1
Images
Landscapes
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
Description
Claims (4)
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN 201110020898 CN102103636B (zh) | 2011-01-18 | 2011-01-18 | 一种面向深层网页的增量信息获取方法 |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN 201110020898 CN102103636B (zh) | 2011-01-18 | 2011-01-18 | 一种面向深层网页的增量信息获取方法 |
Publications (2)
Publication Number | Publication Date |
---|---|
CN102103636A CN102103636A (zh) | 2011-06-22 |
CN102103636B true CN102103636B (zh) | 2013-08-07 |
Family
ID=44156406
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN 201110020898 Expired - Fee Related CN102103636B (zh) | 2011-01-18 | 2011-01-18 | 一种面向深层网页的增量信息获取方法 |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN102103636B (zh) |
Families Citing this family (12)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104036046B (zh) * | 2014-07-02 | 2017-05-03 | 重庆大学 | 基于属性共现模式的Deep Web查询接口模式匹配方法 |
CN104391917A (zh) * | 2014-11-19 | 2015-03-04 | 四川长虹电器股份有限公司 | 一种增量抓取网页内容的方法 |
US10223380B2 (en) * | 2016-03-23 | 2019-03-05 | Here Global B.V. | Map updates from a connected vehicle fleet |
CN105912456B (zh) * | 2016-05-10 | 2019-01-22 | 福建师范大学 | 一种基于用户兴趣迁移的大数据集仿真生成方法 |
CN111831908A (zh) * | 2020-06-24 | 2020-10-27 | 平安科技(深圳)有限公司 | 医疗领域知识图谱构建方法、装置、设备及存储介质 |
CN113021818A (zh) * | 2021-03-25 | 2021-06-25 | 弘丰塑胶制品(深圳)有限公司 | 一种具有自动脱料功能的注射模具的控制系统 |
CN113190585A (zh) * | 2021-04-12 | 2021-07-30 | 郑州轻工业大学 | 一种服装设计的大数据采集分析系统 |
CN113327653A (zh) * | 2021-04-27 | 2021-08-31 | 江苏轩辕特种材料科技有限公司 | 一种新型合金材料的混合预加工系统 |
CN113112584B (zh) * | 2021-05-12 | 2022-09-23 | 中南大学湘雅医院 | 具备动力的智能关节增肌矫形支具、控制系统、终端、介质 |
CN113239091A (zh) * | 2021-05-14 | 2021-08-10 | 杭州志卓科技股份有限公司 | 一种人工智能b2b网站用户的智能评价系统 |
CN113409549A (zh) * | 2021-06-11 | 2021-09-17 | 中铁西南科学研究院有限公司 | 一种高山峡谷地区的滑坡监测预警系统 |
CN114324334A (zh) * | 2021-12-30 | 2022-04-12 | 中国热带农业科学院热带作物品种资源研究所 | 一种芒果种质资源营养品质的评价系统 |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101051313A (zh) * | 2007-05-09 | 2007-10-10 | 崔志明 | 用于深层网页数据源集成的数据源发现方法 |
CN101582074A (zh) * | 2009-01-21 | 2009-11-18 | 东北大学 | 一种DeepWeb响应页面数据抽取方法 |
Family Cites Families (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US7249135B2 (en) * | 2004-05-14 | 2007-07-24 | Microsoft Corporation | Method and system for schema matching of web databases |
-
2011
- 2011-01-18 CN CN 201110020898 patent/CN102103636B/zh not_active Expired - Fee Related
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101051313A (zh) * | 2007-05-09 | 2007-10-10 | 崔志明 | 用于深层网页数据源集成的数据源发现方法 |
CN101582074A (zh) * | 2009-01-21 | 2009-11-18 | 东北大学 | 一种DeepWeb响应页面数据抽取方法 |
Also Published As
Publication number | Publication date |
---|---|
CN102103636A (zh) | 2011-06-22 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN102103636B (zh) | 一种面向深层网页的增量信息获取方法 | |
CN111708740A (zh) | 基于云平台的海量搜索查询日志计算分析系统 | |
JP6216467B2 (ja) | 視覚・意味複合ネットワーク、および当該ネットワークを形成するための方法 | |
CN103310026A (zh) | 一种基于搜索引擎的轻量级通用网页主题爬虫方法 | |
CN103309960A (zh) | 一种网络舆情事件多维信息提取的方法及装置 | |
CN105718590A (zh) | 面向多租户的SaaS舆情监控系统及方法 | |
Saad et al. | Archiving the web using page changes patterns: a case study | |
CN103714140A (zh) | 一种基于主题网络爬虫的搜索方法及装置 | |
CN103412903A (zh) | 基于兴趣对象预测的物联网实时搜索方法及系统 | |
Wang et al. | A novel blockchain oracle implementation scheme based on application specific knowledge engines | |
CN105824880A (zh) | 一种网页抓取方法及装置 | |
Ladekar et al. | Web log based analysis of user's browsing behavior | |
CN107103063A (zh) | 基于大数据的科技信息资源检索查询系统 | |
CN103198136A (zh) | 一种基于时序关联的个人电脑文件查询方法 | |
CN109977285B (zh) | 一种面向Deep Web的自适应增量数据采集方法 | |
Zha et al. | An Efficient Improved Strategy for the PageRank Algorithm | |
CN104598614B (zh) | 一种基于地理语义的数据多比例尺模态扩散更新方法 | |
CN102156733A (zh) | 一种基于面向服务架构的搜索引擎及搜索方法 | |
US20100145944A1 (en) | Mining broad hidden query aspects from user search sessions | |
WO2022105780A1 (zh) | 推荐方法、装置、电子设备、存储介质 | |
Aggarwal | Collaborative crawling: Mining user experiences for topical resource discovery | |
Huang et al. | Research and application of public opinion retrieval based on user behavior modeling | |
Yan et al. | Research on PageRank and hyperlink-induced topic search in web structure mining | |
CN103838841B (zh) | 一种规范化用户需求的组合服务选择方法 | |
Khonsha et al. | New hybrid web personalization framework |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
C14 | Grant of patent or utility model | ||
GR01 | Patent grant | ||
EE01 | Entry into force of recordation of patent licensing contract |
Application publication date: 20110622 Assignee: Science and Technology Co., Ltd. is swum in Jiangsu at once Assignor: Nanjing University of Information Science and Technology Contract record no.: 2015320000189 Denomination of invention: Deep web-oriented incremental information acquisition method Granted publication date: 20130807 License type: Exclusive License Record date: 20150414 |
|
LICC | Enforcement, change and cancellation of record of contracts on the licence for exploitation of a patent or utility model | ||
C41 | Transfer of patent application or patent right or utility model | ||
TR01 | Transfer of patent right |
Effective date of registration: 20161226 Address after: 225400 Jiangsu Province, Taixing City Industrial Park Xiangrong Road No. 18 Patentee after: JIANGSU QIANJING INFORMATION TECHNOLOGY CO., LTD. Address before: 210044 Nanjing Ning Road, Jiangsu, No. six, No. 219 Patentee before: Nanjing IT Engineering Univ. |
|
TR01 | Transfer of patent right |
Effective date of registration: 20180110 Address after: 210044 Nanjing Ning Road, Jiangsu, No. six, No. 219 Patentee after: Nanjing University of Information Science and Technology Address before: 225400 Jiangsu Province, Taixing City Industrial Park Xiangrong Road No. 18 Patentee before: JIANGSU QIANJING INFORMATION TECHNOLOGY CO., LTD. |
|
TR01 | Transfer of patent right | ||
CF01 | Termination of patent right due to non-payment of annual fee |
Granted publication date: 20130807 Termination date: 20180118 |
|
CF01 | Termination of patent right due to non-payment of annual fee |