CN102103636B - 一种面向深层网页的增量信息获取方法 - Google Patents
一种面向深层网页的增量信息获取方法 Download PDFInfo
- Publication number
- CN102103636B CN102103636B CN 201110020898 CN201110020898A CN102103636B CN 102103636 B CN102103636 B CN 102103636B CN 201110020898 CN201110020898 CN 201110020898 CN 201110020898 A CN201110020898 A CN 201110020898A CN 102103636 B CN102103636 B CN 102103636B
- Authority
- CN
- China
- Prior art keywords
- data
- value
- url
- frequency
- timeliness
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Expired - Fee Related
Links
- 238000000034 method Methods 0.000 title claims abstract description 38
- 230000008859 change Effects 0.000 claims abstract description 18
- 230000008569 process Effects 0.000 claims abstract description 12
- 230000015572 biosynthetic process Effects 0.000 claims description 15
- 238000012549 training Methods 0.000 claims description 11
- 239000000284 extract Substances 0.000 claims description 5
- 230000009471 action Effects 0.000 claims description 4
- 230000000694 effects Effects 0.000 claims description 4
- 230000003993 interaction Effects 0.000 claims description 3
- 238000012545 processing Methods 0.000 claims description 3
- 238000004458 analytical method Methods 0.000 claims description 2
- 238000010801 machine learning Methods 0.000 claims description 2
- 238000005728 strengthening Methods 0.000 claims 1
- 230000010354 integration Effects 0.000 abstract description 2
- 241000270322 Lepidosauria Species 0.000 description 3
- 238000005516 engineering process Methods 0.000 description 3
- 230000001360 synchronised effect Effects 0.000 description 3
- 238000011109 contamination Methods 0.000 description 2
- 238000000605 extraction Methods 0.000 description 2
- 238000013507 mapping Methods 0.000 description 2
- 238000011160 research Methods 0.000 description 2
- 238000009825 accumulation Methods 0.000 description 1
- 230000007812 deficiency Effects 0.000 description 1
- 230000002950 deficient Effects 0.000 description 1
- 238000010586 diagram Methods 0.000 description 1
- 238000011156 evaluation Methods 0.000 description 1
- 238000011835 investigation Methods 0.000 description 1
- 238000012423 maintenance Methods 0.000 description 1
- 238000004519 manufacturing process Methods 0.000 description 1
- 238000005457 optimization Methods 0.000 description 1
- 230000004044 response Effects 0.000 description 1
- 230000003068 static effect Effects 0.000 description 1
- 238000003860 storage Methods 0.000 description 1
Images
Landscapes
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
Description
Claims (4)
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN 201110020898 CN102103636B (zh) | 2011-01-18 | 2011-01-18 | 一种面向深层网页的增量信息获取方法 |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN 201110020898 CN102103636B (zh) | 2011-01-18 | 2011-01-18 | 一种面向深层网页的增量信息获取方法 |
Publications (2)
Publication Number | Publication Date |
---|---|
CN102103636A CN102103636A (zh) | 2011-06-22 |
CN102103636B true CN102103636B (zh) | 2013-08-07 |
Family
ID=44156406
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN 201110020898 Expired - Fee Related CN102103636B (zh) | 2011-01-18 | 2011-01-18 | 一种面向深层网页的增量信息获取方法 |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN102103636B (zh) |
Families Citing this family (12)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104036046B (zh) * | 2014-07-02 | 2017-05-03 | 重庆大学 | 基于属性共现模式的Deep Web查询接口模式匹配方法 |
CN104391917A (zh) * | 2014-11-19 | 2015-03-04 | 四川长虹电器股份有限公司 | 一种增量抓取网页内容的方法 |
US10223380B2 (en) * | 2016-03-23 | 2019-03-05 | Here Global B.V. | Map updates from a connected vehicle fleet |
CN105912456B (zh) * | 2016-05-10 | 2019-01-22 | 福建师范大学 | 一种基于用户兴趣迁移的大数据集仿真生成方法 |
CN111831908A (zh) * | 2020-06-24 | 2020-10-27 | 平安科技(深圳)有限公司 | 医疗领域知识图谱构建方法、装置、设备及存储介质 |
CN113021818A (zh) * | 2021-03-25 | 2021-06-25 | 弘丰塑胶制品(深圳)有限公司 | 一种具有自动脱料功能的注射模具的控制系统 |
CN113190585A (zh) * | 2021-04-12 | 2021-07-30 | 郑州轻工业大学 | 一种服装设计的大数据采集分析系统 |
CN113327653A (zh) * | 2021-04-27 | 2021-08-31 | 江苏轩辕特种材料科技有限公司 | 一种新型合金材料的混合预加工系统 |
CN113112584B (zh) * | 2021-05-12 | 2022-09-23 | 中南大学湘雅医院 | 具备动力的智能关节增肌矫形支具、控制系统、终端、介质 |
CN113239091A (zh) * | 2021-05-14 | 2021-08-10 | 杭州志卓科技股份有限公司 | 一种人工智能b2b网站用户的智能评价系统 |
CN113409549A (zh) * | 2021-06-11 | 2021-09-17 | 中铁西南科学研究院有限公司 | 一种高山峡谷地区的滑坡监测预警系统 |
CN114324334A (zh) * | 2021-12-30 | 2022-04-12 | 中国热带农业科学院热带作物品种资源研究所 | 一种芒果种质资源营养品质的评价系统 |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101051313A (zh) * | 2007-05-09 | 2007-10-10 | 崔志明 | 用于深层网页数据源集成的数据源发现方法 |
CN101582074A (zh) * | 2009-01-21 | 2009-11-18 | 东北大学 | 一种DeepWeb响应页面数据抽取方法 |
Family Cites Families (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US7249135B2 (en) * | 2004-05-14 | 2007-07-24 | Microsoft Corporation | Method and system for schema matching of web databases |
-
2011
- 2011-01-18 CN CN 201110020898 patent/CN102103636B/zh not_active Expired - Fee Related
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101051313A (zh) * | 2007-05-09 | 2007-10-10 | 崔志明 | 用于深层网页数据源集成的数据源发现方法 |
CN101582074A (zh) * | 2009-01-21 | 2009-11-18 | 东北大学 | 一种DeepWeb响应页面数据抽取方法 |
Also Published As
Publication number | Publication date |
---|---|
CN102103636A (zh) | 2011-06-22 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN102103636B (zh) | 一种面向深层网页的增量信息获取方法 | |
CN103310026B (zh) | 一种基于搜索引擎的轻量级通用网页主题爬虫方法 | |
CN102760151B (zh) | 开源软件获取与搜索系统的实现方法 | |
CN111708740A (zh) | 基于云平台的海量搜索查询日志计算分析系统 | |
CN101770520A (zh) | 基于用户浏览行为的用户兴趣建模方法 | |
CN103309960A (zh) | 一种网络舆情事件多维信息提取的方法及装置 | |
CN102567407B (zh) | 一种论坛回帖增量采集方法及系统 | |
CN104376406A (zh) | 一种基于大数据的企业创新资源管理与分析系统和方法 | |
CN103559252A (zh) | 给游客推荐其很可能会浏览的景点的方法 | |
CN102402539A (zh) | 对象级个性化垂直搜索引擎设计技术 | |
CN102262661A (zh) | 一种基于k阶混合马尔可夫模型的Web页面访问预测方法 | |
CN103488760A (zh) | 地理信息瓦片服务的提供方法及实现该方法的装置 | |
CN102662954A (zh) | 一种基于url字符串信息学习的主题爬虫系统的实现方法 | |
CN103714140A (zh) | 一种基于主题网络爬虫的搜索方法及装置 | |
CN103150663A (zh) | 一种网络投放数据投放的方法和装置 | |
CN104899229A (zh) | 基于群体智能的行为聚类系统 | |
CN105760443A (zh) | 项目推荐系统、项目推荐装置以及项目推荐方法 | |
JP2016540332A (ja) | 視覚・意味複合ネットワーク、および当該ネットワークを形成するための方法 | |
CN104298785A (zh) | 一种众搜资源搜索方法 | |
CN105824880A (zh) | 一种网页抓取方法及装置 | |
CN102306182A (zh) | 基于概念语义背景图挖掘用户兴趣的方法 | |
CN107103063A (zh) | 基于大数据的科技信息资源检索查询系统 | |
CN109977285B (zh) | 一种面向Deep Web的自适应增量数据采集方法 | |
CN116680469A (zh) | 一种基于动态图神经网络的序列推荐算法 | |
Zha et al. | An Efficient Improved Strategy for the PageRank Algorithm |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
C14 | Grant of patent or utility model | ||
GR01 | Patent grant | ||
EE01 | Entry into force of recordation of patent licensing contract |
Application publication date: 20110622 Assignee: Science and Technology Co., Ltd. is swum in Jiangsu at once Assignor: Nanjing University of Information Science and Technology Contract record no.: 2015320000189 Denomination of invention: Deep web-oriented incremental information acquisition method Granted publication date: 20130807 License type: Exclusive License Record date: 20150414 |
|
LICC | Enforcement, change and cancellation of record of contracts on the licence for exploitation of a patent or utility model | ||
C41 | Transfer of patent application or patent right or utility model | ||
TR01 | Transfer of patent right |
Effective date of registration: 20161226 Address after: 225400 Jiangsu Province, Taixing City Industrial Park Xiangrong Road No. 18 Patentee after: JIANGSU QIANJING INFORMATION TECHNOLOGY CO., LTD. Address before: 210044 Nanjing Ning Road, Jiangsu, No. six, No. 219 Patentee before: Nanjing IT Engineering Univ. |
|
TR01 | Transfer of patent right | ||
TR01 | Transfer of patent right |
Effective date of registration: 20180110 Address after: 210044 Nanjing Ning Road, Jiangsu, No. six, No. 219 Patentee after: Nanjing University of Information Science and Technology Address before: 225400 Jiangsu Province, Taixing City Industrial Park Xiangrong Road No. 18 Patentee before: JIANGSU QIANJING INFORMATION TECHNOLOGY CO., LTD. |
|
CF01 | Termination of patent right due to non-payment of annual fee | ||
CF01 | Termination of patent right due to non-payment of annual fee |
Granted publication date: 20130807 Termination date: 20180118 |