CN102117275B - 一种基于互联网定向站点网页数据采集的方法及装置 - Google Patents
一种基于互联网定向站点网页数据采集的方法及装置 Download PDFInfo
- Publication number
- CN102117275B CN102117275B CN2009102175052A CN200910217505A CN102117275B CN 102117275 B CN102117275 B CN 102117275B CN 2009102175052 A CN2009102175052 A CN 2009102175052A CN 200910217505 A CN200910217505 A CN 200910217505A CN 102117275 B CN102117275 B CN 102117275B
- Authority
- CN
- China
- Prior art keywords
- url
- formation
- collected
- weights
- priority
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Expired - Fee Related
Links
- 238000000034 method Methods 0.000 title claims abstract description 17
- 230000015572 biosynthetic process Effects 0.000 claims description 116
- 238000004458 analytical method Methods 0.000 claims description 18
- 241001122315 Polites Species 0.000 claims description 6
- 238000005755 formation reaction Methods 0.000 description 83
- 230000004048 modification Effects 0.000 description 3
- 238000012986 modification Methods 0.000 description 3
- 230000032683 aging Effects 0.000 description 2
- 238000004364 calculation method Methods 0.000 description 2
- 238000005516 engineering process Methods 0.000 description 2
- BTCSSZJGUNDROE-UHFFFAOYSA-N gamma-aminobutyric acid Chemical compound NCCCC(O)=O BTCSSZJGUNDROE-UHFFFAOYSA-N 0.000 description 2
- 238000010276 construction Methods 0.000 description 1
- 230000002950 deficient Effects 0.000 description 1
- 230000018109 developmental process Effects 0.000 description 1
- 238000010586 diagram Methods 0.000 description 1
- 235000003642 hunger Nutrition 0.000 description 1
Images
Abstract
Description
Claims (4)
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN2009102175052A CN102117275B (zh) | 2009-12-31 | 2009-12-31 | 一种基于互联网定向站点网页数据采集的方法及装置 |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN2009102175052A CN102117275B (zh) | 2009-12-31 | 2009-12-31 | 一种基于互联网定向站点网页数据采集的方法及装置 |
Publications (2)
Publication Number | Publication Date |
---|---|
CN102117275A CN102117275A (zh) | 2011-07-06 |
CN102117275B true CN102117275B (zh) | 2012-11-07 |
Family
ID=44216049
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN2009102175052A Expired - Fee Related CN102117275B (zh) | 2009-12-31 | 2009-12-31 | 一种基于互联网定向站点网页数据采集的方法及装置 |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN102117275B (zh) |
Families Citing this family (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103617225B (zh) * | 2013-11-25 | 2019-03-08 | 北京奇虎科技有限公司 | 一种关联网页搜索方法和系统 |
CN104715016B (zh) * | 2015-02-04 | 2018-02-16 | 北京中搜搜悦网络技术有限公司 | 一种搜悦采集方法 |
CN104679838A (zh) * | 2015-02-09 | 2015-06-03 | 北京中搜网络技术股份有限公司 | 一种高效资讯采集的方法 |
CN107025235A (zh) * | 2016-02-01 | 2017-08-08 | 北京国双科技有限公司 | 爬取网页的方法及装置 |
CN106845092B (zh) * | 2017-01-03 | 2021-06-04 | 青岛海信医疗设备股份有限公司 | 一种系统对接方法及装置 |
CN110233776A (zh) * | 2019-05-31 | 2019-09-13 | 湃方科技(北京)有限责任公司 | 一种旋转型机械设备状态监测方法及设备 |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101004740A (zh) * | 2006-01-18 | 2007-07-25 | 腾讯科技(深圳)有限公司 | 一种读取网络资源站点信息的方法及其系统以及搜索引擎 |
CN101051313A (zh) * | 2007-05-09 | 2007-10-10 | 崔志明 | 用于深层网页数据源集成的数据源发现方法 |
CN101178713A (zh) * | 2006-11-29 | 2008-05-14 | 腾讯科技(深圳)有限公司 | 一种采集网页的方法及系统 |
CN101261643A (zh) * | 2008-05-04 | 2008-09-10 | 腾讯科技(深圳)有限公司 | 网站页面信息统计方法及装置 |
US7599920B1 (en) * | 2006-10-12 | 2009-10-06 | Google Inc. | System and method for enabling website owners to manage crawl rate in a website indexing system |
CN101561814A (zh) * | 2009-05-08 | 2009-10-21 | 华中科技大学 | 基于社会标注的主题爬虫系统 |
-
2009
- 2009-12-31 CN CN2009102175052A patent/CN102117275B/zh not_active Expired - Fee Related
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101004740A (zh) * | 2006-01-18 | 2007-07-25 | 腾讯科技(深圳)有限公司 | 一种读取网络资源站点信息的方法及其系统以及搜索引擎 |
US7599920B1 (en) * | 2006-10-12 | 2009-10-06 | Google Inc. | System and method for enabling website owners to manage crawl rate in a website indexing system |
CN101178713A (zh) * | 2006-11-29 | 2008-05-14 | 腾讯科技(深圳)有限公司 | 一种采集网页的方法及系统 |
CN101051313A (zh) * | 2007-05-09 | 2007-10-10 | 崔志明 | 用于深层网页数据源集成的数据源发现方法 |
CN101261643A (zh) * | 2008-05-04 | 2008-09-10 | 腾讯科技(深圳)有限公司 | 网站页面信息统计方法及装置 |
CN101561814A (zh) * | 2009-05-08 | 2009-10-21 | 华中科技大学 | 基于社会标注的主题爬虫系统 |
Also Published As
Publication number | Publication date |
---|---|
CN102117275A (zh) | 2011-07-06 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US9094478B2 (en) | Prereading method and system for web browser | |
CN102117275B (zh) | 一种基于互联网定向站点网页数据采集的方法及装置 | |
Baraglia et al. | Dynamic personalization of web sites without user intervention | |
CN101488135B (zh) | 延后个性化网页的设计和获取方法 | |
US9443197B1 (en) | Predicting user navigation events | |
JP5588981B2 (ja) | 検索クエリーに応答したディスカッションスレッドへの投稿の提供 | |
CN103997507B (zh) | 一种信息的推送方法及装置 | |
US8775550B2 (en) | Caching HTTP request and response streams | |
CN102426610B (zh) | 微博搜索排名方法及微博搜索引擎 | |
US9589056B2 (en) | User information needs based data selection | |
CN102761627A (zh) | 基于终端访问统计的云网址推荐方法及系统及相关设备 | |
US20110087647A1 (en) | System and method for providing web search results to a particular computer user based on the popularity of the search results with other computer users | |
CN103607496B (zh) | 一种推断手机用户兴趣爱好的方法、装置及手机终端 | |
US9785661B2 (en) | Trend response management | |
US20090276729A1 (en) | Adaptive user feedback window | |
EP2395441A1 (en) | Systems and methods for online search recirculation and query categorization | |
CN105721538A (zh) | 数据访问的方法和装置 | |
Yan et al. | Big data driven wireless communications: A human-in-the-loop pushing technique for 5G systems | |
CN101188521B (zh) | 一种挖掘用户行为数据的方法和网站服务器 | |
CN103559258A (zh) | 基于云计算的网页排序方法 | |
Antunes et al. | Scalable semantic aware context storage | |
Chauhan et al. | Web page ranking using machine learning approach | |
Khodaei et al. | Temporal-textual retrieval: Time and keyword search in web documents | |
Bai et al. | Collaborative personalized top-k processing | |
CN110990706B (zh) | 语料推荐方法及装置 |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
C14 | Grant of patent or utility model | ||
GR01 | Patent grant | ||
TR01 | Transfer of patent right |
Effective date of registration: 20220915 Address after: 3007, Hengqin international financial center building, No. 58, Huajin street, Hengqin new area, Zhuhai, Guangdong 519031 Patentee after: New founder holdings development Co.,Ltd. Patentee after: Peking University Patentee after: BEIJING FOUNDER ELECTRONICS CHIEF INFORMATION TECHNOLOGY Co.,Ltd. Patentee after: BEIJING FOUNDER ELECTRONICS Co.,Ltd. Address before: 100871, Beijing, Haidian District Cheng Fu Road 298, founder building, 9 floor Patentee before: PEKING UNIVERSITY FOUNDER GROUP Co.,Ltd. Patentee before: Peking University Patentee before: BEIJING FOUNDER ELECTRONICS CHIEF INFORMATION TECHNOLOGY Co.,Ltd. Patentee before: BEIJING FOUNDER ELECTRONICS Co.,Ltd. |
|
TR01 | Transfer of patent right | ||
CF01 | Termination of patent right due to non-payment of annual fee |
Granted publication date: 20121107 |
|
CF01 | Termination of patent right due to non-payment of annual fee |