CN104008190A - 一种爬虫系统及其方法 - Google Patents
一种爬虫系统及其方法 Download PDFInfo
- Publication number
- CN104008190A CN104008190A CN201410259561.3A CN201410259561A CN104008190A CN 104008190 A CN104008190 A CN 104008190A CN 201410259561 A CN201410259561 A CN 201410259561A CN 104008190 A CN104008190 A CN 104008190A
- Authority
- CN
- China
- Prior art keywords
- channel
- information
- crawler system
- web page
- date issued
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000000034 method Methods 0.000 title claims abstract description 11
- 241000270322 Lepidosauria Species 0.000 claims description 30
- 230000009193 crawling Effects 0.000 claims description 11
- 238000005516 engineering process Methods 0.000 description 3
- 230000007547 defect Effects 0.000 description 1
- 230000007812 deficiency Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/95—Retrieval from the web
- G06F16/951—Indexing; Web crawling techniques
Landscapes
- Engineering & Computer Science (AREA)
- Databases & Information Systems (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
- Information Transfer Between Computers (AREA)
- Computer And Data Communications (AREA)
Abstract
Description
Claims (4)
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201410259561.3A CN104008190B (zh) | 2014-06-12 | 2014-06-12 | 一种爬虫系统及其方法 |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201410259561.3A CN104008190B (zh) | 2014-06-12 | 2014-06-12 | 一种爬虫系统及其方法 |
Publications (2)
Publication Number | Publication Date |
---|---|
CN104008190A true CN104008190A (zh) | 2014-08-27 |
CN104008190B CN104008190B (zh) | 2017-04-19 |
Family
ID=51368847
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201410259561.3A Active CN104008190B (zh) | 2014-06-12 | 2014-06-12 | 一种爬虫系统及其方法 |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN104008190B (zh) |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106933841A (zh) * | 2015-12-29 | 2017-07-07 | 北京国双科技有限公司 | 论坛目录页内容爬取方法和装置 |
CN106933827A (zh) * | 2015-12-29 | 2017-07-07 | 北京国双科技有限公司 | 论坛目录页内容解析方法和装置 |
CN114817820A (zh) * | 2022-06-30 | 2022-07-29 | 深圳希施玛数据科技有限公司 | 网站数据升级的预警方法及相关装置 |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20080281793A1 (en) * | 2007-01-11 | 2008-11-13 | Anup Kumar Mathur | Method and System of Information Engine with Make-Share-Search of consumer and professional Information and Content for Multi-media and Mobile Global Internet |
CN102222310A (zh) * | 2011-07-18 | 2011-10-19 | 深圳证券信息有限公司 | 证券信息发布方法和平台 |
CN102402627A (zh) * | 2011-12-31 | 2012-04-04 | 凤凰在线(北京)信息技术有限公司 | 一种文章实时智能抓取系统和方法 |
CN102521379A (zh) * | 2011-12-19 | 2012-06-27 | 上海交通大学 | 基于主动推技术的互联网信息采集方法和装置 |
-
2014
- 2014-06-12 CN CN201410259561.3A patent/CN104008190B/zh active Active
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20080281793A1 (en) * | 2007-01-11 | 2008-11-13 | Anup Kumar Mathur | Method and System of Information Engine with Make-Share-Search of consumer and professional Information and Content for Multi-media and Mobile Global Internet |
CN102222310A (zh) * | 2011-07-18 | 2011-10-19 | 深圳证券信息有限公司 | 证券信息发布方法和平台 |
CN102521379A (zh) * | 2011-12-19 | 2012-06-27 | 上海交通大学 | 基于主动推技术的互联网信息采集方法和装置 |
CN102402627A (zh) * | 2011-12-31 | 2012-04-04 | 凤凰在线(北京)信息技术有限公司 | 一种文章实时智能抓取系统和方法 |
Non-Patent Citations (1)
Title |
---|
站长之家用户: "搜索引擎爬虫工作原理-大揭秘", 《HTTP://WWW.CHINAZ.COM/WEB/2013/0325/297115.SHTML》 * |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106933841A (zh) * | 2015-12-29 | 2017-07-07 | 北京国双科技有限公司 | 论坛目录页内容爬取方法和装置 |
CN106933827A (zh) * | 2015-12-29 | 2017-07-07 | 北京国双科技有限公司 | 论坛目录页内容解析方法和装置 |
CN114817820A (zh) * | 2022-06-30 | 2022-07-29 | 深圳希施玛数据科技有限公司 | 网站数据升级的预警方法及相关装置 |
Also Published As
Publication number | Publication date |
---|---|
CN104008190B (zh) | 2017-04-19 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US9412115B2 (en) | Configuring tags to monitor other webpage tags in a tag management system | |
CN105608117B (zh) | 一种信息推荐方法及装置 | |
CN102164186B (zh) | 一种实现云搜索服务的方法及系统 | |
CN101916285B (zh) | 一种互联网网页内容解析方法及装置 | |
US9298839B2 (en) | Resolving a dead shortened uniform resource locator | |
CN103699580A (zh) | 数据库同步方法及装置 | |
CN104090889A (zh) | 数据处理方法及系统 | |
CN104915398A (zh) | 一种网页埋点的方法及装置 | |
US20160328475A1 (en) | Method and system for scheduling web crawlers according to keyword search | |
CN102591992A (zh) | 基于垂直搜索和聚焦爬虫技术的网页分类识别系统及方法 | |
CN101441629A (zh) | 一种非结构化网页信息的自动采集方法 | |
US9836775B2 (en) | System and method for synchronized web scraping | |
CN102867053A (zh) | 收集网站信息中有效信息网页的方法、装置及系统 | |
CN103207882A (zh) | 店铺访问数据处理方法及系统 | |
CN103810283A (zh) | 一种基于用户关联关系的微博数据采集方法 | |
CN103186666A (zh) | 基于收藏进行搜索的方法、装置与设备 | |
CN104615627A (zh) | 一种基于微博平台的事件舆情信息提取方法及系统 | |
CN103605848A (zh) | 路径分析方法和装置 | |
CN103778238A (zh) | 一种从维基百科半结构化数据自动构建分类树的方法 | |
CN104933168A (zh) | 一种网页内容自动采集方法 | |
CN104008190A (zh) | 一种爬虫系统及其方法 | |
Basyuk | Popularization of website and without anchor promotion | |
CN103246709A (zh) | 一种网页数据抓取的方法 | |
CN108399224A (zh) | 一种网络购物信息的推送的方法 | |
CN104156458A (zh) | 一种信息的提取方法及装置 |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
C41 | Transfer of patent application or patent right or utility model | ||
TA01 | Transfer of patent application right |
Effective date of registration: 20160622 Address after: 210012, Room 802, North building, No. 1 West Spring Road, Yuhuatai District, Jiangsu, Nanjing Applicant after: JIANGSU WAFA INFORMATION TECHNOLOGY Co.,Ltd. Address before: 210000, Nanjing, Shimonoseki district and Yan Road, No. two, 63, 1, 3, Applicant before: NANJING BOSHI INFORMATION TECHNOLOGY CO.,LTD. |
|
GR01 | Patent grant | ||
GR01 | Patent grant | ||
CP03 | Change of name, title or address | ||
CP03 | Change of name, title or address |
Address after: Room 1402, Building 1, Yunmi City, No. 19 Ningshuang Road, Yuhuatai District, Nanjing City, Jiangsu Province, 210012 Patentee after: Minxing Information Technology Co.,Ltd. Address before: Room 802, North Building, No. 1 Xichun Road, Yuhuatai District, Nanjing City, Jiangsu Province, 210012 Patentee before: JIANGSU WAFA INFORMATION TECHNOLOGY Co.,Ltd. |