CN104008190B - 一种爬虫系统及其方法 - Google Patents
一种爬虫系统及其方法 Download PDFInfo
- Publication number
- CN104008190B CN104008190B CN201410259561.3A CN201410259561A CN104008190B CN 104008190 B CN104008190 B CN 104008190B CN 201410259561 A CN201410259561 A CN 201410259561A CN 104008190 B CN104008190 B CN 104008190B
- Authority
- CN
- China
- Prior art keywords
- channel
- information
- crawler system
- web page
- dynamic web
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000000034 method Methods 0.000 title claims abstract description 11
- 241000270322 Lepidosauria Species 0.000 claims description 29
- 230000009193 crawling Effects 0.000 claims description 10
- 241001269238 Data Species 0.000 claims description 3
- 239000000284 extract Substances 0.000 claims description 3
- 238000005516 engineering process Methods 0.000 description 5
- 230000004048 modification Effects 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000009194 climbing Effects 0.000 description 1
- 238000004891 communication Methods 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 230000007812 deficiency Effects 0.000 description 1
- 239000004744 fabric Substances 0.000 description 1
- 230000006855 networking Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/95—Retrieval from the web
- G06F16/951—Indexing; Web crawling techniques
Landscapes
- Engineering & Computer Science (AREA)
- Databases & Information Systems (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
- Information Transfer Between Computers (AREA)
- Computer And Data Communications (AREA)
Abstract
Description
Claims (1)
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201410259561.3A CN104008190B (zh) | 2014-06-12 | 2014-06-12 | 一种爬虫系统及其方法 |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201410259561.3A CN104008190B (zh) | 2014-06-12 | 2014-06-12 | 一种爬虫系统及其方法 |
Publications (2)
Publication Number | Publication Date |
---|---|
CN104008190A CN104008190A (zh) | 2014-08-27 |
CN104008190B true CN104008190B (zh) | 2017-04-19 |
Family
ID=51368847
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201410259561.3A Active CN104008190B (zh) | 2014-06-12 | 2014-06-12 | 一种爬虫系统及其方法 |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN104008190B (zh) |
Families Citing this family (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106933841A (zh) * | 2015-12-29 | 2017-07-07 | 北京国双科技有限公司 | 论坛目录页内容爬取方法和装置 |
CN106933827A (zh) * | 2015-12-29 | 2017-07-07 | 北京国双科技有限公司 | 论坛目录页内容解析方法和装置 |
CN114817820B (zh) * | 2022-06-30 | 2022-10-14 | 深圳希施玛数据科技有限公司 | 网站数据升级的预警方法及相关装置 |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102222310A (zh) * | 2011-07-18 | 2011-10-19 | 深圳证券信息有限公司 | 证券信息发布方法和平台 |
CN102402627A (zh) * | 2011-12-31 | 2012-04-04 | 凤凰在线(北京)信息技术有限公司 | 一种文章实时智能抓取系统和方法 |
CN102521379A (zh) * | 2011-12-19 | 2012-06-27 | 上海交通大学 | 基于主动推技术的互联网信息采集方法和装置 |
Family Cites Families (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20080281793A1 (en) * | 2007-01-11 | 2008-11-13 | Anup Kumar Mathur | Method and System of Information Engine with Make-Share-Search of consumer and professional Information and Content for Multi-media and Mobile Global Internet |
-
2014
- 2014-06-12 CN CN201410259561.3A patent/CN104008190B/zh active Active
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102222310A (zh) * | 2011-07-18 | 2011-10-19 | 深圳证券信息有限公司 | 证券信息发布方法和平台 |
CN102521379A (zh) * | 2011-12-19 | 2012-06-27 | 上海交通大学 | 基于主动推技术的互联网信息采集方法和装置 |
CN102402627A (zh) * | 2011-12-31 | 2012-04-04 | 凤凰在线(北京)信息技术有限公司 | 一种文章实时智能抓取系统和方法 |
Non-Patent Citations (1)
Title |
---|
搜索引擎爬虫工作原理-大揭秘;站长之家用户;《http://www.chinaz.com/web/2013/0325/297115.shtml》;20130325;第1-2页 * |
Also Published As
Publication number | Publication date |
---|---|
CN104008190A (zh) | 2014-08-27 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN104125209B (zh) | 恶意网址提示方法和路由器 | |
CN102054028B (zh) | 一种网络爬虫系统实现页面渲染功能的方法 | |
CN104462156B (zh) | 一种基于用户行为的特征提取、个性化推荐方法和系统 | |
CN103218431B (zh) | 一种能识别网页信息自动采集的系统 | |
CN102591992A (zh) | 基于垂直搜索和聚焦爬虫技术的网页分类识别系统及方法 | |
CN103077250B (zh) | 一种网页内容抓取方法及装置 | |
CN105760379B (zh) | 一种基于域内页面关联关系检测webshell页面的方法及装置 | |
CN101520798A (zh) | 基于垂直搜索和聚焦爬虫的网页分类技术 | |
CN102436564A (zh) | 一种识别被篡改网页的方法及装置 | |
CN102355488A (zh) | 爬虫种子获取方法与设备及爬虫爬取方法与设备 | |
US20130325919A1 (en) | Resolving a dead shortened uniform resource locator | |
CN105357192B (zh) | 网页推送的方法、装置及系统 | |
CN104615627A (zh) | 一种基于微博平台的事件舆情信息提取方法及系统 | |
CN104008190B (zh) | 一种爬虫系统及其方法 | |
Bhargav et al. | Pattern discovery and users classification through web usage mining | |
WO2017167391A1 (en) | Method and system for preserving privacy in an http communication between a client and a server | |
CN103152387B (zh) | 一种获取http用户行为轨迹的装置与方法 | |
CN104199893A (zh) | 一种快速将全媒体内容发布的系统和方法 | |
CN103761257A (zh) | 基于移动浏览器的网页处理方法及系统 | |
CN103605742B (zh) | 识别网络资源实体目录页的方法及装置 | |
CN103312692A (zh) | 链接地址安全性检测方法及装置 | |
CN108280102A (zh) | 上网行为记录方法、装置及用户终端 | |
CN104835052A (zh) | 一种提高网络广告投递精准度的方法及系统 | |
CN103354546A (zh) | 报文过滤方法与装置 | |
EP3789890A1 (en) | Fully qualified domain name (fqdn) determination |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
C41 | Transfer of patent application or patent right or utility model | ||
TA01 | Transfer of patent application right |
Effective date of registration: 20160622 Address after: 210012, Room 802, North building, No. 1 West Spring Road, Yuhuatai District, Jiangsu, Nanjing Applicant after: JIANGSU WAFA INFORMATION TECHNOLOGY Co.,Ltd. Address before: 210000, Nanjing, Shimonoseki district and Yan Road, No. two, 63, 1, 3, Applicant before: NANJING BOSHI INFORMATION TECHNOLOGY CO.,LTD. |
|
GR01 | Patent grant | ||
GR01 | Patent grant | ||
CP03 | Change of name, title or address | ||
CP03 | Change of name, title or address |
Address after: Room 1402, Building 1, Yunmi City, No. 19 Ningshuang Road, Yuhuatai District, Nanjing City, Jiangsu Province, 210012 Patentee after: Minxing Information Technology Co.,Ltd. Address before: Room 802, North Building, No. 1 Xichun Road, Yuhuatai District, Nanjing City, Jiangsu Province, 210012 Patentee before: JIANGSU WAFA INFORMATION TECHNOLOGY Co.,Ltd. |