CN103853717A - 网络爬虫 - Google Patents
网络爬虫 Download PDFInfo
- Publication number
- CN103853717A CN103853717A CN201210495699.4A CN201210495699A CN103853717A CN 103853717 A CN103853717 A CN 103853717A CN 201210495699 A CN201210495699 A CN 201210495699A CN 103853717 A CN103853717 A CN 103853717A
- Authority
- CN
- China
- Prior art keywords
- ajax
- data
- node
- webpage
- web crawlers
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/95—Retrieval from the web
- G06F16/951—Indexing; Web crawling techniques
Landscapes
- Engineering & Computer Science (AREA)
- Databases & Information Systems (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Information Transfer Between Computers (AREA)
Abstract
Description
Claims (4)
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201210495699.4A CN103853717B (zh) | 2012-11-28 | 2012-11-28 | 网络爬虫系统 |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201210495699.4A CN103853717B (zh) | 2012-11-28 | 2012-11-28 | 网络爬虫系统 |
Publications (2)
Publication Number | Publication Date |
---|---|
CN103853717A true CN103853717A (zh) | 2014-06-11 |
CN103853717B CN103853717B (zh) | 2018-10-12 |
Family
ID=50861385
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201210495699.4A Active CN103853717B (zh) | 2012-11-28 | 2012-11-28 | 网络爬虫系统 |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN103853717B (zh) |
Cited By (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104111836A (zh) * | 2014-07-14 | 2014-10-22 | 浪潮软件集团有限公司 | 一种网络采集处理异步加载数据的方法 |
CN106020897A (zh) * | 2016-05-30 | 2016-10-12 | 深圳市华傲数据技术有限公司 | 网络爬虫的动态管理方法、装置及系统 |
CN106649567A (zh) * | 2016-11-15 | 2017-05-10 | 杭州安恒信息技术有限公司 | 一种基于浏览器内核的网络爬虫系统 |
CN107729385A (zh) * | 2017-09-19 | 2018-02-23 | 杭州安恒信息技术有限公司 | 一种采集动态网页完整数据内容的方法 |
CN109951457A (zh) * | 2019-03-04 | 2019-06-28 | 广州博士信息技术研究院有限公司 | 一种基于html5特性的防爬虫系统及方法 |
CN110069683A (zh) * | 2017-09-18 | 2019-07-30 | 北京国双科技有限公司 | 一种基于浏览器爬取数据的方法及装置 |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20020103823A1 (en) * | 2001-02-01 | 2002-08-01 | International Business Machines Corporation | Method and system for extending the performance of a web crawler |
US7536389B1 (en) * | 2005-02-22 | 2009-05-19 | Yahoo ! Inc. | Techniques for crawling dynamic web content |
CN101515300A (zh) * | 2009-04-02 | 2009-08-26 | 阿里巴巴集团控股有限公司 | 一种Ajax网页内容的抓取方法及系统 |
CN102609518A (zh) * | 2012-02-09 | 2012-07-25 | 清华大学 | 多状态ajax网页内容获取方法及系统 |
-
2012
- 2012-11-28 CN CN201210495699.4A patent/CN103853717B/zh active Active
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20020103823A1 (en) * | 2001-02-01 | 2002-08-01 | International Business Machines Corporation | Method and system for extending the performance of a web crawler |
US7536389B1 (en) * | 2005-02-22 | 2009-05-19 | Yahoo ! Inc. | Techniques for crawling dynamic web content |
CN101515300A (zh) * | 2009-04-02 | 2009-08-26 | 阿里巴巴集团控股有限公司 | 一种Ajax网页内容的抓取方法及系统 |
CN102609518A (zh) * | 2012-02-09 | 2012-07-25 | 清华大学 | 多状态ajax网页内容获取方法及系统 |
Non-Patent Citations (3)
Title |
---|
夏天: "Ajax站点数据采集研究综述", 《情报分析与研究》 * |
胡亚楠: "社交网络数据获取技术与实现", 《中国优秀硕士学位论文全文数据库信息科技辑》 * |
钱程等: "一种支持Ajax框架的网络爬虫的设计与实现", 《计算机与数字工程》 * |
Cited By (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104111836A (zh) * | 2014-07-14 | 2014-10-22 | 浪潮软件集团有限公司 | 一种网络采集处理异步加载数据的方法 |
CN106020897A (zh) * | 2016-05-30 | 2016-10-12 | 深圳市华傲数据技术有限公司 | 网络爬虫的动态管理方法、装置及系统 |
CN106649567A (zh) * | 2016-11-15 | 2017-05-10 | 杭州安恒信息技术有限公司 | 一种基于浏览器内核的网络爬虫系统 |
CN110069683A (zh) * | 2017-09-18 | 2019-07-30 | 北京国双科技有限公司 | 一种基于浏览器爬取数据的方法及装置 |
CN110069683B (zh) * | 2017-09-18 | 2021-08-13 | 北京国双科技有限公司 | 一种基于浏览器爬取数据的方法及装置 |
CN107729385A (zh) * | 2017-09-19 | 2018-02-23 | 杭州安恒信息技术有限公司 | 一种采集动态网页完整数据内容的方法 |
CN109951457A (zh) * | 2019-03-04 | 2019-06-28 | 广州博士信息技术研究院有限公司 | 一种基于html5特性的防爬虫系统及方法 |
Also Published As
Publication number | Publication date |
---|---|
CN103853717B (zh) | 2018-10-12 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Yu et al. | Understanding mashup development | |
US9330179B2 (en) | Configuring web crawler to extract web page information | |
US10534512B2 (en) | System and method for identifying web elements present on a web-page | |
US8762556B2 (en) | Displaying content on a mobile device | |
CN102597993B (zh) | 利用统一资源标识符管理应用状态信息 | |
CN103853717A (zh) | 网络爬虫 | |
KR101569984B1 (ko) | 웹 스크래핑 추출 데이터 설정 방법 | |
CN102349066A (zh) | 在浏览器中的新标签页面和书签工具条 | |
CN101876897A (zh) | 用于在Web浏览器上处理Widget的系统和方法 | |
US8893097B2 (en) | Tool configured to build cross-browser toolbar | |
CN107239546A (zh) | 一种网页局部内容跟踪与提醒的方法 | |
CN109710250B (zh) | 一种用于构建用户界面的可视化引擎系统及方法 | |
CN103177115A (zh) | 一种提取网页页面链接的方法和装置 | |
CN103092936A (zh) | 一种物联网动态页面实时信息采集方法 | |
US9122484B2 (en) | Method and apparatus for mashing up web applications | |
CN108595697A (zh) | 网页集成方法、装置及系统 | |
CN112612943A (zh) | 一种基于异步处理框架的具有自动测试功能的数据爬取方法 | |
CN114398138B (zh) | 界面生成方法、装置、计算机设备和存储介质 | |
CN113849718A (zh) | 互联网烟草科技情报信息自动采集装置、方法与存储介质 | |
Shao et al. | Webevo: taming web application evolution via detecting semantic structure changes | |
CN112068833B (zh) | 一种浏览器系统 | |
JP2010015292A (ja) | 強調表示追加方法、表示制御プログラム、及びサーバ | |
CN110309465B (zh) | 一种无界面仿真浏览器组件设计方法及装置 | |
KR101231329B1 (ko) | 모바일 환경에서의 웹 데이터 추출을 위한 시스템 | |
JP5476867B2 (ja) | マッシュアッププログラム、マッシュアップ装置及びマッシュアップ方法 |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
C41 | Transfer of patent application or patent right or utility model | ||
TA01 | Transfer of patent application right |
Effective date of registration: 20160914 Address after: East Building 11, 100195 Beijing city Haidian District xingshikou Road No. 65 west Shan creative garden district 1-4 four layer of 1-4 layer Applicant after: Beijing Jingdong Shangke Information Technology Co., Ltd. Address before: 201203 Shanghai city Pudong New Area Zu Road No. 295 Room 102 Applicant before: Niuhai Information Technology (Shanghai) Co., Ltd. |
|
GR01 | Patent grant | ||
GR01 | Patent grant | ||
TR01 | Transfer of patent right |
Effective date of registration: 20201112 Address after: No.8-6, Putou South Road, Haicang District, Xiamen City, Fujian Province Patentee after: Xiamen xinjianfu e-commerce Co., Ltd Address before: East Building 11, 100195 Beijing city Haidian District xingshikou Road No. 65 west Shan creative garden district 1-4 four layer of 1-4 layer Patentee before: BEIJING JINGDONG SHANGKE INFORMATION TECHNOLOGY Co.,Ltd. |
|
TR01 | Transfer of patent right | ||
TR01 | Transfer of patent right |
Effective date of registration: 20210512 Address after: 361000 No.8, Putou South Road, Haicang District, Xiamen City, Fujian Province Patentee after: Xiamen Jianfu Chain Management Co.,Ltd. Address before: No.8-6, Putou South Road, Haicang District, Xiamen City, Fujian Province 361022 Patentee before: Xiamen xinjianfu e-commerce Co., Ltd |
|
TR01 | Transfer of patent right |