CN103853717B - 网络爬虫系统 - Google Patents
网络爬虫系统 Download PDFInfo
- Publication number
- CN103853717B CN103853717B CN201210495699.4A CN201210495699A CN103853717B CN 103853717 B CN103853717 B CN 103853717B CN 201210495699 A CN201210495699 A CN 201210495699A CN 103853717 B CN103853717 B CN 103853717B
- Authority
- CN
- China
- Prior art keywords
- ajax
- data
- crawl
- webpage
- nodes
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/95—Retrieval from the web
- G06F16/951—Indexing; Web crawling techniques
Landscapes
- Engineering & Computer Science (AREA)
- Databases & Information Systems (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Information Transfer Between Computers (AREA)
Abstract
Description
Claims (3)
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201210495699.4A CN103853717B (zh) | 2012-11-28 | 2012-11-28 | 网络爬虫系统 |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201210495699.4A CN103853717B (zh) | 2012-11-28 | 2012-11-28 | 网络爬虫系统 |
Publications (2)
Publication Number | Publication Date |
---|---|
CN103853717A CN103853717A (zh) | 2014-06-11 |
CN103853717B true CN103853717B (zh) | 2018-10-12 |
Family
ID=50861385
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201210495699.4A Active CN103853717B (zh) | 2012-11-28 | 2012-11-28 | 网络爬虫系统 |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN103853717B (zh) |
Families Citing this family (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104111836A (zh) * | 2014-07-14 | 2014-10-22 | 浪潮软件集团有限公司 | 一种网络采集处理异步加载数据的方法 |
CN106020897A (zh) * | 2016-05-30 | 2016-10-12 | 深圳市华傲数据技术有限公司 | 网络爬虫的动态管理方法、装置及系统 |
CN106649567A (zh) * | 2016-11-15 | 2017-05-10 | 杭州安恒信息技术有限公司 | 一种基于浏览器内核的网络爬虫系统 |
CN110069683B (zh) * | 2017-09-18 | 2021-08-13 | 北京国双科技有限公司 | 一种基于浏览器爬取数据的方法及装置 |
CN107729385A (zh) * | 2017-09-19 | 2018-02-23 | 杭州安恒信息技术有限公司 | 一种采集动态网页完整数据内容的方法 |
CN109951457A (zh) * | 2019-03-04 | 2019-06-28 | 广州博士信息技术研究院有限公司 | 一种基于html5特性的防爬虫系统及方法 |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US7536389B1 (en) * | 2005-02-22 | 2009-05-19 | Yahoo ! Inc. | Techniques for crawling dynamic web content |
CN101515300A (zh) * | 2009-04-02 | 2009-08-26 | 阿里巴巴集团控股有限公司 | 一种Ajax网页内容的抓取方法及系统 |
CN102609518A (zh) * | 2012-02-09 | 2012-07-25 | 清华大学 | 多状态ajax网页内容获取方法及系统 |
Family Cites Families (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US6988100B2 (en) * | 2001-02-01 | 2006-01-17 | International Business Machines Corporation | Method and system for extending the performance of a web crawler |
-
2012
- 2012-11-28 CN CN201210495699.4A patent/CN103853717B/zh active Active
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US7536389B1 (en) * | 2005-02-22 | 2009-05-19 | Yahoo ! Inc. | Techniques for crawling dynamic web content |
CN101515300A (zh) * | 2009-04-02 | 2009-08-26 | 阿里巴巴集团控股有限公司 | 一种Ajax网页内容的抓取方法及系统 |
CN102609518A (zh) * | 2012-02-09 | 2012-07-25 | 清华大学 | 多状态ajax网页内容获取方法及系统 |
Non-Patent Citations (1)
Title |
---|
社交网络数据获取技术与实现;胡亚楠;《中国优秀硕士学位论文全文数据库信息科技辑》;20120515;论文正文第10-17页 * |
Also Published As
Publication number | Publication date |
---|---|
CN103853717A (zh) | 2014-06-11 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN103853717B (zh) | 网络爬虫系统 | |
Mesbah et al. | Migrating multi-page web applications to single-page Ajax interfaces | |
CN102597993B (zh) | 利用统一资源标识符管理应用状态信息 | |
CN102349066B (zh) | 在浏览器中的新标签页面和书签工具条 | |
US20160259773A1 (en) | System and method for identifying web elements present on a web-page | |
US20110197124A1 (en) | Automatic Creation And Management Of Dynamic Content | |
CN104217036B (zh) | 一种网页内容提取方法和设备 | |
US20070198727A1 (en) | Method, apparatus and system for extracting field-specific structured data from the web using sample | |
KR101569984B1 (ko) | 웹 스크래핑 추출 데이터 설정 방법 | |
CN104375858B (zh) | 多浏览器平台执行javascript脚本的方法及装置 | |
CN106209863B (zh) | 一种基于全站扫描的网站安全监测方法 | |
CN102262635A (zh) | 一种网页爬虫系统及方法 | |
JP4935399B2 (ja) | セキュリティ運用管理システム、方法およびプログラム | |
CN114398138B (zh) | 界面生成方法、装置、计算机设备和存储介质 | |
CN111381809B (zh) | 一种焦点页面的查找方法及装置 | |
US6772395B1 (en) | Self-modifying data flow execution architecture | |
Shao et al. | Webevo: taming web application evolution via detecting semantic structure changes | |
CN113849718A (zh) | 互联网烟草科技情报信息自动采集装置、方法与存储介质 | |
Alashqar | Automatic generation of uml diagrams from scenario-based user requirements | |
CN103399746B (zh) | 一种便于二次开发的信息管理系统及开发方法 | |
JP2011070541A (ja) | ネットマーケティング支援方法及びネットマーケティング支援装置 | |
KR101231329B1 (ko) | 모바일 환경에서의 웹 데이터 추출을 위한 시스템 | |
CN110309465A (zh) | 一种无界面仿真浏览器组件设计方法及装置 | |
CN113836450B (zh) | 一种基于可视化操作获取xpath的数据接口生成方法 | |
Scaffidi et al. | Using scenario-based requirements to direct research on web macro tools |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
C41 | Transfer of patent application or patent right or utility model | ||
TA01 | Transfer of patent application right |
Effective date of registration: 20160914 Address after: East Building 11, 100195 Beijing city Haidian District xingshikou Road No. 65 west Shan creative garden district 1-4 four layer of 1-4 layer Applicant after: Beijing Jingdong Shangke Information Technology Co., Ltd. Address before: 201203 Shanghai city Pudong New Area Zu Road No. 295 Room 102 Applicant before: Niuhai Information Technology (Shanghai) Co., Ltd. |
|
GR01 | Patent grant | ||
GR01 | Patent grant | ||
TR01 | Transfer of patent right |
Effective date of registration: 20201112 Address after: No.8-6, Putou South Road, Haicang District, Xiamen City, Fujian Province Patentee after: Xiamen xinjianfu e-commerce Co., Ltd Address before: East Building 11, 100195 Beijing city Haidian District xingshikou Road No. 65 west Shan creative garden district 1-4 four layer of 1-4 layer Patentee before: BEIJING JINGDONG SHANGKE INFORMATION TECHNOLOGY Co.,Ltd. |
|
TR01 | Transfer of patent right | ||
TR01 | Transfer of patent right |
Effective date of registration: 20210512 Address after: 361000 No.8, Putou South Road, Haicang District, Xiamen City, Fujian Province Patentee after: Xiamen Jianfu Chain Management Co.,Ltd. Address before: No.8-6, Putou South Road, Haicang District, Xiamen City, Fujian Province 361022 Patentee before: Xiamen xinjianfu e-commerce Co., Ltd |
|
TR01 | Transfer of patent right |