CN101515300B - 一种Ajax网页内容的抓取方法及系统 - Google Patents
一种Ajax网页内容的抓取方法及系统 Download PDFInfo
- Publication number
- CN101515300B CN101515300B CN2009101336305A CN200910133630A CN101515300B CN 101515300 B CN101515300 B CN 101515300B CN 2009101336305 A CN2009101336305 A CN 2009101336305A CN 200910133630 A CN200910133630 A CN 200910133630A CN 101515300 B CN101515300 B CN 101515300B
- Authority
- CN
- China
- Prior art keywords
- javascript
- ajax
- function
- code
- web page
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/10—Text processing
- G06F40/12—Use of codes for handling textual entities
- G06F40/14—Tree-structured documents
- G06F40/143—Markup, e.g. Standard Generalized Markup Language [SGML] or Document Type Definition [DTD]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/95—Retrieval from the web
- G06F16/958—Organisation or management of web site content, e.g. publishing, maintaining pages or automatic linking
- G06F16/972—Access to data in other repository systems, e.g. legacy data or dynamic Web page generation
Abstract
Description
Claims (10)
Priority Applications (8)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN2009101336305A CN101515300B (zh) | 2009-04-02 | 2009-04-02 | 一种Ajax网页内容的抓取方法及系统 |
HK10101951.3A HK1136053A1 (en) | 2009-04-02 | 2010-02-24 | Method for retrieving ajax web page content and system thereof |
JP2012503668A JP5695027B2 (ja) | 2009-04-02 | 2010-03-31 | Ajaxウェブページコンテンツを取得する方法およびシステム |
US12/863,320 US8413044B2 (en) | 2009-04-02 | 2010-03-31 | Method and system of retrieving Ajax web page content |
EP10759351A EP2414929A4 (en) | 2009-04-02 | 2010-03-31 | METHOD AND SYSTEM FOR EXTRACTING AJAX INTERNET PAGE CONTENT |
PCT/US2010/029444 WO2010114913A1 (en) | 2009-04-02 | 2010-03-31 | Method and system of retrieving ajax web page content |
US13/756,886 US9767082B2 (en) | 2009-04-02 | 2013-02-01 | Method and system of retrieving ajax web page content |
JP2015021591A JP5990605B2 (ja) | 2009-04-02 | 2015-02-05 | Ajaxウェブページコンテンツを取得する方法およびシステム |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN2009101336305A CN101515300B (zh) | 2009-04-02 | 2009-04-02 | 一种Ajax网页内容的抓取方法及系统 |
Publications (2)
Publication Number | Publication Date |
---|---|
CN101515300A CN101515300A (zh) | 2009-08-26 |
CN101515300B true CN101515300B (zh) | 2011-07-20 |
Family
ID=41039753
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN2009101336305A Active CN101515300B (zh) | 2009-04-02 | 2009-04-02 | 一种Ajax网页内容的抓取方法及系统 |
Country Status (6)
Country | Link |
---|---|
US (2) | US8413044B2 (zh) |
EP (1) | EP2414929A4 (zh) |
JP (2) | JP5695027B2 (zh) |
CN (1) | CN101515300B (zh) |
HK (1) | HK1136053A1 (zh) |
WO (1) | WO2010114913A1 (zh) |
Families Citing this family (51)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US8539337B2 (en) * | 2008-02-21 | 2013-09-17 | International Business Machines Corporation | Internet services and methods thereof |
US8898623B2 (en) * | 2008-12-30 | 2014-11-25 | The Regents Of The University Of California | Application design and data flow analysis |
CN101515300B (zh) | 2009-04-02 | 2011-07-20 | 阿里巴巴集团控股有限公司 | 一种Ajax网页内容的抓取方法及系统 |
CN102005130B (zh) * | 2009-09-03 | 2014-01-22 | 上海宝信软件股份有限公司 | 基于WebGIS的智能交通设备实时状态显示系统 |
US9052908B2 (en) * | 2010-01-22 | 2015-06-09 | The Regents Of The University Of California | Web application development framework |
CN102236546B (zh) * | 2010-04-30 | 2014-03-12 | 英业达股份有限公司 | 搜寻互动元素以执行对应脚本的系统及其方法 |
CN102262635A (zh) * | 2010-05-25 | 2011-11-30 | 北京启明星辰信息技术股份有限公司 | 一种网页爬虫系统及方法 |
EP2413265B1 (en) * | 2010-07-29 | 2017-10-18 | Tata Consultancy Services Ltd. | A system and method for classification of moving object during video surveillance |
CN102479231A (zh) * | 2010-11-24 | 2012-05-30 | 财团法人资讯工业策进会 | 网页攀爬方法及其装置 |
CN102073728A (zh) * | 2011-01-13 | 2011-05-25 | 百度在线网络技术(北京)有限公司 | 一种用于确定网页访问请求的方法、装置及设备 |
US8646050B2 (en) * | 2011-01-18 | 2014-02-04 | Apple Inc. | System and method for supporting JIT in a secure system with randomly allocated memory ranges |
US9805135B2 (en) * | 2011-03-30 | 2017-10-31 | Cbs Interactive Inc. | Systems and methods for updating rich internet applications |
US8527862B2 (en) | 2011-06-24 | 2013-09-03 | Usablenet Inc. | Methods for making ajax web applications bookmarkable and crawlable and devices thereof |
CN102880618A (zh) | 2011-07-15 | 2013-01-16 | 国际商业机器公司 | 用于网页文档搜索的方法及系统 |
CN102902581B (zh) | 2011-07-29 | 2016-05-11 | 国际商业机器公司 | 硬件加速器及方法、中央处理单元、计算设备 |
CN103020087A (zh) * | 2011-09-26 | 2013-04-03 | 百度在线网络技术(北京)有限公司 | 点击日志的生成方法及装置,搜索结果调整方法及装置 |
CN102609481A (zh) * | 2012-01-20 | 2012-07-25 | 苏州简拔林网络科技有限公司 | 一种评论信息的实时更新汇总方法 |
CN102609518B (zh) * | 2012-02-09 | 2015-02-18 | 清华大学 | 多状态ajax网页内容获取方法及系统 |
CN102662737B (zh) * | 2012-03-14 | 2014-06-11 | 优视科技有限公司 | 扩展程序的调用方法及装置 |
CN103365919B (zh) * | 2012-04-09 | 2018-07-31 | 北京京东尚科信息技术有限公司 | 网页解析容器及方法 |
CN103577427A (zh) * | 2012-07-25 | 2014-02-12 | 中国移动通信集团公司 | 基于浏览器内核的网页爬取方法、装置及包含该装置的浏览器 |
CN103678321B (zh) * | 2012-09-03 | 2017-11-24 | 阿里巴巴集团控股有限公司 | 页面元素确定方法及设备、用户行为路径确定方法及装置 |
CN102929599B (zh) * | 2012-09-26 | 2015-12-02 | 广州市动景计算机科技有限公司 | 移动终端浏览器界面的修改方法及装置、移动终端 |
CN103853717B (zh) * | 2012-11-28 | 2018-10-12 | 北京京东尚科信息技术有限公司 | 网络爬虫系统 |
WO2014120128A1 (en) * | 2013-01-29 | 2014-08-07 | Hewlett-Packard Development Company, L.P. | Analyzing structure of web application |
CN103268361B (zh) * | 2013-06-07 | 2019-05-31 | 百度在线网络技术(北京)有限公司 | 网页中隐藏url的提取方法、装置和系统 |
CN104111836A (zh) * | 2014-07-14 | 2014-10-22 | 浪潮软件集团有限公司 | 一种网络采集处理异步加载数据的方法 |
US9772829B2 (en) * | 2014-09-09 | 2017-09-26 | Liveperson, Inc. | Dynamic code management |
US11120461B1 (en) | 2014-11-06 | 2021-09-14 | Capital One Services, Llc | Passive user-generated coupon submission |
US11068921B1 (en) | 2014-11-06 | 2021-07-20 | Capital One Services, Llc | Automated testing of multiple on-line coupons |
US10649740B2 (en) * | 2015-01-15 | 2020-05-12 | International Business Machines Corporation | Predicting and using utility of script execution in functional web crawling and other crawling |
EP3258700A4 (en) * | 2015-02-13 | 2017-12-20 | Panasonic Intellectual Property Management Co., Ltd. | Content reproduction system, video recording apparatus, terminal apparatus, and content reproduction method |
US11057446B2 (en) | 2015-05-14 | 2021-07-06 | Bright Data Ltd. | System and method for streaming content from multiple servers |
CN106294397B (zh) * | 2015-05-20 | 2019-10-25 | 无锡天脉聚源传媒科技有限公司 | 一种获取任务的方法及装置 |
CN104965901A (zh) * | 2015-06-30 | 2015-10-07 | 北京奇虎科技有限公司 | 一种目标页面内容抓取方法和装置 |
CN105183453B (zh) * | 2015-08-07 | 2019-04-02 | 安一恒通(北京)科技有限公司 | 基于网页的信息获取方法及装置 |
CN105243088B (zh) * | 2015-09-09 | 2019-10-01 | 深圳Tcl数字技术有限公司 | Android系统中获取网页内容的方法及装置 |
US10082937B2 (en) | 2015-09-11 | 2018-09-25 | International Business Machines Corporation | Intelligent rendering of webpages |
CN105183886A (zh) * | 2015-09-25 | 2015-12-23 | 中国民生银行股份有限公司 | 网页内容提取方法及装置 |
WO2017062678A1 (en) * | 2015-10-07 | 2017-04-13 | Impossible Ventures, LLC | Automated extraction of data from web pages |
CN105740419A (zh) * | 2016-01-29 | 2016-07-06 | 广州酷狗计算机科技有限公司 | 获取网页中动态加载内容的方法及装置 |
US10223353B1 (en) * | 2016-09-20 | 2019-03-05 | Amazon Technologies | Dynamic semantic analysis on free-text reviews to identify safety concerns |
CN106959995A (zh) * | 2016-12-21 | 2017-07-18 | 四川长虹电器股份有限公司 | 兼容双向自动化网页内容采集方法 |
CN108306918B (zh) * | 2017-01-13 | 2021-08-31 | 南京邮电大学盐城大数据研究院有限公司 | 一种基于程序动态分析的网站访问信息自动获取方法 |
CN106991188A (zh) * | 2017-04-11 | 2017-07-28 | 焦点科技股份有限公司 | 一种高效的互联网动态数据自动筛选与抓取方法及系统 |
US11205188B1 (en) | 2017-06-07 | 2021-12-21 | Capital One Services, Llc | Automatically presenting e-commerce offers based on browse history |
CN109150984B (zh) * | 2018-07-27 | 2021-11-02 | 平安科技(深圳)有限公司 | 获取数据资源的方法和装置 |
CN108984801A (zh) * | 2018-08-22 | 2018-12-11 | 百卓网络科技有限公司 | 一种基于html标签识别异步加载内容的搜索引擎优化方法 |
JP7018202B2 (ja) * | 2018-11-27 | 2022-02-10 | 株式会社クリエイト | 掲載情報検索システム |
US11483371B2 (en) | 2020-11-23 | 2022-10-25 | International Business Machines Corporation | User-derived webpage activity control |
CN113076460A (zh) * | 2021-05-07 | 2021-07-06 | 北京华云安信息技术有限公司 | 页面数据爬取方法、装置、设备和计算机可读存储介质 |
Family Cites Families (35)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US7178106B2 (en) * | 1999-04-21 | 2007-02-13 | Sonic Solutions, A California Corporation | Presentation of media content from multiple media sources |
US7346920B2 (en) * | 2000-07-07 | 2008-03-18 | Sonic Solutions, A California Corporation | System, method and article of manufacture for a common cross platform framework for development of DVD-Video content integrated with ROM content |
AU2002258769A1 (en) | 2001-04-09 | 2002-10-21 | America Online Incorporated | Server-based browser system |
WO2003009202A1 (en) * | 2001-07-19 | 2003-01-30 | Live Capsule, Inc. | Method for transmitting a transferable information packet |
US7752326B2 (en) * | 2001-08-20 | 2010-07-06 | Masterobjects, Inc. | System and method for utilizing asynchronous client server communication objects |
US20060190561A1 (en) * | 2002-06-19 | 2006-08-24 | Watchfire Corporation | Method and system for obtaining script related information for website crawling |
US8032860B2 (en) * | 2003-02-26 | 2011-10-04 | Oracle International Corporation | Methods for type-independent source code editing |
US7454410B2 (en) | 2003-05-09 | 2008-11-18 | International Business Machines Corporation | Method and apparatus for web crawler data collection |
US7685296B2 (en) | 2003-09-25 | 2010-03-23 | Microsoft Corporation | Systems and methods for client-based web crawling |
US7584194B2 (en) * | 2004-11-22 | 2009-09-01 | Truveo, Inc. | Method and apparatus for an application crawler |
EP1831796A4 (en) | 2004-11-22 | 2010-01-27 | Truveo Inc | METHOD AND DEVICE FOR AN APPLICATION CRAWLER |
EP1662405A1 (en) * | 2004-11-30 | 2006-05-31 | Alcatel | Method of displaying data on a client computer |
US7536389B1 (en) | 2005-02-22 | 2009-05-19 | Yahoo ! Inc. | Techniques for crawling dynamic web content |
US7636883B2 (en) * | 2005-05-18 | 2009-12-22 | International Business Machines Corporation | User form based automated and guided data collection |
US7506248B2 (en) * | 2005-10-14 | 2009-03-17 | Ebay Inc. | Asynchronously loading dynamically generated content across multiple internet domains |
US7725574B2 (en) * | 2006-01-23 | 2010-05-25 | International Business Machines Corporation | Web browser-based programming language error determination and reporting |
US20080040653A1 (en) * | 2006-08-14 | 2008-02-14 | Christopher Levine | System and methods for managing presentation and behavioral use of web display content |
JP2008071048A (ja) * | 2006-09-13 | 2008-03-27 | Nippon Telegr & Teleph Corp <Ntt> | 動的コンテンツ提示システム及びそのプログラム |
JP2008107987A (ja) * | 2006-10-24 | 2008-05-08 | Logly Kk | 情報提供装置及び情報提供方法 |
JP2008186160A (ja) | 2007-01-29 | 2008-08-14 | Fuji Xerox Co Ltd | 文書表示装置およびプログラム |
US7774788B2 (en) | 2007-03-07 | 2010-08-10 | Ianywhere Solutions, Inc. | Selectively updating web pages on a mobile client |
US8065667B2 (en) * | 2007-03-20 | 2011-11-22 | Yahoo! Inc. | Injecting content into third party documents for document processing |
US9594731B2 (en) * | 2007-06-29 | 2017-03-14 | Microsoft Technology Licensing, Llc | WYSIWYG, browser-based XML editor |
US9563718B2 (en) | 2007-06-29 | 2017-02-07 | Intuit Inc. | Using interactive scripts to facilitate web-based aggregation |
US8131591B2 (en) | 2007-09-12 | 2012-03-06 | Microsoft Corporation | Updating contents of asynchronously refreshable webpages |
US7672938B2 (en) | 2007-10-05 | 2010-03-02 | Microsoft Corporation | Creating search enabled web pages |
US8250585B2 (en) * | 2007-11-05 | 2012-08-21 | International Business Machines Corporation | Extensible framework for managing UI state in a composite AJAX application |
US8572065B2 (en) * | 2007-11-09 | 2013-10-29 | Microsoft Corporation | Link discovery from web scripts |
US8527860B1 (en) * | 2007-12-04 | 2013-09-03 | Appcelerator, Inc. | System and method for exposing the dynamic web server-side |
US7958232B1 (en) * | 2007-12-05 | 2011-06-07 | Appcelerator, Inc. | Dashboard for on-the-fly AJAX monitoring |
US8347405B2 (en) * | 2007-12-27 | 2013-01-01 | International Business Machines Corporation | Asynchronous java script and XML (AJAX) form-based authentication using java 2 platform enterprise edition (J2EE) |
EP2238777B1 (en) * | 2008-01-16 | 2023-10-25 | BlackBerry Limited | Secured presentation layer virtualization for wireless handheld communication device |
CN101546309B (zh) * | 2008-03-26 | 2012-07-04 | 国际商业机器公司 | 对计算机网络中的资源内容构建索引的方法和设备 |
BRPI0924401B1 (pt) * | 2009-03-18 | 2020-05-19 | Google Inc | métodos, sistemas e meios não-transitórios de armazenamento para tradução de web com substituição de exibição |
CN101515300B (zh) | 2009-04-02 | 2011-07-20 | 阿里巴巴集团控股有限公司 | 一种Ajax网页内容的抓取方法及系统 |
-
2009
- 2009-04-02 CN CN2009101336305A patent/CN101515300B/zh active Active
-
2010
- 2010-02-24 HK HK10101951.3A patent/HK1136053A1/xx unknown
- 2010-03-31 JP JP2012503668A patent/JP5695027B2/ja active Active
- 2010-03-31 EP EP10759351A patent/EP2414929A4/en not_active Ceased
- 2010-03-31 US US12/863,320 patent/US8413044B2/en active Active
- 2010-03-31 WO PCT/US2010/029444 patent/WO2010114913A1/en active Application Filing
-
2013
- 2013-02-01 US US13/756,886 patent/US9767082B2/en active Active
-
2015
- 2015-02-05 JP JP2015021591A patent/JP5990605B2/ja active Active
Also Published As
Publication number | Publication date |
---|---|
JP2015135680A (ja) | 2015-07-27 |
WO2010114913A1 (en) | 2010-10-07 |
US9767082B2 (en) | 2017-09-19 |
EP2414929A4 (en) | 2012-11-28 |
HK1136053A1 (en) | 2010-06-18 |
JP2012523047A (ja) | 2012-09-27 |
CN101515300A (zh) | 2009-08-26 |
JP5990605B2 (ja) | 2016-09-14 |
US8413044B2 (en) | 2013-04-02 |
US20120011431A1 (en) | 2012-01-12 |
EP2414929A1 (en) | 2012-02-08 |
JP5695027B2 (ja) | 2015-04-01 |
US20130145253A1 (en) | 2013-06-06 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN101515300B (zh) | 一种Ajax网页内容的抓取方法及系统 | |
US9703883B2 (en) | Social bookmarking of resources exposed in web pages | |
CN105094888B (zh) | 一种应用程序插件加载方法及装置 | |
CN102508710B (zh) | 一种ie6内核与新型ie内核的切换方法和系统 | |
CN100514323C (zh) | 用于自动提取副标题信息的系统和方法 | |
CN103268361B (zh) | 网页中隐藏url的提取方法、装置和系统 | |
CN106293675B (zh) | 系统静态资源加载方法及装置 | |
US9183004B2 (en) | System and method for representing user interaction with a web service | |
US20050138033A1 (en) | Methods, applications and systems for deriving content from network resources | |
CN104063401B (zh) | 一种网页样式地址合并的方法和装置 | |
CN112417243A (zh) | 本地应用的搜索结果 | |
CN103577427A (zh) | 基于浏览器内核的网页爬取方法、装置及包含该装置的浏览器 | |
KR101287371B1 (ko) | 웹 컨텐츠 수집방법 및 수집장치, 그 기록매체 | |
JP2004220251A (ja) | 情報抽出規則作成システム、情報抽出規則作成方法及び情報抽出規則作成プログラム | |
CN111680247B (zh) | 网页字符串的本地调用方法、装置、设备及存储介质 | |
EP2711838A1 (en) | Documentation parser | |
US20120324326A1 (en) | Method and apparatus for outputting a multimedia file of a web page | |
CN101140578B (zh) | 多线程分析网页资料的系统及方法 | |
CN110516185B (zh) | 动态网站的处理方法及装置 | |
Bröring et al. | NOVA: a knowledge base for the Node-RED IoT ecosystem | |
Sudhamathy | Mining web logs: an automated approach | |
TW201044197A (en) | A method and system for capturing contents of Ajax web pages | |
Wu et al. | Web crawler for event-driven crawling of AJAX-based web applications | |
Jeyalatha et al. | Design and Implementation of a Tool for Web Data Extraction and Storage using Java and Uniform Interface | |
Han et al. | Problems, solutions and new opportunities: using pagelet-based templates in development of flexible and extensible web applications |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
REG | Reference to a national code |
Ref country code: HK Ref legal event code: DE Ref document number: 1136053 Country of ref document: HK |
|
C14 | Grant of patent or utility model | ||
GR01 | Patent grant | ||
REG | Reference to a national code |
Ref country code: HK Ref legal event code: GR Ref document number: 1136053 Country of ref document: HK |
|
TR01 | Transfer of patent right | ||
TR01 | Transfer of patent right |
Effective date of registration: 20191217 Address after: P.O. Box 31119, grand exhibition hall, hibiscus street, 802 West Bay Road, Grand Cayman, British Cayman Islands Patentee after: Innovative advanced technology Co., Ltd Address before: Cayman Islands Grand Cayman capital building, a four storey No. 847 mailbox Patentee before: Alibaba Group Holding Co., Ltd. |