CN101136026A - Web page content capturing method based on XMLHTTP component technology - Google Patents
Web page content capturing method based on XMLHTTP component technology Download PDFInfo
- Publication number
- CN101136026A CN101136026A CNA2007101069606A CN200710106960A CN101136026A CN 101136026 A CN101136026 A CN 101136026A CN A2007101069606 A CNA2007101069606 A CN A2007101069606A CN 200710106960 A CN200710106960 A CN 200710106960A CN 101136026 A CN101136026 A CN 101136026A
- Authority
- CN
- China
- Prior art keywords
- web page
- xmlhttp
- information
- component technology
- source code
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Abstract
In current information age, the amount of information is huge and is increased in a geometric sequence. The enterprises mainly use the manual method to collect information so as to greatly increase the cost on human force, resources, financial power. The invention can use the computer to simulate the manual operation on back stage so as to collect information in high efficiency and low cost.
Description
What the present invention relates to is by XMLHTTP component technology among the XML, realizes the remote web page data message is carried out the method for collection in batches.
Current domestic general information obtain manner all is to realize by manually information being collected one by one, time-consuming, effort, and inefficiency.Simultaneously, obtain information by manual type and will be subjected to restrictions such as human resources, fund, time, obtain information specific timely and effectively thereby influenced enterprise.
Unique distinction of the present invention is to obtain the source code of remote web page by XMLHTTP component technology among the XML that utilizes the internet, and obtains the network address table of the data of source code the inside according to specific intercepting rule; By utilizing the XMLHTTP component technology to check the source code of data of the network address correspondence of data, by specific intercepting rule is set the customizing messages of source code the inside is intercepted, and the data of intercepting are preserved.By utilizing this acquisition method, can enrich the web site contents of enterprise so that the information search personnel of enterprise obtain a large amount of information at short notice, information can be aggregated into information, can realize information automation is handled.
The quantity of information of current internet is very huge, and becomes the growth rate of how much levels.Obtain the internet Useful Information effectively and come to be enterprises service, become a urgent demand of current enterprise.But the mode of enterprise's acquisition of information is the personnel by enterprise self under a lot of situations at present, goes a rule information is edited, copied by manual, and efficient is very low; If enterprise seeks out more substantial information, just have to dispose great amount of manpower, drop into a large amount of funds, this is difficult to bear concerning enterprise; Simultaneously, if enterprise buys information from the outside, owing to lack specific aim, thus can't satisfy the specific requirement of enterprise to information.
In conjunction with Figure of description, principle of work of the present invention is as follows: the at first definite web page address that will gather, obtain the source code of remote web page by utilizing the XMLHTTP component technology, and the network address table that specific intercepting rule is obtained data in the remote web page source code is set; Utilize the XMLHTTP component technology to obtain the source code of the pairing webpage of network address of data, specific intercepting rule is set, obtain specific data message according to the intercepting rule of setting.
A kind of web page contents acquisition method based on the XMLHTTP component technology, its feature is as follows: 1) the at first definite web page address that will gather; 2) utilize the XMLHTTP component technology, obtain the source code of remote web page; 3) specific intercepting rule is set, from the source code that obtains, obtains the network address table of data; 4) according to the network address of data, utilize the XMLHTTP component technology, obtain the source code of data; 5) specific intercepting rule is set, from the source code of the data obtained, obtains specific data message according to the intercepting rule of setting.
According to the technical characterictic of this acquisition method, can realize with programming language arbitrarily.By using this acquisition technique, enterprise can obtain a large amount of information at short notice, and these information can be used for enriching the web site contents of enterprise, can be the business decision support that provides intelligence; Can be so that enterprise can obtain potential business opportunity by information analysis market.
Claims (1)
1. web page contents acquisition method based on the MLHTTP component technology.Obtain the source code of remote web page by the XMLHTTP component technology, check the source code of remote web page, the network address table that specific intercepting rule is obtained the data of remote web page is set; Obtain the source code of the pairing webpage of network address of remote web page data by the XMLHTTP component technology, the specific intercepting rule of web page contents is set,, obtain information specific, and information is preserved according to the web page contents intercepting rule that is provided with.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CNA2007101069606A CN101136026A (en) | 2007-05-15 | 2007-05-15 | Web page content capturing method based on XMLHTTP component technology |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CNA2007101069606A CN101136026A (en) | 2007-05-15 | 2007-05-15 | Web page content capturing method based on XMLHTTP component technology |
Publications (1)
Publication Number | Publication Date |
---|---|
CN101136026A true CN101136026A (en) | 2008-03-05 |
Family
ID=39160126
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CNA2007101069606A Pending CN101136026A (en) | 2007-05-15 | 2007-05-15 | Web page content capturing method based on XMLHTTP component technology |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN101136026A (en) |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101286895B (en) * | 2008-05-22 | 2010-08-18 | 上海交通大学 | Dynamic configurable data monitoring system and method for distributed network |
WO2013087012A1 (en) * | 2011-12-13 | 2013-06-20 | 北大方正集团有限公司 | Method and system for collecting network data |
CN101741872B (en) * | 2008-11-07 | 2013-10-02 | 华为软件技术有限公司 | Method and device for acquiring information of target resources |
WO2016086784A1 (en) * | 2014-12-02 | 2016-06-09 | 阿里巴巴集团控股有限公司 | Method, apparatus and system for collecting webpage data |
CN106547749A (en) * | 2015-09-16 | 2017-03-29 | 北京国双科技有限公司 | The method and apparatus of collecting webpage data |
-
2007
- 2007-05-15 CN CNA2007101069606A patent/CN101136026A/en active Pending
Cited By (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101286895B (en) * | 2008-05-22 | 2010-08-18 | 上海交通大学 | Dynamic configurable data monitoring system and method for distributed network |
CN101741872B (en) * | 2008-11-07 | 2013-10-02 | 华为软件技术有限公司 | Method and device for acquiring information of target resources |
WO2013087012A1 (en) * | 2011-12-13 | 2013-06-20 | 北大方正集团有限公司 | Method and system for collecting network data |
US9525605B2 (en) | 2011-12-13 | 2016-12-20 | Peking University Founder Group Co., Ltd. | Method of and system for collecting network data |
WO2016086784A1 (en) * | 2014-12-02 | 2016-06-09 | 阿里巴巴集团控股有限公司 | Method, apparatus and system for collecting webpage data |
CN105721519A (en) * | 2014-12-02 | 2016-06-29 | 阿里巴巴集团控股有限公司 | Webpage data acquisition method, device and system |
CN105721519B (en) * | 2014-12-02 | 2019-02-05 | 阿里巴巴集团控股有限公司 | A kind of webpage data acquiring method, apparatus and system |
CN106547749A (en) * | 2015-09-16 | 2017-03-29 | 北京国双科技有限公司 | The method and apparatus of collecting webpage data |
CN106547749B (en) * | 2015-09-16 | 2021-02-12 | 北京国双科技有限公司 | Webpage data acquisition method and device |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Dvořák et al. | Renewable energy investment and job creation; a cross-sectoral assessment for the Czech Republic with reference to EU benchmarks | |
CN104063411B (en) | Based on the corporate information collection method of baud five power models | |
Biemer et al. | Our environmental handprint: The good we do | |
Craig et al. | Exploring utility organization electricity generation, residential electricity consumption, and energy efficiency: A climatic approach | |
CN101136026A (en) | Web page content capturing method based on XMLHTTP component technology | |
White et al. | Strategic environmental assessment in the electricity sector: an application to electricity supply planning, Saskatchewan, Canada | |
Mitchell et al. | Is carbon financing trashing integrated waste management? Experience from Indonesia | |
CN112241428A (en) | Digital decision-making method and system | |
Allington et al. | Selected ‘Starter Kit’energy system modelling data for Lesotho (# CCG) | |
Ó GALLACHÓIR et al. | Comparing primary energy attributed to renewable energy with primary energy equivalent to determine carbon abatement in a national context | |
Gradziuk | The impact of the polish renewable energy sector on employment | |
Cannone et al. | Selected ‘Starter Kit’energy system modelling data for Morocco (# CCG) | |
Nissing et al. | A material flow analysis of wood and paper in Cape Town: is there potential to redirect flows in formal and informal sectors to foster use as a renewable resource? | |
Allington et al. | Selected ‘Starter Kit’energy system modelling data for Indonesia (# CCG) | |
Cannone et al. | Selected ‘Starter Kit’energy system modelling data for Malawi (# CCG) | |
Zhao et al. | Application of Energy-Carbon Flow Charts in High-Tech Industrial Park | |
Pape-Salmon et al. | Low-Impact Renewable Energy Policy in Canada: Strengths, Gaps and a Path Forward | |
Huenteler | Appraisal Program Information Document (PID)-Rwanda Energy Supplemental DPO-P173882 | |
Cannone et al. | Selected ‘Starter Kit’energy system modelling data for Sudan (# CCG) | |
Cannone et al. | Selected ‘Starter Kit’energy system modelling data for Ecuador (# CCG) | |
Cannone et al. | Selected ‘Starter Kit’energy system modelling data for Bolivia (# CCG) | |
Cannone et al. | Selected ‘Starter Kit’energy system modelling data for Taiwan (# CCG) | |
Allington et al. | Selected ‘Starter Kit’energy system modelling data for Mauritania (# CCG) | |
Allington et al. | Selected ‘Starter Kit’energy system modelling data for Cameroon (# CCG) | |
Jalasjoki | Opportunity Enablers, tools etc |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C57 | Notification of unclear or unknown address | ||
DD01 | Delivery of document by public notice |
Addressee: Chen Shijie Document name: Notification of Publication of the Application for Invention |
|
DD01 | Delivery of document by public notice |
Addressee: Beijing Jusheng Science Technology Co., Ltd. Document name: Notification of before Expiration of Request of Examination as to Substance |
|
DD01 | Delivery of document by public notice |
Addressee: Beijing Jusheng Science Technology Co., Ltd. Document name: Notification that Application Deemed to be Withdrawn |
|
C02 | Deemed withdrawal of patent application after publication (patent law 2001) | ||
WD01 | Invention patent application deemed withdrawn after publication |
Open date: 20080305 |