CN101136026A - Web page content capturing method based on XMLHTTP component technology - Google Patents

Web page content capturing method based on XMLHTTP component technology Download PDF

Info

Publication number
CN101136026A
CN101136026A CNA2007101069606A CN200710106960A CN101136026A CN 101136026 A CN101136026 A CN 101136026A CN A2007101069606 A CNA2007101069606 A CN A2007101069606A CN 200710106960 A CN200710106960 A CN 200710106960A CN 101136026 A CN101136026 A CN 101136026A
Authority
CN
China
Prior art keywords
web page
xmlhttp
information
component technology
source code
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CNA2007101069606A
Other languages
Chinese (zh)
Inventor
陈世杰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
BEIJING JUSHENG SCIENCE TECHNOLOGY Co Ltd
Original Assignee
BEIJING JUSHENG SCIENCE TECHNOLOGY Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by BEIJING JUSHENG SCIENCE TECHNOLOGY Co Ltd filed Critical BEIJING JUSHENG SCIENCE TECHNOLOGY Co Ltd
Priority to CNA2007101069606A priority Critical patent/CN101136026A/en
Publication of CN101136026A publication Critical patent/CN101136026A/en
Pending legal-status Critical Current

Links

Abstract

In current information age, the amount of information is huge and is increased in a geometric sequence. The enterprises mainly use the manual method to collect information so as to greatly increase the cost on human force, resources, financial power. The invention can use the computer to simulate the manual operation on back stage so as to collect information in high efficiency and low cost.

Description

A kind of web page contents acquisition method based on the XMLHTTP component technology
What the present invention relates to is by XMLHTTP component technology among the XML, realizes the remote web page data message is carried out the method for collection in batches.
Current domestic general information obtain manner all is to realize by manually information being collected one by one, time-consuming, effort, and inefficiency.Simultaneously, obtain information by manual type and will be subjected to restrictions such as human resources, fund, time, obtain information specific timely and effectively thereby influenced enterprise.
Unique distinction of the present invention is to obtain the source code of remote web page by XMLHTTP component technology among the XML that utilizes the internet, and obtains the network address table of the data of source code the inside according to specific intercepting rule; By utilizing the XMLHTTP component technology to check the source code of data of the network address correspondence of data, by specific intercepting rule is set the customizing messages of source code the inside is intercepted, and the data of intercepting are preserved.By utilizing this acquisition method, can enrich the web site contents of enterprise so that the information search personnel of enterprise obtain a large amount of information at short notice, information can be aggregated into information, can realize information automation is handled.
The quantity of information of current internet is very huge, and becomes the growth rate of how much levels.Obtain the internet Useful Information effectively and come to be enterprises service, become a urgent demand of current enterprise.But the mode of enterprise's acquisition of information is the personnel by enterprise self under a lot of situations at present, goes a rule information is edited, copied by manual, and efficient is very low; If enterprise seeks out more substantial information, just have to dispose great amount of manpower, drop into a large amount of funds, this is difficult to bear concerning enterprise; Simultaneously, if enterprise buys information from the outside, owing to lack specific aim, thus can't satisfy the specific requirement of enterprise to information.
In conjunction with Figure of description, principle of work of the present invention is as follows: the at first definite web page address that will gather, obtain the source code of remote web page by utilizing the XMLHTTP component technology, and the network address table that specific intercepting rule is obtained data in the remote web page source code is set; Utilize the XMLHTTP component technology to obtain the source code of the pairing webpage of network address of data, specific intercepting rule is set, obtain specific data message according to the intercepting rule of setting.
A kind of web page contents acquisition method based on the XMLHTTP component technology, its feature is as follows: 1) the at first definite web page address that will gather; 2) utilize the XMLHTTP component technology, obtain the source code of remote web page; 3) specific intercepting rule is set, from the source code that obtains, obtains the network address table of data; 4) according to the network address of data, utilize the XMLHTTP component technology, obtain the source code of data; 5) specific intercepting rule is set, from the source code of the data obtained, obtains specific data message according to the intercepting rule of setting.
According to the technical characterictic of this acquisition method, can realize with programming language arbitrarily.By using this acquisition technique, enterprise can obtain a large amount of information at short notice, and these information can be used for enriching the web site contents of enterprise, can be the business decision support that provides intelligence; Can be so that enterprise can obtain potential business opportunity by information analysis market.

Claims (1)

1. web page contents acquisition method based on the MLHTTP component technology.Obtain the source code of remote web page by the XMLHTTP component technology, check the source code of remote web page, the network address table that specific intercepting rule is obtained the data of remote web page is set; Obtain the source code of the pairing webpage of network address of remote web page data by the XMLHTTP component technology, the specific intercepting rule of web page contents is set,, obtain information specific, and information is preserved according to the web page contents intercepting rule that is provided with.
CNA2007101069606A 2007-05-15 2007-05-15 Web page content capturing method based on XMLHTTP component technology Pending CN101136026A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CNA2007101069606A CN101136026A (en) 2007-05-15 2007-05-15 Web page content capturing method based on XMLHTTP component technology

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CNA2007101069606A CN101136026A (en) 2007-05-15 2007-05-15 Web page content capturing method based on XMLHTTP component technology

Publications (1)

Publication Number Publication Date
CN101136026A true CN101136026A (en) 2008-03-05

Family

ID=39160126

Family Applications (1)

Application Number Title Priority Date Filing Date
CNA2007101069606A Pending CN101136026A (en) 2007-05-15 2007-05-15 Web page content capturing method based on XMLHTTP component technology

Country Status (1)

Country Link
CN (1) CN101136026A (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101286895B (en) * 2008-05-22 2010-08-18 上海交通大学 Dynamic configurable data monitoring system and method for distributed network
WO2013087012A1 (en) * 2011-12-13 2013-06-20 北大方正集团有限公司 Method and system for collecting network data
CN101741872B (en) * 2008-11-07 2013-10-02 华为软件技术有限公司 Method and device for acquiring information of target resources
WO2016086784A1 (en) * 2014-12-02 2016-06-09 阿里巴巴集团控股有限公司 Method, apparatus and system for collecting webpage data
CN106547749A (en) * 2015-09-16 2017-03-29 北京国双科技有限公司 The method and apparatus of collecting webpage data

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101286895B (en) * 2008-05-22 2010-08-18 上海交通大学 Dynamic configurable data monitoring system and method for distributed network
CN101741872B (en) * 2008-11-07 2013-10-02 华为软件技术有限公司 Method and device for acquiring information of target resources
WO2013087012A1 (en) * 2011-12-13 2013-06-20 北大方正集团有限公司 Method and system for collecting network data
US9525605B2 (en) 2011-12-13 2016-12-20 Peking University Founder Group Co., Ltd. Method of and system for collecting network data
WO2016086784A1 (en) * 2014-12-02 2016-06-09 阿里巴巴集团控股有限公司 Method, apparatus and system for collecting webpage data
CN105721519A (en) * 2014-12-02 2016-06-29 阿里巴巴集团控股有限公司 Webpage data acquisition method, device and system
CN105721519B (en) * 2014-12-02 2019-02-05 阿里巴巴集团控股有限公司 A kind of webpage data acquiring method, apparatus and system
CN106547749A (en) * 2015-09-16 2017-03-29 北京国双科技有限公司 The method and apparatus of collecting webpage data
CN106547749B (en) * 2015-09-16 2021-02-12 北京国双科技有限公司 Webpage data acquisition method and device

Similar Documents

Publication Publication Date Title
Dvořák et al. Renewable energy investment and job creation; a cross-sectoral assessment for the Czech Republic with reference to EU benchmarks
CN104063411B (en) Based on the corporate information collection method of baud five power models
Biemer et al. Our environmental handprint: The good we do
Craig et al. Exploring utility organization electricity generation, residential electricity consumption, and energy efficiency: A climatic approach
CN101136026A (en) Web page content capturing method based on XMLHTTP component technology
White et al. Strategic environmental assessment in the electricity sector: an application to electricity supply planning, Saskatchewan, Canada
Mitchell et al. Is carbon financing trashing integrated waste management? Experience from Indonesia
CN112241428A (en) Digital decision-making method and system
Allington et al. Selected ‘Starter Kit’energy system modelling data for Lesotho (# CCG)
Ó GALLACHÓIR et al. Comparing primary energy attributed to renewable energy with primary energy equivalent to determine carbon abatement in a national context
Gradziuk The impact of the polish renewable energy sector on employment
Cannone et al. Selected ‘Starter Kit’energy system modelling data for Morocco (# CCG)
Nissing et al. A material flow analysis of wood and paper in Cape Town: is there potential to redirect flows in formal and informal sectors to foster use as a renewable resource?
Allington et al. Selected ‘Starter Kit’energy system modelling data for Indonesia (# CCG)
Cannone et al. Selected ‘Starter Kit’energy system modelling data for Malawi (# CCG)
Zhao et al. Application of Energy-Carbon Flow Charts in High-Tech Industrial Park
Pape-Salmon et al. Low-Impact Renewable Energy Policy in Canada: Strengths, Gaps and a Path Forward
Huenteler Appraisal Program Information Document (PID)-Rwanda Energy Supplemental DPO-P173882
Cannone et al. Selected ‘Starter Kit’energy system modelling data for Sudan (# CCG)
Cannone et al. Selected ‘Starter Kit’energy system modelling data for Ecuador (# CCG)
Cannone et al. Selected ‘Starter Kit’energy system modelling data for Bolivia (# CCG)
Cannone et al. Selected ‘Starter Kit’energy system modelling data for Taiwan (# CCG)
Allington et al. Selected ‘Starter Kit’energy system modelling data for Mauritania (# CCG)
Allington et al. Selected ‘Starter Kit’energy system modelling data for Cameroon (# CCG)
Jalasjoki Opportunity Enablers, tools etc

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C57 Notification of unclear or unknown address
DD01 Delivery of document by public notice

Addressee: Chen Shijie

Document name: Notification of Publication of the Application for Invention

DD01 Delivery of document by public notice

Addressee: Beijing Jusheng Science Technology Co., Ltd.

Document name: Notification of before Expiration of Request of Examination as to Substance

DD01 Delivery of document by public notice

Addressee: Beijing Jusheng Science Technology Co., Ltd.

Document name: Notification that Application Deemed to be Withdrawn

C02 Deemed withdrawal of patent application after publication (patent law 2001)
WD01 Invention patent application deemed withdrawn after publication

Open date: 20080305