CN102609416B

CN102609416B - Webpage information storage control and method

Info

Publication number: CN102609416B
Application number: CN201110023799.2A
Authority: CN
Inventors: 翁世芳; 陆欣; 刘耀华; 吴云艳; 林希
Original assignee: Shenzhen Yuzhan Precision Technology Co ltd; Hon Hai Precision Industry Co Ltd
Current assignee: Shenzhen Yuzhan Precision Technology Co ltd; Hon Hai Precision Industry Co Ltd
Filing date: 2011-01-21
Publication date: 2016-12-14
Anticipated expiration: 2031-01-21

Abstract

A kind of webpage information store method, the method includes: obtain the html document of this webpage every a scheduled time；Resolve the html document of this webpage, extract data in the html document of this webpage；The relatively html document of the named web page of the acquisition of this parsing is the most consistent with the data of the HTML of preservation；When the data of the html document of named web page of acquisition of this parsing and the HTML of preservation are inconsistent, replace the data in the html document specified of this preservation by the data in the html document specified of this acquisition.The present invention also provides for a kind of control, can be upgraded in time the contents such as the webpage of appointed website, picture, video by the method and this control.

Description

Webpage information storage control and method

Technical field

The present invention relates to a kind of webpage information storage control and method, particularly to one by one Individual website is gone dynamically to obtain the up-to-date information of a named web page and the control preserved in time and method.

Background technology

At present, we by the auto-programming of a webpage, such as Baidu Aranea, come to visit sometimes Ask the contents such as other webpages on the Internet, picture, video, set up index data base, thus Enable a user to search the contents such as the webpage of other websites, picture, video in the web page. But this auto-programming can not go to capture the contents such as the webpage of the website specified, picture, video, And when having renewal in webpage in other websites, picture, video etc., this auto-programming differs Upgrade in time content in its index data base surely.

Summary of the invention

In view of this, it is necessary to a kind of webpage information storage control and method are provided, can the most more The contents such as the webpage of newly specified website, picture, video.

A kind of webpage information storage control, this control include an input control, one obtain control, One resolve control, one judge control and one update control, this input control for provide one operate Interface for users inputs the web page address specified, and this acquisition control is for carrying by this input control The web page address specified of confession, periodically obtains the current html document of named web page, This parsing control is for extracting the current html document of the named web page that this acquisition control obtains Data, this judgement control is additionally operable to compare the acquisition of this parsing and this preservation named web page In html document in data whether consistent, when this acquisition and the named web page of this preservation In html document in data consistent time, this renewal control is for according to this parsing control institute The data of the current html document of the named web page extracted are corresponding before updating this named web page The data of html document.

A kind of webpage information store method, the method includes: obtains this every a scheduled time and refers to Determine the html document of webpage；Resolve the html document of this named web page, extract this appointment Data in the html document of webpage；The relatively HTML of the named web page of the acquisition of this parsing Document is the most consistent with the data of the HTML of preservation；Named web page when the acquisition of this parsing Time the data of the HTML of html document and preservation are inconsistent, with specifying of this acquisition Data in html document replace the data in the html document specified of this preservation.

This acquisition control obtains the html document of this named web page, and this parsing control resolves this and refers to Determine the html document of webpage, extract the data in the html document of this named web page, should The html document judging current html document and this preservation that control compares this parsing is No unanimously, when inconsistent, this renewal control updates the data in the html document of this preservation. Thus the contents such as the webpage of appointed website, picture, video that can upgrade in time.

Accompanying drawing explanation

Fig. 1 is the block diagram of webpage information storage control in an embodiment of the present invention.

Fig. 2 is the flow chart of webpage information store method in an embodiment of the present invention.

Main element symbol description

Webpage information storage control	100
		Input control	10
Obtain control	20
		Resolve control	30
Judge control	40
		Update control	50

Detailed description of the invention

Refer to Fig. 1, be the block diagram of a webpage information storage control 100.This webpage Information storage control 100 is a source program code, and it is arranged at the program code of website and webpage In, such as in the program code of the homepage of one portal website.This webpage information storage control 100 Obtain control 20, parsing control 30, including an input control 10, and judge control 40 And one update control 50.

This input control 10 is for providing an inputting interface, the webpage specified needed for inputting for user Address, and the web page address that user inputs is saved in the URL (Uniform/ of this website Universal Resource Locator, web page address) in.

This acquisition control 20 is by URL (the Uniform/Universal Resource in this website Locator, web page address) the middle web page address specified arranged is at interval of a scheduled time (such as 2 days) obtain HTML (HyperText Mark-up Language, the hypertext of this named web page Markup language or HTML) document.Specifically, this acquisition control 10 profit Simulate webpage by the webBrowser class in .net to log in, so that with in javascript Document.getElementsByTagName (" HTML ") [0] .outerHTML method obtains This named web page html document.Wherein, this scheduled time also also can be by user by system default The inputting interface provided by this input control 10 is set.

This parsing control 30 resolves this appointment of current acquisition for utilizing Document object The html document (calling " current html document " in the following text) of webpage and this named web page it The html document (calling " html document of preservation " in the following text) of front preservation, passes through GetElementById obtains the data in the html document that this is current and preservation respectively Data in html document.Wherein, any webpage all includes control, such as list, general Logical buttons etc., the data of the html document of this named web page that this parsing control 30 resolves are i.e. For the data in the control of this named web page.

This judgement control 40 is additionally operable to obtain the new of this named web page at this acquisition control 10 During html document, compare the data in the related control in this current html document and guarantor The data of the related control in the html document deposited are the most consistent.

HTML when the data in the related control in the html document that this is current Yu preservation When the data of the related control in document are inconsistent, this renewal control 50 is with this current HTML Data in related control in document replace related control in original html document preserved Data, and preserve this replacement data.

This judges that control 40 is additionally operable to judge that whether the named web page html document of this acquisition is Obtain first.When this current html document is for obtaining first, this renewal control 50 will This html document preserves.When this current html document is not for obtaining first, this solution Analysis control 30 resolves the html document of this named web page.

Refer to Fig. 2, for the flow process of the webpage information store method in an embodiment of the present invention Figure.

In step s 201, this acquisition control 20 is by the institute of input in input control 10 The web page address that need to specify, periodically obtains the html document of this webpage specified.

In step S202, this judges that control 40 judges that whether this current html document is Obtain first.When this current html document is for obtaining first, perform step S206, When this current html document is not for obtaining first, perform step S203.

In step S203, this parsing control 30 utilizes Document object to resolve should Front html document and the html document of preservation, thus obtain this current HTML respectively In related control in document data and preservation html document in related control in Data.

In step S204, this judgement control 40 obtains this appointment net at this acquisition control 10 During the new html document of page, compare related control in this current html document Data are the most consistent with the data in the related control in the html document of this preservation.When deserving The front data of related control in html document and the phase in the html document of this preservation Close the data in control inconsistent time, perform step S205.

In step S205, relevant with in this current html document of this renewal control 50 Data in control replace the data in the related control in the html document of this preservation, and Preserve this replacement data.

In step S206, this renewal control 50 preserves this html document.

Those skilled in the art are it should be appreciated that above embodiment is only to use The present invention is described, and is not used as limitation of the invention, as long as in the essence of the present invention Within scope, the suitably change being made above example and change all fall and want in the present invention Ask within the scope of protection.

Claims

1. a webpage information storage control, it is characterised in that: this control include an input control, One obtain control, one resolve control, one judge control and one update control, this input control use In providing an operation interface for users to input the web page address specified, this acquisition control is used for passing through The web page address specified that this input control provides, periodically obtains the current of named web page Html document, this parsing control is for extracting the current of the named web page of this acquisition control acquisition The data of html document, this judgement control is additionally operable to compare the acquisition of this parsing and this preservation Named web page in html document in data whether consistent, when this acquisition and this preservation Named web page in html document in data inconsistent time, this renewal control be used for basis The data of the current html document of the named web page that this parsing control is extracted update this appointment The data of html document corresponding before webpage.

2. webpage information storage control as claimed in claim 1, it is characterised in that: this judgement Control is additionally operable to judge whether the html document of this webpage is to obtain first, when this webpage When html document is for obtaining first, this renewal control directly preserves this html document, when this When the html document of webpage is not to obtain first, this parsing control resolves in this named web page Data in html document.

3. webpage information storage control as claimed in claim 1, it is characterised in that: this parsing Control utilizes the related data in this named web page of Document object extraction.

4. webpage information storage control as claimed in claim 1, it is characterised in that: this control Being a program code, this program code is positioned in the program of this webpage.

5. a webpage information store method, it is characterised in that the method includes:

The html document of this webpage is obtained every a scheduled time；

Resolve the html document of this webpage, extract data in the html document of this webpage；

The relatively HTML's of the html document of the named web page of the acquisition of this parsing and preservation Data are the most consistent；

When the html document of named web page of acquisition of this parsing and the number of the HTML of preservation According to time inconsistent, replace the finger of this preservation by the data in the html document specified of this acquisition The fixed data in html document.

6. webpage information store method as claimed in claim 5, it is characterised in that the method Also include:

Whether the html document judging this webpage specified is to obtain first；

When the html document of this webpage specified is for obtaining first, preserve the appointment of this acquisition The html document of webpage；

When the html document of this webpage specified is not for obtaining first, resolve the sum of this acquisition Data in the html document of the webpage specified of this preservation.

7. webpage information store method as claimed in claim 5, it is characterised in that: this extraction In the html document of this webpage, the mode of data is for utilizing Document object.