Patented claim of the present invention be that March 31, application number in 2012 are 201210092944.7 the applying date, name is called the dividing an application of Chinese invention patent application of " a kind of web page storage method, Apparatus and system ".
Summary of the invention
The object of this invention is to provide a kind of web page storage method, Apparatus and system, preservation web page storage content of pages that can be comparatively complete.
For achieving the above object, the invention provides following scheme:
A web page storage method, comprising:
Receive after the collection operational order that user carries out browsed webpage, utilize write described webpage for capturing the scripted code of web page contents, capture the content description information of described webpage;
Described content description information is resolved, according to analysis result, capture the content of described webpage;
Captured web page contents is preserved.
Wherein, also comprise:
When webpage that described user browses being detected and loaded, in the webpage of browsing to user, write for capturing the scripted code of web page contents;
Or,
When receiving the collection operational order that user carries out browsed webpage, in the webpage of browsing to user, write for capturing the scripted code of web page contents.
Wherein, in the described webpage of browsing to user, write for capturing the scripted code of web page contents, comprising:
In the webpage of browsing user, add embedded framework;
In described embedded framework, write described scripted code.
Wherein, the content description information of the described webpage of described crawl, comprising:
Capture the DOM Document Object Model information of described webpage.
Wherein, described captured web page contents is preserved, being comprised:
According to the DOM Document Object Model information of described webpage, captured web page contents is preserved with structuring pattern.
Wherein, the described content that captures described webpage according to analysis result comprises:
According to default rule, the content without collection meaning comprising in web page contents is filtered, according to filter result, capture the content of described webpage.
Wherein, the described content that captures described webpage according to analysis result comprises:
In the situation that web page contents comprises picture, judge whether the picture number in webpage is greater than default threshold value, if so, adopt asynchronous system to download the image content of described webpage.
Wherein,
After capturing the content description information of webpage, also comprise: described content description information is sent to server end equipment;
Described server end equipment is resolved described content description information, captures the content of described webpage according to analysis result, and captured web page contents is preserved.
A web page storage device, comprising:
Descriptor placement unit, for receive after the collection operational order that user carries out browsed webpage, utilize write described webpage for capturing the scripted code of web page contents, capture the content description information of described webpage;
Capturing webpage contents unit, for described content description information is resolved, captures the content of described webpage according to analysis result;
Web page contents storage unit, for preserving captured web page contents.
Wherein, also comprise:
Code injection unit, for when webpage that described user browses being detected and loaded, writes in the webpage of browsing for capturing the scripted code of web page contents to user; Or, when receiving the collection operational order that user carries out browsed webpage, in the webpage of browsing to user, write for capturing the scripted code of web page contents.
Wherein, described code injection unit, comprising:
Framework adds subelement, for the webpage of browsing user, adds embedded framework;
Code writes subelement, for writing described scripted code at described embedded framework.
Wherein, described descriptor placement unit, specifically for:
After receiving user's collection operational order, utilize the scripted code writing in advance, capture the DOM Document Object Model information of described webpage.
Wherein, described web page contents storage unit, specifically for:
According to the DOM Document Object Model information of described webpage, captured web page contents is preserved with structuring pattern.
Wherein, described capturing webpage contents unit, specifically for:
According to default rule, the content without collection meaning comprising in web page contents is filtered, according to filter result, capture the content of described webpage.
Wherein, described capturing webpage contents unit, specifically for:
In the situation that web page contents comprises picture, judge whether the picture number in webpage is greater than default threshold value, if so, adopt asynchronous system to download the image content of described webpage.
A web page storage system, comprises client device and server end equipment;
Described client device, comprising:
Descriptor placement unit, for receive after the collection operational order that user carries out browsed webpage, utilize write described webpage for capturing the scripted code of web page contents, capture the content description information of described webpage;
Descriptor transmitting element, for being sent to server end equipment by described web page contents descriptor;
Described server end equipment, comprising:
Descriptor receiving element, the web page contents descriptor sending for receiving client device;
Capturing webpage contents unit, for described web page contents descriptor is resolved, captures the content of webpage according to analysis result;
Web page contents storage unit, for preserving captured web page contents.
Wherein, described client device also comprises:
Code injection unit, for when webpage that described user browses being detected and loaded, writes in the webpage of browsing for capturing the scripted code of web page contents to user; Or, when receiving the collection operational order that user carries out browsed webpage, in the webpage of browsing to user, write for capturing the scripted code of web page contents.
Wherein, described code injection unit, comprising:
Framework adds subelement, for the webpage of browsing user, adds embedded framework;
Code writes subelement, for writing described scripted code at described embedded framework.
Wherein, described descriptor placement unit, specifically for:
After receiving user's collection operational order, utilize the scripted code writing in advance, capture the DOM Document Object Model information of described webpage.
Wherein, described web page contents storage unit, specifically for:
According to the DOM Document Object Model information of described webpage, captured web page contents is preserved with structuring pattern.
Wherein, described capturing webpage contents unit, specifically for:
According to default rule, the content without collection meaning comprising in web page contents is filtered, according to filter result, capture the content of described webpage.
Wherein, described capturing webpage contents unit, specifically for:
In the situation that web page contents comprises picture, judge whether the picture number in webpage is greater than default threshold value, if so, adopt asynchronous system to download the image content of described webpage.
The technical scheme that the embodiment of the present invention provides, owing to by writing in advance the scripted code of described webpage, the descriptor of webpage being captured, has guaranteed on the one hand the comprehensive of the web page contents that captures; On the other hand, in the descriptor due to webpage, carry the style information of webpage, therefore, when preserving web page contents, can to web page contents, carry out typesetting according to style information, thereby improve the order of web page storage result, be convenient to user and read.
Embodiment
Below in conjunction with the accompanying drawing in the embodiment of the present invention, the technical scheme in the embodiment of the present invention is clearly and completely described, obviously, described embodiment is only the present invention's part embodiment, rather than whole embodiment.Embodiment based in the present invention, the every other embodiment that those of ordinary skills obtain, belongs to the scope of protection of the invention.
First a kind of web page storage method embodiment of the present invention being provided describes, and the method can comprise the following steps:
Receive after the collection operational order that user carries out browsed webpage, utilize write described webpage for capturing the scripted code of web page contents, capture the content description information of described webpage;
Described content description information is resolved, according to analysis result, capture the content of described webpage;
Captured web page contents is preserved.
In one embodiment of the invention, above-mentioned steps can all realize in client device, for example, in the web page storage software of browser itself, browser plug-in or special use, realize.
In one embodiment of the invention, the step that writes scripted code step and crawl web page contents descriptor in above-mentioned steps realizes in client device, client is sent to server end equipment after grabbing web page contents descriptor, by server, completes subsequent step.
First, as shown in Figure 1, this web page storage method comprises step:
S101: receive after the collection operational order that user carries out browsed webpage, utilize write described webpage for capturing the scripted code of web page contents, capture the content description information of described webpage;
In embodiments of the present invention, be not to adopt server directly to capture Webpage content, this be because: for a part of webpage, server cannot directly capture, such as some page just can represent after must logining, if not login of client, server end also cannot capture.Therefore in embodiments of the present invention, the operation that captures the content of Webpage is completed by client, such as being completed by softwares such as browsers.
According to the scheme of the embodiment of the present invention, can, when user's browsing page, after viewed webpage loaded being detected, in the webpage of browsing to user, write scripted code.This section of code can show a button (can show words such as " I like " on button) on webpage specified location (such as right side), can trigger collection operation after point " I like " button.Or, under another kind of real-time mode, can give tacit consent in webpage specified location (such as right side) and show a button (can show words such as " I like " on button), if user wants to collect the current webpage of browsing, just can click this " I like " button, then carry out the operation that writes scripted code in the webpage of browsing to user, be equivalent to user simultaneously and triggered collection operation.
Wherein, the scripted code that writes webpage has the function that captures Webpage content, owing to there being at present a lot of Webpages to adopt JS(JavaScript) technological development, therefore in the embodiment of the present invention, adopt to the mode that writes JS scripted code in webpage and realize, the content of pages that can either solve after user login captures problem, the security again can guarantee information capturing.
In a modification of the present invention embodiment, in the webpage that can first browse user, the embedded framework of middle interpolation then writes described scripted code in embedded framework.
Wherein embedded framework can be iframe framework, and iframe framework can be isolated scripted code and browser interface.The reason of implementing is like this: because in practical application, if unique user can obtain scripted code, just can operating browser interface, thereby band is served safety problem, such as: user can be by utilizing scripted code to initiate cross-domain request in browser, can revise browser profile by operating browser interface, and other interface functions of browser.For fear of scripted code, by malicious exploitation, in the embodiment of the present invention, scripted code is write in embedded framework, by embedded framework, scripted code and browser interface is isolated, thus increase security.
To webpage, write after scripted code, can, after page loaded, in the page one side, draw button or user interactions panel.So that user clicks this button and triggers collection operation.Certainly, in the present invention, user sends the mode of collection operational order and only limits to button click.In addition, user can also arrange button skin, share the operations such as configuration with crossing mutual panel, repeats no more here.
Certainly, in actual applications, the scheme of the embodiment of the present invention can realize by the mode of a browser plug-in, in the situation that browser plug-in is supported, also injection script in the webpage that can directly browse user, and needn't adopt the mode of the embedded framework of above-mentioned interpolation to realize.
When receiving, user collects action button by click or other modes are initiated to collect after operational order, utilizes the scripted code writing in advance, captures the content description information of webpage.
In the present invention, the web page contents descriptor that mainly need to capture comprises the DOM(Document Object Model of webpage, DOM Document Object Model) information, the layout structure information that includes the page in the dom tree of webpage, utilize these information, follow-uply just can when preserving web page contents, according to the original pattern of webpage, carry out typesetting, with structurized form, preserve.
It will be understood by those skilled in the art that in capturing the process of web page contents descriptor, except DOM information, the information such as the page hyperlink of all right further crawl webpage, title.The embodiment of the present invention does not need this to limit.
S102: described content description information is resolved, capture the content of described webpage according to analysis result;
By the dom tree of analyzing web page, can extract the contents such as word that the page comprises, picture.Wherein, the image content parsing is the source position at picture file place, also needs further from source position by actual picture file, to download to this locality.
Under in process at picture file, can first judge whether the picture number in webpage is greater than certain default threshold value (for example 10 width, 20 width etc.), if not, directly download each image file.And the picture number comprising at webpage is when many, will be very time-consuming in the process of capturing pictures file.In order to improve system performance, the multithreading that can adopt asynchronous system to realize picture file is downloaded in batches, and all picture files are filed unified after handling, and can effectively reduce the required time of capturing pictures like this.
In actual applications, some website may adopt door chain technology, directly download pictures file.For this situation, in embodiments of the present invention, when the request of download pictures file is initiated, the source domain name of the website at picture resource place on can adding in the referer of http head field.During this request of the server parses of the website at picture resource place, can think that this request is to be initiated by self, thereby return to image content.
In the process of capturing pictures content, can also first obtain the size of picture in webpage, for undersized picture, do not download.The mode of this capturing pictures, can filter out the picture that dimension of picture is greater than pre-set dimension threshold value.This is because the picture in webpage may have a lot, and this does not exist the content of collection meaning comprising a large amount of advertising pictures etc.Yet as the picture of webpage main contents, conventionally all have larger size, the mode that therefore adopts dimension of picture to filter, can effectively reduce the crawl of useless image content, has both saved system resource, also improved the readability of collection result.
Be understandable that, place is except utilizing dimension of picture to carry out image content filtration, can also adopt other presetting rule, modes such as network address key word, filename key word, the information without collection meaning that may exist in webpage is filtered, thereby reach the readable object of saving system resource and having improved collection result, the embodiment of the present invention does not need this to limit.
S103: captured web page contents is preserved.
In this step, the web page contents capturing in S103 is preserved, especially, according to the dom tree information of webpage, can be to the web page contents capturing according to the original pattern of webpage, pattern carries out typesetting, with structurized form, preserves.
Further, can also, according to preserved content information generating web page summary, to show user in the favorites list, be convenient to user and browse.In specific implementation process, can according to web page title information can generate summary title, according to the page word of webpage, can generate word segment in summary, according to page pictures information, can generate the thumbnail in summary, etc.Preserve described summary info, user just can, in follow-up surfing the web in process, directly check the summary info of the webpage of collecting in web page storage list.
In addition, application the present invention program, also allows user that the webpage of collection is shared to other websites, can also be by calling the interface of other websites, typesetted web page content information and summary info are sent to targeted website, thereby realize sharing of user profile, improve user and experience.
Above-mentioned provided web page storage method, captures the descriptor of webpage by writing in advance the scripted code of described webpage, has guaranteed on the one hand the comprehensive of the web page contents that captures; On the other hand, in the descriptor due to webpage, carry the style information of webpage, therefore, when preserving web page contents, can to web page contents, carry out typesetting according to style information, thereby improve the order of web page storage result, be convenient to user and read.
In the above-described embodiments, all web page storage steps are all to realize in client device, in another embodiment of the invention, can be operated by the client and server equipment web page storage that cooperated, and shown in Figure 2, the method comprises the following steps:
S201: client device receives after the collection operational order that user carries out browsed webpage, utilize write described webpage for capturing the scripted code of web page contents, capture the content description information of described webpage;
S202: client device is sent to server end equipment by described content description information;
S203: server end equipment is resolved described content description information, captures the content of described webpage according to analysis result;
S204: server end equipment is preserved captured web page contents.
Compare with last embodiment: S201 and S101 are identical; S203-S204 compares with S102-S103, and difference is that executive agent becomes server end equipment from client device; Increased S202 client device and content description information descriptor has been sent to the step of server end equipment.
Due to the analysis ability of service end, download controllability, the aspect such as typesetting exceeds much than front end JS script again.Therefore can effectively promote the crawl quality of web page contents by this way.And the storage space of service end is more abundant, the Information Sharing of being also more convenient between user.
In addition, according to description before, because service end cannot directly capture some webpage, the step that therefore captures webpage descriptor is still completed by client, thereby guarantees the success ratio of crawl.
Be understandable that, client device, content description information descriptor being sent in the process of server end equipment, can adopt data compression technique, thereby further promotes transfer efficiency.
Corresponding to embodiment of the method above, the embodiment of the present invention also provides a kind of web page storage device, shown in Figure 3, and this device can comprise:
Descriptor placement unit 301, for receive after the collection operational order that user carries out browsed webpage, utilize write described webpage for capturing the scripted code of web page contents, capture the content description information of described webpage;
Capturing webpage contents unit 302, for described content description information is resolved, captures the content of described webpage according to analysis result;
Web page contents storage unit 303, for preserving captured web page contents.
During specific implementation, this device can also comprise:
Code injection unit, for when webpage that described user browses being detected and loaded, writes in the webpage of browsing for capturing the scripted code of web page contents to user; Or, when receiving the collection operational order that user carries out browsed webpage, in the webpage of browsing to user, write for capturing the scripted code of web page contents.
Wherein, in one embodiment of the invention, described code injection unit, can comprise:
Framework adds subelement, for adding embedded framework in the webpage of browsing user;
Code writes subelement, for writing described scripted code at described embedded framework.
Wherein, described descriptor placement unit 301, can be specifically for:
After receiving user's collection operational order, utilize the scripted code writing in advance, capture the DOM Document Object Model information of described webpage.
Web page contents storage unit 303, can be specifically for:
According to the DOM Document Object Model information of described webpage, captured web page contents is preserved with structuring pattern.
In one embodiment of the invention, described capturing webpage contents unit 302, can be specifically for:
According to default rule, the content without collection meaning comprising in web page contents is filtered, according to filter result, capture the content of described webpage.
In another embodiment of the invention, described capturing webpage contents unit 302, can also be specifically for:
In the situation that web page contents comprises picture, judge whether the picture number in webpage is greater than default threshold value, if so, adopt asynchronous system to download the image content of described webpage.
The web page storage device more than providing, can be the functional module that is positioned at client, and this module can be web page storage software of browser itself, browser plug-in or special use etc.
Corresponding and the above-mentioned scheme of all collecting operation that realizes in client, the embodiment of the present invention also provides a kind of web page storage system, shown in Figure 4, and this system comprises client device 401 and server end equipment 402;
Described client device 401, comprising:
Descriptor placement unit 4011, for after receiving user's collection operational order, utilizes the scripted code writing in advance, captures the content description information of described webpage;
Descriptor transmitting element 4012, for being sent to server end equipment by described web page contents descriptor;
Described server end equipment 402, comprising:
Descriptor receiving element 4021, the web page contents descriptor sending for receiving client device;
Capturing webpage contents unit 4022, for described web page contents descriptor is resolved, captures the content of webpage according to analysis result;
Web page contents storage unit 4023, for preserving captured web page contents.
Due to the analysis ability of service end, download controllability, the aspect such as typesetting exceeds much than front end JS script again.Therefore the web page storage system that the embodiment of the present invention provides can effectively promote the crawl quality of web page contents.And the storage space of service end is more abundant, the Information Sharing of being also more convenient between user.
In addition, according to description before, because service end cannot directly capture some webpage, the step that therefore captures webpage descriptor is still completed by client, thereby guarantees the success ratio of crawl.
During specific implementation, client device 401 can also comprise:
Code injection unit, for when webpage that described user browses being detected and loaded, writes in the webpage of browsing for capturing the scripted code of web page contents to user; Or, when receiving the collection operational order that user carries out browsed webpage, in the webpage of browsing to user, write for capturing the scripted code of web page contents.
In one embodiment of the invention, described code injection unit can comprise:
Framework adds subelement, for adding embedded framework in the webpage of browsing user;
Code writes subelement, for writing described scripted code at described embedded framework.
In one embodiment of the invention, described descriptor placement unit 4011, can be specifically for:
After receiving user's collection operational order, utilize the scripted code writing in advance, capture the DOM Document Object Model information of described webpage.
In one embodiment of the invention, described web page contents storage unit 4023, can be specifically for:
According to the DOM Document Object Model information of described webpage, captured web page contents is preserved with structuring pattern.
In one embodiment of the invention, described capturing webpage contents unit 4022, can be specifically for:
According to default rule, the content without collection meaning comprising in web page contents is filtered, according to filter result, capture the content of described webpage.
In one embodiment of the invention, described capturing webpage contents unit 4022, can also be specifically for:
In the situation that web page contents comprises picture, judge whether the picture number in webpage is greater than default threshold value, if so, adopt asynchronous system to download the image content of described webpage.
As seen through the above description of the embodiments, those skilled in the art can be well understood to the mode that the present invention can add essential general hardware platform by software and realizes.Understanding based on such, the part that technical scheme of the present invention contributes to prior art in essence in other words can embody with the form of software product, this computer software product can be stored in storage medium, as ROM/RAM, magnetic disc, CD etc., comprise that some instructions are with so that a computer equipment (can be personal computer, server, or the network equipment etc.) carry out the method described in some part of each embodiment of the present invention or embodiment.
Each embodiment in this instructions all adopts the mode of going forward one by one to describe, between each embodiment identical similar part mutually referring to, each embodiment stresses is the difference with other embodiment.Especially, for device or system embodiment, because it is substantially similar in appearance to embodiment of the method, so describe fairly simplely, relevant part is referring to the part explanation of embodiment of the method.Apparatus and system embodiment described above is only schematic, the wherein said unit as separating component explanation can or can not be also physically to separate, the parts that show as unit can be or can not be also physical locations, can be positioned at a place, or also can be distributed in a plurality of network element.Can select according to the actual needs some or all of module wherein to realize the object of the present embodiment scheme.Those of ordinary skills, in the situation that not paying creative work, are appreciated that and implement.
Above to a kind of web page storage method provided by the present invention, Apparatus and system, be described in detail, applied specific case herein principle of the present invention and embodiment are set forth, the explanation of above embodiment is just for helping to understand method of the present invention and core concept thereof; Meanwhile, for one of ordinary skill in the art, according to thought of the present invention, all will change in specific embodiments and applications.In sum, this description should not be construed as limitation of the present invention.
The embodiment of the invention discloses A1 web page storage method, comprising:
Receive after the collection operational order that user carries out browsed webpage, utilize write described webpage for capturing the scripted code of web page contents, capture the content description information of described webpage;
Described content description information is resolved, according to analysis result, capture the content of described webpage;
Captured web page contents is preserved.
A2, according to the method described in A1, also comprise:
When webpage that described user browses being detected and loaded, in the webpage of browsing to user, write for capturing the scripted code of web page contents;
Or,
When receiving the collection operational order that user carries out browsed webpage, in the webpage of browsing to user, write for capturing the scripted code of web page contents.
A3, according to the method described in A2, in the described webpage of browsing to user, write for capturing the scripted code of web page contents, comprising:
In the webpage of browsing user, add embedded framework;
In described embedded framework, write described scripted code.
A4, according to the method described in A1, the content description information of the described webpage of described crawl, comprising:
Capture the DOM Document Object Model information of described webpage.
A5, according to the method described in A1, described captured web page contents is preserved, comprising:
According to the DOM Document Object Model information of described webpage, captured web page contents is preserved with structuring pattern.
A6, according to the method described in A1, the described content that captures described webpage according to analysis result comprises:
According to default rule, the content without collection meaning comprising in web page contents is filtered, according to filter result, capture the content of described webpage.
A7, according to the method described in A1, the described content that captures described webpage according to analysis result comprises:
In the situation that web page contents comprises picture, judge whether the picture number in webpage is greater than default threshold value, if so, adopt asynchronous system to download the image content of described webpage.
A8, according to the method described in A1-A7 any one,
After capturing the content description information of webpage, also comprise: described content description information is sent to server end equipment;
Described server end equipment is resolved described content description information, captures the content of described webpage according to analysis result, and captured web page contents is preserved.
B9, a kind of web page storage device, comprising:
Descriptor placement unit, for receive after the collection operational order that user carries out browsed webpage, utilize write described webpage for capturing the scripted code of web page contents, capture the content description information of described webpage;
Capturing webpage contents unit, for described content description information is resolved, captures the content of described webpage according to analysis result;
Web page contents storage unit, for preserving captured web page contents.
B10, according to the device described in B9, also comprise:
Code injection unit, for when webpage that described user browses being detected and loaded, writes in the webpage of browsing for capturing the scripted code of web page contents to user; Or, when receiving the collection operational order that user carries out browsed webpage, in the webpage of browsing to user, write for capturing the scripted code of web page contents.
B11, according to the device described in B10, described code injection unit, comprising:
Framework adds subelement, for the webpage of browsing user, adds embedded framework;
Code writes subelement, for writing described scripted code at described embedded framework.
B12, according to the device described in B9, described descriptor placement unit, specifically for:
After receiving user's collection operational order, utilize the scripted code writing in advance, capture the DOM Document Object Model information of described webpage.
B13, according to the device described in B9, described web page contents storage unit, specifically for:
According to the DOM Document Object Model information of described webpage, captured web page contents is preserved with structuring pattern.
B14, according to the device described in B9, described capturing webpage contents unit, specifically for:
According to default rule, the content without collection meaning comprising in web page contents is filtered, according to filter result, capture the content of described webpage.
B15, according to the device described in B9, described capturing webpage contents unit, specifically for:
In the situation that web page contents comprises picture, judge whether the picture number in webpage is greater than default threshold value, if so, adopt asynchronous system to download the image content of described webpage.
C16, a kind of web page storage system, comprise client device and server end equipment;
Described client device, comprising:
Descriptor placement unit, for receive after the collection operational order that user carries out browsed webpage, utilize write described webpage for capturing the scripted code of web page contents, capture the content description information of described webpage;
Descriptor transmitting element, for being sent to server end equipment by described web page contents descriptor;
Described server end equipment, comprising:
Descriptor receiving element, the web page contents descriptor sending for receiving client device;
Capturing webpage contents unit, for described web page contents descriptor is resolved, captures the content of webpage according to analysis result;
Web page contents storage unit, for preserving captured web page contents.
C17, according to the system described in C16, described client device also comprises:
Code injection unit, for when webpage that described user browses being detected and loaded, writes in the webpage of browsing for capturing the scripted code of web page contents to user; Or, when receiving the collection operational order that user carries out browsed webpage, in the webpage of browsing to user, write for capturing the scripted code of web page contents.
C18, according to the system described in C17, described code injection unit, comprising:
Framework adds subelement, for the webpage of browsing user, adds embedded framework;
Code writes subelement, for writing described scripted code at described embedded framework.
C19, according to the system described in C16, described descriptor placement unit, specifically for:
After receiving user's collection operational order, utilize the scripted code writing in advance, capture the DOM Document Object Model information of described webpage.
C20, according to the system described in C16, described web page contents storage unit, specifically for:
According to the DOM Document Object Model information of described webpage, captured web page contents is preserved with structuring pattern.
C21, according to the system described in C16, described capturing webpage contents unit, specifically for:
According to default rule, the content without collection meaning comprising in web page contents is filtered, according to filter result, capture the content of described webpage.
C22, according to the system described in C16, described capturing webpage contents unit, specifically for:
In the situation that web page contents comprises picture, judge whether the picture number in webpage is greater than default threshold value, if so, adopt asynchronous system to download the image content of described webpage.