CN103136259B - A kind of method and apparatus based on content block identification processing web page contents - Google Patents

A kind of method and apparatus based on content block identification processing web page contents Download PDF

Info

Publication number
CN103136259B
CN103136259B CN201110390828.9A CN201110390828A CN103136259B CN 103136259 B CN103136259 B CN 103136259B CN 201110390828 A CN201110390828 A CN 201110390828A CN 103136259 B CN103136259 B CN 103136259B
Authority
CN
China
Prior art keywords
content blocks
identification information
content
block identification
rule
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201110390828.9A
Other languages
Chinese (zh)
Other versions
CN103136259A (en
Inventor
钱海祥
辛昕
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Baidu Netcom Science and Technology Co Ltd
Original Assignee
Beijing Baidu Netcom Science and Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Baidu Netcom Science and Technology Co Ltd filed Critical Beijing Baidu Netcom Science and Technology Co Ltd
Priority to CN201110390828.9A priority Critical patent/CN103136259B/en
Priority to PCT/CN2012/075044 priority patent/WO2013078829A1/en
Publication of CN103136259A publication Critical patent/CN103136259A/en
Application granted granted Critical
Publication of CN103136259B publication Critical patent/CN103136259B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/957Browsing optimisation, e.g. caching or content distillation
    • G06F16/9577Optimising the visualization of content, e.g. distillation of HTML documents

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Transfer Between Computers (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

It is an object of the invention to provide a kind of method and apparatus based on content block identification processing web page contents.First, pending original web page is obtained;Then, block identification information is extracted from the making language document of the original web page, wherein, the block identification information is used to identify each content blocks in the making language document;Then, according to the block identification information, matching inquiry is carried out in rule base is handled, rule is handled to obtain the content blocks corresponding with the block identification information;Then, rule is handled according to the content blocks, the content blocks identified to the block identification information are handled accordingly, to obtain target web.Compared with prior art, the present invention realizes and rapidly content of pages is handled;Thus conversion of page efficiency and quality are improved, so as to lift user experience, simultaneously because only needing to include block identification information in the making language document of the page without including corresponding processing rule, thus mitigating the burden that website carries out maintenance of netpage.

Description

A kind of method and apparatus based on content block identification processing web page contents
Technical field
The present invention relates to Internet technical field, more particularly to a kind of skill based on content block identification processing web page contents Art.
Background technology
Prior art is when carrying out web page contents processing, for example, the webpage shown on the desktop is converted to suitable When the webpage shown on mobile terminals, generally analytically after internet web page in extract subject content, and according to extraction Subject content generate new webpage, be suitable for moving to realize to be converted to the original web page for being suitable for desktop computer displaying The target web of equipment displaying, but the less efficient of webpage conversion is carried out using this method, the time cost of processing is high, so as to shadow The response speed of the accessing page request from mobile terminal user is rung, reduces Consumer's Experience.
Therefore, how to effectively realize and rapidly content of pages handled, turn into current urgent problem to be solved it One.
The content of the invention
It is an object of the invention to provide a kind of method and apparatus based on content block identification processing web page contents.
It is according to an aspect of the invention, there is provided a kind of computer implemented based on content block identification processing web page contents Method, this method comprises the following steps:
A obtains pending original web page;
B extracts block identification information from the making language document of the original web page, wherein, the block identification information is used for Identify each content blocks in the making language document;
C carries out matching inquiry, to obtain and the block identification information phase according to the block identification information in rule base is handled Corresponding content blocks processing rule;
D handles rule according to the content blocks, and the content blocks identified to the block identification information are handled accordingly, with Obtain target web.
According to another aspect of the present invention, a kind of equipment based on content block identification processing web page contents is additionally provided, should Equipment includes:
Original web page acquisition device, for obtaining pending original web page;
Identification information extraction element, for extracting block identification information from the making language document of the original web page, its In, the block identification information is used to identify each content blocks in the making language document;
Rule device is handled, for according to the block identification information, matching inquiry to be carried out in rule base is handled, with Obtain the content blocks processing rule corresponding with the block identification information;
Target web acquisition device, for according to the content blocks handle rule, the block identification information is identified in Hold block to be handled accordingly, to obtain target web.
Compared with prior art, the making language document of present invention original web page acquired in, such as HTML, XHTML text Part, the corresponding block identification information of each content blocks, matching inquiry is carried out in rule base is handled to obtain and the block identification is believed Content blocks processing rule corresponding to manner of breathing, so each content blocks are carried out it is corresponding the processing such as fold, delete, formatting, Rapidly content of pages is handled so as to realize;Thus conversion of page efficiency and quality are improved, is used so as to lift user Experience, simultaneously because only need to include block identification information in the making language document of the page without regular including corresponding processing, Thus the burden that website carries out maintenance of netpage is mitigated.
Brief description of the drawings
By reading the detailed description made to non-limiting example made with reference to the following drawings, of the invention is other Feature, objects and advantages will become more apparent upon:
Fig. 1 shows the equipment schematic diagram based on content block identification processing web page contents according to one aspect of the invention;
Fig. 2 shows the equipment signal based on content block identification processing web page contents in accordance with a preferred embodiment of the present invention Figure;
Fig. 3 shows the method flow diagram based on content block identification processing web page contents according to a further aspect of the present invention;
Fig. 4 shows the method flow based on content block identification processing web page contents in accordance with a preferred embodiment of the present invention Figure.
Same or analogous reference represents same or analogous part in accompanying drawing.
Embodiment
The present invention is described in further detail below in conjunction with the accompanying drawings.
Fig. 1 is shown according to equipment schematic diagram of the one aspect of the invention based on content block identification processing web page contents.Wherein, Processing equipment 1 includes original web page acquisition device 11, identification information extraction element 12, processing Rule device 13 and target network Page acquisition device 14.
Here, processing equipment 1 can be the network equipment, including but not limited to computer, network host, single network service The cloud that device, multiple webserver collection or multiple servers are formed, here, cloud is by based on cloud computing (Cloud Computing) A large amount of computers or the webserver form, wherein, cloud computing is one kind of Distributed Calculation, by the meter of a group loose couplings One super virtual computer of calculation machine collection composition;Processing equipment 1 or mobile terminal, the mobile terminal means can be The computer equipment used in movement, including but not limited to mobile phone, notebook, POS, vehicle-mounted computer etc., its screen size The generally much smaller than display sizes of desktop computer.
It is described in detail referring to Fig. 1 to handle processing equipment 1 process of web page contents:
Specifically, original web page acquisition device 11 obtains pending original web page.
Here, the mode for obtaining pending original web page includes but is not limited to scenario described below:
1) according to the accessing page request from mobile terminal, the URL from the accessing page request (URL) corresponding original web page is obtained at the Website server pointed by;
In one example, first, user is by the interactive device of mobile terminal, including but not limited to keyboard, mouse, remote control Device, touch pad or handwriting equipment, interact with the browser software or client software of mobile terminal, by taking keyboard as an example, use When family is inputted in the address field input frame of the browser software of mobile terminal, it is defeated that the mobile terminal obtains user in real time The keystroke sequence entered, such as a URL (URL) of user's input, and be recorded as and user's input operation phase Corresponding accessing page request, wherein, the accessing page request includes the URL, then passes through the accessing page request about Fixed communication mode is sent to processing equipment 1;Then, original web page acquisition device 11 receives the accessing page request in real time, And page URL is therefrom extracted, and the request for obtaining the webpage, example are sent to the webserver where webpage pointed by the URL Such as, it can be encapsulated as a request message, such as http request message, and be led to by corresponding communication protocol, such as http, https Believe agreement, send to the webserver;Then, original web page acquisition device 11 receives the network server response in the request And the webpage fed back, and using the webpage as the pending original web page.
2) pending original web page is obtained from third party device.
In another example, processing equipment 1 is the network equipment.Original web page acquisition device 11 provides according to third party device API (API), regularly send by predetermined condition or event triggering ground or to the third party device and receive and wait to locate The request message of the original web page of reason, and receive the pending original net that the third party device returns in response to the request message Page;Or third party device actively pushes pending original web page to processing equipment 1, original web page acquisition device 11 receives this and treated The original web page of processing.
Those skilled in the art will be understood that the mode of the pending original web page of above-mentioned acquisition is only for example, and other are existing Or the mode of the pending original web page of acquisition that is likely to occur from now on be such as applicable to the present invention, should also be included in the present invention Within protection domain, and it is incorporated herein by reference.
Then, the markup language text for the original web page that identification information extraction element 12 obtains from original web page acquisition device 11 The mode such as using string matching extracts block identification information in part, wherein, the block identification information is used to identify the mark Remember each content blocks in language file.
Here, the making language document includes but is not limited to:
1) HTML (HTML) file, it is a kind of standard generalized markup language for describing web document Speech;
2) XML (extensible markup language) file, it is a kind of standard generalized markup language for being simply to data storage Speech;
3) XHTML (extensible HyperText Markup Language) file, it is a kind of mark with strict grammer based on XML Language;
4) WML (WML) file, it is retouched for creating the one kind for the page that can be displayed in WAP browser The property stated markup language.
Those skilled in the art will be understood that above-mentioned making language document is only for example, and other are existing or may go out from now on Existing making language document is such as applicable to the present invention, should also be included within the scope of the present invention, and wrap by reference It is contained in this.
Here, the block identification information includes but is not limited to identify title, mark ID etc.;Wherein, the name of title is identified Can according to its mark content blocks type, as title, navigation, text, picture, embedded object (such as Java applet, ActiveX, Flash) etc..
Here, the content blocks mean the content area being made up of one or more labels in making language document, its It is corresponding with the certain content shown in webpage, e.g., title content block, body matter block, navigation content block, image content block, Embedded object (such as Java applet, ActiveX, Flash) block.
Here, storage mode of the block identification information in making language document includes but is not limited to:
1) annotation in making language document;For example, using JSON forms, identification information can be stored in html file annotation In, such as<!--tc block_begin:{type:″context″}-->, wherein, JSON forms are that a kind of data of lightweight are handed over Form is changed, it typically represents data using " name/value " to by the way of, used between title and value ":" separate;
2) the customization label in making language document;For example, in html file, customization label can be<tc></tc>, mark Knowing information can be stored in the customization label;
3) tag attributes in making language document;For example, in XHTML files, identification information can be stored in content blocks In the attribute of label, such as<Div markName=" title ">, wherein attribute markName property value is to be used to identify this The identification information of content blocks corresponding to div tag.
Those skilled in the art will be understood that above-mentioned storage mode is only for example, and other are existing or are likely to occur from now on Storage mode is such as applicable to the present invention, should also be included within the scope of the present invention, and be incorporated herein by reference.
In one example, the markup language file of the original web page obtained when identification information extraction element 12 is XHTML texts During part, such as:
Wherein, the pre-defined tag attributes using the entitled markName of attribute of this XHTML file are come storage content block mark Know information, accordingly, identification information extraction element 12 to the XHTML files by parsing, and according to keyword " markName " carries out string matching therefrom to obtain the markName attributes and its property value " mark in div tag attribute Topic ", the property value is the mark title of content blocks corresponding to the div tag, and the markName category in img tag attributes Property and its property value " picture ", the property value are the mark title of content blocks corresponding to the img labels.
Those skilled in the art will be understood that the mode of said extracted block identification information is only for example, and other are existing or modern The mode for the extraction block identification information being likely to occur afterwards is such as applicable to the present invention, also should be included in the scope of the present invention with It is interior, and be incorporated herein by reference.
Then, the block identification information that Rule device 13 obtains according to identification information extraction element 12 is handled, is being handled Matching inquiry is carried out in rule base, rule is handled to obtain the content blocks corresponding with the block identification information.
Specifically, Rule device 13 is handled according to block identification information, in the processing rule of local or third party device Matching inquiry is carried out in storehouse, rule is handled to obtain the content blocks corresponding with the block identification information.
Here, the processing rule includes but is not limited to:
1) content in content blocks is formatted;Wherein, the formatting includes but is not limited to:
I changes the word attribute in the content blocks, such as font, size, color, the background colour of content etc.;
Ii is reduced etc. to the picture included in the content blocks by predetermined ratio;
2) content blocks are shown;
3) content blocks are deleted;
4) content blocks are folded;Wherein, it is that folding is hidden that the folding, which means that the content blocks are arranged to its content by default, Hide, but can be deployed the content to show by specific triggering mode;
5) display location of content blocks is adjusted.
Those skilled in the art will be understood that above-mentioned processing rule is only for example, and other are existing or are likely to occur from now on Processing rule is such as applicable to the present invention, should also be included within the scope of the present invention, and be incorporated herein by reference.
Here, comprising each block identification information and its corresponding processing rule in the processing rule base, it is included but not It is limited to relational database, Key-Value storage systems, file system etc..
In one example, block identification information is " title ", and processing Rule device 13 passes through according to the block identification information The API (API) that processing equipment 1 provides, matching inquiry is carried out in the processing rule base of local, to obtain and " mark The corresponding content blocks processing rule of topic " block identification information is " show ", will the content blocks that are identified of the block identification information enter Row displaying is handled.
In another example, block identification information is " picture ", processing Rule device 13 according to the block identification information, to Third party device sends processing Rule request, wherein, processing Rule request includes the block identification information;For example, It can be encapsulated as a request message, such as http request message, and by corresponding communication protocol, such as http, https communication protocols View, sends to third party device;Third party device receives and parses through the solicited message in a manner of monitoring ground in real time, and then according to institute The block identification information of extraction is managed at which carries out matching inquiry in rule base, to obtain in corresponding with the block identification information It is " zoomin " to hold block processing rule, and picture that will be in the content blocks that are identified of the block identification information is carried out at predetermined diminution Reason.
Those skilled in the art will be understood that the mode of above-mentioned acquisition processing rule is only for example, and other are existing or from now on The mode for the acquisition processing rule being likely to occur such as is applicable to the present invention, should also be included within the scope of the present invention, and It is incorporated herein by reference.
Preferably, Rule device 13 is handled according to the block identification information and the mark of the original web page affiliated web site Know information, matching inquiry is carried out in rule base is handled, rule is handled with the content blocks that the webpage obtained as the website customizes. This, the identification information of the original web page affiliated web site includes but is not limited to website domain name, web IP address, web site name etc..
Specifically, handle Rule device 13 and pending original net is for example obtained according to original web page acquisition device 11 The URL of page, determine the identification information of the webpage affiliated web site, such as website domain name, web IP address;Then, processing rule obtains Block identification information that device 13 is obtained according to identification information extraction element 12 and the identification information of the original web page affiliated web site are taken, Matching inquiry is carried out in rule base is handled, if matching acquisition is the predetermined processing rule of the webpage of the website, this is made a reservation for Processing rule it is regular as the content blocks processing of the webpage.
In one example, when block identification information is " embedded object ", the URL of original web page is " www.abc.com/ Sport/101.htm ", processing Rule device 13 is according to the URL, and the website domain name of website is where extracting the webpage “www.abc.com”;Processing Rule device 13 carries out matching inquiry according to the block identification information in rule base is handled, and obtains To handle rule accordingly be " delete ", that is, delete the content blocks that the identification information is identified, but according to the block identification information with The website domain name of website described in the original web page carries out matching inquiry in rule base is handled, and obtains as predetermined pair in the website The processing rule of " embedded object " block identification information is " show ", that is, shows the content blocks that the identification information is identified, then handle Rule device 13 ignore with corresponding to the block identification information delete processing rule, by for the website it is predetermined processing rule As content blocks processing rule.
Those skilled in the art will be understood that the mode of above-mentioned acquisition processing rule is only for example, and other are existing or from now on The mode for the acquisition processing rule being likely to occur such as is applicable to the present invention, should also be included within the scope of the present invention, and It is incorporated herein by reference.
Then, the content blocks that target web acquisition device 14 obtains according to processing Rule device 13 handle rule, right The content blocks that the block identification information is identified are handled accordingly, to obtain target web.
Here, described include but is not limited to the corresponding processing of content blocks progress:Row format is entered to the content in content blocks Change, show, delete, fold, sequencing.
In one example, when identification information extraction element 12 parses and obtains two block marks in the html file of certain webpage It is respectively " text " and " picture " to know information, and it is corresponding with " text " block identification information to handle the acquisition of Rule device 13 Content blocks processing rule is the content blocks folding that is identified the identification information, and corresponding with " picture " block identification information interior Picture in the content blocks for holding block processing rule to be identified the identification information is reduced in predetermined reduction ratio;Then target Webpage acquisition device 14 obtains the content blocks that each identification information is identified, so according to above-mentioned identification information in the html file Afterwards, according to corresponding processing rule, the content folded concealed in the content blocks that " text " block identification information is identified, and set Predetermined triggering mode, the body matter can be deployed in the future to show with realizing, and " picture " block identification information is identified Content blocks in picture reduced and shown by predetermined ratio, and then using the webpage after processing as target web.
Those skilled in the art will be understood that the mode of above-mentioned acquisition target web is only for example, and other are existing or from now on The mode for the acquisition target web being likely to occur such as is applicable to the present invention, should also be included within the scope of the present invention, and It is incorporated herein by reference.
Preferably, original web page acquisition device 11, identification information extraction element 12, processing Rule device 13 and target It is continuously to work between webpage acquisition device 14.Specifically, original web page acquisition device 11 constantly obtains pending Original web page;Then, identification information extraction element 12 also constantly extracts from the making language document of the original web page Block identification information, wherein, the block identification information is used to identify each content blocks in the making language document;Then, handle Constantly according to the block identification information, matching inquiry also is carried out in rule base is handled for Rule device 13, with obtain with The corresponding content blocks processing rule of the block identification information;Then, target web acquisition device 14 is also constantly according to described interior Hold block processing rule, the content blocks identified to the block identification information are handled accordingly, to obtain target web.Here, It will be understood by those skilled in the art that " lasting " refers to that each device constantly carries out the acquisition of above-mentioned original web page, block identification information carries Take, handle the acquisition of rule and the acquisition of target web, until meeting predetermined stoppage condition, such as original web page acquisition device 11 Stop obtaining pending original web page in a long time.
Preferably (reference picture 1), when not from the processing rule base obtain content blocks processing it is regular when, processing rule The content correlated information for the content blocks that acquisition device 13 can be identified according to the block identification information, determine the content blocks processing Rule.
Here, the content correlated information of the content blocks includes but is not limited to:
1) positional information of the content of content blocks in the original web page;
2) the alphabetic character quantity that the content of content blocks is included;
3) label information that content blocks are included.
Those skilled in the art will be understood that the above relevant information is only for example, and other are existing or may go out from now on Existing content correlated information is such as applicable to the present invention, should also be included within the scope of the present invention, and wrap by reference It is contained in this.
Here, the mode for determining content blocks processing rule includes but is not limited to scenario described below:
1) handle Rule device 13 and processing rule is determined according to position of the content blocks in original web page;If for example, The content blocks that block identification information is identified are located at the center of original web page, that is, illustrate weight of the content blocks in the original web page Want grade high, then can determine that content blocks processing rule to be shown processing to the content blocks.
2) quantity for handling alphabetic character of the Rule device 13 in content blocks determines processing rule;If for example, When the content blocks character quantity that block identification information is identified exceedes book character amount threshold, then it can determine that its processing rule is to incite somebody to action Word content carries out folding in the content blocks;
3) handle Rule device 13 and processing rule is determined according to the label object included in the content blocks;If for example, The content blocks that block identification information is identified in the making language document of original web page include label<object>, and the label <object>The object used comprising predetermined limit in a mobile device, such as ActiveX, it is determined that it is interior to this that it, which handles rule, Hold block to be deleted.
In one example, code below fragment be present in the html file of original web page:
Block identification information present in it is " embedded object ", processing Rule device 13 according to the block identification information not Match query corresponding content blocks processing rule can be obtained from processing rule base, and from the label<object>Middle parsing obtains The label has attribute clsid, and then determines including ActiveX embedded objects, thereby determines that the block identification information institute is right The processing rule answered is deleted for the content blocks for being identified the identification information.
Those skilled in the art will be understood that the mode of above-mentioned determination processing rule is only for example, and other are existing or from now on The mode for the determination processing rule being likely to occur such as is applicable to the present invention, should also be included within the scope of the present invention, and It is incorporated herein by reference.
Fig. 2 shows the equipment signal based on content block identification processing web page contents in accordance with a preferred embodiment of the present invention Figure.Wherein, processing equipment 1 also includes updating device 15 '.Updating device 15 ' handles rule according to the content blocks of the new determination, Establish or update the processing rule base.
Here, the function of device 11 ', 12 ', 13 ' and 14 ' shown in Fig. 2 and the above device 11 described by reference picture 1, 12nd, 13 is identical with 14 content, for simplicity, it is incorporated herein by reference, without repeating.
Specifically, when processing Rule device 13 ' does not obtain corresponding content according to identification information from processing rule base When block processing is regular, it is that identification information newly determines content blocks processing rule, then updating device 15 ' according to the identification information and its The processing rule of the corresponding new determination is written in the processing rule base, to update the processing rule base;If detect at this Reason rule base is not set up, then initializes the processing rule base in advance, then above- mentioned information is written in the processing rule base.
In one example, the new place corresponding to the mark entitled " embedded object " that Rule device 13 ' obtains is handled When reason rule is delete processing, then updating device 15 ' inserts the mark title and its corresponding place in rule base is handled Manage the data record of rule.
Those skilled in the art will be understood that above-mentioned foundation or the mode of renewal processing rule base are only for example, and other are existing Or the mode of the foundation that is likely to occur from now on or renewal processing rule base be such as applicable to the present invention, should also be included in the present invention Within protection domain, and it is incorporated herein by reference.
(reference picture 1) in a further advantageous embodiment, processing equipment 1 also include providing device (not shown).Wherein, it is former The accessing page request that beginning webpage acquisition device 11 is inputted according to user by mobile terminal, obtains the original web page;There is provided The target web is supplied to the user by device.
Another preferred embodiment is described in detail referring to Fig. 1, wherein, identification information extraction element 12 is from institute Extraction block identification information in the making language document of original web page is stated, wherein, the block identification information is used to identify the mark Each content blocks in language file;Then, Rule device 13 is handled according to the block identification information, in rule base is handled Matching inquiry is carried out, rule is handled to obtain the content blocks corresponding with the block identification information;Then, target web acquisition device 14 ground handle rule according to the content blocks, and the content blocks identified to the block identification information are handled accordingly, to obtain Target web;Its detailed process obtains with the embodiment identification information extraction element 12 described by foregoing reference picture 1, processing rule Take device 13 identical with the process performed by target web acquisition device 14, for simplicity, be incorporated herein by reference, and Do not repeat.
In one example, first, when user is inputted in the address field input frame of the browser software of mobile terminal, The mobile terminal obtains a webpage URL of user's input in real time, and is recorded as the page corresponding with user's input operation Face access request, wherein, the accessing page request includes the URL, the communication that the accessing page request then is passed through into agreement Mode is sent to processing equipment 1;Then, original web page acquisition device 11 receives the accessing page request in real time, and therefrom carries Page URL is taken, and the request for obtaining the webpage is sent to the webserver where webpage pointed by the URL, then, receiving should The webpage that network server response is fed back in the request, and using the webpage as the pending original web page.
The target web that device obtains target web acquisition device 14 is provided, provided using any of mobile terminal The technological means of people's readable information, such as screen display, loudspeaker broadcasting etc., the target web is supplied to by mobile terminal The user.For example, by taking screen display as an example, there is provided the target web that device obtains target web acquisition device 14, pass through page Surface technology, such as JSP, ASP or PHP, mobile terminal is supplied to form in certain sequence, such as is shown with link, the page Mode is supplied to the mobile terminal, is browsed for user.
Those skilled in the art will be understood that the mode of above-mentioned acquisition original web page and/or provide the mode of target web only For citing, other existing or modes of acquisitions original web page and/or the mode of offer target web for being likely to occur from now on are such as The present invention is applicable to, should be also included within the scope of the present invention, and be incorporated herein by reference.
Preferably (reference picture 1), processing equipment 1 also include parameter obtaining device (not shown) and preference rule acquisition device (not shown).Wherein, parameter obtaining device obtains the display parameter information of the mobile terminal;Preference rule acquisition device according to The display parameter information optimizes to content blocks processing rule, to obtain preferred content block processing rule;Target network Page acquisition device 14 handles rule according to the preferred content block, the content blocks is handled accordingly, with described in acquisition Target web.
Specifically, parameter obtaining device in the way of agreement by calling the mobile terminal of the target web to be shown to carry The API (API) of confession, obtain the display parameter information of the mobile terminal;Here, the display parameter information includes But it is not limited to:
1) picture format that mobile terminal is supported, such as JPEG, PNG, GIF form,
2) screen resolution of mobile terminal, such as the physics size of pixel, color digit,
3) whether mobile terminal supports plug-in unit, such as Flash plug-in units;
Then, the display parameter information for the mobile terminal that preference rule acquisition device obtains according to parameter obtaining device, Rule is handled to content blocks of the processing Rule device 13 acquired in each identification information and optimizes processing, it is preferred to obtain Content blocks processing rule.Then, target web acquisition device 14 handles rule according to the preferred content block, and the content blocks are entered The corresponding processing of row, to obtain the target web.
In one example, when the block identification information in the making language document that identification information acquisition device 12 obtains is " Flash ", Flash animations are included in its content blocks identified, and handle Rule device 13 and obtained in rule base is handled The respective handling rule taken is deleted for the Flash animations for being identified the identification information, but the display that parameter obtaining device obtains Parameter information shows that the mobile terminal supports FLASH plug-in component operations, then preference rule acquisition device is accordingly by the identification information institute For corresponding original processing rule optimization to retain the Flash animations in content blocks, as preferred content block handles rule;And then mesh Mark webpage acquisition device 14 retains FLASH animations therein when carrying out respective handling to the content blocks, and the FLASH is included to obtain The target web of animation.
Those skilled in the art will be understood that the mode of above-mentioned acquisition display parameter information and/or obtain at preferred content block The mode for managing rule and/or the mode for obtaining target web are only for example, and other acquisitions that are existing or being likely to occur from now on show Show the mode of parameter information and/or acquisition preferred content block handles the mode of rule and/or the mode of acquisition target web such as may be used Suitable for the present invention, should also be included within the scope of the present invention, and be incorporated herein by reference.
Fig. 3 is shown according to method flow diagram of the one aspect of the invention based on content block identification processing web page contents.
Here, processing equipment 1 can be the network equipment, including but not limited to computer, network host, single network service The cloud that device, multiple webserver collection or multiple servers are formed, here, cloud is by based on cloud computing (Cloud Computing) A large amount of computers or the webserver form, wherein, cloud computing is one kind of Distributed Calculation, by the meter of a group loose couplings One super virtual computer of calculation machine collection composition;Processing equipment 1 or mobile terminal, the mobile terminal means can be The computer equipment used in movement, including but not limited to mobile phone, notebook, POS, vehicle-mounted computer etc., its screen size The generally much smaller than display sizes of desktop computer.
It is described in detail referring to Fig. 3 to handle processing equipment 1 process of web page contents:
Specifically, in step sl, processing equipment 1 obtains pending original web page.
Here, the mode for obtaining pending original web page includes but is not limited to scenario described below:
1) according to the accessing page request from mobile terminal, the URL from the accessing page request (URL) corresponding original web page is obtained at the Website server pointed by;
In one example, first, user is by the interactive device of mobile terminal, including but not limited to keyboard, mouse, remote control Device, touch pad or handwriting equipment, interact with the browser software or client software of mobile terminal, by taking keyboard as an example, use When family is inputted in the address field input frame of the browser software of mobile terminal, it is defeated that the mobile terminal obtains user in real time The keystroke sequence entered, such as a URL (URL) of user's input, and be recorded as and user's input operation phase Corresponding accessing page request, wherein, the accessing page request includes the URL, then passes through the accessing page request about Fixed communication mode is sent to processing equipment 1;Then, in step sl, processing equipment 1 receives the page access and asked in real time Ask, and therefrom extract page URL, and the request for obtaining the webpage is sent to the webserver where webpage pointed by the URL, For example, can be encapsulated as a request message, such as http request message, and by corresponding communication protocol, such as http, https Communication protocol, send to the webserver;Then, processing equipment 1 receives the network server response and fed back in the request Webpage, and using the webpage as the pending original web page.
2) pending original web page is obtained from third party device.
In another example, processing equipment 1 is the network equipment.In step sl, processing equipment 1 carries according to third party device The API (API) of confession, by predetermined condition or event triggering or regularly send reception to the third party device and treat The request message of the original web page of processing, and receive the third party device returned in response to the request message it is pending original Webpage;Or third party device actively pushes pending original web page to processing equipment 1, in step sl, processing equipment 1 receives The pending original web page.
Those skilled in the art will be understood that the mode of the pending original web page of above-mentioned acquisition is only for example, and other are existing Or the mode of the pending original web page of acquisition that is likely to occur from now on be such as applicable to the present invention, should also be included in the present invention Within protection domain, and it is incorporated herein by reference.
Then, in step s 2, processing equipment 1 is from the making language document of its original web page obtained in step sl Such as block identification information is extracted using modes such as string matchings, wherein, the block identification information is used to identify the mark language Say each content blocks in file.
Here, the making language document includes but is not limited to:
1) HTML (HTML) file, it is a kind of standard generalized markup language for describing web document Speech;
2) XML (extensible markup language) file, it is a kind of standard generalized markup language for being simply to data storage Speech;
3) XHTML (extensible HyperText Markup Language) file, it is a kind of mark with strict grammer based on XML Language;
4) WML (WML) file, it is retouched for creating the one kind for the page that can be displayed in WAP browser The property stated markup language.
Those skilled in the art will be understood that above-mentioned making language document is only for example, and other are existing or may go out from now on Existing making language document is such as applicable to the present invention, should also be included within the scope of the present invention, and wrap by reference It is contained in this.
Here, the block identification information includes but is not limited to identify title, mark ID etc.;Wherein, the name of title is identified Can according to its mark content blocks type, as title, navigation, text, picture, embedded object (such as Java applet, ActiveX, Flash) etc..
Here, the content blocks mean the content area being made up of one or more labels in making language document, its It is corresponding with the certain content shown in webpage, e.g., title content block, body matter block, navigation content block, image content block, Embedded object (such as Java applet, ActiveX, Flash) block.
Here, storage mode of the block identification information in making language document includes but is not limited to:
1) annotation in making language document;For example, using JSON forms, identification information can be stored in html file annotation In, such as<!--tc block_begin:{type:″context″}-->, wherein, JSON forms are that a kind of data of lightweight are handed over Form is changed, it typically represents data using " name/value " to by the way of, used between title and value ":" separate;
2) the customization label in making language document;For example, in html file, customization label can be<tc></tc>, mark Knowing information can be stored in the customization label;
3) tag attributes in making language document;For example, in XHTML files, identification information can be stored in content blocks In the attribute of label, such as<Div markName=" title ">, wherein attribute markName property value is to be used to identify this The identification information of content blocks corresponding to div tag.
Those skilled in the art will be understood that above-mentioned storage mode is only for example, and other are existing or are likely to occur from now on Storage mode is such as applicable to the present invention, should also be included within the scope of the present invention, and be incorporated herein by reference.
In one example, the markup language file of the original web page obtained in step s 2 when processing equipment 1 is XHTML texts During part, such as:
Wherein, the pre-defined tag attributes using the entitled markName of attribute of this XHTML file are come storage content block mark Know information, accordingly, in step s 2, processing equipment 1 to the XHTML files by parsing, and according to keyword " markName " carries out string matching therefrom to obtain the markName attributes and its property value " mark in div tag attribute Topic ", the property value is the mark title of content blocks corresponding to the div tag, and the markName category in img tag attributes Property and its property value " picture ", the property value are the mark title of content blocks corresponding to the img labels.
Those skilled in the art will be understood that the mode of said extracted block identification information is only for example, and other are existing or modern The mode for the extraction block identification information being likely to occur afterwards is such as applicable to the present invention, also should be included in the scope of the present invention with It is interior, and be incorporated herein by reference.
Then, in step s3, the block identification information that processing equipment 1 obtains in step s 2 according to it, in processing rule Matching inquiry is carried out in storehouse, rule is handled to obtain the content blocks corresponding with the block identification information.
Specifically, in step s3, processing equipment 1 is advised according to block identification information in the processing of local or third party device Matching inquiry is then carried out in storehouse, rule is handled to obtain the content blocks corresponding with the block identification information.
Here, the processing rule includes but is not limited to:
1) content in content blocks is formatted;Wherein, the formatting includes but is not limited to:
I changes the word attribute in the content blocks, such as font, size, color, the background colour of content etc.;
Ii is reduced etc. to the picture included in the content blocks by predetermined ratio;
2) content blocks are shown;
3) content blocks are deleted;
4) content blocks are folded;Wherein, it is that folding is hidden that the folding, which means that the content blocks are arranged to its content by default, Hide, but can be deployed the content to show by specific triggering mode;
5) display location of content blocks is adjusted.
Those skilled in the art will be understood that above-mentioned processing rule is only for example, and other are existing or are likely to occur from now on Processing rule is such as applicable to the present invention, should also be included within the scope of the present invention, and be incorporated herein by reference.
Here, comprising each block identification information and its corresponding processing rule in the processing rule base, it is included but not It is limited to relational database, Key-Value storage systems, file system etc..
In one example, block identification information is " title ", and in step s3, processing equipment 1 is led to according to the block identification information Cross processing equipment 1 offer API (API), local processing rule base in carry out matching inquiry, with obtain with The corresponding content blocks processing rule of " title " block identification information is " show ", will the content blocks that are identified of the block identification information It is shown processing.
In another example, block identification information is " picture ", in step s3, processing equipment 1 according to the block identification information, Processing Rule request is sent to third party device, wherein, processing Rule request includes the block identification information;Example Such as, it can be encapsulated as a request message, such as http request message, and be led to by corresponding communication protocol, such as http, https Believe agreement, send to third party device;Third party device receives and parses through the solicited message, Jin Ergen in a manner of monitoring ground in real time Managed at which according to the block identification information extracted and matching inquiry is carried out in rule base, it is corresponding with the block identification information to obtain Content blocks processing rule be " zoomin ", picture that will be in the content blocks that are identified of the block identification information carries out predetermined contracting Small processing.
Those skilled in the art will be understood that the mode of above-mentioned acquisition processing rule is only for example, and other are existing or from now on The mode for the acquisition processing rule being likely to occur such as is applicable to the present invention, should also be included within the scope of the present invention, and It is incorporated herein by reference.
Preferably, in step s3, processing equipment 1 is according to the block identification information and the original web page affiliated web site Identification information, matching inquiry is carried out in rule base is handled, rule is handled with the content blocks that the webpage obtained as the website customizes. Here, the identification information of the original web page affiliated web site includes but is not limited to website domain name, web IP address, web site name Deng.
Specifically, in step s3, processing equipment 1 for example obtains pending original web page in step sl according to it URL, determine the identification information of the webpage affiliated web site, such as website domain name, web IP address;Then, processing equipment 1 is according to it The block identification information and the identification information of the original web page affiliated web site obtained in step s 2, the progress in rule base is handled With inquiry, if matching acquisition is the predetermined processing rule of the webpage of the website, using predetermined processing rule as the webpage Content blocks processing rule.
In one example, when block identification information is " embedded object ", the URL of original web page is " www.abc.com/ Sport/101.htm ", in step s3, processing equipment 1 is according to the URL, and the website domain name of website is where extracting the webpage “www.abc.com”;Processing equipment 1 carries out matching inquiry according to the block identification information in rule base is handled, and obtains corresponding It is " delete " to handle rule, that is, deletes the content blocks that the identification information is identified, but according to the block identification information and the original net The website domain name of the page website carries out matching inquiry in rule base is handled, obtain for the website it is predetermined to " embedded object " The processing rule of block identification information is " show ", that is, shows the content blocks that the identification information is identified, then processing equipment 1 ignore with Delete processing rule corresponding to the block identification information, advised for the predetermined processing rule in the website as content blocks processing Then.
Those skilled in the art will be understood that the mode of above-mentioned acquisition processing rule is only for example, and other are existing or from now on The mode for the acquisition processing rule being likely to occur such as is applicable to the present invention, should also be included within the scope of the present invention, and It is incorporated herein by reference.
Then, in step s 4, the content blocks that processing equipment 1 obtains in step s3 according to it handle rule, to the block The content blocks that identification information is identified are handled accordingly, to obtain target web.
Here, described include but is not limited to the corresponding processing of content blocks progress:Row format is entered to the content in content blocks Change, show, delete, fold, sequencing.
In one example, when in step s 2, processing equipment 1 parses and obtains two blocks in the html file of certain webpage Identification information is respectively " text " and " picture ", and in step s3, processing equipment 1 obtains relative with " text " block identification information The content blocks processing rule answered folds for the content blocks for being identified the identification information, and corresponding with " picture " block identification information Content blocks processing rule reduced for the picture in the content blocks that are identified the identification information in predetermined reduction ratio;Then In step s 4, processing equipment 1 obtains the content that each identification information is identified according to above-mentioned identification information in the html file Block, then, according to corresponding processing rule, the content folded concealed in the content blocks that " text " block identification information is identified, And predetermined triggering mode is set, in the future the body matter can be deployed show to realize, and by " picture " block identification information Picture in the content blocks identified is reduced and shown by predetermined ratio, and then using the webpage after processing as target network Page.
Those skilled in the art will be understood that the mode of above-mentioned acquisition target web is only for example, and other are existing or from now on The mode for the acquisition target web being likely to occur such as is applicable to the present invention, should also be included within the scope of the present invention, and It is incorporated herein by reference.
Preferably, processing equipment 1 is continuously to work in step S1, step S2, step S3 and step S4.Specifically Ground, in step sl, processing equipment 1 constantly obtain pending original web page;Then, in step s 2, processing equipment 1 Block identification information is constantly extracted from the making language document of the original web page, wherein, the block identification information is used to mark Know each content blocks in the making language document;Then, in step s3, processing equipment 1 is also constantly according to described piece of mark Know information, matching inquiry is carried out in rule base is handled, rule is handled to obtain the content blocks corresponding with the block identification information; Then, in step s 4, processing equipment 1 also constantly handles rule according to the content blocks, and the block identification information is identified Content blocks handled accordingly, to obtain target web.Here, it will be understood by those skilled in the art that " lasting " refers to handle Equipment 1 constantly carries out the acquisition of above-mentioned original web page, the extraction of block identification information, the acquisition of processing rule and mesh in each step The acquisition of webpage is marked, until meet predetermined stoppage condition, such as processing equipment 1 stops obtaining pending original in a long time Beginning webpage.
Preferably (reference picture 3), when not from the processing rule base obtain content blocks processing it is regular when, in step S3 In, the content correlated information for the content blocks that processing equipment 1 can be identified according to the block identification information, determine at the content blocks Reason rule.
Here, the content correlated information of the content blocks includes but is not limited to:
1) positional information of the content of content blocks in the original web page;
2) the alphabetic character quantity that the content of content blocks is included;
3) label information that content blocks are included.
Those skilled in the art will be understood that the above relevant information is only for example, and other are existing or may go out from now on Existing content correlated information is such as applicable to the present invention, should also be included within the scope of the present invention, and wrap by reference It is contained in this.
Here, the mode for determining content blocks processing rule includes but is not limited to scenario described below:
1) in step s3, processing equipment 1 determines processing rule according to position of the content blocks in original web page;For example, If the content blocks that block identification information is identified are located at the center of original web page, that is, illustrate the content blocks in the original web page Important level is high, then can determine that content blocks processing rule to be shown processing to the content blocks.
2) in step s3, the quantity of alphabetic character of the processing equipment 1 in content blocks determines processing rule;For example, If the content blocks character quantity that block identification information is identified exceedes book character amount threshold, it can determine that it handles rule and is Word content in the content blocks is subjected to folding;
3) in step s3, processing equipment 1 determines processing rule according to the label object included in the content blocks;For example, If the content blocks that block identification information is identified in the making language document of original web page include label<object>, and the mark Label<object>The object used comprising predetermined limit in a mobile device, such as ActiveX, it is determined that it is to this that it, which handles rule, Content blocks are deleted.
In one example, code below fragment be present in the html file of original web page:
Block identification information present in it is " embedded object ", and in step s3, processing equipment 1 is according to the block identification information Fail the match query from processing rule base and obtain corresponding content blocks processing rule, and from the label<object>Middle parsing obtains Obtaining the label has attribute clsid, and then determines including ActiveX embedded objects, thereby determines that the block identification information institute Corresponding processing rule is deleted for the content blocks for being identified the identification information.
Those skilled in the art will be understood that the mode of above-mentioned determination processing rule is only for example, and other are existing or from now on The mode for the determination processing rule being likely to occur such as is applicable to the present invention, should also be included within the scope of the present invention, and It is incorporated herein by reference.
Fig. 4 shows the method flow based on content block identification processing web page contents in accordance with a preferred embodiment of the present invention Figure.Wherein, the process also includes step S5 '.In step S5 ', processing equipment 1 is handled according to the content blocks of the new determination and advised Then, establish or update the processing rule base.
Here, function of the processing equipment 1 shown in Fig. 4 in step S1 ', step S2 ', step S3 ' and step S4 ' is with before Processing equipment 1 described by face reference picture 1 is identical with the content in step S4 in step S1, step S2, step S3, is risen to be concise See, it is incorporated herein by reference, without repeating.
Specifically, when in step S3 ', processing equipment 1 is not obtained in corresponding according to identification information from processing rule base When appearance block processing is regular, it is that identification information newly determines content blocks processing rule, then in step S5 ', processing equipment 1 is according to this Identification information and its processing rule of the corresponding new determination are written in the processing rule base, to update the processing rule base; If detecting, the processing rule base is not set up, and initializes the processing rule base in advance, then above- mentioned information is written at this Manage in rule base.
In one example, it is new corresponding to the mark entitled " embedded object " that processing equipment 1 obtains in step S3 ' When processing rule is delete processing, then in step S5 ', processing equipment 1 inserts the mark title in rule base is handled And its data record of corresponding processing rule.
Those skilled in the art will be understood that above-mentioned foundation or the mode of renewal processing rule base are only for example, and other are existing Or the mode of the foundation that is likely to occur from now on or renewal processing rule base be such as applicable to the present invention, should also be included in the present invention Within protection domain, and it is incorporated herein by reference.
(reference picture 3) in a further advantageous embodiment, the process also include step S6 (not shown).Wherein, in step In S1, accessing page request that processing equipment 1 is inputted according to user by mobile terminal obtains the original web page;In step In S6, the target web is supplied to the user by processing equipment 1.
Another preferred embodiment is described in detail referring to Fig. 3, wherein, in step s 2, processing equipment 1 from Block identification information is extracted in the making language document of the original web page, wherein, the block identification information is used to identify the mark Remember each content blocks in language file;Then, in step s3, processing equipment 1 is according to the block identification information, in processing rule Matching inquiry is carried out in storehouse, rule is handled to obtain the content blocks corresponding with the block identification information;Then, in step s 4, The ground of processing equipment 1 handles rule according to the content blocks, and the content blocks identified to the block identification information are handled accordingly, To obtain target web;Its detailed process is with processing equipment 1 in the embodiment described by foregoing reference picture 3 in step S2, step Performed process is identical in S3 and step S4, for simplicity, is incorporated herein by reference, without repeating.
In one example, first, when user is inputted in the address field input frame of the browser software of mobile terminal, The mobile terminal obtains a webpage URL of user's input in real time, and is recorded as the page corresponding with user's input operation Face access request, wherein, the accessing page request includes the URL, the communication that the accessing page request then is passed through into agreement Mode is sent to processing equipment 1;Then, in step sl, processing equipment 1 receives the accessing page request in real time, and therefrom Page URL is extracted, and the request for obtaining the webpage is sent to the webserver where webpage pointed by the URL, then, is received The webpage that the network server response is fed back in the request, and using the webpage as the pending original web page.
In step s 6, the target web that processing equipment 1 obtains it in step s 4, using any of mobile whole End provides the technological means of people's readable information, such as screen display, loudspeaker broadcasting etc., and the target web is passed through into mobile terminal It is supplied to the user.For example, by taking screen display as an example, in step s 6, the target that processing equipment 1 obtains it in step s 4 Webpage, by page technology, such as JSP, ASP or PHP, be supplied to mobile terminal with form in certain sequence, for example, with link, The page shows etc. that mode is supplied to the mobile terminal, is browsed for user.
Those skilled in the art will be understood that the mode of above-mentioned acquisition original web page and/or provide the mode of target web only For citing, other existing or modes of acquisitions original web page and/or the mode of offer target web for being likely to occur from now on are such as The present invention is applicable to, should be also included within the scope of the present invention, and be incorporated herein by reference.
Preferably (reference picture 3), the process also include step S7 (not shown) and step S8 (not shown).Wherein, in step In rapid S7, processing equipment 1 obtains the display parameter information of the mobile terminal;In step s 8, processing equipment 1 is according to described aobvious Show that parameter information optimizes to content blocks processing rule, to obtain preferred content block processing rule;In step s 4, locate Manage equipment 1 and rule is handled according to the preferred content block, the content blocks are handled accordingly, to obtain the target network Page.
Specifically, in the step s 7, processing equipment 1 in the way of agreement by calling the shifting of the target web to be shown The API (API) that dynamic terminal provides, obtain the display parameter information of the mobile terminal;Here, the display parameters Information includes but is not limited to:
1) picture format that mobile terminal is supported, such as JPEG, PNG, GIF form,
2) screen resolution of mobile terminal, such as the physics size of pixel, color digit,
3) whether mobile terminal supports plug-in unit, such as Flash plug-in units;
Then, in step s 8, processing equipment 1 is believed according to the display parameters of its mobile terminal obtained in the step s 7 Breath, processing is optimized to its content blocks processing rule in step s3 acquired in each identification information, to obtain in preferably Hold block processing rule.Then, in step s 4, processing equipment 1 handles rule according to the preferred content block, and the content blocks are entered The corresponding processing of row, to obtain the target web.
In one example, when the block identification information in the making language document that processing equipment 1 obtains in step s 2 is " Flash ", Flash animations are included in its content blocks identified, and in step s3, processing equipment 1 is in rule base is handled The respective handling rule of acquisition is deleted for the Flash animations for being identified the identification information, but in the step s 7, processing equipment 1 The display parameter information of acquisition shows that the mobile terminal supports FLASH plug-in component operations, then in step s 8, processing equipment 1 is accordingly It is the Flash animations in reservation content blocks, as preferred content block by the original processing rule optimization corresponding to the identification information Processing rule;And then in step s 4, processing equipment 1 retains FLASH animations therein when carrying out respective handling to the content blocks, To obtain the target web for including the FLASH animations.
Those skilled in the art will be understood that the mode of above-mentioned acquisition display parameter information and/or obtain at preferred content block The mode for managing rule and/or the mode for obtaining target web are only for example, and other acquisitions that are existing or being likely to occur from now on show Show the mode of parameter information and/or acquisition preferred content block handles the mode of rule and/or the mode of acquisition target web such as may be used Suitable for the present invention, should also be included within the scope of the present invention, and be incorporated herein by reference.
It is obvious to a person skilled in the art that the invention is not restricted to the details of above-mentioned one exemplary embodiment, Er Qie In the case of without departing substantially from spirit or essential attributes of the invention, the present invention can be realized in other specific forms.Therefore, no matter From the point of view of which point, embodiment all should be regarded as exemplary, and be nonrestrictive, the scope of the present invention is by appended power Profit requires rather than described above limits, it is intended that all in the implication and scope of the equivalency of claim by falling Change is included in the present invention.Any reference in claim should not be considered as to the involved claim of limitation.This Outside, it is clear that the word of " comprising " one is not excluded for other units or step, and odd number is not excluded for plural number.That is stated in device claim is multiple Unit or device can also be realized by a unit or device by software or hardware.The first, the second grade word is used for table Show title, and be not offered as any specific order.

Claims (14)

1. a kind of computer implemented method based on content block identification processing web page contents, wherein, this method includes following step Suddenly:
A obtains pending original web page;
B extracts block identification information from the making language document of the original web page, wherein, the block identification information is used to identify Each content blocks in the making language document;
C carries out matching inquiry according to the block identification information in rule base is handled, corresponding with the block identification information to obtain Content blocks processing rule;
D handles rule according to the content blocks, and the content blocks identified to the block identification information are handled accordingly, to obtain Target web;
Wherein, the step c includes:
- according to the block identification information and the identification information of the original web page affiliated web site, the progress in rule base is handled With inquiry, to obtain the content blocks processing rule;
- when not from the processing rule base obtain content blocks processing it is regular when, identified according to the block identification information The content correlated information of content blocks, determine the content blocks processing rule;
Wherein, the content correlated information includes following at least any one:
Positional information of the content of-content blocks in the original web page;
The alphabetic character quantity that the content of-content blocks is included;
The label information that-the content blocks are included.
2. according to the method for claim 1, wherein, the content blocks processing rule includes following at least any one:
- content in the content blocks is formatted;
- content blocks are shown;
- content blocks are deleted;
- content blocks are folded.
3. according to the method for claim 1, wherein, this method also includes:
- according to the content blocks processing rule newly determined, establish or update the processing rule base.
4. according to the method in any one of claims 1 to 3, wherein, the step a includes:
- the accessing page request inputted according to user by mobile terminal, obtains the original web page;
Wherein, this method also includes:
- target web is supplied to the user.
5. according to the method for claim 4, wherein, this method also includes:
The display parameter information of-acquisition the mobile terminal;
- content blocks processing rule is optimized according to the display parameter information, to obtain preferred content block processing rule Then;
Wherein, the step d includes:
- according to preferred content block processing rule, the content blocks are handled accordingly, to obtain the target network Page.
6. the method according to claim 11, wherein, storage side of the block identification information in the making language document Formula includes following at least any one:
Annotation in the-making language document;
Customization label in the-making language document;
Tag attributes in the-making language document.
7. according to the method for claim 1, wherein, the making language document includes following at least any one:
- html file;
- XML file;
- XHTML files;
- WML files.
8. a kind of equipment based on content block identification processing web page contents, wherein, the equipment includes:
Original web page acquisition device, for obtaining pending original web page;
Identification information extraction element, for extracting block identification information from the making language document of the original web page, wherein, institute Block identification information is stated to be used to identify each content blocks in the making language document;
Rule device is handled, for according to the block identification information, matching inquiry being carried out in rule base is handled, to obtain The content blocks processing rule corresponding with the block identification information;
Target web acquisition device, for handling rule, the content blocks identified to the block identification information according to the content blocks Handled accordingly, to obtain target web;
Wherein, the processing Rule device is used for:
According to the block identification information and the identification information of the original web page affiliated web site, matched in rule base is handled Inquiry, to obtain the content blocks processing rule;
When not from the processing rule base obtain content blocks processing it is regular when, identified according to the block identification information in Hold the content correlated information of block, determine the content blocks processing rule;
Wherein, the content correlated information includes following at least any one:
Positional information of the content of-content blocks in the original web page;
The alphabetic character quantity that the content of-content blocks is included;
The label information that-the content blocks are included.
9. equipment according to claim 8, wherein, the content blocks processing rule includes following at least any one:
- content in the content blocks is formatted;
- content blocks are shown;
- content blocks are deleted;
- content blocks are folded.
10. equipment according to claim 8, wherein, the equipment also includes:
Updating device, for handling rule according to the content blocks newly determined, establish or update the processing rule base.
11. the equipment according to any one of claim 8 to 10, wherein, the original web page acquisition device is used for basis The accessing page request that user is inputted by mobile terminal, obtain the original web page;
Wherein, the equipment also includes:
Device is provided, for the target web to be supplied into the user.
12. equipment according to claim 11, wherein, the equipment also includes:
Parameter obtaining device, for obtaining the display parameter information of the mobile terminal;
Optimize device, it is preferred to obtain for being optimized according to the display parameter information to content blocks processing rule Content blocks processing rule;
Wherein, the target web acquisition device is used to handle rule according to the preferred content block, and the content blocks are carried out Corresponding processing, to obtain the target web.
13. equipment according to claim 8, wherein, storage of the block identification information in the making language document Mode includes following at least any one:
Annotation in the-making language document;
Customization label in the-making language document;
Tag attributes in the-making language document.
14. equipment according to claim 8, wherein, the making language document includes following at least any one:
- html file;
- XML file;
- XHTML files;
- WML files.
CN201110390828.9A 2011-11-30 2011-11-30 A kind of method and apparatus based on content block identification processing web page contents Active CN103136259B (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN201110390828.9A CN103136259B (en) 2011-11-30 2011-11-30 A kind of method and apparatus based on content block identification processing web page contents
PCT/CN2012/075044 WO2013078829A1 (en) 2011-11-30 2012-05-03 Method and device for processing webpage content on the basis of content block identification

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201110390828.9A CN103136259B (en) 2011-11-30 2011-11-30 A kind of method and apparatus based on content block identification processing web page contents

Publications (2)

Publication Number Publication Date
CN103136259A CN103136259A (en) 2013-06-05
CN103136259B true CN103136259B (en) 2018-03-23

Family

ID=48496093

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201110390828.9A Active CN103136259B (en) 2011-11-30 2011-11-30 A kind of method and apparatus based on content block identification processing web page contents

Country Status (2)

Country Link
CN (1) CN103136259B (en)
WO (1) WO2013078829A1 (en)

Families Citing this family (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103473004A (en) * 2013-09-29 2013-12-25 小米科技有限责任公司 Method, device and terminal equipment for displaying message
CN103544320A (en) * 2013-11-05 2014-01-29 从兴技术有限公司 Webpage generation method and device
CN104834685A (en) * 2015-04-17 2015-08-12 百度国际科技(深圳)有限公司 Method and device for processing comment message block in comment-like webpage
CN106126485A (en) * 2016-06-14 2016-11-16 北京金山安全软件有限公司 Text format generation method, server and terminal
CN108595697B (en) * 2018-05-09 2021-02-02 未鲲(上海)科技服务有限公司 Webpage integration method, device and system
CN109710863A (en) * 2018-11-27 2019-05-03 平安科技(深圳)有限公司 Information conversion method, device, computer equipment and storage medium
CN111125605B (en) * 2019-12-31 2022-07-29 北京创鑫旅程网络技术有限公司 Page element acquisition method and device

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101039357A (en) * 2006-03-17 2007-09-19 陈晓月 Method for browsing website using handset
CN101526953A (en) * 2009-01-19 2009-09-09 北京跳网无限科技发展有限公司 WWW transformation technology
CN101815093A (en) * 2010-03-11 2010-08-25 深圳市嘉讯软件有限公司 Method for adapting webpage to mobile terminal and mobile terminal page adaptation device

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20040054973A1 (en) * 2000-10-02 2004-03-18 Akio Yamamoto Method and apparatus for transforming contents on the web
CN102163233A (en) * 2011-04-18 2011-08-24 北京神州数码思特奇信息技术股份有限公司 Method and system for converting webpage markup language format

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101039357A (en) * 2006-03-17 2007-09-19 陈晓月 Method for browsing website using handset
CN101526953A (en) * 2009-01-19 2009-09-09 北京跳网无限科技发展有限公司 WWW transformation technology
CN101815093A (en) * 2010-03-11 2010-08-25 深圳市嘉讯软件有限公司 Method for adapting webpage to mobile terminal and mobile terminal page adaptation device

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
基于网页格局的内容分块算法;路松峰等;《计算机工程与科学》;20070930;第29卷(第9期);16-18 *
浅议WEB页面到WAP页面转换过程;王永飞;《铜陵财经专科学校学报》;20021231;42-45 *
面向移动终端的Web内容转换工具的设计与实现;胥晓欢;《中国优秀硕士学位论文全文数据库 信息科技辑》;20091115;第11-19页,第23页,第27-33页 *

Also Published As

Publication number Publication date
CN103136259A (en) 2013-06-05
WO2013078829A1 (en) 2013-06-06

Similar Documents

Publication Publication Date Title
CN103136259B (en) A kind of method and apparatus based on content block identification processing web page contents
CN103544176B (en) Method and apparatus for generating the page structure template corresponding to multiple pages
US10915828B2 (en) Website address identification method and apparatus
AU2006294603B2 (en) System and method for image processing
CN102004756B (en) Traffic visualization across web maps
CN103365862B (en) It is a kind of for generating the method and apparatus of picture corresponding with the page
US8196035B2 (en) Adaptation of a website to mobile web browser
CN103336794B (en) For providing the corresponding method and apparatus that information is presented in target pages
KR20190039230A (en) Method and system for server-side rendering of native content for presentations
US20140019441A1 (en) Custom web page themes
CN102123195A (en) Apparatus and method for providing bookmark service in communication terminal
CN104243273A (en) Method and device for displaying information on instant messaging client and information display system
CN102314494B (en) Method and equipment for processing webpage contents
CN105718559B (en) Search forms pages and the method and apparatus of target pages transforming relationship
CN102306174A (en) Method and equipment for interacting with user based on web page elements
WO2015026750A1 (en) Presenting fixed format documents in reflowed format
CN102314499A (en) Method and equipment for processing webpage content
CN107436843A (en) Webpage performance test methods and device
CN103246699A (en) Method and device for data access control based on browser
CN103518195A (en) Apparatus, system and method for vector-based form field document
KR20170073693A (en) Extracting similar group elements
CN102760157B (en) A kind of for generating the method that release news, device and the equipment corresponding with mobile terminal
JP2012133515A (en) Information processor, information processing method, program and information processing system
JP5556461B2 (en) Information browsing terminal device, information browsing system, information browsing program, and information browsing method
CN107729573A (en) Information-pushing method and device

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant