CN103136259B - A kind of method and apparatus based on content block identification processing web page contents - Google Patents
A kind of method and apparatus based on content block identification processing web page contents Download PDFInfo
- Publication number
- CN103136259B CN103136259B CN201110390828.9A CN201110390828A CN103136259B CN 103136259 B CN103136259 B CN 103136259B CN 201110390828 A CN201110390828 A CN 201110390828A CN 103136259 B CN103136259 B CN 103136259B
- Authority
- CN
- China
- Prior art keywords
- content blocks
- identification information
- content
- block identification
- rule
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/95—Retrieval from the web
- G06F16/957—Browsing optimisation, e.g. caching or content distillation
- G06F16/9577—Optimising the visualization of content, e.g. distillation of HTML documents
Landscapes
- Engineering & Computer Science (AREA)
- Databases & Information Systems (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Information Transfer Between Computers (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
It is an object of the invention to provide a kind of method and apparatus based on content block identification processing web page contents.First, pending original web page is obtained;Then, block identification information is extracted from the making language document of the original web page, wherein, the block identification information is used to identify each content blocks in the making language document;Then, according to the block identification information, matching inquiry is carried out in rule base is handled, rule is handled to obtain the content blocks corresponding with the block identification information;Then, rule is handled according to the content blocks, the content blocks identified to the block identification information are handled accordingly, to obtain target web.Compared with prior art, the present invention realizes and rapidly content of pages is handled;Thus conversion of page efficiency and quality are improved, so as to lift user experience, simultaneously because only needing to include block identification information in the making language document of the page without including corresponding processing rule, thus mitigating the burden that website carries out maintenance of netpage.
Description
Technical field
The present invention relates to Internet technical field, more particularly to a kind of skill based on content block identification processing web page contents
Art.
Background technology
Prior art is when carrying out web page contents processing, for example, the webpage shown on the desktop is converted to suitable
When the webpage shown on mobile terminals, generally analytically after internet web page in extract subject content, and according to extraction
Subject content generate new webpage, be suitable for moving to realize to be converted to the original web page for being suitable for desktop computer displaying
The target web of equipment displaying, but the less efficient of webpage conversion is carried out using this method, the time cost of processing is high, so as to shadow
The response speed of the accessing page request from mobile terminal user is rung, reduces Consumer's Experience.
Therefore, how to effectively realize and rapidly content of pages handled, turn into current urgent problem to be solved it
One.
The content of the invention
It is an object of the invention to provide a kind of method and apparatus based on content block identification processing web page contents.
It is according to an aspect of the invention, there is provided a kind of computer implemented based on content block identification processing web page contents
Method, this method comprises the following steps:
A obtains pending original web page;
B extracts block identification information from the making language document of the original web page, wherein, the block identification information is used for
Identify each content blocks in the making language document;
C carries out matching inquiry, to obtain and the block identification information phase according to the block identification information in rule base is handled
Corresponding content blocks processing rule;
D handles rule according to the content blocks, and the content blocks identified to the block identification information are handled accordingly, with
Obtain target web.
According to another aspect of the present invention, a kind of equipment based on content block identification processing web page contents is additionally provided, should
Equipment includes:
Original web page acquisition device, for obtaining pending original web page;
Identification information extraction element, for extracting block identification information from the making language document of the original web page, its
In, the block identification information is used to identify each content blocks in the making language document;
Rule device is handled, for according to the block identification information, matching inquiry to be carried out in rule base is handled, with
Obtain the content blocks processing rule corresponding with the block identification information;
Target web acquisition device, for according to the content blocks handle rule, the block identification information is identified in
Hold block to be handled accordingly, to obtain target web.
Compared with prior art, the making language document of present invention original web page acquired in, such as HTML, XHTML text
Part, the corresponding block identification information of each content blocks, matching inquiry is carried out in rule base is handled to obtain and the block identification is believed
Content blocks processing rule corresponding to manner of breathing, so each content blocks are carried out it is corresponding the processing such as fold, delete, formatting,
Rapidly content of pages is handled so as to realize;Thus conversion of page efficiency and quality are improved, is used so as to lift user
Experience, simultaneously because only need to include block identification information in the making language document of the page without regular including corresponding processing,
Thus the burden that website carries out maintenance of netpage is mitigated.
Brief description of the drawings
By reading the detailed description made to non-limiting example made with reference to the following drawings, of the invention is other
Feature, objects and advantages will become more apparent upon:
Fig. 1 shows the equipment schematic diagram based on content block identification processing web page contents according to one aspect of the invention;
Fig. 2 shows the equipment signal based on content block identification processing web page contents in accordance with a preferred embodiment of the present invention
Figure;
Fig. 3 shows the method flow diagram based on content block identification processing web page contents according to a further aspect of the present invention;
Fig. 4 shows the method flow based on content block identification processing web page contents in accordance with a preferred embodiment of the present invention
Figure.
Same or analogous reference represents same or analogous part in accompanying drawing.
Embodiment
The present invention is described in further detail below in conjunction with the accompanying drawings.
Fig. 1 is shown according to equipment schematic diagram of the one aspect of the invention based on content block identification processing web page contents.Wherein,
Processing equipment 1 includes original web page acquisition device 11, identification information extraction element 12, processing Rule device 13 and target network
Page acquisition device 14.
Here, processing equipment 1 can be the network equipment, including but not limited to computer, network host, single network service
The cloud that device, multiple webserver collection or multiple servers are formed, here, cloud is by based on cloud computing (Cloud Computing)
A large amount of computers or the webserver form, wherein, cloud computing is one kind of Distributed Calculation, by the meter of a group loose couplings
One super virtual computer of calculation machine collection composition;Processing equipment 1 or mobile terminal, the mobile terminal means can be
The computer equipment used in movement, including but not limited to mobile phone, notebook, POS, vehicle-mounted computer etc., its screen size
The generally much smaller than display sizes of desktop computer.
It is described in detail referring to Fig. 1 to handle processing equipment 1 process of web page contents:
Specifically, original web page acquisition device 11 obtains pending original web page.
Here, the mode for obtaining pending original web page includes but is not limited to scenario described below:
1) according to the accessing page request from mobile terminal, the URL from the accessing page request
(URL) corresponding original web page is obtained at the Website server pointed by;
In one example, first, user is by the interactive device of mobile terminal, including but not limited to keyboard, mouse, remote control
Device, touch pad or handwriting equipment, interact with the browser software or client software of mobile terminal, by taking keyboard as an example, use
When family is inputted in the address field input frame of the browser software of mobile terminal, it is defeated that the mobile terminal obtains user in real time
The keystroke sequence entered, such as a URL (URL) of user's input, and be recorded as and user's input operation phase
Corresponding accessing page request, wherein, the accessing page request includes the URL, then passes through the accessing page request about
Fixed communication mode is sent to processing equipment 1;Then, original web page acquisition device 11 receives the accessing page request in real time,
And page URL is therefrom extracted, and the request for obtaining the webpage, example are sent to the webserver where webpage pointed by the URL
Such as, it can be encapsulated as a request message, such as http request message, and be led to by corresponding communication protocol, such as http, https
Believe agreement, send to the webserver;Then, original web page acquisition device 11 receives the network server response in the request
And the webpage fed back, and using the webpage as the pending original web page.
2) pending original web page is obtained from third party device.
In another example, processing equipment 1 is the network equipment.Original web page acquisition device 11 provides according to third party device
API (API), regularly send by predetermined condition or event triggering ground or to the third party device and receive and wait to locate
The request message of the original web page of reason, and receive the pending original net that the third party device returns in response to the request message
Page;Or third party device actively pushes pending original web page to processing equipment 1, original web page acquisition device 11 receives this and treated
The original web page of processing.
Those skilled in the art will be understood that the mode of the pending original web page of above-mentioned acquisition is only for example, and other are existing
Or the mode of the pending original web page of acquisition that is likely to occur from now on be such as applicable to the present invention, should also be included in the present invention
Within protection domain, and it is incorporated herein by reference.
Then, the markup language text for the original web page that identification information extraction element 12 obtains from original web page acquisition device 11
The mode such as using string matching extracts block identification information in part, wherein, the block identification information is used to identify the mark
Remember each content blocks in language file.
Here, the making language document includes but is not limited to:
1) HTML (HTML) file, it is a kind of standard generalized markup language for describing web document
Speech;
2) XML (extensible markup language) file, it is a kind of standard generalized markup language for being simply to data storage
Speech;
3) XHTML (extensible HyperText Markup Language) file, it is a kind of mark with strict grammer based on XML
Language;
4) WML (WML) file, it is retouched for creating the one kind for the page that can be displayed in WAP browser
The property stated markup language.
Those skilled in the art will be understood that above-mentioned making language document is only for example, and other are existing or may go out from now on
Existing making language document is such as applicable to the present invention, should also be included within the scope of the present invention, and wrap by reference
It is contained in this.
Here, the block identification information includes but is not limited to identify title, mark ID etc.;Wherein, the name of title is identified
Can according to its mark content blocks type, as title, navigation, text, picture, embedded object (such as Java applet,
ActiveX, Flash) etc..
Here, the content blocks mean the content area being made up of one or more labels in making language document, its
It is corresponding with the certain content shown in webpage, e.g., title content block, body matter block, navigation content block, image content block,
Embedded object (such as Java applet, ActiveX, Flash) block.
Here, storage mode of the block identification information in making language document includes but is not limited to:
1) annotation in making language document;For example, using JSON forms, identification information can be stored in html file annotation
In, such as<!--tc block_begin:{type:″context″}-->, wherein, JSON forms are that a kind of data of lightweight are handed over
Form is changed, it typically represents data using " name/value " to by the way of, used between title and value ":" separate;
2) the customization label in making language document;For example, in html file, customization label can be<tc></tc>, mark
Knowing information can be stored in the customization label;
3) tag attributes in making language document;For example, in XHTML files, identification information can be stored in content blocks
In the attribute of label, such as<Div markName=" title ">, wherein attribute markName property value is to be used to identify this
The identification information of content blocks corresponding to div tag.
Those skilled in the art will be understood that above-mentioned storage mode is only for example, and other are existing or are likely to occur from now on
Storage mode is such as applicable to the present invention, should also be included within the scope of the present invention, and be incorporated herein by reference.
In one example, the markup language file of the original web page obtained when identification information extraction element 12 is XHTML texts
During part, such as:
Wherein, the pre-defined tag attributes using the entitled markName of attribute of this XHTML file are come storage content block mark
Know information, accordingly, identification information extraction element 12 to the XHTML files by parsing, and according to keyword
" markName " carries out string matching therefrom to obtain the markName attributes and its property value " mark in div tag attribute
Topic ", the property value is the mark title of content blocks corresponding to the div tag, and the markName category in img tag attributes
Property and its property value " picture ", the property value are the mark title of content blocks corresponding to the img labels.
Those skilled in the art will be understood that the mode of said extracted block identification information is only for example, and other are existing or modern
The mode for the extraction block identification information being likely to occur afterwards is such as applicable to the present invention, also should be included in the scope of the present invention with
It is interior, and be incorporated herein by reference.
Then, the block identification information that Rule device 13 obtains according to identification information extraction element 12 is handled, is being handled
Matching inquiry is carried out in rule base, rule is handled to obtain the content blocks corresponding with the block identification information.
Specifically, Rule device 13 is handled according to block identification information, in the processing rule of local or third party device
Matching inquiry is carried out in storehouse, rule is handled to obtain the content blocks corresponding with the block identification information.
Here, the processing rule includes but is not limited to:
1) content in content blocks is formatted;Wherein, the formatting includes but is not limited to:
I changes the word attribute in the content blocks, such as font, size, color, the background colour of content etc.;
Ii is reduced etc. to the picture included in the content blocks by predetermined ratio;
2) content blocks are shown;
3) content blocks are deleted;
4) content blocks are folded;Wherein, it is that folding is hidden that the folding, which means that the content blocks are arranged to its content by default,
Hide, but can be deployed the content to show by specific triggering mode;
5) display location of content blocks is adjusted.
Those skilled in the art will be understood that above-mentioned processing rule is only for example, and other are existing or are likely to occur from now on
Processing rule is such as applicable to the present invention, should also be included within the scope of the present invention, and be incorporated herein by reference.
Here, comprising each block identification information and its corresponding processing rule in the processing rule base, it is included but not
It is limited to relational database, Key-Value storage systems, file system etc..
In one example, block identification information is " title ", and processing Rule device 13 passes through according to the block identification information
The API (API) that processing equipment 1 provides, matching inquiry is carried out in the processing rule base of local, to obtain and " mark
The corresponding content blocks processing rule of topic " block identification information is " show ", will the content blocks that are identified of the block identification information enter
Row displaying is handled.
In another example, block identification information is " picture ", processing Rule device 13 according to the block identification information, to
Third party device sends processing Rule request, wherein, processing Rule request includes the block identification information;For example,
It can be encapsulated as a request message, such as http request message, and by corresponding communication protocol, such as http, https communication protocols
View, sends to third party device;Third party device receives and parses through the solicited message in a manner of monitoring ground in real time, and then according to institute
The block identification information of extraction is managed at which carries out matching inquiry in rule base, to obtain in corresponding with the block identification information
It is " zoomin " to hold block processing rule, and picture that will be in the content blocks that are identified of the block identification information is carried out at predetermined diminution
Reason.
Those skilled in the art will be understood that the mode of above-mentioned acquisition processing rule is only for example, and other are existing or from now on
The mode for the acquisition processing rule being likely to occur such as is applicable to the present invention, should also be included within the scope of the present invention, and
It is incorporated herein by reference.
Preferably, Rule device 13 is handled according to the block identification information and the mark of the original web page affiliated web site
Know information, matching inquiry is carried out in rule base is handled, rule is handled with the content blocks that the webpage obtained as the website customizes.
This, the identification information of the original web page affiliated web site includes but is not limited to website domain name, web IP address, web site name etc..
Specifically, handle Rule device 13 and pending original net is for example obtained according to original web page acquisition device 11
The URL of page, determine the identification information of the webpage affiliated web site, such as website domain name, web IP address;Then, processing rule obtains
Block identification information that device 13 is obtained according to identification information extraction element 12 and the identification information of the original web page affiliated web site are taken,
Matching inquiry is carried out in rule base is handled, if matching acquisition is the predetermined processing rule of the webpage of the website, this is made a reservation for
Processing rule it is regular as the content blocks processing of the webpage.
In one example, when block identification information is " embedded object ", the URL of original web page is " www.abc.com/
Sport/101.htm ", processing Rule device 13 is according to the URL, and the website domain name of website is where extracting the webpage
“www.abc.com”;Processing Rule device 13 carries out matching inquiry according to the block identification information in rule base is handled, and obtains
To handle rule accordingly be " delete ", that is, delete the content blocks that the identification information is identified, but according to the block identification information with
The website domain name of website described in the original web page carries out matching inquiry in rule base is handled, and obtains as predetermined pair in the website
The processing rule of " embedded object " block identification information is " show ", that is, shows the content blocks that the identification information is identified, then handle
Rule device 13 ignore with corresponding to the block identification information delete processing rule, by for the website it is predetermined processing rule
As content blocks processing rule.
Those skilled in the art will be understood that the mode of above-mentioned acquisition processing rule is only for example, and other are existing or from now on
The mode for the acquisition processing rule being likely to occur such as is applicable to the present invention, should also be included within the scope of the present invention, and
It is incorporated herein by reference.
Then, the content blocks that target web acquisition device 14 obtains according to processing Rule device 13 handle rule, right
The content blocks that the block identification information is identified are handled accordingly, to obtain target web.
Here, described include but is not limited to the corresponding processing of content blocks progress:Row format is entered to the content in content blocks
Change, show, delete, fold, sequencing.
In one example, when identification information extraction element 12 parses and obtains two block marks in the html file of certain webpage
It is respectively " text " and " picture " to know information, and it is corresponding with " text " block identification information to handle the acquisition of Rule device 13
Content blocks processing rule is the content blocks folding that is identified the identification information, and corresponding with " picture " block identification information interior
Picture in the content blocks for holding block processing rule to be identified the identification information is reduced in predetermined reduction ratio;Then target
Webpage acquisition device 14 obtains the content blocks that each identification information is identified, so according to above-mentioned identification information in the html file
Afterwards, according to corresponding processing rule, the content folded concealed in the content blocks that " text " block identification information is identified, and set
Predetermined triggering mode, the body matter can be deployed in the future to show with realizing, and " picture " block identification information is identified
Content blocks in picture reduced and shown by predetermined ratio, and then using the webpage after processing as target web.
Those skilled in the art will be understood that the mode of above-mentioned acquisition target web is only for example, and other are existing or from now on
The mode for the acquisition target web being likely to occur such as is applicable to the present invention, should also be included within the scope of the present invention, and
It is incorporated herein by reference.
Preferably, original web page acquisition device 11, identification information extraction element 12, processing Rule device 13 and target
It is continuously to work between webpage acquisition device 14.Specifically, original web page acquisition device 11 constantly obtains pending
Original web page;Then, identification information extraction element 12 also constantly extracts from the making language document of the original web page
Block identification information, wherein, the block identification information is used to identify each content blocks in the making language document;Then, handle
Constantly according to the block identification information, matching inquiry also is carried out in rule base is handled for Rule device 13, with obtain with
The corresponding content blocks processing rule of the block identification information;Then, target web acquisition device 14 is also constantly according to described interior
Hold block processing rule, the content blocks identified to the block identification information are handled accordingly, to obtain target web.Here,
It will be understood by those skilled in the art that " lasting " refers to that each device constantly carries out the acquisition of above-mentioned original web page, block identification information carries
Take, handle the acquisition of rule and the acquisition of target web, until meeting predetermined stoppage condition, such as original web page acquisition device 11
Stop obtaining pending original web page in a long time.
Preferably (reference picture 1), when not from the processing rule base obtain content blocks processing it is regular when, processing rule
The content correlated information for the content blocks that acquisition device 13 can be identified according to the block identification information, determine the content blocks processing
Rule.
Here, the content correlated information of the content blocks includes but is not limited to:
1) positional information of the content of content blocks in the original web page;
2) the alphabetic character quantity that the content of content blocks is included;
3) label information that content blocks are included.
Those skilled in the art will be understood that the above relevant information is only for example, and other are existing or may go out from now on
Existing content correlated information is such as applicable to the present invention, should also be included within the scope of the present invention, and wrap by reference
It is contained in this.
Here, the mode for determining content blocks processing rule includes but is not limited to scenario described below:
1) handle Rule device 13 and processing rule is determined according to position of the content blocks in original web page;If for example,
The content blocks that block identification information is identified are located at the center of original web page, that is, illustrate weight of the content blocks in the original web page
Want grade high, then can determine that content blocks processing rule to be shown processing to the content blocks.
2) quantity for handling alphabetic character of the Rule device 13 in content blocks determines processing rule;If for example,
When the content blocks character quantity that block identification information is identified exceedes book character amount threshold, then it can determine that its processing rule is to incite somebody to action
Word content carries out folding in the content blocks;
3) handle Rule device 13 and processing rule is determined according to the label object included in the content blocks;If for example,
The content blocks that block identification information is identified in the making language document of original web page include label<object>, and the label
<object>The object used comprising predetermined limit in a mobile device, such as ActiveX, it is determined that it is interior to this that it, which handles rule,
Hold block to be deleted.
In one example, code below fragment be present in the html file of original web page:
Block identification information present in it is " embedded object ", processing Rule device 13 according to the block identification information not
Match query corresponding content blocks processing rule can be obtained from processing rule base, and from the label<object>Middle parsing obtains
The label has attribute clsid, and then determines including ActiveX embedded objects, thereby determines that the block identification information institute is right
The processing rule answered is deleted for the content blocks for being identified the identification information.
Those skilled in the art will be understood that the mode of above-mentioned determination processing rule is only for example, and other are existing or from now on
The mode for the determination processing rule being likely to occur such as is applicable to the present invention, should also be included within the scope of the present invention, and
It is incorporated herein by reference.
Fig. 2 shows the equipment signal based on content block identification processing web page contents in accordance with a preferred embodiment of the present invention
Figure.Wherein, processing equipment 1 also includes updating device 15 '.Updating device 15 ' handles rule according to the content blocks of the new determination,
Establish or update the processing rule base.
Here, the function of device 11 ', 12 ', 13 ' and 14 ' shown in Fig. 2 and the above device 11 described by reference picture 1,
12nd, 13 is identical with 14 content, for simplicity, it is incorporated herein by reference, without repeating.
Specifically, when processing Rule device 13 ' does not obtain corresponding content according to identification information from processing rule base
When block processing is regular, it is that identification information newly determines content blocks processing rule, then updating device 15 ' according to the identification information and its
The processing rule of the corresponding new determination is written in the processing rule base, to update the processing rule base;If detect at this
Reason rule base is not set up, then initializes the processing rule base in advance, then above- mentioned information is written in the processing rule base.
In one example, the new place corresponding to the mark entitled " embedded object " that Rule device 13 ' obtains is handled
When reason rule is delete processing, then updating device 15 ' inserts the mark title and its corresponding place in rule base is handled
Manage the data record of rule.
Those skilled in the art will be understood that above-mentioned foundation or the mode of renewal processing rule base are only for example, and other are existing
Or the mode of the foundation that is likely to occur from now on or renewal processing rule base be such as applicable to the present invention, should also be included in the present invention
Within protection domain, and it is incorporated herein by reference.
(reference picture 1) in a further advantageous embodiment, processing equipment 1 also include providing device (not shown).Wherein, it is former
The accessing page request that beginning webpage acquisition device 11 is inputted according to user by mobile terminal, obtains the original web page;There is provided
The target web is supplied to the user by device.
Another preferred embodiment is described in detail referring to Fig. 1, wherein, identification information extraction element 12 is from institute
Extraction block identification information in the making language document of original web page is stated, wherein, the block identification information is used to identify the mark
Each content blocks in language file;Then, Rule device 13 is handled according to the block identification information, in rule base is handled
Matching inquiry is carried out, rule is handled to obtain the content blocks corresponding with the block identification information;Then, target web acquisition device
14 ground handle rule according to the content blocks, and the content blocks identified to the block identification information are handled accordingly, to obtain
Target web;Its detailed process obtains with the embodiment identification information extraction element 12 described by foregoing reference picture 1, processing rule
Take device 13 identical with the process performed by target web acquisition device 14, for simplicity, be incorporated herein by reference, and
Do not repeat.
In one example, first, when user is inputted in the address field input frame of the browser software of mobile terminal,
The mobile terminal obtains a webpage URL of user's input in real time, and is recorded as the page corresponding with user's input operation
Face access request, wherein, the accessing page request includes the URL, the communication that the accessing page request then is passed through into agreement
Mode is sent to processing equipment 1;Then, original web page acquisition device 11 receives the accessing page request in real time, and therefrom carries
Page URL is taken, and the request for obtaining the webpage is sent to the webserver where webpage pointed by the URL, then, receiving should
The webpage that network server response is fed back in the request, and using the webpage as the pending original web page.
The target web that device obtains target web acquisition device 14 is provided, provided using any of mobile terminal
The technological means of people's readable information, such as screen display, loudspeaker broadcasting etc., the target web is supplied to by mobile terminal
The user.For example, by taking screen display as an example, there is provided the target web that device obtains target web acquisition device 14, pass through page
Surface technology, such as JSP, ASP or PHP, mobile terminal is supplied to form in certain sequence, such as is shown with link, the page
Mode is supplied to the mobile terminal, is browsed for user.
Those skilled in the art will be understood that the mode of above-mentioned acquisition original web page and/or provide the mode of target web only
For citing, other existing or modes of acquisitions original web page and/or the mode of offer target web for being likely to occur from now on are such as
The present invention is applicable to, should be also included within the scope of the present invention, and be incorporated herein by reference.
Preferably (reference picture 1), processing equipment 1 also include parameter obtaining device (not shown) and preference rule acquisition device
(not shown).Wherein, parameter obtaining device obtains the display parameter information of the mobile terminal;Preference rule acquisition device according to
The display parameter information optimizes to content blocks processing rule, to obtain preferred content block processing rule;Target network
Page acquisition device 14 handles rule according to the preferred content block, the content blocks is handled accordingly, with described in acquisition
Target web.
Specifically, parameter obtaining device in the way of agreement by calling the mobile terminal of the target web to be shown to carry
The API (API) of confession, obtain the display parameter information of the mobile terminal;Here, the display parameter information includes
But it is not limited to:
1) picture format that mobile terminal is supported, such as JPEG, PNG, GIF form,
2) screen resolution of mobile terminal, such as the physics size of pixel, color digit,
3) whether mobile terminal supports plug-in unit, such as Flash plug-in units;
Then, the display parameter information for the mobile terminal that preference rule acquisition device obtains according to parameter obtaining device,
Rule is handled to content blocks of the processing Rule device 13 acquired in each identification information and optimizes processing, it is preferred to obtain
Content blocks processing rule.Then, target web acquisition device 14 handles rule according to the preferred content block, and the content blocks are entered
The corresponding processing of row, to obtain the target web.
In one example, when the block identification information in the making language document that identification information acquisition device 12 obtains is
" Flash ", Flash animations are included in its content blocks identified, and handle Rule device 13 and obtained in rule base is handled
The respective handling rule taken is deleted for the Flash animations for being identified the identification information, but the display that parameter obtaining device obtains
Parameter information shows that the mobile terminal supports FLASH plug-in component operations, then preference rule acquisition device is accordingly by the identification information institute
For corresponding original processing rule optimization to retain the Flash animations in content blocks, as preferred content block handles rule;And then mesh
Mark webpage acquisition device 14 retains FLASH animations therein when carrying out respective handling to the content blocks, and the FLASH is included to obtain
The target web of animation.
Those skilled in the art will be understood that the mode of above-mentioned acquisition display parameter information and/or obtain at preferred content block
The mode for managing rule and/or the mode for obtaining target web are only for example, and other acquisitions that are existing or being likely to occur from now on show
Show the mode of parameter information and/or acquisition preferred content block handles the mode of rule and/or the mode of acquisition target web such as may be used
Suitable for the present invention, should also be included within the scope of the present invention, and be incorporated herein by reference.
Fig. 3 is shown according to method flow diagram of the one aspect of the invention based on content block identification processing web page contents.
Here, processing equipment 1 can be the network equipment, including but not limited to computer, network host, single network service
The cloud that device, multiple webserver collection or multiple servers are formed, here, cloud is by based on cloud computing (Cloud Computing)
A large amount of computers or the webserver form, wherein, cloud computing is one kind of Distributed Calculation, by the meter of a group loose couplings
One super virtual computer of calculation machine collection composition;Processing equipment 1 or mobile terminal, the mobile terminal means can be
The computer equipment used in movement, including but not limited to mobile phone, notebook, POS, vehicle-mounted computer etc., its screen size
The generally much smaller than display sizes of desktop computer.
It is described in detail referring to Fig. 3 to handle processing equipment 1 process of web page contents:
Specifically, in step sl, processing equipment 1 obtains pending original web page.
Here, the mode for obtaining pending original web page includes but is not limited to scenario described below:
1) according to the accessing page request from mobile terminal, the URL from the accessing page request
(URL) corresponding original web page is obtained at the Website server pointed by;
In one example, first, user is by the interactive device of mobile terminal, including but not limited to keyboard, mouse, remote control
Device, touch pad or handwriting equipment, interact with the browser software or client software of mobile terminal, by taking keyboard as an example, use
When family is inputted in the address field input frame of the browser software of mobile terminal, it is defeated that the mobile terminal obtains user in real time
The keystroke sequence entered, such as a URL (URL) of user's input, and be recorded as and user's input operation phase
Corresponding accessing page request, wherein, the accessing page request includes the URL, then passes through the accessing page request about
Fixed communication mode is sent to processing equipment 1;Then, in step sl, processing equipment 1 receives the page access and asked in real time
Ask, and therefrom extract page URL, and the request for obtaining the webpage is sent to the webserver where webpage pointed by the URL,
For example, can be encapsulated as a request message, such as http request message, and by corresponding communication protocol, such as http, https
Communication protocol, send to the webserver;Then, processing equipment 1 receives the network server response and fed back in the request
Webpage, and using the webpage as the pending original web page.
2) pending original web page is obtained from third party device.
In another example, processing equipment 1 is the network equipment.In step sl, processing equipment 1 carries according to third party device
The API (API) of confession, by predetermined condition or event triggering or regularly send reception to the third party device and treat
The request message of the original web page of processing, and receive the third party device returned in response to the request message it is pending original
Webpage;Or third party device actively pushes pending original web page to processing equipment 1, in step sl, processing equipment 1 receives
The pending original web page.
Those skilled in the art will be understood that the mode of the pending original web page of above-mentioned acquisition is only for example, and other are existing
Or the mode of the pending original web page of acquisition that is likely to occur from now on be such as applicable to the present invention, should also be included in the present invention
Within protection domain, and it is incorporated herein by reference.
Then, in step s 2, processing equipment 1 is from the making language document of its original web page obtained in step sl
Such as block identification information is extracted using modes such as string matchings, wherein, the block identification information is used to identify the mark language
Say each content blocks in file.
Here, the making language document includes but is not limited to:
1) HTML (HTML) file, it is a kind of standard generalized markup language for describing web document
Speech;
2) XML (extensible markup language) file, it is a kind of standard generalized markup language for being simply to data storage
Speech;
3) XHTML (extensible HyperText Markup Language) file, it is a kind of mark with strict grammer based on XML
Language;
4) WML (WML) file, it is retouched for creating the one kind for the page that can be displayed in WAP browser
The property stated markup language.
Those skilled in the art will be understood that above-mentioned making language document is only for example, and other are existing or may go out from now on
Existing making language document is such as applicable to the present invention, should also be included within the scope of the present invention, and wrap by reference
It is contained in this.
Here, the block identification information includes but is not limited to identify title, mark ID etc.;Wherein, the name of title is identified
Can according to its mark content blocks type, as title, navigation, text, picture, embedded object (such as Java applet,
ActiveX, Flash) etc..
Here, the content blocks mean the content area being made up of one or more labels in making language document, its
It is corresponding with the certain content shown in webpage, e.g., title content block, body matter block, navigation content block, image content block,
Embedded object (such as Java applet, ActiveX, Flash) block.
Here, storage mode of the block identification information in making language document includes but is not limited to:
1) annotation in making language document;For example, using JSON forms, identification information can be stored in html file annotation
In, such as<!--tc block_begin:{type:″context″}-->, wherein, JSON forms are that a kind of data of lightweight are handed over
Form is changed, it typically represents data using " name/value " to by the way of, used between title and value ":" separate;
2) the customization label in making language document;For example, in html file, customization label can be<tc></tc>, mark
Knowing information can be stored in the customization label;
3) tag attributes in making language document;For example, in XHTML files, identification information can be stored in content blocks
In the attribute of label, such as<Div markName=" title ">, wherein attribute markName property value is to be used to identify this
The identification information of content blocks corresponding to div tag.
Those skilled in the art will be understood that above-mentioned storage mode is only for example, and other are existing or are likely to occur from now on
Storage mode is such as applicable to the present invention, should also be included within the scope of the present invention, and be incorporated herein by reference.
In one example, the markup language file of the original web page obtained in step s 2 when processing equipment 1 is XHTML texts
During part, such as:
Wherein, the pre-defined tag attributes using the entitled markName of attribute of this XHTML file are come storage content block mark
Know information, accordingly, in step s 2, processing equipment 1 to the XHTML files by parsing, and according to keyword
" markName " carries out string matching therefrom to obtain the markName attributes and its property value " mark in div tag attribute
Topic ", the property value is the mark title of content blocks corresponding to the div tag, and the markName category in img tag attributes
Property and its property value " picture ", the property value are the mark title of content blocks corresponding to the img labels.
Those skilled in the art will be understood that the mode of said extracted block identification information is only for example, and other are existing or modern
The mode for the extraction block identification information being likely to occur afterwards is such as applicable to the present invention, also should be included in the scope of the present invention with
It is interior, and be incorporated herein by reference.
Then, in step s3, the block identification information that processing equipment 1 obtains in step s 2 according to it, in processing rule
Matching inquiry is carried out in storehouse, rule is handled to obtain the content blocks corresponding with the block identification information.
Specifically, in step s3, processing equipment 1 is advised according to block identification information in the processing of local or third party device
Matching inquiry is then carried out in storehouse, rule is handled to obtain the content blocks corresponding with the block identification information.
Here, the processing rule includes but is not limited to:
1) content in content blocks is formatted;Wherein, the formatting includes but is not limited to:
I changes the word attribute in the content blocks, such as font, size, color, the background colour of content etc.;
Ii is reduced etc. to the picture included in the content blocks by predetermined ratio;
2) content blocks are shown;
3) content blocks are deleted;
4) content blocks are folded;Wherein, it is that folding is hidden that the folding, which means that the content blocks are arranged to its content by default,
Hide, but can be deployed the content to show by specific triggering mode;
5) display location of content blocks is adjusted.
Those skilled in the art will be understood that above-mentioned processing rule is only for example, and other are existing or are likely to occur from now on
Processing rule is such as applicable to the present invention, should also be included within the scope of the present invention, and be incorporated herein by reference.
Here, comprising each block identification information and its corresponding processing rule in the processing rule base, it is included but not
It is limited to relational database, Key-Value storage systems, file system etc..
In one example, block identification information is " title ", and in step s3, processing equipment 1 is led to according to the block identification information
Cross processing equipment 1 offer API (API), local processing rule base in carry out matching inquiry, with obtain with
The corresponding content blocks processing rule of " title " block identification information is " show ", will the content blocks that are identified of the block identification information
It is shown processing.
In another example, block identification information is " picture ", in step s3, processing equipment 1 according to the block identification information,
Processing Rule request is sent to third party device, wherein, processing Rule request includes the block identification information;Example
Such as, it can be encapsulated as a request message, such as http request message, and be led to by corresponding communication protocol, such as http, https
Believe agreement, send to third party device;Third party device receives and parses through the solicited message, Jin Ergen in a manner of monitoring ground in real time
Managed at which according to the block identification information extracted and matching inquiry is carried out in rule base, it is corresponding with the block identification information to obtain
Content blocks processing rule be " zoomin ", picture that will be in the content blocks that are identified of the block identification information carries out predetermined contracting
Small processing.
Those skilled in the art will be understood that the mode of above-mentioned acquisition processing rule is only for example, and other are existing or from now on
The mode for the acquisition processing rule being likely to occur such as is applicable to the present invention, should also be included within the scope of the present invention, and
It is incorporated herein by reference.
Preferably, in step s3, processing equipment 1 is according to the block identification information and the original web page affiliated web site
Identification information, matching inquiry is carried out in rule base is handled, rule is handled with the content blocks that the webpage obtained as the website customizes.
Here, the identification information of the original web page affiliated web site includes but is not limited to website domain name, web IP address, web site name
Deng.
Specifically, in step s3, processing equipment 1 for example obtains pending original web page in step sl according to it
URL, determine the identification information of the webpage affiliated web site, such as website domain name, web IP address;Then, processing equipment 1 is according to it
The block identification information and the identification information of the original web page affiliated web site obtained in step s 2, the progress in rule base is handled
With inquiry, if matching acquisition is the predetermined processing rule of the webpage of the website, using predetermined processing rule as the webpage
Content blocks processing rule.
In one example, when block identification information is " embedded object ", the URL of original web page is " www.abc.com/
Sport/101.htm ", in step s3, processing equipment 1 is according to the URL, and the website domain name of website is where extracting the webpage
“www.abc.com”;Processing equipment 1 carries out matching inquiry according to the block identification information in rule base is handled, and obtains corresponding
It is " delete " to handle rule, that is, deletes the content blocks that the identification information is identified, but according to the block identification information and the original net
The website domain name of the page website carries out matching inquiry in rule base is handled, obtain for the website it is predetermined to " embedded object "
The processing rule of block identification information is " show ", that is, shows the content blocks that the identification information is identified, then processing equipment 1 ignore with
Delete processing rule corresponding to the block identification information, advised for the predetermined processing rule in the website as content blocks processing
Then.
Those skilled in the art will be understood that the mode of above-mentioned acquisition processing rule is only for example, and other are existing or from now on
The mode for the acquisition processing rule being likely to occur such as is applicable to the present invention, should also be included within the scope of the present invention, and
It is incorporated herein by reference.
Then, in step s 4, the content blocks that processing equipment 1 obtains in step s3 according to it handle rule, to the block
The content blocks that identification information is identified are handled accordingly, to obtain target web.
Here, described include but is not limited to the corresponding processing of content blocks progress:Row format is entered to the content in content blocks
Change, show, delete, fold, sequencing.
In one example, when in step s 2, processing equipment 1 parses and obtains two blocks in the html file of certain webpage
Identification information is respectively " text " and " picture ", and in step s3, processing equipment 1 obtains relative with " text " block identification information
The content blocks processing rule answered folds for the content blocks for being identified the identification information, and corresponding with " picture " block identification information
Content blocks processing rule reduced for the picture in the content blocks that are identified the identification information in predetermined reduction ratio;Then
In step s 4, processing equipment 1 obtains the content that each identification information is identified according to above-mentioned identification information in the html file
Block, then, according to corresponding processing rule, the content folded concealed in the content blocks that " text " block identification information is identified,
And predetermined triggering mode is set, in the future the body matter can be deployed show to realize, and by " picture " block identification information
Picture in the content blocks identified is reduced and shown by predetermined ratio, and then using the webpage after processing as target network
Page.
Those skilled in the art will be understood that the mode of above-mentioned acquisition target web is only for example, and other are existing or from now on
The mode for the acquisition target web being likely to occur such as is applicable to the present invention, should also be included within the scope of the present invention, and
It is incorporated herein by reference.
Preferably, processing equipment 1 is continuously to work in step S1, step S2, step S3 and step S4.Specifically
Ground, in step sl, processing equipment 1 constantly obtain pending original web page;Then, in step s 2, processing equipment 1
Block identification information is constantly extracted from the making language document of the original web page, wherein, the block identification information is used to mark
Know each content blocks in the making language document;Then, in step s3, processing equipment 1 is also constantly according to described piece of mark
Know information, matching inquiry is carried out in rule base is handled, rule is handled to obtain the content blocks corresponding with the block identification information;
Then, in step s 4, processing equipment 1 also constantly handles rule according to the content blocks, and the block identification information is identified
Content blocks handled accordingly, to obtain target web.Here, it will be understood by those skilled in the art that " lasting " refers to handle
Equipment 1 constantly carries out the acquisition of above-mentioned original web page, the extraction of block identification information, the acquisition of processing rule and mesh in each step
The acquisition of webpage is marked, until meet predetermined stoppage condition, such as processing equipment 1 stops obtaining pending original in a long time
Beginning webpage.
Preferably (reference picture 3), when not from the processing rule base obtain content blocks processing it is regular when, in step S3
In, the content correlated information for the content blocks that processing equipment 1 can be identified according to the block identification information, determine at the content blocks
Reason rule.
Here, the content correlated information of the content blocks includes but is not limited to:
1) positional information of the content of content blocks in the original web page;
2) the alphabetic character quantity that the content of content blocks is included;
3) label information that content blocks are included.
Those skilled in the art will be understood that the above relevant information is only for example, and other are existing or may go out from now on
Existing content correlated information is such as applicable to the present invention, should also be included within the scope of the present invention, and wrap by reference
It is contained in this.
Here, the mode for determining content blocks processing rule includes but is not limited to scenario described below:
1) in step s3, processing equipment 1 determines processing rule according to position of the content blocks in original web page;For example,
If the content blocks that block identification information is identified are located at the center of original web page, that is, illustrate the content blocks in the original web page
Important level is high, then can determine that content blocks processing rule to be shown processing to the content blocks.
2) in step s3, the quantity of alphabetic character of the processing equipment 1 in content blocks determines processing rule;For example,
If the content blocks character quantity that block identification information is identified exceedes book character amount threshold, it can determine that it handles rule and is
Word content in the content blocks is subjected to folding;
3) in step s3, processing equipment 1 determines processing rule according to the label object included in the content blocks;For example,
If the content blocks that block identification information is identified in the making language document of original web page include label<object>, and the mark
Label<object>The object used comprising predetermined limit in a mobile device, such as ActiveX, it is determined that it is to this that it, which handles rule,
Content blocks are deleted.
In one example, code below fragment be present in the html file of original web page:
Block identification information present in it is " embedded object ", and in step s3, processing equipment 1 is according to the block identification information
Fail the match query from processing rule base and obtain corresponding content blocks processing rule, and from the label<object>Middle parsing obtains
Obtaining the label has attribute clsid, and then determines including ActiveX embedded objects, thereby determines that the block identification information institute
Corresponding processing rule is deleted for the content blocks for being identified the identification information.
Those skilled in the art will be understood that the mode of above-mentioned determination processing rule is only for example, and other are existing or from now on
The mode for the determination processing rule being likely to occur such as is applicable to the present invention, should also be included within the scope of the present invention, and
It is incorporated herein by reference.
Fig. 4 shows the method flow based on content block identification processing web page contents in accordance with a preferred embodiment of the present invention
Figure.Wherein, the process also includes step S5 '.In step S5 ', processing equipment 1 is handled according to the content blocks of the new determination and advised
Then, establish or update the processing rule base.
Here, function of the processing equipment 1 shown in Fig. 4 in step S1 ', step S2 ', step S3 ' and step S4 ' is with before
Processing equipment 1 described by face reference picture 1 is identical with the content in step S4 in step S1, step S2, step S3, is risen to be concise
See, it is incorporated herein by reference, without repeating.
Specifically, when in step S3 ', processing equipment 1 is not obtained in corresponding according to identification information from processing rule base
When appearance block processing is regular, it is that identification information newly determines content blocks processing rule, then in step S5 ', processing equipment 1 is according to this
Identification information and its processing rule of the corresponding new determination are written in the processing rule base, to update the processing rule base;
If detecting, the processing rule base is not set up, and initializes the processing rule base in advance, then above- mentioned information is written at this
Manage in rule base.
In one example, it is new corresponding to the mark entitled " embedded object " that processing equipment 1 obtains in step S3 '
When processing rule is delete processing, then in step S5 ', processing equipment 1 inserts the mark title in rule base is handled
And its data record of corresponding processing rule.
Those skilled in the art will be understood that above-mentioned foundation or the mode of renewal processing rule base are only for example, and other are existing
Or the mode of the foundation that is likely to occur from now on or renewal processing rule base be such as applicable to the present invention, should also be included in the present invention
Within protection domain, and it is incorporated herein by reference.
(reference picture 3) in a further advantageous embodiment, the process also include step S6 (not shown).Wherein, in step
In S1, accessing page request that processing equipment 1 is inputted according to user by mobile terminal obtains the original web page;In step
In S6, the target web is supplied to the user by processing equipment 1.
Another preferred embodiment is described in detail referring to Fig. 3, wherein, in step s 2, processing equipment 1 from
Block identification information is extracted in the making language document of the original web page, wherein, the block identification information is used to identify the mark
Remember each content blocks in language file;Then, in step s3, processing equipment 1 is according to the block identification information, in processing rule
Matching inquiry is carried out in storehouse, rule is handled to obtain the content blocks corresponding with the block identification information;Then, in step s 4,
The ground of processing equipment 1 handles rule according to the content blocks, and the content blocks identified to the block identification information are handled accordingly,
To obtain target web;Its detailed process is with processing equipment 1 in the embodiment described by foregoing reference picture 3 in step S2, step
Performed process is identical in S3 and step S4, for simplicity, is incorporated herein by reference, without repeating.
In one example, first, when user is inputted in the address field input frame of the browser software of mobile terminal,
The mobile terminal obtains a webpage URL of user's input in real time, and is recorded as the page corresponding with user's input operation
Face access request, wherein, the accessing page request includes the URL, the communication that the accessing page request then is passed through into agreement
Mode is sent to processing equipment 1;Then, in step sl, processing equipment 1 receives the accessing page request in real time, and therefrom
Page URL is extracted, and the request for obtaining the webpage is sent to the webserver where webpage pointed by the URL, then, is received
The webpage that the network server response is fed back in the request, and using the webpage as the pending original web page.
In step s 6, the target web that processing equipment 1 obtains it in step s 4, using any of mobile whole
End provides the technological means of people's readable information, such as screen display, loudspeaker broadcasting etc., and the target web is passed through into mobile terminal
It is supplied to the user.For example, by taking screen display as an example, in step s 6, the target that processing equipment 1 obtains it in step s 4
Webpage, by page technology, such as JSP, ASP or PHP, be supplied to mobile terminal with form in certain sequence, for example, with link,
The page shows etc. that mode is supplied to the mobile terminal, is browsed for user.
Those skilled in the art will be understood that the mode of above-mentioned acquisition original web page and/or provide the mode of target web only
For citing, other existing or modes of acquisitions original web page and/or the mode of offer target web for being likely to occur from now on are such as
The present invention is applicable to, should be also included within the scope of the present invention, and be incorporated herein by reference.
Preferably (reference picture 3), the process also include step S7 (not shown) and step S8 (not shown).Wherein, in step
In rapid S7, processing equipment 1 obtains the display parameter information of the mobile terminal;In step s 8, processing equipment 1 is according to described aobvious
Show that parameter information optimizes to content blocks processing rule, to obtain preferred content block processing rule;In step s 4, locate
Manage equipment 1 and rule is handled according to the preferred content block, the content blocks are handled accordingly, to obtain the target network
Page.
Specifically, in the step s 7, processing equipment 1 in the way of agreement by calling the shifting of the target web to be shown
The API (API) that dynamic terminal provides, obtain the display parameter information of the mobile terminal;Here, the display parameters
Information includes but is not limited to:
1) picture format that mobile terminal is supported, such as JPEG, PNG, GIF form,
2) screen resolution of mobile terminal, such as the physics size of pixel, color digit,
3) whether mobile terminal supports plug-in unit, such as Flash plug-in units;
Then, in step s 8, processing equipment 1 is believed according to the display parameters of its mobile terminal obtained in the step s 7
Breath, processing is optimized to its content blocks processing rule in step s3 acquired in each identification information, to obtain in preferably
Hold block processing rule.Then, in step s 4, processing equipment 1 handles rule according to the preferred content block, and the content blocks are entered
The corresponding processing of row, to obtain the target web.
In one example, when the block identification information in the making language document that processing equipment 1 obtains in step s 2 is
" Flash ", Flash animations are included in its content blocks identified, and in step s3, processing equipment 1 is in rule base is handled
The respective handling rule of acquisition is deleted for the Flash animations for being identified the identification information, but in the step s 7, processing equipment 1
The display parameter information of acquisition shows that the mobile terminal supports FLASH plug-in component operations, then in step s 8, processing equipment 1 is accordingly
It is the Flash animations in reservation content blocks, as preferred content block by the original processing rule optimization corresponding to the identification information
Processing rule;And then in step s 4, processing equipment 1 retains FLASH animations therein when carrying out respective handling to the content blocks,
To obtain the target web for including the FLASH animations.
Those skilled in the art will be understood that the mode of above-mentioned acquisition display parameter information and/or obtain at preferred content block
The mode for managing rule and/or the mode for obtaining target web are only for example, and other acquisitions that are existing or being likely to occur from now on show
Show the mode of parameter information and/or acquisition preferred content block handles the mode of rule and/or the mode of acquisition target web such as may be used
Suitable for the present invention, should also be included within the scope of the present invention, and be incorporated herein by reference.
It is obvious to a person skilled in the art that the invention is not restricted to the details of above-mentioned one exemplary embodiment, Er Qie
In the case of without departing substantially from spirit or essential attributes of the invention, the present invention can be realized in other specific forms.Therefore, no matter
From the point of view of which point, embodiment all should be regarded as exemplary, and be nonrestrictive, the scope of the present invention is by appended power
Profit requires rather than described above limits, it is intended that all in the implication and scope of the equivalency of claim by falling
Change is included in the present invention.Any reference in claim should not be considered as to the involved claim of limitation.This
Outside, it is clear that the word of " comprising " one is not excluded for other units or step, and odd number is not excluded for plural number.That is stated in device claim is multiple
Unit or device can also be realized by a unit or device by software or hardware.The first, the second grade word is used for table
Show title, and be not offered as any specific order.
Claims (14)
1. a kind of computer implemented method based on content block identification processing web page contents, wherein, this method includes following step
Suddenly:
A obtains pending original web page;
B extracts block identification information from the making language document of the original web page, wherein, the block identification information is used to identify
Each content blocks in the making language document;
C carries out matching inquiry according to the block identification information in rule base is handled, corresponding with the block identification information to obtain
Content blocks processing rule;
D handles rule according to the content blocks, and the content blocks identified to the block identification information are handled accordingly, to obtain
Target web;
Wherein, the step c includes:
- according to the block identification information and the identification information of the original web page affiliated web site, the progress in rule base is handled
With inquiry, to obtain the content blocks processing rule;
- when not from the processing rule base obtain content blocks processing it is regular when, identified according to the block identification information
The content correlated information of content blocks, determine the content blocks processing rule;
Wherein, the content correlated information includes following at least any one:
Positional information of the content of-content blocks in the original web page;
The alphabetic character quantity that the content of-content blocks is included;
The label information that-the content blocks are included.
2. according to the method for claim 1, wherein, the content blocks processing rule includes following at least any one:
- content in the content blocks is formatted;
- content blocks are shown;
- content blocks are deleted;
- content blocks are folded.
3. according to the method for claim 1, wherein, this method also includes:
- according to the content blocks processing rule newly determined, establish or update the processing rule base.
4. according to the method in any one of claims 1 to 3, wherein, the step a includes:
- the accessing page request inputted according to user by mobile terminal, obtains the original web page;
Wherein, this method also includes:
- target web is supplied to the user.
5. according to the method for claim 4, wherein, this method also includes:
The display parameter information of-acquisition the mobile terminal;
- content blocks processing rule is optimized according to the display parameter information, to obtain preferred content block processing rule
Then;
Wherein, the step d includes:
- according to preferred content block processing rule, the content blocks are handled accordingly, to obtain the target network
Page.
6. the method according to claim 11, wherein, storage side of the block identification information in the making language document
Formula includes following at least any one:
Annotation in the-making language document;
Customization label in the-making language document;
Tag attributes in the-making language document.
7. according to the method for claim 1, wherein, the making language document includes following at least any one:
- html file;
- XML file;
- XHTML files;
- WML files.
8. a kind of equipment based on content block identification processing web page contents, wherein, the equipment includes:
Original web page acquisition device, for obtaining pending original web page;
Identification information extraction element, for extracting block identification information from the making language document of the original web page, wherein, institute
Block identification information is stated to be used to identify each content blocks in the making language document;
Rule device is handled, for according to the block identification information, matching inquiry being carried out in rule base is handled, to obtain
The content blocks processing rule corresponding with the block identification information;
Target web acquisition device, for handling rule, the content blocks identified to the block identification information according to the content blocks
Handled accordingly, to obtain target web;
Wherein, the processing Rule device is used for:
According to the block identification information and the identification information of the original web page affiliated web site, matched in rule base is handled
Inquiry, to obtain the content blocks processing rule;
When not from the processing rule base obtain content blocks processing it is regular when, identified according to the block identification information in
Hold the content correlated information of block, determine the content blocks processing rule;
Wherein, the content correlated information includes following at least any one:
Positional information of the content of-content blocks in the original web page;
The alphabetic character quantity that the content of-content blocks is included;
The label information that-the content blocks are included.
9. equipment according to claim 8, wherein, the content blocks processing rule includes following at least any one:
- content in the content blocks is formatted;
- content blocks are shown;
- content blocks are deleted;
- content blocks are folded.
10. equipment according to claim 8, wherein, the equipment also includes:
Updating device, for handling rule according to the content blocks newly determined, establish or update the processing rule base.
11. the equipment according to any one of claim 8 to 10, wherein, the original web page acquisition device is used for basis
The accessing page request that user is inputted by mobile terminal, obtain the original web page;
Wherein, the equipment also includes:
Device is provided, for the target web to be supplied into the user.
12. equipment according to claim 11, wherein, the equipment also includes:
Parameter obtaining device, for obtaining the display parameter information of the mobile terminal;
Optimize device, it is preferred to obtain for being optimized according to the display parameter information to content blocks processing rule
Content blocks processing rule;
Wherein, the target web acquisition device is used to handle rule according to the preferred content block, and the content blocks are carried out
Corresponding processing, to obtain the target web.
13. equipment according to claim 8, wherein, storage of the block identification information in the making language document
Mode includes following at least any one:
Annotation in the-making language document;
Customization label in the-making language document;
Tag attributes in the-making language document.
14. equipment according to claim 8, wherein, the making language document includes following at least any one:
- html file;
- XML file;
- XHTML files;
- WML files.
Priority Applications (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201110390828.9A CN103136259B (en) | 2011-11-30 | 2011-11-30 | A kind of method and apparatus based on content block identification processing web page contents |
PCT/CN2012/075044 WO2013078829A1 (en) | 2011-11-30 | 2012-05-03 | Method and device for processing webpage content on the basis of content block identification |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201110390828.9A CN103136259B (en) | 2011-11-30 | 2011-11-30 | A kind of method and apparatus based on content block identification processing web page contents |
Publications (2)
Publication Number | Publication Date |
---|---|
CN103136259A CN103136259A (en) | 2013-06-05 |
CN103136259B true CN103136259B (en) | 2018-03-23 |
Family
ID=48496093
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201110390828.9A Active CN103136259B (en) | 2011-11-30 | 2011-11-30 | A kind of method and apparatus based on content block identification processing web page contents |
Country Status (2)
Country | Link |
---|---|
CN (1) | CN103136259B (en) |
WO (1) | WO2013078829A1 (en) |
Families Citing this family (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103473004A (en) * | 2013-09-29 | 2013-12-25 | 小米科技有限责任公司 | Method, device and terminal equipment for displaying message |
CN103544320A (en) * | 2013-11-05 | 2014-01-29 | 从兴技术有限公司 | Webpage generation method and device |
CN104834685A (en) * | 2015-04-17 | 2015-08-12 | 百度国际科技(深圳)有限公司 | Method and device for processing comment message block in comment-like webpage |
CN106126485A (en) * | 2016-06-14 | 2016-11-16 | 北京金山安全软件有限公司 | Text format generation method, server and terminal |
CN108595697B (en) * | 2018-05-09 | 2021-02-02 | 未鲲(上海)科技服务有限公司 | Webpage integration method, device and system |
CN109710863A (en) * | 2018-11-27 | 2019-05-03 | 平安科技(深圳)有限公司 | Information conversion method, device, computer equipment and storage medium |
CN111125605B (en) * | 2019-12-31 | 2022-07-29 | 北京创鑫旅程网络技术有限公司 | Page element acquisition method and device |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101039357A (en) * | 2006-03-17 | 2007-09-19 | 陈晓月 | Method for browsing website using handset |
CN101526953A (en) * | 2009-01-19 | 2009-09-09 | 北京跳网无限科技发展有限公司 | WWW transformation technology |
CN101815093A (en) * | 2010-03-11 | 2010-08-25 | 深圳市嘉讯软件有限公司 | Method for adapting webpage to mobile terminal and mobile terminal page adaptation device |
Family Cites Families (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20040054973A1 (en) * | 2000-10-02 | 2004-03-18 | Akio Yamamoto | Method and apparatus for transforming contents on the web |
CN102163233A (en) * | 2011-04-18 | 2011-08-24 | 北京神州数码思特奇信息技术股份有限公司 | Method and system for converting webpage markup language format |
-
2011
- 2011-11-30 CN CN201110390828.9A patent/CN103136259B/en active Active
-
2012
- 2012-05-03 WO PCT/CN2012/075044 patent/WO2013078829A1/en active Application Filing
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101039357A (en) * | 2006-03-17 | 2007-09-19 | 陈晓月 | Method for browsing website using handset |
CN101526953A (en) * | 2009-01-19 | 2009-09-09 | 北京跳网无限科技发展有限公司 | WWW transformation technology |
CN101815093A (en) * | 2010-03-11 | 2010-08-25 | 深圳市嘉讯软件有限公司 | Method for adapting webpage to mobile terminal and mobile terminal page adaptation device |
Non-Patent Citations (3)
Title |
---|
基于网页格局的内容分块算法;路松峰等;《计算机工程与科学》;20070930;第29卷(第9期);16-18 * |
浅议WEB页面到WAP页面转换过程;王永飞;《铜陵财经专科学校学报》;20021231;42-45 * |
面向移动终端的Web内容转换工具的设计与实现;胥晓欢;《中国优秀硕士学位论文全文数据库 信息科技辑》;20091115;第11-19页,第23页,第27-33页 * |
Also Published As
Publication number | Publication date |
---|---|
CN103136259A (en) | 2013-06-05 |
WO2013078829A1 (en) | 2013-06-06 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN103136259B (en) | A kind of method and apparatus based on content block identification processing web page contents | |
CN103544176B (en) | Method and apparatus for generating the page structure template corresponding to multiple pages | |
US10915828B2 (en) | Website address identification method and apparatus | |
AU2006294603B2 (en) | System and method for image processing | |
CN102004756B (en) | Traffic visualization across web maps | |
CN103365862B (en) | It is a kind of for generating the method and apparatus of picture corresponding with the page | |
US8196035B2 (en) | Adaptation of a website to mobile web browser | |
CN103336794B (en) | For providing the corresponding method and apparatus that information is presented in target pages | |
KR20190039230A (en) | Method and system for server-side rendering of native content for presentations | |
US20140019441A1 (en) | Custom web page themes | |
CN102123195A (en) | Apparatus and method for providing bookmark service in communication terminal | |
CN104243273A (en) | Method and device for displaying information on instant messaging client and information display system | |
CN102314494B (en) | Method and equipment for processing webpage contents | |
CN105718559B (en) | Search forms pages and the method and apparatus of target pages transforming relationship | |
CN102306174A (en) | Method and equipment for interacting with user based on web page elements | |
WO2015026750A1 (en) | Presenting fixed format documents in reflowed format | |
CN102314499A (en) | Method and equipment for processing webpage content | |
CN107436843A (en) | Webpage performance test methods and device | |
CN103246699A (en) | Method and device for data access control based on browser | |
CN103518195A (en) | Apparatus, system and method for vector-based form field document | |
KR20170073693A (en) | Extracting similar group elements | |
CN102760157B (en) | A kind of for generating the method that release news, device and the equipment corresponding with mobile terminal | |
JP2012133515A (en) | Information processor, information processing method, program and information processing system | |
JP5556461B2 (en) | Information browsing terminal device, information browsing system, information browsing program, and information browsing method | |
CN107729573A (en) | Information-pushing method and device |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |