WO2013078829A1 - Method and device for processing webpage content on the basis of content block identification - Google Patents

Method and device for processing webpage content on the basis of content block identification Download PDF

Info

Publication number
WO2013078829A1
WO2013078829A1 PCT/CN2012/075044 CN2012075044W WO2013078829A1 WO 2013078829 A1 WO2013078829 A1 WO 2013078829A1 CN 2012075044 W CN2012075044 W CN 2012075044W WO 2013078829 A1 WO2013078829 A1 WO 2013078829A1
Authority
WO
WIPO (PCT)
Prior art keywords
content
block
processing rule
identification information
content block
Prior art date
Application number
PCT/CN2012/075044
Other languages
French (fr)
Chinese (zh)
Inventor
钱海祥
辛昕
Original Assignee
百度在线网络技术(北京)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 百度在线网络技术(北京)有限公司 filed Critical 百度在线网络技术(北京)有限公司
Publication of WO2013078829A1 publication Critical patent/WO2013078829A1/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/957Browsing optimisation, e.g. caching or content distillation
    • G06F16/9577Optimising the visualization of content, e.g. distillation of HTML documents

Definitions

  • the present invention relates to the field of Internet technologies, and in particular, to a technology for processing webpage content based on content block identification. Background technique
  • the theme content is usually extracted from the parsed internet webpage, and according to the extracted theme.
  • the content generates a new webpage to convert the original webpage suitable for desktop computer display into a target webpage suitable for mobile device display, but the method for performing webpage conversion is less efficient, and the processing time is high, thereby affecting the movement from the mobile webpage.
  • the response speed of the end user's page access request reduces the user's body.
  • a computer-implemented method for processing web content based on content block identification comprising the steps of:
  • an apparatus for processing webpage content based on a content block identifier comprising:
  • An original webpage obtaining device configured to obtain an original webpage to be processed
  • an identifier information extracting device configured to extract, from the markup language file of the original webpage, the block identifier information, where the block identifier information is used to identify each content block in the markup language file;
  • a processing rule obtaining means configured to perform a matching query in the processing rule base according to the block identification information, to obtain a content block processing rule corresponding to the block identification information
  • the target webpage obtaining means is configured to perform corresponding processing on the content block identified by the block identification information according to the content block processing rule to obtain a target webpage.
  • the present invention performs a matching query in the processing rule base to obtain a piece identification information according to the block language information of the original web page, such as the block identification information corresponding to each content block of the HTML and XHTML files.
  • Corresponding content block processing rules, and then corresponding processing such as folding, deleting, formatting, etc. for each content block, thereby realizing rapid processing of page content; thereby improving page conversion efficiency and quality, thereby improving user experience
  • since only the block identification information needs to be included in the markup language file of the page it is not necessary to include corresponding processing rules, thereby reducing the burden on the website for maintaining the webpage.
  • FIG. 1 shows a schematic diagram of a device for processing webpage content based on content block identification, in accordance with an aspect of the present invention
  • FIG. 2 is a schematic diagram of an apparatus for processing webpage content based on content block identification, in accordance with a preferred embodiment of the present invention
  • FIG. 3 illustrates a flow chart of a method for processing web page content based on content block identification in accordance with another aspect of the present invention
  • FIG. 4 illustrates processing a web page based on a content block identifier in accordance with a preferred embodiment of the present invention.
  • Method flow chart for content
  • the processing device 1 shows a schematic diagram of a device for processing web page content based on content block identification in accordance with an aspect of the present invention.
  • the processing device 1 includes an original web page obtaining device 11, an identification information extracting device 12, a processing rule obtaining device 13, and a target web page obtaining device 14.
  • the processing device 1 may be a network device, including but not limited to a computer, a network host, a single network server, a set of two or more network servers, or a cloud composed of two or more servers, where the cloud is based on cloud computing (Cloud Computing) a large number of computers or network servers, wherein cloud computing is a type of distributed computing, a super virtual computer composed of a group of loosely coupled computers;
  • the processing device 1 can also be a mobile terminal, the mobile terminal means Computer devices that can be used on the move, including but not limited to mobile phones, notebooks, POS machines, on-board computers, etc., are typically much smaller than desktop monitors.
  • processing webpage content by processing device 1 is described in detail below with reference to FIG.
  • the original web page obtaining means 11 acquires the original web page to be processed.
  • the manner of obtaining the original webpage to be processed includes, but is not limited to, the following situations:
  • the interaction device including but not limited to a keyboard, a mouse, a remote controller, a touch pad, or a handwriting device, interacts with a browser software or a client software of the mobile terminal, taking the keyboard as an example, the address of the browser software of the user at the mobile terminal
  • the mobile terminal acquires a key sequence input by the user in real time, for example, a uniform resource locator (URL) input by the user, and records the page access request corresponding to the user input operation, where The URL is included in the page access request, and then the page access request is sent through the agreed communication method.
  • a uniform resource locator URL
  • the original webpage obtaining means 11 receives the page access request in real time, extracts the page URL therefrom, and sends a request for obtaining the webpage to the web server where the webpage is located, for example, it can be encapsulated as a request message, such as an http request message, and sent to the web server through a corresponding communication protocol, such as http, https communication protocol; then, the original web page obtaining device 11 receives the web page that the web server feeds back in response to the request, and The web page is used as the original web page to be processed.
  • a request message such as an http request message
  • a corresponding communication protocol such as http, https communication protocol
  • processing device 1 is a network device.
  • the original web page obtaining means 11 sends a request for receiving the original web page to be processed to the third party device according to a predetermined condition or event triggering or periodically according to an application programming interface (API) provided by the third party device;
  • API application programming interface
  • the original webpage to be processed returned by the third-party device in response to the request message; or the third-party device actively pushes the original webpage to be processed to the processing device 1, and the original webpage obtaining device 11 receives the original webpage to be processed.
  • the identification information extracting device 12 extracts the block identification information from the markup language file of the original web page acquired by the original web page obtaining device 11 by using, for example, string matching, wherein the block identification information is used to identify the markup language file.
  • the block identification information is used to identify the markup language file.
  • markup language file includes but is not limited to:
  • HTML Hypertext Markup Language
  • XHTML Extensible Hypertext Markup Language
  • a WML (Wireless Markup Language) file which is a descriptive markup language used to create pages that can be displayed in a WAP browser.
  • the block identification information includes, but is not limited to, an identification name, an identification ID, and the like; wherein the identification name may be named according to the type of the content block it identifies, such as a title, a navigation, a body, a picture, an embedded object (such as Java). Applet, ActiveX, Flash), etc.
  • the content block means a content area composed of at least one tag in the markup language file, which corresponds to a specific content displayed in the webpage, such as a title content block, a body content block, a navigation content block, and a picture content.
  • the storage manner of the block identification information in the markup language file includes but is not limited to:
  • the custom tag in the markup language file;
  • the custom tag in the HTML file, can be ⁇ tc> ⁇ /tc>, and the identification information can be stored in the custom tag;
  • the identification language file of the original web page acquired by the identification information extraction device 12 is an XHTML file, such as:
  • the identification information extracting means 12 parses the XHTML file, and then according to the keyword "markName” " Perform string matching to get the markName attribute in the div tag attribute and its attribute value "title”, which is the identification name of the content block corresponding to the div tag, and the markName attribute and its attribute in the img tag attribute.
  • the value "picture”, the attribute value is the identification name of the content block corresponding to the img tag.
  • the processing rule obtaining means 13 performs a matching query in the processing rule base based on the block identification information acquired by the identification information extracting means 12 to obtain a content block processing rule corresponding to the block identification information.
  • the processing rule obtaining means 13 performs a matching query in the processing rule base of the local or third party device based on the block identification information to obtain a content block processing rule corresponding to the block identification information.
  • processing rule includes but is not limited to:
  • folding the content block wherein the folding means that the content block is set to be hidden by the content by default, but the content may be expanded by a specific triggering manner;
  • the processing rule base includes each block identification information and a corresponding processing rule thereof, including but not limited to a relational database, a Key-Value storage system, a file system, and the like.
  • the block identification information is a "title”
  • the processing rule obtaining means 13 performs a matching query in the local processing rule base through the application programming interface (API) provided by the processing device 1 according to the block identification information to obtain
  • the content block processing rule corresponding to the "title” block identification information is "show", that is, the content block identified by the block identification information is subjected to display processing.
  • the block identification information is a "picture”
  • the processing rule obtaining means 13 sends a processing rule acquisition request to the third-party device according to the block identification information, where the processing rule acquisition request includes the block identification information; for example, It may be encapsulated into a request message, such as an http request message, and sent to a third-party device through a corresponding communication protocol, such as http, https communication protocol; the third-party device receives and parses the request information in a real-time listening manner, and further Performing a matching query in the processing rule base according to the extracted block identification information, to obtain a content block processing rule corresponding to the block identification information as "zoomin", that is, a picture in the content block identified by the block identification information A predetermined reduction process is performed.
  • the processing rule obtaining means 13 performs a matching query in the processing rule base according to the block identification information and the identification information of the website to which the original webpage belongs, to obtain a content block processing rule customized for the webpage of the website.
  • the identification information of the website to which the original webpage belongs includes, but is not limited to, a website domain name, a website IP address, a website name, and the like.
  • the processing rule obtaining means 13 obtains the UL of the original webpage to be processed, for example, according to the original webpage obtaining means 11, and determines the identification information of the website to which the webpage belongs, such as the website domain name, the website IP address, etc.; The block identification information acquired by the identification information extracting device 12 and the identification information of the website to which the original web page belongs are matched in the processing rule base. If the matching is obtained as a processing rule reserved for the web page of the website, the predetermined processing rule is taken as The content block processing rules for this web page.
  • the processing rule obtaining means 13 extracts the website where the web page is located according to the URL.
  • the website domain name is "www.abc.com”; the processing rule obtaining means 13 performs a matching query in the processing rule base according to the block identification information, and obtains a corresponding processing rule as "delete", that is, deletes the content block identified by the identification information.
  • the processing rule obtaining means 13 ignores the deletion processing rule corresponding to the block identification information, and uses the processing rule predetermined for the website as the content block processing rule.
  • the target webpage obtaining means 14 performs corresponding processing on the content block identified by the block identification information according to the content block processing rule acquired by the processing rule acquiring means 13 to obtain the target webpage.
  • the corresponding processing on the content block includes, but is not limited to: formatting, displaying, deleting, folding, and ordering the content in the content block.
  • the identification information extraction device 12 parses and acquires the HTML of a web page
  • the two block identification information in the file are "body” and "picture”, respectively
  • the processing rule obtaining means 13 acquires the content block processing rule corresponding to the "body” block identification information to fold the content block identified by the identification information.
  • the content block processing rule corresponding to the "picture” block identification information is to reduce the picture in the content block identified by the identification information by a predetermined reduction ratio; and the target webpage obtaining means 14 is in the HTML according to the identification information.
  • the content block identified by each identification information is obtained in the file, and then the content in the content block identified by the "body” block identification information is folded and hidden according to the corresponding processing rule, and a predetermined triggering manner is set to realize the future.
  • the text content may be expanded to be displayed, and the image in the content block identified by the "picture” block identification information is reduced and displayed in a predetermined ratio, and the processed web page is used as the target web page.
  • the original web page obtaining means 11, the identification information extracting means 12, the processing rule obtaining means 13 and the target web page obtaining means 14 are continuously operated.
  • the original webpage obtaining apparatus 11 continuously acquires the original webpage to be processed; then, the identifier information extracting apparatus 12 also continuously extracts block identification information from the markup language file of the original webpage, wherein the block identifier information is used by the block identifier information. And identifying the content blocks in the markup language file; subsequently, the processing rule obtaining means 13 also continuously performs a matching query in the processing rule base according to the block identification information to obtain content corresponding to the block identification information.
  • the block processing rule is further processed according to the content block processing rule, and the content block identified by the block identification information is processed accordingly to obtain a target web page.
  • continuous means that each device continuously performs the acquisition of the original webpage, the extraction of the block identification information, the acquisition of the processing rule, and the acquisition of the target webpage until the predetermined stop condition is met, for example, the original webpage acquisition.
  • the device 11 stops acquiring the original web page to be processed for a long time.
  • the processing rule obtaining means 13 may determine the content according to content related information of the content block identified by the block identification information.
  • the content related information of the content block includes but is not limited to:
  • the processing rule obtaining means 13 determines the processing rule according to the location of the content block in the original webpage; for example, if the content block identified by the block identification information is located at the center of the original webpage, that is, the content block is important in the original webpage If the level is high, the content block processing rule may be determined to perform display processing on the content block.
  • the processing rule obtaining means 13 determines the processing rule according to the number of character characters in the content block; for example, if the number of content block characters identified by the block identification information exceeds a predetermined number of characters threshold, it may be determined that the processing rule is the content The text content in the block is folded;
  • the processing rule obtaining means 13 determines a processing rule according to the tag object included in the content block; for example, if the block identification information includes the tag ⁇ 0 6( ⁇ > in the content block identified in the markup language file of the original web page, and The tag ⁇ 0 6 ( ⁇ > contains an object that is scheduled to be restricted in the mobile device, such as ActiveX, and determines that its processing rule is to delete the content block.
  • the block identification information is an "embedded object”
  • the processing rule obtaining means 13 fails to obtain a corresponding content block processing rule from the processing rule base according to the block identification information, and obtains the corresponding content block processing rule from the tag ⁇ object>
  • the tag has an attribute clsid, and further determines that the ActiveX embedded object is included therein, thereby determining that the processing rule corresponding to the block identification information is to delete the content block identified by the identification information.
  • the processing device 1 further includes an updating device 15.
  • the update device 15' establishes or updates the processing rule base based on the newly determined content block processing rule.
  • the functions of the devices 11, 12, 13, and 14 shown in FIG. 2 are the same as those of the devices 11, 12, 13, and 14 previously described with reference to FIG. 1, for the sake of brevity, The way is included here, without making a comment.
  • the processing rule obtaining means 13 does not obtain the corresponding content block processing rule from the processing rule base according to the identification information, it newly determines the content block processing rule for the identification information, and the updating device 15' according to the identification information and the corresponding The newly determined processing rule is written into the processing rule base to update the processing rule base; if it is detected that the processing rule base is not established, the processing rule base is initialized first, and then the above information is written to the processing In the rule base.
  • the processing rule obtaining means 13 obtains the new processing rule corresponding to the "inline object" as the deletion processing
  • the updating means 15 inserts a tag name and its corresponding in the processing rule base. The data record of the processing rules.
  • the processing device 1 further comprises a providing device (not shown).
  • the original webpage obtaining device 11 acquires the original webpage according to a page access request input by the user through the mobile terminal; and the providing device provides the target webpage to the user.
  • the identification information extracting means 12 extracts block identification information from the markup language file of the original web page, wherein the block identification information is used to identify the mark a content block in the language file; subsequently, the processing rule obtaining means 13 performs a matching query in the processing rule base according to the block identification information to obtain a content block processing rule corresponding to the block identification information; The obtaining means 14 performs corresponding processing on the content block identified by the block identification information according to the content block processing rule to obtain a target webpage; the specific process and the identification information extracting apparatus 12 in the embodiment described above with reference to FIG.
  • the process performed by the processing rule obtaining means 13 and the target web page obtaining means 14 is the same, and is included herein for the sake of brevity and is not to be construed as a reference.
  • the mobile terminal acquires a webpage URL input by the user in real time, and records the page corresponding to the user input operation.
  • An access request wherein the page access request includes the URL, and then the page access request is sent to the processing device 1 by an agreed communication method;
  • the original web page obtaining device 11 receives the page access request in real time, and extracts a page from the page a URL, and sending a request for obtaining the webpage to a web server where the webpage pointed to by the URL is located, and then receiving a webpage that is fed back by the web server in response to the request, and using the webpage as the original webpage to be processed.
  • the providing device obtains the target webpage acquired by the target webpage obtaining device 14 by using any known mobile terminal to provide human readable information, such as screen display, speaker playback, etc., and provides the target webpage to the user through the mobile terminal.
  • the providing device provides the target webpage acquired by the target webpage obtaining device 14 to the mobile terminal in a certain order and format through page technologies, such as JSP, ASP, or PHP, for example, by linking, displaying the page, etc.
  • the method is provided to the mobile terminal for browsing by the user.
  • the processing device 1 further comprises parameter acquisition means (not shown) and preferred rule acquisition means (not shown).
  • the parameter obtaining means acquires display parameter information of the mobile terminal; the preferred rule obtaining means optimizes the content block processing rule according to the display parameter information to obtain a preferred content block processing rule; the target webpage obtaining means 14 according to the The preferred content block processing rule is configured to perform corresponding processing on the content block to obtain the target web page.
  • the parameter obtaining device acquires display parameter information of the mobile terminal by using an API (application programming interface) provided by the mobile terminal to display the target webpage in an agreed manner; where the display parameter information includes but is not limited to :
  • Image formats supported by mobile terminals such as JPEG, PNG, GIF formats, etc.
  • the screen resolution of the mobile terminal such as the physical size of the pixel, the number of color bits,
  • the rule acquisition means optimizes the content block processing rule acquired by the processing rule acquisition means 13 for each identification information according to the display parameter information of the mobile terminal acquired by the parameter acquisition means to obtain a preferred content block processing rule.
  • the target webpage obtaining means 14 performs corresponding processing on the content block according to the preferred content block processing rule to obtain the target webpage.
  • the identified content block includes a Flash animation
  • the processing rule obtaining means 13 obtains in the processing rule base.
  • the corresponding processing rule is to delete the Flash animation identified by the identifier information, but the display parameter information acquired by the parameter obtaining device indicates that the mobile terminal supports the FLASH plug-in operation, and then the preferred rule obtaining device accordingly performs the original processing corresponding to the identifier information.
  • the rule is optimized to preserve the Flash animation in the content block, that is, the preferred content block processing rule; and then the target webpage obtaining device 14 retains the FLASH animation in the corresponding processing of the content block to obtain the target webpage including the FLASH animation.
  • the manner of obtaining the display parameter information and/or the manner of obtaining the preferred content block processing rule and/or the manner of obtaining the target webpage are merely examples, and other existing or future possible acquisition parameter information may be obtained.
  • the manner and/or manner of obtaining the preferred content block processing rules and/or the manner in which the target web page is obtained, as applicable to the present invention, is also included in the scope of the present invention and is incorporated herein by reference.
  • FIG. 3 illustrates a flow diagram of a method for processing web page content based on content block identification in accordance with an aspect of the present invention.
  • the processing device 1 may be a network device, including but not limited to a computer, a network host, a single network server, a set of more than one network server, or a cloud composed of more than one server, where the cloud is cloud computing-based.
  • the processing device 1 can also be a mobile terminal, and the mobile terminal means Computer equipment used in mobile, including but not limited to mobile phones, notebooks, POS machines, car computers, etc., the display size is usually much smaller than the size of the desktop computer.
  • processing webpage content by processing device 1 is described in detail below with reference to FIG. 3:
  • step S1 the processing device 1 acquires the original web page to be processed.
  • the manner of obtaining the original webpage to be processed includes, but is not limited to, the following situations:
  • the mobile terminal obtains the corresponding original webpage from the website server pointed to by the uniform resource locator (URL) in the page access request; in an example, first, the user by means of the mobile terminal
  • the interaction device including but not limited to a keyboard, a mouse, a remote controller, a touch pad, or a handwriting device, interacts with a browser software or a client software of the mobile terminal, taking the keyboard as an example, the address of the browser software of the user at the mobile terminal
  • the mobile terminal acquires a key sequence input by the user in real time, for example, a uniform resource locator (URL) input by the user, and records the page access request corresponding to the user input operation, where
  • the page access request includes the URL, and then the page access request is sent to the processing device 1 by the agreed communication method; then, in step S1, the processing device 1 receives the page access in real time.
  • a uniform resource locator URL
  • request message such as an http request message
  • a corresponding communication protocol such as The http, https communication protocol
  • processing device 1 is a network device.
  • the processing device 1 sends a request message for receiving the original web page to be processed to the third party device according to an application programming interface (API) provided by the third party device, triggered by a predetermined condition or event, or periodically.
  • API application programming interface
  • the third-party device responds to the original web page to be processed returned by the request message; or the third-party device actively pushes the original web page to be processed to the processing device 1, and in step S1, the processing device 1 receives the original web page to be processed.
  • step S2 the processing device 1 extracts block identification information from the markup language file of the original web page acquired in step S1, for example, by using string matching or the like, wherein the block identification information is used to identify the mark Each content block in the language file.
  • markup language file includes but is not limited to:
  • HTML Hypertext Markup Language
  • XHTML Extensible Hypertext Markup Language
  • a WML (Wireless Markup Language) file which is a descriptive markup language used to create pages that can be displayed in a WAP browser.
  • markup language files are only examples, other existing ⁇ / RTI> ⁇ RTIgt; ⁇ /RTI> ⁇ RTIgt; ⁇ /RTI> ⁇ RTIgt; ⁇ /RTI> ⁇ RTIgt; ⁇ /RTI> ⁇ RTIgt; ⁇ /RTI> ⁇ RTIgt;
  • the block identification information includes, but is not limited to, an identification name, an identification ID, and the like; wherein the identification name may be named according to the type of the content block it identifies, such as a title, a navigation, a body, a picture, an embedded object (such as Java). Applet, ActiveX, Flash), etc.
  • the content block means a content area composed of one or more tags in a markup language file, which corresponds to a specific content displayed in a webpage, such as a title content block, a body content block, a navigation content block, Image content blocks, embedded objects (such as Java applets, ActiveX, Flash), and so on.
  • the storage manner of the block identification information in the markup language file includes but is not limited to: 1) markup in the markup language file; for example, using the JSON format, the identification information may be stored in the HTML file comment, such as ⁇ ! — tc block—begin: ⁇ type: "context" ⁇ >, where JSON format is a lightweight data exchange format that generally uses a "name/value" pair to represent data, between name and value. Separated by ":";
  • the custom tag in the markup language file;
  • the custom tag in the HTML file, can be ⁇ tc> ⁇ /tc>, and the identification information can be stored in the custom tag;
  • the identification language file of the original web page acquired by the processing device 1 in step S2 is an XHTML file, such as:
  • the XHTML file pre-defines the content block identification information by using the tag attribute with the attribute name markName, according to which, in step S2, the processing device 1 parses the XHTML file, and according to the keyword "markName” Perform string matching to obtain the markName attribute in the div tag attribute and its attribute value "title”, which is the identification name of the content block corresponding to the div tag, and the markName attribute and its attribute value in the img tag attribute.
  • "Picture” the attribute value is the identification name of the content block corresponding to the img tag.
  • step S3 the processing device 1 performs a matching query in the processing rule base based on the block identification information acquired in step S2 to obtain a content block processing rule corresponding to the block identification information.
  • step S3 the processing device 1 performs a matching query in the processing rule base of the local or third-party device according to the block identification information to obtain a content block processing rule corresponding to the block identification information.
  • processing rule includes but is not limited to:
  • folding the content block wherein the folding means that the content block is set to be hidden by the content by default, but the content may be expanded by a specific triggering manner;
  • the processing rule base includes each block identification information and a corresponding processing rule thereof, including but not limited to a relational database, a Key-Value storage system, a file system, and the like.
  • the block identification information is a "title”
  • the processing device 1 performs a matching query in the local processing rule base by using an application programming interface (API) provided by the processing device 1 according to the block identification information.
  • API application programming interface
  • the content block processing rule corresponding to the "title” block identification information is "show", that is, the content block identified by the block identification information is subjected to display processing.
  • the block identification information is a "picture”
  • the processing device 1 sends a processing rule acquisition request to the third-party device according to the block identification information, where the processing rule acquisition request includes the block identification information.
  • the processing rule acquisition request includes the block identification information.
  • it can be encapsulated into a request message, such as an http request message, and sent to a third-party device through a corresponding communication protocol, such as http, https communication protocol; the third-party device receives and parses the request in real-time listening manner.
  • performing a matching query in the processing rule base according to the extracted block identification information so as to obtain a content block processing rule corresponding to the block identification information, which is “zoomin”, that is, the content block identified by the block identification information.
  • the picture in the picture is subjected to a predetermined reduction process.
  • the processing device 1 according to the block identification information and the The identification information of the website to which the original webpage belongs is matched query in the processing rule base to obtain a content block processing rule customized for the webpage of the website.
  • the identification information of the website to which the original webpage belongs includes, but is not limited to, a website domain name, a website IP address, a website name, and the like.
  • the processing device 1 determines the identification information of the website to which the web page belongs, such as the website domain name, the website IP address, etc., according to the URL of the original web page to be processed, for example, in step S1; Performing a matching query in the processing rule base according to the block identification information acquired in step S2 and the identification information of the website to which the original web page belongs, and if the matching is obtained as a processing rule reserved for the webpage of the website, the predetermined processing rule is obtained. As the content block processing rule of the web page.
  • step S3 when the block identification information is "embedded object" and the URL of the original web page is "www.abc.com/sport/101.htm", in step S3, the processing device 1 extracts the web page according to the URL.
  • the website domain name of the website is "www.abc.com”; the processing device 1 performs a matching query in the processing rule base according to the block identification information, and obtains the corresponding processing rule as "delete", that is, deletes the content identified by the identification information.
  • step S4 the processing device 1 performs corresponding processing on the content block identified by the block identification information according to the content block processing rule acquired in step S3 to obtain a target web page.
  • the corresponding processing on the content block includes, but is not limited to: formatting, displaying, deleting, folding, and ordering the content in the content block.
  • step S2 the processing device 1 parses and acquires two block identification information in an HTML file of a web page as "text" and "picture", respectively, and is in step In step S3, the processing device 1 acquires the content block processing rule corresponding to the "body” block identification information to collapse the content block identified by the identification information, and the content block processing rule corresponding to the "picture" block identification information is The image in the content block identified by the identification information is reduced in a predetermined reduction ratio.
  • step S4 the processing device 1 acquires the content block identified by each identification information in the HTML file according to the identification information, and then, According to the corresponding processing rule, the content in the content block identified by the "body” block identification information is folded and hidden, and a predetermined triggering manner is set, so that the content of the text can be expanded and displayed in the future, and the "picture" block identifier is displayed.
  • the image in the content block identified by the information is reduced and displayed in a predetermined proportion, and the processed web page is used as the target web page.
  • the processing device 1 continues to operate in steps S1, S2, S3 and S4. Specifically, in step S1, the processing device 1 continuously acquires the original web page to be processed; then, in step S2, the processing device 1 also continuously extracts block identification information from the markup language file of the original web page, where The block identification information is used to identify each content block in the markup language file; subsequently, in step S3, the processing device 1 also continuously performs a matching query in the processing rule base according to the block identification information to obtain a content block processing rule corresponding to the block identification information; subsequently, in step S4, the processing device 1 also continuously performs corresponding processing on the content block identified by the block identification information according to the content block processing rule, to Get the landing page.
  • step S1 the processing device 1 continuously acquires the original web page to be processed; then, in step S2, the processing device 1 also continuously extracts block identification information from the markup language file of the original web page, where The block identification information is used to identify each content block in the markup language file; subsequently, in step S3, the processing device
  • continuous means that the processing device 1 continuously performs the acquisition of the original web page, the extraction of the block identification information, the acquisition of the processing rule, and the acquisition of the target web page in each step until the predetermined stop condition is satisfied. For example, the processing device 1 stops acquiring the original web page to be processed for a long time.
  • the processing device 1 may according to the content related information of the content block identified by the block identification information, The content block processing rule is determined.
  • the content related information of the content block includes but is not limited to: 1) location information of the content of the content block in the original webpage;
  • step S3 the processing device 1 determines a processing rule according to the location of the content block in the original webpage; for example, if the content block identified by the block identification information is located at the center of the original webpage, that is, the content block is on the original webpage If the importance level is high, the content block processing rule may be determined to perform display processing on the content block.
  • step S3 the processing device 1 determines a processing rule according to the number of character characters in the content block; for example, if the number of content block characters identified by the block identification information exceeds a predetermined number of characters threshold, it may be determined that the processing rule is Folding the text content in the content block;
  • step S3 the processing device 1 determines a processing rule according to the tag object included in the content block; for example, if the block identification information includes the tag ⁇ 0 6 ( ⁇ in the content block identified in the markup language file of the original web page) > , and the tag ⁇ 0 6 ( ⁇ > contains an object that is scheduled to be restricted in the mobile device, such as ActiveX, then determines its processing rule to delete the content block.
  • step S3 the processing device 1 fails to obtain a corresponding content block processing rule from the processing rule base according to the block identification information, and from the tag ⁇ object> The parsing obtains the tag with the attribute clsid, and further determines that the ActiveX embedded object is included therein, thereby determining that the processing rule corresponding to the block identification information is to delete the content block identified by the identifier information.
  • step S5 the processing device 1 creates or updates the processing rule library according to the newly determined content block processing rule.
  • step S1, step S2, step S3, and step S4, and the processing device 1 described above with reference to FIG. 3 are in step S1, step S2, step S3. It is the same as that in step S4, and for the sake of brevity, it is included herein by reference, and is not described.
  • step S3 when, in step S3, the processing device 1 does not obtain the corresponding content block processing rule from the processing rule base according to the identification information, it newly determines the content block processing rule for the identification information, then in step S5, the processing device 1 according to the identification information and the corresponding newly determined processing rule is written into the processing rule base to update the processing rule base; if it is detected that the processing rule base is not established, the processing rule base is initialized first, and then Write the above information to the processing rule base.
  • step S3 when the new processing rule corresponding to the tag name "inline object" obtained by the processing device 1 is a deletion process, then in step S5, the processing device 1 is in the process rule library. Insert a data record of the tag name and its corresponding processing rule.
  • the process further includes a step S6 (not shown).
  • step S1 the processing device 1 acquires the original webpage according to a page access request input by the user through the mobile terminal; in step S6, the processing device 1 provides the target webpage to the user.
  • step S2 the processing device 1 extracts block identification information from the markup language file of the original webpage, wherein the block identification information is used to identify Each content block in the markup language file; subsequently, in step S3, the processing device 1 performs a matching query in the processing rule base according to the block identification information to obtain a content block processing corresponding to the block identification information. a rule; subsequently, in step S4, the processing device 1 performs corresponding processing on the content block identified by the block identification information according to the content block processing rule to obtain a target web page; the specific process is as described above with reference to FIG.
  • the processes performed by the processing device 1 in the step S2, the step S3 and the step S4 are the same in the described embodiment, and are included herein by way of citation for the sake of brevity.
  • the mobile terminal acquires a webpage URL input by the user in real time, and records the page corresponding to the user input operation.
  • An access request wherein the page access request includes the URL, and then the page access request is sent to the processing device 1 by an agreed communication method; then, in step S1, the processing device 1 receives the page access request in real time, and Extracting a page URL from the webpage, and sending a request for obtaining the webpage to a web server where the webpage pointed to by the webpage is located, and then receiving a webpage that is fed back by the web server in response to the request, and using the webpage as the original webpage to be processed.
  • step S6 the processing device 1 uses the target webpage acquired in step S4 to adopt any known mobile terminal to provide human readable information, such as screen display, speaker playback, etc., to pass the target webpage through the mobile terminal.
  • the processing device 1 provides the target web page acquired in step S4 to the mobile terminal in a certain order and format through page technologies, such as JSP, ASP or PHP, for example.
  • page technologies such as JSP, ASP or PHP, for example.
  • the process further includes a step S7 (not shown) and a step S8 (not shown).
  • step S7 the processing device 1 acquires display parameter information of the mobile terminal; in step S8, the processing device 1 optimizes the content block processing rule according to the display parameter information to obtain a preferred content block processing.
  • step S4 the processing device 1 performs corresponding processing on the content block according to the preferred content block processing rule to obtain the target web page.
  • step S7 the processing device 1 acquires display parameter information of the mobile terminal by calling an API (application programming interface) provided by the mobile terminal to display the target webpage in an agreed manner;
  • the display parameter is Information includes but is not limited to:
  • Image formats supported by mobile terminals such as JPEG, PNG, GIF formats, etc.
  • the screen resolution of the mobile terminal such as the physical size of the pixel, the number of color bits,
  • step S8 the processing device 1 performs optimization processing on the content block processing rule acquired for each identification information in step S3 according to the display parameter information of the mobile terminal acquired in step S7 to obtain a preference. Content block processing rules. Then, in step S4, the processing device 1 performs corresponding processing on the content block according to the preferred content block processing rule to obtain the target web page.
  • step S2 when the block identification information in the markup language file acquired by the processing device 1 in step S2 is "Flash", the identified content block contains a Flash animation, and in step S3, the processing device 1 is processing The corresponding processing rule obtained in the rule base is to delete the Flash animation identified by the identification information, but in step S7, the display parameter information acquired by the processing device 1 indicates that the mobile terminal supports the FLASH plug-in operation, then in step S8, The processing device 1 optimizes the original processing rule corresponding to the identification information to the Flash animation in the reserved content block, that is, the preferred content block processing rule; In step S4, the processing device 1 retains the FLASH animation in the content block when corresponding processing is performed to obtain a target webpage including the FLASH animation.
  • the manner of obtaining the display parameter information and/or the manner of obtaining the preferred content block processing rule and/or the manner of obtaining the target webpage are merely examples, and other existing or future possible acquisition parameter information may be obtained.
  • the manner and/or manner of obtaining the preferred content block processing rules and/or the manner in which the target web page is obtained, as applicable to the present invention, is also included in the scope of the present invention and is incorporated herein by reference.

Abstract

The present invention is directed to a method and a device for processing webpage content on the basis of content block identification. The method comprises: acquiring an original webpage to be processed; extracting block identification information from a markup language file of the original webpage, the block identification information being used for identifying content blocks in the markup language file; performing match query in a processing rule base according to the block identification information, so as to acquire a content block processing rule corresponding to the block identification information; and according to the content block processing rule, perform corresponding processing on the content block identified by the block identification information, so as to acquire a target webpage. Compared with the prior art, the present invention implement fast processing on the webpage content, thus improving the webpage conversion efficiency and quality, and improving the user experience. Meanwhile, as the markup language file of the webpage only needs to comprise the block identification information and does not need to comprise the corresponding processing rule, the webpage maintenance load is reduced for the website.

Description

一种基于内容块标识处理网页内容的方法与设备 技术领域  Method and device for processing webpage content based on content block identification
本发明涉及互联网技术领域, 尤其涉及一种基于内容块标识处理 网页内容的技术。 背景技术  The present invention relates to the field of Internet technologies, and in particular, to a technology for processing webpage content based on content block identification. Background technique
现有技术在进行网页内容处理时, 例如, 将在台式计算机上显示 的网页转换为适于在移动终端上显示的网页时, 通常从解析后的互联 网网页中提取主题内容, 并根据提取的主题内容生成新的网页, 以实 现将适合于台式计算机展示的原始网页转换为适合于移动设备展示 的目标网页, 但利用该方法进行网页转换的效率较低, 处理的时间成 本高, 从而影响来自移动终端用户的页面访问请求的响应速度, 降低 用户体马 。  In the prior art, when performing webpage content processing, for example, when converting a webpage displayed on a desktop computer into a webpage suitable for display on a mobile terminal, the theme content is usually extracted from the parsed internet webpage, and according to the extracted theme. The content generates a new webpage to convert the original webpage suitable for desktop computer display into a target webpage suitable for mobile device display, but the method for performing webpage conversion is less efficient, and the processing time is high, thereby affecting the movement from the mobile webpage. The response speed of the end user's page access request reduces the user's body.
因此, 如何有效地实现快速地对页面内容进行处理, 成为目前亟 待解决的问题之一。 发明内容  Therefore, how to effectively implement the processing of page content quickly becomes one of the problems that need to be solved. Summary of the invention
本发明的目的是提供一种基于内容块标识处理网页内容的方法与 设备。  It is an object of the present invention to provide a method and apparatus for processing web content based on content block identification.
根据本发明的一个方面,提供了一种计算机实现的基于内容块标识 处理网页内容的方法, 该方法包括以下步骤:  According to an aspect of the present invention, a computer-implemented method for processing web content based on content block identification is provided, the method comprising the steps of:
a获取待处理的原始网页;  a obtain the original web page to be processed;
b从所述原始网页的标记语言文件中提取块标识信息, 其中, 所述 块标识信息用于标识所述标记语言文件中的各内容块;  b extracting block identification information from the markup language file of the original webpage, where the block identification information is used to identify each content block in the markup language file;
c根据所述块标识信息, 在处理规则库中进行匹配查询, 以获得与 该块标识信息相对应的内容块处理规则;  c performing a matching query in the processing rule base according to the block identification information to obtain a content block processing rule corresponding to the block identification information;
d根据所述内容块处理规则, 对该块标识信息所标识的内容块进行 相应的处理, 以获得目标网页。 根据本发明的另一方面, 还提供了一种基于内容块标识处理网页 内容的设备, 该设备包括: d according to the content block processing rule, the content block identified by the block identification information is processed correspondingly to obtain a target webpage. According to another aspect of the present invention, there is also provided an apparatus for processing webpage content based on a content block identifier, the apparatus comprising:
原始网页获取装置, 用于获取待处理的原始网页;  An original webpage obtaining device, configured to obtain an original webpage to be processed;
标识信息提取装置, 用于从所述原始网页的标记语言文件中提取块 标识信息, 其中, 所述块标识信息用于标识所述标记语言文件中的各内 容块;  And an identifier information extracting device, configured to extract, from the markup language file of the original webpage, the block identifier information, where the block identifier information is used to identify each content block in the markup language file;
处理规则获取装置, 用于根据所述块标识信息, 在处理规则库中进 行匹配查询, 以获得与该块标识信息相对应的内容块处理规则;  a processing rule obtaining means, configured to perform a matching query in the processing rule base according to the block identification information, to obtain a content block processing rule corresponding to the block identification information;
目标网页获取装置, 用于根据所述内容块处理规则, 对该块标识信 息所标识的内容块进行相应的处理, 以获得目标网页。  The target webpage obtaining means is configured to perform corresponding processing on the content block identified by the block identification information according to the content block processing rule to obtain a target webpage.
与现有技术相比, 本发明根据所获取原始网页的标记语言文件, 如 HTML, XHTML文件的各内容块相对应的块标识信息, 在处理规 则库中进行匹配查询以获得与该块标识信息相对应的内容块处理规则, 进而对各内容块进行相应的诸如折叠、 删除、 格式化等处理, 从而实 现快速地对页面内容进行处理; 由此提高页面转换效率与质量, 从而 提升用户使用体验, 同时由于页面的标记语言文件中仅需包括块标识 信息而无需包括相应的处理规则, 由此减轻网站进行网页维护的负 担。 附图说明  Compared with the prior art, the present invention performs a matching query in the processing rule base to obtain a piece identification information according to the block language information of the original web page, such as the block identification information corresponding to each content block of the HTML and XHTML files. Corresponding content block processing rules, and then corresponding processing such as folding, deleting, formatting, etc. for each content block, thereby realizing rapid processing of page content; thereby improving page conversion efficiency and quality, thereby improving user experience At the same time, since only the block identification information needs to be included in the markup language file of the page, it is not necessary to include corresponding processing rules, thereby reducing the burden on the website for maintaining the webpage. DRAWINGS
通过阅读参照以下附图所作的对非限制性实施例所作的详细描述, 本发明的其它特征、 目的和优点将会变得更明显:  Other features, objects, and advantages of the present invention will become more apparent from the Detailed Description of Description
图 1示出根据本发明一个方面的基于内容块标识处理网页内容的 设备示意图;  1 shows a schematic diagram of a device for processing webpage content based on content block identification, in accordance with an aspect of the present invention;
图 2示出根据本发明一个优选实施例的基于内容块标识处理网页 内容的设备示意图;  2 is a schematic diagram of an apparatus for processing webpage content based on content block identification, in accordance with a preferred embodiment of the present invention;
图 3示出根据本发明另一个方面的基于内容块标识处理网页内容 的方法流程图;  3 illustrates a flow chart of a method for processing web page content based on content block identification in accordance with another aspect of the present invention;
图 4示出根据本发明一个优选实施例的基于内容块标识处理网页 内容的方法流程图。 4 illustrates processing a web page based on a content block identifier in accordance with a preferred embodiment of the present invention. Method flow chart for content.
附图中相同或相似的附图标记代表相同或相似的部件。 具体实施方式  The same or similar reference numerals in the drawings denote the same or similar components. detailed description
下面结合附图对本发明作进一步详细描述。  The invention is further described in detail below with reference to the accompanying drawings.
图 1示出根据本发明一个方面基于内容块标识处理网页内容的设 备示意图。 其中, 处理设备 1包括原始网页获取装置 11、 标识信息提 取装置 12、 处理规则获取装置 13和目标网页获取装置 14。  1 shows a schematic diagram of a device for processing web page content based on content block identification in accordance with an aspect of the present invention. The processing device 1 includes an original web page obtaining device 11, an identification information extracting device 12, a processing rule obtaining device 13, and a target web page obtaining device 14.
在此, 处理设备 1可为网络设备, 包括但不限于计算机、 网络主 机、 单个网络服务器、 两个以上网络服务器集或两个以上服务器构成 的云, 在此, 云由基于云计算(Cloud Computing )的大量计算机或网 络服务器构成, 其中, 云计算是分布式计算的一种, 由一群松散耦合 的计算机集组成的一个超级虚拟计算机;处理设备 1也可为移动终端, 所述移动终端意指可以在移动中使用的计算机设备, 包括但不限于手 机、 笔记本、 POS机、 车载电脑等, 其显示屏尺寸通常远远小于台式 电脑的显示器尺寸。  Here, the processing device 1 may be a network device, including but not limited to a computer, a network host, a single network server, a set of two or more network servers, or a cloud composed of two or more servers, where the cloud is based on cloud computing (Cloud Computing) a large number of computers or network servers, wherein cloud computing is a type of distributed computing, a super virtual computer composed of a group of loosely coupled computers; the processing device 1 can also be a mobile terminal, the mobile terminal means Computer devices that can be used on the move, including but not limited to mobile phones, notebooks, POS machines, on-board computers, etc., are typically much smaller than desktop monitors.
以下参照图 1 来对处理设备 1 处理网页内容的过程进行详细描 述:  The process of processing webpage content by processing device 1 is described in detail below with reference to FIG.
具体地, 原始网页获取装置 11获取待处理的原始网页。  Specifically, the original web page obtaining means 11 acquires the original web page to be processed.
在此,所述获取待处理的原始网页的方式包括但不限于以下情形: Here, the manner of obtaining the original webpage to be processed includes, but is not limited to, the following situations:
1 W艮据来自移动终端的页面访问请求,从该页面访问请求中的统 一资源定位符 (URL ) 所指向的网站服务器处获取相应的原始网页; 在一示例中, 首先, 用户借助移动终端的交互装置, 包括但不限 于键盘、 鼠标、 遥控器、 触摸板、 或手写设备, 与移动终端的浏览器 软件或客户端软件进行交互, 以键盘为例, 用户在移动终端的浏览器 软件的地址栏输入框中进行输入时, 该移动终端实时地获取用户输入 的按键序列, 例如用户输入的一条统一资源定位符 (URL ), 并记录 为与该用户输入操作相对应的页面访问请求, 其中, 该页面访问请求 中包括该 URL,然后将该页面访问请求通过约定的通信方式发送至处 理设备 1 ; 接着, 原始网页获取装置 11实时地接收该页面访问请求, 并从中提取页面 URL, 并向该 URL所指向网页所在的网络服务器发 送获取该网页的请求, 例如, 可将其封装为一请求消息, 如 http请求 消息, 并通过相应的通信协议, 如 http、 https通信协议, 发送至该网 络服务器; 接着, 原始网页获取装置 11 接收该网络服务器响应于该 请求而反馈的网页, 并将该网页作为所述待处理的原始网页。 1 W according to the page access request from the mobile terminal, obtaining the corresponding original webpage from the website server pointed to by the uniform resource locator (URL) in the page access request; in an example, first, the user by means of the mobile terminal The interaction device, including but not limited to a keyboard, a mouse, a remote controller, a touch pad, or a handwriting device, interacts with a browser software or a client software of the mobile terminal, taking the keyboard as an example, the address of the browser software of the user at the mobile terminal When inputting in the column input box, the mobile terminal acquires a key sequence input by the user in real time, for example, a uniform resource locator (URL) input by the user, and records the page access request corresponding to the user input operation, where The URL is included in the page access request, and then the page access request is sent through the agreed communication method. Next, the original webpage obtaining means 11 receives the page access request in real time, extracts the page URL therefrom, and sends a request for obtaining the webpage to the web server where the webpage is located, for example, it can be encapsulated as a request message, such as an http request message, and sent to the web server through a corresponding communication protocol, such as http, https communication protocol; then, the original web page obtaining device 11 receives the web page that the web server feeds back in response to the request, and The web page is used as the original web page to be processed.
2 ) 从第三方设备获取待处理的原始网页。  2) Obtain the original web page to be processed from the third-party device.
在另一示例中, 处理设备 1 为网络设备。 原始网页获取装置 11 根据第三方设备提供的应用编程接口 (API ), 受预定条件或事件触发 地、 或定期地向该第三方设备发送接收待处理的原始网页的请求;肖 息, 并接收该第三方设备响应于该请求消息返回的待处理的原始网 页; 或第三方设备主动向处理设备 1推送待处理的原始网页, 原始网 页获取装置 11接收该待处理的原始网页。  In another example, processing device 1 is a network device. The original web page obtaining means 11 sends a request for receiving the original web page to be processed to the third party device according to a predetermined condition or event triggering or periodically according to an application programming interface (API) provided by the third party device; The original webpage to be processed returned by the third-party device in response to the request message; or the third-party device actively pushes the original webpage to be processed to the processing device 1, and the original webpage obtaining device 11 receives the original webpage to be processed.
本领域技术人员应能理解上述获取待处理的原始网页的方式仅 为举例, 其他现有的或今后可能出现的获取待处理的原始网页的方式 如可适用于本发明, 也应包含在本发明保护范围以内, 并以引用方式 包含于此。  Those skilled in the art should understand that the manner of obtaining the original webpage to be processed is only an example, and other existing or future possible ways of obtaining the original webpage to be processed, as applicable to the present invention, are also included in the present invention. Within the scope of protection, and is hereby incorporated by reference.
接着, 标识信息提取装置 12从原始网页获取装置 11获取的原始 网页的标记语言文件中例如利用字符串匹配等方式提取块标识信息, 其中, 所述块标识信息用于标识所述标记语言文件中的各内容块。  Then, the identification information extracting device 12 extracts the block identification information from the markup language file of the original web page acquired by the original web page obtaining device 11 by using, for example, string matching, wherein the block identification information is used to identify the markup language file. Each piece of content.
在此, 所述标记语言文件包括但不限于:  Here, the markup language file includes but is not limited to:
1 ) HTML (超文本标记语言)文件, 其是用于描述网页文档的一种 标准通用标己语言;  1) HTML (Hypertext Markup Language) file, which is a standard universal markup language used to describe web page documents;
2 ) XML (可扩展标记语言) 文件, 其是一种简单的用于数据存储 的标准通用标记语言;  2) XML (Extensible Markup Language) file, which is a simple standard universal markup language for data storage;
3 ) XHTML (可扩展超文本标记语言) 文件, 其是一种基于 XML 的具有严格语法的标记语言;  3) XHTML (Extensible Hypertext Markup Language) file, which is an XML-based markup language with strict syntax;
4 ) WML (无线标记语言) 文件, 其是用于创建可显示在 WAP浏 览器中的页面的一种描述性标记语言。 本领域技术人员应能理解上述标记语言文件仅为举例, 其他现有 的或今后可能出现的标记语言文件如可适用于本发明, 也应包含在本 发明保护范围以内, 并以引用方式包含于此。 4) A WML (Wireless Markup Language) file, which is a descriptive markup language used to create pages that can be displayed in a WAP browser. Those skilled in the art should understand that the above-mentioned markup language files are only examples, and other existing or future markup language files, as applicable to the present invention, are also included in the scope of the present invention and are incorporated by reference. this.
在此, 所述块标识信息包括但不限于标识名称、 标识 ID等; 其 中, 标识名称的命名可根据其标识的内容块的类型, 如标题、 导航、 正文、 图片、 内嵌对象 (如 Java applet, ActiveX, Flash ) 等。  Here, the block identification information includes, but is not limited to, an identification name, an identification ID, and the like; wherein the identification name may be named according to the type of the content block it identifies, such as a title, a navigation, a body, a picture, an embedded object (such as Java). Applet, ActiveX, Flash), etc.
在此, 所述内容块意为标记语言文件中的由至少一个标签组成的 内容区域, 其与网页中显示的特定内容相对应, 如, 标题内容块、 正 文内容块、 导航内容块、 图片内容块、 内嵌对象 (如 Java applet, ActiveX, Flash ) 块等。  Here, the content block means a content area composed of at least one tag in the markup language file, which corresponds to a specific content displayed in the webpage, such as a title content block, a body content block, a navigation content block, and a picture content. Blocks, embedded objects (such as Java applets, ActiveX, Flash) blocks, etc.
在此,所述块标识信息在标记语言文件中的存储方式包括但不限于: Here, the storage manner of the block identification information in the markup language file includes but is not limited to:
1 )标记语言文件中的注释; 例如, 利用 JSON格式, 标识信息可存 储于 HTML文件注释中 , 如〈!— tc block—begin: {type: "context"}― >, 其 中, JSON格式是一种轻量级的数据交换格式, 其一般采用"名称 /值"对 的方式表示数据, 名称和值之间使用": "隔开; 1) Mark the comments in the language file; for example, using the JSON format, the identification information can be stored in the HTML file comments, such as <! — tc block—begin: {type: "context"}― >, where JSON format is a lightweight data exchange format that generally uses a "name/value" pair to represent data, between name and value. Separated by ":";
2 )标记语言文件中的定制标签; 例如, 在 HTML文件中, 定制标 签可为 <tc></tc>, 标识信息可存储于该定制标签中;  2) a custom tag in the markup language file; for example, in the HTML file, the custom tag can be <tc></tc>, and the identification information can be stored in the custom tag;
3 )标记语言文件中的标签属性; 例如, 在 XHTML文件中, 标识信 息可存储于内容块标签的属性中, 如< ¥ markName= "标题" >, 其中属 性 markName的属性值即为用于标识此 div标签所对应的内容块的标 识信息。  3) Marking the tag attribute in the language file; for example, in the XHTML file, the identification information can be stored in the attribute of the content block tag, such as < ¥ markName= "title" >, where the attribute value of the attribute markName is used for identification The identification information of the content block corresponding to this div tag.
本领域技术人员应能理解上述存储方式仅为举例, 其他现有的或 今后可能出现的存储方式如可适用于本发明, 也应包含在本发明保护 范围以内, 并以引用方式包含于此。  Those skilled in the art will appreciate that the above-described storage methods are merely examples, and other existing or future storage methods, such as those applicable to the present invention, are also included in the scope of the present invention and are incorporated herein by reference.
在一示例中, 当标识信息提取装置 12 获取的原始网页的标识语 言文件为 XHTML文件时, 如:  In an example, when the identification language file of the original web page acquired by the identification information extraction device 12 is an XHTML file, such as:
<body> <body>
<div markNams="标题" > <h2>News headline K/h2> <div markNams="title"> <h2>News headline K/h2>
<p>  <p>
中国互联网规模最大最具影响力最权威的程序设  The largest and most influential and authoritative program in China's Internet
</p>  </p>
</div>  </div>
<img src=7flower.jpg" markName="图片 " l>  <img src=7flower.jpg" markName="Picture" l>
</body> 其中, 此 XHTML文件预先定义利用属性名为 markName的标签 属性来存储内容块标识信息, 据此, 标识信息提取装置 12通过对该 XHTML文件进行解析, 并才艮据关键字 "markName" 进行字符串匹配 以从中获得 div标签属性中的 markName属性及其属性值 "标题" , 该属性值即为该 div标签所对应内容块的标识名称, 以及 img标签属 性中的 markName属性及其属性值 "图片" , 该属性值即为该 img标 签所对应内容块的标识名称。  </body> wherein the XHTML file is pre-defined to store the content block identification information by using the tag attribute of the attribute name markName, and accordingly, the identification information extracting means 12 parses the XHTML file, and then according to the keyword "markName" " Perform string matching to get the markName attribute in the div tag attribute and its attribute value "title", which is the identification name of the content block corresponding to the div tag, and the markName attribute and its attribute in the img tag attribute. The value "picture", the attribute value is the identification name of the content block corresponding to the img tag.
本领域技术人员应能理解上述提取块标识信息的方式仅为举例, 其他现有的或今后可能出现的提取块标识信息的方式如可适用于本 发明, 也应包含在本发明保护范围以内, 并以引用方式包含于此。  A person skilled in the art should understand that the manner of extracting block identification information is only an example, and other existing or future possible methods for extracting block identification information may be applicable to the present invention and should also be included in the scope of protection of the present invention. And is included here by reference.
随后, 处理规则获取装置 13根据标识信息提取装置 12获取的块 标识信息, 在处理规则库中进行匹配查询, 以获得与该块标识信息相 对应的内容块处理规则。  Then, the processing rule obtaining means 13 performs a matching query in the processing rule base based on the block identification information acquired by the identification information extracting means 12 to obtain a content block processing rule corresponding to the block identification information.
具体地, 处理规则获取装置 13 根据块标识信息, 在本地或第三 方设备的处理规则库中进行匹配查询, 以获得与该块标识信息相对应 的内容块处理规则。  Specifically, the processing rule obtaining means 13 performs a matching query in the processing rule base of the local or third party device based on the block identification information to obtain a content block processing rule corresponding to the block identification information.
在此, 所述处理规则包括但不限于:  Here, the processing rule includes but is not limited to:
1 ) 对内容块中的内容进行格式化; 其中, 所述格式化包括但不 限于:  1) formatting the content in the content block; wherein the formatting includes but is not limited to:
i 改变所述内容块中的文字属性, 如字体、 大小、 颜色, 内容的 背景色等; 11对所述内容块中包含的图片按预定比例进行缩小等;i changing the text attributes in the content block, such as font, size, color, background color of the content, etc.; 11 reducing the picture included in the content block by a predetermined ratio;
2 )对内容块进行展示; 2) display the content block;
3 )对内容块进行删除;  3) delete the content block;
4 ) 对内容块进行折叠; 其中, 所述折叠意指该内容块设置为其 内容缺省是折叠隐藏的, 但可通过特定的触发方式, 将该内容展开来 显示;  4) folding the content block; wherein the folding means that the content block is set to be hidden by the content by default, but the content may be expanded by a specific triggering manner;
5 )对内容块的显示位置进行调整。  5) Adjust the display position of the content block.
本领域技术人员应能理解上述处理规则仅为举例, 其他现有的或 今后可能出现的处理规则如可适用于本发明, 也应包含在本发明保护 范围以内, 并以引用方式包含于此。  It should be understood by those skilled in the art that the above-mentioned processing rules are only examples, and other existing or future processing rules may be applied to the present invention, and are also included in the scope of the present invention and are incorporated herein by reference.
在此, 所述处理规则库中包含各块标识信息及其所对应的处理规 则, 其包括但不限于关系数据库、 Key- Value存储系统、 文件系统等。  Here, the processing rule base includes each block identification information and a corresponding processing rule thereof, including but not limited to a relational database, a Key-Value storage system, a file system, and the like.
在一示例中, 块标识信息为 "标题" , 处理规则获取装置 13 根 据该块标识信息, 通过处理设备 1提供的应用编程接口 (API ) , 在 本地的处理规则库中进行匹配查询, 以获取与 "标题" 块标识信息相 对应的内容块处理规则为" show", 即将该块标识信息所标识的内容块 进行展示处理。  In an example, the block identification information is a "title", and the processing rule obtaining means 13 performs a matching query in the local processing rule base through the application programming interface (API) provided by the processing device 1 according to the block identification information to obtain The content block processing rule corresponding to the "title" block identification information is "show", that is, the content block identified by the block identification information is subjected to display processing.
在另一示例中, 块标识信息为 "图片" , 处理规则获取装置 13 根据该块标识信息, 向第三方设备发送处理规则获取请求, 其中, 该 处理规则获取请求包括该块标识信息; 例如, 可将其封装为一请求消 息, 如 http请求消息, 并通过相应的通信协议, 如 http、 https通信协 议, 发送至第三方设备; 第三方设备以实时监听地方式接收并解析该 请求信息, 进而根据所提取的该块标识信息在其处理规则库中进行匹 配查询, 以获取与该块标识信息相对应的内容块处理规则为 "zoomin", 即将该块标识信息所标识的内容块中的图片进行预定的缩 小处理。  In another example, the block identification information is a "picture", and the processing rule obtaining means 13 sends a processing rule acquisition request to the third-party device according to the block identification information, where the processing rule acquisition request includes the block identification information; for example, It may be encapsulated into a request message, such as an http request message, and sent to a third-party device through a corresponding communication protocol, such as http, https communication protocol; the third-party device receives and parses the request information in a real-time listening manner, and further Performing a matching query in the processing rule base according to the extracted block identification information, to obtain a content block processing rule corresponding to the block identification information as "zoomin", that is, a picture in the content block identified by the block identification information A predetermined reduction process is performed.
本领域技术人员应能理解上述获取处理规则的方式仅为举例, 其 他现有的或今后可能出现的获取处理规则的方式如可适用于本发明, 也应包含在本发明保护范围以内, 并以引用方式包含于此。 优选地, 处理规则获取装置 13 根据所述块标识信息和所述原始 网页所属网站的标识信息, 在处理规则库中进行匹配查询, 以获得为 该网站的网页定制的内容块处理规则。 在此, 所述原始网页所属网站 的标识信息包括但不限于网站域名、 网站 IP地址、 网站名称等。 Those skilled in the art should understand that the manner of obtaining the processing rule is only an example, and other existing or future possible acquisition processing rules may be applicable to the present invention, and should also be included in the protection scope of the present invention. The reference is included here. Preferably, the processing rule obtaining means 13 performs a matching query in the processing rule base according to the block identification information and the identification information of the website to which the original webpage belongs, to obtain a content block processing rule customized for the webpage of the website. Here, the identification information of the website to which the original webpage belongs includes, but is not limited to, a website domain name, a website IP address, a website name, and the like.
具体地, 处理规则获取装置 13例如根据原始网页获取装置 11获 取待处理的原始网页的 U L, 确定该网页所属网站的标识信息, 如网站 域名、 网站 IP地址等; 接着, 处理规则获取装置 13根据标识信息提 取装置 12获取的块标识信息和该原始网页所属网站的标识信息, 在 处理规则库中进行匹配查询, 若匹配获得为该网站的网页预定的处理 规则, 则将该预定的处理规则作为该网页的内容块处理规则。  Specifically, the processing rule obtaining means 13 obtains the UL of the original webpage to be processed, for example, according to the original webpage obtaining means 11, and determines the identification information of the website to which the webpage belongs, such as the website domain name, the website IP address, etc.; The block identification information acquired by the identification information extracting device 12 and the identification information of the website to which the original web page belongs are matched in the processing rule base. If the matching is obtained as a processing rule reserved for the web page of the website, the predetermined processing rule is taken as The content block processing rules for this web page.
在一示例中, 当块标识信息为 "内嵌对象" , 原始网页的 URL 为" www.abc.com/sport/101.htm", 处理规则获取装置 13根据该 URL, 提取该网页所在网站的网站域名为 "www.abc.com"; 处理规则获取装 置 13根据该块标识信息在处理规则库中进行匹配查询, 获得相应的 处理规则为" delete" , 即删除该标识信息所标识的内容块, 但据该块 标识信息和该原始网页所述网站的网站域名在处理规则库中进行匹 配查询, 获得为该网站预定的对 "内嵌对象" 块标识信息的处理规则 为" show", 即展示该标识信息所标识的内容块, 则处理规则获取装置 13忽略与该块标识信息所对应的删除处理规则, 将为该网站预定的处 理规则作为该内容块处理规则。  In an example, when the block identification information is "embedded object" and the URL of the original web page is "www.abc.com/sport/101.htm", the processing rule obtaining means 13 extracts the website where the web page is located according to the URL. The website domain name is "www.abc.com"; the processing rule obtaining means 13 performs a matching query in the processing rule base according to the block identification information, and obtains a corresponding processing rule as "delete", that is, deletes the content block identified by the identification information. However, according to the block identification information and the website domain name of the website described in the original webpage, a matching query is performed in the processing rule base, and the processing rule for obtaining the "inline object" block identification information reserved for the website is "show", that is, When the content block identified by the identification information is displayed, the processing rule obtaining means 13 ignores the deletion processing rule corresponding to the block identification information, and uses the processing rule predetermined for the website as the content block processing rule.
本领域技术人员应能理解上述获取处理规则的方式仅为举例, 其 他现有的或今后可能出现的获取处理规则的方式如可适用于本发明, 也应包含在本发明保护范围以内, 并以引用方式包含于此。  Those skilled in the art should understand that the manner of obtaining the processing rule is only an example, and other existing or future possible acquisition processing rules may be applicable to the present invention, and should also be included in the protection scope of the present invention. The reference is included here.
随后, 目标网页获取装置 14根据处理规则获取装置 13获取的内 容块处理规则, 对该块标识信息所标识的内容块进行相应的处理, 以获 得目标网页。  Then, the target webpage obtaining means 14 performs corresponding processing on the content block identified by the block identification information according to the content block processing rule acquired by the processing rule acquiring means 13 to obtain the target webpage.
在此, 所述对内容块进行相应的处理包括但不限于: 对内容块中的 内容进行格式化、 展示、 删除、 折叠、 调序。  Here, the corresponding processing on the content block includes, but is not limited to: formatting, displaying, deleting, folding, and ordering the content in the content block.
在一示例中,当标识信息提取装置 12解析并获取某网页的 HTML 文件中的两个块标识信息分别为 "正文" 和 "图片" , 且处理规则获 取装置 13获取与 "正文" 块标识信息相对应的内容块处理规则为将 该标识信息所标识的内容块折叠, 而与 "图片" 块标识信息相对应的 内容块处理规则为将该标识信息所标识的内容块中的图片按预定缩 小比例进行缩小; 则目标网页获取装置 14根据上述标识信息, 在该 HTML文件中获取各标识信息所标识的内容块, 然后, 才艮据相应的处 理规则, 将 "正文" 块标识信息所标识的内容块中的内容折叠隐藏, 并设置预定的触发方式, 以实现将来可对该正文内容展开来显示, 并 将 "图片" 块标识信息所标识的内容块中的图片按预定比例进行缩小 并展示, 进而将处理后的网页作为目标网页。 In an example, when the identification information extraction device 12 parses and acquires the HTML of a web page The two block identification information in the file are "body" and "picture", respectively, and the processing rule obtaining means 13 acquires the content block processing rule corresponding to the "body" block identification information to fold the content block identified by the identification information. And the content block processing rule corresponding to the "picture" block identification information is to reduce the picture in the content block identified by the identification information by a predetermined reduction ratio; and the target webpage obtaining means 14 is in the HTML according to the identification information. The content block identified by each identification information is obtained in the file, and then the content in the content block identified by the "body" block identification information is folded and hidden according to the corresponding processing rule, and a predetermined triggering manner is set to realize the future. The text content may be expanded to be displayed, and the image in the content block identified by the "picture" block identification information is reduced and displayed in a predetermined ratio, and the processed web page is used as the target web page.
本领域技术人员应能理解上述获取目标网页的方式仅为举例, 其 他现有的或今后可能出现的获取目标网页的方式如可适用于本发明, 也应包含在本发明保护范围以内, 并以引用方式包含于此。  A person skilled in the art should understand that the manner of obtaining the target webpage is only an example. Other existing or future possible ways of obtaining the target webpage may be applicable to the present invention, and should also be included in the scope of the present invention. The reference is included here.
优选地, 原始网页获取装置 11、 标识信息提取装置 12、 处理规则 获取装置 13和目标网页获取装置 14之间是持续不断地工作。具体地, 原始网页获取装置 11 持续地获取待处理的原始网页; 接着, 标识信 息提取装置 12也持续地从所述原始网页的标记语言文件中提取块标 识信息, 其中, 所述块标识信息用于标识所述标记语言文件中的各内 容块; 随后, 处理规则获取装置 13也持续地根据所述块标识信息, 在 处理规则库中进行匹配查询, 以获得与该块标识信息相对应的内容块 处理规则; 随后, 目标网页获取装置 14也持续地根据所述内容块处 理规则, 对该块标识信息所标识的内容块进行相应的处理, 以获得目 标网页。 在此, 本领域技术人员应理解"持续"是指各装置不断进行上 述原始网页的获取、 块标识信息的提取、 处理规则的获取及目标网页 的获取, 直至满足预定停止条件, 例如原始网页获取装置 11 在较长 时间内停止获取待处理的原始网页。  Preferably, the original web page obtaining means 11, the identification information extracting means 12, the processing rule obtaining means 13 and the target web page obtaining means 14 are continuously operated. Specifically, the original webpage obtaining apparatus 11 continuously acquires the original webpage to be processed; then, the identifier information extracting apparatus 12 also continuously extracts block identification information from the markup language file of the original webpage, wherein the block identifier information is used by the block identifier information. And identifying the content blocks in the markup language file; subsequently, the processing rule obtaining means 13 also continuously performs a matching query in the processing rule base according to the block identification information to obtain content corresponding to the block identification information. The block processing rule is further processed according to the content block processing rule, and the content block identified by the block identification information is processed accordingly to obtain a target web page. Here, those skilled in the art should understand that "continuous" means that each device continuously performs the acquisition of the original webpage, the extraction of the block identification information, the acquisition of the processing rule, and the acquisition of the target webpage until the predetermined stop condition is met, for example, the original webpage acquisition. The device 11 stops acquiring the original web page to be processed for a long time.
优选地 (参照图 1 ) , 当未从所述处理规则库获得所述内容块处 理规则时, 处理规则获取装置 13可根据所述块标识信息所标识的内 容块的内容相关信息, 确定所述内容块处理规则。 在此, 所述内容块的内容相关信息包括但不限于: Preferably (refer to FIG. 1), when the content block processing rule is not obtained from the processing rule base, the processing rule obtaining means 13 may determine the content according to content related information of the content block identified by the block identification information. Content block processing rules. Here, the content related information of the content block includes but is not limited to:
1 ) 内容块的内容在所述原始网页中的位置信息;  1) location information of the content of the content block in the original webpage;
2 ) 内容块的内容所包含的文字字符数量;  2) the number of text characters contained in the content of the content block;
3 ) 内容块所包含的标签信息。  3) The tag information contained in the content block.
本领域技术人员应能理解上述内容相关信息仅为举例, 其他现有 的或今后可能出现的内容相关信息如可适用于本发明, 也应包含在本 发明保护范围以内, 并以引用方式包含于此。  Those skilled in the art should understand that the above content related information is only an example, and other existing or future content related information may be applicable to the present invention, and should also be included in the scope of the present invention and included in the reference. this.
1 )处理规则获取装置 13根据内容块在原始网页中的位置确定处 理规则;例如,若块标识信息所标识的内容块位于原始网页的中心处, 即说明该内容块在该原始网页中的重要等级高, 则可确定该内容块处 理规则为对该内容块进行展示处理。 1) The processing rule obtaining means 13 determines the processing rule according to the location of the content block in the original webpage; for example, if the content block identified by the block identification information is located at the center of the original webpage, that is, the content block is important in the original webpage If the level is high, the content block processing rule may be determined to perform display processing on the content block.
2 )处理规则获取装置 13根据内容块中的文字字符的数量确定处 理规则; 例如, 若块标识信息所标识的内容块字符数量超过预定字符 数量阈值时, 则可确定其处理规则为将该内容块中文字内容进行折叠 处理;  2) The processing rule obtaining means 13 determines the processing rule according to the number of character characters in the content block; for example, if the number of content block characters identified by the block identification information exceeds a predetermined number of characters threshold, it may be determined that the processing rule is the content The text content in the block is folded;
3 )处理规则获取装置 13根据该内容块中包含的标签对象确定处 理规则; 例如, 若块标识信息在原始网页的标记语言文件中所标识的 内容块中包括标签<0 6(^> ,且该标签<0 6(^>包含在移动设备中预定 限制使用的对象, 如 ActiveX, 则确定其处理规则为对该内容块进行 删除。  3) the processing rule obtaining means 13 determines a processing rule according to the tag object included in the content block; for example, if the block identification information includes the tag <0 6(^> in the content block identified in the markup language file of the original web page, and The tag <0 6 (^> contains an object that is scheduled to be restricted in the mobile device, such as ActiveX, and determines that its processing rule is to delete the content block.
在一示例中, 在原始网页的 HTML文件存在以下代码片段:  In an example, the following code snippet exists in the HTML file of the original web page:
<!-- tc block—begin: {markName: "内嵌对象 "} -- > <!-- tc block—begin: {markName: "embedded object"} -- >
< OBJECT  < OBJECT
classid="clsid: 2F390484-1C7D-11D0-8908-00A0C90395F4" codebase="ActiveXDoc.cab#version=l , 0, 0, 0" >  Classid="clsid: 2F390484-1C7D-11D0-8908-00A0C90395F4" codebase="ActiveXDoc.cab#version=l , 0, 0, 0" >
< /OBJECT > ,  < /OBJECT > ,
<!— tc block end— > 其中存在的块标识信息为 "内嵌对象", 处理规则获取装置 13根 据该块标识信息未能从处理规则库中查询匹配获得相应的内容块处 理规则, 且从该标签 <object>中解析获得该标签具有属性 clsid, 进而 确定其中包括 ActiveX内嵌对象, 由此确定该块标识信息所对应的处 理规则为将该标识信息所标识的内容块删除。 <!- tc block end— > The block identification information is an "embedded object", and the processing rule obtaining means 13 fails to obtain a corresponding content block processing rule from the processing rule base according to the block identification information, and obtains the corresponding content block processing rule from the tag <object> The tag has an attribute clsid, and further determines that the ActiveX embedded object is included therein, thereby determining that the processing rule corresponding to the block identification information is to delete the content block identified by the identification information.
本领域技术人员应能理解上述确定处理规则的方式仅为举例, 其 他现有的或今后可能出现的确定处理规则的方式如可适用于本发明, 也应包含在本发明保护范围以内, 并以引用方式包含于此。  Those skilled in the art should understand that the manner of determining the processing rule is merely an example, and other existing or future possible methods for determining the processing rule, as applicable to the present invention, are also included in the scope of the present invention, and The reference is included here.
图 2示出根据本发明一个优选实施例的基于内容块标识处理网页 内容的设备示意图。 其中, 处理设备 1还包括更新装置 15,。 更新装 置 15'根据所述新确定的内容块处理规则, 建立或更新所述处理规则 库。  2 shows a schematic diagram of an apparatus for processing web page content based on content block identification in accordance with a preferred embodiment of the present invention. The processing device 1 further includes an updating device 15. The update device 15' establishes or updates the processing rule base based on the newly determined content block processing rule.
在此, 图 2中所示装置 11,、 12,、 13,和 14,的功能与前面参照图 1 所描述的装置 11、 12、 13和 14的内容相同, 为简明起见, 将其以引用 方式包含于此, 而不做赞述。  Here, the functions of the devices 11, 12, 13, and 14 shown in FIG. 2 are the same as those of the devices 11, 12, 13, and 14 previously described with reference to FIG. 1, for the sake of brevity, The way is included here, without making a comment.
具体地, 当处理规则获取装置 13,根据标识信息未从处理规则库 获得相应的内容块处理规则时, 其为标识信息新确定内容块处理规 则, 则更新装置 15'根据该标识信息及其对应的该新确定的处理规则 写入到该处理规则库中, 以更新该处理规则库; 若检测到该处理规则 库未建立, 则先行初始化该处理规则库, 然后将上述信息写入到该处 理规则库中。  Specifically, when the processing rule obtaining means 13 does not obtain the corresponding content block processing rule from the processing rule base according to the identification information, it newly determines the content block processing rule for the identification information, and the updating device 15' according to the identification information and the corresponding The newly determined processing rule is written into the processing rule base to update the processing rule base; if it is detected that the processing rule base is not established, the processing rule base is initialized first, and then the above information is written to the processing In the rule base.
在一示例中, 处理规则获取装置 13,获得的标记名称为 "内嵌对 象" 所对应的新处理规则为删除处理时, 则更新装置 15,在处理规则 库中插入一条该标记名称及其对应的处理规则的数据记录。  In an example, when the processing rule obtaining means 13 obtains the new processing rule corresponding to the "inline object" as the deletion processing, the updating means 15 inserts a tag name and its corresponding in the processing rule base. The data record of the processing rules.
本领域技术人员应能理解上述建立或更新处理规则库的方式仅 为举例, 其他现有的或今后可能出现的建立或更新处理规则库的方式 如可适用于本发明, 也应包含在本发明保护范围以内, 并以引用方式 包含于此。 在另一个优选实施例中 (参照图 1 ) , 处理设备 1还包括提供装置 (未示出) 。 其中, 原始网页获取装置 11根据用户通过移动终端输入 的页面访问请求, 获取所述原始网页; 提供装置将所述目标网页提供给 所述用户。 Those skilled in the art should understand that the above manner of establishing or updating the processing rule base is only an example, and other existing or future possible ways of establishing or updating the processing rule base may be applied to the present invention, and should also be included in the present invention. Within the scope of protection, and is hereby incorporated by reference. In another preferred embodiment (cf. Fig. 1), the processing device 1 further comprises a providing device (not shown). The original webpage obtaining device 11 acquires the original webpage according to a page access request input by the user through the mobile terminal; and the providing device provides the target webpage to the user.
以下参照图 1对该另一优选实施例进行详细描述, 其中, 标识信息 提取装置 12从所述原始网页的标记语言文件中提取块标识信息, 其 中,所述块标识信息用于标识所述标记语言文件中的各内容块;随后, 处理规则获取装置 13根据所述块标识信息, 在处理规则库中进行匹 配查询, 以获得与该块标识信息相对应的内容块处理规则; 随后, 目 标网页获取装置 14根据所述内容块处理规则, 对该块标识信息所标 识的内容块进行相应的处理, 以获得目标网页; 其具体过程与前述参 照图 1所描述的实施例中标识信息提取装置 12、 处理规则获取装置 13 和目标网页获取装置 14所执行的过程相同, 为简明起见, 以引用方式 包含于此, 而不做赞述。  The other preferred embodiment is described in detail below with reference to FIG. 1, wherein the identification information extracting means 12 extracts block identification information from the markup language file of the original web page, wherein the block identification information is used to identify the mark a content block in the language file; subsequently, the processing rule obtaining means 13 performs a matching query in the processing rule base according to the block identification information to obtain a content block processing rule corresponding to the block identification information; The obtaining means 14 performs corresponding processing on the content block identified by the block identification information according to the content block processing rule to obtain a target webpage; the specific process and the identification information extracting apparatus 12 in the embodiment described above with reference to FIG. The process performed by the processing rule obtaining means 13 and the target web page obtaining means 14 is the same, and is included herein for the sake of brevity and is not to be construed as a reference.
在一示例中, 首先, 用户在移动终端的浏览器软件的地址栏输入 框中进行输入时, 该移动终端实时地获取用户输入的一条网页 URL, 并记录为与该用户输入操作相对应的页面访问请求, 其中, 该页面访 问请求中包括该 URL,然后将该页面访问请求通过约定的通信方式发 送至处理设备 1 ;接着,原始网页获取装置 11实时地接收该页面访问 请求, 并从中提取页面 URL, 并向该 URL所指向网页所在的网络服 务器发送获取该网页的请求, 然后, 接收该网络服务器响应于该请求 而反馈的网页, 并将该网页作为所述待处理的原始网页。  In an example, first, when the user inputs in the address bar input box of the browser software of the mobile terminal, the mobile terminal acquires a webpage URL input by the user in real time, and records the page corresponding to the user input operation. An access request, wherein the page access request includes the URL, and then the page access request is sent to the processing device 1 by an agreed communication method; then, the original web page obtaining device 11 receives the page access request in real time, and extracts a page from the page a URL, and sending a request for obtaining the webpage to a web server where the webpage pointed to by the URL is located, and then receiving a webpage that is fed back by the web server in response to the request, and using the webpage as the original webpage to be processed.
提供装置将目标网页获取装置 14 获取的目标网页, 采用任何已 知的移动终端提供人可读信息的技术手段, 例如屏幕显示、 扬声器播 放等, 将该目标网页通过移动终端提供给该用户。 例如, 以屏幕显示 为例, 提供装置将目标网页获取装置 14获取的目标网页, 通过页面 技术, 如 JSP、 ASP或 PHP等, 按一定顺序和格式提供给移动终端, 例如以链接、 页面显示等方式提供给该移动终端, 供用户进行浏览。  The providing device obtains the target webpage acquired by the target webpage obtaining device 14 by using any known mobile terminal to provide human readable information, such as screen display, speaker playback, etc., and provides the target webpage to the user through the mobile terminal. For example, taking the screen display as an example, the providing device provides the target webpage acquired by the target webpage obtaining device 14 to the mobile terminal in a certain order and format through page technologies, such as JSP, ASP, or PHP, for example, by linking, displaying the page, etc. The method is provided to the mobile terminal for browsing by the user.
本领域技术人员应能理解上述获取原始网页的方式和 /或提供目 标网页的方式仅为举例, 其他现有的或今后可能出现的获取原始网页 的方式和 /或提供目标网页的方式如可适用于本发明,也应包含在本发 明保护范围以内, 并以引用方式包含于此。 Those skilled in the art should be able to understand the manner in which the original web page is obtained and/or provide the purpose. The manner of marking the webpage is only an example, and other existing or future possible ways of obtaining the original webpage and/or the manner of providing the target webpage may be applicable to the present invention, and should also be included in the scope of protection of the present invention. The way is included here.
优选地 (参照图 1 ), 处理设备 1还包括参数获取装置 (未示出) 和优选规则获取装置 (未示出)。 其中, 参数获取装置获取所述移动 终端的显示参数信息; 优选规则获取装置根据所述显示参数信息对所 述内容块处理规则进行优化, 以获得优选内容块处理规则; 目标网页 获取装置 14根据所述优选内容块处理规则, 对所述内容块进行相应 的处理, 以获得所述目标网页。  Preferably (refer to Fig. 1), the processing device 1 further comprises parameter acquisition means (not shown) and preferred rule acquisition means (not shown). The parameter obtaining means acquires display parameter information of the mobile terminal; the preferred rule obtaining means optimizes the content block processing rule according to the display parameter information to obtain a preferred content block processing rule; the target webpage obtaining means 14 according to the The preferred content block processing rule is configured to perform corresponding processing on the content block to obtain the target web page.
具体地, 参数获取装置按照约定的方式通过调用待显示该目标网页 的移动终端提供的 API (应用编程接口), 获取该移动终端的显示参数信 息; 在此, 所述显示参数信息包括但不限于:  Specifically, the parameter obtaining device acquires display parameter information of the mobile terminal by using an API (application programming interface) provided by the mobile terminal to display the target webpage in an agreed manner; where the display parameter information includes but is not limited to :
1 ) 移动终端支持的图片格式, 如 JPEG、 PNG、 GIF格式等, 1) Image formats supported by mobile terminals, such as JPEG, PNG, GIF formats, etc.
2 ) 移动终端的屏幕分辨率, 如像素的物理大小, 色彩位数, 2) the screen resolution of the mobile terminal, such as the physical size of the pixel, the number of color bits,
3 ) 移动终端是否支持插件, 如 Flash插件等;  3) Whether the mobile terminal supports plug-ins, such as Flash plug-ins;
接着, 优选规则获取装置根据参数获取装置获取的该移动终端的 显示参数信息, 对处理规则获取装置 13 为各标识信息所获取的内容 块处理规则进行优化处理, 以获得优选内容块处理规则。 随后, 目标 网页获取装置 14根据该优选内容块处理规则, 对所述内容块进行相 应的处理, 以获得所述目标网页。  Then, the rule acquisition means optimizes the content block processing rule acquired by the processing rule acquisition means 13 for each identification information according to the display parameter information of the mobile terminal acquired by the parameter acquisition means to obtain a preferred content block processing rule. Subsequently, the target webpage obtaining means 14 performs corresponding processing on the content block according to the preferred content block processing rule to obtain the target webpage.
在一示例中, 当标识信息获取装置 12获取的标记语言文件中的 块标识信息为 "Flash" , 其所标识的内容块中包含 Flash动画, 且处 理规则获取装置 13在处理规则库中获取的相应处理规则为将该标识 信息所标识的 Flash 动画删除, 但参数获取装置获取的显示参数信息 示出该移动终端支持 FLASH插件运行, 则优选规则获取装置据此将 该标识信息所对应的原始处理规则优化为保留内容块中的 Flash 动 画, 即为优选内容块处理规则; 进而目标网页获取装置 14对该内容 块进行相应处理时保留其中的 FLASH 动画, 以获得包含该 FLASH 动画的目标网页。 本领域技术人员应能理解上述获取显示参数信息的方式和 /或获 取优选内容块处理规则的方式和 /或获取目标网页的方式仅为举例, 其 他现有的或今后可能出现的获取显示参数信息的方式和 /或获取优选 内容块处理规则的方式和 /或获取目标网页的方式如可适用于本发明, 也应包含在本发明保护范围以内, 并以引用方式包含于此。 In an example, when the block identification information in the markup language file acquired by the identification information acquiring device 12 is "Flash", the identified content block includes a Flash animation, and the processing rule obtaining means 13 obtains in the processing rule base. The corresponding processing rule is to delete the Flash animation identified by the identifier information, but the display parameter information acquired by the parameter obtaining device indicates that the mobile terminal supports the FLASH plug-in operation, and then the preferred rule obtaining device accordingly performs the original processing corresponding to the identifier information. The rule is optimized to preserve the Flash animation in the content block, that is, the preferred content block processing rule; and then the target webpage obtaining device 14 retains the FLASH animation in the corresponding processing of the content block to obtain the target webpage including the FLASH animation. Those skilled in the art should be able to understand that the manner of obtaining the display parameter information and/or the manner of obtaining the preferred content block processing rule and/or the manner of obtaining the target webpage are merely examples, and other existing or future possible acquisition parameter information may be obtained. The manner and/or manner of obtaining the preferred content block processing rules and/or the manner in which the target web page is obtained, as applicable to the present invention, is also included in the scope of the present invention and is incorporated herein by reference.
图 3示出根据本发明一个方面基于内容块标识处理网页内容的方 法流程图。  3 illustrates a flow diagram of a method for processing web page content based on content block identification in accordance with an aspect of the present invention.
在此, 处理设备 1可为网络设备, 包括但不限于计算机、 网络主 机、 单个网络服务器、 一个以上网络服务器集或一个以上服务器构成 的云, 在此, 云由基于云计算(Cloud Computing )的大量计算机或网 络服务器构成, 其中, 云计算是分布式计算的一种, 由一群松散耦合 的计算机集组成的一个超级虚拟计算机;处理设备 1也可为移动终端, 所述移动终端意指可以在移动中使用的计算机设备, 包括但不限于手 机、 笔记本、 POS机、 车载电脑等, 其显示屏尺寸通常远远小于台式 电脑的显示器尺寸。  Here, the processing device 1 may be a network device, including but not limited to a computer, a network host, a single network server, a set of more than one network server, or a cloud composed of more than one server, where the cloud is cloud computing-based. A large number of computers or network servers, wherein cloud computing is a kind of distributed computing, a super virtual computer composed of a group of loosely coupled computers; the processing device 1 can also be a mobile terminal, and the mobile terminal means Computer equipment used in mobile, including but not limited to mobile phones, notebooks, POS machines, car computers, etc., the display size is usually much smaller than the size of the desktop computer.
以下参照图 3 来对处理设备 1 处理网页内容的过程进行详细描 述:  The process of processing webpage content by processing device 1 is described in detail below with reference to FIG. 3:
具体地, 在步骤 S1中, 处理设备 1获取待处理的原始网页。  Specifically, in step S1, the processing device 1 acquires the original web page to be processed.
在此,所述获取待处理的原始网页的方式包括但不限于以下情形: Here, the manner of obtaining the original webpage to be processed includes, but is not limited to, the following situations:
1 W艮据来自移动终端的页面访问请求,从该页面访问请求中的统 一资源定位符 (URL ) 所指向的网站服务器处获取相应的原始网页; 在一示例中, 首先, 用户借助移动终端的交互装置, 包括但不限 于键盘、 鼠标、 遥控器、 触摸板、 或手写设备, 与移动终端的浏览器 软件或客户端软件进行交互, 以键盘为例, 用户在移动终端的浏览器 软件的地址栏输入框中进行输入时, 该移动终端实时地获取用户输入 的按键序列, 例如用户输入的一条统一资源定位符 (URL ), 并记录 为与该用户输入操作相对应的页面访问请求, 其中, 该页面访问请求 中包括该 URL,然后将该页面访问请求通过约定的通信方式发送至处 理设备 1 ; 接着, 在步骤 S1 中, 处理设备 1 实时地接收该页面访问 请求, 并从中提取页面 URL, 并向该 URL所指向网页所在的网络服 务器发送获取该网页的请求,例如,可将其封装为一请求消息,如 http 请求消息, 并通过相应的通信协议, 如 http、 https通信协议, 发送至 该网络服务器; 接着, 处理设备 1接收该网络服务器响应于该请求而 反馈的网页, 并将该网页作为所述待处理的原始网页。 1 W according to the page access request from the mobile terminal, obtaining the corresponding original webpage from the website server pointed to by the uniform resource locator (URL) in the page access request; in an example, first, the user by means of the mobile terminal The interaction device, including but not limited to a keyboard, a mouse, a remote controller, a touch pad, or a handwriting device, interacts with a browser software or a client software of the mobile terminal, taking the keyboard as an example, the address of the browser software of the user at the mobile terminal When inputting in the column input box, the mobile terminal acquires a key sequence input by the user in real time, for example, a uniform resource locator (URL) input by the user, and records the page access request corresponding to the user input operation, where The page access request includes the URL, and then the page access request is sent to the processing device 1 by the agreed communication method; then, in step S1, the processing device 1 receives the page access in real time. Requesting, and extracting a page URL therefrom, and sending a request for obtaining the web page to a web server where the web page pointed to by the URL, for example, encapsulating it as a request message, such as an http request message, and through a corresponding communication protocol, such as The http, https communication protocol is sent to the web server; then, the processing device 1 receives the webpage that the web server feeds back in response to the request, and uses the webpage as the original webpage to be processed.
2 ) 从第三方设备获取待处理的原始网页。  2) Obtain the original web page to be processed from the third-party device.
在另一示例中, 处理设备 1为网络设备。 在步骤 S1 中, 处理设 备 1根据第三方设备提供的应用编程接口 (API ), 受预定条件或事件 触发地、或定期地向该第三方设备发送接收待处理的原始网页的请求 消息, 并接收该第三方设备响应于该请求消息返回的待处理的原始网 页; 或第三方设备主动向处理设备 1推送待处理的原始网页, 在步骤 S1中, 处理设备 1接收该待处理的原始网页。  In another example, processing device 1 is a network device. In step S1, the processing device 1 sends a request message for receiving the original web page to be processed to the third party device according to an application programming interface (API) provided by the third party device, triggered by a predetermined condition or event, or periodically. The third-party device responds to the original web page to be processed returned by the request message; or the third-party device actively pushes the original web page to be processed to the processing device 1, and in step S1, the processing device 1 receives the original web page to be processed.
本领域技术人员应能理解上述获取待处理的原始网页的方式仅 为举例, 其他现有的或今后可能出现的获取待处理的原始网页的方式 如可适用于本发明, 也应包含在本发明保护范围以内, 并以引用方式 包含于此。  Those skilled in the art should understand that the manner of obtaining the original webpage to be processed is only an example, and other existing or future possible ways of obtaining the original webpage to be processed, as applicable to the present invention, are also included in the present invention. Within the scope of protection, and is hereby incorporated by reference.
接着, 在步骤 S2中, 处理设备 1从其在步骤 S1中获取的原始网 页的标记语言文件中例如利用字符串匹配等方式提取块标识信息, 其 中, 所述块标识信息用于标识所述标记语言文件中的各内容块。  Next, in step S2, the processing device 1 extracts block identification information from the markup language file of the original web page acquired in step S1, for example, by using string matching or the like, wherein the block identification information is used to identify the mark Each content block in the language file.
在此, 所述标记语言文件包括但不限于:  Here, the markup language file includes but is not limited to:
1 ) HTML (超文本标记语言)文件, 其是用于描述网页文档的一种 标准通用标己语言;  1) HTML (Hypertext Markup Language) file, which is a standard universal markup language used to describe web page documents;
2 ) XML (可扩展标记语言) 文件, 其是一种简单的用于数据存储 的标准通用标记语言;  2) XML (Extensible Markup Language) file, which is a simple standard universal markup language for data storage;
3 ) XHTML (可扩展超文本标记语言) 文件, 其是一种基于 XML 的具有严格语法的标记语言;  3) XHTML (Extensible Hypertext Markup Language) file, which is an XML-based markup language with strict syntax;
4 ) WML (无线标记语言) 文件, 其是用于创建可显示在 WAP浏 览器中的页面的一种描述性标记语言。  4) A WML (Wireless Markup Language) file, which is a descriptive markup language used to create pages that can be displayed in a WAP browser.
本领域技术人员应能理解上述标记语言文件仅为举例, 其他现有 的或今后可能出现的标记语言文件如可适用于本发明, 也应包含在本 发明保护范围以内, 并以引用方式包含于此。 Those skilled in the art should be able to understand that the above markup language files are only examples, other existing </ RTI><RTIgt;</RTI><RTIgt;</RTI><RTIgt;</RTI><RTIgt;</RTI><RTIgt;
在此, 所述块标识信息包括但不限于标识名称、 标识 ID等; 其 中, 标识名称的命名可根据其标识的内容块的类型, 如标题、 导航、 正文、 图片、 内嵌对象 (如 Java applet, ActiveX, Flash ) 等。  Here, the block identification information includes, but is not limited to, an identification name, an identification ID, and the like; wherein the identification name may be named according to the type of the content block it identifies, such as a title, a navigation, a body, a picture, an embedded object (such as Java). Applet, ActiveX, Flash), etc.
在此, 所述内容块意为标记语言文件中的由一个或一个以上标签 组成的内容区域, 其与网页中显示的特定内容相对应, 如, 标题内 容块、 正文内容块、 导航内容块、 图片内容块、 内嵌对象 (如 Java applet、 ActiveX、 Flash ) 块等。  Here, the content block means a content area composed of one or more tags in a markup language file, which corresponds to a specific content displayed in a webpage, such as a title content block, a body content block, a navigation content block, Image content blocks, embedded objects (such as Java applets, ActiveX, Flash), and so on.
在此,所述块标识信息在标记语言文件中的存储方式包括但不限于: 1 )标记语言文件中的注释; 例如, 利用 JSON格式, 标识信息可存 储于 HTML文件注释中 , 如〈!— tc block—begin: {type: "context"}― >, 其 中, JSON格式是一种轻量级的数据交换格式, 其一般采用"名称 /值"对 的方式表示数据, 名称和值之间使用": "隔开;  Here, the storage manner of the block identification information in the markup language file includes but is not limited to: 1) markup in the markup language file; for example, using the JSON format, the identification information may be stored in the HTML file comment, such as <! — tc block—begin: {type: "context"}― >, where JSON format is a lightweight data exchange format that generally uses a "name/value" pair to represent data, between name and value. Separated by ":";
2 )标记语言文件中的定制标签; 例如, 在 HTML文件中, 定制标 签可为 <tc></tc>, 标识信息可存储于该定制标签中;  2) a custom tag in the markup language file; for example, in the HTML file, the custom tag can be <tc></tc>, and the identification information can be stored in the custom tag;
3 )标记语言文件中的标签属性; 例如, 在 XHTML文件中, 标识信 息可存储于内容块标签的属性中, 如< ¥ markName= "标题" >, 其中属 性 markName的属性值即为用于标识此 div标签所对应的内容块的标 识信息。  3) Marking the tag attribute in the language file; for example, in the XHTML file, the identification information can be stored in the attribute of the content block tag, such as < ¥ markName= "title" >, where the attribute value of the attribute markName is used for identification The identification information of the content block corresponding to this div tag.
本领域技术人员应能理解上述存储方式仅为举例, 其他现有的或 今后可能出现的存储方式如可适用于本发明, 也应包含在本发明保护 范围以内, 并以引用方式包含于此。  Those skilled in the art will appreciate that the above-described storage methods are merely examples, and other existing or future storage methods, such as those applicable to the present invention, are also included in the scope of the present invention and are incorporated herein by reference.
在一示例中, 当处理设备 1在步骤 S2 中获取的原始网页的标识 语言文件为 XHTML文件时, 如:  In an example, when the identification language file of the original web page acquired by the processing device 1 in step S2 is an XHTML file, such as:
<body> <body>
<div markNams="标题" >  <div markNams="title" >
<h2>News headline K/h2> 7flower.jpg" markName="图片" <h2>News headline K/h2> 7flower.jpg"markName="picture"
其中, 此 XHTML文件预先定义利用属性名为 markName的标签 属性来存储内容块标识信息, 据此, 在步骤 S2中, 处理设备 1通过 对该 XHTML文件进行解析, 并 4艮据关键字 "markName" 进行字符 串匹配以从中获得 div标签属性中的 markName属性及其属性值 "标 题" , 该属性值即为该 div标签所对应内容块的标识名称, 以及 img 标签属性中的 markName属性及其属性值 "图片" , 该属性值即为该 img标签所对应内容块的标识名称。 Wherein, the XHTML file pre-defines the content block identification information by using the tag attribute with the attribute name markName, according to which, in step S2, the processing device 1 parses the XHTML file, and according to the keyword "markName" Perform string matching to obtain the markName attribute in the div tag attribute and its attribute value "title", which is the identification name of the content block corresponding to the div tag, and the markName attribute and its attribute value in the img tag attribute. "Picture", the attribute value is the identification name of the content block corresponding to the img tag.
本领域技术人员应能理解上述提取块标识信息的方式仅为举例, 其他现有的或今后可能出现的提取块标识信息的方式如可适用于本 发明, 也应包含在本发明保护范围以内, 并以引用方式包含于此。  A person skilled in the art should understand that the manner of extracting block identification information is only an example, and other existing or future possible methods for extracting block identification information may be applicable to the present invention and should also be included in the scope of protection of the present invention. And is included here by reference.
随后, 在步骤 S3中, 处理设备 1根据其在步骤 S2中获取的块标 识信息, 在处理规则库中进行匹配查询, 以获得与该块标识信息相对 应的内容块处理规则。  Subsequently, in step S3, the processing device 1 performs a matching query in the processing rule base based on the block identification information acquired in step S2 to obtain a content block processing rule corresponding to the block identification information.
具体地, 在步骤 S3 中, 处理设备 1根据块标识信息, 在本地或 第三方设备的处理规则库中进行匹配查询, 以获得与该块标识信息相 对应的内容块处理规则。  Specifically, in step S3, the processing device 1 performs a matching query in the processing rule base of the local or third-party device according to the block identification information to obtain a content block processing rule corresponding to the block identification information.
在此, 所述处理规则包括但不限于:  Here, the processing rule includes but is not limited to:
1 ) 对内容块中的内容进行格式化; 其中, 所述格式化包括但不 限于:  1) formatting the content in the content block; wherein the formatting includes but is not limited to:
i 改变所述内容块中的文字属性, 如字体、 大小、 颜色, 内容的 背景色等;  i changing the text attributes in the content block, such as font, size, color, background color of the content, etc.;
11对所述内容块中包含的图片按预定比例进行缩小等; 2 )对内容块进行展示; 11 reducing the picture included in the content block by a predetermined ratio; 2) display the content block;
3 )对内容块进行删除;  3) delete the content block;
4 ) 对内容块进行折叠; 其中, 所述折叠意指该内容块设置为其 内容缺省是折叠隐藏的, 但可通过特定的触发方式, 将该内容展开来 显示;  4) folding the content block; wherein the folding means that the content block is set to be hidden by the content by default, but the content may be expanded by a specific triggering manner;
5 )对内容块的显示位置进行调整。  5) Adjust the display position of the content block.
本领域技术人员应能理解上述处理规则仅为举例, 其他现有的或 今后可能出现的处理规则如可适用于本发明, 也应包含在本发明保护 范围以内, 并以引用方式包含于此。  It should be understood by those skilled in the art that the above-mentioned processing rules are only examples, and other existing or future processing rules may be applied to the present invention, and are also included in the scope of the present invention and are incorporated herein by reference.
在此, 所述处理规则库中包含各块标识信息及其所对应的处理规 则, 其包括但不限于关系数据库、 Key- Value存储系统、 文件系统等。  Here, the processing rule base includes each block identification information and a corresponding processing rule thereof, including but not limited to a relational database, a Key-Value storage system, a file system, and the like.
在一示例中, 块标识信息为 "标题" , 在步骤 S3中, 处理设备 1 根据该块标识信息, 通过处理设备 1提供的应用编程接口 (API ) , 在本地的处理规则库中进行匹配查询, 以获取与 "标题" 块标识信息 相对应的内容块处理规则为 "show", 即将该块标识信息所标识的内容 块进行展示处理。  In an example, the block identification information is a "title", and in step S3, the processing device 1 performs a matching query in the local processing rule base by using an application programming interface (API) provided by the processing device 1 according to the block identification information. The content block processing rule corresponding to the "title" block identification information is "show", that is, the content block identified by the block identification information is subjected to display processing.
在另一示例中, 块标识信息为 "图片" , 在步骤 S3 中, 处理设 备 1根据该块标识信息,向第三方设备发送处理规则获取请求,其中, 该处理规则获取请求包括该块标识信息; 例如, 可将其封装为一请求 消息, 如 http请求消息, 并通过相应的通信协议, 如 http、 https通信 协议, 发送至第三方设备; 第三方设备以实时监听地方式接收并解析 该请求信息, 进而根据所提取的该块标识信息在其处理规则库中进行 匹配查询, 以获取与该块标识信息相对应的内容块处理规则为 "zoomin", 即将该块标识信息所标识的内容块中的图片进行预定的缩 小处理。  In another example, the block identification information is a "picture", and in step S3, the processing device 1 sends a processing rule acquisition request to the third-party device according to the block identification information, where the processing rule acquisition request includes the block identification information. For example, it can be encapsulated into a request message, such as an http request message, and sent to a third-party device through a corresponding communication protocol, such as http, https communication protocol; the third-party device receives and parses the request in real-time listening manner. And performing a matching query in the processing rule base according to the extracted block identification information, so as to obtain a content block processing rule corresponding to the block identification information, which is “zoomin”, that is, the content block identified by the block identification information. The picture in the picture is subjected to a predetermined reduction process.
本领域技术人员应能理解上述获取处理规则的方式仅为举例, 其 他现有的或今后可能出现的获取处理规则的方式如可适用于本发明, 也应包含在本发明保护范围以内, 并以引用方式包含于此。  Those skilled in the art should understand that the manner of obtaining the processing rule is only an example, and other existing or future possible acquisition processing rules may be applicable to the present invention, and should also be included in the protection scope of the present invention. The reference is included here.
优选地, 在步骤 S3 中, 处理设备 1根据所述块标识信息和所述 原始网页所属网站的标识信息, 在处理规则库中进行匹配查询, 以获 得为该网站的网页定制的内容块处理规则。 在此, 所述原始网页所属 网站的标识信息包括但不限于网站域名、 网站 IP地址、 网站名称等。 Preferably, in step S3, the processing device 1 according to the block identification information and the The identification information of the website to which the original webpage belongs is matched query in the processing rule base to obtain a content block processing rule customized for the webpage of the website. Here, the identification information of the website to which the original webpage belongs includes, but is not limited to, a website domain name, a website IP address, a website name, and the like.
具体地, 在步骤 S3中, 处理设备 1例如根据其在步骤 S1中获取 待处理的原始网页的 URL, 确定该网页所属网站的标识信息, 如网站域 名、 网站 IP地址等; 接着, 处理设备 1根据其在步骤 S2中获取的块 标识信息和该原始网页所属网站的标识信息, 在处理规则库中进行匹 配查询, 若匹配获得为该网站的网页预定的处理规则, 则将该预定的处 理规则作为该网页的内容块处理规则。  Specifically, in step S3, the processing device 1 determines the identification information of the website to which the web page belongs, such as the website domain name, the website IP address, etc., according to the URL of the original web page to be processed, for example, in step S1; Performing a matching query in the processing rule base according to the block identification information acquired in step S2 and the identification information of the website to which the original web page belongs, and if the matching is obtained as a processing rule reserved for the webpage of the website, the predetermined processing rule is obtained. As the content block processing rule of the web page.
在一示例中, 当块标识信息为 "内嵌对象" , 原始网页的 URL 为" www.abc.com/sport/101.htm", 在步骤 S3 中, 处理设备 1 根据该 URL, 提取该网页所在网站的网站域名为 "www.abc.com"; 处理设备 1根据该块标识信息在处理规则库中进行匹配查询, 获得相应的处理 规则为 "delete" , 即删除该标识信息所标识的内容块, 但据该块标识 信息和该原始网页所述网站的网站域名在处理规则库中进行匹配查 询, 获得为该网站预定的对 "内嵌对象" 块标识信息的处理规则为 "show", 即展示该标识信息所标识的内容块, 则处理设备 1忽略与该 块标识信息所对应的删除处理规则, 将为该网站预定的处理规则作为 该内容块处理规则。  In an example, when the block identification information is "embedded object" and the URL of the original web page is "www.abc.com/sport/101.htm", in step S3, the processing device 1 extracts the web page according to the URL. The website domain name of the website is "www.abc.com"; the processing device 1 performs a matching query in the processing rule base according to the block identification information, and obtains the corresponding processing rule as "delete", that is, deletes the content identified by the identification information. Block, but according to the block identification information and the website domain name of the website described in the original webpage, a matching query is performed in the processing rule base, and the processing rule for obtaining the "inline object" block identification information reserved for the website is "show", That is, the content block identified by the identification information is displayed, and the processing device 1 ignores the deletion processing rule corresponding to the block identification information, and uses the processing rule predetermined for the website as the content block processing rule.
本领域技术人员应能理解上述获取处理规则的方式仅为举例, 其 他现有的或今后可能出现的获取处理规则的方式如可适用于本发明, 也应包含在本发明保护范围以内, 并以引用方式包含于此。  Those skilled in the art should understand that the manner of obtaining the processing rule is only an example, and other existing or future possible acquisition processing rules may be applicable to the present invention, and should also be included in the protection scope of the present invention. The reference is included here.
随后, 在步骤 S4中, 处理设备 1根据其在步骤 S3中获取的内容 块处理规则, 对该块标识信息所标识的内容块进行相应的处理, 以获得 目标网页。  Then, in step S4, the processing device 1 performs corresponding processing on the content block identified by the block identification information according to the content block processing rule acquired in step S3 to obtain a target web page.
在此, 所述对内容块进行相应的处理包括但不限于: 对内容块中的 内容进行格式化、 展示、 删除、 折叠、 调序。  Here, the corresponding processing on the content block includes, but is not limited to: formatting, displaying, deleting, folding, and ordering the content in the content block.
在一示例中, 当在步骤 S2中, 处理设备 1解析并获取某网页的 HTML文件中的两个块标识信息分别为 "正文" 和 "图片" , 且在步 骤 S3中, 处理设备 1获取与 "正文" 块标识信息相对应的内容块处 理规则为将该标识信息所标识的内容块折叠, 而与 "图片" 块标识信 息相对应的内容块处理规则为将该标识信息所标识的内容块中的图 片按预定缩小比例进行缩小; 则在步骤 S4中, 处理设备 1根据上述 标识信息,在该 HTML文件中获取各标识信息所标识的内容块,然后, 根据相应的处理规则, 将 "正文" 块标识信息所标识的内容块中的内 容折叠隐藏, 并设置预定的触发方式, 以实现将来可对该正文内容展 开来显示, 并将 "图片" 块标识信息所标识的内容块中的图片按预定 比例进行缩小并展示, 进而将处理后的网页作为目标网页。 In an example, in step S2, the processing device 1 parses and acquires two block identification information in an HTML file of a web page as "text" and "picture", respectively, and is in step In step S3, the processing device 1 acquires the content block processing rule corresponding to the "body" block identification information to collapse the content block identified by the identification information, and the content block processing rule corresponding to the "picture" block identification information is The image in the content block identified by the identification information is reduced in a predetermined reduction ratio. Then, in step S4, the processing device 1 acquires the content block identified by each identification information in the HTML file according to the identification information, and then, According to the corresponding processing rule, the content in the content block identified by the "body" block identification information is folded and hidden, and a predetermined triggering manner is set, so that the content of the text can be expanded and displayed in the future, and the "picture" block identifier is displayed. The image in the content block identified by the information is reduced and displayed in a predetermined proportion, and the processed web page is used as the target web page.
本领域技术人员应能理解上述获取目标网页的方式仅为举例, 其 他现有的或今后可能出现的获取目标网页的方式如可适用于本发明, 也应包含在本发明保护范围以内, 并以引用方式包含于此。  A person skilled in the art should understand that the manner of obtaining the target webpage is only an example. Other existing or future possible ways of obtaining the target webpage may be applicable to the present invention, and should also be included in the scope of the present invention. The reference is included here.
优选地, 处理设备 1在步骤 Sl、 步骤 S2、 步骤 S3和步骤 S4中 是持续不断地工作。 具体地, 在步骤 S1 中, 处理设备 1持续地获取 待处理的原始网页; 接着, 在步骤 S2中, 处理设备 1也持续地从所 述原始网页的标记语言文件中提取块标识信息, 其中, 所述块标识信 息用于标识所述标记语言文件中的各内容块; 随后, 在步骤 S3 中, 处理设备 1也持续地根据所述块标识信息, 在处理规则库中进行匹配 查询, 以获得与该块标识信息相对应的内容块处理规则; 随后, 在步 骤 S4中, 处理设备 1也持续地根据所述内容块处理规则, 对该块标 识信息所标识的内容块进行相应的处理, 以获得目标网页。 在此, 本 领域技术人员应理解"持续"是指处理设备 1在各步骤中不断进行上述 原始网页的获取、 块标识信息的提取、 处理规则的获取及目标网页的 获取, 直至满足预定停止条件, 例如处理设备 1在较长时间内停止获 取待处理的原始网页。  Preferably, the processing device 1 continues to operate in steps S1, S2, S3 and S4. Specifically, in step S1, the processing device 1 continuously acquires the original web page to be processed; then, in step S2, the processing device 1 also continuously extracts block identification information from the markup language file of the original web page, where The block identification information is used to identify each content block in the markup language file; subsequently, in step S3, the processing device 1 also continuously performs a matching query in the processing rule base according to the block identification information to obtain a content block processing rule corresponding to the block identification information; subsequently, in step S4, the processing device 1 also continuously performs corresponding processing on the content block identified by the block identification information according to the content block processing rule, to Get the landing page. Here, those skilled in the art should understand that "continuous" means that the processing device 1 continuously performs the acquisition of the original web page, the extraction of the block identification information, the acquisition of the processing rule, and the acquisition of the target web page in each step until the predetermined stop condition is satisfied. For example, the processing device 1 stops acquiring the original web page to be processed for a long time.
优选地 (参照图 3 ) , 当未从所述处理规则库获得所述内容块处 理规则时, 在步骤 S3中, 处理设备 1可根据所述块标识信息所标识 的内容块的内容相关信息, 确定所述内容块处理规则。  Preferably (refer to FIG. 3), when the content block processing rule is not obtained from the processing rule base, in step S3, the processing device 1 may according to the content related information of the content block identified by the block identification information, The content block processing rule is determined.
在此, 所述内容块的内容相关信息包括但不限于: 1 ) 内容块的内容在所述原始网页中的位置信息; Here, the content related information of the content block includes but is not limited to: 1) location information of the content of the content block in the original webpage;
2 ) 内容块的内容所包含的文字字符数量;  2) the number of text characters contained in the content of the content block;
3 ) 内容块所包含的标签信息。  3) The tag information contained in the content block.
本领域技术人员应能理解上述内容相关信息仅为举例, 其他现有 的或今后可能出现的内容相关信息如可适用于本发明, 也应包含在本 发明保护范围以内, 并以引用方式包含于此。  Those skilled in the art should understand that the above content related information is only an example, and other existing or future content related information may be applicable to the present invention, and should also be included in the scope of the present invention and included in the reference. this.
1 )在步骤 S3中, 处理设备 1根据内容块在原始网页中的位置确 定处理规则; 例如, 若块标识信息所标识的内容块位于原始网页的中 心处, 即说明该内容块在该原始网页中的重要等级高, 则可确定该内 容块处理规则为对该内容块进行展示处理。 1) In step S3, the processing device 1 determines a processing rule according to the location of the content block in the original webpage; for example, if the content block identified by the block identification information is located at the center of the original webpage, that is, the content block is on the original webpage If the importance level is high, the content block processing rule may be determined to perform display processing on the content block.
2 )在步骤 S3中, 处理设备 1根据内容块中的文字字符的数量确 定处理规则; 例如, 若块标识信息所标识的内容块字符数量超过预定 字符数量阈值时, 则可确定其处理规则为将该内容块中文字内容进行 折叠处理;  2) In step S3, the processing device 1 determines a processing rule according to the number of character characters in the content block; for example, if the number of content block characters identified by the block identification information exceeds a predetermined number of characters threshold, it may be determined that the processing rule is Folding the text content in the content block;
3 )在步骤 S3中, 处理设备 1根据该内容块中包含的标签对象确 定处理规则; 例如, 若块标识信息在原始网页的标记语言文件中所标 识的内容块中包括标签<0 6(^> ,且该标签<0 6(^>包含在移动设备中 预定限制使用的对象, 如 ActiveX, 则确定其处理规则为对该内容块 进行删除。  3) In step S3, the processing device 1 determines a processing rule according to the tag object included in the content block; for example, if the block identification information includes the tag <0 6 (^ in the content block identified in the markup language file of the original web page) > , and the tag <0 6 (^> contains an object that is scheduled to be restricted in the mobile device, such as ActiveX, then determines its processing rule to delete the content block.
在一示例中, 在原始网页的 HTML文件存在以下代码片段:  In an example, the following code snippet exists in the HTML file of the original web page:
<!-- tc block—begin: {markName: "内嵌对象 "} -- > <!-- tc block—begin: {markName: "embedded object"} -- >
< OBJECT  < OBJECT
classid="clsid: 2F390484-1C7D-11D0-8908-00A0C90395F4" codebase="ActiveXDoc.cab#version=l , 0, 0, 0" >  Classid="clsid: 2F390484-1C7D-11D0-8908-00A0C90395F4" codebase="ActiveXDoc.cab#version=l , 0, 0, 0" >
< /OBJECT > ,  < /OBJECT > ,
<!— tc block end— > 其中存在的块标识信息为 "内嵌对象", 在步骤 S3中, 处理设备 1根据该块标识信息未能从处理规则库中查询匹配获得相应的内容块 处理规则, 且从该标签 <object>中解析获得该标签具有属性 clsid, 进 而确定其中包括 ActiveX内嵌对象, 由此确定该块标识信息所对应的 处理规则为将该标识信息所标识的内容块删除。 <!- tc block end— > The block identification information that exists therein is an "embedded object". In step S3, the processing device 1 fails to obtain a corresponding content block processing rule from the processing rule base according to the block identification information, and from the tag <object> The parsing obtains the tag with the attribute clsid, and further determines that the ActiveX embedded object is included therein, thereby determining that the processing rule corresponding to the block identification information is to delete the content block identified by the identifier information.
本领域技术人员应能理解上述确定处理规则的方式仅为举例, 其 他现有的或今后可能出现的确定处理规则的方式如可适用于本发明, 也应包含在本发明保护范围以内, 并以引用方式包含于此。  Those skilled in the art should understand that the manner of determining the processing rule is merely an example, and other existing or future possible methods for determining the processing rule, as applicable to the present invention, are also included in the scope of the present invention, and The reference is included here.
图 4示出根据本发明一个优选实施例的基于内容块标识处理网页 内容的方法流程图。 其中, 该过程还包括步骤 S5,。 在步骤 S5,中, 处 理设备 1根据所述新确定的内容块处理规则, 建立或更新所述处理规 则库。  4 illustrates a flow chart of a method for processing web page content based on content block identification in accordance with a preferred embodiment of the present invention. Wherein, the process further includes step S5. In step S5, the processing device 1 creates or updates the processing rule library according to the newly determined content block processing rule.
在此, 图 4中所示处理设备 1在步骤 Sl,、 步骤 S2,、 步骤 S3,和 步骤 S4,中的功能与前面参照图 3所描述的处理设备 1在步骤 Sl、 步 骤 S2、 步骤 S3和步骤 S4中的内容相同, 为简明起见, 将其以引用方 式包含于此, 而不做赞述。  Here, the functions of the processing device 1 shown in FIG. 4 in step S1, step S2, step S3, and step S4, and the processing device 1 described above with reference to FIG. 3 are in step S1, step S2, step S3. It is the same as that in step S4, and for the sake of brevity, it is included herein by reference, and is not described.
具体地, 当在步骤 S3,中, 处理设备 1根据标识信息未从处理规 则库获得相应的内容块处理规则时, 其为标识信息新确定内容块处理 规则, 则在步骤 S5,中, 处理设备 1根据该标识信息及其对应的该新 确定的处理规则写入到该处理规则库中, 以更新该处理规则库; 若检 测到该处理规则库未建立, 则先行初始化该处理规则库, 然后将上述 信息写入到该处理规则库中。  Specifically, when, in step S3, the processing device 1 does not obtain the corresponding content block processing rule from the processing rule base according to the identification information, it newly determines the content block processing rule for the identification information, then in step S5, the processing device 1 according to the identification information and the corresponding newly determined processing rule is written into the processing rule base to update the processing rule base; if it is detected that the processing rule base is not established, the processing rule base is initialized first, and then Write the above information to the processing rule base.
在一示例中, 在步骤 S3,中, 处理设备 1获得的标记名称为 "内 嵌对象" 所对应的新处理规则为删除处理时, 则在步骤 S5,中, 处理 设备 1在处理规则库中插入一条该标记名称及其对应的处理规则的数 据记录。  In an example, in step S3, when the new processing rule corresponding to the tag name "inline object" obtained by the processing device 1 is a deletion process, then in step S5, the processing device 1 is in the process rule library. Insert a data record of the tag name and its corresponding processing rule.
本领域技术人员应能理解上述建立或更新处理规则库的方式仅 为举例, 其他现有的或今后可能出现的建立或更新处理规则库的方式 如可适用于本发明, 也应包含在本发明保护范围以内, 并以引用方式 包含于此。 Those skilled in the art should understand that the above manner of establishing or updating the processing rule base is only an example, and other existing or future possible ways of establishing or updating the processing rule base may be applied to the present invention, and should also be included in the present invention. Within the scope of protection, and by reference Included here.
在另一个优选实施例中 (参照图 3 ) , 该过程还包括步骤 S6 (未示 出) 。 其中, 在步骤 S1中, 处理设备 1根据用户通过移动终端输入的 页面访问请求, 获取所述原始网页; 在步骤 S6中, 处理设备 1将所述 目标网页提供给所述用户。  In another preferred embodiment (see Figure 3), the process further includes a step S6 (not shown). In step S1, the processing device 1 acquires the original webpage according to a page access request input by the user through the mobile terminal; in step S6, the processing device 1 provides the target webpage to the user.
以下参照图 3对该另一优选实施例进行详细描述, 其中, 在步骤 S2 中, 处理设备 1从所述原始网页的标记语言文件中提取块标识信息, 其中, 所述块标识信息用于标识所述标记语言文件中的各内容块; 随 后, 在步骤 S3中, 处理设备 1根据所述块标识信息, 在处理规则库 中进行匹配查询, 以获得与该块标识信息相对应的内容块处理规则; 随后, 在步骤 S4中, 处理设备 1地根据所述内容块处理规则, 对该 块标识信息所标识的内容块进行相应的处理, 以获得目标网页; 其具 体过程与前述参照图 3所描述的实施例中处理设备 1在步骤 S2、 步骤 S3和步骤 S4中所执行的过程相同,为简明起见,以引用方式包含于此, 而不做赞述。  The other preferred embodiment is described in detail below with reference to FIG. 3, wherein, in step S2, the processing device 1 extracts block identification information from the markup language file of the original webpage, wherein the block identification information is used to identify Each content block in the markup language file; subsequently, in step S3, the processing device 1 performs a matching query in the processing rule base according to the block identification information to obtain a content block processing corresponding to the block identification information. a rule; subsequently, in step S4, the processing device 1 performs corresponding processing on the content block identified by the block identification information according to the content block processing rule to obtain a target web page; the specific process is as described above with reference to FIG. The processes performed by the processing device 1 in the step S2, the step S3 and the step S4 are the same in the described embodiment, and are included herein by way of citation for the sake of brevity.
在一示例中, 首先, 用户在移动终端的浏览器软件的地址栏输入 框中进行输入时, 该移动终端实时地获取用户输入的一条网页 URL, 并记录为与该用户输入操作相对应的页面访问请求, 其中, 该页面访 问请求中包括该 URL,然后将该页面访问请求通过约定的通信方式发 送至处理设备 1 ; 接着, 在步骤 S1 中, 处理设备 1 实时地接收该页 面访问请求, 并从中提取页面 URL, 并向该 URL所指向网页所在的 网络服务器发送获取该网页的请求, 然后, 接收该网络服务器响应于 该请求而反馈的网页, 并将该网页作为所述待处理的原始网页。  In an example, first, when the user inputs in the address bar input box of the browser software of the mobile terminal, the mobile terminal acquires a webpage URL input by the user in real time, and records the page corresponding to the user input operation. An access request, wherein the page access request includes the URL, and then the page access request is sent to the processing device 1 by an agreed communication method; then, in step S1, the processing device 1 receives the page access request in real time, and Extracting a page URL from the webpage, and sending a request for obtaining the webpage to a web server where the webpage pointed to by the webpage is located, and then receiving a webpage that is fed back by the web server in response to the request, and using the webpage as the original webpage to be processed. .
在步骤 S6中, 处理设备 1将其在步骤 S4中获取的目标网页, 采 用任何已知的移动终端提供人可读信息的技术手段, 例如屏幕显示、 扬声器播放等, 将该目标网页通过移动终端提供给该用户。 例如, 以 屏幕显示为例, 在步骤 S6中, 处理设备 1将其在步骤 S4中获取的目 标网页, 通过页面技术, 如 JSP、 ASP或 PHP等, 按一定顺序和格式 提供给移动终端, 例如以链接、 页面显示等方式提供给该移动终端, 供用户进行浏览。 In step S6, the processing device 1 uses the target webpage acquired in step S4 to adopt any known mobile terminal to provide human readable information, such as screen display, speaker playback, etc., to pass the target webpage through the mobile terminal. Provided to the user. For example, taking the screen display as an example, in step S6, the processing device 1 provides the target web page acquired in step S4 to the mobile terminal in a certain order and format through page technologies, such as JSP, ASP or PHP, for example. Provided to the mobile terminal by means of a link, a page display, etc. For users to browse.
本领域技术人员应能理解上述获取原始网页的方式和 /或提供目 标网页的方式仅为举例, 其他现有的或今后可能出现的获取原始网页 的方式和 /或提供目标网页的方式如可适用于本发明,也应包含在本发 明保护范围以内, 并以引用方式包含于此。  Those skilled in the art should understand that the manner of obtaining the original webpage and/or the manner of providing the target webpage is only an example, and other existing or future possible ways of obtaining the original webpage and/or providing the target webpage may be applied. The present invention should also be included in the scope of the present invention and is hereby incorporated by reference.
优选地(参照图 3 ), 该过程还包括步骤 S7 (未示出) 和步骤 S8 (未示出)。 其中, 在步骤 S7中, 处理设备 1获取所述移动终端的显 示参数信息; 在步骤 S8中, 处理设备 1根据所述显示参数信息对所 述内容块处理规则进行优化, 以获得优选内容块处理规则; 在步骤 S4 中, 处理设备 1 根据所述优选内容块处理规则, 对所述内容块进 行相应的处理, 以获得所述目标网页。  Preferably (see Fig. 3), the process further includes a step S7 (not shown) and a step S8 (not shown). In step S7, the processing device 1 acquires display parameter information of the mobile terminal; in step S8, the processing device 1 optimizes the content block processing rule according to the display parameter information to obtain a preferred content block processing. In step S4, the processing device 1 performs corresponding processing on the content block according to the preferred content block processing rule to obtain the target web page.
具体地, 在步骤 S7中, 处理设备 1按照约定的方式通过调用待显 示该目标网页的移动终端提供的 API (应用编程接口), 获取该移动终端 的显示参数信息; 在此, 所述显示参数信息包括但不限于:  Specifically, in step S7, the processing device 1 acquires display parameter information of the mobile terminal by calling an API (application programming interface) provided by the mobile terminal to display the target webpage in an agreed manner; where the display parameter is Information includes but is not limited to:
1 ) 移动终端支持的图片格式, 如 JPEG、 PNG、 GIF格式等, 1) Image formats supported by mobile terminals, such as JPEG, PNG, GIF formats, etc.
2 ) 移动终端的屏幕分辨率, 如像素的物理大小, 色彩位数, 2) the screen resolution of the mobile terminal, such as the physical size of the pixel, the number of color bits,
3 ) 移动终端是否支持插件, 如 Flash插件等;  3) Whether the mobile terminal supports plug-ins, such as Flash plug-ins;
接着, 在步骤 S8中, 处理设备 1根据其在步骤 S7中获取的该移 动终端的显示参数信息, 对其在步骤 S3 中为各标识信息所获取的内 容块处理规则进行优化处理, 以获得优选内容块处理规则。 随后, 在 步骤 S4中, 处理设备 1根据该优选内容块处理规则, 对所述内容块 进行相应的处理, 以获得所述目标网页。  Next, in step S8, the processing device 1 performs optimization processing on the content block processing rule acquired for each identification information in step S3 according to the display parameter information of the mobile terminal acquired in step S7 to obtain a preference. Content block processing rules. Then, in step S4, the processing device 1 performs corresponding processing on the content block according to the preferred content block processing rule to obtain the target web page.
在一示例中, 当处理设备 1在步骤 S2中获取的标记语言文件中 的块标识信息为 "Flash" , 其所标识的内容块中包含 Flash动画, 且 在步骤 S3中, 处理设备 1在处理规则库中获取的相应处理规则为将 该标识信息所标识的 Flash动画删除, 但在步骤 S7 中, 处理设备 1 获取的显示参数信息示出该移动终端支持 FLASH插件运行, 则在步 骤 S8中, 处理设备 1据此将该标识信息所对应的原始处理规则优化 为保留内容块中的 Flash动画, 即为优选内容块处理规则; 进而在步 骤 S4中, 处理设备 1对该内容块进行相应处理时保留其中的 FLASH 动画, 以获得包含该 FLASH动画的目标网页。 In an example, when the block identification information in the markup language file acquired by the processing device 1 in step S2 is "Flash", the identified content block contains a Flash animation, and in step S3, the processing device 1 is processing The corresponding processing rule obtained in the rule base is to delete the Flash animation identified by the identification information, but in step S7, the display parameter information acquired by the processing device 1 indicates that the mobile terminal supports the FLASH plug-in operation, then in step S8, The processing device 1 optimizes the original processing rule corresponding to the identification information to the Flash animation in the reserved content block, that is, the preferred content block processing rule; In step S4, the processing device 1 retains the FLASH animation in the content block when corresponding processing is performed to obtain a target webpage including the FLASH animation.
本领域技术人员应能理解上述获取显示参数信息的方式和 /或获 取优选内容块处理规则的方式和 /或获取目标网页的方式仅为举例, 其 他现有的或今后可能出现的获取显示参数信息的方式和 /或获取优选 内容块处理规则的方式和 /或获取目标网页的方式如可适用于本发明, 也应包含在本发明保护范围以内, 并以引用方式包含于此。  Those skilled in the art should be able to understand that the manner of obtaining the display parameter information and/or the manner of obtaining the preferred content block processing rule and/or the manner of obtaining the target webpage are merely examples, and other existing or future possible acquisition parameter information may be obtained. The manner and/or manner of obtaining the preferred content block processing rules and/or the manner in which the target web page is obtained, as applicable to the present invention, is also included in the scope of the present invention and is incorporated herein by reference.
对于本领域技术人员而言, 显然本发明不限于上述示范性实施例 的细节, 而且在不背离本发明的精神或基本特征的情况下, 能够以其 他的具体形式实现本发明。 因此, 无论从哪一点来看, 均应将实施例 看作是示范性的, 而且是非限制性的, 本发明的范围由所附权利要求 而不是上述说明限定, 因此旨在将落在权利要求的等同要件的含义和 范围内的所有变化涵括在本发明内。 不应将权利要求中的任何附图标 记视为限制所涉及的权利要求。 此外, 显然"包括"一词不排除其他单 元或步骤, 单数不排除复数。 装置权利要求中陈述的多个单元或装置 也可以由一个单元或装置通过软件或者硬件来实现。 第一, 第二等词 语用来表示名称, 而并不表示任何特定的顺序。  It is apparent to those skilled in the art that the present invention is not limited to the details of the above-described exemplary embodiments, and the present invention can be implemented in other specific forms without departing from the spirit or essential characteristics of the invention. Therefore, the present embodiments are to be considered as illustrative and not restrictive, and the scope of the invention is defined by the appended claims All changes in the meaning and scope of equivalent elements are included in the present invention. Any reference signs in the claims should not be construed as limiting the claim. In addition, it is obvious that the word "comprising" does not exclude other elements or steps, and the singular does not exclude the plural. The plurality of units or devices recited in the device claims may also be implemented by a unit or device by software or hardware. The first and second terms are used to denote names and do not represent any particular order.

Claims

权 利 要 求 书 Claim
1. 一种计算机实现的基于内容块标识处理网页内容的方法, 其中, 该方法包括以下步骤: A computer-implemented method for processing webpage content based on content block identification, wherein the method comprises the following steps:
a获取待处理的原始网页;  a obtain the original web page to be processed;
b从所述原始网页的标记语言文件中提取块标识信息, 其中, 所述 块标识信息用于标识所述标记语言文件中的各内容块;  b extracting block identification information from the markup language file of the original webpage, where the block identification information is used to identify each content block in the markup language file;
c根据所述块标识信息, 在处理规则库中进行匹配查询, 以获得与 所述块标识信息相对应的内容块处理规则;  And performing, according to the block identification information, a matching query in the processing rule base to obtain a content block processing rule corresponding to the block identification information;
d根据所述内容块处理规则, 对所述块标识信息所标识的内容块进 行相应的处理, 以获得目标网页。  And performing, according to the content block processing rule, the content block identified by the block identification information to obtain a target webpage.
2. 根据权利要求 1所述的方法, 其中, 所述步骤 c包括:  2. The method according to claim 1, wherein the step c comprises:
-根据所述块标识信息和所述原始网页所属网站的标识信息, 在处 理规则库中进行匹配查询, 以获得所述内容块处理规则。  And performing a matching query in the processing rule base according to the block identification information and the identification information of the website to which the original webpage belongs, to obtain the content block processing rule.
3. 根据权利要求 1或 2所述的方法, 其中, 所述内容块处理规则包 括以下至少任一项:  The method according to claim 1 or 2, wherein the content block processing rule comprises at least one of the following:
-对所述内容块中的内容进行格式化;  - formatting the content in the content block;
-对所述内容块进行展示;  - presenting the content block;
-对所述内容块进行删除;  - deleting the content block;
-对所述内容块进行折叠。  - folding the content block.
4. 根据权利要求 1或 2所述的方法, 其中, 所述步骤 c包括: 块标识信息所标识的内容块的内容相关信息, 确定所述内容块处理规  The method according to claim 1 or 2, wherein the step c includes: content related information of the content block identified by the block identification information, determining the content block processing rule
5. 根据权利要求 4所述的方法,其中, 所述内容相关信息包括以 下至少任一项: 5. The method according to claim 4, wherein the content related information comprises at least one of:
- 所述内容块的内容在所述原始网页中的位置信息;  - location information of the content of the content block in the original web page;
- 所述内容块的内容所包含的文字字符数量; - 所述内容块所包含的标签信息。 - the number of text characters contained in the content of the content block; - tag information contained in the content block.
6. 根据权利要求 4所述的方法, 其中, 该方法还包括:  The method according to claim 4, wherein the method further comprises:
-根据所述新确定的内容块处理规则,建立或更新所述处理规则库。 - establishing or updating the processing rule base according to the newly determined content block processing rule.
7. 根据权利要求 1或 2所述的方法, 其中, 所述步骤 a包括:The method according to claim 1 or 2, wherein the step a comprises:
-根据用户通过移动终端输入的页面访问请求,获取所述原始网页; 其中, 该方法还包括: - obtaining the original webpage according to a page access request input by the user through the mobile terminal; wherein the method further includes:
- 将所述目标网页提供给所述用户。  - providing the target web page to the user.
8. 根据权利要求 7所述的方法, 其中, 该方法还包括:  8. The method according to claim 7, wherein the method further comprises:
- 获取所述移动终端的显示参数信息;  Obtaining display parameter information of the mobile terminal;
-根据所述显示参数信息对所述内容块处理规则进行优化, 以获得 优选内容块处理规则;  - optimizing the content block processing rule according to the display parameter information to obtain a preferred content block processing rule;
其中, 所述步骤 d包括:  The step d includes:
-根据所述优选内容块处理规则, 对所述内容块进行相应的处理, 以获得所述目标网页。  - performing corresponding processing on the content block according to the preferred content block processing rule to obtain the target web page.
9. 根据权利要求 1或 2所述的方法, 其中, 所述块标识信息在所述 标记语言文件中的存储方式包括以下至少任一项:  The method according to claim 1 or 2, wherein the storage manner of the block identification information in the markup language file comprises at least one of the following:
- 所述标记语言文件中的注释;  - a comment in the markup language file;
- 所述标记语言文件中的定制标签;  - a custom tag in the markup language file;
- 所述标记语言文件中的标签属性。  - the tag attribute in the markup language file.
10. 根据权利要求 1或 2中任一项所述的方法, 其中, 所述标记语 言文件包括以下至少任一项:  The method according to any one of claims 1 to 2, wherein the markup language file comprises at least one of the following:
- HTML文件;  - HTML file;
- XML文件;  - XML file;
- XHTML文件;  - XHTML file;
- WML文件。  - WML file.
11. 一种基于内容块标识处理网页内容的设备,其中,该设备包括: 原始网页获取装置, 用于获取待处理的原始网页;  An apparatus for processing webpage content based on a content block identifier, wherein the apparatus comprises: an original webpage obtaining apparatus, configured to acquire an original webpage to be processed;
标识信息提取装置, 用于从所述原始网页的标记语言文件中提取块 标识信息, 其中, 所述块标识信息用于标识所述标记语言文件中的各内 容块; An identifier information extracting device, configured to extract block identification information from a markup language file of the original webpage, where the block identifier information is used to identify each of the markup language files Block
处理规则获取装置, 用于根据所述块标识信息, 在处理规则库中进 行匹配查询, 以获得与该块标识信息相对应的内容块处理规则;  a processing rule obtaining means, configured to perform a matching query in the processing rule base according to the block identification information, to obtain a content block processing rule corresponding to the block identification information;
目标网页获取装置, 用于根据所述内容块处理规则, 对该块标识信 息所标识的内容块进行相应的处理, 以获得目标网页。  The target webpage obtaining means is configured to perform corresponding processing on the content block identified by the block identification information according to the content block processing rule to obtain a target webpage.
12. 根据权利要求 11所述的设备, 其中, 所述处理规则获取装置用 于根据所述块标识信息和所述原始网页所属网站的标识信息, 在处理规 则库中进行匹配查询, 以获得所述内容块处理规则。  The device according to claim 11, wherein the processing rule obtaining means is configured to perform a matching query in the processing rule base according to the block identification information and the identification information of the website to which the original webpage belongs, to obtain a The content block processing rules.
13. 根据权利要求 11或 12所述的设备, 其中, 所述内容块处理规 则包括以下至少任一项:  The device according to claim 11 or 12, wherein the content block processing rule comprises at least one of the following:
-对所述内容块中的内容进行格式化;  - formatting the content in the content block;
-对所述内容块进行展示;  - presenting the content block;
-对所述内容块进行删除;  - deleting the content block;
-对所述内容块进行折叠。  - folding the content block.
14. 根据权利要求 11或 12所述的设备, 其中, 所述处理规则获取 块标识信息所标识的内容块的内容相关信息, 确定所述内容块处理规  The device according to claim 11 or 12, wherein the processing rule acquires content related information of the content block identified by the block identification information, and determines the content block processing rule
15. 根据权利要求 14所述的设备,其中, 所述内容相关信息包括 以下至少任一项: The device according to claim 14, wherein the content related information comprises at least one of the following:
- 所述内容块的内容在所述原始网页中的位置信息;  - location information of the content of the content block in the original web page;
- 所述内容块的内容所包含的文字字符数量;  - the number of text characters contained in the content of the content block;
- 所述内容块所包含的标签信息。  - tag information contained in the content block.
16. 根据权利要求 14所述的设备, 其中, 该设备还包括:  The device according to claim 14, wherein the device further comprises:
更新装置, 用于根据所述新确定的内容块处理规则, 建立或更新所 述处理规则库。  And an updating device, configured to establish or update the processing rule base according to the newly determined content block processing rule.
17. 根据权利要求 11或 12所述的设备, 其中, 所述原始网页获取 装置用于根据用户通过移动终端输入的页面访问请求, 获取所述原始网 页; 其中, 该设备还包括: The device according to claim 11 or 12, wherein the original webpage obtaining means is configured to acquire the original webpage according to a page access request input by a user through a mobile terminal; The device also includes:
提供装置, 用于将所述目标网页提供给所述用户。  Providing means for providing the target webpage to the user.
18. 根据权利要求 17所述的设备, 其中, 该设备还包括: 参数获取装置, 用于获取所述移动终端的显示参数信息;  The device according to claim 17, wherein the device further comprises: parameter obtaining means, configured to acquire display parameter information of the mobile terminal;
优化装置, 用于根据所述显示参数信息对所述内容块处理规则进行 优化, 以获得优选内容块处理规则;  And an optimization device, configured to optimize the content block processing rule according to the display parameter information to obtain a preferred content block processing rule;
其中, 所述目标网页获取装置用于根据所述优选内容块处理规则, 对所述内容块进行相应的处理, 以获得所述目标网页。  The target webpage obtaining apparatus is configured to perform corresponding processing on the content block according to the preferred content block processing rule to obtain the target webpage.
19. 根据权利要求 11或 12所述的设备, 其中, 所述块标识信息在 所述标记语言文件中的存储方式包括以下至少任一项:  The device according to claim 11 or 12, wherein the storage manner of the block identification information in the markup language file comprises at least one of the following:
- 所述标记语言文件中的注释;  - a comment in the markup language file;
- 所述标记语言文件中的定制标签;  - a custom tag in the markup language file;
- 所述标记语言文件中的标签属性。  - the tag attribute in the markup language file.
20. 根据权利要求 11或 12所述的设备, 其中, 所述标记语言文件 包括以下至少任一项:  The apparatus according to claim 11 or 12, wherein the markup language file comprises at least one of the following:
- HTML文件;  - HTML file;
- XML文件;  - XML file;
- XHTML文件;  - XHTML file;
- WML文件。  - WML file.
PCT/CN2012/075044 2011-11-30 2012-05-03 Method and device for processing webpage content on the basis of content block identification WO2013078829A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201110390828.9 2011-11-30
CN201110390828.9A CN103136259B (en) 2011-11-30 2011-11-30 A kind of method and apparatus based on content block identification processing web page contents

Publications (1)

Publication Number Publication Date
WO2013078829A1 true WO2013078829A1 (en) 2013-06-06

Family

ID=48496093

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2012/075044 WO2013078829A1 (en) 2011-11-30 2012-05-03 Method and device for processing webpage content on the basis of content block identification

Country Status (2)

Country Link
CN (1) CN103136259B (en)
WO (1) WO2013078829A1 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106126485A (en) * 2016-06-14 2016-11-16 北京金山安全软件有限公司 Text format generation method, server and terminal

Families Citing this family (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103473004A (en) * 2013-09-29 2013-12-25 小米科技有限责任公司 Method, device and terminal equipment for displaying message
CN103544320A (en) * 2013-11-05 2014-01-29 从兴技术有限公司 Webpage generation method and device
CN104834685A (en) * 2015-04-17 2015-08-12 百度国际科技(深圳)有限公司 Method and device for processing comment message block in comment-like webpage
CN108595697B (en) * 2018-05-09 2021-02-02 未鲲(上海)科技服务有限公司 Webpage integration method, device and system
CN109710863A (en) * 2018-11-27 2019-05-03 平安科技(深圳)有限公司 Information conversion method, device, computer equipment and storage medium
CN111125605B (en) * 2019-12-31 2022-07-29 北京创鑫旅程网络技术有限公司 Page element acquisition method and device

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102163233A (en) * 2011-04-18 2011-08-24 北京神州数码思特奇信息技术股份有限公司 Method and system for converting webpage markup language format

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20040054973A1 (en) * 2000-10-02 2004-03-18 Akio Yamamoto Method and apparatus for transforming contents on the web
CN101039357A (en) * 2006-03-17 2007-09-19 陈晓月 Method for browsing website using handset
CN101526953A (en) * 2009-01-19 2009-09-09 北京跳网无限科技发展有限公司 WWW transformation technology
CN101815093A (en) * 2010-03-11 2010-08-25 深圳市嘉讯软件有限公司 Method for adapting webpage to mobile terminal and mobile terminal page adaptation device

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102163233A (en) * 2011-04-18 2011-08-24 北京神州数码思特奇信息技术股份有限公司 Method and system for converting webpage markup language format

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106126485A (en) * 2016-06-14 2016-11-16 北京金山安全软件有限公司 Text format generation method, server and terminal

Also Published As

Publication number Publication date
CN103136259A (en) 2013-06-05
CN103136259B (en) 2018-03-23

Similar Documents

Publication Publication Date Title
WO2013078829A1 (en) Method and device for processing webpage content on the basis of content block identification
US10915828B2 (en) Website address identification method and apparatus
US10430514B2 (en) Method and terminal for extracting webpage content, and non-transitory storage medium
US7747782B2 (en) System and method for providing and displaying information content
WO2013078830A1 (en) Method, device, and system for processing webpage access request of mobile terminal
US20100268773A1 (en) System and Method for Displaying Information Content with Selective Horizontal Scrolling
US9015657B2 (en) Systems and methods for developing and delivering platform adaptive web and native application content
JP2019530921A (en) Method and system for server-side rendering of native content for presentation
WO2014029173A1 (en) Method, apparatus and device for sequencing search results
US20130232424A1 (en) User operation detection system and user operation detection method
US20130198613A1 (en) Methods for tranforming requests for web content and devices thereof
JP2002108870A (en) System and method for processing information
WO2014090082A1 (en) Image processing method,device and terminal
WO2016100541A1 (en) Network based static font subset management
US20170372700A1 (en) Method of entering data in an electronic device
CN103389972A (en) Method and device for obtaining text based on really simple syndication (RSS)
WO2023155712A1 (en) Page generation method and apparatus, page display method and apparatus, and electronic device and storage medium
CN110808868A (en) Test data acquisition method and device, computer equipment and storage medium
CN102760157B (en) A kind of for generating the method that release news, device and the equipment corresponding with mobile terminal
CN114297544A (en) Remote browsing method, device, equipment and storage medium
JP2015138376A (en) Information processing terminal, control method for the same, and program
US8875094B2 (en) System and method for implementing intelligent java server faces (JSF) composite component generation
US20140281916A1 (en) Supporting Font Character Kerning
JP5955186B2 (en) Information processing device
KR101975111B1 (en) Mass webpage document transforming method, and system thereof

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 12853364

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 12853364

Country of ref document: EP

Kind code of ref document: A1