CN104951445B

CN104951445B - Webpage processing method and device

Info

Publication number: CN104951445B
Application number: CN201410113842.8A
Authority: CN
Inventors: 左景龙; 徐琰; 於一飞
Original assignee: Xiaomi Inc
Current assignee: Xiaomi Inc
Priority date: 2014-03-25
Filing date: 2014-03-25
Publication date: 2020-06-02
Anticipated expiration: 2034-03-25
Also published as: CN104951445A

Abstract

The present disclosure provides a method and an apparatus for processing a web page, wherein the method includes: in the process of loading a webpage, identifying the node type in the webpage; loading nodes of non-media resource classes in the webpage according to the node types in the webpage; identifying the main content in the webpage according to the loaded nodes of the non-media resource class; loading a node of a media resource class related to the main body content according to the main body content; and displaying the loaded main body content and the nodes of the media resource classes related to the main body content. In the loading process, only the nodes of the media resource class relevant to the main content are loaded, so that the access flow of a user is saved, the access efficiency is improved, and the satisfaction degree of the user in browsing the webpage is improved.

Description

Webpage processing method and device

Technical Field

The present disclosure relates to the field of web page technologies, and in particular, to a web page processing method and apparatus.

Background

With the development of mobile terminals and internet technologies, more and more users access networks through mobile terminals such as mobile phones and tablet computers. However, when a user accesses a web page through a mobile terminal, the web page includes some large media resources, such as pictures, video, and the like, in addition to a title, a web page main body content, and related links, where some pictures and videos may be advertisement-like media resources, and are not related to the web page main body content.

However, in the prior art, when a web page is loaded, since media resources such as pictures or videos irrelevant to the main content of the web page are loaded, not only is the network traffic of a user wasted, but also the reading experience of the user is reduced.

Disclosure of Invention

The disclosure provides a webpage processing method and a webpage processing device, which are used for solving the technical problem that when a webpage is loaded in the prior art, a larger media resource is loaded, so that the access flow is too large.

In order to solve the technical problem, the present disclosure discloses the following technical solutions:

in one aspect, a method for processing a web page is provided, and the method includes:

in the process of loading a webpage, identifying the node type in the webpage;

loading nodes of non-media resource classes in the webpage according to the node types in the webpage;

identifying the main content in the webpage according to the loaded nodes of the non-media resource class;

loading a node of a media resource class related to the main body content according to the main body content;

and displaying the loaded main body content and the nodes of the media resource classes related to the main body content.

Preferably, the identifying the main content in the web page includes:

acquiring a DOM tree and a render tree generated by analyzing the webpage;

determining visible nodes in the DOM tree according to the render tree, and determining preview values of the visible nodes;

and determining the main content in the webpage according to the preview value of each visible node.

Preferably, the determining the preview value of each visible node includes:

and determining an initial preview value of each visible node according to the label of each visible node, and taking the initial preview value as the preview value of each visible node.

Preferably, the method further comprises:

when the label of the visible node is a preset label, determining an additional preview value according to the content corresponding to the visible node in the webpage document;

and adding the additional preview value and the initial preview value to obtain a preview value of the visible node.

Preferably, the method further comprises:

and adding the preview value of the visible node into the preview values of all levels of father nodes of the visible node according to a preset proportion.

Preferably, the determining the main content in the web page according to the preview value of each visible node includes:

acquiring two visible nodes with the maximum preview value, and respectively taking the two visible nodes as a highest visible node and a second highest visible node;

determining a visible node where the preview content is located from the highest visible node and the next highest visible node according to the hierarchical relationship of the highest visible node and the next highest visible node;

and extracting the main content in the webpage from the visible node where the preview content is determined to be located.

Preferably, the determining, according to the hierarchical relationship between the highest visible node and the next highest visible node, the visible node where the preview content is located from the highest visible node and the next highest visible node includes:

if the highest visible node is any level parent node of the next highest visible node, taking the highest visible node as a visible node where the preview content is located; or

And if the highest visible node is not any level father node of the next highest visible node, determining the visible node where the preview content is located according to the position relation of the content in the webpage corresponding to the highest visible node and the next highest visible node.

Preferably, the determining the visible node where the preview content is located according to the position relationship of the content in the webpage corresponding to the highest visible node and the second highest visible node includes:

if the position relation is left-right and not coincident, respectively calculating the sum of preview values of nodes of which the labels are preset labels and which are included in the highest visible node and the next highest visible node, and taking the visible node with the larger sum as the visible node where the preview content is located;

and if the node is in other position relations except left-right and non-coincident, taking the common father node of the highest visible node and the next highest visible node as the visible node where the preview content is located.

Preferably, the node for loading the media resource class related to the main content according to the main content includes:

acquiring a DOM tree and a render tree generated by analyzing the webpage;

determining nodes of media resource classes related to the main body content according to the visible nodes in the DOM tree and the render tree;

and loading the nodes of the media resource classes related to the main body content.

Preferably, the webpage is a current webpage or a next page of the current webpage.

An aspect provides a web page processing apparatus, the apparatus including:

the first identification unit is used for identifying the node type in the webpage loading process;

the first loading unit is used for loading the nodes of the non-media resource types in the webpage according to the node types in the webpage;

the second identification unit is used for identifying the main content in the webpage according to the loaded nodes of the non-media resource class;

the second loading unit is used for loading the nodes of the media resource classes related to the main body content according to the main body content;

and the display unit is used for displaying the loaded main body content and the nodes of the media resource classes related to the main body content.

Preferably, the identification unit includes:

the first acquisition unit is used for acquiring a DOM tree and a render tree generated by analyzing the webpage;

the first determining unit is used for determining visible nodes in the DOM tree according to the render tree and determining preview values of the visible nodes;

and the second determining unit is used for determining the main content in the webpage according to the preview value of each visible node.

Preferably, the first determining unit includes:

a first determining subunit, configured to determine, according to the render tree, a visible node in the DOM tree:

and the second determining subunit is configured to determine an initial preview value of each visible node according to the label of each visible node, and use the initial preview value as a preview value of each visible node.

Preferably, the first determining unit further includes:

a third determining subunit, configured to determine, when the tag of the visible node is a preset tag, an additional preview value according to content corresponding to the visible node in the web document;

and the fourth determining subunit is configured to add the additional preview value to the initial preview value to obtain a preview value of the visible node.

Preferably, the first determining unit further includes:

and the adding subunit is used for adding the preview value of the visible node into the preview values of all levels of parent nodes of the visible node according to a preset proportion.

Preferably, the second determination unit includes:

a second obtaining unit, configured to obtain two visible nodes with a largest preview value, where the two visible nodes are respectively used as a highest visible node and a second highest visible node;

a fifth determining subunit, configured to determine, according to a hierarchical relationship between the highest visible node and a next-highest visible node, a visible node where the preview content is located from the highest visible node and the next-highest visible node;

and the extracting unit is used for extracting the main content in the webpage from the visible node where the preview content is located.

Preferably, the fifth determining subunit is specifically configured to, when the highest visible node is any level parent node of the next highest visible node, use the highest visible node as a visible node where the preview content is located; or, when the highest visible node is not any level parent node of the second highest visible node, determining the visible node where the preview content is located according to the position relationship of the content in the webpage corresponding to the highest visible node and the second highest visible node.

Preferably, the second loading unit includes:

a third obtaining unit, configured to obtain a DOM tree and a render tree generated by parsing the web page;

a third determining unit, configured to determine, according to the visible node and the render tree in the DOM tree, a node of a media resource class related to the main content;

and the loading subunit is used for loading the nodes of the media resource classes related to the main body content.

Preferably, the webpage identified by the identification unit is a current webpage or a next page of the current webpage.

An aspect provides a web page processing apparatus, the apparatus including:

a processor;

a memory for storing processor-executable instructions;

wherein the processor is configured to:

in the process of loading a webpage, identifying the node type in the webpage;

Some benefits of the present disclosure may include:

in the process of loading the webpage, loading nodes of non-media resource classes in the webpage, identifying main content in the webpage, and then loading nodes of the media resource classes related to the main content; and displaying the main content and the nodes of the media resource class related to the main content, namely filtering the nodes of the media resource class in the loading process, so that the access flow of a user is saved, and the access efficiency and the reading satisfaction of the user are improved.

It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the disclosure.

Drawings

In order to more clearly illustrate the present disclosure or technical solutions in the prior art, the drawings needed to be used in the embodiments will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present disclosure, and other drawings can be obtained by those skilled in the art without inventive efforts.

FIG. 1 is a flowchart illustrating a method of web page processing according to an exemplary embodiment of the present disclosure;

FIG. 2 is a schematic diagram illustrating a DOM tree disclosed in accordance with an illustrative embodiment;

FIG. 3 is a block diagram illustrating an exemplary embodiment of a web page processing apparatus according to the present disclosure;

FIG. 4 is another schematic diagram illustrating a web page processing apparatus according to an exemplary embodiment of the present disclosure;

FIG. 5 is another schematic diagram illustrating a web page processing apparatus according to an exemplary embodiment of the present disclosure;

FIG. 6 is another schematic diagram illustrating a web page processing apparatus according to an exemplary embodiment of the present disclosure;

FIG. 7 is another structural schematic of a web page processing apparatus shown in accordance with an exemplary embodiment of the present disclosure;

fig. 8 is a block diagram illustrating an apparatus for web page processing according to an example embodiment.

Detailed Description

The technical solutions of the present disclosure will be described clearly and completely with reference to the accompanying drawings of the present disclosure, and it is to be understood that the described embodiments are only a part of the embodiments of the present disclosure, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments disclosed herein without making any creative effort, shall fall within the protection scope of the present disclosure.

The terminology used in the present disclosure is for the purpose of describing particular embodiments only and is not intended to be limiting of the disclosure. As used in this disclosure and the appended claims, the singular forms "a," "an," and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It should also be understood that the term "and/or" as used herein refers to and encompasses any and all possible combinations of one or more of the associated listed items.

It is to be understood that although the terms first, second, third, etc. may be used herein to describe various information, such information should not be limited to these terms. These terms are only used to distinguish one type of information from another. For example, first information may also be referred to as second information, and similarly, second information may also be referred to as first information, without departing from the scope of the present disclosure. The word "if" as used herein may be interpreted as "at … …" or "when … …" or "in response to a determination", depending on the context.

Referring to fig. 1, fig. 1 is a flowchart illustrating a web page processing method according to an exemplary embodiment of the present disclosure; the method comprises the following steps:

in step 101, in the process of loading a webpage, identifying a node type in the webpage;

in this embodiment, the node types in the web page may include, in addition to the nodes of the main displayed text content, title nodes, related link nodes, and the like, which are collectively referred to as non-media resource nodes, and of course, the web page also includes some media resource nodes, such as picture type nodes, video type nodes, and the like.

In this embodiment, when receiving a web page accessed by a user, a browser loads the web page, and in the process of loading the web page, a node type in the web page is identified first, and a specific identification process thereof is well known to those skilled in the art and is not described herein again.

In step 102, loading nodes of non-media resource types in the webpage according to the node types in the webpage;

in this step, after the node type in the web page is identified, only the node of the non-media resource class in the web page is loaded, that is, the node of the media resource class in the web page is filtered.

That is to say, in this embodiment, the loaded nodes of the non-media resource class in the web page are the content information of the nodes of all the media resource classes, and are generally the text content of the nodes of the non-media resource class in the web page, that is, only the text content in the web page is loaded.

In step 103, identifying the main content in the webpage according to the loaded nodes of the non-media resource class;

wherein identifying subject content in the web page comprises:

1) acquiring a Dom tree and a Render tree generated by analyzing the webpage;

a user firstly accesses a webpage by using a browser of a mobile terminal, and identifies a webpage Document of the webpage, such as an html Document, when the webpage is accessed, the browser analyzes the webpage Document to generate a Document Object Model (DOM) tree and a render tree, wherein the DOM tree is used for describing page information of the webpage; the render tree is used for laying out pages of the webpage, and is particularly responsible for describing the position of information in the DOM tree on a screen of the mobile terminal.

2) Determining visible nodes in the Dom tree according to the render tree, and determining preview values of the visible nodes;

wherein determining the preview value of each visible node comprises: and determining an initial preview value of each visible node according to the label of each visible node, and taking the initial preview value as the preview value of each visible node.

Each node obtained by analyzing the web Document is stored in the DOM tree, for example, a root node of the DOM tree is a file (Document) object, that is, an entry for operating the web Document; as another example, a class of child nodes in a DOM tree are text (text) objects, i.e., some of the textual content in a web page document.

As can be seen from the above, the DOM tree has different types of nodes, and the roles of the nodes of the different types in the web document are different, so that the content corresponding to a part of the nodes in the DOM tree can be displayed in the web page, such as the nodes representing the text object, the nodes representing the img object, and the like; the content corresponding to another part of the nodes can not be displayed in the webpage, such as the root node representing the Document object.

If the content corresponding to the node can be displayed in the webpage, taking the node as a visible node, namely the visible node is the node corresponding to the content displayed in the webpage; and if the content corresponding to the node cannot be displayed in the webpage, taking the node as an invisible node.

When the page is rendered, the render tree is responsible for recording the estimated display position of the visible node in the DOM tree in the page, so that for each node in the DOM tree, whether the node is visible or not can be determined by inquiring the render tree, and therefore each visible node in the DOM tree can be determined.

In addition, since the web page contains various information, such as title, text, link, etc., the visible nodes in the DOM tree differ in their specific content characterized in the web page document. According to the method and the device, the corresponding preview value can be configured for each visible node according to the requirements of the webpage during the re-layout. The preview value can be configured according to the type of the content represented by the node, such as pictures, characters and the like, and can also generate a corresponding preview value according to the specific content represented by each visible node in the webpage.

For example, if a picture in a page is highlighted when re-laid out, the tab associated with the picture may be configured with a higher preview value. For another example, when the text content in the page is highlighted during the re-layout, the higher preview value may be configured for the tag related to the title and the text, or the height of the preview value may be determined according to the amount of the text.

The preview value of the visible node may identify a possibility that the content corresponding to the visible node is previewed in the webpage. Therefore, after the preview value of the corresponding visible node is determined according to the label of each visible node, the possibility that the content corresponding to the visible node is previewed in the webpage can be obtained.

And determining visible nodes in the DOM tree according to the render tree, wherein the following method is adopted: detecting whether the node exists in the render tree; if the node does not exist in the render tree, the node is not a visible node; if the node exists in the render tree, the node is a visible node, and the following detection mode is specifically adopted:

firstly, acquiring a node in the DOM tree, detecting whether the node exists in a render tree or not, if so, determining the node to be a visible node, otherwise, determining the node to be an invisible node.

Of course, detecting whether a node exists in the render tree to determine whether the node is a visible node is only one method for determining the visible node, and of course, when traversing each node of the DOM tree, detecting the attribute of the node in the DOM tree and the attribute value thereof, and further determining whether the node is a visible node according to the attribute value. The nodes in the DOM tree have some attributes related to display, and whether the nodes are visible or not can be determined according to the attribute values of the attributes.

Determining a preview value of each visible node comprises the following steps: and determining an initial preview value of each visible node according to the label of each visible node, and taking the initial preview value as a preview value of each visible node of the label.

Since the visible nodes in the DOM tree represent different contents in the web document, each visible node also has a different tag.

Specifically, the tag may be regarded as an identifier of a node, and is configured when the webpage document is configured, for example, the content of the title portion in the webpage document is < title > xxxx-XXX-XX </title >, the content corresponds to a node in the DOM tree with < title > as a starting point and </title > as an end point, the node is a visible node, and the tag of the visible node is the title.

In this embodiment, an initial preview value may be configured for each tag, and the size of the initial preview value may be flexibly set according to the requirement of re-layout.

Therefore, for each visible node, the label of the visible node can represent the display content in the webpage, and also can represent the preview possibility of the visible label, so that the initial preview value of each visible node can be determined according to the title, and is taken as the preview value of the visible node.

Preferably, when the label of the visible node is a preset label, determining an additional preview value according to the content corresponding to the visible node in the webpage document; and adding the additional preview value and the initial preview value of the preset label to obtain the preview value of the visible node. Further, the preview value of the visible node can be added to the preview values of all levels of parent nodes of the visible node according to a preset proportion.

Although the labels of the visible nodes are the same, the specific contents of the visible nodes in the webpage are different, so that the possibility of previewing the visible nodes in the process of re-layout is different. For example, the pictures loaded with the text in the web page are usually larger than the pictures loaded with the recommended content, the pictures loaded with the text are emphatically displayed on the web page, and meanwhile, the pictures which are desired to be viewed by the user browsing the web page are often also the pictures loaded with the text, so that the preview values of the visible nodes corresponding to the pictures with different sizes can be different.

For another example, in the text content of the web page, the text of the abstract portion is often much less than the text of the text portion, the text portion is highlighted in the web page, and the text portion is usually also mainly viewed when the user browses the web page, so the preview value of the visible node corresponding to the text portion and the preview value of the visible node corresponding to the abstract portion may have a difference.

Therefore, in this embodiment, when the label of the visible node is the preset label, the content corresponding to the visible node in the webpage may be obtained, and then the additional preview value of the content is calculated, that is, the preview value of the visible node may be affected by the amount and size of the content.

For example, when the text content is highlighted, the label of the visible node corresponding to the content of the text portion is generally text, and of course, the label of the visible node corresponding to the content of the abstract portion is text as well. At this time, corresponding attachment preview values can be configured for the length, characters, punctuation marks and the like of the content aiming at the content corresponding to the visible node labeled as the text, so that the visible node labeled as the text can calculate the corresponding additional preview values according to the specific content, and the difference of previewing when the visible nodes of the same type of label are rearranged is distinguished.

The preset label can be set according to specific requirements, and the preset label is text when the text content is highlighted; when the picture is displayed emphatically, the preset label is img.

After the additional preview value of the visible node is obtained, the additional preview value is added and summed with the initial preview value of the preset label, so that the preview value of the visible node is calculated, namely the sum of the additional preview value and the initial preview value.

That is to say, in this embodiment, an initial preview value corresponding to a tag may be set according to a requirement, and the initial preview value is used as a preview value of a visible node corresponding to the tag, so that the possibility of previewing the visible node of each tag may be determined according to the requirement. Furthermore, in order to make the page layout have important points and meet the requirements of users, an additional preview value of a preset label can be set, and the additional preview value can be determined according to the content in the webpage corresponding to the visible node, so that the preview value of the visible node is more accurate, and the displayed preview content and the like have prominent points.

3) And determining the main content in the webpage according to the preview value of each visible node.

In the present disclosure, the preview content, i.e., the main content, in the webpage may be determined according to the preview value of the visible node, for example, a standard of the preview value may be set, and all the visible nodes whose preview values meet the standard may have their corresponding content set as the preview content. For another example, only the content corresponding to the visible node with the highest preview value may be taken as the preview content. After the preview content is acquired, the main content to be displayed is rearranged.

Wherein determining the main content in the webpage specifically comprises:

1) acquiring two visible nodes with the maximum preview value, and respectively taking the two visible nodes as a highest visible node and a second highest visible node;

after the preview values of all the visible nodes are obtained, the preview values of all the visible nodes can be counted and compared, so that two visible nodes with the largest preview values are obtained and are respectively used as the highest visible node and the second highest visible node.

2) Determining a visible node where preview content is located from the highest visible node and the next highest visible node according to the hierarchical relationship between the highest visible node and the next highest visible node;

the hierarchical relationship between the highest visible node and the next highest visible node can be determined according to the hierarchical relationship between the visible nodes, and then the visible node for executing preview, namely the visible node where the preview content is located, is determined according to the hierarchical relationship. The method specifically comprises the following steps:

A) and if the highest visible node is any level parent node of the next highest visible node, taking the highest visible node as the visible node where the preview content is located.

If the highest visible node is determined to be any level parent node of the next highest visible node through detection, it is indicated that the highest visible node and the next highest visible node have a certain association relationship, for example, the highest visible node may correspond to a title portion of a text in a webpage, and the next highest visible node may correspond to a text content portion of the text in the webpage, so that the highest visible node can be used as a visible node where a preview memory is located.

In this embodiment, it is not limited to only displaying the content corresponding to the visible node where the preview is executed, but when the web page is rearranged, the content to be displayed is extracted from the visible node where the preview content is located, and the content to be displayed is started.

For example, if the highest visible node in fig. 2 is D and the next highest visible node is N, where D is the 2-level parent node of N, it is seen that D is the visible node where the preview content is located, and the content corresponding to D is displayed from the beginning.

B) And if the highest visible node is not any level father node of the next highest visible node, determining the visible node where the preview content is located according to the position relation of the content in the webpage corresponding to the highest visible node and the next highest visible node.

If the highest visible node is not any level father node of the next highest visible node, it is indicated that the highest visible node and the next highest visible node do not have a direct upper-lower level relationship, and at this time, the position relationship of the content in the webpage corresponding to the highest visible node and the next highest visible node can be detected, so as to determine the visible node where the preview content is located.

The render tree contains the positions of the content corresponding to the visible nodes in the web page, but the positions recorded in the render tree are generated before layout, and the positions of the specific content may be different when layout is actually realized. For example, the content corresponding to the upper-layer visible node of a certain visible node in the render tree is a picture, and the content of the visible node is displayed only after the picture is placed, so that the position of the visible node stored in the render tree is the position after the picture is displayed, but when the layout is actually realized, the picture is successfully acquired, the picture cannot be displayed, and therefore the position of the content of the visible node is changed, that is, the position recorded in the render tree is not the actual position.

In order to more accurately layout the content in the webpage, the position of the content corresponding to each visible node may be calculated and recorded during layout, and the position relationship between the highest visible node and the next highest visible node is determined according to the recorded position, so as to determine the visible node where the preview content is located. The method comprises the following specific steps:

① if the position relationship is left-right and not coincident;

and respectively calculating the sum of the additional preview values of the nodes with labels as preset labels in the highest visible node and the next highest visible node, and taking the visible node with the larger sum as the visible node where the preview content is located.

If the position relation of the content corresponding to the highest visible node and the second highest visible node in the webpage is left-right and not coincident, the content corresponding to the two visible nodes belongs to different parts in the webpage, and therefore the visible node where the preview content is located can be determined according to the amount of the content corresponding to the preset label.

Specifically, since one visible node may be a parent node of a plurality of visible nodes, and labels of some visible nodes in the plurality of visible nodes are preset labels, the attachment preview values of the visible nodes that are the preset labels may be added and summed to serve as an additional preview value of the visible node, that is, the visible node where the preview content is located is determined according to the sum value of the additional preview values of the preset labels in the highest visible node and the next highest visible node.

For example, assume that the highest visible node in FIG. 2 is D with a preview value of 50 and the next highest visible node is C with a preview value of 48. D and C do not have parent-child relationship, and the position relationship of the corresponding content in the webpage is left-right and not coincident, then the sum of the additional preview values of the preset labels in D and C can be calculated.

Taking D as an example for specific discussion, assuming that D, J and the tags of M are preset tags, the sum of the additional preview values of the preset tags in D = the additional preview value of D + the additional preview value of J + the additional preview value of M.

In order to have an important display when the webpage is rearranged, the visible node with the maximum sum of the additional preview values of the preset tags can be used as the visible node for executing the preview.

② other than left-right and non-overlapping;

and taking the common parent node of the highest visible node and the next highest visible node as the visible node for executing the preview.

If the position relations of the contents corresponding to the highest visible node and the second highest visible node in the webpage are not left and right and are not coincident, namely other position relations are obtained, the contents corresponding to the highest visible node and the second highest visible node are more important, therefore, the public father node of the highest visible node and the second highest visible node can be searched, and then the public father node is used as the visible node where the preview contents are located.

If the highest visible node in fig. 2 is D and the second highest visible node is E, the position relationship of the content corresponding to D and E in the web page is not left-right and not coincident, and the common parent node B of D and E is used as the visible node for performing the preview.

The common parent node refers to a common parent node of several visible nodes, and as to whether the common parent node is the same level parent node or not, which level of the common parent node is, the embodiment is not limited.

3) And extracting the main content in the webpage from the visible node where the preview content is determined to be located.

The content in the webpage document corresponding to the visible node where the preview content is located can be used as a part of the preview content, the preview content is extracted from the visible node where the preview content is located, and specifically, the preview is started from the content in the webpage document corresponding to the visible node where the preview is executed. That is, the beginning of the content is used as the starting point of the preview content, and then the main content in the webpage is extracted from the preview content.

In this step, a DOM tree and a render tree may be acquired, visible nodes in the DOM tree may be determined, a preview value of a corresponding visible node may be determined by a tag of each visible node according to a specific requirement, and a preview content in a web page may be finally determined for display, where the preview content may be a key content extracted from the web page according to the requirement, and the key content may also be referred to as a main content.

It should be noted that, in this step, determining each visible node in the DOM tree according to the render tree specifically includes: and detecting whether the node in the DOM tree is in the render tree or not, and if so, further detecting the attribute value of the node so as to determine whether the node is a visible node or not, thereby accurately and quickly determining the visible node in the DOM tree. And

in this step, an initial preview value corresponding to the tag may be set as a preview value of the visible node corresponding to the tag according to the requirement, so that the possibility of previewing the visible node of each tag may be determined according to the requirement. Furthermore, in order to make the page layout have important points and meet the requirements of users, an additional preview value of a preset label can be set, and the additional preview value can be determined according to the content in the webpage corresponding to the visible node, so that the preview value of the visible node is more accurate, and the displayed preview content and the like have prominent points. And

in this step, the hierarchical relationship of the visible nodes may also be considered, and if a certain visible node is a parent node of other visible nodes when calculating the preview value, the preview value of the other visible nodes may be added to the preview value of the visible node according to a certain proportion, so that the preview value is more accurate, and the finally determined preview content is more accurate.

In this step, in the present disclosure, when loading a web page, the browser only identifies the main content in the web page, and then loads the media resources related to the main content, but does not load (i.e., filter out) the media resources unrelated to the main content.

In step 104, loading a node of a media resource class related to the main body content according to the main body content;

in the above steps, after acquiring the DOM tree and the render tree generated by parsing the web page, determining the nodes of the media resource class related to the main content according to the visible nodes and the render tree in the DOM tree, and loading the nodes of the media resource class related to the main content.

A node for determining a media asset class associated with the body content from a visible node in the DOM tree and a render tree, comprising:

the specific contents of the visible nodes in the DOM tree represented in the webpage document are different, the visible nodes in the DOM tree have a hierarchical relationship, wherein the upper layer visible node is a level 1 father node of the lower layer visible node, the level 1 father node of the upper layer visible node is a level 2 father node of the lower layer visible node, and by analogy, the level n father node of the upper layer visible node is a level n +1 father node of the lower layer visible node.

That is, the visible nodes of the DOM tree are hierarchically related, i.e., the visible nodes of the upper and lower layers are related. The upper visible node and the lower visible node (i.e. the lower visible node of the upper visible node) are relative to the two visible nodes that are connected. The render tree also contains the position of the content corresponding to each visible node in the web page.

Therefore, according to the content corresponding to each visible node, and the association relationship and the position information thereof, the node of the media resource class related to the main content can be determined, that is, the node of the media resource class within the range of the main content can be determined. The nodes of the media resource class may be picture nodes, video nodes, and the like.

In this step, after the browser identifies the main content in the webpage, the nodes of the media resource class related to the main content are determined through the DOM tree and the render tree, and then the main content and the nodes of the media resource class related to the main content are stored in the content memory of the mobile terminal, so as to facilitate the subsequent display.

In step 105, the loaded nodes of the main content and the media resource classes related to the main content are displayed.

In the disclosure, when loading a web page, a node of a non-media resource class in the web page is loaded first, a main content in the web page is identified, and then a node of a media resource class related to the main content is loaded. That is to say, in the loading process, the node of the non-media resource class in the webpage is loaded first, the main content in the webpage is identified, and then the node of the media resource class related to the main content is loaded, so that the access flow of the user is saved, the access efficiency is improved, and the satisfaction degree of the user in browsing the webpage is improved.

Optionally, in another embodiment, the accessed webpage may be the current webpage or a next page of the current webpage, for example, a subsequent page of the current webpage.

That is to say, in the above embodiment, when a user accesses a web page through a browser on a mobile terminal, the browser may load the main content of the web page first, then load a media resource related to the main content, and finally display the main content and the media resource related to the main content through a screen of the mobile terminal. When the current webpage has a subsequent page and the user wants to browse the subsequent page, at this time, the browser only identifies the main content of the subsequent webpage and loads the media resource related to the main content of the subsequent webpage in the process of loading the subsequent page (or the next page), so that the flow of the user for accessing the webpage is saved, and the access efficiency and the reading experience of the user are improved.

Optionally, in another embodiment, when the user browses the web page through the browser on the mobile terminal, in the process of loading the web page by the browser, the main content and all media resources of the current web page may be identified, and the main content and all media resources of the current web page are displayed; when the browser loads the next page of the current webpage, only the main content of the next page is identified, then, the browser only loads the media resources related to the main content of the next page, and then displays the loaded main content of the next page and the media resources related to the main content of the next page, so that the flow of accessing the next page by a user is saved, and the access efficiency and the reading experience of the user are improved.

Optionally, in another embodiment, after the browser loads the media resource related to the main content, the browser may further perform typesetting again on the loaded main content and the media resource to obtain the pre-displayed content on the webpage; the pre-display content is then displayed.

The technology of rearranging the loaded main content and the media resources is well known to those skilled in the art and will not be described herein again.

When the user needs to read the subsequent page in the reading mode, the browser automatically loads the subsequent page in the background, then identifies the main content of the subsequent page, and displays the main content to the user.

In the method, in the process of loading the current webpage or the next webpage of the current webpage, a node of a non-media resource class in the webpage is loaded first, the main content of the webpage or the next webpage of the webpage is identified, then only media resources related to the main content are loaded, and media resources unrelated to the main content are filtered; the internet traffic is saved, and the access efficiency and the reading experience of the user are improved.

Based on the implementation process of the method, the present disclosure shows a web page processing apparatus according to an exemplary embodiment, a schematic mechanism diagram of which is shown in fig. 3, and the apparatus includes: a first recognition unit 31, a first loading unit 32, a second recognition unit 33, a second loading unit 34, and a display unit 35, wherein,

the first identification unit 31 is configured to identify a node type in a web page during loading the web page; the webpage identified by the first identification unit is the current webpage or the next page of the current webpage.

The first loading unit 32 is configured to load a node of a non-media resource class in the web page according to the node type in the web page;

the second identifying unit 33 is configured to identify the main content in the webpage according to the loaded node of the non-media resource class;

the second loading unit 34 is configured to load a node of a media resource class related to the main content according to the main content;

the display unit 35 is configured to display the loaded subject content and nodes of media resource classes related to the subject content.

According to the method and the device, when the webpage is loaded, the browser firstly identifies the type of the node in the webpage, then loads the node of the non-media resource type in the webpage, identifies the main content in the webpage, and then loads the media resource related to the main content without loading the media resource unrelated to the main content in the webpage, such as the picture video of the non-webpage main content and the like, so that the internet traffic of a user is saved, and the access efficiency and the reading experience of the user are improved.

Optionally, in another embodiment, on the basis of the above embodiment, the second identifying unit 33 includes: a schematic configuration diagram of the first acquiring unit 331, the first determining unit 332, and the second determining unit 333 is shown in fig. 4, wherein,

the first obtaining unit 331, configured to obtain a DOM tree and a render tree generated by parsing the web page;

the first determining unit 332 is configured to determine visible nodes in the DOM tree according to the render tree, and determine a preview value of each visible node;

the second determining unit 333 is configured to determine the main content in the web page according to the preview value of each visible node.

Optionally, in another embodiment, on the basis of the above embodiment, the first determining unit 332 includes: a first determining sub-unit 3321 and a second determining sub-unit 3322, which are schematically shown in fig. 5, wherein,

the first determining subunit 3321, configured to determine, according to the render tree, a visible node in the DOM tree:

the second determining subunit 3322 is configured to determine an initial preview value of each visible node according to the label of each visible node, and use the initial preview value as a preview value of each visible node.

Optionally, in another embodiment, on the basis of the above embodiment, the first determining unit further includes: the third determining subunit is configured to determine an additional preview value according to content corresponding to the visible node in the webpage document when the label of the visible node is a preset label; the fourth determining subunit is configured to add the additional preview value to the initial preview value to obtain a preview value of the visible node.

Optionally, in another embodiment, on the basis of the above embodiment, the first determining unit further includes: and the adding subunit is configured to add the preview value of the visible node to the preview values of all levels of parent nodes of the visible node according to a preset proportion.

Optionally, in another embodiment, on the basis of the foregoing embodiment, the second determining unit 333 includes: a second acquiring unit 3331, a fifth determining sub-unit 3332 and an extracting unit 3333, which are schematically shown in fig. 6. Wherein,

the second obtaining unit 3331, configured to obtain two visible nodes with the largest preview value as a highest visible node and a second highest visible node, respectively; the fifth determining subunit 3332, configured to determine, according to the hierarchical relationship between the highest visible node and the next highest visible node, a visible node where the preview content is located from the highest visible node and the next highest visible node; the extracting unit 3333 is configured to extract the main content in the webpage from the visible node where the determined preview content is located.

Wherein the fifth determining subunit 3332 is specifically configured to, when the highest visible node is any level parent node of the next highest visible node, regard the highest visible node as a visible node where the preview content is located; or, when the highest visible node is not any level parent node of the second highest visible node, determining the visible node where the preview content is located according to the position relationship of the content in the webpage corresponding to the highest visible node and the second highest visible node.

Optionally, in another embodiment, on the basis of the above embodiment, the second loading unit 34 includes: a schematic diagram of the structures of the third obtaining unit 341, the third determining unit 342, and the loading subunit 343 is shown in fig. 7, wherein,

the third obtaining unit 341 is configured to obtain a DOM tree and a render tree generated by parsing the web page; the third determining unit 342 is configured to determine a node of a media resource class related to the main content according to the visible node and the render tree in the DOM tree; the loading subunit 343 is configured to load a node of a media resource class related to the body content.

The implementation process of the functions and actions of each unit in the device is specifically described in the implementation process of the corresponding step in the above method, and is not described herein again.

Correspondingly, the present disclosure also provides a web page processing apparatus, the apparatus includes: a processor; a memory for storing processor-executable instructions; wherein the processor is configured to: in the process of loading a webpage, identifying the node type in the webpage; loading nodes of non-media resource classes in the webpage according to the node types in the webpage; identifying the main content in the webpage according to the loaded nodes of the non-media resource class; loading a node of a media resource class related to the main body content according to the main body content; and displaying the loaded main body content and the nodes of the media resource classes related to the main body content.

For the device embodiments, since they substantially correspond to the method embodiments, reference may be made to the partial description of the method embodiments for relevant points. The above-described embodiments of the apparatus are merely illustrative, and the units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the modules can be selected according to actual needs to achieve the purpose of the disclosed solution. One of ordinary skill in the art can understand and implement it without inventive effort.

Accordingly, the present disclosure also provides a mobile terminal comprising a memory, and one or more programs, wherein the one or more programs are stored in the memory and configured for execution by the one or more processors to include instructions for: in the process of loading a webpage, identifying the node type in the webpage; loading nodes of non-media resource classes in the webpage according to the node types in the webpage; identifying the main content in the webpage according to the loaded nodes of the non-media resource class; loading a node of a media resource class related to the main body content according to the main body content; and displaying the loaded main body content and the nodes of the media resource classes related to the main body content.

Fig. 8 is another schematic structural diagram of an apparatus 800 for web page processing according to an exemplary embodiment of the present disclosure. For example, the apparatus 800 may be a mobile phone, a computer, a digital broadcast terminal, a messaging device, a game console, a tablet device, a medical device, an exercise device, a personal digital assistant, and the like.

Referring to fig. 8, the apparatus 800 may include one or more of the following components: processing component 802, memory 804, power component 806, multimedia component 808, audio component 810, input/output (I/O) interface 812, sensor component 814, and communication component 816.

The processing component 802 generally controls overall operation of the device 800, such as operations associated with display, telephone calls, data communications, camera operations, and recording operations. The processing components 802 may include one or more processors 820 to execute instructions to perform all or a portion of the steps of the methods described above. Further, the processing component 802 can include one or more modules that facilitate interaction between the processing component 802 and other components. For example, the processing component 802 can include a multimedia module to facilitate interaction between the multimedia component 808 and the processing component 802.

The memory 804 is configured to store various types of data to support operations at the apparatus 800. Examples of such data include instructions for any application or method operating on device 800, contact data, phonebook data, messages, pictures, videos, and so forth. The memory 804 may be implemented by any type or combination of volatile or non-volatile memory devices such as Static Random Access Memory (SRAM), electrically erasable programmable read-only memory (EEPROM), erasable programmable read-only memory (EPROM), programmable read-only memory (PROM), read-only memory (ROM), magnetic memory, flash memory, magnetic or optical disks.

Power components 806 provide power to the various components of device 800. The power components 806 may include a power management system, one or more power supplies, and other components associated with generating, managing, and distributing power for the apparatus 800.

The multimedia component 808 includes a screen that provides an output interface between the device 800 and a user. In some embodiments, the screen may include a Liquid Crystal Display (LCD) and a Touch Panel (TP). If the screen includes a touch panel, the screen may be implemented as a touch screen to receive an input signal from a user. The touch panel includes one or more touch sensors to sense touch, slide, and gestures on the touch panel. The touch sensor may not only sense the boundary of a touch or slide action, but also detect the duration and pressure associated with the touch or slide operation. In some embodiments, the multimedia component 808 includes a front facing camera and/or a rear facing camera. The front camera and/or the rear camera may receive external multimedia data when the device 800 is in an operating mode, such as a shooting mode or a video mode. Each front camera and rear camera may be a fixed optical lens system or have a focal length and optical zoom capability.

The audio component 810 is configured to output and/or input audio signals. For example, the audio component 810 includes a Microphone (MIC) configured to receive external audio signals when the apparatus 800 is in an operational mode, such as a call mode, a recording mode, and a voice recognition mode. The received audio signals may further be stored in the memory 804 or transmitted via the communication component 816. In some embodiments, audio component 810 also includes a speaker for outputting audio signals.

The I/O interface 812 provides an interface between the processing component 802 and peripheral interface modules, which may be keyboards, click wheels, buttons, etc. These buttons may include, but are not limited to: a home button, a volume button, a start button, and a lock button.

The sensor assembly 814 includes one or more sensors for providing various aspects of state assessment for the device 800. For example, the sensor assembly 814 may detect the open/closed status of the device 800, the relative positioning of components, such as a display and keypad of the device 800, the sensor assembly 814 may also detect a change in the position of the device 800 or a component of the device 800, the presence or absence of user contact with the device 800, the orientation or acceleration/deceleration of the device 800, and a change in the temperature of the device 800. Sensor assembly 814 may include a proximity sensor configured to detect the presence of a nearby object without any physical contact. The sensor assembly 814 may also include a light sensor, such as a CMOS or CCD image sensor, for use in imaging applications. In some embodiments, the sensor assembly 814 may also include an acceleration sensor, a gyroscope sensor, a magnetic sensor, a pressure sensor, or a temperature sensor.

The communication component 816 is configured to facilitate communications between the apparatus 800 and other devices in a wired or wireless manner. The device 800 may access a wireless network based on a communication standard, such as WiFi, 2G or 3G, or a combination thereof. In an exemplary embodiment, the communication component 816 receives a broadcast signal or broadcast related information from an external broadcast management system via a broadcast channel. In an exemplary embodiment, the communication component 816 further includes a Near Field Communication (NFC) module to facilitate short-range communications. For example, the NFC module may be implemented based on Radio Frequency Identification (RFID) technology, infrared data association (IrDA) technology, Ultra Wideband (UWB) technology, Bluetooth (BT) technology, and other technologies.

In an exemplary embodiment, the apparatus 800 may be implemented by one or more Application Specific Integrated Circuits (ASICs), Digital Signal Processors (DSPs), Digital Signal Processing Devices (DSPDs), Programmable Logic Devices (PLDs), Field Programmable Gate Arrays (FPGAs), controllers, micro-controllers, microprocessors or other electronic components for performing the above-described methods.

In an exemplary embodiment, a non-transitory computer-readable storage medium comprising instructions, such as the memory 804 comprising instructions, executable by the processor 820 of the device 800 to perform the above-described method is also provided. For example, the non-transitory computer readable storage medium may be a ROM, a Random Access Memory (RAM), a CD-ROM, a magnetic tape, a floppy disk, an optical data storage device, and the like.

A non-transitory computer readable storage medium having instructions therein, which when executed by a processor of a mobile terminal, enable the mobile terminal to perform a web page processing method, the method comprising: in the process of loading a webpage, identifying the node type in the webpage; loading nodes of non-media resource classes in the webpage according to the node types in the webpage; identifying the main content in the webpage according to the loaded nodes of the non-media resource class; loading a node of a media resource class related to the main body content according to the main body content; and displaying the loaded main body content and the nodes of the media resource classes related to the main body content.

Other embodiments of the present embodiments will be apparent to those skilled in the art from consideration of the specification and practice of the invention disclosed herein. This application is intended to cover any variations, uses, or adaptations of the disclosure following, in general, the principles of the disclosure and including such departures from the present disclosure as come within known or customary practice within the art to which the disclosure pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the disclosure being indicated by the following claims.

It will be understood that the present disclosure is not limited to the precise arrangements described above and shown in the drawings and that various modifications and changes may be made without departing from the scope thereof. The scope of the present disclosure is limited only by the appended claims.

Claims

1. A method for processing a web page, comprising:

in the process of loading a webpage, identifying the node type in the webpage;

2. The method of claim 1, wherein the identifying the subject content in the web page comprises:

acquiring a DOM tree and a render tree generated by analyzing the webpage;

3. The method of claim 2, wherein determining the preview value for each visible node comprises:

4. The method of claim 3, further comprising:

5. The method of claim 4, further comprising:

6. The method of claim 2, wherein the determining the main content in the web page according to the preview value of each visible node comprises:

7. The method of claim 6, wherein the determining the visible node from the highest visible node and the next highest visible node where the preview content is located according to the hierarchical relationship between the highest visible node and the next highest visible node comprises:

8. The method according to claim 7, wherein the determining the visible node where the preview content is located according to the position relationship of the content in the webpage corresponding to the highest visible node and the second highest visible node comprises:

if the position relation is left-right and not coincident, respectively calculating the sum of preview values of nodes of which the labels are preset labels and which are included in the highest visible node and the next highest visible node, and taking the visible node with the larger sum as the visible node where the preview content is located; or

9. The method as claimed in claim 1, wherein the node loading the media resource class related to the subject content according to the subject content comprises:

acquiring a DOM tree and a render tree generated by analyzing the webpage;

10. The method of any one of claims 1 to 9, wherein the web page is a current web page or a next page of the current web page.

11. A web page processing apparatus, comprising:

12. The apparatus of claim 11, wherein the second identification unit comprises:

13. The apparatus of claim 12, wherein the first determining unit comprises:

14. The apparatus of claim 13, wherein the first determining unit further comprises:

15. The apparatus of claim 14, wherein the first determining unit further comprises:

16. The apparatus of claim 12, wherein the second determining unit comprises:

17. The apparatus according to claim 16, wherein the fifth determining subunit is configured to, when the highest visible node is any parent node of the next highest visible node, take the highest visible node as the visible node where the preview content is located; or, when the highest visible node is not any level parent node of the second highest visible node, determining the visible node where the preview content is located according to the position relationship of the content in the webpage corresponding to the highest visible node and the second highest visible node.

18. The apparatus of claim 11, wherein the second loading unit comprises:

19. The apparatus according to any one of claims 11 to 18, wherein the web page identified by the first identifying unit is a current web page or a next page of the current web page.

20. A web page processing apparatus, comprising:

a processor;

a memory for storing processor-executable instructions;

wherein the processor is configured to:

in the process of loading a webpage, identifying the node type in the webpage;