CN109948095B - Method, device, terminal and storage medium for displaying webpage content - Google Patents

Method, device, terminal and storage medium for displaying webpage content Download PDF

Info

Publication number
CN109948095B
CN109948095B CN201711202503.7A CN201711202503A CN109948095B CN 109948095 B CN109948095 B CN 109948095B CN 201711202503 A CN201711202503 A CN 201711202503A CN 109948095 B CN109948095 B CN 109948095B
Authority
CN
China
Prior art keywords
webpage
content
web page
text content
node
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201711202503.7A
Other languages
Chinese (zh)
Other versions
CN109948095A (en
Inventor
张枫枫
孟德全
胡晶晶
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tencent Technology Shenzhen Co Ltd
Original Assignee
Tencent Technology Shenzhen Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tencent Technology Shenzhen Co Ltd filed Critical Tencent Technology Shenzhen Co Ltd
Priority to CN201711202503.7A priority Critical patent/CN109948095B/en
Publication of CN109948095A publication Critical patent/CN109948095A/en
Application granted granted Critical
Publication of CN109948095B publication Critical patent/CN109948095B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Landscapes

  • Information Transfer Between Computers (AREA)

Abstract

The invention discloses a method, a device and a storage medium for displaying webpage content, and belongs to the technical field of networks. The method comprises the following steps: acquiring webpage elements of a first webpage to be displayed; determining non-text content and text content of the first webpage from webpage content of the first webpage according to the label of the webpage element; and displaying a second webpage which accords with a preset format, wherein the second webpage comprises the non-text content and the text content of the first webpage. Because the text content and the non-text content can be determined from the webpage content of the first webpage, when the second webpage is displayed, the text content of the first webpage can be displayed, and the non-text content of the first webpage can also be displayed, so that the problem that the content difference before and after typesetting is large due to the fact that the non-text content is filtered out is solved, and the accuracy is improved.

Description

Method, device, terminal and storage medium for displaying webpage content
Technical Field
The present invention relates to the field of network technologies, and in particular, to a method, an apparatus, a terminal, and a storage medium for displaying web page content.
Background
When a user reads information webpage contents, in order to facilitate the user to look up the webpage contents next time, a collection function is added to a plurality of application software (APP). The collection function of the application software can collect not only the webpage content in the application software, but also the webpage content in other application software. Because the typesetting style of the web pages of each application software is different, in order to realize a uniform style, when a user views the web page content of a certain web page in the application software, the terminal rearranges the web page content and displays the typesetted web page content.
At present, the process of typesetting the web page content by the server of the application may be: when a user reads the webpage content of a collected webpage, the terminal sends the webpage address of the webpage to the server; the server pulls the webpage content of the webpage according to the webpage address; and extracting the text content of the webpage content, typesetting the extracted text content according to a preset format, and sending the typesetted text content to the terminal. And the terminal receives and displays the typesetted text content.
In the process of implementing the invention, the inventor finds that the prior art has at least the following problems:
in the method, the server can only extract the text content in the webpage content and typeset the text content. Because only the text content is extracted and the non-text content is filtered, the difference between the typeset webpage content and the webpage content before typesetting is larger, namely the accuracy is poor.
Disclosure of Invention
The invention provides a method, a device, a terminal and a storage medium for displaying webpage content, which can solve the problem of poor display accuracy. The technical scheme is as follows:
in one aspect, the present invention provides a method for displaying web page content, the method comprising:
acquiring webpage elements of a first webpage to be displayed;
determining non-text content and text content of the first webpage from webpage content of the first webpage according to the label of the webpage element;
and displaying a second webpage which accords with a preset format, wherein the second webpage comprises the non-text content and the text content of the first webpage.
In one aspect, the present invention provides a method for displaying content of a favorite web page, where the method includes:
displaying collection items of at least one collected webpage, wherein the collection items of any webpage comprise a webpage address of any webpage;
acquiring webpage elements of the first webpage according to the webpage address of the selected first webpage;
determining non-text content and text content of the first webpage from webpage content of the first webpage according to the label of the webpage element;
and displaying a second webpage which accords with a preset format, wherein the second webpage comprises the non-text content and the text content of the first webpage.
In one aspect, the present invention provides an apparatus for displaying web page content, the apparatus comprising:
the acquisition module is used for acquiring webpage elements of a first webpage to be displayed;
the determining module is used for determining the non-text content and the text content of the first webpage from the webpage content of the first webpage according to the label of the webpage element;
and the display module is used for displaying a second webpage which accords with a preset format, and the second webpage comprises the non-text content and the text content of the first webpage.
In one aspect, the present invention provides an apparatus for displaying favorite web page content, the apparatus comprising:
the display module is used for displaying collection items of at least one collected webpage, and the collection items of any webpage comprise a webpage address of any webpage;
the acquisition module is used for acquiring webpage elements of the first webpage according to the webpage address of the selected first webpage;
the determining module is used for determining the non-text content and the text content of the first webpage from the webpage content of the first webpage according to the label of the webpage element;
the display module is further used for displaying a second webpage conforming to a preset format, and the second webpage comprises the non-text content and the text content of the first webpage.
In one aspect, the present invention provides a terminal, including a processor and a memory, where at least one instruction, at least one program, a code set, or a set of instructions is stored in the memory, and the instruction, the program, the code set, or the set of instructions is loaded and executed by the processor to implement the operations performed in the method for displaying web page content.
In one aspect, the present invention provides a terminal, where the terminal includes a processor and a memory, where the memory stores at least one instruction, at least one program, a code set, or a set of instructions, and the instruction, the program, the code set, or the set of instructions are loaded and executed by the processor to implement the operations performed in the method for displaying favorite web page contents.
In one aspect, the present invention provides a computer-readable storage medium, which stores at least one instruction, at least one program, a set of codes, or a set of instructions, which is loaded and executed by a processor to implement the operations performed in the method for displaying web page content.
In one aspect, the present invention provides a computer-readable storage medium, which stores at least one instruction, at least one program, a set of codes, or a set of instructions, wherein the instruction, the program, the set of codes, or the set of instructions is loaded and executed by a processor to implement the operations performed in the method for displaying favorite web page contents described above.
In the method for displaying webpage content provided by the embodiment of the invention, webpage elements of a first webpage to be displayed are obtained; and according to the label of the webpage element, determining the non-text content and the text content of the first webpage from the webpage content of the first webpage. Because the text content and the non-text content can be determined from the webpage content of the first webpage, when the second webpage is displayed, the text content of the first webpage can be displayed, and the non-text content of the first webpage can also be displayed, so that the problem that the content difference before and after typesetting is large due to the fact that the non-text content is filtered out is solved, and the accuracy is improved.
Drawings
FIG. 1A is a schematic diagram of an implementation environment provided by embodiments of the invention;
fig. 1B is a schematic diagram of a display sharing interface according to an embodiment of the present invention;
fig. 1C is a schematic diagram illustrating a web page address of a favorite web page according to an embodiment of the present invention;
fig. 1D is a schematic diagram illustrating a display of a prompt message according to an embodiment of the present invention;
FIG. 2A is a flowchart of a method for displaying web page content according to an embodiment of the present invention;
FIG. 2B is a diagram illustrating a display collection interface according to an embodiment of the present invention;
FIG. 2C is a diagram illustrating a web page content of a first web page according to an embodiment of the present invention;
FIG. 3 is a flowchart of a method for displaying content of a favorite web page according to an embodiment of the present invention;
FIG. 4A is a schematic diagram of an apparatus for displaying web content according to an embodiment of the present invention;
fig. 4B is a schematic structural diagram of a determining module according to an embodiment of the present invention;
fig. 4C is a schematic structural diagram of an apparatus for displaying web content according to an embodiment of the present invention;
fig. 4D is a schematic structural diagram of a display module according to an embodiment of the disclosure;
FIG. 5 is a schematic structural diagram of an apparatus for displaying content of a favorite web page according to an embodiment of the present invention;
fig. 6 is a schematic structural diagram of a terminal according to an embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, embodiments of the present invention will be described in detail with reference to the accompanying drawings.
An embodiment of the present invention provides an implementation environment, see fig. 1A, where the implementation environment includes a terminal 101 and a resource server 102; the terminal 101 may be any Device that installs an application having a collection function, such as a mobile phone terminal, a PAD (Portable Android Device) terminal, or a computer terminal. The terminal 101 may access the resource server 102 through a network to obtain a service provided by the resource server 102, which may be a web content service. Such as news services, public services, etc.
The access of the terminal 101 to the resource server 102 can be realized through any application installed on the terminal 101, and when the web page content provided by the resource server 102 is accessed, the terminal 101 can collect the accessed content through any application with a collection function, so that the content can be quickly retrieved and used through the favorite of the collected content when the content needs to be consulted next time. The collection means that addresses and/or content data of the webpage content to be collected are stored on a server corresponding to the application client. For example, when the web page content is displayed on an application client, the web page address and/or the content data of the web page content can be stored in a server corresponding to the application client by triggering the collection function on the application client, so as to complete the collection of the web page content. Of course, any application with the favorite function and the application for accessing the web page content may be different applications, for example, when the terminal 101 is used to read information, web page, articles of public numbers and other contents on the first application client, the interface of another application client with a favorite function (second application client) may be invoked or triggered by the triggering of a shortcut function on the first application client such as a favorite function, thereby establishing an information or data transfer communication/communication connection between the two application clients (the first application client and the second application client), or send a preset trigger signal to another application client with the favorite function (the second application client), or a preset signal is generated through the trigger of the shortcut function to be captured or received by another application client (second application client) with the collection function. Correspondingly, another application client (second application client) with the collection function can establish communication/communication connection with the first application client through the interface; or the second application client captures, searches or receives a preset signal triggered or generated by the first application client through the shortcut function of the second application client, and collects and processes the information, the webpage, the public number and other contents displayed on the first application client. Of course, the first application client may also transmit the web page address and/or the content data of the web page content to the other application client (the second application client), so as to store the web page address and/or the content data on the server corresponding to the second application client, thereby completing the collection of the web page content. The embodiment of the invention limits how the collection function is realized and the specific expression of the application is realized. And when the terminal finishes collecting the webpage content, displaying a prompt message, wherein the prompt message is used for indicating that the collection is successful.
When the second application client of the terminal 101 collects the web page content displayed in the first application client, the first application client is operated in the foreground of the terminal 101, the web page content of a certain web page is displayed on the first application client, and the sharing button is displayed in the web page. When the user wants to collect the web page content, the user can click the sharing button. When detecting that the sharing button is triggered, the terminal 101 displays at least one sharing interface; the at least one sharing interface comprises a collection interface for calling the second application client, and the collection interface can be a copy link or a sharing link. When the user wants to collect the web page, the user can click on the collection interface. When detecting that the collection interface is triggered, the terminal 101 runs a second application client in the foreground, and stores the web page address and/or the content data of the web page to a server corresponding to the second application client.
For example, referring to fig. 1B, it is shown that the user reads a description about "three minutes reading through the terminal 101 (in this embodiment, a mobile phone terminal): when the user is interested in collecting the content, the sharing instruction can be triggered to the terminal 101 by triggering the shortcut function. When the terminal 101 detects a sharing instruction triggered by a user, at least one sharing interface is displayed, wherein at least one sharing mode includes a collection interface, and the collection interface is used for collecting the webpage. For example, the at least one sharing interface includes: the method comprises the steps of generating picture sharing, sharing to friends through social application, copying links and sharing to an information display platform. Wherein the copy link is a collection interface. When the user wants to collect the web page content, the user clicks on the copy link. When detecting that the collection interface is triggered, the terminal 101 acquires a web page address and/or content data of the web page, and stores the web page address and/or content data of the web page to a server corresponding to the application client, so as to complete collection of the web page content, see fig. 1C. When the terminal 101 finishes collecting the web page content, the terminal 101 displays a prompt message, which may be "collection success", as shown in fig. 1D.
When the terminal 101 stores the web page address and/or the content data of the web page content to the server corresponding to the application client, the terminal 101 generates a collection entry of the web page, where the collection entry at least includes the web page address of the web page, and the collection entry may also include information such as summary information of the web page, a web page title, and/or an application identifier of a source application of the web page. The application identifier may be an application name, etc.
It should be noted that the web page content collected by any application may be processed by the application client for display, or the application client may interact with the corresponding server, and the server returns the processed web page content to the application client for display. In addition, the information such as the web contents of the collection may be stored by the server on a user-by-user basis, so that the same user can browse the information collected on any one terminal 101 by using the collection function after logging in to the application client through any one terminal 101.
The embodiment of the invention provides a method for displaying webpage content, and an execution main body of the method can be a terminal. Referring to fig. 2A, the method includes:
201. the terminal obtains webpage elements of a first webpage to be displayed.
The first webpage is a webpage already collected by the terminal, and the embodiment of the invention only takes the example of viewing the content of the collected webpage as an example, but does not limit the source application of the collected webpage. The terminal may have stored the web page address or the web page content of the collected web page in advance, if the web page address is stored, the web page element of the first web page may be further obtained according to the web page address to avoid excessive occupation of the storage space of the terminal, and if the web page content is stored, the web page element of the first web page may be obtained according to the web page content stored in advance.
In the embodiment of storing the web page address, the terminal acquires the web page elements of the first web page to be displayed through the following steps (1) to (3), and the web page elements of the first web page may be web page HTML (HyperText Markup Language) elements.
(1) And the terminal responds to the viewing instruction and displays the webpage address of at least one collected webpage according to the viewing instruction.
The main interface of the webpage content currently displayed by the terminal comprises a viewing button; when the user wants to view the web page content of a certain web page, the user can trigger a view instruction to the terminal by triggering the view button. And when the terminal detects that the viewing button is triggered, responding to the viewing instruction, and displaying the webpage address of at least one collected webpage according to the viewing instruction.
When the terminal displays the web page address of at least one collected web page, the terminal can display the collection entry of at least one web page, and the collection entry of each web page at least comprises the web page address of the web page and can also comprise information such as summary information, web page title and/or application identification of source application of the web page. The user can select the webpage address of the webpage to be displayed in the webpage addresses of the at least one webpage according to the collection items of the displayed at least one webpage, submit the selected webpage address to the terminal, and execute the step (2).
For example, referring to fig. 2B, the terminal displays the web page addresses of two collected web pages, which are "three minutes reading: the world-wide pass from the product manager "and" HTML _ hundred department ".
(2) And the terminal acquires the webpage address of the selected first webpage from the webpage addresses of the plurality of webpages.
For example, the user selects "three minutes reading from the web page addresses of two web pages already collected by the terminal: from the world-wide pass of the product manager, "the terminal obtains" three minutes reading: web addresses that are passed around the world from the product manager.
(3) The terminal obtains the webpage elements of the first webpage from the webpage address of the first webpage.
The terminal loads the webpage address of the first webpage through WebView (web view) to access the source server of the first webpage, and acquires the webpage elements of the first webpage from the webpage address of the first webpage. The web page elements of the first web page may be source code of the first web page.
In the embodiment of the present invention, after the terminal acquires the webpage element of the first webpage, the terminal may determine the non-text content and the text content of the first webpage from the webpage content of the first webpage directly through the following step 202 and 205, and then typeset the non-text content and the text content. The terminal may also delete non-content elements from the web page elements of the first web page before determining the non-text content and the text content of the first web page from the web page content of the first web page.
The step that the terminal deletes the non-content element in the web page elements of the first web page may be: and the terminal determines non-content webpage elements in the webpage elements and deletes the non-content webpage elements in the webpage elements to obtain the content webpage elements. Based on the content web page element, step 202 is then performed.
The non-content webpage elements can be webpage elements corresponding to a third type of tags, and the third type of tags comprise style tags, style tags and/or script tags. Correspondingly, the step of the terminal determining the non-content web page element in the web page element may be: and the terminal determines the webpage elements corresponding to the third type of labels in the webpage elements according to the labels of the webpage elements, and takes the webpage elements corresponding to the third type of labels as the non-content webpage elements.
For example, the third type of tags includes style tags, and script tags. The Style tag may be CSS (casting Style Sheets), the Style tag may be a Style tag, and the script tag may be a script tag. The terminal determines the webpage elements corresponding to the CSS tags, the webpage elements corresponding to the style tags and the webpage elements corresponding to the script tags in the webpage elements according to the tags of the webpage elements, and the webpage elements corresponding to the CSS tags, the webpage elements corresponding to the style tags and the webpage elements corresponding to the script tags form non-content webpage elements.
In the embodiment of the invention, the terminal deletes the non-content webpage elements in the webpage elements of the first webpage, so that the non-text content and the text content are not influenced by the non-content webpage elements (such as styles, styles and/or scripts) when the non-text content and the text content are determined in the webpage content of the first webpage in the following process, and the typesetting accuracy is improved.
202. The terminal constructs a topological structure of the first webpage according to the webpage elements of the first webpage, and each element node of the topological structure corresponds to one webpage element.
The terminal determines the hierarchical relationship between each webpage element of the first webpage, and constructs the topological structure of the first webpage according to the hierarchical relationship between each webpage. The topology may be a DOM (Document Object Model) tree. The DOM tree is used for analyzing the webpage elements of the first webpage through DOM and generating a tree structure of the webpage elements, one element node of one tree structure corresponds to one webpage element, and the webpage content of the webpage element is stored in the node tag of the element node.
In a possible implementation manner, before the terminal constructs the topology structure of the first webpage, the terminal may also omit the non-content elements in the webpage elements of the first webpage, and when constructing the topology structure of the first webpage according to the webpage elements of the first webpage, ignore the non-content elements of the first webpage, and construct the topology structure only according to the content elements of the first webpage.
In the embodiment of the invention, the terminal constructs the topological structure of the first webpage, so that the subsequent processing is carried out based on the topological structure, and the subsequent typesetting efficiency is improved.
203. The terminal determines a first element node in the topological structure, wherein the first element node is an element node of non-text content.
The webpage elements corresponding to the element nodes comprise labels, and the labels of the text content are different from the labels of the non-text content. The terminal may determine the first element node in the topology through the tag of the web page element. Correspondingly, the steps can be as follows: the terminal determines element nodes corresponding to the first type of labels in the topological structure, and/or determines element nodes corresponding to the second type of labels in the topological structure, wherein the first type of labels comprise reference labels, table labels and/or code block labels, and the second type of labels comprise custom labels; and taking the element node corresponding to the first type label and/or the element node corresponding to the second type label as the first element node.
The terminal traverses the webpage elements corresponding to each element node in the topological structure, determines the webpage elements with the labels as the first class labels in the webpage elements corresponding to each element node, and determines the element nodes corresponding to the determined webpage elements as the element nodes corresponding to the first class labels. Similarly, the terminal traverses the webpage elements corresponding to each element node in the topological structure, determines the webpage elements with the labels of the second type labels in the webpage elements corresponding to each element node, and determines the element nodes corresponding to the determined webpage elements as the element nodes corresponding to the second type labels.
For example, the first type of tags include reference tags, table tags, and code block tags. Reference tags are < blockquote >, table tags are < table >, code block tags are < code > and < pre >, etc. The terminal determines element nodes with labels of < blockquote >, < table >, < code > and < pre > in the topological structure, and the determined element nodes are used as first element nodes. The second category of tags includes custom tags, such as audio tags, video tags, picture tags, and the like.
Because the encoding modes of different applications are different, different applications may correspond to different first type tags and second type tags, before this step, the terminal needs to determine the first type tags and/or the second type tags, and this process may be: the terminal obtains an application identifier of a source application of a first webpage; and determining the first type label and/or the second type label corresponding to the application identifier according to the application identifier.
When the terminal collects the first webpage, the terminal stores a collection entry of the first webpage, wherein the collection entry comprises an application identifier of a source application of the first webpage. Therefore, the terminal directly obtains the application identifier of the source application of the first webpage from the collection entry of the first webpage.
Different applications correspond to different tags of the first type. Before the step, the terminal obtains application identifiers of a plurality of source applications and a first type label of each application, and stores a corresponding relation between the application identifier of each source application and the first type label. Correspondingly, the step of determining, by the terminal, the first type tag corresponding to the application identifier may be: and the terminal acquires the first type label corresponding to the application identifier from the corresponding relation between the application identifier and the first type label according to the application identifier.
Likewise, different applications correspond to different second type tags. Before the step, the terminal obtains the application identifiers of the source applications and the second type tags of each other application, and stores the corresponding relation between the application identifier of each source application and the second type tags. Correspondingly, the step of determining, by the terminal, the second type tag corresponding to the application identifier according to the application identifier may be: and the terminal acquires the second type label corresponding to the application identifier from the corresponding relation between the application identifier and the second type label according to the application identifier.
In the embodiment of the invention, after the terminal determines the first element node in the topological structure, the corresponding relation between the webpage address of the first webpage and the first element node is stored, so that when the subsequent terminal displays the webpage content of the first webpage again, the first element node of the first webpage is obtained from the corresponding relation between the webpage address of the first webpage and the first element node directly according to the webpage address of the first webpage, and the identification process is not required to be carried out again, thereby improving the identification efficiency and further improving the subsequent typesetting efficiency.
204. And the terminal determines a second element node in the topological structure, wherein the second element node is an element node of the text content.
The webpage elements corresponding to the element nodes comprise tags, and the tags of the text contents are different from the tags of the non-text contents. The terminal may determine the second element node in the topology through the label of the web page element. Correspondingly, the steps can be as follows: and the terminal determines the element node corresponding to the fourth type label in the topological structure, and takes the element node corresponding to the fourth type label as a second element node. The fourth type of tags are text tags.
For example, a fourth type of tag includes a text tag, the text tag being < class >, etc. And the terminal determines the element node labeled as < class > in the topological structure, and takes the determined element node as a second element node.
Different application programs can correspond to different fourth-class labels due to different encoding modes of the different application programs, and the terminal can also obtain the fourth-class label corresponding to the application identifier of the source application of the first webpage according to the mode of obtaining the first-class label and/or the second-class label.
The topological structure comprises a first element node and a second element node. In the case where the terminal has determined the first element node, the terminal may take other element nodes in the topology other than the first element node as the second element node.
Similarly, after the terminal determines the second element node in the topological structure, the corresponding relation between the webpage address of the first webpage and the second element node is stored, so that when the subsequent terminal displays the webpage content of the first webpage again, the second element node of the first webpage is directly obtained from the corresponding relation between the webpage address of the first webpage and the second element node according to the webpage address of the first webpage, and the identification process is not required to be carried out again, so that the identification efficiency is improved, and the subsequent typesetting efficiency is improved.
It should be noted that, steps 203 and 204 do not have a strict time sequence, and step 203 may be executed first, and then step 204 may be executed; step 204 may be executed first, and then step 203 may be executed; steps 203 and 204 may also be performed by two processes, which is not limited in this embodiment of the present invention.
205. The terminal obtains the non-text content from the node label of the first element node, and obtains the text content from the node label of the second element node.
The node labels of the element nodes comprise webpage content, the terminal acquires the non-text content from the node label of the first element node, acquires the text content from the node label of the second element node, and step 206 is executed to typeset the non-text content and the text content.
206. And the terminal displays a second webpage which accords with the preset format, wherein the second webpage comprises the non-text content and the text content of the first webpage.
The terminal can directly typeset the non-text content and the text content, namely, the following first implementation manner; the terminal may also identify the text content from the non-text content and the text content, and only typeset the text content, that is, the following second implementation manner.
For the first implementation, the step may be: and the terminal combines the non-text content and the text content into webpage content of a second webpage and displays the webpage content of the second webpage which accords with the first preset format.
The first preset format comprises a first non-text content display format and a first text content display format, and the first text content display format comprises a first paragraph format and/or a first font format. The first non-textual content display format includes a first reference display format, a first table display format, and/or a first code block display format.
In the embodiment of the invention, after the terminal identifies the non-text content and the text content, the text content and the non-text content are directly typeset, so that the typesetting efficiency is improved. And, because the non-text content is not filtered, the readability of the web page content of the second web page is improved. In addition, when the terminal typesets the webpage content of the second webpage, the original text cannot be edited, the meaning of the original text is not changed, and the author of the original text is fully respected.
For example, a graph a in fig. 2C is the web page content of the first web page before typesetting, the font format of the web page content of the first web page is sons font, the font size of the title and the font size of the recommendation information are both 14, and the font size of the text content is 12. The terminal adjusts the font format and the size of the web page content of the first web page, modifies the font format of the web page content of the first web page into a regular font, and sets the font size of the recommendation information to be 8, to obtain the web page content of the second web page, see graph B in fig. 2C.
For the second implementation, this step may be implemented by the following steps (1) to (3), including:
(1) the terminal combines the non-text content and the text content into web page content of a second web page.
(2) The terminal identifies text content from the web content of the second web page.
The terminal can identify the text content in the webpage content of the second webpage through a first preset regular expression, and the first preset regular expression is used for identifying the text content in the webpage content; namely, the following first mode; the terminal can also identify the text content in the web content of the second web page through a second preset regular expression, wherein the second preset regular expression is used for identifying the non-text content in the web content, namely, the following second mode; the terminal may further identify the text content in the web content of the second web page according to the area where the text content is located, that is, the following third manner. The terminal may further identify the text content in the web content of the second web page according to the label of the element node, that is, in the following fourth manner.
In a first way, the step (2) may be: and the terminal identifies the text content from the webpage content of the second webpage through the first preset regular expression.
The terminal identifies a second specified element node from the element nodes of the second webpage through a first preset regular expression; and determining second node content corresponding to the second specified element node from the webpage content of the second webpage, and taking the second node content as text content.
The first preset regular expression comprises at least one first label, wherein the first label is a label corresponding to the text content. Correspondingly, the step of identifying the second specified element node from each element node of the second webpage by the terminal through the first preset regular expression may be: and the terminal traverses the label of each element node of the second webpage through the first preset regular expression and determines a second specified element node of which the label is matched with the first preset regular expression.
For example, the first tag is tag a, tag B, and tag C; the first preset regular expression may be label a or label B or label C.
In a second mode, the step (2) can be: and the terminal identifies non-text content from the webpage content of the second webpage through a second preset regular expression, and determines the content except the non-text content in the webpage content of the second webpage as the text content. The second preset regular expression is used for identifying non-text content in the webpage content. Non-textual content includes recommended links (e.g., advertisements), etc.
The second preset regular expression comprises at least one second label; the terminal identifies a third specified element node from the element nodes of the second webpage through a second preset regular expression; and determining third node content corresponding to the third specified element node from the webpage content of the second webpage, and taking the third node content as non-text content. And the second label is a label corresponding to the non-text content. Correspondingly, the step of identifying the third designated element node from the element nodes of the second webpage by the terminal through the second preset regular expression may be: and the terminal traverses the label of each element node of the second webpage through the second preset regular expression and determines a third designated element node of which the label is matched with the second preset regular expression.
In a possible implementation manner, the second preset regular expression may further include at least one keyword, where each keyword is a keyword corresponding to the non-text content. For example, the keyword may be "guess you like", "buy", or the like. Correspondingly, the step of identifying the non-text content from the web content of the second web page by the terminal through the second preset regular expression may be: the terminal divides the webpage content of the second webpage into a plurality of content blocks and determines the matching degree between each content block and a second preset regular expression; and selecting the content blocks with the matching degree exceeding a preset threshold from the plurality of content blocks according to the matching degree between each content block and the second preset regular expression, and determining the content blocks with the matching degree exceeding the preset threshold as non-text content. The terminal may use a paragraph in the web page content as a content block.
The preset threshold may be set and changed as needed, and is not specifically limited in the embodiment of the present invention. For example, the preset threshold may be 80% or 85%, etc.
In the third mode, because the text content is generally located in the middle area of the web page, the terminal can identify the text content from the web page content of the second web page according to the preset text location area. Accordingly, step (2) may be: and the terminal determines a designated area in the second webpage and takes the webpage content in the designated area as the text content.
Because the web page layouts of different applications are different, correspondingly, the step of the terminal determining the designated area in the second web page may be: the terminal acquires an application identifier of a source application of a first webpage; and determining a designated area corresponding to the application identifier according to the application identifier, and determining the designated area in the second webpage.
In a fourth mode, the step (2) may be: the terminal determines the weight of each element node of the second webpage, determines a first designated element node according to the weight of each element node, determines first node content corresponding to the first designated element node from the webpage content of the second webpage, and takes the first node content as text content.
In the embodiment of the invention, the terminal can determine the weight of the element node by combining the label of the element node and the node content. Accordingly, the step of the terminal determining the weight of each element node of the second web page may be implemented by the following steps (2-1) to (2-4), including:
(2-1) the terminal determines the tag type of each element node and the number of words included in the node contents corresponding to each element node.
(2-2) the terminal determines a first weight of each element node according to the label type of each element node.
Storing the corresponding relation between each label type and the weight in the terminal; correspondingly, the steps can be as follows: and the terminal acquires the first weight of each element node from the corresponding relation between the label type and the weight according to the label type of each element node.
And (2-3) the terminal determining the second weight of each element node according to the number of words included in the node content corresponding to each element node.
The terminal stores the corresponding relation between the word number and the weight; correspondingly, the steps can be as follows: and the terminal acquires the second weight of each element node from the corresponding relation between the word number and the weight according to the word number included in the node content corresponding to each element node.
In the embodiment of the invention, the terminal stores the corresponding relation between the word number and the weight, and the terminal acquires the second weight of each element node from the corresponding relation between the word number and the weight according to the word number included in the content of the node corresponding to each element, so that the accuracy of determining the second weight of each element node is improved.
The terminal can also store the corresponding relation between the word number range and the weight; correspondingly, the steps can be as follows: the terminal determines a word number range in which the word number included in the node content corresponding to each element node is located according to the word number included in the node content corresponding to each element node and the stored word number range, and acquires a second weight of each element node from a corresponding relationship between the word number range and the weight according to the word number range corresponding to each element node.
In the embodiment of the invention, the terminal stores the corresponding relation between the word number range and the weight, and determines the second weight of each element node according to the word number included in the node content corresponding to each element node and the corresponding relation between the word number range and the weight. Therefore, the terminal does not need to store the corresponding relation between each word number and the weight, and the storage space is saved.
And (2-4) the terminal determining the weight of each element node according to the first weight and the second weight of each element node.
For each element node, the terminal determines a first coefficient of the first weight and a second coefficient of the second weight, determines a product of the first weight and the first coefficient of the element node to obtain a first numerical value, determines a product of the second weight and the second coefficient of the element node to obtain a second numerical value, and takes the sum of the first numerical value and the second numerical value as the weight of the element node.
(3) And the terminal displays the text content of the second webpage which accords with the second preset format.
The second preset format may be the same as or different from the first preset format. And the second preset format comprises a second non-text content display format and a second text content display format, and the second text content display format comprises a second paragraph format and/or a second font format. The second non-textual content display format includes a second reference display format, a second table display format, and/or a second code block display format.
For example, referring to fig. 2C, the terminal filters out the non-text content in fig. 2C, and displays only the text content of the second web content.
In the embodiment of the invention, the terminal identifies the text content from the webpage content of the second webpage, so that the non-text content such as the advertisement content and/or the recommended content is filtered, the disturbance of the non-Chinese content to the user is avoided, and the viscosity of the user is improved. And the webpage content is typeset through the terminal, so that the concurrency capability of the server is reduced.
After the terminal displays the second webpage, the terminal stores the webpage address of the first webpage and the webpage content of the typeset second webpage, so that when the subsequent terminal displays the second webpage again, the webpage content of the second webpage is directly obtained from the corresponding relation between the webpage address of the first webpage and the webpage content of the second webpage according to the webpage address of the first webpage, the webpage content of the second webpage is displayed, the typesetting process is not required to be carried out again, and the typesetting efficiency is improved.
In the method for displaying webpage content provided by the embodiment of the invention, a terminal acquires webpage elements of a first webpage to be displayed; and determining the non-text content and the text content of the first webpage from the webpage content of the first webpage according to the label of the webpage element. Because the text content and the non-text content can be determined from the webpage content of the first webpage, when the second webpage is displayed, the text content of the first webpage can be displayed, and the non-text content of the first webpage can also be displayed, so that the problem of large content difference before and after typesetting caused by filtering out the non-text content is avoided, and the accuracy is improved.
The embodiment of the invention provides a method for displaying contents of collected webpages, the execution subject of the method is a terminal, referring to fig. 3, the method comprises the following steps:
301. the terminal displays the collection items of at least one webpage in collection, and the collection items of any webpage comprise the webpage address of any webpage.
The main interface of the webpage content currently displayed by the terminal comprises a viewing button; when the user wants to view the web page content of a certain web page, the user can trigger a view instruction to the terminal by triggering the view button. When the terminal detects that the viewing button is triggered, in response to the viewing instruction, displaying at least one collection entry of the web page according to the viewing instruction, wherein the collection entry of each web page at least comprises a web page address of the web page, and also can comprise information such as summary information of the web page, a web page title and/or an application identifier of a source application of the web page. The user can select the webpage address of the webpage to be displayed from the webpage addresses of the at least one webpage according to the collection items of the displayed at least one webpage, and submit the selected webpage address to the terminal.
302. And the terminal acquires the webpage elements of the first webpage according to the webpage address of the selected first webpage.
The process of obtaining the web page element of the first web page in this step is the same as that in step 201, and is not described herein again.
303. And the terminal determines the non-text content and the text content of the first webpage from the webpage contents of the first webpage according to the label of the webpage element.
This step can be realized by the above steps 202-205, which are not described herein again.
304. And the terminal displays a second webpage which accords with the preset format, wherein the second webpage comprises the non-text content and the text content of the first webpage.
This step is the same as step 206 and will not be described herein.
In the method for displaying the contents of the collected webpages provided by the embodiment of the invention, the terminal displays the collected items of at least one webpage, and the collected items of any webpage comprise the webpage address of any webpage. When a user wants to read the web page content of a certain web page, the user can click on the web page address of the certain web page. The terminal acquires webpage elements of the first webpage according to the webpage address of the selected first webpage; and according to the label of the webpage element, determining the non-text content and the text content of the first webpage from the webpage content of the first webpage. Because the text content and the non-text content can be determined from the webpage content of the first webpage, when the second webpage is displayed, the text content of the first webpage can be displayed, and the non-text content of the first webpage can also be displayed, so that the problem that the content difference before and after typesetting is large due to the fact that the non-text content is filtered out is solved, and the accuracy is improved.
The embodiment of the invention provides a device for displaying webpage content, which is applied to a terminal and used for executing steps executed by the terminal in the method for displaying the webpage content. Referring to fig. 4A, the apparatus includes:
an obtaining module 401, configured to obtain a web page element of a first web page to be displayed;
a determining module 402, configured to determine, according to the tag of the web page element, non-text content and text content of the first web page from the web page content of the first web page;
a display module 403, configured to typeset the non-text content and the text content, and display a second webpage.
In one possible implementation, referring to fig. 4B, the determining module 402 includes:
the constructing unit 4021 is configured to construct a topology structure of the first web page according to the web page element, where each element node of the topology structure corresponds to one web page element;
a determining unit 4022, configured to determine a first element node in the topology, where the first element node is an element node of a non-text content;
the determining unit 4022 is further configured to determine a second element node in the topological structure, where the second element node is an element node of the text content;
the obtaining unit 4023 is configured to obtain the non-text content from the node tag of the first element node, and obtain the text content from the node tag of the second element node.
In a possible implementation manner, the determining unit 4022 is further configured to determine an element node corresponding to a first type tag in the topology structure, and/or determine an element node corresponding to a second type tag in the topology structure, where the first type tag includes a reference tag, a table tag, and/or a code block tag, and the second type tag includes a custom tag; and taking the element node corresponding to the first type label and/or the element node corresponding to the second type label as the first element node.
In a possible implementation manner, the determining unit 4022 is further configured to obtain an application identifier of a source application of the first web page; and determining the first type label and/or the second type label corresponding to the application identifier according to the application identifier.
In one possible implementation, referring to fig. 4C, the apparatus further includes:
a deletion module 404 for determining non-content web page elements among the web page elements in which to delete the non-content web page elements.
In one possible implementation, referring to fig. 4D, the display module 403 includes:
a composing unit 4031, configured to compose the non-text content and the text content into web page content of the second web page;
an identifying unit 4032, configured to identify text content from the web page content of the second web page;
a display unit 4033, configured to display a second webpage that conforms to the preset format, where the second webpage includes non-text content and text content of the first webpage.
In a possible implementation manner, the identifying unit 4032 is further configured to identify the text content from the web content of the second web page through a preset regular expression, where the preset regular expression is used to identify the text content in the web content; and/or the presence of a gas in the gas,
the identifying unit 4032 is further configured to determine a weight of each element node of the second web page, determine a first designated element node according to the weight of each element node, determine first node content corresponding to the first designated element node from the web page content of the second web page, and take the first node content as text content.
In a possible implementation manner, the identifying unit 4032 is further configured to determine a tag type of each element node and a number of words included in a node content corresponding to each element node; determining a first weight of each element node according to the label type of each element node; determining a second weight of each element node according to the number of words included in the node content corresponding to each element node; determining a weight of each element node according to the first weight and the second weight of each element node.
In a possible implementation manner, the identifying unit 4032 is further configured to identify a second specified element node from the element nodes of the second webpage through a preset regular expression; and determining second node content corresponding to the second specified element node from the webpage content of the second webpage, and taking the second node content as text content.
In a possible implementation manner, the obtaining module 401 is further configured to respond to a viewing instruction, and display a webpage address of at least one collected webpage according to the viewing instruction; acquiring a webpage address of the selected first webpage from the webpage addresses of at least one webpage; and acquiring the webpage elements of the first webpage from the webpage address of the first webpage.
In the method for displaying the webpage content provided by the embodiment of the invention, the webpage elements of a first webpage to be displayed are obtained; and determining the non-text content and the text content of the first webpage from the webpage content of the first webpage according to the label of the webpage element. Because the text content and the non-text content can be determined from the webpage content of the first webpage, when the second webpage is displayed, the text content of the first webpage can be displayed, and the non-text content of the first webpage can also be displayed, so that the problem that the content difference before and after typesetting is large due to the fact that the non-text content is filtered out is solved, and the accuracy is improved.
An embodiment of the present invention provides a device for displaying content of a favorite web page, and referring to fig. 5, the device includes:
the display module 501 is configured to display collection items of at least one collected webpage, where the collection item of any webpage includes a webpage address of any webpage;
an obtaining module 502, configured to obtain a web page element of a first web page according to a web page address of the selected first web page;
a determining module 503, configured to determine, according to the tag of the web page element, non-text content and text content of the first web page from the web page content of the first web page;
the display module 501 is further configured to display a second webpage conforming to the preset format, where the second webpage includes the non-text content and the text content of the first webpage.
In the method for displaying the content of the collected webpages provided by the embodiment of the invention, the terminal displays the collection items of at least one collected webpage, and the collection items of any webpage comprise the webpage address of any webpage. When a user wants to read the web page content of a certain web page, the user can click on the web page address of the certain web page. The terminal acquires webpage elements of the first webpage according to the webpage address of the selected first webpage; and determining the non-text content and the text content of the first webpage from the webpage content of the first webpage according to the label of the webpage element. Because the text content and the non-text content can be determined from the webpage content of the first webpage, when the second webpage is displayed, the text content of the first webpage can be displayed, and the non-text content of the first webpage can also be displayed, so that the problem of large content difference before and after typesetting caused by filtering out the non-text content is avoided, and the accuracy is improved.
It should be noted that: in the device for displaying web page content according to the above embodiment, when web page content is displayed, only the division of the above functional modules is taken as an example, and in practical applications, the above function distribution may be completed by different functional modules according to needs, that is, the internal structure of the device is divided into different functional modules to complete all or part of the above described functions. In addition, the apparatus for displaying web page content and the method for displaying web page content provided in the foregoing embodiments belong to the same concept, and specific implementation processes thereof are described in detail in the method embodiments, and are not described herein again.
Fig. 6 shows a block diagram of a terminal 600 according to an exemplary embodiment of the present invention. The terminal 600 may be: a smart phone, a tablet computer, an MP3 player (Moving Picture Experts Group Audio Layer III, motion video Experts compression standard Audio Layer 3), an MP4 player (Moving Picture Experts Group Audio Layer IV, motion video Experts compression standard Audio Layer 4), a notebook computer, or a desktop computer. The terminal 600 may also be referred to by other names such as user equipment, portable terminal, laptop terminal, desktop terminal, etc.
In general, the terminal 600 includes: a processor 601 and a memory 602.
The processor 601 may include one or more processing cores, such as a 4-core processor, an 8-core processor, and so on. The processor 601 may be implemented in at least one hardware form of a DSP (Digital Signal Processing), an FPGA (Field-Programmable Gate Array), and a PLA (Programmable Logic Array). The processor 601 may also include a main processor and a coprocessor, where the main processor is a processor for Processing data in an awake state, and is also called a Central Processing Unit (CPU); a coprocessor is a low power processor for processing data in a standby state. In some embodiments, the processor 601 may be integrated with a GPU (Graphics Processing Unit), which is responsible for rendering and drawing the content required to be displayed on the display screen. In some embodiments, processor 601 may further include an AI (Artificial Intelligence) processor for processing computing operations related to machine learning.
Memory 602 may include one or more computer-readable storage media, which may be non-transitory. The memory 602 may also include high-speed random access memory, as well as non-volatile memory, such as one or more magnetic disk storage devices, flash memory storage devices. In some embodiments, a non-transitory computer readable storage medium in memory 602 is used to store at least one instruction for execution by processor 601 to implement a method of displaying web content as provided by method embodiments herein.
In some embodiments, the terminal 600 may further optionally include: a peripheral interface 603 and at least one peripheral. The processor 601, memory 602, and peripheral interface 603 may be connected by buses or signal lines. Various peripheral devices may be connected to the peripheral interface 603 via a bus, signal line, or circuit board. Specifically, the peripheral device includes: at least one of a radio frequency circuit 604, a touch screen display 605, a camera 606, an audio circuit 607, and a power supply 609.
The peripheral interface 603 may be used to connect at least one peripheral related to I/O (Input/Output) to the processor 601 and the memory 602. In some embodiments, the processor 601, memory 602, and peripheral interface 603 are integrated on the same chip or circuit board; in some other embodiments, any one or two of the processor 601, the memory 602, and the peripheral interface 603 may be implemented on separate chips or circuit boards, which is not limited by the present embodiment.
The Radio Frequency circuit 604 is used for receiving and transmitting RF (Radio Frequency) signals, also called electromagnetic signals. The radio frequency circuitry 604 communicates with communication networks and other communication devices via electromagnetic signals. The rf circuit 604 converts an electrical signal into an electromagnetic signal to transmit, or converts a received electromagnetic signal into an electrical signal. Optionally, the radio frequency circuit 604 comprises: an antenna system, an RF transceiver, one or more amplifiers, a tuner, an oscillator, a digital signal processor, a codec chipset, a subscriber identity module card, and so forth. The radio frequency circuitry 604 may communicate with other terminals via at least one wireless communication protocol. The wireless communication protocols include, but are not limited to: the world wide web, metropolitan area networks, intranets, various generations of mobile communication networks (2G, 3G, 4G, and 5G), Wireless local area networks, and/or WiFi (Wireless Fidelity) networks. In some embodiments, the rf circuit 604 may further include NFC (Near Field Communication) related circuits, which are not limited in this application.
The display 605 is used to display a UI (User Interface). The UI may include graphics, text, icons, video, and any combination thereof. When the display screen 605 is a touch display screen, the display screen 605 also has the ability to capture touch signals on or over the surface of the display screen 605. The touch signal may be input to the processor 601 as a control signal for processing. At this point, the display 605 may also be used to provide virtual buttons and/or a virtual keyboard, also referred to as soft buttons and/or a soft keyboard. In some embodiments, the display 605 may be one, providing the front panel of the terminal 600; in other embodiments, the display 605 may be at least two, respectively disposed on different surfaces of the terminal 600 or in a folded design; in still other embodiments, the display 605 may be a flexible display disposed on a curved surface or on a folded surface of the terminal 600. Even more, the display 605 may be arranged in a non-rectangular irregular pattern, i.e., a shaped screen. The Display 605 may be made of LCD (Liquid Crystal Display), OLED (Organic Light-Emitting Diode), and the like.
The camera assembly 606 is used to capture images or video. Optionally, camera assembly 606 includes a front camera and a rear camera. Generally, a front camera is disposed at a front panel of a terminal, and a rear camera is disposed at a rear surface of the terminal. In some embodiments, the number of the rear cameras is at least two, and each rear camera is any one of a main camera, a depth-of-field camera, a wide-angle camera and a telephoto camera, so that the main camera and the depth-of-field camera are fused to realize a background blurring function, and the main camera and the wide-angle camera are fused to realize panoramic shooting and VR (Virtual Reality) shooting functions or other fusion shooting functions. In some embodiments, camera assembly 606 may also include a flash. The flash lamp can be a single-color temperature flash lamp or a double-color temperature flash lamp. The double-color-temperature flash lamp is a combination of a warm-light flash lamp and a cold-light flash lamp, and can be used for light compensation at different color temperatures.
Audio circuitry 607 may include a microphone and a speaker. The microphone is used for collecting sound waves of a user and the environment, converting the sound waves into electric signals, and inputting the electric signals to the processor 601 for processing or inputting the electric signals to the radio frequency circuit 604 to realize voice communication. For the purpose of stereo sound collection or noise reduction, a plurality of microphones may be provided at different portions of the terminal 600. The microphone may also be an array microphone or an omni-directional acquisition microphone. The speaker is used to convert electrical signals from the processor 601 or the radio frequency circuit 604 into sound waves. The loudspeaker can be a traditional film loudspeaker and can also be a piezoelectric ceramic loudspeaker. When the speaker is a piezoelectric ceramic speaker, the speaker can be used for purposes such as converting an electric signal into a sound wave audible to a human being, or converting an electric signal into a sound wave inaudible to a human being to measure a distance. In some embodiments, audio circuitry 607 may also include a headphone jack.
Power supply 609 is used to provide power to the various components in terminal 600. The power supply 609 may be ac, dc, disposable or rechargeable. When the power supply 609 includes a rechargeable battery, the rechargeable battery may be a wired rechargeable battery or a wireless rechargeable battery. The wired rechargeable battery is a battery charged through a wired line, and the wireless rechargeable battery is a battery charged through a wireless coil. The rechargeable battery can also be used to support fast charge technology.
In some embodiments, the terminal 600 also includes one or more sensors 610. The one or more sensors 610 include, but are not limited to: acceleration sensor 611, gyro sensor 612, pressure sensor 613, optical sensor 615, and proximity sensor 616.
The acceleration sensor 611 may detect the magnitude of acceleration in three coordinate axes of a coordinate system established with the terminal 600. For example, the acceleration sensor 611 may be used to detect components of the gravitational acceleration in three coordinate axes. The processor 601 may control the touch screen display 605 to display the user interface in a landscape view or a portrait view according to the gravitational acceleration signal collected by the acceleration sensor 611. The acceleration sensor 611 may also be used for acquisition of motion data of a game or a user.
The gyro sensor 612 may detect a body direction and a rotation angle of the terminal 600, and the gyro sensor 612 may acquire a 3D motion of the user on the terminal 600 in cooperation with the acceleration sensor 611. The processor 601 may implement the following functions according to the data collected by the gyro sensor 612: motion sensing (such as changing the UI according to a user's tilting operation), image stabilization at the time of photographing, game control, and inertial navigation.
The pressure sensor 613 may be disposed on a side frame of the terminal 600 and/or on a lower layer of the touch display screen 605. When the pressure sensor 613 is disposed on the side frame of the terminal 600, a user's holding signal of the terminal 600 can be detected, and the processor 601 performs left-right hand recognition or shortcut operation according to the holding signal collected by the pressure sensor 613. When the pressure sensor 613 is arranged at the lower layer of the touch display screen 605, the processor 601 controls the operability control on the UI interface according to the pressure operation of the user on the touch display screen 605. The operability control comprises at least one of a button control, a scroll bar control, an icon control and a menu control.
The optical sensor 615 is used to collect the ambient light intensity. In one embodiment, processor 601 may control the display brightness of touch display 605 based on the ambient light intensity collected by optical sensor 615. Specifically, when the ambient light intensity is higher, the display brightness of the touch display screen 605 is increased; when the ambient light intensity is low, the display brightness of the touch display screen 605 is turned down. In another embodiment, the processor 601 may also dynamically adjust the shooting parameters of the camera assembly 606 according to the ambient light intensity collected by the optical sensor 615.
A proximity sensor 616, also known as a distance sensor, is typically disposed on the front panel of the terminal 600. The proximity sensor 616 is used to collect the distance between the user and the front surface of the terminal 600. In one embodiment, when the proximity sensor 616 detects that the distance between the user and the front surface of the terminal 600 gradually decreases, the processor 601 controls the touch display 605 to switch from the bright screen state to the dark screen state; when the proximity sensor 616 detects that the distance between the user and the front surface of the terminal 600 becomes gradually larger, the touch display 605 is controlled by the processor 601 to switch from the message screen state to the bright screen state.
Those skilled in the art will appreciate that the configuration shown in fig. 6 is not intended to be limiting of terminal 600 and may include more or fewer components than those shown, or some components may be combined, or a different arrangement of components may be used.
An embodiment of the present invention further provides a computer-readable storage medium, where the computer-readable storage medium is applied to a terminal, and at least one instruction, at least one program, a code set, or a set of instructions is stored in the computer-readable storage medium, where the instruction, the program, the code set, or the set of instructions is loaded and executed by a processor to implement the operations performed by the terminal in the method for displaying web page content according to the foregoing embodiments.
An embodiment of the present invention further provides a computer-readable storage medium, where the computer-readable storage medium is applied to a terminal, and the computer-readable storage medium stores at least one instruction, at least one program, a code set, or a set of instructions, where the instruction, the program, the code set, or the set of instructions are loaded and executed by a processor to implement the operations performed by the terminal in the method for displaying favorite web page content according to the foregoing embodiment.
It will be understood by those skilled in the art that all or part of the steps for implementing the above embodiments may be implemented by hardware, or may be implemented by a program instructing relevant hardware, where the program may be stored in a computer-readable storage medium, and the above-mentioned storage medium may be a read-only memory, a magnetic disk or an optical disk, etc.
The above description is only for the purpose of illustrating the preferred embodiments of the present invention and should not be taken as limiting the scope of the present invention, which is intended to cover any modifications, equivalents, improvements, etc. within the spirit and scope of the present invention.

Claims (10)

1. A method for displaying webpage content, which is applied to an application client, and comprises the following steps:
displaying collection items of at least one collected webpage, wherein the collection items of any webpage comprise a webpage address of any webpage;
acquiring webpage elements of the first webpage according to the selected webpage address of the first webpage;
constructing a topological structure of the first webpage according to the webpage elements, wherein each element node of the topological structure corresponds to one webpage element;
determining element nodes corresponding to a first type of tags in the topological structure, and/or determining element nodes corresponding to a second type of tags in the topological structure, wherein the first type of tags comprise reference tags, table tags and/or code block tags, the second type of tags comprise custom tags, and the custom tags comprise audio tags, video tags and/or picture tags;
taking element nodes corresponding to the first type of tags and/or element nodes corresponding to the second type of tags as first element nodes, and determining other element nodes except the first element nodes in the topological structure as second element nodes, wherein the first element nodes are element nodes of non-text content of the first webpage, and the second element nodes are element nodes of text content of the first webpage;
acquiring the non-text content from the node label of the first element node, and acquiring the text content from the node label of the second element node;
displaying a second webpage which accords with a preset format, wherein the second webpage comprises the non-text content and the text content;
the method further comprises the following steps:
acquiring an application identifier of a source application of the first webpage;
and determining a first type label corresponding to the application identifier of the source application from the corresponding relationship between the application identifier and the first type label, and/or determining a second type label corresponding to the application identifier of the source application from the corresponding relationship between the application identifier and the second type label, wherein different application programs correspond to different first type labels and different second type labels.
2. The method of claim 1, wherein prior to determining the non-textual content and the textual content from the web page content of the first web page, the method further comprises:
determining non-content web page elements in the web page elements, and deleting the non-content web page elements in the web page elements.
3. The method of claim 1, wherein displaying a second webpage conforming to a preset format, the second webpage comprising the non-text content and the text content comprises:
composing the non-text content and the text content into web page content of the second web page;
identifying text content from the web page content of the second web page;
and displaying the text content of the second webpage which accords with a preset format.
4. The method of claim 3, wherein the identifying text content from the web content of the second web page comprises:
identifying the text content from the webpage content of the second webpage through a preset regular expression, wherein the preset regular expression is used for identifying the text content in the webpage content; and/or the presence of a gas in the gas,
determining the weight of each element node of the second webpage, determining a first designated element node according to the weight of each element node, determining first node content corresponding to the first designated element node from the webpage content of the second webpage, and taking the first node content as the text content.
5. The method of claim 4, wherein determining the weight for each element node of the web page content of the second web page comprises:
determining the label type of each element node and the number of words included in the node content corresponding to each element node;
determining a first weight of each element node according to the label type of each element node;
determining a second weight of each element node according to the number of words included in the node content corresponding to each element node;
and determining the weight of each element node according to the first weight and the second weight of each element node.
6. The method of claim 4, wherein the identifying the text content from the web content of the second web page through a preset regular expression comprises:
identifying a second specified element node from the element nodes of the second webpage through the preset regular expression;
and determining second node content corresponding to the second specified element node from the webpage content of the second webpage, and taking the second node content as the text content.
7. The method according to any of claims 1-6, wherein said obtaining web page elements of said first web page comprises:
responding to a viewing instruction, and displaying a webpage address of at least one collected webpage according to the viewing instruction;
acquiring the webpage address of the selected first webpage from the webpage addresses of the at least one webpage;
and acquiring the webpage elements of the first webpage from the webpage address of the first webpage.
8. An apparatus for displaying web content, the apparatus comprising:
the display module is used for displaying collection items of at least one collected webpage, and the collection items of any webpage comprise a webpage address of any webpage;
the acquisition module is used for acquiring webpage elements of the first webpage according to the webpage address of the selected first webpage;
the determining module comprises a constructing unit, a determining unit and an acquiring unit, wherein the constructing unit is used for constructing a topological structure of the first webpage according to the webpage elements, and each element node of the topological structure corresponds to one webpage element;
the determining unit is configured to determine an element node corresponding to a first type of tag in the topology structure, and/or determine an element node corresponding to a second type of tag in the topology structure, where the first type of tag includes a reference tag, a table tag, and/or a code block tag, the second type of tag includes a custom tag, and the custom tag includes an audio tag, a video tag, and/or a picture tag;
the determining unit is further configured to determine, as a first element node, an element node corresponding to the first type of tag and/or an element node corresponding to the second type of tag, to be a second element node, where other element nodes except the first element node in the topological structure are determined, the first element node is an element node of the non-text content of the first web page, and the second element node is an element node of the text content of the first web page;
the acquiring unit is used for acquiring the non-text content from the node label of the first element node and acquiring the text content from the node label of the second element node;
the display module is further used for displaying a second webpage which accords with a preset format, and the second webpage comprises the non-text content and the text content of the first webpage;
the determining unit is further configured to obtain an application identifier of a source application of the first webpage; and determining a first type label corresponding to the application identifier of the source application from the corresponding relationship between the application identifier and the first type label, and/or determining a second type label corresponding to the application identifier of the source application from the corresponding relationship between the application identifier and the second type label, wherein different application programs correspond to different first type labels and different second type labels.
9. A terminal, characterized in that the terminal comprises a processor and a memory, in which at least one instruction, at least one program, a set of codes, or a set of instructions is stored, which is loaded and executed by the processor to implement the operations performed in the method of displaying web content according to any one of claims 1 to 7.
10. A computer-readable storage medium having stored therein at least one instruction, at least one program, a set of codes, or a set of instructions, which is loaded and executed by a processor to perform the operations performed in the method of displaying web content according to any one of claims 1 to 7.
CN201711202503.7A 2017-11-27 2017-11-27 Method, device, terminal and storage medium for displaying webpage content Active CN109948095B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201711202503.7A CN109948095B (en) 2017-11-27 2017-11-27 Method, device, terminal and storage medium for displaying webpage content

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201711202503.7A CN109948095B (en) 2017-11-27 2017-11-27 Method, device, terminal and storage medium for displaying webpage content

Publications (2)

Publication Number Publication Date
CN109948095A CN109948095A (en) 2019-06-28
CN109948095B true CN109948095B (en) 2022-09-30

Family

ID=67003973

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201711202503.7A Active CN109948095B (en) 2017-11-27 2017-11-27 Method, device, terminal and storage medium for displaying webpage content

Country Status (1)

Country Link
CN (1) CN109948095B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112508284A (en) * 2020-12-10 2021-03-16 网易(杭州)网络有限公司 Display material preprocessing method, putting method, system, device and equipment
CN114020987A (en) * 2022-01-06 2022-02-08 北京微步在线科技有限公司 Sample data acquisition method, device, equipment and storage medium based on webpage

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103150389A (en) * 2013-03-21 2013-06-12 北京奇虎科技有限公司 Method and device for processing matching setting of webpage text contents
CN103345532A (en) * 2013-07-26 2013-10-09 人民搜索网络股份公司 Method and device for extracting webpage information
CN106095985A (en) * 2016-06-20 2016-11-09 网际傲游(北京)科技有限公司 A kind of dynamic collection the method for cluster web pages information
CN107329985A (en) * 2017-05-31 2017-11-07 北京安云世纪科技有限公司 A kind of collecting method of the page, device and mobile terminal

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101477564B (en) * 2009-01-21 2011-05-04 北京千家悦网络科技有限公司 Intelligent layout method for displaying wide web page on narrow-screen equipment
CN107924412B (en) * 2015-08-18 2022-04-12 三星电子株式会社 Method and system for bookmarking web pages

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103150389A (en) * 2013-03-21 2013-06-12 北京奇虎科技有限公司 Method and device for processing matching setting of webpage text contents
CN103345532A (en) * 2013-07-26 2013-10-09 人民搜索网络股份公司 Method and device for extracting webpage information
CN106095985A (en) * 2016-06-20 2016-11-09 网际傲游(北京)科技有限公司 A kind of dynamic collection the method for cluster web pages information
CN107329985A (en) * 2017-05-31 2017-11-07 北京安云世纪科技有限公司 A kind of collecting method of the page, device and mobile terminal

Also Published As

Publication number Publication date
CN109948095A (en) 2019-06-28

Similar Documents

Publication Publication Date Title
CN109828802B (en) List view display method, device and readable medium
CN110019929B (en) Webpage content processing method and device and computer readable storage medium
US20220350470A1 (en) User Profile Picture Generation Method and Electronic Device
CN111274777B (en) Thinking guide display method and electronic equipment
CN111880712B (en) Page display method and device, electronic equipment and storage medium
KR20130142642A (en) Mobile terminal, server, system, method for controlling of the same
CN111191176B (en) Website content updating method, device, terminal and storage medium
CN112527287A (en) Article detail information display method, device, terminal and storage medium
CN109635202B (en) Content item processing method and device, electronic equipment and storage medium
CN112269853A (en) Search processing method, search processing device and storage medium
CN112632445A (en) Webpage playing method, device, equipment and storage medium
CN111970401A (en) Call content processing method and electronic equipment
CN112749362A (en) Control creating method, device, equipment and storage medium
CN109726379B (en) Content item editing method and device, electronic equipment and storage medium
WO2022057889A1 (en) Method for translating interface of application, and related device
CN109063079B (en) Webpage labeling method and electronic equipment
CN109948095B (en) Method, device, terminal and storage medium for displaying webpage content
CN109857578B (en) Text copying method and electronic equipment
CN113497835B (en) Multi-screen interaction method, electronic equipment and computer readable storage medium
CN114329292A (en) Resource information configuration method and device, electronic equipment and storage medium
CN115134316A (en) Topic display method, device, terminal and storage medium
CN114257755A (en) Image processing method, device, equipment and storage medium
CN112818205B (en) Page processing method, device, electronic equipment, storage medium and product
WO2022089276A1 (en) Collection processing method and related apparatus
WO2023246666A1 (en) Search method and electronic device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant