CN112749528A - Text processing method and device, electronic equipment and computer readable storage medium - Google Patents

Text processing method and device, electronic equipment and computer readable storage medium Download PDF

Info

Publication number
CN112749528A
CN112749528A CN201911052664.1A CN201911052664A CN112749528A CN 112749528 A CN112749528 A CN 112749528A CN 201911052664 A CN201911052664 A CN 201911052664A CN 112749528 A CN112749528 A CN 112749528A
Authority
CN
China
Prior art keywords
text
directory
webpage
link address
content
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201911052664.1A
Other languages
Chinese (zh)
Inventor
刘鹏
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tencent Technology Shenzhen Co Ltd
Original Assignee
Tencent Technology Shenzhen Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tencent Technology Shenzhen Co Ltd filed Critical Tencent Technology Shenzhen Co Ltd
Priority to CN201911052664.1A priority Critical patent/CN112749528A/en
Publication of CN112749528A publication Critical patent/CN112749528A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/951Indexing; Web crawling techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/955Retrieval from the web using information identifiers, e.g. uniform resource locators [URL]
    • G06F16/9566URL specific, e.g. using aliases, detecting broken or misspelled links
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/957Browsing optimisation, e.g. caching or content distillation

Abstract

The embodiment of the application relates to the technical field of Internet and discloses a text processing method, a text processing device, electronic equipment and a computer readable storage medium, wherein the text processing method comprises the following steps: extracting a text of the text webpage based on the text density when the text webpage meets a preset condition; determining a link address of a directory of the text in a preset white list according to the link address of the text webpage, extracting the directory content of the text according to the link address of the directory based on the text density, and presetting the link address of the directory of the text in the white list; and then, displaying the directory content of the text, and displaying the body text of the text webpage and the directory item corresponding to the body text of the text webpage in the directory content in an associated manner.

Description

Text processing method and device, electronic equipment and computer readable storage medium
Technical Field
The embodiment of the application relates to the technical field of internet, in particular to a text processing method and device, electronic equipment and a computer readable storage medium.
Background
With the development of network technology, people tend to acquire information through the internet, including reading various novel works on the internet, and most of the novel works on the internet exist in the form of web pages of world wide web (www), wherein the web pages of the world wide web generally refer to web pages based on Personal Computers (PCs), that is, the novel pages of the novel need to be read.
In the course of research and practice on the prior art, the inventors of the present application found that, at present, when reading the text of a certain chapter through a novel web page, the novel web page does not synchronously display the catalogue text of the whole novel, and the information in the novel web page of the novel is rich and complex, so that the novel web page is not as neat and clean as the traditional text, and contains a large amount of noise content, such as scripts added for enhancing user interactivity, navigation links added for facilitating user browsing, advertisement links added for commercial consideration, and the like, which seriously affects reading of the text of the novel text and causes poor reading experience.
Disclosure of Invention
The purpose of the embodiments of the present application is to solve at least one of the above technical drawbacks, and to provide the following technical solutions:
in one aspect, a text processing method is provided, including:
extracting a text of the text webpage based on the text density when the text webpage meets a preset condition;
determining a link address of a directory of the text in a preset white list according to the link address of the text webpage, extracting the directory content of the text according to the link address of the directory based on the text density, and the preset white list comprising the link address of the directory of the text;
and displaying the directory content of the text, and displaying the body text of the text webpage and the directory item corresponding to the body text of the text webpage in the directory content in an associated manner.
In one aspect, a text processing apparatus is provided, including:
the first extraction module is used for extracting the body text of the text webpage based on the text density when the text webpage meets the preset condition;
the second extraction module is used for determining the link address of the directory of the text in a preset white list according to the link address of the text webpage, extracting the directory content of the text according to the link address of the directory based on the text density, and presetting the link address of the directory including the text in the white list;
and the first display module is used for displaying the directory content of the text and performing associated display on the body text of the text webpage and the directory item corresponding to the body text of the text webpage in the directory content.
In one possible implementation, the apparatus further includes a second display module;
the second display module is used for displaying prompt information of a preset text reading mode when the link address of the text webpage does not belong to a preset white list, and the preset white list comprises the link address of the text webpage;
the second extraction module is specifically configured to, when detecting a trigger operation for a predetermined text reading mode, determine a link address of a directory of a text in a preset white list according to the link address of the text webpage, and extract directory content of the text according to the link address of the directory.
In one possible implementation, the text web page satisfies a predetermined condition, including at least one of:
the text density of the text webpage is not less than a preset threshold value;
the link address of the text webpage does not belong to the preset blacklist, and the link address in the preset blacklist is the link address intercepted and accessed.
In one possible implementation, the apparatus further includes a third display module;
and the third display module is used for displaying the text corresponding to any directory item in a linkage manner based on the pre-established corresponding relation when the triggering operation aiming at any directory item of the directory contents is detected.
In a possible implementation manner, the apparatus further includes a relationship establishing module;
and the relation establishing module is used for establishing one-to-one correspondence between each directory item of the directory content and the link address of the text webpage where each chapter of the text is located by positioning the Document Object Model (DOM) node based on the directory content of the text.
In one possible implementation, the apparatus further includes a processing module;
and the processing module is used for loading and displaying the body text of the next text webpage of the current text webpage of the text when the sliding distance of the sliding operation of the body text of the current text webpage of the text is detected to be larger than a preset distance threshold.
In one possible implementation manner, the apparatus further includes a fourth display module, and the fourth display module is configured to perform at least one of the following:
the directory virtual key is used for controlling the display of the directory content through the triggering operation of the directory virtual key;
and the display mode virtual key is used for displaying a preset display mode and controlling the display modes of the text and the catalogue content through the triggering operation of the display mode virtual key, and the preset display mode comprises a mode of displaying through at least one of different fonts, different background colors and different character colors.
In one aspect, an electronic device is provided, which includes a memory, a processor, and a computer program stored in the memory and executable on the processor, and when the processor executes the computer program, the text processing method is implemented.
In one aspect, a computer-readable storage medium is provided, on which a computer program is stored, which when executed by a processor implements the text processing method described above.
According to the text processing method provided by the embodiment of the application, the directory content of the text and the directory content of the display text are determined, so that when the text of a certain chapter is read through a text webpage, the directory text of the whole text can be synchronously displayed in the text webpage, a target reading chapter can be conveniently and quickly selected according to the directory text, and the convenience of reading operation is greatly improved; by determining the text of the text webpage and displaying the text, noise contents such as advertisements and navigation links in the text webpage are effectively filtered, the problem of too many text webpage interference factors is solved, the influence of the noise contents on reading of the text is effectively avoided, and the reading experience is greatly improved.
Additional aspects and advantages of embodiments of the present application will be set forth in part in the description which follows and, in part, will be obvious from the description, or may be learned by practice of the present application.
Drawings
The foregoing and/or additional aspects and advantages of embodiments of the present application will become apparent and more readily appreciated from the following description of the embodiments, taken in conjunction with the accompanying drawings of which:
fig. 1 is a schematic flowchart of a text processing method according to an embodiment of the present application;
FIG. 2 is a diagram illustrating a prompt for displaying a predetermined text reading mode according to an embodiment of the present application;
FIG. 3 is a schematic diagram illustrating a text of a novel web page and corresponding directory entries in directory content in a linked manner according to an embodiment of the present application;
fig. 4 is a schematic diagram illustrating loading and displaying of a text of a novel web page next to a current novel web page according to an embodiment of the present application;
FIG. 5 is a diagram illustrating a directory virtual key and a display mode virtual key according to an embodiment of the present disclosure;
FIG. 6 is a diagram illustrating an overall process of text processing according to an embodiment of the present application;
FIG. 7 is a schematic diagram of a basic structure of a text processing apparatus according to an embodiment of the present application;
fig. 8 is a schematic structural diagram of an electronic device according to an embodiment of the present application.
Detailed Description
Reference will now be made in detail to embodiments of the present application, examples of which are illustrated in the accompanying drawings, wherein like or similar reference numerals refer to the same or similar elements or elements having the same or similar function throughout. The embodiments described below with reference to the drawings are exemplary only for the purpose of explaining the present application and are not to be construed as limiting the present application.
As used herein, the singular forms "a", "an", "the" and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It will be further understood that the terms "comprises" and/or "comprising," when used in this specification, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof. It will be understood that when an element is referred to as being "connected" or "coupled" to another element, it can be directly connected or coupled to the other element or intervening elements may also be present. Further, "connected" or "coupled" as used herein may include wirelessly connected or wirelessly coupled. As used herein, the term "and/or" includes all or any element and all combinations of one or more of the associated listed items.
For better understanding and description of the embodiments of the present application, some technical terms used in the embodiments of the present application will be briefly described below.
The browser: refers to a client browser of a world wide Web (Web) service, which can send various requests to a Web server, and parse, display, and play hypertext information and various multimedia data formats from the Web server, etc.
And (4) process: is a basic unit for resource allocation and scheduling of an operating system, and is an activity occurring when a program and its data are sequentially executed on a processor.
Plug-in components: the program is written according to the rules of the application program interface, and the functions which are not possessed by the original pure operating system platform or the original application software platform can be realized. Since the plug-in needs to call the function library or data provided by the original operating system, it needs to run under the operating system platform (possibly supporting multiple system platforms simultaneously) specified by the application program, and cannot run separately from the specified system platform. For example, after installing a corresponding plug-in the browser, the browser can directly call the plug-in for processing a specific type of file.
To make the objects, technical solutions and advantages of the embodiments of the present application more clear, the embodiments of the present application will be further described in detail with reference to the accompanying drawings.
The following describes in detail the technical solutions of the embodiments of the present application and how to solve the above technical problems with specific embodiments. The following several specific embodiments may be combined with each other, and details of the same or similar concepts or processes may not be repeated in some embodiments. Embodiments of the present application will be described below with reference to the accompanying drawings.
One embodiment of the present application provides a text processing method, which is executed by a terminal device, where the terminal device may be a desktop device or a mobile terminal. As shown in fig. 1, the method includes:
and step S110, when the text webpage meets the preset condition, extracting the text of the text webpage based on the text density.
Specifically, the user may access a corresponding text website through a browser of the terminal device to read the content of a certain chapter of a certain text. Generally, the content of each chapter of text is presented through a corresponding text web page, i.e., the content of one chapter is presented through one text web page.
Specifically, when a user accesses a text web page corresponding to a certain chapter through a browser to read the content of the certain chapter, it is equivalent to send an access request for accessing the text web page to the browser, and the browser receives the access request accordingly. After the browser receives the access request, whether the text webpage meets the preset condition or not can be determined through a preset plug-in, and after the text webpage meets the preset condition, the text of the text webpage can be extracted based on the text density, so that the noise contents of advertisements, navigation links and the like in the text webpage can be effectively filtered, the problem of too many interference factors of the text webpage is solved, and the influence of the noise contents on reading of the text is effectively avoided.
In particular, web pages are composed of a wide variety of text, including content text, script text, anchor text, tag text, and the like. In effect, the content text on the web page mainly includes body content (written as body text) and irrelevant content. The text content refers to main information to be acquired when a user browses a webpage, and the irrelevant content refers to words often used for identifying website functions and is irrelevant to the text content of the webpage, such as copyright, statement, search, home page, help and the like. The words on the navigation bar and the words on the related links are considered to be irrelevant content. The texts of different types are combined together to form a web page rich in content, and the texts are distinguished, and the text density is formed by analyzing the proportion of the texts in the label text block. The text density has great significance for extracting the text of the webpage body, and the noise text (such as script text, anchor text, label text and the like) in the webpage can be effectively eliminated by using the text density, so that the text of the body can be accurately identified.
Step S120, determining the link address of the directory of the text in a preset white list according to the link address of the text webpage, extracting the directory content of the text according to the link address of the directory based on the text density, and presetting the link address of the directory including the text in the white list.
Specifically, after extracting the body text of the text webpage based on the text density, the browser may further determine a link address of a directory of the text in a preset white list according to the link address of the text webpage, where the preset white list may be obtained by the browser from a corresponding local server or cloud server in advance, and the preset white list includes the link address of the directory of the text. If the link address of the text web page is http:// www.xxxxx.com/book/7/2480870.html, and the portion (i.e. 2480870) after the last "/" and before the word of "html" of the link address is dynamically changed according to each chapter, the content of this portion can be recorded as query parameters, such as 2480870 for the query parameter of the first chapter, 2480877 for the query parameter of the second chapter, 2480888 for the query parameter of the third chapter, etc., i.e. the query parameters of the link address of the text web page of each chapter of the text are dynamically changed, but the portion before the last "/" of the link address of the text web page is fixed, and this fixed portion can be the link address of the directory of the text, and can have a unique correspondence with the link address of the directory of the text, so that the link address of the text web page can be changed according to the link address of the text web page, and determining the link address of the directory of the text in the preset white list.
After the link address of the directory of the text is determined, the directory content of the text can be extracted according to the link address of the directory based on the text density. If a text includes chapter 490, the following directory contents can be extracted: chapter 1 AAAA, chapter 2 BBBB, chapter 3 CCCCC, etc., and so on, chapter 490 yyyyy, etc., i.e., the directory content includes 490 directory entries.
Step S130, displaying the directory content of the text, and displaying the body text of the text webpage and the directory item corresponding to the body text of the text webpage in the directory content in an associated manner.
Specifically, after the text of the text webpage and the content of the text catalog are mentioned, the content of the extracted text catalog and the text of the text webpage may be displayed, wherein when the content of the text catalog is displayed, the content of the text catalog may be vertically arranged in a column for displaying, or may be in other display forms.
In the process of displaying the directory content of the text and the body text of the text webpage, the body text of the text webpage and the directory item corresponding to the body text of the text webpage in the directory content of the text can be displayed in a correlated manner. If the current text web page is the text web page corresponding to the chapter 2, that is, the text of the text web page is the content of the chapter 2, the whole catalog content and the text of the text web page can be displayed, and at the same time, the "chapter 2 BBBB" in the catalog content is displayed in bold, replaced with other font styles, replaced with other font colors, and the like, so that the text web page and the catalog content of the text are displayed in a manner of being distinguished from other catalog items in the catalog content, and the relevant display of the catalog item corresponding to the text of the text web page in the text web page and the catalog content of the text is realized.
According to the text processing method provided by the embodiment of the application, the directory content of the text and the directory content of the display text are determined, so that when the text of a certain chapter is read through a text webpage, the directory text of the whole text can be synchronously displayed in the text webpage, a target reading chapter can be conveniently and quickly selected according to the directory text, and the convenience of reading operation is greatly improved; by determining the text of the text webpage and displaying the text, noise contents such as advertisements and navigation links in the text webpage are effectively filtered, the problem of too many text webpage interference factors is solved, the influence of the noise contents on reading of the text is effectively avoided, and the reading experience is greatly improved.
It should be noted that the text may be a novel text, a news text, a thesis text, a patent text, and the like, and the embodiment of the present application is not limited thereto.
In one possible implementation, the text web page satisfies a predetermined condition, including at least one of: the text density of the text webpage is not less than a preset threshold value; the link address of the text webpage does not belong to the preset blacklist, and the link address in the preset blacklist is the link address intercepted and accessed.
Specifically, after the browser receives the access request, whether the novel webpage meets the preset condition can be determined through a preset plug-in, wherein, the preset plug-in the browser can determine whether the novel webpage meets the preset condition by detecting whether the text density of the text of the novel webpage is less than the preset threshold value, if the text density of the body text of the novel web page is not less than the predetermined threshold, it may be determined that the novel web page satisfies the predetermined condition, if the text density of the body text of the novel web page is less than the predetermined threshold, it may be determined that the novel web page does not satisfy the predetermined condition, no response is made to the access request, that is, the specific content of the novel web page corresponding to the access request is not returned, that is, the related information (including the text of the novel web page, the noise information, etc.) of the novel web page is not presented.
Specifically, the preset plug-in the browser may also determine whether the novel web page meets the predetermined condition by detecting whether the link address of the novel web page is a link address in a preset blacklist, where the preset blacklist stores the link address intercepted from access. If the link address of the novel webpage is not the link address in the preset blacklist, namely the link address of the novel webpage does not belong to the preset blacklist, the novel webpage is determined to meet the preset condition, if the link address of the novel webpage is the link address in the preset blacklist, namely the link address of the novel webpage belongs to the preset blacklist, the novel webpage does not meet the preset condition, at the moment, the novel webpage needs to be intercepted, namely relevant information (including text, noise information and the like of the novel webpage) of the novel webpage is not returned.
In practical application, when it is detected that the text density of the text of the body text of the novel webpage is not less than a predetermined threshold and the link address of the novel webpage does not belong to a preset blacklist, it can be determined that the novel webpage meets a predetermined condition.
In one possible implementation, after determining body text of the text webpage based on the text density, the method further includes:
if the link address of the text webpage does not belong to the preset white list, displaying prompt information of a preset text reading mode, wherein the preset white list comprises the link address of the text webpage;
determining a link address of a directory of a text in a preset white list according to the link address of the text webpage, and extracting the directory content of the text according to the link address of the directory, wherein the method comprises the following steps:
when the trigger operation aiming at the preset text reading mode is detected, the link address of the directory of the text is determined in the preset white list according to the link address of the text webpage, and the directory content of the text is extracted according to the link address of the directory based on the text density.
Specifically, the preset white list acquired by the browser from the corresponding server further includes a link address of the novel webpage, wherein the link address of the novel webpage in the preset white list is a link address which does not need to display prompt information of a predetermined text reading mode. Then, after extracting the body text of the novel web page based on the text density, whether the link address of the novel web page belongs to the link address of the novel web page in the preset white list can be further detected, wherein if the link address of the novel web page is determined to belong to the link address of the novel web page in the preset white list, the link address of the catalogue of the novel can be determined in the preset white list directly according to the link address of the novel web page without displaying prompt information of a preset text reading mode, and the catalogue content of the novel is extracted according to the link address of the catalogue based on the text density; and if the link address of the novel webpage is determined not to belong to the link address of the novel webpage in the preset white list, displaying prompt information for displaying a preset text reading mode.
The prompt message of the predetermined text reading mode may be displayed at a predetermined position (for example, the rightmost side) of an address bar of the novel web page, as shown in fig. 2, a trigger button of "reading mode" is displayed at the rightmost side of the address bar for prompting the predetermined text reading mode, and when the user clicks the button of "reading mode", the predetermined text reading mode is triggered, and at this time, the browser may detect the trigger operation for the predetermined text reading mode.
In addition to displaying the prompt message of the predetermined text reading mode at the predetermined position of the address bar, the prompt message may be displayed at the predetermined position of the display window of the novel web page, for example, in the form of a floating window, and the prompt message is displayed at the upper right of the display window, as shown in fig. 2, so that the user can be reminded more prominently. In fig. 2, the user may click a close button of the floating window to close the prompt message; the user may also click the "enter reading mode" button to trigger the predetermined text reading mode, at which point the browser may detect a trigger operation for the predetermined text reading mode.
Specifically, when the browser detects a trigger operation for a predetermined text reading mode, a link address of a directory of the novel in a preset white list may be determined according to the link address of the novel web page, and directory content of the novel may be extracted according to the link address of the directory based on the text density.
In a possible implementation manner, after displaying the directory content of the text, and displaying the body text of the text webpage and the directory entry corresponding to the body text of the text webpage in the directory content in an associated manner, the method further includes:
and when the triggering operation aiming at any directory item of the directory contents is detected, the text corresponding to any directory item is displayed in a linkage manner based on the pre-established corresponding relation.
Specifically, after the directory content of the novel and the text of the novel webpage are extracted, a DOM (Document Object Model) node can be positioned based on the directory content of the novel, the content of each chapter of the novel (namely, the text of each novel webpage) is accurately positioned, and the obtained content of each chapter is synchronized to the directory column, namely, a one-to-one correspondence relationship between the content of each chapter and the corresponding directory item in the directory column is established, wherein the directory column is formed according to the directory content of the novel, namely, the content of the directory column is the directory content, and each directory item of the directory column is the directory item of the directory content. The DOM is a standard programming interface for processing extensible markup language recommended by the W3C (Word Wide Web Consortium) organization.
Specifically, because the content of each chapter corresponds to the link address of the corresponding novel web page one to one, that is, when the link address of a certain novel web page is accessed, the content (text) of the corresponding chapter is correspondingly presented, so that the one to one correspondence relationship between the content of each chapter and the corresponding directory entry in the directory column is established, which can be regarded as the one to one correspondence relationship between the link address of the novel web page where each chapter of the novel is located and each directory entry of the directory content, that is, the one to one correspondence relationship between each directory entry of the directory content and the link address of the novel web page where each chapter of the novel is located is established by positioning the document object model DOM node.
Specifically, after the one-to-one correspondence between each directory entry of the directory content and the link address of the novel web page where each chapter of the novel is located is established, the user can check the content of the corresponding chapter by clicking any directory entry in the directory content, and correspondingly, the browser can detect the trigger operation for any directory entry of the directory content. If the user clicks a certain directory item (for example, "chapter 3 CCCC") in the directory content, the browser jumps to the link address of the novel webpage corresponding to the directory item "chapter 3 CCCC" according to the detected trigger operation, and extracts and displays the content (namely, the text) of the "chapter 3 CCCC".
Specifically, fig. 3 shows a schematic diagram of linkage display, in fig. 3, the left side is each directory entry of the directory content of the novel, the directory entry "chapter 5 dragon qi" displayed by bold and downward slide lines indicates that the user triggers or clicks the directory entry, and the right side is the text of "chapter 5 dragon qi" displayed in linkage, where the text of "chapter 5 dragon qi" is the link address of the novel webpage corresponding to the directory entry "chapter 5 dragon qi" that the browser jumps to according to the detected trigger operation, and refers to the obtained text.
In a possible implementation manner, after displaying the directory content of the text, and displaying the body text of the text webpage and the directory entry corresponding to the body text of the text webpage in the directory content in an associated manner, the method further includes: and when the sliding distance of the sliding operation of the body text of the current text webpage aiming at the novel is detected to be larger than a preset distance threshold, loading the body text of the next text webpage of the current text webpage of the text and displaying the body text.
Specifically, when the text of a certain novel web page is more, all the text cannot be displayed in the display window at one time, and at this time, the user needs to slide the text of the novel web page to view the content of the part that is not displayed, and the displayed part can be slid out of the display window along with the sliding operation. The user can slide the text of the novel webpage by sliding down the scroll bar of the novel webpage, and can also slide the text of the novel webpage by sliding up the touch display screen. Correspondingly, the browser of the terminal device detects the sliding operation of the text of the certain novel webpage. The text of the currently displayed novel web page may be written as the text of the current novel web page, that is, the browser of the terminal device may detect the sliding operation for the text of the current novel web page.
Specifically, as the user slides the body text of the current novel web page, the body text of the current novel web page slides by a larger distance with respect to a predetermined position (such as the top or the bottom) of the display window. Correspondingly, the browser of the terminal device detects the sliding distance of the text of the current novel webpage, wherein when the sliding distance of the sliding operation on the text of the current novel webpage is detected to be larger than a preset distance threshold, the text of the next novel webpage of the current novel webpage is preloaded and displayed, namely the text of the next novel webpage of the current novel webpage is extracted and displayed, so that the function of loading and displaying the content of the next chapter of the current chapter through the content of the current chapter of the sliding novel is realized, the sliding loading effect is achieved, a user can slide down to read the content of the next chapter, fluent fast reading is performed, and the reading continuity is prevented from being cut off.
It should be noted that the predetermined distance threshold may be a specific distance value, or may be a predetermined ratio obtained by converting the distance value, and if the predetermined ratio is a predetermined ratio, after detecting the sliding distance of the text sliding operation on the current novel web page, the ratio of the sliding distance to the total text height is calculated, and then it is detected whether the ratio is greater than the predetermined ratio.
In practical application, when the sum of the height of the display area of the display window, the actual height of the body text of the novel webpage and the preset height value is larger than or equal to the scrolling height of the body text of the novel webpage, preloading and displaying the body text of the novel webpage next to the current novel webpage. In fig. 4, a schematic diagram of preloading and displaying the body text of the next novel webpage of the current novel webpage is shown, and in fig. 4, "chapter 36 does not have xxxxxxx", that is, the content of the next chapter (that is, the body text of the next novel webpage of the current novel webpage) that is loaded and displayed when the sliding distance of the sliding operation for "chapter 35 meets xxxxxxx" (that is, the body text of the current novel webpage) is detected to be greater than a predetermined distance threshold. After the "no fruit in chapter 36 xxxxxxx" is loaded and displayed, the corresponding directory item "no fruit in chapter 36" in the directory content is displayed in a display manner different from other chapters in a linkage manner, such as a bold and downward slide line display.
In a possible implementation manner, after displaying the directory content of the text, and displaying the body text of the text webpage and the directory entry corresponding to the body text of the text webpage in the directory content in an associated manner, at least one of the following is further included: the directory virtual key is used for controlling the display of the directory content through the triggering operation of the directory virtual key; and the display mode virtual key is used for displaying a preset display mode and controlling the display modes of the text and the catalogue content through the triggering operation of the display mode virtual key, and the preset display mode comprises a mode of displaying through at least one of different fonts, different background colors and different character colors.
Specifically, after the directory content of the novel and the body text of the novel web page are displayed, a directory virtual key of the directory content may also be displayed, as shown in fig. 5. The user can determine whether to display the directory content of the novel by triggering the directory virtual key, and correspondingly, the browser of the terminal device can detect the triggering operation aiming at the directory virtual key. If the browser displays the directory content currently, the user can cancel displaying the directory content by triggering the directory virtual key, that is, the browser can cancel displaying the directory content when detecting the triggering operation for the directory virtual key under the condition that the browser displays the directory content; if the browser does not display the directory content currently, the user can display the directory content by triggering the directory virtual key, that is, if the browser detects a triggering operation for the directory virtual key without displaying the directory content, the directory content can be displayed.
Specifically, after the directory content of the novel and the body text of the novel web page are displayed, a display mode virtual key of a predetermined display mode, for example, a display mode virtual key of a night display mode, and a display mode virtual key of a day display mode, for example, may also be displayed, as shown in fig. 5. If the predetermined display mode is the night display mode, the user may determine whether to switch to the night display mode by triggering the display mode virtual key of the night display mode, and accordingly, the browser of the terminal device may detect a triggering operation for the display mode virtual key. If the user starts the night display mode, displaying the text and the catalogue content of the novel webpage through at least one of the font, the background color and the character color of the preset night mode, namely displaying the text and the catalogue content of the novel webpage through the preset night display style; and if the user closes the night display mode, canceling the night display style of the text and the directory content of the novel webpage, namely recovering the original display style of the text and the directory content of the novel webpage.
Specifically, as shown in fig. 5, an exemplary diagram of turning on the night display mode is given, and it can be seen that in fig. 5, in the night display mode, the text and the directory content of the novel web page are black in background, and the text color is white and bold and italic.
In addition, in fig. 5, in addition to the above-mentioned directory virtual key and display mode virtual key, a "automatically open at this web page" virtual key is also displayed, wherein if it is detected that the user has opened the virtual key, the link address of the current novel web page can be synchronized to the preset white list of the server, so that when the novel web page is visited again, the novel reading mode is automatically performed, that is, the text of the novel web page and the directory content of the novel are directly extracted and displayed, without popping up the prompt information of the preset reading mode.
Specifically, fig. 6 shows a schematic diagram of a process of processing a novel text, which specifically includes the following steps:
step S601: detecting whether the novel webpage meets a preset condition, namely detecting whether the text density of the text of the novel webpage is not less than a preset threshold value and detecting whether the link address of the novel webpage is in a preset blacklist, and if the novel webpage meets the preset condition, executing a step S602; otherwise, ending.
Step S602, extracting the body text of the novel web page based on the text density, analyzing the body text of the novel web page based on the text density, and extracting the body text of the novel web page when it is determined that the body text of the novel web page meets the extraction conditions according to the analysis result.
Step S603, detecting whether the link address of the novel web page belongs to a preset white list, if the link address of the novel web page belongs to the preset white list, executing step S605, and if the link address of the novel web page does not belong to the preset white list, executing step S604.
Step S604: and displaying prompt information of a preset text reading mode, wherein the prompt information can be displayed on the rightmost side of an address bar of the novel webpage, and can also be displayed in the form of a floating window on the upper right corner of a display window of the novel webpage. When the trigger operation for the predetermined text reading mode is detected, step S605 is performed.
Step S605: and performing template rendering, namely displaying the text of the novel webpage, extracting the content of the novel catalog based on the text density according to the determined link address of the novel catalog, and simultaneously displaying the extracted content of the catalog.
Fig. 7 is a schematic structural diagram of a text processing apparatus according to another embodiment of the present application, and as shown in fig. 7, the apparatus 700 may include a first extraction module 701, a second extraction module 702, and a first display module 703, where:
a first extraction module 701, configured to extract a body text of a text webpage based on text density when the text webpage meets a predetermined condition;
a second extraction module 702, configured to determine a link address of a directory of the text in a preset white list according to the link address of the text webpage, extract directory content of the text according to the link address of the directory based on the text density, and preset the link address of the directory including the text in the white list;
the first display module 703 is configured to display the directory content of the text, and perform associated display on the body text of the text webpage and a directory entry corresponding to the body text of the text webpage in the directory content.
In one possible implementation, the apparatus further includes a second display module;
the second display module is used for displaying prompt information of a preset text reading mode when the link address of the text webpage does not belong to a preset white list, and the preset white list comprises the link address of the text webpage;
the second extraction module is specifically configured to, when detecting a trigger operation for a predetermined text reading mode, determine a link address of a directory of a text in a preset white list according to the link address of the text webpage, and extract directory content of the text according to the link address of the directory.
In one possible implementation, the text web page satisfies a predetermined condition, including at least one of:
the text density of the text webpage is not less than a preset threshold value;
the link address of the text webpage does not belong to the preset blacklist, and the link address in the preset blacklist is the link address intercepted and accessed.
In one possible implementation, the apparatus further includes a third display module;
and the third display module is used for displaying the text corresponding to any directory item in a linkage manner based on the pre-established corresponding relation when the triggering operation aiming at any directory item of the directory contents is detected.
In a possible implementation manner, the apparatus further includes a relationship establishing module;
and the relation establishing module is used for establishing one-to-one correspondence between each directory item of the directory content and the link address of the text webpage where each chapter of the text is located by positioning the Document Object Model (DOM) node based on the directory content of the text.
In one possible implementation, the apparatus further includes a processing module;
and the processing module is used for loading and displaying the body text of the next text webpage of the current text webpage of the text when the sliding distance of the sliding operation of the body text of the current text webpage of the text is detected to be larger than a preset distance threshold.
In one possible implementation manner, the apparatus further includes a fourth display module, and the fourth display module is configured to perform at least one of the following:
the directory virtual key is used for controlling the display of the directory content through the triggering operation of the directory virtual key;
and the display mode virtual key is used for displaying a preset display mode and controlling the display modes of the text and the catalogue content through the triggering operation of the display mode virtual key, and the preset display mode comprises a mode of displaying through at least one of different fonts, different background colors and different character colors.
According to the device provided by the embodiment of the application, the directory content of the text and the directory content of the display text are determined, so that when the text of a certain chapter is read through a text webpage, the directory text of the whole text can be synchronously displayed in the text webpage, a target reading chapter can be conveniently and quickly selected according to the directory text, and the convenience of reading operation is greatly improved; by determining the text of the text webpage and displaying the text, noise contents such as advertisements and navigation links in the text webpage are effectively filtered, the problem of too many text webpage interference factors is solved, the influence of the noise contents on reading of the text is effectively avoided, and the reading experience is greatly improved.
It should be noted that the present embodiment is an apparatus embodiment corresponding to the method embodiment described above, and the present embodiment can be implemented in cooperation with the method embodiment described above. The related technical details mentioned in the above method embodiments are still valid in this embodiment, and are not described herein again in order to reduce repetition. Accordingly, the related-art details mentioned in the present embodiment can also be applied to the above-described method item embodiments.
Another embodiment of the present application provides an electronic device, as shown in fig. 8, an electronic device 800 shown in fig. 8 includes: a processor 801 and a memory 803. Wherein the processor 801 is coupled to a memory 803, such as via a bus 802. Further, the electronic device 800 may also include a transceiver 804. It should be noted that the transceiver 804 is not limited to one in practical applications, and the structure of the electronic device 800 is not limited to the embodiment of the present application.
The processor 801 is applied to the embodiment of the present application, and is used to implement the functions of the first extraction module, the second extraction module, and the first display module shown in fig. 7.
The processor 801 may be a CPU, general purpose processor, DSP, ASIC, FPGA or other programmable logic device, transistor logic device, hardware component, or any combination thereof. Which may implement or perform the various illustrative logical blocks, modules, and circuits described in connection with the disclosure. The processor 801 may also be a combination of computing functions, e.g., comprising one or more microprocessors, a combination of a DSP and a microprocessor, or the like.
Bus 802 may include a path that transfers information between the above components. The bus 802 may be a PCI bus or an EISA bus, etc. The bus 802 may be divided into an address bus, a data bus, a control bus, and the like. For ease of illustration, only one thick line is shown in FIG. 8, but this is not intended to represent only one bus or type of bus.
The memory 803 may be, but is not limited to, a ROM or other type of static storage device that can store static information and instructions, a RAM or other type of dynamic storage device that can store information and instructions, an EEPROM, a CD-ROM or other optical disk storage, optical disk storage (including compact disk, laser disk, optical disk, digital versatile disk, blu-ray disk, etc.), a magnetic disk storage medium or other magnetic storage device, or any other medium that can be used to carry or store desired program code in the form of instructions or data structures and that can be accessed by a computer.
The memory 803 is used for storing application program code for performing the present solution and is controlled in execution by the processor 801. The processor 801 is configured to execute application program code stored in the memory 803 to implement the actions of the text processing apparatus provided by the embodiment shown in fig. 7.
The electronic device provided by the embodiment of the application comprises a memory, a processor and a computer program which is stored on the memory and can run on the processor, wherein when the processor executes the program, the electronic device can realize that: by determining the directory content of the text and the directory content of the display text, the directory text of the whole text can be synchronously displayed in the text webpage when the text of a certain chapter is read through the text webpage, so that the target reading chapter can be conveniently and quickly selected according to the directory text, and the convenience of reading operation is greatly improved; by determining the text of the text webpage and displaying the text, noise contents such as advertisements and navigation links in the text webpage are effectively filtered, the problem of too many text webpage interference factors is solved, the influence of the noise contents on reading of the text is effectively avoided, and the reading experience is greatly improved.
The embodiment of the present application provides a computer-readable storage medium, on which a computer program is stored, and the computer program, when executed by a processor, implements the method shown in the above embodiment. The method comprises the steps that the directory content of a text and the directory content of a display text are determined, so that when the text of a certain chapter is read through a text webpage, the directory text of the whole text can be synchronously displayed in the text webpage, a target reading chapter can be conveniently and quickly selected according to the directory text, and the convenience of reading operation is greatly improved; by determining the text of the text webpage and displaying the text, noise contents such as advertisements and navigation links in the text webpage are effectively filtered, the problem of too many text webpage interference factors is solved, the influence of the noise contents on reading of the text is effectively avoided, and the reading experience is greatly improved.
The computer-readable storage medium provided by the embodiment of the application is suitable for any embodiment of the method.
It should be understood that, although the steps in the flowcharts of the figures are shown in order as indicated by the arrows, the steps are not necessarily performed in order as indicated by the arrows. The steps are not performed in the exact order shown and may be performed in other orders unless explicitly stated herein. Moreover, at least a portion of the steps in the flow chart of the figure may include multiple sub-steps or multiple stages, which are not necessarily performed at the same time, but may be performed at different times, which are not necessarily performed in sequence, but may be performed alternately or alternately with other steps or at least a portion of the sub-steps or stages of other steps.
The foregoing is only a partial embodiment of the present application, and it should be noted that, for those skilled in the art, several modifications and decorations can be made without departing from the principle of the present application, and these modifications and decorations should also be regarded as the protection scope of the present application.

Claims (10)

1. A method of text processing, comprising:
when the text webpage meets a preset condition, extracting the text of the text webpage based on text density;
determining a link address of a directory of a text in a preset white list according to the link address of the text webpage, and extracting the directory content of the text according to the link address of the directory based on the text density, wherein the preset white list comprises the link address of the directory of the text;
displaying the content of the text catalog, and displaying the text of the text webpage and the catalog item corresponding to the text of the text webpage in the content of the catalog in a correlated manner.
2. The method of claim 1, after determining body text of the textual web page based on text density, further comprising:
if the link address of the text webpage does not belong to a preset white list, displaying prompt information of a preset text reading mode, wherein the preset white list comprises the link address of the text webpage;
determining a link address of a directory of a text in a preset white list according to the link address of the text webpage, and extracting the directory content of the text according to the link address of the directory, including:
and when the trigger operation aiming at the preset text reading mode is detected, determining the link address of the directory of the text in a preset white list according to the link address of the text webpage, and extracting the directory content of the text according to the link address of the directory based on the text density.
3. The method of claim 1, wherein the text web page satisfies a predetermined condition comprising at least one of:
the text density of the text webpage is not less than a preset threshold value;
and the link address of the text webpage does not belong to a preset blacklist, and the link address in the preset blacklist is the link address intercepted and accessed.
4. The method according to any one of claims 1 to 3, further comprising, after displaying the catalog content of the text and displaying the body text of the text web page and the catalog item corresponding to the body text of the text web page in the catalog content in association with each other:
and when the triggering operation aiming at any directory item of the directory contents is detected, based on the pre-established corresponding relation, displaying the text corresponding to the any directory item in a linkage manner.
5. The method according to claim 4, before displaying the body text corresponding to any one of the directory entries in linkage based on the correspondence established in advance, further comprising:
and based on the directory content of the text, positioning a Document Object Model (DOM) node, and establishing a one-to-one correspondence between each directory item of the directory content and the link address of the text webpage where each chapter of the text is located.
6. The method of claim 1, further comprising, after displaying the directory content of the text and displaying the body text of the text webpage and the directory entry corresponding to the body text of the text webpage in the directory content in an associated manner, the steps of:
and when the sliding distance of the sliding operation of the body text of the current text webpage of the text is detected to be larger than a preset distance threshold, loading and displaying the body text of the next text webpage of the current text webpage of the text.
7. The method according to claim 1, wherein after displaying the directory content of the text and displaying the body text of the text webpage and the directory item corresponding to the body text of the text webpage in the directory content in an associated manner, at least one of the following is further included:
the directory virtual key is used for controlling the display of the directory content through the triggering operation of the directory virtual key;
and the display mode virtual key is used for controlling the display modes of the text and the catalogue content through the triggering operation of the display mode virtual key, and the preset display mode comprises a mode of displaying through at least one of different fonts, different background colors and different character colors.
8. A text processing apparatus, comprising:
the first extraction module is used for extracting the body text of the text webpage based on text density when the text webpage meets a preset condition;
the second extraction module is used for determining a link address of a directory of a text in a preset white list according to the link address of the text webpage, and extracting the directory content of the text according to the link address of the directory based on the text density, wherein the preset white list comprises the link address of the directory of the text;
the first display module is used for displaying the directory content of the text and displaying the body text of the text webpage and the directory item corresponding to the body text of the text webpage in the directory content in an associated manner.
9. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor implements the text processing method of any one of claims 1-7 when executing the program.
10. A computer-readable storage medium, characterized in that the computer-readable storage medium has stored thereon a computer program which, when being executed by a processor, implements the text processing method according to any one of claims 1 to 7.
CN201911052664.1A 2019-10-31 2019-10-31 Text processing method and device, electronic equipment and computer readable storage medium Pending CN112749528A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201911052664.1A CN112749528A (en) 2019-10-31 2019-10-31 Text processing method and device, electronic equipment and computer readable storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911052664.1A CN112749528A (en) 2019-10-31 2019-10-31 Text processing method and device, electronic equipment and computer readable storage medium

Publications (1)

Publication Number Publication Date
CN112749528A true CN112749528A (en) 2021-05-04

Family

ID=75644618

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911052664.1A Pending CN112749528A (en) 2019-10-31 2019-10-31 Text processing method and device, electronic equipment and computer readable storage medium

Country Status (1)

Country Link
CN (1) CN112749528A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113407889A (en) * 2021-07-15 2021-09-17 北京百度网讯科技有限公司 Novel transcoding method, device, equipment and storage medium

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101977233A (en) * 2010-11-01 2011-02-16 优视科技有限公司 Method and system for leading mobile terminal to browse webpage in reading mode
CN102541874A (en) * 2010-12-16 2012-07-04 中国移动通信集团公司 Webpage text content extracting method and device
CN102663050A (en) * 2012-03-29 2012-09-12 奇智软件(北京)有限公司 Method and device for processing electronic book data
CN103577466A (en) * 2012-08-03 2014-02-12 腾讯科技(深圳)有限公司 Method and device for displaying webpage content in browser
CN103729354A (en) * 2012-10-10 2014-04-16 腾讯科技(深圳)有限公司 Webpage information processing method and device
CN103970755A (en) * 2013-01-28 2014-08-06 腾讯科技(深圳)有限公司 Novel catalog entry identification method, device and system
US20180329581A1 (en) * 2017-05-15 2018-11-15 International Business Machines Corporation Generating a catalog for a web page

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101977233A (en) * 2010-11-01 2011-02-16 优视科技有限公司 Method and system for leading mobile terminal to browse webpage in reading mode
CN102541874A (en) * 2010-12-16 2012-07-04 中国移动通信集团公司 Webpage text content extracting method and device
CN102663050A (en) * 2012-03-29 2012-09-12 奇智软件(北京)有限公司 Method and device for processing electronic book data
CN103577466A (en) * 2012-08-03 2014-02-12 腾讯科技(深圳)有限公司 Method and device for displaying webpage content in browser
CN103729354A (en) * 2012-10-10 2014-04-16 腾讯科技(深圳)有限公司 Webpage information processing method and device
CN103970755A (en) * 2013-01-28 2014-08-06 腾讯科技(深圳)有限公司 Novel catalog entry identification method, device and system
US20180329581A1 (en) * 2017-05-15 2018-11-15 International Business Machines Corporation Generating a catalog for a web page

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
朱宇雄: "《小说迷新法宝UC浏览器智能阅读全新体验》", 《计算机与网络》, pages 34 *
石锦涛;: "基于文字密度提取网页正文", 福建电脑, no. 04, pages 116 - 117 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113407889A (en) * 2021-07-15 2021-09-17 北京百度网讯科技有限公司 Novel transcoding method, device, equipment and storage medium
CN113407889B (en) * 2021-07-15 2023-10-20 北京百度网讯科技有限公司 Novel transcoding method, device, equipment and storage medium

Similar Documents

Publication Publication Date Title
US10318095B2 (en) Reader mode presentation of web content
US9977768B2 (en) System for clipping webpages by traversing a dom, and highlighting a minimum number of words
US8751953B2 (en) Progress indicators for loading content
US8806325B2 (en) Mode identification for selective document content presentation
US10387535B2 (en) System and method for selectively displaying web page elements
US8959449B2 (en) Enabling hypertext elements to work with software applications
US20150193386A1 (en) System and Method of Facilitating Font Selection and Manipulation of Fonts
US20210149842A1 (en) System and method for display of document comparisons on a remote device
CN110209966B (en) Webpage refreshing method, webpage system and electronic equipment
US9934206B2 (en) Method and apparatus for extracting web page content
CA2712925A1 (en) Editing a document using a transitory editing surface
KR20140012664A (en) Method for rearranging web page
US20120030562A1 (en) Device and method for generating customized webpages
CN109933751B (en) Image-text drawing method and device, computer-readable storage medium and computer equipment
CN104750851A (en) Webpage content lazy loading method and system
CN110781427A (en) Method, device, equipment and storage medium for calculating first screen time
CN107391534B (en) Page display method, page file return method, page display device, page file return device and computer storage medium
CN112749528A (en) Text processing method and device, electronic equipment and computer readable storage medium
US10353979B2 (en) Web-user navigating information recording method, apparatus and storage medium
CN110515618B (en) Page information input optimization method, equipment, storage medium and device
JP6142620B2 (en) Display change program, display change method, and display change device
US20150193385A1 (en) System and Method for Facilitating Font Selection
CN113176878B (en) Automatic query method, device and equipment
WO2022061857A1 (en) Method for operating a terminal when accessing a web page defined by a code in a markup language
CN113742020A (en) Display implementation method for longitudinal multi-column layout of traditional Mongolian mobile terminal

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
REG Reference to a national code

Ref country code: HK

Ref legal event code: DE

Ref document number: 40044652

Country of ref document: HK

SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination