CN106445967B - Resource directory management method and device - Google Patents

Resource directory management method and device Download PDF

Info

Publication number
CN106445967B
CN106445967B CN201510489311.3A CN201510489311A CN106445967B CN 106445967 B CN106445967 B CN 106445967B CN 201510489311 A CN201510489311 A CN 201510489311A CN 106445967 B CN106445967 B CN 106445967B
Authority
CN
China
Prior art keywords
chapter
resource
impurity
directory
title
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201510489311.3A
Other languages
Chinese (zh)
Other versions
CN106445967A (en
Inventor
芦世先
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tencent Technology Shenzhen Co Ltd
Original Assignee
Tencent Technology Shenzhen Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tencent Technology Shenzhen Co Ltd filed Critical Tencent Technology Shenzhen Co Ltd
Priority to CN201510489311.3A priority Critical patent/CN106445967B/en
Publication of CN106445967A publication Critical patent/CN106445967A/en
Application granted granted Critical
Publication of CN106445967B publication Critical patent/CN106445967B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/957Browsing optimisation, e.g. caching or content distillation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/951Indexing; Web crawling techniques

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The embodiment of the invention discloses a method and a device for managing a resource directory, which are used for realizing the accurate search of a browser on the resource directory and improving the acquisition efficiency of the browser on network information resources. The method provided by the embodiment of the invention comprises the following steps: obtaining network information resources from a publishing platform, the network information resources including: a resource directory; respectively judging whether each chapter title of the resource catalog is an impurity chapter, wherein the impurity chapter is a chapter title which does not comprise the catalog name content in the resource catalog; and filtering the impurity chapters from the resource catalogue to obtain chapter titles including the catalogue name content in the resource catalogue.

Description

Resource directory management method and device
Technical Field
The present invention relates to the field of computer technologies, and in particular, to a method and an apparatus for managing a resource directory.
Background
The network information resources are quickly released, browsed and shared by relying on a network platform. The network information resources can be network novels, network logs, network microblogs and the like, the network novels are used as examples, network novels are novels published by network authors depending on a network basic platform and are emerging novels types along with the rapid development of the network, and the network novels can be rapidly acquired by browsers, are convenient to read and are increasingly popular with netists.
In order to realize convenient lookup of network content in network information resources, a resource directory is usually set in the network information resources, and a browser can quickly browse the network content corresponding to the resource directory only by querying the resource directory. However, in the current resource directory, a browser usually browses advertisements implanted by the publishing platform or notification information published by a network user, which both seriously affect the accurate search of the resource directory by the browser and reduce the acquisition efficiency of the network information resource by the browser.
Disclosure of Invention
The embodiment of the invention provides a method and a device for managing a resource directory, which are used for realizing the accurate search of a browser on the resource directory and improving the acquisition efficiency of the browser on network information resources.
In order to solve the above technical problems, embodiments of the present invention provide the following technical solutions:
in a first aspect, an embodiment of the present invention provides a method for managing a resource directory, including:
obtaining network information resources from a publishing platform, the network information resources including: a resource directory;
respectively judging whether each chapter title of the resource catalog is an impurity chapter, wherein the impurity chapter is a chapter title which does not comprise the catalog name content in the resource catalog;
and filtering the impurity chapters from the resource catalogue to obtain chapter titles including the catalogue name content in the resource catalogue.
In a second aspect, an embodiment of the present invention further provides a device for managing a resource directory, where the device includes:
an obtaining module, configured to obtain a network information resource from a publishing platform, where the network information resource includes: a resource directory;
the chapter judgment module is used for respectively judging whether each chapter title of the resource catalogue is an impurity chapter, and the impurity chapter is a chapter title which does not comprise catalogue name contents in the resource catalogue;
and the filtering module is used for filtering the impurity chapters from the resource catalogue to obtain chapter titles including the catalogue name content in the resource catalogue.
According to the technical scheme, the embodiment of the invention has the following advantages:
in the embodiment of the invention, network information resources are obtained from a publishing platform, wherein the network information resources comprise: and finally, filtering the impurity chapters from the resource catalogue to obtain the chapter titles including the catalogue name content in the resource catalogue. According to the method and the device, the network information resource comprising the resource catalog is obtained from the publishing platform, the impurity chapter can be obtained from the resource catalog, and the impurity chapter is filtered from the resource catalog, so that the impurity chapter irrelevant to the catalog name content in the resource catalog presented to the browser is filtered, the chapter title comprising the catalog name content is reserved in the resource catalog, the browser is prevented from being interfered by the impurity chapter, the resource catalog is accurately searched by the browser, and the network information resource obtaining efficiency of the browser is improved.
Drawings
In order to more clearly illustrate the technical solutions in the embodiments of the present invention, the drawings needed to be used in the description of the embodiments will be briefly introduced below, and it is apparent that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art that other drawings can be obtained according to the drawings.
Fig. 1 is a schematic flowchart illustrating a method for managing a resource directory according to an embodiment of the present invention;
fig. 2 is a schematic diagram illustrating an implementation manner of an impurity section according to an embodiment of the present invention;
FIG. 3 is a flowchart illustrating the management of a resource directory of a network novel according to an embodiment of the present invention;
fig. 4-a is a schematic structural diagram of a management apparatus for a resource directory according to an embodiment of the present invention;
fig. 4-b is a schematic structural diagram of a chapter judgment module according to an embodiment of the present invention;
fig. 4-c is a schematic structural diagram of another chapter judgment module according to an embodiment of the present invention;
FIG. 4-d is a schematic diagram illustrating a structure of another resource directory management apparatus according to an embodiment of the present invention;
fig. 5 is a schematic structural diagram of a server to which the resource directory management method provided in the embodiment of the present invention is applied.
Detailed Description
The embodiment of the invention provides a method and a device for managing a resource directory, which are used for realizing the accurate search of a browser on the resource directory and improving the acquisition efficiency of the browser on network information resources.
In order to make the objects, features and advantages of the present invention more obvious and understandable, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention, and it is apparent that the embodiments described below are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments that can be derived by one skilled in the art from the embodiments given herein are intended to be within the scope of the invention.
The terms "comprises" and "comprising," and any variations thereof, in the description and claims of this invention and the above-described drawings are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of elements is not necessarily limited to those elements, but may include other elements not expressly listed or inherent to such process, method, article, or apparatus.
The following are detailed below.
Referring to fig. 1, an embodiment of the method for managing a resource directory according to the present invention may be specifically applied to optimization management of network information resources, and the method for managing a resource directory according to an embodiment of the present invention may include the following steps:
101. acquiring network information resources from a publishing platform, wherein the network information resources comprise: a resource directory.
In the embodiment of the present invention, after the network information resource is released from the releasing platform, the management device of the resource directory provided in the embodiment of the present invention may acquire the network information resource from the releasing platform, where in order to implement convenient lookup of network content in the network information resource, the releasing platform generally sets the resource directory in the network information resource, and performs convenient management on the released network information resource through the resource directory. Taking the network information resource as the network novel as an example, the resource catalog may be a novel catalog, for example, the network novel updates the novel content periodically, and the content updated each time may be a new section issued in sequence, so the novel catalog may include a plurality of sections, for example, as follows, a network novel named "love pet wife" includes the following: chapter 1 goes into prison; chapter 2, reproduction; chapter 3, sister.
It should be noted that the management device for the resource directory provided in the embodiment of the present invention may be applied to an aggregation website, and then the aggregation website may acquire the network information resource from the publishing platform through the management device for the resource directory, and the aggregation website presents the network information resource acquired from the publishing platform in a page of the website, so that a viewer may select to browse the network information resource.
102. And respectively judging whether each chapter title of the resource directory is an impurity chapter, wherein the impurity chapter is a chapter title which does not comprise directory name contents in the resource directory.
In the embodiment of the present invention, after acquiring the network information resource from the publishing platform, the management apparatus of the resource directory acquires the resource directory from the network information resource, where the resource directory is derived from the publishing platform, and sometimes the publishing platform implants advertisement information in the resource directory, or even when the creator of the network information resource issues the network information resource, the resource directory issues the notification message by using the resource directory, and after acquiring the resource directory from the publishing platform, the management apparatus of the resource directory may also include the advertisement and the notification message from the publishing platform, and the advertisement and the notification message are unrelated to the network content of the network information resource. In the prior art, after the resource directory of the network information resource is acquired from the publishing platform, the aggregation website can directly present the resource directory in a webpage, and a browser can browse the advertisement implanted by the publishing platform or the notification information published by the network user in the resource directory, which both seriously affects the accurate search of the resource directory by the browser and reduces the acquisition efficiency of the network information resource by the browser.
In the embodiment of the present invention, after acquiring the resource directory of the network information resource from the publishing platform, the management device of the resource directory does not directly present the resource directory in the web page, but analyzes and determines chapter titles in the resource directory, that is, it is required to respectively determine whether each chapter title of the resource directory is an impurity chapter, where the impurity chapter in the embodiment of the present invention is a chapter title that does not include a directory name content in the resource directory. The section title is a component of the resource catalog, taking the network information resource as the network novel as an example, the resource catalog is a novel catalog, and the novel catalog can include a plurality of sections, for example, as follows, a novel network novel named "love pet wife", the novel catalog of the network novel includes the following: chapter 1 goes into prison; chapter 2, regenerating; chapter 3, sister, where chapter 1 is prison, chapter 2 is regenerated, chapter 3 is sister, and chapter 1 is three chapter titles, and directory name content is usually included in chapter titles, such as "prison entering", "regeneration", and "sister", is directory name content.
As described above, not every chapter header in the resource directory includes the directory name content, but some chapter headers do not include the directory name content although they appear in the resource directory, and taking the network information resource as an example, there are foreign chapters that do not include the directory name content in the resource directory, for example, chapter headers that are not related to the directory name, such as "asking for a month ticket" or "asking for a leave" appearing in the network novels, are foreign chapters. Referring to fig. 2, a schematic diagram of an implementation manner of a foreign matter section according to an embodiment of the present invention is provided, and a recommendation ticket and my fourth book circled by a box in fig. 2 are provided for internal details. The "is a foreign chapter, and all the chapters not enclosed by boxes in fig. 2 are chapter titles, for example," monster rock "," mystery black egg "," metamorphosis "and the like are contents of the catalog name.
It should be noted that, in the embodiment of the present invention, the management device of the resource directory respectively determines whether each chapter title of the resource directory is an impurity chapter, where the management device of the resource directory may simultaneously determine whether all chapter titles in the resource directory are impurity chapters, and the management device of the resource directory may further sequentially determine whether each chapter title in the resource directory is an impurity chapter from top to bottom, and the specific details are not limited.
In some embodiments of the present invention, the step 102 of respectively determining whether each chapter title of the resource directory is an impurity chapter may specifically include the following steps:
a1, respectively judging whether each chapter title of the resource catalog comprises a chapter number;
a2, if the chapter number is included in the chapter title, determining that the chapter title including the chapter number is not an impurity chapter;
a3, if the chapter number is not included in the chapter title, determining that the chapter title not including the chapter number is an impurity chapter.
In step a1, when the chapter title of the resource directory includes the directory name content, the chapter titles inevitably satisfy the following relationship: for example, please refer to fig. 2, the first chapter title of the resource directory is: the sixteenth monster, the chapter title being chapter number: chapter sixty + directory name content: monster rocks. According to the research of the inventor, the chapter title including the directory name content in the resource directory of the network information resource also includes the chapter number, for example, the inventor finds that, in 95% of network novels, the chapter title includes the chapter number, and for the chapter title not including the chapter number, the chapter title also does not include the directory name content, and is generally the impurity chapter such as "request for a month ticket", "ask for leave", and the like. Therefore, whether the chapter number is included in the chapter title can be used as a basis for determining whether the chapter title is an impurity chapter, specifically, if the chapter title includes the chapter number, it is determined that the chapter title including the chapter number is not an impurity chapter, and if the chapter title does not include the chapter number, it is determined that the chapter title not including the chapter number is an impurity chapter. Taking the implementation manner of steps a1 to A3 as an example, if the resource directory is the resource directory shown in fig. 2, the judgment is made according to whether the chapter title includes the chapter number, so as to obtain: for the recommendation ticket and my fourth book, the book is put on shelf and the details are given. "these two chapter titles, which do not include chapter numbers, can be judged as" request recommendation ticket "," my fourth book is to be put on shelf, please see the inside. The two chapter headings are the impurity chapters.
Further, in some embodiments of the present invention, the step a1 is to respectively determine whether each chapter title of the resource directory includes a chapter number, and specifically includes the following steps:
a11, judging whether each chapter title of the resource directory includes a chapter number according to the directory title characteristics of the resource directory.
In step a11, a directory title feature of the resource directory may be selected as a basis for determining whether the chapter title includes a chapter number, where the directory title feature refers to a feature of the chapter title including the chapter number in the resource directory, and the directory title feature may be a feature of "the first", "the chapter", or "a number symbol" appearing in the chapter number of the chapter title, for example, the directory title feature may be selected as a feature of "the first" and "the chapter", or "a number symbol" appearing in the first several characters of the chapter title at the same time, where the "number character" may not be limited to an arabic number or a kanji number. For another example, the directory title feature may also be that "the first" appears in the 1 st character of the chapter title, and "chapter" appears in the 3 rd or 4 th or 5 th or 6 th character of the chapter title, etc., as long as the above directory title feature is satisfied, it can be determined that the chapter number is included in the chapter title of the resource directory.
In some embodiments of the present invention, the step 102 of respectively determining whether each chapter title of the resource directory is an impurity chapter may specifically include the following steps:
b1, respectively judging whether each chapter title of the resource catalog comprises a preset impurity word;
b2, if the chapter title comprises the impurity words, determining that the chapter title comprising the impurity words is an impurity chapter;
b3, if the chapter title does not include the foreign word, determining that the chapter title not including the foreign word is not the foreign chapter.
In step B1, words that may be included in the impurity sections are configured in advance as impurity words, where the preset impurity words may be a dynamically configured word set, and the impurity words may be extracted by continuously training a predefined classification model, for example, name contents of the impurity sections are collected periodically, the name contents of the impurity sections are counted, and words with high frequency of occurrence are extracted as impurity words. After the impurity words are set, the preset impurity words can be used for judging the impurity sections, and when the impurity words appear in the section titles of the resource catalogue, the section titles are determined to be the impurity sections. It should be noted that, in the embodiment of the present invention, the foreign word may be a dynamically configured word set, and as the number of collected foreign chapters increases, the configured foreign word may also be enriched continuously, so as to improve the accuracy of determining whether the chapter title is a foreign chapter. Taking the implementation manner of steps B1 to B3 as an example, if the resource directory is the resource directory shown in fig. 2, it is determined whether the chapter title includes a preset impurity word, and after continuously analyzing and determining the impurity chapter, it may be defined that the impurity word generally includes the following words: "monthly ticket", "leave", "shelve", "notice", "recommendation ticket", "! ",". "and so on, after configuring the impurity words as exemplified above, judging whether each chapter title in the resource catalog shown in fig. 2 includes the impurity word, respectively, so as to obtain: for the recommendation ticket and my fourth book, the book is put on shelf and the details are given. The two chapter titles, both of which are the foreign words, can be concluded that "ask for recommendation ticket" and my fourth book are on the shelf, please be detailed in the text. The two chapter headings are the impurity chapters.
It should be noted that, in some embodiments of the present invention, when the steps B1 to B3 determine whether the chapter title is a foreign chapter according to the foreign word, the determination accuracy may be continuously improved as the foreign word is continuously enriched. In addition, words that some foreign words may include are not truly foreign words, so in order to improve the accuracy of determining the chapter title, the determining method provided in steps B1 to B3 may be combined with another determining method for determining whether the chapter title is a foreign chapter, for example, the determining method provided in steps B1 to B3 may be combined with the determining method for determining whether the chapter title is a foreign chapter in steps a1 to A3, and is commonly used for determining whether the chapter title is a foreign chapter, for example, as follows, each chapter title of the resource directory is determined as a foreign chapter in step 102, and specifically, the following steps may be included:
c1, respectively judging whether each chapter title of the resource catalog comprises a preset impurity word or not, and respectively judging whether each chapter title of the resource catalog comprises a chapter number or not;
c2, if the chapter title comprises the impurity words and the chapter number is not included in the chapter title, determining that the chapter title comprising the impurity words and not comprising the chapter number is the impurity chapter;
c3, if the chapter title does not include the foreign word and the chapter number is included in the chapter title, determining that the chapter title which does not include the foreign word and includes the chapter number is not the foreign chapter.
In the implementation manner of steps C1 to C3, the accuracy of the result obtained by the judgment can be higher according to the judgment process that the chapter number and the foreign word are commonly used for the chapter title of the foreign chapter.
In other embodiments of the present invention, besides the determination method combining the determination manners provided in steps B1 through B3 and the determination manners provided in steps a1 through A3, there may be other determination manners, for example, the determination manners provided in steps B1 through B3 may be used as a subsequent supplementary check of the determination manners provided in steps a1 through A3, that is, after determining whether the chapter title is an impurity chapter in steps a1 through A3, the determination results in steps a1 through A3 may be checked according to the determination methods shown in steps B1 through B3.
In some embodiments of the present invention, the step 102 of respectively determining whether each chapter title of the resource directory is an impurity chapter may specifically include the following steps:
d1, acquiring an aggregation catalog generated by the aggregation website after aggregating and splicing the network information resources;
d2, comparing whether the chapter titles of the resource directory are the same as the chapter titles of the aggregation directory;
d3, if the chapter titles included in the resource directory and the aggregation directory are identical, determining that each chapter title in the resource directory is not an impurity chapter;
and D4, if the resource directory and the aggregation directory have different chapter titles, determining the chapter title in the resource directory, which is different from the chapter title in the aggregation directory, as an impurity chapter.
In step D1, the network information resource published by the publishing platform may be extracted by an aggregation website, where the aggregation website is different from the aggregation website where the management device of the resource directory is located in the embodiment of the present invention, and after the aggregation website acquires the network information resource from the publishing platform in step D1, the aggregation and concatenation may be performed on the network information resource to generate an aggregation directory, for example, after the aggregation website acquires a network novel, the aggregation and concatenation may be performed on the network novel to generate an aggregation directory, so in the embodiment of the present invention, the management device of the resource directory first acquires the aggregation directory, then performs step D2 to compare whether the resource directory acquired by the management device of the resource directory from the publishing platform and the aggregation directory acquired from the aggregation website are the same by comparing whether chapter titles included in the resource directory and the aggregation directory are the same, if the chapter titles respectively included in the resource directory and the aggregation directory are completely the same, it can be determined that there is no impurity chapter in the resource directory acquired by the resource directory management device, and if the chapter titles respectively included in the resource directory and the aggregation directory are not completely the same, the chapter title in the resource directory, which is different from the chapter title in the aggregation directory, is the impurity chapter in the resource directory. For example, taking a network information resource as a network novel, there are many novel aggregated websites such as website a, website B, and website C, and the management apparatus of a resource catalog in the embodiment of the present invention may acquire an aggregated catalog from the website a, website B, and website C, compare the aggregated catalog with a resource catalog acquired from a distribution platform in the embodiment of the present invention, and determine whether there is a foreign chapter.
It should be noted that, in some embodiments of the present invention, in the implementation manner of steps D1 to D4, the difference between the chapter titles of the resource directory and the chapter titles of the impurity directory is compared to determine whether the chapter title is an impurity chapter, which does not exclude some cases where the existence of inaccuracy in the aggregation directory is not the same, that is, in the embodiment of the present invention, when there are impurity chapters in both the resource directory and the aggregation directory, there are still impurity chapters in the resource directory. In order to improve the judgment accuracy of the chapter title, the judgment method provided in steps D1 to D4 may be combined with other judgment methods for judging whether the chapter title is an impurity chapter, for example, the judgment methods provided in steps D1 to D4 and the judgment methods for judging whether the chapter title is an impurity chapter in steps a1 to A3 may be combined together to be used for judging whether the chapter title is an impurity chapter, the judgment methods provided in steps D1 to D4 and the judgment methods for judging whether the chapter title is an impurity chapter in steps B1 to B3 are combined together to be used for judging whether the chapter title is an impurity chapter, and the combination method may be similar to the implementation manner described in steps C1 to C3 in the foregoing embodiment, and is not illustrated one by one.
In other embodiments of the present invention, besides the determination method combining the determination manners provided in steps D1 to D4 and the determination manners provided in steps a1 to A3 and B1 to B3, there may be other determination manners, for example, the determination manners provided in steps D1 to D4 may be used as a subsequent supplementary check of the determination manners provided in steps a1 to A3, that is, after determining whether the chapter title is an impurity chapter in steps a1 to A3, the determination results in steps a1 to A3 may be checked according to the determination methods shown in steps D1 to D4. For another example, the determination methods provided in steps D1 to D4 are used as the subsequent supplementary check of the determination methods provided in steps B1 to B3.
103. And filtering the impurity chapters from the resource catalogue to obtain chapter titles including the catalogue name content in the resource catalogue.
In the embodiment of the present invention, after the resource directory of the network information resource is obtained from the publishing platform, the resource directory is not directly presented to the browser, but step 102 needs to be executed to respectively determine whether each chapter title of the resource directory is an impurity chapter, if the resource directory does not include an impurity chapter, the resource directory is presented to the browser for use, if the resource directory includes an impurity chapter, step 103 needs to be executed to filter the impurity chapter from the resource directory, so that the impurity chapter does not exist in the resource directory, and interference of the impurity chapter on the browser is avoided, compared with the technical scheme in the prior art that the impurity chapter is directly presented to the browser after the resource directory is obtained from the publishing platform, the impurity chapter in the resource directory is filtered in the embodiment of the present invention, so that the accurate search of the resource directory by the browser can be realized, the efficiency of obtaining the network information resources by the browser is improved.
It should be noted that, in the embodiment of the present invention, the filtering out of the impurity sections in the resource directory may refer to deleting the impurity sections from the resource directory, or hiding the impurity sections in the resource directory. In some embodiments of the present invention, after the step 103 filters the impurity chapters from the resource directory and obtains the chapter title including the directory name content in the resource directory, the method for managing the resource directory provided in the embodiments of the present invention may further include the following steps:
e1, moving the chapter title after the impurity chapter in the resource directory upward, and occupying the position in the resource directory before the impurity chapter is filtered by the chapter title after the impurity chapter in the resource directory.
That is to say, in the embodiment of the present invention, the impurity chapter as the chapter title occupies a certain position in the user resource directory, and after the impurity chapter is filtered from the resource directory, the position occupied by the impurity chapter in the resource directory can be occupied by the chapter title located after the impurity chapter, so that by re-aggregating the resource directory, the distribution of the chapter titles in the resource directory is more reasonable, the occurrence of a large blank is avoided, and a viewer can conveniently look up the resource directory.
As can be seen from the description of the present invention in the above embodiments, the network information resource is obtained from the publishing platform, where the network information resource includes: and finally, filtering the impurity chapters from the resource catalogue to obtain the chapter titles including the catalogue name content in the resource catalogue. According to the method and the device, the network information resource comprising the resource catalog is obtained from the publishing platform, the impurity chapter can be obtained from the resource catalog, and the impurity chapter is filtered from the resource catalog, so that the impurity chapter irrelevant to the catalog name content in the resource catalog presented to the browser is filtered, the chapter title comprising the catalog name content is reserved in the resource catalog, the browser is prevented from being interfered by the impurity chapter, the resource catalog is accurately searched by the browser, and the network information resource obtaining efficiency of the browser is improved.
In order to better understand and implement the above-mentioned schemes of the embodiments of the present invention, the following description specifically illustrates corresponding application scenarios. Next, taking an example that a network information resource issued by the issuing platform is specifically a network novel, in the prior art, after the network novel is issued by the issuing platform, chapter titles such as a monthly ticket asked by an author or an advertisement issued by the platform may exist in a resource directory of the network novel, so that impurity chapters exist in the resource directory. The resource catalog management device in the embodiment of the invention can realize the resource catalog management of the network novels, filter out impurity chapters from the resource catalog and improve the efficiency of a browser for looking up the resource catalog.
Firstly, a network novel is obtained from a publishing platform, the network novel comprises a resource directory, whether impurity chapters exist in the resource directory of the network novel is judged, then, taking judgment according to chapter numbers as an example, according to statistics and observation, the chapter titles of 95% of books comprise the chapter numbers and do not comprise the titles of the chapter numbers, and generally comprise impurity chapters such as 'monthly requisition' and 'leave request'. The specific implementation flow is as follows, please refer to fig. 3, which is a schematic flow diagram illustrating the management of the resource directory of the network novel in the embodiment of the present invention, and the specific implementation idea is as follows:
s01: the resource directory of network novel a is extracted.
S02: based on the directory title characteristics, automatically determining whether the chapter number is included in the chapter title of the resource directory, and performing steps S03 and S04 according to the determination result.
S03: if the title of the chapter includes the chapter number, the title of the chapter is determined to be a non-impurity chapter
S04: if the section title does not include the section number, the section title is determined to be an impurity section, and then the step S05 is triggered to be executed.
S05: the impurity sections are filtered from the resource catalog.
The method for determining the chapter number of the impurity performed in steps S02 to S04 is described as follows, and whether the chapter number is an impurity chapter is determined according to whether the chapter number is included in the chapter title. As shown in table 1 below, for a head segment of a resource directory of a certain book, the chapter number of the resource directory shown in table 1 is automatically determined, and the chapter number of each chapter title is obtained, so as to obtain the chapter situation shown in table 2, where the left column in table 2 shows the chapter number and the right column shows the chapter name content. Wherein, the chapter name contents in table 2 are: "the month comes to the end of asking for a month ticket" and "the date has something else" two chapter titles do not include a chapter number, and other chapter titles have chapter numbers, so that the two chapter titles can be judged as foreign chapters.
Table 1 is a resource catalog for the network novel:
chapter i sun rising
At the end of the moon, ask for a monthly ticket
Chapter ii who is
Chapter III has a wind
Standing up, there is a leave today
The sixth section is first to
Table 2 shows chapter titles obtained after automatic processing of chapter titles according to chapter numbers:
chapter number Chapter name content
Chapter I Rising sun
Is free of At the end of the month, the month ticket is obtained
Chapter II Who is he
Chapter III Get on the wind
Is free of Standing up, there is a leave today
Section six At first sight
As can be seen from tables 1 and 2, in "the month has arrived at the request for the month ticket" and "the date has been" two chapter titles do not include a chapter number, and the other chapter titles have chapter numbers, it can be determined that the two chapter titles are the foreign chapter. The chapter titles of the resource directory are counted, and directory title characteristics are obtained, namely, the directory title characteristics all contain ' the ' th ', ' the ' chapter ', ' the ' section ' and appearing numbers, the chapter titles can be used as bases for judging whether the chapter titles are impurity chapters or not, whether the chapter titles are impurity chapters or not is judged according to whether chapter numbers are contained in the chapter titles or not, accordingly, the impurity chapters can be filtered, the chapter titles which are not impurity chapters are presented to a browser, and the browser can conveniently look up the chapter titles.
It should be noted that, for simplicity of description, the above-mentioned method embodiments are described as a series of acts or combination of acts, but those skilled in the art will recognize that the present invention is not limited by the order of acts, as some steps may occur in other orders or concurrently in accordance with the invention. Further, those skilled in the art should also appreciate that the embodiments described in the specification are preferred embodiments and that the acts and modules referred to are not necessarily required by the invention.
To facilitate a better implementation of the above-described aspects of embodiments of the present invention, the following also provides relevant means for implementing the above-described aspects.
Referring to fig. 4-a, an apparatus 400 for managing a resource directory according to an embodiment of the present invention includes: an acquisition module 401, a chapter judgment module 402, and a filtering module 403, wherein,
an obtaining module 401, configured to obtain a network information resource from a publishing platform, where the network information resource includes: a resource directory;
a chapter judgment module 402, configured to respectively judge whether each chapter title of the resource directory is an impurity chapter, where the impurity chapter is a chapter title that does not include a directory name content in the resource directory;
a filtering module 403, configured to filter the impurity chapters from the resource directory, so as to obtain chapter titles that include the directory name content in the resource directory.
In some embodiments of the present invention, referring to fig. 4-b, the chapter determination module 402 includes:
the chapter number obtaining sub-module 4021 is configured to respectively determine whether each chapter title of the resource directory includes a chapter number; if the chapter number is included in the chapter title, determining that the chapter title including the chapter number is not the impurity chapter; if the chapter number is not included in the chapter titles, determining that the chapter title not including the chapter number is the impurity chapter.
Further, in some embodiments of the present invention, the chapter number obtaining sub-module 4021 is specifically configured to respectively determine whether each chapter title of the resource directory includes a chapter number according to the directory title feature of the resource directory.
In some embodiments of the present invention, the chapter judgment module 402 is specifically configured to respectively judge whether each chapter title of the resource directory includes a preset impurity word; if the chapter title comprises the impurity words, determining that the chapter title comprising the impurity words is the impurity chapter; if the foreign word is not included in the chapter title, determining that the chapter title not including the foreign word is not the foreign chapter.
In some embodiments of the present invention, referring to fig. 4-c, the chapter determination module 402 includes:
the aggregation directory obtaining sub-module 4022 is configured to obtain an aggregation directory generated by the aggregation website after performing aggregation and splicing on the network information resources;
the chapter comparison sub-module 4023 compares the chapter titles of the resource directory with the chapter titles of the aggregation directory to determine whether the chapter titles are the same; if the chapter titles included in the resource catalog and the aggregation catalog are identical, determining that each chapter title in the resource catalog is not the impurity chapter; and if the resource directory and the aggregation directory have different chapter titles, determining the chapter title in the resource directory, which is different from the chapter title in the aggregation directory, as the impurity chapter.
In some embodiments of the present invention, referring to fig. 4-d, the apparatus 400 for managing a resource directory further includes:
a shifting module 404, configured to, after the filtering module 403 filters the impurity chapter from the resource directory, and obtaining the chapter title including the directory name content in the resource directory, move the chapter title located after the impurity chapter in the resource directory upward, and occupy, by the chapter title located after the impurity chapter in the resource directory, a position in the resource directory before the impurity chapter is filtered.
As can be seen from the above description of the embodiment of the present invention, the network information resource is obtained from the publishing platform, where the network information resource includes: and finally, filtering the impurity chapters from the resource catalogue to obtain the chapter titles including the catalogue name content in the resource catalogue. According to the method and the device, the network information resource comprising the resource catalog is obtained from the publishing platform, the impurity chapter can be obtained from the resource catalog, and the impurity chapter is filtered from the resource catalog, so that the impurity chapter irrelevant to the catalog name content in the resource catalog presented to the browser is filtered, the chapter title comprising the catalog name content is reserved in the resource catalog, the browser is prevented from being interfered by the impurity chapter, the resource catalog is accurately searched by the browser, and the network information resource obtaining efficiency of the browser is improved.
Fig. 5 is a schematic structural diagram of a server to which a method for managing a resource directory according to an embodiment of the present invention is applied, where the server 500 may have a relatively large difference due to different configurations or performances, and may include one or more Central Processing Units (CPUs) 522 (e.g., one or more processors) and a memory 532, and one or more storage media 530 (e.g., one or more mass storage devices) for storing applications 542 or data 544. Memory 532 and storage media 530 may be, among other things, transient storage or persistent storage. The program stored on the storage medium 530 may include one or more modules (not shown), each of which may include a series of instruction operations for the server. Still further, the central processor 522 may be configured to communicate with the storage medium 530, and execute a series of instruction operations in the storage medium 530 on the server 500.
The server 500 may also include one or more power supplies 526, one or more wired or wireless network interfaces 550, one or more input-output interfaces 558, and/or one or more operating systems 541, such as Windows Server, Mac OS XTM, UnixTM, LinuxTM, FreeBSDTM, and so forth.
The method performed by the server in the above embodiments may be based on the management apparatus structure of the resource directory shown in fig. 4-a, fig. 4-b, fig. 4-c, and fig. 4-d.
As can be seen from the above description of the embodiment of the present invention, the network information resource is obtained from the publishing platform, where the network information resource includes: and finally, filtering the impurity chapters from the resource catalogue to obtain the chapter titles including the catalogue name content in the resource catalogue. According to the method and the device, the network information resource comprising the resource catalog is obtained from the publishing platform, the impurity chapter can be obtained from the resource catalog, and the impurity chapter is filtered from the resource catalog, so that the impurity chapter irrelevant to the catalog name content in the resource catalog presented to the browser is filtered, the chapter title comprising the catalog name content is reserved in the resource catalog, the browser is prevented from being interfered by the impurity chapter, the resource catalog is accurately searched by the browser, and the network information resource obtaining efficiency of the browser is improved.
It should be noted that the above-described embodiments of the apparatus are merely schematic, where the units described as separate parts may or may not be physically separate, and the parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on multiple network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment. In addition, in the drawings of the embodiment of the apparatus provided by the present invention, the connection relationship between the modules indicates that there is a communication connection between them, and may be specifically implemented as one or more communication buses or signal lines. One of ordinary skill in the art can understand and implement it without inventive effort.
Through the above description of the embodiments, those skilled in the art will clearly understand that the present invention may be implemented by software plus necessary general hardware, and may also be implemented by special hardware including special integrated circuits, special CPUs, special memories, special components and the like. Generally, functions performed by computer programs can be easily implemented by corresponding hardware, and specific hardware structures for implementing the same functions may be various, such as analog circuits, digital circuits, or dedicated circuits. However, the implementation of a software program is a more preferable embodiment for the present invention. Based on such understanding, the technical solutions of the present invention may be embodied in the form of a software product, which is stored in a readable storage medium, such as a floppy disk, a usb disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk of a computer, and includes instructions for enabling a computer device (which may be a personal computer, a server, or a network device) to execute the methods according to the embodiments of the present invention.
In summary, the above embodiments are only used for illustrating the technical solutions of the present invention, and not for limiting the same; although the present invention has been described in detail with reference to the above embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the above embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present invention.

Claims (10)

1. A method for managing a resource directory, comprising:
obtaining network information resources from a publishing platform, the network information resources including: a resource directory;
respectively judging whether each chapter title of the resource catalog is an impurity chapter, wherein the impurity chapter is a chapter title which does not comprise the catalog name content in the resource catalog;
filtering the impurity chapters from the resource catalogue to obtain chapter titles including the catalogue name content in the resource catalogue;
wherein the respectively determining whether each chapter title of the resource directory is an impurity chapter comprises:
respectively judging whether each chapter title of the resource catalogue comprises a chapter number or not, and respectively judging whether each chapter title of the resource catalogue comprises a preset impurity word or not, wherein the impurity word is obtained by continuously training and extracting a predefined classification model;
if the chapter title comprises the chapter number and does not comprise the impurity words, determining that the chapter title is not an impurity chapter, wherein the chapter title of a non-impurity chapter comprises the chapter number and the content of the catalog name;
if the chapter title does not include a chapter number and includes the impurity word, determining that the chapter title is an impurity chapter;
and if the judgment result is obtained by judging whether each chapter title of the resource directory comprises the chapter number, checking the judgment result by utilizing a judgment mode of judging whether each chapter title of the resource directory comprises the preset impurity words.
2. The method according to claim 1, wherein the separately determining whether each chapter header of the resource directory includes a chapter number comprises:
and respectively judging whether each chapter title of the resource directory comprises a chapter number according to the directory title characteristics of the resource directory.
3. The method of claim 1, wherein the separately determining whether each chapter title of the resource directory is a trash chapter comprises:
acquiring an aggregation catalog generated by an aggregation website after the network information resources are aggregated and spliced;
comparing whether the chapter titles of the resource directory are the same as the chapter titles of the aggregation directory;
if the chapter titles included in the resource catalog and the aggregation catalog are identical, determining that each chapter title in the resource catalog is not the impurity chapter;
and if the resource directory and the aggregation directory have different chapter titles, determining the chapter title in the resource directory, which is different from the chapter title in the aggregation directory, as the impurity chapter.
4. The method of claim 1, wherein after filtering the impurity sections from the resource directory and obtaining section titles in the resource directory that include the content of the directory name, the method further comprises:
moving up the section titles after the impurity section in the resource directory, wherein the section titles after the impurity section in the resource directory occupy the position in the resource directory before the impurity section is filtered out.
5. An apparatus for managing a resource directory, comprising:
an obtaining module, configured to obtain a network information resource from a publishing platform, where the network information resource includes: a resource directory;
the chapter judgment module is used for respectively judging whether each chapter title of the resource catalogue is an impurity chapter, and the impurity chapter is a chapter title which does not comprise catalogue name contents in the resource catalogue;
the filtering module is used for filtering the impurity chapters from the resource catalogue to obtain chapter titles including the catalogue name content in the resource catalogue;
wherein the chapter judgment module is configured to:
respectively judging whether each chapter title of the resource catalogue comprises a chapter number or not, and respectively judging whether each chapter title of the resource catalogue comprises a preset impurity word or not, wherein the impurity word is obtained by continuously training and extracting a predefined classification model; if the chapter title comprises the chapter number and does not comprise the impurity words, determining that the chapter title is not an impurity chapter, wherein the chapter title of a non-impurity chapter comprises the chapter number and the content of the catalog name; if the chapter title does not include a chapter number and includes the impurity word, determining that the chapter title is an impurity chapter; and if the judgment result is obtained by judging whether each chapter title of the resource directory comprises the chapter number, checking the judgment result by utilizing a judgment mode of judging whether each chapter title of the resource directory comprises the preset impurity words.
6. The apparatus according to claim 5, wherein the chapter determining module is specifically configured to determine whether each chapter title of the resource directory includes a chapter number according to a directory title feature of the resource directory.
7. The apparatus of claim 5, wherein the chapter determination module comprises:
the aggregation directory acquisition submodule is used for acquiring an aggregation directory generated after the aggregation website performs aggregation splicing on the network information resources;
a chapter comparison submodule for comparing whether the chapter titles of the resource directory are the same as the chapter titles of the aggregation directory; if the chapter titles included in the resource catalog and the aggregation catalog are identical, determining that each chapter title in the resource catalog is not the impurity chapter; and if the resource directory and the aggregation directory have different chapter titles, determining the chapter title in the resource directory, which is different from the chapter title in the aggregation directory, as the impurity chapter.
8. The apparatus of claim 5, wherein the means for managing the resource directory further comprises:
and the displacement module is used for filtering the impurity chapters from the resource catalogue by the filtering module, moving the chapter titles behind the impurity chapters in the resource catalogue upwards after obtaining the chapter titles comprising the catalogue name content in the resource catalogue, and occupying the positions in the resource catalogue before the impurity chapters are filtered by the chapter titles behind the impurity chapters in the resource catalogue.
9. A storage medium, comprising: at least one instruction to cause a terminal device to perform a method of managing a resource directory as claimed in any one of claims 1 to 4.
10. A server, comprising: a memory and a processor;
the memory is used for storing a computer program;
the processor is configured to execute a computer program stored in the memory;
the computer program is for executing the method for managing a resource directory of any one of claims 1 to 4.
CN201510489311.3A 2015-08-11 2015-08-11 Resource directory management method and device Active CN106445967B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201510489311.3A CN106445967B (en) 2015-08-11 2015-08-11 Resource directory management method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201510489311.3A CN106445967B (en) 2015-08-11 2015-08-11 Resource directory management method and device

Publications (2)

Publication Number Publication Date
CN106445967A CN106445967A (en) 2017-02-22
CN106445967B true CN106445967B (en) 2020-12-29

Family

ID=58092961

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201510489311.3A Active CN106445967B (en) 2015-08-11 2015-08-11 Resource directory management method and device

Country Status (1)

Country Link
CN (1) CN106445967B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110929474B (en) * 2019-10-28 2023-10-20 维沃移动通信(杭州)有限公司 Display method, electronic equipment and medium for literary composition chapters
CN113408660A (en) * 2021-07-15 2021-09-17 北京百度网讯科技有限公司 Book clustering method, device, equipment and storage medium

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8458227B1 (en) * 2010-06-24 2013-06-04 Amazon Technologies, Inc. URL rescue by identifying information related to an item referenced in an invalid URL
CN106033405A (en) * 2015-03-10 2016-10-19 腾讯科技(深圳)有限公司 A network book contents integrity detection method and device
CN106294292A (en) * 2016-07-20 2017-01-04 腾讯科技(深圳)有限公司 Chapters and sections catalogue screening technique and device

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102346748A (en) * 2010-08-05 2012-02-08 盛乐信息技术(上海)有限公司 Automatic identification method for network literature directory type web pages
CN103544172B (en) * 2012-07-13 2019-01-29 深圳市世纪光速信息技术有限公司 A kind of chapters and sections catalogue processing method and processing device of e-book

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8458227B1 (en) * 2010-06-24 2013-06-04 Amazon Technologies, Inc. URL rescue by identifying information related to an item referenced in an invalid URL
CN106033405A (en) * 2015-03-10 2016-10-19 腾讯科技(深圳)有限公司 A network book contents integrity detection method and device
CN106294292A (en) * 2016-07-20 2017-01-04 腾讯科技(深圳)有限公司 Chapters and sections catalogue screening technique and device

Also Published As

Publication number Publication date
CN106445967A (en) 2017-02-22

Similar Documents

Publication Publication Date Title
JP6381002B2 (en) Search recommendation method and apparatus
CN101025738B (en) Template-free dynamic website generating method
US8843483B2 (en) Method and system for interactive search result filter
US20190147003A1 (en) Dynamic search set creation in a search engine
CN104572668B (en) Method and apparatus based on multiple pattern file generated Merge Styles files
US20170300481A1 (en) Document searching visualized within a document
WO2019100645A1 (en) Method for realizing multilevel interactive drop-down box, electronic device, and storage medium
US10073918B2 (en) Classifying URLs
CN105094775B (en) Webpage generation method and device
CN108009147B (en) Electronic book cover generation method, electronic device and computer storage medium
CN111382192B (en) Data list display method and device and electronic equipment
CN103514282A (en) Method and device for displaying search results of videos
US9667505B2 (en) URL navigation page generation method, device and program
CN107977420A (en) The abstract extraction method, apparatus and readable storage medium storing program for executing of a kind of evolved document
CN103607668B (en) Video play method and apparatus
CN106445967B (en) Resource directory management method and device
CN106202513A (en) Method and apparatus is recommended by the main website that browses of browser
US20200073925A1 (en) Method and system for generating a website from collected content
WO2017172373A1 (en) Search navigation element
CN110969000B (en) Data merging processing method and device
CN104123307A (en) Data loading method and system
CN108073646B (en) Directory extraction method and device
JP2019200494A (en) Display program, display method, and display device
JP2019046017A (en) Information processing apparatus, information processing method and information processing program
CN113128184A (en) Document content screening method and device for multi-person collaborative editing document

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant