CN108268429B - Method and device for determining network literature chapters - Google Patents

Method and device for determining network literature chapters Download PDF

Info

Publication number
CN108268429B
CN108268429B CN201710452914.5A CN201710452914A CN108268429B CN 108268429 B CN108268429 B CN 108268429B CN 201710452914 A CN201710452914 A CN 201710452914A CN 108268429 B CN108268429 B CN 108268429B
Authority
CN
China
Prior art keywords
chapter
chapters
markov chain
section
acyclic
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201710452914.5A
Other languages
Chinese (zh)
Other versions
CN108268429A (en
Inventor
庞培宇
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Alibaba China Co Ltd
Original Assignee
Alibaba China Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Alibaba China Co Ltd filed Critical Alibaba China Co Ltd
Priority to CN201710452914.5A priority Critical patent/CN108268429B/en
Publication of CN108268429A publication Critical patent/CN108268429A/en
Application granted granted Critical
Publication of CN108268429B publication Critical patent/CN108268429B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/205Parsing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/10Text processing
    • G06F40/103Formatting, i.e. changing of presentation of documents

Abstract

The application provides a method and a device for determining a network literature chapter, wherein an acyclic Markov chain model is established by acquiring sequences of chapters of N first candidate sites, and a subsequence with the largest weight in the chain model is acquired according to the acyclic Markov chain model; and determining that the chapter sequence corresponding to the subsequence with the maximum weight value is a credible chapter sequence, and the last node is the latest chapter. The method comprises the steps of establishing an acyclic Markov model with a weight value through a sequence of chapters of a plurality of candidate sites, and obtaining a subsequence with the maximum weight value, so that the most credible chapter sequence and the latest chapter are determined, and the user experience is improved.

Description

Method and device for determining network literature chapters
Technical Field
The present application relates to computer network technologies, and in particular, to a method and an apparatus for determining a network literature chapter.
Background
The network literature is literature published on a network as a carrier, and the latest chapter is updated in an online updating manner.
The quality of the network literature provided by different sites varies, for example: the network literature provided by some sites may be the following, for example: in the prior art, when a user wants to read certain chapters of network literature, the name of the network literature is input, and all search results are displayed by a website according to search words input by the user.
However, with the method in the prior art, the quality of the search results provided for the user is uneven, the user is difficult to distinguish which search result has high reliability of the chapters, and the user experience is not high.
Disclosure of Invention
The application provides a method and a device for determining a network literature chapter, which improve the quality of a search result provided for a user and improve user experience.
In a first aspect, the present application provides a method for determining a network literature section, including:
acquiring a sequence of chapters of N first candidate sites, wherein N is an integer greater than or equal to 2;
establishing an acyclic Markov chain model according to the sequence of chapters of the N first candidate sites, wherein the nodes of the acyclic Markov chain model are determined according to the chapters, directed edges of the acyclic Markov chain model are determined according to the sequence of the chapters, and the weights of the directed edges are determined according to the occurrence frequency of the sequence of chapters corresponding to the directed edges;
acquiring a subsequence with the maximum weight according to the acyclic Markov chain model;
and determining the chapter sequence corresponding to the subsequence with the maximum weight value as a credible chapter sequence.
Optionally, the method further comprises:
and determining the section corresponding to the tail node of the subsequence with the maximum weight as the latest section of the network literature.
Optionally, before the obtaining the sequence of the chapters of the N first candidate sites, the method further includes:
and determining the N first candidate sites according to the number of chapters of the M second candidate sites, wherein M is an integer larger than N.
Optionally, the determining the N first candidate sites according to the number of chapters of the M second candidate sites includes:
according to pi=|si-u/δ, obtaining a score for each second candidate site; obtaining a score greater than DmaxThe second candidate sites are the N first candidate sites, wherein i is an integer which is greater than or equal to 1 and less than or equal to M, and s isiIs the number of chapters of the ith second candidate site, u is the average of the number of chapters of the M second candidate sites, δ is the standard deviation of the number of chapters of the M second candidate sites, and
Figure BDA0001322967420000021
said n being equal to M.
Optionally, the obtaining the sequence of the chapters of the N first candidate sites includes:
and acquiring a sequence of the last L sections of the N first candidate sites, wherein L is an integer greater than or equal to 2.
Optionally, the establishing an endless markov chain model according to the sequence of chapters of the N first candidate sites includes:
sequentially merging chapters of each first candidate site of the N first candidate sites into the established endless Markov chain;
wherein merging the chapters of each first candidate site into the established endless Markov chain comprises:
sequentially combining each chapter into an established endless Markov chain according to a preset rule according to the sequence of the chapters in the first candidate site, wherein the initial value of the established Markov chain is null;
wherein, the merging each section into the established endless markov chain according to the preset rule comprises:
determining whether the first chapter exists in an established acyclic Markov chain, and if the first chapter does not exist, adding the first chapter to the established acyclic Markov chain; determining whether a second chapter exists in the established acyclic Markov chain, if not, adding the second chapter to the established acyclic Markov chain, and establishing a directed edge from the first chapter to the second chapter; if the second section exists, establishing a directed edge from the first section to the second section;
if the first chapter exists, determining whether a second chapter exists in the established acyclic Markov chain, if the second chapter does not exist, adding the second chapter to the established acyclic Markov chain, and establishing a directed edge from the first chapter to the second chapter; if the second section exists, determining whether a directed edge from the first section to the second section exists, if the directed edge from the first section to the second section does not exist, determining whether a loop is formed between the directed edge from the first section to the second section and the existing directed edge, and if the loop is not formed, establishing a directed edge from the first section to the second section; and if the directed edge from the first chapter to the second chapter exists, adding a unit to the weight value of the directed edge from the first chapter to the second chapter, wherein the first chapter is a chapter to be merged, and the second chapter is a next chapter of the first chapter.
In a second aspect, the present application provides an apparatus for determining a network literature chapter, including:
an obtaining module, configured to obtain a sequence of chapters of N first candidate sites, where N is an integer greater than or equal to 2;
the processing module is used for establishing an acyclic Markov chain model according to the sequence of chapters of the N first candidate sites, wherein nodes of the acyclic Markov chain model are determined according to the chapters, directed edges of the acyclic Markov chain model are determined according to the sequence of the chapters, and the weight of the directed edges is determined according to the occurrence frequency of the sequence of the chapters corresponding to the directed edges;
the processing module is further configured to obtain a subsequence with a largest weight according to the acyclic markov chain model;
and the output module is used for determining that the chapter sequence corresponding to the subsequence with the maximum weight value is a credible chapter sequence.
Optionally, the output module is further configured to determine that a chapter corresponding to a last node of the subsequence with the largest weight is a latest chapter of the network literature.
Optionally, the processing module is further configured to determine the N first candidate sites according to the number of chapters of M second candidate sites, where M is an integer greater than N.
Optionally, the processing module is specifically configured to
According to pi=|si-u/δ, obtaining a score for each second candidate site; obtaining a score greater than DmaxThe second candidate sites are the N first candidate sites, wherein i is greater than or equal to 1 and less than or equal to 1An integer equal to M, said siIs the number of chapters of the ith second candidate site, u is the average of the number of chapters of the M second candidate sites, δ is the standard deviation of the number of chapters of the M second candidate sites, and
Figure BDA0001322967420000041
said n being equal to M.
Optionally, the obtaining module is specifically configured to obtain a sequence of reciprocal L chapters of the N first candidate sites, where L is an integer greater than or equal to 2.
Optionally, the processing module is specifically configured to merge chapters of each of the N first candidate sites into the established endless markov chain in sequence; wherein merging the chapters of each first candidate site into the established endless Markov chain comprises:
sequentially combining each chapter into an established endless Markov chain according to a preset rule according to the sequence of the chapters in the first candidate site, wherein the initial value of the established Markov chain is null;
wherein, the merging each section into the established endless markov chain according to the preset rule comprises:
determining whether the first chapter exists in an established acyclic Markov chain, and if the first chapter does not exist, adding the first chapter to the established acyclic Markov chain; determining whether a second chapter exists in the established acyclic Markov chain, if not, adding the second chapter to the established acyclic Markov chain, and establishing a directed edge from the first chapter to the second chapter; if the second section exists, establishing a directed edge from the first section to the second section;
if the first chapter exists, determining whether a second chapter exists in the established acyclic Markov chain, if the second chapter does not exist, adding the second chapter to the established acyclic Markov chain, and establishing a directed edge from the first chapter to the second chapter; if the second section exists, determining whether a directed edge from the first section to the second section exists, if the directed edge from the first section to the second section does not exist, determining whether a loop is formed between the directed edge from the first section to the second section and the existing directed edge, and if the loop is not formed, establishing a directed edge from the first section to the second section; and if the directed edge from the first chapter to the second chapter exists, adding a unit to the weight value of the directed edge from the first chapter to the second chapter, wherein the first chapter is a chapter to be merged, and the second chapter is a next chapter of the first chapter.
According to the method and the device for determining the network literature chapters, the sequences of the chapters of the N first candidate sites are obtained, an acyclic Markov chain model is built according to the sequences of the chapters of the N first candidate sites, and a subsequence with the largest weight is obtained according to the acyclic Markov chain model; and determining the chapter sequence corresponding to the subsequence with the maximum weight value as a credible chapter sequence. The method comprises the steps of establishing an endless Markov chain model, integrating sequences of chapters of a plurality of first candidate sites, determining a subsequence with the maximum weight value, and enabling the chapter sequence corresponding to the subsequence with the maximum weight value to be a credible chapter sequence, so that the credible chapter sequence is provided for a user, and user experience is improved.
Drawings
In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings needed to be used in the description of the embodiments or the prior art will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and it is obvious for those skilled in the art that other drawings can be obtained according to the drawings without inventive exercise.
FIG. 1 is a schematic flow chart of a first embodiment of a method for determining a network literature section of the present application;
2-10 are schematic diagrams of the present application of building an endless Markov chain model;
fig. 11 is a flowchart illustrating a second embodiment of the method for determining a network literature section according to the present application;
fig. 12 is a schematic structural diagram of an embodiment of a determining apparatus in the network literature section of the present application.
Detailed Description
The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.
The terms "first," "second," "third," "fourth," and the like in the description and in the claims of the present application and in the above-described drawings (if any) are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used is interchangeable under appropriate circumstances such that the embodiments of the application described herein are, for example, capable of operation in sequences other than those illustrated or otherwise described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.
Fig. 1 is a schematic flow chart of a first embodiment of the determination method of the network literature section of the present application, as shown in fig. 1:
s101: and acquiring the sequence of the chapters of the N first candidate sites.
Wherein N is an integer of 2 or more.
In one possible implementation manner, a sequence of reciprocal L chapters of the N first candidate sites is obtained, where L is an integer greater than or equal to 2.
Taking N as 9 and L as 3 as an example, assume that the sequence of the last 3 chapters of the 9 first candidate sites is as described in table 1:
TABLE 1
First candidate site Chapter 3 last Chapter 2 last Chapter 1 from last
X1 A B E
X2 B C D
X3 A B C
X4 C D E
X5 C D E
X6 J C E
X7 J X Z
X8 C D A
X
9 1 2 3
S102: and establishing an endless Markov chain model according to the sequence of the chapters of the N first candidate sites.
And determining the nodes of the acyclic Markov chain model according to the chapters, and determining the directed edges of the acyclic Markov chain model according to the sequence between the chapters.
One possible implementation is as follows: sequentially merging chapters of each first candidate site of the N first candidate sites into the established endless Markov chain;
one possible implementation manner for merging the chapters of each first candidate site into the established endless markov chain is as follows:
sequentially combining each chapter into an established endless Markov chain according to a preset rule according to the sequence of the chapters in the first candidate site, wherein the initial value of the established Markov chain is null;
one possible implementation of incorporating each chapter into an established endless markov chain according to preset rules is as follows:
determining whether the first chapter exists in an established acyclic Markov chain, and if the first chapter does not exist, adding the first chapter to the established acyclic Markov chain; determining whether a second chapter exists in the established acyclic Markov chain, if not, adding the second chapter to the established acyclic Markov chain, and establishing a directed edge from the first chapter to the second chapter; if the second section exists, establishing a directed edge from the first section to the second section;
if the first chapter exists, determining whether a second chapter exists in the established acyclic Markov chain, if the second chapter does not exist, adding the second chapter to the established acyclic Markov chain, and establishing a directed edge from the first chapter to the second chapter; if the second section exists, determining whether a directed edge from the first section to the second section exists, if the directed edge from the first section to the second section does not exist, determining whether a loop is formed between the directed edge from the first section to the second section and the existing directed edge, and if the loop is not formed, establishing a directed edge from the first section to the second section; and if the directed edge from the first chapter to the second chapter exists, adding a unit to the weight value of the directed edge from the first chapter to the second chapter, wherein the first chapter is a chapter to be merged, and the second chapter is a next chapter of the first chapter.
Combining Table 1, suppose X is sequentially combined1-X9The sections in (1) are merged into the established endless markov chain;
mixing X1The sections in (1) are merged into the established endless markov chain; the results are shown in FIG. 2; acyclic Markov chain initiationThe value is null;
wherein, X is1A in (3) is merged into an acyclic Markov chain;
adding A to the acyclic Markov chain and adding B to the acyclic Markov chain according to the preset rule, and establishing a directed edge from A to B;
mixing X1B in (3) is merged into an acyclic markov chain;
and according to the preset rule, adding C to the acyclic Markov chain, and establishing a directed edge from B to C.
On the basis of FIG. 2, X is2The sections in (a) are merged into the established endless markov chain, as shown in figure 3.
Wherein, X is2B in (3) is merged into an acyclic markov chain;
adding C to the acyclic Markov chain according to the preset rule, and establishing a directed edge from B to C;
mixing X2C in (3) is merged into an acyclic markov chain;
and D is added to the acyclic Markov chain according to the preset rule, and a directed edge from C to D is established.
On the basis of FIG. 3, X is3The sections in (1) are merged into the established endless markov chain, as shown in fig. 4 after merging:
wherein, X is3A in (3) is merged into an acyclic Markov chain;
according to the preset rule, the weight value of the directed edge from A to B is increased by one unit, and if one unit is 1, the weight value of the directed edge from A to B is 2.
Mixing X3B in (3) is merged into an acyclic markov chain;
and according to the preset rule, increasing the weight value of the directed edge from B to C by one unit, wherein the weight value of the directed edge from B to C is 2.
On the basis of FIG. 4, X is4The sections in (1) are merged into the established endless markov chain, and the merged section is shown in fig. 5;
wherein, X is4C in (3) is merged into an acyclic markov chain;
and according to the preset rule, adding a unit to the weight value of the directed edge from C to D, and assuming that the unit is 1, the weight value of the directed edge from C to D is 2.
Mixing X4D in (3) is merged into an acyclic markov chain;
and establishing directed edges from D to E according to the preset rule.
On the basis of FIG. 5, X is5The sections in (1) are merged into the established endless markov chain, and the merged section is shown in fig. 6;
wherein, X is5C in (3) is merged into an acyclic markov chain;
and according to the preset rule, adding a unit to the weight value of the directed edge from C to D, and assuming that the unit is 1, the weight value of the directed edge from C to D is 3.
Mixing X5D in (3) is merged into an acyclic markov chain;
according to the preset rule, the weight value of the directed edge from D to E is increased by one unit, and if one unit is 1, the weight value of the directed edge from D to E is 2.
On the basis of FIG. 6, X is6The sections in (a) are merged into the established endless markov chain, as shown in figure 7.
Wherein, X is6J in (3) is merged into an acyclic markov chain;
and adding J to the acyclic Markov chain according to the preset rule, and establishing a directed edge from J to C.
Mixing X6C in (3) is merged into an acyclic markov chain;
and establishing the directed edges from C to E according to the preset rule.
On the basis of FIG. 7, let X7The sections in (1) are merged into the established endless markov chain, and the merged section is shown in fig. 8;
wherein, X is7J in (3) is merged into an acyclic markov chain;
and adding X to the acyclic Markov chain according to the preset rule, and establishing a directed edge from J to X.
Mixing X6In (1)Merging X into an acyclic Markov chain;
and adding Z to the acyclic Markov chain according to the preset rule, and establishing an X-to-Z directed edge.
On the basis of FIG. 8, let X8The sections in (1) are merged into the established endless markov chain, and the merged section is shown in fig. 9;
wherein, X is8C in (3) is merged into an acyclic markov chain;
and according to the preset rule, adding a unit to the weight value of the directed edge from C to D, and assuming that the unit is 1, the weight value of the directed edge from C to D is 4.
Mixing X8D in (3) is merged into an acyclic markov chain;
according to the preset rule, no operation is needed.
On the basis of FIG. 9, let X9The sections in (a) are merged into the established endless markov chain, as shown in figure 10.
Mixing X 91 in (3) is merged into an acyclic markov chain;
adding 1 into an acyclic Markov chain and adding 2 into the acyclic Markov chain according to the preset rule, and establishing a directed edge from 1 to 2;
mixing X 92 in (3) are merged into an acyclic markov chain;
and 3 is added into the acyclic Markov chain according to the preset rule, and directed edges of 2 to 3 are established.
S103: and acquiring the subsequence with the maximum weight value according to the acyclic Markov chain model.
Wherein, a dynamic programming or greedy strategy can be adopted to obtain the subsequence with the maximum weight.
The value of the subsequence with the largest weight can be according to:
Li,n+1=Li,n+max(Vi,n+1)
in connection with the example in S102, it can be seen that: the subsequence with the largest weight is 10, and the subsequence traversed by the largest value is the subsequence with the largest weight, so the subsequence with the largest weight is A, B, C, D, E.
S104: and determining the chapter sequence corresponding to the subsequence with the maximum weight value as a credible chapter sequence.
The user can accurately acquire the chapters to be read according to the credible chapter sequence, such as: and if the user wants to read the latest chapter, determining the chapter corresponding to the last tail node of the subsequence with the maximum weight as the latest chapter of the network literature, and acquiring the chapter E by combining the example. The user is reading the latest 3 chapters and, in conjunction with the previous example, obtains C, D, E three chapters.
In this embodiment, by obtaining a sequence of chapters of N first candidate sites, an acyclic markov chain model is established according to the sequence of chapters of the N first candidate sites, and a subsequence with the largest weight is obtained according to the acyclic markov chain model; and determining the chapter sequence corresponding to the subsequence with the maximum weight value as a credible chapter sequence. The method comprises the steps of establishing an endless Markov chain model, integrating sequences of chapters of a plurality of first candidate sites, determining a subsequence with the maximum weight value, and enabling the chapter sequence corresponding to the subsequence with the maximum weight value to be a credible chapter sequence, so that the credible chapter sequence is provided for a user, and user experience is improved.
Fig. 11 is a flowchart of a second embodiment of the determination method in the network literature section of the present application, and fig. 11 is a flowchart of the embodiment shown in fig. 1, when the number of candidate sites is large, for example: if the number of the candidate sites exceeds 6, excluding the sites with more errors according to the number of chapters of the candidate sites to reduce the calculation workload, and therefore, before S101, the method may further include:
s100: and determining the N first candidate sites according to the number of chapters of the M second candidate sites.
Wherein M is an integer greater than N.
One possible implementation is to remove the noise site by using the cheavent:
in particular, according to pi=|si-u/δ, obtaining a score for each second candidate site; obtaining a score greater than DmaxIs the N first candidate sites, which isIn the formula, i is an integer of 1 to M, and s isiIs the number of chapters of the ith second candidate site, u is the average of the number of chapters of the M second candidate sites, δ is the standard deviation of the number of chapters of the M second candidate sites, and
Figure BDA0001322967420000111
said n being equal to M.
According to the embodiment, the N first candidate sites are determined according to the number of chapters of the M second candidate sites, and the sites with more errors are eliminated, so that the calculation workload is reduced, and the efficiency of determining the latest chapters of the network literature is improved.
Fig. 12 is a schematic structural diagram of an embodiment of a determining apparatus in the network literature section of the present application, where the apparatus of the present embodiment includes: an obtaining module 1201, a processing module 1202 and an output module 1203, where the obtaining module 1201 is configured to obtain sequences of chapters of N first candidate sites, where N is an integer greater than or equal to 2; the processing module 1202 is configured to establish an acyclic markov chain model according to the sequence of the chapters of the N first candidate sites, where a node of the acyclic markov chain model is determined according to the chapters, a directed edge of the acyclic markov chain model is determined according to a sequence of the chapters, and a weight of the directed edge is determined according to the number of times of occurrence of the sequence of the chapters corresponding to the directed edge; the processing module 1202 is further configured to obtain the maximum weight according to the acyclic markov chain model; the output module 1203 is configured to determine that the chapter sequence corresponding to the subsequence with the largest weight is a trusted chapter sequence.
In the above embodiment, the output module 1203 is further configured to determine that the section corresponding to the last end node of the subsequence with the largest weight is the latest section of the network literature.
In the above embodiment, the processing module 1202 is specifically configured to merge the chapters of each of the N first candidate sites into the established endless markov chain in turn; wherein merging the chapters of each first candidate site into the established endless Markov chain comprises:
sequentially combining each chapter into an established endless Markov chain according to a preset rule according to the sequence of the chapters in the first candidate site, wherein the initial value of the established Markov chain is null;
wherein, the merging each section into the established endless markov chain according to the preset rule comprises:
determining whether the first chapter exists in an established acyclic Markov chain, and if the first chapter does not exist, adding the first chapter to the established acyclic Markov chain; determining whether a second chapter exists in the established acyclic Markov chain, if not, adding the second chapter to the established acyclic Markov chain, and establishing a directed edge from the first chapter to the second chapter; if the second section exists, establishing a directed edge from the first section to the second section;
if the first chapter exists, determining whether a second chapter exists in the established acyclic Markov chain, if the second chapter does not exist, adding the second chapter to the established acyclic Markov chain, and establishing a directed edge from the first chapter to the second chapter; if the second section exists, determining whether a directed edge from the first section to the second section exists, if the directed edge from the first section to the second section does not exist, determining whether a loop is formed between the directed edge from the first section to the second section and the existing directed edge, and if the loop is not formed, establishing a directed edge from the first section to the second section; and if the directed edge from the first chapter to the second chapter exists, adding a unit to the weight value of the directed edge from the first chapter to the second chapter, wherein the first chapter is a chapter to be merged, and the second chapter is a next chapter of the first chapter.
In the foregoing embodiment, the obtaining module 1201 is specifically configured to obtain a sequence of inverse L chapters of the N first candidate sites, where L is an integer greater than or equal to 2.
The apparatus of this embodiment may be correspondingly used to implement the technical solution of the method embodiment shown in fig. 1, and the implementation principle and the technical effect are similar, which are not described herein again.
In addition to the embodiment shown in fig. 12, the processing module 1202 is further configured to determine the N first candidate sites according to the number of chapters of M second candidate sites, where M is an integer greater than N.
Wherein the processing module 1202 is specifically configured to determine pi=|si-u/δ, obtaining a score for each second candidate site; obtaining a score greater than DmaxThe second candidate sites are the N first candidate sites, wherein i is an integer which is greater than or equal to 1 and less than or equal to M, and s isiIs the number of chapters of the ith second candidate site, u is the average of the number of chapters of the M second candidate sites, δ is the standard deviation of the number of chapters of the M second candidate sites, and
Figure BDA0001322967420000121
said n being equal to M.
The apparatus of this embodiment may be correspondingly used to implement the technical solution of the method embodiment shown in fig. 11, and the implementation principle and the technical effect are similar, which are not described herein again.
Those of ordinary skill in the art will understand that: all or a portion of the steps of implementing the above-described method embodiments may be performed by hardware associated with program instructions. The program may be stored in a computer-readable storage medium. When executed, the program performs steps comprising the method embodiments described above; and the aforementioned storage medium includes: various media that can store program codes, such as ROM, RAM, magnetic or optical disks.
Finally, it should be noted that: the above embodiments are only used for illustrating the technical solutions of the present application, and not for limiting the same; although the present application has been described in detail with reference to the foregoing embodiments, it should be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some or all of the technical features may be equivalently replaced; and the modifications or the substitutions do not make the essence of the corresponding technical solutions depart from the scope of the technical solutions of the embodiments of the present application.

Claims (12)

1. A method for determining a network literature chapter, comprising:
acquiring a sequence of chapters of N first candidate sites, wherein N is an integer greater than or equal to 2;
establishing an acyclic Markov chain model according to the sequence of the chapters of the N first candidate sites, wherein the acyclic Markov chain model is used for integrating the sequence of the chapters of the first candidate sites, the nodes of the acyclic Markov chain model are determined according to the chapters, the directed edges of the acyclic Markov chain model are determined according to the sequence of the chapters, and the weights of the directed edges are determined according to the occurrence times of the sequence of the chapters corresponding to the directed edges;
acquiring a subsequence with the maximum weight according to the acyclic Markov chain model;
and determining the chapter sequence corresponding to the subsequence with the maximum weight value as a credible chapter sequence.
2. The method of claim 1, further comprising:
and determining the section corresponding to the tail node of the subsequence with the maximum weight as the latest section of the network literature.
3. The method of claim 1, wherein obtaining the sequence of chapters for the N first candidate sites is preceded by:
and determining the N first candidate sites according to the number of chapters of the M second candidate sites, wherein M is an integer larger than N.
4. The method of claim 3, wherein determining the N first candidate sites based on the number of chapters of the M second candidate sites comprises:
obtaining the score of each second candidate station according to pi ═ si-u |/delta; and obtaining second candidate sites with the scores larger than Dmax as the N first candidate sites, wherein i is an integer larger than or equal to 1 and smaller than or equal to M, si is the number of chapters of the ith second candidate site, u is the average number of chapters of the M second candidate sites, δ is the standard deviation of the number of chapters of the M second candidate sites, and N is equal to M.
5. The method according to any one of claims 1-4, wherein the obtaining of the sequence of the chapters of the N first candidate sites comprises:
and acquiring a sequence of the last L sections of the N first candidate sites, wherein L is an integer greater than or equal to 2.
6. The method of claim 5, wherein building an endless Markov chain model from the sequence of chapters for the N first candidate sites comprises:
sequentially merging chapters of each first candidate site of the N first candidate sites into the established endless Markov chain;
wherein merging the chapters of each first candidate site into the established endless Markov chain comprises:
sequentially combining each chapter into an established endless Markov chain according to a preset rule according to the sequence of the chapters in the first candidate site, wherein the initial value of the established Markov chain is null;
wherein, the merging each section into the established endless markov chain according to the preset rule comprises:
determining whether the first chapter exists in an established acyclic Markov chain, and if the first chapter does not exist, adding the first chapter to the established acyclic Markov chain; determining whether a second chapter exists in the established acyclic Markov chain, if not, adding the second chapter to the established acyclic Markov chain, and establishing a directed edge from the first chapter to the second chapter; if the second section exists, establishing a directed edge from the first section to the second section;
if the first chapter exists, determining whether a second chapter exists in the established acyclic Markov chain, if the second chapter does not exist, adding the second chapter to the established acyclic Markov chain, and establishing a directed edge from the first chapter to the second chapter; if the second section exists, determining whether a directed edge from the first section to the second section exists, if the directed edge from the first section to the second section does not exist, determining whether a loop is formed between the directed edge from the first section to the second section and the existing directed edge, and if the loop is not formed, establishing a directed edge from the first section to the second section; and if the directed edge from the first chapter to the second chapter exists, adding a unit to the weight value of the directed edge from the first chapter to the second chapter, wherein the first chapter is a chapter to be merged, and the second chapter is a next chapter of the first chapter.
7. An apparatus for determining a network literature chapter, comprising:
an obtaining module, configured to obtain a sequence of chapters of N first candidate sites, where N is an integer greater than or equal to 2;
the processing module is used for establishing an acyclic Markov chain model according to the sequence of the chapters of the N first candidate sites, wherein the acyclic Markov chain model is used for integrating the sequence of the chapters of the first candidate sites, the nodes of the acyclic Markov chain model are determined according to the chapters, the directed edges of the acyclic Markov chain model are determined according to the sequence of the chapters, and the weight values of the directed edges are determined according to the occurrence times of the sequence of the chapters corresponding to the directed edges;
the processing module is further configured to obtain a subsequence with a largest weight according to the acyclic markov chain model;
and the output module is used for determining that the chapter sequence corresponding to the subsequence with the maximum weight value is a credible chapter sequence.
8. The apparatus of claim 7, wherein the output module is further configured to determine that the section corresponding to the last end node of the subsequence with the largest weight is the latest section of the network literature.
9. The apparatus of claim 7, wherein the processing module is further configured to determine the N first candidate sites according to a number of chapters of M second candidate sites, wherein M is an integer greater than N.
10. The apparatus according to claim 9, wherein the processing module is specifically configured to obtain a score for each second candidate station according to pi ═ si-u |/δ; and obtaining second candidate sites with the scores larger than Dmax as the N first candidate sites, wherein i is an integer larger than or equal to 1 and smaller than or equal to M, si is the number of chapters of the ith second candidate site, u is the average number of chapters of the M second candidate sites, δ is the standard deviation of the number of chapters of the M second candidate sites, and N is equal to M.
11. The apparatus according to any one of claims 7 to 10, wherein the obtaining module is specifically configured to obtain a sequence of reciprocal L chapters of the N first candidate sites, where L is an integer greater than or equal to 2.
12. The apparatus of claim 11, wherein the processing module is specifically configured to merge sections of each of the N first candidate sites into an established endless markov chain in turn; wherein merging the chapters of each first candidate site into the established endless Markov chain comprises:
sequentially combining each chapter into an established endless Markov chain according to a preset rule according to the sequence of the chapters in the first candidate site, wherein the initial value of the established Markov chain is null;
wherein, the merging each section into the established endless markov chain according to the preset rule comprises:
determining whether the first chapter exists in an established acyclic Markov chain, and if the first chapter does not exist, adding the first chapter to the established acyclic Markov chain; determining whether a second chapter exists in the established acyclic Markov chain, if not, adding the second chapter to the established acyclic Markov chain, and establishing a directed edge from the first chapter to the second chapter; if the second section exists, establishing a directed edge from the first section to the second section;
if the first chapter exists, determining whether a second chapter exists in the established acyclic Markov chain, if the second chapter does not exist, adding the second chapter to the established acyclic Markov chain, and establishing a directed edge from the first chapter to the second chapter; if the second section exists, determining whether a directed edge from the first section to the second section exists, if the directed edge from the first section to the second section does not exist, determining whether a loop is formed between the directed edge from the first section to the second section and the existing directed edge, and if the loop is not formed, establishing a directed edge from the first section to the second section; and if the directed edge from the first chapter to the second chapter exists, adding a unit to the weight value of the directed edge from the first chapter to the second chapter, wherein the first chapter is a chapter to be merged, and the second chapter is a next chapter of the first chapter.
CN201710452914.5A 2017-06-15 2017-06-15 Method and device for determining network literature chapters Active CN108268429B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201710452914.5A CN108268429B (en) 2017-06-15 2017-06-15 Method and device for determining network literature chapters

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710452914.5A CN108268429B (en) 2017-06-15 2017-06-15 Method and device for determining network literature chapters

Publications (2)

Publication Number Publication Date
CN108268429A CN108268429A (en) 2018-07-10
CN108268429B true CN108268429B (en) 2021-08-06

Family

ID=62771764

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710452914.5A Active CN108268429B (en) 2017-06-15 2017-06-15 Method and device for determining network literature chapters

Country Status (1)

Country Link
CN (1) CN108268429B (en)

Citations (18)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1604075A (en) * 2004-11-22 2005-04-06 北京北大方正技术研究院有限公司 Method for conducting words reading sequence recovery for newspaper pages
CN1604073A (en) * 2004-11-22 2005-04-06 北京北大方正技术研究院有限公司 Method for conducting title and text logic connection for newspaper pages
CN101866418A (en) * 2009-04-17 2010-10-20 株式会社理光 Method and equipment for determining file reading sequences
CN102270343A (en) * 2011-07-27 2011-12-07 宁波大学 Image segmentation method based on Ising graph model
CN102831059A (en) * 2012-08-23 2012-12-19 北京工业大学 Software behavior modeling method based on state layer
CN102937933A (en) * 2012-11-14 2013-02-20 中国矿业大学 Class test sequence determining method based on testing level
CN103377271A (en) * 2012-04-24 2013-10-30 英奇达资讯股份有限公司 Method for making knowledge map
CN103544172A (en) * 2012-07-13 2014-01-29 深圳市世纪光速信息技术有限公司 Method and device for processing chapter catalogs of E-book
CN103577566A (en) * 2013-10-25 2014-02-12 北京奇虎科技有限公司 Web reading content loading method and device
CN104268127A (en) * 2014-09-22 2015-01-07 同方知网(北京)技术有限公司 Method for analyzing reading order of electronic layout file
CN104331438A (en) * 2014-10-24 2015-02-04 北京奇虎科技有限公司 Method and device for selectively extracting content of novel webpage
CN104391889A (en) * 2014-11-11 2015-03-04 西安交通大学 Method for discovering community structure oriented to directed-weighting network
WO2015099810A1 (en) * 2013-12-29 2015-07-02 Hewlett-Packard Development Company, L.P. Learning graph
CN105096240A (en) * 2015-07-21 2015-11-25 南京师范大学 Method for hiding image sensitive object based texture synthesis
CN105095613A (en) * 2014-04-16 2015-11-25 华为技术有限公司 Method and device for prediction based on sequential data
CN105513400A (en) * 2015-12-03 2016-04-20 四川长虹电器股份有限公司 Method for dynamically planning travel route
CN105609107A (en) * 2015-12-23 2016-05-25 北京奇虎科技有限公司 Text processing method and device based on voice identification
CN106844350A (en) * 2017-02-15 2017-06-13 广州索答信息科技有限公司 A kind of computational methods of short text semantic similarity

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10346516B2 (en) * 2013-02-27 2019-07-09 International Business Machines Corporation Readable structural text-based representation of activity flows

Patent Citations (18)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1604075A (en) * 2004-11-22 2005-04-06 北京北大方正技术研究院有限公司 Method for conducting words reading sequence recovery for newspaper pages
CN1604073A (en) * 2004-11-22 2005-04-06 北京北大方正技术研究院有限公司 Method for conducting title and text logic connection for newspaper pages
CN101866418A (en) * 2009-04-17 2010-10-20 株式会社理光 Method and equipment for determining file reading sequences
CN102270343A (en) * 2011-07-27 2011-12-07 宁波大学 Image segmentation method based on Ising graph model
CN103377271A (en) * 2012-04-24 2013-10-30 英奇达资讯股份有限公司 Method for making knowledge map
CN103544172A (en) * 2012-07-13 2014-01-29 深圳市世纪光速信息技术有限公司 Method and device for processing chapter catalogs of E-book
CN102831059A (en) * 2012-08-23 2012-12-19 北京工业大学 Software behavior modeling method based on state layer
CN102937933A (en) * 2012-11-14 2013-02-20 中国矿业大学 Class test sequence determining method based on testing level
CN103577566A (en) * 2013-10-25 2014-02-12 北京奇虎科技有限公司 Web reading content loading method and device
WO2015099810A1 (en) * 2013-12-29 2015-07-02 Hewlett-Packard Development Company, L.P. Learning graph
CN105095613A (en) * 2014-04-16 2015-11-25 华为技术有限公司 Method and device for prediction based on sequential data
CN104268127A (en) * 2014-09-22 2015-01-07 同方知网(北京)技术有限公司 Method for analyzing reading order of electronic layout file
CN104331438A (en) * 2014-10-24 2015-02-04 北京奇虎科技有限公司 Method and device for selectively extracting content of novel webpage
CN104391889A (en) * 2014-11-11 2015-03-04 西安交通大学 Method for discovering community structure oriented to directed-weighting network
CN105096240A (en) * 2015-07-21 2015-11-25 南京师范大学 Method for hiding image sensitive object based texture synthesis
CN105513400A (en) * 2015-12-03 2016-04-20 四川长虹电器股份有限公司 Method for dynamically planning travel route
CN105609107A (en) * 2015-12-23 2016-05-25 北京奇虎科技有限公司 Text processing method and device based on voice identification
CN106844350A (en) * 2017-02-15 2017-06-13 广州索答信息科技有限公司 A kind of computational methods of short text semantic similarity

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
Document understanding for a broad class of documents;Marco Aiello 等;《http://www.rug.nl/research/portal》;20021231;第1-17页 *

Also Published As

Publication number Publication date
CN108268429A (en) 2018-07-10

Similar Documents

Publication Publication Date Title
CN105389349B (en) Dictionary update method and device
US8768919B2 (en) Web searching
CN107729322B (en) Word segmentation method and device and sentence vector generation model establishment method and device
US20100076910A1 (en) Calculating web page importance based on web behavior model
CN105302882B (en) Obtain the method and device of keyword
US20160292234A1 (en) Method and system for searching in a distributed database
CN108304377B (en) Extraction method of long-tail words and related device
CN112052413B (en) URL fuzzy matching method, device and system
CN111178514A (en) Neural network quantification method and system
CN112434188B (en) Data integration method, device and storage medium of heterogeneous database
CN115473841A (en) Method and device for determining network path and storage medium
CN109992659B (en) Method and device for text sorting
CN108268429B (en) Method and device for determining network literature chapters
CN109376362A (en) A kind of the determination method and relevant device of corrected text
CN104077320B (en) method and device for generating information to be issued
CN110708619B (en) Word vector training method and device for intelligent equipment
Mehta et al. A general procedure for estimating finite population mean using ranked set sampling
CN107688661B (en) Lyric similarity calculation method, terminal device and computer-readable storage medium
CN109952742B (en) Graph structure processing method, system, network device and storage medium
JPWO2009069474A1 (en) Similar service search system, method, and program
CN104750609B (en) Determine the method and device of interface layout compatibility
Holzer et al. An analysis of the renormalization group method for asymptotic expansions with logarithmic switchback terms
CN110097893B (en) Audio signal conversion method and device
JP6005583B2 (en) SEARCH DEVICE, SEARCH METHOD, AND SEARCH PROGRAM
CN110633474B (en) Mathematical formula identification method, device, equipment and readable storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
TA01 Transfer of patent application right

Effective date of registration: 20200423

Address after: 310052 room 508, floor 5, building 4, No. 699, Wangshang Road, Changhe street, Binjiang District, Hangzhou City, Zhejiang Province

Applicant after: Alibaba (China) Co.,Ltd.

Address before: 510627 Guangdong city of Guangzhou province Whampoa Tianhe District Road No. 163 Xiping Yun Lu Yun Ping square B radio tower 13 layer self unit 01

Applicant before: GUANGZHOU SHENMA MOBILE INFORMATION TECHNOLOGY Co.,Ltd.

TA01 Transfer of patent application right
GR01 Patent grant
GR01 Patent grant