CN110096649B - Post extraction method, device, equipment and storage medium - Google Patents

Post extraction method, device, equipment and storage medium Download PDF

Info

Publication number
CN110096649B
CN110096649B CN201910401520.6A CN201910401520A CN110096649B CN 110096649 B CN110096649 B CN 110096649B CN 201910401520 A CN201910401520 A CN 201910401520A CN 110096649 B CN110096649 B CN 110096649B
Authority
CN
China
Prior art keywords
post
popularity
target
text
determining
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201910401520.6A
Other languages
Chinese (zh)
Other versions
CN110096649A (en
Inventor
王非池
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Wuhan Douyu Network Technology Co Ltd
Original Assignee
Wuhan Douyu Network Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Wuhan Douyu Network Technology Co Ltd filed Critical Wuhan Douyu Network Technology Co Ltd
Priority to CN201910401520.6A priority Critical patent/CN110096649B/en
Publication of CN110096649A publication Critical patent/CN110096649A/en
Application granted granted Critical
Publication of CN110096649B publication Critical patent/CN110096649B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/334Query execution
    • G06F16/3344Query execution using natural language analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9535Search customisation based on user profiles and personalisation

Abstract

The embodiment of the invention discloses a post extraction method, a device, equipment and a storage medium, wherein the method comprises the following steps: acquiring a plurality of post information, wherein the post information comprises a text and a reply text of a post; determining the target association degree between every two posts according to each text and each replying text; determining an association graph corresponding to each post according to the association degree of each target, and determining the target popularity of each post according to the association graph and the association degree of each target; and extracting the target posts from the posts according to the target popularity. By the technical scheme of the embodiment of the invention, the accuracy of hot post extraction can be improved.

Description

Post extraction method, device, equipment and storage medium
Technical Field
The embodiment of the invention relates to information processing technology, in particular to a post extraction method, a device, equipment and a storage medium.
Background
With the continuous development of internet technology, various network posts are quite popular. The network posts can refer to articles or opinions published by netizens on forums, and can communicate discussion by means of posting posts and replying posts. For example, users in a live broadcast platform often discuss live broadcast content through a post bar, so that the users can know the daily discussed subjects based on posts in the post bar, and therefore public opinion control can be accurately performed.
In the prior art, popular posts are generally extracted and mined based on the number of posts returned or clicked, for example, the posts with the highest number of returned or clicked are taken as the popular posts. However, the extraction method is easily interfered by user behaviors such as brushing posts, robbing buildings and drawing a lottery, and meanwhile, the content information expressed by the posts is ignored, so that the accuracy of hot post extraction is greatly reduced.
Disclosure of Invention
The embodiment of the invention provides a post extraction method, a device, equipment and a storage medium, which are used for improving the accuracy of hot post extraction.
In a first aspect, an embodiment of the present invention provides a post extraction method, including:
acquiring a plurality of post information, wherein the post information comprises a text and a reply text of a post;
determining the target association degree between every two posts according to each text and each replying text;
determining an association graph corresponding to each post according to each target association degree, and determining a target popularity of each post according to the association graph and the target association degree;
and extracting the target posts from the posts according to the target popularity.
In a second aspect, an embodiment of the present invention further provides a post extraction apparatus, including:
the post information acquisition module is used for acquiring a plurality of post information, wherein the post information comprises a text and a reply text of a post;
the target relevance determining module is used for determining the target relevance between every two posts according to each text and each replying text;
the target popularity determination module is used for determining an association graph corresponding to each post according to each target association degree and determining the target popularity of each post according to the association graph and the target association degree;
and the target post extraction module is used for extracting the target posts from the posts according to the target popularity.
In a third aspect, an embodiment of the present invention further provides an apparatus, where the apparatus includes:
one or more processors;
a memory for storing one or more programs;
when executed by the one or more processors, cause the one or more processors to implement a post extraction method as provided by any of the embodiments of the invention.
In a fourth aspect, the embodiment of the present invention further provides a computer-readable storage medium, on which a computer program is stored, which when executed by a processor implements the post extraction method provided in any embodiment of the present invention.
According to the method and the device for extracting the hot target posts, the target relevance between any two posts is determined according to the text and the replying text of each post, the relevance graph formed by all the posts is determined according to each target relevance, and the target hot degree corresponding to each post is determined based on the relevance graph and each target relevance, so that the hot target posts can be extracted from each post according to each target hot degree. By extracting the target posts based on the text information of the posts, the interference of user behaviors such as brushing posts, building robbing and lottery drawing and the like can be avoided, the accuracy of extraction of popular posts is greatly improved, and the public opinion analysis is more accurately performed.
Drawings
FIG. 1 is a flowchart of a post extraction method according to an embodiment of the present invention;
FIG. 2 is a flowchart of a post extraction method according to a second embodiment of the present invention;
fig. 3 is a flowchart of a target popularity determination method according to a second embodiment of the present invention;
FIG. 4 is a schematic structural diagram of a post extraction apparatus according to a third embodiment of the present invention;
fig. 5 is a schematic structural diagram of an apparatus according to a fourth embodiment of the present invention.
Detailed Description
The present invention will be described in further detail with reference to the accompanying drawings and examples. It is to be understood that the specific embodiments described herein are merely illustrative of the invention and are not limiting of the invention. It should be further noted that, for the convenience of description, only some of the structures related to the present invention are shown in the drawings, not all of the structures.
Example one
Fig. 1 is a flowchart of a post extraction method according to an embodiment of the present invention, which is applicable to the extraction of popular posts. The method can be executed by a post extraction device, which can be realized by software and/or hardware and is integrated in equipment with information processing function. The method specifically comprises the following steps:
s110, obtaining a plurality of post information, wherein the post information comprises a text and a reply text of the post.
Wherein, the post may refer to an article published in the bar by the user. Each post information may include body text and posting text. The body text may contain the title and content of the post. The posting text may refer to the respective text that the user replies to the post so that the user may communicate discussions by posting. There may be one or more posts per post.
Specifically, the present embodiment may obtain all post information in the bar so as to extract the target post based on the text information of the post.
S120, determining the target association degree between every two posts according to each text and each replying text.
Wherein the target relevance may refer to a degree of textual similarity between two posts. If the target relevance between two posts is higher, the public sentiment subjects discussed by the two posts are similar.
Specifically, for each post, the text and the reply text of the post may be subjected to word segmentation processing based on an existing word segmentation tool, such as chinese word segmentation, recognition of proper nouns, elimination of meaningless stop words, part-of-speech tagging by using a dictionary, and only nouns, verbs, adjectives and the like are retained, so as to obtain each post keyword corresponding to the post, and form a post keyword set, where the post keyword set may include each text keyword and each reply keyword.
Illustratively, the target relevance between any two posts can be determined according to the post keyword set corresponding to each post based on the following formula:
Figure GDA0002093125290000041
wherein, S (D)i,Dj) Is the target degree of association between post i and post j; diIs a post keyword set corresponding to the post i; djIs a post keyword set corresponding to the post j; i { wk|wk∈Di&wk∈DjMeans simultaneously present in DiAnd DjKeyword w of (1)kThe number of (2); i DiI is the total number of keywords in the keyword set of the post corresponding to the post i; i DjAnd | is the total number of keywords in the set of keywords of the post corresponding to the post j. Note that, by comparing the total number | D of keywords in the post keyword setiAnd if the log extraction processing is carried out, the excessive interference of keywords caused by overlong post text can be avoided, and the accuracy of calculating the target criticality is improved.
S130, determining a correlation diagram corresponding to each post according to the target correlation degree, and determining the target popularity of each post according to the correlation diagram and the target correlation degree.
The association graph may refer to a connected graph without directions, which is formed by all posts based on target association degrees. The target popularity may be used to reflect the popularity of the content of the post.
Specifically, each post may be used as a vertex, and vertices corresponding to two posts with a target relevance greater than a preset relevance are connected, so as to determine a relevance graph corresponding to each post. By screening a large number of target relevance degrees by using the preset relevance degrees and connecting vertexes according to the screened target relevance degrees, the calculation complexity can be reduced and the calculation efficiency can be improved. The embodiment may determine the associated post corresponding to each post based on the association diagram, and may determine the target popularity of the post according to the target association degree between the posts and the associated posts.
And S140, extracting target posts from the posts according to the target popularity.
Wherein the target post may refer to a highly popular post.
Specifically, the posts may be arranged in a descending order according to the target popularity of the posts, so that the target popularity of each arranged post is sequentially reduced, and at this time, a preset number of posts in the arranged posts may be used as the target posts. By analyzing the extracted target posts, the trending topics discussed by the user can be obtained, and public opinion control is facilitated. The embodiment extracts the posts based on the text information of the posts, and can extract the posts even if the clicking times of the posts are low when the posts have more topic discussions, so that the interference of user behaviors such as brushing the posts, robbing buildings and drawing a lottery can be avoided, and the accuracy of extracting the popular posts is greatly improved.
According to the technical scheme of the embodiment, the target relevance between any two posts is determined according to the text and the replying text of each post, the relevance graph consisting of all posts is determined according to each target relevance, and the target popularity corresponding to each post is determined based on the relevance graph and each target relevance, so that the popular target posts can be extracted from each post according to each target popularity. By extracting the target posts according to the text information of the posts, the interference of user behaviors such as brushing posts, building robbing and lottery drawing and the like can be avoided, the accuracy of extraction of popular posts is greatly improved, and public opinion analysis is more accurate.
On the basis of the above technical solution, S120 may include: performing word segmentation processing on each text to determine a text keyword set, and performing word segmentation processing on each reply text to determine a reply keyword set; determining the text relevancy between every two posts according to the text keyword set corresponding to each post; determining the replying association degree between every two posts according to the replying keyword set corresponding to each post; and determining the target relevance between every two posts according to the text relevance and the replying relevance.
The text keyword set may be a set composed of text keywords obtained by segmenting a text. The reply keyword set may be a set of reply keywords obtained by segmenting the reply text. The body relevancy may refer to the degree of similarity between the body text of two posts. The degree of posting relevance may refer to a degree of similarity between the posting text of the two posts.
Specifically, in this embodiment, the relevance between the text and the posting text is different, so that the text and the posting text are subjected to word segmentation processing, and a text keyword set and a posting keyword set corresponding to each posting are determined. And respectively determining the text relevancy between the two posts according to the text keyword sets corresponding to the two posts, and determining the replying relevancy between the two posts according to the replying keyword sets corresponding to the two posts. Based on the text association degree and the replying association degree between the two posts, the target association degree between the two posts can be more accurately determined, so that the target association degree can more accurately reflect the similarity degree between the posts, namely the higher the target association degree is, the higher the text association degree and the replying association degree are.
Illustratively, the degree of textual relevance between two posts may be determined according to the following formula:
Figure GDA0002093125290000071
wherein, S (T)i,Tj) Is the text relevancy between post i and post j; t isiIs a text keyword set corresponding to the post i; t isjIs the text keyword set corresponding to the post j; i { wk|wk∈Ti&wk∈TjMeans that the same time occurs at TiAnd TjKeyword w in (1)kThe number of (2); i TiL is the total number of keywords in the text keyword set corresponding to the post i; i TjAnd | is the total number of keywords in the text keyword set corresponding to post j.
In the embodiment, by determining the degree of relatedness between posts, the influence caused by behaviors such as title parties, plagiarism posts and the like can be prevented, so that popular posts with more copybacks can be extracted more easily. Illustratively, the degree of relatedness between two posts may be determined according to the following formula:
Figure GDA0002093125290000072
wherein, S (R)i,Rj) Is the degree of relatedness between post i and post j;Riis a reply keyword set corresponding to the post i; rjIs a reply keyword set corresponding to the post j; i { wk|wk∈Ri&wk∈RjDenotes the simultaneous occurrence of RiAnd RjKeyword w in (1)kThe number of (2); | RiL is the total number of keywords in the reply keyword set corresponding to the post i; | RjAnd | is the total number of keywords in the reply keyword set corresponding to the post j.
Illustratively, the target relevance between two posts may be determined according to the following formula:
S(i,j)=rT×S(Ti,Tj)+rR×S(Ri,Rj)
wherein S (i, j) is the target relevance between post i and post j; s (T)i,Tj) Is the text relevancy between post i and post j; s (R)i,Rj) Is the degree of relatedness between post i and post j; r isTIs a preset text weight; r isRIs a preset posting weight. Wherein the text weight r is presetTAnd a preset copyback weight rRCan be preset based on the service scene and the requirement. Preset text weight rTMay be greater than or equal to the preset replying weight rR. By carrying out weighted summation on the text relevance degree and the replying relevance degree, the target relevance degree of the post can be more accurately determined.
Illustratively, the post information corresponding to post a is:
text: zhang Sanhao and Hao, I like her, hope that her live broadcast is better and better.
And (3) replying text: i also feel wonder and support building owner.
The post information corresponding to the post B is as follows:
text: zhang three of pigeon today? None of this points see her live.
And (3) replying text: the homeowner does not play today and participates in the event.
Respectively performing word segmentation processing on the post information of the post A and the post B:
text keywords corresponding to post a: three-leaf, beautiful, favorite, hope, direct broadcast, better and better
Post A corresponding reply keyword: feeling, support and building owner
Text keywords corresponding to post B: zhang San, today, Pigeon, watch her, live broadcast
And replying keywords corresponding to the post B: the building owner, today, does not broadcast, attends to, moves
Calculate the target relevance between post A and post B as follows:
Figure GDA0002093125290000081
S(A,B)=S(TA,TB)+S(RA,RB)=2.2
example two
Fig. 2 is a flowchart of a post extraction method according to a second embodiment of the present invention, and in this embodiment, based on the above embodiments, the "target popularity determined according to the association graph and the target association degree" is optimized. Wherein explanations of the same or corresponding terms as those of the above-described embodiments are omitted.
Referring to fig. 2, the method for extracting a post provided in this embodiment specifically includes the following steps:
s210, obtaining a plurality of post information, wherein the post information comprises a text and a reply text of the post.
S220, determining the target association degree between every two posts according to the text and the replying text.
And S230, determining a correlation diagram corresponding to each post according to the target correlation degree.
S240, determining the associated post corresponding to each post according to the associated graph, and acquiring the initial popularity of each post.
The associated post corresponding to the post may refer to another post connected with the post in the associated graph. The initial popularity of the post may refer to an initial value set in advance for the popularity of the post. For example, the initial popularity of each post may be set to 1, so that the initial popularity of each post is the same, in order to improve the accuracy of the calculation of the target popularity of the post.
Specifically, based on whether a connection line exists between any two posts in the association graph, all the associated posts corresponding to each post can be determined. For example, if vertices corresponding to post a and post B are connected in the association graph, it may be determined that post a and post B are associated posts. The initial popularity of each post is set, so that the subsequent popularity iteration process can be facilitated.
And S250, iterating the popularity of each post according to the associated post, the initial popularity and the target relevance corresponding to each post, and determining the target popularity of each post according to the iteration result.
In particular, since the topic may refer to communication between posts, when a topic is referred to by a plurality of posts, the topic is a hot topic. Meanwhile, when one post contains a hot topic, the fact that the associated post associated with the post may also contain the hot topic is indicated, so that the target hot degree corresponding to each post can be obtained iteratively based on the target association degree among the posts, and the accuracy of post extraction is further improved. That is, in the present embodiment, the popularity of each post may be obtained by popularity transmission of the associated post associated therewith, and when more popular posts are associated with the current post, it indicates that the popularity of the post is higher.
Illustratively, as shown in FIG. 3, S250 may determine the target popularity for each post by the following operations of steps S251-S255:
and S251, taking the initial popularity corresponding to each post as a first popularity.
In particular, at a first trending iteration, the initial trending may be taken as the first trending corresponding to each post.
S252, determining a second popularity corresponding to each post after the current iteration according to the first associated post corresponding to each post, the second associated post associated with the first associated post, the first popularity corresponding to the first associated post and the target relevance.
Each post may be regarded as a current post, and accordingly, the first associated post may refer to an associated post that is wired to the current post. The second associated post may refer to an associated post that is wired to the first associated post. The first popularity may refer to a popularity corresponding to the post prior to the current iteration. The second popularity may refer to a popularity corresponding to the post after the current iteration.
For example, for each post, the second popularity of post i may be determined according to the following formula:
Figure GDA0002093125290000101
wherein WD (i)' is a second popularity corresponding to post i; in (i) is a first set of associated posts associated with post i; j is one of the first set of associated posts In (i); in (j) is a second set of associated posts associated with post j; k is one second associated post of the first set of associated posts In (j); s (i, j) is the target relevance between post i and post j; s (j, k) is the target relevance between post j and post k; WD (j) is the first popularity for post j; ε is a preset damping coefficient.
In particular, the second popularity for each post may need to be recalculated at each iteration. For example, in the first iteration, each post may be used as a current post one by one, and a second popularity corresponding to the current post after the current iteration is calculated by using an initial popularity of a first associated post corresponding to the current post, a target relevance between the current post and the first associated post, and a target relevance between a second associated post and the first associated post. It should be noted that, in the embodiment, by setting the preset damping coefficient ∈, it may be ensured that each post has a certain popularity, and a situation that the popularity of the post is zero due to a lack of the associated post is prevented, so that popularity transfer of the post is optimized, and accuracy of popularity determination is improved. Illustratively, the preset damping coefficient ε may be set to 0.8.
S253, detecting whether each second hot degree meets a preset iteration stop condition or not; if yes, go to step S254; if not, the process proceeds to step S255.
The preset iteration stop adjustment can be preset and is used for representing the condition of iteration stop. For example, the preset iteration condition may be that the iteration result after the current iteration converges, that is, the second popularity after the current iteration tends to be stable; it may also be that a preset number of iterations is reached.
Specifically, the second popularity of each post after the next iteration may be compared with the corresponding second popularity of each post after the previous iteration, and if a difference between the second popularity of each post after the next iteration and the corresponding second popularity of each post after the previous iteration is smaller than a preset error, it is indicated that an iteration result is converged, at this time, the operation of step S254 may be performed, otherwise, the operation of step S255 is performed to perform the next iteration. The present embodiment may further detect whether the current iteration number is equal to the preset iteration number, if so, the operation of step S254 may be performed, otherwise, the next iteration is performed by performing the operation of step S255.
It should be noted that, if the convergence rate of the iteration result is relatively slow, the convergence rate may be increased by reducing the preset damping coefficient epsilon, so as to further increase the extraction rate.
And S254, determining the second popularity corresponding to each post after the current iteration as the target popularity.
Specifically, when the secondary iteration result satisfies the preset iteration stop condition, it indicates that the iteration result converges, and at this time, the second popularity corresponding to each post after the secondary iteration may be determined as the corresponding target popularity.
And S255, updating the first popularity corresponding to each post to be a second popularity obtained after the current iteration, and returning to execute the operation of the S252.
Specifically, when the result of the second iteration does not satisfy the preset iteration stop condition, the next iteration may be performed by updating the first popularity corresponding to each post to the corresponding second popularity obtained after the second iteration, so that the operations of S252-S255 are re-performed with the updated first popularity.
According to the technical scheme of the embodiment, the popularity corresponding to each post is iterated according to the associated post, the initial popularity and the target relevance corresponding to each post, and the target popularity corresponding to each post is determined according to the iteration result, so that the accuracy of extracting the posts is further improved.
On the basis of the above technical solution, S250 may further include: obtaining posting user information corresponding to each post; and iterating the popularity degree corresponding to each post according to the associated post, the initial popularity degree, the target relevance degree and the posting user information corresponding to each post, and determining the target popularity degree corresponding to each post according to the iteration result.
The posting user information may refer to identity information of a posting user (i.e., a building owner), such as posting user level information. Illustratively, the posting user may be ranked 1-100, with a larger value indicating a higher ranking, i.e., a higher user activity.
Specifically, in the embodiment, the popularity of each post may be iterated based on the posting user information, so that the influence of the watery posts issued by the machine user (i.e., the non-real user) due to no consideration of the user information in the prior art may be solved, and the accuracy of post extraction may be further improved.
Illustratively, "iterating the popularity of each post according to the associated post, the initial popularity, the target relevance and the posting user information corresponding to each post, and determining the target popularity of each post according to the iteration result" may include: taking the initial popularity corresponding to each post as a first popularity; determining a second popularity degree corresponding to each post after the current iteration according to a first associated post corresponding to each post, a second associated post associated with the first associated post, a first popularity degree corresponding to the first associated post, a target relevance degree and posting user information; if the second popularity does not meet the preset iteration stop condition, updating the first popularity corresponding to each post to be the second popularity obtained after the current iteration, and returning to execute the operation of determining the second popularity corresponding to each post after the current iteration according to the first associated post corresponding to each post, the second associated post associated with the first associated post, the first popularity corresponding to the first associated post, the target relevancy and the posting user information; and if the second popularity meets the preset iteration stop condition, determining the second popularity corresponding to each post after the current iteration as the target popularity.
Specifically, the present embodiment may also perform iteration based on the posting user information based on the similar iteration process described above. In each hot iteration, posts posted by users with higher posting user levels are more likely to cause topic discussion, that is, posts distributed by users with higher levels are often hot posts, so that the hot posts are more transitive among users with higher levels, and the hot posts posted by the users with higher levels can be more easily extracted.
For example, the second popularity for each post may be determined according to the following formula:
Figure GDA0002093125290000131
wherein WD (i)' is a second popularity corresponding to post i; in (i) is a first set of associated posts associated with post i; j is one of the first set of associated posts In (i); in (j) is a second set of associated posts associated with post j; k is one second associated post of the first set of associated posts In (j); s (i, j) is the target relevance between post i and post j; s (j, k) is the target relevance between post j and post k; l isiIs the posting user rank value corresponding to the post i; l isjIs the posting user rank value corresponding to post j; l iskIs the posting user rank value corresponding to the post k; WD (j) is the first popularity for post j; ε is a preset damping coefficient.
Illustratively, it can be seen from the above calculation formula of the second popularity corresponding to the post i that: suppose that the posting user rank value L corresponding to the first associated post jjThe posting user corresponding to the second associated post kRank value LkIf the values are not changed, if the posting user rank value L corresponding to the post i is not changed, the posting user rank value L corresponding to the post i is added to the posting user rank value LiThe larger the post is, the higher the second popularity corresponding to the post i is, that is, the higher the posting user rank is, posts issued by users are more likely to cause topic discussion, so that the popular posts can be determined more accurately based on the posting user information, and the influence of the post by the watery post is avoided.
The following is an embodiment of the post extraction device provided in the embodiments of the present invention, which belongs to the same inventive concept as the post extraction methods of the above embodiments, and reference may be made to the above embodiments of the post extraction method for details that are not described in detail in the embodiments of the post extraction device.
EXAMPLE III
Fig. 4 is a schematic structural diagram of a post extraction device according to a third embodiment of the present invention, where the embodiment is applicable to extracting popular posts, the device may include: post information acquisition module 310, target relevance determination module 320, target popularity determination module 330 and target post extraction module 340.
The post information acquisition module 310 is configured to acquire a plurality of post information, where the post information includes a text of a post and a reply text; the target relevance determining module 320 is used for determining the target relevance between every two posts according to each text and each replying text; the target popularity determination module 330 is configured to determine an association diagram corresponding to each post according to each target relevance, and determine a target popularity corresponding to each post according to the association diagram and the target relevance; and the target post extraction module 340 is used for extracting the target posts from the posts according to the target popularity.
Optionally, the target relevance determination module 320 is specifically configured to: performing word segmentation processing on each text to determine a text keyword set, and performing word segmentation processing on each reply text to determine a reply keyword set; determining the text relevancy between every two posts according to the text keyword set corresponding to each post; determining the replying association degree between every two posts according to the replying keyword set corresponding to each post; and determining the target relevance between every two posts according to the text relevance and the replying relevance.
Optionally, the text relevancy between two posts is determined according to the following formula:
Figure GDA0002093125290000151
wherein, S (T)i,Tj) Is the text relevancy between post i and post j; t isiIs a text keyword set corresponding to the post i; t isjIs the text keyword set corresponding to the post j; i { wk|wk∈Ti&wk∈TjMeans that the same time occurs at TiAnd TjKeyword w in (1)kThe number of (2); i TiL is the total number of keywords in the text keyword set corresponding to the post i; i TjAnd | is the total number of keywords in the text keyword set corresponding to post j.
Optionally, the target popularity determination module 330 includes:
the association post determining unit is used for determining an association post corresponding to each post according to the association diagram and acquiring initial popularity corresponding to each post;
and the target popularity determining unit is used for iterating the popularity corresponding to each post according to the associated post corresponding to each post, the initial popularity and the target relevance, and determining the target popularity corresponding to each post according to the iteration result.
Optionally, the target popularity determination unit includes:
the posting user information acquiring subunit is used for acquiring posting user information corresponding to each post;
and the target popularity determining subunit is used for iterating the popularity corresponding to each post according to the associated post corresponding to each post, the initial popularity, the target popularity and the posting user information, and determining the target popularity corresponding to each post according to the iteration result.
Optionally, the target popularity determination subunit is specifically configured to: taking the initial popularity corresponding to each post as a first popularity; determining a second popularity degree corresponding to each post after the current iteration according to a first associated post corresponding to each post, a second associated post associated with the first associated post, a first popularity degree corresponding to the first associated post, a target relevance degree and posting user information; if the second popularity does not meet the preset iteration stop condition, updating the first popularity corresponding to each post to be the second popularity obtained after the current iteration, and returning to execute the operation of determining the second popularity corresponding to each post after the current iteration according to the first associated post corresponding to each post, the second associated post associated with the first associated post, the first popularity corresponding to the first associated post, the target relevancy and the posting user information; and if the second popularity meets the preset iteration stop condition, determining the second popularity corresponding to each post after the current iteration as the target popularity.
Optionally, determining a second popularity for each post according to the following formula:
Figure GDA0002093125290000161
wherein WD (i)' is a second popularity corresponding to post i; in (i) is a first set of associated posts associated with post i; j is one of the first set of associated posts In (i); in (j) is a second set of associated posts associated with post j; k is one second associated post of the first set of associated posts In (j); s (i, j) is the target relevance between post i and post j; s (j, k) is the target relevance between post j and post k; l isiIs the posting user rank value corresponding to the post i; l isjIs the posting user rank value corresponding to post j; l iskIs the posting user rank value corresponding to the post k; WD (j) is the first popularity for post j; ε is a preset damping coefficient.
The post extraction device provided by the embodiment of the invention can execute the post extraction method provided by any embodiment of the invention, and has the corresponding functional modules and beneficial effects of executing the post extraction method.
Example four
Fig. 5 is a schematic structural diagram of an apparatus according to a fourth embodiment of the present invention. Referring to fig. 5, the apparatus includes:
one or more processors 410;
a memory 420 for storing one or more programs;
when executed by the one or more programs 410, cause the one or more processors 410 to implement a post extraction method as provided in any of the embodiments above, the method comprising:
acquiring a plurality of post information, wherein the post information comprises a text and a reply text of a post;
determining the target association degree between every two posts according to each text and each replying text;
determining an association graph corresponding to each post according to the association degree of each target, and determining the target popularity of each post according to the association graph and the association degree of each target;
and extracting the target posts from the posts according to the target popularity.
In FIG. 5, a processor 410 is illustrated as an example; the processor 410 and the memory 420 in the device may be connected by a bus or other means, as exemplified by the bus connection in fig. 5.
The memory 420 may be used as a computer-readable storage medium for storing software programs, computer-executable programs, and modules, such as program instructions/modules corresponding to the post extraction method in the embodiments of the present invention (e.g., the post information acquisition module 310, the target relevance determination module 320, the target popularity determination module 330, and the target post extraction module 340 in the post extraction device). The processor 410 executes various functional applications and data processing of the device by executing software programs, instructions and modules stored in the memory 420, that is, implements the post extraction method described above.
The memory 420 mainly includes a program storage area and a data storage area, wherein the program storage area can store an operating system and an application program required by at least one function; the storage data area may store data created according to use of the device, and the like. Further, the memory 420 may include high speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other non-volatile solid state storage device. In some examples, memory 420 may further include memory located remotely from processor 410, which may be connected to devices through a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.
The device proposed by the embodiment belongs to the same inventive concept as the post extraction method proposed by the above embodiment, and the technical details not described in detail in the embodiment can be referred to the above embodiment, and the embodiment has the same beneficial effects as the post extraction method.
EXAMPLE five
The fifth embodiment provides a computer-readable storage medium, on which a computer program is stored, where the computer program, when executed by a processor, implements a post extraction method according to any embodiment of the present invention, where the method includes:
acquiring a plurality of post information, wherein the post information comprises a text and a reply text of a post;
determining the target association degree between every two posts according to each text and each replying text;
determining an association graph corresponding to each post according to the association degree of each target, and determining the target popularity of each post according to the association graph and the association degree of each target;
and extracting the target posts from the posts according to the target popularity.
Computer storage media for embodiments of the invention may employ any combination of one or more computer-readable media. The computer readable medium may be a computer readable signal medium or a computer readable storage medium. The computer-readable storage medium may be, for example but not limited to: an electrical, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination thereof. More specific examples (a non-exhaustive list) of the computer readable storage medium would include the following: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the context of this document, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device.
A computer readable signal medium may include a propagated data signal with computer readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device.
Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: wireless, wire, fiber optic cable, RF, etc., or any suitable combination of the foregoing.
Computer program code for carrying out operations for aspects of the present invention may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, Smalltalk, C + +, or the like, as well as conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any type of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet service provider).
It will be understood by those skilled in the art that the modules or steps of the invention described above may be implemented by a general purpose computing device, they may be centralized on a single computing device or distributed across a network of computing devices, and optionally they may be implemented by program code executable by a computing device, such that it may be stored in a memory device and executed by a computing device, or it may be separately fabricated into various integrated circuit modules, or it may be fabricated by fabricating a plurality of modules or steps thereof into a single integrated circuit module. Thus, the present invention is not limited to any specific combination of hardware and software.
It is to be noted that the foregoing is only illustrative of the preferred embodiments of the present invention and the technical principles employed. It will be understood by those skilled in the art that the present invention is not limited to the particular embodiments illustrated herein, but is capable of various obvious changes, rearrangements and substitutions as will now become apparent to those skilled in the art without departing from the scope of the invention. Therefore, although the present invention has been described in greater detail by the above embodiments, the present invention is not limited to the above embodiments, and may include other equivalent embodiments without departing from the spirit of the present invention, and the scope of the present invention is determined by the scope of the appended claims.

Claims (9)

1. A post extraction method, comprising:
acquiring a plurality of post information, wherein the post information comprises a text and a reply text of a post;
determining the target association degree between every two posts according to each text and each replying text;
determining an association graph corresponding to each post according to each target association degree, and determining a target popularity of each post according to the association graph and the target association degree;
extracting target posts from the posts according to the target popularity;
the determining the target popularity of each post according to the association graph and the target relevance comprises the following steps:
determining an association post corresponding to each post according to the association diagram, and acquiring initial popularity of each post;
and iterating the popularity of each post according to the associated post, the initial popularity and the target popularity corresponding to each post, and determining the target popularity of each post according to an iteration result.
2. The method of claim 1, wherein determining a degree of target relevance between each post based on each of the body text and each of the posting texts comprises:
performing word segmentation processing on each text to determine a text keyword set, and performing word segmentation processing on each reply text to determine a reply keyword set;
determining the text relevancy between every two posts according to the text keyword set corresponding to each post;
determining the replying association degree between every two posts according to the replying keyword set corresponding to each post;
and determining the target relevance between every two posts according to the text relevance and the replying relevance.
3. The method of claim 2, wherein the degree of text relevance between two posts is determined according to the following formula:
Figure FDA0003020681880000021
wherein, S (T)i,Tj) Is the text relevancy between post i and post j; t isiIs a text keyword set corresponding to the post i; t isjIs the text keyword set corresponding to the post j; i { wk|wk∈Ti&wk∈TjMeans that the same time occurs at TiAnd TjKeyword w in (1)kThe number of (2); i TiL is the total number of keywords in the text keyword set corresponding to the post i; i TjAnd | is the total number of keywords in the text keyword set corresponding to post j.
4. The method of claim 1, wherein iterating the popularity of each post based on the associated post, the initial popularity, and the target popularity of each post, and determining the target popularity of each post based on the iteration result comprises:
obtaining posting user information corresponding to each post;
and iterating the popularity of each post according to the associated post, the initial popularity, the target relevance and the posting user information corresponding to each post, and determining the target popularity of each post according to an iteration result.
5. The method of claim 4, wherein iterating the popularity of each post based on the associated post, the initial popularity, the target popularity, and the posting user information for each post, and determining the target popularity of each post based on the iteration result comprises:
taking the initial popularity corresponding to each post as a first popularity;
determining a second popularity degree corresponding to each post after the current iteration according to a first associated post corresponding to each post, a second associated post associated with the first associated post, a first popularity degree corresponding to the first associated post, the target association degree and the posting user information;
if the second popularity does not meet the preset iteration stop condition, updating the first popularity corresponding to each post to be the second popularity obtained after the current iteration, and returning to execute the operation of determining the second popularity corresponding to each post after the current iteration according to the first associated post corresponding to each post, the second associated post associated with the first associated post, the first popularity corresponding to the first associated post, the target relevance and the posting user information;
and if the second popularity meets the preset iteration stop condition, determining the second popularity corresponding to each post after the current iteration as the target popularity.
6. The method of claim 5, wherein the second popularity for each post is determined according to the following formula:
Figure FDA0003020681880000031
wherein WD (i)' is a second popularity corresponding to post i; in (i) is a first set of associated posts associated with post i; j is one of the first set of associated posts In (i); in (j) is a second set of associated posts associated with post j; k is one second associated post of the first set of associated posts In (j); s (i, j) is the target relevance between post i and post j; s (j, k) is the target relevance between post j and post k; l isiIs the posting user rank value corresponding to the post i; l isjIs the posting user rank value corresponding to post j; l iskIs the posting user rank value corresponding to the post k; WD (j) is the first popularity for post j; ε is a preset damping coefficient.
7. A post extraction device, comprising:
the post information acquisition module is used for acquiring a plurality of post information, wherein the post information comprises a text and a reply text of a post;
the target relevance determining module is used for determining the target relevance between every two posts according to each text and each replying text;
the target popularity determination module is used for determining an association graph corresponding to each post according to each target association degree and determining the target popularity of each post according to the association graph and the target association degree;
the target post extraction module is used for extracting the target posts from the posts according to the target popularity;
the target popularity determination module includes:
the association post determining unit is used for determining an association post corresponding to each post according to the association diagram and acquiring initial popularity corresponding to each post;
and the target popularity determining unit is used for iterating the popularity corresponding to each post according to the associated post corresponding to each post, the initial popularity and the target relevance, and determining the target popularity corresponding to each post according to the iteration result.
8. An electronic device, characterized in that the device comprises:
one or more processors;
a memory for storing one or more programs;
when executed by the one or more processors, cause the one or more processors to implement the post extraction method as recited in any of claims 1-6.
9. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, implements the post extraction method according to any one of claims 1 to 6.
CN201910401520.6A 2019-05-14 2019-05-14 Post extraction method, device, equipment and storage medium Active CN110096649B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910401520.6A CN110096649B (en) 2019-05-14 2019-05-14 Post extraction method, device, equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910401520.6A CN110096649B (en) 2019-05-14 2019-05-14 Post extraction method, device, equipment and storage medium

Publications (2)

Publication Number Publication Date
CN110096649A CN110096649A (en) 2019-08-06
CN110096649B true CN110096649B (en) 2021-07-30

Family

ID=67448105

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910401520.6A Active CN110096649B (en) 2019-05-14 2019-05-14 Post extraction method, device, equipment and storage medium

Country Status (1)

Country Link
CN (1) CN110096649B (en)

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101980199A (en) * 2010-10-28 2011-02-23 北京交通大学 Method and system for discovering network hot topic based on situation assessment
CN103955547A (en) * 2014-05-22 2014-07-30 厦门市美亚柏科信息股份有限公司 Method and system for searching forum hot-posts
CN103970756A (en) * 2013-01-28 2014-08-06 腾讯科技(深圳)有限公司 Hot topic extracting method, device and server
CN106599181A (en) * 2016-12-13 2017-04-26 浙江网新恒天软件有限公司 Hot news detecting method based on topic model
CN107436877A (en) * 2016-05-25 2017-12-05 北京京东尚科信息技术有限公司 Much-talked-about topic method for pushing and device
CN109739975A (en) * 2018-11-15 2019-05-10 东软集团股份有限公司 Focus incident abstracting method, device, readable storage medium storing program for executing and electronic equipment

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20110307479A1 (en) * 2010-06-10 2011-12-15 Microsoft Corporation Automatic Extraction of Structured Web Content

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101980199A (en) * 2010-10-28 2011-02-23 北京交通大学 Method and system for discovering network hot topic based on situation assessment
CN103970756A (en) * 2013-01-28 2014-08-06 腾讯科技(深圳)有限公司 Hot topic extracting method, device and server
CN103955547A (en) * 2014-05-22 2014-07-30 厦门市美亚柏科信息股份有限公司 Method and system for searching forum hot-posts
CN107436877A (en) * 2016-05-25 2017-12-05 北京京东尚科信息技术有限公司 Much-talked-about topic method for pushing and device
CN106599181A (en) * 2016-12-13 2017-04-26 浙江网新恒天软件有限公司 Hot news detecting method based on topic model
CN109739975A (en) * 2018-11-15 2019-05-10 东软集团股份有限公司 Focus incident abstracting method, device, readable storage medium storing program for executing and electronic equipment

Also Published As

Publication number Publication date
CN110096649A (en) 2019-08-06

Similar Documents

Publication Publication Date Title
CN107220352B (en) Method and device for constructing comment map based on artificial intelligence
TWI732271B (en) Human-machine dialog method, device, electronic apparatus and computer readable medium
US11227118B2 (en) Methods, devices, and systems for constructing intelligent knowledge base
CN107992585B (en) Universal label mining method, device, server and medium
WO2018195875A1 (en) Generating question-answer pairs for automated chatting
US10437894B2 (en) Method and system for app search engine leveraging user reviews
US20130060769A1 (en) System and method for identifying social media interactions
CN111104511B (en) Method, device and storage medium for extracting hot topics
JP6661754B2 (en) Content distribution method and apparatus
CN110069713B (en) Personalized recommendation method based on user context perception
CN110162771A (en) The recognition methods of event trigger word, device, electronic equipment
CN111143508B (en) Event detection and tracking method and system based on communication type short text
CN115269828A (en) Method, apparatus, and medium for generating comment reply
CN109858024B (en) Word2 vec-based room source word vector training method and device
US7895206B2 (en) Search query categrization into verticals
US20100306235A1 (en) Real-Time Detection of Emerging Web Search Queries
CN110019763B (en) Text filtering method, system, equipment and computer readable storage medium
US20230112385A1 (en) Method of obtaining event information, electronic device, and storage medium
CN110096649B (en) Post extraction method, device, equipment and storage medium
CN114048742A (en) Knowledge entity and relation extraction method of text information and text quality evaluation method
CN113127639B (en) Abnormal conversation text detection method and device
US10552459B2 (en) Classifying a document using patterns
Kim et al. TrendsSummary: a platform for retrieving and summarizing trendy multimedia contents
Shakhovska et al. Building a smart news annotation system for further evaluation of news validity and reliability of their sources
Saxena et al. An iterative MapReduce framework for sports-based tweet clustering

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant