CN113254503B

CN113254503B - Content mining method and device and related products

Info

Publication number: CN113254503B
Application number: CN202110636994.6A
Authority: CN
Inventors: 苑鹏程; 贺文嵩; 顾晓光
Original assignee: Tencent Technology Shenzhen Co Ltd
Current assignee: Tencent Technology Shenzhen Co Ltd
Priority date: 2021-06-08
Filing date: 2021-06-08
Publication date: 2021-11-02
Anticipated expiration: 2041-06-08
Also published as: CN113254503A

Abstract

The embodiment of the application discloses a content mining method, a content mining device and a related product, and relates to the technical field of computers. The method specifically discloses: acquiring a seed content set, wherein a plurality of contents in the seed content set have preset content attributes; determining a target user from the candidate user set according to the interaction behavior data of the candidate user set associated with the seed content set to the seed content set; and determining target mining content based on the interaction behavior data of the target user on the candidate content set. The method does not need to establish a complex graph structure, and can frequently and quickly mine the content even if the mined content has higher timeliness requirements. The consumption of computing resources in the mining process is saved, the introduction of noise is effectively avoided, and the purity of the content is improved.

Description

Content mining method and device and related products

Technical Field

The present application relates to the field of computer technologies, and in particular, to a content mining method and apparatus, and a related product.

Background

With the rapid development of the internet, the content on the internet is more and more abundant, and users can easily obtain various types of content, such as videos, audios, pictures or documents, on the internet through computer devices. The user can act as both a consumer and a producer of the content. At present, on many internet platforms, a wide variety of content has been exposed to well-blows. Taking news scenes as an example, the cumulative production of news content every day can be as many as ten million pictures and videos. Taking video websites as an example, amateur users and professional teams may produce up to millions of pieces of video content each day. The mass content data brings great challenges to the operation, and a platform operator needs to discriminate the content produced by a content producer on one hand and provide personalized service of the content for a user on the other hand, so that the user can accurately release the content which is possibly interested in the content. Since the manual content screening is high in cost and low in efficiency, a technology for automatically realizing content mining is urgently needed to solve the problem of a platform operator in the face of mass content data.

In the prior art, a data mining method based on a graph structure is provided, unified modeling and feature expression are performed on user and multi-source data, and classified information is obtained by clustering data after feature expression. And performing distance analysis on the data, wherein the closer the distance is, the higher the association degree of the two is, and mining corresponding data with high association degree according to the initially defined seed data. However, in a scene with a huge number of users, very many computing resources are required to be consumed for constructing a complete graph structure, and due to the timeliness of mining contents, a global graph structure needs to be established every time the content is mined, so that the consumption of the computing resources is more huge. In addition, some content mining methods introduce more noise due to more intermediate processes in the mining process, so that the quality of mined content data is not high, and the availability of mined content in personalized service is reduced. In combination with the above analysis, how to implement content mining with less computing resources and ensure the quality of the mined content has become a technical problem to be solved in the field.

Disclosure of Invention

The embodiment of the application provides a content mining method, a content mining device and a related product, so that resource consumption in the content mining process is reduced, and the quality of mined content is improved.

In view of the above, a first aspect of the present application provides a content mining method, including:

acquiring a seed content set; the plurality of contents included in the seed content set all have preset content attributes;

determining a target user from the candidate user set according to the interaction behavior data of the candidate user set associated with the seed content set to the seed content set;

and determining target mining content based on the interaction behavior data of the target user on the candidate content set, wherein the target mining content comprises one or more contents in the candidate content set.

A second aspect of the present application provides a content mining apparatus, including:

a seed acquisition unit for acquiring a seed content set; the plurality of contents included in the seed content set all have preset content attributes;

the user determining unit is used for determining a target user from the candidate user set according to the interaction behavior data of the candidate user set associated with the seed content set to the seed content set;

and the content determining unit is used for determining target mining content based on the interaction behavior data of the target user on the candidate content set, wherein the target mining content comprises one or more contents in the candidate content set.

A third aspect of the application provides a computer device comprising a processor and a memory:

the memory is used for storing the program codes and transmitting the program codes to the processor;

the processor is adapted to perform the steps of the content mining method according to the first aspect as described above, according to instructions in the program code.

A fourth aspect of the present application provides a computer-readable storage medium for storing program code for performing the content mining method of the first aspect described above.

According to the technical scheme, the embodiment of the application has the following advantages:

the content mining method, the content mining device and the related products provided in the embodiment of the application realize content mining based on bipartite graph propagation. In the content mining method, firstly, a seed content set is obtained, wherein a plurality of included contents all have preset content attributes; secondly, determining a target user from the candidate user set according to the interaction behavior data of the candidate user set associated with the seed content set to the seed content set; then, determining target mining content based on the interaction behavior data of the target user on the candidate content set; the target mined content includes one or more of the content in the candidate content set. The method does not need to establish a complex graph structure, and can frequently and quickly mine the content even if the mined content has higher timeliness requirements. The consumption of computing resources in the mining process is saved. In addition, the method realizes content mining based on bipartite graph propagation, and does not need to introduce an intermediate process on the basis of bipartite graphs, so that the introduction of noise is effectively avoided, and compared with the prior art, the purity of target mining content is improved, and the quality of the mining content is guaranteed.

Drawings

Fig. 1 is a flowchart of a content mining method according to an embodiment of the present application;

FIG. 2 is a bipartite graph according to an embodiment of the present application;

FIG. 3 is a flowchart of determining a target user according to an embodiment of the present disclosure;

FIG. 4 is a flowchart illustrating mining of first target content according to an embodiment of the present disclosure;

fig. 5 is a flowchart of another content mining method provided in an embodiment of the present application;

fig. 6 is a schematic structural diagram of a content mining apparatus according to an embodiment of the present application;

fig. 7 is a schematic structural diagram of a server according to an embodiment of the present application;

fig. 8 is a schematic structural diagram of a terminal device according to an embodiment of the present application.

Detailed Description

In order to make the technical solutions of the present application better understood, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.

Based on the mass content produced by the content producer on the internet platform, the platform operator can continuously provide high-quality services, such as personalized recommendation service of the content, only by accurately screening the content. Taking news reading applications (appeaton, APP) as an example, a large amount of news contents of various types are generated by a content producer every day, and a large amount of contents with low quality inevitably appear in the period, which may be represented by poor sound quality, strange pictures and catch eyes, and the like, and the production period of the contents is short. And the high-quality content containing rich knowledge needs a large amount of early investment, so that the whole production period is longer. Therefore, a phenomenon occurs in which the quality content is overwhelmed by the amount of the low quality content, that is, "the bad coin ejects the good coin". In view of the above problems, a platform operator needs to provide personalized recommendations for a served user group, for example, to recommend fresh high-quality content to a user who likes high-quality content, which requires to discriminate whether the content is high-quality or not from massive content produced by a content producer in the early stage, thereby increasing the exposure or promotion of the discriminated high-quality content in a targeted manner.

The manual content screening requires a large amount of manpower, and in the case of such a large amount of content, the cost is high on one hand, and the efficiency is very low on the other hand. The method of visually labeling content is not preferable. Therefore, content mining needs to be performed in a time-saving and labor-saving mode, and the burden of manual labeling is reduced. Data mining methods based on graph structures often need to reconstruct a global graph structure at each mining, and consumption of computing resources is serious. Another content mining method based on the three-part graph introduces an intermediate process, so that the noise in the mined data is high, and the quality of the mined content is difficult to ensure.

Based on the above problems, the present application provides a content mining method, device and related product, which implement mining from content to user and then from user to content in a bipartite graph propagation manner. And the introduction of noise is reduced in the content mining process based on bipartite graph propagation, and the quality of the mined content is guaranteed. And the method saves the consumption of computing resources in the aspect of drawing. Through automatic content mining, the burden of manually marking the content is reduced. And determining target mining contents with the same preset content attribute as the seed contents in the massive contents at low cost, so that a platform operator can be assisted to complete the screening and popularization of specific types of contents.

The content mining method provided by the application can be applied to content mining equipment with data processing capacity, such as terminal equipment and servers. The terminal device may be, but is not limited to, a smart phone, a desktop computer, a notebook computer, a tablet computer, a smart speaker, a smart watch, and the like; the server may be an independent physical server, a server cluster or a distributed system formed by a plurality of physical servers, or a cloud server providing cloud computing services. The terminal device and the server may be directly or indirectly connected through wired or wireless communication, and the application is not limited herein.

For ease of understanding, please refer to a flow chart of a content mining method shown in fig. 1. As shown in fig. 1, a content mining method provided in an embodiment of the present application includes:

s100: the method comprises the steps of obtaining a seed content set, wherein a plurality of contents included in the seed content set all have preset content attributes.

The method provided by the embodiment of the application prepares a seed content set. The seed content set contains a plurality of contents, and the contents have preset content attributes, and the preset content attributes are also the purpose of executing the method of the embodiment to perform content mining. That is to say, in the technical solution of the embodiment of the present application, the seed content set is used as a base material, so as to determine as many content with the preset content attribute as possible from a large amount of content. The preset content attributes may be selected according to actual content mining requirements.

As an example, if the purpose of content mining is to determine high-quality content in the massive content, the content included in the set of seed content all belongs to the high-quality content, that is, the preset content attribute is the high-quality attribute. As another example, if the purpose of content mining is to mine the variety content in the massive content, the content included in the seed content set all belongs to the variety content, that is, the preset content attribute is the variety attribute. Or the plurality of contents included in the seed content set all have a plurality of preset content attributes, for example, the plurality of preset content attributes include a high-quality attribute and a news broadcast attribute. In practical applications, the preset content attribute may have a plurality of dividing manners and defining manners. As an example, the content-based presentation types are classified into a variety type, a news cast type, a documentary type, a movie type, and the like; the fields based on the contents are divided into science and education, military, traditional culture, pets and the like; the method is divided into high quality, poor quality and the like based on the fineness of pictures and tone quality and the knowledge depth and richness degree contained in the content; whether the content has the low popular attribute can be determined based on the correlation of the content with the conveying effect of "low-level taste, mediocre, popular, and prosperous" or the correlation of the content with the terms explicitly prohibited in the network information dissemination rules in the content platform. The seed content set may be obtained in a variety of ways. Two implementations of obtaining a set of seed content are exemplary described below.

In one possible implementation, the set of seed content is obtained by manual annotation of a small amount of content. The seed content labeled by the professional can correctly reflect the preset content attribute. The seed content set obtained in this way has the advantages of no noise and high quality.

In another possible implementation manner, a seed content set is constructed by labeling a small number of content producers uploading content with preset content attributes to obtain a large number of content uploaded by the content producers. For example, if a content producer S (not shown in the figure) uploads a content with a preset content attribute, the content producer S performs a label corresponding to the preset content attribute, and adds all the uploaded content of the content producer S to the seed content set according to the label. The advantage of obtaining a set of seed content in this way is that the annotation cost is lower.

The above manner of acquiring the seed content set is only an example, and a suitable manner of acquiring the seed content set may be selected according to actual requirements, and the acquisition manner is not limited to the above two manners.

S101: and determining a target user from the candidate user set according to the interaction behavior data of the candidate user set associated with the seed content set to the seed content set.

In the embodiment of the application, content mining is performed based on bipartite graph propagation. The bipartite graph is also called bipartite graph and is a model in graph theory. The bipartite graph contains two independent point sets, which are content point set user point sets in the embodiment of the present application. Fig. 2 is a bipartite graph according to an embodiment of the present application. As shown in fig. 2, the content point set may include various forms of content, such as video, pictures, text, etc., and the user point set includes a plurality of users. The relationship between content and users is consumption and consumed. During consumption, a user may generate one or more social behaviors, such as clicks, likes, favorites, comments, or shares, for the content being consumed. The consumption behavior and the social behavior can be collectively defined as the interaction behavior of the user on the content, and the occurrence of the interaction behavior generates corresponding data, which is also called as interaction behavior data of the user on the content. In the embodiment of the application, the method and the device are applied to interactive behavior data of the user on the content when the bipartite graph is spread to achieve content mining.

As mentioned above, the content mining of the present solution is based on a bipartite graph, for which a set of candidate users is prepared. A plurality of users are included in the candidate user set. In one example implementation, the candidate user set may include all registered users on the platform, and in another example implementation, the candidate user set may also include only users with activities reaching a preset level within the last 3 years on the platform, and the candidate user set includes users that generate the interaction behavior data with the seed content set.

In addition, the users can be screened based on the interaction behavior data generated between the users and the seed content set, and then a candidate user set is obtained. For example, if a user has not interacted with any content in the set of seed content, the user may not be selected to the set of candidate users; only users who have interacted with one or more of the contents in the seed content set are selected from the candidate user set. In the method, the candidate user set is formed in advance based on the interactive behavior data, so that the calculation amount for determining the target user is reduced, and the efficiency of determining the target user is improved. The above selection manner is only an example, and the selection manner of the users in the candidate user set is not limited in the embodiment of the present application.

In order to extract a new content with preset content attributes from the massive content, a target user needs to be first determined from the candidate user set in S101. Here, the target user is determined to be the audience for the seed content set. Since the plurality of contents included in the seed content set have the preset content attribute, the target user is also an audience of the preset content attribute. In S101, a target user is determined according to the interaction behavior data of the candidate user set to the seed content set. Here, the interaction behavior data of the candidate user set on the seed content set includes: and the interaction behavior data of each user in the candidate user set for each content in the seed content set. Based on the interactive behavior data, whether each user in the candidate user set has close and high-strength interactive connection with the seed content set can be judged, and whether the user is the target user or not is further determined.

An example implementation of S101 is described below, please refer to the flowchart of determining target users shown in fig. 3. As shown in fig. 3, the process of determining the target user includes:

s101 a: and obtaining behavior scores of various preset interactive behaviors of the user in the seed content set according to the interactive behavior data of the user in the candidate user set on the seed content set.

As an example, the plurality of preset interaction behaviors include: viewing content, clicking content, favorites content, commenting on content, and sharing content.

Generally, viewing content is an interactive activity that results from a user being attracted to the content; the click content is an interactive behavior generated by a user based on the requirements of pausing playing, amplifying a content display picture and the like, being attracted by the content or based on a certain reward mechanism related to the content; favorite contents are interactive behaviors generated by the user on the content expression favorite attitudes; the collected content is an interactive behavior generated by the user with the positive content value; the comment content is the interactive behavior generated by the user aiming at the external lyric viewpoint of the content; the sharing content is interactive behavior generated by the user for spreading the content to others, and the sharing purpose can be to spread the content to other users who may be interested, or express the approval of the user for the shared content through spreading.

The preset interaction behaviors can be respectively realized through different types of input operations, and the specific input operation can be related to the display position and display mode of the content and the module setting of the content platform. The following provides an exemplary description of the input operation of the above-mentioned various preset interactive behaviors.

As an example, the interactive behavior of clicking the content corresponds to the input operations of: the triggering operation for content presentation or playing may be, for example, a user clicking on a content title or thumbnail presented on the screen of the terminal device. The interaction behavior of viewing content corresponds to the input operations of: after the content is displayed or played, the user performs pull-down operation on a display window or drag operation on a playing progress bar while maintaining the display or playing of the content. The interactive behavior of favorite contents corresponds to the input operations of: selection of a heart-shaped module shown in the same page showing the content on the screen. The input operation corresponding to the interactive behavior of the collection content is as follows: and selecting and operating the star-shaped module displayed in the same page displaying the content on the screen. The input operation corresponding to the interaction behavior of the comment content comprises the following steps: and the user selects the comment input box, inputs characters and publishes the input characters. The input operation corresponding to the interactive behavior of sharing the content comprises the following steps: the method comprises the following steps of triggering a sharing button displayed in the same page displaying the content on a screen and selecting a sharing path or a sharing position.

The preset interaction behaviors can be detected and identified through a detection mode corresponding to the input operation, and interaction behavior data are obtained. The following provides an exemplary description of the detection manner of the preset interaction behaviors.

As an example, the detection manner of the interaction behavior of clicking the content may be: and detecting the clicking operation of the content title or the thumbnail in the screen. The detection mode corresponding to the interaction behavior of the viewing content can be as follows: and detecting the dragging operation of the display window and the playing time of the content. The interactive behavior of the favorite content corresponds to a detection mode as follows: and detecting the selected state of the heart-shaped module on the screen. The corresponding detection mode of the interactive behavior of the collection content is as follows: and detecting the selected state of the star-shaped module on the screen. The detection mode corresponding to the interactive behavior of the comment content comprises the following steps: the method includes detecting operations of sequentially selecting a comment input box of the content, inputting characters, and publishing the characters. The detection mode corresponding to the interactive behavior of sharing the content comprises the following steps: and detecting the trigger state of the sharing button on the screen and detecting the selected state of the sharing path or the sharing position.

The data of the viewing content includes a viewing time period. And the data of the click content, the favorite content, the comment content and the shared content are included or not. For example, if the viewing duration exceeds the preset duration, the value of the interactive behavior of the viewing content is set to 1, otherwise, the value is set to 0. If the user clicks the content, the value of the interactive behavior of clicking the content is 1, otherwise, the value is 0; the setting mode of the values of the interactive behaviors such as favorite content, comment content, shared content and the like is similar to the setting mode of the values of the click content, and details are not repeated here.

Behavior scores of various preset interaction behaviors of the user in the seed content set comprise: the method comprises the steps of obtaining a behavior score of a user watching content in a seed content set, a behavior score of a user clicking content in the seed content set, a behavior score of a user enjoying content in the seed content set, a behavior score of a user collecting content in the seed content set, a behavior score of a user commenting content in the seed content set and a behavior score of a user sharing content in the seed content set. In order to respectively identify the behavior scores of the various preset interactive behaviors, assuming that the multiple preset interactive behaviors include N types of interactive behaviors, the multiple preset interactive behaviors are classified into a 1 st type of interactive behavior and a 2 nd type of interactive behavior …. Wherein N is an integer greater than 1.

In practical application, as one kindAccording to the selected implementation mode, the behavior score of the ith interactive behavior of the user in the seed content set is obtained through accumulation according to the behavior data of the ith interactive behavior of each seed content in the seed content set. Here, 1. ltoreq. i.ltoreq.N, and i is an integer. Assume that the set of seed contents includes P seed contents, P being an integer greater than 1. Calculating the behavior score of the ith interactive behavior of a certain user in the seed content set

The formula of (1) is as follows:

formula (1)

In the formula (1), the first and second groups,

and the value of the ith interactive behavior of the w-th seed content in the P seed contents of a certain user is represented, and the specific value is 1 or 0. W is not less than 1 and not more than P and w is an integer. Obtaining the behavior score of the ith interactive behavior of the user in the whole seed content set through the addition operation of the formula (1)

. Based on the value range of i, the behavior score of each interaction behavior of each user in the candidate user set in the seed content set can be obtained according to the formula (1).

S101 b: and obtaining the behavior credibility scores of the user in the seed content set according to the behavior scores of various preset interactive behaviors of the user in the seed content set and the weight parameters corresponding to the various preset interactive behaviors.

And setting weight parameters corresponding to the various preset interactive behaviors based on the recognition degree of different interactive behaviors in the various preset interactive behaviors to the content. For example, in some cases, based on the difference in convenience of the input operation, the user agrees with the content less than the content sharing, and thus a lower weight parameter may be set for the content agreeing, and a higher weight parameter may be set for the shared content.

On the premise that the behavior scores of various preset interactive behaviors of the user in the seed content set and the weight parameters corresponding to the various preset interactive behaviors are known, the behavior scores of the same interactive behaviors and the weight parameters can be multiplied and then added to obtain the behavior credibility score of the user in the seed content set. It is understood that a higher confidence score of a behavior indicates that the user has a more intimate and more intense interaction with the set of seed content, and also indicates that the user has a higher degree of recognition of the set of seed content. The formula for calculating the behavior credibility score of the user in the seed content set is as follows:

formula (2)

In the formula (2), the first and second groups,

the behavior score of the ith interaction behavior of the user in the seed content set is represented, and the specific calculation mode can refer to formula (1).

A weight parameter representing an ith interaction behavior of the plurality of preset interaction behaviors,

representing the user's behavioral confidence score in the seed content collection.

S101 c: and determining the target user from the candidate user set according to the behavior credibility score of the user in the seed content set.

The specific implementation of this step includes many possible implementations. For example, a user in the candidate user set whose behavior reliability score of the seed content set exceeds a first preset score may be determined as the target user. Or, the behavior reliability scores of the users in the seed content set in the candidate user set may be sorted in a descending order, and the users corresponding to the scores of the first number preset in the candidate user set may be determined as the target users. It should be noted that the first preset score and the first preset number may be set according to experience or actual requirements, and therefore, specific numerical values are not limited in the embodiment of the present application. The first preset number is less than the total number of users included in the candidate user set. As an example, the first preset number is 500, and in the latter implementation, the first 500 users with the highest credibility scores are determined as the target users.

S102: and determining target mining content based on the interaction behavior data of the target user on the candidate content set.

The target user has been determined (i.e., mined) by S101 as above. Then, based on the concept of bipartite graph propagation, further according to the determined target users, new contents matching mining requirements (i.e. having preset content attributes) are determined from the mass of contents that are not manually labeled, and these newly determined contents are referred to as target mining contents in the embodiments of the present application. These massive amounts of unarmed content include content in the candidate content set, and may also include content other than the aforementioned candidate content set. When the target mining content is determined based on the target user, the interaction behavior data of the target user on the candidate content set is also used as the analysis basis. An alternative implementation of S102 is described below in conjunction with fig. 4.

Assume that the candidate content set includes K candidate contents, i.e. the 1 st candidate content and the 2 nd candidate content … kth candidate content, respectively, where K is an integer greater than 1. As shown in fig. 4, the process of mining the first target content includes:

s102 a: and accumulating to obtain the interaction score of the target user in the jth candidate content according to the behavior data of the ith interaction behavior of the jth candidate content of the target user in the candidate content set, wherein j is more than or equal to 1 and less than or equal to K, and j is an integer.

The expression of the interaction score of a target user in the jth candidate content is

Wherein

And the value representing the interactive behavior of the target user in the ith candidate content is 1 or 0.

S102 b: and obtaining the credibility score of the jth candidate content according to the interaction score of the target user in the jth candidate content and the behavior credibility score of the target user in the seed content set.

And obtaining the credibility score of the jth candidate content obtained by the interactive behavior data of the target user according to the interactive score of the target user in the jth candidate content and the behavior credibility score of the target user in the seed content set. The confidence score is calculated as follows:

formula (3)

In equation (3), i indicates the ith interaction behavior,

and (3) expressing the behavior credibility score of the target user in the seed content set, wherein the calculation formula is specifically shown in formula (2).

And obtaining the credibility score of the jth candidate content according to the interactive behavior data of the target user. The higher the reliability of the candidate content is, the higher the probability that the candidate content has the preset content attribute is.

In the target users determined in S101, there may be a preset interaction behavior between a plurality of target users and the jth candidate content in the candidate content set. Therefore, according to the formula (3), the credibility scores related to the jth candidate content can be obtained according to the interactive behavior data of a plurality of target users, for example, the credibility scores obtained by the interactive behavior data of the target user A

Obtained from interaction behavior data of target user B

And so on.

In an alternative implementation, all that is obtained according to equation (3)

The maximum value is taken as the confidence score of the jth candidate content. In another alternative implementation, all that is obtained according to equation (3)

And performing weighted average operation, and taking the operation result as the credibility score of the jth candidate content.

Based on the value range of j, the credibility score of each candidate content in the candidate content set can be obtained according to the above mode.

S102 c: and determining target mining content from the candidate content set according to the credibility score of each candidate content in the candidate content set.

In the embodiment of the application, the mined target mined contents at least comprise one or more contents in the candidate content set. For the sake of convenience of distinction, the target mined content determined in the candidate content set is referred to as first target content, and the target mined content determined outside the candidate content set is referred to as second target content. In some implementation scenarios, content may be mined from only the candidate content set, or may be mined additionally from outside the candidate content set when there is a higher number of requirements for mining content. An implementation of determining targeted mined content from a set of candidate content is first described below.

In an optional implementation manner, candidate contents with credibility scores exceeding a second preset score in the candidate content set can be determined as the first target content; or the credibility scores of the candidate contents in the candidate content set are arranged in a descending order, and the candidate contents corresponding to a second number of scores preset in the candidate content set are determined as the first target contents.

It should be noted that the second preset score and the second preset number may be set according to experience or actual requirements, so that no specific numerical limitation is made in the embodiment of the present application. The second preset number is smaller than the total number of candidate contents contained in the candidate content set. As an example, the second preset number is 50000, in the latter implementation manner, after the descending order, the first 50000 candidate contents with the highest credibility score in the candidate content set are determined as the first target content. These contents are newly determined target mining contents.

The content mining method provided by the embodiment of the application is as above. Firstly, acquiring a seed content set, wherein a plurality of included contents have preset content attributes; secondly, determining a target user from the candidate user set according to the interaction behavior data of the candidate user set associated with the seed content set to the seed content set; and then, determining target mining content based on the interaction behavior data of the target user on the candidate content set. The method does not need to establish a complex graph structure, and can frequently and quickly mine the content even if the mined content has higher timeliness requirements. The consumption of computing resources in the mining process is saved. In addition, the method realizes content mining based on bipartite graph propagation, and does not need to introduce an intermediate process on the basis of bipartite graphs, so that the introduction of noise is effectively avoided, and compared with the prior art, the purity of target mining content is improved, and the quality of the mining content is guaranteed. Through automatic content mining, the burden of manually marking the content is reduced. And contents with the same preset content attribute as the seed content are mined out from the massive contents at low cost, so that the platform operator can be assisted to complete the screening and popularization of specific types of contents.

A round of content mining is completed through S100-S102. In practical applications, if it is desired to further mine more new contents related to the preset content attribute, the product of the content mining of the current round can be used as the seed content set of the content mining of the next round. For example, the target mining content determined in the 1 st round is used for constructing a new seed content set, and the new seed content set is used for starting the 2 nd round of content mining for the preset content attribute; and the target mining content determined in the 2 nd round is used for constructing a new seed content set, the new seed content set is used for starting the 3 rd round of content mining aiming at the preset content attribute, and the like.

Before the seed content set of the next round of content mining is constructed according to the mining product (namely, the target mining content) of the current round, the qualification verification aiming at the preset content attribute can be carried out on the mining product of the current round. And adding the mining product into the new seed content set only when the mining product passes the qualification verification of the preset content attribute. In the embodiment of the application, through the link of qualification verification of the preset content, the accuracy of a seed content set used in the next round of content mining can be effectively improved, and the product quality of the next round of content mining is further improved.

In order to promote the amount of mined content, considering that the amount of the first target content may be smaller, a content producer may be further determined by the first target content, and then more target mined content may be determined based on the determined content producer. Another content mining method is described below in conjunction with an embodiment.

Fig. 5 is a flowchart of another content mining method according to an embodiment of the present application. The content mining method shown in fig. 5 includes:

s501: obtaining cold start data, wherein the cold start data comprises a seed content set used in 1 st round of content mining aiming at preset content attributes, and a plurality of contents in the seed content set have the preset content attributes.

S502: and determining a target user from the candidate user set according to the interaction behavior data of the candidate user set associated with the seed content set to the seed content set.

S503: and determining first target content from the candidate content set based on the interaction behavior data of the target user on the candidate content set.

S501-S503 are substantially the same as the implementations of S100-S102 in the foregoing embodiments, and please refer to the foregoing embodiments for details, which will not be described in detail for S502-S503.

S504: a content producer of the first target content is determined.

For the content producer who uploads the first target content, it also uploads other content with preset content attributes with a high probability. In order to expand the number of the mining artifacts, in the embodiment of the present application, the content producer of the first target content determined in S503 is determined accordingly, so as to obtain more contents with the preset content attribute, that is, the second target content, and increase the number of the mining artifacts in the current round.

S505: obtaining second target content uploaded by a content producer; the second targeted content includes one or more content uploaded by the content producer in addition to the first targeted content.

As an example, the first target content is one content uploaded by its content producer, and the remaining one or more contents uploaded by the content producer may be determined as the second target content.

S506: and constructing a new seed content set according to the first target content and the second target content.

Constructing a new seed content set according to the first target content and the second target content, which may specifically include: determining the content with the preset content attribute from the first target content and the second target content as qualified mining content; and constructing a new seed content set according to the qualified mining content. Through the qualification verification link of the preset content attribute, the accuracy of the seed content set used in the next round of mining can be effectively improved, and the product quality of the next round of content mining is further improved.

In order to implement the qualification verification of whether the first target content and the second target content have the preset content attribute, in the embodiment of the present application, S506 may be implemented by a trained classification model. For example: and (3) training a classification model by taking the pre-labeled content with the preset content attribute as a positive sample, inputting the first target content and the second target content into the classification model, and determining whether the content is qualified (namely whether the content has the preset content attribute) according to an output result. The above is merely an example, and in other implementations, a variety of machine learning models, deep learning models, or generative models may be used to verify the eligibility of the content.

It should be noted that, in practical applications, the interactive behavior data of the content may be updated in real time by the user, so that the interactive behavior data used in content mining is not unchanged, and may be updated in real time. Furthermore, according to the content mining method provided by the embodiment of the application, new content with preset content attributes can be mined from time to time along with the updating of the interaction behavior data. For example, a batch of content is mined according to the method on the same day, a new seed content set can be extracted from the product of the previous day on the next day to perform content mining, the seed content is different on different dates, and the interactive behavior data is changed, so that the content mining of the preset content attribute is periodically realized. In addition, the candidate content set may also be updated as the content producer uploads.

S507: judging whether a preset condition for ending mining is met, if not, taking a new seed content set as a seed content set for next-round content mining, and entering S502; if so, the process proceeds to S508.

The purpose of this step is to determine whether the content mining operation for the preset content needs to be ended. For this purpose, preset end excavation conditions are set. As an example, the preset end excavation condition includes: the new seed content set is an empty set, or the new seed content set has no new content compared with the seed content set used in the content mining of the current round, or the iteration times reach the preset times. Here, S502-S506 are performed once per loop, referred to as a round of content mining, with the number of iterations incremented by 1. Therefore, the iteration times refer to the cycle times of determining the target user and the subsequent steps from the candidate user set according to the interaction behavior data of the candidate user set associated with the seed content set to the seed content set.

The fact that the new seed content set has no new content compared with the seed content set used in the current round of content mining means that the target mining contents determined in two consecutive rounds are consistent. Of course, in other implementations, it may be determined that the preset ending mining condition is met only when new seed content sets obtained in more than two consecutive rounds are consistent. For example, if the new seed content sets obtained in five consecutive rounds are consistent, it is determined that the end mining condition is satisfied.

After the content mining of the current round is finished, if it is determined that the fresh target mining content is not mined according to the new seed content set, the iteration is stopped without being continued, and the content mining is finished. Or the iteration times reach the preset times, which indicates that the iteration times are enough and close to convergence, and at the moment, the iteration can be stopped, and the content mining is finished.

And if the preset condition for ending the excavation is not met, the next round of excavation can be started by a new seed content set formed by the excavation products of the current round in the technical scheme of the application. See specifically S508.

S508: and stopping mining the content.

According to the method and the device, the seed content data partially used in the cold start stage are provided, the iterative algorithm propagated through the bipartite graph can efficiently, conveniently and periodically acquire a large amount of content data with preset content attributes, the data accuracy is high, and the cost of manual labeling is saved. By way of example, in a premium content mining project, mining by implementing the content mining method described herein for production content data for the day may produce more than about one thousand premium content per day. Through verification, the accuracy of the high-quality content in the excavated content is over 90 percent.

In addition, a machine learning model or a deep learning model can be introduced to discriminate the target user after the execution of S502, so as to ensure the accuracy of the target user. After S504, a machine learning model or a deep learning model may be introduced to discriminate the determined content producer, so as to ensure the accuracy of the determined content producer as a content producer for further mining the second target content. The target users and content producers which do not have the content with the preset content attribute and the accuracy requirement can be eliminated, and the influence on the accuracy and quality of content mining is avoided. The verification and judgment of specific categories by using a machine model, a deep learning model or a generating model belong to a relatively mature technology, and are not described herein any more, and the models only need to be trained in combination with the application scenarios and data requirements of the embodiment of the application.

By applying the content mining method provided by the embodiment of the application, a large amount of content matched with the preset content attribute can be obtained efficiently, accurately and at low cost, automatic labeling of a large amount of content is realized, and the labeling range is expanded. The target mining content can be used for being recommended to a user with specific preference accurately, and personalized service for the user is achieved. Therefore, the user can consume the interested specific content within a limited time, and the content consumption experience of the user is improved.

By taking the preset content attribute as the high-quality attribute as an example, although the high-quality content usually occupies only a few in the mass data, the unmarked high-quality content can be accurately mined by the technical scheme provided by the embodiment of the application. The accurate mining of the high-quality content is beneficial to completing the positioning and popularization of the high-quality content in the mass content. Thus, the user can more conveniently read the high-quality content.

In some scenarios of training the model, training needs to be based on a large amount of labeled data. Generally speaking, the more training data and the higher quality, the better the performance of the trained model and the more accurate the model. However, manual labeling has the problems of low efficiency and high cost. The content mining method provided by the embodiment of the application is based on the preset content attribute of the seed content set and the bipartite graph propagation method, so that a large amount of target mining contents with the preset content attribute are effectively mined, the automatic marking of the contents is realized, and the number of marked contents is expanded. Meanwhile, the cost of manual marking is saved, and the marking efficiency is improved. These mined content may be used to supplement the training data of the model. And further help training out the model with higher accuracy. The function and type of model that requires training is not limited herein.

In the embodiment of the application, the internet platform and the content mining method related to content mining can be realized based on a block chain.

The blockchain is a novel application mode of computer technologies such as distributed data storage, point-to-point transmission, a consensus mechanism and an encryption algorithm. A block chain (Blockchain), which is essentially a decentralized database, is a series of data blocks associated by using a cryptographic method, and each data block contains information of a batch of network transactions, so as to verify the validity (anti-counterfeiting) of the information and generate a next block. The blockchain may include a blockchain underlying platform, a platform product services layer, and an application services layer.

The block chain underlying platform can comprise processing modules such as user management, basic service, intelligent contract and operation monitoring. The user management module is responsible for identity information management of all blockchain participants, and comprises public and private key generation maintenance (account management), key management, user real identity and blockchain address corresponding relation maintenance (authority management) and the like, and under the authorization condition, the user management module supervises and audits the transaction condition of certain real identities and provides rule configuration (wind control audit) of risk control; the basic service module is deployed on all block chain node equipment and used for verifying the validity of the service request, recording the service request to storage after consensus on the valid request is completed, for a new service request, the basic service firstly performs interface adaptation analysis and authentication processing (interface adaptation), then encrypts service information (consensus management) through a consensus algorithm, transmits the service information to a shared account (network communication) completely and consistently after encryption, and performs recording and storage; the intelligent contract module is responsible for registering and issuing contracts, triggering the contracts and executing the contracts, developers can define contract logics through a certain programming language, issue the contract logics to a block chain (contract registration), call keys or other event triggering and executing according to the logics of contract clauses, complete the contract logics and simultaneously provide the function of upgrading and canceling the contracts; the operation monitoring module is mainly responsible for deployment, configuration modification, contract setting, cloud adaptation in the product release process and visual output of real-time states in product operation, such as: alarm, monitoring network conditions, monitoring node equipment health status, and the like.

The platform product service layer (namely a content platform) provides basic capability and an implementation framework of typical applications, and developers can complete block chain implementation of business logic based on the basic capability and the characteristics of the superposed business. The application service layer provides the application service based on the block chain scheme for the business participants to use.

Based on the content mining method provided by the foregoing embodiment, correspondingly, the application further provides a content mining device. The following description is made with reference to the drawings and examples.

Fig. 6 is a schematic structural diagram of a content mining device according to an embodiment of the present application. The content mining apparatus 600 shown in fig. 6 includes:

a seed obtaining unit 601, configured to obtain a seed content set; the plurality of contents included in the seed content set all have preset content attributes;

a user determining unit 602, configured to determine a target user from a candidate user set according to interaction behavior data of the candidate user set associated with the seed content set to the seed content set;

a content determining unit 603, configured to determine target mined content based on the interaction behavior data of the target user on the candidate content set, where the target mined content includes one or more contents in the candidate content set.

The device does not need to establish a complex graph structure, and can frequently and quickly mine the content even if the mined content has higher timeliness requirements. The consumption of computing resources in the mining process is saved. In addition, the device realizes content mining based on bipartite graph propagation, and does not need to introduce an intermediate process on the basis of bipartite graphs, so that the introduction of noise is effectively avoided, and compared with the prior art, the purity of mined content is improved, and the quality of target mining content is guaranteed.

Optionally, the target mining content includes a first target content and a second target content, and the content determining unit 603 includes:

the content first determining subunit is used for determining first target content from the candidate content set according to the interaction behavior data of the target user on the candidate content set;

a content producer determination subunit operable to determine a content producer of the first target content;

the content second determining subunit is used for acquiring second target content uploaded by the content producer; the second targeted content includes one or more content uploaded by the content producer in addition to the first targeted content.

Optionally, the content mining apparatus 600 further includes:

the seed content set constructing unit is used for constructing a new seed content set according to the first target content and the second target content;

and the iteration determining unit is used for taking the new seed content set as a seed content set for next round of content mining when the preset mining ending condition is not met, returning to execute the interactive behavior data of the candidate user set associated with the seed content set on the seed content set, determining the target user and subsequent steps from the candidate user set, and stopping mining the content until the preset mining ending condition is met.

Optionally, the seed content set constructing unit specifically includes:

the eligibility verification subunit is used for determining the content with the preset content attribute from the first target content and the second target content as the eligible mining content;

and the seed content set constructing subunit is used for constructing a new seed content set according to the qualified mining content.

Optionally, the preset end excavation condition includes:

the new seed content set is an empty set; alternatively, the first and second electrodes may be,

the new seed content set has no new content compared with the seed content set used in the current round of content mining; alternatively, the first and second electrodes may be,

and determining the iteration times of the target user and the subsequent steps from the candidate user set to reach the preset times according to the interaction behavior data of the candidate user set associated with the seed content set to the seed content set.

Optionally, the user determining unit 602 includes:

the behavior score calculating subunit is used for obtaining behavior scores of various preset interactive behaviors of the user in the seed content set according to the interactive behavior data of the user in the candidate user set on the seed content set;

the behavior credibility score calculating operator unit is used for obtaining behavior credibility scores of the user in the seed content set according to the behavior scores of various preset interactive behaviors of the user in the seed content set and the weight parameters corresponding to the various preset interactive behaviors;

and the target user determining subunit is used for determining the target user from the candidate user set according to the behavior credibility score of the user in the seed content set.

Optionally, the target user determining subunit is configured to determine, as the target user, a user in the candidate user set whose behavior reliability score in the seed content set exceeds a first preset score; or, performing descending order on the behavior credibility scores of the users in the seed content set in the candidate user set, and determining the users corresponding to the scores of the first number preset in the candidate user set as the target users.

Optionally, the plurality of preset interaction behaviors includes: 1 st interactive behavior to Nth interactive behavior, wherein N is an integer greater than 1;

the candidate content set comprises 1 st candidate content to Kth candidate content, and K is an integer greater than 1; the content determining unit 603 includes:

the interaction score calculating subunit is used for accumulating the interaction scores of the target users in the jth candidate content according to the behavior data of the ith interaction behavior of the jth candidate content of the target users in the candidate content set; j is more than or equal to 1 and less than or equal to K, and j is an integer; i is more than or equal to 1 and less than or equal to N, and i is an integer;

the credibility score value operator unit is used for obtaining the credibility score of the jth candidate content according to the interaction score of the target user in the jth candidate content and the behavior credibility score of the target user in the seed content set;

and the content determining subunit is used for determining the target mining content from the candidate content set according to the credibility score of each candidate content in the candidate content set.

Optionally, the plurality of preset interaction behaviors includes: viewing content, clicking content, favorites content, commenting on content, and sharing content.

The embodiment of the present application further provides a computer device, and the computer device provided in the embodiment of the present application will be described below from the perspective of hardware materialization.

Fig. 7 is a schematic diagram of a server 900 according to an embodiment of the present application, where the server 900 may have a relatively large difference due to different configurations or performances, and may include one or more Central Processing Units (CPUs) 922 (e.g., one or more processors) and a memory 932, and one or more storage media 930 (e.g., one or more mass storage devices) for storing applications 942 or data 944. Memory 932 and storage media 930 can be, among other things, transient storage or persistent storage. The program stored on the storage medium 930 may include one or more modules (not shown), each of which may include a series of instruction operations for the server. Still further, a central processor 922 may be provided in communication with the storage medium 930 to execute a series of instruction operations in the storage medium 930 on the server 900.

The server 900 may also include one or more power supplies 926, one or more wired or wireless network interfaces 950, one or more input-output interfaces 958, and/or one or more operating systems 941, such as Windows ServerTM, Mac OS XTM, UnixTM, LinuxTM, FreeBSDTM, etc.

The steps performed by the server in the above embodiments may be based on the server structure shown in fig. 7.

The CPU 922 is configured to execute the following steps:

For the content mining method described above, the embodiment of the present application further provides a terminal device for content mining, so that the content mining method described above is implemented and applied in practice.

Referring to fig. 8, fig. 8 is a schematic structural diagram of a terminal device according to an embodiment of the present application. For convenience of explanation, only the parts related to the embodiments of the present application are shown, and details of the specific technology are not disclosed. The terminal may be any terminal device including a mobile phone, a tablet computer, a Personal Digital Assistant (PDA, abbreviated as "Personal Digital Assistant"), a Sales terminal (POS, abbreviated as "Point of Sales"), a vehicle-mounted computer, etc., and the terminal is taken as a mobile phone as an example:

fig. 8 is a block diagram showing a partial structure of a cellular phone related to a terminal-form computer device provided in an embodiment of the present application. Referring to fig. 8, the handset includes: radio Frequency (RF) circuit 1010, memory 1020, input unit 1030, display unit 1040, sensor 1050, audio circuit 1060, wireless fidelity (WiFi) module 1070, processor 1080, and power source 1090. Those skilled in the art will appreciate that the handset configuration shown in fig. 8 is not intended to be limiting and may include more or fewer components than those shown, or some components may be combined, or a different arrangement of components.

The following describes each component of the mobile phone in detail with reference to fig. 8:

RF circuit 1010 may be used for receiving and transmitting signals during information transmission and reception or during a call, and in particular, for processing downlink information of a base station after receiving the downlink information to processor 1080; in addition, the data for designing uplink is transmitted to the base station. In general, RF circuit 1010 includes, but is not limited to, an antenna, at least one Amplifier, a transceiver, a coupler, a Low Noise Amplifier (Low Noise Amplifier; LNA), a duplexer, and the like. In addition, the RF circuitry 1010 may also communicate with networks and other devices via wireless communications. The wireless communication may use any communication standard or protocol, including but not limited to Global System for Mobile communications (GSM), General Packet Radio Service (GPRS), Code Division Multiple Access (CDMA), Wideband Code Division Multiple Access (WCDMA), Long Term Evolution (LTE), e-mail, Short message Service (Short SMS), and so on.

The memory 1020 can be used for storing software programs and modules, and the processor 1080 executes various functional applications and data processing of the mobile phone by operating the software programs and modules stored in the memory 1020. The memory 1020 may mainly include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required by at least one function (such as a sound playing function, an image playing function, etc.), and the like; the storage data area may store data (such as audio data, a phonebook, etc.) created according to the use of the cellular phone, and the like. Further, the memory 1020 may include high speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other volatile solid state storage device.

The input unit 1030 may be used to receive input numeric or character information and generate key signal inputs related to user settings and function control of the cellular phone. Specifically, the input unit 1030 may include a touch panel 1031 and other input devices 1032. The touch panel 1031, also referred to as a touch screen, may collect touch operations by a user (e.g., operations by a user on or near the touch panel 1031 using any suitable object or accessory such as a finger, a stylus, etc.) and drive corresponding connection devices according to a preset program. Alternatively, the touch panel 1031 may include two parts, a touch detection device and a touch controller. The touch detection device detects the touch direction of a user, detects a signal brought by touch operation and transmits the signal to the touch controller; the touch controller receives touch information from the touch sensing device, converts the touch information into touch point coordinates, sends the touch point coordinates to the processor 1080, and can receive and execute commands sent by the processor 1080. In addition, the touch panel 1031 may be implemented by various types such as a resistive type, a capacitive type, an infrared ray, and a surface acoustic wave. The input unit 1030 may include other input devices 1032 in addition to the touch panel 1031. In particular, other input devices 1032 may include, but are not limited to, one or more of a physical keyboard, function keys (such as volume control keys, switch keys, etc.), a track ball, a mouse, a joystick, or the like.

The display unit 1040 may be used to display information input by a user or information provided to the user and various menus of the cellular phone. The Display unit 1040 may include a Display panel 1041, and optionally, the Display panel 1041 may be configured by using a Liquid Crystal Display (LCD), an Organic Light-Emitting Diode (OLED), or the like. Further, the touch panel 1031 can cover the display panel 1041, and when the touch panel 1031 detects a touch operation on or near the touch panel 1031, the touch operation is transmitted to the processor 1080 to determine the type of the touch event, and then the processor 1080 provides a corresponding visual output on the display panel 1041 according to the type of the touch event. Although in fig. 8, the touch panel 1031 and the display panel 1041 are two separate components to implement the input and output functions of the mobile phone, in some embodiments, the touch panel 1031 and the display panel 1041 may be integrated to implement the input and output functions of the mobile phone.

The handset may also include at least one sensor 1050, such as a light sensor, motion sensor, and other sensors. Specifically, the light sensor may include an ambient light sensor and a proximity sensor, wherein the ambient light sensor may adjust the brightness of the display panel 1041 according to the brightness of ambient light, and the proximity sensor may turn off the display panel 1041 and/or the backlight when the mobile phone moves to the ear. As one of the motion sensors, the accelerometer sensor can detect the magnitude of acceleration in each direction (generally three axes), can detect the magnitude and direction of gravity when the mobile phone is stationary, can be used for applications of recognizing the gesture of the mobile phone (such as horizontal and vertical screen switching, related games, magnetometer gesture calibration), vibration recognition related functions (such as pedometer and tapping) and the like, and can also be configured with other sensors such as a gyroscope, a barometer, a hygrometer, a thermometer, an infrared sensor and the like, which are not described herein again.

Audio circuitry 1060, speaker 1061, microphone 1062 may provide an audio interface between the user and the handset. The audio circuit 1060 can transmit the electrical signal converted from the received audio data to the speaker 1061, and the electrical signal is converted into a sound signal by the speaker 1061 and output; on the other hand, the microphone 1062 converts the collected sound signal into an electrical signal, which is received by the audio circuit 1060 and converted into audio data, which is then processed by the audio data output processor 1080 and then sent to, for example, another cellular phone via the RF circuit 1010, or output to the memory 1020 for further processing.

WiFi belongs to short-distance wireless transmission technology, and the mobile phone can help the user to send and receive e-mail, browse web pages, access streaming media, etc. through the WiFi module 1070, which provides wireless broadband internet access for the user. Although fig. 8 shows the WiFi module 1070, it is understood that it does not belong to the essential constitution of the handset, and can be omitted entirely as needed within the scope not changing the essence of the invention.

The processor 1080 is a control center of the mobile phone, connects various parts of the whole mobile phone by using various interfaces and lines, and executes various functions of the mobile phone and processes data by operating or executing software programs and/or modules stored in the memory 1020 and calling data stored in the memory 1020, thereby integrally monitoring the mobile phone. Optionally, processor 1080 may include one or more processing units; preferably, the processor 1080 may integrate an application processor, which handles primarily the operating system, user interfaces, applications, etc., and a modem processor, which handles primarily the wireless communications. It is to be appreciated that the modem processor described above may not be integrated into processor 1080.

The handset also includes a power source 1090 (e.g., a battery) for powering the various components, which may preferably be logically coupled to the processor 1080 via a power management system to manage charging, discharging, and power consumption via the power management system.

Although not shown, the mobile phone may further include a camera, a bluetooth module, etc., which are not described herein.

In the embodiment of the present application, the processor 1080 included in the terminal further has the following functions:

The embodiment of the present application further provides a computer-readable storage medium for storing a program code, where the program code is configured to execute any one implementation of the content mining method described in the foregoing embodiments.

The present application further provides a computer program product including instructions, which when run on a computer, cause the computer to perform any one of the embodiments of a content mining method described in the foregoing embodiments.

It is clear to those skilled in the art that, for convenience and brevity of description, the specific working processes of the above-described systems, apparatuses and units may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.

In the several embodiments provided in the present application, it should be understood that the disclosed system, apparatus and method may be implemented in other manners. For example, the above-described apparatus embodiments are merely illustrative, and for example, the division of the units is only one logical division, and other divisions may be realized in practice, for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or units, and may be in an electrical, mechanical or other form.

The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.

In addition, functional units in the embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit.

The integrated unit, if implemented in the form of a software functional unit and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present application may be substantially implemented or contributed to by the prior art, or all or part of the technical solution may be embodied in a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present application. And the aforementioned storage medium includes: a U disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.

The above embodiments are only used for illustrating the technical solutions of the present application, and not for limiting the same; although the present application has been described in detail with reference to the foregoing embodiments, it should be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions in the embodiments of the present application.

Claims

1. A method of content mining, comprising:

determining target mining content based on the interaction behavior data of the target user on a candidate content set, wherein the target mining content comprises one or more contents in the candidate content set;

the determining a target user from the candidate user set according to the interaction behavior data of the candidate user set associated with the seed content set to the seed content set includes:

according to the interactive behavior data of the user on the seed content set in the candidate user set, behavior scores of various preset interactive behaviors of the user in the seed content set are obtained; the plurality of preset interactive behaviors comprise one or more social behaviors; the social behavior comprises one of: clicking content, favorite content, comment content or shared content;

obtaining behavior credibility scores of the user in the seed content set according to behavior scores of various preset interactive behaviors of the user in the seed content set and weight parameters corresponding to the various preset interactive behaviors;

determining a target user from the candidate user set according to the behavior credibility score of the user in the seed content set;

the plurality of preset interaction behaviors include: 1 st interactive behavior to Nth interactive behavior, wherein N is an integer greater than 1; the candidate content set comprises 1 st candidate content to Kth candidate content, and K is an integer greater than 1;

determining target mining content based on the interaction behavior data of the target user on the candidate content set, including:

accumulating to obtain the interaction score of the target user in the jth candidate content according to the behavior data of the ith interaction behavior of the jth candidate content of the target user in the candidate content set; wherein j is more than or equal to 1 and less than or equal to K, and j is an integer; i is more than or equal to 1 and less than or equal to N, and i is an integer;

obtaining the credibility score of the jth candidate content according to the interaction score of the target user in the jth candidate content and the behavior credibility score of the target user in the seed content set;

and determining target mining content from the candidate content set according to the credibility score of each candidate content in the candidate content set.

2. The method of claim 1, wherein the target-mined content comprises a first target content and a second target content, and wherein determining the target-mined content based on the interaction behavior data of the target user with respect to the set of candidate content comprises:

determining first target content from the candidate content set according to the interaction behavior data of the target user on the candidate content set;

determining a content producer of the first target content;

acquiring second target content uploaded by the content producer; the second targeted content includes one or more content uploaded by the content producer in addition to the first targeted content.

3. The method of claim 2, further comprising:

constructing a new seed content set according to the first target content and the second target content;

and when the preset mining ending condition is not met, taking the new seed content set as a seed content set for next round of content mining, returning to execute the interactive behavior data of the candidate user set associated with the seed content set on the seed content set, determining a target user from the candidate user set and subsequent steps, and stopping mining the content until the preset mining ending condition is met.

4. The method of claim 3, wherein the constructing a new set of seed content from the first target content and the second target content comprises:

determining the content with the preset content attribute from the first target content and the second target content as qualified mining content;

and constructing the new seed content set according to the qualified mining content.

5. The method of claim 3, wherein the pre-set end-dig condition comprises:

6. The method of claim 1, wherein the determining a target user from the set of candidate users according to the behavior confidence scores of the users in the set of seed content comprises:

determining users in the candidate user set, of which the behavior credibility scores of the seed content set exceed a first preset score, as target users; alternatively, the first and second electrodes may be,

and performing descending order on the behavior credibility scores of the users in the candidate user set in the seed content set, and determining the users corresponding to the scores of the first number preset in the candidate user set as target users.

7. The method according to claim 1 or 6, wherein the plurality of preset interaction behaviors comprises: viewing content, clicking content, favorites content, commenting on content, and sharing content.

8. A content mining apparatus, comprising:

a content determining unit, configured to determine target mining content based on interaction behavior data of the target user on a candidate content set, where the target mining content includes one or more contents in the candidate content set;

the user determination unit includes:

the behavior score calculating subunit is used for obtaining behavior scores of various preset interaction behaviors of the user in the seed content set according to the interaction behavior data of the user to the seed content set in the candidate user set; the plurality of preset interactive behaviors comprise one or more social behaviors; the social behavior comprises one of: clicking content, favorite content, comment content or shared content;

the target user determining subunit is used for determining a target user from the candidate user set according to the behavior credibility score of the user in the seed content set;

the plurality of preset interaction behaviors include: 1 st interactive behavior to Nth interactive behavior, wherein N is an integer greater than 1; the candidate content set comprises 1 st candidate content to Kth candidate content, and K is an integer greater than 1; the content determination unit includes:

the interaction score calculating subunit is configured to, according to behavior data of an ith interaction behavior of a jth candidate content in the candidate content set of the target user, accumulate to obtain an interaction score of the jth candidate content of the target user; wherein j is more than or equal to 1 and less than or equal to K, and j is an integer; i is more than or equal to 1 and less than or equal to N, and i is an integer;

the credibility score operator unit is used for obtaining the credibility score of the jth candidate content according to the interaction score of the target user in the jth candidate content and the behavior credibility score of the target user in the seed content set;

and the content determining subunit is used for determining target mining content from the candidate content set according to the credibility score of each candidate content in the candidate content set.

9. A computer device, the device comprising a processor and a memory:

the memory is used for storing program codes and transmitting the program codes to the processor;

the processor is adapted to perform the steps of the content mining method according to any of claims 1-7, according to instructions in the program code.

10. A computer-readable storage medium, characterized in that the computer-readable storage medium is configured to store program code for performing the content mining method of any of claims 1-7.