CN108319622B

CN108319622B - Media content recommendation method and device

Info

Publication number: CN108319622B
Application number: CN201710037620.6A
Authority: CN
Inventors: 李天浩; 何翔; 郭卫敏; 姬硕
Original assignee: Tencent Technology Beijing Co Ltd
Current assignee: Tencent Technology Beijing Co Ltd
Priority date: 2017-01-18
Filing date: 2017-01-18
Publication date: 2022-11-11
Anticipated expiration: 2037-01-18
Also published as: CN108319622A

Abstract

The application provides a media content recommendation method, which is characterized in that on the basis of obtaining candidate media contents through collaborative filtering, the user similarity between each candidate media content and media contents browsed by a user and the content similarity are further comprehensively considered, the candidate media contents are further filtered to obtain related recommended media contents, and the media contents based on more relevance and readability of the media contents are mined. The application also provides a corresponding media content recommendation device.

Description

Media content recommendation method and device

Technical Field

The present application relates to the field of internet technologies, and in particular, to a method and an apparatus for recommending media content.

Background

At present, with the rapid development of the internet, the amount of network data is continuously increased, which brings convenience to network users to acquire information and also causes the problem of information overload, and how to quickly and effectively search and locate required information in massive data becomes a prominent problem in the current internet development and is also a hotspot of network information retrieval research.

To address the above-mentioned problems, many media content platforms recommend relevant other media content to a user when the user accesses or browses a media content. Such as: after a user opens a certain news content, the news website recommends other news related to the news currently displayed by the news client to the user in a news recommending mode to serve as extended reading. The news recommending mode can recommend personalized news information for the user, help the user to find interesting contents, effectively help the user to quickly and accurately find needed resources, and has a wide application prospect.

Disclosure of Invention

The application provides a media content recommendation method, which comprises the following steps: acquiring user access records of each media content in a preset time period, determining and storing a first similarity between every two media contents according to the user access records, wherein the first similarity represents the similarity of access users of the two media contents; receiving a media content recommendation request sent by an application client, wherein when a first media content in the media contents is accessed by a user in the application client, the application client sends the media content recommendation request; responding to the media content recommendation request, acquiring first similarity of each second media content and the first media content from the stored first similarity, and taking each second media content with the first similarity exceeding a first preset threshold as candidate media content; calculating a second similarity between each candidate media content and the first media content, the second similarity characterizing content similarity of the two media contents; calculating a recommendability score for each candidate media content based on a first similarity and a second similarity between each candidate media content of the candidate media contents and the first media content, and regarding candidate media contents with recommendability scores exceeding a second preset threshold as related recommended media contents of the first media content; and sending the link of the related recommended media content of the first media content to the application client.

The present application also provides a media content recommendation apparatus, including: the first similarity determining unit is used for acquiring user access records of each media content in a preset time period, determining and storing a first similarity between every two media contents according to the user access records, wherein the first similarity represents the similarity of access users of the two media contents; a media content recommendation request receiving unit, configured to receive a media content recommendation request sent by an application client, where when a first media content in the media contents is accessed by a user in the application client, the application client sends the media content recommendation request; the candidate media content determining unit is used for responding to the media content recommendation request, acquiring first similarity of each second media content and the first media content from the stored first similarity, and taking each second media content with the first similarity exceeding a first preset threshold value as candidate media content; the second similarity calculation unit is used for calculating second similarity between each candidate media content and the first media content, and the second similarity represents the content similarity of the two media contents; a relevant recommended media content determining unit, configured to calculate a recommendability score of each candidate media content based on a first similarity and a second similarity between each candidate media content of the candidate media contents and the first media content, and use a candidate media content with a recommendability score exceeding a second preset threshold as a relevant recommended media content of the first media content; a sending unit, configured to send a link of a recommended media content related to the first media content to the application client.

By adopting the scheme provided by the application, the related recommendable media content with stronger relevance can be obtained.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to these drawings without creative efforts.

FIG. 1 is a schematic diagram of a system architecture related to a media content recommendation method proposed in an example of the present application;

FIG. 2 is a schematic view of a user interface to which the present application relates;

FIG. 3 is a flow chart of a method for recommending media content according to an embodiment of the present application;

FIG. 4 is a schematic diagram of a comparison of a collaborative filtering recommendation algorithm and a content-based recommendation algorithm with respect to click through rate;

FIG. 5 is a schematic flow chart of calculating and obtaining a first similarity;

FIG. 6 is a diagram illustrating coverage of recommended results after parallel computation using large and small windows;

FIG. 7 is a schematic flow chart of calculating a first similarity using cosine similarities;

FIG. 8 is a schematic flow chart illustrating a process of calculating a second similarity using cosine similarity;

FIG. 9 is a graphical illustration of click through rate after employing a deduplication strategy;

FIG. 10 is a schematic diagram of a media content recommender according to an example of the present application; and

fig. 11 is a block diagram of a computing device in which a media content recommendation apparatus according to an example of the present application is located.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be obtained by a person skilled in the art without making any creative effort based on the embodiments in the present invention, belong to the protection scope of the present invention.

The application provides an internet-based media content recommendation method, which can be applied to the system architecture shown in fig. 1. As shown in fig. 1, the system architecture includes: an Application (APP) client 101, a push information platform 102 and a push information provider client 105, which entities may communicate over the internet 106, wherein the push information platform 102 comprises an application server 103 and a user access record database 104.

An end user may use an application client 101 to access an application server 103 in a push information platform 102, such as: browsing news or articles, etc. When a user accesses the application server 103 by using the application client 101, the application client 101 may report the access behavior of the user to the application server 103, and the application server 103 stores the access behavior data of the user in the user access record database 104. While the application client 101 reports the user access behavior, the application client 101 may send an information push request to the push information platform 102, and the push information platform 102 may push the media content matched with the information push request to the application client 101. Through the push information provider client 105, the push information provider can upload material of the media content it is to push to the push information platform 102 to generate corresponding media content for pushing.

When the media content is news, the system architecture shown in fig. 1 may be a system architecture for implementing news recommendation, where the push information platform 102 may be a news push platform, the push information provider may be a news publisher, the application client 101 is a news client, and the application server 103 is a news server. Fig. 2 shows a page of a news APP client, in which a piece of news being browsed by a user and 3 underlying recommended news messages related to the browsed news are shown, each recommended news message includes a title and a picture, the title and the picture of each recommended news message are clicked, and the application client 101 shows the complete content of the recommended news message. When a user browses news by using a news client, the news client sends a news pushing request to a news pushing platform, the news pushing platform sends a link of recommended news related to the browsed news to the news client, the news client displays the link in a related reading below the browsed news in the form of characters or pictures, and when the user clicks the characters or the pictures, the news client displays all contents of the recommended news.

In some examples, the push information platform 102 is based on content recommendation when obtaining the related recommended media content, and takes the media content with the keyword information related score meeting the threshold condition as the related recommended media content according to the keyword information of the media content being browsed by the user. The current content-based recommendations suffer from the following drawbacks: the quality of recommended media content is difficult to guarantee by simply depending on content correlation, and old texts and other situations affecting user experience are easy to occur on the recommendation of articles in time administration, for example.

Based on the above technical problem, the present application provides a media content recommendation method, which can be applied to the push information platform 102, as shown in fig. 3, and the method includes the following steps:

step 301: and acquiring user access records of each media content in a preset time period, determining and storing a first similarity between every two media contents according to the user access records, wherein the first similarity represents the similarity of access users of the two media contents.

In this step, the user access record of each media content is obtained first, and the access records of all users to each media content in the latest predetermined time period are obtained. When a terminal user accesses an application server by using the application client 101, the application client 101 reports access behavior data of the user to the application server 103, and the application server 103 stores the access behavior data of the user in the user access record database 104. In this step, the application server 103 obtains access records of all users to each media content in a predetermined time period from the user access record database. The behavioral record data format of each user may be (user a, media content 1, media content 3, media content 5, …), where user a may be the user ID of the user, which may include, for example, various accounts used by user a registered on various APP, websites, such as: an instant messaging number such as QQ, an e-mail address, a WeChat account, a microblog account, a Taobao account and the like.

In some examples, dirty data is filtered after obtaining a user access record for each media content, such as by removing access records that are significantly non-user-behaving and/or access records that are less influential to users from the obtained user access record. The access record of the apparent non-user behavior refers to the access behavior record of some machines accessing a large amount of media content within a predetermined time period, beyond the range of the access capability of normal users. The access record of the user with small influence refers to the access record of the user with small access amount of the media content in a preset time period, such as the access record of the user with only one media content in 12 hours. The first similarity of every two media contents is calculated based on the first-level classification of the media contents on the basis of effectively filtering dirty data, and the first similarity between any two media contents is not calculated but calculated between the media contents under the same first-level classification because the media contents under the same first-level classification have larger correlation and the media contents under different first-level classifications have smaller correlation. For example, for news media content, the first category of the news media content may include sports, current news, entertainment, finance, etc., for example, two news with the first category being sports have a greater relevance, and thus a first similarity between two news with the first category being sports is calculated. However, the news classified into sports at the first level and the news classified into news of the political affairs are generally less relevant, so that the first similarity is not calculated between the news under different first-level classifications, and the calculation efficiency is improved.

Step 302: receiving a media content recommendation request sent by an application client, wherein when a first media content in the media contents is accessed by a user in the application client, the application client sends the media content recommendation request.

When a user accesses first media content using the application client 101, the application client 101 sends a message of a media content recommendation request to the push information platform 102.

Step 303: and responding to the media content recommendation request, acquiring the first similarity between each second media content and the first media content from the stored first similarities, and taking each second media content with the first media content of which the first similarity exceeds a first preset threshold as candidate media content.

After receiving the media content recommendation request message sent by the application client 101, the push information platform 102 obtains the first similarity between each second media content and the first media content from the first similarity between each two media contents obtained and stored in step 301, and uses each second media content with the first similarity exceeding a first preset threshold as a candidate media content, thereby completing the first step of filtering of the related recommended media content. The candidate media content is acquired in a project-based collaborative filtering mode, the similarity among the articles is calculated according to the historical access records of all the users, and the articles similar to the articles liked by the users are recommended to the users. FIG. 4 illustrates the difference in click-through rate between recommendations using item-based collaborative filtering and existing content-based recommendations, wherein 50% of each is delivered for an item-based collaborative filtering recommendation versus a content-based recommendation, and it can be seen that there is a greater improvement in click-through rate for recommendations using item-based collaborative filtering compared to content-based recommendations. In other examples of the present application, candidate media content may also be obtained by a content-based recommendation method, or candidate media content may also be obtained by a collaborative filtering method based on a user, or obtained by fitting a content-based recommendation method, a user-based collaborative filtering method, and a project-based collaborative filtering method. When the number of the obtained candidate media contents is large, the top 100 pieces of the obtained candidate media contents can be taken according to the first similarity.

Step 304: a second similarity between each candidate media content and the first media content is calculated, the second similarity characterizing content similarity of the two media contents.

Each media content has some keywords, these keywords exist in the title or content of the media content, according to the keywords of the media content, the media content can be retrieved, the more the same keywords of two media contents are, the greater the content similarity is.

Step 305: calculating a recommendability score for each candidate media content based on a first similarity and a second similarity between each candidate media content of the candidate media contents and the first media content, and regarding the candidate media content with the recommendability score exceeding a second preset threshold as a relevant recommended media content of the first media content.

When relevant news is recommended, the method is different from the traditional recommendation based on the similarity of the contents, the first similarity and the second similarity of the candidate media contents are comprehensively considered, and in addition, the popularity and the time-freshness of each candidate media content can be considered.

Step 306: and sending the link of the related recommended media content of the first media content to an application client.

The method comprises the steps that after the related recommended media content of the media content which is being browsed by a user is obtained by the information pushing platform, the link of the related recommended media content is sent to an application client side, the link is displayed in related reading below the media content which is being browsed by the user in a text or picture mode by the application client side, and when the user clicks the text or picture, all content of the related recommended media content is displayed by the application client side.

By adopting the media content recommendation method provided by the application, on the basis of obtaining the candidate media content through collaborative filtering, the user similarity between each candidate media content and the media content browsed by the user and the content similarity are further comprehensively considered, the candidate media content is further filtered to obtain the related recommended media content, and more media contents with stronger relevance and readability based on the media content are mined.

In some examples, in the step 301, when determining the first similarity between every two media contents according to the user access record is performed, as shown in fig. 5, the following steps may be further included:

step 501: according to the user access record of each media content in the latest first preset time period, calculating the first similarity between every two media contents, recalculating once after a first period, and storing the first similarity in a first set, wherein the first period is less than the first preset time.

In this step, a first preset time is preset first, and a first similarity between two media contents is calculated according to a user access record in a latest first preset time period, and is recalculated once per a first cycle. The first preset time may be 12 hours, the first period may be 1 hour, the first similarity between the two media contents is calculated according to the last 12 hours of user access records, the calculation is performed again every hour, and the calculated first similarity is stored in the first set. In the step, the first similarity is calculated through a large window, the first preset time is long, and other media contents similar to the media contents are calculated and obtained relatively comprehensively aiming at one media content. However, the first period in the large window calculation is also relatively long, and the long update period is difficult to satisfy the related reading requirements of the rapidly-spread explosive media content. For example, when the update period is 1 hour, a piece of breaking news is shown in a period of 1 hour, and when the user browses the breaking news, the recommended news related to the breaking news is not calculated in a large window, and thus the recommendation of the related news cannot be made to the breaking news. To solve this problem, the present application adopts the small window calculation at the same time as the large window calculation, as described in step 502.

Step 502: according to the user access record of each media content in the latest second preset time period, calculating the first similarity between every two media contents, recalculating once after each second period, and storing the first similarity in a second set, wherein the second period is less than the second preset time, the second preset time is less than the first preset time, and the second period is less than the first period.

The step of calculating the first similarity between the two media contents is to calculate through a small window, wherein the second preset time of the small window is smaller than the first preset time of the large window, and meanwhile, the second period of the small window is smaller than the first period of the large window. If the second preset time is 1 hour and the second period is 10 minutes, calculating a first similarity between the two media contents in the small window according to the latest 1 hour user access record, updating the calculation every 10 minutes, and storing the calculated first similarity in the second set. Therefore, when the large window is still in calculation, the data of the small window is synchronized in time, and the purpose of compensating calculation delay caused by the large window is achieved.

After the first similarity between every two media contents is obtained, when the first media content is accessed by a user in the application client, the application client sends a media content recommendation request to the push information platform. The obtaining of the first similarity between each second media content and the first media content includes: the push information platform searches for and obtains a first similarity between each second media content and the first media content in the first set and the second set, specifically as described in the following steps.

Step 503: and judging whether the first similarity of each second media content and the first media content is stored in the first set.

Judging whether first similarity between each second media content and the first media content is stored in the first set, if so, searching and acquiring the first similarity between each second media content and the first media content in the first set; otherwise, searching and acquiring the first similarity of each second media content and the first media content in the second set. When determining the first similarity between each second media content and the first media content, searching in the first set to see whether the large window already calculates the first similarity between each second media content and the first media content. If the first similarity between each second media content and the first media content exists in the first set, step 504 is executed to search for the first similarity between each second media content and the first media content in the first set, otherwise step 505 is executed to search for the first similarity between each second media content and the first media content in the second set

Step 504: and searching and acquiring the first similarity of each second media content and the first media content in the first set. The first similarity obtained by calculating the large window is relatively comprehensive, so that the first similarity is preferentially searched and obtained from the first set corresponding to the large window under the condition that the first similarity is already calculated by the large window.

Step 505: and searching and acquiring the first similarity of each second media content and the first media content in the second set. And under the condition that the first similarity is not calculated in the large window, searching in the second set corresponding to the small window to obtain the first similarity.

The first similarity obtained by the large window calculation is comprehensive, but the calculation delay exists, the first similarity obtained by the small window calculation is not comprehensive enough, but can be updated in time, and after the large-small window parallel calculation is adopted based on the collaborative filtering of the project, as shown in fig. 6, the coverage rate of the related recommendation result of the media content is improved.

In some examples, in the

steps

501 and 502, when calculating the first similarity between each two media contents, the calculating the first similarity using the cosine similarity may further include, as shown in fig. 7:

step 701: an access user vector for each of two media contents is obtained.

According to the user access records in the predetermined time period obtained from the above contents, it is assumed that 10 users access the media contents in the predetermined time period, which are user a, user B, user C, user D, user E, user F, user G, user H, user I, and user J, respectively. For the media content I and the media content J, the users accessing the media content I in the preset time period comprise a user A, a user B, a user D, a user F, a user G and a user I, and the users accessing the media content J in the preset time period comprise a user A, a user B, a user C, a user D, a user I and a user J. The visiting user vector characterizes which users have visited it for a media content within the predetermined time period, so the visiting user vector corresponding to media content i is (1,1,0,1,0,1,1,0,1,0) and the visiting user vector corresponding to media content j is (1,1,1,1,0,0,0,0,1,1).

Step 702: and calculating the cosine similarity of the access user vectors of the two media contents.

As described above for media content i and media content j, the vector of the accessing user of media content i is (1,1,0,1,0,1,1,0,1,0), the vector of the accessing user of media content j is (1,1,1,1,0,0,0,0,1,1), and the cosine similarity between the vector of the accessing user of media content i and the vector of the accessing user of media content j is expressed by the following formula (1):

the cosine similarity between the media content i and the media content j can be calculated by formula (1).

Step 703: and taking the cosine similarity obtained by calculation as the first similarity between the two media contents.

In some examples, in the step 304, when calculating the second similarity between each candidate media content and the first media content is performed, calculating the second similarity by using a cosine similarity, as shown in fig. 8, the method may further include the following steps:

step 801: a keyword vector for each candidate media content is obtained.

The correspondence between keywords of all media contents viewed by all users is as follows:

media content: tag1, tag2, tag3 …, tag M

In the above formula, the media content represents all the media contents viewed by all the users, tag represents a keyword extracted from each media content, the keyword is contained in the title or content of the media content, the media content can be searched according to the keyword, and the keyword can be any Chinese, english, number or mixture of Chinese Wen Ying characters and numbers. tag1 represents the first keyword of all news viewed by all users, tag2 represents the second keyword of all news viewed by all users, tag3 represents the third keyword of all news viewed by all users, and so on, tag M represents the mth keyword of all media contents viewed by all users, and M represents the number of all keywords of all media contents viewed by all users. According to which keywords are included in a piece of media content, a keyword vector corresponding to the media content can be determined, for example, if the media content i includes tag1 and tag3, the keyword vector corresponding to the media content i is (1,0,1,0,0,0, …).

Step 802: a keyword vector of first media content is obtained. The manner of obtaining the keyword vector of the first media content is the same as the manner of obtaining the keyword vector of each candidate media content, and is not repeated herein.

Step 803: and calculating the cosine similarity of the keyword vector of the first media content and the keyword vector of each candidate media content.

When the keyword vector of a candidate media content is

Key to first media contentThe word vector is

Time, key word vector

And keyword vector

The cosine similarity of (c) is calculated by formula (1).

Step 804: and taking the cosine similarity obtained by calculation as a second similarity between each candidate media content and the first media content.

In some examples, in step 305 above, when performing calculating the recommendability score for each candidate media content based on the first and second similarities between each candidate media content and the first media content, the first and second similarities for each candidate media content may be weighted and summed to obtain the recommendability score for each candidate media content. The weight of the first similarity and the weight of the second similarity can be preset through experience, can be obtained through machine learning, and can be adjusted according to sampling results.

In other examples, in step 305, when calculating the recommendability score for each of the candidate media contents based on the first similarity and the second similarity between each of the candidate media contents and the first media content is performed, the method may include the steps of:

1) And acquiring the popularity and the time-freshness of each candidate media content. Some media content itself has a hotspot attribute, and a hotspot value is added to such media content. The timeliness is a time decay of the time of media content generation with respect to the current system time.

2) And weighting and summing the first similarity, the second similarity, the heat and/or the time-of-newness of each candidate media content to obtain the recommendability score of each candidate media content.

When the factors considered include the first similarity, the second similarity, the popularity, and the time-freshness, the recommendability score for each candidate media content is calculated using the following equation (2).

Score＝a ₁ W _CF +a ₂ W _Tag +a ₃ W _Hot +a ₄ W _Time (a _i ＞0) (2)

In the formula (2), W _CF Is a first similarity, W _Tag Is a second degree of similarity, W _Hot Is heat, W _Time Is the hour freshness, alpha ₁ 、α ₂ 、α ₃ 、α ₄ The first similarity, the second similarity, the heat degree and the time-freshness are the weight parameters respectively. And after the recommendability score of each candidate media content is calculated, taking the candidate media content with the recommendability score exceeding a second preset threshold value as the related recommended media content of the first media content.

In some examples, the media content recommendation method proposed by the present application further includes:

and eliminating the related recommended media contents which are accessed by the user from the related recommended media contents.

The principle of related recommendation often causes that the coincidence degree of related recommended media contents calculated by two media contents with similar characteristics or strong relevance is very high, so that related reading of different media contents exposes the same related recommended media contents for multiple times, and user experience and exposure of long-tailed articles are influenced. The implementation requires additional storage of exposure records of the relevant recommended media content, which is a great challenge for both the operation efficiency and the storage overhead due to the large access amount. The method includes the steps that efficient read-write Redis is supported to store exposure history, an application server maintains an accessed media content queue for each user, when the user uses the application client to browse media content, the application client reports media content browsing behavior of the user to the application server, the application server judges whether the accessed media content queue corresponding to the user exists or not, if yes, the access behavior data of the time are stored into the queue from the tail of the queue, otherwise, an empty queue is established for the user, and the access behavior data of the time are stored into the queue. To address the storage of stale data, the accessed media content queue is maintained at a fixed length beyond which data is deleted from the head of the queue. When the relevant recommended media content is pushed to the user, for each relevant butted media content in the relevant recommended media content, whether the relevant recommended media content exists in an accessed media content queue of the user is checked first, and if the relevant recommended media content exists, the relevant recommended media content is not taken as an object to be pushed to the user. Therefore, the method and the device can ensure that no accessed media content appears when the user reads the related recommended media content, and ensure the reading experience of the user. FIG. 9 shows the impact of the use of the deduplication strategy on the overall article click through rate CTR and the click through rate of self-media articles (OM articles) when the media content is articles. The gray scale corresponding to each coordinate point on the curve is the release proportion of the relevant recommended articles adopting the de-duplication strategy, the lower curve is the whole article CTR, and the upper curve is the OM article CTR. It has also been found in practice that the total number of articles exposed using the deduplication strategy also increases. That is to say, more articles are browsed by the user, and the information pushing effect of the whole system is improved.

In some examples, the media content recommendation method proposed by the present application further includes: the obtained relevant recommended media content is further filtered in combination with the interests of the user. Specifically, interest characteristics of the user on the user portrait are obtained according to a record of the user historical access media content, the obtained related recommended media content is matched with the interest characteristics of the user, the matching degree between each candidate media content and the interest characteristics of the user is obtained, the higher the matching degree of the related recommended media content is, the higher the possibility that the user is interested in the related recommended media content is, and when the related recommended media content is recommended to the user, the less matched related recommended media content is filtered.

In response to the above media content recommendation method, some examples of the present application further provide a media content recommendation apparatus, which is applicable to the application server 103 in the push information platform 102, as shown in fig. 10, the apparatus includes:

a first similarity determining unit 1001, configured to obtain a user access record for each media content within a predetermined time period, and determine and store a first similarity between every two media contents according to the user access record, where the first similarity represents a similarity of access users of the two media contents;

a media content recommendation request receiving unit 1002, configured to receive a media content recommendation request sent by an application client, where when a first media content in the media contents is accessed by a user in the application client, the application client sends the media content recommendation request;

a candidate media content determining unit 1003, configured to, in response to the media content recommendation request, obtain first similarities of the second media contents and the first media content from the stored first similarities, and use, as candidate media contents, the second media contents whose first similarities with the first media content exceed a first preset threshold;

a second similarity calculation unit 1004 for calculating a second similarity between each candidate media content and the first media content, the second similarity characterizing the content similarity of the two media contents;

a related recommended media content determining unit 1005, configured to calculate a recommendability score of each candidate media content based on a first similarity and a second similarity between each candidate media content of the candidate media contents and the first media content, and to take a candidate media content with a recommendability score exceeding a second preset threshold as a related recommended media content of the first media content;

a sending unit 1006, configured to send a link of a related recommended media content of the first media content to the application client.

By adopting the media content recommendation device provided by the application, on the basis of obtaining the candidate media content through collaborative filtering, the user similarity between each candidate media content and the media content browsed by the user and the content similarity are further comprehensively considered, the candidate media content is further filtered to obtain the related recommended media content, and more media contents with stronger relevance and readability based on the media content are mined.

In some embodiments of the present application, the first similarity determination unit 1001 includes:

the first calculation module is used for calculating a first similarity between every two media contents according to the user access records of all the media contents in the latest first preset time period, recalculating every first period, and storing the first similarity in a first set, wherein the first period is less than the first preset time;

a second calculating module, configured to calculate, according to a user access record of each media content in a second latest preset time period, a first similarity between every two media contents, recalculate every second period, and store the first similarity in a second set, where the second period is shorter than the second preset time, the second preset time is shorter than the first preset time, and the second period is shorter than the first period;

the candidate media content determination unit 1003 includes:

a first similarity searching and obtaining module, configured to determine whether a first similarity between each second media content and the first media content is stored in the first set, and if so, search and obtain the first similarity between each second media content and the first media content in the first set; otherwise, searching and acquiring the first similarity of each second media content and the first media content in the second set.

In some embodiments of the present application, the related recommended media content determining unit 1005 is configured to sum the first similarity and the second similarity of each candidate media content by weighting to obtain the recommendability score of each candidate media content.

In some embodiments of the present application, the related recommended-media content determining unit 1005 is configured to:

obtains the popularity and the time-freshness of each candidate media content,

and weighting and summing the first similarity, the second similarity, the heat and/or the time novelty of each candidate media content to obtain the recommendability score of each candidate media content.

In some embodiments of the present application, the media content recommender further comprises:

and the duplication eliminating unit is used for eliminating the related recommended media contents accessed by the user from the related recommended media contents.

The modules may be implemented in the same server device or server cluster, or may be distributed in different server devices or server clusters.

The implementation principle of the functions of the above modules has been described in detail previously, and is not described in detail herein.

In one example, the modules of the media content recommender can be run on various computing devices and loaded into the memory of the computing device.

Fig. 11 shows a composition configuration diagram of a computing device in which the media content recommendation apparatus is located. As shown in fig. 11, the computing device includes one or more processors (CPUs) 1102, a communications module 1104, a memory 1106, a user interface 1110, and a communications bus 1108 for interconnecting these components.

The processor 1102 may receive and transmit data via the communication module 1104 to enable network communications and/or local communications.

The user interface 1110 includes one or more output devices 1112, including one or more speakers and/or one or more visual displays. The user interface 1110 also includes one or more input devices 1114, including, for example, a keyboard, a mouse, a voice command input unit or microphone, a touch screen display, a touch-sensitive input pad, a gesture-capture camera or other input buttons or controls, and the like.

Memory 1106 may be high-speed random access memory such as DRAM, SRAM, DDR RAM, or other random access solid state memory devices; or non-volatile memory, such as one or more magnetic disk storage devices, optical disk storage devices, flash memory devices, or other non-volatile solid-state storage devices.

The memory 1106 stores a set of instructions executable by the processor 1102, including:

an operating system 1116, including programs for handling various basic system services and for performing hardware dependent tasks;

the applications 1118, including various applications for media content recommendation, may implement the process flow in the above examples, such as may include some or all of the elements of the media content recommendation device shown in FIG. 10. At least one of the units 1001-1006 may store machine executable instructions. The processor 1102 is capable of performing the functions of at least one of the blocks 1001-1006 described above by executing machine-executable instructions in at least one of the blocks 1001-1006 in the memory 1106.

It should be noted that not all steps and modules in the above flows and structures are necessary, and some steps or modules may be omitted according to actual needs. The execution order of the steps is not fixed and can be adjusted as required. The division of each module is only for convenience of describing adopted functional division, and in actual implementation, one module may be divided into multiple modules, and the functions of multiple modules may also be implemented by the same module, and these modules may be located in the same device or in different devices.

The hardware modules in the embodiments may be implemented in hardware or a hardware platform plus software. The software includes machine-readable instructions stored on a non-volatile storage medium. Thus, embodiments may also be embodied as software products.

In various examples, the hardware may be implemented by specialized hardware or hardware executing machine-readable instructions. For example, the hardware may be specially designed permanent circuits or logic devices (e.g., special purpose processors, such as FPGAs or ASICs) for performing the specified operations. The hardware may also include programmable logic devices or circuits temporarily configured by software (e.g., including a general purpose processor or other programmable processor) to perform certain operations.

In addition, each example of the present application can be realized by a data processing program executed by a data processing apparatus such as a computer. It is clear that a data processing program constitutes the present application. Further, the data processing program, which is generally stored in one storage medium, is executed by directly reading the program out of the storage medium or by installing or copying the program into a storage device (such as a hard disk and/or a memory) of the data processing device. Such a storage medium therefore also constitutes the present application, which also provides a non-volatile storage medium in which a data processing program is stored, which data processing program can be used to carry out any one of the above-mentioned method examples of the present application.

The machine-readable instructions corresponding to the modules in fig. 11 may cause an operating system or the like operating on the computer to perform some or all of the operations described herein. The nonvolatile computer-readable storage medium may be a memory provided in an expansion board inserted into the computer or written to a memory provided in an expansion unit connected to the computer. A CPU or the like mounted on the expansion board or the expansion unit may perform part or all of the actual operations according to the instructions.

The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents, improvements and the like made within the spirit and principle of the present invention should be included in the scope of the present invention.

Claims

1. A method for recommending media contents, comprising:

acquiring user access records of each media content in a preset time period, determining and storing a first similarity between every two media contents according to the user access records, wherein the first similarity represents the similarity of access users of the two media contents;

receiving a media content recommendation request sent by an application client, wherein when a first media content in the media contents is accessed by a user in the application client, the application client sends the media content recommendation request;

responding to the media content recommendation request, acquiring first similarity of each second media content and the first media content from the stored first similarity, and taking each second media content with the first similarity exceeding a first preset threshold value as candidate media content;

calculating a second similarity between each candidate media content and the first media content, the second similarity characterizing content similarity of the two media contents;

calculating a recommendability score for each candidate media content based on a first similarity and a second similarity between each candidate media content of the candidate media contents and the first media content, and regarding candidate media contents with recommendability scores exceeding a second preset threshold as related recommended media contents of the first media content;

and sending a link of the related recommended media content of the first media content to the application client.

2. The media content recommendation method according to claim 1, wherein the obtaining a user access record for each media content within a predetermined time period, and determining a first similarity between every two media contents according to the user access record comprises:

calculating a first similarity between every two media contents according to a user access record of each media content in a latest first preset time period, recalculating once after each first period, and storing the first similarity in a first set, wherein the first period is less than the first preset time;

calculating a first similarity between every two media contents according to a user access record of each media content in a second latest preset time period, recalculating every second period, and storing the first similarity in a second set, wherein the second period is shorter than the second preset time, the second preset time is shorter than the first preset time, and the second period is shorter than the first period;

the obtaining of the first similarity between each second media content and the first media content includes:

judging whether first similarity between each second media content and the first media content is stored in the first set, if so, searching and acquiring the first similarity between each second media content and the first media content in the first set; otherwise, searching and acquiring the first similarity of each second media content and the first media content in the second set.

3. The media content recommendation method of claim 2, wherein the calculating a first similarity between each two media contents comprises:

obtaining an access user vector of each of the two media contents;

calculating cosine similarity of access user vectors of the two media contents;

and taking the cosine similarity obtained by calculation as the first similarity between the two media contents.

4. The media content recommendation method of claim 1, wherein said calculating a second similarity between each candidate media content and the first media content comprises:

obtaining a keyword vector of each candidate media content;

obtaining a keyword vector of the first media content;

calculating cosine similarity of the keyword vector of the first media content and the keyword vector of each candidate media content;

and taking the cosine similarity obtained by calculation as a second similarity between each candidate media content and the first media content.

5. The media content recommendation method of claim 1, wherein said calculating a recommendability score for each candidate media content comprises: and weighting and summing the first similarity and the second similarity of each candidate media content to obtain the recommendability score of each candidate media content.

6. The media content recommendation method of claim 1, wherein said calculating a recommendability score for each candidate media content comprises:

obtains the popularity and the time-freshness of each candidate media content,

7. The media content recommendation method of claim 1, wherein the method further comprises:

8. A media content recommender, comprising:

the first similarity determining unit is used for acquiring user access records of each media content in a preset time period, determining and storing a first similarity between every two media contents according to the user access records, wherein the first similarity represents the similarity of access users of the two media contents;

a media content recommendation request receiving unit, configured to receive a media content recommendation request sent by an application client, where when a first media content in the media contents is accessed by a user in the application client, the application client sends the media content recommendation request;

the candidate media content determining unit is used for responding to the media content recommendation request, acquiring first similarity of each second media content and the first media content from the stored first similarity, and taking each second media content with the first similarity exceeding a first preset threshold value as candidate media content;

the second similarity calculation unit is used for calculating second similarity between each candidate media content and the first media content, and the second similarity represents the content similarity of the two media contents;

a relevant recommended media content determining unit, configured to calculate a recommendability score of each candidate media content based on a first similarity and a second similarity between each candidate media content of the candidate media contents and the first media content, and use a candidate media content with a recommendability score exceeding a second preset threshold as a relevant recommended media content of the first media content;

a sending unit, configured to send a link of a recommended media content related to the first media content to the application client.

9. The media content recommendation device of claim 8, wherein the first similarity determination unit comprises:

a second calculating module, configured to calculate, according to a user access record of each media content in a second latest preset time period, a first similarity between every two media contents, recalculate every second cycle, and store the first similarity in a second set, where the second cycle is shorter than the second preset time, the second preset time is shorter than the first preset time, and the second cycle is shorter than the first cycle;

the candidate media content determining unit includes:

10. The media content recommender according to claim 8, wherein said related recommended media content determining unit is configured to sum the first similarity and the second similarity of each candidate media content by weighting to obtain the recommendability score of each candidate media content.

11. The media content recommender according to claim 8, wherein the related recommended media content determining unit is adapted to:

obtains the popularity and the time-freshness of each candidate media content,

12. The media content recommendation device of claim 8, wherein the device further comprises: