CN108319622B - Media content recommendation method and device - Google Patents

Media content recommendation method and device Download PDF

Info

Publication number
CN108319622B
CN108319622B CN201710037620.6A CN201710037620A CN108319622B CN 108319622 B CN108319622 B CN 108319622B CN 201710037620 A CN201710037620 A CN 201710037620A CN 108319622 B CN108319622 B CN 108319622B
Authority
CN
China
Prior art keywords
media content
similarity
media
candidate
contents
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201710037620.6A
Other languages
Chinese (zh)
Other versions
CN108319622A (en
Inventor
李天浩
何翔
郭卫敏
姬硕
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tencent Technology Beijing Co Ltd
Original Assignee
Tencent Technology Beijing Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tencent Technology Beijing Co Ltd filed Critical Tencent Technology Beijing Co Ltd
Priority to CN201710037620.6A priority Critical patent/CN108319622B/en
Publication of CN108319622A publication Critical patent/CN108319622A/en
Application granted granted Critical
Publication of CN108319622B publication Critical patent/CN108319622B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9535Search customisation based on user profiles and personalisation

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The application provides a media content recommendation method, which is characterized in that on the basis of obtaining candidate media contents through collaborative filtering, the user similarity between each candidate media content and media contents browsed by a user and the content similarity are further comprehensively considered, the candidate media contents are further filtered to obtain related recommended media contents, and the media contents based on more relevance and readability of the media contents are mined. The application also provides a corresponding media content recommendation device.

Description

Media content recommendation method and device
Technical Field
The present application relates to the field of internet technologies, and in particular, to a method and an apparatus for recommending media content.
Background
At present, with the rapid development of the internet, the amount of network data is continuously increased, which brings convenience to network users to acquire information and also causes the problem of information overload, and how to quickly and effectively search and locate required information in massive data becomes a prominent problem in the current internet development and is also a hotspot of network information retrieval research.
To address the above-mentioned problems, many media content platforms recommend relevant other media content to a user when the user accesses or browses a media content. Such as: after a user opens a certain news content, the news website recommends other news related to the news currently displayed by the news client to the user in a news recommending mode to serve as extended reading. The news recommending mode can recommend personalized news information for the user, help the user to find interesting contents, effectively help the user to quickly and accurately find needed resources, and has a wide application prospect.
Disclosure of Invention
The application provides a media content recommendation method, which comprises the following steps: acquiring user access records of each media content in a preset time period, determining and storing a first similarity between every two media contents according to the user access records, wherein the first similarity represents the similarity of access users of the two media contents; receiving a media content recommendation request sent by an application client, wherein when a first media content in the media contents is accessed by a user in the application client, the application client sends the media content recommendation request; responding to the media content recommendation request, acquiring first similarity of each second media content and the first media content from the stored first similarity, and taking each second media content with the first similarity exceeding a first preset threshold as candidate media content; calculating a second similarity between each candidate media content and the first media content, the second similarity characterizing content similarity of the two media contents; calculating a recommendability score for each candidate media content based on a first similarity and a second similarity between each candidate media content of the candidate media contents and the first media content, and regarding candidate media contents with recommendability scores exceeding a second preset threshold as related recommended media contents of the first media content; and sending the link of the related recommended media content of the first media content to the application client.
The present application also provides a media content recommendation apparatus, including: the first similarity determining unit is used for acquiring user access records of each media content in a preset time period, determining and storing a first similarity between every two media contents according to the user access records, wherein the first similarity represents the similarity of access users of the two media contents; a media content recommendation request receiving unit, configured to receive a media content recommendation request sent by an application client, where when a first media content in the media contents is accessed by a user in the application client, the application client sends the media content recommendation request; the candidate media content determining unit is used for responding to the media content recommendation request, acquiring first similarity of each second media content and the first media content from the stored first similarity, and taking each second media content with the first similarity exceeding a first preset threshold value as candidate media content; the second similarity calculation unit is used for calculating second similarity between each candidate media content and the first media content, and the second similarity represents the content similarity of the two media contents; a relevant recommended media content determining unit, configured to calculate a recommendability score of each candidate media content based on a first similarity and a second similarity between each candidate media content of the candidate media contents and the first media content, and use a candidate media content with a recommendability score exceeding a second preset threshold as a relevant recommended media content of the first media content; a sending unit, configured to send a link of a recommended media content related to the first media content to the application client.
By adopting the scheme provided by the application, the related recommendable media content with stronger relevance can be obtained.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to these drawings without creative efforts.
FIG. 1 is a schematic diagram of a system architecture related to a media content recommendation method proposed in an example of the present application;
FIG. 2 is a schematic view of a user interface to which the present application relates;
FIG. 3 is a flow chart of a method for recommending media content according to an embodiment of the present application;
FIG. 4 is a schematic diagram of a comparison of a collaborative filtering recommendation algorithm and a content-based recommendation algorithm with respect to click through rate;
FIG. 5 is a schematic flow chart of calculating and obtaining a first similarity;
FIG. 6 is a diagram illustrating coverage of recommended results after parallel computation using large and small windows;
FIG. 7 is a schematic flow chart of calculating a first similarity using cosine similarities;
FIG. 8 is a schematic flow chart illustrating a process of calculating a second similarity using cosine similarity;
FIG. 9 is a graphical illustration of click through rate after employing a deduplication strategy;
FIG. 10 is a schematic diagram of a media content recommender according to an example of the present application; and
fig. 11 is a block diagram of a computing device in which a media content recommendation apparatus according to an example of the present application is located.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be obtained by a person skilled in the art without making any creative effort based on the embodiments in the present invention, belong to the protection scope of the present invention.
The application provides an internet-based media content recommendation method, which can be applied to the system architecture shown in fig. 1. As shown in fig. 1, the system architecture includes: an Application (APP) client 101, a push information platform 102 and a push information provider client 105, which entities may communicate over the internet 106, wherein the push information platform 102 comprises an application server 103 and a user access record database 104.
An end user may use an application client 101 to access an application server 103 in a push information platform 102, such as: browsing news or articles, etc. When a user accesses the application server 103 by using the application client 101, the application client 101 may report the access behavior of the user to the application server 103, and the application server 103 stores the access behavior data of the user in the user access record database 104. While the application client 101 reports the user access behavior, the application client 101 may send an information push request to the push information platform 102, and the push information platform 102 may push the media content matched with the information push request to the application client 101. Through the push information provider client 105, the push information provider can upload material of the media content it is to push to the push information platform 102 to generate corresponding media content for pushing.
When the media content is news, the system architecture shown in fig. 1 may be a system architecture for implementing news recommendation, where the push information platform 102 may be a news push platform, the push information provider may be a news publisher, the application client 101 is a news client, and the application server 103 is a news server. Fig. 2 shows a page of a news APP client, in which a piece of news being browsed by a user and 3 underlying recommended news messages related to the browsed news are shown, each recommended news message includes a title and a picture, the title and the picture of each recommended news message are clicked, and the application client 101 shows the complete content of the recommended news message. When a user browses news by using a news client, the news client sends a news pushing request to a news pushing platform, the news pushing platform sends a link of recommended news related to the browsed news to the news client, the news client displays the link in a related reading below the browsed news in the form of characters or pictures, and when the user clicks the characters or the pictures, the news client displays all contents of the recommended news.
In some examples, the push information platform 102 is based on content recommendation when obtaining the related recommended media content, and takes the media content with the keyword information related score meeting the threshold condition as the related recommended media content according to the keyword information of the media content being browsed by the user. The current content-based recommendations suffer from the following drawbacks: the quality of recommended media content is difficult to guarantee by simply depending on content correlation, and old texts and other situations affecting user experience are easy to occur on the recommendation of articles in time administration, for example.
Based on the above technical problem, the present application provides a media content recommendation method, which can be applied to the push information platform 102, as shown in fig. 3, and the method includes the following steps:
step 301: and acquiring user access records of each media content in a preset time period, determining and storing a first similarity between every two media contents according to the user access records, wherein the first similarity represents the similarity of access users of the two media contents.
In this step, the user access record of each media content is obtained first, and the access records of all users to each media content in the latest predetermined time period are obtained. When a terminal user accesses an application server by using the application client 101, the application client 101 reports access behavior data of the user to the application server 103, and the application server 103 stores the access behavior data of the user in the user access record database 104. In this step, the application server 103 obtains access records of all users to each media content in a predetermined time period from the user access record database. The behavioral record data format of each user may be (user a, media content 1, media content 3, media content 5, …), where user a may be the user ID of the user, which may include, for example, various accounts used by user a registered on various APP, websites, such as: an instant messaging number such as QQ, an e-mail address, a WeChat account, a microblog account, a Taobao account and the like.
In some examples, dirty data is filtered after obtaining a user access record for each media content, such as by removing access records that are significantly non-user-behaving and/or access records that are less influential to users from the obtained user access record. The access record of the apparent non-user behavior refers to the access behavior record of some machines accessing a large amount of media content within a predetermined time period, beyond the range of the access capability of normal users. The access record of the user with small influence refers to the access record of the user with small access amount of the media content in a preset time period, such as the access record of the user with only one media content in 12 hours. The first similarity of every two media contents is calculated based on the first-level classification of the media contents on the basis of effectively filtering dirty data, and the first similarity between any two media contents is not calculated but calculated between the media contents under the same first-level classification because the media contents under the same first-level classification have larger correlation and the media contents under different first-level classifications have smaller correlation. For example, for news media content, the first category of the news media content may include sports, current news, entertainment, finance, etc., for example, two news with the first category being sports have a greater relevance, and thus a first similarity between two news with the first category being sports is calculated. However, the news classified into sports at the first level and the news classified into news of the political affairs are generally less relevant, so that the first similarity is not calculated between the news under different first-level classifications, and the calculation efficiency is improved.
Step 302: receiving a media content recommendation request sent by an application client, wherein when a first media content in the media contents is accessed by a user in the application client, the application client sends the media content recommendation request.
When a user accesses first media content using the application client 101, the application client 101 sends a message of a media content recommendation request to the push information platform 102.
Step 303: and responding to the media content recommendation request, acquiring the first similarity between each second media content and the first media content from the stored first similarities, and taking each second media content with the first media content of which the first similarity exceeds a first preset threshold as candidate media content.
After receiving the media content recommendation request message sent by the application client 101, the push information platform 102 obtains the first similarity between each second media content and the first media content from the first similarity between each two media contents obtained and stored in step 301, and uses each second media content with the first similarity exceeding a first preset threshold as a candidate media content, thereby completing the first step of filtering of the related recommended media content. The candidate media content is acquired in a project-based collaborative filtering mode, the similarity among the articles is calculated according to the historical access records of all the users, and the articles similar to the articles liked by the users are recommended to the users. FIG. 4 illustrates the difference in click-through rate between recommendations using item-based collaborative filtering and existing content-based recommendations, wherein 50% of each is delivered for an item-based collaborative filtering recommendation versus a content-based recommendation, and it can be seen that there is a greater improvement in click-through rate for recommendations using item-based collaborative filtering compared to content-based recommendations. In other examples of the present application, candidate media content may also be obtained by a content-based recommendation method, or candidate media content may also be obtained by a collaborative filtering method based on a user, or obtained by fitting a content-based recommendation method, a user-based collaborative filtering method, and a project-based collaborative filtering method. When the number of the obtained candidate media contents is large, the top 100 pieces of the obtained candidate media contents can be taken according to the first similarity.
Step 304: a second similarity between each candidate media content and the first media content is calculated, the second similarity characterizing content similarity of the two media contents.
Each media content has some keywords, these keywords exist in the title or content of the media content, according to the keywords of the media content, the media content can be retrieved, the more the same keywords of two media contents are, the greater the content similarity is.
Step 305: calculating a recommendability score for each candidate media content based on a first similarity and a second similarity between each candidate media content of the candidate media contents and the first media content, and regarding the candidate media content with the recommendability score exceeding a second preset threshold as a relevant recommended media content of the first media content.
When relevant news is recommended, the method is different from the traditional recommendation based on the similarity of the contents, the first similarity and the second similarity of the candidate media contents are comprehensively considered, and in addition, the popularity and the time-freshness of each candidate media content can be considered.
Step 306: and sending the link of the related recommended media content of the first media content to an application client.
The method comprises the steps that after the related recommended media content of the media content which is being browsed by a user is obtained by the information pushing platform, the link of the related recommended media content is sent to an application client side, the link is displayed in related reading below the media content which is being browsed by the user in a text or picture mode by the application client side, and when the user clicks the text or picture, all content of the related recommended media content is displayed by the application client side.
By adopting the media content recommendation method provided by the application, on the basis of obtaining the candidate media content through collaborative filtering, the user similarity between each candidate media content and the media content browsed by the user and the content similarity are further comprehensively considered, the candidate media content is further filtered to obtain the related recommended media content, and more media contents with stronger relevance and readability based on the media content are mined.
In some examples, in the step 301, when determining the first similarity between every two media contents according to the user access record is performed, as shown in fig. 5, the following steps may be further included:
step 501: according to the user access record of each media content in the latest first preset time period, calculating the first similarity between every two media contents, recalculating once after a first period, and storing the first similarity in a first set, wherein the first period is less than the first preset time.
In this step, a first preset time is preset first, and a first similarity between two media contents is calculated according to a user access record in a latest first preset time period, and is recalculated once per a first cycle. The first preset time may be 12 hours, the first period may be 1 hour, the first similarity between the two media contents is calculated according to the last 12 hours of user access records, the calculation is performed again every hour, and the calculated first similarity is stored in the first set. In the step, the first similarity is calculated through a large window, the first preset time is long, and other media contents similar to the media contents are calculated and obtained relatively comprehensively aiming at one media content. However, the first period in the large window calculation is also relatively long, and the long update period is difficult to satisfy the related reading requirements of the rapidly-spread explosive media content. For example, when the update period is 1 hour, a piece of breaking news is shown in a period of 1 hour, and when the user browses the breaking news, the recommended news related to the breaking news is not calculated in a large window, and thus the recommendation of the related news cannot be made to the breaking news. To solve this problem, the present application adopts the small window calculation at the same time as the large window calculation, as described in step 502.
Step 502: according to the user access record of each media content in the latest second preset time period, calculating the first similarity between every two media contents, recalculating once after each second period, and storing the first similarity in a second set, wherein the second period is less than the second preset time, the second preset time is less than the first preset time, and the second period is less than the first period.
The step of calculating the first similarity between the two media contents is to calculate through a small window, wherein the second preset time of the small window is smaller than the first preset time of the large window, and meanwhile, the second period of the small window is smaller than the first period of the large window. If the second preset time is 1 hour and the second period is 10 minutes, calculating a first similarity between the two media contents in the small window according to the latest 1 hour user access record, updating the calculation every 10 minutes, and storing the calculated first similarity in the second set. Therefore, when the large window is still in calculation, the data of the small window is synchronized in time, and the purpose of compensating calculation delay caused by the large window is achieved.
After the first similarity between every two media contents is obtained, when the first media content is accessed by a user in the application client, the application client sends a media content recommendation request to the push information platform. The obtaining of the first similarity between each second media content and the first media content includes: the push information platform searches for and obtains a first similarity between each second media content and the first media content in the first set and the second set, specifically as described in the following steps.
Step 503: and judging whether the first similarity of each second media content and the first media content is stored in the first set.
Judging whether first similarity between each second media content and the first media content is stored in the first set, if so, searching and acquiring the first similarity between each second media content and the first media content in the first set; otherwise, searching and acquiring the first similarity of each second media content and the first media content in the second set. When determining the first similarity between each second media content and the first media content, searching in the first set to see whether the large window already calculates the first similarity between each second media content and the first media content. If the first similarity between each second media content and the first media content exists in the first set, step 504 is executed to search for the first similarity between each second media content and the first media content in the first set, otherwise step 505 is executed to search for the first similarity between each second media content and the first media content in the second set
Step 504: and searching and acquiring the first similarity of each second media content and the first media content in the first set. The first similarity obtained by calculating the large window is relatively comprehensive, so that the first similarity is preferentially searched and obtained from the first set corresponding to the large window under the condition that the first similarity is already calculated by the large window.
Step 505: and searching and acquiring the first similarity of each second media content and the first media content in the second set. And under the condition that the first similarity is not calculated in the large window, searching in the second set corresponding to the small window to obtain the first similarity.
The first similarity obtained by the large window calculation is comprehensive, but the calculation delay exists, the first similarity obtained by the small window calculation is not comprehensive enough, but can be updated in time, and after the large-small window parallel calculation is adopted based on the collaborative filtering of the project, as shown in fig. 6, the coverage rate of the related recommendation result of the media content is improved.
In some examples, in the steps 501 and 502, when calculating the first similarity between each two media contents, the calculating the first similarity using the cosine similarity may further include, as shown in fig. 7:
step 701: an access user vector for each of two media contents is obtained.
According to the user access records in the predetermined time period obtained from the above contents, it is assumed that 10 users access the media contents in the predetermined time period, which are user a, user B, user C, user D, user E, user F, user G, user H, user I, and user J, respectively. For the media content I and the media content J, the users accessing the media content I in the preset time period comprise a user A, a user B, a user D, a user F, a user G and a user I, and the users accessing the media content J in the preset time period comprise a user A, a user B, a user C, a user D, a user I and a user J. The visiting user vector characterizes which users have visited it for a media content within the predetermined time period, so the visiting user vector corresponding to media content i is (1,1,0,1,0,1,1,0,1,0) and the visiting user vector corresponding to media content j is (1,1,1,1,0,0,0,0,1,1).
Step 702: and calculating the cosine similarity of the access user vectors of the two media contents.
As described above for media content i and media content j, the vector of the accessing user of media content i is (1,1,0,1,0,1,1,0,1,0), the vector of the accessing user of media content j is (1,1,1,1,0,0,0,0,1,1), and the cosine similarity between the vector of the accessing user of media content i and the vector of the accessing user of media content j is expressed by the following formula (1):
Figure BDA0001212594260000101
the cosine similarity between the media content i and the media content j can be calculated by formula (1).
Step 703: and taking the cosine similarity obtained by calculation as the first similarity between the two media contents.
In some examples, in the step 304, when calculating the second similarity between each candidate media content and the first media content is performed, calculating the second similarity by using a cosine similarity, as shown in fig. 8, the method may further include the following steps:
step 801: a keyword vector for each candidate media content is obtained.
The correspondence between keywords of all media contents viewed by all users is as follows:
media content: tag1, tag2, tag3 …, tag M
In the above formula, the media content represents all the media contents viewed by all the users, tag represents a keyword extracted from each media content, the keyword is contained in the title or content of the media content, the media content can be searched according to the keyword, and the keyword can be any Chinese, english, number or mixture of Chinese Wen Ying characters and numbers. tag1 represents the first keyword of all news viewed by all users, tag2 represents the second keyword of all news viewed by all users, tag3 represents the third keyword of all news viewed by all users, and so on, tag M represents the mth keyword of all media contents viewed by all users, and M represents the number of all keywords of all media contents viewed by all users. According to which keywords are included in a piece of media content, a keyword vector corresponding to the media content can be determined, for example, if the media content i includes tag1 and tag3, the keyword vector corresponding to the media content i is (1,0,1,0,0,0, …).
Step 802: a keyword vector of first media content is obtained. The manner of obtaining the keyword vector of the first media content is the same as the manner of obtaining the keyword vector of each candidate media content, and is not repeated herein.
Step 803: and calculating the cosine similarity of the keyword vector of the first media content and the keyword vector of each candidate media content.
When the keyword vector of a candidate media content is
Figure BDA0001212594260000111
Key to first media contentThe word vector is
Figure BDA0001212594260000112
Time, key word vector
Figure BDA0001212594260000113
And keyword vector
Figure BDA0001212594260000114
The cosine similarity of (c) is calculated by formula (1).
Step 804: and taking the cosine similarity obtained by calculation as a second similarity between each candidate media content and the first media content.
In some examples, in step 305 above, when performing calculating the recommendability score for each candidate media content based on the first and second similarities between each candidate media content and the first media content, the first and second similarities for each candidate media content may be weighted and summed to obtain the recommendability score for each candidate media content. The weight of the first similarity and the weight of the second similarity can be preset through experience, can be obtained through machine learning, and can be adjusted according to sampling results.
In other examples, in step 305, when calculating the recommendability score for each of the candidate media contents based on the first similarity and the second similarity between each of the candidate media contents and the first media content is performed, the method may include the steps of:
1) And acquiring the popularity and the time-freshness of each candidate media content. Some media content itself has a hotspot attribute, and a hotspot value is added to such media content. The timeliness is a time decay of the time of media content generation with respect to the current system time.
2) And weighting and summing the first similarity, the second similarity, the heat and/or the time-of-newness of each candidate media content to obtain the recommendability score of each candidate media content.
When the factors considered include the first similarity, the second similarity, the popularity, and the time-freshness, the recommendability score for each candidate media content is calculated using the following equation (2).
Score=a 1 W CF +a 2 W Tag +a 3 W Hot +a 4 W Time (a i >0) (2)
In the formula (2), W CF Is a first similarity, W Tag Is a second degree of similarity, W Hot Is heat, W Time Is the hour freshness, alpha 1 、α 2 、α 3 、α 4 The first similarity, the second similarity, the heat degree and the time-freshness are the weight parameters respectively. And after the recommendability score of each candidate media content is calculated, taking the candidate media content with the recommendability score exceeding a second preset threshold value as the related recommended media content of the first media content.
In some examples, the media content recommendation method proposed by the present application further includes:
and eliminating the related recommended media contents which are accessed by the user from the related recommended media contents.
The principle of related recommendation often causes that the coincidence degree of related recommended media contents calculated by two media contents with similar characteristics or strong relevance is very high, so that related reading of different media contents exposes the same related recommended media contents for multiple times, and user experience and exposure of long-tailed articles are influenced. The implementation requires additional storage of exposure records of the relevant recommended media content, which is a great challenge for both the operation efficiency and the storage overhead due to the large access amount. The method includes the steps that efficient read-write Redis is supported to store exposure history, an application server maintains an accessed media content queue for each user, when the user uses the application client to browse media content, the application client reports media content browsing behavior of the user to the application server, the application server judges whether the accessed media content queue corresponding to the user exists or not, if yes, the access behavior data of the time are stored into the queue from the tail of the queue, otherwise, an empty queue is established for the user, and the access behavior data of the time are stored into the queue. To address the storage of stale data, the accessed media content queue is maintained at a fixed length beyond which data is deleted from the head of the queue. When the relevant recommended media content is pushed to the user, for each relevant butted media content in the relevant recommended media content, whether the relevant recommended media content exists in an accessed media content queue of the user is checked first, and if the relevant recommended media content exists, the relevant recommended media content is not taken as an object to be pushed to the user. Therefore, the method and the device can ensure that no accessed media content appears when the user reads the related recommended media content, and ensure the reading experience of the user. FIG. 9 shows the impact of the use of the deduplication strategy on the overall article click through rate CTR and the click through rate of self-media articles (OM articles) when the media content is articles. The gray scale corresponding to each coordinate point on the curve is the release proportion of the relevant recommended articles adopting the de-duplication strategy, the lower curve is the whole article CTR, and the upper curve is the OM article CTR. It has also been found in practice that the total number of articles exposed using the deduplication strategy also increases. That is to say, more articles are browsed by the user, and the information pushing effect of the whole system is improved.
In some examples, the media content recommendation method proposed by the present application further includes: the obtained relevant recommended media content is further filtered in combination with the interests of the user. Specifically, interest characteristics of the user on the user portrait are obtained according to a record of the user historical access media content, the obtained related recommended media content is matched with the interest characteristics of the user, the matching degree between each candidate media content and the interest characteristics of the user is obtained, the higher the matching degree of the related recommended media content is, the higher the possibility that the user is interested in the related recommended media content is, and when the related recommended media content is recommended to the user, the less matched related recommended media content is filtered.
In response to the above media content recommendation method, some examples of the present application further provide a media content recommendation apparatus, which is applicable to the application server 103 in the push information platform 102, as shown in fig. 10, the apparatus includes:
a first similarity determining unit 1001, configured to obtain a user access record for each media content within a predetermined time period, and determine and store a first similarity between every two media contents according to the user access record, where the first similarity represents a similarity of access users of the two media contents;
a media content recommendation request receiving unit 1002, configured to receive a media content recommendation request sent by an application client, where when a first media content in the media contents is accessed by a user in the application client, the application client sends the media content recommendation request;
a candidate media content determining unit 1003, configured to, in response to the media content recommendation request, obtain first similarities of the second media contents and the first media content from the stored first similarities, and use, as candidate media contents, the second media contents whose first similarities with the first media content exceed a first preset threshold;
a second similarity calculation unit 1004 for calculating a second similarity between each candidate media content and the first media content, the second similarity characterizing the content similarity of the two media contents;
a related recommended media content determining unit 1005, configured to calculate a recommendability score of each candidate media content based on a first similarity and a second similarity between each candidate media content of the candidate media contents and the first media content, and to take a candidate media content with a recommendability score exceeding a second preset threshold as a related recommended media content of the first media content;
a sending unit 1006, configured to send a link of a related recommended media content of the first media content to the application client.
By adopting the media content recommendation device provided by the application, on the basis of obtaining the candidate media content through collaborative filtering, the user similarity between each candidate media content and the media content browsed by the user and the content similarity are further comprehensively considered, the candidate media content is further filtered to obtain the related recommended media content, and more media contents with stronger relevance and readability based on the media content are mined.
In some embodiments of the present application, the first similarity determination unit 1001 includes:
the first calculation module is used for calculating a first similarity between every two media contents according to the user access records of all the media contents in the latest first preset time period, recalculating every first period, and storing the first similarity in a first set, wherein the first period is less than the first preset time;
a second calculating module, configured to calculate, according to a user access record of each media content in a second latest preset time period, a first similarity between every two media contents, recalculate every second period, and store the first similarity in a second set, where the second period is shorter than the second preset time, the second preset time is shorter than the first preset time, and the second period is shorter than the first period;
the candidate media content determination unit 1003 includes:
a first similarity searching and obtaining module, configured to determine whether a first similarity between each second media content and the first media content is stored in the first set, and if so, search and obtain the first similarity between each second media content and the first media content in the first set; otherwise, searching and acquiring the first similarity of each second media content and the first media content in the second set.
In some embodiments of the present application, the related recommended media content determining unit 1005 is configured to sum the first similarity and the second similarity of each candidate media content by weighting to obtain the recommendability score of each candidate media content.
In some embodiments of the present application, the related recommended-media content determining unit 1005 is configured to:
obtains the popularity and the time-freshness of each candidate media content,
and weighting and summing the first similarity, the second similarity, the heat and/or the time novelty of each candidate media content to obtain the recommendability score of each candidate media content.
In some embodiments of the present application, the media content recommender further comprises:
and the duplication eliminating unit is used for eliminating the related recommended media contents accessed by the user from the related recommended media contents.
The modules may be implemented in the same server device or server cluster, or may be distributed in different server devices or server clusters.
The implementation principle of the functions of the above modules has been described in detail previously, and is not described in detail herein.
In one example, the modules of the media content recommender can be run on various computing devices and loaded into the memory of the computing device.
Fig. 11 shows a composition configuration diagram of a computing device in which the media content recommendation apparatus is located. As shown in fig. 11, the computing device includes one or more processors (CPUs) 1102, a communications module 1104, a memory 1106, a user interface 1110, and a communications bus 1108 for interconnecting these components.
The processor 1102 may receive and transmit data via the communication module 1104 to enable network communications and/or local communications.
The user interface 1110 includes one or more output devices 1112, including one or more speakers and/or one or more visual displays. The user interface 1110 also includes one or more input devices 1114, including, for example, a keyboard, a mouse, a voice command input unit or microphone, a touch screen display, a touch-sensitive input pad, a gesture-capture camera or other input buttons or controls, and the like.
Memory 1106 may be high-speed random access memory such as DRAM, SRAM, DDR RAM, or other random access solid state memory devices; or non-volatile memory, such as one or more magnetic disk storage devices, optical disk storage devices, flash memory devices, or other non-volatile solid-state storage devices.
The memory 1106 stores a set of instructions executable by the processor 1102, including:
an operating system 1116, including programs for handling various basic system services and for performing hardware dependent tasks;
the applications 1118, including various applications for media content recommendation, may implement the process flow in the above examples, such as may include some or all of the elements of the media content recommendation device shown in FIG. 10. At least one of the units 1001-1006 may store machine executable instructions. The processor 1102 is capable of performing the functions of at least one of the blocks 1001-1006 described above by executing machine-executable instructions in at least one of the blocks 1001-1006 in the memory 1106.
It should be noted that not all steps and modules in the above flows and structures are necessary, and some steps or modules may be omitted according to actual needs. The execution order of the steps is not fixed and can be adjusted as required. The division of each module is only for convenience of describing adopted functional division, and in actual implementation, one module may be divided into multiple modules, and the functions of multiple modules may also be implemented by the same module, and these modules may be located in the same device or in different devices.
The hardware modules in the embodiments may be implemented in hardware or a hardware platform plus software. The software includes machine-readable instructions stored on a non-volatile storage medium. Thus, embodiments may also be embodied as software products.
In various examples, the hardware may be implemented by specialized hardware or hardware executing machine-readable instructions. For example, the hardware may be specially designed permanent circuits or logic devices (e.g., special purpose processors, such as FPGAs or ASICs) for performing the specified operations. The hardware may also include programmable logic devices or circuits temporarily configured by software (e.g., including a general purpose processor or other programmable processor) to perform certain operations.
In addition, each example of the present application can be realized by a data processing program executed by a data processing apparatus such as a computer. It is clear that a data processing program constitutes the present application. Further, the data processing program, which is generally stored in one storage medium, is executed by directly reading the program out of the storage medium or by installing or copying the program into a storage device (such as a hard disk and/or a memory) of the data processing device. Such a storage medium therefore also constitutes the present application, which also provides a non-volatile storage medium in which a data processing program is stored, which data processing program can be used to carry out any one of the above-mentioned method examples of the present application.
The machine-readable instructions corresponding to the modules in fig. 11 may cause an operating system or the like operating on the computer to perform some or all of the operations described herein. The nonvolatile computer-readable storage medium may be a memory provided in an expansion board inserted into the computer or written to a memory provided in an expansion unit connected to the computer. A CPU or the like mounted on the expansion board or the expansion unit may perform part or all of the actual operations according to the instructions.
The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents, improvements and the like made within the spirit and principle of the present invention should be included in the scope of the present invention.

Claims (12)

1. A method for recommending media contents, comprising:
acquiring user access records of each media content in a preset time period, determining and storing a first similarity between every two media contents according to the user access records, wherein the first similarity represents the similarity of access users of the two media contents;
receiving a media content recommendation request sent by an application client, wherein when a first media content in the media contents is accessed by a user in the application client, the application client sends the media content recommendation request;
responding to the media content recommendation request, acquiring first similarity of each second media content and the first media content from the stored first similarity, and taking each second media content with the first similarity exceeding a first preset threshold value as candidate media content;
calculating a second similarity between each candidate media content and the first media content, the second similarity characterizing content similarity of the two media contents;
calculating a recommendability score for each candidate media content based on a first similarity and a second similarity between each candidate media content of the candidate media contents and the first media content, and regarding candidate media contents with recommendability scores exceeding a second preset threshold as related recommended media contents of the first media content;
and sending a link of the related recommended media content of the first media content to the application client.
2. The media content recommendation method according to claim 1, wherein the obtaining a user access record for each media content within a predetermined time period, and determining a first similarity between every two media contents according to the user access record comprises:
calculating a first similarity between every two media contents according to a user access record of each media content in a latest first preset time period, recalculating once after each first period, and storing the first similarity in a first set, wherein the first period is less than the first preset time;
calculating a first similarity between every two media contents according to a user access record of each media content in a second latest preset time period, recalculating every second period, and storing the first similarity in a second set, wherein the second period is shorter than the second preset time, the second preset time is shorter than the first preset time, and the second period is shorter than the first period;
the obtaining of the first similarity between each second media content and the first media content includes:
judging whether first similarity between each second media content and the first media content is stored in the first set, if so, searching and acquiring the first similarity between each second media content and the first media content in the first set; otherwise, searching and acquiring the first similarity of each second media content and the first media content in the second set.
3. The media content recommendation method of claim 2, wherein the calculating a first similarity between each two media contents comprises:
obtaining an access user vector of each of the two media contents;
calculating cosine similarity of access user vectors of the two media contents;
and taking the cosine similarity obtained by calculation as the first similarity between the two media contents.
4. The media content recommendation method of claim 1, wherein said calculating a second similarity between each candidate media content and the first media content comprises:
obtaining a keyword vector of each candidate media content;
obtaining a keyword vector of the first media content;
calculating cosine similarity of the keyword vector of the first media content and the keyword vector of each candidate media content;
and taking the cosine similarity obtained by calculation as a second similarity between each candidate media content and the first media content.
5. The media content recommendation method of claim 1, wherein said calculating a recommendability score for each candidate media content comprises: and weighting and summing the first similarity and the second similarity of each candidate media content to obtain the recommendability score of each candidate media content.
6. The media content recommendation method of claim 1, wherein said calculating a recommendability score for each candidate media content comprises:
obtains the popularity and the time-freshness of each candidate media content,
and weighting and summing the first similarity, the second similarity, the heat and/or the time novelty of each candidate media content to obtain the recommendability score of each candidate media content.
7. The media content recommendation method of claim 1, wherein the method further comprises:
and eliminating the related recommended media contents which are accessed by the user from the related recommended media contents.
8. A media content recommender, comprising:
the first similarity determining unit is used for acquiring user access records of each media content in a preset time period, determining and storing a first similarity between every two media contents according to the user access records, wherein the first similarity represents the similarity of access users of the two media contents;
a media content recommendation request receiving unit, configured to receive a media content recommendation request sent by an application client, where when a first media content in the media contents is accessed by a user in the application client, the application client sends the media content recommendation request;
the candidate media content determining unit is used for responding to the media content recommendation request, acquiring first similarity of each second media content and the first media content from the stored first similarity, and taking each second media content with the first similarity exceeding a first preset threshold value as candidate media content;
the second similarity calculation unit is used for calculating second similarity between each candidate media content and the first media content, and the second similarity represents the content similarity of the two media contents;
a relevant recommended media content determining unit, configured to calculate a recommendability score of each candidate media content based on a first similarity and a second similarity between each candidate media content of the candidate media contents and the first media content, and use a candidate media content with a recommendability score exceeding a second preset threshold as a relevant recommended media content of the first media content;
a sending unit, configured to send a link of a recommended media content related to the first media content to the application client.
9. The media content recommendation device of claim 8, wherein the first similarity determination unit comprises:
the first calculation module is used for calculating a first similarity between every two media contents according to the user access records of all the media contents in the latest first preset time period, recalculating every first period, and storing the first similarity in a first set, wherein the first period is less than the first preset time;
a second calculating module, configured to calculate, according to a user access record of each media content in a second latest preset time period, a first similarity between every two media contents, recalculate every second cycle, and store the first similarity in a second set, where the second cycle is shorter than the second preset time, the second preset time is shorter than the first preset time, and the second cycle is shorter than the first cycle;
the candidate media content determining unit includes:
a first similarity searching and obtaining module, configured to determine whether a first similarity between each second media content and the first media content is stored in the first set, and if so, search and obtain the first similarity between each second media content and the first media content in the first set; otherwise, searching and acquiring the first similarity of each second media content and the first media content in the second set.
10. The media content recommender according to claim 8, wherein said related recommended media content determining unit is configured to sum the first similarity and the second similarity of each candidate media content by weighting to obtain the recommendability score of each candidate media content.
11. The media content recommender according to claim 8, wherein the related recommended media content determining unit is adapted to:
obtains the popularity and the time-freshness of each candidate media content,
and weighting and summing the first similarity, the second similarity, the heat and/or the time novelty of each candidate media content to obtain the recommendability score of each candidate media content.
12. The media content recommendation device of claim 8, wherein the device further comprises:
and the duplication eliminating unit is used for eliminating the related recommended media contents accessed by the user from the related recommended media contents.
CN201710037620.6A 2017-01-18 2017-01-18 Media content recommendation method and device Active CN108319622B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201710037620.6A CN108319622B (en) 2017-01-18 2017-01-18 Media content recommendation method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710037620.6A CN108319622B (en) 2017-01-18 2017-01-18 Media content recommendation method and device

Publications (2)

Publication Number Publication Date
CN108319622A CN108319622A (en) 2018-07-24
CN108319622B true CN108319622B (en) 2022-11-11

Family

ID=62892040

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710037620.6A Active CN108319622B (en) 2017-01-18 2017-01-18 Media content recommendation method and device

Country Status (1)

Country Link
CN (1) CN108319622B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111046230B (en) * 2019-12-06 2023-09-15 北京奇艺世纪科技有限公司 Content recommendation method and device, electronic equipment and storable medium
CN112256970A (en) * 2020-10-28 2021-01-22 四川金熊猫新媒体有限公司 News text pushing method, device, equipment and storage medium

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103106208A (en) * 2011-11-11 2013-05-15 中国移动通信集团公司 Streaming media content recommendation method and system in mobile internet
CN105868248A (en) * 2015-12-15 2016-08-17 乐视网信息技术(北京)股份有限公司 Media recommendation method and device
CN106126669A (en) * 2016-06-28 2016-11-16 北京邮电大学 User collaborative based on label filters content recommendation method and device

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103106208A (en) * 2011-11-11 2013-05-15 中国移动通信集团公司 Streaming media content recommendation method and system in mobile internet
CN105868248A (en) * 2015-12-15 2016-08-17 乐视网信息技术(北京)股份有限公司 Media recommendation method and device
CN106126669A (en) * 2016-06-28 2016-11-16 北京邮电大学 User collaborative based on label filters content recommendation method and device

Also Published As

Publication number Publication date
CN108319622A (en) 2018-07-24

Similar Documents

Publication Publication Date Title
KR101721338B1 (en) Search engine and implementation method thereof
JP6196316B2 (en) Adjusting content distribution based on user posts
US10133710B2 (en) Generating preview data for online content
US9348935B2 (en) Systems and methods for augmenting a keyword of a web page with video content
US7685200B2 (en) Ranking and suggesting candidate objects
US8332763B2 (en) Aggregating dynamic visual content
US9135354B2 (en) Method and system for topical browser history
US10719836B2 (en) Methods and systems for enhancing web content based on a web search query
US8863000B2 (en) Method and system for action suggestion using browser history
US9442903B2 (en) Generating preview data for online content
US10083248B2 (en) Method and system for topic-based browsing
US20150324448A1 (en) Information Recommendation Processing Method and Apparatus
CN109388760B (en) Recommendation label obtaining method, media content recommendation method, device and storage medium
WO2017041359A1 (en) Information pushing method, apparatus and device, and non-volatile computer storage medium
US11755676B2 (en) Systems and methods for generating real-time recommendations
US20110161413A1 (en) User interface for web comments
US11836778B2 (en) Product and content association
US20120095834A1 (en) Systems and methods for using a behavior history of a user to augment content of a webpage
CN111552884B (en) Method and apparatus for content recommendation
JP2010097461A (en) Document search apparatus, document search method, and document search program
TWI417751B (en) Information providing device, information providing method, information application program, and information recording medium
CN108319622B (en) Media content recommendation method and device
US9064014B2 (en) Information provisioning device, information provisioning method, program, and information recording medium
CN105243073A (en) Bookmark access method and device and terminal
US8365064B2 (en) Hyperlinking web content

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant