CN109543109B - Recommendation algorithm integrating time window technology and scoring prediction model - Google Patents
Recommendation algorithm integrating time window technology and scoring prediction model Download PDFInfo
- Publication number
- CN109543109B CN109543109B CN201811425529.2A CN201811425529A CN109543109B CN 109543109 B CN109543109 B CN 109543109B CN 201811425529 A CN201811425529 A CN 201811425529A CN 109543109 B CN109543109 B CN 109543109B
- Authority
- CN
- China
- Prior art keywords
- user
- item
- scoring
- score
- recommendation
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q30/00—Commerce
- G06Q30/06—Buying, selling or leasing transactions
- G06Q30/0601—Electronic shopping [e-shopping]
- G06Q30/0631—Item recommendations
Landscapes
- Business, Economics & Management (AREA)
- Accounting & Taxation (AREA)
- Finance (AREA)
- Development Economics (AREA)
- Economics (AREA)
- Marketing (AREA)
- Strategic Management (AREA)
- Physics & Mathematics (AREA)
- General Business, Economics & Management (AREA)
- General Physics & Mathematics (AREA)
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention discloses a recommendation algorithm integrating a time window technology and a score prediction model, which belongs to the technical field of electronic commerce and aims to solve the technical problems of data sparsity and user interest change along with time in a collaborative filtering recommendation algorithm, and adopts the technical scheme that: based on a collaborative filtering recommendation algorithm and a time window technology, a new scoring prediction model is adopted to complement a scoring matrix, the scoring prediction model can overcome the problems of overhigh time complexity and non-unique scoring prediction result caused by the traditional non-negative matrix decomposition scoring prediction model to a certain extent, the scoring prediction precision is improved, meanwhile, the influence of data sparseness on the collaborative filtering algorithm is reduced, then, the recommendation model is utilized to predict items possibly preferred by a user according to the interest of the user, and the items with higher user prediction scoring are selected for utilizationTopNThe recommendation method generates a comprehensive recommendation list, and the recommendation effect of the algorithm is improved.
Description
Technical Field
The invention relates to the technical field of electronic commerce, in particular to a recommendation algorithm integrating a time window technology and a scoring prediction model.
Background
The rapid development of the internet enables the total amount of human information to increase explosively, information overload occurs, people are submerged by massive information, for example, Amazon has millions of books, and del. Recommendation systems have received significant attention and research, both in academic and industrial settings.
The recommendation system method mainly comprises two categories of content-based algorithm and collaborative filtering. The recommendation algorithm based on the content mainly uses a multi-dimensional vector to represent the user interest through a text processing technology, and meanwhile, the item is also subjected to feature extraction to establish a feature vector. The recommendation is made by calculating the similarity between the user interest vector and the item feature vector.
Collaborative filtering includes memory-based and model-based algorithms as well as various fusion algorithms. The collaborative filtering algorithm based on the memory firstly calculates the similarity between users and between items according to the past behavior records of the users. And then recommending the items purchased and scored by the user with high similarity to the user or recommending the items with high similarity to the previous purchased items of the user. However, the actual data volume of the electronic commerce system is huge, the user scoring matrix is sparse, and the accuracy is low by utilizing the traditional memory-based recommendation algorithm.
The mainstream recommendation algorithm comprises collaborative filtering recommendation and context recommendation, the collaborative filtering recommendation algorithm belongs to a classical recommendation algorithm, the current collaborative filtering recommendation mainly comprises three categories, namely memory-based, model-based and mixed recommendation, different categories meet different requirements and applications, the collaborative filtering algorithm has the advantages of being prominent, the model has strong trafficability, the realization is simple, the effect is good, and the defects are obvious, such as the cold start problem and the problem that the user interest changes along with time.
The model-based collaborative recommendation algorithm computes a user behavior model based on the user's previous scores and various implicit preferences. And then predicting the scoring behavior of the user according to the model. The matrix decomposition recommendation algorithm scores items by establishing k-dimensional feature vectors for n users and m items, converting a scoring matrix with the size of n x m into two matrixes of n x k and k x m, and then calculating the dot products of the user feature vectors and the item feature vectors. Many recommendation algorithms search neighbors for recommendation by calculating the global similarity of users, but the interests of the users may be similar in some aspect, so the recommendation model based on the bayesian classification classifies scores and adopts different similar users to recommend respectively. Models in the recommendation field are many, wherein matrix decomposition models have good performance, so that many scholars are working on personalized recommendation by using matrix decomposition, such as non-negative matrix decomposition, and the main disadvantages of the models are that the time complexity is high, the obtained results are not unique, and the global minimum point is difficult to obtain.
The Probabilistic Latent Semantic (PLSA) method is also a model-based algorithm that extracts hidden variables to model user preferences and can achieve a relatively high accuracy. Most of the widely used recommendation algorithms are static models, which only simply integrate the historical data of the user and do not consider the interest change situation of the user.
The recommendation system and the personalized customization service are widely applied to the field of electronic commerce as important methods for overcoming data overload, but how to deal with the time complexity existing in the traditional non-negative matrix is high, the obtained structure is not unique, and the change of user interest in a collaborative filtering recommendation algorithm along with time is a technical problem which needs to be solved urgently at present.
Disclosure of Invention
The technical task of the invention is to provide a recommendation algorithm integrating a time window technology and a score prediction model, and solve the problems that the time complexity is high, the obtained structure is not unique and the user interest changes along with time in a collaborative filtering recommendation algorithm in the traditional non-negative matrix.
The technical task of the invention is realized in the following way, a recommendation algorithm fusing a time window technology and a score prediction model is based on a collaborative filtering recommendation algorithm and a time window technology, the interest similarity of a user is calculated by using the score prediction model, then the recommendation model is used for predicting the possibly preferred items of the user according to the interest of the user, and the items with higher user prediction scores are selected to produce a recommendation list by using a TopN recommendation method;
the specific method comprises the following steps:
s1, constructing a user-item scoring matrix through the scoring of the item by the user, and carrying out normalization processing on the scoring to obtain a normalized user-item scoring matrix;
s2, according to the score prediction model, calculating and obtaining a feature matrix of the user and a feature matrix of the project through the recommendation model, multiplying the feature matrices to obtain the prediction score of the user on the unscored project, and obtaining a TopN2 recommendation list by using a TopN recommendation method;
s3, restoring the obtained prediction scores to original scores according to a normalization processing principle to obtain a dense score matrix, dividing a plurality of different time windows by adopting a time window technology, endowing time scales to the scored items according to the time windows, calculating the overall similarity between each user prediction scoring item and the item set in each time window, and taking any time scale in the time window with the highest similarity as the time scale for predicting the article scoring behavior;
s4, calculating interest similarity among users by adopting a collaborative filtering algorithm, constructing a user interest similarity matrix, calculating interest preference of a target user on resources, producing a TopN1 recommendation list by utilizing the top N commodities with highest user interest by using a TopN recommendation method, and fusing the TopN1 recommendation list with the TopN2 recommendation list to generate a TopN recommendation list.
Preferably, the method for normalizing the score in step S1 is specifically formulated as follows:
wherein u represents a user; i represents an item; m represents the highest score value obtained according to the range of the scoring data supported by the system; r isu,iRepresenting the user's true value of credit to the project;represents the value of the score obtained by normalizing the actual value of the score of the item by the user,
preferably, the scoring prediction model is used for predicting the scoring of the unscored items by the user; the method comprises the following specific steps:
the real score of the user u can be calculatedWhereinρu,iAnd the method is used for calculating the normalized scoring result and further determining the unknown variable so as to determine the characteristic vectors of the users and the projects, and after the corresponding characteristic vectors of the users and the projects are obtained, the scoring of the users on the unscored projects can be obtained through matrix operation. A
In fig. 1, α and β represent parameters of the model; alpha represents the overlapping degree of the user characteristics and takes a value range of [0, 1]]When alpha is close to 0, the representative users tend to have the same characteristics, in short, the user characteristics are single, and if alpha is larger, the corresponding users tend to have different user characteristics, and the characteristics of the users are more; beta is larger than 1, the larger the value of beta is, the more information needed for proving that a certain characteristic of a certain user is prominent is represented, and k represents the dimension of the vector; u represents user, i represents item, V represents item characteristic vector, and the initialization of item characteristic matrix adopts beta distribution Vi,kBeta (Beta ) with a value range of [0, 1]];UnThe eigenvectors representing the users, in particular the eigenvector for user u, obey the Dirichlet distribution, notedVector (U)u,1,…,Uu,k) Representing the components of user u in the various dimensions, thanks to the use of the dirichlet distribution model,absence of real or reshaped values; z in modelu,iAnd ρi,uIs a random variable set for each user u and item i; zu,iIs a random variable subject to a classification distribution,Zu,ithe value in (1) represents the user's u rating for item i; rhou,iAre random variables that are subject to a binomial distribution,representing the confidence level of the user for a certain preference item.
Preferably, the calculation model is used for predicting items which are possibly preferred by the user according to the interest preference of the user; the specific implementation method of the calculation model is as follows:
(1) inputting alpha, beta and a scoring matrix R; wherein α and β represent parameters of an estimation model; alpha represents the overlapping degree of the user characteristics, and alpha is more than or equal to 0 and less than or equal to 1; when alpha is close to 0, the representative user tends to the same characteristic, namely the characteristic of the user is relatively single; if alpha is larger, the user characteristics tend to be different, namely more user characteristics are indicated; beta represents the amount of information required by a certain characteristic of a user to be highlighted, beta is larger than 1, and the larger the value of beta represents the more information required by a certain characteristic of a user to be highlighted;
(2) normalizing the scoring matrix R to generate a matrix R';
(3) random initialization free parameter gammau,k、Andγu,kis a matrix of dimension u x k,andmatrices of dimensions v × k, respectively; wherein u represents a user, k represents a matrix as a dimension, and v represents an item;
(4) recording according to the real score of the user uCalculating lambda of corresponding user u to corresponding item iu,i,k:
Wherein λ isu,i,kRepresenting a dependent variable Zu,iIn the distribution of the classes obeyedParameter, λ'u,i,kMeans for calculating lambdau,i,kAn intermediate amount of (a);
wherein Γ (x) represents a gamma function; Γ' (x) represents the derivative of the gamma function; y represents the highest score associated with system support, Y is a constant, fixed at 4; rhou,iA conditional probability distribution representing user u prefers item i; x represents;
(5) recording according to the real score of the user uCalculating and updating gamma of corresponding user uu,k:
Wherein, γu,kRepresenting the relevant parameters in the dirichlet distribution to which user u obeys;
(6) recording according to the real score of the user uCalculating and updating corresponding items i
if yes, repeating the steps (4) to (6);
if not, executing the step (8);
(10) the scoring prediction completion matrix R' generates a TopN2 recommendation list by using a TopN recommendation method;
(11) dividing a time window according to a time window technology;
(12) calculating the similarity of the projects:
wherein sim (i, j) represents the similarity of item i and item j; w (u, i) and W (u, j) represent the combined weight of the time weight and the data weight;andthe average scores of item i and item j are respectively represented,represents the average score of the user u;
(13) and calculating the comprehensive similarity of each user unscored item and the items in each time window:
wherein, Iu,jRepresents the set of items in the jth time window of user u, size (I)u,j) Representation set Iu,jThe size of (d);
(14) selecting the first k ' items with the highest similarity to be given with time scales (selecting the first k ' unscored items with the highest similarity to be given with time scales, wherein the optimal value of k ' is different according to different data sets, and the optimal value needs to be obtained through experiments);
(15) converting the dense matrix with the time scale into a three-dimensional scoring matrix of user-project-time;
(16) and acquiring the similarity between users:
wherein the content of the first and second substances,tirepresenting the time weight of the corresponding user for scoring the corresponding project, and obtaining the time weight through a user-project-time three-dimensional matrix;
(17) and obtaining the preference value of the user to the unscored items:
s (u, k) represents the first k users with similar interests to the users, N (i) represents a user set with scores for the item i, and sim (u, v) represents the interest similarity between the users u and v;
(18) on the basis of score prediction, selecting a project with higher user prediction score to generate a TopN2 recommendation list, and combining the TopN2 recommendation list and the TopN1 recommendation list in a weighted manner to form a new TopN recommendation list, wherein the formula is as follows:
TopN=εTopN1+(1-ε)TopN2
wherein epsilon is between 0 and 1, and different data sets have different epsilon optima.
Preferably, the calculation method for dividing the time window according to the time window technique in the step (11) is as follows:
Tuk(k)=Tu0-θ(k-1)-ka1
wherein alpha is1The size of a first time window is represented and represents the length of the user interest, and the larger the value is, the larger the window is; theta represents the interval increase amplitude of the time window catch, and is T in sequenceu1,Tu2,…,TukThe value is larger, the interest of the user is changed faster, and vice versa.
More preferably, the difference of the time windows corresponds to different weights, and an Ebingos forgetting curve is adopted as a time function:
f(u,i)=0.318×(T0-Tuk)-0.125
wherein, TukA time window representing the time when user u accessed i; the value of f (u, i) ranges from [0, 1]]Presenting a forgetting rule of first-speed and second-speed;
and (3) defining the interest degree of the target user on the item by combining the data weight:
W(u,i)=f(u,i)×β1+w(u,i)×(1-β1),β1∈[0,1]
wherein w (u, i) represents the interest degree of the user u in the item i in the recent time period; i denotes the user's most recent time period (T)u1~Tu0) A set of items that have been accessed; sim (i, j) represents the similarity of item i and item j;
preferably, the calculation of the item similarity sim (i, j) in w (u, i) uses the modified cosine similarity:
wherein the content of the first and second substances,andthe average scores of item i and item j are respectively represented,representing the average score of the user u score.
The recommendation algorithm integrating the time window technology and the score prediction model has the following advantages:
the method mainly comprises a scoring prediction model and a collaborative filtering recommendation model, a time window technology is adopted for time processing, the influence of data sparsity on a collaborative filtering algorithm can be made up through the method, and due to the introduction of the time window technology, a recommendation result can better conform to the interest change of a user;
compared with the traditional matrix decomposition algorithm, the method adopts a new scoring prediction model, and the scoring prediction is greatly improved;
thirdly, the time window technology is adopted to divide the time window, and the influence of time factors on the user interest is considered, so that the recommendation result is more reasonable;
combining the scoring prediction model with the time window technology and the collaborative filtering algorithm, the cold start problem existing in the collaborative filtering algorithm can be solved, meanwhile, the influence of data sparsity on the collaborative filtering algorithm can be improved, particularly, the time window technology is added, so that the collaborative filtering algorithm can adapt to the interest change of the user, and the recommendation effect is improved;
and fifthly, the novel non-negative matrix decomposition algorithm and the time window technology are fused in the collaborative filtering algorithm, and the advantages of the non-negative matrix decomposition algorithm and the time window technology are combined to solve the defects of the traditional non-negative matrix and the problem that the user interest in the collaborative filtering recommendation algorithm changes along with time, so that the collaborative filtering algorithm can adapt to the change of the user interest, and the recommendation effect is improved.
Drawings
The invention is further described below with reference to the accompanying drawings.
FIG. 1 is a flow diagram of the present invention;
FIG. 2 is a diagram illustrating a relationship between score prediction models;
FIG. 3 is a schematic diagram of time window division;
FIG. 4 is a graph of accuracy;
fig. 5 is a graph of recall.
Detailed Description
A recommendation algorithm incorporating a time window technique and a score prediction model according to the present invention will be described in detail below with reference to the accompanying drawings and specific embodiments.
The first embodiment is as follows:
the recommendation algorithm fusing the time window technology and the score prediction model is based on a collaborative filtering recommendation algorithm and a time window technology, the interest similarity of a user is calculated by using the score prediction model, then the recommendation model is used for predicting the possibly preferred items of the user according to the interest of the user, the items with higher user prediction scores are selected, and a TopN recommendation list is produced by using a TopN recommendation method;
the specific method comprises the following steps:
s1, constructing a user-item scoring matrix through the scoring of the item by the user, and carrying out normalization processing on the scoring to obtain a normalized user-item scoring matrix;
s2, according to the score prediction model, calculating and obtaining a feature matrix of the user and a feature matrix of the project through the recommendation model, multiplying the feature matrices to obtain the prediction score of the user on the unscored project, and obtaining a TopN2 recommendation list by using a TopN recommendation method;
the specific formula of the method for carrying out normalization processing on the scores is as follows:
wherein u represents a user; i represents an item; m represents the highest score value obtained according to the range of the scoring data supported by the system; r isu,iRepresenting the user's true value of credit to the project;represents the value of the score obtained by normalizing the actual value of the score of the item by the user,
s3, restoring the obtained prediction scores to original scores according to a normalization processing principle to obtain a dense score matrix, dividing a plurality of different time windows by adopting a time window technology, endowing time scales to the scored items according to the time windows, calculating the overall similarity between each user prediction scoring item and the item set in each time window, and taking any time scale in the time window with the highest similarity as the time scale for predicting the article scoring behavior; the scoring matrix is complemented through a scoring prediction model to further obtain a dense scoring matrix, on the basis, time scales are given to scores in the scoring matrix, the interest of a user can be known to have timeliness according to related research, the interest of the user can change along with the change of time, but the interest of the user in a short period is basically unchanged, according to a time curve of a human-like forgetting rule provided in the related research, the time of the earliest scoring item of the user is set to be 0, and the time of the latest scoring item of the user is set to be Tu0From 0 to Tu0This period of time is divided into a plurality of time segments, as shown in FIG. 3Shown in the figure.
S4, calculating interest similarity among users by adopting a collaborative filtering algorithm, constructing a user interest similarity matrix, calculating interest preference of a target user on resources, producing a TopN1 recommendation list by utilizing the top N commodities with highest user interest by using a TopN recommendation method, and fusing the TopN1 recommendation list with the TopN2 recommendation list to generate a TopN recommendation list.
The scoring prediction model is used for predicting the scoring of the user on the unscored items; the method comprises the following specific steps:
the real score of the user u can be calculatedWherein the content of the first and second substances,ρu,iand the method is used for calculating the normalized scoring result and further determining the unknown variable so as to determine the characteristic vectors of the users and the projects, and after the corresponding characteristic vectors of the users and the projects are obtained, the scoring of the users on the unscored projects can be obtained through matrix operation.
As shown in fig. 2, α and β represent parameters of the model; alpha represents the overlapping degree of the user characteristics, the value range is [0, 1], when alpha is close to 0, the user tends to the same characteristics, in short, the user characteristics are single, if alpha is larger, the corresponding user tends to different user characteristics, and the user characteristics are more; beta is larger than 1, the larger the value of beta is, the more information needed for proving that a certain characteristic of a certain user is prominent is represented, and k represents the dimension of the vector;
u represents user, i represents item, V represents item characteristic vector, and the initialization of item characteristic matrix adopts beta distribution Vi,kBeta (Beta ) with a value range of [0, 1]];UnThe eigenvectors representing the users, in particular the eigenvector for user u, obey the Dirichlet distribution, notedVector (U)u,1,…,Uu,k) Represent the components of user u in various dimensions, sinceA dirichlet distribution model is used,absence of real or reshaped values; z in modelu,iAnd ρi,uIs a random variable set for each user u and item i; zu,iIs a random variable subject to a classification distribution,Zu,ithe value in (1) represents the user's u rating for item i; rhoi,uAre random variables that are subject to a binomial distribution,representing the confidence level of the user for a certain preference item.
As shown in FIG. 1, the calculation model is used for predicting items which may be preferred by a user according to the interest preference of the user; the specific implementation method of the calculation model is as follows:
(1) inputting alpha, beta and a scoring matrix R; wherein α and β represent parameters of an estimation model; alpha represents the overlapping degree of the user characteristics, and alpha is more than or equal to 0 and less than or equal to 1; when alpha is close to 0, the representative user tends to the same characteristic, namely the characteristic of the user is relatively single; if alpha is larger, the user characteristics tend to be different, namely more user characteristics are indicated; beta represents the amount of information required by a certain characteristic of a user to be highlighted, beta is larger than 1, and the larger the value of beta represents the more information required by a certain characteristic of a user to be highlighted;
(2) normalizing the scoring matrix R to generate a matrix R';
(3) random initialization free parameter gammau,k、Andγu,kis a matrix of dimension u x k,andmatrices of dimensions v × k, respectively; wherein u represents a user, k represents a matrix as a dimension, and v represents an item;
(4) recording according to the real score of the user uCalculating lambda of corresponding user u to corresponding item iu,i,k:
Wherein λ isu,i,kRepresenting a dependent variable Zu,iParameter in the classification distribution obeyed, λ'u,i,kMeans for calculating lambdau,i,kAn intermediate amount of (a);
wherein Γ (x) represents a gamma function; Γ' (x) represents the derivative of the gamma function; y represents the highest score associated with system support, Y is a constant, fixed at 4; rhou,iA conditional probability distribution representing user u prefers item i; x represents;
(5) recording according to the real score of the user uCalculating and updating gamma of corresponding user uu,k:
Wherein, γu,kRepresenting the relevant parameters in the dirichlet distribution to which user u obeys;
(6) recording according to the real score of the user uCalculating and updating corresponding items i
if yes, repeating the steps (4) to (6);
if not, executing the step (8);
(10) the scoring prediction completion matrix R' generates a TopN2 recommendation list by using a TopN recommendation method;
(11) dividing a time window according to a time window technology; the calculation method for dividing the time window according to the time window technology is as follows:
Tuk(k)=Tu0-θ(k-1)-ka1
wherein alpha is1The size of a first time window is represented and represents the length of the user interest, and the larger the value is, the larger the window is; theta represents the interval increase amplitude of the time window catch, and is T in sequenceu1,Tu2,…,TukThe value is larger, the interest of the user is changed faster, and vice versa. In this embodiment, θ is 0, that is, the time window is divided equally.
Different time windows correspond to different weights, and an Ebingois forgetting curve is adopted as a time function:
f(u,i)=0.318×(T0-Tuk)-0.125
wherein, TukA time window representing the time when user u accessed i; the value of f (u, i) ranges from [0, 1]]Presenting a forgetting rule of first-speed and second-speed;
and (3) defining the interest degree of the target user on the item by combining the data weight:
W(u,i)=f(u,i)×β1+w(u,i)×(1-β1),β1∈[0,1]
wherein w (u, i) represents the interest degree of the user u in the item i in the recent time period; i denotes the user's most recent time period (T)u1~Tu0) A set of items that have been accessed; sim (i, j) represents the similarity of item i and item j; sim (i, j) is calculated using the modified cosine similarity:
wherein the content of the first and second substances,andthe average scores of item i and item j are respectively represented,representing the average score of the user u score.
(12) Calculating the similarity of the projects:
wherein sim (i, j) represents the similarity of item i and item j; w (u, i) and W (u, j) represent the combined weight of the time weight and the data weight;andthe average scores of item i and item j are respectively represented,represents the average score of the user u;
(13) and calculating the comprehensive similarity of each user unscored item and the items in each time window:
wherein, Iu,jRepresents the set of items in the jth time window of user u, size (I)u,j) Representation set Iu,jThe size of (d);
(14) selecting the first k ' items with the highest similarity to be given with time scales (selecting the first k ' unscored items with the highest similarity to be given with time scales, wherein the optimal value of k ' is different according to different data sets, and the optimal value needs to be obtained through experiments);
(15) converting the dense matrix with the time scale into a three-dimensional scoring matrix of user-project-time;
(16) acquiring similarity among users, and calculating user interest similarity by using Pearson correlation similarity based on time weighting:
wherein, a logistic function is used as a weight function, different weights are given to different time windows, and scores in the same time window are given to the same weight formula as follows:tirepresenting the time weight of the corresponding user for scoring the corresponding project, and obtaining the time weight through a user-project-time three-dimensional matrix;
(17) and obtaining the preference value of the user to the unscored items:
s (u, k) represents the first k users with similar interests to the users, N (i) represents a user set with scores for the item i, and sim (u, v) represents the interest similarity between the users u and v;
(18) on the basis of score prediction, selecting a project with higher user prediction score to generate a TopN2 recommendation list, and combining the TopN2 recommendation list and the TopN1 recommendation list in a weighted manner to form a new TopN recommendation list, wherein the formula is as follows:
TopN=εTopN1+(1-ε)TopN2
wherein epsilon is between 0 and 1, and different data sets have different epsilon optima.
Example two: detailed description of the invention
A. Data sets Netflix, Movielens 20M, Movielens 10M, Movielens1M and epion are used; constructing a user-item scoring matrix according to the scoring of the item by the user, and carrying out normalization processing on the scoring to obtain a normalized scoring matrix;
B. obtaining the predicted score of the user for the item through a score prediction model, and generating a TopN2 recommendation list, wherein the accuracy of the score adopts MAE and CMAE, and the calculation formula is as follows:
the results are shown in the following table:
C. the scoring matrix can be completed through scoring prediction to obtain a dense user scoring matrix, a time window technology is adopted (the number of time windows is set to be 6 according to related research results, equal division is carried out), and the overall similarity sim (I, I) of each user predicted scoring item and an item set in each time window is further obtained through calculating the similarity sim (I, j) between itemsu,j) The top k' unscored items with the highest similarity are selected and assigned a time scale.
D. And finally, converting the dense matrix with the time information into a user-item-time three-dimensional scoring matrix, calculating the user interest similarity sim (u, v) on the basis, further calculating the interest degree of the user on the item, generating a TopN1 recommendation list, and finally generating and feeding back the TopN recommendation list to the user, wherein the effect of the recommendation list is measured by adopting the accuracy and the recall rate, as shown in the attached figures 4 and 5.
Finally, it should be noted that: the above embodiments are only used to illustrate the technical solution of the present invention, and not to limit the same; while the invention has been described in detail and with reference to the foregoing embodiments, it will be understood by those skilled in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some or all of the technical features may be equivalently replaced; and the modifications or the substitutions do not make the essence of the corresponding technical solutions depart from the scope of the technical solutions of the embodiments of the present invention.
Claims (6)
1. A recommendation method combining a time window technology and a score prediction model is characterized in that the method is based on a collaborative filtering recommendation method and a time window technology, the score prediction model is used for calculating interest similarity of users, then the recommendation model is used for predicting items which the users are likely to prefer according to the interests of the users, and the items with higher user prediction scores are selected to produce a recommendation list by using a TopN recommendation method;
the specific method comprises the following steps:
s1, constructing a user-item scoring matrix through the scoring of the item by the user, and carrying out normalization processing on the scoring to obtain a normalized user-item scoring matrix;
s2, according to the score prediction model, calculating and obtaining a feature matrix of the user and a feature matrix of the project through the recommendation model, multiplying the feature matrices to obtain the prediction score of the user on the unscored project, and obtaining a TopN2 recommendation list by using a TopN recommendation method;
s3, restoring the obtained prediction scores to original scores according to a normalization processing principle to obtain a dense score matrix, dividing a plurality of different time windows by adopting a time window technology, endowing time scales to the scored items according to the time windows, calculating the overall similarity between each user prediction scoring item and the item set in each time window, and taking any time scale in the time window with the highest similarity as the time scale for predicting the article scoring behavior;
s4, calculating interest similarity among users by adopting a collaborative filtering algorithm, constructing a user interest similarity matrix, calculating interest preference of a target user on resources, producing a TopN1 recommendation list by utilizing the top N items with highest user interest by using a TopN recommendation method, and fusing the TopN1 recommendation list with a TopN2 recommendation list to generate a TopN recommendation list;
the calculation model is used for predicting items which are possibly preferred by the user according to the interest preference of the user; the specific implementation method of the calculation model is as follows:
(1) inputting alpha, beta and a scoring matrix R; wherein α and β represent parameters of an estimation model; alpha represents the overlapping degree of the user characteristics, and alpha is more than or equal to 0 and less than or equal to 1; when alpha is close to 0, the representative user tends to the same characteristic, namely the characteristic of the user is relatively single; if alpha is larger, the user characteristics tend to be different, namely more user characteristics are indicated; beta represents the amount of information required by a certain characteristic of a user to be highlighted, beta is larger than 1, and the larger the value of beta represents the more information required by a certain characteristic of a user to be highlighted;
(2) normalizing the scoring matrix R to generate a matrix R';
(3) random initialization free parameter gammau,k、Andγu,kis a matrix of dimension u x k,andmatrices of dimensions v × k, respectively; wherein u represents a user, k represents a matrix as a dimension, and v represents an item;
(4) recording according to the real score of the user uCalculating lambda of corresponding user u to corresponding item iu,i,k:
Wherein λ isu,i,kRepresenting a dependent variable Zu,iParameter in the distribution of classes obeyed, λ ″)u,i,kMeans for calculating lambdau,i,kAn intermediate amount of (a);
wherein Γ (x) represents a gamma function; Γ' (x) represents the derivative of the gamma function; y represents the highest score associated with system support, Y is a constant, fixed at 4; rhou,iA conditional probability distribution representing user u prefers item i;
(5) recording according to the real score of the user uCalculating and updating gamma of corresponding user uu,k:
Wherein, γu,kRepresenting the relevant parameters in the dirichlet distribution to which user u obeys;
(6) recording according to the real score of the user uCalculating and updating corresponding items i
if yes, repeating the steps (4) to (6);
if not, executing the step (8);
(10) the scoring prediction completion matrix R' generates a TopN2 recommendation list by using a TopN recommendation method;
(11) dividing a time window according to a time window technology;
(12) calculating the similarity of the projects:
wherein sim (i, j) represents the similarity of item i and item j; w (u, i) and W (u, j) represent the combined weight of the time weight and the data weight;andmean scores representing item i and item j, respectively;
(13) and calculating the comprehensive similarity between each user unscored item and the item in the time window:
wherein, Iu,jRepresents the set of items in the jth time window of user u, size (I)u,j) Representation set Iu,jThe size of (d);
(14) selecting the first k' items with the highest similarity and giving time scales;
(15) converting the dense matrix with the time scale into a three-dimensional scoring matrix of user-project-time;
(16) and acquiring the similarity between users:
wherein the content of the first and second substances,tirepresenting the time weight of the corresponding user for scoring the corresponding project, and obtaining the time weight through a user-project-time three-dimensional matrix;
(17) and obtaining the preference value of the user to the unscored items:
s (u, k) represents the first k users with similar interests to the users, N (i) represents a user set with scores for the item i, and sim (u, v) represents the interest similarity between the users u and v;
(18) on the basis of score prediction, selecting a project with higher user prediction score to generate a TopN2 recommendation list, and combining the TopN2 recommendation list and the TopN1 recommendation list in a weighted manner to form a new TopN recommendation list, wherein the formula is as follows:
TopN=εTopN1+(1-ε)TopN2
wherein epsilon is between 0 and 1, and different data sets have different epsilon optima.
2. The recommendation method combining the time window technique and the score prediction model according to claim 1, wherein the normalization processing of the score in step S1 is performed according to the following specific formula:
wherein u represents a user; i represents an item; m represents the highest score value obtained according to the range of the scoring data supported by the system; r isu,iRepresenting the user's true value of credit to the project;represents the value of the score obtained by normalizing the actual value of the score of the item by the user,
3. the recommendation method combining the time window technology and the scoring prediction model according to claim 1 or 2, wherein the scoring prediction model is used for predicting the scoring of unscored items by a user; the method comprises the following specific steps:
the real score of the user u can be calculatedWherein the content of the first and second substances,ρu,ithe system is used for calculating the normalized scoring result and further determining an unknown variable so as to determine the characteristic vectors of the users and the projects, and after the corresponding characteristic vectors of the users and the projects are obtained, the scoring of the users on the unscored projects can be obtained through matrix operation;
u represents a user; i represents an item; m represents the highest score value obtained according to the range of the scoring data supported by the system; r isu,iRepresenting the user's true value of credit to the project;represents the value of the score obtained by normalizing the actual value of the score of the item by the user,y represents the highest score associated with system support, Y is a constant, fixed at 4; rhou,iRepresenting a conditional probability distribution of user u prefers item i.
4. The recommendation method combining time window technique and score prediction model according to claim 1, wherein the calculation method of dividing the time window according to the time window technique in the step (11) is as follows:
Tuk(k)=Tu0-θ(k-1)-ka1
wherein alpha is1The size of a first time window is represented and represents the length of the user interest, and the larger the value is, the larger the window is; theta represents the interval increase amplitude of the time window catch, and is T in sequenceu1,Tu2,...,TukThe value is larger, the interest of the user is changed faster, and vice versa.
5. The recommendation method combining time window technique and score prediction model as claimed in claim 4, wherein the time windows are different and correspond to different weights, and the Ebinghaos forgetting curve is adopted as a time function:
f(u,i)=0.318×(T0-Tuk)-0.125
wherein, TukA time window representing the time when user u accessed i; the value of f (u, i) ranges from [0, 1]]Presenting a forgetting rule of first-speed and second-speed;
and (3) defining the interest degree of the target user on the item by combining the data weight:
W(u,i)=f(u,i)×β1+w(u,i)×(1-β1),β1∈[0,1]
wherein w (u, i) represents the interest degree of the user u in the item i in the recent time period; i denotes the user's most recent time period (T)u1~Tu0) A set of items that have been accessed; sim (i, j) represents the similarity of item i and item j.
6. The recommendation method combining time window technique and score prediction model according to claim 5, wherein the calculation of the item similarity sim (i, j) employs a modified cosine similarity:
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811425529.2A CN109543109B (en) | 2018-11-27 | 2018-11-27 | Recommendation algorithm integrating time window technology and scoring prediction model |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811425529.2A CN109543109B (en) | 2018-11-27 | 2018-11-27 | Recommendation algorithm integrating time window technology and scoring prediction model |
Publications (2)
Publication Number | Publication Date |
---|---|
CN109543109A CN109543109A (en) | 2019-03-29 |
CN109543109B true CN109543109B (en) | 2021-06-22 |
Family
ID=65851074
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201811425529.2A Active CN109543109B (en) | 2018-11-27 | 2018-11-27 | Recommendation algorithm integrating time window technology and scoring prediction model |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN109543109B (en) |
Families Citing this family (12)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110544129A (en) * | 2019-09-05 | 2019-12-06 | 创新奇智(青岛)科技有限公司 | Personalized recommendation method for social e-commerce users |
CN112530520A (en) * | 2019-09-17 | 2021-03-19 | 中山大学 | CircRNA function prediction method based on scoring mechanism and LightGBM |
CN112860984A (en) * | 2019-11-27 | 2021-05-28 | 中移(苏州)软件技术有限公司 | Recommendation method, recommendation device and storage medium |
CN111310033B (en) * | 2020-01-23 | 2023-05-30 | 山西大学 | Recommendation method and recommendation device based on user interest drift |
CN111339435B (en) * | 2020-02-10 | 2022-09-23 | 南京邮电大学 | Matrix decomposition completion hybrid recommendation method based on potential factors |
CN111311324B (en) * | 2020-02-18 | 2022-05-20 | 电子科技大学 | User-commodity preference prediction system and method based on stable neural collaborative filtering |
CN111382361B (en) * | 2020-03-12 | 2023-05-02 | 腾讯科技(深圳)有限公司 | Information pushing method, device, storage medium and computer equipment |
CN111475744B (en) * | 2020-04-03 | 2022-06-14 | 南京理工大学紫金学院 | Personalized position recommendation method based on ensemble learning |
CN112069417A (en) * | 2020-08-24 | 2020-12-11 | 北京神舟航天软件技术有限公司 | Work breakdown structure WBS template recommendation method |
CN113011950A (en) * | 2021-03-30 | 2021-06-22 | 吉林亿联银行股份有限公司 | Product recommendation method and device |
CN113360759B (en) * | 2021-06-09 | 2023-08-25 | 南京大学 | Crowd measurement task recommendation method based on user and project dual time sequence correlation |
CN116028727B (en) * | 2023-03-30 | 2023-08-18 | 南京邮电大学 | Video recommendation method based on image data processing |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106339502A (en) * | 2016-09-18 | 2017-01-18 | 电子科技大学 | Modeling recommendation method based on user behavior data fragmentation cluster |
CN107729542A (en) * | 2017-10-31 | 2018-02-23 | 咪咕音乐有限公司 | A kind of information methods of marking and device and storage medium |
-
2018
- 2018-11-27 CN CN201811425529.2A patent/CN109543109B/en active Active
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106339502A (en) * | 2016-09-18 | 2017-01-18 | 电子科技大学 | Modeling recommendation method based on user behavior data fragmentation cluster |
CN107729542A (en) * | 2017-10-31 | 2018-02-23 | 咪咕音乐有限公司 | A kind of information methods of marking and device and storage medium |
Non-Patent Citations (2)
Title |
---|
"基于时间加权的协同过滤推荐算法的改进";刘乔 等;《计算机工程与设计》;20160716;第37卷(第7期);1827-1830、1872 * |
"融合标签和多元信息的个性化推荐算法研究";张鹏飞 等;《计算机工程与应用》;20180517;1-9 * |
Also Published As
Publication number | Publication date |
---|---|
CN109543109A (en) | 2019-03-29 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109543109B (en) | Recommendation algorithm integrating time window technology and scoring prediction model | |
CN107330115B (en) | Information recommendation method and device | |
CN108648049B (en) | Sequence recommendation method based on user behavior difference modeling | |
CN107506480B (en) | Double-layer graph structure recommendation method based on comment mining and density clustering | |
CN107357793B (en) | Information recommendation method and device | |
CN109299994B (en) | Recommendation method, device, equipment and readable storage medium | |
CN111709812A (en) | E-commerce platform commodity recommendation method and system based on user dynamic classification | |
CN108509573B (en) | Book recommendation method and system based on matrix decomposition collaborative filtering algorithm | |
CN106251174A (en) | Information recommendation method and device | |
CN104063481A (en) | Film individuation recommendation method based on user real-time interest vectors | |
Eliyas et al. | Recommendation systems: Content-based filtering vs collaborative filtering | |
Jiao et al. | A novel learning rate function and its application on the SVD++ recommendation algorithm | |
Xu et al. | Personalized product recommendation method for analyzing user behavior using DeepFM | |
CN113536139B (en) | Content recommendation method and device based on interests, computer equipment and storage medium | |
Chung et al. | Categorization for grouping associative items using data mining in item-based collaborative filtering | |
CN109063120B (en) | Collaborative filtering recommendation method and device based on clustering | |
CN112396492A (en) | Conversation recommendation method based on graph attention network and bidirectional long-short term memory network | |
CN110727872A (en) | Method and device for mining ambiguous selection behavior based on implicit feedback | |
Zhou et al. | LsRec: Large-scale social recommendation with online update | |
Fareed et al. | A collaborative filtering recommendation framework utilizing social networks | |
CN117593089A (en) | Credit card recommendation method, apparatus, device, storage medium and program product | |
CN114581165A (en) | Product recommendation method, device, computer storage medium and system | |
CN113761084A (en) | POI search ranking model training method, ranking device, method and medium | |
Bharadhwaj | Layer-wise relevance propagation for explainable recommendations | |
Paul et al. | A weighted hybrid recommendation approach for user’s contentment using natural language processing |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |