CN109543109B - Recommendation algorithm integrating time window technology and scoring prediction model - Google Patents

Recommendation algorithm integrating time window technology and scoring prediction model Download PDF

Info

Publication number
CN109543109B
CN109543109B CN201811425529.2A CN201811425529A CN109543109B CN 109543109 B CN109543109 B CN 109543109B CN 201811425529 A CN201811425529 A CN 201811425529A CN 109543109 B CN109543109 B CN 109543109B
Authority
CN
China
Prior art keywords
user
item
scoring
score
recommendation
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201811425529.2A
Other languages
Chinese (zh)
Other versions
CN109543109A (en
Inventor
张志军
张鹏飞
潘华丽
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shandong Jianzhu University
Original Assignee
Shandong Jianzhu University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shandong Jianzhu University filed Critical Shandong Jianzhu University
Priority to CN201811425529.2A priority Critical patent/CN109543109B/en
Publication of CN109543109A publication Critical patent/CN109543109A/en
Application granted granted Critical
Publication of CN109543109B publication Critical patent/CN109543109B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/06Buying, selling or leasing transactions
    • G06Q30/0601Electronic shopping [e-shopping]
    • G06Q30/0631Item recommendations

Landscapes

  • Business, Economics & Management (AREA)
  • Accounting & Taxation (AREA)
  • Finance (AREA)
  • Development Economics (AREA)
  • Economics (AREA)
  • Marketing (AREA)
  • Strategic Management (AREA)
  • Physics & Mathematics (AREA)
  • General Business, Economics & Management (AREA)
  • General Physics & Mathematics (AREA)
  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a recommendation algorithm integrating a time window technology and a score prediction model, which belongs to the technical field of electronic commerce and aims to solve the technical problems of data sparsity and user interest change along with time in a collaborative filtering recommendation algorithm, and adopts the technical scheme that: based on a collaborative filtering recommendation algorithm and a time window technology, a new scoring prediction model is adopted to complement a scoring matrix, the scoring prediction model can overcome the problems of overhigh time complexity and non-unique scoring prediction result caused by the traditional non-negative matrix decomposition scoring prediction model to a certain extent, the scoring prediction precision is improved, meanwhile, the influence of data sparseness on the collaborative filtering algorithm is reduced, then, the recommendation model is utilized to predict items possibly preferred by a user according to the interest of the user, and the items with higher user prediction scoring are selected for utilizationTopNThe recommendation method generates a comprehensive recommendation list, and the recommendation effect of the algorithm is improved.

Description

Recommendation algorithm integrating time window technology and scoring prediction model
Technical Field
The invention relates to the technical field of electronic commerce, in particular to a recommendation algorithm integrating a time window technology and a scoring prediction model.
Background
The rapid development of the internet enables the total amount of human information to increase explosively, information overload occurs, people are submerged by massive information, for example, Amazon has millions of books, and del. Recommendation systems have received significant attention and research, both in academic and industrial settings.
The recommendation system method mainly comprises two categories of content-based algorithm and collaborative filtering. The recommendation algorithm based on the content mainly uses a multi-dimensional vector to represent the user interest through a text processing technology, and meanwhile, the item is also subjected to feature extraction to establish a feature vector. The recommendation is made by calculating the similarity between the user interest vector and the item feature vector.
Collaborative filtering includes memory-based and model-based algorithms as well as various fusion algorithms. The collaborative filtering algorithm based on the memory firstly calculates the similarity between users and between items according to the past behavior records of the users. And then recommending the items purchased and scored by the user with high similarity to the user or recommending the items with high similarity to the previous purchased items of the user. However, the actual data volume of the electronic commerce system is huge, the user scoring matrix is sparse, and the accuracy is low by utilizing the traditional memory-based recommendation algorithm.
The mainstream recommendation algorithm comprises collaborative filtering recommendation and context recommendation, the collaborative filtering recommendation algorithm belongs to a classical recommendation algorithm, the current collaborative filtering recommendation mainly comprises three categories, namely memory-based, model-based and mixed recommendation, different categories meet different requirements and applications, the collaborative filtering algorithm has the advantages of being prominent, the model has strong trafficability, the realization is simple, the effect is good, and the defects are obvious, such as the cold start problem and the problem that the user interest changes along with time.
The model-based collaborative recommendation algorithm computes a user behavior model based on the user's previous scores and various implicit preferences. And then predicting the scoring behavior of the user according to the model. The matrix decomposition recommendation algorithm scores items by establishing k-dimensional feature vectors for n users and m items, converting a scoring matrix with the size of n x m into two matrixes of n x k and k x m, and then calculating the dot products of the user feature vectors and the item feature vectors. Many recommendation algorithms search neighbors for recommendation by calculating the global similarity of users, but the interests of the users may be similar in some aspect, so the recommendation model based on the bayesian classification classifies scores and adopts different similar users to recommend respectively. Models in the recommendation field are many, wherein matrix decomposition models have good performance, so that many scholars are working on personalized recommendation by using matrix decomposition, such as non-negative matrix decomposition, and the main disadvantages of the models are that the time complexity is high, the obtained results are not unique, and the global minimum point is difficult to obtain.
The Probabilistic Latent Semantic (PLSA) method is also a model-based algorithm that extracts hidden variables to model user preferences and can achieve a relatively high accuracy. Most of the widely used recommendation algorithms are static models, which only simply integrate the historical data of the user and do not consider the interest change situation of the user.
The recommendation system and the personalized customization service are widely applied to the field of electronic commerce as important methods for overcoming data overload, but how to deal with the time complexity existing in the traditional non-negative matrix is high, the obtained structure is not unique, and the change of user interest in a collaborative filtering recommendation algorithm along with time is a technical problem which needs to be solved urgently at present.
Disclosure of Invention
The technical task of the invention is to provide a recommendation algorithm integrating a time window technology and a score prediction model, and solve the problems that the time complexity is high, the obtained structure is not unique and the user interest changes along with time in a collaborative filtering recommendation algorithm in the traditional non-negative matrix.
The technical task of the invention is realized in the following way, a recommendation algorithm fusing a time window technology and a score prediction model is based on a collaborative filtering recommendation algorithm and a time window technology, the interest similarity of a user is calculated by using the score prediction model, then the recommendation model is used for predicting the possibly preferred items of the user according to the interest of the user, and the items with higher user prediction scores are selected to produce a recommendation list by using a TopN recommendation method;
the specific method comprises the following steps:
s1, constructing a user-item scoring matrix through the scoring of the item by the user, and carrying out normalization processing on the scoring to obtain a normalized user-item scoring matrix;
s2, according to the score prediction model, calculating and obtaining a feature matrix of the user and a feature matrix of the project through the recommendation model, multiplying the feature matrices to obtain the prediction score of the user on the unscored project, and obtaining a TopN2 recommendation list by using a TopN recommendation method;
s3, restoring the obtained prediction scores to original scores according to a normalization processing principle to obtain a dense score matrix, dividing a plurality of different time windows by adopting a time window technology, endowing time scales to the scored items according to the time windows, calculating the overall similarity between each user prediction scoring item and the item set in each time window, and taking any time scale in the time window with the highest similarity as the time scale for predicting the article scoring behavior;
s4, calculating interest similarity among users by adopting a collaborative filtering algorithm, constructing a user interest similarity matrix, calculating interest preference of a target user on resources, producing a TopN1 recommendation list by utilizing the top N commodities with highest user interest by using a TopN recommendation method, and fusing the TopN1 recommendation list with the TopN2 recommendation list to generate a TopN recommendation list.
Preferably, the method for normalizing the score in step S1 is specifically formulated as follows:
Figure BDA0001881522420000031
wherein u represents a user; i represents an item; m represents the highest score value obtained according to the range of the scoring data supported by the system; r isu,iRepresenting the user's true value of credit to the project;
Figure BDA0001881522420000032
represents the value of the score obtained by normalizing the actual value of the score of the item by the user,
Figure BDA0001881522420000033
preferably, the scoring prediction model is used for predicting the scoring of the unscored items by the user; the method comprises the following specific steps:
the real score of the user u can be calculated
Figure BDA0001881522420000034
Wherein
Figure BDA0001881522420000035
ρu,iAnd the method is used for calculating the normalized scoring result and further determining the unknown variable so as to determine the characteristic vectors of the users and the projects, and after the corresponding characteristic vectors of the users and the projects are obtained, the scoring of the users on the unscored projects can be obtained through matrix operation. A
In fig. 1, α and β represent parameters of the model; alpha represents the overlapping degree of the user characteristics and takes a value range of [0, 1]]When alpha is close to 0, the representative users tend to have the same characteristics, in short, the user characteristics are single, and if alpha is larger, the corresponding users tend to have different user characteristics, and the characteristics of the users are more; beta is larger than 1, the larger the value of beta is, the more information needed for proving that a certain characteristic of a certain user is prominent is represented, and k represents the dimension of the vector; u represents user, i represents item, V represents item characteristic vector, and the initialization of item characteristic matrix adopts beta distribution Vi,kBeta (Beta ) with a value range of [0, 1]];UnThe eigenvectors representing the users, in particular the eigenvector for user u, obey the Dirichlet distribution, noted
Figure BDA0001881522420000036
Vector (U)u,1,…,Uu,k) Representing the components of user u in the various dimensions, thanks to the use of the dirichlet distribution model,
Figure BDA0001881522420000037
absence of real or reshaped values; z in modelu,iAnd ρi,uIs a random variable set for each user u and item i; zu,iIs a random variable subject to a classification distribution,
Figure BDA0001881522420000038
Zu,ithe value in (1) represents the user's u rating for item i; rhou,iAre random variables that are subject to a binomial distribution,
Figure BDA0001881522420000039
representing the confidence level of the user for a certain preference item.
Preferably, the calculation model is used for predicting items which are possibly preferred by the user according to the interest preference of the user; the specific implementation method of the calculation model is as follows:
(1) inputting alpha, beta and a scoring matrix R; wherein α and β represent parameters of an estimation model; alpha represents the overlapping degree of the user characteristics, and alpha is more than or equal to 0 and less than or equal to 1; when alpha is close to 0, the representative user tends to the same characteristic, namely the characteristic of the user is relatively single; if alpha is larger, the user characteristics tend to be different, namely more user characteristics are indicated; beta represents the amount of information required by a certain characteristic of a user to be highlighted, beta is larger than 1, and the larger the value of beta represents the more information required by a certain characteristic of a user to be highlighted;
(2) normalizing the scoring matrix R to generate a matrix R';
(3) random initialization free parameter gammau,k
Figure BDA0001881522420000041
And
Figure BDA00018815224200000418
γu,kis a matrix of dimension u x k,
Figure BDA00018815224200000419
and
Figure BDA0001881522420000042
matrices of dimensions v × k, respectively; wherein u represents a user, k represents a matrix as a dimension, and v represents an item;
(4) recording according to the real score of the user u
Figure BDA0001881522420000043
Calculating lambda of corresponding user u to corresponding item iu,i,k
Figure BDA0001881522420000044
Wherein λ isu,i,kRepresenting a dependent variable Zu,iIn the distribution of the classes obeyedParameter, λ'u,i,kMeans for calculating lambdau,i,kAn intermediate amount of (a);
Figure BDA0001881522420000046
Ψ is defined as the logarithmic derivative of the gamma function,
Figure BDA0001881522420000047
Figure BDA0001881522420000048
wherein Γ (x) represents a gamma function; Γ' (x) represents the derivative of the gamma function; y represents the highest score associated with system support, Y is a constant, fixed at 4; rhou,iA conditional probability distribution representing user u prefers item i; x represents;
(5) recording according to the real score of the user u
Figure BDA0001881522420000049
Calculating and updating gamma of corresponding user uu,k
Figure BDA00018815224200000410
Wherein, γu,kRepresenting the relevant parameters in the dirichlet distribution to which user u obeys;
(6) recording according to the real score of the user u
Figure BDA00018815224200000411
Calculating and updating corresponding items i
Figure BDA00018815224200000412
Figure BDA00018815224200000413
Figure BDA00018815224200000414
(7)、γu,k
Figure BDA00018815224200000415
And
Figure BDA00018815224200000416
whether the parameter values of (a) change significantly:
if yes, repeating the steps (4) to (6);
if not, executing the step (8);
(8) calculating a feature matrix U of the user Uu,k
Figure BDA00018815224200000417
Calculating a project feature matrix Vi,k
Figure BDA0001881522420000051
(9) And calculating the preference prediction of the user u on the item i:
Figure BDA0001881522420000052
(10) the scoring prediction completion matrix R' generates a TopN2 recommendation list by using a TopN recommendation method;
(11) dividing a time window according to a time window technology;
(12) calculating the similarity of the projects:
Figure BDA0001881522420000053
wherein sim (i, j) represents the similarity of item i and item j; w (u, i) and W (u, j) represent the combined weight of the time weight and the data weight;
Figure BDA0001881522420000054
and
Figure BDA0001881522420000055
the average scores of item i and item j are respectively represented,
Figure BDA0001881522420000056
represents the average score of the user u;
(13) and calculating the comprehensive similarity of each user unscored item and the items in each time window:
Figure BDA0001881522420000057
wherein, Iu,jRepresents the set of items in the jth time window of user u, size (I)u,j) Representation set Iu,jThe size of (d);
(14) selecting the first k ' items with the highest similarity to be given with time scales (selecting the first k ' unscored items with the highest similarity to be given with time scales, wherein the optimal value of k ' is different according to different data sets, and the optimal value needs to be obtained through experiments);
(15) converting the dense matrix with the time scale into a three-dimensional scoring matrix of user-project-time;
(16) and acquiring the similarity between users:
Figure BDA0001881522420000061
wherein the content of the first and second substances,
Figure BDA0001881522420000062
tirepresenting the time weight of the corresponding user for scoring the corresponding project, and obtaining the time weight through a user-project-time three-dimensional matrix;
(17) and obtaining the preference value of the user to the unscored items:
Figure BDA0001881522420000063
s (u, k) represents the first k users with similar interests to the users, N (i) represents a user set with scores for the item i, and sim (u, v) represents the interest similarity between the users u and v;
(18) on the basis of score prediction, selecting a project with higher user prediction score to generate a TopN2 recommendation list, and combining the TopN2 recommendation list and the TopN1 recommendation list in a weighted manner to form a new TopN recommendation list, wherein the formula is as follows:
TopN=εTopN1+(1-ε)TopN2
wherein epsilon is between 0 and 1, and different data sets have different epsilon optima.
Preferably, the calculation method for dividing the time window according to the time window technique in the step (11) is as follows:
Tuk(k)=Tu0-θ(k-1)-ka1
wherein alpha is1The size of a first time window is represented and represents the length of the user interest, and the larger the value is, the larger the window is; theta represents the interval increase amplitude of the time window catch, and is T in sequenceu1,Tu2,…,TukThe value is larger, the interest of the user is changed faster, and vice versa.
More preferably, the difference of the time windows corresponds to different weights, and an Ebingos forgetting curve is adopted as a time function:
f(u,i)=0.318×(T0-Tuk)-0.125
wherein, TukA time window representing the time when user u accessed i; the value of f (u, i) ranges from [0, 1]]Presenting a forgetting rule of first-speed and second-speed;
and (3) defining the interest degree of the target user on the item by combining the data weight:
W(u,i)=f(u,i)×β1+w(u,i)×(1-β1),β1∈[0,1]
Figure BDA0001881522420000064
wherein w (u, i) represents the interest degree of the user u in the item i in the recent time period; i denotes the user's most recent time period (T)u1~Tu0) A set of items that have been accessed; sim (i, j) represents the similarity of item i and item j;
preferably, the calculation of the item similarity sim (i, j) in w (u, i) uses the modified cosine similarity:
Figure BDA0001881522420000071
wherein the content of the first and second substances,
Figure BDA0001881522420000072
and
Figure BDA0001881522420000073
the average scores of item i and item j are respectively represented,
Figure BDA0001881522420000074
representing the average score of the user u score.
The recommendation algorithm integrating the time window technology and the score prediction model has the following advantages:
the method mainly comprises a scoring prediction model and a collaborative filtering recommendation model, a time window technology is adopted for time processing, the influence of data sparsity on a collaborative filtering algorithm can be made up through the method, and due to the introduction of the time window technology, a recommendation result can better conform to the interest change of a user;
compared with the traditional matrix decomposition algorithm, the method adopts a new scoring prediction model, and the scoring prediction is greatly improved;
thirdly, the time window technology is adopted to divide the time window, and the influence of time factors on the user interest is considered, so that the recommendation result is more reasonable;
combining the scoring prediction model with the time window technology and the collaborative filtering algorithm, the cold start problem existing in the collaborative filtering algorithm can be solved, meanwhile, the influence of data sparsity on the collaborative filtering algorithm can be improved, particularly, the time window technology is added, so that the collaborative filtering algorithm can adapt to the interest change of the user, and the recommendation effect is improved;
and fifthly, the novel non-negative matrix decomposition algorithm and the time window technology are fused in the collaborative filtering algorithm, and the advantages of the non-negative matrix decomposition algorithm and the time window technology are combined to solve the defects of the traditional non-negative matrix and the problem that the user interest in the collaborative filtering recommendation algorithm changes along with time, so that the collaborative filtering algorithm can adapt to the change of the user interest, and the recommendation effect is improved.
Drawings
The invention is further described below with reference to the accompanying drawings.
FIG. 1 is a flow diagram of the present invention;
FIG. 2 is a diagram illustrating a relationship between score prediction models;
FIG. 3 is a schematic diagram of time window division;
FIG. 4 is a graph of accuracy;
fig. 5 is a graph of recall.
Detailed Description
A recommendation algorithm incorporating a time window technique and a score prediction model according to the present invention will be described in detail below with reference to the accompanying drawings and specific embodiments.
The first embodiment is as follows:
the recommendation algorithm fusing the time window technology and the score prediction model is based on a collaborative filtering recommendation algorithm and a time window technology, the interest similarity of a user is calculated by using the score prediction model, then the recommendation model is used for predicting the possibly preferred items of the user according to the interest of the user, the items with higher user prediction scores are selected, and a TopN recommendation list is produced by using a TopN recommendation method;
the specific method comprises the following steps:
s1, constructing a user-item scoring matrix through the scoring of the item by the user, and carrying out normalization processing on the scoring to obtain a normalized user-item scoring matrix;
s2, according to the score prediction model, calculating and obtaining a feature matrix of the user and a feature matrix of the project through the recommendation model, multiplying the feature matrices to obtain the prediction score of the user on the unscored project, and obtaining a TopN2 recommendation list by using a TopN recommendation method;
the specific formula of the method for carrying out normalization processing on the scores is as follows:
Figure BDA0001881522420000081
wherein u represents a user; i represents an item; m represents the highest score value obtained according to the range of the scoring data supported by the system; r isu,iRepresenting the user's true value of credit to the project;
Figure BDA0001881522420000082
represents the value of the score obtained by normalizing the actual value of the score of the item by the user,
Figure BDA0001881522420000083
s3, restoring the obtained prediction scores to original scores according to a normalization processing principle to obtain a dense score matrix, dividing a plurality of different time windows by adopting a time window technology, endowing time scales to the scored items according to the time windows, calculating the overall similarity between each user prediction scoring item and the item set in each time window, and taking any time scale in the time window with the highest similarity as the time scale for predicting the article scoring behavior; the scoring matrix is complemented through a scoring prediction model to further obtain a dense scoring matrix, on the basis, time scales are given to scores in the scoring matrix, the interest of a user can be known to have timeliness according to related research, the interest of the user can change along with the change of time, but the interest of the user in a short period is basically unchanged, according to a time curve of a human-like forgetting rule provided in the related research, the time of the earliest scoring item of the user is set to be 0, and the time of the latest scoring item of the user is set to be Tu0From 0 to Tu0This period of time is divided into a plurality of time segments, as shown in FIG. 3Shown in the figure.
S4, calculating interest similarity among users by adopting a collaborative filtering algorithm, constructing a user interest similarity matrix, calculating interest preference of a target user on resources, producing a TopN1 recommendation list by utilizing the top N commodities with highest user interest by using a TopN recommendation method, and fusing the TopN1 recommendation list with the TopN2 recommendation list to generate a TopN recommendation list.
The scoring prediction model is used for predicting the scoring of the user on the unscored items; the method comprises the following specific steps:
the real score of the user u can be calculated
Figure BDA0001881522420000091
Wherein the content of the first and second substances,
Figure BDA0001881522420000092
ρu,iand the method is used for calculating the normalized scoring result and further determining the unknown variable so as to determine the characteristic vectors of the users and the projects, and after the corresponding characteristic vectors of the users and the projects are obtained, the scoring of the users on the unscored projects can be obtained through matrix operation.
As shown in fig. 2, α and β represent parameters of the model; alpha represents the overlapping degree of the user characteristics, the value range is [0, 1], when alpha is close to 0, the user tends to the same characteristics, in short, the user characteristics are single, if alpha is larger, the corresponding user tends to different user characteristics, and the user characteristics are more; beta is larger than 1, the larger the value of beta is, the more information needed for proving that a certain characteristic of a certain user is prominent is represented, and k represents the dimension of the vector;
u represents user, i represents item, V represents item characteristic vector, and the initialization of item characteristic matrix adopts beta distribution Vi,kBeta (Beta ) with a value range of [0, 1]];UnThe eigenvectors representing the users, in particular the eigenvector for user u, obey the Dirichlet distribution, noted
Figure BDA0001881522420000093
Vector (U)u,1,…,Uu,k) Represent the components of user u in various dimensions, sinceA dirichlet distribution model is used,
Figure BDA0001881522420000094
absence of real or reshaped values; z in modelu,iAnd ρi,uIs a random variable set for each user u and item i; zu,iIs a random variable subject to a classification distribution,
Figure BDA0001881522420000095
Zu,ithe value in (1) represents the user's u rating for item i; rhoi,uAre random variables that are subject to a binomial distribution,
Figure BDA0001881522420000096
representing the confidence level of the user for a certain preference item.
As shown in FIG. 1, the calculation model is used for predicting items which may be preferred by a user according to the interest preference of the user; the specific implementation method of the calculation model is as follows:
(1) inputting alpha, beta and a scoring matrix R; wherein α and β represent parameters of an estimation model; alpha represents the overlapping degree of the user characteristics, and alpha is more than or equal to 0 and less than or equal to 1; when alpha is close to 0, the representative user tends to the same characteristic, namely the characteristic of the user is relatively single; if alpha is larger, the user characteristics tend to be different, namely more user characteristics are indicated; beta represents the amount of information required by a certain characteristic of a user to be highlighted, beta is larger than 1, and the larger the value of beta represents the more information required by a certain characteristic of a user to be highlighted;
(2) normalizing the scoring matrix R to generate a matrix R';
(3) random initialization free parameter gammau,k
Figure BDA0001881522420000097
And
Figure BDA0001881522420000098
γu,kis a matrix of dimension u x k,
Figure BDA0001881522420000099
and
Figure BDA00018815224200000910
matrices of dimensions v × k, respectively; wherein u represents a user, k represents a matrix as a dimension, and v represents an item;
(4) recording according to the real score of the user u
Figure BDA00018815224200000911
Calculating lambda of corresponding user u to corresponding item iu,i,k
Figure BDA0001881522420000101
Wherein λ isu,i,kRepresenting a dependent variable Zu,iParameter in the classification distribution obeyed, λ'u,i,kMeans for calculating lambdau,i,kAn intermediate amount of (a);
Figure BDA0001881522420000102
Ψ in the above equation is defined as the logarithmic derivative of the gamma function,
Figure BDA0001881522420000103
Figure BDA0001881522420000104
wherein Γ (x) represents a gamma function; Γ' (x) represents the derivative of the gamma function; y represents the highest score associated with system support, Y is a constant, fixed at 4; rhou,iA conditional probability distribution representing user u prefers item i; x represents;
(5) recording according to the real score of the user u
Figure BDA0001881522420000105
Calculating and updating gamma of corresponding user uu,k
Figure BDA0001881522420000106
Wherein, γu,kRepresenting the relevant parameters in the dirichlet distribution to which user u obeys;
(6) recording according to the real score of the user u
Figure BDA0001881522420000107
Calculating and updating corresponding items i
Figure BDA0001881522420000108
Figure BDA0001881522420000109
Figure BDA00018815224200001010
(7)、γu,k
Figure BDA00018815224200001011
And
Figure BDA00018815224200001012
whether the parameter values of (a) change significantly:
if yes, repeating the steps (4) to (6);
if not, executing the step (8);
(8) calculating a feature matrix U of the user Uu,k
Figure BDA00018815224200001013
Calculating a project feature matrix Vi,k
Figure BDA00018815224200001014
(9) And calculating the preference prediction of the user u on the item i:
Figure BDA00018815224200001015
(10) the scoring prediction completion matrix R' generates a TopN2 recommendation list by using a TopN recommendation method;
(11) dividing a time window according to a time window technology; the calculation method for dividing the time window according to the time window technology is as follows:
Tuk(k)=Tu0-θ(k-1)-ka1
wherein alpha is1The size of a first time window is represented and represents the length of the user interest, and the larger the value is, the larger the window is; theta represents the interval increase amplitude of the time window catch, and is T in sequenceu1,Tu2,…,TukThe value is larger, the interest of the user is changed faster, and vice versa. In this embodiment, θ is 0, that is, the time window is divided equally.
Different time windows correspond to different weights, and an Ebingois forgetting curve is adopted as a time function:
f(u,i)=0.318×(T0-Tuk)-0.125
wherein, TukA time window representing the time when user u accessed i; the value of f (u, i) ranges from [0, 1]]Presenting a forgetting rule of first-speed and second-speed;
and (3) defining the interest degree of the target user on the item by combining the data weight:
W(u,i)=f(u,i)×β1+w(u,i)×(1-β1),β1∈[0,1]
Figure BDA0001881522420000111
wherein w (u, i) represents the interest degree of the user u in the item i in the recent time period; i denotes the user's most recent time period (T)u1~Tu0) A set of items that have been accessed; sim (i, j) represents the similarity of item i and item j; sim (i, j) is calculated using the modified cosine similarity:
Figure BDA0001881522420000112
wherein the content of the first and second substances,
Figure BDA0001881522420000113
and
Figure BDA0001881522420000114
the average scores of item i and item j are respectively represented,
Figure BDA0001881522420000115
representing the average score of the user u score.
(12) Calculating the similarity of the projects:
Figure BDA0001881522420000116
wherein sim (i, j) represents the similarity of item i and item j; w (u, i) and W (u, j) represent the combined weight of the time weight and the data weight;
Figure BDA0001881522420000117
and
Figure BDA0001881522420000118
the average scores of item i and item j are respectively represented,
Figure BDA0001881522420000119
represents the average score of the user u;
(13) and calculating the comprehensive similarity of each user unscored item and the items in each time window:
Figure BDA0001881522420000121
wherein, Iu,jRepresents the set of items in the jth time window of user u, size (I)u,j) Representation set Iu,jThe size of (d);
(14) selecting the first k ' items with the highest similarity to be given with time scales (selecting the first k ' unscored items with the highest similarity to be given with time scales, wherein the optimal value of k ' is different according to different data sets, and the optimal value needs to be obtained through experiments);
(15) converting the dense matrix with the time scale into a three-dimensional scoring matrix of user-project-time;
(16) acquiring similarity among users, and calculating user interest similarity by using Pearson correlation similarity based on time weighting:
Figure BDA0001881522420000122
wherein, a logistic function is used as a weight function, different weights are given to different time windows, and scores in the same time window are given to the same weight formula as follows:
Figure BDA0001881522420000123
tirepresenting the time weight of the corresponding user for scoring the corresponding project, and obtaining the time weight through a user-project-time three-dimensional matrix;
(17) and obtaining the preference value of the user to the unscored items:
Figure BDA0001881522420000124
s (u, k) represents the first k users with similar interests to the users, N (i) represents a user set with scores for the item i, and sim (u, v) represents the interest similarity between the users u and v;
(18) on the basis of score prediction, selecting a project with higher user prediction score to generate a TopN2 recommendation list, and combining the TopN2 recommendation list and the TopN1 recommendation list in a weighted manner to form a new TopN recommendation list, wherein the formula is as follows:
TopN=εTopN1+(1-ε)TopN2
wherein epsilon is between 0 and 1, and different data sets have different epsilon optima.
Example two: detailed description of the invention
A. Data sets Netflix, Movielens 20M, Movielens 10M, Movielens1M and epion are used; constructing a user-item scoring matrix according to the scoring of the item by the user, and carrying out normalization processing on the scoring to obtain a normalized scoring matrix;
B. obtaining the predicted score of the user for the item through a score prediction model, and generating a TopN2 recommendation list, wherein the accuracy of the score adopts MAE and CMAE, and the calculation formula is as follows:
Figure BDA0001881522420000131
Figure BDA0001881522420000132
the results are shown in the following table:
Figure BDA0001881522420000133
C. the scoring matrix can be completed through scoring prediction to obtain a dense user scoring matrix, a time window technology is adopted (the number of time windows is set to be 6 according to related research results, equal division is carried out), and the overall similarity sim (I, I) of each user predicted scoring item and an item set in each time window is further obtained through calculating the similarity sim (I, j) between itemsu,j) The top k' unscored items with the highest similarity are selected and assigned a time scale.
D. And finally, converting the dense matrix with the time information into a user-item-time three-dimensional scoring matrix, calculating the user interest similarity sim (u, v) on the basis, further calculating the interest degree of the user on the item, generating a TopN1 recommendation list, and finally generating and feeding back the TopN recommendation list to the user, wherein the effect of the recommendation list is measured by adopting the accuracy and the recall rate, as shown in the attached figures 4 and 5.
Finally, it should be noted that: the above embodiments are only used to illustrate the technical solution of the present invention, and not to limit the same; while the invention has been described in detail and with reference to the foregoing embodiments, it will be understood by those skilled in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some or all of the technical features may be equivalently replaced; and the modifications or the substitutions do not make the essence of the corresponding technical solutions depart from the scope of the technical solutions of the embodiments of the present invention.

Claims (6)

1. A recommendation method combining a time window technology and a score prediction model is characterized in that the method is based on a collaborative filtering recommendation method and a time window technology, the score prediction model is used for calculating interest similarity of users, then the recommendation model is used for predicting items which the users are likely to prefer according to the interests of the users, and the items with higher user prediction scores are selected to produce a recommendation list by using a TopN recommendation method;
the specific method comprises the following steps:
s1, constructing a user-item scoring matrix through the scoring of the item by the user, and carrying out normalization processing on the scoring to obtain a normalized user-item scoring matrix;
s2, according to the score prediction model, calculating and obtaining a feature matrix of the user and a feature matrix of the project through the recommendation model, multiplying the feature matrices to obtain the prediction score of the user on the unscored project, and obtaining a TopN2 recommendation list by using a TopN recommendation method;
s3, restoring the obtained prediction scores to original scores according to a normalization processing principle to obtain a dense score matrix, dividing a plurality of different time windows by adopting a time window technology, endowing time scales to the scored items according to the time windows, calculating the overall similarity between each user prediction scoring item and the item set in each time window, and taking any time scale in the time window with the highest similarity as the time scale for predicting the article scoring behavior;
s4, calculating interest similarity among users by adopting a collaborative filtering algorithm, constructing a user interest similarity matrix, calculating interest preference of a target user on resources, producing a TopN1 recommendation list by utilizing the top N items with highest user interest by using a TopN recommendation method, and fusing the TopN1 recommendation list with a TopN2 recommendation list to generate a TopN recommendation list;
the calculation model is used for predicting items which are possibly preferred by the user according to the interest preference of the user; the specific implementation method of the calculation model is as follows:
(1) inputting alpha, beta and a scoring matrix R; wherein α and β represent parameters of an estimation model; alpha represents the overlapping degree of the user characteristics, and alpha is more than or equal to 0 and less than or equal to 1; when alpha is close to 0, the representative user tends to the same characteristic, namely the characteristic of the user is relatively single; if alpha is larger, the user characteristics tend to be different, namely more user characteristics are indicated; beta represents the amount of information required by a certain characteristic of a user to be highlighted, beta is larger than 1, and the larger the value of beta represents the more information required by a certain characteristic of a user to be highlighted;
(2) normalizing the scoring matrix R to generate a matrix R';
(3) random initialization free parameter gammau,k
Figure FDA0003025876130000011
And
Figure FDA0003025876130000012
γu,kis a matrix of dimension u x k,
Figure FDA0003025876130000013
and
Figure FDA0003025876130000014
matrices of dimensions v × k, respectively; wherein u represents a user, k represents a matrix as a dimension, and v represents an item;
(4) recording according to the real score of the user u
Figure FDA0003025876130000015
Calculating lambda of corresponding user u to corresponding item iu,i,k
Figure FDA0003025876130000021
Wherein λ isu,i,kRepresenting a dependent variable Zu,iParameter in the distribution of classes obeyed, λ ″)u,i,kMeans for calculating lambdau,i,kAn intermediate amount of (a);
Figure FDA0003025876130000022
Ψ in the above equation is defined as the logarithmic derivative of the gamma function,
Figure FDA0003025876130000023
Figure FDA0003025876130000024
wherein Γ (x) represents a gamma function; Γ' (x) represents the derivative of the gamma function; y represents the highest score associated with system support, Y is a constant, fixed at 4; rhou,iA conditional probability distribution representing user u prefers item i;
(5) recording according to the real score of the user u
Figure FDA0003025876130000025
Calculating and updating gamma of corresponding user uu,k
Figure FDA0003025876130000026
Wherein, γu,kRepresenting the relevant parameters in the dirichlet distribution to which user u obeys;
(6) recording according to the real score of the user u
Figure FDA0003025876130000027
Calculating and updating corresponding items i
Figure FDA0003025876130000028
Figure FDA0003025876130000029
Figure FDA00030258761300000210
(7)、γu,k
Figure FDA00030258761300000211
And
Figure FDA00030258761300000212
whether the parameter values of (a) change significantly:
if yes, repeating the steps (4) to (6);
if not, executing the step (8);
(8) calculating a feature matrix U of the user Uu,k
Figure FDA00030258761300000213
Calculating a project feature matrix Vi,k
Figure FDA00030258761300000214
(9) And calculating the preference prediction of the user u on the item i:
Figure FDA0003025876130000031
(10) the scoring prediction completion matrix R' generates a TopN2 recommendation list by using a TopN recommendation method;
(11) dividing a time window according to a time window technology;
(12) calculating the similarity of the projects:
Figure FDA0003025876130000032
wherein sim (i, j) represents the similarity of item i and item j; w (u, i) and W (u, j) represent the combined weight of the time weight and the data weight;
Figure FDA0003025876130000033
and
Figure FDA0003025876130000034
mean scores representing item i and item j, respectively;
(13) and calculating the comprehensive similarity between each user unscored item and the item in the time window:
Figure FDA0003025876130000035
wherein, Iu,jRepresents the set of items in the jth time window of user u, size (I)u,j) Representation set Iu,jThe size of (d);
(14) selecting the first k' items with the highest similarity and giving time scales;
(15) converting the dense matrix with the time scale into a three-dimensional scoring matrix of user-project-time;
(16) and acquiring the similarity between users:
Figure FDA0003025876130000036
wherein the content of the first and second substances,
Figure FDA0003025876130000037
tirepresenting the time weight of the corresponding user for scoring the corresponding project, and obtaining the time weight through a user-project-time three-dimensional matrix;
(17) and obtaining the preference value of the user to the unscored items:
Figure FDA0003025876130000041
s (u, k) represents the first k users with similar interests to the users, N (i) represents a user set with scores for the item i, and sim (u, v) represents the interest similarity between the users u and v;
(18) on the basis of score prediction, selecting a project with higher user prediction score to generate a TopN2 recommendation list, and combining the TopN2 recommendation list and the TopN1 recommendation list in a weighted manner to form a new TopN recommendation list, wherein the formula is as follows:
TopN=εTopN1+(1-ε)TopN2
wherein epsilon is between 0 and 1, and different data sets have different epsilon optima.
2. The recommendation method combining the time window technique and the score prediction model according to claim 1, wherein the normalization processing of the score in step S1 is performed according to the following specific formula:
Figure FDA0003025876130000042
wherein u represents a user; i represents an item; m represents the highest score value obtained according to the range of the scoring data supported by the system; r isu,iRepresenting the user's true value of credit to the project;
Figure FDA0003025876130000043
represents the value of the score obtained by normalizing the actual value of the score of the item by the user,
Figure FDA0003025876130000044
3. the recommendation method combining the time window technology and the scoring prediction model according to claim 1 or 2, wherein the scoring prediction model is used for predicting the scoring of unscored items by a user; the method comprises the following specific steps:
the real score of the user u can be calculated
Figure FDA0003025876130000045
Wherein the content of the first and second substances,
Figure FDA0003025876130000046
ρu,ithe system is used for calculating the normalized scoring result and further determining an unknown variable so as to determine the characteristic vectors of the users and the projects, and after the corresponding characteristic vectors of the users and the projects are obtained, the scoring of the users on the unscored projects can be obtained through matrix operation;
u represents a user; i represents an item; m represents the highest score value obtained according to the range of the scoring data supported by the system; r isu,iRepresenting the user's true value of credit to the project;
Figure FDA0003025876130000047
represents the value of the score obtained by normalizing the actual value of the score of the item by the user,
Figure FDA0003025876130000048
y represents the highest score associated with system support, Y is a constant, fixed at 4; rhou,iRepresenting a conditional probability distribution of user u prefers item i.
4. The recommendation method combining time window technique and score prediction model according to claim 1, wherein the calculation method of dividing the time window according to the time window technique in the step (11) is as follows:
Tuk(k)=Tu0-θ(k-1)-ka1
wherein alpha is1The size of a first time window is represented and represents the length of the user interest, and the larger the value is, the larger the window is; theta represents the interval increase amplitude of the time window catch, and is T in sequenceu1,Tu2,...,TukThe value is larger, the interest of the user is changed faster, and vice versa.
5. The recommendation method combining time window technique and score prediction model as claimed in claim 4, wherein the time windows are different and correspond to different weights, and the Ebinghaos forgetting curve is adopted as a time function:
f(u,i)=0.318×(T0-Tuk)-0.125
wherein, TukA time window representing the time when user u accessed i; the value of f (u, i) ranges from [0, 1]]Presenting a forgetting rule of first-speed and second-speed;
and (3) defining the interest degree of the target user on the item by combining the data weight:
W(u,i)=f(u,i)×β1+w(u,i)×(1-β1),β1∈[0,1]
Figure FDA0003025876130000051
wherein w (u, i) represents the interest degree of the user u in the item i in the recent time period; i denotes the user's most recent time period (T)u1~Tu0) A set of items that have been accessed; sim (i, j) represents the similarity of item i and item j.
6. The recommendation method combining time window technique and score prediction model according to claim 5, wherein the calculation of the item similarity sim (i, j) employs a modified cosine similarity:
Figure FDA0003025876130000052
wherein the content of the first and second substances,
Figure FDA0003025876130000053
and
Figure FDA0003025876130000054
the average scores of item i and item j are respectively represented,
Figure FDA0003025876130000055
representing the average score of the user u score.
CN201811425529.2A 2018-11-27 2018-11-27 Recommendation algorithm integrating time window technology and scoring prediction model Active CN109543109B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201811425529.2A CN109543109B (en) 2018-11-27 2018-11-27 Recommendation algorithm integrating time window technology and scoring prediction model

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201811425529.2A CN109543109B (en) 2018-11-27 2018-11-27 Recommendation algorithm integrating time window technology and scoring prediction model

Publications (2)

Publication Number Publication Date
CN109543109A CN109543109A (en) 2019-03-29
CN109543109B true CN109543109B (en) 2021-06-22

Family

ID=65851074

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201811425529.2A Active CN109543109B (en) 2018-11-27 2018-11-27 Recommendation algorithm integrating time window technology and scoring prediction model

Country Status (1)

Country Link
CN (1) CN109543109B (en)

Families Citing this family (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110544129A (en) * 2019-09-05 2019-12-06 创新奇智(青岛)科技有限公司 Personalized recommendation method for social e-commerce users
CN112530520A (en) * 2019-09-17 2021-03-19 中山大学 CircRNA function prediction method based on scoring mechanism and LightGBM
CN112860984A (en) * 2019-11-27 2021-05-28 中移(苏州)软件技术有限公司 Recommendation method, recommendation device and storage medium
CN111310033B (en) * 2020-01-23 2023-05-30 山西大学 Recommendation method and recommendation device based on user interest drift
CN111339435B (en) * 2020-02-10 2022-09-23 南京邮电大学 Matrix decomposition completion hybrid recommendation method based on potential factors
CN111311324B (en) * 2020-02-18 2022-05-20 电子科技大学 User-commodity preference prediction system and method based on stable neural collaborative filtering
CN111382361B (en) * 2020-03-12 2023-05-02 腾讯科技(深圳)有限公司 Information pushing method, device, storage medium and computer equipment
CN111475744B (en) * 2020-04-03 2022-06-14 南京理工大学紫金学院 Personalized position recommendation method based on ensemble learning
CN112069417A (en) * 2020-08-24 2020-12-11 北京神舟航天软件技术有限公司 Work breakdown structure WBS template recommendation method
CN113011950A (en) * 2021-03-30 2021-06-22 吉林亿联银行股份有限公司 Product recommendation method and device
CN113360759B (en) * 2021-06-09 2023-08-25 南京大学 Crowd measurement task recommendation method based on user and project dual time sequence correlation
CN116028727B (en) * 2023-03-30 2023-08-18 南京邮电大学 Video recommendation method based on image data processing

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106339502A (en) * 2016-09-18 2017-01-18 电子科技大学 Modeling recommendation method based on user behavior data fragmentation cluster
CN107729542A (en) * 2017-10-31 2018-02-23 咪咕音乐有限公司 A kind of information methods of marking and device and storage medium

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106339502A (en) * 2016-09-18 2017-01-18 电子科技大学 Modeling recommendation method based on user behavior data fragmentation cluster
CN107729542A (en) * 2017-10-31 2018-02-23 咪咕音乐有限公司 A kind of information methods of marking and device and storage medium

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
"基于时间加权的协同过滤推荐算法的改进";刘乔 等;《计算机工程与设计》;20160716;第37卷(第7期);1827-1830、1872 *
"融合标签和多元信息的个性化推荐算法研究";张鹏飞 等;《计算机工程与应用》;20180517;1-9 *

Also Published As

Publication number Publication date
CN109543109A (en) 2019-03-29

Similar Documents

Publication Publication Date Title
CN109543109B (en) Recommendation algorithm integrating time window technology and scoring prediction model
CN107330115B (en) Information recommendation method and device
CN108648049B (en) Sequence recommendation method based on user behavior difference modeling
CN107506480B (en) Double-layer graph structure recommendation method based on comment mining and density clustering
CN107357793B (en) Information recommendation method and device
CN109299994B (en) Recommendation method, device, equipment and readable storage medium
CN111709812A (en) E-commerce platform commodity recommendation method and system based on user dynamic classification
CN108509573B (en) Book recommendation method and system based on matrix decomposition collaborative filtering algorithm
CN106251174A (en) Information recommendation method and device
CN104063481A (en) Film individuation recommendation method based on user real-time interest vectors
Eliyas et al. Recommendation systems: Content-based filtering vs collaborative filtering
Jiao et al. A novel learning rate function and its application on the SVD++ recommendation algorithm
Xu et al. Personalized product recommendation method for analyzing user behavior using DeepFM
CN113536139B (en) Content recommendation method and device based on interests, computer equipment and storage medium
Chung et al. Categorization for grouping associative items using data mining in item-based collaborative filtering
CN109063120B (en) Collaborative filtering recommendation method and device based on clustering
CN112396492A (en) Conversation recommendation method based on graph attention network and bidirectional long-short term memory network
CN110727872A (en) Method and device for mining ambiguous selection behavior based on implicit feedback
Zhou et al. LsRec: Large-scale social recommendation with online update
Fareed et al. A collaborative filtering recommendation framework utilizing social networks
CN117593089A (en) Credit card recommendation method, apparatus, device, storage medium and program product
CN114581165A (en) Product recommendation method, device, computer storage medium and system
CN113761084A (en) POI search ranking model training method, ranking device, method and medium
Bharadhwaj Layer-wise relevance propagation for explainable recommendations
Paul et al. A weighted hybrid recommendation approach for user’s contentment using natural language processing

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant