CN109446420B - Cross-domain collaborative filtering method and system - Google Patents

Cross-domain collaborative filtering method and system Download PDF

Info

Publication number
CN109446420B
CN109446420B CN201811209371.5A CN201811209371A CN109446420B CN 109446420 B CN109446420 B CN 109446420B CN 201811209371 A CN201811209371 A CN 201811209371A CN 109446420 B CN109446420 B CN 109446420B
Authority
CN
China
Prior art keywords
user
training sample
item
classifier
training
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201811209371.5A
Other languages
Chinese (zh)
Other versions
CN109446420A (en
Inventor
于旭
付裕
徐凌伟
杜军威
巩敦卫
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Qingdao University of Science and Technology
Original Assignee
Qingdao University of Science and Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Qingdao University of Science and Technology filed Critical Qingdao University of Science and Technology
Priority to CN201811209371.5A priority Critical patent/CN109446420B/en
Publication of CN109446420A publication Critical patent/CN109446420A/en
Application granted granted Critical
Publication of CN109446420B publication Critical patent/CN109446420B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/06Buying, selling or leasing transactions
    • G06Q30/0601Electronic shopping [e-shopping]
    • G06Q30/0631Item recommendations

Landscapes

  • Business, Economics & Management (AREA)
  • Accounting & Taxation (AREA)
  • Finance (AREA)
  • Development Economics (AREA)
  • Economics (AREA)
  • Marketing (AREA)
  • Strategic Management (AREA)
  • Physics & Mathematics (AREA)
  • General Business, Economics & Management (AREA)
  • General Physics & Mathematics (AREA)
  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a cross-domain collaborative filtering method, which comprises the steps of converting user project scoring data into training sample sets, carrying out Funk-SVD decomposition on a user project scoring matrix of each auxiliary domain to obtain user potential vectors, then expanding the training sample sets by using the user potential vectors to obtain first expanded training sample sets, adding project characteristic information to expand the first expanded training sample sets to obtain second expanded training sample sets, training an imbalance classifier by using the second expanded training sample sets, and finally predicting missing data of the user project scoring data based on the imbalance classifier and generating recommendations; the problem of sparsity of target domain data is solved by adopting auxiliary domain data to expand, then training of an unbalanced classifier is carried out on a training sample after expansion, missing items of a target domain are predicted by adopting the unbalanced classifier, then recommended data are obtained, and the problems of sparsity and unbalance of a data set of an existing recommendation system are solved.

Description

Cross-domain collaborative filtering method and system
Technical Field
The invention belongs to the technical field of information recommendation, and particularly relates to a cross-domain collaborative filtering method and a cross-domain collaborative filtering system.
Background
The rapid growth of internet information requires that effective intelligent information agents be able to screen out all available information and find the most valuable information to the user among them.
In recent years, recommendation systems are widely applied to e-commerce networks and online social media, and currently, main recommendation methods are: content-based recommendations, collaborative filtering-based recommendations, association rule-based recommendations, utility-based recommendations, knowledge-based recommendations, combinatorial recommendations, and the like; the recommendation based on collaborative filtering is the most successful strategy in the recommendation method, and the basic idea is that the user likes resources similar to the user, and the user probably likes the resources; a user likes a resource, and is likely to also like other resources similar to the resource; namely, users can help each other to mine and filter out the content of interest of the users through the behaviors of the users on the website, such as resource evaluation, browsing and the like.
However, in practical recommendation systems, users are often reluctant to score items they do not like, which results in an imbalance in the majority of scoring data sets.
Disclosure of Invention
The application provides a cross-domain collaborative filtering method and a cross-domain collaborative filtering system, which solve the technical problem of unbalanced data sets in the conventional recommendation system.
In order to solve the technical problems, the application adopts the following technical scheme:
a cross-domain collaborative filtering method is provided, which comprises the following steps: converting the user item scoring data into a training sample set of a classification algorithm; performing Funk-SVD on the user item scoring matrix of each auxiliary domain to obtain a user potential vector; expanding the characteristic vector of the user in the training sample set by using the user potential vector to obtain a first expanded training sample set; adding project characteristic information to expand the characteristic vectors of the projects in the first extended training sample set to obtain a second extended training sample set; training an imbalance classifier using the second extended training sample set; predicting missing data for the user item scoring data based on the imbalance classifier and generating a recommendation.
Further, the user item scoring data is converted into a training sample set of a classification algorithm, specifically: by using LuRepresenting the rows of the user in the user item scoring matrix, using LiColumns representing items in the user item rating matrix; based on feature vector (L)u,Li) Training sample set of classification algorithm for constructing user project scoring data { (L)u,Li,Rui) L (u, i) e.k, where k is the set of scored "user-item" pairs in the scoring matrix, RuiRepresenting the user u's rating for item i.
Further, for each assistantThe method comprises the following steps of carrying out a Funk-SVD decomposition on a user item scoring matrix of a domain to obtain a user potential vector, and specifically comprises the following steps: setting an objective function
Figure GDA0003367752720000021
By pu+γ(euiqi-λpu) And q isi+γ(euipu-λqi) Updating puAnd q isiTo optimize the objective function; wherein, λ is a regularization parameter, and γ is a learning rate; obtaining a potential vector of the user u on the jth auxiliary domain based on the optimization result
Figure GDA0003367752720000023
Wherein j is from 1 to K, and K is the number of the auxiliary domains; wherein r isuiRepresents the scoring of item i by user u; p is a radical of*={puserI user ∈ userset } represents the set of all users' potential vectors, q*={qitem| item ∈ itemset } represents the set of potential factors of all items; p is a radical ofuA latent factor vector, q, representing user uiA potential factor vector representing item i;
Figure GDA0003367752720000022
further, training an imbalance classifier using the second extended training sample set specifically includes: initializing a sample weight of each sample in the second extended training sample set to
Figure GDA0003367752720000031
Wherein A is the number of samples, and a is more than or equal to 1 and less than or equal to A; repeating the following steps for T times: 1) from the t-th iteration, all weights { Dt(xa) If 1 is less than or equal to a and less than or equal to A, training and obtaining the weak classifier ht(ii) a Wherein T is from 1 to T; 2) calculate each training sample xaPenalty term of
Figure GDA0003367752720000032
Figure GDA0003367752720000033
Wherein the content of the first and second substances,
Figure GDA0003367752720000034
is the weight of the weak classifier; 3) use of
Figure GDA0003367752720000035
Updating the sample weight; wherein Z istFor the regularization factor, λ ∈ [0.5, 12 ]]Updating step length of the penalty item; computational unbalance classifier
Figure GDA0003367752720000036
Wherein, H is the classification result integrating all classifiers.
The cross-domain system filtering system comprises a training sample conversion module, a user potential vector generation module, a training sample first expansion module, a training sample second expansion module, an unbalanced classifier training module and a recommendation module; the training sample conversion module is used for converting the user project scoring data into a training sample set of a classification algorithm; the user potential vector generation module is used for carrying out the Funk-SVD decomposition on the user item scoring matrix of each auxiliary domain to obtain a user potential vector; the training sample first expansion module is used for expanding the feature vectors of the users in the training sample set by using the user potential vectors to obtain a first expanded training sample set; the training sample second expansion module is used for adding project characteristic information to expand the characteristic vectors of the projects in the first expansion training sample set to obtain a second expansion training sample set; the imbalance classifier training module is configured to train an imbalance classifier using the second extended training sample set; the recommendation module is used for predicting missing data of the user item scoring data based on the imbalance classifier and generating a recommendation.
Further, the training sample conversion module is specifically configured to adopt LuRepresenting the rows of the user in the user item scoring matrix, using LiRepresenting columns of items in a user item rating matrix and based on a feature vector (L)u,Li) Constructing user project scoring dataTraining sample set of classification algorithm (L)u,Li,Rui) L (u, i) e.k, where k is the set of scored "user-item" pairs in the scoring matrix, RuiRepresenting the user u's rating for item i.
Further, the user potential vector generating module comprises an objective function setting unit, an objective function optimizing unit and a user potential vector generating unit; the target function setting unit is used for setting a target function
Figure GDA0003367752720000041
The objective function optimization unit is used for adopting pu+γ(euiqi-λpu) And q isi+γ(euipu-λqi) Updating puAnd q isiTo optimize the objective function; wherein, λ is a regularization parameter, and γ is a learning rate; the user potential vector generating unit is used for obtaining a potential vector of the user u on the jth auxiliary domain based on the optimization result
Figure GDA0003367752720000048
Wherein j is from 1 to K, and K is the number of the auxiliary domains; wherein r isuiRepresents the scoring of item i by user u; p is a radical of*={puserI user ∈ userset } represents the set of all users' potential vectors, q*={qitem| item ∈ itemset } represents the set of potential factors of all items; p is a radical ofuA latent factor vector, q, representing user uiA potential factor vector representing item i;
Figure GDA0003367752720000042
further, the unbalanced classifier training module comprises a sample weight initialization unit, a weak classifier training unit, a sample weight updating unit and an unbalanced classifier generating unit; the sample weight initialization unit is used for initializing the sample weight of each sample in the second extended training sample set as
Figure GDA0003367752720000043
Wherein A is the number of samples, and a is more than or equal to 1 and less than or equal to A; the weak classifier training unit is used for weighting all samples { D ] according to the t iterationt(xa) If 1 is less than or equal to a and less than or equal to A, training and obtaining the weak classifier ht(ii) a Wherein T is from 1 to T; the sample weight updating unit is used for calculating each training sample xaPenalty term of
Figure GDA0003367752720000044
Figure GDA0003367752720000045
Wherein the content of the first and second substances,
Figure GDA0003367752720000046
is the weight of the weak classifier; use of
Figure GDA0003367752720000047
Updating the sample weight; wherein Z istFor the regularization factor, λ ∈ [0.5, 12 ]]Updating step length of the penalty item; the unbalanced classifier generating unit is used for calculating the unbalanced classifier after the weak classifier training unit and the sample weight updating unit repeat the calculation for T times
Figure GDA0003367752720000051
Wherein, H is the classification result integrating all classifiers.
Compared with the prior art, the application has the advantages and positive effects that: according to the cross-domain collaborative filtering method and system, scoring data in a user project scoring matrix are converted into training samples according to positions of the scoring data in the matrix as characteristic vectors, user potential vectors are obtained from other auxiliary domains containing relatively rich information through Funk-SVD decomposition, the training sample set is expanded by using the user potential vectors to obtain a first expanded training sample set, sparseness of a target domain is reduced, the first expanded training sample set is expanded by using project characteristic information of the auxiliary domains to obtain a second expanded training sample set, and finally an imbalance classifier is trained by using the expanded training sample set, namely the converted and expanded training set is classified, missing data of the user project scoring matrix of the target domain is predicted, and recommendation data for a user are generated; in the application, the problem that the existing recommendation system is unbalanced in data set is solved by adopting an unbalanced classification model, and the problem of biased distribution of scores is effectively solved.
Other features and advantages of the present application will become more apparent from the detailed description of the embodiments of the present application when taken in conjunction with the accompanying drawings.
Drawings
FIG. 1 is a flowchart of a cross-domain collaborative filtering method proposed in the present application;
fig. 2 is a system architecture diagram of the cross-domain collaborative filtering system proposed in the present application.
Detailed Description
The following describes embodiments of the present application in further detail with reference to the accompanying drawings.
The cross-domain collaborative filtering method aims to solve the problem of sparsity of target domain data by adopting auxiliary domain data to expand after a user project scoring matrix of a target domain is converted into a training sample set, then train an unbalanced classifier on the expanded training sample, predict missing items of the target domain by adopting the unbalanced classifier, further obtain recommended data and solve the problems of sparsity and unbalance of a data set of the conventional recommendation system. The method specifically comprises the following steps:
step S11: and converting the user item scoring data into a training sample set of a classification algorithm.
In the embodiment of the application, assuming that a target domain is T, u and i respectively represent projects of a user, a relationship between the user and the projects is represented by u × i → R, R is a score, and the range is set to {1, 2, 3, 4, 5 }; in the examples of this application, L is useduRepresenting the rows of the user u in the user item scoring matrix by LiRepresenting the columns of the items i in the user item scoring matrix, each score in the user item scoring data can be represented as a training sample { (L)u,Li,Rui) L (u, i) e k, where k is the score in the scoring matrixA set of "user-item" pairs that are divided, i.e., the user item scoring matrix shown in table one is converted to a training sample set shown in table two:
watch 1
i1 i2 i3 i4
u1 5 4
u2 5 1
u3 2 4 3
Watch two
Figure GDA0003367752720000061
Figure GDA0003367752720000071
In Table I, u1、u1And u3For three users, i1、i2、i3And i4Is four items, using the position of the user's row in the user item scoring matrix as LuUsing the position of the column of the item in the user item scoring matrix as LiTherefore, the correlation between u and i can be represented by (1, 1, 5), so as to convert the user item scoring matrix of table one into the training sample set of table two, i.e., based on the feature vector (L)u,Li) A training sample set of user project scoring data is generated.
Step S12: and carrying out the Funk-SVD decomposition on the user item scoring matrix of each auxiliary domain to obtain a user potential vector.
In the conventional collaborative filtering method, in order to solve the problem of sparsity of a user project scoring matrix, effective information is usually found from the same domain, for example, the relationship between a user and a project is inferred by information of social networks, trust relationships or comments, but the information in the same domain is not easily obtained.
In the embodiment of the application, the Funk-SVD decomposition is applied to the user item scoring matrix in the auxiliary domain to obtain the user latent vector, that is, the user item scoring matrix is decomposed into the user latent factor multiplied by the item latent factor through the Funk-SVD decompositionIn this form, the user item scoring matrix with a high dimension is decomposed into two matrices with a low dimension, such as X (m × n) into U (m × k) × V (k × n), where m and n are the number of rows and columns, k, respectively, of the user item scoring matrix representing the dimension of the latent factor, and k is much smaller than min (m, n). Funk-SVD decomposes known points fitting X to maximize to predict unknown points of X, k too small may not fit the data, and k too large may result in overfitting, to use
Figure GDA0003367752720000072
The predicted score of user u for item i is shown as
Figure GDA0003367752720000073
Wherein p isuA latent factor vector, q, representing user uiRepresenting the potential factor vector for item i.
In decomposition, an objective function is set to
Figure GDA0003367752720000081
Wherein p is*={puserI user ∈ userset } represents the set of all users' potential vectors, q*={qitem| item ∈ itemset } represents the set of potential factors for all items.
By pu←pu+γ(euiqi-λpu) And q isi←qi+γ(euipu-λqi) Updating puAnd q isiTo optimize the objective function to obtain the optimal optimization result, wherein
Figure GDA0003367752720000082
Finally, obtaining the potential vector of the user u on the jth auxiliary domain based on the optimization result
Figure GDA0003367752720000083
Wherein j is from 1 to K, and K is the number of the auxiliary domains; lambda is a regularization parameter, gamma is a learning rate, too large a value of gamma will cause the algorithm not to converge, and too small a value will cause the algorithm to converge for a long time.
Step S13: and expanding the characteristic vectors of the users in the training sample set by using the potential vectors of the users to obtain a first expanded training sample set. And, step S14: and adding project characteristic information to expand the characteristic vectors of the projects in the first extended training sample set to obtain a second extended training sample set.
Adding the user potential vector obtained in the step S12 to the training sample in the target domain, that is, adding the user potential vector to the feature vector (L)u,Li) Obtaining a first extended training sample set
Figure GDA0003367752720000084
In addition, item feature information is added to expand the first extended training sample set to obtain a second extended training sample set, so that recommendation performance is improved. Taking a movie domain as an example, attributes of movies may be added to the feature vector, attribute information of all movies is retrieved according to movie names, and a plurality of attributes are selected from the attribute information as project features added to the feature vector, such as director, genre, actors, country, language, etc., to obtain a second extended training sample set which can be expressed as
Figure GDA0003367752720000085
And q is the item feature number.
Step S15: the imbalance classifier is trained using a second extended training sample set.
In the embodiment of the application, the converted and expanded second expansion training sample set is classified by adopting an AdaBoost. The basic principle is that a plurality of classifiers are reasonably combined to form a strong classifier, the idea of iteration is adopted, only one weak classifier is trained in each iteration, the trained weak classifier participates in the next iteration, namely, after the Nth iteration, N weak classifiers exist in total, wherein N-1 weak classifiers are trained previously and various parameters of the N weak classifiers are not changed, the Nth classifier is trained at this time, wherein the weak classifier is in a relationship that the Nth weak classifier is more likely to classify data which is not paired with the previous N-1 weak classifiers, and finally, the data is classifiedThe class output is to look at the combined effect of the N classifiers. NC algorithm, there are two weights, the first is the sample weight of each sample in the training sample set, expressed by vector D, after one learning, the sample weight needs to be readjusted, adjust the sample weight of the misclassified sample in this classification, make it possible to learn it in the following learning in a repeated way; the other weight is the weight of each weak classifier, which is expressed by a vector alpha, and because a plurality of classifiers exist, a fuzzy term is set to measure the difference between different classifiers, and the fuzzy term is used for measuring the difference between the different classifiers
Figure GDA0003367752720000094
Is represented bytRepresenting the classification result of the t weak classifier; if the training sample x is correctly classified by the t-th weak classifier, htIs 1, otherwise is-1; and H is the classification result integrating all the classifiers.
Specifically, in the embodiment of the present application, the sample weight of each sample in the second extended training sample set is initialized to be
Figure GDA0003367752720000091
Wherein A is the number of samples, and a is more than or equal to 1 and less than or equal to A; and (3) setting the number of the weak classifiers as T, repeating the following steps for T times to carry out iterative computation based on the idea of AdaBoost. NC imbalance algorithm: 1) from the t-th iteration, all sample weights { D }t(xa) A is more than or equal to 1 and less than or equal to A, training and obtaining a weak classifier ht(ii) a Wherein T is from 1 to T, T is from 1, and 1 is increased when the T is repeated until T; 2) calculate each training sample xaPenalty term p oft1- | amb |, wherein,
Figure GDA0003367752720000092
is the weight of the weak classifier; 3) use of
Figure GDA0003367752720000093
Updating the sample weight; wherein Z istFor the regularization factor, λ ∈ [0.5, 12 ]]An update step size which is a penalty term; is completed in T iterationsThen, calculating the imbalance classifier
Figure GDA0003367752720000101
Step S16: missing data for the user item scoring data is predicted based on the imbalance classifier and recommendations are generated.
As can be seen from the above, in the cross-domain collaborative filtering method provided by the present application, the scoring data in the user project scoring matrix is converted into a training sample as a feature vector according to the position of the scoring data in the matrix, then a user potential vector is obtained from other auxiliary domains containing relatively rich information through Funk-SVD decomposition, and the training sample set is expanded using the user potential vector to obtain a first expanded training sample set, thereby reducing the sparsity of a target domain, and then the first expanded training sample set is expanded using the project feature information of the auxiliary domains to obtain a second expanded training sample set, and finally the unbalanced classifier is trained using the second expanded training sample set after expansion, that is, the converted and expanded training set is classified, missing data of the user project scoring matrix of the target domain is predicted, and recommendation data for a user is generated; in the application, the problem that the existing recommendation system is unbalanced in data set is solved by adopting an unbalanced classification model, and the problem of biased distribution of scores is effectively solved.
Based on the above proposed cross-domain collaborative filtering method, the present application further proposes a cross-domain collaborative filtering system, as shown in fig. 2, which includes a training sample conversion module 21, a user potential vector generation module 22, a training sample first extension module 23, a training sample second extension module 24, an imbalance classifier training module 25, and a recommendation module 26.
The training sample conversion module 21 is configured to convert the user item scoring data into a training sample set of a classification algorithm; the user potential vector generating module 22 is configured to perform a Funk-SVD decomposition on the user item score matrix of each auxiliary domain to obtain a user potential vector; the training sample first extension module 23 is configured to extend the feature vectors of the users in the training sample set by using the user potential vectors to obtain a first extended training sample set; the training sample second expansion module 24 is configured to add the project feature information to expand the feature vectors of the projects in the first extended training sample set to obtain a second extended training sample set; the imbalance classifier training module 25 is configured to train an imbalance classifier using the second extended training sample set; the recommendation module 26 is to predict missing data for user item scoring data based on the imbalance classifier and generate recommendations.
In particular, the training sample conversion module is adapted to employ LuRepresenting the rows of the user in the user item scoring matrix, using LiRepresenting columns of items in the user item rating testimony and based on a feature vector (L)u,Li) Training sample set of classification algorithm for generating user project scoring data, { (L)u,Li,Rui) L (u, i) e.k, where k is the set of scored "user-item" pairs in the scoring matrix, RuiRepresenting the user u's rating for item i.
In the embodiment of the present application, the user latent vector generating module 22 includes an objective function setting unit 221, an objective function optimizing unit 222, and a user latent vector generating unit 223; the objective function setting unit 221 is for setting an objective function
Figure GDA0003367752720000111
The objective function optimization unit 222 is for employing pu←pu+γ(euiqi-λpu) And q isi←qi+γ(euipu-λqi) Updating puAnd q isiTo optimize the objective function; the user potential vector generating unit 223 is used for obtaining a potential vector of the user u in the jth auxiliary domain based on the optimization result
Figure GDA0003367752720000112
Where j ranges from 1 to K, where K is the number of auxiliary fields.
The unbalanced classifier training module 25 includes a sample weight initialization unit 251, a weak classifier training unit 252, a sample weight update unit 253, and an unbalanced classifier generation unit 254; the sample weight initialization unit 251 is used to initialize the sample weight of each sample in the second extended training sample set to
Figure GDA0003367752720000113
Wherein A is the number of samples, and a is more than or equal to 1 and less than or equal to A; the weak classifier training unit 252 is used for weighting all samples { D } according to the t-th iteration of the samplet(xa) If 1 is less than or equal to a and less than or equal to A, training and obtaining the weak classifier ht(ii) a Wherein T is from 1 to T; the sample weight update unit 253 is used for calculating each training sample xaPenalty term of
Figure GDA0003367752720000114
Figure GDA0003367752720000115
Wherein the content of the first and second substances,
Figure GDA0003367752720000116
is the weight of the weak classifier; use of
Figure GDA0003367752720000117
Updating the sample weight; wherein Z istFor the regularization factor, λ ∈ [0.5, 12 ]]An update step size which is a penalty term; the unbalanced classifier generating unit 254 is used for calculating the unbalanced classifier after repeating the calculation for T times by the weak classifier training unit and the sample weight updating unit
Figure GDA0003367752720000121
The recommendation method of the cross-domain collaborative filtering system has been described in detail in the above proposed cross-domain collaborative filtering method, and is not described herein again.
It should be noted that the above description is not intended to limit the present invention, and the present invention is not limited to the above examples, and those skilled in the art should also make changes, modifications, additions or substitutions within the spirit and scope of the present invention.

Claims (8)

1. A cross-domain collaborative filtering method is characterized by comprising the following steps:
converting the user item scoring data into a training sample set of a classification algorithm;
performing Funk-SVD on the user item scoring matrix of each auxiliary domain to obtain a user potential vector;
expanding the characteristic vector of the user in the training sample set by using the user potential vector to obtain a first expanded training sample set;
adding project characteristic information to expand the characteristic vectors of the projects in the first extended training sample set to obtain a second extended training sample set;
training an imbalance classifier using the second extended training sample set;
predicting missing data for the user item scoring data based on the imbalance classifier and generating a recommendation.
2. The cross-domain collaborative filtering method according to claim 1, wherein the user item scoring data is converted into a training sample set of a classification algorithm, specifically:
by using LuRepresenting the rows of the user in the user item scoring matrix, using LiColumns representing items in the user item rating matrix;
based on feature vector (L)u,Li) Training sample set of classification algorithm for constructing user project scoring data { (L)u,Li,Rui) L (u, i) e.k, where k is the set of scored "user-item" pairs in the scoring matrix, RuiRepresenting the user u's rating for item i.
3. The cross-domain collaborative filtering method according to claim 2, wherein the Funk-SVD decomposition is performed on the user item scoring matrix of each auxiliary domain to obtain a user potential vector, and specifically comprises:
setting an objective function
Figure FDA0003367752710000011
By pu+γ(euiqi-λpu) And q isi+γ(euipu-λqi) Updating puAnd q isiTo optimize the objective function; wherein, λ is a regularization parameter, and γ is a learning rate;
obtaining a potential vector of the user u on the jth auxiliary domain based on the optimization result
Figure FDA0003367752710000021
Wherein j is from 1 to K, and K is the number of the auxiliary domains;
wherein r isuiRepresents the scoring of item i by user u; p is a radical of*={puserI user ∈ userset } represents the set of all users' potential vectors, q*={qitem| item ∈ itemset } represents the set of potential factors of all items; p is a radical ofuA latent factor vector, q, representing user uiA potential factor vector representing item i;
Figure FDA0003367752710000022
4. the cross-domain collaborative filtering method according to claim 1, wherein training an imbalance classifier using the second extended training sample set specifically includes:
initializing a sample weight of each sample in the second extended training sample set to
Figure FDA0003367752710000023
Wherein A is the number of samples, and a is more than or equal to 1 and less than or equal to A;
repeating the following steps for T times:
1) from the t-th iteration, all sample weights { D }t(xa) If 1 is less than or equal to a and less than or equal to A, training and obtaining the weak classifier ht(ii) a Wherein T is from 1 to T;
2) calculate each training sample xaPenalty term p oft=1-|amb|,
Figure FDA0003367752710000024
Figure FDA0003367752710000025
Wherein the content of the first and second substances,
Figure FDA0003367752710000026
is the weight of the weak classifier;
3) use of
Figure FDA0003367752710000027
Updating the sample weight; wherein Z istFor the regularization factor, λ ∈ [0.5, 12 ]]Updating step length of the penalty item;
computational unbalance classifier
Figure FDA0003367752710000028
Wherein, H is the classification result integrating all classifiers.
5. A cross-domain collaborative filtering system is characterized by comprising a training sample conversion module, a user potential vector generation module, a training sample first expansion module, a training sample second expansion module, an unbalanced classifier training module and a recommendation module;
the training sample conversion module is used for converting the user project scoring data into a training sample set of a classification algorithm;
the user potential vector generation module is used for carrying out the Funk-SVD decomposition on the user item scoring matrix of each auxiliary domain to obtain a user potential vector;
the training sample first expansion module is used for expanding the feature vectors of the users in the training sample set by using the user potential vectors to obtain a first expanded training sample set; the training sample second expansion module is used for adding project characteristic information to expand the characteristic vectors of the projects in the first expansion training sample set to obtain a second expansion training sample set;
the imbalance classifier training module is configured to train an imbalance classifier using the second extended training sample set;
the recommendation module is used for predicting missing data of the user item scoring data based on the imbalance classifier and generating a recommendation.
6. The cross-domain collaborative filtering system according to claim 5, wherein the training sample transformation module is specifically configured to employ LuRepresenting the rows of the user in the user item scoring matrix, using LiRepresenting columns of items in a user item rating matrix and based on a feature vector (L)u,Li) Training sample set of classification algorithm for constructing user project scoring data { (L)u,Li,Rui) L (u, i) e.k, where k is the set of scored "user-item" pairs in the scoring matrix, RuiRepresenting the user u's rating for item i.
7. The cross-domain collaborative filtering system according to claim 6, wherein the user potential vector generation module includes an objective function setting unit, an objective function optimization unit, and a user potential vector generation unit;
the target function setting unit is used for setting a target function
Figure FDA0003367752710000031
The objective function optimization unit is used for adopting pu+γ(euiqi-λpu) And q isi+γ(euipu-λqi) Updating puAnd q isiTo optimize the objective function; wherein, λ is a regularization parameter, and γ is a learning rate;
the user potential vector generating unit is used for obtaining a potential vector of the user u on the jth auxiliary domain based on the optimization result
Figure FDA0003367752710000041
Wherein j is from 1 to K, and K is the number of the auxiliary domains;
wherein r isuiRepresents the scoring of item i by user u; p is a radical of*={puserI user ∈ userset } represents the set of all users' potential vectors, q*={qitem| item ∈ itemset } represents the set of potential factors of all items; p is a radical ofuA latent factor vector, q, representing user uiA potential factor vector representing item i;
Figure FDA0003367752710000042
8. the cross-domain collaborative filtering system according to claim 5, wherein the unbalanced classifier training module comprises a sample weight initialization unit, a weak classifier training unit, a sample weight update unit, and an unbalanced classifier generation unit;
the sample weight initialization unit is used for initializing the sample weight of each sample in the second extended training sample set as
Figure FDA0003367752710000043
Wherein A is the number of samples, and a is more than or equal to 1 and less than or equal to A;
the weak classifier training unit is used for weighting all samples { D ] according to the t iterationt(xa) A is more than or equal to 1 and less than or equal to A, training and obtaining a weak classifier ht(ii) a Wherein T is from 1 to T;
the sample weight updating unit is used for calculating each training sample xaPenalty term p oft=1-|amb|,
Figure FDA0003367752710000044
Wherein the content of the first and second substances,
Figure FDA0003367752710000045
is the weight of the weak classifier; use of
Figure FDA0003367752710000046
Figure FDA0003367752710000047
Updating the sample weight; wherein Z istFor the regularization factor, λ ∈ [0.5, 12 ]]Updating step length of the penalty item;
the unbalanced classifier generating unit is used for calculating the unbalanced classifier after the weak classifier training unit and the sample weight updating unit repeat the calculation for T times
Figure FDA0003367752710000051
Wherein, H is the classification result integrating all classifiers.
CN201811209371.5A 2018-10-17 2018-10-17 Cross-domain collaborative filtering method and system Active CN109446420B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201811209371.5A CN109446420B (en) 2018-10-17 2018-10-17 Cross-domain collaborative filtering method and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201811209371.5A CN109446420B (en) 2018-10-17 2018-10-17 Cross-domain collaborative filtering method and system

Publications (2)

Publication Number Publication Date
CN109446420A CN109446420A (en) 2019-03-08
CN109446420B true CN109446420B (en) 2022-01-25

Family

ID=65546951

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201811209371.5A Active CN109446420B (en) 2018-10-17 2018-10-17 Cross-domain collaborative filtering method and system

Country Status (1)

Country Link
CN (1) CN109446420B (en)

Families Citing this family (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110119465B (en) * 2019-05-17 2023-06-13 哈尔滨工业大学 Mobile phone application user preference retrieval method integrating LFM potential factors and SVD
CN110264274B (en) * 2019-06-21 2023-12-29 深圳前海微众银行股份有限公司 Guest group dividing method, model generating method, device, equipment and storage medium
CN110297848B (en) * 2019-07-09 2024-02-23 深圳前海微众银行股份有限公司 Recommendation model training method, terminal and storage medium based on federal learning
CN112214682B (en) * 2019-07-11 2023-04-07 中移(苏州)软件技术有限公司 Recommendation method, device and equipment based on field and storage medium
CN110543597B (en) * 2019-08-30 2022-06-03 北京奇艺世纪科技有限公司 Grading determination method and device and electronic equipment

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102385586A (en) * 2010-08-27 2012-03-21 日电(中国)有限公司 Multiparty cooperative filtering method and system
CN102930341A (en) * 2012-10-15 2013-02-13 罗辛 Optimal training method of collaborative filtering recommendation model
EP2837199A1 (en) * 2012-04-12 2015-02-18 MOVIRI S.p.A. Client-side recommendations on one-way broadcast networks
CN105447145A (en) * 2015-11-25 2016-03-30 天津大学 Item-based transfer learning recommendation method and recommendation apparatus thereof

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102385586A (en) * 2010-08-27 2012-03-21 日电(中国)有限公司 Multiparty cooperative filtering method and system
EP2837199A1 (en) * 2012-04-12 2015-02-18 MOVIRI S.p.A. Client-side recommendations on one-way broadcast networks
CN102930341A (en) * 2012-10-15 2013-02-13 罗辛 Optimal training method of collaborative filtering recommendation model
CN105447145A (en) * 2015-11-25 2016-03-30 天津大学 Item-based transfer learning recommendation method and recommendation apparatus thereof

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
A User-Based Cross Domain CollaborativeFiltering Algorithm Based on a Linear Decomposition Model;xu yu等;《IEEE》;20171116;全文 *
跨域协同过滤系统;刘青文;《中国博士学位全文全文数据库》;20131031;全文 *

Also Published As

Publication number Publication date
CN109446420A (en) 2019-03-08

Similar Documents

Publication Publication Date Title
CN109446420B (en) Cross-domain collaborative filtering method and system
Wang et al. Relational deep learning: A deep latent variable model for link prediction
Dong et al. Hnhn: Hypergraph networks with hyperedge neurons
Jadhav et al. Comparative study of K-NN, naive Bayes and decision tree classification techniques
CN112232925A (en) Method for carrying out personalized recommendation on commodities by fusing knowledge maps
Ma et al. Adaptive-step graph meta-learner for few-shot graph classification
US8005784B2 (en) Supervised rank aggregation based on rankings
US20080195631A1 (en) System and method for determining web page quality using collective inference based on local and global information
Jin et al. Pattern classification with corrupted labeling via robust broad learning system
CN112529168A (en) GCN-based attribute multilayer network representation learning method
WO2022252458A1 (en) Classification model training method and apparatus, device, and medium
Kongsorot et al. Multi-label classification with extreme learning machine
Wan et al. Adaptive knowledge subgraph ensemble for robust and trustworthy knowledge graph completion
Ludl et al. Using machine learning models to explore the solution space of large nonlinear systems underlying flowsheet simulations with constraints
CN111144500A (en) Differential privacy deep learning classification method based on analytic Gaussian mechanism
WO2020147259A1 (en) User portait method and apparatus, readable storage medium, and terminal device
Dehuri et al. A condensed polynomial neural network for classification using swarm intelligence
CN114282077A (en) Session recommendation method and system based on session data
US20060276996A1 (en) Fast tracking system and method for generalized LARS/LASSO
CN116975686A (en) Method for training student model, behavior prediction method and device
Zhou et al. Online recommendation based on incremental-input self-organizing map
WO2022105780A1 (en) Recommendation method and apparatus, electronic device, and storage medium
CN117033992A (en) Classification model training method and device
CN115757897A (en) Intelligent culture resource recommendation method based on knowledge graph convolution network
CN114254738A (en) Double-layer evolvable dynamic graph convolution neural network model construction method and application

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant