CN109446420B - Cross-domain collaborative filtering method and system - Google Patents
Cross-domain collaborative filtering method and system Download PDFInfo
- Publication number
- CN109446420B CN109446420B CN201811209371.5A CN201811209371A CN109446420B CN 109446420 B CN109446420 B CN 109446420B CN 201811209371 A CN201811209371 A CN 201811209371A CN 109446420 B CN109446420 B CN 109446420B
- Authority
- CN
- China
- Prior art keywords
- user
- training sample
- item
- classifier
- training
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q30/00—Commerce
- G06Q30/06—Buying, selling or leasing transactions
- G06Q30/0601—Electronic shopping [e-shopping]
- G06Q30/0631—Item recommendations
Landscapes
- Business, Economics & Management (AREA)
- Accounting & Taxation (AREA)
- Finance (AREA)
- Development Economics (AREA)
- Economics (AREA)
- Marketing (AREA)
- Strategic Management (AREA)
- Physics & Mathematics (AREA)
- General Business, Economics & Management (AREA)
- General Physics & Mathematics (AREA)
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention discloses a cross-domain collaborative filtering method, which comprises the steps of converting user project scoring data into training sample sets, carrying out Funk-SVD decomposition on a user project scoring matrix of each auxiliary domain to obtain user potential vectors, then expanding the training sample sets by using the user potential vectors to obtain first expanded training sample sets, adding project characteristic information to expand the first expanded training sample sets to obtain second expanded training sample sets, training an imbalance classifier by using the second expanded training sample sets, and finally predicting missing data of the user project scoring data based on the imbalance classifier and generating recommendations; the problem of sparsity of target domain data is solved by adopting auxiliary domain data to expand, then training of an unbalanced classifier is carried out on a training sample after expansion, missing items of a target domain are predicted by adopting the unbalanced classifier, then recommended data are obtained, and the problems of sparsity and unbalance of a data set of an existing recommendation system are solved.
Description
Technical Field
The invention belongs to the technical field of information recommendation, and particularly relates to a cross-domain collaborative filtering method and a cross-domain collaborative filtering system.
Background
The rapid growth of internet information requires that effective intelligent information agents be able to screen out all available information and find the most valuable information to the user among them.
In recent years, recommendation systems are widely applied to e-commerce networks and online social media, and currently, main recommendation methods are: content-based recommendations, collaborative filtering-based recommendations, association rule-based recommendations, utility-based recommendations, knowledge-based recommendations, combinatorial recommendations, and the like; the recommendation based on collaborative filtering is the most successful strategy in the recommendation method, and the basic idea is that the user likes resources similar to the user, and the user probably likes the resources; a user likes a resource, and is likely to also like other resources similar to the resource; namely, users can help each other to mine and filter out the content of interest of the users through the behaviors of the users on the website, such as resource evaluation, browsing and the like.
However, in practical recommendation systems, users are often reluctant to score items they do not like, which results in an imbalance in the majority of scoring data sets.
Disclosure of Invention
The application provides a cross-domain collaborative filtering method and a cross-domain collaborative filtering system, which solve the technical problem of unbalanced data sets in the conventional recommendation system.
In order to solve the technical problems, the application adopts the following technical scheme:
a cross-domain collaborative filtering method is provided, which comprises the following steps: converting the user item scoring data into a training sample set of a classification algorithm; performing Funk-SVD on the user item scoring matrix of each auxiliary domain to obtain a user potential vector; expanding the characteristic vector of the user in the training sample set by using the user potential vector to obtain a first expanded training sample set; adding project characteristic information to expand the characteristic vectors of the projects in the first extended training sample set to obtain a second extended training sample set; training an imbalance classifier using the second extended training sample set; predicting missing data for the user item scoring data based on the imbalance classifier and generating a recommendation.
Further, the user item scoring data is converted into a training sample set of a classification algorithm, specifically: by using LuRepresenting the rows of the user in the user item scoring matrix, using LiColumns representing items in the user item rating matrix; based on feature vector (L)u,Li) Training sample set of classification algorithm for constructing user project scoring data { (L)u,Li,Rui) L (u, i) e.k, where k is the set of scored "user-item" pairs in the scoring matrix, RuiRepresenting the user u's rating for item i.
Further, for each assistantThe method comprises the following steps of carrying out a Funk-SVD decomposition on a user item scoring matrix of a domain to obtain a user potential vector, and specifically comprises the following steps: setting an objective functionBy pu+γ(euiqi-λpu) And q isi+γ(euipu-λqi) Updating puAnd q isiTo optimize the objective function; wherein, λ is a regularization parameter, and γ is a learning rate; obtaining a potential vector of the user u on the jth auxiliary domain based on the optimization resultWherein j is from 1 to K, and K is the number of the auxiliary domains; wherein r isuiRepresents the scoring of item i by user u; p is a radical of*={puserI user ∈ userset } represents the set of all users' potential vectors, q*={qitem| item ∈ itemset } represents the set of potential factors of all items; p is a radical ofuA latent factor vector, q, representing user uiA potential factor vector representing item i;
further, training an imbalance classifier using the second extended training sample set specifically includes: initializing a sample weight of each sample in the second extended training sample set toWherein A is the number of samples, and a is more than or equal to 1 and less than or equal to A; repeating the following steps for T times: 1) from the t-th iteration, all weights { Dt(xa) If 1 is less than or equal to a and less than or equal to A, training and obtaining the weak classifier ht(ii) a Wherein T is from 1 to T; 2) calculate each training sample xaPenalty term of Wherein the content of the first and second substances,is the weight of the weak classifier; 3) use ofUpdating the sample weight; wherein Z istFor the regularization factor, λ ∈ [0.5, 12 ]]Updating step length of the penalty item; computational unbalance classifierWherein, H is the classification result integrating all classifiers.
The cross-domain system filtering system comprises a training sample conversion module, a user potential vector generation module, a training sample first expansion module, a training sample second expansion module, an unbalanced classifier training module and a recommendation module; the training sample conversion module is used for converting the user project scoring data into a training sample set of a classification algorithm; the user potential vector generation module is used for carrying out the Funk-SVD decomposition on the user item scoring matrix of each auxiliary domain to obtain a user potential vector; the training sample first expansion module is used for expanding the feature vectors of the users in the training sample set by using the user potential vectors to obtain a first expanded training sample set; the training sample second expansion module is used for adding project characteristic information to expand the characteristic vectors of the projects in the first expansion training sample set to obtain a second expansion training sample set; the imbalance classifier training module is configured to train an imbalance classifier using the second extended training sample set; the recommendation module is used for predicting missing data of the user item scoring data based on the imbalance classifier and generating a recommendation.
Further, the training sample conversion module is specifically configured to adopt LuRepresenting the rows of the user in the user item scoring matrix, using LiRepresenting columns of items in a user item rating matrix and based on a feature vector (L)u,Li) Constructing user project scoring dataTraining sample set of classification algorithm (L)u,Li,Rui) L (u, i) e.k, where k is the set of scored "user-item" pairs in the scoring matrix, RuiRepresenting the user u's rating for item i.
Further, the user potential vector generating module comprises an objective function setting unit, an objective function optimizing unit and a user potential vector generating unit; the target function setting unit is used for setting a target functionThe objective function optimization unit is used for adopting pu+γ(euiqi-λpu) And q isi+γ(euipu-λqi) Updating puAnd q isiTo optimize the objective function; wherein, λ is a regularization parameter, and γ is a learning rate; the user potential vector generating unit is used for obtaining a potential vector of the user u on the jth auxiliary domain based on the optimization resultWherein j is from 1 to K, and K is the number of the auxiliary domains; wherein r isuiRepresents the scoring of item i by user u; p is a radical of*={puserI user ∈ userset } represents the set of all users' potential vectors, q*={qitem| item ∈ itemset } represents the set of potential factors of all items; p is a radical ofuA latent factor vector, q, representing user uiA potential factor vector representing item i;
further, the unbalanced classifier training module comprises a sample weight initialization unit, a weak classifier training unit, a sample weight updating unit and an unbalanced classifier generating unit; the sample weight initialization unit is used for initializing the sample weight of each sample in the second extended training sample set asWherein A is the number of samples, and a is more than or equal to 1 and less than or equal to A; the weak classifier training unit is used for weighting all samples { D ] according to the t iterationt(xa) If 1 is less than or equal to a and less than or equal to A, training and obtaining the weak classifier ht(ii) a Wherein T is from 1 to T; the sample weight updating unit is used for calculating each training sample xaPenalty term of Wherein the content of the first and second substances,is the weight of the weak classifier; use ofUpdating the sample weight; wherein Z istFor the regularization factor, λ ∈ [0.5, 12 ]]Updating step length of the penalty item; the unbalanced classifier generating unit is used for calculating the unbalanced classifier after the weak classifier training unit and the sample weight updating unit repeat the calculation for T timesWherein, H is the classification result integrating all classifiers.
Compared with the prior art, the application has the advantages and positive effects that: according to the cross-domain collaborative filtering method and system, scoring data in a user project scoring matrix are converted into training samples according to positions of the scoring data in the matrix as characteristic vectors, user potential vectors are obtained from other auxiliary domains containing relatively rich information through Funk-SVD decomposition, the training sample set is expanded by using the user potential vectors to obtain a first expanded training sample set, sparseness of a target domain is reduced, the first expanded training sample set is expanded by using project characteristic information of the auxiliary domains to obtain a second expanded training sample set, and finally an imbalance classifier is trained by using the expanded training sample set, namely the converted and expanded training set is classified, missing data of the user project scoring matrix of the target domain is predicted, and recommendation data for a user are generated; in the application, the problem that the existing recommendation system is unbalanced in data set is solved by adopting an unbalanced classification model, and the problem of biased distribution of scores is effectively solved.
Other features and advantages of the present application will become more apparent from the detailed description of the embodiments of the present application when taken in conjunction with the accompanying drawings.
Drawings
FIG. 1 is a flowchart of a cross-domain collaborative filtering method proposed in the present application;
fig. 2 is a system architecture diagram of the cross-domain collaborative filtering system proposed in the present application.
Detailed Description
The following describes embodiments of the present application in further detail with reference to the accompanying drawings.
The cross-domain collaborative filtering method aims to solve the problem of sparsity of target domain data by adopting auxiliary domain data to expand after a user project scoring matrix of a target domain is converted into a training sample set, then train an unbalanced classifier on the expanded training sample, predict missing items of the target domain by adopting the unbalanced classifier, further obtain recommended data and solve the problems of sparsity and unbalance of a data set of the conventional recommendation system. The method specifically comprises the following steps:
step S11: and converting the user item scoring data into a training sample set of a classification algorithm.
In the embodiment of the application, assuming that a target domain is T, u and i respectively represent projects of a user, a relationship between the user and the projects is represented by u × i → R, R is a score, and the range is set to {1, 2, 3, 4, 5 }; in the examples of this application, L is useduRepresenting the rows of the user u in the user item scoring matrix by LiRepresenting the columns of the items i in the user item scoring matrix, each score in the user item scoring data can be represented as a training sample { (L)u,Li,Rui) L (u, i) e k, where k is the score in the scoring matrixA set of "user-item" pairs that are divided, i.e., the user item scoring matrix shown in table one is converted to a training sample set shown in table two:
watch 1
i1 | i2 | i3 | i4 | |
u1 | 5 | 4 | ||
u2 | 5 | 1 | ||
u3 | 2 | 4 | 3 |
Watch two
In Table I, u1、u1And u3For three users, i1、i2、i3And i4Is four items, using the position of the user's row in the user item scoring matrix as LuUsing the position of the column of the item in the user item scoring matrix as LiTherefore, the correlation between u and i can be represented by (1, 1, 5), so as to convert the user item scoring matrix of table one into the training sample set of table two, i.e., based on the feature vector (L)u,Li) A training sample set of user project scoring data is generated.
Step S12: and carrying out the Funk-SVD decomposition on the user item scoring matrix of each auxiliary domain to obtain a user potential vector.
In the conventional collaborative filtering method, in order to solve the problem of sparsity of a user project scoring matrix, effective information is usually found from the same domain, for example, the relationship between a user and a project is inferred by information of social networks, trust relationships or comments, but the information in the same domain is not easily obtained.
In the embodiment of the application, the Funk-SVD decomposition is applied to the user item scoring matrix in the auxiliary domain to obtain the user latent vector, that is, the user item scoring matrix is decomposed into the user latent factor multiplied by the item latent factor through the Funk-SVD decompositionIn this form, the user item scoring matrix with a high dimension is decomposed into two matrices with a low dimension, such as X (m × n) into U (m × k) × V (k × n), where m and n are the number of rows and columns, k, respectively, of the user item scoring matrix representing the dimension of the latent factor, and k is much smaller than min (m, n). Funk-SVD decomposes known points fitting X to maximize to predict unknown points of X, k too small may not fit the data, and k too large may result in overfitting, to useThe predicted score of user u for item i is shown asWherein p isuA latent factor vector, q, representing user uiRepresenting the potential factor vector for item i.
In decomposition, an objective function is set toWherein p is*={puserI user ∈ userset } represents the set of all users' potential vectors, q*={qitem| item ∈ itemset } represents the set of potential factors for all items.
By pu←pu+γ(euiqi-λpu) And q isi←qi+γ(euipu-λqi) Updating puAnd q isiTo optimize the objective function to obtain the optimal optimization result, wherein
Finally, obtaining the potential vector of the user u on the jth auxiliary domain based on the optimization resultWherein j is from 1 to K, and K is the number of the auxiliary domains; lambda is a regularization parameter, gamma is a learning rate, too large a value of gamma will cause the algorithm not to converge, and too small a value will cause the algorithm to converge for a long time.
Step S13: and expanding the characteristic vectors of the users in the training sample set by using the potential vectors of the users to obtain a first expanded training sample set. And, step S14: and adding project characteristic information to expand the characteristic vectors of the projects in the first extended training sample set to obtain a second extended training sample set.
Adding the user potential vector obtained in the step S12 to the training sample in the target domain, that is, adding the user potential vector to the feature vector (L)u,Li) Obtaining a first extended training sample set
In addition, item feature information is added to expand the first extended training sample set to obtain a second extended training sample set, so that recommendation performance is improved. Taking a movie domain as an example, attributes of movies may be added to the feature vector, attribute information of all movies is retrieved according to movie names, and a plurality of attributes are selected from the attribute information as project features added to the feature vector, such as director, genre, actors, country, language, etc., to obtain a second extended training sample set which can be expressed asAnd q is the item feature number.
Step S15: the imbalance classifier is trained using a second extended training sample set.
In the embodiment of the application, the converted and expanded second expansion training sample set is classified by adopting an AdaBoost. The basic principle is that a plurality of classifiers are reasonably combined to form a strong classifier, the idea of iteration is adopted, only one weak classifier is trained in each iteration, the trained weak classifier participates in the next iteration, namely, after the Nth iteration, N weak classifiers exist in total, wherein N-1 weak classifiers are trained previously and various parameters of the N weak classifiers are not changed, the Nth classifier is trained at this time, wherein the weak classifier is in a relationship that the Nth weak classifier is more likely to classify data which is not paired with the previous N-1 weak classifiers, and finally, the data is classifiedThe class output is to look at the combined effect of the N classifiers. NC algorithm, there are two weights, the first is the sample weight of each sample in the training sample set, expressed by vector D, after one learning, the sample weight needs to be readjusted, adjust the sample weight of the misclassified sample in this classification, make it possible to learn it in the following learning in a repeated way; the other weight is the weight of each weak classifier, which is expressed by a vector alpha, and because a plurality of classifiers exist, a fuzzy term is set to measure the difference between different classifiers, and the fuzzy term is used for measuring the difference between the different classifiersIs represented bytRepresenting the classification result of the t weak classifier; if the training sample x is correctly classified by the t-th weak classifier, htIs 1, otherwise is-1; and H is the classification result integrating all the classifiers.
Specifically, in the embodiment of the present application, the sample weight of each sample in the second extended training sample set is initialized to beWherein A is the number of samples, and a is more than or equal to 1 and less than or equal to A; and (3) setting the number of the weak classifiers as T, repeating the following steps for T times to carry out iterative computation based on the idea of AdaBoost. NC imbalance algorithm: 1) from the t-th iteration, all sample weights { D }t(xa) A is more than or equal to 1 and less than or equal to A, training and obtaining a weak classifier ht(ii) a Wherein T is from 1 to T, T is from 1, and 1 is increased when the T is repeated until T; 2) calculate each training sample xaPenalty term p oft1- | amb |, wherein,is the weight of the weak classifier; 3) use ofUpdating the sample weight; wherein Z istFor the regularization factor, λ ∈ [0.5, 12 ]]An update step size which is a penalty term; is completed in T iterationsThen, calculating the imbalance classifier
Step S16: missing data for the user item scoring data is predicted based on the imbalance classifier and recommendations are generated.
As can be seen from the above, in the cross-domain collaborative filtering method provided by the present application, the scoring data in the user project scoring matrix is converted into a training sample as a feature vector according to the position of the scoring data in the matrix, then a user potential vector is obtained from other auxiliary domains containing relatively rich information through Funk-SVD decomposition, and the training sample set is expanded using the user potential vector to obtain a first expanded training sample set, thereby reducing the sparsity of a target domain, and then the first expanded training sample set is expanded using the project feature information of the auxiliary domains to obtain a second expanded training sample set, and finally the unbalanced classifier is trained using the second expanded training sample set after expansion, that is, the converted and expanded training set is classified, missing data of the user project scoring matrix of the target domain is predicted, and recommendation data for a user is generated; in the application, the problem that the existing recommendation system is unbalanced in data set is solved by adopting an unbalanced classification model, and the problem of biased distribution of scores is effectively solved.
Based on the above proposed cross-domain collaborative filtering method, the present application further proposes a cross-domain collaborative filtering system, as shown in fig. 2, which includes a training sample conversion module 21, a user potential vector generation module 22, a training sample first extension module 23, a training sample second extension module 24, an imbalance classifier training module 25, and a recommendation module 26.
The training sample conversion module 21 is configured to convert the user item scoring data into a training sample set of a classification algorithm; the user potential vector generating module 22 is configured to perform a Funk-SVD decomposition on the user item score matrix of each auxiliary domain to obtain a user potential vector; the training sample first extension module 23 is configured to extend the feature vectors of the users in the training sample set by using the user potential vectors to obtain a first extended training sample set; the training sample second expansion module 24 is configured to add the project feature information to expand the feature vectors of the projects in the first extended training sample set to obtain a second extended training sample set; the imbalance classifier training module 25 is configured to train an imbalance classifier using the second extended training sample set; the recommendation module 26 is to predict missing data for user item scoring data based on the imbalance classifier and generate recommendations.
In particular, the training sample conversion module is adapted to employ LuRepresenting the rows of the user in the user item scoring matrix, using LiRepresenting columns of items in the user item rating testimony and based on a feature vector (L)u,Li) Training sample set of classification algorithm for generating user project scoring data, { (L)u,Li,Rui) L (u, i) e.k, where k is the set of scored "user-item" pairs in the scoring matrix, RuiRepresenting the user u's rating for item i.
In the embodiment of the present application, the user latent vector generating module 22 includes an objective function setting unit 221, an objective function optimizing unit 222, and a user latent vector generating unit 223; the objective function setting unit 221 is for setting an objective functionThe objective function optimization unit 222 is for employing pu←pu+γ(euiqi-λpu) And q isi←qi+γ(euipu-λqi) Updating puAnd q isiTo optimize the objective function; the user potential vector generating unit 223 is used for obtaining a potential vector of the user u in the jth auxiliary domain based on the optimization resultWhere j ranges from 1 to K, where K is the number of auxiliary fields.
The unbalanced classifier training module 25 includes a sample weight initialization unit 251, a weak classifier training unit 252, a sample weight update unit 253, and an unbalanced classifier generation unit 254; the sample weight initialization unit 251 is used to initialize the sample weight of each sample in the second extended training sample set toWherein A is the number of samples, and a is more than or equal to 1 and less than or equal to A; the weak classifier training unit 252 is used for weighting all samples { D } according to the t-th iteration of the samplet(xa) If 1 is less than or equal to a and less than or equal to A, training and obtaining the weak classifier ht(ii) a Wherein T is from 1 to T; the sample weight update unit 253 is used for calculating each training sample xaPenalty term of Wherein the content of the first and second substances,is the weight of the weak classifier; use ofUpdating the sample weight; wherein Z istFor the regularization factor, λ ∈ [0.5, 12 ]]An update step size which is a penalty term; the unbalanced classifier generating unit 254 is used for calculating the unbalanced classifier after repeating the calculation for T times by the weak classifier training unit and the sample weight updating unit
The recommendation method of the cross-domain collaborative filtering system has been described in detail in the above proposed cross-domain collaborative filtering method, and is not described herein again.
It should be noted that the above description is not intended to limit the present invention, and the present invention is not limited to the above examples, and those skilled in the art should also make changes, modifications, additions or substitutions within the spirit and scope of the present invention.
Claims (8)
1. A cross-domain collaborative filtering method is characterized by comprising the following steps:
converting the user item scoring data into a training sample set of a classification algorithm;
performing Funk-SVD on the user item scoring matrix of each auxiliary domain to obtain a user potential vector;
expanding the characteristic vector of the user in the training sample set by using the user potential vector to obtain a first expanded training sample set;
adding project characteristic information to expand the characteristic vectors of the projects in the first extended training sample set to obtain a second extended training sample set;
training an imbalance classifier using the second extended training sample set;
predicting missing data for the user item scoring data based on the imbalance classifier and generating a recommendation.
2. The cross-domain collaborative filtering method according to claim 1, wherein the user item scoring data is converted into a training sample set of a classification algorithm, specifically:
by using LuRepresenting the rows of the user in the user item scoring matrix, using LiColumns representing items in the user item rating matrix;
based on feature vector (L)u,Li) Training sample set of classification algorithm for constructing user project scoring data { (L)u,Li,Rui) L (u, i) e.k, where k is the set of scored "user-item" pairs in the scoring matrix, RuiRepresenting the user u's rating for item i.
3. The cross-domain collaborative filtering method according to claim 2, wherein the Funk-SVD decomposition is performed on the user item scoring matrix of each auxiliary domain to obtain a user potential vector, and specifically comprises:
By pu+γ(euiqi-λpu) And q isi+γ(euipu-λqi) Updating puAnd q isiTo optimize the objective function; wherein, λ is a regularization parameter, and γ is a learning rate;
obtaining a potential vector of the user u on the jth auxiliary domain based on the optimization resultWherein j is from 1 to K, and K is the number of the auxiliary domains;
wherein r isuiRepresents the scoring of item i by user u; p is a radical of*={puserI user ∈ userset } represents the set of all users' potential vectors, q*={qitem| item ∈ itemset } represents the set of potential factors of all items; p is a radical ofuA latent factor vector, q, representing user uiA potential factor vector representing item i;
4. the cross-domain collaborative filtering method according to claim 1, wherein training an imbalance classifier using the second extended training sample set specifically includes:
initializing a sample weight of each sample in the second extended training sample set toWherein A is the number of samples, and a is more than or equal to 1 and less than or equal to A;
repeating the following steps for T times:
1) from the t-th iteration, all sample weights { D }t(xa) If 1 is less than or equal to a and less than or equal to A, training and obtaining the weak classifier ht(ii) a Wherein T is from 1 to T;
2) calculate each training sample xaPenalty term p oft=1-|amb|, Wherein the content of the first and second substances,is the weight of the weak classifier;
3) use ofUpdating the sample weight; wherein Z istFor the regularization factor, λ ∈ [0.5, 12 ]]Updating step length of the penalty item;
Wherein, H is the classification result integrating all classifiers.
5. A cross-domain collaborative filtering system is characterized by comprising a training sample conversion module, a user potential vector generation module, a training sample first expansion module, a training sample second expansion module, an unbalanced classifier training module and a recommendation module;
the training sample conversion module is used for converting the user project scoring data into a training sample set of a classification algorithm;
the user potential vector generation module is used for carrying out the Funk-SVD decomposition on the user item scoring matrix of each auxiliary domain to obtain a user potential vector;
the training sample first expansion module is used for expanding the feature vectors of the users in the training sample set by using the user potential vectors to obtain a first expanded training sample set; the training sample second expansion module is used for adding project characteristic information to expand the characteristic vectors of the projects in the first expansion training sample set to obtain a second expansion training sample set;
the imbalance classifier training module is configured to train an imbalance classifier using the second extended training sample set;
the recommendation module is used for predicting missing data of the user item scoring data based on the imbalance classifier and generating a recommendation.
6. The cross-domain collaborative filtering system according to claim 5, wherein the training sample transformation module is specifically configured to employ LuRepresenting the rows of the user in the user item scoring matrix, using LiRepresenting columns of items in a user item rating matrix and based on a feature vector (L)u,Li) Training sample set of classification algorithm for constructing user project scoring data { (L)u,Li,Rui) L (u, i) e.k, where k is the set of scored "user-item" pairs in the scoring matrix, RuiRepresenting the user u's rating for item i.
7. The cross-domain collaborative filtering system according to claim 6, wherein the user potential vector generation module includes an objective function setting unit, an objective function optimization unit, and a user potential vector generation unit;
the target function setting unit is used for setting a target function
The objective function optimization unit is used for adopting pu+γ(euiqi-λpu) And q isi+γ(euipu-λqi) Updating puAnd q isiTo optimize the objective function; wherein, λ is a regularization parameter, and γ is a learning rate;
the user potential vector generating unit is used for obtaining a potential vector of the user u on the jth auxiliary domain based on the optimization resultWherein j is from 1 to K, and K is the number of the auxiliary domains;
wherein r isuiRepresents the scoring of item i by user u; p is a radical of*={puserI user ∈ userset } represents the set of all users' potential vectors, q*={qitem| item ∈ itemset } represents the set of potential factors of all items; p is a radical ofuA latent factor vector, q, representing user uiA potential factor vector representing item i;
8. the cross-domain collaborative filtering system according to claim 5, wherein the unbalanced classifier training module comprises a sample weight initialization unit, a weak classifier training unit, a sample weight update unit, and an unbalanced classifier generation unit;
the sample weight initialization unit is used for initializing the sample weight of each sample in the second extended training sample set asWherein A is the number of samples, and a is more than or equal to 1 and less than or equal to A;
the weak classifier training unit is used for weighting all samples { D ] according to the t iterationt(xa) A is more than or equal to 1 and less than or equal to A, training and obtaining a weak classifier ht(ii) a Wherein T is from 1 to T;
the sample weight updating unit is used for calculating each training sample xaPenalty term p oft=1-|amb|,Wherein the content of the first and second substances,is the weight of the weak classifier; use of Updating the sample weight; wherein Z istFor the regularization factor, λ ∈ [0.5, 12 ]]Updating step length of the penalty item;
the unbalanced classifier generating unit is used for calculating the unbalanced classifier after the weak classifier training unit and the sample weight updating unit repeat the calculation for T times
Wherein, H is the classification result integrating all classifiers.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811209371.5A CN109446420B (en) | 2018-10-17 | 2018-10-17 | Cross-domain collaborative filtering method and system |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811209371.5A CN109446420B (en) | 2018-10-17 | 2018-10-17 | Cross-domain collaborative filtering method and system |
Publications (2)
Publication Number | Publication Date |
---|---|
CN109446420A CN109446420A (en) | 2019-03-08 |
CN109446420B true CN109446420B (en) | 2022-01-25 |
Family
ID=65546951
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201811209371.5A Active CN109446420B (en) | 2018-10-17 | 2018-10-17 | Cross-domain collaborative filtering method and system |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN109446420B (en) |
Families Citing this family (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110119465B (en) * | 2019-05-17 | 2023-06-13 | 哈尔滨工业大学 | Mobile phone application user preference retrieval method integrating LFM potential factors and SVD |
CN110264274B (en) * | 2019-06-21 | 2023-12-29 | 深圳前海微众银行股份有限公司 | Guest group dividing method, model generating method, device, equipment and storage medium |
CN110297848B (en) * | 2019-07-09 | 2024-02-23 | 深圳前海微众银行股份有限公司 | Recommendation model training method, terminal and storage medium based on federal learning |
CN112214682B (en) * | 2019-07-11 | 2023-04-07 | 中移(苏州)软件技术有限公司 | Recommendation method, device and equipment based on field and storage medium |
CN110543597B (en) * | 2019-08-30 | 2022-06-03 | 北京奇艺世纪科技有限公司 | Grading determination method and device and electronic equipment |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102385586A (en) * | 2010-08-27 | 2012-03-21 | 日电(中国)有限公司 | Multiparty cooperative filtering method and system |
CN102930341A (en) * | 2012-10-15 | 2013-02-13 | 罗辛 | Optimal training method of collaborative filtering recommendation model |
EP2837199A1 (en) * | 2012-04-12 | 2015-02-18 | MOVIRI S.p.A. | Client-side recommendations on one-way broadcast networks |
CN105447145A (en) * | 2015-11-25 | 2016-03-30 | 天津大学 | Item-based transfer learning recommendation method and recommendation apparatus thereof |
-
2018
- 2018-10-17 CN CN201811209371.5A patent/CN109446420B/en active Active
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102385586A (en) * | 2010-08-27 | 2012-03-21 | 日电(中国)有限公司 | Multiparty cooperative filtering method and system |
EP2837199A1 (en) * | 2012-04-12 | 2015-02-18 | MOVIRI S.p.A. | Client-side recommendations on one-way broadcast networks |
CN102930341A (en) * | 2012-10-15 | 2013-02-13 | 罗辛 | Optimal training method of collaborative filtering recommendation model |
CN105447145A (en) * | 2015-11-25 | 2016-03-30 | 天津大学 | Item-based transfer learning recommendation method and recommendation apparatus thereof |
Non-Patent Citations (2)
Title |
---|
A User-Based Cross Domain CollaborativeFiltering Algorithm Based on a Linear Decomposition Model;xu yu等;《IEEE》;20171116;全文 * |
跨域协同过滤系统;刘青文;《中国博士学位全文全文数据库》;20131031;全文 * |
Also Published As
Publication number | Publication date |
---|---|
CN109446420A (en) | 2019-03-08 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109446420B (en) | Cross-domain collaborative filtering method and system | |
Wang et al. | Relational deep learning: A deep latent variable model for link prediction | |
Dong et al. | Hnhn: Hypergraph networks with hyperedge neurons | |
Jadhav et al. | Comparative study of K-NN, naive Bayes and decision tree classification techniques | |
CN112232925A (en) | Method for carrying out personalized recommendation on commodities by fusing knowledge maps | |
Ma et al. | Adaptive-step graph meta-learner for few-shot graph classification | |
US8005784B2 (en) | Supervised rank aggregation based on rankings | |
US20080195631A1 (en) | System and method for determining web page quality using collective inference based on local and global information | |
Jin et al. | Pattern classification with corrupted labeling via robust broad learning system | |
CN112529168A (en) | GCN-based attribute multilayer network representation learning method | |
WO2022252458A1 (en) | Classification model training method and apparatus, device, and medium | |
Kongsorot et al. | Multi-label classification with extreme learning machine | |
Wan et al. | Adaptive knowledge subgraph ensemble for robust and trustworthy knowledge graph completion | |
Ludl et al. | Using machine learning models to explore the solution space of large nonlinear systems underlying flowsheet simulations with constraints | |
CN111144500A (en) | Differential privacy deep learning classification method based on analytic Gaussian mechanism | |
WO2020147259A1 (en) | User portait method and apparatus, readable storage medium, and terminal device | |
Dehuri et al. | A condensed polynomial neural network for classification using swarm intelligence | |
CN114282077A (en) | Session recommendation method and system based on session data | |
US20060276996A1 (en) | Fast tracking system and method for generalized LARS/LASSO | |
CN116975686A (en) | Method for training student model, behavior prediction method and device | |
Zhou et al. | Online recommendation based on incremental-input self-organizing map | |
WO2022105780A1 (en) | Recommendation method and apparatus, electronic device, and storage medium | |
CN117033992A (en) | Classification model training method and device | |
CN115757897A (en) | Intelligent culture resource recommendation method based on knowledge graph convolution network | |
CN114254738A (en) | Double-layer evolvable dynamic graph convolution neural network model construction method and application |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |