CN108717654A

CN108717654A - A kind of more electric business intersection recommendation method based on cluster feature migration

Info

Publication number: CN108717654A
Application number: CN201810470713.2A
Authority: CN
Inventors: 吴骏; 方贺贺; 张怡; 杜云涛; 王崇骏
Original assignee: Nanjing University
Current assignee: Nanjing University
Priority date: 2018-05-17
Filing date: 2018-05-17
Publication date: 2018-10-30
Anticipated expiration: 2038-05-17
Also published as: CN108717654B

Abstract

The invention discloses a kind of more electric business based on cluster feature migration to intersect recommendation method, includes the following steps 1) rating matrix construction phase：A acquires each electric quotient data；Noise is removed in b data cleansings；C builds rating matrix；D terminates；2) auxiliary domain learns the stage：A obtains rating matrix；B extracts user/item characteristic matrix；C is to user/item characteristic Matrix Cluster；D calculates average score；E constructs cluster feature matrix；F repeats above step to terminating for each auxiliary electric business；3) aiming field learns the stage：A obtains target electric business rating matrix；B migrates cluster feature, completes matrix decomposition.C reconstructs target electric business rating matrix；D generates recommendation list；E terminates.The present invention provides a kind of new resolving ideas using transfer learning technology for Deta sparseness, cold start-up and diversity existing for electric business commending system and accuracy awkward predicament problem.

Description

Multi-provider cross recommendation method based on clustering feature migration

Technical Field

The invention relates to a multi-provider cross recommendation method, which solves the problem that an e-provider recommendation system is low in recommendation accuracy under the conditions of extremely sparse data and cold start.

Background

With the continuous expansion of the scale of the e-commerce website, the problem of information overload becomes more and more serious, and a very potential method for solving the problem is a personalized recommendation system. Such as the well-known e-commerce platform Amazon, recommends other products to the user that may be of interest using behavior records such as clicks, browses, favorites, and shopping carts that reflect the user's purchasing interest. According to the preference of each user, the intelligent content recommendation of thousands of people and thousands of faces is carried out, so that key indexes such as user activity, stay time, payment rate, retention rate and the like can be effectively improved, and huge values are created for the society and enterprises. However, the rapid increase of the number of users and commodities brings about a plurality of troubles such as data sparsity, cold start, diversity and accuracy difficulty and the like to the traditional e-commerce recommendation system.

Currently, most e-commerce recommendation systems are performed in a single domain. The internet is an open environment, almost every user cannot generate data in only one field, the user can shop in Taobao, Amazon and Jingdong at the same time, and can listen to songs on Internet music, QQ music and dog music at the same time. The single field recommendation cannot effectively share internet resources, so that information is relatively blocked, and an information island is easily formed.

The cross-domain recommendation aims to extract knowledge from other fields containing rich data through information sharing and complementation between domains, provides help for recommendation of a target domain, can relieve the problems of sparsity and cold start of data on one hand, and can also give consideration to diversity and accuracy on the other hand, so that the cross-domain recommendation becomes a research hotspot in the field of recommendation systems. The invention provides a new solution for the problems of the e-commerce recommendation system by applying the transfer learning to the e-commerce recommendation from the consideration of cross-domain recommendation technology.

Disclosure of Invention

The purpose of the invention is as follows: the invention aims to solve the technical problems that in consideration of the challenges of data sparsity, cold start, diversity and accuracy dilemma and the like of a traditional e-commerce recommendation system, a multi-e-commerce cross recommendation method based on clustering feature migration is provided by introducing a migration learning idea: firstly, extracting a user/item feature matrix from each auxiliary e-commerce; then, clustering is carried out on the users/projects, and the average scores of the user clusters on the project clusters are calculated to form clustering characteristics which are used as domain knowledge and transmitted to the target e-commerce; and finally, migrating the domain knowledge of each auxiliary e-commerce to the target e-commerce in a weighting mode to help the target e-commerce to reconstruct a user-item scoring matrix, thereby completing final recommendation.

The technical scheme is as follows: in order to achieve the purpose, the invention adopts the technical scheme that:

a multi-provider cross recommendation method based on cluster feature migration comprises the following steps:

1) and (3) a scoring matrix construction stage:

1) a, collecting user historical behavior data of each E-commerce website;

1) b, cleaning and denoising the historical behavior data of the user;

the data in the steps 1) -b are cleaned to remove repeated data and missing data, and the noise removal is to delete the data with few user behavior records;

1) c, respectively constructing a user-item scoring matrix of each E-commerce website by comprehensively using behavior data capable of reflecting the purchasing interest of the user;

the step 1) -c of constructing the user-item scoring matrix refers to replacing the user name and the item name with the row number and the column number of the matrix, and converting the behavior data into specific numerical values; the behavior data is data reflecting clicking, browsing, collecting and purchasing behaviors of the purchasing interest of the user;

1) -d ends;

2) and (3) auxiliary domain learning stage:

2) -a acquisition of auxiliary e-commerceUser-item scoring matrix R of_z，z∈{1,2,…,Z}；

2) B implementing ALS algorithm from user-item scoring matrix R_zUser feature matrix M with D dimension extracted from the user feature matrix_zAnd item feature matrix N_z；

The ALS algorithm in the steps 2) -b specifically comprises the following steps:

step 2) -b-1) randomly initializing an item feature matrix N by using the value in (0,1)_z；

Step 2) -b-2) fixing the project feature matrix N_zUpdating each user feature vector M one by one according to the following formula_i.；

Wherein N is_uiA matrix of eigenvectors representing the items scored by the ith user, n_uiThe score of the ith user is shown, I is an identity matrix of DxD, lambda represents the step length, T represents the iteration number, I represents a matrix M_zLine number of, M_i.The user feature vector representing the ith user, i.e. the matrix M_zTo (1) ai line;

step 2) -b-3), fixing the user characteristic matrix M_zUpdating each item feature vector N one by one according to the following formula_j.；

Wherein M is_mjA matrix of eigenvectors representing users scoring the jth item, n_mjThe number of the j-th item to be scored is shown, I is an identity matrix of DxD, and j is a matrix N_zLine number of, N_j.Item feature vector representing the jth item, i.e. matrix N_zRow j of (1);

step 2) -b-4), repeating the steps 2) -b-2) and 2) -b-3) for T times until the end;

2) c applying K-means algorithm to user feature matrix M_zAnd item feature matrix N_zClustering is carried out to obtain k_zIndividual user clustering and l_zClustering the items;

the K-means clustering algorithm in the steps 2) -c comprises the following specific processes:

step 2) -c-1) randomly selecting K data as an initial clustering center, wherein K is predetermined;

step 2) -c-2) assigning each row of data to its nearest cluster according to the Euclidean distance formula as follows:

where dis (a, b) represents the Euclidean distance of data a and data b, X_a,dFor the value of data a on the d-th attribute, X_a,dIs the value of data b on the d-th attribute;

step 2) -c-3) recalculating the cluster center value of each cluster;

steps 2) -c-4) repeating steps 2) -c-2) and 2) -c-3) T times until the end;

2) -d calculating the average score p of each user cluster over the project clusters_kl；

The formula for calculating the average score of each user cluster to the project cluster in the steps 2) -d is as follows:

wherein p is_klRepresents the average score, r, of the kth user cluster over the l item cluster_u,vRepresents the rating of the item v by the user u,represents a clusterThe number of users is increased, and the number of users,represents a clusterThe number of middle items.

2) E constructing a clustering feature matrix P of the auxiliary e-commerce_zCluster feature matrix P_zWherein the element is p_kl；

2) -f for each auxiliary e-commerceZ belongs to {1,2, …, Z }, and the steps are repeated until the end;

3) and a target domain learning stage:

3) -a obtaining a target e-commerceUser-item scoring matrix R of_T；

3) B migration clustering feature matrix P_zHelping the user-item scoring matrix R_TCompleting matrix decomposition to obtain parameter U_z、V_zAnd α_z；

The specific process of matrix decomposition described in steps 3) -b is as follows:

3) -b-1) an objective equation defining an objective domain matrix decomposition, the formula being as follows:

wherein, U_z、V_zAnd α_zParameters, U, to be solved for this objective equation_zRepresenting a source domain to which a target domain user belongsWhich user in (b) is clustered, V_zRepresenting a source domain to which a target domain item belongsWhich item in(s) is clustered, α_zRepresenting a source domainA parameter of the degree of migration is,k_zas an auxiliary domainNumber of user clusters, l_zAs an auxiliary domainNumber of item clusters, W represents R_TOf a marking matrix, matrix1 represents the full "1" matrix, the symbol ° represents the multiplication of the elements between the matrices, U_z1＝1，V_z1-1 ensures that each user and item only belongs to one cluster feature, i.e. only one element in each row is 1, and the rest are 0;

3) -b-2) random initialization V_zEnsuring that only one element in each row is 1 and the rest are 0;

3) -b-3) order

3) -b-4) per user u_iAuxiliary domain to which a possible belongsUser cluster has k_zConsidering Z auxiliary domain knowledge together, the combined situation is k₁×k₂×…×k_zSelecting a combination mode to minimize the following formula, namely, selecting the combination which can predict the target score most to find the corresponding auxiliary domain cluster [ U ] of the target user by checking different combinations of user clusters in all auxiliary domains_z]_i：

Wherein,

3) -b-5) order U_zIth row of (1)_zColumn is 1, and the rest are 0;

3) b-6) for R_TRepeat 3) -b-4) and 3) -b-5) for each row i);

3) b-7) Each item v_iAuxiliary domain to which a possible belongsThe item cluster has l_zConsidering multiple auxiliary domain knowledge, the combination condition is l₁×l₂×…×l_zSelecting a combination mode to minimize the following formula, namely, selecting the combination which can predict the target score most to find the auxiliary domain cluster [ V ] to which the target domain item belongs by checking different combinations of the item clusters in all the auxiliary domains_z]_i：

3) B-8) order V_zIth row of (1)_zColumn is 1, and the rest are 0;

3) b-9) for R_TRepeating steps 3) -b-7) and 3) -b-8) for each column i);

3) -b-10) update vectorThe formula is as follows:

wherein,w is R_TThe tag matrix of (2);

3) -b-11) repeating steps 3) -b-4) to 3) -b-10) T times until the end;

3) c, reconstructing a user-item scoring matrix of the target e-commerce to obtain a reconstruction matrix

The formula of the project-score matrix of the reconstructed target electric business user in the steps 3) to c is as follows:

wherein W represents R_T1 represents a matrix whose matrix elements are all values 1.

3) D determining the number N of the commodities to be recommended according to specific requirements, and reconstructing a matrixFind user u_iRecommending the top N commodities with the highest scores;

3) -e ends.

Compared with the prior art, the invention has the following beneficial effects:

the invention provides a new solution for various troubles of data sparsity, cold start, diversity and accuracy and the like of a traditional e-commerce recommendation system, and provides a multi-e-commerce cross recommendation method based on cluster feature migration.

Drawings

Fig. 1 is a flowchart of a multi-provider cross recommendation method based on cluster feature migration.

FIG. 2 is a user-item scoring matrix conversion chart.

FIG. 3 is a flow chart of extracting a user/project feature matrix using ALS algorithm;

FIG. 4 is a flow chart for obtaining user/item clusters using the K-means algorithm.

Fig. 5 is a flow chart of a target e-commerce migration assisted e-commerce clustering feature to assist matrix decomposition.

Detailed Description

The present invention is further illustrated by the following description in conjunction with the accompanying drawings and the specific embodiments, it is to be understood that these examples are given solely for the purpose of illustration and are not intended as a definition of the limits of the invention, since various equivalent modifications will occur to those skilled in the art upon reading the present invention and fall within the limits of the appended claims.

Fig. 1 is a flowchart of a multi-provider cross recommendation method based on cluster feature migration according to an embodiment of the present invention. The specific steps are described as follows:

step 0 is the starting state of the present invention;

in a scoring matrix construction stage (step 1-3), step 1, collecting user historical behavior data of a plurality of e-commerce;

step 2, removing repeated data and missing data from the user historical behavior data and deleting data with few user behavior records;

step 3, comprehensively using behavior data capable of reflecting the purchasing interest of the user, and constructing a user-item scoring matrix of each E-commerce website by using the behavior data preprocessed in the step 2;

in the auxiliary domain learning phase (steps 4-8), step 4 is to acquire each auxiliary e-commerce separatelyUser-item scoring matrix R of_z，z∈{1,2,…,Z}；

Step 5 is implemented in each auxiliary domain separatelyALS algorithm from R_zUser feature matrix M with D dimension extracted from the user feature matrix_zAnd item feature matrix N_z；

Step 6, respectively implementing a K-means algorithm to the user feature matrix M in each auxiliary field_zAnd item feature matrix N_zClustering is carried out to obtain k_zIndividual user clustering and l_zClustering the items;

step 7, respectively calculating the average score p of each user cluster to the project cluster in each auxiliary field_ijAdding the scores of each user in the ith user cluster to the items in the jth item cluster, and dividing the sum by the product of the number of the users in the ith user cluster and the number of the items in the jth item cluster;

the average scoring formula for calculating the item cluster of each user cluster is as follows:

Step 8 is to construct a clustering feature matrix P of each auxiliary e-commerce_zWherein the matrix element is the average score p obtained in step 7_kl；

In the target domain learning phase (steps 9-12), step 9 is to acquire the target e-commerceUser item scoring matrix R_T；

Step 10 is to cluster a plurality of cluster features P_zMigrating to a target domain, and accordingly helping a target e-commerce to complete matrix decomposition to obtain a parameter U_z、V_zAnd α_z；

Step 11 is to obtain the parameter U according to step 9_z、V_zAnd α_zReconstructing a target domain matrix of the formula Wherein R is_TFor the purpose of electronic commerceW is R_TThe tag matrix of (2);

step 12, determining the number N of the commodities to be recommended according to the specific requirements, and reconstructing the matrixFind user u_iRecommending the top N commodities with the highest scores;

step 13 is the end state.

As shown in fig. 2, which is a detailed description of step 3 in fig. 1, there are various interaction behaviors between the user and the goods in the e-commerce recommendation field, such as browsing, clicking, adding a shopping cart, purchasing, etc. These behaviors are in fact implicit behavior data that can well represent user preferences. The data are comprehensively considered, and the data are converted into a user-item scoring matrix according to the preference degree of the user for the goods, which is conveyed by each behavior.

Fig. 3 is a detailed description of step 5 in fig. 1.

Step 14 is the start state;

step 15 is to initialize matrix N randomly with values between (0,1)_z；

Step 16 is to fix the matrix N_zThe matrix M is updated row by row according to the following formula:

wherein N is_uiA matrix of eigenvectors representing the items scored by the ith user, n_uiThe score of the ith user is shown, I is an identity matrix of DxD, lambda represents the step length, T represents the iteration number, I represents a matrix M_zLine number of, M_i.The user feature vector representing the ith user, i.e. the matrix M_zRow i of (1);

step 17 is to fix the matrix M_zThe matrix N is updated row by row according to the following formula:

step 18, judging whether the iteration is carried out for T times, if not, turning to step 15, and if so, turning to step 18;

step 19 is the end state.

FIG. 4 is a detailed description of the K-means algorithm in step 6.

Step 20 is the start state;

step 21 is to determine the number of clusters K, when clustering the user feature matrix of the z-th auxiliary domain, K is K_zWhen clustering the item feature matrix of the z-th auxiliary domain, K is l_z；

Step 22, randomly selecting K data as an initial clustering center;

step 23 is to assign each row of data points to the cluster closest to it according to the euclidean distance, the formula is as follows:

wherein X_a,dFor the value of data a on the d-th attribute, X_b,dIs the value of data b on the d-th attribute;

step 24, recalculating new center values for each cluster based on the data points assigned to each cluster;

step 25, judging whether the iteration is carried out for T times, if not, turning to step 22, and if so, turning to step 25;

step 26 is the end state.

Fig. 5 is a detailed description of step 10 in fig. 1.

Step 27 is the start state;

step 28 is to initialize Z matrices V randomly_zEnsuring that only one element in each row is 1 and the rest are 0;

step 29 is to let the Z parameters characterizing the migration degree

Step 30 is to find the user u_iWhich belongs to the z-th auxiliary domainA user cluster j_zI.e. by examining different combinations of user clusters in all source domains, total k₁×k₂×…×k_zUnder the condition, selecting the combination capable of predicting the target score to find the corresponding auxiliary domain cluster [ U ] of the target user_z]_iI.e. selecting a combinationLet the following equation take the minimum value:

wherein is R_TA user-item scoring matrix for the target e-business,

step 31 is to make U_zIth row of (1)_zColumn 1, the rest 0, for each user u in the target domain_iRepeating steps 30 and 31;

step 32 is to find item v_iWhich item cluster j belongs to the z-th auxiliary domain_zI.e. by examining different combinations of clusters of items in all auxiliary domains, total₁×l₂×…×l_zUnder the condition, selecting the combination of the most predictive target scores to find the corresponding auxiliary domain cluster [ U ] of the target item_z]_iI.e. selecting a combinationLet the following equation take the minimum value:

step 33 is to let V_zIth row of (1)_zColumn 1, the rest 0, for each item v of the target domain_iRepetition ofSteps 32 and 33 are performed;

step 34 is to update the vectorThe formula is as follows:

whereinW is R_TThe tag matrix of (2);

step 35, judging whether the iteration is performed for T times, if not, turning to step 29, and if so, turning to step 35;

step 36 is an end state.

The method adopts a transfer learning technology, namely, the clustering characteristics are extracted from a plurality of auxiliary domains and are transferred to the target domain by different weights as knowledge to help the target e-commerce to reconstruct a user-project scoring matrix, thereby completing final recommendation. The negative migration problem caused by destructive information is reduced by adopting a migration learning technology and introducing parameters representing the migration degree, and experiments are carried out on real E-commerce website data, so that the method can effectively solve the problems of data sparsity, cold start, diversity and accuracy existing in the traditional E-commerce recommendation system, and improve the recommendation performance.

In conclusion, the multi-provider cross recommendation method based on cluster feature migration provides a new solution for the dilemma of data sparsity, cold start, diversity and accuracy in the e-provider recommendation system by using the migration learning technology.

The above description is only of the preferred embodiments of the present invention, and it should be noted that: it will be apparent to those skilled in the art that various modifications and adaptations can be made without departing from the principles of the invention and these are intended to be within the scope of the invention.

Claims

1. A multi-provider cross recommendation method based on cluster feature migration is characterized by comprising the following steps:

step 1, a scoring matrix construction stage: collecting and preprocessing user historical behavior data of each e-commerce website, comprehensively using behavior data capable of reflecting user purchasing interest, and respectively constructing a user-item scoring matrix of each e-commerce website;

step 2, auxiliary domain learning stage: obtaining auxiliary e-commerceUser-item scoring matrix R of_zZ ∈ {1,2, …, Z }; implementing ALS algorithm from user-item scoring matrix R_zUser feature matrix M with D dimension extracted from the user feature matrix_zAnd item feature matrix N_z(ii) a Implementing K-means algorithm to respectively carry out user feature matrix M_zAnd item feature matrix N_zClustering is carried out to obtain k_zIndividual user clustering and l_zClustering the items; calculating the average score p of each user cluster to the project cluster_kl(ii) a Constructing clustering characteristic matrix P of auxiliary E-commerce_zCluster feature matrix P_zWherein the element is p_kl；

Step 3, target domain learning stage: obtaining target e-commerceUser-item scoring matrix R of_T(ii) a Migration clustering feature matrix P_zHelping the user-item scoring matrix R_TCompleting matrix decomposition to obtain parameter U_z、V_zAnd α_z(ii) a Reconstructing a user-item scoring matrix of the target e-commerce to obtain a reconstruction matrixThen, according to specific requirements, determining the number N of commodities to be recommended, and reconstructing a matrixFind user u_iRecommending the top N commodities with the highest scores; the formula of the project-score matrix of the reconstructed target electric commercial user is as follows:

2. The multi-provider cross recommendation method based on cluster feature migration according to claim 1, wherein: the ALS algorithm in the step 2 specifically comprises the following steps:

step 2) -b-4), repeating the steps 2) -b-2) and 2) -b-3) for T times until finishing.

3. The multi-provider cross recommendation method based on cluster feature migration according to claim 4, wherein: the K-means clustering algorithm in the step 2 comprises the following specific processes:

where dis (a, b) represents the Euclidean distance of data a and data b, X_a,dFor the value of data a on the d-th attribute, X_b,dIs the value of data b on the d-th attribute;

step 2) -c-3) recalculating the cluster center value of each cluster;

steps 2) -c-4) repeat steps 2) -c-2) and steps 2) -c-3) T times until the end.

4. The multi-provider cross recommendation method based on cluster feature migration according to claim 5, wherein: the formula for calculating the average score of each user cluster to the project cluster in step 2 is as follows:

5. The multi-provider cross recommendation method based on cluster feature migration according to claim 6, wherein: the specific process of matrix decomposition in step 3 is as follows:

s.t.U_z1＝1,V_z1＝1

wherein, U_z、V_zAnd α_zParameters, U, to be solved for this objective equation_zRepresenting a source domain to which a target domain user belongsWhich user in (b) is clustered, V_zRepresenting the source domain to which the target domain item belongsWhich item in(s) is clustered, α_zRepresenting a source domainA parameter of the degree of migration is,k_zas an auxiliary domainNumber of user clusters, l_zAs an auxiliary domainNumber of item clusters, W represents R_TThe matrix 1 represents the full '1' matrix, the symbol DEG represents the multiplication of corresponding elements between the matrices, U_z1＝1，V_z1-1 ensures that each user and item belongs to only one cluster feature, i.e. there is only one element per rowThe element is 1, and the rest is 0;

3) -b-3) order

Wherein,

3) -b-5) order U_zIth row of (1)_zColumn is 1, and the rest are 0;

3) b-6) for R_TRepeat 3) -b-4) and 3) -b-5) for each row i);

3) b-7) Each item v_iAuxiliary domain to which a possible belongsThe item cluster has l_zConsidering multiple auxiliary domain knowledge, the combination condition is l₁×l₂×…×l_zSelecting one combination from the combinations minimizes the following formula, namely, finding the target domain by checking different combinations of the item clusters in all the auxiliary domains and selecting the combination with the most predictable target scoreItem belonging to corresponding auxiliary domain cluster [ V ]_z]_i：

3) B-8) order V_zIth row of (1)_zColumn is 1, and the rest are 0;

3) b-9) for R_TRepeating steps 3) -b-7) and 3) -b-8) for each column i);

3) -b-10) update vectorThe formula is as follows:

wherein,w is R_TThe tag matrix of (2);

3) -b-11) repeating steps 3) -b-4) to 3) -b-10) T times until the end.

6. The multi-provider cross recommendation method based on cluster feature migration according to claim 1, wherein: and (2) preprocessing operation data cleaning and denoising in the step 1, wherein the data cleaning is to remove repeated data and missing data, and the denoising is to delete data with few user behavior records.

7. The multi-provider cross recommendation method based on cluster feature migration according to claim 1, wherein: the step 1 of constructing the user-item scoring matrix refers to replacing the user name and the item name with the row number and the column number of the matrix, and converting behavior data into specific numerical values; the behavior data is data reflecting the behaviors of clicking, browsing, collecting and purchasing of the purchasing interest of the user.