CN107092924A

CN107092924A - A kind of personalized recommendation algorithm of the clustering algorithm adaptively increased based on number of clusters

Info

Publication number: CN107092924A
Application number: CN201710177780.0A
Authority: CN
Inventors: 杨波; 袁磊
Original assignee: University of Electronic Science and Technology of China
Current assignee: University of Electronic Science and Technology of China
Priority date: 2017-03-23
Filing date: 2017-03-23
Publication date: 2017-08-25

Abstract

The invention discloses a kind of personalized recommendation algorithm of the clustering algorithm adaptively increased based on number of clusters.The algorithm is comprising the incremental learning based on cluster and recommends two stages.Wherein, the incremental learning stage based on cluster includes three parts：1) the MWOSK means algorithms provided using the present invention are clustered；2) the MGSoC algorithms provided using the present invention realize the adaptive growth of the quantity of cluster；3) incremental update.Result of the recommendation stage based on previous stage, personalized recommendation is carried out using the Collaborative Filtering Recommendation Algorithm for having merged user's weights.Compared with the proposed algorithm of collaborative filtering after existing first cluster, the proposed algorithm that the present invention is provided has the advantages that accuracy is high, adaptive can should determine that the quantity of cluster, be applicable to incremental learning.

Description

A kind of personalized recommendation algorithm of the clustering algorithm adaptively increased based on number of clusters

Technical field

The present invention is on the personalized recommendation problem in data mining, and in particular in data mining based on cluster Personalized recommendation field.

Background technology

Personalized recommendation is the Characteristic of Interest and buying behavior according to user, to user recommended user information interested and Commodity.Collaborative filtering is the algorithms most in use in personalized recommendation.Cluster is carried out before collaborative filtering recommending to advantageously account for searching Rope space is larger, accuracy rate is not high enough and to sparse data it is sensitive the problems such as.

Cluster is by the process of the high object clustering of similarity., can be first using cluster in personalized recommendation Then technology uses the information of clustering cluster similarity high clustering objects in proposed algorithm.However, most of at present take The personalized recommendation algorithm of collaborative filtering strategy only supports off-line learning after first clustering, it is impossible to adapt to user, project and scoring letter Cease the situation of the incremental learning of frequent updating.

Have been proposed at present some adapt to incremental learning situation first cluster after collaborative filtering personalized recommendation calculate Method.But one of these algorithms presence has the disadvantage：The quantity of specified cluster artificial in advance is needed in clustering phase, so recommends to calculate The result of method is often sensitive to the quantity of the cluster artificially specified, thus needs to take a significant amount of time optimal to determine to test Number of clusters amount.Another has the disadvantage that the degree of accuracy is not high enough.

The content of the invention

The deficiency existed for the personalized recommendation algorithm of collaborative filtering after existing first cluster, the invention provides a kind of base The personalized recommendation algorithm of the clustering algorithm adaptively increased in number of clusters.This algorithm includes the incremental learning based on cluster and recommendation Two stages, wherein employing the MWOSK-means (Modified that the present invention is provided in the incremental learning stage based on cluster Weighted Online Spherical K-means) algorithm and MGSoC (Modified Growing Self- Organizing Cluster) algorithm.MWOSK-means algorithms can make full use of project information to supply the meters of user's weights Calculate, improve the degree of accuracy of personalized recommendation.MGSoC algorithms complete the adaptive growth of number of clusters amount, solve to a certain extent The quantity of cluster needs artificially to specify in advance, needs the plenty of time the problem of determine optimal number of clusters amount in the prior art.

The personalized recommendation algorithm of the clustering algorithm provided by the present invention adaptively increased based on number of clusters is applicable to letter Cease the incremental learning situation of (such as user, project and score information) frequent updating, the energy compared with existing personalized recommendation algorithm Obtain the higher degree of accuracy and reduce the time needed for the optimal number of clusters amount of determination.

The present invention includes herein below：

1st, a kind of personalized recommendation algorithm of the clustering algorithm adaptively increased based on number of clusters

The algorithm is comprising the incremental learning based on cluster and recommends two stages, refers to Fig. 1.

2nd, the clustering algorithm that a kind of number of clusters based on MWOSK-means algorithms and MGSoC algorithms adaptively increases

In the incremental learning stage in the personalized recommendation algorithm that the present invention is provided, employ a kind of base that the present invention is provided The clustering algorithm (see the P1 in Fig. 1) adaptively increased in the number of clusters of MWOSK-means algorithms and MGSoC algorithms, the algorithm bag Include and clustered using MWOSK-means algorithms and (see the S2 in Fig. 1, refer to Fig. 3), realize number of clusters amount using MGSoC algorithms It is adaptive increase (see the P1.1 in Fig. 1, referring to Fig. 4, Fig. 5) and incremental update (see the S6 in Fig. 1, refer to Fig. 6, Fig. 7, Fig. 8, Fig. 9) three parts.

3rd, a kind of computational methods of new project weights and user's weights

Prior art does not account for the influence that project weights are brought when calculating user's weights.The invention provides one kind MWOSK-means algorithms, the initial phase (see the S1.5 in Fig. 2) of the algorithm employs a kind of new item that the present invention is provided Mesh weight calculation method, is specifically shown in formula (1), (2).Based on the project weight calculation method, the invention provides a kind of new User's weight calculation method of project weights is considered, specific method is shown in formula (3).

4th, it is a kind of new to judge the whether suitable determination methods of number of clusters amount in cluster process

In present disclosure 2, using MGSoC algorithms realize number of clusters amount adaptive growth (see the P1.1 in Fig. 1, Refer to Fig. 4, Fig. 5) part, in order to realize the adaptive growth of number of clusters amount, the present invention provides a kind of new in MGSoC algorithms Judge the whether suitable method of number of clusters amount in cluster process (S3, S4 in Fig. 1, wherein S3 refer to Fig. 4).

5th, a kind of computational methods of the newly-increased cluster center initial position of new calculating

In present disclosure 2, using MGSoC algorithms realize number of clusters amount adaptive growth (see the P1.1 in Fig. 1, Refer to Fig. 4, Fig. 5) part, if it is determined that showing that the number of clusters amount in cluster process is improper, then the present invention is carried in MGSoC algorithms A kind of computational methods of the newly-increased cluster center initial position of new calculating supplied come calculate Xin Cu centers initial position (S5 in Fig. 1, Refer to Fig. 5).

Brief description of the drawings

Fig. 1 is a kind of stream of the personalized recommendation algorithm for clustering algorithm adaptively increased based on number of clusters that the present invention is provided Cheng Tu.

Fig. 2 is the flow chart of S1 in Fig. 1.

Fig. 3 is the flow chart of S2 in Fig. 1.

Fig. 4 is the flow chart of S3 in Fig. 1.

Fig. 5 is the flow chart of S5 in Fig. 1.

Fig. 6 is the flow chart of S6 in Fig. 1.

Fig. 7 is the flow chart of S6.1 in Fig. 6.

Fig. 8 is the flow chart of S6.2 in Fig. 6.

Fig. 9 is the flow chart of S6.3 in Fig. 6.

Symbol description used in the present invention：

σ(u_i)：u_iStandard deviation

exp(·)：Using e as the exponential function at bottom

η：The learning rate at cluster process Zhong Cu centers

α：For judging the whether convergent convergence threshold in cluster center

β：For judging the whether suitable error threshold of current cluster quantity

1：Vector dimension is identical with being multiplied, and element is all 1 vector

^T：Matrix transposition

n：Number of users

p：Item number

d：Length between Linear Mapping back zone

l：Lower bound between Linear Mapping back zone

A：User's set to be recommended

s(u)：Vector normalization.Citing：To n-dimensional vector u=(u₁,u₂,...,u_n), its mould isThen

Embodiment

The personalized recommendation algorithm of the clustering algorithm disclosed by the invention adaptively increased based on number of clusters is included based on cluster Incremental learning and recommend two stages.

Personalized recommendation algorithm overall flow figure is as shown in Figure 1.

Below in conjunction with the accompanying drawings, the embodiment to the present invention elaborates.

First, initialize

S1 in this part corresponding diagram 1, detail flowchart is shown in Fig. 2.

S1：Initialization

S1.1：Initiation parameter collection

1) assignment is carried out to number of users n, item number p according to actual conditions；

2) user of the algorithm specifies initial cluster quantity K, learning rate η, convergence threshold α, error threshold β, Linear Mapping The lower bound l between length d, Linear Mapping back zone between back zone, user's set A to be recommended.

S1.2：Initialization represents the matrix M of " whether user scores project "

Matrix M=(the m arranged with n rows p_ij) represent whether i-th of user gives the scoring to j-th of project, wherein i ∈ { 1,2,3 ..., n }, j ∈ { 1,2,3 ..., p }.m_ij=1 expression user i gives the scoring to project j, m_ij=0 represents to use Family i does not provide scoring to project j.

S1.3：Initialize " user-project rating matrix " U

1) the matrix U=(u arranged with n rows p_ij) represent user-project rating matrix, wherein i ∈ { 1,2,3 ..., n }, j ∈ {1,2,3,...,p}.Score value of i-th of the user of element representation of i-th row jth row to j-th of project.Use u_i=(u_i1, u_i2,...,u_ij,...,u_ip) represent score value of i-th of user to all p projects；

2) U is normalized line by line, i.e. s (u_i), i ∈ { 1,2,3 ..., n }；

S1.4：Initialize " the affiliated cluster matrix of user-user " Z

1) the matrix Z=(z of construction n rows K row_ik) represent the affiliated cluster matrix of user-user, wherein i ∈ 1,2,3 ..., N }, k ∈ { 1,2,3 ..., K }.z_ik∈ { 0,1 }, z_ik=1 expression user i belongs to cluster k, z_ik=0 expression user i is not belonging to cluster k；

2) n user is randomly assigned into some cluster into K cluster (to meet any user i)；

S1.5：MWOSK-means algorithm initializations

1) the project weight vector w of the weights comprising p project is calculated_item, jth (j ∈ { 1,2,3 ..., p }) individual project Project weights calculation formula it is as follows：

m_,jRepresenting matrix M jth row.

2) according to d, l and equation below by w_itemLinear Mapping is to suitable interval：

WhereinRepresent w_itemGreatest member value,Represent w_itemLeast member value.Represent Project j project weights before Linear Mapping,Represent the project weights of project j after Linear Mapping, j ∈ { 1,2,3 ..., p }. D represents the length between Linear Mapping back zone, and l represents the lower bound between Linear Mapping back zone.

3) user's weight vector w of the weights comprising n user is calculated_user, the i-th (i ∈ { 1,2,3 ..., n }) individual user User's weights calculation formula it is as follows：

Wherein σ (u_i) represent u_iStandard deviation.

4) K p dimensional vectors μ is constructed according to equation below_k(k ∈ { 1,2,3 ..., K }) are to represent each Cu Cu center Position：

Wherein j ∈ 1,2,3 ..., p }.

2nd, cluster

S2 in this part corresponding diagram 1, particular flow sheet is shown in Fig. 3.

S2：MWOSK-means algorithms are clustered

S2.1：Travel through all user's training patterns in U

To i-th (i ∈ 1,2,3 ..., n }) individual user：

1) the affiliated clusters of user i are recalculated：

Wherein k ∈ { 1,2,3 ..., K }, closest_k_iRepresent to recalculate some new cluster belonging to obtained user i, closest_k_i∈{1,2,3,...,K}。

2) the new affiliated Cu Cu centers of user i are updated：

Represent to update Hou Cu centers,Represent to update Qian Cu centers.

S2.2：Calculate degree of convergence h

Degree of convergence h is calculated according to equation below：

μ'_kRepresent " S2.1：K-th of cluster center after all user's training patterns in traversal U ", μ_kRepresent the kth before S2.1 Ge Cu centers, k ∈ { 1,2,3 ..., K }.

3rd, number of clusters adaptively increases

S3, S4, S5 in this part corresponding diagram 1, wherein S3, S5 particular flow sheet are shown in Fig. 4, Fig. 5.

S3：The MGSoC algorithm stages one：Calculation error degree e_m

S3.1：Construction includes the error vector e of K element

E is calculated with below equation to each cluster k (k ∈ { 1,2,3 ..., K })_k：

S3.2：Normalize e

That is s (e).

S3.3：Calculation error degree e_m

With following formula calculation error degree e_m：

Wherein, exp () is the exponential function using e the bottom of as.

S4：Judge whether number of clusters amount is suitable

According to degree of error e derived above_m, by e_mCompared with β, if e_mLess than β, then current cluster quantity is suitable；If e_m More than β, then current number of clusters is improper.

S5：The MGSoC algorithm stages two：Calculate Xin Cu center μ_new

S5.1：Calculate the position μ at Xin Cu centers_new

The position μ at Xin Cu centers is calculated with below equation_new：

S5.2：Normalize the cluster center vector μ of new cluster_new

That is s (μ_new)。

S5.3：Remove error vector e

4th, incremental learning

S6：Incremental learning

The personalized recommendation algorithm proposed in the present invention supports following four incremental learning situation：1) existing user i Once project j is scored；2) existing user i updates the scoring to project j；3) there is new user；4) there are new projects. Because the new projects of appearance are not scored by user, therefore model will not be impacted.When new projects are scored by user, situation It is equivalent to " 1) existing user i scores project j for the first time ".

It is specific as follows：

S6.1：Existing user i scores project j for the first time

Corresponding diagram 7.

1) user i mould is updated：

||u'_i| | the mould of user i after updating is represented, | | u_i| | represent the mould of user i before updating.

2) user i and each cluster cosine similarity are updated：

Wherein k ∈ 1,2,3 ..., K }.

3) renewal item j weights：

The weights of project j after updating are represented,Represent the weights of project j before updating.

4) user i weights are updated：

The weights of user i after updating are represented,Represent the weights of user i before updating.

5) the affiliated clusters of user i are recalculated：

6) cluster center is updated according to formula (7)

S6.2：Existing user i updates the scoring to j

Corresponding diagram 8.

1) user i mould is updated：

2) user i and each cluster cosine similarity are updated：

Wherein k ∈ 1,2,3 ..., K }.

3) user i weights are updated：

4) the affiliated clusters of user i are recalculated according to formula (16), (17)；

5) cluster center is updated according to formula (7)

S6.3：There is new user

Corresponding diagram 9.

1) new user's weights are calculated according to formula (3)；

2) the new affiliated cluster of user is calculated according to formula (16), (17)；

3) the cluster center according to belonging to updating formula (7)；

5th, recommend

S7 in this part corresponding diagram 1.

S7：Collaborative filtering recommending

Predict scorings of the user a to project j：

Wherein a ∈ A, k ∈ { 1,2,3 ..., K }, j ∈ { 1,2,3 ..., p }.

Claims

1. a kind of personalized recommendation algorithm of clustering algorithm adaptively increased based on number of clusters (see Fig. 1), it is characterised in that：Comprising Incremental learning and two stages of recommendation based on cluster.

2. the one kind in the incremental learning stage based on cluster is based on MWOSK-means algorithms in a kind of claim 1 and MGSoC is calculated The clustering algorithm that the number of clusters of method adaptively increases (see the P1 in Fig. 1), it is characterised in that：Carried out using MWOSK-means algorithms Cluster (see the S2 in Fig. 1, referring to Fig. 3)；Realize the adaptive growth of quantity of cluster (see in Fig. 1 using MGSoC algorithms P1.1, refers to Fig. 4, Fig. 5).

3. the project weights and use of the initial phase (see the S1.5 in Fig. 2) of MWOSK-means algorithms in a kind of claim 2 The computational methods of family weights, it is characterised in that：Shown in project weight calculation method such as formula (1), (2)；Consider project weights User's weight calculation method such as formula (3) shown in.

4. the whether suitable determination methods of the quantity for judging cluster in cluster process in MGSoC algorithms in a kind of claim 2 (S3, S4 in Fig. 1, wherein S3 refer to Fig. 4), it is characterised in that：The computational methods of the degree of error of each cluster by formula (9), (10) provide.

5. calculated in a kind of claim 2 in MGSoC algorithms newly-increased cluster center initial position computational methods (S5 in Fig. 1, in detail See Fig. 5), it is characterised in that：The computational methods of the initial position at Xin Cu centers are provided by formula (11).