CN104268290B

CN104268290B - A kind of recommendation method based on user clustering

Info

Publication number: CN104268290B
Application number: CN201410565721.7A
Authority: CN
Inventors: 李鹏; 王娅丹; 金瑜; 刘璟; 刘欣
Original assignee: Wuhan University of Science and Engineering WUSE
Current assignee: Wuhan University of Science and Engineering WUSE
Priority date: 2014-10-22
Filing date: 2014-10-22
Publication date: 2017-08-08
Anticipated expiration: 2034-10-22
Also published as: CN104268290A

Abstract

The present invention proposes a kind of recommendation method based on user clustering, in order to reasonably effectively be recommended user according to user interest, according to user's always browsing frequency, browsing time and total browsing time, effectively browse frequency and effective browsing time to each theme label, interest-degree is obtained, the interest characteristics vector of user is formed；According to the interest characteristics of user vector, core customer is screened, core customer's collection is constituted, total user is clustered using K means clustering algorithms；Obtain after full user clustering, calculate class interest vector of each user clustering on each theme；Compare interest value and class interest vector, it is recommended.CCVR methods recommendation effect provided by the present invention is better than other recommendation methods, with good accuracy.

Description

A kind of recommendation method based on user clustering

Technical field

The present invention relates to technical field of Internet information, and in particular to a kind of recommendation method based on user clustering.

Background technology

Social networks has gradually substituted traditional acquisition of information channel with the popularization of Internet user, such as newspaper, Magazine, TV news etc., grow into a kind of mode of most people very first time receive information.Such as external facebook, Twitter, domestic microblogging, Renren Network etc..Everybody issues oneself information to be expressed, passed through by sending out message and state The message and state with sharing other people are forwarded, the information for going diffusion to be obtained there from other people.This is related to node disturbance degree The problem of, i.e., one node paid close attention to by owner, the information that it is issued can be seen that a concern is proprietary by owner Node, it can see the information of owner's issue.Certainly, personal energy is limited, it is impossible to looked for by oneself, so Manual all interior perhaps nodes that may be interested of concern afterwards.How to be gone so Internet Information Service side needs to study Effect to user recommend they can it is interested in perhaps node.

The strong or weak relation concept proposed in flood et al., has annotated the concern form in social networks.Renren Network, QQ spaces etc. Form, social networks is built in the way of two-way concern (strong relation)；The forms such as microblogging, unidirectionally to pay close attention to (weak relation) Mode builds the network of personal connections of oneself.For the recommendation of mutual concern relation, in the social networks of strong relation, by common good The method of the true social information such as friend, contact person, address list generally just reaches good effect, but past just because of strong relation Toward real social relationships can be built on, consequently, it is possible to just there is significant limitation compared to weak relation, because if can not be with Some node opening relationships cannot see the dynamic that it is issued, and this just seems less reasonable.Some people like releasing news, This kind of node is become for the publisher of message in network, and it is obviously more than what they subscribed to that they issue, and some people like Receive information, these people are more than as subscriber's receive information to release news, if so such a imbalance is built on by force It is just very unreasonable if relation, therefore the social networks form based on weak relation arises at the historic moment, each takes what he needs for everybody.

Bibliography：Yu Hong, poplar shows microblogging interior joints influence power measurement and propagation path model study [J] telecommunications Report, 2012,33 (Z1)：96~97；Chen J,Geyer W,Dugan C,Muller M,Guy I.Make new friends, but keep the old:Recommending people on social networking sites//Proceedings of the 27th International Conference on Human Factors in Computing Systems.New York,NY,USA,2009:201~210；Isomery societies of Chen Kehan, Han Panpan, the Wu Jian based on user clustering Hand over network recommendation algorithm [J] Chinese journal of computers, 2013,36 (2)：350~351；Mislove Alan,Marcon Massimiliano,Gummadi Krishna P,Druschel Peter,Bhattacharjee Bobby.Measurement and analysis of online social networks//Proceedings of the 7th ACM SIGCOMM Conference on Internet Measurement.San Diego,CA,USA,2007:29~42；Liu Mei lotuses, Liu Tong Deposit, proposed algorithm research [J] the computer applications research that Bruce Lee is extracted based on user interest profile, 2011,28 (5)： 1665~1666.

On this kind of recommendation problem, there is scholar also to carry out sufficient research.Collaborative Filtering Recommendation Algorithm be earliest by What Goldberg et al. was proposed, but the system does not take into full account user's request, there is certain defect.For this problem, GroupLens proposes the automatic Collaborative Filtering Recommendation System scored based on user first.Collaborative Filtering Recommendation Algorithm is using most It is more early due to proposing for extensive proposed algorithm, so in the presence of many defects, the later stage occurs in that content-based recommendation calculation again Method, recommendation service is provided the user by item compared and User profile；Proposed algorithm based on correlation rule is main It is that recommendation service is provided the user according to the current buying behavior of Association Rules Model and user.

Existing achievement in research shows that it is very necessary that research, which provides a kind of rational way of recommendation,.

Bibliography：LI Yu,LI Xue-feng.A hybrid collaborative filtering method for multiple-interests and multiple-content recommendation in e-commerce[J] .Expert Systems with Applications,2005,28(1):67~77；HUANG Cheng-lung,HUANG Wei-liang.Handing sequential pattern decay:developing a two-stage collaborative recommender system[J].Electronic Commerce Research and Application,2008,8(3):117~129；LUIS M,JUAN M,JUAN F.A collaborative recommender system base on probabilistic inference from fuzzy observations [J].Fuzzy Set and Systems,2008,159(12):1554~1576；HUANG Zan,ZENG D,CHEN H C.A comparison of collaborative-filtering recommendation algorithms for e- commerce[J].IEEE Intelligent Systems,2007,22(5):68~78；LIU Duen-ren,SHIH Y Y.Hybrid approaches to product recommendation base on customer lifetime value and purchase preferences[J].Journal of Systems and Software,2005,77(2):181~ 191；MATEVZ K,TOMAZ P,et al.Optimisation of combined collaborative recommender systems[J].AEU of Electronics and Communications,2007,61(7):433~443.

The content of the invention

According to some above-mentioned researchs, the present invention provides a kind of recommendation method based on user clustering.

To reach above-mentioned purpose, the technical solution adopted by the present invention is a kind of recommendation method based on user clustering, including Following steps：

Step1, input user set U={ u₁,u₂…u_αAnd theme label set C={ s₁,s₂…s_β, α represents user Number, β represents theme label number in theme label set C；It is 1 to initialize currently processed user's sequence number i values, is gone to Step2；

Step2, it is 1 to initialize currently processed label sequence number j values, goes to Step3；

Step3, if user u_iTheme label s is paid close attention to_j, go to Step4；Otherwise user is made to j-th of theme label sense The degree d of interest_j=0, go to Step9；

Step4, according to user u_iTo theme label s_jNumber of visits n, determine user u_iTo theme label s_jIt is total clear Look at frequency f=n, go to Step5；

Step5, determines user u_iTo theme label s_jKth time browsing time t_j,kAnd total browsing time T, k value The n for 1,2 ..., goes to Step6；

Step6, determines user u_iTo theme label s_jEffectively browse frequency e_f, go to Step7；

Determination mode is, if t_min≤t_j,k≤t_max, t_minAnd t_maxFor user u_iTo the minimum browsing time of label and maximum The predetermined threshold value of browsing time, then user u_iIt is effective that the kth of j-th of theme label, which time is browsed, then user u_iTo j-th In n navigation process of theme label, all number of times sums effectively browsed are user u_iTo the effective clear of j-th theme label Look at frequency；

Step7, seeks e_fThe secondary browsing time sum effectively browsed, calculates user u_iTo theme label s_jEffectively browse Time e_t, go to Step8；

Step8, according to following formula, calculates user u_iTo theme label s_jInterest-degree d_j, go to Step9；

Wherein, parameterF1 is that user browses frequency sum to all theme labels；Ps joins for default system Number period of interest coefficient,Average browsing time of the user to j-th of theme label is represented,Represent that user leads to j-th Inscribe the average effective browsing time of label；

Step9, if user u_iThe not browsed tag set c in theme label set C_bRepresent, browsed label Collection shares c_aRepresent, according to following formula, calculate V_i,j, j=j+1 is made, Step3 is gone to if j is less than or equal to β, otherwise goes to Step10；

Step10, makes i=i+1, if i is less than or equal to α, goes to Step2, otherwise makes i=1, initializes core customer's number Mesh γ values are 0, go to Step11；

Step11, according to user u_iInterest vector v_i(V_i,1,V_i,2,…V_i,β) in nonzero element proportion obtain interest Density value density (u_i), if interest density value density (u_i) ＞ λ, mark u_iFor core customer, Step12 is gone to；It is no Then go to Step13；Wherein, λ is default density threshold；

Step12, makes γ=γ+1, goes to Step13；

Step13, makes i=i+1, if i is less than or equal to α, goes to Step11；Otherwise Step14 is gone to；

Step14, currently available γ core customer starts to cluster whole users with K-means algorithms, this step Suddenly using γ core customer as initial cluster centre, original definition variable newJ=0, oldJ=-1 go to Step15；

Step15, calculates fabs (newJ-oldJ), fabs function representations calculate absolute value, if fabs (newJ-oldJ) More than or equal to the corresponding predetermined threshold value of absolute value, Step16 is gone to, Step19 is otherwise gone to；

Step16, to user set U={ u₁,u₂…u_αIn as each remaining users beyond the user of cluster centre, point Not Ji Suan remaining users and each as the Euclidean distance between the user of cluster centre, and be assigned in closest cluster The heart accordingly in cluster, goes to Step17；

Step17, calculates each user clustering R_hIn all user interests vector average value, be used as user clustering R_hNew Cluster centre Z_h, go to Step18；

Step18, makes oldJ=newJ, and calculating new criterion function value according to criterion function is assigned to newJ, goes to Step15；

Step19, currently available γ user clustering R₁,R₂…R_γ, go to Step20；

Step20, it is 1 to initialize currently processed classification sequence number h values, goes to Step21；

Step21, the class interest vector Rv of the category is calculated according to following formula_h=(RV_h1,RV_h2,...,RV_hβ), go to Step22；

Wherein, | R_h| represent user clustering R_hIn user's number,Represent user clustering R_hIn any user, use w tables Show cluster R_hMiddle user's number,Value is 1,2......w,Represent user clustering R_hMiddle userTo j-th of theme label Interest-degree, RV_hjRepresent user clustering R_hTo the interest-degree of j-th of theme label, j values are 1,2...... β；

Step22, makes h=h+1, if h is less than or equal to γ, goes to Step21, otherwise goes to Step23；

Step23, now obtains the class interest vector of γ classification, Rv₁,Rv₂…Rv_γ, h=1 is made, Step24 is gone to；

Step24, is user clustering R_hIn each user difference proposed topic label, if user clustering R_hIn user For user set U={ u₁,u₂…u_αIn user u_i, for user u_iInterest vector v_i(V_i,1,V_i,2,…V_i,β), it with User clustering R_hClass interest vector Rv_h=(RV_h1,RV_h2,...,RV_hβ) in each interest value RV_hjIt is compared, if V_i,jGreatly In equal to RV_hj, then theme label s_jUser is recommended, Step25 is gone to；

Step25, makes h=h+1, if h is less than or equal to γ, goes to Step24, otherwise goes to Step26；

Step26, to user set U={ u₁,u₂…u_αIn the automatic recommendation of each user completed, terminate.

Moreover, in Step18, the calculation formula of criterion function is as follows,

Wherein, w represents user clustering R_hMiddle user's number,Represent putting down for the deviation between two characteristic vectors Side,For user clustering R_hIn userInterest vector, Z_hFor the cluster centre of respective classes.

The invention has the characteristics that：

1) interest quantifies.According to user to the partial volume of the related systems such as the click frequency of label, number of visits, residence time ten The data easily collected, integration quantization is carried out to it, so that interest-degree of each user to each theme label is obtained, for one For individual user, then his interest vector can be obtained.

2) recommendation mechanisms.The concept of class is firstly introduced into, because if if being recommended, entering for each user Row analysis is irrational, and one is that, because workload is huge, two be because the otherness of individual is too obvious.The data of unique user It is that there is particularity very much, if being analyzed for each user, the effect of recommendation also will not be too preferable.It is of the invention in this The solution of release is first to carry out user clustering.Core customer is more representative, so in using them as cluster The heart, obtains user clustering one by one, that is, the users for possessing same interest are brought together.Consequently, it is possible to which each cluster is just It is one group of user with same interest, the community that as user divides.Then, further according to user in class interest to Amount, calculates whole cluster for the interest-degree of theme collection, that is, class interest vector is obtained, then with such interest vector and class User compares, so as to draw recommendation.

Therefore, the present invention can realize automatic recommendation, and ontoanalysis is carried out from wasting Web Community's system resource, without Artificial to participate in, accuracy rate is high, and effect is good, and practical value is high.

Brief description of the drawings

Fig. 1 is the flow chart of the embodiment of the present invention.

CCVR (the Core user for Clustering interesting Vector for that Fig. 2 realizes for the present invention Recommend) method and RK-Means (Random K-Means) method user interest difference in the case of different pieces of information collection The contrast schematic diagram of sex index；

CCVR (the Core user for Clustering interesting Vector for that Fig. 3 realizes for the present invention Recommend) method and RCVR (Random user for Clustering interesting Vector for Recommend Algorithm) method, CCRR (Core user for Clustering Random for Recommend Algorithm) method, RCRR (Random user for Clustering Random for Recommend Algorithm) method recommends the contrast schematic diagram of the degree of accuracy in the case of different pieces of information collection.

Embodiment

The present invention provides a kind of recommendation method (CCVR) based on user clustering, mainly to solve how to be recommended, Propose by considering how that prediction user is interested in which label, so as to effectively be recommended.

First, to be recommended, it is necessary to select some attributes to be recommended, be according to relation, according to good friend, or according to Interest.As it is desirable that content interested can be recommended to be recommended to user, therefore present invention selection interest attribute.However, with How the interest at family will obtain, and this is first the problem of to solve.

Secondly, after the interest of user is obtained, recommended with what mechanism, this be the invention solves the problems that second Problem.Because if after the interest for acquiring user, the interest value of each user is quantized into specific numerical value, these numerical value phases To reflecting interest of the user to each label, in other words, user's content tab interested is just quantitative to be obtained, but It is to need to determine how to determine that user may also be interested be in other content tabs do not paid close attention to, so as to be recommended.

Based on above mentioned problem, The present invention gives solution.With reference to the accompanying drawings and examples to the technology of the present invention side Case is further described.

The present invention is that recommendation method is studied, and proposes the recommendation method based on user clustering, the realization bag of this method Include the design of three parts.The detailed realization for providing embodiment is as follows：

First, interest characteristics is extracted, and constitutes interests matrix.The interest characteristics vector of user is obtained by being defined as below：

Define 1 (browsing frequency) and user is designated as n to the number of visits of j-th of theme label, then n is user to j-th Theme label always browses frequency, is represented with f, i.e. f=n.

User is designated as t by 2 (browsing times) of definition to the kth time browsing time of j-th of theme label_j,k, k value is 1,2 ... n, T is designated as to label j n total browsing time.

3 (effectively browsing frequency) are defined if t_min≤t_j,k≤t_max, t_minAnd t_maxFor minimum browsing time of the user to label With the threshold value of maximum browsing time, then it is effective that user browses to the kth time of j-th of theme label, then user leads to j-th In n navigation process for inscribing label, all number of times sums effectively browsed are that user browses frequency to j-th of the effective of theme label Rate, is designated as e_f.When it is implemented, those skilled in the art can voluntarily preset t_minAnd t_maxValue.

Defined for 4 (effective browsing times) by e_fThe secondary browsing time sum effectively browsed is referred to as user to j-th of theme mark Effective browsing time of label, use e_tRepresent.

5 (interest-degree) users degree interested in j-th of theme label is defined, d is used_jTo represent.Wherein,

Wherein, parameterF1 is that user browses frequency sum to all labels；Ps is systematic parameter period of interest Coefficient,Average browsing time of the user to j-th of theme label is represented,Represent that user puts down to j-th of theme label Effective browsing time.When it is implemented, user can voluntarily parameter preset ps value, be traditionally arranged to be empirical value.

If user u_iThe not browsed tag set c in theme label set C_bRepresent, browsed tag set is used c_aRepresent.User u_iTo c_bIn the interest-degree of any theme label be 0.In summary, it can be deduced that user u_iTo theme label collection Close any theme label s in C_jInterest-degree be：

α × β type interest-degree matrix is built on this basis, and α represents user's number, and β is represented in theme label set C Theme label number, i rows represent i-th of user, and i value is 1,2 ..., and α, j row represent j-th of theme label, and j value is 1,2,…β；User u_iInterest vector be v_i(V_i,1,V_i,2,…V_i,β), j-th of theme label is designated as theme label s_j, so emerging Any V in interesting matrix_i,jRepresent user u_iTo theme label s_jInterest level, then calculate all values formation user interest Matrix, is expressed as follows：

s₁ s₂ … s_j … s_β

u₁ V_1,1 V_1,2 … V_1,j … V_1,β

u₂ V_2,1 V_2,2 … V_2,j … V_2,β

… … … … … …

u_i V_i,1 V_i,2 V_i,j … V_i,β

… … … … … …

u_α V_α,1 V_α,2 V_α,j … V_α,β

2nd, core customer is filtered out, core customer's collection is constituted, then using core customer as central point, uses K-means Algorithm is clustered whole users.

(1) for whole user clusterings, it is necessary first to filter out core customer, for each user u_i, define its interest Density value density (u_i) it is interest vector v_iMiddle nonzero element proportion, then for density (u_i) it is more than density threshold Value λ (those skilled in the art can sets itself experience value, generally take user u 10%)_iCore customer is defined as, so real Apply example and show that core customer collects by screening：

CoreUser={ u_i|u_i∈U,density(u_i) ＞ λ (3)

Wherein, U gathers for user, and the interests matrix that core customer's interest vector is constituted is intensive submatrix m '.

(2) next using K-means algorithms, point carries out whole user clusterings centered on core customer.Basis first (1) the core customer's collection analyzed, then circulation for the first time concentrates each core customer as characteristic vector, meter using core customer The Euclidean distance between each non-core user and each core customer is calculated, sees that it is nearest apart from which core customer, then It is assigned to around the core customer, all users is so traveled through and makes a preliminary clusters.Second of iteration will then be calculated The central point each clustered, is then characterized vector with the central point, total user is traveled through again, obtains new cluster.Constantly The process of second of iteration is repeated, until algorithmic statement, obtained cluster is exactly final cluster.

3rd, obtain after full user clustering, user clustering R can be calculated_hClass interest on theme label set C to Amount：

Rv_h=(RV_h1,RV_h2,...,RV_hβ)

Wherein,

|R_h| represent user clustering R_hIn user's number,Represent user clustering R_hIn any user, represent poly- with w Class R_hMiddle user's number,Value is 1,2......w,Represent user clustering R_hMiddle userTo the emerging of j-th theme label Interesting degree, RV_hjRepresent user clustering R_hTo the interest-degree of j-th of theme label.

The class interest vector so drawn just represents the interest level entirely clustered.R will each be clustered_hAmong respectively use FamilyInterest vectorWith class interest vector Rv_hMake comparisons, if user interest is vectorialIn some interest valueIt is more than or waits The a certain interest value RV in class interest vector in the cluster_hj, then by the theme label s corresponding to the interest value_jRecommend user

Designed based on more than, when it is implemented, those skilled in the art can realize clustering flow using computer software technology The automatic running of journey.

As shown in accompanying drawing 1, the flow of embodiment is as follows including step：

Step1：Input user set U={ u₁,u₂…u_α, theme label set C={ s₁,s₂…s_β, initialization is current It is 1 to handle user sequence number i values；Go to Step2；

Step2：It is 1 to initialize currently processed label sequence number j values, goes to Step3；

Step3：If user u_iLabel s is paid close attention to_j, go to Step4；Otherwise d is made_j=0, go to Step9；

Step4：According to defining 1, user u is determined_iTo label s_jAlways browse frequency f, go to Step5；

Step5：According to defining 2, user u is determined_iTo label s_jKth time browsing time t_j,k(k value is 1,2 ... n) And total browsing time T, go to Step6；

Step6：According to defining 3, user u is determined_iTo label s_jEffectively browse frequency e_f, go to Step7；

That is, if t_min≤t_j,k≤t_max, t_minAnd t_maxFor minimum browsing time and maximum browsing time of the user to label Threshold value, then user u_iIt is effective that the kth of j-th of theme label, which time is browsed, then user u_iIt is clear to n times of j-th of theme label During looking at, all number of times sums effectively browsed are user u_iFrequency is browsed to j-th of the effective of theme label；

Step7：Seek e_fThe secondary browsing time sum effectively browsed, calculates user u_iTo label s_jEffective browsing time e_t, go to Step8；

Step8：According to defining 5, user u is determined_iTo label s_jInterest-degree d_j, go to Step9；

Step9：According to formula (2), V is calculated_i,j, j=j+1 is made, Step3 is gone to if j is less than or equal to β, otherwise goes to Step10；

Step10：I=i+1 is made, if i is less than or equal to α, Step2 is gone to, i=1 is otherwise made, core customer's number is initialized Mesh γ values are 0, go to Step11；

Step11：According to user u_iInterest vector v_i(V_i,1,V_i,2,…V_i,β) interest density value is obtained, if current u_iInterest density value density (u_i) ＞ λ, mark u_iFor core customer, Step12 is gone to；Otherwise Step13 is gone to；

Step12：γ=γ+1 is made, Step13 is gone to；

Step13：I=i+1 is made, if i is less than or equal to α, Step11 is gone to；Otherwise Step14 is gone to；

Step14：Currently available γ core customer, starts to cluster whole users with K-means algorithms, this step Suddenly using γ core customer as initial cluster centre, original definition variable newJ=0, oldJ=-1 go to Step15；

Step15：Fabs (newJ-oldJ) is calculated, fabs is C language mathematical function, equivalent to calculating absolute value.If fabs(newJ-oldJ)>=1e-5, goes to Step16, otherwise goes to Step19；

Wherein, fabs (newJ-oldJ)>=1e-5 represents that newJ-oldJ absolute value is more than or equal to 0.00001, is used for Loop control condition, during specific implementation those skilled in the art can voluntarily pre-determined absolute respective threshold；

Step16：To user set U={ u₁,u₂…u_αIn be used as each remaining users beyond the user of cluster centre, meter Calculate remaining users and each as the Euclidean distance between the user of cluster centre, see that it is nearest apart from which cluster centre, Then it is assigned in the cluster, Step17 is gone to；

Step17：Calculate current each user clustering R_hAverage, that is, calculate all user interests vector in each classification Average value, the average value is exactly the new cluster centre Z of the category_h, go to Step18；

Step18：Make oldJ=newJ, calculate new criterion function value and be assigned to newJ (calculation formula of criterion function isW represents user clustering R_hMiddle user's number,Represent the deviation between two characteristic vectors Square,For user clustering R_hIn user interest vector, Z_hFor the cluster centre of respective classes), go to Step15；

Step19：Before after the completion of step cluster, γ classification R is now obtained₁,R₂…R_γ, go to Step20；

Step20：It is 1 to initialize currently processed classification sequence number h values, goes to Step21；

Step21：The class interest vector Rv of the category is calculated according to formula (4)_h=(RV_h1,RV_h2,...,RV_hβ), go to Step22；

Step22：H=h+1 is made, if h is less than or equal to γ, Step21 is gone to, otherwise goes to Step23；

Step23：The class interest vector for obtaining γ classification, Rv are now calculated by step before₁,Rv₂…Rv_γ, make h= 1, go to Step24；

Step24：For user clustering R_hIn each user difference proposed topic label, if representing user clustering R_hIn appoint One userFor user set U={ u₁,u₂…u_αIn u_i, for user u_iInterest vector v_i(V_i,1,V_i,2,…V_i,β), It and R_hThe class interest vector Rv of classification_h=(RV_h1,RV_h2,...,RV_hβ) be compared, j is since 1, until β, each single item All it is compared, if V_i,jMore than or equal to RV_hj, then theme label s_jUser is recommended, Step25 is gone to；

Step25：H=h+1 is made, if h is less than or equal to γ, Step24 is gone to, otherwise goes to Step26；

Step26：To user set U={ u₁,u₂…u_αIn the recommendation of each user completed, terminate.

Illustrate that the recommendation method works well for ease of during understanding the technology of the present invention effect, carrying out related experiment, from two Individual aspect is tested.

On the one hand it is compared with Di come the user clustering effect to two methods of CCVR and RK-Means.

CCVR：Core user for Clustering interesting Vector for Recommend, the present invention The recommendation method of proposition；

RK-Means：Random K-Means algorithms, using random user as central point, then use k-means algorithms Clustered；

Quality for how to compare two kinds of algorithm Clustering Effects, introduces user interest diversity factor concept.

Two cluster Rv_h1、Rv_h2Between interest distance use COS distance：

So group cluster R={ R of total user₁,R₂…R_γDi on theme label set C is：

When Di is bigger, then show that the interest between class is more differed, the interest that this allows for each cluster is special Levy more obvious, while just increasing the accuracy that we are predicted interest.

The experimental result that Di is calculated is shown in accompanying drawing 2, and abscissa is data set (number of users × theme label number Amount), ordinate is Di, can intuitively find out that the interest characteristics that CCVR algorithms are clustered is more obvious by figure, this Sample will make it that class interest vector is more representative, and for recommendation, then effect is more accurate.

On the other hand from recommending the degree of accuracy to compare several ways, accuracy rate is to recommend hit number with always pushing away Recommend the ratio of number.

RCVR(Random user for Clustering interesting Vector for Recommend Algorithm) using random user K-Means clusters, class interest vector is recommended.

CCRR (Core user for Clustering Random for Recommend Algorithm) uses core User K-Means is clustered, random to recommend.

RCRR (Random user for Clustering Random for Recommend Algorithm) using with Machine user K-Means is clustered, random to recommend.

The experimental result of degree of accuracy contrast is shown in accompanying drawing 3, and abscissa is data set (number of users × theme label quantity), is indulged Coordinate is the degree of accuracy, can intuitively find out that CCVR proposed by the present invention recommends the degree of accuracy of method to be better than other sides by figure Formula.

Specific embodiment described herein is only to spirit explanation for example of the invention.Technology neck belonging to of the invention The technical staff in domain can be made various modifications or supplement to described specific embodiment or be substituted using similar fashion, But without departing from the spiritual of the present invention or surmount scope defined in appended claims.

Claims

1. a kind of recommendation method based on user clustering, it is characterised in that comprise the following steps：

Step1, input user set U={ u₁,u₂…u_αAnd theme label set C={ s₁,s₂…s_β, α represents user's number, β represents theme label number in theme label set C；It is 1 to initialize currently processed user's sequence number i values, goes to Step2；

Step3, if user u_iTheme label s is paid close attention to_j, go to Step4；Otherwise make user interested in j-th of theme label Degree d_j=0, go to Step9；

Step4, according to user u_iTo theme label s_jNumber of visits n, determine user u_iTo theme label s_jAlways browse frequency Rate f=n, goes to Step5；

Step5, determines user u_iTo theme label s_jKth time browsing time t_j,kAnd total browsing time T, k value is 1, 2 ... n, go to Step6；

Determination mode is, if t_min≤t_j,k≤t_max, t_minAnd t_maxFor user u_iThe minimum browsing time of label and maximum are browsed The predetermined threshold value of time, then user u_iIt is effective that the kth of j-th of theme label, which time is browsed, then user u_iTo j-th of theme In n navigation process of label, all number of times sums effectively browsed are user u_iFrequency is browsed to j-th of the effective of theme label Rate；

Step7, seeks e_fThe secondary browsing time sum effectively browsed, calculates user u_iTo theme label s_jEffective browsing time e_t, go to Step8；

Wherein, parameterF1 is that user browses frequency sum to all theme labels；Ps is that default systematic parameter is emerging Interesting time coefficient,Average browsing time of the user to j-th of theme label is represented,Represent user to j-th of theme label The average effective browsing time；

Step9, if user u_iThe not browsed tag set c in theme label set C_bRepresent, browsed tag set Use c_aRepresent, according to following formula, calculate V_i,j, V_i,jRepresent user u_iTo theme label s_jInterest level, user u_iInterest vector For v_i(V_i,1,V_i,2,…V_i,β)；J=j+1 is made, Step3 is gone to if j is less than or equal to β, otherwise goes to Step10；

Step10, makes i=i+1, if i is less than or equal to α, goes to Step2, otherwise makes i=1, initialization core customer's number γ Value is 0, goes to Step11；

Step11, according to user u_iInterest vector v_i(V_i,1,V_i,2,…V_i,β) in nonzero element proportion obtain interest density Value density (u_i), if interest density value density (u_i) ＞ λ, mark u_iFor core customer, Step12 is gone to；Otherwise turn To Step13；Wherein, λ is default density threshold；

Step12, makes γ=γ+1, goes to Step13；

Step14, currently available γ core customer starts to cluster whole users with K-means algorithms, this step with γ core customer is initial cluster centre, and original definition variable newJ=0, oldJ=-1 go to Step15；

Step15, calculates fabs (newJ-oldJ), fabs function representations calculate absolute value, if fabs (newJ-oldJ) is more than Equal to the corresponding predetermined threshold value of absolute value, Step16 is gone to, Step19 is otherwise gone to；

Step16, to user set U={ u₁,u₂…u_αIn as each remaining users beyond the user of cluster centre, count respectively Calculate remaining users and each as the Euclidean distance between the user of cluster centre, and be assigned to closest cluster centre phase In should clustering, Step17 is gone to；

Step17, calculates each user clustering R_hIn all user interests vector average value, be used as user clustering R_hNew cluster Center Z_h, go to Step18；

Wherein, | R_h| represent user clustering R_hIn user's number,Represent user clustering R_hIn any user, represent poly- with w Class R_hMiddle user's number,Value is 1,2......w,Represent user clustering R_hMiddle userTo the emerging of j-th theme label Interesting degree, RV_hjRepresent user clustering R_hTo the interest-degree of j-th of theme label, j values are 1,2...... β；

Step24, is user clustering R_hIn each user difference proposed topic label, if user clustering R_hIn userFor with Family set U={ u₁,u₂…u_αIn user u_i, for user u_iInterest vector v_i(V_i,1,V_i,2,…V_i,β), it and user Cluster R_hClass interest vector Rv_h=(RV_h1,RV_h2,...,RV_hβ) in each interest value RV_hjIt is compared, if V_i,jMore than etc. In RV_hj, then theme label s_jUser is recommended, Step25 is gone to；

2. the recommendation method based on user clustering according to claim 1, it is characterised in that：In Step18, criterion function Calculation formula is as follows,

Wherein, w represents user clustering R_hMiddle user's number,Square of the deviation between two characteristic vectors is represented,For User clustering R_hIn userInterest vector, Z_hFor the cluster centre of respective classes.