CN104268290B - A kind of recommendation method based on user clustering - Google Patents

A kind of recommendation method based on user clustering Download PDF

Info

Publication number
CN104268290B
CN104268290B CN201410565721.7A CN201410565721A CN104268290B CN 104268290 B CN104268290 B CN 104268290B CN 201410565721 A CN201410565721 A CN 201410565721A CN 104268290 B CN104268290 B CN 104268290B
Authority
CN
China
Prior art keywords
user
interest
clustering
theme label
label
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Fee Related
Application number
CN201410565721.7A
Other languages
Chinese (zh)
Other versions
CN104268290A (en
Inventor
李鹏
王娅丹
金瑜
刘璟
刘欣
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Wuhan University of Science and Engineering WUSE
Original Assignee
Wuhan University of Science and Engineering WUSE
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Wuhan University of Science and Engineering WUSE filed Critical Wuhan University of Science and Engineering WUSE
Priority to CN201410565721.7A priority Critical patent/CN104268290B/en
Publication of CN104268290A publication Critical patent/CN104268290A/en
Application granted granted Critical
Publication of CN104268290B publication Critical patent/CN104268290B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9535Search customisation based on user profiles and personalisation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • G06F16/355Class or cluster creation or modification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/951Indexing; Web crawling techniques

Abstract

The present invention proposes a kind of recommendation method based on user clustering, in order to reasonably effectively be recommended user according to user interest, according to user's always browsing frequency, browsing time and total browsing time, effectively browse frequency and effective browsing time to each theme label, interest-degree is obtained, the interest characteristics vector of user is formed;According to the interest characteristics of user vector, core customer is screened, core customer's collection is constituted, total user is clustered using K means clustering algorithms;Obtain after full user clustering, calculate class interest vector of each user clustering on each theme;Compare interest value and class interest vector, it is recommended.CCVR methods recommendation effect provided by the present invention is better than other recommendation methods, with good accuracy.

Description

A kind of recommendation method based on user clustering
Technical field
The present invention relates to technical field of Internet information, and in particular to a kind of recommendation method based on user clustering.
Background technology
Social networks has gradually substituted traditional acquisition of information channel with the popularization of Internet user, such as newspaper, Magazine, TV news etc., grow into a kind of mode of most people very first time receive information.Such as external facebook, Twitter, domestic microblogging, Renren Network etc..Everybody issues oneself information to be expressed, passed through by sending out message and state The message and state with sharing other people are forwarded, the information for going diffusion to be obtained there from other people.This is related to node disturbance degree The problem of, i.e., one node paid close attention to by owner, the information that it is issued can be seen that a concern is proprietary by owner Node, it can see the information of owner's issue.Certainly, personal energy is limited, it is impossible to looked for by oneself, so Manual all interior perhaps nodes that may be interested of concern afterwards.How to be gone so Internet Information Service side needs to study Effect to user recommend they can it is interested in perhaps node.
The strong or weak relation concept proposed in flood et al., has annotated the concern form in social networks.Renren Network, QQ spaces etc. Form, social networks is built in the way of two-way concern (strong relation);The forms such as microblogging, unidirectionally to pay close attention to (weak relation) Mode builds the network of personal connections of oneself.For the recommendation of mutual concern relation, in the social networks of strong relation, by common good The method of the true social information such as friend, contact person, address list generally just reaches good effect, but past just because of strong relation Toward real social relationships can be built on, consequently, it is possible to just there is significant limitation compared to weak relation, because if can not be with Some node opening relationships cannot see the dynamic that it is issued, and this just seems less reasonable.Some people like releasing news, This kind of node is become for the publisher of message in network, and it is obviously more than what they subscribed to that they issue, and some people like Receive information, these people are more than as subscriber's receive information to release news, if so such a imbalance is built on by force It is just very unreasonable if relation, therefore the social networks form based on weak relation arises at the historic moment, each takes what he needs for everybody.
Bibliography:Yu Hong, poplar shows microblogging interior joints influence power measurement and propagation path model study [J] telecommunications Report, 2012,33 (Z1):96~97;Chen J,Geyer W,Dugan C,Muller M,Guy I.Make new friends, but keep the old:Recommending people on social networking sites//Proceedings of the 27th International Conference on Human Factors in Computing Systems.New York,NY,USA,2009:201~210;Isomery societies of Chen Kehan, Han Panpan, the Wu Jian based on user clustering Hand over network recommendation algorithm [J] Chinese journal of computers, 2013,36 (2):350~351;Mislove Alan,Marcon Massimiliano,Gummadi Krishna P,Druschel Peter,Bhattacharjee Bobby.Measurement and analysis of online social networks//Proceedings of the 7th ACM SIGCOMM Conference on Internet Measurement.San Diego,CA,USA,2007:29~42;Liu Mei lotuses, Liu Tong Deposit, proposed algorithm research [J] the computer applications research that Bruce Lee is extracted based on user interest profile, 2011,28 (5): 1665~1666.
On this kind of recommendation problem, there is scholar also to carry out sufficient research.Collaborative Filtering Recommendation Algorithm be earliest by What Goldberg et al. was proposed, but the system does not take into full account user's request, there is certain defect.For this problem, GroupLens proposes the automatic Collaborative Filtering Recommendation System scored based on user first.Collaborative Filtering Recommendation Algorithm is using most It is more early due to proposing for extensive proposed algorithm, so in the presence of many defects, the later stage occurs in that content-based recommendation calculation again Method, recommendation service is provided the user by item compared and User profile;Proposed algorithm based on correlation rule is main It is that recommendation service is provided the user according to the current buying behavior of Association Rules Model and user.
Existing achievement in research shows that it is very necessary that research, which provides a kind of rational way of recommendation,.
Bibliography:LI Yu,LI Xue-feng.A hybrid collaborative filtering method for multiple-interests and multiple-content recommendation in e-commerce[J] .Expert Systems with Applications,2005,28(1):67~77;HUANG Cheng-lung,HUANG Wei-liang.Handing sequential pattern decay:developing a two-stage collaborative recommender system[J].Electronic Commerce Research and Application,2008,8(3):117~129;LUIS M,JUAN M,JUAN F.A collaborative recommender system base on probabilistic inference from fuzzy observations [J].Fuzzy Set and Systems,2008,159(12):1554~1576;HUANG Zan,ZENG D,CHEN H C.A comparison of collaborative-filtering recommendation algorithms for e- commerce[J].IEEE Intelligent Systems,2007,22(5):68~78;LIU Duen-ren,SHIH Y Y.Hybrid approaches to product recommendation base on customer lifetime value and purchase preferences[J].Journal of Systems and Software,2005,77(2):181~ 191;MATEVZ K,TOMAZ P,et al.Optimisation of combined collaborative recommender systems[J].AEU of Electronics and Communications,2007,61(7):433~443.
The content of the invention
According to some above-mentioned researchs, the present invention provides a kind of recommendation method based on user clustering.
To reach above-mentioned purpose, the technical solution adopted by the present invention is a kind of recommendation method based on user clustering, including Following steps:
Step1, input user set U={ u1,u2…uαAnd theme label set C={ s1,s2…sβ, α represents user Number, β represents theme label number in theme label set C;It is 1 to initialize currently processed user's sequence number i values, is gone to Step2;
Step2, it is 1 to initialize currently processed label sequence number j values, goes to Step3;
Step3, if user uiTheme label s is paid close attention toj, go to Step4;Otherwise user is made to j-th of theme label sense The degree d of interestj=0, go to Step9;
Step4, according to user uiTo theme label sjNumber of visits n, determine user uiTo theme label sjIt is total clear Look at frequency f=n, go to Step5;
Step5, determines user uiTo theme label sjKth time browsing time tj,kAnd total browsing time T, k value The n for 1,2 ..., goes to Step6;
Step6, determines user uiTo theme label sjEffectively browse frequency ef, go to Step7;
Determination mode is, if tmin≤tj,k≤tmax, tminAnd tmaxFor user uiTo the minimum browsing time of label and maximum The predetermined threshold value of browsing time, then user uiIt is effective that the kth of j-th of theme label, which time is browsed, then user uiTo j-th In n navigation process of theme label, all number of times sums effectively browsed are user uiTo the effective clear of j-th theme label Look at frequency;
Step7, seeks efThe secondary browsing time sum effectively browsed, calculates user uiTo theme label sjEffectively browse Time et, go to Step8;
Step8, according to following formula, calculates user uiTo theme label sjInterest-degree dj, go to Step9;
Wherein, parameterF1 is that user browses frequency sum to all theme labels;Ps joins for default system Number period of interest coefficient,Average browsing time of the user to j-th of theme label is represented,Represent that user leads to j-th Inscribe the average effective browsing time of label;
Step9, if user uiThe not browsed tag set c in theme label set CbRepresent, browsed label Collection shares caRepresent, according to following formula, calculate Vi,j, j=j+1 is made, Step3 is gone to if j is less than or equal to β, otherwise goes to Step10;
Step10, makes i=i+1, if i is less than or equal to α, goes to Step2, otherwise makes i=1, initializes core customer's number Mesh γ values are 0, go to Step11;
Step11, according to user uiInterest vector vi(Vi,1,Vi,2,…Vi,β) in nonzero element proportion obtain interest Density value density (ui), if interest density value density (ui) > λ, mark uiFor core customer, Step12 is gone to;It is no Then go to Step13;Wherein, λ is default density threshold;
Step12, makes γ=γ+1, goes to Step13;
Step13, makes i=i+1, if i is less than or equal to α, goes to Step11;Otherwise Step14 is gone to;
Step14, currently available γ core customer starts to cluster whole users with K-means algorithms, this step Suddenly using γ core customer as initial cluster centre, original definition variable newJ=0, oldJ=-1 go to Step15;
Step15, calculates fabs (newJ-oldJ), fabs function representations calculate absolute value, if fabs (newJ-oldJ) More than or equal to the corresponding predetermined threshold value of absolute value, Step16 is gone to, Step19 is otherwise gone to;
Step16, to user set U={ u1,u2…uαIn as each remaining users beyond the user of cluster centre, point Not Ji Suan remaining users and each as the Euclidean distance between the user of cluster centre, and be assigned in closest cluster The heart accordingly in cluster, goes to Step17;
Step17, calculates each user clustering RhIn all user interests vector average value, be used as user clustering RhNew Cluster centre Zh, go to Step18;
Step18, makes oldJ=newJ, and calculating new criterion function value according to criterion function is assigned to newJ, goes to Step15;
Step19, currently available γ user clustering R1,R2…Rγ, go to Step20;
Step20, it is 1 to initialize currently processed classification sequence number h values, goes to Step21;
Step21, the class interest vector Rv of the category is calculated according to following formulah=(RVh1,RVh2,...,RV), go to Step22;
Wherein, | Rh| represent user clustering RhIn user's number,Represent user clustering RhIn any user, use w tables Show cluster RhMiddle user's number,Value is 1,2......w,Represent user clustering RhMiddle userTo j-th of theme label Interest-degree, RVhjRepresent user clustering RhTo the interest-degree of j-th of theme label, j values are 1,2...... β;
Step22, makes h=h+1, if h is less than or equal to γ, goes to Step21, otherwise goes to Step23;
Step23, now obtains the class interest vector of γ classification, Rv1,Rv2…Rvγ, h=1 is made, Step24 is gone to;
Step24, is user clustering RhIn each user difference proposed topic label, if user clustering RhIn user For user set U={ u1,u2…uαIn user ui, for user uiInterest vector vi(Vi,1,Vi,2,…Vi,β), it with User clustering RhClass interest vector Rvh=(RVh1,RVh2,...,RV) in each interest value RVhjIt is compared, if Vi,jGreatly In equal to RVhj, then theme label sjUser is recommended, Step25 is gone to;
Step25, makes h=h+1, if h is less than or equal to γ, goes to Step24, otherwise goes to Step26;
Step26, to user set U={ u1,u2…uαIn the automatic recommendation of each user completed, terminate.
Moreover, in Step18, the calculation formula of criterion function is as follows,
Wherein, w represents user clustering RhMiddle user's number,Represent putting down for the deviation between two characteristic vectors Side,For user clustering RhIn userInterest vector, ZhFor the cluster centre of respective classes.
The invention has the characteristics that:
1) interest quantifies.According to user to the partial volume of the related systems such as the click frequency of label, number of visits, residence time ten The data easily collected, integration quantization is carried out to it, so that interest-degree of each user to each theme label is obtained, for one For individual user, then his interest vector can be obtained.
2) recommendation mechanisms.The concept of class is firstly introduced into, because if if being recommended, entering for each user Row analysis is irrational, and one is that, because workload is huge, two be because the otherness of individual is too obvious.The data of unique user It is that there is particularity very much, if being analyzed for each user, the effect of recommendation also will not be too preferable.It is of the invention in this The solution of release is first to carry out user clustering.Core customer is more representative, so in using them as cluster The heart, obtains user clustering one by one, that is, the users for possessing same interest are brought together.Consequently, it is possible to which each cluster is just It is one group of user with same interest, the community that as user divides.Then, further according to user in class interest to Amount, calculates whole cluster for the interest-degree of theme collection, that is, class interest vector is obtained, then with such interest vector and class User compares, so as to draw recommendation.
Therefore, the present invention can realize automatic recommendation, and ontoanalysis is carried out from wasting Web Community's system resource, without Artificial to participate in, accuracy rate is high, and effect is good, and practical value is high.
Brief description of the drawings
Fig. 1 is the flow chart of the embodiment of the present invention.
CCVR (the Core user for Clustering interesting Vector for that Fig. 2 realizes for the present invention Recommend) method and RK-Means (Random K-Means) method user interest difference in the case of different pieces of information collection The contrast schematic diagram of sex index;
CCVR (the Core user for Clustering interesting Vector for that Fig. 3 realizes for the present invention Recommend) method and RCVR (Random user for Clustering interesting Vector for Recommend Algorithm) method, CCRR (Core user for Clustering Random for Recommend Algorithm) method, RCRR (Random user for Clustering Random for Recommend Algorithm) method recommends the contrast schematic diagram of the degree of accuracy in the case of different pieces of information collection.
Embodiment
The present invention provides a kind of recommendation method (CCVR) based on user clustering, mainly to solve how to be recommended, Propose by considering how that prediction user is interested in which label, so as to effectively be recommended.
First, to be recommended, it is necessary to select some attributes to be recommended, be according to relation, according to good friend, or according to Interest.As it is desirable that content interested can be recommended to be recommended to user, therefore present invention selection interest attribute.However, with How the interest at family will obtain, and this is first the problem of to solve.
Secondly, after the interest of user is obtained, recommended with what mechanism, this be the invention solves the problems that second Problem.Because if after the interest for acquiring user, the interest value of each user is quantized into specific numerical value, these numerical value phases To reflecting interest of the user to each label, in other words, user's content tab interested is just quantitative to be obtained, but It is to need to determine how to determine that user may also be interested be in other content tabs do not paid close attention to, so as to be recommended.
Based on above mentioned problem, The present invention gives solution.With reference to the accompanying drawings and examples to the technology of the present invention side Case is further described.
The present invention is that recommendation method is studied, and proposes the recommendation method based on user clustering, the realization bag of this method Include the design of three parts.The detailed realization for providing embodiment is as follows:
First, interest characteristics is extracted, and constitutes interests matrix.The interest characteristics vector of user is obtained by being defined as below:
Define 1 (browsing frequency) and user is designated as n to the number of visits of j-th of theme label, then n is user to j-th Theme label always browses frequency, is represented with f, i.e. f=n.
User is designated as t by 2 (browsing times) of definition to the kth time browsing time of j-th of theme labelj,k, k value is 1,2 ... n, T is designated as to label j n total browsing time.
3 (effectively browsing frequency) are defined if tmin≤tj,k≤tmax, tminAnd tmaxFor minimum browsing time of the user to label With the threshold value of maximum browsing time, then it is effective that user browses to the kth time of j-th of theme label, then user leads to j-th In n navigation process for inscribing label, all number of times sums effectively browsed are that user browses frequency to j-th of the effective of theme label Rate, is designated as ef.When it is implemented, those skilled in the art can voluntarily preset tminAnd tmaxValue.
Defined for 4 (effective browsing times) by efThe secondary browsing time sum effectively browsed is referred to as user to j-th of theme mark Effective browsing time of label, use etRepresent.
5 (interest-degree) users degree interested in j-th of theme label is defined, d is usedjTo represent.Wherein,
Wherein, parameterF1 is that user browses frequency sum to all labels;Ps is systematic parameter period of interest Coefficient,Average browsing time of the user to j-th of theme label is represented,Represent that user puts down to j-th of theme label Effective browsing time.When it is implemented, user can voluntarily parameter preset ps value, be traditionally arranged to be empirical value.
If user uiThe not browsed tag set c in theme label set CbRepresent, browsed tag set is used caRepresent.User uiTo cbIn the interest-degree of any theme label be 0.In summary, it can be deduced that user uiTo theme label collection Close any theme label s in CjInterest-degree be:
α × β type interest-degree matrix is built on this basis, and α represents user's number, and β is represented in theme label set C Theme label number, i rows represent i-th of user, and i value is 1,2 ..., and α, j row represent j-th of theme label, and j value is 1,2,…β;User uiInterest vector be vi(Vi,1,Vi,2,…Vi,β), j-th of theme label is designated as theme label sj, so emerging Any V in interesting matrixi,jRepresent user uiTo theme label sjInterest level, then calculate all values formation user interest Matrix, is expressed as follows:
s1 s2 … sj … sβ
u1 V1,1 V1,2 … V1,j … V1,β
u2 V2,1 V2,2 … V2,j … V2,β
… … … … … …
ui Vi,1 Vi,2 Vi,j … Vi,β
… … … … … …
uα Vα,1 Vα,2 Vα,j … Vα,β
2nd, core customer is filtered out, core customer's collection is constituted, then using core customer as central point, uses K-means Algorithm is clustered whole users.
(1) for whole user clusterings, it is necessary first to filter out core customer, for each user ui, define its interest Density value density (ui) it is interest vector viMiddle nonzero element proportion, then for density (ui) it is more than density threshold Value λ (those skilled in the art can sets itself experience value, generally take user u 10%)iCore customer is defined as, so real Apply example and show that core customer collects by screening:
CoreUser={ ui|ui∈U,density(ui) > λ (3)
Wherein, U gathers for user, and the interests matrix that core customer's interest vector is constituted is intensive submatrix m '.
(2) next using K-means algorithms, point carries out whole user clusterings centered on core customer.Basis first (1) the core customer's collection analyzed, then circulation for the first time concentrates each core customer as characteristic vector, meter using core customer The Euclidean distance between each non-core user and each core customer is calculated, sees that it is nearest apart from which core customer, then It is assigned to around the core customer, all users is so traveled through and makes a preliminary clusters.Second of iteration will then be calculated The central point each clustered, is then characterized vector with the central point, total user is traveled through again, obtains new cluster.Constantly The process of second of iteration is repeated, until algorithmic statement, obtained cluster is exactly final cluster.
3rd, obtain after full user clustering, user clustering R can be calculatedhClass interest on theme label set C to Amount:
Rvh=(RVh1,RVh2,...,RV)
Wherein,
|Rh| represent user clustering RhIn user's number,Represent user clustering RhIn any user, represent poly- with w Class RhMiddle user's number,Value is 1,2......w,Represent user clustering RhMiddle userTo the emerging of j-th theme label Interesting degree, RVhjRepresent user clustering RhTo the interest-degree of j-th of theme label.
The class interest vector so drawn just represents the interest level entirely clustered.R will each be clusteredhAmong respectively use FamilyInterest vectorWith class interest vector RvhMake comparisons, if user interest is vectorialIn some interest valueIt is more than or waits The a certain interest value RV in class interest vector in the clusterhj, then by the theme label s corresponding to the interest valuejRecommend user
Designed based on more than, when it is implemented, those skilled in the art can realize clustering flow using computer software technology The automatic running of journey.
As shown in accompanying drawing 1, the flow of embodiment is as follows including step:
Step1:Input user set U={ u1,u2…uα, theme label set C={ s1,s2…sβ, initialization is current It is 1 to handle user sequence number i values;Go to Step2;
Step2:It is 1 to initialize currently processed label sequence number j values, goes to Step3;
Step3:If user uiLabel s is paid close attention toj, go to Step4;Otherwise d is madej=0, go to Step9;
Step4:According to defining 1, user u is determinediTo label sjAlways browse frequency f, go to Step5;
Step5:According to defining 2, user u is determinediTo label sjKth time browsing time tj,k(k value is 1,2 ... n) And total browsing time T, go to Step6;
Step6:According to defining 3, user u is determinediTo label sjEffectively browse frequency ef, go to Step7;
That is, if tmin≤tj,k≤tmax, tminAnd tmaxFor minimum browsing time and maximum browsing time of the user to label Threshold value, then user uiIt is effective that the kth of j-th of theme label, which time is browsed, then user uiIt is clear to n times of j-th of theme label During looking at, all number of times sums effectively browsed are user uiFrequency is browsed to j-th of the effective of theme label;
Step7:Seek efThe secondary browsing time sum effectively browsed, calculates user uiTo label sjEffective browsing time et, go to Step8;
Step8:According to defining 5, user u is determinediTo label sjInterest-degree dj, go to Step9;
Step9:According to formula (2), V is calculatedi,j, j=j+1 is made, Step3 is gone to if j is less than or equal to β, otherwise goes to Step10;
Step10:I=i+1 is made, if i is less than or equal to α, Step2 is gone to, i=1 is otherwise made, core customer's number is initialized Mesh γ values are 0, go to Step11;
Step11:According to user uiInterest vector vi(Vi,1,Vi,2,…Vi,β) interest density value is obtained, if current uiInterest density value density (ui) > λ, mark uiFor core customer, Step12 is gone to;Otherwise Step13 is gone to;
Step12:γ=γ+1 is made, Step13 is gone to;
Step13:I=i+1 is made, if i is less than or equal to α, Step11 is gone to;Otherwise Step14 is gone to;
Step14:Currently available γ core customer, starts to cluster whole users with K-means algorithms, this step Suddenly using γ core customer as initial cluster centre, original definition variable newJ=0, oldJ=-1 go to Step15;
Step15:Fabs (newJ-oldJ) is calculated, fabs is C language mathematical function, equivalent to calculating absolute value.If fabs(newJ-oldJ)>=1e-5, goes to Step16, otherwise goes to Step19;
Wherein, fabs (newJ-oldJ)>=1e-5 represents that newJ-oldJ absolute value is more than or equal to 0.00001, is used for Loop control condition, during specific implementation those skilled in the art can voluntarily pre-determined absolute respective threshold;
Step16:To user set U={ u1,u2…uαIn be used as each remaining users beyond the user of cluster centre, meter Calculate remaining users and each as the Euclidean distance between the user of cluster centre, see that it is nearest apart from which cluster centre, Then it is assigned in the cluster, Step17 is gone to;
Step17:Calculate current each user clustering RhAverage, that is, calculate all user interests vector in each classification Average value, the average value is exactly the new cluster centre Z of the categoryh, go to Step18;
Step18:Make oldJ=newJ, calculate new criterion function value and be assigned to newJ (calculation formula of criterion function isW represents user clustering RhMiddle user's number,Represent the deviation between two characteristic vectors Square,For user clustering RhIn user interest vector, ZhFor the cluster centre of respective classes), go to Step15;
Step19:Before after the completion of step cluster, γ classification R is now obtained1,R2…Rγ, go to Step20;
Step20:It is 1 to initialize currently processed classification sequence number h values, goes to Step21;
Step21:The class interest vector Rv of the category is calculated according to formula (4)h=(RVh1,RVh2,...,RV), go to Step22;
Step22:H=h+1 is made, if h is less than or equal to γ, Step21 is gone to, otherwise goes to Step23;
Step23:The class interest vector for obtaining γ classification, Rv are now calculated by step before1,Rv2…Rvγ, make h= 1, go to Step24;
Step24:For user clustering RhIn each user difference proposed topic label, if representing user clustering RhIn appoint One userFor user set U={ u1,u2…uαIn ui, for user uiInterest vector vi(Vi,1,Vi,2,…Vi,β), It and RhThe class interest vector Rv of classificationh=(RVh1,RVh2,...,RV) be compared, j is since 1, until β, each single item All it is compared, if Vi,jMore than or equal to RVhj, then theme label sjUser is recommended, Step25 is gone to;
Step25:H=h+1 is made, if h is less than or equal to γ, Step24 is gone to, otherwise goes to Step26;
Step26:To user set U={ u1,u2…uαIn the recommendation of each user completed, terminate.
Illustrate that the recommendation method works well for ease of during understanding the technology of the present invention effect, carrying out related experiment, from two Individual aspect is tested.
On the one hand it is compared with Di come the user clustering effect to two methods of CCVR and RK-Means.
CCVR:Core user for Clustering interesting Vector for Recommend, the present invention The recommendation method of proposition;
RK-Means:Random K-Means algorithms, using random user as central point, then use k-means algorithms Clustered;
Quality for how to compare two kinds of algorithm Clustering Effects, introduces user interest diversity factor concept.
Two cluster Rvh1、Rvh2Between interest distance use COS distance:
So group cluster R={ R of total user1,R2…RγDi on theme label set C is:
When Di is bigger, then show that the interest between class is more differed, the interest that this allows for each cluster is special Levy more obvious, while just increasing the accuracy that we are predicted interest.
The experimental result that Di is calculated is shown in accompanying drawing 2, and abscissa is data set (number of users × theme label number Amount), ordinate is Di, can intuitively find out that the interest characteristics that CCVR algorithms are clustered is more obvious by figure, this Sample will make it that class interest vector is more representative, and for recommendation, then effect is more accurate.
On the other hand from recommending the degree of accuracy to compare several ways, accuracy rate is to recommend hit number with always pushing away Recommend the ratio of number.
RCVR(Random user for Clustering interesting Vector for Recommend Algorithm) using random user K-Means clusters, class interest vector is recommended.
CCRR (Core user for Clustering Random for Recommend Algorithm) uses core User K-Means is clustered, random to recommend.
RCRR (Random user for Clustering Random for Recommend Algorithm) using with Machine user K-Means is clustered, random to recommend.
The experimental result of degree of accuracy contrast is shown in accompanying drawing 3, and abscissa is data set (number of users × theme label quantity), is indulged Coordinate is the degree of accuracy, can intuitively find out that CCVR proposed by the present invention recommends the degree of accuracy of method to be better than other sides by figure Formula.
Specific embodiment described herein is only to spirit explanation for example of the invention.Technology neck belonging to of the invention The technical staff in domain can be made various modifications or supplement to described specific embodiment or be substituted using similar fashion, But without departing from the spiritual of the present invention or surmount scope defined in appended claims.

Claims (2)

1. a kind of recommendation method based on user clustering, it is characterised in that comprise the following steps:
Step1, input user set U={ u1,u2…uαAnd theme label set C={ s1,s2…sβ, α represents user's number, β represents theme label number in theme label set C;It is 1 to initialize currently processed user's sequence number i values, goes to Step2;
Step2, it is 1 to initialize currently processed label sequence number j values, goes to Step3;
Step3, if user uiTheme label s is paid close attention toj, go to Step4;Otherwise make user interested in j-th of theme label Degree dj=0, go to Step9;
Step4, according to user uiTo theme label sjNumber of visits n, determine user uiTo theme label sjAlways browse frequency Rate f=n, goes to Step5;
Step5, determines user uiTo theme label sjKth time browsing time tj,kAnd total browsing time T, k value is 1, 2 ... n, go to Step6;
Step6, determines user uiTo theme label sjEffectively browse frequency ef, go to Step7;
Determination mode is, if tmin≤tj,k≤tmax, tminAnd tmaxFor user uiThe minimum browsing time of label and maximum are browsed The predetermined threshold value of time, then user uiIt is effective that the kth of j-th of theme label, which time is browsed, then user uiTo j-th of theme In n navigation process of label, all number of times sums effectively browsed are user uiFrequency is browsed to j-th of the effective of theme label Rate;
Step7, seeks efThe secondary browsing time sum effectively browsed, calculates user uiTo theme label sjEffective browsing time et, go to Step8;
Step8, according to following formula, calculates user uiTo theme label sjInterest-degree dj, go to Step9;
Wherein, parameterF1 is that user browses frequency sum to all theme labels;Ps is that default systematic parameter is emerging Interesting time coefficient,Average browsing time of the user to j-th of theme label is represented,Represent user to j-th of theme label The average effective browsing time;
Step9, if user uiThe not browsed tag set c in theme label set CbRepresent, browsed tag set Use caRepresent, according to following formula, calculate Vi,j, Vi,jRepresent user uiTo theme label sjInterest level, user uiInterest vector For vi(Vi,1,Vi,2,…Vi,β);J=j+1 is made, Step3 is gone to if j is less than or equal to β, otherwise goes to Step10;
Step10, makes i=i+1, if i is less than or equal to α, goes to Step2, otherwise makes i=1, initialization core customer's number γ Value is 0, goes to Step11;
Step11, according to user uiInterest vector vi(Vi,1,Vi,2,…Vi,β) in nonzero element proportion obtain interest density Value density (ui), if interest density value density (ui) > λ, mark uiFor core customer, Step12 is gone to;Otherwise turn To Step13;Wherein, λ is default density threshold;
Step12, makes γ=γ+1, goes to Step13;
Step13, makes i=i+1, if i is less than or equal to α, goes to Step11;Otherwise Step14 is gone to;
Step14, currently available γ core customer starts to cluster whole users with K-means algorithms, this step with γ core customer is initial cluster centre, and original definition variable newJ=0, oldJ=-1 go to Step15;
Step15, calculates fabs (newJ-oldJ), fabs function representations calculate absolute value, if fabs (newJ-oldJ) is more than Equal to the corresponding predetermined threshold value of absolute value, Step16 is gone to, Step19 is otherwise gone to;
Step16, to user set U={ u1,u2…uαIn as each remaining users beyond the user of cluster centre, count respectively Calculate remaining users and each as the Euclidean distance between the user of cluster centre, and be assigned to closest cluster centre phase In should clustering, Step17 is gone to;
Step17, calculates each user clustering RhIn all user interests vector average value, be used as user clustering RhNew cluster Center Zh, go to Step18;
Step18, makes oldJ=newJ, and calculating new criterion function value according to criterion function is assigned to newJ, goes to Step15;
Step19, currently available γ user clustering R1,R2…Rγ, go to Step20;
Step20, it is 1 to initialize currently processed classification sequence number h values, goes to Step21;
Step21, the class interest vector Rv of the category is calculated according to following formulah=(RVh1,RVh2,...,RV), go to Step22;
Wherein, | Rh| represent user clustering RhIn user's number,Represent user clustering RhIn any user, represent poly- with w Class RhMiddle user's number,Value is 1,2......w,Represent user clustering RhMiddle userTo the emerging of j-th theme label Interesting degree, RVhjRepresent user clustering RhTo the interest-degree of j-th of theme label, j values are 1,2...... β;
Step22, makes h=h+1, if h is less than or equal to γ, goes to Step21, otherwise goes to Step23;
Step23, now obtains the class interest vector of γ classification, Rv1,Rv2…Rvγ, h=1 is made, Step24 is gone to;
Step24, is user clustering RhIn each user difference proposed topic label, if user clustering RhIn userFor with Family set U={ u1,u2…uαIn user ui, for user uiInterest vector vi(Vi,1,Vi,2,…Vi,β), it and user Cluster RhClass interest vector Rvh=(RVh1,RVh2,...,RV) in each interest value RVhjIt is compared, if Vi,jMore than etc. In RVhj, then theme label sjUser is recommended, Step25 is gone to;
Step25, makes h=h+1, if h is less than or equal to γ, goes to Step24, otherwise goes to Step26;
Step26, to user set U={ u1,u2…uαIn the automatic recommendation of each user completed, terminate.
2. the recommendation method based on user clustering according to claim 1, it is characterised in that:In Step18, criterion function Calculation formula is as follows,
Wherein, w represents user clustering RhMiddle user's number,Square of the deviation between two characteristic vectors is represented,For User clustering RhIn userInterest vector, ZhFor the cluster centre of respective classes.
CN201410565721.7A 2014-10-22 2014-10-22 A kind of recommendation method based on user clustering Expired - Fee Related CN104268290B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201410565721.7A CN104268290B (en) 2014-10-22 2014-10-22 A kind of recommendation method based on user clustering

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201410565721.7A CN104268290B (en) 2014-10-22 2014-10-22 A kind of recommendation method based on user clustering

Publications (2)

Publication Number Publication Date
CN104268290A CN104268290A (en) 2015-01-07
CN104268290B true CN104268290B (en) 2017-08-08

Family

ID=52159811

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201410565721.7A Expired - Fee Related CN104268290B (en) 2014-10-22 2014-10-22 A kind of recommendation method based on user clustering

Country Status (1)

Country Link
CN (1) CN104268290B (en)

Families Citing this family (16)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106033415B (en) * 2015-03-09 2020-07-03 深圳市腾讯计算机系统有限公司 Text content recommendation method and device
CN105808698B (en) * 2016-03-03 2020-04-07 江苏大学 TOP-k position interest point recommendation method facing internet of things user query request
CN105824942A (en) * 2016-03-21 2016-08-03 上海珍岛信息技术有限公司 Item recommendation method and system based on collaborative filtering algorithm
CN106055713B (en) * 2016-07-01 2019-10-18 华南理工大学 Social network user recommended method based on user interest and social subject distillation
CN107608992A (en) * 2016-07-12 2018-01-19 上海视畅信息科技有限公司 A kind of personalized recommendation method based on time shaft
CN106484795A (en) * 2016-09-22 2017-03-08 天津大学 A kind of interest based on non-structured web page data recommends method
CN107122805A (en) * 2017-05-15 2017-09-01 腾讯科技(深圳)有限公司 A kind of user clustering method and apparatus
CN107480217A (en) * 2017-07-31 2017-12-15 陕西识代运筹信息科技股份有限公司 A kind of information processing method and device based on social data
CN107948257B (en) * 2017-11-13 2019-10-01 苏州达家迎信息技术有限公司 The method for pushing and computer readable storage medium of APP
CN107943895A (en) * 2017-11-16 2018-04-20 百度在线网络技术(北京)有限公司 Information-pushing method and device
CN108921398B (en) * 2018-06-14 2020-12-11 口口相传(北京)网络技术有限公司 Shop quality evaluation method and device
CN109087711A (en) * 2018-06-28 2018-12-25 郑州大学第附属医院 Medical big data method for digging and system
CN108876407B (en) * 2018-06-28 2022-04-19 联想(北京)有限公司 Data processing method and electronic equipment
CN109903082B (en) * 2019-01-24 2022-10-28 平安科技(深圳)有限公司 Clustering method based on user portrait, electronic device and storage medium
CN110517114A (en) * 2019-08-21 2019-11-29 广州云徙科技有限公司 A kind of information-pushing method and system based on community discovery algorithm
CN111695039A (en) * 2020-06-12 2020-09-22 江苏海洋大学 Personalized recommendation method based on multi-objective optimization

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101694659A (en) * 2009-10-20 2010-04-14 浙江大学 Individual network news recommending method based on multitheme tracing
CN103235824A (en) * 2013-05-06 2013-08-07 上海河广信息科技有限公司 Method and system for determining web page texts users interested in according to browsed web pages

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101694659A (en) * 2009-10-20 2010-04-14 浙江大学 Individual network news recommending method based on multitheme tracing
CN103235824A (en) * 2013-05-06 2013-08-07 上海河广信息科技有限公司 Method and system for determining web page texts users interested in according to browsed web pages

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
基于协同过滤与划分聚类的改进推荐算法;吴泓辰等;《计算机研究与发展》;20110915;第48卷(第S3期);全文 *

Also Published As

Publication number Publication date
CN104268290A (en) 2015-01-07

Similar Documents

Publication Publication Date Title
CN104268290B (en) A kind of recommendation method based on user clustering
Wu et al. Learning fair representations for recommendation: A graph-based perspective
Yue An extended TOPSIS for determining weights of decision makers with interval numbers
Dou et al. A survey of collaborative filtering algorithms for social recommender systems
CN111160954B (en) Recommendation method facing group object based on graph convolution network model
CN110430471A (en) It is a kind of based on the television recommendations method and system instantaneously calculated
CN105723402A (en) Systems and methods for determining influencers in a social data network
Pérez-Marcos et al. Hybrid system for video game recommendation based on implicit ratings and social networks
CN107145541B (en) Social network recommendation model construction method based on hypergraph structure
Liu et al. Personal recommendation via unequal resource allocation on bipartite networks
Wang et al. A fog-based recommender system
Tran et al. Collaborative filtering via sparse Markov random fields
Mahmood et al. Influence model and doubly extended TOPSIS with TOPSIS based matrix of interpersonal influences
Tian et al. A survey of personalized recommendation based on machine learning algorithms
Qin et al. Towards a personalized movie recommendation system: A deep learning approach
Gao et al. Deep learning with consumer preferences for recommender system
Lian et al. Personalized recommendation via an improved NBI algorithm and user influence model in a Microblog network
Lv et al. Measuring geospatial properties: Relating online content browsing behaviors to users’ points of interest
Bang et al. Collective matrix factorization using tag embedding for effective recommender system
Van Ma et al. Fuzzy Decision Making-based Recommendation Channel System using the Social Network Database
Amini et al. Proposing a new hybrid approach in movie recommender system
Tian et al. Common features based volunteer and voluntary activity recommendation algorithm
Huang et al. Collaborative filtering of web service based on mapreduce
Xie et al. Correlation-based top-k recommendation for web services
Zhou et al. Personalized recommendation algorithm based on user preference and user profile

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20170808

Termination date: 20181022

CF01 Termination of patent right due to non-payment of annual fee