CN104268290B - A kind of recommendation method based on user clustering - Google Patents
A kind of recommendation method based on user clustering Download PDFInfo
- Publication number
- CN104268290B CN104268290B CN201410565721.7A CN201410565721A CN104268290B CN 104268290 B CN104268290 B CN 104268290B CN 201410565721 A CN201410565721 A CN 201410565721A CN 104268290 B CN104268290 B CN 104268290B
- Authority
- CN
- China
- Prior art keywords
- user
- interest
- clustering
- theme label
- label
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Expired - Fee Related
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/95—Retrieval from the web
- G06F16/953—Querying, e.g. by the use of web search engines
- G06F16/9535—Search customisation based on user profiles and personalisation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/35—Clustering; Classification
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/35—Clustering; Classification
- G06F16/355—Class or cluster creation or modification
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/95—Retrieval from the web
- G06F16/951—Indexing; Web crawling techniques
Abstract
The present invention proposes a kind of recommendation method based on user clustering, in order to reasonably effectively be recommended user according to user interest, according to user's always browsing frequency, browsing time and total browsing time, effectively browse frequency and effective browsing time to each theme label, interest-degree is obtained, the interest characteristics vector of user is formed;According to the interest characteristics of user vector, core customer is screened, core customer's collection is constituted, total user is clustered using K means clustering algorithms;Obtain after full user clustering, calculate class interest vector of each user clustering on each theme;Compare interest value and class interest vector, it is recommended.CCVR methods recommendation effect provided by the present invention is better than other recommendation methods, with good accuracy.
Description
Technical field
The present invention relates to technical field of Internet information, and in particular to a kind of recommendation method based on user clustering.
Background technology
Social networks has gradually substituted traditional acquisition of information channel with the popularization of Internet user, such as newspaper,
Magazine, TV news etc., grow into a kind of mode of most people very first time receive information.Such as external facebook,
Twitter, domestic microblogging, Renren Network etc..Everybody issues oneself information to be expressed, passed through by sending out message and state
The message and state with sharing other people are forwarded, the information for going diffusion to be obtained there from other people.This is related to node disturbance degree
The problem of, i.e., one node paid close attention to by owner, the information that it is issued can be seen that a concern is proprietary by owner
Node, it can see the information of owner's issue.Certainly, personal energy is limited, it is impossible to looked for by oneself, so
Manual all interior perhaps nodes that may be interested of concern afterwards.How to be gone so Internet Information Service side needs to study
Effect to user recommend they can it is interested in perhaps node.
The strong or weak relation concept proposed in flood et al., has annotated the concern form in social networks.Renren Network, QQ spaces etc.
Form, social networks is built in the way of two-way concern (strong relation);The forms such as microblogging, unidirectionally to pay close attention to (weak relation)
Mode builds the network of personal connections of oneself.For the recommendation of mutual concern relation, in the social networks of strong relation, by common good
The method of the true social information such as friend, contact person, address list generally just reaches good effect, but past just because of strong relation
Toward real social relationships can be built on, consequently, it is possible to just there is significant limitation compared to weak relation, because if can not be with
Some node opening relationships cannot see the dynamic that it is issued, and this just seems less reasonable.Some people like releasing news,
This kind of node is become for the publisher of message in network, and it is obviously more than what they subscribed to that they issue, and some people like
Receive information, these people are more than as subscriber's receive information to release news, if so such a imbalance is built on by force
It is just very unreasonable if relation, therefore the social networks form based on weak relation arises at the historic moment, each takes what he needs for everybody.
Bibliography:Yu Hong, poplar shows microblogging interior joints influence power measurement and propagation path model study [J] telecommunications
Report, 2012,33 (Z1):96~97;Chen J,Geyer W,Dugan C,Muller M,Guy I.Make new friends,
but keep the old:Recommending people on social networking sites//Proceedings
of the 27th International Conference on Human Factors in Computing
Systems.New York,NY,USA,2009:201~210;Isomery societies of Chen Kehan, Han Panpan, the Wu Jian based on user clustering
Hand over network recommendation algorithm [J] Chinese journal of computers, 2013,36 (2):350~351;Mislove Alan,Marcon
Massimiliano,Gummadi Krishna P,Druschel Peter,Bhattacharjee Bobby.Measurement
and analysis of online social networks//Proceedings of the 7th ACM SIGCOMM
Conference on Internet Measurement.San Diego,CA,USA,2007:29~42;Liu Mei lotuses, Liu Tong
Deposit, proposed algorithm research [J] the computer applications research that Bruce Lee is extracted based on user interest profile, 2011,28 (5):
1665~1666.
On this kind of recommendation problem, there is scholar also to carry out sufficient research.Collaborative Filtering Recommendation Algorithm be earliest by
What Goldberg et al. was proposed, but the system does not take into full account user's request, there is certain defect.For this problem,
GroupLens proposes the automatic Collaborative Filtering Recommendation System scored based on user first.Collaborative Filtering Recommendation Algorithm is using most
It is more early due to proposing for extensive proposed algorithm, so in the presence of many defects, the later stage occurs in that content-based recommendation calculation again
Method, recommendation service is provided the user by item compared and User profile;Proposed algorithm based on correlation rule is main
It is that recommendation service is provided the user according to the current buying behavior of Association Rules Model and user.
Existing achievement in research shows that it is very necessary that research, which provides a kind of rational way of recommendation,.
Bibliography:LI Yu,LI Xue-feng.A hybrid collaborative filtering method
for multiple-interests and multiple-content recommendation in e-commerce[J]
.Expert Systems with Applications,2005,28(1):67~77;HUANG Cheng-lung,HUANG
Wei-liang.Handing sequential pattern decay:developing a two-stage
collaborative recommender system[J].Electronic Commerce Research and
Application,2008,8(3):117~129;LUIS M,JUAN M,JUAN F.A collaborative
recommender system base on probabilistic inference from fuzzy observations
[J].Fuzzy Set and Systems,2008,159(12):1554~1576;HUANG Zan,ZENG D,CHEN H C.A
comparison of collaborative-filtering recommendation algorithms for e-
commerce[J].IEEE Intelligent Systems,2007,22(5):68~78;LIU Duen-ren,SHIH Y
Y.Hybrid approaches to product recommendation base on customer lifetime value
and purchase preferences[J].Journal of Systems and Software,2005,77(2):181~
191;MATEVZ K,TOMAZ P,et al.Optimisation of combined collaborative recommender
systems[J].AEU of Electronics and Communications,2007,61(7):433~443.
The content of the invention
According to some above-mentioned researchs, the present invention provides a kind of recommendation method based on user clustering.
To reach above-mentioned purpose, the technical solution adopted by the present invention is a kind of recommendation method based on user clustering, including
Following steps:
Step1, input user set U={ u1,u2…uαAnd theme label set C={ s1,s2…sβ, α represents user
Number, β represents theme label number in theme label set C;It is 1 to initialize currently processed user's sequence number i values, is gone to
Step2;
Step2, it is 1 to initialize currently processed label sequence number j values, goes to Step3;
Step3, if user uiTheme label s is paid close attention toj, go to Step4;Otherwise user is made to j-th of theme label sense
The degree d of interestj=0, go to Step9;
Step4, according to user uiTo theme label sjNumber of visits n, determine user uiTo theme label sjIt is total clear
Look at frequency f=n, go to Step5;
Step5, determines user uiTo theme label sjKth time browsing time tj,kAnd total browsing time T, k value
The n for 1,2 ..., goes to Step6;
Step6, determines user uiTo theme label sjEffectively browse frequency ef, go to Step7;
Determination mode is, if tmin≤tj,k≤tmax, tminAnd tmaxFor user uiTo the minimum browsing time of label and maximum
The predetermined threshold value of browsing time, then user uiIt is effective that the kth of j-th of theme label, which time is browsed, then user uiTo j-th
In n navigation process of theme label, all number of times sums effectively browsed are user uiTo the effective clear of j-th theme label
Look at frequency;
Step7, seeks efThe secondary browsing time sum effectively browsed, calculates user uiTo theme label sjEffectively browse
Time et, go to Step8;
Step8, according to following formula, calculates user uiTo theme label sjInterest-degree dj, go to Step9;
Wherein, parameterF1 is that user browses frequency sum to all theme labels;Ps joins for default system
Number period of interest coefficient,Average browsing time of the user to j-th of theme label is represented,Represent that user leads to j-th
Inscribe the average effective browsing time of label;
Step9, if user uiThe not browsed tag set c in theme label set CbRepresent, browsed label
Collection shares caRepresent, according to following formula, calculate Vi,j, j=j+1 is made, Step3 is gone to if j is less than or equal to β, otherwise goes to
Step10;
Step10, makes i=i+1, if i is less than or equal to α, goes to Step2, otherwise makes i=1, initializes core customer's number
Mesh γ values are 0, go to Step11;
Step11, according to user uiInterest vector vi(Vi,1,Vi,2,…Vi,β) in nonzero element proportion obtain interest
Density value density (ui), if interest density value density (ui) > λ, mark uiFor core customer, Step12 is gone to;It is no
Then go to Step13;Wherein, λ is default density threshold;
Step12, makes γ=γ+1, goes to Step13;
Step13, makes i=i+1, if i is less than or equal to α, goes to Step11;Otherwise Step14 is gone to;
Step14, currently available γ core customer starts to cluster whole users with K-means algorithms, this step
Suddenly using γ core customer as initial cluster centre, original definition variable newJ=0, oldJ=-1 go to Step15;
Step15, calculates fabs (newJ-oldJ), fabs function representations calculate absolute value, if fabs (newJ-oldJ)
More than or equal to the corresponding predetermined threshold value of absolute value, Step16 is gone to, Step19 is otherwise gone to;
Step16, to user set U={ u1,u2…uαIn as each remaining users beyond the user of cluster centre, point
Not Ji Suan remaining users and each as the Euclidean distance between the user of cluster centre, and be assigned in closest cluster
The heart accordingly in cluster, goes to Step17;
Step17, calculates each user clustering RhIn all user interests vector average value, be used as user clustering RhNew
Cluster centre Zh, go to Step18;
Step18, makes oldJ=newJ, and calculating new criterion function value according to criterion function is assigned to newJ, goes to
Step15;
Step19, currently available γ user clustering R1,R2…Rγ, go to Step20;
Step20, it is 1 to initialize currently processed classification sequence number h values, goes to Step21;
Step21, the class interest vector Rv of the category is calculated according to following formulah=(RVh1,RVh2,...,RVhβ), go to
Step22;
Wherein, | Rh| represent user clustering RhIn user's number,Represent user clustering RhIn any user, use w tables
Show cluster RhMiddle user's number,Value is 1,2......w,Represent user clustering RhMiddle userTo j-th of theme label
Interest-degree, RVhjRepresent user clustering RhTo the interest-degree of j-th of theme label, j values are 1,2...... β;
Step22, makes h=h+1, if h is less than or equal to γ, goes to Step21, otherwise goes to Step23;
Step23, now obtains the class interest vector of γ classification, Rv1,Rv2…Rvγ, h=1 is made, Step24 is gone to;
Step24, is user clustering RhIn each user difference proposed topic label, if user clustering RhIn user
For user set U={ u1,u2…uαIn user ui, for user uiInterest vector vi(Vi,1,Vi,2,…Vi,β), it with
User clustering RhClass interest vector Rvh=(RVh1,RVh2,...,RVhβ) in each interest value RVhjIt is compared, if Vi,jGreatly
In equal to RVhj, then theme label sjUser is recommended, Step25 is gone to;
Step25, makes h=h+1, if h is less than or equal to γ, goes to Step24, otherwise goes to Step26;
Step26, to user set U={ u1,u2…uαIn the automatic recommendation of each user completed, terminate.
Moreover, in Step18, the calculation formula of criterion function is as follows,
Wherein, w represents user clustering RhMiddle user's number,Represent putting down for the deviation between two characteristic vectors
Side,For user clustering RhIn userInterest vector, ZhFor the cluster centre of respective classes.
The invention has the characteristics that:
1) interest quantifies.According to user to the partial volume of the related systems such as the click frequency of label, number of visits, residence time ten
The data easily collected, integration quantization is carried out to it, so that interest-degree of each user to each theme label is obtained, for one
For individual user, then his interest vector can be obtained.
2) recommendation mechanisms.The concept of class is firstly introduced into, because if if being recommended, entering for each user
Row analysis is irrational, and one is that, because workload is huge, two be because the otherness of individual is too obvious.The data of unique user
It is that there is particularity very much, if being analyzed for each user, the effect of recommendation also will not be too preferable.It is of the invention in this
The solution of release is first to carry out user clustering.Core customer is more representative, so in using them as cluster
The heart, obtains user clustering one by one, that is, the users for possessing same interest are brought together.Consequently, it is possible to which each cluster is just
It is one group of user with same interest, the community that as user divides.Then, further according to user in class interest to
Amount, calculates whole cluster for the interest-degree of theme collection, that is, class interest vector is obtained, then with such interest vector and class
User compares, so as to draw recommendation.
Therefore, the present invention can realize automatic recommendation, and ontoanalysis is carried out from wasting Web Community's system resource, without
Artificial to participate in, accuracy rate is high, and effect is good, and practical value is high.
Brief description of the drawings
Fig. 1 is the flow chart of the embodiment of the present invention.
CCVR (the Core user for Clustering interesting Vector for that Fig. 2 realizes for the present invention
Recommend) method and RK-Means (Random K-Means) method user interest difference in the case of different pieces of information collection
The contrast schematic diagram of sex index;
CCVR (the Core user for Clustering interesting Vector for that Fig. 3 realizes for the present invention
Recommend) method and RCVR (Random user for Clustering interesting Vector for
Recommend Algorithm) method, CCRR (Core user for Clustering Random for Recommend
Algorithm) method, RCRR (Random user for Clustering Random for Recommend
Algorithm) method recommends the contrast schematic diagram of the degree of accuracy in the case of different pieces of information collection.
Embodiment
The present invention provides a kind of recommendation method (CCVR) based on user clustering, mainly to solve how to be recommended,
Propose by considering how that prediction user is interested in which label, so as to effectively be recommended.
First, to be recommended, it is necessary to select some attributes to be recommended, be according to relation, according to good friend, or according to
Interest.As it is desirable that content interested can be recommended to be recommended to user, therefore present invention selection interest attribute.However, with
How the interest at family will obtain, and this is first the problem of to solve.
Secondly, after the interest of user is obtained, recommended with what mechanism, this be the invention solves the problems that second
Problem.Because if after the interest for acquiring user, the interest value of each user is quantized into specific numerical value, these numerical value phases
To reflecting interest of the user to each label, in other words, user's content tab interested is just quantitative to be obtained, but
It is to need to determine how to determine that user may also be interested be in other content tabs do not paid close attention to, so as to be recommended.
Based on above mentioned problem, The present invention gives solution.With reference to the accompanying drawings and examples to the technology of the present invention side
Case is further described.
The present invention is that recommendation method is studied, and proposes the recommendation method based on user clustering, the realization bag of this method
Include the design of three parts.The detailed realization for providing embodiment is as follows:
First, interest characteristics is extracted, and constitutes interests matrix.The interest characteristics vector of user is obtained by being defined as below:
Define 1 (browsing frequency) and user is designated as n to the number of visits of j-th of theme label, then n is user to j-th
Theme label always browses frequency, is represented with f, i.e. f=n.
User is designated as t by 2 (browsing times) of definition to the kth time browsing time of j-th of theme labelj,k, k value is
1,2 ... n, T is designated as to label j n total browsing time.
3 (effectively browsing frequency) are defined if tmin≤tj,k≤tmax, tminAnd tmaxFor minimum browsing time of the user to label
With the threshold value of maximum browsing time, then it is effective that user browses to the kth time of j-th of theme label, then user leads to j-th
In n navigation process for inscribing label, all number of times sums effectively browsed are that user browses frequency to j-th of the effective of theme label
Rate, is designated as ef.When it is implemented, those skilled in the art can voluntarily preset tminAnd tmaxValue.
Defined for 4 (effective browsing times) by efThe secondary browsing time sum effectively browsed is referred to as user to j-th of theme mark
Effective browsing time of label, use etRepresent.
5 (interest-degree) users degree interested in j-th of theme label is defined, d is usedjTo represent.Wherein,
Wherein, parameterF1 is that user browses frequency sum to all labels;Ps is systematic parameter period of interest
Coefficient,Average browsing time of the user to j-th of theme label is represented,Represent that user puts down to j-th of theme label
Effective browsing time.When it is implemented, user can voluntarily parameter preset ps value, be traditionally arranged to be empirical value.
If user uiThe not browsed tag set c in theme label set CbRepresent, browsed tag set is used
caRepresent.User uiTo cbIn the interest-degree of any theme label be 0.In summary, it can be deduced that user uiTo theme label collection
Close any theme label s in CjInterest-degree be:
α × β type interest-degree matrix is built on this basis, and α represents user's number, and β is represented in theme label set C
Theme label number, i rows represent i-th of user, and i value is 1,2 ..., and α, j row represent j-th of theme label, and j value is
1,2,…β;User uiInterest vector be vi(Vi,1,Vi,2,…Vi,β), j-th of theme label is designated as theme label sj, so emerging
Any V in interesting matrixi,jRepresent user uiTo theme label sjInterest level, then calculate all values formation user interest
Matrix, is expressed as follows:
s1 s2 … sj … sβ
u1 V1,1 V1,2 … V1,j … V1,β
u2 V2,1 V2,2 … V2,j … V2,β
… … … … … …
ui Vi,1 Vi,2 Vi,j … Vi,β
… … … … … …
uα Vα,1 Vα,2 Vα,j … Vα,β
2nd, core customer is filtered out, core customer's collection is constituted, then using core customer as central point, uses K-means
Algorithm is clustered whole users.
(1) for whole user clusterings, it is necessary first to filter out core customer, for each user ui, define its interest
Density value density (ui) it is interest vector viMiddle nonzero element proportion, then for density (ui) it is more than density threshold
Value λ (those skilled in the art can sets itself experience value, generally take user u 10%)iCore customer is defined as, so real
Apply example and show that core customer collects by screening:
CoreUser={ ui|ui∈U,density(ui) > λ (3)
Wherein, U gathers for user, and the interests matrix that core customer's interest vector is constituted is intensive submatrix m '.
(2) next using K-means algorithms, point carries out whole user clusterings centered on core customer.Basis first
(1) the core customer's collection analyzed, then circulation for the first time concentrates each core customer as characteristic vector, meter using core customer
The Euclidean distance between each non-core user and each core customer is calculated, sees that it is nearest apart from which core customer, then
It is assigned to around the core customer, all users is so traveled through and makes a preliminary clusters.Second of iteration will then be calculated
The central point each clustered, is then characterized vector with the central point, total user is traveled through again, obtains new cluster.Constantly
The process of second of iteration is repeated, until algorithmic statement, obtained cluster is exactly final cluster.
3rd, obtain after full user clustering, user clustering R can be calculatedhClass interest on theme label set C to
Amount:
Rvh=(RVh1,RVh2,...,RVhβ)
Wherein,
|Rh| represent user clustering RhIn user's number,Represent user clustering RhIn any user, represent poly- with w
Class RhMiddle user's number,Value is 1,2......w,Represent user clustering RhMiddle userTo the emerging of j-th theme label
Interesting degree, RVhjRepresent user clustering RhTo the interest-degree of j-th of theme label.
The class interest vector so drawn just represents the interest level entirely clustered.R will each be clusteredhAmong respectively use
FamilyInterest vectorWith class interest vector RvhMake comparisons, if user interest is vectorialIn some interest valueIt is more than or waits
The a certain interest value RV in class interest vector in the clusterhj, then by the theme label s corresponding to the interest valuejRecommend user
Designed based on more than, when it is implemented, those skilled in the art can realize clustering flow using computer software technology
The automatic running of journey.
As shown in accompanying drawing 1, the flow of embodiment is as follows including step:
Step1:Input user set U={ u1,u2…uα, theme label set C={ s1,s2…sβ, initialization is current
It is 1 to handle user sequence number i values;Go to Step2;
Step2:It is 1 to initialize currently processed label sequence number j values, goes to Step3;
Step3:If user uiLabel s is paid close attention toj, go to Step4;Otherwise d is madej=0, go to Step9;
Step4:According to defining 1, user u is determinediTo label sjAlways browse frequency f, go to Step5;
Step5:According to defining 2, user u is determinediTo label sjKth time browsing time tj,k(k value is 1,2 ... n)
And total browsing time T, go to Step6;
Step6:According to defining 3, user u is determinediTo label sjEffectively browse frequency ef, go to Step7;
That is, if tmin≤tj,k≤tmax, tminAnd tmaxFor minimum browsing time and maximum browsing time of the user to label
Threshold value, then user uiIt is effective that the kth of j-th of theme label, which time is browsed, then user uiIt is clear to n times of j-th of theme label
During looking at, all number of times sums effectively browsed are user uiFrequency is browsed to j-th of the effective of theme label;
Step7:Seek efThe secondary browsing time sum effectively browsed, calculates user uiTo label sjEffective browsing time
et, go to Step8;
Step8:According to defining 5, user u is determinediTo label sjInterest-degree dj, go to Step9;
Step9:According to formula (2), V is calculatedi,j, j=j+1 is made, Step3 is gone to if j is less than or equal to β, otherwise goes to
Step10;
Step10:I=i+1 is made, if i is less than or equal to α, Step2 is gone to, i=1 is otherwise made, core customer's number is initialized
Mesh γ values are 0, go to Step11;
Step11:According to user uiInterest vector vi(Vi,1,Vi,2,…Vi,β) interest density value is obtained, if current
uiInterest density value density (ui) > λ, mark uiFor core customer, Step12 is gone to;Otherwise Step13 is gone to;
Step12:γ=γ+1 is made, Step13 is gone to;
Step13:I=i+1 is made, if i is less than or equal to α, Step11 is gone to;Otherwise Step14 is gone to;
Step14:Currently available γ core customer, starts to cluster whole users with K-means algorithms, this step
Suddenly using γ core customer as initial cluster centre, original definition variable newJ=0, oldJ=-1 go to Step15;
Step15:Fabs (newJ-oldJ) is calculated, fabs is C language mathematical function, equivalent to calculating absolute value.If
fabs(newJ-oldJ)>=1e-5, goes to Step16, otherwise goes to Step19;
Wherein, fabs (newJ-oldJ)>=1e-5 represents that newJ-oldJ absolute value is more than or equal to 0.00001, is used for
Loop control condition, during specific implementation those skilled in the art can voluntarily pre-determined absolute respective threshold;
Step16:To user set U={ u1,u2…uαIn be used as each remaining users beyond the user of cluster centre, meter
Calculate remaining users and each as the Euclidean distance between the user of cluster centre, see that it is nearest apart from which cluster centre,
Then it is assigned in the cluster, Step17 is gone to;
Step17:Calculate current each user clustering RhAverage, that is, calculate all user interests vector in each classification
Average value, the average value is exactly the new cluster centre Z of the categoryh, go to Step18;
Step18:Make oldJ=newJ, calculate new criterion function value and be assigned to newJ (calculation formula of criterion function isW represents user clustering RhMiddle user's number,Represent the deviation between two characteristic vectors
Square,For user clustering RhIn user interest vector, ZhFor the cluster centre of respective classes), go to Step15;
Step19:Before after the completion of step cluster, γ classification R is now obtained1,R2…Rγ, go to Step20;
Step20:It is 1 to initialize currently processed classification sequence number h values, goes to Step21;
Step21:The class interest vector Rv of the category is calculated according to formula (4)h=(RVh1,RVh2,...,RVhβ), go to
Step22;
Step22:H=h+1 is made, if h is less than or equal to γ, Step21 is gone to, otherwise goes to Step23;
Step23:The class interest vector for obtaining γ classification, Rv are now calculated by step before1,Rv2…Rvγ, make h=
1, go to Step24;
Step24:For user clustering RhIn each user difference proposed topic label, if representing user clustering RhIn appoint
One userFor user set U={ u1,u2…uαIn ui, for user uiInterest vector vi(Vi,1,Vi,2,…Vi,β),
It and RhThe class interest vector Rv of classificationh=(RVh1,RVh2,...,RVhβ) be compared, j is since 1, until β, each single item
All it is compared, if Vi,jMore than or equal to RVhj, then theme label sjUser is recommended, Step25 is gone to;
Step25:H=h+1 is made, if h is less than or equal to γ, Step24 is gone to, otherwise goes to Step26;
Step26:To user set U={ u1,u2…uαIn the recommendation of each user completed, terminate.
Illustrate that the recommendation method works well for ease of during understanding the technology of the present invention effect, carrying out related experiment, from two
Individual aspect is tested.
On the one hand it is compared with Di come the user clustering effect to two methods of CCVR and RK-Means.
CCVR:Core user for Clustering interesting Vector for Recommend, the present invention
The recommendation method of proposition;
RK-Means:Random K-Means algorithms, using random user as central point, then use k-means algorithms
Clustered;
Quality for how to compare two kinds of algorithm Clustering Effects, introduces user interest diversity factor concept.
Two cluster Rvh1、Rvh2Between interest distance use COS distance:
So group cluster R={ R of total user1,R2…RγDi on theme label set C is:
When Di is bigger, then show that the interest between class is more differed, the interest that this allows for each cluster is special
Levy more obvious, while just increasing the accuracy that we are predicted interest.
The experimental result that Di is calculated is shown in accompanying drawing 2, and abscissa is data set (number of users × theme label number
Amount), ordinate is Di, can intuitively find out that the interest characteristics that CCVR algorithms are clustered is more obvious by figure, this
Sample will make it that class interest vector is more representative, and for recommendation, then effect is more accurate.
On the other hand from recommending the degree of accuracy to compare several ways, accuracy rate is to recommend hit number with always pushing away
Recommend the ratio of number.
RCVR(Random user for Clustering interesting Vector for Recommend
Algorithm) using random user K-Means clusters, class interest vector is recommended.
CCRR (Core user for Clustering Random for Recommend Algorithm) uses core
User K-Means is clustered, random to recommend.
RCRR (Random user for Clustering Random for Recommend Algorithm) using with
Machine user K-Means is clustered, random to recommend.
The experimental result of degree of accuracy contrast is shown in accompanying drawing 3, and abscissa is data set (number of users × theme label quantity), is indulged
Coordinate is the degree of accuracy, can intuitively find out that CCVR proposed by the present invention recommends the degree of accuracy of method to be better than other sides by figure
Formula.
Specific embodiment described herein is only to spirit explanation for example of the invention.Technology neck belonging to of the invention
The technical staff in domain can be made various modifications or supplement to described specific embodiment or be substituted using similar fashion,
But without departing from the spiritual of the present invention or surmount scope defined in appended claims.
Claims (2)
1. a kind of recommendation method based on user clustering, it is characterised in that comprise the following steps:
Step1, input user set U={ u1,u2…uαAnd theme label set C={ s1,s2…sβ, α represents user's number,
β represents theme label number in theme label set C;It is 1 to initialize currently processed user's sequence number i values, goes to Step2;
Step2, it is 1 to initialize currently processed label sequence number j values, goes to Step3;
Step3, if user uiTheme label s is paid close attention toj, go to Step4;Otherwise make user interested in j-th of theme label
Degree dj=0, go to Step9;
Step4, according to user uiTo theme label sjNumber of visits n, determine user uiTo theme label sjAlways browse frequency
Rate f=n, goes to Step5;
Step5, determines user uiTo theme label sjKth time browsing time tj,kAnd total browsing time T, k value is 1,
2 ... n, go to Step6;
Step6, determines user uiTo theme label sjEffectively browse frequency ef, go to Step7;
Determination mode is, if tmin≤tj,k≤tmax, tminAnd tmaxFor user uiThe minimum browsing time of label and maximum are browsed
The predetermined threshold value of time, then user uiIt is effective that the kth of j-th of theme label, which time is browsed, then user uiTo j-th of theme
In n navigation process of label, all number of times sums effectively browsed are user uiFrequency is browsed to j-th of the effective of theme label
Rate;
Step7, seeks efThe secondary browsing time sum effectively browsed, calculates user uiTo theme label sjEffective browsing time
et, go to Step8;
Step8, according to following formula, calculates user uiTo theme label sjInterest-degree dj, go to Step9;
Wherein, parameterF1 is that user browses frequency sum to all theme labels;Ps is that default systematic parameter is emerging
Interesting time coefficient,Average browsing time of the user to j-th of theme label is represented,Represent user to j-th of theme label
The average effective browsing time;
Step9, if user uiThe not browsed tag set c in theme label set CbRepresent, browsed tag set
Use caRepresent, according to following formula, calculate Vi,j, Vi,jRepresent user uiTo theme label sjInterest level, user uiInterest vector
For vi(Vi,1,Vi,2,…Vi,β);J=j+1 is made, Step3 is gone to if j is less than or equal to β, otherwise goes to Step10;
Step10, makes i=i+1, if i is less than or equal to α, goes to Step2, otherwise makes i=1, initialization core customer's number γ
Value is 0, goes to Step11;
Step11, according to user uiInterest vector vi(Vi,1,Vi,2,…Vi,β) in nonzero element proportion obtain interest density
Value density (ui), if interest density value density (ui) > λ, mark uiFor core customer, Step12 is gone to;Otherwise turn
To Step13;Wherein, λ is default density threshold;
Step12, makes γ=γ+1, goes to Step13;
Step13, makes i=i+1, if i is less than or equal to α, goes to Step11;Otherwise Step14 is gone to;
Step14, currently available γ core customer starts to cluster whole users with K-means algorithms, this step with
γ core customer is initial cluster centre, and original definition variable newJ=0, oldJ=-1 go to Step15;
Step15, calculates fabs (newJ-oldJ), fabs function representations calculate absolute value, if fabs (newJ-oldJ) is more than
Equal to the corresponding predetermined threshold value of absolute value, Step16 is gone to, Step19 is otherwise gone to;
Step16, to user set U={ u1,u2…uαIn as each remaining users beyond the user of cluster centre, count respectively
Calculate remaining users and each as the Euclidean distance between the user of cluster centre, and be assigned to closest cluster centre phase
In should clustering, Step17 is gone to;
Step17, calculates each user clustering RhIn all user interests vector average value, be used as user clustering RhNew cluster
Center Zh, go to Step18;
Step18, makes oldJ=newJ, and calculating new criterion function value according to criterion function is assigned to newJ, goes to Step15;
Step19, currently available γ user clustering R1,R2…Rγ, go to Step20;
Step20, it is 1 to initialize currently processed classification sequence number h values, goes to Step21;
Step21, the class interest vector Rv of the category is calculated according to following formulah=(RVh1,RVh2,...,RVhβ), go to Step22;
Wherein, | Rh| represent user clustering RhIn user's number,Represent user clustering RhIn any user, represent poly- with w
Class RhMiddle user's number,Value is 1,2......w,Represent user clustering RhMiddle userTo the emerging of j-th theme label
Interesting degree, RVhjRepresent user clustering RhTo the interest-degree of j-th of theme label, j values are 1,2...... β;
Step22, makes h=h+1, if h is less than or equal to γ, goes to Step21, otherwise goes to Step23;
Step23, now obtains the class interest vector of γ classification, Rv1,Rv2…Rvγ, h=1 is made, Step24 is gone to;
Step24, is user clustering RhIn each user difference proposed topic label, if user clustering RhIn userFor with
Family set U={ u1,u2…uαIn user ui, for user uiInterest vector vi(Vi,1,Vi,2,…Vi,β), it and user
Cluster RhClass interest vector Rvh=(RVh1,RVh2,...,RVhβ) in each interest value RVhjIt is compared, if Vi,jMore than etc.
In RVhj, then theme label sjUser is recommended, Step25 is gone to;
Step25, makes h=h+1, if h is less than or equal to γ, goes to Step24, otherwise goes to Step26;
Step26, to user set U={ u1,u2…uαIn the automatic recommendation of each user completed, terminate.
2. the recommendation method based on user clustering according to claim 1, it is characterised in that:In Step18, criterion function
Calculation formula is as follows,
Wherein, w represents user clustering RhMiddle user's number,Square of the deviation between two characteristic vectors is represented,For
User clustering RhIn userInterest vector, ZhFor the cluster centre of respective classes.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201410565721.7A CN104268290B (en) | 2014-10-22 | 2014-10-22 | A kind of recommendation method based on user clustering |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201410565721.7A CN104268290B (en) | 2014-10-22 | 2014-10-22 | A kind of recommendation method based on user clustering |
Publications (2)
Publication Number | Publication Date |
---|---|
CN104268290A CN104268290A (en) | 2015-01-07 |
CN104268290B true CN104268290B (en) | 2017-08-08 |
Family
ID=52159811
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201410565721.7A Expired - Fee Related CN104268290B (en) | 2014-10-22 | 2014-10-22 | A kind of recommendation method based on user clustering |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN104268290B (en) |
Families Citing this family (16)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106033415B (en) * | 2015-03-09 | 2020-07-03 | 深圳市腾讯计算机系统有限公司 | Text content recommendation method and device |
CN105808698B (en) * | 2016-03-03 | 2020-04-07 | 江苏大学 | TOP-k position interest point recommendation method facing internet of things user query request |
CN105824942A (en) * | 2016-03-21 | 2016-08-03 | 上海珍岛信息技术有限公司 | Item recommendation method and system based on collaborative filtering algorithm |
CN106055713B (en) * | 2016-07-01 | 2019-10-18 | 华南理工大学 | Social network user recommended method based on user interest and social subject distillation |
CN107608992A (en) * | 2016-07-12 | 2018-01-19 | 上海视畅信息科技有限公司 | A kind of personalized recommendation method based on time shaft |
CN106484795A (en) * | 2016-09-22 | 2017-03-08 | 天津大学 | A kind of interest based on non-structured web page data recommends method |
CN107122805A (en) * | 2017-05-15 | 2017-09-01 | 腾讯科技(深圳)有限公司 | A kind of user clustering method and apparatus |
CN107480217A (en) * | 2017-07-31 | 2017-12-15 | 陕西识代运筹信息科技股份有限公司 | A kind of information processing method and device based on social data |
CN107948257B (en) * | 2017-11-13 | 2019-10-01 | 苏州达家迎信息技术有限公司 | The method for pushing and computer readable storage medium of APP |
CN107943895A (en) * | 2017-11-16 | 2018-04-20 | 百度在线网络技术(北京)有限公司 | Information-pushing method and device |
CN108921398B (en) * | 2018-06-14 | 2020-12-11 | 口口相传(北京)网络技术有限公司 | Shop quality evaluation method and device |
CN109087711A (en) * | 2018-06-28 | 2018-12-25 | 郑州大学第附属医院 | Medical big data method for digging and system |
CN108876407B (en) * | 2018-06-28 | 2022-04-19 | 联想(北京)有限公司 | Data processing method and electronic equipment |
CN109903082B (en) * | 2019-01-24 | 2022-10-28 | 平安科技(深圳)有限公司 | Clustering method based on user portrait, electronic device and storage medium |
CN110517114A (en) * | 2019-08-21 | 2019-11-29 | 广州云徙科技有限公司 | A kind of information-pushing method and system based on community discovery algorithm |
CN111695039A (en) * | 2020-06-12 | 2020-09-22 | 江苏海洋大学 | Personalized recommendation method based on multi-objective optimization |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101694659A (en) * | 2009-10-20 | 2010-04-14 | 浙江大学 | Individual network news recommending method based on multitheme tracing |
CN103235824A (en) * | 2013-05-06 | 2013-08-07 | 上海河广信息科技有限公司 | Method and system for determining web page texts users interested in according to browsed web pages |
-
2014
- 2014-10-22 CN CN201410565721.7A patent/CN104268290B/en not_active Expired - Fee Related
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101694659A (en) * | 2009-10-20 | 2010-04-14 | 浙江大学 | Individual network news recommending method based on multitheme tracing |
CN103235824A (en) * | 2013-05-06 | 2013-08-07 | 上海河广信息科技有限公司 | Method and system for determining web page texts users interested in according to browsed web pages |
Non-Patent Citations (1)
Title |
---|
基于协同过滤与划分聚类的改进推荐算法;吴泓辰等;《计算机研究与发展》;20110915;第48卷(第S3期);全文 * |
Also Published As
Publication number | Publication date |
---|---|
CN104268290A (en) | 2015-01-07 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN104268290B (en) | A kind of recommendation method based on user clustering | |
Wu et al. | Learning fair representations for recommendation: A graph-based perspective | |
Yue | An extended TOPSIS for determining weights of decision makers with interval numbers | |
Dou et al. | A survey of collaborative filtering algorithms for social recommender systems | |
CN111160954B (en) | Recommendation method facing group object based on graph convolution network model | |
CN110430471A (en) | It is a kind of based on the television recommendations method and system instantaneously calculated | |
CN105723402A (en) | Systems and methods for determining influencers in a social data network | |
Pérez-Marcos et al. | Hybrid system for video game recommendation based on implicit ratings and social networks | |
CN107145541B (en) | Social network recommendation model construction method based on hypergraph structure | |
Liu et al. | Personal recommendation via unequal resource allocation on bipartite networks | |
Wang et al. | A fog-based recommender system | |
Tran et al. | Collaborative filtering via sparse Markov random fields | |
Mahmood et al. | Influence model and doubly extended TOPSIS with TOPSIS based matrix of interpersonal influences | |
Tian et al. | A survey of personalized recommendation based on machine learning algorithms | |
Qin et al. | Towards a personalized movie recommendation system: A deep learning approach | |
Gao et al. | Deep learning with consumer preferences for recommender system | |
Lian et al. | Personalized recommendation via an improved NBI algorithm and user influence model in a Microblog network | |
Lv et al. | Measuring geospatial properties: Relating online content browsing behaviors to users’ points of interest | |
Bang et al. | Collective matrix factorization using tag embedding for effective recommender system | |
Van Ma et al. | Fuzzy Decision Making-based Recommendation Channel System using the Social Network Database | |
Amini et al. | Proposing a new hybrid approach in movie recommender system | |
Tian et al. | Common features based volunteer and voluntary activity recommendation algorithm | |
Huang et al. | Collaborative filtering of web service based on mapreduce | |
Xie et al. | Correlation-based top-k recommendation for web services | |
Zhou et al. | Personalized recommendation algorithm based on user preference and user profile |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant | ||
CF01 | Termination of patent right due to non-payment of annual fee |
Granted publication date: 20170808 Termination date: 20181022 |
|
CF01 | Termination of patent right due to non-payment of annual fee |