CN107092924A - A kind of personalized recommendation algorithm of the clustering algorithm adaptively increased based on number of clusters - Google Patents

A kind of personalized recommendation algorithm of the clustering algorithm adaptively increased based on number of clusters Download PDF

Info

Publication number
CN107092924A
CN107092924A CN201710177780.0A CN201710177780A CN107092924A CN 107092924 A CN107092924 A CN 107092924A CN 201710177780 A CN201710177780 A CN 201710177780A CN 107092924 A CN107092924 A CN 107092924A
Authority
CN
China
Prior art keywords
cluster
algorithm
user
clusters
algorithms
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201710177780.0A
Other languages
Chinese (zh)
Inventor
杨波
袁磊
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
University of Electronic Science and Technology of China
Original Assignee
University of Electronic Science and Technology of China
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by University of Electronic Science and Technology of China filed Critical University of Electronic Science and Technology of China
Priority to CN201710177780.0A priority Critical patent/CN107092924A/en
Publication of CN107092924A publication Critical patent/CN107092924A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • G06F18/232Non-hierarchical techniques
    • G06F18/2321Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions
    • G06F18/23211Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions with adaptive number of clusters

Landscapes

  • Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Probability & Statistics with Applications (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a kind of personalized recommendation algorithm of the clustering algorithm adaptively increased based on number of clusters.The algorithm is comprising the incremental learning based on cluster and recommends two stages.Wherein, the incremental learning stage based on cluster includes three parts:1) the MWOSK means algorithms provided using the present invention are clustered;2) the MGSoC algorithms provided using the present invention realize the adaptive growth of the quantity of cluster;3) incremental update.Result of the recommendation stage based on previous stage, personalized recommendation is carried out using the Collaborative Filtering Recommendation Algorithm for having merged user's weights.Compared with the proposed algorithm of collaborative filtering after existing first cluster, the proposed algorithm that the present invention is provided has the advantages that accuracy is high, adaptive can should determine that the quantity of cluster, be applicable to incremental learning.

Description

A kind of personalized recommendation algorithm of the clustering algorithm adaptively increased based on number of clusters
Technical field
The present invention is on the personalized recommendation problem in data mining, and in particular in data mining based on cluster Personalized recommendation field.
Background technology
Personalized recommendation is the Characteristic of Interest and buying behavior according to user, to user recommended user information interested and Commodity.Collaborative filtering is the algorithms most in use in personalized recommendation.Cluster is carried out before collaborative filtering recommending to advantageously account for searching Rope space is larger, accuracy rate is not high enough and to sparse data it is sensitive the problems such as.
Cluster is by the process of the high object clustering of similarity., can be first using cluster in personalized recommendation Then technology uses the information of clustering cluster similarity high clustering objects in proposed algorithm.However, most of at present take The personalized recommendation algorithm of collaborative filtering strategy only supports off-line learning after first clustering, it is impossible to adapt to user, project and scoring letter Cease the situation of the incremental learning of frequent updating.
Have been proposed at present some adapt to incremental learning situation first cluster after collaborative filtering personalized recommendation calculate Method.But one of these algorithms presence has the disadvantage:The quantity of specified cluster artificial in advance is needed in clustering phase, so recommends to calculate The result of method is often sensitive to the quantity of the cluster artificially specified, thus needs to take a significant amount of time optimal to determine to test Number of clusters amount.Another has the disadvantage that the degree of accuracy is not high enough.
The content of the invention
The deficiency existed for the personalized recommendation algorithm of collaborative filtering after existing first cluster, the invention provides a kind of base The personalized recommendation algorithm of the clustering algorithm adaptively increased in number of clusters.This algorithm includes the incremental learning based on cluster and recommendation Two stages, wherein employing the MWOSK-means (Modified that the present invention is provided in the incremental learning stage based on cluster Weighted Online Spherical K-means) algorithm and MGSoC (Modified Growing Self- Organizing Cluster) algorithm.MWOSK-means algorithms can make full use of project information to supply the meters of user's weights Calculate, improve the degree of accuracy of personalized recommendation.MGSoC algorithms complete the adaptive growth of number of clusters amount, solve to a certain extent The quantity of cluster needs artificially to specify in advance, needs the plenty of time the problem of determine optimal number of clusters amount in the prior art.
The personalized recommendation algorithm of the clustering algorithm provided by the present invention adaptively increased based on number of clusters is applicable to letter Cease the incremental learning situation of (such as user, project and score information) frequent updating, the energy compared with existing personalized recommendation algorithm Obtain the higher degree of accuracy and reduce the time needed for the optimal number of clusters amount of determination.
The present invention includes herein below:
1st, a kind of personalized recommendation algorithm of the clustering algorithm adaptively increased based on number of clusters
The algorithm is comprising the incremental learning based on cluster and recommends two stages, refers to Fig. 1.
2nd, the clustering algorithm that a kind of number of clusters based on MWOSK-means algorithms and MGSoC algorithms adaptively increases
In the incremental learning stage in the personalized recommendation algorithm that the present invention is provided, employ a kind of base that the present invention is provided The clustering algorithm (see the P1 in Fig. 1) adaptively increased in the number of clusters of MWOSK-means algorithms and MGSoC algorithms, the algorithm bag Include and clustered using MWOSK-means algorithms and (see the S2 in Fig. 1, refer to Fig. 3), realize number of clusters amount using MGSoC algorithms It is adaptive increase (see the P1.1 in Fig. 1, referring to Fig. 4, Fig. 5) and incremental update (see the S6 in Fig. 1, refer to Fig. 6, Fig. 7, Fig. 8, Fig. 9) three parts.
3rd, a kind of computational methods of new project weights and user's weights
Prior art does not account for the influence that project weights are brought when calculating user's weights.The invention provides one kind MWOSK-means algorithms, the initial phase (see the S1.5 in Fig. 2) of the algorithm employs a kind of new item that the present invention is provided Mesh weight calculation method, is specifically shown in formula (1), (2).Based on the project weight calculation method, the invention provides a kind of new User's weight calculation method of project weights is considered, specific method is shown in formula (3).
4th, it is a kind of new to judge the whether suitable determination methods of number of clusters amount in cluster process
In present disclosure 2, using MGSoC algorithms realize number of clusters amount adaptive growth (see the P1.1 in Fig. 1, Refer to Fig. 4, Fig. 5) part, in order to realize the adaptive growth of number of clusters amount, the present invention provides a kind of new in MGSoC algorithms Judge the whether suitable method of number of clusters amount in cluster process (S3, S4 in Fig. 1, wherein S3 refer to Fig. 4).
5th, a kind of computational methods of the newly-increased cluster center initial position of new calculating
In present disclosure 2, using MGSoC algorithms realize number of clusters amount adaptive growth (see the P1.1 in Fig. 1, Refer to Fig. 4, Fig. 5) part, if it is determined that showing that the number of clusters amount in cluster process is improper, then the present invention is carried in MGSoC algorithms A kind of computational methods of the newly-increased cluster center initial position of new calculating supplied come calculate Xin Cu centers initial position (S5 in Fig. 1, Refer to Fig. 5).
Brief description of the drawings
Fig. 1 is a kind of stream of the personalized recommendation algorithm for clustering algorithm adaptively increased based on number of clusters that the present invention is provided Cheng Tu.
Fig. 2 is the flow chart of S1 in Fig. 1.
Fig. 3 is the flow chart of S2 in Fig. 1.
Fig. 4 is the flow chart of S3 in Fig. 1.
Fig. 5 is the flow chart of S5 in Fig. 1.
Fig. 6 is the flow chart of S6 in Fig. 1.
Fig. 7 is the flow chart of S6.1 in Fig. 6.
Fig. 8 is the flow chart of S6.2 in Fig. 6.
Fig. 9 is the flow chart of S6.3 in Fig. 6.
Symbol description used in the present invention:
σ(ui):uiStandard deviation
exp(·):Using e as the exponential function at bottom
η:The learning rate at cluster process Zhong Cu centers
α:For judging the whether convergent convergence threshold in cluster center
β:For judging the whether suitable error threshold of current cluster quantity
1:Vector dimension is identical with being multiplied, and element is all 1 vector
T:Matrix transposition
n:Number of users
p:Item number
d:Length between Linear Mapping back zone
l:Lower bound between Linear Mapping back zone
A:User's set to be recommended
s(u):Vector normalization.Citing:To n-dimensional vector u=(u1,u2,...,un), its mould isThen
Embodiment
The personalized recommendation algorithm of the clustering algorithm disclosed by the invention adaptively increased based on number of clusters is included based on cluster Incremental learning and recommend two stages.
Personalized recommendation algorithm overall flow figure is as shown in Figure 1.
Below in conjunction with the accompanying drawings, the embodiment to the present invention elaborates.
First, initialize
S1 in this part corresponding diagram 1, detail flowchart is shown in Fig. 2.
S1:Initialization
S1.1:Initiation parameter collection
1) assignment is carried out to number of users n, item number p according to actual conditions;
2) user of the algorithm specifies initial cluster quantity K, learning rate η, convergence threshold α, error threshold β, Linear Mapping The lower bound l between length d, Linear Mapping back zone between back zone, user's set A to be recommended.
S1.2:Initialization represents the matrix M of " whether user scores project "
Matrix M=(the m arranged with n rows pij) represent whether i-th of user gives the scoring to j-th of project, wherein i ∈ { 1,2,3 ..., n }, j ∈ { 1,2,3 ..., p }.mij=1 expression user i gives the scoring to project j, mij=0 represents to use Family i does not provide scoring to project j.
S1.3:Initialize " user-project rating matrix " U
1) the matrix U=(u arranged with n rows pij) represent user-project rating matrix, wherein i ∈ { 1,2,3 ..., n }, j ∈ {1,2,3,...,p}.Score value of i-th of the user of element representation of i-th row jth row to j-th of project.Use ui=(ui1, ui2,...,uij,...,uip) represent score value of i-th of user to all p projects;
2) U is normalized line by line, i.e. s (ui), i ∈ { 1,2,3 ..., n };
S1.4:Initialize " the affiliated cluster matrix of user-user " Z
1) the matrix Z=(z of construction n rows K rowik) represent the affiliated cluster matrix of user-user, wherein i ∈ 1,2,3 ..., N }, k ∈ { 1,2,3 ..., K }.zik∈ { 0,1 }, zik=1 expression user i belongs to cluster k, zik=0 expression user i is not belonging to cluster k;
2) n user is randomly assigned into some cluster into K cluster (to meet any user i);
S1.5:MWOSK-means algorithm initializations
1) the project weight vector w of the weights comprising p project is calculateditem, jth (j ∈ { 1,2,3 ..., p }) individual project Project weights calculation formula it is as follows:
m,jRepresenting matrix M jth row.
2) according to d, l and equation below by witemLinear Mapping is to suitable interval:
WhereinRepresent witemGreatest member value,Represent witemLeast member value.Represent Project j project weights before Linear Mapping,Represent the project weights of project j after Linear Mapping, j ∈ { 1,2,3 ..., p }. D represents the length between Linear Mapping back zone, and l represents the lower bound between Linear Mapping back zone.
3) user's weight vector w of the weights comprising n user is calculateduser, the i-th (i ∈ { 1,2,3 ..., n }) individual user User's weights calculation formula it is as follows:
Wherein σ (ui) represent uiStandard deviation.
4) K p dimensional vectors μ is constructed according to equation belowk(k ∈ { 1,2,3 ..., K }) are to represent each Cu Cu center Position:
Wherein j ∈ 1,2,3 ..., p }.
2nd, cluster
S2 in this part corresponding diagram 1, particular flow sheet is shown in Fig. 3.
S2:MWOSK-means algorithms are clustered
S2.1:Travel through all user's training patterns in U
To i-th (i ∈ 1,2,3 ..., n }) individual user:
1) the affiliated clusters of user i are recalculated:
Wherein k ∈ { 1,2,3 ..., K }, closest_kiRepresent to recalculate some new cluster belonging to obtained user i, closest_ki∈{1,2,3,...,K}。
2) the new affiliated Cu Cu centers of user i are updated:
Represent to update Hou Cu centers,Represent to update Qian Cu centers.
S2.2:Calculate degree of convergence h
Degree of convergence h is calculated according to equation below:
μ'kRepresent " S2.1:K-th of cluster center after all user's training patterns in traversal U ", μkRepresent the kth before S2.1 Ge Cu centers, k ∈ { 1,2,3 ..., K }.
3rd, number of clusters adaptively increases
S3, S4, S5 in this part corresponding diagram 1, wherein S3, S5 particular flow sheet are shown in Fig. 4, Fig. 5.
S3:The MGSoC algorithm stages one:Calculation error degree em
S3.1:Construction includes the error vector e of K element
E is calculated with below equation to each cluster k (k ∈ { 1,2,3 ..., K })k
S3.2:Normalize e
That is s (e).
S3.3:Calculation error degree em
With following formula calculation error degree em
Wherein, exp () is the exponential function using e the bottom of as.
S4:Judge whether number of clusters amount is suitable
According to degree of error e derived abovem, by emCompared with β, if emLess than β, then current cluster quantity is suitable;If em More than β, then current number of clusters is improper.
S5:The MGSoC algorithm stages two:Calculate Xin Cu center μnew
S5.1:Calculate the position μ at Xin Cu centersnew
The position μ at Xin Cu centers is calculated with below equationnew
S5.2:Normalize the cluster center vector μ of new clusternew
That is s (μnew)。
S5.3:Remove error vector e
4th, incremental learning
S6:Incremental learning
The personalized recommendation algorithm proposed in the present invention supports following four incremental learning situation:1) existing user i Once project j is scored;2) existing user i updates the scoring to project j;3) there is new user;4) there are new projects. Because the new projects of appearance are not scored by user, therefore model will not be impacted.When new projects are scored by user, situation It is equivalent to " 1) existing user i scores project j for the first time ".
It is specific as follows:
S6.1:Existing user i scores project j for the first time
Corresponding diagram 7.
1) user i mould is updated:
||u'i| | the mould of user i after updating is represented, | | ui| | represent the mould of user i before updating.
2) user i and each cluster cosine similarity are updated:
Wherein k ∈ 1,2,3 ..., K }.
3) renewal item j weights:
The weights of project j after updating are represented,Represent the weights of project j before updating.
4) user i weights are updated:
The weights of user i after updating are represented,Represent the weights of user i before updating.
5) the affiliated clusters of user i are recalculated:
Wherein k ∈ { 1,2,3 ..., K }, closest_kiRepresent to recalculate some new cluster belonging to obtained user i, closest_ki∈{1,2,3,...,K}。
6) cluster center is updated according to formula (7)
S6.2:Existing user i updates the scoring to j
Corresponding diagram 8.
1) user i mould is updated:
2) user i and each cluster cosine similarity are updated:
Wherein k ∈ 1,2,3 ..., K }.
3) user i weights are updated:
4) the affiliated clusters of user i are recalculated according to formula (16), (17);
5) cluster center is updated according to formula (7)
S6.3:There is new user
Corresponding diagram 9.
1) new user's weights are calculated according to formula (3);
2) the new affiliated cluster of user is calculated according to formula (16), (17);
3) the cluster center according to belonging to updating formula (7);
5th, recommend
S7 in this part corresponding diagram 1.
S7:Collaborative filtering recommending
Predict scorings of the user a to project j:
Wherein a ∈ A, k ∈ { 1,2,3 ..., K }, j ∈ { 1,2,3 ..., p }.

Claims (5)

1. a kind of personalized recommendation algorithm of clustering algorithm adaptively increased based on number of clusters (see Fig. 1), it is characterised in that:Comprising Incremental learning and two stages of recommendation based on cluster.
2. the one kind in the incremental learning stage based on cluster is based on MWOSK-means algorithms in a kind of claim 1 and MGSoC is calculated The clustering algorithm that the number of clusters of method adaptively increases (see the P1 in Fig. 1), it is characterised in that:Carried out using MWOSK-means algorithms Cluster (see the S2 in Fig. 1, referring to Fig. 3);Realize the adaptive growth of quantity of cluster (see in Fig. 1 using MGSoC algorithms P1.1, refers to Fig. 4, Fig. 5).
3. the project weights and use of the initial phase (see the S1.5 in Fig. 2) of MWOSK-means algorithms in a kind of claim 2 The computational methods of family weights, it is characterised in that:Shown in project weight calculation method such as formula (1), (2);Consider project weights User's weight calculation method such as formula (3) shown in.
4. the whether suitable determination methods of the quantity for judging cluster in cluster process in MGSoC algorithms in a kind of claim 2 (S3, S4 in Fig. 1, wherein S3 refer to Fig. 4), it is characterised in that:The computational methods of the degree of error of each cluster by formula (9), (10) provide.
5. calculated in a kind of claim 2 in MGSoC algorithms newly-increased cluster center initial position computational methods (S5 in Fig. 1, in detail See Fig. 5), it is characterised in that:The computational methods of the initial position at Xin Cu centers are provided by formula (11).
CN201710177780.0A 2017-03-23 2017-03-23 A kind of personalized recommendation algorithm of the clustering algorithm adaptively increased based on number of clusters Pending CN107092924A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201710177780.0A CN107092924A (en) 2017-03-23 2017-03-23 A kind of personalized recommendation algorithm of the clustering algorithm adaptively increased based on number of clusters

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710177780.0A CN107092924A (en) 2017-03-23 2017-03-23 A kind of personalized recommendation algorithm of the clustering algorithm adaptively increased based on number of clusters

Publications (1)

Publication Number Publication Date
CN107092924A true CN107092924A (en) 2017-08-25

Family

ID=59649259

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710177780.0A Pending CN107092924A (en) 2017-03-23 2017-03-23 A kind of personalized recommendation algorithm of the clustering algorithm adaptively increased based on number of clusters

Country Status (1)

Country Link
CN (1) CN107092924A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109145207A (en) * 2018-08-01 2019-01-04 广东奥博信息产业股份有限公司 A kind of information personalized recommendation method and device based on classification indicators prediction
CN110717551A (en) * 2019-10-18 2020-01-21 中国电子信息产业集团有限公司第六研究所 Training method and device of flow identification model and electronic equipment

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109145207A (en) * 2018-08-01 2019-01-04 广东奥博信息产业股份有限公司 A kind of information personalized recommendation method and device based on classification indicators prediction
CN110717551A (en) * 2019-10-18 2020-01-21 中国电子信息产业集团有限公司第六研究所 Training method and device of flow identification model and electronic equipment
CN110717551B (en) * 2019-10-18 2023-01-20 中国电子信息产业集团有限公司第六研究所 Training method and device of flow identification model and electronic equipment

Similar Documents

Publication Publication Date Title
CN108829763B (en) Deep neural network-based attribute prediction method for film evaluation website users
CN109635291A (en) A kind of recommended method of fusion score information and item contents based on coorinated training
Chen et al. Fuzzy forecasting based on fuzzy-trend logical relationship groups
CN103559504B (en) Image target category identification method and device
CN104462383B (en) A kind of film based on a variety of behavior feedbacks of user recommends method
CN103345656B (en) A kind of data identification method based on multitask deep neural network and device
CN106202377B (en) A kind of online collaboration sort method based on stochastic gradient descent
CN110348579A (en) A kind of domain-adaptive migration feature method and system
CN106022392B (en) A kind of training method that deep neural network sample is accepted or rejected automatically
CN109582864A (en) Course recommended method and system based on big data science and changeable weight adjustment
CN106919951A (en) A kind of Weakly supervised bilinearity deep learning method merged with vision based on click
CN107239993A (en) A kind of matrix decomposition recommendation method and system based on expansion label
CN106649658A (en) Recommendation system and method for improving user role undifferentiated treatment and data sparseness
CN107038184A (en) A kind of news based on layering latent variable model recommends method
CN107391582A (en) The information recommendation method of user preference similarity is calculated based on context ontology tree
CN108764577A (en) Online time series prediction technique based on dynamic fuzzy Cognitive Map
CN109583635A (en) A kind of short-term load forecasting modeling method towards operational reliability
CN108172047A (en) A kind of network on-line study individualized resource real-time recommendation method
CN104091038A (en) Method for weighting multiple example studying features based on master space classifying criterion
CN109903138A (en) A kind of individual commodity recommendation method
CN113407864A (en) Group recommendation method based on mixed attention network
CN107092924A (en) A kind of personalized recommendation algorithm of the clustering algorithm adaptively increased based on number of clusters
CN107341479A (en) A kind of method for tracking target based on the sparse coordination model of weighting
CN104572915B (en) One kind is based on the enhanced customer incident relatedness computation method of content environment
CN107807919A (en) A kind of method for carrying out microblog emotional classification prediction using random walk network is circulated

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
WD01 Invention patent application deemed withdrawn after publication
WD01 Invention patent application deemed withdrawn after publication

Application publication date: 20170825