CN107092924A - A kind of personalized recommendation algorithm of the clustering algorithm adaptively increased based on number of clusters - Google Patents
A kind of personalized recommendation algorithm of the clustering algorithm adaptively increased based on number of clusters Download PDFInfo
- Publication number
- CN107092924A CN107092924A CN201710177780.0A CN201710177780A CN107092924A CN 107092924 A CN107092924 A CN 107092924A CN 201710177780 A CN201710177780 A CN 201710177780A CN 107092924 A CN107092924 A CN 107092924A
- Authority
- CN
- China
- Prior art keywords
- cluster
- algorithm
- user
- clusters
- algorithms
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/23—Clustering techniques
- G06F18/232—Non-hierarchical techniques
- G06F18/2321—Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions
- G06F18/23211—Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions with adaptive number of clusters
Landscapes
- Engineering & Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Artificial Intelligence (AREA)
- Life Sciences & Earth Sciences (AREA)
- Bioinformatics & Computational Biology (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Evolutionary Biology (AREA)
- Evolutionary Computation (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Probability & Statistics with Applications (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention discloses a kind of personalized recommendation algorithm of the clustering algorithm adaptively increased based on number of clusters.The algorithm is comprising the incremental learning based on cluster and recommends two stages.Wherein, the incremental learning stage based on cluster includes three parts:1) the MWOSK means algorithms provided using the present invention are clustered;2) the MGSoC algorithms provided using the present invention realize the adaptive growth of the quantity of cluster;3) incremental update.Result of the recommendation stage based on previous stage, personalized recommendation is carried out using the Collaborative Filtering Recommendation Algorithm for having merged user's weights.Compared with the proposed algorithm of collaborative filtering after existing first cluster, the proposed algorithm that the present invention is provided has the advantages that accuracy is high, adaptive can should determine that the quantity of cluster, be applicable to incremental learning.
Description
Technical field
The present invention is on the personalized recommendation problem in data mining, and in particular in data mining based on cluster
Personalized recommendation field.
Background technology
Personalized recommendation is the Characteristic of Interest and buying behavior according to user, to user recommended user information interested and
Commodity.Collaborative filtering is the algorithms most in use in personalized recommendation.Cluster is carried out before collaborative filtering recommending to advantageously account for searching
Rope space is larger, accuracy rate is not high enough and to sparse data it is sensitive the problems such as.
Cluster is by the process of the high object clustering of similarity., can be first using cluster in personalized recommendation
Then technology uses the information of clustering cluster similarity high clustering objects in proposed algorithm.However, most of at present take
The personalized recommendation algorithm of collaborative filtering strategy only supports off-line learning after first clustering, it is impossible to adapt to user, project and scoring letter
Cease the situation of the incremental learning of frequent updating.
Have been proposed at present some adapt to incremental learning situation first cluster after collaborative filtering personalized recommendation calculate
Method.But one of these algorithms presence has the disadvantage:The quantity of specified cluster artificial in advance is needed in clustering phase, so recommends to calculate
The result of method is often sensitive to the quantity of the cluster artificially specified, thus needs to take a significant amount of time optimal to determine to test
Number of clusters amount.Another has the disadvantage that the degree of accuracy is not high enough.
The content of the invention
The deficiency existed for the personalized recommendation algorithm of collaborative filtering after existing first cluster, the invention provides a kind of base
The personalized recommendation algorithm of the clustering algorithm adaptively increased in number of clusters.This algorithm includes the incremental learning based on cluster and recommendation
Two stages, wherein employing the MWOSK-means (Modified that the present invention is provided in the incremental learning stage based on cluster
Weighted Online Spherical K-means) algorithm and MGSoC (Modified Growing Self-
Organizing Cluster) algorithm.MWOSK-means algorithms can make full use of project information to supply the meters of user's weights
Calculate, improve the degree of accuracy of personalized recommendation.MGSoC algorithms complete the adaptive growth of number of clusters amount, solve to a certain extent
The quantity of cluster needs artificially to specify in advance, needs the plenty of time the problem of determine optimal number of clusters amount in the prior art.
The personalized recommendation algorithm of the clustering algorithm provided by the present invention adaptively increased based on number of clusters is applicable to letter
Cease the incremental learning situation of (such as user, project and score information) frequent updating, the energy compared with existing personalized recommendation algorithm
Obtain the higher degree of accuracy and reduce the time needed for the optimal number of clusters amount of determination.
The present invention includes herein below:
1st, a kind of personalized recommendation algorithm of the clustering algorithm adaptively increased based on number of clusters
The algorithm is comprising the incremental learning based on cluster and recommends two stages, refers to Fig. 1.
2nd, the clustering algorithm that a kind of number of clusters based on MWOSK-means algorithms and MGSoC algorithms adaptively increases
In the incremental learning stage in the personalized recommendation algorithm that the present invention is provided, employ a kind of base that the present invention is provided
The clustering algorithm (see the P1 in Fig. 1) adaptively increased in the number of clusters of MWOSK-means algorithms and MGSoC algorithms, the algorithm bag
Include and clustered using MWOSK-means algorithms and (see the S2 in Fig. 1, refer to Fig. 3), realize number of clusters amount using MGSoC algorithms
It is adaptive increase (see the P1.1 in Fig. 1, referring to Fig. 4, Fig. 5) and incremental update (see the S6 in Fig. 1, refer to Fig. 6, Fig. 7, Fig. 8,
Fig. 9) three parts.
3rd, a kind of computational methods of new project weights and user's weights
Prior art does not account for the influence that project weights are brought when calculating user's weights.The invention provides one kind
MWOSK-means algorithms, the initial phase (see the S1.5 in Fig. 2) of the algorithm employs a kind of new item that the present invention is provided
Mesh weight calculation method, is specifically shown in formula (1), (2).Based on the project weight calculation method, the invention provides a kind of new
User's weight calculation method of project weights is considered, specific method is shown in formula (3).
4th, it is a kind of new to judge the whether suitable determination methods of number of clusters amount in cluster process
In present disclosure 2, using MGSoC algorithms realize number of clusters amount adaptive growth (see the P1.1 in Fig. 1,
Refer to Fig. 4, Fig. 5) part, in order to realize the adaptive growth of number of clusters amount, the present invention provides a kind of new in MGSoC algorithms
Judge the whether suitable method of number of clusters amount in cluster process (S3, S4 in Fig. 1, wherein S3 refer to Fig. 4).
5th, a kind of computational methods of the newly-increased cluster center initial position of new calculating
In present disclosure 2, using MGSoC algorithms realize number of clusters amount adaptive growth (see the P1.1 in Fig. 1,
Refer to Fig. 4, Fig. 5) part, if it is determined that showing that the number of clusters amount in cluster process is improper, then the present invention is carried in MGSoC algorithms
A kind of computational methods of the newly-increased cluster center initial position of new calculating supplied come calculate Xin Cu centers initial position (S5 in Fig. 1,
Refer to Fig. 5).
Brief description of the drawings
Fig. 1 is a kind of stream of the personalized recommendation algorithm for clustering algorithm adaptively increased based on number of clusters that the present invention is provided
Cheng Tu.
Fig. 2 is the flow chart of S1 in Fig. 1.
Fig. 3 is the flow chart of S2 in Fig. 1.
Fig. 4 is the flow chart of S3 in Fig. 1.
Fig. 5 is the flow chart of S5 in Fig. 1.
Fig. 6 is the flow chart of S6 in Fig. 1.
Fig. 7 is the flow chart of S6.1 in Fig. 6.
Fig. 8 is the flow chart of S6.2 in Fig. 6.
Fig. 9 is the flow chart of S6.3 in Fig. 6.
Symbol description used in the present invention:
σ(ui):uiStandard deviation
exp(·):Using e as the exponential function at bottom
η:The learning rate at cluster process Zhong Cu centers
α:For judging the whether convergent convergence threshold in cluster center
β:For judging the whether suitable error threshold of current cluster quantity
1:Vector dimension is identical with being multiplied, and element is all 1 vector
T:Matrix transposition
n:Number of users
p:Item number
d:Length between Linear Mapping back zone
l:Lower bound between Linear Mapping back zone
A:User's set to be recommended
s(u):Vector normalization.Citing:To n-dimensional vector u=(u1,u2,...,un), its mould isThen
Embodiment
The personalized recommendation algorithm of the clustering algorithm disclosed by the invention adaptively increased based on number of clusters is included based on cluster
Incremental learning and recommend two stages.
Personalized recommendation algorithm overall flow figure is as shown in Figure 1.
Below in conjunction with the accompanying drawings, the embodiment to the present invention elaborates.
First, initialize
S1 in this part corresponding diagram 1, detail flowchart is shown in Fig. 2.
S1:Initialization
S1.1:Initiation parameter collection
1) assignment is carried out to number of users n, item number p according to actual conditions;
2) user of the algorithm specifies initial cluster quantity K, learning rate η, convergence threshold α, error threshold β, Linear Mapping
The lower bound l between length d, Linear Mapping back zone between back zone, user's set A to be recommended.
S1.2:Initialization represents the matrix M of " whether user scores project "
Matrix M=(the m arranged with n rows pij) represent whether i-th of user gives the scoring to j-th of project, wherein i ∈
{ 1,2,3 ..., n }, j ∈ { 1,2,3 ..., p }.mij=1 expression user i gives the scoring to project j, mij=0 represents to use
Family i does not provide scoring to project j.
S1.3:Initialize " user-project rating matrix " U
1) the matrix U=(u arranged with n rows pij) represent user-project rating matrix, wherein i ∈ { 1,2,3 ..., n }, j ∈
{1,2,3,...,p}.Score value of i-th of the user of element representation of i-th row jth row to j-th of project.Use ui=(ui1,
ui2,...,uij,...,uip) represent score value of i-th of user to all p projects;
2) U is normalized line by line, i.e. s (ui), i ∈ { 1,2,3 ..., n };
S1.4:Initialize " the affiliated cluster matrix of user-user " Z
1) the matrix Z=(z of construction n rows K rowik) represent the affiliated cluster matrix of user-user, wherein i ∈ 1,2,3 ...,
N }, k ∈ { 1,2,3 ..., K }.zik∈ { 0,1 }, zik=1 expression user i belongs to cluster k, zik=0 expression user i is not belonging to cluster k;
2) n user is randomly assigned into some cluster into K cluster (to meet any user i);
S1.5:MWOSK-means algorithm initializations
1) the project weight vector w of the weights comprising p project is calculateditem, jth (j ∈ { 1,2,3 ..., p }) individual project
Project weights calculation formula it is as follows:
m,jRepresenting matrix M jth row.
2) according to d, l and equation below by witemLinear Mapping is to suitable interval:
WhereinRepresent witemGreatest member value,Represent witemLeast member value.Represent
Project j project weights before Linear Mapping,Represent the project weights of project j after Linear Mapping, j ∈ { 1,2,3 ..., p }.
D represents the length between Linear Mapping back zone, and l represents the lower bound between Linear Mapping back zone.
3) user's weight vector w of the weights comprising n user is calculateduser, the i-th (i ∈ { 1,2,3 ..., n }) individual user
User's weights calculation formula it is as follows:
Wherein σ (ui) represent uiStandard deviation.
4) K p dimensional vectors μ is constructed according to equation belowk(k ∈ { 1,2,3 ..., K }) are to represent each Cu Cu center
Position:
Wherein j ∈ 1,2,3 ..., p }.
2nd, cluster
S2 in this part corresponding diagram 1, particular flow sheet is shown in Fig. 3.
S2:MWOSK-means algorithms are clustered
S2.1:Travel through all user's training patterns in U
To i-th (i ∈ 1,2,3 ..., n }) individual user:
1) the affiliated clusters of user i are recalculated:
Wherein k ∈ { 1,2,3 ..., K }, closest_kiRepresent to recalculate some new cluster belonging to obtained user i,
closest_ki∈{1,2,3,...,K}。
2) the new affiliated Cu Cu centers of user i are updated:
Represent to update Hou Cu centers,Represent to update Qian Cu centers.
S2.2:Calculate degree of convergence h
Degree of convergence h is calculated according to equation below:
μ'kRepresent " S2.1:K-th of cluster center after all user's training patterns in traversal U ", μkRepresent the kth before S2.1
Ge Cu centers, k ∈ { 1,2,3 ..., K }.
3rd, number of clusters adaptively increases
S3, S4, S5 in this part corresponding diagram 1, wherein S3, S5 particular flow sheet are shown in Fig. 4, Fig. 5.
S3:The MGSoC algorithm stages one:Calculation error degree em
S3.1:Construction includes the error vector e of K element
E is calculated with below equation to each cluster k (k ∈ { 1,2,3 ..., K })k:
S3.2:Normalize e
That is s (e).
S3.3:Calculation error degree em
With following formula calculation error degree em:
Wherein, exp () is the exponential function using e the bottom of as.
S4:Judge whether number of clusters amount is suitable
According to degree of error e derived abovem, by emCompared with β, if emLess than β, then current cluster quantity is suitable;If em
More than β, then current number of clusters is improper.
S5:The MGSoC algorithm stages two:Calculate Xin Cu center μnew
S5.1:Calculate the position μ at Xin Cu centersnew
The position μ at Xin Cu centers is calculated with below equationnew:
S5.2:Normalize the cluster center vector μ of new clusternew
That is s (μnew)。
S5.3:Remove error vector e
4th, incremental learning
S6:Incremental learning
The personalized recommendation algorithm proposed in the present invention supports following four incremental learning situation:1) existing user i
Once project j is scored;2) existing user i updates the scoring to project j;3) there is new user;4) there are new projects.
Because the new projects of appearance are not scored by user, therefore model will not be impacted.When new projects are scored by user, situation
It is equivalent to " 1) existing user i scores project j for the first time ".
It is specific as follows:
S6.1:Existing user i scores project j for the first time
Corresponding diagram 7.
1) user i mould is updated:
||u'i| | the mould of user i after updating is represented, | | ui| | represent the mould of user i before updating.
2) user i and each cluster cosine similarity are updated:
Wherein k ∈ 1,2,3 ..., K }.
3) renewal item j weights:
The weights of project j after updating are represented,Represent the weights of project j before updating.
4) user i weights are updated:
The weights of user i after updating are represented,Represent the weights of user i before updating.
5) the affiliated clusters of user i are recalculated:
Wherein k ∈ { 1,2,3 ..., K }, closest_kiRepresent to recalculate some new cluster belonging to obtained user i,
closest_ki∈{1,2,3,...,K}。
6) cluster center is updated according to formula (7)
S6.2:Existing user i updates the scoring to j
Corresponding diagram 8.
1) user i mould is updated:
2) user i and each cluster cosine similarity are updated:
Wherein k ∈ 1,2,3 ..., K }.
3) user i weights are updated:
4) the affiliated clusters of user i are recalculated according to formula (16), (17);
5) cluster center is updated according to formula (7)
S6.3:There is new user
Corresponding diagram 9.
1) new user's weights are calculated according to formula (3);
2) the new affiliated cluster of user is calculated according to formula (16), (17);
3) the cluster center according to belonging to updating formula (7);
5th, recommend
S7 in this part corresponding diagram 1.
S7:Collaborative filtering recommending
Predict scorings of the user a to project j:
Wherein a ∈ A, k ∈ { 1,2,3 ..., K }, j ∈ { 1,2,3 ..., p }.
Claims (5)
1. a kind of personalized recommendation algorithm of clustering algorithm adaptively increased based on number of clusters (see Fig. 1), it is characterised in that:Comprising
Incremental learning and two stages of recommendation based on cluster.
2. the one kind in the incremental learning stage based on cluster is based on MWOSK-means algorithms in a kind of claim 1 and MGSoC is calculated
The clustering algorithm that the number of clusters of method adaptively increases (see the P1 in Fig. 1), it is characterised in that:Carried out using MWOSK-means algorithms
Cluster (see the S2 in Fig. 1, referring to Fig. 3);Realize the adaptive growth of quantity of cluster (see in Fig. 1 using MGSoC algorithms
P1.1, refers to Fig. 4, Fig. 5).
3. the project weights and use of the initial phase (see the S1.5 in Fig. 2) of MWOSK-means algorithms in a kind of claim 2
The computational methods of family weights, it is characterised in that:Shown in project weight calculation method such as formula (1), (2);Consider project weights
User's weight calculation method such as formula (3) shown in.
4. the whether suitable determination methods of the quantity for judging cluster in cluster process in MGSoC algorithms in a kind of claim 2
(S3, S4 in Fig. 1, wherein S3 refer to Fig. 4), it is characterised in that:The computational methods of the degree of error of each cluster by formula (9),
(10) provide.
5. calculated in a kind of claim 2 in MGSoC algorithms newly-increased cluster center initial position computational methods (S5 in Fig. 1, in detail
See Fig. 5), it is characterised in that:The computational methods of the initial position at Xin Cu centers are provided by formula (11).
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710177780.0A CN107092924A (en) | 2017-03-23 | 2017-03-23 | A kind of personalized recommendation algorithm of the clustering algorithm adaptively increased based on number of clusters |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710177780.0A CN107092924A (en) | 2017-03-23 | 2017-03-23 | A kind of personalized recommendation algorithm of the clustering algorithm adaptively increased based on number of clusters |
Publications (1)
Publication Number | Publication Date |
---|---|
CN107092924A true CN107092924A (en) | 2017-08-25 |
Family
ID=59649259
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201710177780.0A Pending CN107092924A (en) | 2017-03-23 | 2017-03-23 | A kind of personalized recommendation algorithm of the clustering algorithm adaptively increased based on number of clusters |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN107092924A (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109145207A (en) * | 2018-08-01 | 2019-01-04 | 广东奥博信息产业股份有限公司 | A kind of information personalized recommendation method and device based on classification indicators prediction |
CN110717551A (en) * | 2019-10-18 | 2020-01-21 | 中国电子信息产业集团有限公司第六研究所 | Training method and device of flow identification model and electronic equipment |
-
2017
- 2017-03-23 CN CN201710177780.0A patent/CN107092924A/en active Pending
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109145207A (en) * | 2018-08-01 | 2019-01-04 | 广东奥博信息产业股份有限公司 | A kind of information personalized recommendation method and device based on classification indicators prediction |
CN110717551A (en) * | 2019-10-18 | 2020-01-21 | 中国电子信息产业集团有限公司第六研究所 | Training method and device of flow identification model and electronic equipment |
CN110717551B (en) * | 2019-10-18 | 2023-01-20 | 中国电子信息产业集团有限公司第六研究所 | Training method and device of flow identification model and electronic equipment |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN108829763B (en) | Deep neural network-based attribute prediction method for film evaluation website users | |
CN109635291A (en) | A kind of recommended method of fusion score information and item contents based on coorinated training | |
Chen et al. | Fuzzy forecasting based on fuzzy-trend logical relationship groups | |
CN103559504B (en) | Image target category identification method and device | |
CN104462383B (en) | A kind of film based on a variety of behavior feedbacks of user recommends method | |
CN103345656B (en) | A kind of data identification method based on multitask deep neural network and device | |
CN106202377B (en) | A kind of online collaboration sort method based on stochastic gradient descent | |
CN110348579A (en) | A kind of domain-adaptive migration feature method and system | |
CN106022392B (en) | A kind of training method that deep neural network sample is accepted or rejected automatically | |
CN109582864A (en) | Course recommended method and system based on big data science and changeable weight adjustment | |
CN106919951A (en) | A kind of Weakly supervised bilinearity deep learning method merged with vision based on click | |
CN107239993A (en) | A kind of matrix decomposition recommendation method and system based on expansion label | |
CN106649658A (en) | Recommendation system and method for improving user role undifferentiated treatment and data sparseness | |
CN107038184A (en) | A kind of news based on layering latent variable model recommends method | |
CN107391582A (en) | The information recommendation method of user preference similarity is calculated based on context ontology tree | |
CN108764577A (en) | Online time series prediction technique based on dynamic fuzzy Cognitive Map | |
CN109583635A (en) | A kind of short-term load forecasting modeling method towards operational reliability | |
CN108172047A (en) | A kind of network on-line study individualized resource real-time recommendation method | |
CN104091038A (en) | Method for weighting multiple example studying features based on master space classifying criterion | |
CN109903138A (en) | A kind of individual commodity recommendation method | |
CN113407864A (en) | Group recommendation method based on mixed attention network | |
CN107092924A (en) | A kind of personalized recommendation algorithm of the clustering algorithm adaptively increased based on number of clusters | |
CN107341479A (en) | A kind of method for tracking target based on the sparse coordination model of weighting | |
CN104572915B (en) | One kind is based on the enhanced customer incident relatedness computation method of content environment | |
CN107807919A (en) | A kind of method for carrying out microblog emotional classification prediction using random walk network is circulated |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
WD01 | Invention patent application deemed withdrawn after publication | ||
WD01 | Invention patent application deemed withdrawn after publication |
Application publication date: 20170825 |