CN104598601B - A kind of method, apparatus classified to user and content and computing device - Google Patents
A kind of method, apparatus classified to user and content and computing device Download PDFInfo
- Publication number
- CN104598601B CN104598601B CN201510041042.4A CN201510041042A CN104598601B CN 104598601 B CN104598601 B CN 104598601B CN 201510041042 A CN201510041042 A CN 201510041042A CN 104598601 B CN104598601 B CN 104598601B
- Authority
- CN
- China
- Prior art keywords
- user
- content
- type
- visit capacity
- similarity
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/23—Clustering techniques
- G06F18/232—Non-hierarchical techniques
- G06F18/2321—Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions
- G06F18/23211—Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions with adaptive number of clusters
Abstract
The invention discloses a kind of method, apparatus classified to user and content and computing device.Described device includes:Initialization module, is suitable for each user type and specifies a user, and a content is specified for each content type;Visit capacity computing module, suitable for calculating the 3rd visit capacity of each user type to the first visit capacity of each content, each user to the second visit capacity of each content type and each user type to each content type;Similarity calculation module, suitable for according to the second visit capacity and the 3rd visit capacity, calculating the similarity between each user and each user type, according to the first visit capacity and the 3rd visit capacity, calculating the similarity between each content and each content type;Sort module, suitable for for each user, selection and its user type of similarity highest user type as the user, for each content, selection and its content type of similarity highest content type as the content.
Description
Technical field
The present invention relates to computer and internet arena, and in particular to a kind of method classified to user and content,
Device and computing device.
Background technology
The analysis that website accesses user content can provide reference for web site contents construction, operation.On Contents Construction,
The faster commodity of growth can be accessed according to user further to cooperate or seek business opportunities with businessman., can on to user service
According to user's commodity interested, targetedly to be recommended user., can be according to different user in operation management
The income level that type is brought to site owners, each content type cost-benefit in conversion website are horizontal.Wherein, web site contents can
To be that the page, the user of website post, the website such as website classification, trade name, commodity classification shows all the elements of user.
For this reason, it may be necessary to website user and content are classified.On to website user and classifying content, in general way
It is that web site contents are artificially first divided into several types according to the construction of website, then, when user carrys out website visiting, according to user
To the visit capacity of each content in website, by largely calculating, some types are separated the users into.But in actual applications some
The more difficult manual sort of web site contents, such as the model of user's hair and chained address etc..
The algorithms most in use classified automatically to website user and content has K averages (Kmeans), probability potential applications point
Analyse (probabilitistic Latent Semantic Analysis, PLSA) and latent Dirichletal location model
(Latent Dirichlet Allocation, LDA) etc..These algorithms are typically first classified to web site contents, dimensionality reduction,
Then user is classified again.But content is classified according to these algorithms, first should substantial many category
Property, moreover, during using these algorithms, its iterative calculation amount is very big.
The content of the invention
In view of the above problems, it is proposed that the present invention so as to provide one kind overcome above mentioned problem or at least in part solve on
State the method, apparatus classified to user and content and computing device of problem.
According to an aspect of the invention, there is provided a kind of device classified to user and content, resides in calculating
It is the first predetermined number user type suitable for each user clustering during user is gathered in equipment, will be each in properties collection
Content clustering is the second predetermined number content type, and described device includes:Initialization module, it is suitable for first predetermined number
Each user type in mesh user type specifies one or more of user's set user, is second predetermined number
One or more of each content type given content set in individual content type content;Visit capacity computing module, is suitable to
Visit capacity according to user to content, each user type is calculated to the first visit capacity of each content, each user to each content type
The second visit capacity and each user type to the 3rd visit capacity of each content type;Similarity calculation module, suitable for according to
Second visit capacity and the 3rd visit capacity, calculate the similarity between each user and each user type, according to first visit capacity
With the 3rd visit capacity, the similarity between each content and each content type is calculated;Sort module, suitable for for each user, choosing
Select with its user type of similarity highest user type as the user, for each content, selection with its similarity most
Content type of the high content type as the content, and trigger visit capacity computing module re-start visit capacity calculate and it is similar
After degree computing module re-starts Similarity Measure, the selection is re-started, when predetermined condition meets, no longer carries out institute
State triggering.
Alternatively, in the device classified to user and content according to the present invention, the initialization module enters one
Step is suitable to:According to the mapping relations between existing user and user type, for the user type of existing one or more users
One or more users are specified, and a user without user type is randomly assigned for the user type of no user;Root
According to the mapping relations between existing content and content type, this is specified for the content type of existing one or more contents
Or multiple contents, and it is randomly assigned a content without content type for sleazy content type.
Alternatively, in the device classified to user and content according to the present invention, for existing user with using
Mapping relations between the type of family, the similarity calculation module do not calculate the similarity between the user and each user type,
And the sort module does not change the user type of the user;For the mapping relations between existing content and content type,
The similarity calculation module does not calculate the similarity between the content and each content type, and the sort module does not change this
The content type of content.
Alternatively, in the device classified to user and content according to the present invention, the visit capacity computing module
Visit capacity of some user type to some content is calculated as follows:Obtain all users that the user type includes;
Obtain visit capacity of wherein each user to the content;All visit capacities are summed, obtain visit of the user type to the content
The amount of asking;The visit capacity computing module calculates visit capacity of some user to some content type as follows:Obtaining should
All the elements that content type includes;Obtain visit capacity of the user to wherein each content;All visit capacities are summed, obtained
Visit capacity of the user to the content type;The visit capacity computing module calculates some user type to certain as follows
The visit capacity of individual content type:Obtain all users that the user type includes and all the elements that the content type includes;
Obtain visit capacity of wherein each user to wherein each content;All visit capacities are summed, it is interior to this to obtain the user type
Hold the visit capacity of type.
Alternatively, in the device classified to user and content according to the present invention, the similarity is based on most
Similarity factor, Pasteur's similarity factor or the cosine similarity factor of small value.
Alternatively, in the device classified to user and content according to the present invention, the similarity calculation module
Before two vectorial similarities are calculated, first to the two vectorial domains take common factor or union after, then calculate the two to
The similarity of amount.
Alternatively, in the device classified to user and content according to the present invention, the predetermined condition is:Triggering
The visit capacity computing module and the number of similarity calculation module reach default number;Or this classification results with
The classification results of last time are compared, and user's ratio that user type changes is less than default first thresholding and content type occurs
The content ratio of change is less than default second thresholding.
According to another aspect of the present invention, there is provided a kind of method classified to user and content, in computing device
Middle execution, it is the first predetermined number user type suitable for each user clustering during user is gathered, will be each in properties collection
Content clustering is the second predetermined number content type, and methods described includes:Initialization step:For first predetermined number
Each user type in user type specifies one or more of user's set user, is in second predetermined number
Hold one or more of each content type given content set in type content;Visit capacity calculation procedure:According to user
To the visit capacity of content, each user type is calculated to the second visit of the first visit capacity, each user of each content to each content type
The 3rd visit capacity of the amount of asking and each user type to each content type;Similarity Measure step, according to second visit capacity and
3rd visit capacity, the similarity between each user and each user type is calculated, according to first visit capacity and the 3rd visit capacity,
Calculate the similarity between each content and each content type;Classifying step:For each user, selection and its similarity highest
User type of the user type as the user, for each content, selection is used as with its similarity highest content type should
The content type of content, and trigger visit capacity calculation procedure re-start visit capacity calculate and Similarity Measure step re-start
After Similarity Measure, the selection is re-started, when predetermined condition meets, no longer carries out the triggering.
Alternatively, in the method classified to user and content according to the present invention, in the initialization step,
According to the mapping relations between existing user and user type, for existing one or more users user type specify this one
Individual or multiple users, and it is randomly assigned a user without user type for the user type of no user;According to existing
Mapping relations between content and content type, the content type for existing one or more contents are specified in the one or more
Hold, and a content without content type is randomly assigned for sleazy content type.
Alternatively, in the method classified to user and content according to the present invention, for existing user with using
Mapping relations between the type of family, do not calculated in the Similarity Measure step similar between the user and each user type
Spend, and do not change the user type of the user in the classifying step;For reflecting between existing content and content type
Relation is penetrated, does not calculate the similarity between the content and each content type in the Similarity Measure step, and at described point
The content type of the content is not changed in class step.
Alternatively, in the method classified to user and content according to the present invention, calculate and walk in the visit capacity
In rapid, visit capacity of some user type to some content is calculated as follows:Obtain that the user type includes is all
User;Obtain visit capacity of wherein each user to the content;All visit capacities are summed, obtain the user type to the content
Visit capacity;Visit capacity of some user to some content type is calculated as follows:Obtain what the content type included
All the elements;Obtain visit capacity of the user to wherein each content;All visit capacities are summed, obtain the user to the content
The visit capacity of type;Visit capacity of some user type to some content type is calculated as follows:Obtain the user class
All the elements that all users and the content type that type includes include;Wherein each user is obtained to wherein each content
Visit capacity;All visit capacities are summed, obtain visit capacity of the user type to the content type.
Alternatively, in the method classified to user and content according to the present invention, the similarity is based on most
Similarity factor, Pasteur's similarity factor or the cosine similarity factor of small value.
Alternatively, in the method classified to user and content according to the present invention, walked in the Similarity Measure
In rapid, before two vectorial similarities are calculated, first the two vectorial domains are taken occur simultaneously or union after, then calculate this two
Individual vectorial similarity.
Alternatively, in the method classified to user and content according to the present invention, the predetermined condition is:Triggering
The number of the visit capacity calculation procedure and Similarity Measure step reaches default number;Or this classification results with
The classification results of last time are compared, and user's ratio that user type changes is less than default first thresholding and content type occurs
The content ratio of change is less than default second thresholding.
According to another aspect of the invention, there is provided a kind of computing device, be populated with according to the present invention in the computing device
The device classified to user and content.
Compared with prior art, in the scheme classified to user and content according to the present invention, using to website
User and content carry out double focusing alanysis, it is not necessary to know many attributes of content, it is only necessary to according to each user to each content
Visit capacity, it is possible to disposably user, content are classified simultaneously, user is grouped into each user type, content is grouped into respectively
Content type.Moreover, the solution of the present invention is in each iterative calculation, it is not necessary to which traverse user number × content number, therefore, it changes
It is much smaller compared to existing PLSA, LDA scheduling algorithm for amount of calculation.
Described above is only the general introduction of technical solution of the present invention, in order to better understand the technological means of the present invention,
And can be practiced according to the content of specification, and in order to allow above and other objects of the present invention, feature and advantage can
Become apparent, below especially exemplified by the embodiment of the present invention.
Brief description of the drawings
By reading the detailed description of hereafter preferred embodiment, it is various other the advantages of and benefit it is common for this area
Technical staff will be clear understanding.Accompanying drawing is only used for showing the purpose of preferred embodiment, and is not considered as to the present invention
Limitation.And in whole accompanying drawing, identical part is denoted by the same reference numerals.In the accompanying drawings:
Fig. 1 shows the schematic diagram of the user of use of the embodiment of the present invention and the double clustering methods of content;
Fig. 2 shows the flow chart of the method according to an embodiment of the invention classified to user and content;
Fig. 3 shows the structure chart of the device according to an embodiment of the invention classified to user and content;
Fig. 4 shows the calculating used time comparison diagram of double clustering algorithms and PLSA algorithms that the embodiment of the present invention uses;And
Fig. 5 is the Example Computing Device for being arranged as realizing the method classified to user and content according to the present invention
Block diagram.
Embodiment
The exemplary embodiment of the disclosure is more fully described below with reference to accompanying drawings.Although the disclosure is shown in accompanying drawing
Exemplary embodiment, it being understood, however, that may be realized in various forms the disclosure without should be by embodiments set forth here
Limited.On the contrary, these embodiments are provided to facilitate a more thoroughly understanding of the present invention, and can be by the scope of the present disclosure
Completely it is communicated to those skilled in the art.
The scheme that the embodiment of the present invention uses is to carry out double focusing alanysis to website user and content, and its realization principle is such as
Under:
User, content are regarded to two independent set of vertices of bipartite graph as, using user as left side point set L, using content as the right side
Side point set R, using user to the visit capacity of content as the weight on side, target is that all users are polymerized into Nl classification, by institute
There is content to be polymerized to Nr classification.
As shown in figure 1, before left figure is cluster, user A, B, C, D have access to content X, Y, Z, and there is corresponding power on each side
Weight (this example weighted value is 1).It is right figure after algorithm clusters, user A, B gather to be gathered for a use for a user class L, user C, D
Family class M, content X, Y is gathered is individually for a content class S for content a class R, content Z.Pass through cluster, access of the user to content
Belong to access of the user type to content type.
It is now that the symbolic interpretation hereinafter used is as follows for ease of understanding:
Pickone (S) represents one element of taking-up from set S, can take out one at random.
D (F) represents mapping F domain, that is, maps F key (key) set;R (F) represents mapping F codomain, that is, maps
F value (value) set.
F (x) represents that mapping F is mapped as x in domain the value (functional value corresponding to x) in codomain, that is, maps F key
For x when corresponding value values.
F (x) or F (, x) represent the mapping one-dimensional value of F domains to be fixed as remaining sub- mapping after x, i.e., inclined letter
Number, obtain not providing the mapping of domain that the subset of parameter formed to codomain in domain.
Argmax (F) represented to mapping F, the value in domain corresponding to the maximum in codomain.
Similarity (X, Y) represents the similarity between two vectorial X and Y.
Fig. 2 shows the flow chart of the method according to an embodiment of the invention classified to user and content, should
Method performs in computing device, is the first predetermined number user type suitable for each user clustering during user is gathered, will
Each content clustering in properties collection is the second predetermined number content type.
Reference picture 2, this method start from step S202 (initialization step).It is the first predetermined number in step S202
Each user type in user type specifies one or more of user's set user, is the second predetermined number content class
One or more of each content type given content set in type content.
Can be according to the mapping relations between existing user and user type, for the user of existing one or more users
Type specifies one or more users, and is randomly assigned the use without user type for the user type of no user
Family;According to the mapping relations between existing content and content type, the content type for existing one or more contents is specified
One or more contents, and it is randomly assigned a content without content type for sleazy content type.
If user's collection of the user including all pending clusters is combined into U, including the content of all pending clusters
Hold collection and be combined into A, the visit capacity mapping relations of each content are F in each user and properties collection A in user's set UUA, and FUA=
{(u,a)->fua|u∈U,a∈A,fua>0 }, in the mapping relations, (u, a)->fuaRepresent visit capacities of the user u to content a
For fua。
For example, U={ u1, u2, u3, u4, u5, u6, u7, u8, u9, u10 };
A={ a1, a2, a3, a4, a5, a6, a7, a8, a9, a10 };
FUA=(u6, a3)->4,(u5,a5)->8,(u9,a1)->8,(u7,a5)->7,(u7,a3)->2,(u8,a1)->
3,(u9,a6)->8,(u4,a2)->8,(u8,a4)->10,(u1,a2)->2,(u8,a9)->2,(u10,a10)->4,(u4,
a9)->10,(u1,a1)->10,(u2,a3)->5,(u10,a3)->8,(u5,a7)->9,(u3,a3)->3,(u4,a6)->6,
(u7,a2)->4,(u4,a5)->10,(u7,a8)->3,(u9,a7)->3,(u1,a6)->2,(u3,a8)->9,(u4,a6)->
3,(u7,a1)->1,(u7,a9)->9,(u5,a9)->6,(u3,a4)->8}。
Visit capacity mapping relations FUACorrespond to bivariate table see the table below, behavior user in table, be classified as content, respectively taking in table
It is worth for visit capacity:
a1 | a2 | a3 | a4 | a5 | a6 | a7 | a8 | a9 | a10 | |
u1 | 10 | 2 | 2 | |||||||
u2 | 5 | |||||||||
u3 | 3 | 8 | 9 | |||||||
u4 | 8 | 10 | 6 | 3 | 10 | |||||
u5 | 8 | 9 | 6 | |||||||
u6 | 4 | |||||||||
u7 | 1 | 4 | 2 | 7 | 3 | 9 | ||||
u8 | 3 | 10 | 2 | |||||||
u9 | 8 | 8 | 3 | |||||||
u10 | 8 | 4 |
If the user type collection including the first predetermined number user type is combined into G, including the second predetermined number content
The content type collection of type is combined into C.Wherein, the first predetermined number is to need to cluster all users in user's set U
Classification number, the second predetermined number is the classification number for needing to cluster all the elements in properties collection A, such as first pre-
Fixed number mesh and the second predetermined number are 3, and G={ g1, g2, g3 }, C={ c1, c2, c3 }.
In embodiments of the present invention, when classifying to user and content, the user type of user attaching is unique (i.e. one
User can only correspond to a user type), and uniquely (i.e. a content can only correspond to a content to the content type of content ownership
Type).
In initialization step, can not there is no user, all the elements without any priori conditions, i.e., all user types
Type does not have content.Or, it is possibility to have some priori conditions, the priori conditions are by manually entering to certain customers and content
Obtain that (now the user type for user's division is referred to as the initial user type of the user, is the interior of division of teaching contents after row division
Hold the initial content type that type is the type), for example, there is certain customers' type to have one or more users, and/or, there is portion
Content type is divided there are one or more contents.
User's collection provided with initial user type is combined into U0, U0In user to the mapping relations of user type be G0, and G0
={ u->g|u∈U0, g ∈ G }, in the mapping relations, u->G represents that the user type of user u ownership is g;There is initial content
The properties collection of type is A0, A0In content to the mapping relations of content type be C0, and C0={ a->c|a∈A0, c ∈ C },
In the mapping relations, a->C represents that the content type of content a ownership is c.For example, U0={ u1, u3 }, G0={ u1->g1,u3-
>G2 }, A0={ a1, a3 }, C0={ a1->c1,a3->c2}.
If user to the mapping relations of user type be GU, and GU={ u->G | u ∈ U, g ∈ G, and initialising subscriber to use
The mapping relations G of family typeU=G0;If content is to the mapping relations C of content typeA, and CA={ a->C | a ∈ A, c ∈ C }, and
Content is initialized to the mapping relations C of content typeA=C0.According to upper example, GU={ u1->g1,u3->G2 }, CA={ a1->c1,
a3->c2}。
Then, user is randomly choosed for the user type of no user, in the random selection of sleazy content type
Hold, it is specific as follows:
(1) a user without user type, false code are randomly assigned for the user type of no user
(Pseudocode) it is as follows:
GU+{pickone(U-D(GU))->pickone(G-R(GU))=>GU
According to upper example, g3 does not have user, specifies user u5, then GU={ u1->g1,u3->g2,u5->g3}.
(2) a content without content type is randomly assigned for sleazy content type, false code is as follows:
CA+{pickone(A-D(CA))->pickone(C-R(CA))=>CA
According to upper example, c3 does not have user, and it is a5 to specify user, then CA={ a1->c1,a3->c2,a5->c3.
So, allowing for all user types in user type set G has a user, in content type set C
All the elements type has content.But by initialization step, each user not represented in user's set U has
User type, each content also not represented in properties collection A have content type.
After initialization step, method enters step S204 (visit capacity calculation procedure).In step S204, according to
Each user calculates each user type pair in user type set respectively to the visit capacity of each content in properties collection in user's set
Each user is each interior in being closed to content set of types in the visit capacity (being referred to as the first visit capacity) of each content, user's set in properties collection
Each user type is each interior in being closed to content set of types in the visit capacity (being referred to as the second visit capacity) of appearance type and user type set
Hold the visit capacity (being referred to as the 3rd visit capacity) of type.
Visit capacity of some user type to some content can be calculated as follows:First, the user class is obtained
All users that type includes;Then, each user is obtained in the user type to the visit capacity of the content;Finally, to acquisition
All visit capacity summations, obtain visit capacity of the user type to the content.False code is as follows:
FGA=(g, a)->fga|fga=∑ FUA(u,a),GU(u)=g }
Wherein, FGARepresent that the visit capacity of each user type and each content in properties collection A in user type set G maps
Relation, in the mapping relations, (g, a)->fgaRepresent that user type g is f to content a visit capacityga。
Visit capacity of some user to some content type can be calculated as follows:First, the content class is obtained
All the elements that type includes;Then, visit capacity of the user to each content in the content type is obtained;Finally, to acquisition
All visit capacity summations, obtain visit capacity of the user to the content type.False code is as follows:
FUC=(u, c)->fuc|fuc=∑ FUA(u,a),CA(a)=c }
Wherein, FUCRepresent that each user in user's set U and the visit capacity of each content type in content type set C map
Relation, in the mapping relations, (u, c)->fucRepresent that user u is f to content type c visit capacityuc。
Visit capacity of some user type to some content type can be calculated as follows:First, the use is obtained
All the elements that all users and the content type that family type includes include;Then, obtain every in the user type set
Visit capacity of the individual user to each content in the content type set;All visit capacities of acquisition are summed, obtain the user class
Visit capacity of the type to the content type.False code is as follows:
FGC=(g, c)->fgc|fgc=∑ FUA(u,a)+α,GU(u)=g, CA(a)=c }
Wherein, FGCRepresent each user type in user type set G and the visit of each content type in content type set C
The amount of asking mapping relations, in the mapping relations, (g, c)->fgcRepresent that user type g is f to content type c visit capacitygc。
In addition, make it that iterative model, can also be by visit of the user type being calculated to content type according to stabilization
The amount of asking increases α, 0≤α≤1, and will increase visit capacity of the visit capacity as the user type to the content type after α.Rear
In the description of text, α=1.
According to upper example, user type g1 includes user u1, then user type g1 is u1 to a1's to content a1 visit capacity
Visit capacity 10, user type g1 are u1 to a2 visit capacity 2 to content a2 visit capacity, by that analogy, are obtained:FGA=(g1,
a1)->10,(g1,a2)->2,(g1,a6)->2,(g2,a3)->3,(g2,a4)->8,(g2,a8)->9,(g3,a5)->8,
(g3,a7)->9,(g3,a9)->6}。
Content type c1 includes user a1, then visit capacities 10 of the user u1 to content type c1 visit capacity for u1 to a1,
Visit capacities of the user u2 to content type c1 visit capacity for u2 to a1, it is no not have to note when accessing, by that analogy, obtain:FUC=
{(u1,c1)->10,(u2,c2)->5,(u3,c2)->3,(u4,c3)->10,(u5,c3)->8,(u6,c2)->4,(u7,c1)-
>1,(u7,c2)->2,(u7,c3)->7,(u8,c1)->3,(u9,c1)->8,(u10,c2)->8}。
User type g1 includes user u1, and content type c1 includes user a1, then user type g1 is to content type c1's
Visit capacity be u1 to a1 visit capacity 10 again plus 1 obtains 11, user type g1 is u1 to a3's to content type c2 visit capacity
Visit capacity, no visit capacity is 0 again plus 1 obtains 1, by that analogy, obtains:FGC=(g1, c1)->11,(g1,c2)->1,(g1,
c3)->1,(g2,c1)->1,(g2,c2)->4,(g2,c3)->1,(g3,c1)->1,(g3,c2)->1,(g3,c3)->9}。
After visit capacity calculation procedure, method enters step S206 (Similarity Measure step).In step S206,
According to second visit capacity and the 3rd visit capacity, each user and each user type in user type set in user's set are calculated
Between similarity, according to first visit capacity and the 3rd visit capacity, calculate each content and content type collection in properties collection
Similarity in conjunction between each content type.
In a kind of implementation (hereinafter referred to as mode 1), some user in gathering for user, the use is obtained respectively
The visit capacity of each content type, obtains a visit capacity vector during family is closed to content set of types;For in user type set
Some user type, the visit capacity of each content type during the user type is closed to content set of types is obtained respectively, obtains another
Visit capacity vector;Then, the similarity of the two visit capacities vector is calculated, and using the similarity obtained by calculating as the user
With the similarity of the user type.
For some content in properties collection, visit of each user type to the content in user type set is obtained respectively
The amount of asking, obtain a visit capacity vector;For some content type in content type set, user type set is obtained respectively
In each user type to the visit capacity of the content type, obtain another visit capacity vector;Then, calculate the two visit capacities to
The similarity of amount, and using the similarity obtained by calculating as the content and the similarity of the content type.
It is that similarity is directly calculated according to visit capacity in mode 1.It is (hereinafter referred to as square in another implementation
Formula 2) in, in order to improve the degree of accuracy of algorithm, various visit capacity ratios are calculated always according to visit capacity, and according to various visit capacities
Ratio calculates similarity, including:
A) according to corresponding to each user, the user to the visit capacity ratio of each content type, and, each user type
The corresponding, user type is similar between each user and each user type to calculate to the visit capacity ratio of each content type
Degree;
B) according to corresponding to each content, to the visit capacity ratio of the interior each user type for having access, and, each
Corresponding to user type, the user type to the visit capacity ratio of each content type, come calculate each content and each content type it
Between similarity.
So, it is necessary to calculate above-mentioned several visit capacity ratios before similarity is calculated.
Calculate the false code of corresponding, to the interior each user type for having access the visit capacity ratio of each content such as
Under:
PGA=(g, a)->pga|pga=FGA(g,a)/∑FGA(,a)}
Wherein, PGARepresent the visit capacity ratio of each content in each user type and the properties collection A in user type set G
Mapping relations, in the mapping relations, (g, a)->pgaExpression has in all user types of access to content a, user type
G is p to content a visit capacity ratioga。
Calculate corresponding to each user, the user it is as follows to the false code of the visit capacity ratio of each content type:
PUC=(u, c)->puc|puc=FUC(u,c)/∑FUC(u,)}
Wherein, PUCRepresent the visit capacity ratio of each content type in each user and content type set C in user's set U
Mapping relations, in the mapping relations, (u, c)->pucRepresent in all the elements type that user u is accessed, user u is to interior
The visit capacity ratio for holding type c is puc。
Calculate corresponding to each user type, the user type to the false code of the visit capacity ratio of each content type such as
Under:
PGC=(g, c)->pgc|pgc=FGC(g,c)/∑FGC(g,)}
Wherein, PGCRepresent each user type in user type set G and the visit of each content type in content type set C
A kind of mapping relations of the amount of asking ratio, in the mapping relations, (g, c)->pgcRepresent all the elements that user type g is accessed
In type, user type g is p to content type c visit capacity ratiogc。
Calculate corresponding to each content type, have to the content type access each user type visit capacity ratio puppet
Code is as follows:
QGC=(g, c)->qgc|qgc=FGC(g,c)/∑FGC(,c)}
Wherein, QGCRepresent each user type in user type set G and the visit of each content type in content type set C
Another mapping relations of the amount of asking ratio, in the mapping relations, (g, c)->qgcExpression has access to own content type c
In user type, user type g is q to content type c visit capacity ratiogc。
According to upper example, it is calculated:
PGA=(g1, a1)->1,(g1,a2)->1,(g2,a3)->1,(g2,a4)->1,(g3,a5)->1,(g1,a6)->
1,(g3,a7)->1,(g2,a8)->1,(g3,a9)->1};
PUC=(u1, c1)->1,(u2,c2)->1,(u3,c2)->1,(u4,c3)->1,(u5,c3)->1,(u6,c2)->
1,(u7,c1)->0.1,(u7,c2)->0.2,(u7,c3)->0.7,(u8,c1)->1,(u9,c1)->1,(u10,c2)->1};
PGC=(g1, c1)->0.85,(g1,c2)->0.077,(g1,c3)->0.077,(g2,c1)->0.17,(g2,
c2)->0.67,(g2,c3)->0.17,(g3,c1)->0.091,(g3,c2)->0.091,(g3,c3)->0.82};
QGC=(g1, c1)->0.85,(g2,c1)->0.077,(g3,c1)->0.077,(g1,c2)->0.17,(g2,
c2)->0.67,(g3,c2)->0.17,(g1,c3)->0.091,(g2,c3)->0.091,(g3,c3)->0.82}。
In addition, in mode 1 and mode 2, for the mapping relations G between existing user and user type0, similar
Spend in calculation procedure and do not calculate the similarity of the user (user with initial user type) between each user type, and
The user type of the user is not changed in follow-up classifying step;For the mapping relations between existing content and content type
C0, do not calculate the phase of the content (content with initial content type) between each content type in Similarity Measure step
Like degree, and the content type of the content is not changed in follow-up classifying step.
After Similarity Measure step, method enters step S208 (classifying step).In step S208, for
Each user (in addition to the user with initial user type) in the set of family, selection and its similarity highest user type
As the user type of the user, for each content (in addition to the content with initial content type) in properties collection,
Selection and its content type of similarity highest content type as the content.
In above-mentioned steps, calculate similar between each user and each user type in user type set in user's set
Degree, and, it is that user selects the false code of user type as follows according to similarity:
Wherein, SGRepresent the set of the similarity between active user u and user type g ∈ G.
According to upper example, during employing mode 2, it is calculated:GU={ u1->g1,u2->g2,u3->g2,u4->g3,u5->
g3,u6->g2,u7->g3,u8->g1,u9->g1,u10->g2}。
In above-mentioned steps, calculate similar between each content and each content type in content type set in properties collection
Degree, and, it is as follows for the false code of content selection content type according to similarity:
Wherein, SCRepresent the set of the similarity between Current Content a and content type c ∈ C.
According to upper example, during employing mode 2, it is calculated:CA={ a1->c1,a2->c1,a3->c2,a4->c2,a5->
c3,a6->c1,a7->c3,a8->c2,a9->c3}。
The similarity calculated between two vectors has many algorithms, and those skilled in the art can rationally select as needed
Select.In addition, in embodiments of the present invention, because each visit capacity vector is sparse vector, to save amount of calculation and calculating process
In storage overhead, in the Similarity Measure step, before two vectorial similarities are calculated, can first to the two to
After the domain of each element (for mapping) of amount merges and (can take common factor or union), then calculate the two vectorial similarities.
The three kinds given below algorithms for calculating the similarity between two vectors.If x, y is n (n>0) dimensional vector, each dimension
The value of degree is respectively x1,x2,…,xn, y1,y2,…,yn。
Vectorial each dimension value summation:
∑ x=∑sixi=x1+x2+…+xn
∑ y=∑siyi=y1+y2+…+yn
Vector is normalized, vectorial p, q after normalization, the value of each dimension is respectively p1,p2,…,pn, q1,q2,…,qn,
And:
(1) similarity between vector x, y uses the similarity factor (similarity factor based on min) based on minimum value, public
Formula is as follows:
Minsim is the similarity being calculated.
(2) similarity between vector x, y uses Pasteur's similarity factor (Bhattacharyya coefficients), and formula is as follows:
BC is the similarity being calculated.
(3) similarity between vector x, y uses cosine similarity factor, and formula is as follows:
Cossim is the similarity being calculated.
After classifying step, method enters step S210.In step S210, predetermined condition (iteration ends are judged
Condition) whether meet, if predetermined condition is unsatisfactory for, return to step S204 (enters next iteration), i.e. triggering accesses gauge
Calculate step and re-start visit capacity and calculate and after Similarity Measure step re-starts Similarity Measure, in classifying step again
Selected and classified;If predetermined condition meets, the triggering (algorithm terminates, and stops iteration) is no longer carried out, classification is walked
Rapid classification results export as final result.Wherein, the predetermined condition can be:Trigger the visit capacity calculation procedure
Reach default number (such as 30 times) with the number of Similarity Measure step;Or this classification results and point of last time
Class result is compared, and user's ratio that user type changes is less than default first thresholding (such as 90%), and content type
The content ratio to change is less than default second thresholding (such as 90%).
According to upper example, the 3rd iteration is identical with the 2nd iteration result, can terminate calculating, is as a result GU={ u1->g1,
u2->g2,u3->g2,u4->g3,u5->g3,u6->g2,u7->g3,u8->g2,u9->g1,u10->g2};CA={ a1->c1,
a2->c3,a3->c2,a4->c2,a5->c3,a6->c1,a7->c3,a8->c2,a9->c3,a10->c2}。
Fig. 3 shows the structure chart of the device according to an embodiment of the invention classified to user and content, should
Device is resided in computing device, is the first predetermined number user type suitable for each user clustering during user is gathered, will
Each content clustering in properties collection is the second predetermined number content type.
Reference picture 3, described device include initialization module 310, visit capacity computing module 320, similarity calculation module 330
With sort module 340.
Each user type that initialization module 310 is suitable in the first predetermined number user type specifies user's set
One or more of user, be the second predetermined number content type in each content type given content set in one
Individual or multiple contents.When performing initialization operation, can not there is no user without any priori conditions, i.e., all user types,
All the elements type does not have content.Or, it is possibility to have some priori conditions, the priori conditions are by manually to certain customers
Obtain that (now the user type for user's division is referred to as the initial user type of the user, is content after being divided with content
The content type of division is the initial content type of the type), for example, there is certain customers' type to have one or more users, and/
Or, there is part content type there are one or more contents.
Therefore, initialization module 310 can be according to the mapping relations between existing user and user type, for existing one
The user type of individual or multiple users specifies one or more users, and is randomly assigned one for the user type of no user
There is no the user of user type;According to the mapping relations between existing content and content type, in existing one or more
The content type of appearance specifies one or more contents, and is randomly assigned one for sleazy content type and does not have content class
The content of type.
Visit capacity computing module 320 is suitable to the visit capacity according to user to content, calculates each user type to each content
First visit capacity, each user are accessed the 3rd of each content type the second visit capacity of each content type and each user type
Amount.
Visit capacity computing module 320 can calculate visit capacity of some user type to some content as follows:
First, all users that the user type includes are obtained;Then, each access of the user to the content in the user type is obtained
Amount;Finally, all visit capacities of acquisition are summed, obtains visit capacity of the user type to the content.
Visit capacity computing module 320 can calculate visit capacity of some user to some content type as follows:
First, all the elements that the content type includes are obtained;Then, access of the user to each content in the content type is obtained
Amount;Finally, all visit capacities of acquisition are summed, obtains visit capacity of the user to the content type.
Visit capacity computing module 320 can calculate access of some user type to some content type as follows
Amount:First, all users that the user type includes and all the elements that the content type includes are obtained;Then, obtaining should
Visit capacity of each user to each content in the content type set in user type set;All visit capacities of acquisition are asked
(or after summation, then by summed result increase α, 0≤α≤1), obtains visit capacity of the user type to the content type.
Similarity calculation module 330 is suitable to according to second visit capacity and the 3rd visit capacity, calculates each user and each use
Similarity between the type of family, according to first visit capacity and the 3rd visit capacity, calculate between each content and each content type
Similarity.Wherein, do not calculated for the mapping relations between existing user and user type, similarity calculation module 330
Similarity between the user and each user type;For the mapping relations between existing content and content type, similarity
Computing module 330 does not calculate the similarity between the content and each content type.
In one implementation, some user in gathering for user, obtains the user to content set of types respectively
The visit capacity of each content type in conjunction, obtain a visit capacity vector;For some user type in user type set, divide
The visit capacity of each content type during the user type is closed to content set of types is not obtained, obtains another visit capacity vector;Then,
The similarity of the two visit capacities vector is calculated, and using the similarity obtained by calculating as the user and the phase of the user type
Like degree.
For some content in properties collection, visit of each user type to the content in user type set is obtained respectively
The amount of asking, obtain a visit capacity vector;For some content type in content type set, user type set is obtained respectively
In each user type to the visit capacity of the content type, obtain another visit capacity vector;Then, calculate the two visit capacities to
The similarity of amount, and using the similarity obtained by calculating as the content and the similarity of the content type.
In another implementation, in order to improve the degree of accuracy of algorithm, various visit capacities are calculated always according to visit capacity
Ratio, and similarity is calculated according to various visit capacity ratios, including:
A) according to corresponding to each user, the user to the visit capacity ratio of each content type, and, each user type
The corresponding, user type is similar between each user and each user type to calculate to the visit capacity ratio of each content type
Degree;
B) according to corresponding to each content, to the visit capacity ratio of the interior each user type for having access, and, each
Corresponding to user type, the user type to the visit capacity ratio of each content type, come calculate each content and each content type it
Between similarity.
The similarity calculated between two vectors has many algorithms, and those skilled in the art can rationally select as needed
Select.For example, the similarity is the similarity factor based on minimum value, Pasteur's similarity factor or cosine similarity factor.In addition,
In the embodiment of the present invention, because each visit capacity vector is sparse vector, opened to save the storage in amount of calculation and calculating process
Pin, similarity calculation module 330, can be first to the two vectorial each elements (to reflect before two vectorial similarities are calculated
Penetrate) domain merge (can take common factor or union) after, then calculate the two vectorial similarities.
Sort module 340 is suitable to for each user, and selection is with its similarity highest user type as the user's
User type, for each content, selection and its content type of similarity highest content type as the content, and trigger
Visit capacity computing module 320 re-starts visit capacity and calculated and after similarity calculation module 330 re-starts Similarity Measure, weight
The selection is newly carried out, when predetermined condition meets, no longer carries out the triggering.Wherein, for existing user and user
Mapping relations between type, sort module 340 do not change the user type of the user;For existing content and content type
Between mapping relations, sort module 340 do not change the content type of the content.
The predetermined condition can be:Triggering visit capacity computing module 320 and the number of similarity calculation module 330 reach
Default number;Or this classification results are compared with the classification results of last time, user's ratio that user type changes
It is less than default second thresholding less than the content ratio that default first thresholding and content type change.
It is double using being carried out to website user and content in the scheme classified to user and content according to the present invention
Cluster analysis, it is not necessary to know many attributes of content, it is only necessary to the visit capacity according to each user to each content, it is possible to once
Property user, content simultaneously classified, user is grouped into each user type, content is grouped into each content type.Moreover, this hair
The iterative calculation amount of bright scheme is much smaller compared to existing PLSA, LDA scheduling algorithm.
Below to the calculating of the scheme (double clustering algorithms) according to embodiments of the present invention classified to user and content
Complexity is analyzed as follows:
If the connection number between user and content is L, user type number is S, and content type number is T, every time iterative calculation
Measure as O (L* (S+T)).For in general website, number of users M and content number N are bigger, and between user and content
Connection is sparse, traverse user number × content number is not needed using double clustering algorithms, so amount of calculation and little.And PLSA,
LDA each iterative calculation amount is O (M*N*S), and PLSA, LDA only have an intermediate layer to be believed that S=T.It is far smaller than M*N in L
When, show clear superiority using double clustering algorithms.
The embodiment of the present invention, which can achieve the effect that, to be exemplified below:
With double clustering methods of the embodiment of the present invention to 672069 users, 722618 contents, 259255531 users
Access to content is clustered, is polymerized to when 500 classes share 22 minutes, is polymerized to when 10000 classes share 4451 minutes.Use PLSA
Are obtained to same data, internal memory overflows when calculating 10000 dimension, if press Linear Estimation application 485 minutes 500 dimensional vector used times
9704 minutes.When being polymerized to 500 classes, computational efficiency of the invention is approximately 22 times of PLSA, and it is PLSA to be polymerized to during 10000 classes
2.2 times, improved efficiency becomes apparent from when number of clusters and larger former data volume level difference.Compare as shown in Figure 4 in column diagram.
Note:Context of methods realizes that PLSA is realized with c program+MPI, from efficiency of code execution using spark calculating platforms
C program should be more higher, so the lifting of algorithm actual efficiency should be bigger than data in text.
Fig. 5 is the Example Computing Device for being arranged as realizing the method classified to user and content according to the present invention
900 block diagram.
In basic configuration 902, computing device 900 typically comprise system storage 906 and one or more at
Manage device 904.The communication that memory bus 908 can be used between processor 904 and system storage 906.
Depending on desired configuration, processor 904 can be any kind of processing, include but is not limited to:Microprocessor
(μ P), microcontroller (μ C), digital information processor (DSP) or any combination of them.Processor 904 can be included such as
The cache of one or more rank of on-chip cache 910 and second level cache 912 etc, processor core
914 and register 916.The processor core 914 of example can include arithmetic and logical unit (ALU), floating-point unit (FPU),
Digital signal processing core (DSP core) or any combination of them.The Memory Controller 918 of example can be with processor
904 are used together, or in some implementations, Memory Controller 918 can be an interior section of processor 904.
Depending on desired configuration, system storage 906 can be any type of memory, include but is not limited to:Easily
The property lost memory (RAM), nonvolatile memory (ROM, flash memory etc.) or any combination of them.System stores
Device 906 can include operating system 920, one or more apply 922 and routine data 924.It can include quilt using 922
It is arranged for carrying out the device 926 classified to user and content for the method classified to user and content.Routine data
924 can include can be used for visit capacity 928 of the user as described here to content.In some embodiments, can using 922
To be arranged as being operated using routine data 924 on an operating system.
Computing device 900 can also include contributing to from various interface equipments (for example, output equipment 942, Peripheral Interface
944 and communication equipment 946) to basic configuration 902 via the communication of bus/interface controller 930 interface bus 940.Example
Output equipment 942 include graphics processing unit 948 and audio treatment unit 950.They can be configured as contributing to via
One or more A/V port 952 is communicated with the various external equipments of such as display or loudspeaker etc.Outside example
If interface 944 can include serial interface controller 954 and parallel interface controller 956, they can be configured as contributing to
Via one or more I/O port 958 and such as input equipment (for example, keyboard, mouse, pen, voice-input device, touch
Input equipment) or the external equipment of other peripheral hardwares (such as printer, scanner etc.) etc communicated.The communication of example is set
Standby 946 can include network controller 960, and it can be arranged to be easy to via one or more COM1 964 and one
The communication that other individual or multiple computing devices 962 pass through network communication link.
Network communication link can be an example of communication media.Communication media can be generally presented as in such as carrier wave
Or computer-readable instruction in the modulated data signal of other transmission mechanisms etc, data structure, program module, and can
With including any information delivery media." modulated data signal " can such signal, one in its data set or more
It is individual or it change can the mode of coding information in the signal carry out.As nonrestrictive example, communication media can be with
Include the wire medium of such as cable network or private line network etc, and it is such as sound, radio frequency (RF), microwave, infrared
(IR) the various wireless mediums or including other wireless mediums.Term computer-readable medium used herein can include depositing
Both storage media and communication media.
Computing device 900 can be implemented as a part for portable (or mobile) electronic equipment of small size, and these electronics are set
It is standby can be such as cell phone, personal digital assistant (PDA), it is personal media player device, wireless network browsing apparatus, individual
People's helmet, application specific equipment or the mixing apparatus that any of the above function can be included.Computing device 900 can be with
It is embodied as including desktop computer and the personal computer of notebook computer configuration.
Algorithm and display be not inherently related to any certain computer, virtual system or miscellaneous equipment provided herein.
Various general-purpose systems can also be used together with teaching based on this.As described above, required by constructing this kind of system
Structure be obvious.In addition, the present invention is not also directed to any certain programmed language.It should be understood that it can utilize various
Programming language realizes the content of invention described herein, and the description done above to language-specific is to disclose this hair
Bright preferred forms.
In the specification that this place provides, numerous specific details are set forth.It is to be appreciated, however, that the implementation of the present invention
Example can be put into practice in the case of these no details.In some instances, known method, structure is not been shown in detail
And technology, so as not to obscure the understanding of this description.
Similarly, it will be appreciated that in order to simplify the disclosure and help to understand one or more of each inventive aspect,
Above in the description to the exemplary embodiment of the present invention, each feature of the invention is grouped together into single implementation sometimes
In example, figure or descriptions thereof.However, the method for the disclosure should be construed to reflect following intention:I.e. required guarantor
The application claims of shield features more more than the feature being expressly recited in each claim.It is more precisely, such as following
Claims reflect as, inventive aspect is all features less than single embodiment disclosed above.Therefore,
Thus the claims for following embodiment are expressly incorporated in the embodiment, wherein each claim is in itself
Separate embodiments all as the present invention.
Those skilled in the art, which are appreciated that, to be carried out adaptively to the module in the equipment in embodiment
Change and they are arranged in one or more equipment different from the embodiment.Can be the module or list in embodiment
Member or component be combined into a module or unit or component, and can be divided into addition multiple submodule or subelement or
Sub-component.In addition at least some in such feature and/or process or unit exclude each other, it can use any
Combination is disclosed to all features disclosed in this specification (including adjoint claim, summary and accompanying drawing) and so to appoint
Where all processes or unit of method or equipment are combined.Unless expressly stated otherwise, this specification (including adjoint power
Profit requires, summary and accompanying drawing) disclosed in each feature can be by providing the alternative features of identical, equivalent or similar purpose come generation
Replace.
In addition, it will be appreciated by those of skill in the art that although some embodiments described herein include other embodiments
In included some features rather than further feature, but the combination of the feature of different embodiments means in of the invention
Within the scope of and form different embodiments.For example, in the following claims, embodiment claimed is appointed
One of meaning mode can use in any combination.
The all parts embodiment of the present invention can be realized with hardware, or to be run on one or more processor
Software module realize, or realized with combinations thereof.It will be understood by those of skill in the art that it can use in practice
Microprocessor or digital signal processor (DSP) according to embodiments of the present invention are classified to realize to user and content
The some or all functions of some or all parts in device.The present invention is also implemented as being used to perform being retouched here
The some or all equipment or program of device (for example, computer program and computer program product) for the method stated.
Such program for realizing the present invention can store on a computer-readable medium, or can have one or more signal
Form.Such signal can be downloaded from internet website and obtained, either provide on carrier signal or with it is any its
He provides form.
It should be noted that the present invention will be described rather than limits the invention for above-described embodiment, and ability
Field technique personnel can design alternative embodiment without departing from the scope of the appended claims.In the claims,
Any reference symbol between bracket should not be configured to limitations on claims.Word "comprising" does not exclude the presence of not
Element or step listed in the claims.Word "a" or "an" before element does not exclude the presence of multiple such
Element.The present invention can be by means of including the hardware of some different elements and being come by means of properly programmed computer real
It is existing.In if the unit claim of equipment for drying is listed, several in these devices can be by same hardware branch
To embody.The use of word first, second, and third does not indicate that any order.These words can be explained and run after fame
Claim.
Claims (15)
1. a kind of device classified to user and content, is resided in computing device, suitable for each use during user is gathered
Family cluster is the first predetermined number user type, is the second predetermined number content class by each content clustering in properties collection
Type, described device include:
Initialization module, it is suitable for each user type in the first predetermined number user type and specifies in user's set
One or more users, be the second predetermined number content type in each content type given content set in
One or more contents;
Visit capacity computing module, suitable for the visit capacity according to user to content, calculate each user type and visit the first of each content
The 3rd visit capacity of the amount of asking, each user to the second visit capacity of each content type and each user type to each content type;
Similarity calculation module, suitable for according to second visit capacity and the 3rd visit capacity, calculating each user and each user type
Between similarity, according to first visit capacity and the 3rd visit capacity, calculate similar between each content and each content type
Degree;And
Sort module, suitable for for each user, selection and its user class of similarity highest user type as the user
Type, for each content, selection and its content type of similarity highest content type as the content, and trigger visit capacity
Computing module re-starts visit capacity and calculated and after similarity calculation module re-starts Similarity Measure, re-starts the choosing
Select, when predetermined condition meets, no longer carry out the triggering.
2. device as claimed in claim 1, wherein, the initialization module is further adapted for:According to existing user with using
Mapping relations between the type of family, one or more users are specified for the user type of existing one or more users, and be
The user type for not having user is randomly assigned a user without user type;According between existing content and content type
Mapping relations, specify one or more contents for the content type of existing one or more contents, and to be sleazy
Content type is randomly assigned a content without content type.
3. device as claimed in claim 2, wherein, it is described for the mapping relations between existing user and user type
Similarity calculation module does not calculate the similarity between the user and each user type, and the sort module does not change the user
User type;
For the mapping relations between existing content and content type, the similarity calculation module do not calculate the content with it is each
Similarity between content type, and the sort module does not change the content type of the content.
4. device as claimed in claim 2, wherein, the visit capacity computing module calculates some user class as follows
Visit capacity of the type to some content:Obtain all users that the user type includes;Wherein each user is obtained to the content
Visit capacity;All visit capacities are summed, obtain visit capacity of the user type to the content;
The visit capacity computing module calculates visit capacity of some user to some content type as follows:It is interior to obtain this
Hold all the elements that type includes;Obtain visit capacity of the user to wherein each content;All visit capacities are summed, are somebody's turn to do
Visit capacity of the user to the content type;
The visit capacity computing module calculates visit capacity of some user type to some content type as follows:Obtain
All the elements that all users and the content type that the user type includes include;Wherein each user is obtained to wherein every
The visit capacity of individual content;All visit capacities are summed, obtain visit capacity of the user type to the content type.
5. device as claimed in claim 4, wherein, the similarity is the similarity factor based on minimum value, the similar system of Pasteur
Number or cosine similarity factor.
6. device as claimed in claim 5, wherein, the similarity calculation module calculate as follows some user with
Similarity between some user type:Some user in gathering for user, obtains the user to content set of types respectively
The visit capacity of each content type in conjunction, obtain a visit capacity vector;For some user type in user type set, divide
The visit capacity of each content type during the user type is closed to content set of types is not obtained, obtains another visit capacity vector;Calculate
The similarity of the two visit capacities vector, and the similarity obtained by calculating is similar to the user type as the user
Degree;
The similarity calculation module calculates the similarity between some content and some content type as follows:For
Some content in properties collection, each user type in user type set is obtained respectively and, to the visit capacity of the content, obtains one
Individual visit capacity vector;For some content type in content type set, each user class in user type set is obtained respectively
Type obtains another visit capacity vector to the visit capacity of the content type;The similarity of the two visit capacities vector is calculated, and will
Similarity obtained by calculating is as the content and the similarity of the content type;
Wherein, the user type collection is combined into the set that the first predetermined number user type is formed, the content type
Collection is combined into the set that the second predetermined number content type is formed;
Wherein, the similarity calculation module first takes before two vectorial similarities are calculated to the two vectorial domains
After common factor or union, then calculate the two vectorial similarities.
7. device as claimed in claim 1, wherein, the predetermined condition is:Trigger the visit capacity computing module and similar
The number of degree computing module reaches default number;Or this classification results are compared with the classification results of last time, user class
User's ratio that type changes is less than default less than the content ratio that default first thresholding and content type change
Second thresholding.
8. a kind of method classified to user and content, is performed in computing device, suitable for each use during user is gathered
Family cluster is the first predetermined number user type, is the second predetermined number content class by each content clustering in properties collection
Type, methods described include:
Initialization step:One in user's set is specified for each user type in the first predetermined number user type
Individual or multiple users, it is one in each content type given content set in the second predetermined number content type
Or multiple contents;
Visit capacity calculation procedure:Visit capacity according to user to content, calculate each user type to the first visit capacity of each content,
Threeth visit capacity of each user to the second visit capacity of each content type and each user type to each content type;
Similarity Measure step, according to second visit capacity and the 3rd visit capacity, calculate between each user and each user type
Similarity, according to first visit capacity and the 3rd visit capacity, calculate the similarity between each content and each content type;With
And
Classifying step:For each user, selection and its user type of similarity highest user type as the user are right
In each content, selection and its content type of similarity highest content type as the content, and trigger visit capacity calculating
Step re-starts visit capacity and calculated and after Similarity Measure step re-starts Similarity Measure, re-starts the selection,
When predetermined condition meets, the triggering is no longer carried out.
9. method as claimed in claim 8, wherein, in the initialization step, according to existing user and user type
Between mapping relations, specify one or more users for the user type of existing one or more users, and not use
The user type at family is randomly assigned a user without user type;According to the mapping between existing content and content type
Relation, one or more contents are specified for the content type of existing one or more contents, and be sleazy content class
Type is randomly assigned a content without content type.
10. method as claimed in claim 9, wherein, for the mapping relations between existing user and user type, in institute
State and do not calculate similarity between the user and each user type in Similarity Measure step, and do not change in the classifying step
Become the user type of the user;
For the mapping relations between existing content and content type, the content is not calculated in the Similarity Measure step
With the similarity between each content type, and the content type of the content is not changed in the classifying step.
11. method as claimed in claim 9, wherein, in the visit capacity calculation procedure, some is calculated as follows
Visit capacity of the user type to some content:Obtain all users that the user type includes;Wherein each user is obtained to this
The visit capacity of content;All visit capacities are summed, obtain visit capacity of the user type to the content;
Visit capacity of some user to some content type is calculated as follows:Obtain that the content type includes it is all in
Hold;Obtain visit capacity of the user to wherein each content;All visit capacities are summed, obtain the user to the content type
Visit capacity;
Visit capacity of some user type to some content type is calculated as follows:Obtain the institute that the user type includes
There is a user and all the elements that the content type includes;Obtain visit capacity of wherein each user to wherein each content;It is right
All visit capacity summations, obtain visit capacity of the user type to the content type.
12. method as claimed in claim 11, wherein, the similarity is the similarity factor based on minimum value, Pasteur is similar
Coefficient or cosine similarity factor.
13. method as claimed in claim 12, wherein, in the Similarity Measure step,
The similarity between some user and some user type is calculated as follows:Some use in gathering for user
Family, the visit capacity of each content type during the user closes to content set of types is obtained respectively, obtains a visit capacity vector;For with
Some user type in the type set of family, the access of each content type during the user type is closed to content set of types is obtained respectively
Amount, obtain another visit capacity vector;The similarity of the two visit capacities vector is calculated, and the similarity obtained by calculating is made
For the user and the similarity of the user type;And
The similarity between some content and some content type is calculated as follows:For in some in properties collection
Hold, obtain each user type in user type set respectively and, to the visit capacity of the content, obtain a visit capacity vector;For interior
Hold some content type in type set, obtain access of each user type to the content type in user type set respectively
Amount, obtain another visit capacity vector;The similarity of the two visit capacities vector is calculated, and the similarity obtained by calculating is made
For the content and the similarity of the content type;
Wherein, the user type collection is combined into the set that the first predetermined number user type is formed, the content type
Collection is combined into the set that the second predetermined number content type is formed;
Wherein, before two vectorial similarities are calculated, after the two vectorial domains first are taken with common factor or union, then calculate
The two vectorial similarities.
14. method as claimed in claim 8, wherein, the predetermined condition is:Trigger the visit capacity calculation procedure and similar
The number of degree calculation procedure reaches default number;Or this classification results are compared with the classification results of last time, user class
User's ratio that type changes is less than default less than the content ratio that default first thresholding and content type change
Second thresholding.
15. a kind of computing device, including such as the dress according to any one of claims 1 to 7 classified to user and content
Put.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201510041042.4A CN104598601B (en) | 2015-01-27 | 2015-01-27 | A kind of method, apparatus classified to user and content and computing device |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201510041042.4A CN104598601B (en) | 2015-01-27 | 2015-01-27 | A kind of method, apparatus classified to user and content and computing device |
Publications (2)
Publication Number | Publication Date |
---|---|
CN104598601A CN104598601A (en) | 2015-05-06 |
CN104598601B true CN104598601B (en) | 2017-12-12 |
Family
ID=53124386
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201510041042.4A Active CN104598601B (en) | 2015-01-27 | 2015-01-27 | A kind of method, apparatus classified to user and content and computing device |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN104598601B (en) |
Families Citing this family (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
KR20170017583A (en) * | 2015-08-07 | 2017-02-15 | 주식회사 더아이콘티비 | Apparatus for providing contents |
CN106021329A (en) * | 2016-05-06 | 2016-10-12 | 西安电子科技大学 | A user similarity-based sparse data collaborative filtering recommendation method |
CN106101839A (en) * | 2016-06-20 | 2016-11-09 | 徐汕 | A kind of method identifying that television user gathers |
CN107451170B (en) * | 2017-03-10 | 2020-04-10 | 中山大学 | Parallel PLSA method based on MPI computing framework |
CN109409949A (en) * | 2018-10-17 | 2019-03-01 | 北京字节跳动网络技术有限公司 | Determination method, apparatus, electronic equipment and the storage medium of user group's classification |
CN109933788B (en) * | 2019-02-14 | 2023-05-23 | 北京百度网讯科技有限公司 | Type determining method, device, equipment and medium |
CN111176800A (en) * | 2019-07-05 | 2020-05-19 | 腾讯科技(深圳)有限公司 | Training method and device of document theme generation model |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101685521A (en) * | 2008-09-23 | 2010-03-31 | 北京搜狗科技发展有限公司 | Method for showing advertisements in webpage and system |
CN103198418A (en) * | 2013-03-15 | 2013-07-10 | 北京亿赞普网络技术有限公司 | Application recommendation method and application recommendation system |
Family Cites Families (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2009163496A (en) * | 2008-01-07 | 2009-07-23 | Funai Electric Co Ltd | Content reproduction system |
-
2015
- 2015-01-27 CN CN201510041042.4A patent/CN104598601B/en active Active
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101685521A (en) * | 2008-09-23 | 2010-03-31 | 北京搜狗科技发展有限公司 | Method for showing advertisements in webpage and system |
CN103198418A (en) * | 2013-03-15 | 2013-07-10 | 北京亿赞普网络技术有限公司 | Application recommendation method and application recommendation system |
Non-Patent Citations (1)
Title |
---|
基于项目和用户双重聚类的协同过滤推荐算法;施华;《中国优秀硕士学位论文全文数据库》;20090601;第20-26页 * |
Also Published As
Publication number | Publication date |
---|---|
CN104598601A (en) | 2015-05-06 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN104598601B (en) | A kind of method, apparatus classified to user and content and computing device | |
Bai et al. | A neural collaborative filtering model with interaction-based neighborhood | |
Kim et al. | Simultaneous discovery of common and discriminative topics via joint nonnegative matrix factorization | |
CN110866181B (en) | Resource recommendation method, device and storage medium | |
Tsiotas | Detecting different topologies immanent in scale-free networks with the same degree distribution | |
Bickel et al. | A nonparametric view of network models and Newman–Girvan and other modularities | |
US9536201B2 (en) | Identifying associations in data and performing data analysis using a normalized highest mutual information score | |
US9208257B2 (en) | Partitioning a graph by iteratively excluding edges | |
WO2021143267A1 (en) | Image detection-based fine-grained classification model processing method, and related devices | |
CN107786943A (en) | A kind of tenant group method and computing device | |
CN112085565B (en) | Deep learning-based information recommendation method, device, equipment and storage medium | |
CN108021708B (en) | Content recommendation method and device and computer readable storage medium | |
WO2017171826A1 (en) | Entropic classification of objects | |
JP7083375B2 (en) | Real-time graph-based embedding construction methods and systems for personalized content recommendations | |
Hare et al. | Derivative-free optimization methods for finite minimax problems | |
CN107341233A (en) | A kind of position recommends method and computing device | |
CN110647696A (en) | Business object sorting method and device | |
Zhang et al. | Advertisement click-through rate prediction based on the weighted-ELM and adaboost algorithm | |
CN112131261A (en) | Community query method and device based on community network and computer equipment | |
Kagan et al. | Probabilistic Search for Tracking Targets: Theory and Modern Application | |
CN110598123B (en) | Information retrieval recommendation method, device and storage medium based on image similarity | |
CN112995414B (en) | Behavior quality inspection method, device, equipment and storage medium based on voice call | |
CN113343713B (en) | Intention recognition method and device, computer equipment and storage medium | |
Liu et al. | A new robust model-free feature screening method for ultra-high dimensional right censored data | |
CN113408579A (en) | Internal threat early warning method based on user portrait |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |