CN109886300A - A kind of user's clustering method, device and equipment - Google Patents

A kind of user's clustering method, device and equipment Download PDF

Info

Publication number
CN109886300A
CN109886300A CN201910043942.0A CN201910043942A CN109886300A CN 109886300 A CN109886300 A CN 109886300A CN 201910043942 A CN201910043942 A CN 201910043942A CN 109886300 A CN109886300 A CN 109886300A
Authority
CN
China
Prior art keywords
user
clustered
cluster
feature
class cluster
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201910043942.0A
Other languages
Chinese (zh)
Inventor
马国伟
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing QIYI Century Science and Technology Co Ltd
Original Assignee
Beijing QIYI Century Science and Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing QIYI Century Science and Technology Co Ltd filed Critical Beijing QIYI Century Science and Technology Co Ltd
Priority to CN201910043942.0A priority Critical patent/CN109886300A/en
Publication of CN109886300A publication Critical patent/CN109886300A/en
Pending legal-status Critical Current

Links

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The embodiment of the invention provides a kind of user's clustering method, device and equipment, this method comprises: the history pushed information that each user to be clustered clicked is determined, respectively as the characteristic information of each user to be clustered;According to the characteristic information of each user to be clustered, the similarity between every two user to be clustered is calculated;According to the characteristic information of the similarity and each user to be clustered that are calculated, treats cluster user and clustered.User is clustered using each scheme provided in an embodiment of the present invention, can be improved the accuracy of cluster result.

Description

A kind of user's clustering method, device and equipment
Technical field
The present invention relates to field of computer technology, more particularly to a kind of user's clustering method, device and equipment.
Background technique
With popularizing for mobile terminal, operator often passes through the mobile terminal actively used to user and sends information Mode, the push of Lai Shixian information.For example, in such a way that the mobile terminal that active is used to user sends advertisement, it is wide to realize Accuse push.
In process of information push, in order to increase the probability that information is clicked, usually according to the hobby of user come The possible interested information of user is pushed, and in order to accelerate the efficiency of information push, usually simultaneously to similar interests The same class user of hobby sends identical information.The prior art is usually the basic letter such as gender, age, occupation for collecting user Breath, and judge whether the user is a kind of user that there are similar interests to like using the essential information being collected into.
However, inventor has found in the implementation of the present invention, at least there are the following problems for the prior art:
User may provide the essential information of some inaccuracy, cause to collect basic to protect oneself privacy Information inaccuracy, so that determining that the hobby of user is not accurate enough according to essential information, then determining according to hobby Same class user in there may be the actual interest of certain user hobby be from the actual interest of other users hobby be different 's.
Summary of the invention
The embodiment of the present invention is designed to provide a kind of user's clustering method, device and equipment, to solve the prior art The problem of.Specific technical solution is as follows:
The one side that the present invention is implemented provides a kind of user's clustering method, which comprises
For each user to be clustered, the history pushed information that the user to be clustered clicked is determined, and according to determining History pushed information obtain the feature of the user to be clustered;
According to the feature of user to be clustered, the similarity between every two user to be clustered is calculated;
According to the feature of the similarity and each user to be clustered that are calculated, treats cluster user and clustered.
Optionally, the feature of the basis is calculated similarity and each user to be clustered, treat cluster user into After the step of row cluster, further includes:
Each existing subscriber's class cluster is extended using following manner:
The user class cluster that cluster obtains is compared with the feature of user in existing subscriber's class cluster, determines what cluster obtained The number of user in user class cluster with similar features;
From the user class cluster that cluster obtains, the first preset quantity user class cluster is chosen, wherein selected user class The number of user in cluster with similar features is all larger than the number of the user in the user class cluster that do not choose with similar features;
User in the user class cluster of selection is added in existing subscriber's class cluster.
Optionally, the step of history pushed information according to determined by obtains the feature of the user to be clustered described it Afterwards, this method further include:
From the feature of user to be clustered obtained, the second preset quantity feature is randomly selected as cluster centre;
The feature of similarity and each user to be clustered that the basis is calculated, treats what cluster user was clustered Step, comprising:
For each user to be clustered, according to the similarity being calculated, the user to be clustered and any described poly- is determined Similarity between class center, and judge whether the user to be clustered belongs to the cluster centre and correspond to according to identified similarity User class cluster;If the user to be clustered belongs to the corresponding user class cluster of the cluster centre, which is added to In the corresponding user class cluster of the cluster centre;
For each user class cluster, according to the feature of the user to be clustered in the user class cluster, the user class cluster is calculated Average characteristics, in the case where the average characteristics being calculated are different from the cluster centre of the user class cluster, by the user class The cluster centre of cluster is updated to the average characteristics being calculated;And return it is described be directed to each user to be clustered, according to calculating The similarity arrived determines the similarity between the user to be clustered and any cluster centre, and according to identified similar Degree judges whether the user to be clustered belongs to the corresponding user class cluster of the cluster centre;If the user to be clustered belongs in the cluster The user to be clustered is then added to the step in the corresponding user class cluster of the cluster centre by the corresponding user class cluster of the heart, until When the average characteristics of each user class cluster are identical as the cluster centre of the user class cluster, the user class cluster clustered at this time is made For cluster result.
Optionally, the feature according to user to be clustered calculates separately the similarity between every two user to be clustered The step of, comprising:
The phase between every two user to be clustered is calculated using cosine similarity algorithm according to the feature of user to be clustered Like degree;Alternatively,
Using following formula calculate separately the similarity factor between every two user to be clustered, and according to being calculated Similarity factor determines the similarity between described two users to be clustered:
Wherein, s (i, j) indicates the similarity factor between user i and user j to be clustered to be clustered, UiIndicate use to be clustered The feature vector of family i, UjIndicate the feature vector of user j to be clustered, | Ui&Uj| indicate user i to be clustered feature vector and to The intersection of the feature vector of cluster user j, | Ui|Uj| indicate the feature vector of user i to be clustered and the feature of user j to be clustered The union of vector.
The another aspect that the present invention is implemented, additionally provides a kind of user's clustering apparatus, and described device includes:
Determining module, for determining the history pushed information that the user to be clustered clicked for each user to be clustered, And the feature of the user to be clustered is obtained according to identified history pushed information;
Computing module calculates the similarity between every two user to be clustered for the feature according to user to be clustered;
Cluster module treats cluster user for the feature according to the similarity and each user to be clustered being calculated It is clustered.
Optionally, described device further include:
Expansion module, for being extended to each existing subscriber's class cluster;
Wherein, the expansion module includes:
The feature of Comparative sub-module, the user class cluster for obtaining cluster and user in existing subscriber's class cluster compare Compared with determining has the number of the user of similar features in the user class cluster for clustering and obtaining;
Submodule is chosen, for choosing the first preset quantity user class cluster from the user class cluster that cluster obtains, In, the number of the user with similar features is all larger than in the user class cluster that do not choose with similar in selected user class cluster The number of the user of feature;
Submodule is extended, is added in existing subscriber's class cluster for the user in the user class cluster by selection.
Optionally, described device further include:
Module is chosen, for randomly selecting the second preset quantity feature from the feature of user to be clustered obtained As cluster centre;
Correspondingly, the cluster module, comprising:
Submodule is added, for determining the use to be clustered according to the similarity being calculated for each user to be clustered Similarity between family and any cluster centre, and judge whether the user to be clustered belongs to according to identified similarity The corresponding user class cluster of the cluster centre;If the user to be clustered belongs to the corresponding user class cluster of the cluster centre, this is waited for Cluster user is added in the corresponding user class cluster of the cluster centre;
Submodule is updated, for the feature according to the user to be clustered in the user class cluster, calculates the flat of the user class cluster Equal feature, in the case where the average characteristics being calculated are different from the cluster centre of the user class cluster, by the user class cluster Cluster centre is updated to the average characteristics being calculated, and triggers the addition submodule, until each user class cluster is averaged When feature is identical as the cluster centre of the user class cluster, using the user class cluster clustered at this time as cluster result.
Optionally, the computing module, is specifically used for,
The phase between every two user to be clustered is calculated using cosine similarity algorithm according to the feature of user to be clustered Like degree;Alternatively,
Using following formula calculate separately the similarity factor between every two user to be clustered, and according to being calculated Similarity factor determines the similarity between described two users to be clustered:
Wherein, s (i, j) indicates the similarity factor between user i and user j to be clustered to be clustered, UiIndicate use to be clustered The feature vector of family i, UjIndicate the feature vector of user j to be clustered, | Ui&Uj| indicate user i to be clustered feature vector and to The intersection of the feature vector of cluster user j, | Ui|Uj| indicate the feature vector of user i to be clustered and the feature of user j to be clustered The union of vector.
The another aspect that the present invention is implemented additionally provides a kind of electronic equipment, including processor, communication interface, memory And communication bus, wherein processor, communication interface, memory complete mutual communication by communication bus;
Memory, for storing computer program;
Processor when for executing the program stored on memory, realizes any of the above-described user's clustering method.
At the another aspect that the present invention is implemented, a kind of computer readable storage medium is additionally provided, it is described computer-readable Instruction is stored in storage medium, when run on a computer, so that computer executes any of the above-described user and gathers Class method.
At the another aspect that the present invention is implemented, the embodiment of the invention also provides a kind of, and the computer program comprising instruction is produced Product, when run on a computer, so that computer executes any of the above-described user's clustering method.
When carrying out user's cluster using scheme provided in an embodiment of the present invention, it can determine that each user to be clustered clicked History pushed information, respectively as the characteristic information of each user to be clustered;According to the characteristic information of each user to be clustered, Calculate separately the similarity between every two user to be clustered;According to the spy of the similarity and each user to be clustered that are calculated Reference breath, treats cluster user and is clustered.Since the history pushed information that user clicked is by the true of user's generation Click data, thus reliability is higher, and the history pushed information that the similar user of hobby clicked can generally also compare It is more similar, thus user is clustered using scheme provided in an embodiment of the present invention, it can be improved the accuracy of cluster result, And the user for including in each the user class cluster obtained on this basis for cluster carries out information push, can accelerate While information pushing efficiency, so that meeting the hobby of user to the information that user pushes.
Detailed description of the invention
In order to more clearly explain the embodiment of the invention or the technical proposal in the existing technology, to embodiment or will show below There is attached drawing needed in technical description to be briefly described.
Fig. 1 is a kind of flow diagram of user's clustering method provided in an embodiment of the present invention;
Fig. 2 is a kind of structural schematic diagram of user's clustering apparatus provided in an embodiment of the present invention;
Fig. 3 is the structural schematic diagram of a kind of electronic equipment provided in an embodiment of the present invention.
Specific embodiment
Following will be combined with the drawings in the embodiments of the present invention, and technical solution in the embodiment of the present invention is described.
Fig. 1 is a kind of flow diagram of user's clustering method provided in an embodiment of the present invention, this method comprises:
S100 determines the history pushed information that the user to be clustered clicked, and according to institute for each user to be clustered Determining history pushed information obtains the feature of the user to be clustered.
From time dimension, above-mentioned history pushed information refers to the information pushed.Come from content dimension It says, above-mentioned history pushed information can be with are as follows: advertising information, short video information and news information etc..
The server for sending pushed information to user can recorde the history pushed information that each user clicked, except this it Outside, the number etc. that each user clicks each history pushed information can also be recorded, for ease of description, the above- mentioned information recorded It is properly termed as historical record.It is above-mentioned when the executing subject of the embodiment of the present invention is to send the server of pushed information to user Server can directly determine the history pushed information that user to be clustered clicked according to the information recorded in historical record;And work as When the executing subject of the embodiment of the present invention is the other equipment except the server for sending pushed information to user, other equipment are then needed The history pushed information that user to be clustered clicked is determined from above-mentioned server acquisition historical record.
Each user has different characteristics, and can be described from a variety of different angles when describing a user. Inventor has found that the pushed information that a user clicked is often related to its hobby during the experiment, it is, tool The pushed information that the user for having same interest to like clicked is similar, in consideration of it, in the embodiment of the present invention, using user institute point The pushed information hit characterizes the feature of user.
Based on above content, in a kind of implementation of the invention, the push that user to be clustered clicked can use The attributes such as the type of information, the keyword for including characterize the feature of user to be clustered.For example, what user to be clustered clicked The type of pushed information are as follows: shopping, the keyword for including have: nest's milk powder, correspondingly, the feature of the user to be clustered can wrap It includes: shopping, nest's milk powder.
In another implementation of the invention, the click condition of each history pushed information can also be clicked using user To characterize the feature of user.Specifically, can indicate the feature of user to be clustered in a manner of vector, user to be clustered is to institute There is the click condition of each of history pushed information information to constitute the corresponding feature vector of feature of the user to be clustered In each element, that is, an element representation user to be clustered in feature vector pushes the click of information to a history Situation.
Specifically, whether above-mentioned click condition can click the feelings of the history pushed information for expression user to be clustered Condition, for example, the feature vector U of user i to be clusterediAn element UijWhen=1, indicate that user i clicked history pushed information J, and element UijWhen=0, then it represents that user i did not clicked on history pushed information j.
In addition, above-mentioned click condition may be to indicate user to be clustered to the feelings of the history pushed information number of clicks Condition, for example, the feature vector U of user i to be clusterediAn element UijWhen=3, indicate that user i clicked history pushed information j Number be 3 times, the feature vector U of user i to be clusterediAn element UijWhen=0, indicate that user i clicked history push The number of information j is 0 time.
For convenient for handling in subsequent process the feature of each user to be clustered, the feature of each user to be clustered can To be described based on identical history pushed information, that is, the feature of its feature is indicated for each user to be clustered The number for the element for including in vector is identical, therefore, it is possible to using matrix U indicate the feature of each user to be clustered to Amount.
Wherein, in a kind of situation, every a line of matrix U can correspond to a user to be clustered, and each column corresponding one is gone through The case where history pushed information, each user to be clustered of element representation in matrix clicks each history pushed information.Such case Under, every a line in matrix U can be used as the feature vector of a user to be clustered, indicate that a user to be clustered goes through to all The click condition of history pushed information.
In another case, each column of matrix U can correspond to a user to be clustered, the corresponding history of every a line is pushed away It delivers letters breath, the case where each user to be clustered of the element representation in matrix clicks each history pushed information.In this case, square Each column in battle array U can be used as the feature vector of a user to be clustered, indicate that a user to be clustered pushes away all history It delivers letters the click condition of breath.
S110 calculates the similarity between every two user to be clustered according to the feature of user to be clustered.
In the first implementation, every two can be calculated using cosine similarity algorithm according to the feature of user to be clustered Similarity between a user to be clustered.
In second of implementation, two can be calculated to poly- using following formula according to the feature of user to be clustered Similarity between class user:
Wherein, s (i, j) indicates the similarity factor between user i and user j to be clustered to be clustered, UiIndicate use to be clustered The feature vector of family i, UjIndicate the feature vector of user j to be clustered, | Ui&Uj| indicate user i to be clustered feature vector and to The intersection of the feature vector of cluster user j, | Ui|Uj| indicate the feature vector of user i to be clustered or the feature of user j to be clustered The union of vector.Specifically, as the feature vector U of user i to be clusterediFor the historical information of its opposite push of user i to be clustered Click condition, the feature vector U of user j to be clusteredjFor the click condition of the historical information of its opposite push of user i to be clustered When, | Ui&Uj| for the number of user i to be clustered and user the j to be clustered identical history pushed information clicked;|Ui|Uj| for The number for the history pushed information that the number for the history pushed information that cluster user i was clicked or user j to be clustered were clicked.
In the third implementation, in the case where feature is indicated in the form of feature vector, the distance between feature vector It can be used to indicate that similarity between two feature vectors, i.e. the distance between two feature vectors are smaller, show two features Similarity between vector is higher.
Specifically, can Euclidean distance between the feature vector by calculating two users to be clustered, using calculating To Euclidean distance indicate the similarity between two users to be clustered, that is, the Euclidean distance being calculated is smaller, two Similarity between a user to be clustered is higher.
The similarity factor between two users to be clustered can also be calculated first with formula one, and formula two is recycled to calculate this The distance between the feature of two users to be clustered:
D (i, j)=1-s (i, j) formula two
Wherein, s (i, j) indicates the similarity factor between user i and user j to be clustered to be clustered, UiIndicate use to be clustered The feature vector of family i, UjIndicate the feature vector of user j to be clustered, | Ui&Uj| indicate user i to be clustered feature vector and to The intersection of the feature vector of cluster user j, | Ui|Uj| indicate the feature vector of user i to be clustered or the feature of user j to be clustered The union of vector;D (i, j) indicates the distance between user i and user j to be clustered to be clustered.Specifically, as user i to be clustered Feature vector UiFor the click condition of the historical information of its opposite push of user i to be clustered, the feature vector of user j to be clustered UjFor the historical information of its opposite push of user i to be clustered click condition when, | Ui&Uj| it is user i to be clustered and use to be clustered The number for the identical history pushed information that family j was clicked;|Ui|Uj| for of the user i to be clustered history pushed information clicked The number for the history pushed information that user j several or to be clustered was clicked.
When indicating the similarity between every two user to be clustered with calculated distance, the spy of user to be clustered The distance between sign is smaller, then it represents that the similarity between the feature of user to be clustered is higher.
S120 treats cluster user and is clustered according to the feature of the similarity and each user to be clustered that are calculated.
Before being clustered, after being clustered according to the class of subscriber quantity setting intentionally got during concrete application The quantity of obtained class cluster, i.e. the second preset quantity.For example, it is desirable to obtain 3 class of subscribers, then can be set before cluster Obtained class number of clusters amount is 3 after fixed cluster.Since each class cluster can have a cluster centre in cluster process, When being clustered, the number of the user class cluster that can according to need determines the number of cluster centre, that is, one poly- One user class cluster for needing to obtain of class center representative.
Correspondingly, the numerical value of the second preset quantity is bigger, then it represents that and the user class cluster to be obtained after cluster is more, that is, It is thinner to treat the classification results that cluster user is classified.
In a kind of implementation, it can use hierarchical clustering algorithm and treat cluster user and clustered, it is default to obtain second Quantity user class cluster, then using the center of obtained user class cluster as the cluster centre of the user class cluster.
In another implementation, the feature that the user to be clustered is obtained according to identified history pushed information it Afterwards, can also include:
Step A randomly selects the second preset quantity feature as in cluster from the feature of each user to be clustered The heart.
Correspondingly, above-mentioned S120 may include steps of:
Step B, for each user to be clustered, according to the similarity being calculated, determine the user to be clustered with it is any Similarity between the cluster centre, and judge whether the user to be clustered belongs in the cluster according to identified similarity The corresponding user class cluster of the heart;If the user to be clustered belongs to the corresponding user class cluster of the cluster centre, by the user to be clustered It is added in the corresponding user class cluster of the cluster centre.
Randomly selected from the feature of each user to be clustered due to cluster centre, and user to be clustered be characterized in User clicks what the case where pushed information indicated, and user clicked some pushed information and illustrates user to this push Information is interested, so, the feature of user to be clustered and the similarity of cluster centre are higher, then show user to be clustered and cluster Hobby similarity degree between the corresponding user in center is higher, and a cluster centre represents a user class cluster, It so has found and has also determined that user belonging to user to be clustered with the high cluster centre of the characteristic similarity of user to be clustered Class cluster.
Step C calculates the use according to the feature of the user to be clustered in the user class cluster for each user class cluster The average characteristics of family class cluster, in the case where the average characteristics being calculated are different from the cluster centre of the user class cluster, by this The cluster centre of user class cluster is updated to the average characteristics being calculated;And return step A, until each user class cluster is averaged When feature is identical as the cluster centre of the user class cluster, using the user class cluster clustered at this time as cluster result.
The average value of the feature for the user to be clustered for including in average characteristics i.e. user class cluster, each user class cluster are equal Correspondence possesses an average characteristics.
In the case where user to be clustered is characterized in the number determination for the history pushed information clicked according to user, than It such as, include three users in a user class cluster, the feature of user 1 is 2,3,4;The feature of user 2 is 5,6,5;The spy of user 3 Sign is 5,9,6;When calculating average characteristics, first element of average characteristics is (2+5+5)/3=4;Second element is (3+6+ 9)/3=6;Third element is (4+5+6)/3=5.
And in the case where user to be clustered is characterized in that whether clicking history pushed information according to user determines, than It such as, include three users in a user class cluster, the feature of user 1 is 1,0,1;The feature of user 2 is 1,0,1;The spy of user 3 Sign is 0,0,1;When calculating average characteristics, first element of average characteristics is (1+1+0)/3=0.66;Second element is (0 + 0+0)/3=0;Third element is (1+1+1)/3=1.
Since above-mentioned average characteristics illustrate the average value of the feature for the user to be clustered for including in user class cluster, so working as Under the cluster centre of selection and the different situation of the average characteristics being calculated, then show that selected cluster centre is not The actual center of user class cluster, cluster result at this time may have error, it is then desired to by the cluster of the user class cluster Center is updated to the average characteristics being calculated, and since each class cluster can have a cluster centre, cluster centre becomes Change correspondingly user class cluster also just to change, thus, after cluster centre updates, need to re-start cluster, it is poly- to improve The accuracy of class result.
When the cluster centre of selection is identical as the average characteristics being calculated, show each user for including in user class cluster It is centered around around cluster centre and is evenly distributed, belong to same type of user;When the cluster centre of selection be calculated Average characteristics it is not identical when, it may be possible to be not belonging to of a sort user due to existing in user class cluster with other users and generate Error, it is then desired to need to re-start cluster, after cluster centre updates to improve the accuracy of cluster result.
The quantity of user is ever-increasing in practical application, and in order to accelerate to cluster speed, it can be only to a period of time The user inside newly increased clusters, then will the user for including in the obtained user class cluster of cluster be added to this time it In the preceding each existing subscriber's class cluster clustered to user, specifically, in a kind of implementation of the embodiment of the present invention, It treats after cluster user clustered, each existing subscriber's class cluster can also be expanded using the user class cluster that cluster obtains Exhibition, specifically includes following steps D-F:
The user class cluster that cluster obtains is compared by step D with the feature of user in existing subscriber's class cluster, determines cluster The number of user in obtained user class cluster with similar features.
Existing subscriber's class cluster is clustered to obtain namely before this clusters the user newly increased to user User class cluster.
User with similar features user namely similar or identical to the click condition of history pushed information;
In a kind of situation, the quantity of identical history pushed information is greater than pre- in the history pushed information that two users clicked When fixed number value, it may be considered that the two users have similar features, for example, predetermined value is 2, user 1, clicked history and pushes away Deliver letters breath H, I, G, K;User 2, click history pushed information H, I, G, L;The identical history that user 1 and user 2 clicked pushes away Information is H, I, G, and quantity is 3 and greater than 2, then user 1 and user 2 is determined as the user with similar features;
In another case, the history pushed information that two users clicked is identical, it may be considered that the two are used Family has similar features, for example, user 1, clicked history pushed information H, I, G;User 2, clicked history pushed information H, I,G;It is H, I, G that the history that user 1 and user 2 clicked, which pushes away information, then user 1 and user 2 is determined as having similar spy The user of sign.
Step E chooses the first preset quantity user class cluster from the user class cluster that cluster obtains.
Wherein, the number of the user in selected user class cluster with similar features is all larger than the user class cluster that do not choose In with similar features user number.
In a kind of implementation, the user class cluster that cluster is obtained, according to the number of the user in cluster with similar features Descending sort is carried out, preceding first preset quantity user class cluster is chosen;Wherein, the number of the user in cluster with similar features is It is determined in step D.
In another implementation, optional cluster in the set of the user class cluster obtained from cluster, determining has in the cluster The number of the user of similar features determines whether the number is to have the number of the user of similar features maximum in all clusters; If so, determining that the user class cluster is the user class cluster chosen, and the use is deleted from the set for the user class cluster that cluster obtains Family class cluster, the step of returning again to the optional cluster from the set of user class cluster that cluster obtains, until the user class cluster of selection reaches To the first preset quantity.
First preset quantity, which can according to need, determines the number of users that existing subscriber's class cluster is extended, due to cluster The number of users for including in obtained each user class cluster is fixed, then, when needing to be extended to existing subscriber's class cluster Number of users it is more, correspondingly, then it is also more to need to cluster obtained user class cluster, that is, the first preset quantity is got over Greatly.
For example, the number comprising the user with same interest hobby in an existing user class cluster is 200, need When the user's number for including in the user class cluster being expanded to 1000, that is, needing to extend 800 users and existing subscriber The user for including in user class cluster L, M, N, O, P that there is class cluster the cluster of user's number of same characteristic features from high to low to obtain Number is respectively as follows: 300,300,200,200,300;So, then it needs using the user for including in user class cluster L, M, N to this There is user class cluster to be extended.And in a kind of situation, when the user's number for including in the user class cluster is expanded to 1100 by needs It is a, that is, when needing to extend 900 users, due to the user for including in user class cluster L, M, N lazy weight with by this There is the number of users for including in user class cluster to extend 900, it is possible to which optional 100 users are added in user class cluster O In the user class cluster, realizes and the user's number for including in the user class cluster is expanded to 1100.
User in the user class cluster of selection is added in existing subscriber's class cluster by step F.
The case where history pushed information that two users clicked, is closer, then shows that the hobby of the two users is got over It is close, then when the user's number with similar features for including in the user class cluster and existing subscriber's class cluster that cluster obtains is got over It is more, that is, in two user class clusters comprising the identical user's number of hobby it is more when, then show that two user class clusters are It include that a possibility that user is same class user is bigger, therefore, it is possible to utilize the use with existing subscriber's class cluster with same characteristic features The user for including in user class cluster more than the number of family is extended existing subscriber's class cluster.
In each scheme provided in an embodiment of the present invention, since the history pushed information that user clicked is to be produced by user Raw true click data, thus reliability is higher, and the history pushed information that the similar user of hobby clicked is logical Often also can be more similar, thus the scheme provided through the embodiment of the present invention clusters user, can be improved cluster result Accuracy, and carry out information push for the user for including in obtained each the user class cluster of cluster on this basis, It can be while accelerating pushing efficiency, so that meeting the hobby of user to the information that user pushes.
It referring to fig. 2, is a kind of structural schematic diagram of user's clustering apparatus provided in an embodiment of the present invention, which includes:
Determining module 200 determines the history push letter that the user to be clustered clicked for being directed to each user to be clustered It ceases, and obtains the feature of the user to be clustered according to identified history pushed information;
Computing module 210 calculates similar between every two user to be clustered for the feature according to user to be clustered Degree;
Cluster module 220, for the feature according to the similarity and each user to be clustered being calculated, to use to be clustered Family is clustered.
In a kind of implementation of the embodiment of the present invention, above-mentioned apparatus further include:
Expansion module, for being extended to each existing subscriber's class cluster, wherein the expansion module includes:
The feature of Comparative sub-module, the user class cluster for obtaining cluster and user in existing subscriber's class cluster compare Compared with determining has the number of the user of similar features in the user class cluster for clustering and obtaining;
Submodule is chosen, for choosing the first preset quantity user class cluster from the user class cluster that cluster obtains, In, the number of the user with similar features is all larger than in the user class cluster that do not choose with similar in selected user class cluster The number of the user of feature;
Submodule is extended, is added in existing subscriber's class cluster for the user in the user class cluster by selection.
In a kind of implementation of the embodiment of the present invention, described device further include:
Module is chosen, for randomly selecting the second preset quantity feature from the feature of user to be clustered obtained As cluster centre;
Correspondingly, the cluster module, comprising:
Submodule is added, for determining the use to be clustered according to the similarity being calculated for each user to be clustered Similarity between family and any cluster centre, and judge whether the user to be clustered belongs to according to identified similarity The corresponding user class cluster of the cluster centre;If the user to be clustered belongs to the corresponding user class cluster of the cluster centre, this is waited for Cluster user is added in the corresponding user class cluster of the cluster centre;
Submodule is updated, for the feature according to the user to be clustered in the user class cluster, calculates the flat of the user class cluster Equal feature, in the case where the average characteristics being calculated are different from the cluster centre of the user class cluster, by the user class cluster Cluster centre is updated to the average characteristics being calculated, and triggers the addition submodule, until each user class cluster is averaged When feature is identical as the cluster centre of the user class cluster, using the user class cluster clustered at this time as cluster result.
In a kind of implementation of the embodiment of the present invention, the computing module is specifically used for,
The phase between every two user to be clustered is calculated using cosine similarity algorithm according to the feature of user to be clustered Like degree;Alternatively,
Using following formula calculate separately the similarity factor between every two user to be clustered, and according to being calculated Similarity factor determines the similarity between described two users to be clustered:
Wherein, s (i, j) indicates the similarity factor between user i and user j to be clustered to be clustered, UiIndicate use to be clustered The feature vector of family i, UjIndicate the feature vector of user j to be clustered, | Ui&Uj| indicate user i to be clustered feature vector and to The intersection of the feature vector of cluster user j, | Ui|Uj| indicate the feature vector of user i to be clustered and the feature of user j to be clustered The union of vector.
In each scheme provided in an embodiment of the present invention, since the history pushed information that user clicked is to be produced by user Raw true click data, thus reliability is higher, and the history pushed information that the similar user of hobby clicked is logical Often also can be more similar, thus the scheme provided through the embodiment of the present invention clusters user, can be improved cluster result Accuracy, and carry out information push for the user for including in obtained each the user class cluster of cluster on this basis, It can be while accelerating pushing efficiency, so that meeting the hobby of user to the information that user pushes.
The embodiment of the invention also provides a kind of electronic equipment, as shown in figure 3, include processor 001, communication interface 002, Memory 003 and communication bus 004, wherein processor 001, communication interface 002, memory 003 are complete by communication bus 004 At mutual communication,
Memory 003, for storing computer program;
Processor 001 when for executing the program stored on memory 003, realizes use provided in an embodiment of the present invention Family clustering method.
Specifically, above-mentioned user's clustering method includes:
For each user to be clustered, the history pushed information that the user to be clustered clicked is determined, and according to determining History pushed information obtain the feature of the user to be clustered;
According to the feature of user to be clustered, the similarity between every two user to be clustered is calculated;
According to the feature of the similarity and each user to be clustered that are calculated, treats cluster user and clustered.
It should be noted that above-mentioned processor 001, which executes the program stored on memory 003, realizes user's clustering method Other embodiments, with preceding method embodiment part provide embodiment it is identical, which is not described herein again.
In each scheme provided in an embodiment of the present invention, since the history pushed information that user clicked is to be produced by user Raw true click data, thus reliability is higher, and the history pushed information that the similar user of hobby clicked is logical Often also can be more similar, thus the scheme provided through the embodiment of the present invention clusters user, can be improved cluster result Accuracy, and carry out information push for the user for including in obtained each the user class cluster of cluster on this basis, It can be while accelerating pushing efficiency, so that meeting the hobby of user to the information that user pushes.
The communication bus that above-mentioned electronic equipment is mentioned can be Peripheral Component Interconnect standard (Peripheral Component Interconnect, PCI) bus or expanding the industrial standard structure (Extended Industry Standard Architecture, EISA) bus etc..The communication bus can be divided into address bus, data/address bus, control bus etc..For just It is only indicated with a thick line in expression, figure, it is not intended that an only bus or a type of bus.
Communication interface is for the communication between above-mentioned electronic equipment and other equipment.
Memory may include random access memory (Random Access Memory, RAM), also may include non-easy The property lost memory (Non-Volatile Memory, NVM), for example, at least a magnetic disk storage.Optionally, memory may be used also To be storage device that at least one is located remotely from aforementioned processor.
Above-mentioned processor can be general processor, including central processing unit (Central Processing Unit, CPU), network processing unit (Network Processor, NP) etc.;It can also be digital signal processor (Digital Signal Processing, DSP), it is specific integrated circuit (Application Specific Integrated Circuit, ASIC), existing It is field programmable gate array (Field-Programmable Gate Array, FPGA) or other programmable logic device, discrete Door or transistor logic, discrete hardware components.
In another embodiment provided by the invention, a kind of computer readable storage medium is additionally provided, which can It reads to be stored with computer program in storage medium, realizes that the embodiment of the present invention provides when the computer program is executed by processor User's clustering method.
Specifically, above-mentioned user's clustering method includes:
For each user to be clustered, the history pushed information that the user to be clustered clicked is determined, and according to determining History pushed information obtain the feature of the user to be clustered;
According to the feature of user to be clustered, the similarity between every two user to be clustered is calculated;
According to the feature of the similarity and each user to be clustered that are calculated, treats cluster user and clustered.
It should be noted that the other embodiments of user's clustering method are realized by above-mentioned computer readable storage medium, Identical as the embodiment that preceding method embodiment part provides, which is not described herein again.
In each scheme provided in an embodiment of the present invention, since the history pushed information that user clicked is to be produced by user Raw true click data, thus reliability is higher, and the history pushed information that the similar user of hobby clicked is logical Often also can be more similar, thus the scheme provided through the embodiment of the present invention clusters user, can be improved cluster result Accuracy, and carry out information push for the user for including in obtained each the user class cluster of cluster on this basis, It can be while accelerating pushing efficiency, so that meeting the hobby of user to the information that user pushes.
In another embodiment provided by the invention, a kind of computer program product comprising instruction is additionally provided, when it When running on computers, so that computer executes user's clustering method provided by the above embodiment.
Specifically, above-mentioned user's clustering method includes:
For each user to be clustered, the history pushed information that the user to be clustered clicked is determined, and according to determining History pushed information obtain the feature of the user to be clustered;
According to the feature of user to be clustered, the similarity between every two user to be clustered is calculated;
According to the feature of the similarity and each user to be clustered that are calculated, treats cluster user and clustered.
It should be noted that the other embodiments of user's clustering method are realized by above-mentioned computer program product, and it is preceding The embodiment for stating the offer of embodiment of the method part is identical, and which is not described herein again.
In each scheme provided in an embodiment of the present invention, since the history pushed information that user clicked is to be produced by user Raw true click data, thus reliability is higher, and the history pushed information that the similar user of hobby clicked is logical Often also can be more similar, thus the scheme provided through the embodiment of the present invention clusters user, can be improved cluster result Accuracy, and carry out information push for the user for including in obtained each the user class cluster of cluster on this basis, It can be while accelerating pushing efficiency, so that meeting the hobby of user to the information that user pushes.
In the above-described embodiments, can come wholly or partly by software, hardware, firmware or any combination thereof real It is existing.When implemented in software, it can entirely or partly realize in the form of a computer program product.The computer program Product includes one or more computer instructions.When loading on computers and executing the computer program instructions, all or It partly generates according to process or function described in the embodiment of the present invention.The computer can be general purpose computer, dedicated meter Calculation machine, computer network or other programmable devices.The computer instruction can store in computer readable storage medium In, or from a computer readable storage medium to the transmission of another computer readable storage medium, for example, the computer Instruction can pass through wired (such as coaxial cable, optical fiber, number from a web-site, computer, server or data center User's line (DSL)) or wireless (such as infrared, wireless, microwave etc.) mode to another web-site, computer, server or Data center is transmitted.The computer readable storage medium can be any usable medium that computer can access or It is comprising data storage devices such as one or more usable mediums integrated server, data centers.The usable medium can be with It is magnetic medium, (for example, floppy disk, hard disk, tape), optical medium (for example, DVD) or semiconductor medium (such as solid state hard disk Solid State Disk (SSD)) etc..
It should be noted that, in this document, relational terms such as first and second and the like are used merely to a reality Body or operation are distinguished with another entity or operation, are deposited without necessarily requiring or implying between these entities or operation In any actual relationship or order or sequence.Moreover, the terms "include", "comprise" or its any other variant are intended to Non-exclusive inclusion, so that the process, method, article or equipment including a series of elements is not only wanted including those Element, but also including other elements that are not explicitly listed, or further include for this process, method, article or equipment Intrinsic element.In the absence of more restrictions, the element limited by sentence "including a ...", it is not excluded that There is also other identical elements in process, method, article or equipment including the element.
Each embodiment in this specification is all made of relevant mode and describes, same and similar portion between each embodiment Dividing may refer to each other, and each embodiment focuses on the differences from other embodiments.Especially for device, For electronic equipment, computer readable storage medium and computer program product embodiments, since it is substantially similar to method Embodiment, so being described relatively simple, the relevent part can refer to the partial explaination of embodiments of method.
The foregoing is merely illustrative of the preferred embodiments of the present invention, is not intended to limit the scope of the present invention.It is all Any modification, equivalent replacement, improvement and so within the spirit and principles in the present invention, are all contained in protection scope of the present invention It is interior.

Claims (10)

1. a kind of user's clustering method, which is characterized in that the described method includes:
For each user to be clustered, the history pushed information that the user to be clustered clicked is determined, and go through according to identified History pushed information obtains the feature of the user to be clustered;
According to the feature of user to be clustered, the similarity between every two user to be clustered is calculated;
According to the feature of the similarity and each user to be clustered that are calculated, treats cluster user and clustered.
2. the method as described in claim 1, which is characterized in that the similarity and each use to be clustered that the basis is calculated The feature at family, after treating the step of cluster user is clustered, further includes:
Each existing subscriber's class cluster is extended using following manner:
The user class cluster that cluster obtains is compared with the feature of user in existing subscriber's class cluster, determines the user that cluster obtains The number of user in class cluster with similar features;
From the user class cluster that cluster obtains, the first preset quantity user class cluster is chosen, wherein in selected user class cluster The number of user with similar features is all larger than the number of the user in the user class cluster that do not choose with similar features;
User in the user class cluster of selection is added in existing subscriber's class cluster.
3. method according to claim 1 or 2, which is characterized in that obtained in the history pushed information according to determined by After the step of feature of the user to be clustered, this method further include:
From the feature of user to be clustered obtained, the second preset quantity feature is randomly selected as cluster centre;
The feature of similarity and each user to be clustered that the basis is calculated, treats the step that cluster user is clustered Suddenly, comprising:
For each user to be clustered, according to the similarity being calculated, determine in the user to be clustered and any cluster Similarity between the heart, and judge whether the user to be clustered belongs to the corresponding use of the cluster centre according to identified similarity Family class cluster;If the user to be clustered belongs to the corresponding user class cluster of the cluster centre, which is added to this and is gathered In the corresponding user class cluster in class center;
The flat of the user class cluster is calculated according to the feature of the user to be clustered in the user class cluster for each user class cluster Equal feature, in the case where the average characteristics being calculated are different from the cluster centre of the user class cluster, by the user class cluster Cluster centre is updated to the average characteristics being calculated;And each user to be clustered is directed to described in returning, according to what is be calculated Similarity determines the similarity between the user to be clustered and any cluster centre, and is sentenced according to identified similarity Whether the user to be clustered of breaking belongs to the corresponding user class cluster of the cluster centre;If the user to be clustered belongs to the cluster centre pair The user to be clustered is then added to the step in the corresponding user class cluster of the cluster centre by the user class cluster answered, until each When the average characteristics of user class cluster are identical as the cluster centre of the user class cluster, using the user class cluster clustered at this time as poly- Class result.
4. method according to claim 1 or 2, which is characterized in that the feature according to user to be clustered calculates every two The step of similarity between user to be clustered, comprising:
The similarity between every two user to be clustered is calculated using cosine similarity algorithm according to the feature of user to be clustered; Alternatively,
The similarity factor between every two user to be clustered is calculated using following formula, and according to the similar system being calculated Number, determines the similarity between described two users to be clustered:
Wherein, s (i, j) indicates the similarity factor between user i and user j to be clustered to be clustered, UiIndicate user i's to be clustered Feature vector, UjIndicate the feature vector of user j to be clustered, | Ui&Uj| indicate the feature vector of user i to be clustered and to be clustered The intersection of the feature vector of user j, | Ui|Uj| indicate the feature vector of user i to be clustered and the feature vector of user j to be clustered Union.
5. a kind of user's clustering apparatus, which is characterized in that described device includes:
Determining module determines the history pushed information that the user to be clustered clicked, and root for being directed to each user to be clustered The feature of the user to be clustered is obtained according to identified history pushed information;
Computing module calculates the similarity between every two user to be clustered for the feature according to user to be clustered;
Cluster module treats cluster user progress for the feature according to the similarity and each user to be clustered being calculated Cluster.
6. device as claimed in claim 5, which is characterized in that described device further include:
Expansion module, for being extended to each existing subscriber's class cluster;
Wherein, the expansion module includes:
Comparative sub-module, the user class cluster for obtaining cluster are compared with the feature of user in existing subscriber's class cluster, really The number of the user in obtained user class cluster with similar features is clustered calmly;
Submodule is chosen, for choosing the first preset quantity user class cluster, wherein institute from the user class cluster that cluster obtains The number of the user with similar features, which is all larger than in the user class cluster that do not choose, in the user class cluster of selection has similar features User number;
Submodule is extended, is added in existing subscriber's class cluster for the user in the user class cluster by selection.
7. such as device described in claim 5 or 6, which is characterized in that described device further include:
Module is chosen, for from the feature of user to be clustered obtained, randomly selecting the second preset quantity feature conduct Cluster centre;
Correspondingly, the cluster module, comprising:
Add submodule, for be directed to each user to be clustered, according to the similarity being calculated, determine the user to be clustered with Similarity between any cluster centre, and judge whether the user to be clustered belongs to this and gather according to identified similarity The corresponding user class cluster in class center;If the user to be clustered belongs to the corresponding user class cluster of the cluster centre, this is to be clustered User is added in the corresponding user class cluster of the cluster centre;
It updates submodule and calculates the average spy of the user class cluster for the feature according to the user to be clustered in the user class cluster Sign, in the case where the average characteristics being calculated are different from the cluster centre of the user class cluster, by the cluster of the user class cluster Center is updated to the average characteristics being calculated, and triggers the addition submodule, until the average characteristics of each user class cluster When identical as the cluster centre of the user class cluster, using the user class cluster clustered at this time as cluster result.
8. such as device described in claim 5 or 6, which is characterized in that the computing module is specifically used for,
The similarity between every two user to be clustered is calculated using cosine similarity algorithm according to the feature of user to be clustered; Alternatively,
The similarity factor between every two user to be clustered is calculated separately using following formula, and similar according to what is be calculated Coefficient determines the similarity between described two users to be clustered:
Wherein, s (i, j) indicates the similarity factor between user i and user j to be clustered to be clustered, UiIndicate user i's to be clustered Feature vector, UjIndicate the feature vector of user j to be clustered, | Ui&Uj| indicate the feature vector of user i to be clustered and to be clustered The intersection of the feature vector of user j, | Ui|Uj| indicate the feature vector of user i to be clustered and the feature vector of user j to be clustered Union.
9. a kind of electronic equipment, which is characterized in that including processor, communication interface, memory and communication bus, wherein processing Device, communication interface, memory complete mutual communication by communication bus;
Memory, for storing computer program;
Processor when for executing the program stored on memory, realizes any method and step of claim 1-4.
10. a kind of computer readable storage medium, which is characterized in that be stored with computer in the computer readable storage medium Program realizes claim 1-4 any method and step when the computer program is executed by processor.
CN201910043942.0A 2019-01-17 2019-01-17 A kind of user's clustering method, device and equipment Pending CN109886300A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910043942.0A CN109886300A (en) 2019-01-17 2019-01-17 A kind of user's clustering method, device and equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910043942.0A CN109886300A (en) 2019-01-17 2019-01-17 A kind of user's clustering method, device and equipment

Publications (1)

Publication Number Publication Date
CN109886300A true CN109886300A (en) 2019-06-14

Family

ID=66926156

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910043942.0A Pending CN109886300A (en) 2019-01-17 2019-01-17 A kind of user's clustering method, device and equipment

Country Status (1)

Country Link
CN (1) CN109886300A (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111444933A (en) * 2019-11-26 2020-07-24 北京邮电大学 Object classification method and device
CN111784412A (en) * 2020-07-15 2020-10-16 Oppo广东移动通信有限公司 Information pushing method and device, electronic equipment and storage medium
CN114880580A (en) * 2022-06-15 2022-08-09 北京百度网讯科技有限公司 Information recommendation method and device, electronic equipment and medium

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104063801A (en) * 2014-06-23 2014-09-24 广州优蜜信息科技有限公司 Mobile advertisement recommendation method based on cluster
CN106228188A (en) * 2016-07-22 2016-12-14 北京市商汤科技开发有限公司 Clustering method, device and electronic equipment
CN108108419A (en) * 2017-12-15 2018-06-01 百度在线网络技术(北京)有限公司 A kind of information recommendation method, device, equipment and medium
CN108647293A (en) * 2018-05-07 2018-10-12 广州虎牙信息科技有限公司 Video recommendation method, device, storage medium and server
CN108898432A (en) * 2018-06-25 2018-11-27 武汉斗鱼网络科技有限公司 Advertisement putting effect evaluation method and device and electronic equipment

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104063801A (en) * 2014-06-23 2014-09-24 广州优蜜信息科技有限公司 Mobile advertisement recommendation method based on cluster
CN106228188A (en) * 2016-07-22 2016-12-14 北京市商汤科技开发有限公司 Clustering method, device and electronic equipment
CN108108419A (en) * 2017-12-15 2018-06-01 百度在线网络技术(北京)有限公司 A kind of information recommendation method, device, equipment and medium
CN108647293A (en) * 2018-05-07 2018-10-12 广州虎牙信息科技有限公司 Video recommendation method, device, storage medium and server
CN108898432A (en) * 2018-06-25 2018-11-27 武汉斗鱼网络科技有限公司 Advertisement putting effect evaluation method and device and electronic equipment

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111444933A (en) * 2019-11-26 2020-07-24 北京邮电大学 Object classification method and device
CN111444933B (en) * 2019-11-26 2023-10-10 北京邮电大学 Object classification method and device
CN111784412A (en) * 2020-07-15 2020-10-16 Oppo广东移动通信有限公司 Information pushing method and device, electronic equipment and storage medium
CN114880580A (en) * 2022-06-15 2022-08-09 北京百度网讯科技有限公司 Information recommendation method and device, electronic equipment and medium

Similar Documents

Publication Publication Date Title
US10277480B2 (en) Method, apparatus, and system for determining a location corresponding to an IP address
US9405746B2 (en) User behavior models based on source domain
US10547618B2 (en) Method and apparatus for setting access privilege, server and storage medium
CN107341716A (en) A kind of method, apparatus and electronic equipment of the identification of malice order
US20140280548A1 (en) Method and system for discovery of user unknown interests
EP2438539A1 (en) Co-selected image classification
CN108021708B (en) Content recommendation method and device and computer readable storage medium
CN109886300A (en) A kind of user's clustering method, device and equipment
US20120295633A1 (en) Using user's social connection and information in web searching
CN108366012B (en) Social relationship establishing method and device and electronic equipment
CN109241403A (en) Item recommendation method, device, machinery equipment and computer readable storage medium
CN108345601A (en) Search result ordering method and device
CN110928739B (en) Process monitoring method and device and computing equipment
US10762122B2 (en) Method and device for assessing quality of multimedia resource
US20150302088A1 (en) Method and System for Providing Personalized Content
CN105574030A (en) Information search method and device
US8750629B2 (en) Method for searching and ranking images clustered based upon similar content
CN108154024A (en) A kind of data retrieval method, device and electronic equipment
WO2017095413A1 (en) Incremental automatic update of ranked neighbor lists based on k-th nearest neighbors
US10733244B2 (en) Data retrieval system
CN109885729B (en) Method, device and system for displaying data
US10516684B1 (en) Recommending and prioritizing computer log anomalies
CN114997327A (en) Target object classification method and device, storage medium and electronic equipment
CN113868373A (en) Word cloud generation method and device, electronic equipment and storage medium
CN112015924A (en) Streaming media caching method and device and computer storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20190614

RJ01 Rejection of invention patent application after publication