CN102646097A - Clustering method and device - Google Patents

Clustering method and device Download PDF

Info

Publication number
CN102646097A
CN102646097A CN2011100412008A CN201110041200A CN102646097A CN 102646097 A CN102646097 A CN 102646097A CN 2011100412008 A CN2011100412008 A CN 2011100412008A CN 201110041200 A CN201110041200 A CN 201110041200A CN 102646097 A CN102646097 A CN 102646097A
Authority
CN
China
Prior art keywords
hash
minhash
clustering model
hash function
classification
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN2011100412008A
Other languages
Chinese (zh)
Other versions
CN102646097B (en
Inventor
陈建群
杨志峰
刘建
贺鹏程
崔岩
肖战勇
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tencent Technology Shenzhen Co Ltd
Tencent Cloud Computing Beijing Co Ltd
Original Assignee
Tencent Technology Shenzhen Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tencent Technology Shenzhen Co Ltd filed Critical Tencent Technology Shenzhen Co Ltd
Priority to CN201110041200.8A priority Critical patent/CN102646097B/en
Publication of CN102646097A publication Critical patent/CN102646097A/en
Application granted granted Critical
Publication of CN102646097B publication Critical patent/CN102646097B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Landscapes

  • Storage Device Security (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a clustering method and device, wherein the clustering method comprises the steps of: classifying types of a plurality of users through a MinHash clustering model, storing a Hash function of the MinHash clustering model, storing a correspondence of Hash values of the plurality of users and the types; determining Hash values of new users through the Hash function of the MinHash clustering model; and determining types of the Hash values of the new users according to the correspondence of the Hash values and the types. Through determining the Hash function of the MinHash clustering model and the correspondence of the Hash value and the type, the new users are rapidly re-clustered by using the existing MinHash clustering model without re-generating a clustering model, thus the clustering efficiency of the new users is increased.

Description

A kind of clustering method and device
Technical field
The present invention relates to networking technology area, relate in particular to a kind of clustering method and device.
Background technology
MinHash (Minwise Independent Permutation Hashing; A kind of Hash that exchanges independent condition that meets) technology is shown the set of element to subscriber's meter, can estimate two similarities between the set based on the clustering method of MinHash; Realize the effect of quick clustering; And then provide the user to recommend, as be applied to approximate webpage context of detection, approximate webpage recommending is provided.
But, for a new set, owing to do not participate in cluster before; The cluster of promptly new set subordinate is unknown, and the clustering method of MinHash can't directly be confirmed the class that new set belongs to, and can only regenerate Clustering Model; Newly gathered the class that belongs to; Could further recommend, cause new user's cluster efficient low, influence new user's recommendation efficient.
Summary of the invention
The embodiment of the invention provides a kind of clustering method and device, and it improves the cluster efficient to new user.
A kind of clustering method comprises:
Through the MinHash Clustering Model is that a plurality of users divide classification, stores the hash function of said MinHash Clustering Model, and the corresponding relation of storing said a plurality of users' cryptographic hash and said classification;
Confirm new user's cryptographic hash through the hash function of said MinHash Clustering Model;
According to the corresponding relation of said cryptographic hash and said classification, confirm said new user's cryptographic hash corresponding class.
A kind of clustering apparatus comprises:
Division unit, being used for through the MinHash Clustering Model is that a plurality of users divide classification, stores the hash function of said MinHash Clustering Model, and the corresponding relation of storing said a plurality of users' cryptographic hash and said classification;
Confirm the unit, be used for confirming new user's cryptographic hash through the hash function of said MinHash Clustering Model;
Cluster cell is used for the corresponding relation according to said cryptographic hash and said classification, confirms said new user's cryptographic hash corresponding class.
Clustering method that the embodiment of the invention provides and device; Through the hash function of definite MinHash Clustering Model and the corresponding relation of cryptographic hash and classification; Realization utilizes existing MinHash Clustering Model with new user type of reunion fast; No longer need regenerate Clustering Model, improve cluster efficient new user.
Description of drawings
In order to be illustrated more clearly in the technical scheme of the embodiment of the invention; The accompanying drawing of required use is done to introduce simply in will describing embodiment below; Obviously, the accompanying drawing in describing below only is some embodiments of the present invention, for those of ordinary skill in the art; Under the prerequisite of not paying creative work, can also obtain other accompanying drawings according to these accompanying drawings.
The schematic flow sheet of the clustering method that Fig. 1 provides for the embodiment of the invention.
The formation synoptic diagram one of the clustering apparatus that Fig. 2 provides for the embodiment of the invention.
The formation synoptic diagram two of the clustering apparatus that Fig. 3 provides for the embodiment of the invention.
The schematic flow sheet of clustering method under an application scenarios that Fig. 4 provides for the embodiment of the invention.
Embodiment
Below in conjunction with the accompanying drawing in the embodiment of the invention, the technical scheme in the embodiment of the invention is carried out clear, intactly description, obviously, described embodiment only is the present invention's part embodiment, rather than whole embodiment.Based on embodiments of the invention, those of ordinary skills belong to protection scope of the present invention not making the every other embodiment that is obtained under the creative work prerequisite.
As shown in Figure 1, the embodiment of the invention provides a kind of clustering method, comprising:
11, be that a plurality of users divide classification through the MinHash Clustering Model, the hash function of store M inHash Clustering Model, and the corresponding relation of storing a plurality of users' cryptographic hash and classification.
12, confirm new user's cryptographic hash through the hash function of MinHash Clustering Model.
13,, confirm new user's cryptographic hash corresponding class according to the corresponding relation of cryptographic hash and classification.
Technical scheme by the invention described above provides can be found out; Through the hash function of definite MinHash Clustering Model and the corresponding relation of cryptographic hash and classification; Realization utilizes existing MinHash Clustering Model with new user type of reunion fast; No longer need regenerate Clustering Model, improve cluster efficient new user.
Particularly, the embodiment of the invention provides in a kind of clustering method, and the user can correspondence be defined as the set that a plurality of key words (or element) constitute; As the set A that constitutes of the key word of describing user interest, as in music recommend, key word can be the song of user's collection; In news is recommended; The keyword of the news that key word can be browsed for the user, in film is recommended, film that key word can be watched for the user or the like.
When user's interest has had change, the key word of set increases or reduces, and then can be used as new user to this user for this user sets up new user's overview (profile).
Optional, the hash function of MinHash Clustering Model can comprise:
One group of hash function or many group hash functions, wherein, every group of hash function is made up of a plurality of different Hash functions.
Exemplary, like q group different Hash function, every group all has p different Hash function, for each set, generates a literary sketch (sketch) according to every group of hash function, and this literary sketch is made up of p cryptographic hash, for each set, obtains q literary sketch.
Wherein, hash function can be for the random Harsh function in the MinHash Clustering Model, and is unrestricted.
Particularly, step 11 is that a plurality of users divide classification through the MinHash Clustering Model, and the hash function of store M inHash Clustering Model, and the corresponding relation of storing a plurality of users' cryptographic hash and classification can comprise:
Confirm the also hash function of store M inHash Clustering Model.
Confirm a plurality of users' cryptographic hash through the hash function of MinHash Clustering Model.
The user that cryptographic hash is consistent is divided into same classification.
The corresponding relation of storage cryptographic hash and classification.
Wherein, Hash function based on the MinHash Clustering Model generates at random; After confirming the hash function of MinHash Clustering Model; Hash function that can store M inHash Clustering Model so that the new user of subsequent treatment the time, finds this new user's corresponding class in existing MinHash Clustering Model the inside.In addition; The step of the hash function of store M inHash Clustering Model and be that a plurality of users divide unqualified sequencing relation between the step of classification through the MinHash Clustering Model, promptly the step of the hash function of store M inHash Clustering Model can be before or after a plurality of users divide the step of classification through the MinHash Clustering Model.
Exemplary, for each set, generating a literary sketch according to every group of hash function, this literary sketch is made up of p cryptographic hash, for each set, obtains q literary sketch.If given two set are the same as long as in q the literary sketch of two set a literary sketch is arranged, just get together these two set, be divided into same classification.Can also confirm the corresponding relation of classification and literary sketch, i.e. the corresponding relation of cryptographic hash and classification according to the classification of dividing.
And, confirm q group different Hash function, and can preserve q group different Hash function.
Follow-up, when handling new user, get into step 12 is confirmed new user through the hash function of MinHash Clustering Model cryptographic hash; Get into the corresponding relation of step 13 according to cryptographic hash and classification; Confirm new user's cryptographic hash corresponding class, thereby realize meeting again fast class, no longer need regenerate Clustering Model; Raising is to new user's cluster efficient, and then also can carry out the recommendation based on cluster to new user.
Step 12 is confirmed to comprise new user's cryptographic hash through the hash function of MinHash Clustering Model:
Through the hash function of MinHash Clustering Model, the parallel cryptographic hash of confirming one group of new user.
Step 13 is confirmed new user's cryptographic hash corresponding class according to the corresponding relation of cryptographic hash and classification, can comprise:
According to the corresponding relation of cryptographic hash and classification, the parallel cryptographic hash corresponding class of confirming one group of new user.
It is thus clear that the parallel cryptographic hash of confirming one group of new user is convenient to walk abreast and is confirmed new user's cryptographic hash corresponding class to improve cluster efficient.
Technical scheme by the invention described above provides can find out that the cluster of MinHash Clustering Model can walk abreast, and is separate when each user calculates MinHash, in case after cryptographic hash calculated, the classification under the user had also just been confirmed.For new user; It is independent too to calculate MinHash; Do not receive other existing subscribers' influence, to new user's cryptographic hash, as long as guarantee that hash function is the same; New user is generated a literary sketch by same mode, just can find this new user's corresponding class in existing MinHash Clustering Model the inside.
Embodiment of the invention clustering method can also comprise:
After confirming new user's classification, for new user provides the user to recommend.
Wherein, the recommendation of MinHash Clustering Model can be able to understand with reference to following example.
Exemplary, a new user u, the classification c that finds this user to belong to; Calculate the similarity sim (u of user and this classification; C), each element ci of classification the inside hereto calculates the number of times COUNT (ci) that this element occurs in the classification the inside then; To recommend user's mark be sim (u, c) * COUNT (ci) to element ci so.For all elements in classification c the inside, all can generate such recommender score, according to the mark ordering, finally recommend the user then.
Technical scheme by the invention described above provides can be found out; Through the hash function of definite MinHash Clustering Model and the corresponding relation of cryptographic hash and classification; Realization utilizes existing MinHash Clustering Model with new user type of reunion fast; No longer need regenerate Clustering Model, improve cluster efficient, and then improved new user's recommendation efficient new user.
Based on the MinHash Clustering Model new user is recommended, only need to generate p*q hash function, in Clustering Model, find corresponding class to get final product then, at most only need q file operation (generally having only two to three times), advisory speed is very fast.And; Based on the MinHash Clustering Model new user is recommended, can make full use of the precision advantage of MinHash Clustering Model, the degree of accuracy of recommendation results is higher; Avoid to go to select corresponding recommendation results according to single clauses and subclauses based on the recommendation of clauses and subclauses; And user's history entries is a lot, and the interest that single clauses and subclauses can not representative of consumer is though the merging of recommendation results can reflect user's whole interest; But can not utilize the relation between the history entries, can cause like this recommending precision to lose.
Embodiment of the invention clustering method goes for all users, promptly as long as user's interest has had change, then can be used as new user to this user for this user sets up new user's overview, provides real-time recommendation results.
As shown in Figure 2, corresponding to the clustering method that the invention described above embodiment provides, the embodiment of the invention provides a kind of clustering apparatus, comprising:
Division unit 21, being used for through the MinHash Clustering Model is that a plurality of users divide classification, the hash function of store M inHash Clustering Model, and the corresponding relation of storing a plurality of users' cryptographic hash and classification.
Confirm unit 22, be used for confirming new user's cryptographic hash through the hash function of MinHash Clustering Model.
Cluster cell 23 is used for the corresponding relation according to cryptographic hash and classification, confirms new user's cryptographic hash corresponding class.
Technical scheme by the invention described above provides can be found out; Through the hash function of definite MinHash Clustering Model and the corresponding relation of cryptographic hash and classification; Realization utilizes existing MinHash Clustering Model with new user type of reunion fast; No longer need regenerate Clustering Model, improve cluster efficient new user.
Particularly, the embodiment of the invention provides a kind of clustering apparatus, and the user can corresponding be defined as the set that a plurality of key words constitute, and like set A, new user can increase or reduce key word by corresponding definition set.
Optional, the hash function of MinHash Clustering Model can comprise:
One group of hash function or many group hash functions, wherein, every group of hash function is made up of a plurality of different Hash functions.
Wherein, hash function can be for the random Harsh function in the MinHash Clustering Model, and is unrestricted.
Exemplary, like q group different Hash function, every group all has p different Hash function, for each set, generates a literary sketch according to every group of hash function, and this literary sketch is made up of p cryptographic hash, for each set, obtains q literary sketch.
As shown in Figure 3, division unit 21 can comprise:
First storing sub-units 31 is used for confirming the also hash function of store M inHash Clustering Model.
First confirms subelement 32, is used for confirming through the hash function of MinHash Clustering Model a plurality of users' cryptographic hash.
Divide subelement 33, be used for the user that cryptographic hash is consistent and be divided into same classification.
Second storing sub-units 34 is used to store the corresponding relation of cryptographic hash and classification.
Optional, can a collection of new user of single treatment, then confirm unit 22, can specifically be used for hash function, the parallel cryptographic hash of confirming one group of new user through the MinHash Clustering Model.
Cluster cell 23 can specifically be used for the corresponding relation according to cryptographic hash and classification, the parallel cryptographic hash corresponding class of confirming one group of new user.
The effect of embodiment of the invention clustering apparatus and component part thereof, the related content of the clustering method that can provide corresponding to the invention described above embodiment is able to understand, and does not give unnecessary details at this.
Technical scheme by the invention described above provides can find out that the cluster of MinHash Clustering Model can walk abreast, and is separate when each user calculates MinHash, in case after cryptographic hash calculated, the classification under the user had also just been confirmed.For new user; It is independent too to calculate MinHash; Do not receive other existing subscribers' influence, to new user's cryptographic hash, as long as guarantee that hash function is the same; New user is generated a literary sketch by same mode, just can find this new user's corresponding class in existing MinHash Clustering Model the inside.
Embodiment of the invention clustering apparatus can also comprise:
Recommendation unit is after being used for confirming new user's classification, for new user provides the user to recommend.
Wherein, the recommendation of MinHash Clustering Model can be able to understand with reference to following example.
Exemplary, a new user u, the classification c that finds this user to belong to; Calculate the similarity sim (u of user and this classification; C), each element ci of classification the inside hereto calculates the number of times COUNT (ci) that this element occurs in the classification the inside then; To recommend user's mark be sim (u, c) * COUNT (ci) to element ci so.For all elements in classification c the inside, all can generate such recommender score, according to the mark ordering, finally recommend the user then.
Technical scheme by the invention described above provides can be found out; Through the hash function of definite MinHash Clustering Model and the corresponding relation of cryptographic hash and classification; Realization utilizes existing MinHash Clustering Model with new user type of reunion fast; No longer need regenerate Clustering Model, improve cluster efficient, and then improved new user's recommendation efficient new user.
Based on the MinHash Clustering Model new user is recommended, only need to generate p*q hash function, in Clustering Model, find corresponding class to get final product then, at most only need q file operation (generally having only two to three times), advisory speed is very fast.And, based on the MinHash Clustering Model new user is recommended, can make full use of the precision advantage of MinHash Clustering Model; The degree of accuracy of recommendation results is higher, avoid can only going to select corresponding recommendation results according to single clauses and subclauses based on the recommendation of clauses and subclauses, and user's history entries is a lot; Single clauses and subclauses can not representative of consumer interest; Though the merging of recommendation results can reflect user's whole interest, can not utilize the relation between the history entries, cause recommending precision to be lost.
To combine concrete application scenarios that embodiment of the invention clustering method is done to describe in detail further below.
At first, MinHash Clustering Model principle is described:
Definition V representes the complete or collected works of element, and the random Harsh function just can be expressed as f:V → R so, and R is a set of real numbers.If Xa is any two different elements among the complete or collected works with Xb; Hash function must satisfy two conditions so, f (Xa) ≠ f (Xb) and P (f (Xa)<f (Xb))=0.5, and P () representes probability; The meaning of these two conditions is; The cryptographic hash of any two different elements can not equate, and the cryptographic hash of any element is 0.5 less than the probability of the cryptographic hash of another element, and promptly the magnitude relationship of the cryptographic hash of different elements must be at random.
On the basis of such hash function, if A is the subclass on the complete or collected works V, definition MinHash is:
h ( f , A ) = min X ∈ A f ( X ) - - - ( 1 )
For same hash function, the identical probability of the cryptographic hash of set A and B is:
P ( h ( f , A ) = h ( f , B ) ) = | A ∩ B | | A ∪ B | - - - ( 2 )
The similarity of formula (2) the right expression set A and set B, promptly the identical probability of cryptographic hash of set A and B equals the similarity of set A and B.
It is more direct to use the MinHash cluster, if the cryptographic hash of set A and set B is the same, so just gathers same type to set A and B, and the label of class is just with their identical cryptographic hash h signs.Set A and set B polymerization probability together are their similarity.
In order to improve the degree of accuracy of cluster, can get p different Hash function usually, require the cryptographic hash of the set of same classification the inside all will equate, but cause the recall rate of cluster to reduce fast easily.
In order to improve recall rate; Normally get q group different Hash function, every group all has p different Hash function, for each set A; Generate a literary sketch according to every group of hash function, this literary sketch constitutes (can separate with comma between a plurality of cryptographic hash in the literary sketch) by p cryptographic hash.For each set A, obtain q literary sketch.Given two set are the same as long as a literary sketch is arranged, and just get together these two set.Increase the probability of getting together like this, can improve the recall rate of cluster effectively, but also reduced similarity between class simultaneously, also just reduced the degree of accuracy of cluster.
Besides the recommendation principle of bright MinHash Clustering Model:
A given user u; The classification c that finds this user to belong to, and the similarity sim of calculating user and this type (u, c); Each element ci of the inside hereto type then; Calculate the number of times COUNT (ci) that this element occurs in class the inside, to recommend user's mark be sim (u, c) * COUNT (ci) to element ci so.For all elements of classification c, all can generate such recommender score, according to the mark ordering, finally recommend the user then.
Can belong under the situation of a plurality of classifications a user; It also is similar handling; Detailed process is following: at first each classification is done aforesaid processing, lump together all elements of classification the inside then, and the mark of identical element is added up; Finally obtain a long recommendation list, recommend the user after the ordering.
As shown in Figure 4, the hash function of MinHash Clustering Model, as (f1, f2 ..., fp), Fig. 4 only illustrates one group of hash function eventually.The classification of MinHash Clustering Model, as class 1 (h11, h21 ..., hp1), class 2 (h12, h22 ..., hp2), class 3 (h13, h23 ..., hp3), wherein, (h11, h21 ..., hp1) be literary sketch.
Embodiment of the invention clustering method comprises:
41, confirm new user's cryptographic hash through the hash function of MinHash Clustering Model.
Through the MinHash Clustering Model, as (f1, f2 ..., fp) confirm new user u NewLiterary sketch (h1 New, h2 New..., hp New).
42,, confirm new user's cryptographic hash corresponding class according to the corresponding relation of cryptographic hash and classification.
Confirm new user u NewLiterary sketch (h1 New, h2 New..., hp New) corresponding class, as class 3 (h13, h23 ..., hp3).
43, provide the user to recommend for new user.
According to new user unew the class 3 (h13, h23 ..., hp3) obtain recommendation results, carry out user's recommendation.
Technical scheme by the invention described above provides can find out that the cluster of MinHash Clustering Model can walk abreast, and is separate when each user calculates MinHash, in case after cryptographic hash calculated, the classification under the user had also just been confirmed.For new user; It is independent too to calculate MinHash; Do not receive other existing subscribers' influence, to new user's cryptographic hash, as long as guarantee that hash function is the same; New user is generated a literary sketch by same mode, just can find this new user's corresponding class in existing MinHash Clustering Model the inside.
In several embodiment that the application provided, should be understood that, the system that is disclosed, apparatus and method can realize through other mode.For example, device embodiment described above only is schematically, for example; The division of said unit; Only be that a kind of logic function is divided, during actual the realization other dividing mode can be arranged, for example a plurality of unit or assembly can combine or can be integrated into another system; Or some characteristics can ignore, or do not carry out.Another point, the coupling each other that shows or discuss or directly coupling or communication to connect can be through some interfaces, the indirect coupling of device or unit or communication connect, and can be electrically, machinery or other form.
Said unit as separating component explanation can or can not be physically to separate also, and the parts that show as the unit can be or can not be physical locations also, promptly can be positioned at a place, perhaps also can be distributed on a plurality of NEs.Can realize the purpose of present embodiment scheme according to the needs selection some or all of unit wherein of reality.
In addition, each functional unit in each embodiment of the present invention can be integrated in the processing unit, also can be that the independent physics in each unit exists, and also can be integrated in the unit two or more unit.Above-mentioned integrated unit both can adopt the form of hardware to realize, also can adopt the form of SFU software functional unit to realize.
If said integrated unit is realized with the form of SFU software functional unit and during as independently production marketing or use, can be stored in the computer read/write memory medium.Based on such understanding; Part or all or part of of this technical scheme that technical scheme of the present invention contributes to prior art in essence in other words can come out with the embodied of software product; This computer software product is stored in the storage medium; Comprise some instructions with so that computer equipment (can be personal computer, server, the perhaps network equipment etc.) carry out all or part of step of the said method of each embodiment of the present invention.And aforesaid storage medium comprises: various media that can be program code stored such as USB flash disk, portable hard drive, ROM (read-only memory) (ROM, Read-Only Memory), RAS (RAM, Random Access Memory), magnetic disc or CD.
The above; Be merely the preferable embodiment of the present invention, but protection scope of the present invention is not limited thereto, any technician who is familiar with the present technique field is in the technical scope that the present invention discloses; The variation that can expect easily or replacement all should be encompassed within protection scope of the present invention.Therefore, protection scope of the present invention should be as the criterion with the protection domain of claims.

Claims (10)

1. a clustering method is characterized in that, comprising:
Through the MinHash Clustering Model is that a plurality of users divide classification, stores the hash function of said MinHash Clustering Model, and the corresponding relation of storing said a plurality of users' cryptographic hash and said classification;
Confirm new user's cryptographic hash through the hash function of said MinHash Clustering Model;
According to the corresponding relation of said cryptographic hash and said classification, confirm said new user's cryptographic hash corresponding class.
2. clustering method according to claim 1 is characterized in that, the hash function of said MinHash Clustering Model is the random Harsh function.
3. clustering method according to claim 1 is characterized in that, the hash function of said MinHash Clustering Model comprises:
One group of hash function or many group hash functions, wherein, every group of hash function is made up of a plurality of different Hash functions.
4. clustering method according to claim 1; It is characterized in that said is that a plurality of users divide classification through the MinHash Clustering Model, stores the hash function of said MinHash Clustering Model; And the corresponding relation of storing said a plurality of users' cryptographic hash and said classification, comprising:
Confirm and store the hash function of said MinHash Clustering Model;
Confirm said a plurality of users' cryptographic hash through the hash function of said MinHash Clustering Model;
The user that said cryptographic hash is consistent is divided into same classification;
Store the corresponding relation of said cryptographic hash and said classification.
5. clustering method according to claim 1 is characterized in that, said hash function through said MinHash Clustering Model is confirmed new user's cryptographic hash, comprising:
Through the hash function of said MinHash Clustering Model, the parallel cryptographic hash of confirming one group of new user.
6. a clustering apparatus is characterized in that, comprising:
Division unit, being used for through the MinHash Clustering Model is that a plurality of users divide classification, stores the hash function of said MinHash Clustering Model, and the corresponding relation of storing said a plurality of users' cryptographic hash and said classification;
Confirm the unit, be used for confirming new user's cryptographic hash through the hash function of said MinHash Clustering Model;
Cluster cell is used for the corresponding relation according to said cryptographic hash and said classification, confirms said new user's cryptographic hash corresponding class.
7. clustering apparatus according to claim 6 is characterized in that, the hash function of said MinHash Clustering Model is the random Harsh function.
8. clustering apparatus according to claim 6 is characterized in that, the hash function of said MinHash Clustering Model comprises:
One group of hash function or many group hash functions, wherein, every group of hash function is made up of a plurality of different Hash functions.
9. clustering apparatus according to claim 6 is characterized in that, said division unit comprises:
First storing sub-units is used for confirming and storing the hash function of said MinHash Clustering Model;
First confirms subelement, is used for confirming through the hash function of said MinHash Clustering Model said a plurality of users' cryptographic hash;
Divide subelement, be used for the said user that said cryptographic hash is consistent and be divided into same classification;
Second storing sub-units is used to store the corresponding relation of said cryptographic hash and said classification.
10. clustering apparatus according to claim 6 is characterized in that, said cluster cell specifically is used for the hash function through said MinHash Clustering Model, the parallel cryptographic hash of confirming one group of new user.
CN201110041200.8A 2011-02-18 2011-02-18 A kind of clustering method and device Active CN102646097B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201110041200.8A CN102646097B (en) 2011-02-18 2011-02-18 A kind of clustering method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201110041200.8A CN102646097B (en) 2011-02-18 2011-02-18 A kind of clustering method and device

Publications (2)

Publication Number Publication Date
CN102646097A true CN102646097A (en) 2012-08-22
CN102646097B CN102646097B (en) 2019-04-26

Family

ID=46658920

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201110041200.8A Active CN102646097B (en) 2011-02-18 2011-02-18 A kind of clustering method and device

Country Status (1)

Country Link
CN (1) CN102646097B (en)

Cited By (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103106283A (en) * 2013-02-28 2013-05-15 北京奇虎科技有限公司 Duplicate removal treatment method and device
CN104424254A (en) * 2013-08-28 2015-03-18 阿里巴巴集团控股有限公司 Method and device for obtaining similar object set and providing similar object set
CN104715021A (en) * 2015-02-27 2015-06-17 南京邮电大学 Multi-label learning design method based on hashing method
CN104778234A (en) * 2015-03-31 2015-07-15 南京邮电大学 Multi-label file nearest neighbor search method based on LSH (Locality Sensitive Hashing) technology
CN105100164A (en) * 2014-05-20 2015-11-25 深圳市腾讯计算机系统有限公司 Network service recommendation method and device
CN106470435A (en) * 2015-08-18 2017-03-01 腾讯科技(深圳)有限公司 The method and system of identification WiFi group
WO2017067364A1 (en) * 2015-10-21 2017-04-27 北京瀚思安信科技有限公司 Method and equipment for determining common subsequence of text strings
CN107004221A (en) * 2014-11-28 2017-08-01 Bc卡有限公司 For predict using industry card use pattern analysis method and perform its server
US9754035B2 (en) 2014-02-07 2017-09-05 Excalibur LP, LCC Recursive unique user metrics in real time
CN110210883A (en) * 2018-05-09 2019-09-06 腾讯科技(深圳)有限公司 The recognition methods of team control account, device, server and storage medium
CN110245687A (en) * 2019-05-17 2019-09-17 腾讯科技(上海)有限公司 User classification method and device

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101051322A (en) * 2007-05-18 2007-10-10 北京中星微电子有限公司 File classifying method and file classifier
CN101359992A (en) * 2007-07-31 2009-02-04 华为技术有限公司 Content category request method, determination method, interaction method and apparatus thereof
CN101562612A (en) * 2009-05-26 2009-10-21 中兴通讯股份有限公司 Method and device for constructing matching rule list and recognizing message type
US20100169258A1 (en) * 2008-12-31 2010-07-01 Microsoft Corporation Scalable Parallel User Clustering in Discrete Time Window

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101051322A (en) * 2007-05-18 2007-10-10 北京中星微电子有限公司 File classifying method and file classifier
CN101359992A (en) * 2007-07-31 2009-02-04 华为技术有限公司 Content category request method, determination method, interaction method and apparatus thereof
US20100169258A1 (en) * 2008-12-31 2010-07-01 Microsoft Corporation Scalable Parallel User Clustering in Discrete Time Window
CN101562612A (en) * 2009-05-26 2009-10-21 中兴通讯股份有限公司 Method and device for constructing matching rule list and recognizing message type

Cited By (20)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103106283B (en) * 2013-02-28 2016-04-27 北京奇虎科技有限公司 Duplicate removal treatment method and device
CN103106283A (en) * 2013-02-28 2013-05-15 北京奇虎科技有限公司 Duplicate removal treatment method and device
CN104424254A (en) * 2013-08-28 2015-03-18 阿里巴巴集团控股有限公司 Method and device for obtaining similar object set and providing similar object set
CN104424254B (en) * 2013-08-28 2018-05-22 阿里巴巴集团控股有限公司 Obtain analogical object set, the method and device that analogical object information is provided
US9754035B2 (en) 2014-02-07 2017-09-05 Excalibur LP, LCC Recursive unique user metrics in real time
US10331734B2 (en) 2014-05-20 2019-06-25 Tencent Technology (Shenzhen) Company Limited Method and apparatus for recommending network service
CN105100164B (en) * 2014-05-20 2018-06-15 深圳市腾讯计算机系统有限公司 Network service recommends method and apparatus
CN105100164A (en) * 2014-05-20 2015-11-25 深圳市腾讯计算机系统有限公司 Network service recommendation method and device
CN107004221A (en) * 2014-11-28 2017-08-01 Bc卡有限公司 For predict using industry card use pattern analysis method and perform its server
CN104715021A (en) * 2015-02-27 2015-06-17 南京邮电大学 Multi-label learning design method based on hashing method
CN104715021B (en) * 2015-02-27 2018-09-11 南京邮电大学 A kind of learning method of the Multi-label learning based on hash method
CN104778234A (en) * 2015-03-31 2015-07-15 南京邮电大学 Multi-label file nearest neighbor search method based on LSH (Locality Sensitive Hashing) technology
CN106470435A (en) * 2015-08-18 2017-03-01 腾讯科技(深圳)有限公司 The method and system of identification WiFi group
CN106470435B (en) * 2015-08-18 2019-11-29 腾讯科技(深圳)有限公司 The method and system of identification WiFi groups
CN106610965A (en) * 2015-10-21 2017-05-03 北京瀚思安信科技有限公司 Text string common sub sequence determining method and equipment
WO2017067364A1 (en) * 2015-10-21 2017-04-27 北京瀚思安信科技有限公司 Method and equipment for determining common subsequence of text strings
CN110210883A (en) * 2018-05-09 2019-09-06 腾讯科技(深圳)有限公司 The recognition methods of team control account, device, server and storage medium
CN110210883B (en) * 2018-05-09 2023-08-22 腾讯科技(深圳)有限公司 Group control account identification method, device, server and storage medium
CN110245687A (en) * 2019-05-17 2019-09-17 腾讯科技(上海)有限公司 User classification method and device
CN110245687B (en) * 2019-05-17 2021-06-04 腾讯科技(上海)有限公司 User classification method and device

Also Published As

Publication number Publication date
CN102646097B (en) 2019-04-26

Similar Documents

Publication Publication Date Title
CN102646097A (en) Clustering method and device
CN103995839A (en) Commodity recommendation optimizing method and system based on collaborative filtering
CN109561052B (en) Method and device for detecting abnormal flow of website
CN103942712A (en) Product similarity based e-commerce recommendation system and method thereof
CN104123315A (en) Multi-media file recommendation method and recommendation server
CN105005582A (en) Recommendation method and device for multimedia information
CN103136683A (en) Method and device for calculating product reference price and method and system for searching products
CN103106285A (en) Recommendation algorithm based on information security professional social network platform
WO2008130753A3 (en) Methods and apparatus to facilitate sales estimates
CN105095211A (en) Acquisition method and device for multimedia data
CN102591873B (en) A kind of information recommendation method and equipment
CN105389590A (en) Video clustering recommendation method and apparatus
CN105183731A (en) Method, device, and system for generating recommended information
CN109697454B (en) Cross-device individual identification method and device based on privacy protection
CN104077415A (en) Searching method and device
CN106846082B (en) Travel cold start user product recommendation system and method based on hardware information
CN105574045A (en) Video recommendation method and server
CN103714063A (en) Data analysis method and data analysis system
CN103500228A (en) Similarity measuring method improved through collaborative filtering recommendation algorithm
CN105488107A (en) Offline evaluation method for recommendation system
CN104346428A (en) Information processing apparatus, information processing method, and program
CN105022807A (en) Information recommendation method and apparatus
CN104166732A (en) Project collaboration filtering recommendation method based on global scoring information
CN104408640A (en) Application software recommending method and apparatus
CN104199836A (en) Annotation user model construction method based on child interest division

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
EE01 Entry into force of recordation of patent licensing contract

Application publication date: 20120822

Assignee: Ocean interactive (Beijing) Information Technology Co., Ltd.

Assignor: Tencent Technology (Shenzhen) Co., Ltd.

Contract record no.: 2016990000422

Denomination of invention: Clustering method and device

License type: Common License

Record date: 20161009

LICC Enforcement, change and cancellation of record of contracts on the licence for exploitation of a patent or utility model
GR01 Patent grant
GR01 Patent grant
TR01 Transfer of patent right
TR01 Transfer of patent right

Effective date of registration: 20190731

Address after: 518028 Room 403, 2 East Building, Xingxing Road Saige Science Park, Futian District, Shenzhen City, Guangdong Province

Co-patentee after: Tencent cloud computing (Beijing) limited liability company

Patentee after: Tencent Technology (Shenzhen) Co., Ltd.

Address before: 2 East 403 room, SEG science and technology garden, Futian District, Guangdong, Shenzhen 518028, China

Patentee before: Tencent Technology (Shenzhen) Co., Ltd.