CN104731867A - Object clustering method and device - Google Patents

Object clustering method and device Download PDF

Info

Publication number
CN104731867A
CN104731867A CN201510090184.XA CN201510090184A CN104731867A CN 104731867 A CN104731867 A CN 104731867A CN 201510090184 A CN201510090184 A CN 201510090184A CN 104731867 A CN104731867 A CN 104731867A
Authority
CN
China
Prior art keywords
transfer
keyword
described multiple
information
transfer case
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201510090184.XA
Other languages
Chinese (zh)
Other versions
CN104731867B (en
Inventor
周泽伟
程涛远
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Baidu Online Network Technology Beijing Co Ltd
Beijing Baidu Netcom Science and Technology Co Ltd
Original Assignee
Beijing Baidu Netcom Science and Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Baidu Netcom Science and Technology Co Ltd filed Critical Beijing Baidu Netcom Science and Technology Co Ltd
Priority to CN201510090184.XA priority Critical patent/CN104731867B/en
Publication of CN104731867A publication Critical patent/CN104731867A/en
Application granted granted Critical
Publication of CN104731867B publication Critical patent/CN104731867B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Abstract

The invention relates to a method for clustering objects in a computer device. The method comprises the steps that the transfer condition information of multiple objects is obtained, wherein the transfer condition information is used for indicating the transfer condition of users in the multiple objects based on actions for obtaining the object information; the objects are clustered according to the transfer condition information, and a clustering result of the objects is obtained. According to the method, the objects can be clustered based on analysis of the transfer condition of the users in the multiple objects, and therefore the determined object classification is more objective and accurate.

Description

A kind of method and apparatus object being carried out to cluster
Technical field
The present invention relates to field of computer technology, particularly relate to a kind of method and apparatus object being carried out to cluster.
Background technology
In prior art, usually by carrying out natural language analysis to the description text of object, object is classified.Especially, when object relates to commercial use, as when object is brand, except carrying out except natural language analysis to brand name, also can in conjunction with the data from object angle, the factor such as sales situation and the market demand of the industry belonging to brand and region, brand, classifies to brand.
Summary of the invention
Order of the present invention comprises provides a kind of method and apparatus object being carried out to cluster.
According to an aspect of the present invention, provide a kind of method for carrying out cluster in computer equipment to object, wherein, the method comprises:
Obtain the transfer case information of multiple object, described transfer case information be used to indicate based on object information obtain behavior, the transfer case of user in described multiple object;
According to described transfer case information, cluster is carried out to described multiple object, obtain the cluster result of described multiple object.
According to another aspect of the present invention, additionally provide a kind of device for carrying out cluster in computer equipment to object, wherein, this device comprises:
For obtaining the device of the transfer case information of multiple object, described transfer case information be used to indicate based on object information obtain behavior, the transfer case of user in described multiple object;
For according to described transfer case information, cluster is carried out to described multiple object, obtain the device of the cluster result of described multiple object.
Compared with prior art, the present invention has the following advantages: 1) the solution of the present invention has broken the prejudice of this area, can carry out cluster by analyzing the transfer case information of user in object to object; 2) data from object angle are compared, by analyzing the scheme that the transfer case of user in multiple object is carried out object in the present invention, more to be close to the users angle, more can reflect the understanding of user to object intuitively, therefore, the determined object classification of the solution of the present invention is more objective, accurate; 3) even if from the data of user perspective, transfer case information of the present invention is not common data yet, in fact, if clearly mention the data from user perspective, those skilled in the art are more it is contemplated that from the direct evaluation (as marking, comment word etc.) of user.
Accompanying drawing explanation
By reading the detailed description done non-limiting example done with reference to the following drawings, other features, objects and advantages of the present invention will become more obvious:
Fig. 1 is method flow schematic diagram object being carried out to cluster of a preferred embodiment of the invention;
Fig. 2 is structural representation object being carried out to the clustering apparatus of cluster of a preferred embodiment of the invention;
Fig. 3 shows the schematic diagram of the transfer path of user in multiple object of a preferred embodiment;
Fig. 4 shows the schematic diagram of the transfer path of user in multiple keyword of a preferred embodiment;
Fig. 5 shows the instantiation of the conversion of a cancellated transfer path from the cancellated transfer path of keyword to object;
Fig. 6 show a preferred embodiment, from an object to the schematic diagram of the transfer of multiple object;
Fig. 7 shows an instantiation of Fig. 6.
In accompanying drawing, same or analogous Reference numeral represents same or analogous parts.
Embodiment
Below in conjunction with accompanying drawing, the present invention is described in further detail.
Fig. 1 is method flow schematic diagram object being carried out to cluster of a preferred embodiment of the invention.Wherein, the method for the present embodiment realizes mainly through computer equipment, and this computer equipment comprises the network equipment and subscriber equipment.The described network equipment includes but not limited to the server group that single network server, multiple webserver form or the cloud be made up of a large amount of computing machine or the webserver based on cloud computing (CloudComputing), wherein, cloud computing is the one of Distributed Calculation, the super virtual machine be made up of a group loosely-coupled computing machine collection; Network residing for the described network equipment includes but not limited to internet, wide area network, Metropolitan Area Network (MAN), LAN (Local Area Network), VPN etc.Described subscriber equipment includes but not limited to PC, panel computer, smart mobile phone, PDA, IPTV etc.
It should be noted that, described computer equipment and network are only citing, and other computing equipments that are existing or that may occur from now on or network, as being applicable to the present invention, within also should being included in scope, and are contained in this with way of reference.
Step S1 and step S2 is comprised according to the method for the present embodiment.
In step sl, computer equipment obtains the transfer case information of multiple object.
Wherein, described object can comprise and anyly can be carried out the object of cluster.Preferably, described object has nature of business.More preferably, described object comprises brand.
Wherein, described transfer case information be used to indicate based on object information obtain behavior, the transfer case of user in described multiple object.Wherein, described object information acquisition behavior comprises any behavior that can be used in the information obtaining object; Such as, described object information acquisition behavior comprises the behavior being obtained object information by the search keyword relevant with object; Again such as, described object information acquisition behavior comprises by clicking and browsing the behavior that the content relevant with object obtains object information.Wherein, described " obtaining behavior based on object information ", represents the transfer case that described transfer case reflects user and produces in object information acquisition behavior, and preferably, described transfer case needs to obtain behavior based on object information and determined; Such as, by adding up the object search that multiple user changes in search behavior, or changed in search behavior by the multiple user of statistics with the search keyword of object association, determine the transfer case information etc. of user in middle object.
Preferably, the transfer case information of described multiple object includes but not limited to following at least one item:
1) the transfer path information of user in multiple object.
Wherein, described transfer path information indicates the transfer path of user in multiple object.Such as, there are three objects Object1, Object2 and Object3, transfer path information indicates the transfer path of multiple user in these three objects to comprise: be transferred to Object2 from Object1, and, be transferred to Object3 from Object1.
2) the transfer number information of user between each object.
Wherein, described transfer number information indicates the transfer number of user between each object.Such as, there are three objects Object1, Object2 and Object3, transfer number information indicates the transfer number of multiple user between these three objects and comprises: be transferred to Object2 five times from Object1, and, be transferred to Object3 eight times from Object1.
3) the transition probability information of user between each object.
Wherein, described transition probability information indicates the transition probability of user between each object.Such as, there are three objects Object1, Object2 and Object3, transition probability information indicates the transition probability of multiple user between these three objects and comprises: the probability being transferred to Object2 from Object1 is 38.46%, and the probability being transferred to Object3 from Object1 is 61.54%.
It should be noted that, may not there is transfer path (namely user did not carry out transfer in object information acquisition behavior between these partial objects) between partial objects in multiple object, then the transfer number between these partial objects and transition probability are zero.In addition, the situation being transferred to this object self from an object may be there is; Such as, user's possibility continuous several times in search behavior adopts the information of the different same objects of search keyword search, thus produces the situation being transferred to this object self from an object.
Preferably, this transfer case information can adopt multiple storage mode.
Such as, this transfer case information is stored as form, and have recorded transfer number between each object of the transfer path of user in multiple object and user and transition probability in form, as shown in table 1 below.
Transfer path Transfer number Transition probability
Object1→Object2 5 38.46%
Object1→Object3 8 61.54%
Table 1
Again such as, this transfer case information comprises: be stored as cancellated transfer path, and, the transfer number of (namely between each object) and/or transition probability between each node in this reticulate texture.As for 9 object Object1 to Object9, the transfer case information of these 9 objects comprises transfer path as shown in Figure 3, and, there is in Fig. 3 transfer number and/or the transition probability of (as from Object1 to Object2, from Object1 to Object3, from Object1 to Object4 etc.) between each node that arrow connects.
It should be noted that, above-mentioned citing is only and technical scheme of the present invention is described better, but not limitation of the present invention, those skilled in the art should understand that, any be used to indicate based on object information obtain behavior, the transfer case information of the transfer case of user in described multiple object, all should be within the scope of the present invention.
Particularly, the mode that computer equipment obtains the transfer case information of multiple object includes but not limited to:
1) computer equipment directly obtain pre-determining, the transfer case information of the plurality of object.
Such as, computer equipment from local or other equipment read pre-determining, the transfer case information of the plurality of object.
2) step S1 comprises step S11 and step S12 further.
In step s 11, computer equipment obtains the transfer case information of multiple keyword.
Wherein, the transfer case information of described multiple keyword be used to indicate based on object information obtain behavior, the transfer case of user in multiple keyword.Preferably, described multiple keyword is associated with the object in object information acquisition behavior; Such as, if it is object search behavior that object information obtains behavior, then keyword can be search keyword etc. that is that user in this search behavior inputs or that select.
Preferably, the transfer case information of described multiple keyword comprises following at least one item:
A) the transfer path information of user in described multiple keyword.
Wherein, the transfer path information of user in described multiple keyword indicates the transfer path of user in multiple keyword.Such as, there are three keywords Query1, Query2 and Query3, transfer path information indicates the transfer path of multiple user in these three keywords to comprise: be transferred to Query2 from Query1, and, be transferred to Query3 from Query1.
B) the transfer number information of user between each keyword.
Wherein, the transfer number information of user between each keyword indicates the transfer number of user between each keyword.Such as, there are three keywords Query1, Query2 and Query3, transfer number information indicates the transfer number of multiple user between these three keywords to comprise: be transferred to Query2 five times from Query1, and, be transferred to Query3 eight times from Query1.
It should be noted that, may not there is transfer path (namely user did not carry out transfer in object information acquisition behavior between these Partial key words) between Partial key word in multiple keyword, then the transfer number between these Partial key words is zero.
Preferably, the transfer case information of the plurality of keyword can adopt multiple storage mode.
Such as, this transfer case information is stored as form, and have recorded the transfer path of user in multiple keyword and the transfer number of user between each keyword in form, as shown in table 2 below.
Transfer path Transfer number
Query1→Query2 5
Query1→Query3 8
Table 2
Again such as, this transfer case information comprises: be stored as cancellated transfer path, and, the transfer number of (namely between each keyword) between each node in this reticulate texture.As for 9 keyword Query1 to Query9, the transfer case information of these 9 objects comprises transfer path as shown in Figure 4, and, there is in Fig. 4 the transfer number between each node that arrow connects.
It should be noted that, above-mentioned citing is only and technical scheme of the present invention is described better, but not limitation of the present invention, those skilled in the art should understand that, any be used to indicate based on object information obtain behavior, the transfer case information of the transfer case of user in multiple keyword, all should be within the scope of the present invention.
Particularly, the implementation that computer equipment obtains the transfer case information of multiple keyword includes but not limited to:
A) computer equipment directly obtain pre-determining, the transfer case information of the plurality of keyword.
Such as, computer equipment from local or other equipment read pre-determining, the transfer case information of the plurality of keyword.
B) computer equipment obtains the keyword concern record of at least one user, and pays close attention to record according to described keyword, determines the transfer case information of described multiple keyword.
Wherein, this keyword is paid close attention to record and is comprised keyword that described multiple user paid close attention in object information acquisition behavior and the temporal information that described keyword is concerned.Preferably, object information obtains behavior and comprises search behavior, and the keyword of described concern comprises searched keyword; Preferably, object information obtains behavior and comprises navigation patterns, and the keyword of described concern comprises the clicked keyword in order to browse contents of object.
Preferably, record paid close attention in the keyword for each user, and computer equipment pays close attention to according to this keyword the temporal information that in record, the keyword that comprises is concerned, and determines the transfer number between the transfer path of this user in keyword and each keyword; Further, computer equipment, by merging the transfer number between the transfer path of each user in keyword and each keyword, determines the transfer case information of described multiple keyword.
Such as, computer equipment obtains the keyword concern record of user A and user B; Wherein, the keyword of user A and user B pays close attention to record respectively as shown in following table 3 and table 4:
The keyword paid close attention to The time that keyword is concerned
Query1 2014-12-13-10:40
Query3 2014-12-13-10:36
Table 3
The keyword paid close attention to The time that keyword is concerned
Query1 2014-11-10-00:14
Query2 2014-11-10-00:23
Table 4
Record paid close attention in keyword then for user A, and computer equipment determines that the transfer path of user A in keyword comprises " Query1 → Query3 ", and the transfer number of " Query1 → Query3 " is 1; Similarly, computer equipment determines that the transfer path of user B in keyword comprises " Query1 → Query2 ", and the transfer number of " Query1 → Query2 " is 1.Then, computer equipment merges the transfer number between transfer path in keyword of user A and B and each keyword, determines that the transfer case information of described multiple keyword is as shown in table 5 below.
Transfer path Transfer number
Query1→Query2 1
Query1→Query3 1
Table 5
It should be noted that, above-mentioned citing is only and technical scheme of the present invention is described better, but not limitation of the present invention, those skilled in the art should understand that, the implementation of the transfer case information of the multiple keyword of any acquisition, all should be within the scope of the present invention.
In step s 12, the transfer case information of the object that computer equipment is associated to respectively according to multiple keyword and described multiple keyword, determines the transfer case information of multiple object.
Particularly, the object that computer equipment can be associated to respectively according to the transfer path information of user in multiple keyword and multiple keyword, determine the transfer path information of user in multiple object, and, according to the object that the transfer number information of user between each keyword and multiple keyword are associated to respectively, determine the transfer number of user between each object and/or transition probability information.
Such as, as shown in Table 2 above, and Query1, Query2 and Query3 are associated to Object1, Object2 and Object3 to the transfer case information of multiple keyword respectively; Computer equipment is according to the transfer case information of the keyword shown in table 2 and aforementioned incidence relation, determine that the transfer path of user in object Object1, Object2 and Object3 comprises " Object1 → Object2 " and " Object1 → Object3 ", and the transfer number of these 2 transfer paths is respectively 5 and 8; Then, computer equipment is according to the transfer number of these 2 transfer paths, calculate transition probability=5/ (5+8)=38.46% of " Object1 → Object2 ", transition probability=8/ (5+8)=61.54% of " Object1 → Object3 ", also namely, computer equipment obtains transfer case information as shown in table 1.
It should be noted that, because a user may be associated to the different keywords (as adopted the difference of corresponding same object to search plain keyword in repeatedly searching for) of same object in continuous several times object information acquisition behavior, therefore, the transition probability being transferred to this object self from an object may be there is, as the p as shown in Fig. 6 can be there is 00deng.An instantiation of Fig. 6 can see Fig. 7.As shown in Figure 7, transferring to " Gymboree " self probability from " Gymboree " can up to 71.86%.
It should be noted that, preferably, computer equipment can carry out the transition probability p of calculating object i to object j based on following formula ij:
p ij = a ij Σ j a ij
Wherein, a ijrepresent the transfer number of object i to object j, represent the transfer number of object i to all objects.
Such as, as shown in Figure 6, object Object0 is transferred to himself and other multiple object Object1 to Object13; To transfer to Object8 from object Object1, object Objectp0 is to the transition probability of object Object8 wherein, represent all transfer numbers from object Object0 to Object0 self and object Object1 to Object13.
It should be noted that, the object that multiple keyword is associated to respectively can be determined in advance, and Fig. 5 shows the transform instances of the cancellated transfer path from the cancellated transfer path of keyword to object.In Fig. 5, each node in the reticulate texture of top is keyword, and each node in the reticulate texture of below is the object corresponding with the keyword in the respective nodes of top.
It should be noted that, above-mentioned citing is only and technical scheme of the present invention is described better, but not limitation of the present invention, those skilled in the art should understand that, the implementation of the transfer case information of the multiple object of any acquisition, all should be within the scope of the present invention.
In step s 2, computer equipment, according to the transfer case information of the multiple objects obtained in step sl, carries out cluster to the plurality of object, obtains the cluster result of described multiple object.
Wherein, the cluster result of described multiple object can show as various ways; Such as, this cluster result comprises multiple set, and the object that each set comprises belongs to a class; Again such as, this cluster result comprises: object ID and the category IDs corresponding with this object ID, then determine the classification belonging to this object by the category IDs that each object ID is corresponding.
Particularly, computer equipment, according to the transfer case information of multiple object, carries out cluster to the plurality of object, and the implementation obtaining the cluster result of described multiple object includes but not limited to:
1) computer equipment directly carries out cluster according to the transfer case information of multiple object to multiple object, obtains the cluster result of multiple object.Wherein, the transition probability between two objects or transfer number higher, then these two objects are gathered is that the possibility of a class is higher.
Such as, the transfer case information that computer equipment obtains in step sl is as shown in aforementioned table 1, then computer equipment does not exceed predetermined threshold 60% according to the transition probability between Object1 and Object2 38.46%, determining that Object1 and Object2 can not gather is a class, and, computer equipment exceedes predetermined threshold 60% according to the transition probability between Object1 and Object3 61.54%, and determining that Object1 and Object3 gathers is a class.Then computer equipment obtains the cluster result [Object1, Object3] showing as two set, [Object2]; Wherein, these two set expression Object1 and Object3 belong to same classification, and Object2 belongs to separately a classification.
It should be noted that, exist in described multiple object and to have been gathered when being the object of a class (may comprise in as multiple object gathered by the operation of artificial or computer equipment be the object of a class), may occur judging to gather be whether multiple object of a class and other one or more objects can gather is the situation of a class, then: between an object and the one or more objects in having gathered the multiple objects being a class transition probability or transfer number higher, then this object be that multiple objects of a class are gathered is that the possibility of a class is higher with to have gathered; Gathered one or more object in the multiple objects being a class and the transition probability between other one or more objects having gathered in the multiple objects being a class or transfer number higher, then this gathered be multiple object of a class and other to have gathered be that multiple objects of a class are gathered be that the possibility of a class is higher.
2) computer equipment is by based on the transfer distance between described transfer case information acquisition object, carries out cluster, obtain the cluster result of described multiple object to described multiple object.
Particularly, computer equipment first can obtain the transfer distance between all objects, then carries out cluster according to transfer distance to multiple object, obtains the cluster result of described multiple object; Or, computer equipment can perform repeatedly cluster operation to obtain the cluster result of multiple object, as selected partial objects in each cluster operation from multiple object, and determine transfer distance required between this partial objects, thus cluster operation is carried out to this partial objects.
Preferably, the transfer distance between described object includes but not limited to following at least one item:
A) object in described multiple object and the transfer distance between another object in described multiple object.Wherein, this transfer distance is less, and it is that the possibility of a class is larger that an object in described multiple object and another object in described multiple object are gathered.
Wherein, the transfer distance between two objects is determined by the transfer number information in transfer case information and/or transition probability information.
Such as, the transfer distance between two objects is determined by following formula:
d ij = r ( p ij + p ji ) / 2
Wherein, d ijrepresent the transfer distance between object i and object j, p ijrepresent the transition probability between object i to object j, p jirepresent the transition probability between object j to object i, r represents parameter, and this parameter can manually set.
It should be noted that, above-mentioned formula can adjust as required, as by (the p in formula ij+ p ji)/2 are adjusted to deng.
B) object in described multiple object and the transfer distance between the multiple objects in described multiple object.Wherein, this transfer distance is less, and it is that the possibility of a class is larger that an object in described multiple object and the multiple objects in described multiple object are gathered.Preferably, it is a class that the multiple objects in described multiple object have been gathered usually.
Wherein, an object in described multiple object and the transfer distance between the multiple objects in described multiple object, can determine according to the transfer distance between the one or more objects in the multiple objects in this object and the plurality of object, also can determine according to the transfer number/transition probability between this object to the one or more objects in the multiple objects in the plurality of object.
Such as, co-exist in 9 object Object1 to Object9, wherein, one object Object1 and three is gathered is the transfer distance between object Object4, Object7 and Object8 of a class, determines by any one mode following:
The first: is using transfer distance minimum in the transfer distance between Object1 and Object4, between Object1 and Object7, between Object1 and Object8 as the transfer distance between Object1 and Object4, Object7 and Object8.
The second: using transfer distance maximum in the transfer distance between Object1 and Object4, between Object1 and Object7, between Object1 and Object8 as the transfer distance between Object1 and Object4, Object7 and Object8.
The third: the transfer distance of three between Object1 and Object4, between Object1 and Object7, between Object1 and Object8 is calculated, as averaged etc., and using result of calculation as the transfer distance between Object1 and Object4, Object7 and Object8.
4th kind: determine transfer number/transition probability maximum in the transfer number/transition probability between Object1 and Object4, between Object1 and Object7, between Object1 and Object8, and ask for transfer distance, as the transfer distance between Object1 and Object4, Object7 and Object8 according to this maximum transfer number/transition probability.
5th kind: determine transfer number/transition probability minimum in the transfer number/transition probability between Object1 and Object4, between Object1 and Object7, between Object1 and Object8, and ask for transfer distance, as the transfer distance between Object1 and Object4, Object7 and Object8 according to this minimum transfer number/transition probability.
6th kind: the transfer number/transition probability between Object1 and Object4, between Object1 and Object7, between Object1 and Object8 is calculated, as averaged etc., and ask for transfer distance according to result of calculation, as the transfer distance between Object1 and Object4, Object7 and Object8.
C) the multiple object in described multiple object and the transfer distance between other the multiple objects in described multiple object.Wherein, this transfer distance is less, and it is that the possibility of a class is larger that other the multiple objects in the multiple object in described multiple object and described multiple object are gathered.
Wherein, transfer distance between other multiple objects in multiple object in described multiple object and described multiple object, can determine according to the transfer distance between the one or more objects in the one or more object in the plurality of object and this other multiple object, also can determine according to the transfer number/transition probability between the one or more objects in the one or more object in the plurality of object and this other multiple object.
Such as, co-exist in 9 object Object1 to Object9, wherein, having gathered is two object Object1 and Object3 and two transfer distances of gathering between object Object4 and Object8 being a class of a class, determines by any one mode following:
The first: is using transfer distance minimum in the transfer distance between Object1 and Object4, between Object1 and Object8, between Object3 and Object4, between Object3 and Object8 as the transfer distance between Object1 and Object3 and Object4 and Object8.
The second: using transfer distance maximum in the transfer distance between Object1 and Object4, between Object1 and Object8, between Object3 and Object4, between Object3 and Object8 as the transfer distance between Object1 and Object3 and Object4 and Object8.
The third: the shifting science and technology in four directions distance between Object1 and Object4, between Object1 and Object8, between Object3 and Object4, between Object3 and Object8 is calculated, as averaged etc., and using result of calculation as the transfer distance between Object1 and Object3 and Object4 and Object8.
4th kind: determine transfer number/transition probability maximum in the transfer number/transition probability between Object1 and Object4, between Object1 and Object8, between Object3 and Object4, between Object3 and Object8, and ask for transfer distance, as the transfer distance between Object1 and Object3 and Object4 and Object8 according to this maximum transfer number/transition probability.
5th kind: determine transfer number/transition probability minimum in the transfer number/transition probability between Object1 and Object4, between Object1 and Object8, between Object3 and Object4, between Object3 and Object8, and ask for transfer distance, as the transfer distance between Object1 and Object3 and Object4 and Object8 according to this minimum transfer number/transition probability.
6th kind: the transfer number/transition probability between Object1 and Object4, between Object1 and Object8, between Object3 and Object4, between Object3 and Object8 is calculated, as averaged etc., and ask for transfer distance according to result of calculation, as the transfer distance between Object1 and Object3 and Object4 and Object8.
It should be noted that, above-mentioned citing is only and technical scheme of the present invention is described better, but not limitation of the present invention, it should be appreciated by those skilled in the art that the transfer distance between any object, all should be within the scope of the present invention.
Implementation 2 as step S2) one of preferred version, step S2 comprises step S21, step S22, step S23, step S24 and step S25 further.
In the step s 21, computer equipment selects Part I object and Part II object in described multiple object.
Wherein, Part I object can be the one or more objects in described multiple object, and Part II object can be one or more objects different from Part I object in described multiple object.Preferably, when Part I object or Part II object are multiple, this comprises the Part I object of multiple object or Part II object belongs to a class.
In step S22, computer equipment obtains that determine based on the transfer case information relevant with described Part II object with described Part I object, between this Part I object and Part II object transfer distance.
It should be noted that, before step S22, the transfer distance between Part I object and Part II object may exist; Such as, the transfer distance between Part I object and Part II object may be determined by computer equipment in previous step.
Preferably, when the transfer distance between described Part I object and Part II object exists, computer equipment directly reads the transfer distance between described Part I object and Part II object.As computer equipment directly reads the transfer distance between local already present Part I object and Part II object.
When transfer distance between described Part I object and Part II object does not exist, computer equipment according to determine based on the transfer case information between described Part I object and described Part II object, the one or more object in this Part I object and the transfer distance between the one or more objects in this Part II object, determine the transfer distance between described Part I object and Part II object.Wherein, how to determine the mode of the transfer distance between two objects, between an object and multiple object, between multiple object and multiple object, aforementioned illustrate for " transfer distance between object " in described in detail, do not repeat them here.In addition, if the transfer distance between the one or more objects in the one or more object in Part I object and this Part II object existed before this step performs, then directly read in this step, if not yet obtain this transfer distance when this step performs, then need to determine this transfer distance based on the transfer case information between Part I object and described Part II object.
In step S23, computer equipment is according to the transfer distance between this Part I object and Part II object, and determining whether this Part I object and Part II object gather is a class.
Wherein, this transfer distance is less, and it is that the possibility of a class is higher that Part I object and Part II object are gathered; This transfer distance is larger, and it is that the possibility of a class is less that Part I object and Part II object are gathered.
In step s 24 which, computer equipment reselects Part I object and Part II object, wherein, does not perform cluster operation between the Part I object reselected and Part II object.
In step s 25, computer equipment repeats step S22, step S23 and step S24, until obtain the cluster result of described multiple object.Preferably, computer equipment can adopt various ways, judges whether the cluster result obtaining described multiple object; Such as, whether multiplicity has exceeded predetermined is repeated threshold value, whether there are not the Part I object that do not perform cluster operation and Part II object etc.
Below give an example, better this preferred version to be described:
Such as, 5 objects Object1, Object2, Object3, Object4, Object5 are co-existed in.
In the step s 21, computer equipment selects Object1 as Part I object, selects Object2 as Part II object.Then, in step S22, computer equipment, according to the transfer case information between Object1 and Object2, determines transfer distance between Object1 and Object2.Then, in step S23, computer equipment is according to transfer distance between Object1 and Object2, and determining that Object1 and Object2 gathers is a class; Then, in step s 24 which, computer equipment select to have gathered be Object1 and Object2 of a class as Part I object, select Object3 as Part II object.
Then, computer equipment repeats step S22 to step S23, and determining that Object1 and Object2 and Object3 can not gather is a class, and repeats step S24, select to have gathered be Object1 and Object2 of a class as Part I object, select Object4 as Part II object.
Then, computer equipment repeats step S22 to step S23, and determining that Object1 and Object2 and Object4 can not gather is a class, and repeats step S24, select to have gathered be Object1 and Object2 of a class as Part I object, select Object5 as Part II object.
Then, computer equipment repeats step S22 to step S23, and determining that Object1 and Object2 and Object5 can not gather is a class, and repeats step S24, selects Object3 as Part I object, selects Object4 as Part II object.
Then, computer equipment repeats step S22 to step S23, and determining that Object3 and Object4 gathers is a class, and repeats step S24, select to have gathered be Object3 and Object4 of a class as Part I object, select Object5 as Part II object.
Then, computer equipment repeats step S22 to step S23, determining that Object3 and Object4 and Object5 gathers is a class, and repeat step S24, select to have gathered be Object1 and Object2 of a class as Part I object, selecting to have gathered is that Object3, Object4 and Object5 of a class is as Part II object.
Then, computer equipment repeats step S22 to step S23, determines Object1 and Object2 and Object3, Object4, Object5 can not gather is a class.Further, computer equipment judges currently there is not the Part I object and Part II object that did not carry out cluster, stops cluster operation.Then the cluster result of object Object1, Object2, Object3, Object4, Object5 is: [Object1, Object2], [Object3, Object4, Object5].
In prior art, usually by carrying out natural language analysis to the description text of object, object is classified.Especially, when object relates to commercial use, as when object is brand, receive the impact that artificial supervisor judges, except carrying out except natural language analysis to object oriented, also can in conjunction with the data from object angle, the factor such as sales situation and the market demand of the industry belonging to object and region, object, classifies to object.Also namely, when classifying to the object of design commercial use, can there is such prejudice in those skilled in the art: according to the commercial data from object angle, classify to object.
The solution of the present invention has broken above-mentioned prejudice, can carry out cluster by analyzing the transfer case information of user in object to object; And, compare the data from object angle, by analyzing the transfer case of user in multiple object object carried out to the scheme of cluster in the present invention, more to be close to the users angle, more can reflect the understanding of user to object intuitively, therefore, the determined object classification of the solution of the present invention is more objective, accurate; In addition, even if from the data of user perspective, transfer case information of the present invention is not common data yet, in fact, if clearly mention the data from user perspective, those skilled in the art are more it is contemplated that from the direct evaluation (as marking, comment word etc.) of user.
Fig. 2 is structural representation object being carried out to the clustering apparatus of cluster of a preferred embodiment of the invention.This clustering apparatus can be mounted in computer equipment, this clustering apparatus comprises: for obtain the transfer case information of multiple object device (hereinafter referred to as " acquisition device 1 "), for according to described transfer case information, cluster is carried out to described multiple object, obtains the device (hereinafter referred to as " sub-clustering apparatus 2 ") of the cluster result of described multiple object.
Acquisition device 1 obtains the transfer case information of multiple object.
Wherein, described object can comprise and anyly can be carried out the object of cluster.Preferably, described object has nature of business.More preferably, described object comprises brand.
Wherein, described transfer case information be used to indicate based on object information obtain behavior, the transfer case of user in described multiple object.Wherein, described object information acquisition behavior comprises any behavior that can be used in the information obtaining object; Such as, described object information acquisition behavior comprises the behavior being obtained object information by the search keyword relevant with object; Again such as, described object information acquisition behavior comprises by clicking and browsing the behavior that the content relevant with object obtains object information.Wherein, described " obtaining behavior based on object information ", represents the transfer case that described transfer case reflects user and produces in object information acquisition behavior, and preferably, described transfer case needs to obtain behavior based on object information and determined; Such as, by adding up the object search that multiple user changes in search behavior, or changed in search behavior by the multiple user of statistics with the search keyword of object association, determine the transfer case information etc. of user in middle object.
Preferably, the transfer case information of described multiple object includes but not limited to following at least one item:
1) the transfer path information of user in multiple object.
Wherein, described transfer path information indicates the transfer path of user in multiple object.Such as, there are three objects Object1, Object2 and Object3, transfer path information indicates the transfer path of multiple user in these three objects to comprise: be transferred to Object2 from Object1, and, be transferred to Object3 from Object1.
2) the transfer number information of user between each object.
Wherein, described transfer number information indicates the transfer number of user between each object.Such as, there are three objects Object1, Object2 and Object3, transfer number information indicates the transfer number of multiple user between these three objects and comprises: be transferred to Object2 five times from Object1, and, be transferred to Object3 eight times from Object1.
3) the transition probability information of user between each object.
Wherein, described transition probability information indicates the transition probability of user between each object.Such as, there are three objects Object1, Object2 and Object3, transition probability information indicates the transition probability of multiple user between these three objects and comprises: the probability being transferred to Object2 from Object1 is 38.46%, and the probability being transferred to Object3 from Object1 is 61.54%.
It should be noted that, may not there is transfer path (namely user did not carry out transfer in object information acquisition behavior between these partial objects) between partial objects in multiple object, then the transfer number between these partial objects and transition probability are zero.In addition, the situation being transferred to this object self from an object may be there is; Such as, user's possibility continuous several times in search behavior adopts the information of the different same objects of search keyword search, thus produces the situation being transferred to this object self from an object.
Preferably, this transfer case information can adopt multiple storage mode.
Such as, this transfer case information is stored as form, and have recorded transfer number between each object of the transfer path of user in multiple object and user and transition probability in form, as shown in aforementioned table 1.
Again such as, this transfer case information comprises: be stored as cancellated transfer path, and, the transfer number of (namely between each object) and/or transition probability between each node in this reticulate texture.As for 9 object Object1 to Object9, the transfer case information of these 9 objects comprises transfer path as shown in Figure 3, and, there is in Fig. 3 transfer number and/or the transition probability of (as from Object1 to Object2, from Object1 to Object3, from Object1 to Object4 etc.) between each node that arrow connects.
It should be noted that, above-mentioned citing is only and technical scheme of the present invention is described better, but not limitation of the present invention, those skilled in the art should understand that, any be used to indicate based on object information obtain behavior, the transfer case information of the transfer case of user in described multiple object, all should be within the scope of the present invention.
Particularly, the mode that acquisition device 1 obtains the transfer case information of multiple object includes but not limited to:
1) acquisition device 1 standby directly obtain pre-determining, the transfer case information of the plurality of object.
Such as, acquisition device 1 from local or other equipment read pre-determining, the transfer case information of the plurality of object.
2) acquisition device 1 comprises the device of the transfer case information for obtaining multiple keyword further (hereinafter referred to as " the first sub-acquisition device ", figure does not show) and for the transfer case information of the object that is associated to respectively according to multiple keyword and described multiple keyword, determine the device (hereinafter referred to as " the first determining device ", figure does not show) of the transfer case information of multiple object.
First sub-acquisition device obtains the transfer case information of multiple keyword.
Wherein, the transfer case information of described multiple keyword be used to indicate based on object information obtain behavior, the transfer case of user in multiple keyword.Preferably, described multiple keyword is associated with the object in object information acquisition behavior; Such as, if it is object search behavior that object information obtains behavior, then keyword can be search keyword etc. that is that user in this search behavior inputs or that select.
Preferably, the transfer case information of described multiple keyword comprises following at least one item:
A) the transfer path information of user in described multiple keyword.
Wherein, the transfer path information of user in described multiple keyword indicates the transfer path of user in multiple keyword.Such as, there are three keywords Query1, Query2 and Query3, transfer path information indicates the transfer path of multiple user in these three keywords to comprise: be transferred to Query2 from Query1, and, be transferred to Query3 from Query1.
B) the transfer number information of user between each keyword.
Wherein, the transfer number information of user between each keyword indicates the transfer number of user between each keyword.Such as, there are three keywords Query1, Query2 and Query3, transfer number information indicates the transfer number of multiple user between these three keywords to comprise: be transferred to Query2 five times from Query1, and, be transferred to Query3 eight times from Query1.
It should be noted that, may not there is transfer path (namely user did not carry out transfer in object information acquisition behavior between these Partial key words) between Partial key word in multiple keyword, then the transfer number between these Partial key words is zero.
Preferably, the transfer case information of the plurality of keyword can adopt multiple storage mode.
Such as, this transfer case information is stored as form, and have recorded the transfer path of user in multiple keyword and the transfer number of user between each keyword in form, as shown in Table 2 above.
Again such as, this transfer case information comprises: be stored as cancellated transfer path, and, the transfer number of (namely between each keyword) between each node in this reticulate texture.As for 9 keyword Query1 to Query9, the transfer case information of these 9 objects comprises transfer path as shown in Figure 4, and, there is in Fig. 4 the transfer number between each node that arrow connects.
It should be noted that, above-mentioned citing is only and technical scheme of the present invention is described better, but not limitation of the present invention, those skilled in the art should understand that, any be used to indicate based on object information obtain behavior, the transfer case information of the transfer case of user in multiple keyword, all should be within the scope of the present invention.
Particularly, the implementation that the first sub-acquisition device obtains the transfer case information of multiple keyword includes but not limited to:
A) the first sub-acquisition device directly obtain pre-determining, the transfer case information of the plurality of keyword.
Such as, the first sub-acquisition device from local or other equipment read pre-determining, the transfer case information of the plurality of keyword.
B) the first sub-acquisition device obtains the keyword concern record of at least one user, and pays close attention to record according to described keyword, determines the transfer case information of described multiple keyword.
Wherein, this keyword is paid close attention to record and is comprised keyword that described multiple user paid close attention in object information acquisition behavior and the temporal information that described keyword is concerned.Preferably, object information obtains behavior and comprises search behavior, and the keyword of described concern comprises searched keyword; Preferably, object information obtains behavior and comprises navigation patterns, and the keyword of described concern comprises the clicked keyword in order to browse contents of object.
Preferably, record paid close attention in keyword for each user, first sub-acquisition device pays close attention to according to this keyword the temporal information that in record, the keyword that comprises is concerned, and determines the transfer number between the transfer path of this user in keyword and each keyword; Further, the first sub-acquisition device, by merging the transfer number between the transfer path of each user in keyword and each keyword, determines the transfer case information of described multiple keyword.
Such as, the first sub-acquisition device obtains the keyword concern record of user A and user B; Wherein, the keyword of user A and user B pays close attention to record respectively as shown in aforementioned table 3 and table 4.
Record paid close attention in keyword then for user A, and the first sub-acquisition device determines that the transfer path of user A in keyword comprises " Query1 → Query3 ", and the transfer number of " Query1 → Query3 " is 1; Similarly, the first sub-acquisition device determines that the transfer path of user B in keyword comprises " Query1 → Query2 ", and the transfer number of " Query1 → Query2 " is 1.Then, the first sub-acquisition device merges the transfer number between the transfer path of user A and B in keyword and each keyword, determines that the transfer case information of described multiple keyword is as shown in aforementioned table 5.
It should be noted that, above-mentioned citing is only and technical scheme of the present invention is described better, but not limitation of the present invention, those skilled in the art should understand that, the implementation of the transfer case information of the multiple keyword of any acquisition, all should be within the scope of the present invention.
The transfer case information of the object that the first determining device is associated to respectively according to multiple keyword and described multiple keyword, determines the transfer case information of multiple object.
Particularly, the object that first determining device can be associated to respectively according to the transfer path information of user in multiple keyword and multiple keyword, determine the transfer path information of user in multiple object, and, the object that first determining device is associated to respectively according to the transfer number information of user between each keyword and multiple keyword, determines the transfer number of user between each object and/or transition probability information.
Such as, as shown in Table 2 above, and Query1, Query2 and Query3 are associated to Object1, Object2 and Object3 to the transfer case information of multiple keyword respectively; First determining device is according to the transfer case information of the keyword shown in table 2 and aforementioned incidence relation, determine that the transfer path of user in object Object1, Object2 and Object3 comprises " Object1 → Object2 " and " Object1 → Object3 ", and the transfer number of these 2 transfer paths is respectively 5 and 8; Then, first determining device is according to the transfer number of these 2 transfer paths, calculate transition probability=5/ (5+8)=38.46% of " Object1 → Object2 ", transition probability=8/ (5+8)=61.54% of " Object1 → Object3 ", also namely, the first determining device obtains the transfer case information as shown in aforementioned table 1.
It should be noted that, because a user may be associated to the different keywords (as adopted the difference of corresponding same object to search plain keyword in repeatedly searching for) of same object in continuous several times object information acquisition behavior, therefore, the transition probability being transferred to this object self from an object may be there is, as the p as shown in Fig. 6 can be there is 00deng.An instantiation of Fig. 6 can see Fig. 7.As shown in Figure 7, transferring to " Gymboree " self probability from " Gymboree " can up to 71.86%.
It should be noted that, preferably, the first determining device can carry out the transition probability p of calculating object i to object j based on following formula ij:
p ij = a ij Σ j a ij
Wherein, a ijrepresent the transfer number of object i to object j, represent the transfer number of object i to all objects.
Such as, as shown in Figure 6, object Object0 is transferred to himself and other multiple object Object1 to Object13; To transfer to Object8 from object Object1, object Objectp0 is to the transition probability of object Object8 wherein, represent all transfer numbers from object Object0 to Object0 self and object Object1 to Object13.
It should be noted that, the object that multiple keyword is associated to respectively can be determined in advance, and Fig. 5 shows the transform instances of the cancellated transfer path from the cancellated transfer path of keyword to object.In Fig. 5, each node in the reticulate texture of top is keyword, and each node in the reticulate texture of below is the object corresponding with the keyword in the respective nodes of top.
It should be noted that, above-mentioned citing is only and technical scheme of the present invention is described better, but not limitation of the present invention, those skilled in the art should understand that, the implementation of the transfer case information of the multiple object of any acquisition, all should be within the scope of the present invention.
The transfer case information of multiple objects that sub-clustering apparatus 2 obtains according to acquisition device 1, carries out cluster to the plurality of object, obtains the cluster result of described multiple object.
Wherein, the cluster result of described multiple object can show as various ways; Such as, this cluster result comprises multiple set, and the object that each set comprises belongs to a class; Again such as, this cluster result comprises: object ID and the category IDs corresponding with this object ID, then determine the classification belonging to this object by the category IDs that each object ID is corresponding.
Particularly, sub-clustering apparatus 2, according to the transfer case information of multiple object, carries out cluster to the plurality of object, and the implementation obtaining the cluster result of described multiple object includes but not limited to:
1) sub-clustering apparatus 2 directly carries out cluster according to the transfer case information of multiple object to multiple object, obtains the cluster result of multiple object.Wherein, the transition probability between two objects or transfer number higher, then these two objects are gathered is that the possibility of a class is higher.
Such as, the transfer case information that acquisition device 1 obtains is as shown in aforementioned table 1, then sub-clustering apparatus 2 does not exceed predetermined threshold 60% according to the transition probability between Object1 and Object2 38.46%, determining that Object1 and Object2 can not gather is a class, and, sub-clustering apparatus 2 exceedes predetermined threshold 60% according to the transition probability between Object1 and Object3 61.54%, and determining that Object1 and Object3 gathers is a class.Then sub-clustering apparatus 2 obtains the cluster result [Object1, Object3] showing as two set, [Object2]; Wherein, these two set expression Object1 and Object3 belong to same classification, and Object2 belongs to separately a classification.
It should be noted that, exist in described multiple object and to have been gathered when being the object of a class (may comprise in as multiple object gathered by the operation of artificial or clustering apparatus be the object of a class), may occur judging to gather be whether multiple object of a class and other one or more objects can gather is the situation of a class, then: between an object and the one or more objects in having gathered the multiple objects being a class transition probability or transfer number higher, then this object be that multiple objects of a class are gathered is that the possibility of a class is higher with to have gathered; Gathered one or more object in the multiple objects being a class and the transition probability between other one or more objects having gathered in the multiple objects being a class or transfer number higher, then this gathered be multiple object of a class and other to have gathered be that multiple objects of a class are gathered be that the possibility of a class is higher.
2) sub-clustering apparatus 2 is by based on the transfer distance between described transfer case information acquisition object, carries out cluster, obtain the cluster result of described multiple object to described multiple object.
Particularly, sub-clustering apparatus 2 first can obtain the transfer distance between all objects, then carries out cluster according to transfer distance to multiple object, obtains the cluster result of described multiple object; Or, sub-clustering apparatus 2 can perform repeatedly cluster operation to obtain the cluster result of multiple object, as selected partial objects in each cluster operation from multiple object, and determine transfer distance required between this partial objects, thus cluster operation is carried out to this partial objects.
Preferably, the transfer distance between described object includes but not limited to following at least one item:
A) object in described multiple object and the transfer distance between another object in described multiple object.Wherein, this transfer distance is less, and it is that the possibility of a class is larger that an object in described multiple object and another object in described multiple object are gathered.
Wherein, the transfer distance between two objects is determined by the transfer number information in transfer case information and/or transition probability information.
Such as, the transfer distance between two objects is determined by following formula:
d ij = r ( p ij + p ji ) / 2
Wherein, d ijrepresent the transfer distance between object i and object j, p ijrepresent the transition probability between object i to object j, p jirepresent the transition probability between object j to object i, r represents parameter, and this parameter can manually set.
It should be noted that, above-mentioned formula can adjust as required, as by (the p in formula ij+ p ji)/2 are adjusted to deng.
B) object in described multiple object and the transfer distance between the multiple objects in described multiple object.Wherein, this transfer distance is less, and it is that the possibility of a class is larger that an object in described multiple object and the multiple objects in described multiple object are gathered.Preferably, it is a class that the multiple objects in described multiple object have been gathered usually.
Wherein, an object in described multiple object and the transfer distance between the multiple objects in described multiple object, can determine according to the transfer distance between the one or more objects in the multiple objects in this object and the plurality of object, also can determine according to the transfer number/transition probability between this object to the one or more objects in the multiple objects in the plurality of object.
Such as, co-exist in 9 object Object1 to Object9, wherein, one object Object1 and three is gathered is the transfer distance between object Object4, Object7 and Object8 of a class, determines by any one mode following:
The first: is using transfer distance minimum in the transfer distance between Object1 and Object4, between Object1 and Object7, between Object1 and Object8 as the transfer distance between Object1 and Object4, Object7 and Object8.
The second: using transfer distance maximum in the transfer distance between Object1 and Object4, between Object1 and Object7, between Object1 and Object8 as the transfer distance between Object1 and Object4, Object7 and Object8.
The third: the transfer distance of three between Object1 and Object4, between Object1 and Object7, between Object1 and Object8 is calculated, as averaged etc., and using result of calculation as the transfer distance between Object1 and Object4, Object7 and Object8.
4th kind: determine transfer number/transition probability maximum in the transfer number/transition probability between Object1 and Object4, between Object1 and Object7, between Object1 and Object8, and ask for transfer distance, as the transfer distance between Object1 and Object4, Object7 and Object8 according to this maximum transfer number/transition probability.
5th kind: determine transfer number/transition probability minimum in the transfer number/transition probability between Object1 and Object4, between Object1 and Object7, between Object1 and Object8, and ask for transfer distance, as the transfer distance between Object1 and Object4, Object7 and Object8 according to this minimum transfer number/transition probability.
6th kind: the transfer number/transition probability between Object1 and Object4, between Object1 and Object7, between Object1 and Object8 is calculated, as averaged etc., and ask for transfer distance according to result of calculation, as the transfer distance between Object1 and Object4, Object7 and Object8.
C) the multiple object in described multiple object and the transfer distance between other the multiple objects in described multiple object.Wherein, this transfer distance is less, and it is that the possibility of a class is larger that other the multiple objects in the multiple object in described multiple object and described multiple object are gathered.
Wherein, transfer distance between other multiple objects in multiple object in described multiple object and described multiple object, can determine according to the transfer distance between the one or more objects in the one or more object in the plurality of object and this other multiple object, also can determine according to the transfer number/transition probability between the one or more objects in the one or more object in the plurality of object and this other multiple object.
Such as, co-exist in 9 object Object1 to Object9, wherein, having gathered is two object Object1 and Object3 and two transfer distances of gathering between object Object4 and Object8 being a class of a class, determines by any one mode following:
The first: is using transfer distance minimum in the transfer distance between Object1 and Object4, between Object1 and Object8, between Object3 and Object4, between Object3 and Object8 as the transfer distance between Object1 and Object3 and Object4 and Object8.
The second: using transfer distance maximum in the transfer distance between Object1 and Object4, between Object1 and Object8, between Object3 and Object4, between Object3 and Object8 as the transfer distance between Object1 and Object3 and Object4 and Object8.
The third: the shifting science and technology in four directions distance between Object1 and Object4, between Object1 and Object8, between Object3 and Object4, between Object3 and Object8 is calculated, as averaged etc., and using result of calculation as the transfer distance between Object1 and Object3 and Object4 and Object8.
4th kind: determine transfer number/transition probability maximum in the transfer number/transition probability between Object1 and Object4, between Object1 and Object8, between Object3 and Object4, between Object3 and Object8, and ask for transfer distance, as the transfer distance between Object1 and Object3 and Object4 and Object8 according to this maximum transfer number/transition probability.
5th kind: determine transfer number/transition probability minimum in the transfer number/transition probability between Object1 and Object4, between Object1 and Object8, between Object3 and Object4, between Object3 and Object8, and ask for transfer distance, as the transfer distance between Object1 and Object3 and Object4 and Object8 according to this minimum transfer number/transition probability.
6th kind: the transfer number/transition probability between Object1 and Object4, between Object1 and Object8, between Object3 and Object4, between Object3 and Object8 is calculated, as averaged etc., and ask for transfer distance according to result of calculation, as the transfer distance between Object1 and Object3 and Object4 and Object8.
It should be noted that, above-mentioned citing is only and technical scheme of the present invention is described better, but not limitation of the present invention, it should be appreciated by those skilled in the art that the transfer distance between any object, all should be within the scope of the present invention.
Implementation 2 as sub-clustering apparatus 2) one of preferred version, sub-clustering apparatus 2 comprises device for selecting Part I object and Part II object in multiple object further (hereinafter referred to as " the first selecting arrangement ", figure does not show), to determine based on the transfer case information relevant with described Part II object with described Part I object for obtaining, the device of the transfer distance between this Part I object and Part II object is (hereinafter referred to as " the second sub-acquisition device ", figure does not show), for according to the transfer distance between this Part I object and Part II object, determining whether this Part I object and Part II object gather is that the device of a class is (hereinafter referred to as " the second determining device ", figure does not show), for reselecting the device of Part I object and Part II object (hereinafter referred to as " the second selecting arrangement ", figure does not show) and for triggering described second sub-acquisition device, described second determining device, described second selecting arrangement repeats operation, until the device obtaining the cluster result of described multiple object is (hereinafter referred to as " flip flop equipment ", figure does not show).
First selecting arrangement selects Part I object and Part II object in multiple object.
Wherein, Part I object can be the one or more objects in described multiple object, and Part II object can be one or more objects different from Part I object in described multiple object.Preferably, when Part I object or Part II object are multiple, this comprises the Part I object of multiple object or Part II object belongs to a class.
Second sub-acquisition device obtains that determine based on the transfer case information relevant with described Part II object with described Part I object, between this Part I object and Part II object transfer distance.
It should be noted that, before the second sub-acquisition device executable operations, the transfer distance between Part I object and Part II object may exist; Such as, the transfer distance between Part I object and Part II object may be determined by clustering apparatus in previous operation.
Preferably, when the transfer distance between described Part I object and Part II object exists, the second sub-acquisition device directly reads the transfer distance between described Part I object and Part II object.As the second sub-acquisition device directly reads the transfer distance between local already present Part I object and Part II object.
When transfer distance between described Part I object and Part II object does not exist, second sub-acquisition device according to determine based on the transfer case information between described Part I object and described Part II object, the one or more object in this Part I object and the transfer distance between the one or more objects in this Part II object, determine the transfer distance between described Part I object and Part II object.Wherein, how to determine the mode of the transfer distance between two objects, between an object and multiple object, between multiple object and multiple object, aforementioned illustrate for " transfer distance between object " in described in detail, do not repeat them here.In addition, if the transfer distance between the one or more objects in the one or more object in Part I object and this Part II object existed before the second sub-acquisition device executable operations, then the second sub-acquisition device directly reads, if not yet obtain this transfer distance when the second sub-acquisition device executable operations, then need to determine this transfer distance based on the transfer case information between Part I object and described Part II object.
Second determining device is according to the transfer distance between this Part I object and Part II object, and determining whether this Part I object and Part II object gather is a class.
Wherein, this transfer distance is less, and it is that the possibility of a class is higher that Part I object and Part II object are gathered; This transfer distance is larger, and it is that the possibility of a class is less that Part I object and Part II object are gathered.
Second selecting arrangement reselects Part I object and Part II object, wherein, does not perform cluster operation between the Part I object reselected and Part II object.
The described second sub-acquisition device of flip flop equipment triggering, described second determining device, described second selecting arrangement repeat operation, until obtain the cluster result of described multiple object.Preferably, flip flop equipment can adopt various ways, judges whether the cluster result obtaining described multiple object; Such as, whether multiplicity has exceeded predetermined is repeated threshold value, whether there are not the Part I object that do not perform cluster operation and Part II object etc.
Below give an example, better this preferred version to be described:
Such as, 5 objects Object1, Object2, Object3, Object4, Object5 are co-existed in.
First selecting arrangement selects Object1 as Part I object, selects Object2 as Part II object.Then, the second sub-acquisition device, according to the transfer case information between Object1 and Object2, determines transfer distance between Object1 and Object2.Then, the second determining device is according to transfer distance between Object1 and Object2, and determining that Object1 and Object2 gathers is a class; Then, the second selecting arrangement select to have gathered be Object1 and Object2 of a class as Part I object, select Object3 as Part II object.
Then, flip flop equipment triggers the second sub-acquisition device and the second determining device repeats operation, determining that Object1 and Object2 and Object3 can not gather is a class, and flip flop equipment triggers the second selecting arrangement repeats operation, select to gather be Object1 and Object2 of a class as Part I object, select Object4 as Part II object.
Then, flip flop equipment triggers the second sub-acquisition device and the second determining device repeats operation, determining that Object1 and Object2 and Object4 can not gather is a class, and flip flop equipment triggers the second selecting arrangement repeats operation, select to gather be Object1 and Object2 of a class as Part I object, select Object5 as Part II object.
Then, flip flop equipment triggers the second sub-acquisition device and the second determining device repeats operation, determining that Object1 and Object2 and Object5 can not gather is a class, and flip flop equipment triggers the second selecting arrangement repeats operation, select Object3 as Part I object, select Object4 as Part II object.
Then, flip flop equipment triggers the second sub-acquisition device and the second determining device repeats operation, determining that Object3 and Object4 gathers is a class, and flip flop equipment triggers the second selecting arrangement repeats operation, select to gather be Object3 and Object4 of a class as Part I object, select Object5 as Part II object.
Then, flip flop equipment triggers the second sub-acquisition device and the second determining device repeats operation, determining that Object3 and Object4 and Object5 gathers is a class, and flip flop equipment triggers the second selecting arrangement repeats operation, select to gather be Object1 and Object2 of a class as Part I object, selecting to have gathered is that Object3, Object4 and Object5 of a class is as Part II object.
Then, flip flop equipment triggers the second sub-acquisition device and the second determining device repeats operation, determines Object1 and Object2 and Object3, Object4, Object5 can not gather is a class.Further, flip flop equipment judges currently there is not the Part I object and Part II object that did not carry out cluster, stops cluster operation.Then the cluster result of object Object1, Object2, Object3, Object4, Object5 is: [Object1, Object2], [Object3, Object4, Object5].
In prior art, usually by carrying out natural language analysis to the description text of object, object is classified.Especially, when object relates to commercial use, as when object is brand, receive the impact that artificial supervisor judges, except carrying out except natural language analysis to object oriented, also can in conjunction with the data from object angle, the factor such as sales situation and the market demand of the industry belonging to object and region, object, classifies to object.Also namely, when classifying to the object of design commercial use, can there is such prejudice in those skilled in the art: according to the commercial data from object angle, classify to object.
The solution of the present invention has broken above-mentioned prejudice, can carry out cluster by analyzing the transfer case information of user in object to object; And, compare the data from object angle, by analyzing the scheme that the transfer case of user in multiple object is carried out object in the present invention, more to be close to the users angle, more can reflect the understanding of user to object intuitively, therefore, the determined object classification of the solution of the present invention is more objective, accurate; In addition, even if from the data of user perspective, transfer case information of the present invention is not common data yet, in fact, if clearly mention the data from user perspective, those skilled in the art are more it is contemplated that from the direct evaluation (as marking, comment word etc.) of user.
It should be noted that the present invention can be implemented in the assembly of software and/or software restraint, such as, each device of the present invention can adopt special IC (ASIC) or any other similar hardware device to realize.In one embodiment, software program of the present invention can perform to realize step mentioned above or function by processor.Similarly, software program of the present invention (comprising relevant data structure) can be stored in computer readable recording medium storing program for performing, such as, and RAM storer, magnetic or CD-ROM driver or flexible plastic disc and similar devices.In addition, steps more of the present invention or function can adopt hardware to realize, such as, as coordinating with processor thus performing the circuit of each step or function.
To those skilled in the art, obviously the invention is not restricted to the details of above-mentioned one exemplary embodiment, and when not deviating from spirit of the present invention or essential characteristic, the present invention can be realized in other specific forms.Therefore, no matter from which point, all should embodiment be regarded as exemplary, and be nonrestrictive, scope of the present invention is limited by claims instead of above-mentioned explanation, and all changes be therefore intended in the implication of the equivalency by dropping on claim and scope are included in the present invention.Any Reference numeral in claim should be considered as the claim involved by limiting.In addition, obviously " comprising " one word do not get rid of other unit or step, odd number does not get rid of plural number.Multiple unit of stating in system claims or device also can be realized by software or hardware by a unit or device.First, second word such as grade is used for representing title, and does not represent any specific order.

Claims (20)

1., for carrying out a method for cluster in computer equipment to object, wherein, the method comprises:
Obtain the transfer case information of multiple object, described transfer case information be used to indicate based on object information obtain behavior, the transfer case of user in described multiple object;
According to described transfer case information, cluster is carried out to described multiple object, obtain the cluster result of described multiple object.
2. method according to claim 1, wherein, described in carry out cluster step comprise:
By based on the transfer distance between described transfer case information acquisition object, cluster is carried out to described multiple object, obtain the cluster result of described multiple object.
3. method according to claim 2, wherein, described in carry out cluster step comprise:
Part I object and Part II object is selected in described multiple object;
Obtain that determine based on the transfer case information relevant with described Part II object with described Part I object, between this Part I object and Part II object transfer distance;
According to the transfer distance between this Part I object and Part II object, determining whether this Part I object and Part II object gather is a class;
Reselect Part I object and Part II object, wherein, between the Part I object reselected and Part II object, do not perform cluster operation;
Repeat the transfer distance between described acquisition Part I object and Part II object, described determine this Part I object and Part II object whether gather for a class, described in reselect the step of Part I object and Part II object, until obtain the cluster result of described multiple object.
4. method according to claim 3, wherein, the step obtaining the transfer distance between described Part I object and Part II object comprises:
When transfer distance between described Part I object and Part II object exists, directly read the transfer distance between described Part I object and Part II object;
When transfer distance between described Part I object and Part II object does not exist, according to determine based on the transfer case information between described Part I object and described Part II object, the one or more object in this Part I object and the transfer distance between the one or more objects in this Part II object, determine the transfer distance between described Part I object and Part II object.
5. the method according to any one of claim 2 to 4, wherein, the transfer distance between described object comprises following at least one item:
An object in-described multiple object and the transfer distance between another object in described multiple object;
An object in-described multiple object and the transfer distance between the multiple objects in described multiple object;
Multiple object in-described multiple object and the transfer distance between other the multiple objects in described multiple object.
6. method according to any one of claim 1 to 5, wherein, the step obtaining described transfer case information comprises:
Obtain the transfer case information of multiple keyword, wherein, the transfer case information of described multiple keyword be used to indicate based on object information obtain behavior, the transfer case of user in multiple keyword;
The transfer case information of the object be associated to respectively according to described multiple keyword and described multiple keyword, determines the transfer case information of described multiple object.
7. method according to claim 6, wherein, the step obtaining the transfer case information of described multiple keyword comprises:
Record paid close attention in the keyword obtaining at least one user, and this keyword is paid close attention to record and comprised keyword that described multiple user paid close attention in object information acquisition behavior and the temporal information that described keyword is concerned;
Pay close attention to record according to described keyword, determine the transfer case information of described multiple keyword.
8. the method according to claim 6 or 7, wherein, the transfer case information of described multiple keyword comprises following at least one item:
The transfer path information of-user in described multiple keyword;
The transfer number information of-user between each keyword.
9. method according to any one of claim 1 to 8, wherein, the transfer case information of described multiple object comprises following at least one item:
The transfer path information of-user in described multiple object;
The transfer number information of-user between each object;
The transition probability information of-user between each object.
10. method according to any one of claim 1 to 9, wherein, described object comprises brand.
11. 1 kinds for carrying out the device of cluster in computer equipment to object, wherein, this device comprises:
For obtaining the device of the transfer case information of multiple object, described transfer case information be used to indicate based on object information obtain behavior, the transfer case of user in described multiple object;
For according to described transfer case information, cluster is carried out to described multiple object, obtain the device of the cluster result of described multiple object.
12. devices according to claim 11, wherein, comprise for the described device carrying out cluster:
For by based on the transfer distance between described transfer case information acquisition object, cluster is carried out to described multiple object, obtains the device of the cluster result of described multiple object.
13. devices according to claim 12, wherein, comprise for the described device carrying out cluster:
For selecting the device of Part I object and Part II object in described multiple object;
For obtaining the device of that determine based on the transfer case information relevant with described Part II object with described Part I object, between this Part I object and Part II object transfer distance;
For according to the transfer distance between this Part I object and Part II object, determining whether this Part I object and Part II object gather is the device of a class;
For reselecting the device of Part I object and Part II object, wherein, cluster operation was not performed between the Part I object reselected and Part II object;
For trigger device for obtaining the transfer distance between Part I object and Part II object, for determine this Part I object and Part II object whether gather be a class device, repeat operation for the device reselecting Part I object and Part II object, until obtain the device of the cluster result of described multiple object.
14. devices according to claim 13, wherein, comprise for the device obtaining the transfer distance between described Part I object and Part II object:
When existing for the transfer distance between described Part I object and Part II object, directly read the device of the transfer distance between described Part I object and Part II object;
When not existing for the transfer distance between described Part I object and Part II object, according to determine based on the transfer case information between described Part I object and described Part II object, the one or more object in this Part I object and the transfer distance between the one or more objects in this Part II object, determine the device of the transfer distance between described Part I object and Part II object.
15. according to claim 12 to the device according to any one of 14, and wherein, the transfer distance between described object comprises following at least one item:
An object in-described multiple object and the transfer distance between another object in described multiple object;
An object in-described multiple object and the transfer distance between the multiple objects in described multiple object;
Multiple object in-described multiple object and the transfer distance between other the multiple objects in described multiple object.
16. according to claim 11 to the device according to any one of 15, wherein, comprises for the device obtaining described transfer case information:
For obtaining the device of the transfer case information of multiple keyword, wherein, the transfer case information of described multiple keyword be used to indicate based on object information obtain behavior, the transfer case of user in multiple keyword;
For the transfer case information of the object that is associated to respectively according to described multiple keyword and described multiple keyword, determine the device of the transfer case information of described multiple object.
17. devices according to claim 16, wherein, the device for the transfer case information obtaining described multiple keyword comprises:
The device of record paid close attention in keyword for obtaining at least one user, and this keyword is paid close attention to record and comprised keyword that described multiple user paid close attention in object information acquisition behavior and the temporal information that described keyword is concerned;
For paying close attention to record according to described keyword, determine the device of the transfer case information of described multiple keyword.
18. devices according to claim 16 or 17, wherein, the transfer case information of described multiple keyword comprises following at least one item:
The transfer path information of-user in described multiple keyword;
The transfer number information of-user between each keyword described.
19. according to claim 11 to the device according to any one of 18, and wherein, the transfer case information of described multiple object comprises following at least one item:
The transfer path information of-user in described multiple object;
The transfer number information of-user between each object described;
The transition probability information of-user between each object described.
20. according to claim 11 to the device according to any one of 19, and wherein, described object comprises brand.
CN201510090184.XA 2015-02-27 2015-02-27 A kind of method and apparatus that object is clustered Active CN104731867B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201510090184.XA CN104731867B (en) 2015-02-27 2015-02-27 A kind of method and apparatus that object is clustered

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201510090184.XA CN104731867B (en) 2015-02-27 2015-02-27 A kind of method and apparatus that object is clustered

Publications (2)

Publication Number Publication Date
CN104731867A true CN104731867A (en) 2015-06-24
CN104731867B CN104731867B (en) 2018-09-07

Family

ID=53455754

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201510090184.XA Active CN104731867B (en) 2015-02-27 2015-02-27 A kind of method and apparatus that object is clustered

Country Status (1)

Country Link
CN (1) CN104731867B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110069542A (en) * 2017-09-26 2019-07-30 北京国双科技有限公司 Keyword appraisal procedure and device

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5504887A (en) * 1993-09-10 1996-04-02 International Business Machines Corporation Storage clustering and packing of objects on the basis of query workload ranking
CN101527000A (en) * 2009-04-03 2009-09-09 南京航空航天大学 Fast movable object orbit clustering method based on sampling
CN104142950A (en) * 2013-05-10 2014-11-12 中国人民大学 Microblog user classifying method based on keyword extraction and gini coefficient
CN104199969A (en) * 2014-09-22 2014-12-10 北京国双科技有限公司 Webpage data analysis method and device

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5504887A (en) * 1993-09-10 1996-04-02 International Business Machines Corporation Storage clustering and packing of objects on the basis of query workload ranking
CN101527000A (en) * 2009-04-03 2009-09-09 南京航空航天大学 Fast movable object orbit clustering method based on sampling
CN104142950A (en) * 2013-05-10 2014-11-12 中国人民大学 Microblog user classifying method based on keyword extraction and gini coefficient
CN104199969A (en) * 2014-09-22 2014-12-10 北京国双科技有限公司 Webpage data analysis method and device

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110069542A (en) * 2017-09-26 2019-07-30 北京国双科技有限公司 Keyword appraisal procedure and device
CN110069542B (en) * 2017-09-26 2021-06-29 北京国双科技有限公司 Keyword evaluation method and device

Also Published As

Publication number Publication date
CN104731867B (en) 2018-09-07

Similar Documents

Publication Publication Date Title
Lin et al. Pet: a statistical model for popular events tracking in social communities
CN105701216B (en) A kind of information-pushing method and device
CN104123332B (en) The display methods and device of search result
Cai et al. What are popular: exploring twitter features for event detection, tracking and visualization
KR101793222B1 (en) Updating a search index used to facilitate application searches
JP5092165B2 (en) Data construction method and system
US9946775B2 (en) System and methods thereof for detection of user demographic information
CN104077377A (en) Method and device for finding network public opinion hotspots based on network article attributes
CN103150374A (en) Method and system for identifying abnormal microblog users
CN102609475A (en) Method for monitoring content of microblog and monitoring system
CN103324666A (en) Topic tracing method and device based on micro-blog data
CA3059929C (en) Text searching method, apparatus, and non-transitory computer-readable storage medium
CN103593336A (en) Knowledge pushing system and method based on semantic analysis
KR20130009987A (en) Method and system of displaying friend status and computer storage medium for same
CN102135983A (en) Group dividing method and device based on network user behavior
US10135723B2 (en) System and method for supervised network clustering
Suma et al. Automatic detection and validation of smart city events using hpc and apache spark platforms
CN105404675A (en) Ranked reverse nearest neighbor space keyword query method and apparatus
CN103778206A (en) Method for providing network service resources
CN102855245A (en) Image similarity determining method and image similarity determining equipment
CN103744887A (en) Method and device for people search and computer equipment
CN108304432A (en) Information push processing method, information push processing unit and storage medium
CN103761286B (en) A kind of Service Source search method based on user interest
Yerva et al. What have fruits to do with technology? The case of Orange, Blackberry and Apple
KR101621735B1 (en) Recommended search word providing method and system

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant