CN103902654B - Clustering method and device and terminal device - Google Patents

Clustering method and device and terminal device Download PDF

Info

Publication number
CN103902654B
CN103902654B CN201410073353.4A CN201410073353A CN103902654B CN 103902654 B CN103902654 B CN 103902654B CN 201410073353 A CN201410073353 A CN 201410073353A CN 103902654 B CN103902654 B CN 103902654B
Authority
CN
China
Prior art keywords
visited
neighborhood
distance
neighbor objects
objects
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201410073353.4A
Other languages
Chinese (zh)
Other versions
CN103902654A (en
Inventor
陈志军
王琳
王百超
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Xiaomi Inc
Original Assignee
Xiaomi Inc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Xiaomi Inc filed Critical Xiaomi Inc
Priority to CN201410073353.4A priority Critical patent/CN103902654B/en
Publication of CN103902654A publication Critical patent/CN103902654A/en
Application granted granted Critical
Publication of CN103902654B publication Critical patent/CN103902654B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning

Landscapes

  • Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Theoretical Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • General Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • Software Systems (AREA)
  • Physics & Mathematics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • General Engineering & Computer Science (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Medical Informatics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Biology (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The embodiment of the invention discloses a clustering method and device and a terminal device. The clustering method comprises the steps that firstly, neighbor objects of an object to be accessed are obtained, and the number of the neighbor objects in a neighborhood of the object to be accessed is worked out according to the weight coefficients corresponding to the distances between the neighbor objects and the object to be accessed; secondly, the smaller the distance between the neighbor object and the object to be accessed is, the larger the quantitative proportion, obtained through conversion, of the neighbor object is, and the contribution of the object to the neighbor objects in the neighborhood is larger; otherwise, the larger the distance between the neighbor object and the object to be accessed is, the smaller the quantitative proportion, obtained through conversion, of the neighbor object is, and the contribution of the object to the neighbor objects in the neighborhood is smaller. By the adoption of the clustering method, the sensitivity of the number of the neighbor objects to the scanning radius of the neighborhood and the minimum number of contained objects of the neighborhood is reduced to some extent, and then the sensitivity of a clustering result, obtained through the clustering method, to the scanning radius of the neighborhood and the minimum number of the contained objects is reduced; in this way, the accuracy rate of the clustering result is increased.

Description

Clustering method, device and terminal unit
Technical field
It relates to technical field of data processing, more particularly to a kind of clustering method, device and terminal unit.
Background technology
Cluster is the process that the set of physics or abstract object is divided into the multiple classes being made up of similar object, will object It is categorized into different classes or the process of cluster, the object in same class has very big similarity, the object between inhomogeneity There is very big diversity.
Clustering method includes many types, wherein, density clustering method from unlike other clustering methods it It is not based on various distances, but be based on density, as long as the density of the point in a region is greater than certain threshold values, just it It is added in cluster close therewith.The clustering algorithm based on distance so can be overcome can only to find the cluster of " similar round " Shortcoming.For example, DBSCAN(Density-Based Spatial Clustering of Applications with Noise) Algorithm is namely based on a kind of typical algorithm in the clustering method of density, and cluster is defined as the point that density is connected by DBSCAN algorithm Maximum set, can have highdensity enough region division be cluster it is possible in the spatial database of noise send out The cluster of existing arbitrary shape.DBSCAN algorithm introduces the concept of kernel object and two initial parameters Eps(Scanning half Footpath)And MinPts(Minimum comprises to count).If there is an object, it is being no less than in the range of its Eps MinPts object, then this object is exactly kernel object.Neighbor objects in the range of kernel object and its Eps form one Individual cluster.It is all kernel object if there is multiple objects in a cluster, then the cluster centered on these kernel objects will close And.But, the cluster result of this kind of clustering algorithm is very sensitive to the value of parameter Eps and MinPts, that is, Eps and The value of MinPts is different, produces different cluster results, thus leading to the uncertainty of cluster result.
Content of the invention
For overcoming problem present in correlation technique, the disclosure provides a kind of clustering method, device and terminal unit.
In order to solve above-mentioned technical problem, the embodiment of the present disclosure discloses following technical scheme:
According to the embodiment of the present disclosure in a first aspect, provide a kind of clustering method, including:
For arbitrary object to be visited, obtain whole neighbor objects of described object to be visited;
According to the distance between described neighbor objects and described object to be visited and corresponding weight coefficient, obtain described object to be visited Neighborhood in neighbor objects quantity, the distance between described weight coefficient and described object is related;
The quantity of neighbor objects in neighborhood according to described object to be visited, judges whether described object to be visited is kernel object;
When described to be visited to as if during kernel object, described object to be visited is classified as a class;
To direct density in neighborhood for the described object to be visited up to object be extended clustering, the object until not having new adds Enter the class that described object to be visited is located.
In conjunction with a first aspect, in the first possible implementation of first aspect, described according to described neighbor objects with described Distance between object to be visited and corresponding weight coefficient, obtain the quantity of neighbor objects in the neighborhood of described object to be visited, adopt Use following manner:
Obtain the neighbor objects in described neighborhood and the distance between described object to be visited;
According to described distance, determine the neighbor objects in described neighborhood from whole neighbor objects of described object to be visited;
Obtain described apart from corresponding weight coefficient, described weight coefficient is related to the distance between object;
According to described weight coefficient, calculate the quantity of neighbor objects in the neighborhood of described object to be visited.
In conjunction with the first possible implementation of first aspect, in the possible implementation of the second of first aspect, obtain Described apart from corresponding weight coefficient, in the following way:
Correspondence between the probability whether distance between object and two objects are same targets is obtained according to sample object statistics Relation;
Inquire about described corresponding relation, obtain described apart from corresponding two objects be whether same object probability;
According to the product between described probability and described distance, obtain described apart from corresponding weight coefficient, described weight coefficient with Described probability positive correlation.
In conjunction with a first aspect, in the third possible implementation of first aspect, described object to be visited is provided with scanning half The ascending multiple neighborhood changing successively in footpath;
The quantity of neighbor objects in neighborhood according to described object to be visited, judges whether described object to be visited is kernel object, In the following way:
According to the order that described sweep radius is ascending, whether the quantity judging neighbor objects in described neighborhood is not less than corresponding Predetermined threshold value;
When neighbor objects in described neighborhood quantity be not less than corresponding predetermined threshold value when, determine described to be visited to as if core pair As;
When the quantity of neighbor objects in described current neighborhood is less than corresponding predetermined threshold value, judge the multiple of described object to be visited Whether neighborhood has all judged;
When not judged described multiple neighborhood, execution, according to the ascending order of described sweep radius, judges next weight neighborhood Whether the quantity of interior neighbor objects is not less than corresponding predetermined threshold value;
When having judged described multiple neighborhood, determine that described object to be visited is not kernel object.
In conjunction with the first any one to the third possible implementation of first aspect and first aspect, in first aspect In 4th kind of possible implementation, to direct density in neighborhood for the described object to be visited up to object be extended gathering Class, until not having new object to add the class that described object to be visited is located, in the following way:
Obtain whole directly density in designated field for the described object to be visited up to object, the scanning half of described specified neighborhood Footpath is less than the maximum scan radius in described multiple neighborhood;
Judge one by one described direct density up to object whether be kernel object;
When described direct density up to as if during kernel object, by described direct density up to object in specified neighborhood Neighbor objects add the apoplexy due to endogenous wind that described object to be visited is located, until not having new object to add the class that described object to be visited is located In.
In conjunction with the first possible implementation of first aspect, in the 5th kind of possible implementation of first aspect, according to Described distance, determines the neighbor objects in described neighborhood from whole neighbor objects of described object to be visited, using as lower section Formula:
The distance between each described neighbor objects and described object to be visited are ranked up according to magnitude relationship;
According to the sequence of described distance, the distance between statistics and described object to be visited is less than the neighbours of the sweep radius of described neighborhood Object.
According to the second aspect of the embodiment of the present disclosure, provide a kind of clustering apparatus, including:
First acquisition unit, for for arbitrary object to be visited, obtaining whole neighbor objects of described object to be visited;
Second acquisition unit, for according to the distance between described neighbor objects and described object to be visited and corresponding weight coefficient, Obtain the quantity of neighbor objects in the neighborhood of described object to be visited, the distance between described weight coefficient and described object is related;
Judging unit, for the quantity of neighbor objects in the neighborhood according to described object to be visited, judge described to be visited to as if No is kernel object;
Cluster cell, for when described to be visited to as if during kernel object, described object to be visited is classified as a class;
Extended clustering unit, for direct density in neighborhood for the described object to be visited up to object be extended clustering, Until not having new object to add the class that described object to be visited is located.
In conjunction with second aspect, in the first possible implementation of second aspect, described second acquisition unit includes:
First acquisition subelement, for obtaining the distance between the neighbor objects in described neighborhood and described object to be visited;
First determination subelement, for according to described distance, determining described from whole neighbor objects of described object to be visited Neighbor objects in neighborhood;
Second acquisition subelement, for obtain described apart from the distance phase between corresponding weight coefficient, described weight coefficient and object Close;
Computation subunit, for according to described weight coefficient, calculating the quantity of neighbor objects in the neighborhood of described object to be visited.
In conjunction with the first possible implementation of second aspect, in the possible implementation of the second of second aspect, described Second acquisition subelement includes:
Whether statistics subelement, be same target for obtaining the distance between object and two objects according to sample object statistics Corresponding relation between probability;
Inquiry subelement, for inquiring about described corresponding relation, obtain described apart from corresponding two objects whether be same object Probability;
3rd acquisition subelement, for according to the product between described probability and described distance, obtain described apart from corresponding weight Coefficient, described weight coefficient and described probability positive correlation.
In conjunction with second aspect, in the third possible implementation of second aspect, described object to be visited is provided with multiple neighbour Domain, the sweep radius of described multiple neighborhood is set to ascending change successively;Described judging unit includes:
First judgment sub-unit, for the order that the sweep radius according to described multiple neighborhood is ascending, judges in a weight neighborhood Whether the quantity of neighbor objects is not less than corresponding predetermined threshold value;
First determination subelement, when the quantity for weighing neighbor objects in neighborhood when described one is not less than corresponding predetermined threshold value, really Fixed described to be visited to as if kernel object;
Second judgment sub-unit, for when the quantity of neighbor objects in described one weight neighborhood is less than corresponding predetermined threshold value, judging Whether the multiple neighborhood of described object to be visited has all judged, when not judged described multiple neighborhood, returns execution according to sweeping Retouch the ascending order of radius, judge whether the quantity of the neighbor objects in next weight neighborhood is not less than corresponding predetermined threshold value;
Second determination subelement, for when having judged described multiple neighborhood, determining that described object to be visited is not kernel object.
In conjunction with the third possible implementation of second aspect, in the 4th kind of possible implementation of second aspect, described Extended clustering unit includes:
4th acquisition subelement, for obtain whole directly density in designated field for the described object to be visited up to object, The sweep radius of described specified neighborhood is less than the maximum scan radius in described multiple neighborhood;
3rd judgment sub-unit, for judge one by one described direct density up to object whether be kernel object;
Cluster subelement, for when described direct density up to as if during kernel object, by described direct density up to right As the neighbor objects in specified neighborhood add the apoplexy due to endogenous wind at described object place to be visited, until there is no described in new object addition to treat Access the apoplexy due to endogenous wind that object is located.
In conjunction with the first possible implementation of second aspect, in the 5th kind of possible implementation of second aspect, described First determination subelement includes:
Sequence subelement, for carrying out the distance between each described neighbor objects and described object to be visited according to magnitude relationship Sequence;
Statistics subelement, for the sequence according to described distance, the distance between statistics and described object to be visited is less than described neighborhood Sweep radius neighbor objects.
According to the second aspect of the embodiment of the present disclosure, provide a kind of terminal unit, including:Processor;Can for storing processor The memorizer of execute instruction;Wherein, described processor is configured to:
For arbitrary object to be visited, obtain whole neighbor objects of described object to be visited;
According to the distance between described neighbor objects and described object to be visited and corresponding weight coefficient, obtain described object to be visited Neighborhood in neighbor objects quantity, the distance between described weight coefficient and described object is related;
The quantity of neighbor objects in neighborhood according to described object to be visited, judges whether described object to be visited is kernel object;
When described to be visited to as if during kernel object, described object to be visited is classified as a class;
To direct density in neighborhood for the described object to be visited up to object be extended clustering, the object until not having new adds Enter the class that described object to be visited is located.
The technical scheme that embodiment of the disclosure provides can include following beneficial effect:According to neighbor objects and object to be visited it Between the corresponding weight coefficient of distance, calculate neighbor objects in neighborhood for the object to be visited quantity;Get over apart from object to be visited Near object, the quantitative proportion converting this object obtaining is bigger, and the contribution to neighbor objects in neighborhood is also bigger;Conversely, Apart from the more remote object of object to be visited, the quantitative proportion that conversion obtains this object is less, the contribution to neighbor objects in neighborhood Less.Thus reducing the sensitivity that neighbor objects quantity comprises number of objects to sweep radius and the minimum of neighborhood to a certain extent Property, and then reduce the sensitivity that the cluster result obtaining using this kind of clustering method comprises number of objects to sweep radius and minimum Property, improve the accuracy rate of cluster result.
It should be appreciated that above general description and detailed description hereinafter are only exemplary, this public affairs can not be limited Open.
Brief description
Accompanying drawing herein is merged in description and constitutes the part of this specification, shows and meets embodiments of the invention, And be used for explaining the principle of the present invention together with description.
Fig. 1 is a kind of flow chart of the clustering method according to an exemplary embodiment;
Fig. 2 is a kind of object distribution schematic diagram according to an exemplary embodiment;
Fig. 3 is the flow chart of step S200 according to an exemplary embodiment;
Fig. 4 is the flow chart of step S220 according to an exemplary embodiment;
Fig. 5 is the flow chart of step S230 according to an exemplary embodiment;
Fig. 6 is the flow chart of step S300 according to an exemplary embodiment;
Fig. 7 is the neighborhood distribution schematic diagram according to an exemplary embodiment;
Fig. 8 is the flow chart of step S500 according to an exemplary embodiment;
Fig. 9 is the block diagram of the clustering apparatus according to an exemplary embodiment;
Figure 10 is a kind of block diagram of the terminal unit according to an exemplary embodiment.
By above-mentioned accompanying drawing it has been shown that the clear and definite embodiment of the disclosure, hereinafter will be described in more detail.These accompanying drawings It is not intended to limit the scope of disclosure design by any mode, but be this area skill by reference to specific embodiment Art personnel illustrate the concept of the disclosure.
Specific embodiment
Here will in detail exemplary embodiment be illustrated, its example is illustrated in the accompanying drawings.Explained below is related to attached During figure, unless otherwise indicated, the same numbers in different accompanying drawings represent same or analogous key element.Following exemplary is implemented Embodiment described in example does not represent all embodiments consistent with the present invention.On the contrary, they be only with such as The example of the consistent apparatus and method of some aspects being described in detail in appended claims, the present invention.
Before embodiment of this disclosure is described in detail, introduce the concept of the following noun of disclosure appearance first:
E neighborhood:With certain object as the center of circle, the region for E for the sweep radius is referred to as the E neighborhood of this object;
Kernel object:If the quantity of the neighbor objects in the E neighborhood of certain object P comprises number of objects not less than minimum MinPts, then this object is called kernel object;
Neighbor objects:For certain object P, the object being joined directly together with object P is referred to as the neighbor objects of P;
Directly density up to:For sample set, if neighbor objects Q is in the E neighborhood of object P, and object P Kernel object, then claim object Q from the direct density of object P up to, namely Q be neighbours in E neighborhood for the object P Object.
Fig. 1 is a kind of flow chart of the clustering method according to an exemplary embodiment, as shown in figure 1, described cluster Method is used in terminal, comprises the following steps:
In the step s 100, whole neighbor objects of described object to be visited for arbitrary object to be visited, are obtained;
The object of the clustering method that the disclosure provides can be facial image, and the image belonging to same person is flocked together Form a cluster.Feature Conversion in facial image is become one group of vector, therefore, the distance between object is between vector Distance.Certainly, the clustering method that the disclosure is provided can be applicable to the other data in addition to picture.
For a pending object set, each object in this object set as object to be visited, is treated described in acquisition Access whole neighbor objects of object.As shown in Fig. 2 object 5 is joined directly together with object 4,6 and 7, then object 4, 6 and 7 is the neighbor objects of object 5.
Each object arranges access identities, when accessing certain object, the access identities of this object is labeled as accessing. For example, when certain object is not accessed, corresponding access identities are " 0 ";If this object is accessed, will access Mark is revised as " 1 ".Access identities according to object identify whether described object is object to be visited.Can be according to object Access identities judge whether described object is object to be visited.
In step s 200, according to the distance between described neighbor objects and described object to be visited and corresponding weight coefficient, Obtain the quantity of neighbor objects in the neighborhood of described object to be visited, the distance between described weight coefficient and described object is related;
Described weight coefficient can embody the relatedness of the similarity between object, and for example, the distance between object is bigger, shows Similarity between two objects is less, and its corresponding weight coefficient is less;Conversely, the distance between object is less, show that two is right As similarity bigger, its corresponding weight coefficient is bigger.I.e. apart from the object that described object to be visited is nearer, in meter Calculate contribution during number of objects bigger;Apart from the more remote object of described object to be visited, when calculating number of objects, contribution is less.
In the disclosure one exemplary embodiment, as shown in figure 3, described step S200 may comprise steps of:
S210, obtains the neighbor objects in described neighborhood and the distance between described object to be visited;Described distance can be remaining String similarity, Euclidean distance etc..
Assume that two objects are respectively A and B, wherein, the cosine similarity between this two objects is according to formula(1)Meter Calculate:
cos θ = A · B | | A | | B | | - - - ( 1 )
Formula(1)In, molecule represents the inner product of vectorial A and vectorial B, and | A | is the length of vectorial A, and | B | is vectorial B Length.
It should be noted that when the disclosure adopts the cosine similarity between object to react the distance relation between object, utilizing (1-cosθ)Characterize the distance between object, i.e. d=1-cos θ, so, the distance between object is less, the phase of two objects Bigger like property.
S220, according to described distance, determines the neighbour in described neighborhood from whole neighbor objects of described object to be visited Occupy object;
In the neighbor objects of object P, if the distance between neighbor objects and object P are not more than the E1 neighborhood of object P Sweep radius E1 it is determined that this neighbor objects is neighbor objects in E1 neighborhood for the object P.
Alternatively, in the disclosure one exemplary embodiment, as shown in figure 4, step S220 may comprise steps of:
In step S221, by the distance between each described neighbor objects and described object to be visited according to magnitude relationship It is ranked up;
In step S222, according to the sequence of described distance, the distance between statistics and described object to be visited is less than described The neighbor objects of the sweep radius of neighborhood.
Judge the quantity of the object in E1 neighborhood, according to object in E1 neighborhood and the distance between object P magnitude relationship It is ranked up, obtain distance sequence, can be first according to the method for lookup(For example, binary chop)Determine described apart from sequence In row with sweep radius E1 immediate distance, then in statistical series be less than this distance number, improve judgement effect Rate.
In step S230, obtain described apart from corresponding weight coefficient, described weight coefficient is related to the distance between object;
Described weight coefficient can determine according to the attributive distance between object, can with object between distance and two right As if no be same object determine the probability, the disclosure do not limit to this.
In the disclosure one exemplary embodiment, as shown in figure 5, step S230 may comprise steps of:
In step S231, counted according to sample object, obtain whether the distance between object and two objects are same Corresponding relation between the probability of object;
For example, in recognition of face, the model of the cosine similarity cos θ of two facial images being calculated according to high dimensional feature Enclosing is [0,1], is drawn according to substantial amounts of facial image statistical data, and when cosine similarity is in the range of [0.45,1], two is right As if the probability of same person is substantially more than 98%;When cosine similarity [0.35,0.45) in the range of when, two to as if The probability of same person is substantially 70%;When cosine similarity [0.25,0.35) in the range of when, two to as if same person Probability be substantially 40%;When cosine similarity [0.15,0.25) in the range of when, two to as if same person probability base Originally it is 10%;When cosine similarity is [0,0.15)In the range of when, two to as if the probability of same person is substantially 0.1%.
Can be obtained between the probability whether distance between object and two objects are same objects according to above-mentioned statistical result Corresponding relation, described corresponding relation can in a tabular form or other forms storage.
In step S232, inquire about described corresponding relation, obtain described apart from corresponding two objects whether be same The probability of object;
In step S233, according to the product between described probability and described distance, obtain described apart from corresponding weight Coefficient, described weight coefficient and described probability positive correlation.
When distance when between object utilizes cosine similarity to calculate, need for the distance between object to be converted into corresponding cosine phase Like spend, between object, the span of distance is converted into the span of corresponding cosine similarity, obtain described probability with remaining Corresponding relation between string similarity;According to this corresponding relation.Below equation can be adopted(2)Description weight coefficient with Relation between cosine similarity:
W ( d ) = 1 * cos &theta; , if cos &theta; &GreaterEqual; 0.45 0.7 * cos &theta; , if 0.35 &le; cos &theta; < 0.45 0.4 * cos &theta; , if 0.25 &le; cos &theta; < 0.35 0.1 * cos &theta; , if 0.15 &le; cos &theta; < 0.25 0.001 * cos &theta; , if cos &theta; 0.15 - - - ( 2 )
Formula(2)It is to be whether corresponding relation between the probability of same person according to cosine similarity and two objects.Other The distance of type, carries out similar summary and derivation according to the relation between corresponding probability, here is omitted.
Certainly, described weight coefficient can also be obtained using other modes, as long as weight coefficient can characterize between object Relation between distance and the similarity of object.
S240, according to described weight coefficient, calculates the quantity of neighbor objects in the neighborhood of described object to be visited.
The quantity of neighbor objects, such as formula can be characterized according to weight coefficient(3)Represent:
Pts = &Sigma; i &Element; E W ( d i ) - - - ( 3 )
Wherein, diIt is object PiThe distance between with object P, PiIt is neighbor objects in E neighborhood for the object P.
Formula(3)It is the quantity sum of the whole objects in E neighborhood, really the quantity of each object is become from 1 This object and the distance between described object to be visited corresponding weight coefficient W(di).In other words, it is not simple calculating The number of the object comprising in E neighborhood, but the ratio of object distance centre point its object number obtaining that converts nearlyer is more Greatly, conversely, the ratio of object distance centre point its object number obtaining that converts more remote is less.
In step S300, the quantity of neighbor objects in the neighborhood according to described object to be visited, judge described to be visited Whether object is kernel object;If it is, entering step S400;Otherwise, by described object tag to be visited be noise spot, And return execution step S100, obtain whole neighbor objects of next object to be visited, carry out the process of next round, directly To there is not object to be visited.
Assume that the minimum of the E1 neighbor assignment of object P to be visited comprises number of objects Minpts1, judge that step S200 is counted Quantity Pts1 of the neighbor objects of P in the E1 neighborhood obtaining, if not less than Minpts1, if Pts1 >=Minpts1, Then P is kernel object;Otherwise, P is not kernel object.
It is understood that minimum comprises number of objects Minpts1 can be set according to described weight coefficient, also may be used To be set according to result of the test or empirical value.
When described to be visited to as if during kernel object, in step S400, described object to be visited is classified as a class.
If P is kernel object, P is regarded as a class, that is, such comprises object P.
Class label C is set for object, under original state, the class of object is numbered 0, often increases a class, corresponding class Label adds 1, and for example, object P is kernel object, then P is classified as a class, and class label C is updated to 1;Find not Belong to P place class kernel object Q when, Q is classified as another class, class label is updated to 2, the like, increase newly Plus class corresponding class label increase by 1 on the basis of the class label of existing class.
In step S500, to direct density in neighborhood for the described object to be visited up to object be extended cluster, Until not having new object to add the class that described object to be visited is located.
Extended clustering be exactly by the direct density of kernel object P up to object PiAdd the apoplexy due to endogenous wind that kernel object P is located, Then, judge one by one the direct density of this apoplexy due to endogenous wind up to object PiWhether it is kernel object, if it is, again will be described new Kernel object PiDirect density up to object add P be located apoplexy due to endogenous wind.
The probability of sweep radius bigger introducing noise spot is bigger, therefore, when the credibility of sweep radius is not high, permissible Only the object in specified range is extended clustering, thus avoid for more noise spots to add apoplexy due to endogenous wind, thus improving The accuracy rate of cluster result.
The clustering method that the present embodiment provides, according to the distance between neighbor objects and object to be visited, and its corresponding power Weight coefficient, calculates the quantity of neighbor objects in neighborhood for the object to be visited;Apart from the object that object to be visited is nearer, roll over The quantitative proportion of this object obtaining is bigger, and the contribution to neighbor objects in neighborhood is also bigger;Conversely, distance is waited to visit Ask object more remote object, the quantitative proportion that conversion obtains this object is less, the contribution to neighbor objects in neighborhood is also got over Little.Thus reducing the sensitivity that neighbor objects quantity comprises number of objects to sweep radius and the minimum of neighborhood to a certain extent Property, and then reduce the cluster result obtaining using this kind of clustering method the quick of number of objects is comprised to sweep radius and minimum Perception, improves the accuracy rate of cluster result.
In order to reduce the sensitivity that cluster result comprises quantity to sweep radius and minimum further, improve the standard of cluster result Really rate.
Whether object to be visited is the flow chart of kernel object to judging shown in the disclosure one exemplary embodiment, described treats Access object and be provided with the ascending multiple neighborhood changing successively of sweep radius, as shown in fig. 6, described step S300 Can include:
In step S310, according to the order that described sweep radius is ascending, judge neighbor objects in described neighborhood Whether quantity is not less than corresponding predetermined threshold value;
As shown in fig. 7, for object setting multiple neighborhood schematic diagram, object P be provided with sweep radius be respectively E1, E2、E3(E1 < E2 < E3)Triple neighborhoods, this triple neighborhood is E1 neighborhood, E2 neighborhood and E3 neighborhood respectively.
When judging whether object P is kernel object, first determine whether the minimum neighborhood of sweep radius, that is, shown in Fig. 7 Whether quantity Pts1 of the neighbor objects in E1 neighborhood is not less than MinPts1;If Pts1 is < MinPts1, continue to sentence Whether quantity Pts2 of the neighbor objects in disconnected E2 neighborhood is not less than MinPts2;If Pts2 is < MinPts2, continue Judge the quantity situation of the neighbor objects in E3 neighborhood.
No matter object is provided with several heavy neighborhoods, and the order of the above-mentioned triple neighborhoods of judgement order is similar, according to sweep radius by Little to big order, gradually judge the quantity of the neighbor objects in neighborhood.
When the quantity of neighbor objects in described neighborhood is not less than corresponding predetermined threshold value, in step S311, determine institute State to be visited to as if kernel object;
When the quantity of the neighbor objects in heavy neighborhood a certain in the multiple neighborhood of object P is not less than corresponding predetermined threshold value(Figure In example shown in 7, Pts1 >=MinPts1 or Pts2 >=MinPts2 or Pts3 >=MinPts3)When, determine this object P is kernel object.
When the quantity of neighbor objects in described current neighborhood is less than corresponding predetermined threshold value, in step S312, judge Whether the multiple neighborhood of described object to be visited has all judged;If it is, entering step S313;Otherwise, return execution Step S310, execution, according to the ascending order of described sweep radius, judges the quantity of neighbor objects in next weight neighborhood Whether it is not less than corresponding predetermined threshold value;
For example, it is possible to setting variable i, the initial value of i is 0, and when judging the first weight neighborhood, i increases by 1(I.e. i=i+1), By multilevel iudge i and neighborhood tuple, it is determined whether judged multiple neighborhood.
When having judged described multiple neighborhood, in step S313, determine that described object to be visited is not kernel object.
If the quantity of the neighbor objects in whole neighborhoods of object P is respectively less than corresponding preset value, for example, shown in Fig. 7 Example in, Pts1 < MinPts1, and Pts2 < MinPts2, and Pts3 < MinPts3, then object P is not core Object.If it is determined that object P to be visited is not kernel object, then object P is Noise(Noise spot), and return execution , until there is not object to be visited in the step obtaining whole neighbor objects of next object to be visited.
It will be appreciated by persons skilled in the art that when the multiple neighborhood arranging the ascending change of sweep radius for object When, the process obtaining the quantity of neighbor objects in multiple neighborhood can obtain successively according to the ascending order of sweep radius Take the quantity of the neighbor objects in multiple neighborhood.The example of Fig. 7, obtains the quantity of the neighbor objects in E1 neighborhood first Pts1, as Pts1 < MinPts1, continues to obtain sweep radius more than E1 and less than the neighbor objects in the range of E2 Quantity Pts21, i.e. quantity Pts2=Pts1+Pts21 of the neighbor objects in E2 neighborhood;As Pts2 < MinPts2, Continue acquisition sweep radius and be more than E2 and quantity Pts32 less than the neighbor objects in the range of E3, that is, in E3 neighborhood Quantity Pts3=Pts32+Pts21+Pts1 of neighbor objects.Obtain neighborhood or the quantity of the neighbor objects in the range of certain please be joined See above-mentioned related content, here is omitted.
What the present embodiment provided judges that whether object is the mode of kernel object, directly according to neighbor objects and object to be visited The distance between determine the quantity of neighbor objects in described neighborhood, if certain in the multiple neighborhood of object to be visited is adjacent In domain the quantity of neighbor objects be not less than corresponding predetermined threshold value, then this to be visited to as if kernel object, be equivalent to and relax Pine is to Eps(Sweep radius)And MinPts(Minimum comprises number of objects)Restriction, therefore, reduce cluster result Sensitivity to this two parameters of Eps and MinPts, improves the accuracy rate of cluster result.
Sweep radius bigger introduce noise spot probability bigger, therefore, by the direct density in specified neighborhood up to right As the apoplexy due to endogenous wind adding described object to be visited to be located.
In the disclosure one exemplary embodiment, when object is provided with the multiple neighborhood of sweep radius ascending change, such as Shown in Fig. 8, extended clustering can include:
In step S510, obtain whole directly density in designated field for the described object to be visited up to object, The sweep radius of described specified neighborhood is less than the maximum scan radius in described multiple neighborhood;
Obtain whole neighbor objects in specified neighborhood for the object P to be visited, obtain queue NeighborPts, Ke Yizhi Meet the neighbor objects P calculating PiThe distance between with P, then, the scanning of relatively described distance and described specified neighborhood Magnitude relationship between radius, if described distance is not less than described sweep radius, shows that described neighbor objects are queues Object in NeighborPts.Described distance can be cosine similarity, Euclidean distance etc..It should be noted that this public affairs When cosine similarity between exploitation object reacts the distance relation between object, using 1- cosine similarity(1-cosθ)Table Levy the distance between object, so, the distance between object is less, and the similarity of two objects is bigger.
Alternatively, can also be big according to distance after obtaining the distance between each neighbor objects and object to be visited Little be ranked up, so, during number of objects in judgement field, according to described distance sequence, statistics is waited to visit with described Ask the neighbor objects of the sweep radius that the distance between object is less than described neighborhood, finally give queue NeighborPts.
In step S520, judge one by one described direct density up to object whether be kernel object;If it is, holding Row step S530;Otherwise, execution step S540.
Object P in traversal queue NeighborPtsi, judge object PiWhether it is kernel object, judge object PiWhether Be kernel object process similar to above-mentioned related content:First according to weight coefficient and object PiWith its neighbor objects it Between distance, obtain object PiThe quantity of the neighbor objects in neighborhood;Then whether judge the quantity of described neighbor objects Not less than corresponding predetermined threshold value.
If PiIt is provided with a weight neighborhood, then object PiNeighbor objects quantity be not less than predetermined threshold value when, determine institute State direct density up to as if kernel object;When the quantity of the neighbor objects in neighborhood is less than predetermined threshold value, determine Described direct density up to object be not kernel object.
If object PiIt is provided with multiple neighborhood, then judge that object whether there is at least one weight neighborhood in multiple neighborhood The quantity of neighbor objects is not less than predetermined threshold value;As object PiNeighbor objects quantity be not less than predetermined threshold value when, determine Described direct density up to as if kernel object;Preset when the quantity of the described neighbor objects in multiple neighborhood is respectively less than During threshold value, determine described direct density up to object be not kernel object.
When described direct density up to as if during kernel object, in step S530, by described direct density up to Neighbor objects in specified neighborhood for the object add the apoplexy due to endogenous wind that described object to be visited is located.
In step S540, judge described direct density up to object whether all judged;If all do not judged Complete, then return execution step S520, judge next direct density up to object whether be kernel object;If all Judge, then terminated this extended clustering.
When being extended cluster, a queue can be created for storing object to be visited, for example, to be visited right When being extended cluster as P, create a queue, first by direct density in specified neighborhood for the P up to object add Enter in this queue, for example, this queue is { P1, P2, P3, P4 }, first determines whether whether P1 is kernel object, if Be, then by direct density in specified neighborhood for the P1 up to object add the apoplexy due to endogenous wind that P is located, and, P1 is being referred to Determine direct density in neighborhood up to object add described queue(Stack data structures)In, continue in access queue Next object(For example, P2), this object of labelling(P2)For accessing object, judge whether this object is core pair As if not kernel object, then judging the member of the whether other class of this object, if this object can not be of other classes Member, then add the apoplexy due to endogenous wind at object P place by this object, continues next object in access queue, does not have in queue Object.
The method of the extended clustering that the present embodiment provides, is only extended to the object in specified neighborhood, that is, only to credibility Object in the corresponding neighborhood of higher sweep radius is extended, and reduces the probability introducing noise spot, therefore improves The accuracy rate of cluster result.
Fig. 9 is a kind of clustering apparatus schematic diagram according to an exemplary embodiment.With reference to Fig. 9, this device includes the One acquiring unit 100, second acquisition unit 200, judging unit 300, cluster cell 400 and extended clustering unit 500.
First acquisition unit 100 is configured to obtain whole neighbours of described object to be visited for arbitrary object to be visited Object;
Second acquisition unit 200 is configured to according to the distance between described neighbor objects and described object to be visited and corresponding Weight coefficient, obtains the quantity of neighbor objects in the neighborhood of described object to be visited, between described weight coefficient and described object Distance related;
In the disclosure one exemplary embodiment, second acquisition unit 200 can include the first acquisition subelement, first true Stator unit, the second acquisition subelement and computation subunit;
Described first acquisition subelement is configured to obtain between the neighbor objects in described neighborhood and described object to be visited Distance;
Described first determination subelement is configured to according to described distance, from whole neighbor objects of described object to be visited Determine the neighbor objects in described neighborhood;
Described second acquisition subelement be configured to obtain described apart from corresponding weight coefficient, described weight coefficient and object Between distance related;
Described computation subunit is configured to according to described weight coefficient, neighbours couple in the neighborhood of the described object to be visited of calculating The quantity of elephant.
Judging unit 300 is configured to the quantity of neighbor objects in the neighborhood according to described object to be visited, treats described in judgement Access whether object is kernel object;
Cluster cell 400 be configured as described to be visited to as if during kernel object, described object to be visited is classified as one Class;
Extended clustering unit 500 be configured to direct density in neighborhood for the described object to be visited up to object carry out Extended clustering, until not having new object to add the class that described object to be visited is located.
The clustering apparatus that the present embodiment provides, according to the distance between neighbor objects and object to be visited, and its corresponding power Weight coefficient, calculates the quantity of neighbor objects in neighborhood for the object to be visited;Apart from the object that object to be visited is nearer, roll over The quantitative proportion of this object obtaining is bigger, and the contribution to neighbor objects in neighborhood is also bigger;Conversely, distance is waited to visit Ask object more remote object, the quantitative proportion that conversion obtains this object is less, the contribution to neighbor objects in neighborhood is also got over Little.Thus reducing the sensitivity that neighbor objects quantity comprises number of objects to sweep radius and the minimum of neighborhood to a certain extent Property, and then reduce the cluster result obtaining using this kind of clustering method the quick of number of objects is comprised to sweep radius and minimum Perception, improves the accuracy rate of cluster result.
In the disclosure one exemplary embodiment, described second acquisition subelement can include:Statistics subelement, inquiry Unit and the 3rd acquisition subelement;
Statistics subelement is configured to count according to sample object whether the distance obtaining between object and two objects are same Corresponding relation between the probability of one object;
Inquiry subelement is configured to inquire about described corresponding relation, obtain described apart from corresponding two objects whether be same The probability of individual object;
3rd acquisition subelement is configured to according to the product between described probability and described distance, obtains described distance corresponding Weight coefficient, described weight coefficient and described probability positive correlation.
The present embodiment provide second acquisition subelement, by statistics subelement according to sample object count neighbor objects with described Whether the distance between object to be visited and this two objects are the corresponding relations between the probability of same object;By the 3rd Obtain subelement according to the product between probability and described distance, obtain described apart from corresponding weight coefficient;Make weight Whether coefficient and two objects are the probability correlation connection of same object, and the more big corresponding weight coefficient of probability is bigger, logarithm The contribution of amount is bigger, thus improve the accuracy rate of the cluster result that clustering apparatus obtain.
In the disclosure one exemplary embodiment, described first acquisition subelement can include:Sequence subelement and statistics Unit;
Described sequence subelement be configured to by the distance between each described neighbor objects and described object to be visited according to Magnitude relationship is ranked up;
Described statistics subelement is configured to the sequence according to described distance, and the distance between statistics and described object to be visited is little Neighbor objects in the sweep radius of described neighborhood.
The first acquisition subelement that the present embodiment provides, after obtaining the distance between each neighbor objects and object to be visited According to magnitude relationship sequence, the object in certain neighborhood can be found out according to certain lookup method, search all without each To whole object search one time, thus improve search efficiency.
In the disclosure one exemplary embodiment, if object to be visited is provided with multiple neighborhood, and described multiple neighborhood Sweep radius is set to ascending change successively;Described judging unit 300 can include the first judgment sub-unit, first true Stator unit, the second judgment sub-unit and the second determination subelement;
First judgment sub-unit is configured to the order ascending according to the sweep radius of described multiple neighborhood, judges a weight In neighborhood, whether the quantity of neighbor objects is not less than corresponding predetermined threshold value;
The quantity that first determination subelement is configured as neighbor objects in a described weight neighborhood is not less than corresponding default threshold During value, determine described to be visited to as if kernel object;
The quantity that second judgment sub-unit is configured as neighbor objects in a described weight neighborhood is less than corresponding predetermined threshold value When, judge whether the multiple neighborhood of described object to be visited has all judged, when not judged described multiple neighborhood, return Receipt row, according to the ascending order of sweep radius, judges whether the quantity of the neighbor objects in next weight neighborhood is not less than Corresponding predetermined threshold value;
When second determination subelement is configured as having judged described multiple neighborhood, determine that described object to be visited is not core Object.
The present embodiment provide judging unit, according to the order that sweep radius is ascending, judge in described multiple neighborhood be The no quantity that there are neighbor objects in certain neighborhood is not less than corresponding predetermined threshold value, is equivalent to and relaxes pine to Eps(Sweep Retouch radius)And MinPts(Minimum comprises number of objects)Restriction, therefore, reduce cluster result to Eps and MinPts The sensitivity of this two parameters, improves the accuracy rate of cluster result.
In the disclosure one exemplary embodiment, if object to be visited is provided with multiple neighborhood, and described multiple neighborhood Sweep radius is set to ascending change successively;Described extended clustering 500 can include the 4th acquisition subelement, the 3rd sentence Disconnected subelement and cluster subelement;
4th acquisition subelement, for obtain whole directly density in designated field for the described object to be visited up to right As the sweep radius of described specified neighborhood is less than the maximum scan radius in described multiple neighborhood;
3rd judgment sub-unit, for judge one by one described direct density up to object whether be kernel object;
Cluster subelement, for when described direct density up to as if during kernel object, by described direct density up to Neighbor objects in specified neighborhood for the object add the apoplexy due to endogenous wind that described object to be visited is located, the object until not having new adds Enter the apoplexy due to endogenous wind that described object to be visited is located.
The extended clustering unit that the present embodiment provides, is only extended to the object in specified neighborhood, that is, only to credibility relatively Object in the corresponding neighborhood of high sweep radius is extended, and reduces the probability introducing noise spot, therefore improves poly- The accuracy rate of class result.
With regard to the device in above-described embodiment, wherein the concrete mode of modules execution operation is in relevant the method It has been described in detail in embodiment, explanation will be not set forth in detail herein.
Figure 10 is a kind of block diagram of the terminal unit 800 for cluster according to an exemplary embodiment.For example, Terminal unit 800 can be mobile phone, computer, digital broadcast terminal, messaging devices, game console, Tablet device, armarium, body-building equipment, personal digital assistant etc..
With reference to Figure 10, terminal unit 800 can include following one or more assemblies:Process assembly 802, memorizer 804, Power supply module 806, multimedia groupware 808, audio-frequency assembly 810, input/output(I/O)Interface 812, sensor Assembly 814, and communication component 816.
The integrated operation of the usual control terminal equipment 800 of process assembly 802, such as with display, call, data is led to The associated operation of letter, camera operation and record operation.Process assembly 802 can include one or more processors 820 Carry out execute instruction, to complete all or part of step of above-mentioned method.Additionally, process assembly 802 can include one Or multiple module, it is easy to the interaction between process assembly 802 and other assemblies.For example, process assembly 802 can include Multi-media module, to facilitate the interaction between multimedia groupware 808 and process assembly 802.
Memorizer 804 is configured to store various types of data to support the operation in equipment 800.The showing of these data Example includes the instruction for any application program of operation or method on terminal unit 800, contact data, telephone directory Data, message, picture, video etc..Memorizer 804 can be by any kind of volatibility or non-volatile memory device Or combinations thereof is realized, such as static RAM(SRAM), Electrically Erasable Read Only Memory (EEPROM), Erasable Programmable Read Only Memory EPROM(EPROM), programmable read only memory(PROM), only Read memorizer(ROM), magnetic memory, flash memory, disk or CD.
Power supply module 806 provides electric power for the various assemblies of terminal unit 800.Power supply module 806 can include power supply pipe Reason system, one or more power supplys, and other generate, manage and distribute, with for terminal unit 800, the group that electric power is associated Part.
Multimedia groupware 808 includes the screen of one output interface of offer between described terminal unit 800 and user. In certain embodiments, screen can include liquid crystal display(LCD)And touch panel(TP).If screen includes Touch panel, screen may be implemented as touch screen, to receive the input signal from user.Touch panel includes one Or multiple touch sensor is with the gesture on sensing touch, slip and touch panel.Described touch sensor can not only be felt Survey the border of touch or sliding action, but also the detection persistent period related to described touch or slide and pressure. In certain embodiments, multimedia groupware 808 includes a front-facing camera and/or post-positioned pick-up head.At equipment 800 In operator scheme, during as screening-mode or video mode, front-facing camera and/or post-positioned pick-up head can receive the many of outside Media data.Each front-facing camera and post-positioned pick-up head can be a fixation optical lens system or have focal length and Optical zoom ability.
Audio-frequency assembly 810 is configured to output and/or input audio signal.For example, audio-frequency assembly 810 includes a Mike Wind(MIC), when terminal unit 800 is in operator scheme, during as call model, logging mode and speech recognition mode, Mike is configured to receive external audio signal.The audio signal being received can be further stored in memorizer 804 Or send via communication component 816.In certain embodiments, audio-frequency assembly 810 also includes a speaker, for defeated Go out audio signal.
, for providing interface between process assembly 802 and peripheral interface module, above-mentioned peripheral interface module can for I/O interface 812 To be keyboard, click wheel, button etc..These buttons may include but be not limited to:Home button, volume button, startup are pressed Button and locking press button.
Sensor cluster 814 includes one or more sensors, for providing the state of various aspects for terminal unit 800 Assessment.For example, sensor cluster 814 can detect/the closed mode of opening of equipment 800, the relative localization of assembly, For example described assembly is display and the keypad of terminal unit 800, and sensor cluster 814 can be with detection terminal equipment 800 or the position change of 800 1 assemblies of terminal unit, user is presence or absence of with what terminal unit 800 contacted, Terminal unit 800 orientation or the temperature change of acceleration/deceleration and terminal unit 800.Sensor cluster 814 can include connecing Nearly sensor, is configured to the presence of object near the detection when not having any physical contact.Sensor cluster 814 Optical sensor can also be included, such as CMOS or ccd image sensor, for using in imaging applications.At some In embodiment, this sensor cluster 814 can also include acceleration transducer, gyro sensor, Magnetic Sensor, pressure Force transducer or temperature sensor.
Communication component 816 is configured to facilitate the communication of wired or wireless way between terminal unit 800 and other equipment. Terminal unit 800 can access the wireless network based on communication standard, such as WiFi, 2G or 3G, or combinations thereof. In one exemplary embodiment, communication component 816 receives the broadcast from external broadcasting management system via broadcast channel Signal or broadcast related information.In one exemplary embodiment, described communication component 816 also includes near-field communication(NFC) Module, to promote junction service.For example, RF identification can be based in NFC module(RFID)Technology, infrared data Association(IrDA)Technology, ultra broadband(UWB)Technology, bluetooth(BT)Technology and other technologies are realizing.
In the exemplary embodiment, terminal unit 800 can be by one or more application specific integrated circuits(ASIC)、 Digital signal processor(DSP), digital signal processing appts(DSPD), PLD(PLD), scene Programmable gate array(FPGA), controller, microcontroller, microprocessor or other electronic components realize, for executing Said method.
In the exemplary embodiment, additionally provide a kind of non-transitorycomputer readable storage medium including instruction, for example Including the memorizer 804 of instruction, above-mentioned instruction can be executed by the processor 820 of terminal unit 800 to complete said method. For example, described non-transitorycomputer readable storage medium can be ROM, random access memory(RAM)、 CD-ROM, tape, floppy disk and optical data storage devices etc..
A kind of non-transitorycomputer readable storage medium, when the instruction in described storage medium is by the processor of mobile terminal So that mobile terminal is able to carry out a kind of clustering method during execution, methods described includes:
For arbitrary object to be visited, obtain whole neighbor objects of described object to be visited;
According to the distance between described neighbor objects and described object to be visited and corresponding weight coefficient, obtain described to be visited The quantity of neighbor objects in the neighborhood of object, the distance between described weight coefficient and described object is related;
The quantity of neighbor objects in neighborhood according to described object to be visited, judges whether described object to be visited is core pair As;
When described to be visited to as if during kernel object, described object to be visited is classified as a class;
To direct density in neighborhood for the described object to be visited up to object be extended clustering, new right until not having As the class adding described object to be visited to be located.
Those skilled in the art, after considering description and putting into practice invention disclosed herein, will readily occur to other of the present invention Embodiment.The application is intended to any modification, purposes or the adaptations of the present invention, these modifications, purposes Or adaptations follow the present invention general principle and include the disclosure undocumented in the art known often Know or conventional techniques.Description and embodiments be considered only as exemplary, true scope and spirit of the invention by under The claim in face is pointed out.
It is described above and precision architecture illustrated in the accompanying drawings it should be appreciated that the invention is not limited in, and Various modifications and changes can carried out without departing from the scope.The scope of the present invention only to be limited by appended claim.
It should be noted that herein, the relational terms of such as " first " and " second " or the like be used merely to by One entity or operation are made a distinction with another entity or operation, and not necessarily require or imply these entities or behaviour There is any this actual relation or order between work.And, term " inclusion ", "comprising" or its any other Variant is intended to comprising of nonexcludability, so that including a series of process of key elements, method, article or equipment Not only include those key elements, but also include other key elements being not expressly set out, or also include for this process, Method, article or the intrinsic key element of equipment.In the absence of more restrictions, by sentence "including a ..." It is not excluded that also there is other phase in process, method, article or the equipment including described key element in the key element limiting Same key element.
The above is only the specific embodiment of the disclosure, makes skilled artisans appreciate that or realizing the disclosure. Multiple modifications to these embodiments will be apparent to one skilled in the art, and as defined herein one As principle can realize in other embodiments without departing from the spirit or the scope of the present disclosure.Therefore, this public affairs Open and be not intended to be limited to the embodiments shown herein, and be to fit to and principles disclosed herein and features of novelty Consistent scope the widest.

Claims (9)

1. a kind of clustering method is it is characterised in that include:
For arbitrary object to be visited, obtain whole neighbor objects of described object to be visited;
According to the distance between described neighbor objects and described object to be visited and corresponding weight coefficient, obtain described object to be visited Neighborhood in neighbor objects quantity, the distance between described weight coefficient and described object is related;
The quantity of neighbor objects in neighborhood according to described object to be visited, judges whether described object to be visited is kernel object;
When described to be visited to as if during kernel object, described object to be visited is classified as a class;
To direct density in neighborhood for the described object to be visited up to object be extended clustering, the object until not having new adds Enter the class that described object to be visited is located;
Wherein, described according to the distance between described neighbor objects and described object to be visited and corresponding weight coefficient, obtain described The quantity of neighbor objects in the neighborhood of object to be visited, in the following way:
Obtain the neighbor objects in described neighborhood and the distance between described object to be visited;
According to described distance, determine the neighbor objects in described neighborhood from whole neighbor objects of described object to be visited;
Obtain described apart from corresponding weight coefficient, described weight coefficient is related to the distance between object;
According to described weight coefficient, calculate the quantity of neighbor objects in the neighborhood of described object to be visited;
Described obtain described apart from corresponding weight coefficient, in the following way:
Corresponding pass between the probability whether distance between object and two objects are same targets is obtained according to sample object statistics System;
Inquire about described corresponding relation, obtain described apart from corresponding two objects be whether same object probability;
According to the product between described probability and described distance, obtain described apart from corresponding weight coefficient, described weight coefficient with Described probability positive correlation.
2. method according to claim 1 it is characterised in that described object to be visited to be provided with sweep radius ascending The multiple neighborhood changing successively;
The quantity of neighbor objects in neighborhood according to described object to be visited, judges whether described object to be visited is kernel object, In the following way:
According to the order that described sweep radius is ascending, whether the quantity judging neighbor objects in described neighborhood is not less than corresponding Predetermined threshold value;
When neighbor objects in described neighborhood quantity be not less than corresponding predetermined threshold value when, determine described to be visited to as if core pair As;
When the quantity of neighbor objects in described current neighborhood is less than corresponding predetermined threshold value, judge the multiple of described object to be visited Whether neighborhood has all judged;
When not judged described multiple neighborhood, execution, according to the ascending order of described sweep radius, judges next weight neighborhood Whether the quantity of interior neighbor objects is not less than corresponding predetermined threshold value;
When having judged described multiple neighborhood, determine that described object to be visited is not kernel object.
3. method according to claim 2 is it is characterised in that the direct density in neighborhood can to described object to be visited The object reaching is extended clustering, until not having new object to add the class that described object to be visited is located, in the following way:
Obtain whole directly density in specified neighborhood for the described object to be visited up to object, the scanning half of described specified neighborhood Footpath is less than the maximum scan radius in described multiple neighborhood;
Judge one by one described direct density up to object whether be kernel object;
When described direct density up to as if during kernel object, by described direct density up to object in specified neighborhood Neighbor objects add the apoplexy due to endogenous wind that described object to be visited is located, until not having new object to add the class that described object to be visited is located In.
4. method according to claim 1 is it is characterised in that according to described distance, whole from described object to be visited The neighbor objects in described neighborhood are determined, in the following way in neighbor objects:
The distance between each described neighbor objects and described object to be visited are ranked up according to magnitude relationship;
According to the sequence of described distance, the distance between statistics and described object to be visited is less than the neighbours of the sweep radius of described neighborhood Object.
5. a kind of clustering apparatus are it is characterised in that include:
First acquisition unit, for for arbitrary object to be visited, obtaining whole neighbor objects of described object to be visited;
Second acquisition unit, for according to the distance between described neighbor objects and described object to be visited and corresponding weight coefficient, Obtain the quantity of neighbor objects in the neighborhood of described object to be visited, the distance between described weight coefficient and described object is related;
Judging unit, for the quantity of neighbor objects in the neighborhood according to described object to be visited, judge described to be visited to as if No is kernel object;
Cluster cell, for when described to be visited to as if during kernel object, described object to be visited is classified as a class;
Extended clustering unit, for direct density in neighborhood for the described object to be visited up to object be extended clustering, Until not having new object to add the class that described object to be visited is located;
Wherein, described second acquisition unit includes:
First acquisition subelement, for obtaining the distance between the neighbor objects in described neighborhood and described object to be visited;
First determination subelement, for according to described distance, determining described from whole neighbor objects of described object to be visited Neighbor objects in neighborhood;
Second acquisition subelement, for obtain described apart from the distance phase between corresponding weight coefficient, described weight coefficient and object Close;
Computation subunit, for according to described weight coefficient, calculating the quantity of neighbor objects in the neighborhood of described object to be visited;
Described second acquisition subelement includes:
Whether statistics subelement, be same target for obtaining the distance between object and two objects according to sample object statistics Corresponding relation between probability;
Inquiry subelement, for inquiring about described corresponding relation, obtain described apart from corresponding two objects whether be same object Probability;
3rd acquisition subelement, for according to the product between described probability and described distance, obtain described apart from corresponding weight Coefficient, described weight coefficient and described probability positive correlation.
6. device according to claim 5 is it is characterised in that described object to be visited is provided with multiple neighborhood, institute The sweep radius stating multiple neighborhood is set to ascending change successively;Described judging unit includes:
First judgment sub-unit, for the order that the sweep radius according to described multiple neighborhood is ascending, judges in a weight neighborhood Whether the quantity of neighbor objects is not less than corresponding predetermined threshold value;
First determination subelement, when the quantity for weighing neighbor objects in neighborhood when described one is not less than corresponding predetermined threshold value, really Fixed described to be visited to as if kernel object;
Second judgment sub-unit, for when the quantity of neighbor objects in described one weight neighborhood is less than corresponding predetermined threshold value, judging Whether the multiple neighborhood of described object to be visited has all judged, when not judged described multiple neighborhood, returns execution according to sweeping Retouch the ascending order of radius, judge whether the quantity of the neighbor objects in next weight neighborhood is not less than corresponding predetermined threshold value;
Second determination subelement, for when having judged described multiple neighborhood, determining that described object to be visited is not kernel object.
7. device according to claim 6 is it is characterised in that described extended clustering unit includes:
4th acquisition subelement, for obtain whole directly density in specified neighborhood for the described object to be visited up to object, The sweep radius of described specified neighborhood is less than the maximum scan radius in described multiple neighborhood;
3rd judgment sub-unit, for judge one by one described direct density up to object whether be kernel object;
Cluster subelement, for when described direct density up to as if during kernel object, by described direct density up to right As the neighbor objects in specified neighborhood add the apoplexy due to endogenous wind at described object place to be visited, until there is no described in new object addition to treat Access the apoplexy due to endogenous wind that object is located.
8. device according to claim 5 is it is characterised in that described first determination subelement includes:
Sequence subelement, for carrying out the distance between each described neighbor objects and described object to be visited according to magnitude relationship Sequence;
Statistics subelement, for the sequence according to described distance, the distance between statistics and described object to be visited is less than described neighborhood Sweep radius neighbor objects.
9. a kind of terminal unit is it is characterised in that include:
Processor;
For storing the memorizer of processor executable;
Wherein, described processor is configured to:
For arbitrary object to be visited, obtain whole neighbor objects of described object to be visited;
According to the distance between described neighbor objects and described object to be visited and corresponding weight coefficient, obtain described object to be visited Neighborhood in neighbor objects quantity, the distance between described weight coefficient and described object is related;
The quantity of neighbor objects in neighborhood according to described object to be visited, judges whether described object to be visited is kernel object;
When described to be visited to as if during kernel object, described object to be visited is classified as a class;
To direct density in neighborhood for the described object to be visited up to object be extended clustering, the object until not having new adds Enter the class that described object to be visited is located;
Wherein, described according to the distance between described neighbor objects and described object to be visited and corresponding weight coefficient, obtain described The quantity of neighbor objects in the neighborhood of object to be visited, in the following way:
Obtain the neighbor objects in described neighborhood and the distance between described object to be visited;
According to described distance, determine the neighbor objects in described neighborhood from whole neighbor objects of described object to be visited;
Obtain described apart from corresponding weight coefficient, described weight coefficient is related to the distance between object;
According to described weight coefficient, calculate the quantity of neighbor objects in the neighborhood of described object to be visited;
Described obtain described apart from corresponding weight coefficient, in the following way:
Corresponding pass between the probability whether distance between object and two objects are same targets is obtained according to sample object statistics System;
Inquire about described corresponding relation, obtain described apart from corresponding two objects be whether same object probability;
According to the product between described probability and described distance, obtain described apart from corresponding weight coefficient, described weight coefficient with Described probability positive correlation.
CN201410073353.4A 2014-02-28 2014-02-28 Clustering method and device and terminal device Active CN103902654B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201410073353.4A CN103902654B (en) 2014-02-28 2014-02-28 Clustering method and device and terminal device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201410073353.4A CN103902654B (en) 2014-02-28 2014-02-28 Clustering method and device and terminal device

Publications (2)

Publication Number Publication Date
CN103902654A CN103902654A (en) 2014-07-02
CN103902654B true CN103902654B (en) 2017-02-08

Family

ID=50993977

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201410073353.4A Active CN103902654B (en) 2014-02-28 2014-02-28 Clustering method and device and terminal device

Country Status (1)

Country Link
CN (1) CN103902654B (en)

Families Citing this family (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103902655B (en) * 2014-02-28 2017-01-04 小米科技有限责任公司 Clustering method, device and terminal unit
CN108038500B (en) * 2017-12-07 2020-07-03 东软集团股份有限公司 Clustering method, apparatus, computer device, storage medium, and program product
CN108108760A (en) * 2017-12-19 2018-06-01 山东大学 A kind of fast human face recognition
CN108960298A (en) * 2018-06-15 2018-12-07 重庆大学 Clustering method is reported in physical examination based on density core and dynamic scan radius
CN109541654A (en) * 2018-11-19 2019-03-29 北京金州世纪信息技术有限公司 A kind of calculation method and device of vehicle parking point
CN110097126B (en) * 2019-05-07 2023-04-21 江苏优聚思信息技术有限公司 Method for checking important personnel and house missing registration based on DBSCAN clustering algorithm
CN111582306A (en) * 2020-03-30 2020-08-25 南昌大学 Near-repetitive image matching method based on key point graph representation
CN111709473B (en) * 2020-06-16 2023-09-19 腾讯科技(深圳)有限公司 Clustering method and device for object features

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102670251A (en) * 2012-05-29 2012-09-19 飞依诺科技(苏州)有限公司 Numerical sorting weighted average-based ultrasonic image spatial filter method
US8429153B2 (en) * 2010-06-25 2013-04-23 The United States Of America As Represented By The Secretary Of The Army Method and apparatus for classifying known specimens and media using spectral properties and identifying unknown specimens and media
CN103679148A (en) * 2013-12-11 2014-03-26 哈尔滨工业大学深圳研究生院 Population gathering and dispersing detection method and device based on corner clustering weighted area

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8429153B2 (en) * 2010-06-25 2013-04-23 The United States Of America As Represented By The Secretary Of The Army Method and apparatus for classifying known specimens and media using spectral properties and identifying unknown specimens and media
CN102670251A (en) * 2012-05-29 2012-09-19 飞依诺科技(苏州)有限公司 Numerical sorting weighted average-based ultrasonic image spatial filter method
CN103679148A (en) * 2013-12-11 2014-03-26 哈尔滨工业大学深圳研究生院 Population gathering and dispersing detection method and device based on corner clustering weighted area

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
"改进的密度聚类算法研究";于智航;《CNKI中国优秀硕士论文全文数据库》;20080615;第12页最后一段、第17-19页、第20页最后一段-21页第1段、第26页最后一段,图3.4 *

Also Published As

Publication number Publication date
CN103902654A (en) 2014-07-02

Similar Documents

Publication Publication Date Title
CN103902654B (en) Clustering method and device and terminal device
CN103902655B (en) Clustering method, device and terminal unit
TWI754855B (en) Method and device, electronic equipment for face image recognition and storage medium thereof
CN109800325A (en) Video recommendation method, device and computer readable storage medium
Chon et al. Automatically characterizing places with opportunistic crowdsensing using smartphones
CN103944804B (en) Contact recommending method and device
CN110782034A (en) Neural network training method, device and storage medium
Zhang et al. Discovering place-informative scenes and objects using social media photos
CN103038765A (en) Method and apparatus for adapting a context model
CN110175223A (en) A kind of method and device that problem of implementation generates
CN103902689A (en) Clustering method, incremental clustering method and related device
CN106354758B (en) Handle the method and device of houseclearing
CN107560619A (en) Recommend method and apparatus in path
CN103927545B (en) Clustering method and relevant apparatus
CN104850238A (en) Method and device for sorting candidate items generated by input method
CN105700362A (en) Equipment control method and apparatus thereof
CN105550231B (en) The method, device and equipment of information exchange
CN109918565A (en) A kind of processing method, device and electronic equipment for searching for data
CN105302877B (en) For the method for short message domain classification, the recognition methods of short message field and device
Bhargava et al. Senseme: a system for continuous, on-device, and multi-dimensional context and activity recognition
CN112417318A (en) Method and device for determining state of interest point, electronic equipment and medium
CN109543069A (en) Video recommendation method, device and computer readable storage medium
CN105933502A (en) Method and device for marking message to be in read status
CN109409414B (en) Sample image determines method and apparatus, electronic equipment and storage medium
CN104268149A (en) Clustering method and clustering device

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant