WO2015135276A1 - 聚类方法及相关装置 - Google Patents

聚类方法及相关装置 Download PDF

Info

Publication number
WO2015135276A1
WO2015135276A1 PCT/CN2014/082876 CN2014082876W WO2015135276A1 WO 2015135276 A1 WO2015135276 A1 WO 2015135276A1 CN 2014082876 W CN2014082876 W CN 2014082876W WO 2015135276 A1 WO2015135276 A1 WO 2015135276A1
Authority
WO
WIPO (PCT)
Prior art keywords
class
classes
distance
objects
rank
Prior art date
Application number
PCT/CN2014/082876
Other languages
English (en)
French (fr)
Inventor
陈志军
张涛
张波
王琳
Original Assignee
小米科技有限责任公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 小米科技有限责任公司 filed Critical 小米科技有限责任公司
Priority to MX2014010879A priority Critical patent/MX358804B/es
Priority to RU2015129676A priority patent/RU2628167C2/ru
Priority to JP2016506778A priority patent/JP6101399B2/ja
Priority to KR1020147026527A priority patent/KR20150117202A/ko
Priority to US14/532,271 priority patent/US10037345B2/en
Publication of WO2015135276A1 publication Critical patent/WO2015135276A1/zh

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/762Arrangements for image or video recognition or understanding using pattern recognition or machine learning using clustering, e.g. of similar faces in social networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/243Classification techniques relating to the number of classes
    • G06F18/2433Single-class perspective, e.g. one-against-all classification; Novelty detection; Outlier detection
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/50Information retrieval; Database structures therefor; File system structures therefor of still image data
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/172Classification, e.g. identification

Definitions

  • the present disclosure relates to the field of computer technologies, and in particular, to a clustering method and related apparatus. Background technique
  • Clustering is the process of dividing a collection of physical or abstract objects into multiple classes consisting of similar objects, that is, the process of classifying objects into different classes (clusters).
  • the objects in the same class have great similarities and differences. Objects between classes have great dissimilarity.
  • the concept of "class” is used below. It should be noted that "class” and “cluster” have the same meaning in this article.
  • the clustering method is used to classify face images, the pictures belonging to the same person are classified into one class, and the related clustering method uses the Rank-Order distance to measure the similarity between two faces, and the same person can be The pictures are gathered together.
  • the clustering result accuracy of such a clustering method is very low. Summary of the invention
  • a clustering method including: performing iterative merging of classes according to a Rank-Order distance between classes; The intra-class aggregation degree corresponding to the iteratively merged class is obtained by using the distance between the objects in the class; for each class obtained by the iterative combination, the object with the distance between the objects within the class being less than the degree of aggregation within the class is divided into a new one.
  • Class and update the number of classes; when the number of updated classes is less than the number of classes before the update, return the step of performing iterative merge of classes based on the Rank-Order distance between classes, until the number of classes before and after the update
  • a clustering result is obtained, the clustering result including a class containing a plurality of objects and a class containing a single object.
  • the utilizing the distance between each object in the class to obtain the intra-class aggregation degree corresponding to the iteratively merged class is as follows: The distance between the objects; calculating the distance average of the distances between the objects in the class according to the distance between the objects in the class, and obtaining the intra-class aggregation degree of the class.
  • the utilizing the distance between each object in the class to obtain the intra-class aggregation degree corresponding to the iteratively merged class is as follows: The distance between the objects; calculating the distance average of the distances between the objects in the class according to the distance between the objects in the class; normalizing the distance average to obtain the intra-class polymerization degree of the class .
  • the each class obtained by the iterative combination is between the intra-class objects
  • the object whose distance is less than the degree of aggregation in the class is divided into a new class, and the number of the classes is updated in the following manner: the distance between the objects in the class is smaller than the intra-class aggregation object, and the connectivity is marked;
  • the tag determines a connected component within the class; splits the class into new classes based on the connected component, and updates the number of classes.
  • the performing the iterative merging of the classes according to the Rank-Order distance between the classes is as follows: obtaining an Inter-class Rank-Order distance, and obtaining Rank-Order normalization distance between classes; when the Rank-Order distance between classes is less than the distance threshold, and the Rank-Order normalization distance between the classes is less than 1, the class is merged; When the number is less than the number of classes before the merge, the steps of obtaining the merged Rank-Order distance between the classes and the Rank-Order normalization distance between the classes are performed.
  • a clustering apparatus including: an iterative merging unit, configured to perform iterative merging of classes according to a Rank-Order distance between classes; and an acquiring unit, configured to utilize each object in the class
  • the distance between the classes obtains the intra-class aggregation degree corresponding to the iteratively merged class; the partitioning unit is used to divide each object obtained by the iterative merging into an object with a distance less than the degree of aggregation within the class.
  • the acquiring unit includes: a first obtaining subunit, configured to obtain a distance between each object in the class; a first calculating subunit, configured to calculate an average value of distances between the objects of the class, to obtain the intra-class aggregation degree.
  • the acquiring unit includes: a second acquiring subunit, configured to acquire a distance between each object in the class; and a second calculating subunit, configured to: Calculating a distance average value of distances between the objects in the class according to the distance between the objects in the class; normalizing the subunit, normalizing the distance average to obtain an in-class aggregation of the class degree.
  • the dividing unit includes: a first determining subunit For determining whether the distance between the objects in the class is less than the intra-class aggregation degree; marking a sub-unit, configured to: when the distance between the objects in the class is less than the intra-class aggregation degree, The object corresponding to the distance between the objects performs a connectivity flag; the determining subunit is configured to determine a connected component in the class according to the connectivity flag; and the splitting unit is configured to split the class into new according to the connected component Class, and update the number of classes.
  • the iterative merging unit includes: a third acquiring subunit, configured to obtain an inter-class Rank-Order distance, and obtain an inter-class Rank-Order homing a merging sub-unit, configured to merge the classes when the Rank-Order distance between the classes is less than a distance threshold, and the normal-to-class Rank-Order normalization distance is less than 1, the second determining sub-unit, When the number of merged classes is less than the number of pre-merged classes, the third obtaining sub-unit is controlled to perform the step of obtaining the updated inter-class Rank-Order distance and the Rank-Order normalized distance between the classes.
  • a terminal device including: a processor; a memory for storing processor-executable instructions; wherein the processor is configured to: according to a Rank-Order distance between classes , iteratively merging the classes; using the distance between the objects in the class to obtain the intra-class aggregation degree of the iteratively merged class; for each class obtained by iterative merging, the distance between the objects within the class is smaller than the intra-class aggregation
  • the inner object of degree is divided into a new class, and the number of classes is updated; when the number of updated classes is less than the number of classes before updating, the steps of performing iterative merging of classes according to the Rank-Order distance between classes are returned.
  • the clustering result is obtained until the number of classes before and after the update is unchanged, and the clustering result includes a class including a plurality of objects and a class containing a single object.
  • the technical solution provided by the embodiments of the present disclosure may include the following beneficial effects: the clustering method is utilizing inter-class The Rank-Order distance combines the qualified classes to reduce the number of classes. Then, the intra-class aggregation degree is calculated by using the distance between the objects in the class, and the distance between the objects in the class is smaller than the intra-class aggregation degree. The object is split into new classes until all the classes are split.
  • the split class is iteratively merged and split, until each class can no longer be split, and the cluster containing multiple objects and the class containing a single object are determined, so as to compare the dissimilarity in the clustering process. Large objects are eliminated, improving the accuracy of clustering results. In particular, when there are many objects in the data set, but there are fewer objects belonging to the same class, the accuracy of the clustering results is relatively high.
  • FIG. 1 is a sequence sequence diagram of a plurality of objects
  • FIG. 2 is a flowchart of a clustering method according to an exemplary embodiment
  • FIG. 3 is a flowchart of an exemplary embodiment of step S110 of FIG.
  • Figure 4 is a flow chart of another exemplary embodiment of step S110 of Figure 2
  • Figure 5 is a flow chart of an exemplary embodiment of step S120 of Figure 2
  • Figure 6 is a diagram of step S130 of Figure 2.
  • FIG. 7 is a block diagram of a clustering apparatus according to an exemplary embodiment
  • FIG. 8 is a block diagram of a terminal device according to an exemplary embodiment
  • n objects namely ⁇ , i 2 , i 3 , i 4 , i 5 , i 6 ... i n
  • calculate the distance between each other object and the object ⁇ and press magnitude of distance sort ⁇ ⁇ obtain sequence shown in Figure 1;
  • i 2 to the object as a reference object 12 calculates a distance between each of the other objects and the reference objects, to obtain sequence shown in FIG. 02.
  • the D ⁇ H represents the Rank-Order distance between the normalized objects
  • the Rank-Order distance between the classes is the same as the Rank-Order distance algorithm between the objects
  • one class is the reference class and then each is based on the distance between the classes.
  • C ⁇ PCj represents a class.
  • the formula for calculating the Rank-Order distance between classes is as shown in equation (4):
  • D(Ci, Cj) represents the asymmetric Rank-Order distance between the class and the class Cj
  • D(Cj, Ci) represents the asymmetric Rank-Order distance between the class Cj and the class
  • Oc ⁇ Cj indicates the sequence number of the class Cj in the sequence with ⁇ as the reference class
  • O e indicates the sequence number of the class ⁇ in the sequence with the class ⁇ as the reference class.
  • the normalized Rank-Order distance D N (Ci, Cj) is calculated according to the inter-class distance D R (Ci, Cj), where the formula for calculating the normalization distance between classes is as shown in formula (5): 1 ⁇
  • + C laeQUCj K k l
  • c Ci, Cj) represents the distance between the class and the class Cj
  • and Cj represent the number of objects in the class
  • K is a constant
  • f a (k ) represents the kth neighbor object of object a
  • ⁇ Ci, Cj) represents the average distance between the nearest K objects in the two classes.
  • the object is a face image
  • the clustering method provided by the present disclosure is capable of grouping images belonging to the same person to form a cluster. The features in the face image are converted into a set of vectors, so the distance between the objects is the distance between the vectors.
  • the clustering method provided by the present disclosure can also be applied to other data.
  • step S110 according to the Rank- between classes Order distance, the iterative merge of classes. Calculate the Rank-Order distance between the two classes and combine the classes whose Rank-Order distance is less than the first distance threshold.
  • the first distance threshold may be determined according to a data type, and may also be determined according to a test result.
  • step S110 may include the following steps: In step S111, an inter-class Rank-Order distance is obtained, and an inter-class Rank-Order normalized distance is obtained.
  • the number of initial face images is N
  • the number of initial classes is N
  • the distance threshold t and the constant K are set.
  • the inter-class Rank-Order distance D R (Ci, Cj) and the inter-class normalized Rank-Order distance D N (Ci, are calculated. Cj;).
  • the number of initial classes is N, and finally a D R (Ci, Cj;) matrix of NXN and a D N (Ci, Cj) matrix of NXN are obtained, where each vector representation in the D R (Ci, Cj) matrix
  • the Rank-Order distance between the corresponding classes for example, Cg in the matrix represents the Rank-Order distance between the classes C ⁇ PCj, and the vector C ⁇ in the D N (Ci, Cj) matrix represents the Rank between the class and Cj.
  • Order normalized distance In step S112, when the Rank-Order distance between classes is less than the distance threshold, and the Rank-Order normalization distance between the classes is less than 1, the classes are merged.
  • step S120 the intra-class aggregation degree corresponding to the iteratively merged class is calculated by using the distance between the objects in the class.
  • step S120 may include the following steps:
  • step S121 the distance between each object in the class is obtained.
  • the distance between the objects may be a cosine similarity, an Euclidean distance, or a Jachard distance. It should be noted that, when the cosine similarity degree is used in the present disclosure to calculate the distance between objects, the distance between the objects is defined as ic 0S e, that is, the smaller the distance between the objects, the greater the similarity of the objects.
  • step S122 the distance average of the distances between the objects in the class is calculated to obtain the intra-class aggregation degree of the class. Assuming that there are n objects in the class, according to the distance between any two objects in the calculated class, the distance matrix d of nXn is obtained. Each point in the matrix indicates the distance between the corresponding two objects, for example, matrix d Vector ⁇ table Shows the distance between the i-th object and the j-th object in the class. This step calculates the average of the vectors in matrix d.
  • step S120 may include the following steps: In step S123, the distance between each object in the class is obtained. In step S124, the average distance of the distances between the objects in the class is calculated according to the distance between the objects in the class. In step S125, the distance average is normalized to obtain an intra-class polymerization degree of the class. Normalize d_aver from the distance average, that is, d_aver is summarized into a range [dlef t, dright], dleft and dri ght are thresholds, for example, dl eft can be 0. 6, dright can be 0. 75. For example, the normalization formula is shown in equation (6): dleft, d_aver ⁇ dleft
  • D_aver dright, d_aver> dright (6) d_aver, dleft ⁇ d aver ⁇ dright
  • the degree of intra-class polymerization obtained after normalization is 0.6;
  • the in-class degree of polymerization obtained after normalization is 0.75, and the degree of aggregation in the class after normalization is 0.75.
  • (1-cosine similarity) is used to measure the degree of intra-class polymerization, so the smaller the intra-class polymerization degree, the more aggregated the objects in the class and the greater the similarity. Therefore, the intra-class aggregation degree is normalized. Into an interval, for example,
  • the objects within the class are divided according to the intra-class aggregation degree, when the intra-class aggregation degree is not within the normalized interval Dividing the objects in the class according to the threshold of the interval, thereby realizing that the class having a large degree of aggregation within the class (that is, a class having a large intra-class dispersion) can be appropriately divided into a plurality of classes, thereby enabling Avoid classifying too many classes with less aggregation within the class.
  • step S130 for each class obtained by iterative merging, the object whose distance between the objects in the class is smaller than the degree of aggregation within the class is divided into a new class, and the number of classes is updated. For each class that is iteratively merged according to the Rank-Order distance, each class is divided according to the distance between the objects in the class and the degree of aggregation within the class, and a new class is obtained, and an iteration is completed, and then step S140 is performed.
  • step S130 may include the following steps: In step S131, an object whose distance between objects within the class is smaller than the degree of aggregation within the class is connected.
  • the degree of aggregation indicates that the similarity between objects is large and can be divided into the same class.
  • the two objects corresponding to the distance may be connected, for example, when the distance between the two face images is less than the intra-class aggregation degree, the i-th object and the j-th object are connected.
  • the distance between the objects in the class is greater than the degree of aggregation within the class, it indicates that the similarity between the objects is small, and it is not suitable to be divided into the same class without any markup.
  • a connected component within the class is determined according to the connectivity flag.
  • the connectable object is regarded as a connected component, so that all objects in the class can be divided into several connected components.
  • the class is split into new classes according to the connected component, and the number of classes is updated.
  • the object corresponding to each connected component is divided into a new class, that is, a class contains several connected components, and the large class is divided into several new classes, and the number of classes is correspondingly increased.
  • step S140 it is judged whether the number of updated classes is smaller than the number of classes before the update.
  • step S150 If yes, go back to step S1 10; otherwise, go to step S150.
  • the process returns to step S1 10, and the iterative merging of the classes is performed according to the Rank-Order distance between the classes until the number of classes before and after the update is unchanged.
  • the class is merged based on the Rank-Order distance, and then the new class is divided as an iteration. It is assumed that the number of pre-merging classes is 6, based on the Rank-Order distance, the merged into 4 classes, and then the merged 4 classes. After splitting to get 5 classes, the number of updated classes is 5.
  • the number of classes before the update is 6.
  • the updated number is less than the number before the update, and the return continues to perform iteration. If the number of updated classes is less than the number of pre-update classes, indicating that the intra-class dispersion is large, that is, the objects in the class are not gathered enough, there may be outliers, and it is necessary to continue the iterative merging of the split classes. And classify the class until the number of updated classes is not greater than the number of classes before the update. When the number of classes before and after the update is equal, in step S150, a clustering result is obtained, the clustering result including a class including a plurality of objects and a class containing a single object.
  • the resulting clustering result is a class that contains multiple objects, and a class that contains a single object.
  • a plurality of objects within a class containing multiple objects are face images of the same person.
  • a class that contains only a single object is an out-of-group object that is removed from the iteratively merged class using the Rank-Order distance.
  • the clustering method provided in this embodiment uses the distance between objects within the class (for example, 1-cosine similarity, Euclidean distance, etc.) to measure the similarity of the two objects, and compares the similarities.
  • P represents the accuracy of the clustering result
  • R represents the recall rate in the clustering result
  • CR represents the number of face images that each class has on average in the clustering result. It can be seen from the results in Table 1 that the total number of faces included in all the images in Scenario 1 is 2291, and all images contain 562 different people, and the average person corresponds to 4.07 face images, ie The average accuracy of the clustering results of the clustering results is 86.1%.
  • the clustering accuracy obtained by the clustering method of the present disclosure is 99.1%, which is much higher than the accuracy of clustering only by Rank-Order distance.
  • FIG. 7 is a schematic diagram of a clustering device according to an exemplary embodiment.
  • the apparatus includes an iterative merging unit 100, an obtaining unit 200, a dividing unit 300, and a judging unit 400.
  • the iterative merging unit 100 is configured to perform iterative merging of classes according to the Rank-Order distance between classes.
  • the iterative merging unit 100 may include a third obtaining subunit and a merging subunit; the third obtaining subunit is configured to acquire an inter-class Rank-Order distance, and obtain an inter-class Rank-Order Normalized distance.
  • the merging subunit is configured to merge the eligible classes respectively when the Rank-Order distance between the classes is less than the distance threshold and the normal-to-class Rank-Order normalization distance is less than one.
  • the obtaining unit 200 is configured to obtain the intra-class aggregation degree corresponding to the iteratively merged class by using the distance between the respective objects in the class.
  • the obtaining unit 200 may include a first acquiring subunit and a first calculating subunit; the first acquiring subunit is configured to acquire a distance between each object in the class.
  • the first computing subunit is configured to calculate an average of distances between respective objects of the class to obtain the intra-class aggregation degree.
  • the obtaining unit 200 may include a second acquiring subunit, a second calculating subunit, and a normalizing subunit; the second acquiring subunit is configured to acquire each object in the class The distance between them.
  • the functions and implementation manners of the second obtaining subunit and the first obtaining subunit are the same.
  • the second computing subunit is configured to calculate a distance average of distances between objects within the class based on distances between the objects within the class.
  • the normalized subunit is configured to normalize the distance average to obtain an in-class degree of aggregation of the class.
  • the dividing unit 300 is configured to divide, for each class obtained by iterative merging, an object whose distance between objects within the class is smaller than the degree of aggregation within the class into a new class, and update the number of classes.
  • the dividing unit may include a first determining subunit, a marking subunit, a determining subunit, and a disassembling unit.
  • the first determining subunit is configured to determine whether a distance between objects within the class is less than the intra-class aggregation degree.
  • the marking subunit is configured to perform an object of connectivity between objects within the class that are less than the degree of aggregation within the class.
  • the determining subunit is configured to determine a connected component within the class based on the connectivity flag.
  • the split unit is configured to split the class into new classes based on the connected component and update the number of classes.
  • the determining unit 400 is configured to determine whether the number of updated classes is less than the number of classes before updating; when the number of updated classes is less than the number of classes before updating, the iterative merging unit performs a Rank based on the class
  • the -Order distance is iteratively merged with the class, until the number of classes before and after the update is unchanged, the clustering result is obtained, the clustering result includes a class containing multiple objects and a class containing a single object.
  • the iterative merging unit combines the eligible classes according to the Rank-Order distance between the classes, thereby reducing the number of classes; and the obtaining unit calculates the class according to the distance between the objects in the class.
  • the split unit splits the objects whose distances between objects within the class smaller than the degree of aggregation within the class into new classes until all the classes are split. Then, the judgment unit re-integrates and splits the split class, until each class can no longer be split to obtain a cluster containing multiple objects and a class containing a single object, thereby realizing the comparison of dissimilarity in the clustering process. Large objects are eliminated, improving the accuracy of clustering results. In particular, when there are many objects in the data set, but there are fewer objects belonging to the same class, the accuracy of the clustering results is relatively high.
  • FIG. 8 is a block diagram of a terminal device 800 for clustering, according to an exemplary embodiment.
  • the terminal device 800 can be a mobile phone, a computer, a digital broadcast terminal, a messaging device, a game console, a tablet device, a medical device, a fitness device, a personal digital assistant, and the like.
  • terminal device 800 can include one or more of the following components: processing component 802, memory 804, power component 806, multimedia component 808, audio component 810, input/output (I/O) interface 812, and sensor component 814.
  • Processing component 802 typically controls the overall operation of terminal device 800, such as operations associated with display, telephone calls, data communications, camera operations, and recording operations.
  • Processing component 802 can include one or more processors 820 to execute instructions to perform all or part of the steps of the above described methods.
  • processing component 802 can include one or more modules to facilitate interaction between component 802 and other components.
  • processing component 802 can include a multimedia module to facilitate interaction between multimedia component 808 and processing component 802.
  • Memory 804 is configured to store various types of data to support operation at device 800. Examples of such data include instructions for any application or method operating on terminal device 800, contact data, phone book Data, messages, pictures, videos, etc.
  • the memory 804 can be implemented by any type of volatile or non-volatile storage device or a combination thereof, such as static random access memory (SRAM), electrically erasable programmable read only memory (EEPROM), erasable Programmable Read Only Memory (EPR0M), Programmable Read Only Memory (PR0M), Read Only Memory (ROM), Magnetic Memory, Flash Memory, Disk or Optical Disk.
  • Power component 806 provides power to various components of terminal device 800.
  • Power component 806 can include a power management system, one or more power sources, and other components associated with generating, managing, and distributing power for terminal device 800.
  • the multimedia component 808 includes a screen that provides an output interface between the terminal device 800 and a user.
  • the screen can include a liquid crystal display (LCD) and a touch panel (TP). If the screen includes a touch panel, the screen can be implemented as a touch screen to receive input signals from the user.
  • the touch panel includes one or more touch sensors to sense touches, slides, and gestures on the touch panel. The touch sensor can sense not only the boundary of the touch or sliding action, but also the duration and pressure associated with the touch or slide operation.
  • the multimedia component 808 includes a front camera and/or a rear camera. When the device 800 is in an operation mode, such as a shooting mode or a video mode, the front camera and/or the rear camera can receive external multimedia data. Each front and rear camera can be a fixed optical lens system or have focal length and optical zoom capabilities.
  • the audio component 810 is configured to output and/or input an audio signal.
  • the audio component 810 includes a microphone (MIC) that is configured to receive an external audio signal when the terminal device 800 is in an operational mode, such as a call mode, a recording mode, and a voice recognition mode.
  • the received audio signal may be further stored in memory 804 or transmitted via communication component 816.
  • the audio component 810 also includes a speaker for outputting an audio signal.
  • the I/O interface 812 provides an interface between the processing component 802 and the peripheral interface module, which may be a keyboard, a click wheel, a button, or the like. These buttons can include, but are not limited to: Home button, Volume button, Start button, and Lock button.
  • Sensor component 814 includes one or more sensors for providing terminal device 800 with a status assessment of various aspects. For example, sensor component 814 can detect an open/closed state of device 800, a relative positioning of components, such as the display and keypad of terminal device 800, and sensor component 814 can also detect terminal device 800 or terminal device 800 The position of the component changes, the presence or absence of contact of the user with the terminal device 800, the orientation or acceleration/deceleration of the terminal device 800, and the temperature change of the terminal device 800.
  • Sensor assembly 814 can include a proximity sensor configured to detect the presence of nearby objects without any physical contact.
  • Sensor assembly 814 may also include a light sensor, such as a CMOS or CCD image sensor, for use in imaging applications.
  • the sensor component 814 can also include an acceleration sensor, a gyro sensor, a magnetic sensor, a pressure sensor, or a temperature sensor.
  • Communication component 816 is configured to facilitate wired or wireless communication between terminal device 800 and other devices.
  • the terminal device 800 can access a wireless network based on a communication standard such as WiFi, 2G, 3G or 4G, or a combination thereof.
  • the communication component 816 receives broadcast signals or broadcast associated information from an external broadcast management system via a broadcast channel.
  • the communication component 816 also includes a near field communication (NFC) module to facilitate short range communication.
  • NFC near field communication
  • the NFC module can be implemented based on radio frequency identification (RFID) technology, infrared data association (IrDA) technology, ultra-wideband (UWB) technology, Bluetooth (BT) technology, and other technologies.
  • terminal device 800 may be implemented by one or more application specific integrated circuits (ASICs), digital signal processors (DSPs), digital signal processing devices (DSPDs), programmable logic devices (PLDs), A gated array (FPGA), controller, microcontroller, microprocessor, or other electronic component implementation is used to perform the above methods.
  • ASICs application specific integrated circuits
  • DSPs digital signal processors
  • DSPDs digital signal processing devices
  • PLDs programmable logic devices
  • FPGA gated array
  • controller microcontroller, microprocessor, or other electronic component implementation is used to perform the above methods.
  • non-transitory computer readable storage medium comprising instructions, such as a memory 804 comprising instructions executable by the processor 820 of the terminal device 800 to perform the above method.
  • the non-transitory computer readable storage medium may be a ROM, a random access memory (RAM), a CD-ROM, a magnetic tape, a floppy disk, and an optical data storage device.
  • a non-transitory computer readable storage medium when instructions in the storage medium are executed by a processor of a mobile terminal, enabling the mobile terminal to perform a clustering method, the method comprising: according to Rank- between classes Order distance, iterative merging of classes; using the distance between objects in the class to obtain the intra-class aggregation degree of the iteratively merged class; for each class obtained by iterative merging, the distance between objects within the class is smaller than the class
  • the inner degree of aggregation object is divided into a new class, and the number of classes is updated; when the number of updated classes is less than the number of classes before the update, the execution of the class is iteratively merged according to the Rank-Order distance between the classes.
  • Steps until the number of classes before and after the update is unchanged, a clustering result is obtained, the clustering result including a class containing a plurality of objects and a class containing a single object.
  • the intra-class aggregation degree corresponding to the iteratively merged class is obtained by using the distance between the objects in the class, and the method is as follows: acquiring a distance between each object in the class; calculating according to the distance between the objects in the class The average distance of the distances between the objects in the class gives the degree of intra-class polymerization of the class.
  • the using the distance between each object in the class to obtain the intra-class aggregation degree corresponding to the iteratively merged class is as follows: Obtaining a distance between each object in the class; calculating an average distance of distances between the objects in the class according to the distance between the objects in the class; normalizing the distance average to obtain a class of the class Degree of internal polymerization.
  • the object whose distance between the objects in the class is smaller than the degree of aggregation in the class is divided into a new class, and the number of the classes is updated, as follows: An object having a distance less than the intra-class aggregation degree performs a connectivity flag; determining a connected component in the class according to the connectivity flag; splitting the class into a new class according to the connected component, and updating the number of classes .
  • FIG. 9 is a schematic structural diagram of a server in an embodiment of the present invention.
  • the server 1900 can vary considerably depending on configuration or performance, and can include one or more central processing units (CPUs) 1922 (eg, one or more processors) and memory 1932, one Or more than one storage medium 1930 storing data 1942 or data 1944 (eg, one or one storage device in Shanghai).
  • CPUs central processing units
  • memory 1932 one or more than one storage medium 1930 storing data 1942 or data 1944 (eg, one or one storage device in Shanghai).
  • the memory 1932 and the storage medium 1930 may be short-term storage or persistent storage.
  • the program stored on the storage medium 1930 may include one or more modules (not shown), each of which may include a series of instruction operations in the terminal device.
  • central processor 1922 can be configured to communicate with storage medium 1930, executing a series of instruction operations in storage medium 1930 on server 1900.
  • Server 1900 may also include one or more power supplies 1926, one or more wired or wireless network interfaces 1950, one or more input and output interfaces 1958, one or more keyboards 1956, and/or one or more operating systems 1941.
  • Windows ServerTM Mac OS XTM, UnixTM, LinuxTM, FreeBSDTM, and more.
  • non-transitory computer readable storage medium comprising instructions, such as a memory 1932 or a storage medium 1930, which may be executed by the processor 1922 of the terminal device to perform the above method.
  • the non-transitory computer readable storage medium may be a ROM, a random access memory (RAM), a CD-ROM, a magnetic tape, a floppy disk, and an optical data storage device.
  • a non-transitory computer readable storage medium when instructions in the storage medium are executed by a processor of a terminal device, enabling the terminal device to perform a clustering method, the method comprising: according to Rank- between classes Order distance, iterative merging of classes; using the distance between objects within the class Iteratively classifies the intra-class aggregation degree corresponding to the merged class; for each class obtained by iterative merging, divides the object between the objects within the class less than the degree of aggregation within the class into a new class, and updates the number of classes When the number of updated classes is less than the number of classes before the update, the step of performing iterative merging of the classes according to the Rank-Order distance between the classes is returned, until the number of classes before and after the update is unchanged, the clustering result is obtained.
  • the clustering result includes a class containing a plurality of objects and a class containing a single object.
  • the intra-class aggregation degree corresponding to the iteratively merged class is obtained by using the distance between the objects in the class, and the method is as follows: acquiring a distance between each object in the class; calculating according to the distance between the objects in the class The average distance of the distances between the objects in the class gives the degree of intra-class polymerization of the class.
  • the intra-class aggregation degree corresponding to the iteratively merged class is obtained by using the distance between the objects in the class, and the method is as follows: acquiring a distance between each object in the class; calculating according to the distance between the objects in the class The average distance of the distances between the objects in the class; normalizing the distance average to obtain the intra-class polymerization degree of the class.
  • the object whose distance between the objects in the class is smaller than the degree of aggregation in the class is divided into a new class, and the number of the classes is updated, as follows: An object having a distance less than the intra-class aggregation degree performs a connectivity flag; determining a connected component in the class according to the connectivity flag; splitting the class into a new class according to the connected component, and updating the number of classes .
  • the iterative merging of the classes according to the Rank-Order distance between the classes is as follows: obtaining the Rank-Order distance between the classes, and obtaining the Rank-Order normalization distance between the classes; The -Order distance is less than the distance threshold, and when the Rank-Order normalization distance between the classes is less than 1, the classes are merged.

Abstract

一种聚类方法及相关装置,所述聚类方法在利用类间的Rank-Order距离对符合条件的类进行合并,从而减少类的数量;然后,利用类内各个对象之间的距离计算类内聚合度,将类内对象间的距离小于所述类内聚合度的对象拆分成新的类,直到所有的类都拆分完。然后,将拆分后的类重新进行迭代合并和拆分,直到各个类无法再拆分,确定出包含多个对象的聚类及包含单个对象的类,从而实现将聚类过程中相异性比较大的对象剔除掉,提高聚类结果的准确率。尤其,当数据集中的对象比较多,但属于同一类的对象比较少时,聚类结果的准确率比较高。

Description

聚类方法及相关装置 本申请基于申请号为 201410097422. 5、 申请日为 2014年 3月 14日的中国专利申请 提出, 并要求该中国专利申请的优先权, 该中国专利申请的全部内容在此引入本申请作 为参考。 技术领域
本公开涉及计算机技术领域, 特别是涉及一种聚类方法及相关装置。 背景技术
聚类是将物理或抽象对象的集合分成由类似的对象组成的多个类的过程, 即将对象 分类到不同的类 (簇) 的过程, 同一个类中的对象有很大的相似性, 不同类间的对象有 很大的相异性。 下文使用 "类" 的概念, 需要说明的是, 本文中 "类" 与 "簇" 的含义 相同。 例如, 聚类方法用于人脸图片的分类时, 将属于同一个人的图片分为一类, 相关的 聚类方法采用 Rank-Order距离度量两张人脸之间的相似性, 能够将同一个人的图片聚集 在一起。 但是, 当一推图片中包含的人脸数量比较多, 且每个人的图片比较少时, 此种 聚类方法的聚类结果准确率非常低。 发明内容
为克服相关技术中存在的问题, 本公开提供一种聚类方法及相关装置, 以提高聚类 结果准确率。 为了解决上述技术问题, 本公开实施例公开了如下技术方案: 根据本公开实施例的第一方面, 提供一种聚类方法, 包括: 根据类间的 Rank-Order 距离, 进行类的迭代合并; 利用类内各个对象间的距离获 得迭代合并后的类对应的类内聚合度; 针对迭代合并得到的每个类, 将类内对象间的距 离小于所述类内聚合度的对象划分成一个新的类, 并更新类的数量; 当更新后的类的数 量比更新前的类的数量少时,返回执行根据类间的 Rank-Order距离进行类的迭代合并的 步骤, 直到更新前后的类的数量不变时, 得到聚类结果, 所述聚类结果包括包含多个对 象的类和包含单个对象的类。 结合第一方面, 在第一方面的第一种可能的实现方式中, 所述利用类内各个对象间 的距离获得迭代合并后的类对应的类内聚合度, 采用如下方式: 获取类内各个对象间的距离; 根据所述类内对象间的距离, 计算所述类内的各个对 象间距离的距离平均值, 得到所述类的类内聚合度。 结合第一方面, 在第一方面的第二种可能的实现方式中, 所述利用类内各个对象间 的距离获得迭代合并后的类对应的类内聚合度, 采用如下方式: 获取类内各个对象间的距离; 根据所述类内对象间的距离, 计算所述类内的各个对 象间距离的距离平均值; 将所述距离平均值进行归一化, 得到所述类的类内聚合度。 结合第一方面的第一种实现方式或第一方面的第二种实现方式, 在第一方面的第三 种实现方式中, 所述针对迭代合并得到的每个类, 将类内对象间的距离小于所述类内聚 合度的对象划分成一个新类, 更新类的数量, 采用如下方式: 将所述类内对象间的距离小于所述类内聚合度对象进行连通标记; 根据所述连通标 记确定所述类内的连通分量;根据所述连通分量将所述类拆分成新类,并更新类的数量。 结合第一方面, 在第一方面的第四种可能的实现方式中, 所述根据类间的 Rank-Order距离, 进行类的迭代合并, 采用如下方式: 获取类间 Rank-Order 距离, 以及获取类间 Rank-Order 归一化距离; 当类间的 Rank-Order距离小于距离阈值, 且所述类间的 Rank-Order归一化距离小于 1时, 合并 所述类; 当合并后的类的数量小于合并前的类的数量时, 执行获取合并后的类间 Rank-Order距离, 以及类间 Rank-Order归一化距离的步骤。 根据本公开实施例的第二方面, 提供一种聚类装置, 包括: 迭代合并单元, 用于根据类间的 Rank-Order距离, 进行类的迭代合并; 获取单元, 用于利用类内各个对象间的距离获得迭代合并后的类对应的类内聚合度; 划分单元, 用 于针对迭代合并得到的每个类, 将类内对象间的距离小于所述类内聚合度的对象划分成 一个新的类, 并更新类的数量; 判断单元, 用于当更新后的类的数量比更新前的类的数 量少时, 控制所述迭代合并单元执行根据类间的 Rank-Order距离进行类的迭代合并, 直 到更新前后的类的数量不变时, 得到聚类结果, 所述聚类结果包括包含多个对象的类和 包含单个对象的类。 结合第二方面, 在第二方面的第一种可能的实现方式中, 所述获取单元包括: 第一获取子单元, 用于获取类内各个对象间的距离; 第一计算子单元, 用于计算所 述类的各个对象间的距离的平均值, 得到所述类内聚合度。 结合第二方面, 在第二方面的第二种可能的实现方式中, 所述获取单元包括: 第二获取子单元, 用于获取类内各个对象间的距离; 第二计算子单元, 用于根据所 述类内对象间的距离, 计算所述类内的各个对象间距离的距离平均值; 归一化子单元, 将所述距离平均值进行归一化, 得到所述类的类内聚合度。 结合第二方面的第一种可能的实现方式或第二方面的第二种可能的实现方式, 在第 二方面的第三种可能的实现方式中, 所述划分单元包括: 第一判断子单元, 用于判断所述类内对象间的距离是否小于所述类内聚合度; 标记 子单元, 用于当所述类内对象间的距离小于所述类内聚合度时, 将所述类内对象间的距 离对应的对象进行连通标记; 确定子单元, 用于根据所述连通标记确定所述类内的连通 分量; 拆分子单元, 用于根据所述连通分量将所述类拆分成新类, 并更新类的数量。 结合第二方面, 在第二方面的第四种可能的实现方式中, 所述迭代合并单元包括: 第三获取子单元, 用于获取类间 Rank-Order距离, 以及获取类间 Rank-Order归一 化距离; 合并子单元, 用于当类间的 Rank-Order 距离小于距离阈值, 且所述类间 Rank-Order归一化距离小于 1时, 合并所述类; 第二判断子单元, 用于当合并后的类的 数量小于合并前的类的数量时, 控制所述第三获取子单元执行获取更新后的类间 Rank-Order距离, 以及类间 Rank-Order归一化距离的步骤。 根据本公开实施例的第二方面, 提供一种终端设备, 包括: 处理器; 用于存储处理器可执行指令的存储器; 其中, 所述处理器被配置为: 根据类间的 Rank-Order 距离, 进行类的迭代合并; 利用类内各个对象间的距离获 得迭代合并后的类对应的类内聚合度; 针对迭代合并得到的每个类, 将类内对象间的距 离小于所述类内聚合度的内对象划分成一个新的类, 并更新类的数量; 当更新后的类的 数量比更新前的类的数量少时,返回执行根据类间的 Rank-Order距离进行类的迭代合并 的步骤, 直到更新前后的类的数量不变时, 得到聚类结果, 所述聚类结果包括包含多个 对象的类和包含单个对象的类。 本公开的实施例提供的技术方案可以包括以下有益效果: 所述聚类方法在利用类间的 Rank-Order距离对符合条件的类进行合并, 从而减少类的数量; 然后, 利用类内各个对象之 间的距离计算类内聚合度, 将类内对象间的距离小于所述类内聚合度的对象拆分成新的类, 直到所有的类都拆分完。 然后, 将拆分后的类重新进行迭代合并和拆分, 直到各个类无法再 拆分, 确定出包含多个对象的聚类及包含单个对象的类, 从而实现将聚类过程中相异性比较 大的对象剔除掉, 提高聚类结果的准确率。 尤其, 当数据集中的对象比较多, 但属于同一类 的对象比较少时, 聚类结果的准确率比较高。 应当理解的是, 以上的一般描述和后文的细节描述仅是示例性的, 并不能限制本公 开。 附图说明
此处的附图被并入说明书中并构成本说明书的一部分, 示出了符合本发明的实施例, 并与说明书一起用于解释本发明的原理。 图 1是多个对象的序列排序示意图; 图 2根据一示例性实施例示出的一种聚类方法的流程图; 图 3是图 2中步骤 S 110的一种示例性实施例的流程图; 图 4是图 2中步骤 S 110的另一种示例性实施例的流程图; 图 5是图 2中步骤 S120的一种示例性实施例的流程图; 图 6是图 2中步骤 S130的一种示例性实施例的流程图; 图 7是是根据一示例性实施例示出的一种聚类装置的框图; 图 8是根据一示例性实施例示出的一种终端设备的框图; 图 9是根据一示例性实施例示出的一种服务器的框图。 通过上述附图, 已示出本公开明确的实施例, 后文中将有更详细的描述。 这些附图 并不是为了通过任何方式限制本公开构思的范围, 而是通过参考特定实施例为本领域技 术人员说明本公开的概念。 具体实施方式
这里将详细地对示例性实施例进行说明, 其示例表示在附图中。 下面的描述涉及附 图时, 除非另有表示, 不同附图中的相同数字表示相同或相似的要素。 以下示例性实施 例中所描述的实施方式并不代表与本发明相一致的所有实施方式。 相反, 它们仅是与如 所附权利要求书中所详述的、 本发明的一些方面相一致的装置和方法的例子。 在对本公开的示例性实施例进行说明之前,首先介绍 Rank-Order距离的相关知识, 计算对象间的距离 (例如, 余弦相似度、 欧式距离等), 按照距离的大小将各个对象进行 重新排序, 得到一个序列。 假设有 n个对象, 分别为 ^、 i2、 i3、 i4、 i5、 i6…… in, 以 对象 ^为基准对象, 计算其它各个对象与对象 ^之间的距离, 并按距离的大小进行排序, 得到图 1所示的序列 οι ; 以对象 i2为基准对象, 计算其它各个对象与基准对象12之间的 距离, 得到图 1所示的序列 02。 根据序列 中对象 ^和12之间的邻居对象在序列 02中的序号计算, 对象 i^Pi2之间 的非对称 Rank-Order距离 D (^, i2), 具体根据图 1的示例, 对象 ^、 i3、 i4、 12在02中 的序号分别为 5、 2、 4、 0, 则根据公式 1计算 D (i^ i2): θ!(ί2)
D ( i{, i2) = ^02(f1(X)) = 02(i1) + 02(i3) + 02(i4) + 02(i2) =5+2+4+0=11 (1) x=0 公式 1 中, O iJ表示对象^在序列 02中的序号, 02Ci3)表示对象13在序列 02中的 序号, 02(i4)表示对象14在序列 02中的序号, 02(i2)表示对象12在序列 02中的序号。 同理计算得到对象 i^Pi2之间的非对称 Rank-Order距离 D (i2, 然后, 根据公 式 2计算得到对象 i^Pi2之间归一化之后的 Rank-Order距离 D11^^):
DR(i · ) = D^+D^ (2)
所述 D^H)表示归一化后的对象间的 Rank-Order距离, 类间的 Rank-Order距离 与对象间的 Rank-Order距离算法相同,一个类为基准类然后按照类间距离对各个类进行 重新排序, 类间距离如公式 (3) 所示: d(Ci,Cj) = mind(a,b)Vae Ci,b Ci (3) 公式 (3) 中 C^PCj表示类。 类间 Rank-Order距离的计算公式如公式 (4) 所示:
Figure imgf000008_0001
公式 (4) 中 D(Ci,Cj)表示类 与类 Cj之间的非对称 Rank-Order距离, D(Cj,Ci)表 示类 Cj与类 之间的非对称 Rank-Order距离; Oc^Cj)表示以 ^为基准类的序列中类 Cj 的序号, Oe.(Ci)表示以类^为基准类的序列中类 ^的序号。 根据类间距离 DR(Ci,Cj)计算得到类间归一化 Rank-Order距离 DN(Ci,Cj), 其中, 类 间归一化距离的计算公式如公式 (5) 所示:
Figure imgf000008_0002
1 κ
∑ -∑d(a,fa(k)) (5)
Ci|+ C laeQUCj Kk=l 公式 (5) 中, c Ci,Cj)表示类 和类 Cj之间的距离, | |和 Cj表示类内的对象水 数, K是常数, fa(k)表示对象 a第 k个邻居对象, ^Ci,Cj)表示两个类中距离它们的最 近的 K个对象之间的平均距离。 假设对象是人脸图像, 本公开提供的所述聚类方法能够将属于同一个人的图像聚集 在一起形成一个聚类。 将人脸图像中的特征转换成一组向量, 因此, 对象间的距离即向 量之间的距离。 当然, 本公开提供的聚类方法也可以应用于其它的数据。 图 2是根据一示例性实施例示出的一种聚类方法的流程图, 如图 1所示, 聚类方法 应用于终端中, 可以包括以下步骤: 在步骤 S110中, 根据类间的 Rank-Order距离, 进行类的迭代合并。 计算两两类之间的 Rank-Order距离, 将 Rank-Order距离小于第一距离阈值的类进 行合并。 所述第一距离阈值可以根据数据类型确定, 还可以根据试验结果确定。 如图 3所示, 步骤 S110可以包括以下步骤: 在步骤 S111 中, 获取类间 Rank-Order距离, 以及获取类间 Rank-Order归一化距 离。 假设初始的人脸图像的数目是 N, 将每一个人脸图像作为一个单独的类, 则初始的 类的数量为 N个, 并设定距离阈值 t和常数 K。 针对任意的类 C^PCj, 根据上述的公式 (1) 〜 (5), 计算得到类间 Rank-Order 距离 DR(Ci,Cj)和类间归一化 Rank-Order 距离 DN(Ci,Cj;)。 初始类的数量为 N, 则最后得到一个 NXN的 DR(Ci,Cj;)矩阵和一个 NXN的 DN(Ci,Cj)矩阵,其中, DR(Ci,Cj)矩阵中每个向量表示对应的类之间的 Rank-Order距离, 例如, 矩阵中的 Cg表示类 C^PCj间的 Rank-Order距离, DN(Ci,Cj)矩阵中的向量 C^表 示类 和 Cj间的 Rank-Order归一化距离。 在步骤 S112中,当类间的 Rank-Order距离小于距离阈值,且所述类间的 Rank-Order 归一化距离小于 1时, 合并所述类。 从 DR(Ci,Cj)矩阵中选出小于距离阈值 t的 DR(Ci,Cj), 以及从 DN(Ci,Cj)矩阵中选出 小于 1的1) ,^;)。 当1^( ,^)<^, 且1^( ,^)<1时, 确定类 C^PCj相似性较大 能够, 即类 C^PCj为候选合并类, 然后合并全部的候选合并类。 当1^( ,^)^ 表明 类 C^PCj相似性较小; 当1) ,^)^1时, 表明类间离散度较大。 在步骤 S120 中, 利用类内各个对象间的距离计算迭代合并后的类对应的类内聚合 度。 在本公开的一个实施例中, 如图 4所示, 步骤 S120可以包括以下步骤: 在步骤 S121 中, 获取类内各个对象间的距离。 所述对象间的距离可以是余弦相似 度、 欧式距离或杰卡德距离等。 需要说明的是, 本公开中采用余弦相似度 cose计算对象间的距离时, 将对象间的距 离定义为 i-c0Se, 即对象间的距离越小, 对象的相似性越大。 在步骤 S122中, 计算所述类内各个对象间距离的距离平均值, 得到所述类的类内聚合 度。 假设类内的对象为 n个, 根据计算得到的类内任意两个对象之间的距离, 得到 nXn 的距离矩阵 d, 矩阵中每个点表明对应的两个对象间的距离, 例如矩阵 d中的向量 ^表 示类内的第 i个对象和第 j个对象之间的距离。 此步骤即计算矩阵 d中各个向量的平均
在本公开的另一个实施例中, 如图 5所示, 步骤 S120可以包括以下步骤: 在步骤 S123中, 获取类内各个对象间的距离。 在步骤 S124 中, 根据所述类内对象间的距离计算所述类内的各个对象间距离的距 离平均值。 在步骤 S125中, 将所述距离平均值进行归一化, 得到所述类的类内聚合度。 将距离平均值将 d_aver 进行归一化, 就是将 d_aver 归纳到一个范围中 [dlef t, dright] , dleft和 dri ght是阈值, 例如, dl eft可以是 0. 6, dright可以是 0. 75。 例 如, 归一化公式如公式 (6 ) 所示: dleft, d_aver< dleft
d_aver= dright, d_aver> dright (6) d_aver, dleft < d aver < dright 例如, 当计算得到距离平均值为 0. 5时, 归一化后得到的类内聚合度为 0. 6 ; 当距 离平均值为 0. 65时, 归一化后得到的类内聚合度为 0. 65 ; 当距离平均值为 0. 78时, 归 一化后得到的类内聚合度为 0. 75。 本公开实施例中, 采用 (1-余弦相似度) 来度量类内聚合度, 因此类内聚合度越小 表明类内的对象越聚集、相似性越大, 因此, 将类内聚合度归一化到一个区间内, 例如,
[0. 6, 0. 75]; 当类内聚合度在归一化的区间内时, 根据类内聚合度对类内的对象进行划 分, 当类内聚合度不在归一化的区间内时, 根据该区间的阈值对类内的对象进行划分, 从而实现将类内聚合度数值较大的类 (即, 类内离散度较大的类) 能够适当地划分成多 个类, 这样, 能够避免将类内聚合度较小的类划分过多的类。 在步骤 S130 中, 针对迭代合并得到的每个类, 将类内对象间的距离小于所述类内 聚合度的对象划分成一个新的类, 并更新类的数量。 对于根据 Rank-Order 距离迭代合并后的每个类, 根据类内对象间的距离及类内聚 合度, 对每个类进行划分, 得到新的类, 至此完成一次迭代, 然后执行步骤 S140。 在本公开的一个实施例中, 如图 6所示, 步骤 S130可以包括以下步骤: 在步骤 S131中, 将类内对象间的距离小于所述类内聚合度的对象进行连通标记。 对于类内的任一对象, 査询所述类内对象距离矩阵内该对象与类内的其它对象间的 距离是否小于所述类内聚合度, 如果类内对象间的距离小于所述类内聚合度, 表明对象 间的相似性较大, 可以划分到同一个类中。 此时, 可以将所述距离对应的两个对象作连 通标记, 例如, 两个人脸图像间的距离 小于类内聚合度时, 将第 i个对象和第 j个对 象连通。 当所述类内对象间的距离大于所述类内聚合度时, 表明对象间的相似性较小, 不适 合划分到同一个类中, 不作任何标记。 在步骤 S132中, 根据所述连通标记确定所述类内的连通分量。 将能够连通的对象作为一个连通分量, 从而判断类内的全部对象能够划分成几个连 通分量。 在步骤 S133中, 根据所述连通分量将所述类拆分成新类, 并更新类的数量。 将每个连通分量对应的对象划分到一个新的类中, 也就是一个类中包含几个连通分 量, 就将此大类划分成几个新的类, 并相应的增加类的数量。 通过划分连通分量能够实 现将一个聚类中不属于该类的对象划分出来, 即从聚类中剔除离群对象。 在步骤 S140 中, 判断更新后的类的数量是否小于更新前的类的数量。 如果是, 返 回执行步骤 S1 10 ; 否则, 进入步骤 S150。 当更新后的类的数量比更新前类的数量少时, 返回执行步骤 S1 10, 根据类间的 Rank-Order距离进行类的迭代合并的步骤, 直到更新前后的类的数量不变。 对类进行基于 Rank-Order 距离合并, 然后进行划分新类作为一次迭代, 假设合并 前类的数量为 6个, 基于 Rank-Order距离合并后变为 4个类, 再对合并后的 4个类进行 拆分最终得到 5个类, 则更新后类的数量是 5个, 更新前类的数量是 6个, 更新后的数 量小于更新前的数量, 返回继续执行迭代。 如果更新后的类的数量小于更新前类的数量, 表明类内离散度较大, 即类内的对象 聚集不够紧密, 可能存在离群对象, 需要通过继续对拆分后的类进行迭代合并, 以及划 分类, 直到更新后的类的数量不大于更新前的类的数量。 当更新前后的类的数量相等时, 在步骤 S150 中, 得到聚类结果, 所述聚类结果包 括包含多个对象的类和包含单个对象的类。 更新后的类的数量等于更新前的类的数量时, 表明类内没有可剔除的离群点。 最终 得到的聚类结果是包含多个对象的类, 以及包含单个对象的类。 包含多个对象的类内的 多个对象是同一人的人脸图像。 只包含单个对象的类, 是从利用 Rank-Order距离进行迭 代合并后的类中剔除的离群的对象。 本实施例提供的聚类方法, 在利用 Rank-Order 距离合并类之后, 又利用类内对象 间距离 (例如 1-余弦相似度、 欧式距离等) 度量两个对象的相似性, 将相似性较小 (相 异性较大) 的对象从所述类中剔除 (作为新的类), 相当于剔除类中的噪声点, 从而提高 了聚类准确率。 尤其, 当数据集中的对象比较多, 但属于同一类的对象比较少时, 聚类结果 的准确率比较高。 下面以具体的试验数据说明本公开的聚类方法的显著效果, 如表 1所示: 表 1
Figure imgf000012_0001
表 1中, P表示聚类结果的准确率, R表示聚类结果中的召回率, CR表示聚类结果 中每个类平均拥有的人脸图像数量。 从表 1中的结果可以看出, 情景 1中所有的图像中共包含的人脸数量是 2291, 而所 有的图像中包含 562个不同的人, 则平均每个人对应 4. 07个人脸图像, 即所有图像中平 均有 4. 07个人脸图像属于同一个人, 相关的仅用 Rank-Order距离聚类的聚类结果中, 准确率是 86. 1%。 而采用本公开的聚类方法得到的聚类准确率为 99. 1%, 远远高于仅用 Rank-Order距离聚类的准确率。 情景 2和情景 3中, 采用本公开的聚类方法的准确率也 都高于仅用 Rank-Order距离聚类的准确率。 相应于上述的聚类方法实施例, 本公开提供了聚类装置。 图 7是根据一示例性实施例示出的一种聚类装置示意图。 请参照图 7, 该装置包括 迭代合并单元 100、 获取单元 200、 划分单元 300和判断单元 400 迭代合并单元 100被配置为根据类间的 Rank-Order距离, 进行类的迭代合并。 在本公开的一个实施例中, 迭代合并单元 100可以包括第三获取子单元和合并子单 元; 所述第三获取子单元被配置为获取类间 Rank-Order距离,以及获取类间 Rank-Order 归一化距离。 所述合并子单元被配置为当类间的 Rank-Order 距离小于距离阈值, 且所述类间 Rank-Order归一化距离小于 1时, 分别合并符合条件的类。 获取单元 200被配置为利用类内各个对象间的距离获得迭代合并后的类对应的类内 聚合度。 在本公开的一个实施例中, 所述获取单元 200可以包括第一获取子单元和第一计算 子单元; 所述第一获取子单元被配置为获取类内各个对象间的距离。 所述第一计算子单元被配置为计算所述类的各个对象间的距离的平均值, 得到所述 类内聚合度。 在本公开的另一个实施例中, 所述获取单元 200可以包括第二获取子单元、 第二计 算子单元和归一化子单元; 所述第二获取子单元被配置为获取类内各个对象间的距离。 所述第二获取子单元和所 述第一获取子单元的功能及实现方式相同。 所述第二计算子单元被配置为根据所述类内对象间的距离, 计算所述类内的各个对象 间距离的距离平均值。 归一化子单元被配置为将所述距离平均值进行归一化, 得到所述类的类内聚合度。 划分单元 300被配置为针对迭代合并得到的每个类, 将类内对象间的距离小于所述 类内聚合度的对象划分成一个新的类, 并更新类的数量。 在本公开的一个实施例中, 所述划分单元可以包括第一判断子单元、 标记子单元、 确定子单元和拆分子单元。 所述第一判断子单元被配置为判断所述类内对象间的距离是否小于所述类内聚合度。 所述标记子单元被配置为将类内对象间的距离小于所述类内聚合度的对象进行连通标 所述确定子单元被配置为根据所述连通标记确定所述类内的连通分量。 所述拆分子单元被配置为根据所述连通分量将所述类拆分成新类, 并更新类的数量。 判断单元 400被配置为判断更新后的类的数量是否比更新前的类的数量少; 当更新 后的类的数量比更新前的类的数量少时, 所述迭代合并单元执行根据类间的 Rank-Order 距离进行类的迭代合并, 直到更新前后类的数量不变时, 得到聚类结果, 所述聚类结果 包括包含多个对象的类和包含单个对象的类。 本实施例提供的聚类装置, 由迭代合并单元依据类间的 Rank-Order距离对符合条件的 类进行合并, 从而减少类的数量; 再利用获取单元根据类内各个对象之间的距离计算类内聚 合度; 然后, 由拆分单元将类内对象间的距离小于所述类内聚合度的对象拆分成新的类, 直 到所有的类都拆分完。 再由判断单元将拆分后的类重新进行迭代合并和拆分, 直到各个类无 法再拆分得到包含多个对象的聚类及包含单个对象的类, 从而实现将聚类过程中相异性比较 大的对象剔除掉, 提高聚类结果的准确率。 尤其, 当数据集中的对象比较多, 但属于同一类 的对象比较少时, 聚类结果的准确率比较高。 关于上述实施例中的装置, 其中各个模块执行操作的具体方式已经在有关该方法的 实施例中进行了详细描述, 此处将不做详细阐述说明。 图 8是根据一示例性实施例示出的一种用于聚类的终端设备 800的框图。 例如, 终 端设备 800可以是移动电话, 计算机, 数字广播终端, 消息收发设备, 游戏控制台, 平 板设备, 医疗设备, 健身设备, 个人数字助理等。 参照图 8,终端设备 800可以包括以下一个或多个组件:处理组件 802,存储器 804, 电源组件 806, 多媒体组件 808, 音频组件 810, 输入 /输出 (I/ O) 的接口 812, 传感器 组件 814, 以及通信组件 816。 处理组件 802通常控制终端设备 800的整体操作, 诸如与显示, 电话呼叫, 数据通 信, 相机操作和记录操作相关联的操作。 处理组件 802可以包括一个或多个处理器 820 来执行指令, 以完成上述的方法的全部或部分步骤。 此外, 处理组件 802可以包括一个 或多个模块, 便于处理组件 802和其他组件之间的交互。 例如, 处理组件 802可以包括 多媒体模块, 以方便多媒体组件 808和处理组件 802之间的交互。 存储器 804被配置为存储各种类型的数据以支持在设备 800的操作。 这些数据的示 例包括用于在终端设备 800上操作的任何应用程序或方法的指令, 联系人数据, 电话簿 数据, 消息, 图片, 视频等。 存储器 804可以由任何类型的易失性或非易失性存储设备 或者它们的组合实现, 如静态随机存取存储器 (SRAM ) , 电可擦除可编程只读存储器 ( EEPR0M) , 可擦除可编程只读存储器 (EPR0M) , 可编程只读存储器 (PR0M), 只读存储 器 (ROM) , 磁存储器, 快闪存储器, 磁盘或光盘。 电源组件 806为终端设备 800的各种组件提供电力。 电源组件 806可以包括电源管 理系统, 一个或多个电源, 及其他与为终端设备 800生成、 管理和分配电力相关联的组 件。 多媒体组件 808包括在所述终端设备 800和用户之间的提供一个输出接口的屏幕。 在一些实施例中, 屏幕可以包括液晶显示器 (LCD ) 和触摸面板 (TP )。 如果屏幕包括触 摸面板, 屏幕可以被实现为触摸屏, 以接收来自用户的输入信号。 触摸面板包括一个或 多个触摸传感器以感测触摸、 滑动和触摸面板上的手势。 所述触摸传感器可以不仅感测 触摸或滑动动作的边界, 而且还检测与所述触摸或滑动操作相关的持续时间和压力。 在 一些实施例中, 多媒体组件 808包括一个前置摄像头和 /或后置摄像头。 当设备 800处于 操作模式, 如拍摄模式或视频模式时, 前置摄像头和 /或后置摄像头可以接收外部的多媒 体数据。 每个前置摄像头和后置摄像头可以是一个固定的光学透镜系统或具有焦距和光 学变焦能力。 音频组件 810被配置为输出和 /或输入音频信号。 例如, 音频组件 810包括一个麦 克风(MIC), 当终端设备 800处于操作模式, 如呼叫模式、记录模式和语音识别模式时, 麦克风被配置为接收外部音频信号。 所接收的音频信号可以被进一步存储在存储器 804 或经由通信组件 816发送。 在一些实施例中, 音频组件 810还包括一个扬声器, 用于输 出音频信号。
I/ O接口 812为处理组件 802和外围接口模块之间提供接口, 上述外围接口模块可 以是键盘, 点击轮, 按钮等。 这些按钮可包括但不限于: 主页按钮、 音量按钮、 启动按 钮和锁定按钮。 传感器组件 814包括一个或多个传感器, 用于为终端设备 800提供各个方面的状态 评估。 例如, 传感器组件 814可以检测到设备 800的打开 /关闭状态, 组件的相对定位, 例如所述组件为终端设备 800的显示器和小键盘, 传感器组件 814还可以检测终端设备 800或终端设备 800—个组件的位置改变, 用户与终端设备 800接触的存在或不存在, 终端设备 800方位或加速 /减速和终端设备 800的温度变化。传感器组件 814可以包括接 近传感器, 被配置用来在没有任何的物理接触时检测附近物体的存在。 传感器组件 814 还可以包括光传感器, 如 CMOS或 CCD图像传感器, 用于在成像应用中使用。 在一些实施 例中, 该传感器组件 814还可以包括加速度传感器, 陀螺仪传感器, 磁传感器, 压力传 感器或温度传感器。 通信组件 816被配置为便于终端设备 800和其他设备之间有线或无线方式的通信。 终端设备 800可以接入基于通信标准的无线网络, 如 WiFi, 2G、 3G或 4G, 或它们的组 合。 在一个示例性实施例中, 通信部件 816经由广播信道接收来自外部广播管理系统的 广播信号或广播相关信息。 在一个示例性实施例中, 所述通信部件 816还包括近场通信 ( NFC) 模块, 以促进短程通信。 例如, 在 NFC模块可基于射频识别 (RFID) 技术, 红外 数据协会 (IrDA) 技术, 超宽带 (UWB) 技术, 蓝牙 (BT) 技术和其他技术来实现。 在示例性实施例中, 终端设备 800 可以被一个或多个应用专用集成电路 (ASIC)、 数字信号处理器 (DSP)、 数字信号处理设备 (DSPD)、 可编程逻辑器件 (PLD)、 现场可编 程门阵列 (FPGA)、 控制器、 微控制器、 微处理器或其他电子元件实现, 用于执行上述方 法。 在示例性实施例中, 还提供了一种包括指令的非临时性计算机可读存储介质, 例如 包括指令的存储器 804,上述指令可由终端设备 800的处理器 820执行以完成上述方法。 例如, 所述非临时性计算机可读存储介质可以是 R0M、 随机存取存储器(RAM)、 CD-ROM, 磁带、 软盘和光数据存储设备等。 一种非临时性计算机可读存储介质, 当所述存储介质中的指令由移动终端的处理器 执行时, 使得移动终端能够执行一种聚类方法, 所述方法包括: 根据类间的 Rank-Order 距离, 进行类的迭代合并; 利用类内各个对象间的距离获 得迭代合并后的类对应的类内聚合度; 针对迭代合并得到的每个类, 将类内对象间的距 离小于所述类内聚合度的对象划分成一个新的类, 并更新类的数量; 当更新后的类的数 量比更新前的类的数量少时,返回执行根据类间的 Rank-Order距离进行类的迭代合并的 步骤, 直到更新前后的类的数量不变时, 得到聚类结果, 所述聚类结果包括包含多个对 象的类和包含单个对象的类。 可选地, 所述利用类内各个对象间的距离获得迭代合并后的类对应的类内聚合度, 采用如下方式: 获取类内各个对象间的距离; 根据所述类内对象间的距离计算所述类内的各个对象 间距离的距离平均值, 得到所述类的类内聚合度。 可选地, 所述利用类内各个对象间的距离获得迭代合并后的类对应的类内聚合度, 采用如下方式: 获取类内各个对象间的距离; 根据所述类内对象间的距离计算所述类内的各个对象 间距离的距离平均值; 将所述距离平均值进行归一化, 得到所述类的类内聚合度。 可选地, 所述针对迭代合并得到的每个类, 将类内对象间的距离小于所述类内聚合 度的对象划分成一个新类, 更新类的数量, 采用如下方式: 将类内对象间的距离小于所述类内聚合度的对象进行连通标记; 根据所述连通标记 确定所述类内的连通分量; 根据所述连通分量将所述类拆分成新类, 并更新类的数量。 可选地, 所述根据类间的 Rank-Order距离, 进行类的迭代合并, 采用如下方式: 获取类间 Rank-Order 距离, 以及获取类间 Rank-Order 归一化距离; 当类间的 Rank-Order距离小于距离阈值, 且所述类间的 Rank-Order归一化距离小于 1时, 合并 所述类。 图 9是本发明实施例中服务器的结构示意图。 例如, 该服务器 1900可因配置或性 能不同而产生比较大的差异,可以包括一个或一个以上中央处理器(central processing units , CPU) 1922 (例如, 一个或一个以上处理器) 和存储器 1932, 一个或一个以上存 储应用程序 1942或数据 1944的存储介质 1930 (例如一个或一个以上海量存储设备)。 其中, 存储器 1932和存储介质 1930可以是短暂存储或持久存储。 存储在存储介质 1930 的程序可以包括一个或一个以上模块(图示没标出), 每个模块可以包括对终端设备中的 一系列指令操作。 更进一步地, 中央处理器 1922可以设置为与存储介质 1930通信, 在 服务器 1900上执行存储介质 1930中的一系列指令操作。 服务器 1900还可以包括一个或一个以上电源 1926, 一个或一个以上有线或无线网 络接口 1950, 一个或一个以上输入输出接口 1958, 一个或一个以上键盘 1956, 和 /或, 一个或一个以上操作系统 1941,例如 Windows ServerTM, Mac OS XTM, UnixTM, LinuxTM, FreeBSDTM等等。 在示例性实施例中, 还提供了一种包括指令的非临时性计算机可读存储介质, 例如 存储器 1932或存储介质 1930, 上述指令可由终端设备的处理器 1922执行以完成上述方 法。例如,所述非临时性计算机可读存储介质可以是 ROM、随机存取存储器(RAM)、CD-R0M、 磁带、 软盘和光数据存储设备等。 一种非临时性计算机可读存储介质, 当所述存储介质中的指令由终端设备的处理器 执行时, 使得终端设备能够执行一种聚类方法, 所述方法包括: 根据类间的 Rank-Order 距离, 进行类的迭代合并; 利用类内各个对象间的距离获 得迭代合并后的类对应的类内聚合度; 针对迭代合并得到的每个类, 将类内对象间的距 离小于所述类内聚合度的对象划分成一个新的类, 并更新类的数量; 当更新后的类的数 量比更新前的类的数量少时,返回执行根据类间的 Rank-Order距离进行类的迭代合并的 步骤, 直到更新前后的类的数量不变时, 得到聚类结果, 所述聚类结果包括包含多个对 象的类和包含单个对象的类。 可选地, 所述利用类内各个对象间的距离获得迭代合并后的类对应的类内聚合度, 采用如下方式: 获取类内各个对象间的距离; 根据所述类内对象间的距离计算所述类内的各个对象 间距离的距离平均值, 得到所述类的类内聚合度。 可选地, 所述利用类内各个对象间的距离获得迭代合并后的类对应的类内聚合度, 采用如下方式: 获取类内各个对象间的距离; 根据所述类内对象间的距离计算所述类内的各个对象 间距离的距离平均值; 将所述距离平均值进行归一化, 得到所述类的类内聚合度。 可选地, 所述针对迭代合并得到的每个类, 将类内对象间的距离小于所述类内聚合 度的对象划分成一个新类, 更新类的数量, 采用如下方式: 将类内对象间的距离小于所述类内聚合度的对象进行连通标记; 根据所述连通标记 确定所述类内的连通分量; 根据所述连通分量将所述类拆分成新类, 并更新类的数量。 可选地, 所述根据类间的 Rank-Order距离, 进行类的迭代合并, 采用如下方式: 获取类间 Rank-Order 距离, 以及获取类间 Rank-Order 归一化距离; 当类间的 Rank-Order距离小于距离阈值, 且所述类间的 Rank-Order归一化距离小于 1时, 合并 所述类。 应当理解的是, 本发明并不局限于上面已经描述并在附图中示出的精确结构, 并且 可以在不脱离其范围进行各种修改和改变。 本发明的范围仅由所附的权利要求来限制。 需要说明的是, 在本文中, 诸如 "第一"和 "第二" 等之类的关系术语仅仅用来将 一个实体或者操作与另一个实体或操作区分开来, 而不一定要求或者暗示这些实体或操 作之间存在任何这种实际的关系或者顺序。 而且, 术语 "包括" 、 "包含" 或者其任何 其他变体意在涵盖非排他性的包含, 从而使得包括一系列要素的过程、 方法、 物品或者 设备不仅包括那些要素, 而且还包括没有明确列出的其他要素, 或者是还包括为这种过 程、方法、物品或者设备所固有的要素。在没有更多限制的情况下,由语句 "包括一个…… 限定的要素, 并不排除在包括所述要素的过程、 方法、 物品或者设备中还存在另外的相 同要素。

Claims

权利要求
1、 一种聚类方法, 其特征在于, 包括:
根据类间的 Rank-Order距离, 进行类的迭代合并;
利用类内各个对象间的距离获得迭代合并后的类对应的类内聚合度;
针对迭代合并得到的每个类, 将类内对象间的距离小于所述类内聚合度的对象划分成一 个新的类, 并更新类的数量;
当更新后的类的数量比更新前的类的数量少时,返回执行根据类间的 Rank-Order距离进 行类的迭代合并的步骤, 直到更新前后的类的数量不变时, 得到聚类结果, 所述聚类结果包 括包含多个对象的类和包含单个对象的类。
2、 根据权利要求 1所述的方法, 其特征在于, 所述利用类内各个对象间的距离获得迭代 合并后的类对应的类内聚合度, 采用如下方式:
获取类内各个对象间的距离;
根据所述类内对象间的距离计算所述类内的各个对象间距离的距离平均值, 得到所述类 的类内聚合度。
3、 根据权利要求 1所述的方法, 其特征在于, 所述利用类内各个对象间的距离获得迭代 合并后的类对应的类内聚合度, 采用如下方式:
获取类内各个对象间的距离;
根据所述类内对象间的距离计算所述类内的各个对象间距离的距离平均值;
将所述距离平均值进行归一化, 得到所述类的类内聚合度。
4、 根据权利要求 2或 3所述的方法, 其特征在于, 所述针对迭代合并得到的每个类, 将 类内对象间的距离小于所述类内聚合度的对象划分成一个新类, 更新类的数量, 采用如下方 式- 将类内对象间的距离小于所述类内聚合度的对象进行连通标记;
根据所述连通标记确定所述类内的连通分量;
根据所述连通分量将所述类拆分成新类, 并更新类的数量。
5、 根据权利要求 1所述的方法, 其特征在于, 所述根据类间的 Rank-Order距离, 进行 类的迭代合并, 采用如下方式:
获取类间 Rank-Order距离, 以及获取类间 Rank-Order归一化距离;
当类间的 Rank-Order距离小于距离阈值, 且所述类间的 Rank-Order归一化距离小于 1 时, 合并所述类。
6、 一种聚类装置, 其特征在于, 包括:
迭代合并单元, 用于根据类间的 Rank-Order距离, 进行类的迭代合并;
获取单元, 用于利用类内各个对象间的距离获得迭代合并后的类对应的类内聚合度; 划分单元, 用于针对迭代合并得到的每个类, 将类内对象间的距离小于所述类内聚合度 的对象划分成一个新的类, 并更新类的数量;
判断单元, 用于当更新后的类的数量比更新前的类的数量少时, 控制所述迭代合并单元 执行根据类间的 Rank-Order距离进行类的迭代合并, 直到更新前后的类的数量不变时, 得到 聚类结果, 所述聚类结果包括包含多个对象的类和包含单个对象的类。
7、 根据权利要求 6所述的装置, 其特征在于, 所述获取单元包括:
第一获取子单元, 用于获取类内各个对象间的距离;
第一计算子单元, 用于计算所述类的各个对象间的距离的距离平均值, 得到所述类内聚
8、 根据权利要求 6所述的装置, 其特征在于, 所述获取单元包括:
第二获取子单元, 用于获取类内各个对象间的距离;
第二计算子单元, 用于根据所述类内对象间的距离,计算所述类内的各个对象间距离的 距离平均值;
归一化子单元, 将所述距离平均值进行归一化, 得到所述类的类内聚合度。
9、 根据权利要求 7或 8所述的装置, 其特征在于, 所述划分单元包括:
第一判断子单元, 用于判断所述类内对象间的距离是否小于所述类内聚合度; 标记子单元, 用于当所述类内对象间的距离小于所述类内聚合度时, 将所述类内对象间 的距离对应的对象进行连通标记;
确定子单元, 用于根据所述连通标记确定所述类内的连通分量;
拆分子单元, 用于根据所述连通分量将所述类拆分成新类, 并更新类的数量。
10、 根据权利要求 6所述的装置, 其特征在于, 所述迭代合并单元包括: 第三获取子单元, 用于获取类间 Rank-Order距离, 以及获取类间 Rank-Order归一化 距离;
合并子单元, 用于当类间的 Rank-Order距离小于距离阈值, 且所述类间 Rank-Order归 一化距离小于 1时, 合并所述类。
11、 一种终端设备, 其特征在于, 包括:
处理器; 用于存储处理器可执行指令的存储器; 其中, 所述处理器被配置为:
根据类间的 Rank-Order距离, 进行类的迭代合并; 利用类内各个对象间的距离获得迭代合并后的类对应的类内聚合度; 针对迭代合并得到的每个类, 将类内对象间的距离小于所述类内聚合度的对象划分 成一个新的类, 并更新类的数量; 当更新后的类的数量比更新前的类的数量少时,返回执行根据类间的 Rank-Order 离进行类的迭代合并的步骤, 直到更新前后的类的数量不变; 当更新前后的类的数量不变时, 得到聚类结果, 所述聚类结果包括包含多个对象的类 包含单个对象的类。
PCT/CN2014/082876 2014-03-14 2014-07-24 聚类方法及相关装置 WO2015135276A1 (zh)

Priority Applications (5)

Application Number Priority Date Filing Date Title
MX2014010879A MX358804B (es) 2014-03-14 2014-07-24 Metodo y dispositivo de agrupamiento.
RU2015129676A RU2628167C2 (ru) 2014-03-14 2014-07-24 Способ и устройство для кластеризации
JP2016506778A JP6101399B2 (ja) 2014-03-14 2014-07-24 クラスタリング方法、クラスタリング装置、端末装置、プログラム及び記録媒体
KR1020147026527A KR20150117202A (ko) 2014-03-14 2014-07-24 클러스터링 방법, 관련 장치, 프로그램 및 기록매체
US14/532,271 US10037345B2 (en) 2014-03-14 2014-11-04 Clustering method and device

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201410097422.5A CN103914518B (zh) 2014-03-14 2014-03-14 聚类方法及相关装置
CN201410097422.5 2014-03-14

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US14/532,271 Continuation US10037345B2 (en) 2014-03-14 2014-11-04 Clustering method and device

Publications (1)

Publication Number Publication Date
WO2015135276A1 true WO2015135276A1 (zh) 2015-09-17

Family

ID=51040198

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2014/082876 WO2015135276A1 (zh) 2014-03-14 2014-07-24 聚类方法及相关装置

Country Status (7)

Country Link
EP (1) EP2919165B1 (zh)
JP (1) JP6101399B2 (zh)
KR (1) KR20150117202A (zh)
CN (1) CN103914518B (zh)
MX (1) MX358804B (zh)
RU (1) RU2628167C2 (zh)
WO (1) WO2015135276A1 (zh)

Families Citing this family (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10037345B2 (en) 2014-03-14 2018-07-31 Xiaomi Inc. Clustering method and device
CN103914518B (zh) * 2014-03-14 2017-05-17 小米科技有限责任公司 聚类方法及相关装置
CN104268149A (zh) * 2014-08-28 2015-01-07 小米科技有限责任公司 聚类方法及装置
CN104408130B (zh) * 2014-11-26 2018-04-27 小米科技有限责任公司 图片整理的方法及装置
CN104598544A (zh) * 2014-12-31 2015-05-06 小米科技有限责任公司 聚类分析方法、装置及设备
KR101811962B1 (ko) * 2016-12-07 2017-12-22 울산대학교 산학협력단 비선형 데이터의 클래스 변별성 평가 방법 및 장치
CN109063737A (zh) * 2018-07-03 2018-12-21 Oppo广东移动通信有限公司 图像处理方法、装置、存储介质及移动终端
CN110363382A (zh) * 2019-06-03 2019-10-22 华东电力试验研究院有限公司 全能型乡镇供电所一体化业务融合技术
CN110730270B (zh) * 2019-09-09 2021-09-14 上海斑马来拉物流科技有限公司 一种短信分组方法、装置及计算机存储介质、电子设备
CN110826338B (zh) * 2019-10-28 2022-06-17 桂林电子科技大学 一种单选择门与类间度量的细粒度语义相似识别的方法
CN110826616B (zh) * 2019-10-31 2023-06-30 Oppo广东移动通信有限公司 信息处理方法及装置、电子设备、存储介质
TWI756597B (zh) * 2019-12-10 2022-03-01 晶睿通訊股份有限公司 隊列分析方法與影像監控設備
CN111860700B (zh) * 2020-09-22 2020-12-15 深圳须弥云图空间科技有限公司 一种能耗分类方法、装置、存储介质及设备
CN113255841B (zh) * 2021-07-02 2021-11-16 浙江大华技术股份有限公司 一种聚类方法、聚类装置和计算机可读存储介质

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20120294540A1 (en) * 2011-05-17 2012-11-22 Microsoft Corporation Rank order-based image clustering
CN103473255A (zh) * 2013-06-06 2013-12-25 中国科学院深圳先进技术研究院 一种数据聚类方法、系统及数据处理设备
TW201407390A (zh) * 2012-08-15 2014-02-16 Acer Inc 資料分群裝置和方法
CN103914518A (zh) * 2014-03-14 2014-07-09 小米科技有限责任公司 聚类方法及相关装置

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH10171823A (ja) * 1996-12-09 1998-06-26 Mitsubishi Electric Corp 文書の自動分類方法およびその装置
RU2345414C1 (ru) * 2007-08-10 2009-01-27 Общество с ограниченной ответственностью "Рекогмишн" Способ построения системы индексирования для поиска объектов на цифровых изображениях
US9171071B2 (en) * 2010-03-26 2015-10-27 Nec Corporation Meaning extraction system, meaning extraction method, and recording medium

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20120294540A1 (en) * 2011-05-17 2012-11-22 Microsoft Corporation Rank order-based image clustering
TW201407390A (zh) * 2012-08-15 2014-02-16 Acer Inc 資料分群裝置和方法
CN103473255A (zh) * 2013-06-06 2013-12-25 中国科学院深圳先进技术研究院 一种数据聚类方法、系统及数据处理设备
CN103914518A (zh) * 2014-03-14 2014-07-09 小米科技有限责任公司 聚类方法及相关装置

Also Published As

Publication number Publication date
EP2919165A3 (en) 2015-12-23
RU2628167C2 (ru) 2017-08-15
JP6101399B2 (ja) 2017-03-29
MX358804B (es) 2018-08-29
EP2919165B1 (en) 2018-02-07
JP2016516251A (ja) 2016-06-02
CN103914518A (zh) 2014-07-09
KR20150117202A (ko) 2015-10-19
CN103914518B (zh) 2017-05-17
EP2919165A2 (en) 2015-09-16
RU2015129676A (ru) 2017-04-24
MX2014010879A (es) 2016-08-30

Similar Documents

Publication Publication Date Title
WO2015135276A1 (zh) 聚类方法及相关装置
US11663468B2 (en) Method and apparatus for training neural network, and storage medium
WO2020029966A1 (zh) 视频处理方法及装置、电子设备和存储介质
US9953212B2 (en) Method and apparatus for album display, and storage medium
WO2020048308A1 (zh) 多媒体资源分类方法、装置、计算机设备及存储介质
US11244228B2 (en) Method and device for recommending video, and computer readable storage medium
WO2021036382A1 (zh) 图像处理方法及装置、电子设备和存储介质
WO2017107419A1 (zh) 屏幕解锁方法、装置及终端
WO2017020476A1 (zh) 关联用户的确定方法及装置
KR101639502B1 (ko) 클러스터링 방법, 관련장치, 프로그램 및 기록매체
RU2639682C2 (ru) Способ и устройство для сворачивания изображений
WO2017219484A1 (zh) 设置身份图像的方法及装置
WO2022160849A1 (zh) 视频处理方法及装置、电子设备和存储介质
WO2016206295A1 (zh) 字符确定方法及装置
CN111259967A (zh) 图像分类及神经网络训练方法、装置、设备及存储介质
TW202036476A (zh) 圖像處理方法及裝置、電子設備和儲存介質
JP2016517110A5 (zh)
US10037345B2 (en) Clustering method and device
US20150262033A1 (en) Method and terminal device for clustering
CN109325141B (zh) 图像检索方法及装置、电子设备和存储介质
CN107992893B (zh) 压缩图像特征空间的方法及装置
WO2019214234A1 (zh) 一种输入预测方法及装置
CN113312475B (zh) 一种文本相似度确定方法及装置
WO2022116524A1 (zh) 图片识别方法、装置、电子设备及介质
CN109063001B (zh) 页面展示方法及装置

Legal Events

Date Code Title Description
WWE Wipo information: entry into national phase

Ref document number: MX/A/2014/010879

Country of ref document: MX

ENP Entry into the national phase

Ref document number: 2016506778

Country of ref document: JP

Kind code of ref document: A

WWE Wipo information: entry into national phase

Ref document number: 1020147026527

Country of ref document: KR

ENP Entry into the national phase

Ref document number: 2015129676

Country of ref document: RU

Kind code of ref document: A

121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 14885147

Country of ref document: EP

Kind code of ref document: A1

REG Reference to national code

Ref country code: BR

Ref legal event code: B01A

Ref document number: 112014024397

Country of ref document: BR

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 14885147

Country of ref document: EP

Kind code of ref document: A1

ENP Entry into the national phase

Ref document number: 112014024397

Country of ref document: BR

Kind code of ref document: A2

Effective date: 20140930