CN107423757B

CN107423757B - Clustering processing method and device

Info

Publication number: CN107423757B
Application number: CN201710573089.4A
Authority: CN
Inventors: 陈志军
Original assignee: Beijing Xiaomi Mobile Software Co Ltd
Current assignee: Beijing Xiaomi Mobile Software Co Ltd
Priority date: 2017-07-14
Filing date: 2017-07-14
Publication date: 2020-10-09
Anticipated expiration: 2037-07-14
Also published as: CN107423757A

Abstract

The present disclosure provides a clustering method and apparatus, the method comprising: respectively obtaining the similarity between each two elements in the first class and each two elements in the second class to obtain M × N similarities, wherein the first class comprises M elements, and the second class comprises N elements; determining preset K similarity degrees according to the similarity degrees from big to small; calculating a confidence value between the first class and the second class and an analogy probability that the first class and the second class are the same class by adopting two inter-element clustering probabilities corresponding to each of the K similarities; and when the confidence value and the analogy probability both meet a preset condition, combining the first class and the second class. In the new clustering method, the clustering probability among different elements is considered, so that the confidence values and the analogy probability of different classes are calculated, and the clustering accuracy is greatly improved.

Description

Clustering processing method and device

Technical Field

The present disclosure relates to information processing technologies, and in particular, to a clustering method and apparatus.

Background

Clustering refers to the process of dividing a collection of physical or abstract objects into classes composed of similar objects. The cluster generated by clustering is a collection of a set of data objects that are similar to objects in the same cluster and distinct from objects in other clusters.

Clustering is widely applied in various fields, such as face recognition, product recommendation and the like. However, the existing clustering method mainly adopts a one-to-many and one-to-one mode, and the clustering effect is poor.

Disclosure of Invention

The disclosure provides a clustering method and a clustering device, which are used for improving clustering effect.

According to a first aspect of the embodiments of the present disclosure, there is provided a clustering method, including:

respectively obtaining the similarity between each element in the first class and each element in the second class to obtain M-N similarities, wherein the first class comprises M elements, the second class comprises N elements, and M, N are integers greater than 0;

determining preset K similarity degrees according to the similarity degrees from large to small, wherein K is an integer larger than 0;

calculating a confidence value between the first class and the second class and an analogy probability that the first class and the second class are the same class by adopting two inter-element clustering probabilities corresponding to each of the K similarities, wherein the clustering probabilities are used for indicating the probabilities that the two elements are the same element;

and when the confidence value and the analogy probability both meet a preset condition, combining the first class and the second class.

Optionally, before calculating the confidence value between the first class and the second class and the analogy probability that the first class and the second class are the same class by using the two inter-element clustering probabilities corresponding to each of the K similarities, the method further includes:

acquiring a set of elements to be processed, and calculating the clustering probability between any two elements in the set of elements to be processed, wherein the set of elements to be processed comprises: all elements in the first class and all elements in the second class.

Optionally, the calculating a clustering probability between any two elements in the set of elements to be processed includes:

calculating the similarity between any two elements in the element set to be processed;

and determining the clustering probability between any two elements according to the similarity between any two elements.

Optionally, the calculating a confidence value between the first class and the second class by using the two inter-element clustering probabilities corresponding to each of the K similarities includes:

using a formula

Calculating a confidence value D between the first class and the second class_ABWherein p is_iAnd representing the clustering probability between two elements corresponding to the ith similarity in the K similarities, wherein i is greater than 0 and less than or equal to K.

Optionally, the calculating, by using the clustering probability between two elements corresponding to each of the K similarities, an analog probability that the first class and the second class are the same class includes:

using a formula

Calculating an analogy probability P that said first class and said second class are of the same class_ABWherein p is_iAnd representing the clustering probability between two elements corresponding to the ith similarity in the K similarities, wherein i is greater than 0 and less than or equal to K.

According to a second aspect of the embodiments of the present disclosure, there is provided a cluster processing apparatus including:

the acquisition module is configured to respectively acquire the similarity between each two elements in the first class and each two elements in the second class to obtain M × N similarities, wherein the first class comprises M elements, the second class comprises N elements, and M, N are integers greater than 0;

the determining module is configured to determine preset K similarity degrees according to the similarity degrees from large to small, wherein K is an integer larger than 0;

a processing module configured to calculate a confidence value between the first class and the second class and an analogy probability that the first class and the second class are the same class by using a clustering probability between two elements corresponding to each of the K similarities, wherein the clustering probability is used to indicate a probability that two elements are the same element;

a clustering module configured to merge the first class and the second class when both the confidence value and the analogy probability satisfy a preset condition.

Optionally, the apparatus further comprises:

a probability obtaining module configured to obtain a set of elements to be processed, and calculate a clustering probability between any two elements in the set of elements to be processed, where the set of elements to be processed includes: all elements in the first class and all elements in the second class.

Optionally, the probability obtaining module includes:

the statistic submodule is configured to count the similarity between any two elements in the element set to be processed;

a determining submodule configured to determine a clustering probability between any two elements according to a similarity between the any two elements.

Optionally, the processing module is configured to adopt a formula

Optionally, the processing module is configured to adopt a formula

According to a third aspect of the embodiments of the present disclosure, there is provided a cluster processing apparatus including:

a processor;

a memory for storing executable instructions;

wherein the processor is configured to:

The technical scheme provided by the embodiment of the disclosure can have the following beneficial effects: respectively obtaining the similarity between each element in the first class and each element in the second class to obtain M x N similarity, determining preset K similarities according to the similarity from large to small, calculating a confidence value between the first class and the second class and an analog probability that the first class and the second class are the same class by adopting the clustering probability between two elements corresponding to each similarity in the K similarities, and then merging the first class and the second class when the confidence value and the analog probability both meet preset conditions. In the new clustering method, the clustering probability among different elements is considered, so that the confidence values and the analogy probability of different classes are calculated, and the clustering accuracy is greatly improved.

It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the disclosure.

Drawings

In order to more clearly illustrate the embodiments of the present disclosure or the technical solutions in the prior art, the drawings needed to be used in the description of the embodiments or the prior art will be briefly introduced below, and it is obvious that the drawings in the following description are some embodiments of the present disclosure, and for those skilled in the art, other drawings can be obtained according to the drawings without inventive exercise.

FIG. 1 is a flow diagram illustrating a method of clustering according to an exemplary embodiment;

FIG. 2 is a flow diagram illustrating a method of clustering according to another exemplary embodiment;

fig. 3 is a schematic structural diagram illustrating a cluster processing apparatus according to an exemplary embodiment;

fig. 4 is a schematic structural diagram illustrating a cluster processing apparatus according to yet another exemplary embodiment;

fig. 5 is a schematic structural diagram showing a cluster processing apparatus according to still another exemplary embodiment;

fig. 6 is a schematic structural diagram showing a cluster processing apparatus according to another exemplary embodiment;

fig. 7 is a schematic structural diagram illustrating a cluster processing apparatus according to another exemplary embodiment.

Detailed Description

Reference will now be made in detail to the exemplary embodiments, examples of which are illustrated in the accompanying drawings. When the following description refers to the accompanying drawings, like numbers in different drawings represent the same or similar elements unless otherwise indicated. The implementations described in the exemplary embodiments below are not intended to represent all implementations consistent with the present disclosure. Rather, they are merely examples of apparatus and methods consistent with certain aspects of the present disclosure, as detailed in the appended claims.

The terms "first," "second," "third," and the like in the description and in the claims of the present disclosure are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used is interchangeable under appropriate circumstances such that the embodiments of the disclosure described herein are, for example, capable of operation in sequences other than those illustrated or otherwise described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus. In the specification and claims of the present disclosure "" denotes a multiplier.

In the embodiment of the disclosure, the probability, confidence and the like of the same elements among the elements are introduced in the clustering process, so as to effectively ensure the accuracy of clustering.

FIG. 1 is a flow diagram illustrating a method of clustering according to an example embodiment. As shown in fig. 1, the method includes:

in step S101, the similarity between each two of the elements in the first class and each of the elements in the second class is obtained, so as to obtain M × N similarities.

The embodiment of the present disclosure takes two classes as an example for description, and in a specific implementation, a plurality of classes may be executed with reference to the first class and the second class in pairs. It should be noted that the first class and the second class are any two of a plurality of classes to be clustered. In an initial state, all elements to be aggregated can be classified into one class by each element, then the similarity between every two elements is calculated, specifically, the similarity can be represented by a distance, the smaller the distance is, the greater the similarity is, the elements with the distance smaller than a preset threshold are aggregated into one class, and a plurality of classes are formed.

Wherein the first type comprises M elements, the second type comprises N elements, and M, N are integers greater than 0.

Elements in this disclosure may refer to anything that needs to be clustered, such as: human face, fruit, etc., without limitation herein.

In step S102, a preset number K of similarity degrees are determined according to the similarity degrees from high to low.

Wherein K is an integer greater than 0. And K is a preset value, namely, the similarity of the preset number is selected according to the sequence from big to small of the similarity. Specifically, the K similarity values may be added to the similarity set E. And adding elements corresponding to all the similarity degrees in the K similarity degrees into the set V.

The set V and the set E can also be updated continuously in the whole clustering process, after the similarity is sequenced, if the similarity between the first pair of the element A and the element B is the highest, whether at least 1 of the element A and the element B is in the set V is determined, and if yes, the next pair of elements is judged; if not, element A and element B are added to set V.

In step S103, a confidence value between the first class and the second class and an analogy probability that the first class and the second class are the same class are calculated by using two inter-element clustering probabilities corresponding to each of the K similarities.

Wherein the clustering probability is used to indicate the probability that two elements are the same element.

After selecting the K similarities, the similarities are respectively the similarities between the elements. For example, if a certain similarity is the similarity between the element a and the element B, then the clustering probability of the element a and the element B is obtained, and so on, the clustering probability between two elements corresponding to each similarity in the K similarities is obtained, and then the confidence value between the first class and the second class, and the analogy probability that the first class and the second class are the same class are calculated.

Alternatively, K may be equal to M, and may indicate that "probability value of first class to second class" and "probability value of second class to first class" are the same.

It should be noted that the confidence value, i.e. the information entropy, is used to indicate the confidence that two elements are the same element. In the source, not the uncertainty that occurs for a single element, but the average uncertainty of all possible occurrences of this source is considered. The average uncertainty of the source should be a statistical average of the individual symbol uncertainties, which may be referred to as an information entropy value.

In step S104, when both the confidence value and the analogy probability satisfy the predetermined condition, the first class and the second class are merged.

In this embodiment, the similarity between each element in the first class and each element in the second class is obtained, M × N similarities are obtained, K preset similarities are determined according to the similarities from large to small, a confidence value between the first class and the second class and an analog probability that the first class and the second class are the same class are calculated by using a clustering probability between two elements corresponding to each similarity in the K similarities, and the first class and the second class are combined when the confidence value and the analog probability both satisfy a preset condition. In the new clustering method, the clustering probability among different elements is considered, so that the confidence values and the analogy probability of different classes are calculated, and the clustering accuracy is greatly improved.

The clustering probability between any two elements in the element set to be processed can be obtained from a preset database, and can also be obtained through temporary calculation.

Optionally, before calculating the confidence value between the first class and the second class and the analog probability that the first class and the second class are the same class by using the two inter-element clustering probabilities corresponding to each of the K similarities, a large number of element samples may be collected, the clustering probability between any two elements may be calculated, and the calculated clustering probability may be stored in a preset database.

Specifically, the method may include: and acquiring a to-be-processed element set, and calculating the clustering probability between any two elements in the to-be-processed element set. In the specific calculation, two elements can be combined, and the clustering probability between the two elements is calculated until the clustering probability is calculated between all the elements in the element set to be processed.

The set of elements to be processed may include a plurality of element samples, and the element samples include the same elements and different elements. The set of elements to be processed includes, but is not limited to, all elements in the first class and all elements in the second class.

Fig. 2 is a flow chart illustrating a method of clustering according to another exemplary embodiment.

Optionally, as shown in fig. 2, calculating a clustering probability between any two elements in the set of elements to be processed may include:

in step S201, the similarity between any two elements in the set of elements to be processed is counted.

In step S202, a clustering probability between any two elements is determined according to a similarity between any two elements.

Specifically, when the similarity between two elements is calculated in a pairwise combination, any similarity calculation method may be adopted, which is not limited in this disclosure. For example, if the elements are all faces, a face recognition algorithm can be adopted to extract the features of each face, and then the similarity between the two faces is calculated.

And then calculating the probability that the two elements are the same element through the statistics of the similarity. For example, waveform statistics may be used, that is, a similarity curve is generated, and the higher the similarity is, the higher the probability that two elements are the same element is, that is, the higher the clustering probability value is.

The similarity value in the statistical process can be embodied by a distance value, and the closer the distance between the two elements is, the higher the identification similarity is. It can be seen on the graph that the closer the distance, the higher the clustering probability.

Specifically, the confidence value between the first class and the second class is calculated by using the clustering probability between two elements corresponding to each similarity in the K similarities, which may specifically be: using a formula

Calculating a confidence value D between the first class and the second class_AB。

Wherein p is_iAnd representing the clustering probability between two elements corresponding to the ith similarity in the K similarities, wherein i is greater than 0 and less than or equal to K.

-log₂p_iIndicating the uncertainty that the two elements corresponding to the ith similarity are the same element.

D_ABSmaller means that the elements in the first class and the elements in the second class are the same element with less uncertainty.

Similarly, the clustering probability between two elements corresponding to each of the K similarities is used to calculate the analogy probability between the first class and the second class, which may be: using a formula

On the basis of the above embodiment, the confidence threshold and the analogy probability threshold can be set separately. When both the confidence value and the analogy probability satisfy the preset condition, the first class and the second class are merged, which may be: and when the confidence value is greater than a first preset threshold value and the analogy probability is greater than a second preset threshold value, combining the first class and the second class.

Fig. 3 is a schematic structural diagram illustrating a cluster processing apparatus according to an exemplary embodiment. The embodiment of the disclosure provides a cluster processing device, which can be integrated in a terminal or a terminal. The terminal here may refer to a computer, a server, and the like, and is not limited herein. As shown in fig. 3, the apparatus includes: an obtaining module 301, a determining module 302, a processing module 303 and a clustering module 304, wherein:

the obtaining module 301 is configured to obtain similarity between each two of each element in the first class and each element in the second class, to obtain M × N similarities, where the first class includes M elements, the second class includes N elements, and M, N are integers greater than 0.

A determining module 302, configured to determine preset K similarity degrees according to the similarity degrees from large to small, where K is an integer greater than 0.

A processing module 303, configured to calculate a confidence value between the first class and the second class and an analogy probability that the first class and the second class are the same class by using two inter-element clustering probabilities corresponding to each of the K similarities, where the clustering probabilities are used to indicate probabilities that two elements are the same element.

A clustering module 304 configured to merge the first class and the second class when both the confidence value and the analogy probability satisfy a preset condition.

In the clustering device provided in this embodiment, a similarity between each two elements in the first class and each two elements in the second class is obtained to obtain M × N similarities, a preset K similarities are determined according to the similarities from large to small, a confidence value between the first class and the second class and an analog probability that the first class and the second class are the same class are calculated by using a clustering probability between two elements corresponding to each similarity in the K similarities, and the first class and the second class are merged when both the confidence value and the analog probability satisfy a preset condition. In the new clustering method, the clustering probability among different elements is considered, so that the confidence values and the analogy probability of different classes are calculated, and the clustering accuracy is greatly improved.

Fig. 4 is a schematic structural diagram illustrating a cluster processing apparatus according to still another exemplary embodiment. As shown in fig. 4, on the basis of fig. 3, the apparatus may further include: a probability acquisition module 401.

A probability obtaining module 401, configured to obtain a to-be-processed element set, and calculate a clustering probability between any two elements in the to-be-processed element set, where the to-be-processed element set includes: all elements in the first class and all elements in the second class.

Fig. 5 is a schematic structural diagram illustrating a cluster processing apparatus according to still another exemplary embodiment. Optionally, as shown in fig. 5, on the basis of fig. 4, the probability obtaining module 401 may include: a statistics submodule 501 and a determination submodule 502, wherein:

and the statistic submodule 501 is configured to count the similarity between any two elements in the set of elements to be processed.

A determining submodule 502 configured to determine a clustering probability between any two elements according to a similarity between the any two elements.

Further, the processing module 303 may be specifically configured to employ a formula

Optionally, the processing module 303 may be specifically configured to employ a formula

Fig. 6 is a schematic structural diagram illustrating a cluster processing apparatus according to another exemplary embodiment. The clustering device may be integrated in a terminal or may be a terminal. The terminal here may refer to a computer, a server, and the like, and is not limited herein.

As shown in fig. 6, the apparatus includes: a processor 601 and a memory 602 for storing executable instructions. Wherein the processor 601 is coupled to the memory 602.

The processor 601 is configured to:

To sum up, in the clustering device provided in this embodiment, the similarity between each element in the first class and each element in the second class is obtained to obtain M × N similarities, a preset K similarity is determined according to the similarity from large to small, a clustering probability between two elements corresponding to each similarity in the K similarities is used to calculate a confidence value between the first class and the second class and an analogy probability that the first class and the second class are the same class, and the first class and the second class are merged when both the confidence value and the analogy probability satisfy a preset condition. In the new clustering method, the clustering probability among different elements is considered, so that the confidence values and the analogy probability of different classes are calculated, and the clustering accuracy is greatly improved.

Referring to fig. 7, the cluster processing apparatus 700 may include one or more of the following components: a processing component 702, a memory 704, a power component 706, a multimedia component 708, an audio component 710, an input/output (I/O) interface 712, a sensor component 714, and a communication component 716.

The processing component 702 generally controls the overall operation of the cluster processing apparatus 700, such as operations associated with display, data communication, camera operations, and recording operations. The processing components 702 may include one or more processors 720 to execute instructions to perform all or a portion of the steps of the methods described above. Further, the processing component 702 may include one or more modules that facilitate interaction between the processing component 702 and other components. For example, the processing component 702 may include a multimedia module to facilitate interaction between the multimedia component 708 and the processing component 702.

The memory 704 is configured to store various types of data to support operations at the cluster processing apparatus 700. Examples of such data include instructions for any application or method operating on the cluster processing apparatus 700, contact data, phonebook data, messages, pictures, videos, and so forth. The Memory 704 may be implemented by any type of volatile or non-volatile Memory device or combination thereof, such as Static Random Access Memory (SRAM), Electrically Erasable Programmable Read-Only Memory (EEPROM), Erasable Programmable Read-Only Memory (EPROM), Programmable Read-Only Memory (PROM), Read-Only Memory (ROM), magnetic Memory, flash Memory, magnetic disk or optical disk.

The power component 706 provides power to the various components of the cluster processing apparatus 700. The power components 706 can include a power management system, one or more power sources, and other components associated with generating, managing, and distributing power for the cluster processing device 700.

The multimedia component 708 comprises a screen providing an output interface between the cluster processing apparatus 700 and the user. In some embodiments, the screen may include a Liquid Crystal Display (LCD) and a Touch Panel (TP). If the screen includes a touch panel, the screen may be implemented as a touch screen to receive an input signal from a user. The touch panel includes one or more touch sensors to sense touch, slide, and gestures on the touch panel. The touch sensor may not only sense the boundary of a touch or slide action, but also detect the duration and pressure associated with the touch or slide operation. In some embodiments, the multimedia component 708 includes a front facing camera and/or a rear facing camera. When the cluster processing device 700 is in an operation mode, such as a shooting mode or a video mode, the front camera and/or the rear camera can receive external multimedia data. Each front camera and rear camera may be a fixed optical lens system or have a focal length and optical zoom capability.

The audio component 710 is configured to output and/or input audio signals. For example, the audio component 710 may include a Microphone (MIC) configured to receive external audio signals when the cluster processing apparatus 700 is in an operating mode, such as a call mode, a recording mode, and a voice recognition mode. The received audio signal may further be stored in the memory 704 or transmitted via the communication component 716. In some embodiments, audio component 710 also includes a speaker for outputting audio signals.

The I/O interface 712 provides an interface between the processing component 702 and peripheral interface modules, which may be keyboards, click wheels, buttons, etc. These buttons may include, but are not limited to: a home button, a volume button, a start button, and a lock button.

The sensor component 714 includes one or more sensors for providing status evaluations of various aspects to the cluster processing apparatus 700. For example, the sensor component 714 may detect an open/closed state of the cluster processing device 700, the relative positioning of components, such as a display and a keypad of the cluster processing device 700, the sensor component 714 may also detect a change in position of the cluster processing device 700 or a component of the cluster processing device 700, the presence or absence of user contact with the cluster processing device 700, orientation or acceleration/deceleration of the cluster processing device 700, and a change in temperature of the cluster processing device 700. The sensor assembly 714 may include a proximity sensor configured to detect the presence of a nearby object without any physical contact. The sensor assembly 714 may also include a photosensor, such as a Complementary Metal Oxide Semiconductor (CMOS) or Charge-coupled Device (CCD) photosensitive imaging element, for use in imaging applications. In some embodiments, the sensor assembly 714 may also include an acceleration sensor, a gyroscope sensor, a magnetic sensor, a pressure sensor, or a temperature sensor.

The communication component 716 is configured to facilitate communication between the cluster processing apparatus 700 and other devices in a wired or wireless manner. The cluster processing device 700 may access a wireless network based on a communication standard, such as Wi-Fi, 2G, or 3G, or a combination thereof. In an exemplary embodiment, the communication component 716 receives a broadcast signal or broadcast related information from an external broadcast management system via a broadcast channel. In an exemplary embodiment, the Communication component 716 further includes a Near Field Communication (NFC) module to facilitate short-range communications. For example, the NFC module may be implemented based on Radio Frequency Identification (RFID) technology, Infrared data association (IrDA) technology, Ultra Wideband (UWB) technology, Bluetooth (BT) technology, and other technologies.

In an exemplary embodiment, the clustering unit 700 may be implemented by one or more Application Specific Integrated Circuits (ASICs), Digital Signal Processors (DSPs), Digital Signal Processing Devices (DSPDs), Programmable Logic Devices (PLDs), Field Programmable Gate Arrays (FPGAs), controllers, microcontrollers, microprocessors, or other electronic components for performing the above-described methods.

In an exemplary embodiment, a non-transitory computer-readable storage medium comprising instructions, such as the memory 704 comprising instructions, executable by the processor 720 of the cluster processing device 700 to perform the above-described method is also provided. For example, the non-transitory computer readable storage medium may be a ROM, a Random Access Memory (RAM), a Compact disk Read-Only Memory (CD-ROM), a magnetic tape, a floppy disk, an optical data storage device, and the like.

A non-transitory computer readable storage medium, in which instructions, when executed by a processor of a cluster processing apparatus 700, enable the cluster processing apparatus 700 to perform the above-described method.

Other embodiments of the disclosure will be apparent to those skilled in the art from consideration of the specification and practice of the disclosure disclosed herein. This application is intended to cover any variations, uses, or adaptations of the disclosure following, in general, the principles of the disclosure and including such departures from the present disclosure as come within known or customary practice within the art to which the disclosure pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the disclosure being indicated by the following claims.

It will be understood that the present disclosure is not limited to the precise arrangements described above and shown in the drawings and that various modifications and changes may be made without departing from the scope thereof. The scope of the present disclosure is limited only by the appended claims.

Claims

1. A clustering method is applied to face recognition and comprises the following steps:

when the confidence value and the analogy probability both meet a preset condition, combining the first class and the second class;

before calculating the confidence value between the first class and the second class and the analogy probability that the first class and the second class are the same class by using the two inter-element clustering probabilities corresponding to each of the K similarities, the method further includes:

acquiring a set of elements to be processed, and calculating the clustering probability between any two elements in the set of elements to be processed, wherein the set of elements to be processed comprises: all elements in the first class and all elements in the second class;

the calculating the clustering probability between any two elements in the element set to be processed comprises:

determining the clustering probability between any two elements according to the similarity between any two elements; the element is a human face;

the calculating the confidence value between the first class and the second class by using the clustering probability between two elements corresponding to each similarity in the K similarities comprises:

using a formula

2. The method according to claim 1, wherein the calculating the analogy probability that the first class and the second class are the same class by using the two inter-element clustering probabilities corresponding to each of the K similarities comprises:

using a formula

3. A cluster processing device, applied to face recognition, includes:

a clustering module configured to merge the first class and the second class when both the confidence value and the analogy probability satisfy a preset condition;

further comprising:

a probability obtaining module configured to obtain a set of elements to be processed, and calculate a clustering probability between any two elements in the set of elements to be processed, where the set of elements to be processed includes: all elements in the first class and all elements in the second class;

the probability obtaining module comprises:

a determining submodule configured to determine a clustering probability between any two elements according to a similarity between the any two elements; the element is a human face;

the processing module configured to employ a formula

4. The device of claim 3Wherein the processing module is configured to apply a formula

5. A clustering apparatus, comprising:

a processor;

a memory for storing executable instructions;

wherein the processor is configured to:

the processor is further configured to:

before calculating the confidence value between the first class and the second class and the analogy probability that the first class and the second class are the same class by adopting the two inter-element clustering probabilities corresponding to each of the K similarities,

the processor is further configured to:

using a formula