CN107423757B - Clustering processing method and device - Google Patents

Clustering processing method and device Download PDF

Info

Publication number
CN107423757B
CN107423757B CN201710573089.4A CN201710573089A CN107423757B CN 107423757 B CN107423757 B CN 107423757B CN 201710573089 A CN201710573089 A CN 201710573089A CN 107423757 B CN107423757 B CN 107423757B
Authority
CN
China
Prior art keywords
class
elements
probability
clustering
similarity
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201710573089.4A
Other languages
Chinese (zh)
Other versions
CN107423757A (en
Inventor
陈志军
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Xiaomi Mobile Software Co Ltd
Original Assignee
Beijing Xiaomi Mobile Software Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Xiaomi Mobile Software Co Ltd filed Critical Beijing Xiaomi Mobile Software Co Ltd
Priority to CN201710573089.4A priority Critical patent/CN107423757B/en
Publication of CN107423757A publication Critical patent/CN107423757A/en
Application granted granted Critical
Publication of CN107423757B publication Critical patent/CN107423757B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • G06F18/232Non-hierarchical techniques
    • G06F18/2321Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/22Matching criteria, e.g. proximity measures

Landscapes

  • Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Probability & Statistics with Applications (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Image Analysis (AREA)

Abstract

The present disclosure provides a clustering method and apparatus, the method comprising: respectively obtaining the similarity between each two elements in the first class and each two elements in the second class to obtain M × N similarities, wherein the first class comprises M elements, and the second class comprises N elements; determining preset K similarity degrees according to the similarity degrees from big to small; calculating a confidence value between the first class and the second class and an analogy probability that the first class and the second class are the same class by adopting two inter-element clustering probabilities corresponding to each of the K similarities; and when the confidence value and the analogy probability both meet a preset condition, combining the first class and the second class. In the new clustering method, the clustering probability among different elements is considered, so that the confidence values and the analogy probability of different classes are calculated, and the clustering accuracy is greatly improved.

Description

Clustering processing method and device
Technical Field
The present disclosure relates to information processing technologies, and in particular, to a clustering method and apparatus.
Background
Clustering refers to the process of dividing a collection of physical or abstract objects into classes composed of similar objects. The cluster generated by clustering is a collection of a set of data objects that are similar to objects in the same cluster and distinct from objects in other clusters.
Clustering is widely applied in various fields, such as face recognition, product recommendation and the like. However, the existing clustering method mainly adopts a one-to-many and one-to-one mode, and the clustering effect is poor.
Disclosure of Invention
The disclosure provides a clustering method and a clustering device, which are used for improving clustering effect.
According to a first aspect of the embodiments of the present disclosure, there is provided a clustering method, including:
respectively obtaining the similarity between each element in the first class and each element in the second class to obtain M-N similarities, wherein the first class comprises M elements, the second class comprises N elements, and M, N are integers greater than 0;
determining preset K similarity degrees according to the similarity degrees from large to small, wherein K is an integer larger than 0;
calculating a confidence value between the first class and the second class and an analogy probability that the first class and the second class are the same class by adopting two inter-element clustering probabilities corresponding to each of the K similarities, wherein the clustering probabilities are used for indicating the probabilities that the two elements are the same element;
and when the confidence value and the analogy probability both meet a preset condition, combining the first class and the second class.
Optionally, before calculating the confidence value between the first class and the second class and the analogy probability that the first class and the second class are the same class by using the two inter-element clustering probabilities corresponding to each of the K similarities, the method further includes:
acquiring a set of elements to be processed, and calculating the clustering probability between any two elements in the set of elements to be processed, wherein the set of elements to be processed comprises: all elements in the first class and all elements in the second class.
Optionally, the calculating a clustering probability between any two elements in the set of elements to be processed includes:
calculating the similarity between any two elements in the element set to be processed;
and determining the clustering probability between any two elements according to the similarity between any two elements.
Optionally, the calculating a confidence value between the first class and the second class by using the two inter-element clustering probabilities corresponding to each of the K similarities includes:
using a formula
Figure GDA0002623015700000021
Calculating a confidence value D between the first class and the second classABWherein p isiAnd representing the clustering probability between two elements corresponding to the ith similarity in the K similarities, wherein i is greater than 0 and less than or equal to K.
Optionally, the calculating, by using the clustering probability between two elements corresponding to each of the K similarities, an analog probability that the first class and the second class are the same class includes:
using a formula
Figure GDA0002623015700000022
Calculating an analogy probability P that said first class and said second class are of the same classABWherein p isiAnd representing the clustering probability between two elements corresponding to the ith similarity in the K similarities, wherein i is greater than 0 and less than or equal to K.
According to a second aspect of the embodiments of the present disclosure, there is provided a cluster processing apparatus including:
the acquisition module is configured to respectively acquire the similarity between each two elements in the first class and each two elements in the second class to obtain M × N similarities, wherein the first class comprises M elements, the second class comprises N elements, and M, N are integers greater than 0;
the determining module is configured to determine preset K similarity degrees according to the similarity degrees from large to small, wherein K is an integer larger than 0;
a processing module configured to calculate a confidence value between the first class and the second class and an analogy probability that the first class and the second class are the same class by using a clustering probability between two elements corresponding to each of the K similarities, wherein the clustering probability is used to indicate a probability that two elements are the same element;
a clustering module configured to merge the first class and the second class when both the confidence value and the analogy probability satisfy a preset condition.
Optionally, the apparatus further comprises:
a probability obtaining module configured to obtain a set of elements to be processed, and calculate a clustering probability between any two elements in the set of elements to be processed, where the set of elements to be processed includes: all elements in the first class and all elements in the second class.
Optionally, the probability obtaining module includes:
the statistic submodule is configured to count the similarity between any two elements in the element set to be processed;
a determining submodule configured to determine a clustering probability between any two elements according to a similarity between the any two elements.
Optionally, the processing module is configured to adopt a formula
Figure GDA0002623015700000031
Calculating a confidence value D between the first class and the second classABWherein p isiAnd representing the clustering probability between two elements corresponding to the ith similarity in the K similarities, wherein i is greater than 0 and less than or equal to K.
Optionally, the processing module is configured to adopt a formula
Figure GDA0002623015700000032
Calculating an analogy probability P that said first class and said second class are of the same classABWherein p isiAnd representing the clustering probability between two elements corresponding to the ith similarity in the K similarities, wherein i is greater than 0 and less than or equal to K.
According to a third aspect of the embodiments of the present disclosure, there is provided a cluster processing apparatus including:
a processor;
a memory for storing executable instructions;
wherein the processor is configured to:
respectively obtaining the similarity between each element in the first class and each element in the second class to obtain M-N similarities, wherein the first class comprises M elements, the second class comprises N elements, and M, N are integers greater than 0;
determining preset K similarity degrees according to the similarity degrees from large to small, wherein K is an integer larger than 0;
calculating a confidence value between the first class and the second class and an analogy probability that the first class and the second class are the same class by adopting two inter-element clustering probabilities corresponding to each of the K similarities, wherein the clustering probabilities are used for indicating the probabilities that the two elements are the same element;
and when the confidence value and the analogy probability both meet a preset condition, combining the first class and the second class.
The technical scheme provided by the embodiment of the disclosure can have the following beneficial effects: respectively obtaining the similarity between each element in the first class and each element in the second class to obtain M x N similarity, determining preset K similarities according to the similarity from large to small, calculating a confidence value between the first class and the second class and an analog probability that the first class and the second class are the same class by adopting the clustering probability between two elements corresponding to each similarity in the K similarities, and then merging the first class and the second class when the confidence value and the analog probability both meet preset conditions. In the new clustering method, the clustering probability among different elements is considered, so that the confidence values and the analogy probability of different classes are calculated, and the clustering accuracy is greatly improved.
It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the disclosure.
Drawings
In order to more clearly illustrate the embodiments of the present disclosure or the technical solutions in the prior art, the drawings needed to be used in the description of the embodiments or the prior art will be briefly introduced below, and it is obvious that the drawings in the following description are some embodiments of the present disclosure, and for those skilled in the art, other drawings can be obtained according to the drawings without inventive exercise.
FIG. 1 is a flow diagram illustrating a method of clustering according to an exemplary embodiment;
FIG. 2 is a flow diagram illustrating a method of clustering according to another exemplary embodiment;
fig. 3 is a schematic structural diagram illustrating a cluster processing apparatus according to an exemplary embodiment;
fig. 4 is a schematic structural diagram illustrating a cluster processing apparatus according to yet another exemplary embodiment;
fig. 5 is a schematic structural diagram showing a cluster processing apparatus according to still another exemplary embodiment;
fig. 6 is a schematic structural diagram showing a cluster processing apparatus according to another exemplary embodiment;
fig. 7 is a schematic structural diagram illustrating a cluster processing apparatus according to another exemplary embodiment.
Detailed Description
Reference will now be made in detail to the exemplary embodiments, examples of which are illustrated in the accompanying drawings. When the following description refers to the accompanying drawings, like numbers in different drawings represent the same or similar elements unless otherwise indicated. The implementations described in the exemplary embodiments below are not intended to represent all implementations consistent with the present disclosure. Rather, they are merely examples of apparatus and methods consistent with certain aspects of the present disclosure, as detailed in the appended claims.
The terms "first," "second," "third," and the like in the description and in the claims of the present disclosure are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used is interchangeable under appropriate circumstances such that the embodiments of the disclosure described herein are, for example, capable of operation in sequences other than those illustrated or otherwise described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus. In the specification and claims of the present disclosure "" denotes a multiplier.
In the embodiment of the disclosure, the probability, confidence and the like of the same elements among the elements are introduced in the clustering process, so as to effectively ensure the accuracy of clustering.
FIG. 1 is a flow diagram illustrating a method of clustering according to an example embodiment. As shown in fig. 1, the method includes:
in step S101, the similarity between each two of the elements in the first class and each of the elements in the second class is obtained, so as to obtain M × N similarities.
The embodiment of the present disclosure takes two classes as an example for description, and in a specific implementation, a plurality of classes may be executed with reference to the first class and the second class in pairs. It should be noted that the first class and the second class are any two of a plurality of classes to be clustered. In an initial state, all elements to be aggregated can be classified into one class by each element, then the similarity between every two elements is calculated, specifically, the similarity can be represented by a distance, the smaller the distance is, the greater the similarity is, the elements with the distance smaller than a preset threshold are aggregated into one class, and a plurality of classes are formed.
Wherein the first type comprises M elements, the second type comprises N elements, and M, N are integers greater than 0.
Elements in this disclosure may refer to anything that needs to be clustered, such as: human face, fruit, etc., without limitation herein.
In step S102, a preset number K of similarity degrees are determined according to the similarity degrees from high to low.
Wherein K is an integer greater than 0. And K is a preset value, namely, the similarity of the preset number is selected according to the sequence from big to small of the similarity. Specifically, the K similarity values may be added to the similarity set E. And adding elements corresponding to all the similarity degrees in the K similarity degrees into the set V.
The set V and the set E can also be updated continuously in the whole clustering process, after the similarity is sequenced, if the similarity between the first pair of the element A and the element B is the highest, whether at least 1 of the element A and the element B is in the set V is determined, and if yes, the next pair of elements is judged; if not, element A and element B are added to set V.
In step S103, a confidence value between the first class and the second class and an analogy probability that the first class and the second class are the same class are calculated by using two inter-element clustering probabilities corresponding to each of the K similarities.
Wherein the clustering probability is used to indicate the probability that two elements are the same element.
After selecting the K similarities, the similarities are respectively the similarities between the elements. For example, if a certain similarity is the similarity between the element a and the element B, then the clustering probability of the element a and the element B is obtained, and so on, the clustering probability between two elements corresponding to each similarity in the K similarities is obtained, and then the confidence value between the first class and the second class, and the analogy probability that the first class and the second class are the same class are calculated.
Alternatively, K may be equal to M, and may indicate that "probability value of first class to second class" and "probability value of second class to first class" are the same.
It should be noted that the confidence value, i.e. the information entropy, is used to indicate the confidence that two elements are the same element. In the source, not the uncertainty that occurs for a single element, but the average uncertainty of all possible occurrences of this source is considered. The average uncertainty of the source should be a statistical average of the individual symbol uncertainties, which may be referred to as an information entropy value.
In step S104, when both the confidence value and the analogy probability satisfy the predetermined condition, the first class and the second class are merged.
In this embodiment, the similarity between each element in the first class and each element in the second class is obtained, M × N similarities are obtained, K preset similarities are determined according to the similarities from large to small, a confidence value between the first class and the second class and an analog probability that the first class and the second class are the same class are calculated by using a clustering probability between two elements corresponding to each similarity in the K similarities, and the first class and the second class are combined when the confidence value and the analog probability both satisfy a preset condition. In the new clustering method, the clustering probability among different elements is considered, so that the confidence values and the analogy probability of different classes are calculated, and the clustering accuracy is greatly improved.
The clustering probability between any two elements in the element set to be processed can be obtained from a preset database, and can also be obtained through temporary calculation.
Optionally, before calculating the confidence value between the first class and the second class and the analog probability that the first class and the second class are the same class by using the two inter-element clustering probabilities corresponding to each of the K similarities, a large number of element samples may be collected, the clustering probability between any two elements may be calculated, and the calculated clustering probability may be stored in a preset database.
Specifically, the method may include: and acquiring a to-be-processed element set, and calculating the clustering probability between any two elements in the to-be-processed element set. In the specific calculation, two elements can be combined, and the clustering probability between the two elements is calculated until the clustering probability is calculated between all the elements in the element set to be processed.
The set of elements to be processed may include a plurality of element samples, and the element samples include the same elements and different elements. The set of elements to be processed includes, but is not limited to, all elements in the first class and all elements in the second class.
Fig. 2 is a flow chart illustrating a method of clustering according to another exemplary embodiment.
Optionally, as shown in fig. 2, calculating a clustering probability between any two elements in the set of elements to be processed may include:
in step S201, the similarity between any two elements in the set of elements to be processed is counted.
In step S202, a clustering probability between any two elements is determined according to a similarity between any two elements.
Specifically, when the similarity between two elements is calculated in a pairwise combination, any similarity calculation method may be adopted, which is not limited in this disclosure. For example, if the elements are all faces, a face recognition algorithm can be adopted to extract the features of each face, and then the similarity between the two faces is calculated.
And then calculating the probability that the two elements are the same element through the statistics of the similarity. For example, waveform statistics may be used, that is, a similarity curve is generated, and the higher the similarity is, the higher the probability that two elements are the same element is, that is, the higher the clustering probability value is.
The similarity value in the statistical process can be embodied by a distance value, and the closer the distance between the two elements is, the higher the identification similarity is. It can be seen on the graph that the closer the distance, the higher the clustering probability.
Specifically, the confidence value between the first class and the second class is calculated by using the clustering probability between two elements corresponding to each similarity in the K similarities, which may specifically be: using a formula
Figure GDA0002623015700000091
Calculating a confidence value D between the first class and the second classAB
Wherein p isiAnd representing the clustering probability between two elements corresponding to the ith similarity in the K similarities, wherein i is greater than 0 and less than or equal to K.
-log2piIndicating the uncertainty that the two elements corresponding to the ith similarity are the same element.
DABSmaller means that the elements in the first class and the elements in the second class are the same element with less uncertainty.
Similarly, the clustering probability between two elements corresponding to each of the K similarities is used to calculate the analogy probability between the first class and the second class, which may be: using a formula
Figure GDA0002623015700000092
Calculating an analogy probability P that said first class and said second class are of the same classABWherein p isiAnd representing the clustering probability between two elements corresponding to the ith similarity in the K similarities, wherein i is greater than 0 and less than or equal to K.
On the basis of the above embodiment, the confidence threshold and the analogy probability threshold can be set separately. When both the confidence value and the analogy probability satisfy the preset condition, the first class and the second class are merged, which may be: and when the confidence value is greater than a first preset threshold value and the analogy probability is greater than a second preset threshold value, combining the first class and the second class.
Fig. 3 is a schematic structural diagram illustrating a cluster processing apparatus according to an exemplary embodiment. The embodiment of the disclosure provides a cluster processing device, which can be integrated in a terminal or a terminal. The terminal here may refer to a computer, a server, and the like, and is not limited herein. As shown in fig. 3, the apparatus includes: an obtaining module 301, a determining module 302, a processing module 303 and a clustering module 304, wherein:
the obtaining module 301 is configured to obtain similarity between each two of each element in the first class and each element in the second class, to obtain M × N similarities, where the first class includes M elements, the second class includes N elements, and M, N are integers greater than 0.
A determining module 302, configured to determine preset K similarity degrees according to the similarity degrees from large to small, where K is an integer greater than 0.
A processing module 303, configured to calculate a confidence value between the first class and the second class and an analogy probability that the first class and the second class are the same class by using two inter-element clustering probabilities corresponding to each of the K similarities, where the clustering probabilities are used to indicate probabilities that two elements are the same element.
A clustering module 304 configured to merge the first class and the second class when both the confidence value and the analogy probability satisfy a preset condition.
In the clustering device provided in this embodiment, a similarity between each two elements in the first class and each two elements in the second class is obtained to obtain M × N similarities, a preset K similarities are determined according to the similarities from large to small, a confidence value between the first class and the second class and an analog probability that the first class and the second class are the same class are calculated by using a clustering probability between two elements corresponding to each similarity in the K similarities, and the first class and the second class are merged when both the confidence value and the analog probability satisfy a preset condition. In the new clustering method, the clustering probability among different elements is considered, so that the confidence values and the analogy probability of different classes are calculated, and the clustering accuracy is greatly improved.
Fig. 4 is a schematic structural diagram illustrating a cluster processing apparatus according to still another exemplary embodiment. As shown in fig. 4, on the basis of fig. 3, the apparatus may further include: a probability acquisition module 401.
A probability obtaining module 401, configured to obtain a to-be-processed element set, and calculate a clustering probability between any two elements in the to-be-processed element set, where the to-be-processed element set includes: all elements in the first class and all elements in the second class.
Fig. 5 is a schematic structural diagram illustrating a cluster processing apparatus according to still another exemplary embodiment. Optionally, as shown in fig. 5, on the basis of fig. 4, the probability obtaining module 401 may include: a statistics submodule 501 and a determination submodule 502, wherein:
and the statistic submodule 501 is configured to count the similarity between any two elements in the set of elements to be processed.
A determining submodule 502 configured to determine a clustering probability between any two elements according to a similarity between the any two elements.
Further, the processing module 303 may be specifically configured to employ a formula
Figure GDA0002623015700000111
Calculating a confidence value D between the first class and the second classABWherein p isiAnd representing the clustering probability between two elements corresponding to the ith similarity in the K similarities, wherein i is greater than 0 and less than or equal to K.
Optionally, the processing module 303 may be specifically configured to employ a formula
Figure GDA0002623015700000112
Calculating an analogy probability P that said first class and said second class are of the same classABWherein p isiAnd representing the clustering probability between two elements corresponding to the ith similarity in the K similarities, wherein i is greater than 0 and less than or equal to K.
Fig. 6 is a schematic structural diagram illustrating a cluster processing apparatus according to another exemplary embodiment. The clustering device may be integrated in a terminal or may be a terminal. The terminal here may refer to a computer, a server, and the like, and is not limited herein.
As shown in fig. 6, the apparatus includes: a processor 601 and a memory 602 for storing executable instructions. Wherein the processor 601 is coupled to the memory 602.
The processor 601 is configured to:
respectively obtaining the similarity between each element in the first class and each element in the second class to obtain M-N similarities, wherein the first class comprises M elements, the second class comprises N elements, and M, N are integers greater than 0;
determining preset K similarity degrees according to the similarity degrees from large to small, wherein K is an integer larger than 0;
calculating a confidence value between the first class and the second class and an analogy probability that the first class and the second class are the same class by adopting two inter-element clustering probabilities corresponding to each of the K similarities, wherein the clustering probabilities are used for indicating the probabilities that the two elements are the same element;
and when the confidence value and the analogy probability both meet a preset condition, combining the first class and the second class.
To sum up, in the clustering device provided in this embodiment, the similarity between each element in the first class and each element in the second class is obtained to obtain M × N similarities, a preset K similarity is determined according to the similarity from large to small, a clustering probability between two elements corresponding to each similarity in the K similarities is used to calculate a confidence value between the first class and the second class and an analogy probability that the first class and the second class are the same class, and the first class and the second class are merged when both the confidence value and the analogy probability satisfy a preset condition. In the new clustering method, the clustering probability among different elements is considered, so that the confidence values and the analogy probability of different classes are calculated, and the clustering accuracy is greatly improved.
Fig. 7 is a schematic structural diagram illustrating a cluster processing apparatus according to another exemplary embodiment.
Referring to fig. 7, the cluster processing apparatus 700 may include one or more of the following components: a processing component 702, a memory 704, a power component 706, a multimedia component 708, an audio component 710, an input/output (I/O) interface 712, a sensor component 714, and a communication component 716.
The processing component 702 generally controls the overall operation of the cluster processing apparatus 700, such as operations associated with display, data communication, camera operations, and recording operations. The processing components 702 may include one or more processors 720 to execute instructions to perform all or a portion of the steps of the methods described above. Further, the processing component 702 may include one or more modules that facilitate interaction between the processing component 702 and other components. For example, the processing component 702 may include a multimedia module to facilitate interaction between the multimedia component 708 and the processing component 702.
The memory 704 is configured to store various types of data to support operations at the cluster processing apparatus 700. Examples of such data include instructions for any application or method operating on the cluster processing apparatus 700, contact data, phonebook data, messages, pictures, videos, and so forth. The Memory 704 may be implemented by any type of volatile or non-volatile Memory device or combination thereof, such as Static Random Access Memory (SRAM), Electrically Erasable Programmable Read-Only Memory (EEPROM), Erasable Programmable Read-Only Memory (EPROM), Programmable Read-Only Memory (PROM), Read-Only Memory (ROM), magnetic Memory, flash Memory, magnetic disk or optical disk.
The power component 706 provides power to the various components of the cluster processing apparatus 700. The power components 706 can include a power management system, one or more power sources, and other components associated with generating, managing, and distributing power for the cluster processing device 700.
The multimedia component 708 comprises a screen providing an output interface between the cluster processing apparatus 700 and the user. In some embodiments, the screen may include a Liquid Crystal Display (LCD) and a Touch Panel (TP). If the screen includes a touch panel, the screen may be implemented as a touch screen to receive an input signal from a user. The touch panel includes one or more touch sensors to sense touch, slide, and gestures on the touch panel. The touch sensor may not only sense the boundary of a touch or slide action, but also detect the duration and pressure associated with the touch or slide operation. In some embodiments, the multimedia component 708 includes a front facing camera and/or a rear facing camera. When the cluster processing device 700 is in an operation mode, such as a shooting mode or a video mode, the front camera and/or the rear camera can receive external multimedia data. Each front camera and rear camera may be a fixed optical lens system or have a focal length and optical zoom capability.
The audio component 710 is configured to output and/or input audio signals. For example, the audio component 710 may include a Microphone (MIC) configured to receive external audio signals when the cluster processing apparatus 700 is in an operating mode, such as a call mode, a recording mode, and a voice recognition mode. The received audio signal may further be stored in the memory 704 or transmitted via the communication component 716. In some embodiments, audio component 710 also includes a speaker for outputting audio signals.
The I/O interface 712 provides an interface between the processing component 702 and peripheral interface modules, which may be keyboards, click wheels, buttons, etc. These buttons may include, but are not limited to: a home button, a volume button, a start button, and a lock button.
The sensor component 714 includes one or more sensors for providing status evaluations of various aspects to the cluster processing apparatus 700. For example, the sensor component 714 may detect an open/closed state of the cluster processing device 700, the relative positioning of components, such as a display and a keypad of the cluster processing device 700, the sensor component 714 may also detect a change in position of the cluster processing device 700 or a component of the cluster processing device 700, the presence or absence of user contact with the cluster processing device 700, orientation or acceleration/deceleration of the cluster processing device 700, and a change in temperature of the cluster processing device 700. The sensor assembly 714 may include a proximity sensor configured to detect the presence of a nearby object without any physical contact. The sensor assembly 714 may also include a photosensor, such as a Complementary Metal Oxide Semiconductor (CMOS) or Charge-coupled Device (CCD) photosensitive imaging element, for use in imaging applications. In some embodiments, the sensor assembly 714 may also include an acceleration sensor, a gyroscope sensor, a magnetic sensor, a pressure sensor, or a temperature sensor.
The communication component 716 is configured to facilitate communication between the cluster processing apparatus 700 and other devices in a wired or wireless manner. The cluster processing device 700 may access a wireless network based on a communication standard, such as Wi-Fi, 2G, or 3G, or a combination thereof. In an exemplary embodiment, the communication component 716 receives a broadcast signal or broadcast related information from an external broadcast management system via a broadcast channel. In an exemplary embodiment, the Communication component 716 further includes a Near Field Communication (NFC) module to facilitate short-range communications. For example, the NFC module may be implemented based on Radio Frequency Identification (RFID) technology, Infrared data association (IrDA) technology, Ultra Wideband (UWB) technology, Bluetooth (BT) technology, and other technologies.
In an exemplary embodiment, the clustering unit 700 may be implemented by one or more Application Specific Integrated Circuits (ASICs), Digital Signal Processors (DSPs), Digital Signal Processing Devices (DSPDs), Programmable Logic Devices (PLDs), Field Programmable Gate Arrays (FPGAs), controllers, microcontrollers, microprocessors, or other electronic components for performing the above-described methods.
In an exemplary embodiment, a non-transitory computer-readable storage medium comprising instructions, such as the memory 704 comprising instructions, executable by the processor 720 of the cluster processing device 700 to perform the above-described method is also provided. For example, the non-transitory computer readable storage medium may be a ROM, a Random Access Memory (RAM), a Compact disk Read-Only Memory (CD-ROM), a magnetic tape, a floppy disk, an optical data storage device, and the like.
A non-transitory computer readable storage medium, in which instructions, when executed by a processor of a cluster processing apparatus 700, enable the cluster processing apparatus 700 to perform the above-described method.
Other embodiments of the disclosure will be apparent to those skilled in the art from consideration of the specification and practice of the disclosure disclosed herein. This application is intended to cover any variations, uses, or adaptations of the disclosure following, in general, the principles of the disclosure and including such departures from the present disclosure as come within known or customary practice within the art to which the disclosure pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the disclosure being indicated by the following claims.
It will be understood that the present disclosure is not limited to the precise arrangements described above and shown in the drawings and that various modifications and changes may be made without departing from the scope thereof. The scope of the present disclosure is limited only by the appended claims.

Claims (5)

1. A clustering method is applied to face recognition and comprises the following steps:
respectively obtaining the similarity between each element in the first class and each element in the second class to obtain M-N similarities, wherein the first class comprises M elements, the second class comprises N elements, and M, N are integers greater than 0;
determining preset K similarity degrees according to the similarity degrees from large to small, wherein K is an integer larger than 0;
calculating a confidence value between the first class and the second class and an analogy probability that the first class and the second class are the same class by adopting two inter-element clustering probabilities corresponding to each of the K similarities, wherein the clustering probabilities are used for indicating the probabilities that the two elements are the same element;
when the confidence value and the analogy probability both meet a preset condition, combining the first class and the second class;
before calculating the confidence value between the first class and the second class and the analogy probability that the first class and the second class are the same class by using the two inter-element clustering probabilities corresponding to each of the K similarities, the method further includes:
acquiring a set of elements to be processed, and calculating the clustering probability between any two elements in the set of elements to be processed, wherein the set of elements to be processed comprises: all elements in the first class and all elements in the second class;
the calculating the clustering probability between any two elements in the element set to be processed comprises:
calculating the similarity between any two elements in the element set to be processed;
determining the clustering probability between any two elements according to the similarity between any two elements; the element is a human face;
the calculating the confidence value between the first class and the second class by using the clustering probability between two elements corresponding to each similarity in the K similarities comprises:
using a formula
Figure FDA0002623015690000021
Calculating a confidence value D between the first class and the second classABWherein p isiAnd representing the clustering probability between two elements corresponding to the ith similarity in the K similarities, wherein i is greater than 0 and less than or equal to K.
2. The method according to claim 1, wherein the calculating the analogy probability that the first class and the second class are the same class by using the two inter-element clustering probabilities corresponding to each of the K similarities comprises:
using a formula
Figure FDA0002623015690000022
Calculating an analogy probability P that said first class and said second class are of the same classABWherein p isiAnd representing the clustering probability between two elements corresponding to the ith similarity in the K similarities, wherein i is greater than 0 and less than or equal to K.
3. A cluster processing device, applied to face recognition, includes:
the acquisition module is configured to respectively acquire the similarity between each two elements in the first class and each two elements in the second class to obtain M × N similarities, wherein the first class comprises M elements, the second class comprises N elements, and M, N are integers greater than 0;
the determining module is configured to determine preset K similarity degrees according to the similarity degrees from large to small, wherein K is an integer larger than 0;
a processing module configured to calculate a confidence value between the first class and the second class and an analogy probability that the first class and the second class are the same class by using a clustering probability between two elements corresponding to each of the K similarities, wherein the clustering probability is used to indicate a probability that two elements are the same element;
a clustering module configured to merge the first class and the second class when both the confidence value and the analogy probability satisfy a preset condition;
further comprising:
a probability obtaining module configured to obtain a set of elements to be processed, and calculate a clustering probability between any two elements in the set of elements to be processed, where the set of elements to be processed includes: all elements in the first class and all elements in the second class;
the probability obtaining module comprises:
the statistic submodule is configured to count the similarity between any two elements in the element set to be processed;
a determining submodule configured to determine a clustering probability between any two elements according to a similarity between the any two elements; the element is a human face;
the processing module configured to employ a formula
Figure FDA0002623015690000031
Calculating a confidence value D between the first class and the second classABWherein p isiAnd representing the clustering probability between two elements corresponding to the ith similarity in the K similarities, wherein i is greater than 0 and less than or equal to K.
4. The device of claim 3Wherein the processing module is configured to apply a formula
Figure FDA0002623015690000032
Calculating an analogy probability P that said first class and said second class are of the same classABWherein p isiAnd representing the clustering probability between two elements corresponding to the ith similarity in the K similarities, wherein i is greater than 0 and less than or equal to K.
5. A clustering apparatus, comprising:
a processor;
a memory for storing executable instructions;
wherein the processor is configured to:
respectively obtaining the similarity between each element in the first class and each element in the second class to obtain M-N similarities, wherein the first class comprises M elements, the second class comprises N elements, and M, N are integers greater than 0;
determining preset K similarity degrees according to the similarity degrees from large to small, wherein K is an integer larger than 0;
calculating a confidence value between the first class and the second class and an analogy probability that the first class and the second class are the same class by adopting two inter-element clustering probabilities corresponding to each of the K similarities, wherein the clustering probabilities are used for indicating the probabilities that the two elements are the same element;
when the confidence value and the analogy probability both meet a preset condition, combining the first class and the second class;
the processor is further configured to:
before calculating the confidence value between the first class and the second class and the analogy probability that the first class and the second class are the same class by adopting the two inter-element clustering probabilities corresponding to each of the K similarities,
acquiring a set of elements to be processed, and calculating the clustering probability between any two elements in the set of elements to be processed, wherein the set of elements to be processed comprises: all elements in the first class and all elements in the second class;
the calculating the clustering probability between any two elements in the element set to be processed comprises:
calculating the similarity between any two elements in the element set to be processed;
determining the clustering probability between any two elements according to the similarity between any two elements; the element is a human face;
the processor is further configured to:
using a formula
Figure FDA0002623015690000041
Calculating a confidence value D between the first class and the second classABWherein p isiAnd representing the clustering probability between two elements corresponding to the ith similarity in the K similarities, wherein i is greater than 0 and less than or equal to K.
CN201710573089.4A 2017-07-14 2017-07-14 Clustering processing method and device Active CN107423757B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201710573089.4A CN107423757B (en) 2017-07-14 2017-07-14 Clustering processing method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710573089.4A CN107423757B (en) 2017-07-14 2017-07-14 Clustering processing method and device

Publications (2)

Publication Number Publication Date
CN107423757A CN107423757A (en) 2017-12-01
CN107423757B true CN107423757B (en) 2020-10-09

Family

ID=60427642

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710573089.4A Active CN107423757B (en) 2017-07-14 2017-07-14 Clustering processing method and device

Country Status (1)

Country Link
CN (1) CN107423757B (en)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110377670B (en) * 2018-04-11 2021-11-26 腾讯大地通途(北京)科技有限公司 Method, device, medium and equipment for determining road element information
CN110403582B (en) * 2019-07-23 2021-12-03 宏人仁医医疗器械设备(东莞)有限公司 Method for analyzing pulse wave form quality
CN111444933B (en) * 2019-11-26 2023-10-10 北京邮电大学 Object classification method and device

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
TW201118754A (en) * 2009-11-17 2011-06-01 Nat Cheng Kong University Face clustering system using PLSA model
CN103324949A (en) * 2012-03-21 2013-09-25 阿里巴巴集团控股有限公司 Method and device for recognizing object in image
CN105608430A (en) * 2015-12-22 2016-05-25 小米科技有限责任公司 Face clustering method and device
CN105844283A (en) * 2015-01-16 2016-08-10 阿里巴巴集团控股有限公司 Method for identifying category of image, image search method and image search device
CN106778501A (en) * 2016-11-21 2017-05-31 武汉科技大学 Video human face ONLINE RECOGNITION method based on compression tracking with IHDR incremental learnings

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20070086627A1 (en) * 2005-10-18 2007-04-19 Samsung Electronics Co., Ltd. Face identification apparatus, medium, and method
US8977061B2 (en) * 2011-06-23 2015-03-10 Hewlett-Packard Development Company, L.P. Merging face clusters

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
TW201118754A (en) * 2009-11-17 2011-06-01 Nat Cheng Kong University Face clustering system using PLSA model
CN103324949A (en) * 2012-03-21 2013-09-25 阿里巴巴集团控股有限公司 Method and device for recognizing object in image
CN105844283A (en) * 2015-01-16 2016-08-10 阿里巴巴集团控股有限公司 Method for identifying category of image, image search method and image search device
CN105608430A (en) * 2015-12-22 2016-05-25 小米科技有限责任公司 Face clustering method and device
CN106778501A (en) * 2016-11-21 2017-05-31 武汉科技大学 Video human face ONLINE RECOGNITION method based on compression tracking with IHDR incremental learnings

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
Multi-modal user identification and object recognition surveillance system;Albert Clapes;《Pattern Recognition Letters》;20131231;全文 *
人脸识别系统与人脸检测算法研究;王蕾;《中国优秀硕士学位论文全文数据库 信息科技辑》;20110315;全文 *
视频人脸识别中基于聚类中心LLE的特征相似性融合方法;贾海龙;《科学技术与工程》;20141015;第14卷(第24期);全文 *
采用类别相似度聚合的关联文本分类方法;田丰;《西安交通大学学报》;20130315;第46卷(第12期);全文 *

Also Published As

Publication number Publication date
CN107423757A (en) 2017-12-01

Similar Documents

Publication Publication Date Title
US20210089799A1 (en) Pedestrian Recognition Method and Apparatus and Storage Medium
CN108629354B (en) Target detection method and device
KR101813195B1 (en) Method and apparatus, program and recording medium for recommending contact information
EP3179379A1 (en) Method and apparatus for determining similarity and terminal therefor
CN110472091B (en) Image processing method and device, electronic equipment and storage medium
WO2021036382A1 (en) Image processing method and apparatus, electronic device and storage medium
CN109599104B (en) Multi-beam selection method and device
CN106919629B (en) Method and device for realizing information screening in group chat
CN111259967B (en) Image classification and neural network training method, device, equipment and storage medium
CN111539443A (en) Image recognition model training method and device and storage medium
CN107423757B (en) Clustering processing method and device
CN105426878A (en) Method and device for face clustering
EP2919136A1 (en) Method and device for clustering
US20220300141A1 (en) Detection method, device, and electronic equipment
CN110781842A (en) Image processing method and device, electronic equipment and storage medium
KR20170101770A (en) Method and device for fingerprint recognition
CN111062407B (en) Image processing method and device, electronic equipment and storage medium
CN111125388B (en) Method, device and equipment for detecting multimedia resources and storage medium
CN111797746A (en) Face recognition method and device and computer readable storage medium
CN114547073B (en) Aggregation query method and device for time series data and storage medium
CN108628883B (en) Data processing method and device and electronic equipment
CN112131999B (en) Identity determination method and device, electronic equipment and storage medium
CN112333233B (en) Event information reporting method and device, electronic equipment and storage medium
CN109325141B (en) Image retrieval method and device, electronic equipment and storage medium
CN107992893B (en) Method and device for compressing image feature space

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant