CN113743533A - Picture clustering method and device and storage medium - Google Patents

Picture clustering method and device and storage medium Download PDF

Info

Publication number
CN113743533A
CN113743533A CN202111093417.3A CN202111093417A CN113743533A CN 113743533 A CN113743533 A CN 113743533A CN 202111093417 A CN202111093417 A CN 202111093417A CN 113743533 A CN113743533 A CN 113743533A
Authority
CN
China
Prior art keywords
feature vector
clustering
cluster
feature
vectors
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202111093417.3A
Other languages
Chinese (zh)
Other versions
CN113743533B (en
Inventor
刘畅
段钧宝
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Chongqing Unisinsight Technology Co Ltd
Original Assignee
Chongqing Unisinsight Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Chongqing Unisinsight Technology Co Ltd filed Critical Chongqing Unisinsight Technology Co Ltd
Priority to CN202111093417.3A priority Critical patent/CN113743533B/en
Publication of CN113743533A publication Critical patent/CN113743533A/en
Application granted granted Critical
Publication of CN113743533B publication Critical patent/CN113743533B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/22Matching criteria, e.g. proximity measures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • G06F18/253Fusion techniques of extracted features
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Image Analysis (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a picture clustering method, which comprises the following steps: respectively extracting the features of a plurality of pictures to be clustered to obtain a plurality of first feature vectors; clustering the plurality of first feature vectors to obtain at least one first cluster; determining a central feature vector of each of the at least one first cluster; respectively fusing the central feature vector of each first cluster with other first feature vectors in the first cluster to which the central feature vector belongs to obtain a plurality of second feature vectors; clustering the plurality of second feature vectors to obtain at least one second cluster; and clustering pictures based on the at least one second clustering cluster. By the image clustering method, the calculation mode of the class center characteristics is optimized, the expression mode of the image characteristics is optimized, and the clustering accuracy is improved.

Description

Picture clustering method and device and storage medium
Technical Field
The invention belongs to the technical field of picture clustering, and particularly relates to a picture clustering method, a picture clustering device and a storage medium.
Background
The face clustering is an important technology for applying algorithms such as face recognition and the like to the security field. The specific application is that the human faces are grouped, the same human faces are divided into the same file, and source tracing investigation can be carried out subsequently according to related human face information. In addition, the face clustering can be used for filing the collected face information in a scene without registration information, and can be subsequently used for analyzing user portrait, personnel monitoring and the like.
At present, the face clustering technology is mainly divided into two methods, one is that in the real-time snapshot process, a plurality of pictures to be clustered are snapshot by face quality scores, and then face recognition is used for filing and clustering. The other type is that the face is firstly subjected to feature extraction, and then the features are clustered by using a clustering algorithm. Since the business demand is more concerned with accuracy, the second type of picture clustering method is now used more.
In the prior art, firstly, a face feature extraction model is trained, then the similarity of two images is calculated according to the face features, then a clustering algorithm is used, and finally class merging and special condition processing are carried out according to certain rules and thresholds. The method has the defect of low accuracy and is highly dependent on the effect of the face feature extraction model. Other methods firstly train a face feature model to extract face features, and perform clustering, not only use face feature information, but also combine other information, such as technical use of confidence that two images belong to the same class, and determine the joint probability of the images belonging to the same class based on the space-time matching degree information and feature similarity of the two images. In addition to the similarity of two images, the similarity of the images and the class center features is also calculated. Although the accuracy of the method is improved to a certain extent, the method still cannot meet the requirements of some specific service scenes, particularly scenes with high algorithm efficiency requirements.
Disclosure of Invention
In view of the above-mentioned shortcomings of the prior art, the present invention provides a method, an apparatus and a storage medium for clustering pictures to optimize the calculation of the center-like feature.
To achieve the above and other related objects, the present invention provides a method for clustering pictures, comprising:
respectively extracting the features of a plurality of pictures to be clustered to obtain a plurality of first feature vectors;
clustering the plurality of first feature vectors to obtain at least one first cluster;
determining a central feature vector of each of the at least one first cluster;
respectively fusing the central feature vector of each first cluster with other first feature vectors in the first cluster to which the central feature vector belongs to obtain a plurality of second feature vectors;
clustering the plurality of second feature vectors to obtain at least one second cluster;
and clustering pictures based on the at least one second clustering cluster.
Optionally, determining the central feature vector of each of the at least one first cluster includes:
determining a third feature vector in each first cluster, wherein the third feature vector is the feature vector with the most similar feature vectors in the first clusters;
calculating first similarity of the third feature vector and other feature vectors in the first clustering cluster, and taking one feature vector of which the first similarity is greater than a set similarity threshold value as a fourth feature vector;
carrying out weighted average on the third feature vector and the fourth feature vector to obtain a first new feature vector;
calculating second similarity of the first new feature vector and other feature vectors in the first clustering cluster, and taking one feature vector of which the second similarity is greater than a set similarity threshold value as a fifth feature vector;
carrying out weighted average on the fifth feature vector and the first new feature vector to obtain a second new feature vector;
repeating the step of obtaining the new feature vector until the feature vector in the cluster class is traversed or no feature vector meeting the similarity threshold condition exists;
and taking the finally obtained new feature vector as a central feature vector.
Optionally, determining the central feature vector of each of the at least one first cluster includes:
determining a third feature vector in each first cluster, wherein the third feature vector is the feature vector with the most similar feature vectors in the first clusters;
calculating the distance between the third feature vector and other feature vectors in the first clustering cluster, and taking one feature vector with the distance smaller than a set distance threshold value as a fourth feature vector;
carrying out weighted average on the third feature vector and the fourth feature vector to obtain a first new feature vector;
calculating the distance between the first new feature vector and other feature vectors in the first clustering cluster, and taking one feature vector of which the distance is smaller than a set distance threshold value as a fifth feature vector;
carrying out weighted average on the fifth feature vector and the first new feature vector to obtain a second new feature vector;
repeating the step of obtaining the new characteristic vector until the characteristic vector in the cluster class is traversed or no characteristic vector meeting the distance threshold condition exists;
and taking the finally obtained new feature vector as a central feature vector.
Optionally, the plurality of first feature vectors are clustered using a KNN clustering algorithm.
Optionally, before clustering the second feature vectors, a KNN map is constructed based on the second feature vectors, and the KNN map is clustered by using an infomap clustering algorithm.
Optionally, the clustering the plurality of first feature vectors includes:
acquiring feature similarity between any two first feature vectors in the plurality of first feature vectors;
constructing a KNN matrix based on the feature similarity between any two first feature vectors;
setting the feature similarity with the feature similarity smaller than a set feature similarity threshold as 0 to obtain a sparse matrix;
completing clustering of the plurality of first eigenvectors based on the sparse matrix.
Optionally, the method further comprises:
determining a second cluster with the number of pictures smaller than a set value;
and re-clustering the pictures in the second cluster with the number of pictures smaller than the set value to ensure that the pictures in the second cluster with the number of pictures smaller than the set value belong to the first cluster.
Optionally, the fusing the center feature vector and other first feature vectors in the first cluster to which the center feature vector belongs respectively includes: and splicing the central feature vector and the first feature vector in sequence from end to end.
To achieve the above and other related objects, the present invention provides a picture clustering apparatus, including:
the characteristic extraction module is used for respectively extracting the characteristics of the multiple pictures to be clustered to obtain multiple first characteristic vectors;
the first clustering module is used for clustering the plurality of first characteristic vectors to obtain at least one first clustering cluster;
a central feature vector determination module, configured to determine a central feature vector of each of the at least one first cluster;
the characteristic fusion module is used for respectively fusing the central characteristic vector of each first cluster with other first characteristic vectors in the first cluster to which the central characteristic vector belongs to obtain a plurality of second characteristic vectors;
the second clustering module is used for clustering the plurality of second feature vectors to obtain at least one second clustering cluster;
and the third clustering module is used for clustering pictures based on the at least one second clustering cluster.
To achieve the above and other related objects, the present invention also provides a storage medium storing a computer program which, when executed by a processor, performs the method.
As described above, the image clustering method, the image clustering device and the storage medium of the present invention have the following advantages:
the invention discloses a picture clustering method, which comprises the following steps: respectively extracting the features of a plurality of pictures to be clustered to obtain a plurality of first feature vectors; clustering the plurality of first feature vectors to obtain at least one first cluster; determining a central feature vector of each of the at least one first cluster; respectively fusing the central feature vector of each first cluster with other first feature vectors in the first cluster to which the central feature vector belongs to obtain a plurality of second feature vectors; clustering the plurality of second feature vectors to obtain at least one second cluster; and clustering pictures based on the at least one second clustering cluster. By the image clustering method, the calculation mode of the class center characteristics is optimized, the expression mode of the image characteristics is optimized, and the clustering accuracy is improved.
Drawings
FIG. 1 is a flowchart illustrating a method for clustering pictures according to an embodiment of the present invention;
fig. 2 is a schematic block diagram of an image clustering device according to an embodiment of the present invention.
Detailed Description
The embodiments of the present invention are described below with reference to specific embodiments, and other advantages and effects of the present invention will be easily understood by those skilled in the art from the disclosure of the present specification. The invention is capable of other and different embodiments and of being practiced or of being carried out in various ways, and its several details are capable of modification in various respects, all without departing from the spirit and scope of the present invention. It is to be noted that the features in the following embodiments and examples may be combined with each other without conflict.
It should be noted that the drawings provided in the following embodiments are only for illustrating the basic idea of the present invention, and the components related to the present invention are only shown in the drawings rather than drawn according to the number, shape and size of the components in actual implementation, and the type, quantity and proportion of the components in actual implementation may be changed freely, and the layout of the components may be more complicated.
As shown in fig. 1, an embodiment of the present application provides a picture clustering method, including:
s10, respectively extracting the features of the multiple pictures to be clustered to obtain multiple first feature vectors;
s11, clustering the first feature vectors to obtain at least one first cluster;
s12 determining a central feature vector of each of the at least one first cluster;
s13, respectively fusing the central feature vector of each first cluster with other first feature vectors in the first cluster to which the central feature vector belongs to obtain a plurality of second feature vectors;
s14, clustering the second feature vectors to obtain at least one second cluster;
s15 clustering pictures based on the at least one second cluster.
According to the image clustering method, the calculation mode of the class center characteristics is optimized, the expression mode of the image characteristics is optimized, and the clustering accuracy is improved.
In step S10, feature extraction is performed on the multiple pictures to be clustered, respectively, to obtain multiple first feature vectors. The pictures to be clustered can be face pictures, landscape pictures, object pictures, other types of pictures and the like. The present embodiment is exemplified by a face picture.
When the features of the face picture are extracted, the face features of the face picture are extracted through a face feature extraction model which is trained in advance and is based on a neural network. The face feature extraction model is a residual error network with hundreds of layers. When the features of the face picture are extracted, the features of the face picture are extracted to obtain face features, and face feature vectors are further obtained.
In an embodiment, the clustering the plurality of first feature vectors includes:
s110, acquiring feature similarity between any two first feature vectors in the plurality of first feature vectors;
s111, constructing a KNN matrix based on the feature similarity between any two first feature vectors;
in constructing the KNN matrix, the value of K may be selected as desired, e.g., K may be 100.
S112, setting the feature similarity with the feature similarity smaller than a set feature similarity threshold as 0 to obtain a sparse matrix;
the set feature similarity threshold may be selected according to the requirement, for example, the feature similarity threshold may be 0.5
S113 completes clustering the plurality of first eigenvectors based on the sparse matrix.
In this embodiment, when clustering the plurality of first feature vectors, the plurality of first feature vectors are clustered by using a KNN clustering method to obtain a neighborhood of each vector, where the neighborhood can be used as an initial class, that is, a first cluster.
In step S12, determining a central feature vector of each of the at least one first cluster;
in the step, the plurality of first eigenvectors are clustered by a KNN clustering method, so that the clustering effect is prone to excessive discrete classes, and the clustering is too many in the same class. Therefore, the method and the device provide the class center feature vector, can enrich feature information of the picture, and enable the clustering effect to be more accurate under the condition that the clustering algorithm is not changed. Definition of central feature vector: the central feature vector is a representative vector for each class. In an embodiment, the determining the central feature vector of each of the at least one first cluster includes:
determining a third feature vector in each first cluster, wherein the third feature vector is the feature vector with the most similar feature vectors in the first clusters;
calculating first similarity of the third feature vector and other feature vectors in the first clustering cluster, and taking one feature vector of which the first similarity is greater than a set similarity threshold value as a fourth feature vector;
carrying out weighted average on the third feature vector and the fourth feature vector to obtain a first new feature vector;
calculating second similarity of the first new feature vector and other feature vectors in the first clustering cluster, and taking one feature vector of which the second similarity is greater than a set similarity threshold value as a fifth feature vector;
carrying out weighted average on the fifth feature vector and the first new feature vector to obtain a second new feature vector;
repeating the step of obtaining the new feature vector until the feature vector in the cluster class is traversed or no feature vector meeting the similarity threshold condition exists;
and taking the finally obtained new feature vector as a central feature vector.
The above-mentioned determining the central feature vector of the at least one first cluster may be calculated by using the following iterative formula:
Figure BDA0003268168710000061
wherein x is0The picture (vector) with the most similar pictures in the neighborhood is taken; n is the number of vectors in the neighborhood; y isi(i-0, 1,2 …, n-1) is the ith and x0A similar vector; x is the number ofi(i-0, 1,2 …, n-1) is x0To start with, from xi-1And yi-1And iterating the calculated vector through the formula. Alpha is alphaiIs xiAnd yiCosine similarity of (d); when the formula is iterated to i-n-1, such a final central vector x is obtainedn
By the method for the central feature vector, the picture with the highest similarity is ensured, the weight is the largest, and the influence of individual error pictures on the central vector in the class is reduced, so that a more reasonable class central feature vector can be found.
11. In an embodiment, determining the central feature vector of each of the at least one first cluster comprises:
determining a third feature vector in each first cluster, wherein the third feature vector is the feature vector with the most similar feature vectors in the first clusters;
calculating the distance between the third feature vector and other feature vectors in the first clustering cluster, and taking one feature vector with the distance smaller than a set distance threshold value as a fourth feature vector;
carrying out weighted average on the third feature vector and the fourth feature vector to obtain a first new feature vector;
calculating the distance between the first new feature vector and other feature vectors in the first clustering cluster, and taking one feature vector of which the distance is smaller than a set distance threshold value as a fifth feature vector;
carrying out weighted average on the fifth feature vector and the first new feature vector to obtain a second new feature vector;
repeating the step of obtaining the new characteristic vector until the characteristic vector in the cluster class is traversed or no characteristic vector meeting the distance threshold condition exists;
and taking the finally obtained new feature vector as a central feature vector.
In order to enrich the information of the face features and improve the clustering accuracy, the central feature vector of each first cluster and other first feature vectors in the first cluster to which the central feature vector belongs can be respectively fused to obtain a plurality of second feature vectors.
Specifically, the feature fusion is to splice two feature vectors in sequence from end to end. For example, the central feature vector is a [ a1, a2, A3, a4], the first feature vector is B [ B1, B2, B3, B4], and the central feature vector is spliced with the first feature vector to obtain a second feature vector represented as [ a1, a2, A3, a4, B1, B2, B3, B4 ].
In step S14, the plurality of second feature vectors are clustered to obtain at least one second cluster. And clustering the plurality of second feature vectors by adopting an infomap clustering algorithm.
The infomap algorithm is graph structure data which considers the data required to be clustered as a whole, and clustering is carried out in a mode that the whole graph structure has the minimum entropy value. Therefore, in the embodiment, the second feature vector is firstly constructed into a graph structure data by using the KNN, namely, the KNN graph is constructed, and then the KNN graph is input into the clustering algorithm infomap for clustering to obtain a second clustering cluster.
After the second clustering of the face pictures can be completed, the clustering effect can be evaluated by using the BCubed F score.
BCubed F score 2 recall/(accuracy + recall)
Accuracy ═ Σ (square of the picture of the same person in a class/total number of such pictures))/total number of target vectors participating in clustering.
Recall ═ Σ (the square of the number of target vectors in a class/total number of vectors participating in a cluster)/total number of target vectors participating in a cluster.
Such as: one class should have 63 graphs, which are clustered into 1 and 62 graphs, respectively, wherein 1 is a single class, and 62 graphs and the other 3 interference graphs are clustered into one class of 65 graphs. Then the accuracy rate is (1 × 1/1+62 × 62/65)/63 and the recall rate is (1 × 1/63+62 × 62/63)/63.
In one embodiment, a face feature extraction model is trained using a training set of face IDs of over 200W using a layer 152 residual network, whereby the model can obtain 256-dimensional feature vectors of the original picture data.
Calculating the feature similarity between any two feature vectors;
constructing the obtained 256-dimensional feature vector into a KNN matrix with K being 100 based on the feature similarity;
setting the feature similarity with the feature similarity smaller than 0.5 to 0, thereby obtaining a sparse matrix.
According to the sparse matrix, each graph can find its own neighborhood, namely, a preliminary clustering result, namely, a first clustering cluster is obtained.
After the first cluster is obtained, the central feature vector of each first cluster is calculated by utilizing the mode of determining the central feature vector of each first cluster in the at least one first cluster, and then the calculated central feature vector is fused with other feature vectors except the central feature vector in the cluster to which the corresponding central feature vector belongs, and the other feature vectors are spliced.
And then, constructing a primary KNN graph by using the fused and spliced feature vectors, and clustering by using the primary KNN graph as the input of the infomap, wherein the minimum threshold of the infomap is set to be 0.69, so that the result is better (the minimum threshold can have good recall rate under the condition of high accuracy, and the actual engineering requirements are met).
Finally, after the Infomap clustering is finished, a plurality of isolated points can exist, and after the Infomap clustering is finished, all the characteristic vectors in the class which are less than 3 characteristic vectors in the class are clustered again. And classifying the feature vectors into the class of the nearest feature vectors in the KNN neighborhood, setting the maximum distance threshold value to be 0.32, and keeping the feature vector clustering unchanged when the maximum distance threshold value is greater than the threshold value until the clustering is finished.
As shown in fig. 2, an embodiment of the present application provides an image clustering device, including:
the feature extraction module 20 is configured to perform feature extraction on the multiple pictures to be clustered respectively to obtain multiple first feature vectors;
the first clustering module 21 is configured to cluster the plurality of first feature vectors to obtain at least one first cluster;
a central feature vector determining module 22, configured to determine a central feature vector of each of the at least one first cluster;
the feature fusion module 23 is configured to fuse the center feature vector of each first cluster with other first feature vectors in the first cluster to which the center feature vector belongs, respectively, to obtain a plurality of second feature vectors;
a second clustering module 24, configured to cluster the plurality of second feature vectors to obtain at least one second clustering cluster;
and a third clustering module 25, configured to perform picture clustering based on the at least one second clustering cluster.
It should be noted that, because the embodiment of the apparatus portion and the embodiment of the method portion correspond to each other, please refer to the description of the embodiment of the method portion for the content of the embodiment of the apparatus portion, which is not repeated here.
The invention also provides a storage medium storing a computer program, which when executed by a processor performs the aforementioned picture clustering method.
The present invention also provides an apparatus comprising:
a memory for storing a computer program;
a processor for executing the computer program stored in the memory to make the device execute the aforementioned picture clustering method.
The Processor may be a Central Processing Unit (CPU), other general purpose Processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA) or other Programmable logic device, discrete Gate or transistor logic device, discrete hardware component, etc. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.
The memory may be an internal storage unit or an external storage device, such as a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital Card (SD), a Flash memory Card (Flash Card), and the like. Further, the memory may also include both an internal storage unit and an external storage device. The memory is used for storing the computer program and other programs and data. The memory may also be used to temporarily store data that has been or will be output.
It will be apparent to those skilled in the art that, for convenience and brevity of description, only the above-mentioned division of the functional units and modules is illustrated, and in practical applications, the above-mentioned function distribution may be performed by different functional units and modules according to needs, that is, the internal structure of the apparatus is divided into different functional units or modules to perform all or part of the above-mentioned functions. Each functional unit and module in the embodiments may be integrated in one processing unit, or each unit may exist alone physically, or two or more units are integrated in one unit, and the integrated unit may be implemented in a form of hardware, or in a form of software functional unit. In addition, specific names of the functional units and modules are only for convenience of distinguishing from each other, and are not used for limiting the protection scope of the present application. The specific working processes of the units and modules in the system may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.
In the above embodiments, the descriptions of the respective embodiments have respective emphasis, and reference may be made to the related descriptions of other embodiments for parts that are not described or illustrated in a certain embodiment.
Those of ordinary skill in the art will appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware or combinations of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present invention.
In the embodiments provided in the present invention, it should be understood that the disclosed apparatus/terminal device and method may be implemented in other ways. For example, the above-described embodiments of the apparatus/terminal device are merely illustrative, and for example, the division of the modules or units is only one logical division, and there may be other divisions when actually implemented, for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or units, and may be in an electrical, mechanical or other form.
The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.
In addition, functional units in the embodiments of the present invention may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit.
The integrated modules/units, if implemented in the form of software functional units and sold or used as separate products, may be stored in a computer readable storage medium. Based on such understanding, all or part of the flow of the method according to the embodiments of the present invention may also be implemented by a computer program, which may be stored in a computer-readable storage medium, and when the computer program is executed by a processor, the steps of the method embodiments may be implemented. Wherein the computer program comprises computer program code, which may be in the form of source code, object code, an executable file or some intermediate form, etc. The computer readable medium may comprise any entity or device capable of carrying the computer program code, a recording medium, a U disk, a removable hard disk, a magnetic disk, an optical disk, a computer Memory, a Read-Only Memory (ROM), a Random Access Memory (RAM), an electrical carrier signal, a telecommunications signal, a software distribution medium, etc.
The foregoing embodiments are merely illustrative of the principles and utilities of the present invention and are not intended to limit the invention. Any person skilled in the art can modify or change the above-mentioned embodiments without departing from the spirit and scope of the present invention. Accordingly, it is intended that all equivalent modifications or changes which can be made by those skilled in the art without departing from the spirit and technical spirit of the present invention be covered by the claims of the present invention.

Claims (10)

1. A picture clustering method is characterized by comprising the following steps:
respectively extracting the features of a plurality of pictures to be clustered to obtain a plurality of first feature vectors;
clustering the plurality of first feature vectors to obtain at least one first cluster;
determining a central feature vector of each of the at least one first cluster;
respectively fusing the central feature vector of each first cluster with other first feature vectors in the first cluster to which the central feature vector belongs to obtain a plurality of second feature vectors;
clustering the plurality of second feature vectors to obtain at least one second cluster;
and clustering pictures based on the at least one second clustering cluster.
2. The method for clustering pictures according to claim 1, wherein determining the central feature vector of each of the at least one first cluster comprises:
determining a third feature vector in each first cluster, wherein the third feature vector is the feature vector with the most similar feature vectors in the first clusters;
calculating first similarity of the third feature vector and other feature vectors in the first clustering cluster, and taking one feature vector of which the first similarity is greater than a set similarity threshold value as a fourth feature vector;
carrying out weighted average on the third feature vector and the fourth feature vector to obtain a first new feature vector;
calculating second similarity of the first new feature vector and other feature vectors in the first clustering cluster, and taking one feature vector of which the second similarity is greater than a set similarity threshold value as a fifth feature vector;
carrying out weighted average on the fifth feature vector and the first new feature vector to obtain a second new feature vector;
repeating the step of obtaining the new feature vector until the feature vector in the cluster class is traversed or no feature vector meeting the similarity threshold condition exists;
and taking the finally obtained new feature vector as a central feature vector.
3. The method for clustering pictures according to claim 1, wherein determining the central feature vector of each of the at least one first cluster comprises:
determining a third feature vector in each first cluster, wherein the third feature vector is the feature vector with the most similar feature vectors in the first clusters;
calculating the distance between the third feature vector and other feature vectors in the first clustering cluster, and taking one feature vector with the distance smaller than a set distance threshold value as a fourth feature vector;
carrying out weighted average on the third feature vector and the fourth feature vector to obtain a first new feature vector;
calculating the distance between the first new feature vector and other feature vectors in the first clustering cluster, and taking one feature vector of which the distance is smaller than a set distance threshold value as a fifth feature vector;
carrying out weighted average on the fifth feature vector and the first new feature vector to obtain a second new feature vector;
repeating the step of obtaining the new characteristic vector until the characteristic vector in the cluster class is traversed or no characteristic vector meeting the distance threshold condition exists;
and taking the finally obtained new feature vector as a central feature vector.
4. The picture clustering method according to claim 1, wherein the plurality of first feature vectors are clustered using a KNN clustering algorithm.
5. The picture clustering method according to claim 4, wherein before clustering the second feature vectors, a KNN map is constructed based on the second feature vectors, and the KNN map is clustered by using an infomap clustering algorithm.
6. The method according to claim 1, wherein the clustering the plurality of first feature vectors comprises:
acquiring feature similarity between any two first feature vectors in the plurality of first feature vectors;
constructing a KNN matrix based on the feature similarity between any two first feature vectors;
setting the feature similarity with the feature similarity smaller than a set feature similarity threshold as 0 to obtain a sparse matrix;
completing clustering of the plurality of first eigenvectors based on the sparse matrix.
7. The method for clustering pictures according to claim 1, further comprising:
determining a second cluster with the number of pictures smaller than a set value;
and re-clustering the pictures in the second cluster with the number of pictures smaller than the set value to ensure that the pictures in the second cluster with the number of pictures smaller than the set value belong to the first cluster.
8. The method for clustering pictures according to claim 1, wherein the fusing the central feature vector with other first feature vectors in the first cluster to which the central feature vector belongs respectively comprises: and splicing the central feature vector and the first feature vector in sequence from end to end.
9. An apparatus for clustering pictures, the apparatus comprising:
the characteristic extraction module is used for respectively extracting the characteristics of the multiple pictures to be clustered to obtain multiple first characteristic vectors;
the first clustering module is used for clustering the plurality of first characteristic vectors to obtain at least one first clustering cluster;
a central feature vector determination module, configured to determine a central feature vector of each of the at least one first cluster;
the characteristic fusion module is used for respectively fusing the central characteristic vector of each first cluster with other first characteristic vectors in the first cluster to which the central characteristic vector belongs to obtain a plurality of second characteristic vectors;
the second clustering module is used for clustering the plurality of second feature vectors to obtain at least one second clustering cluster;
and the third clustering module is used for clustering pictures based on the at least one second clustering cluster.
10. A storage medium storing a computer program, wherein the computer program, when executed by a processor, performs the picture clustering method according to any one of claims 1 to 8.
CN202111093417.3A 2021-09-17 2021-09-17 Picture clustering method and device and storage medium Active CN113743533B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111093417.3A CN113743533B (en) 2021-09-17 2021-09-17 Picture clustering method and device and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111093417.3A CN113743533B (en) 2021-09-17 2021-09-17 Picture clustering method and device and storage medium

Publications (2)

Publication Number Publication Date
CN113743533A true CN113743533A (en) 2021-12-03
CN113743533B CN113743533B (en) 2023-08-01

Family

ID=78739611

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111093417.3A Active CN113743533B (en) 2021-09-17 2021-09-17 Picture clustering method and device and storage medium

Country Status (1)

Country Link
CN (1) CN113743533B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114612967A (en) * 2022-03-03 2022-06-10 北京百度网讯科技有限公司 Face clustering method, device, equipment and storage medium

Citations (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2012089902A1 (en) * 2010-12-30 2012-07-05 Nokia Corporation Method, apparatus, and computer program product for image clustering
CN103049514A (en) * 2012-12-14 2013-04-17 杭州淘淘搜科技有限公司 Balanced image clustering method based on hierarchical clustering
US20150302436A1 (en) * 2003-08-25 2015-10-22 Thomas J. Reynolds Decision strategy analytics
CN106203445A (en) * 2016-06-30 2016-12-07 北京小白世纪网络科技有限公司 A kind of dictionary construction method migrated based on scale dictionary
CN109165947A (en) * 2018-06-27 2019-01-08 阿里巴巴集团控股有限公司 Account information determines method, apparatus and server
CN111444363A (en) * 2020-03-02 2020-07-24 高新兴科技集团股份有限公司 Picture retrieval method and device, terminal equipment and storage medium
CN111783808A (en) * 2019-07-23 2020-10-16 北京沃东天骏信息技术有限公司 Method and apparatus for generating information
CN111783517A (en) * 2020-05-13 2020-10-16 北京达佳互联信息技术有限公司 Image recognition method and device, electronic equipment and storage medium
CN112101238A (en) * 2020-09-17 2020-12-18 浙江商汤科技开发有限公司 Clustering method and device, electronic equipment and storage medium
CN112257801A (en) * 2020-10-30 2021-01-22 浙江商汤科技开发有限公司 Incremental clustering method and device for images, electronic equipment and storage medium
CN113052245A (en) * 2021-03-30 2021-06-29 重庆紫光华山智安科技有限公司 Image clustering method and device, electronic equipment and storage medium

Patent Citations (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20150302436A1 (en) * 2003-08-25 2015-10-22 Thomas J. Reynolds Decision strategy analytics
WO2012089902A1 (en) * 2010-12-30 2012-07-05 Nokia Corporation Method, apparatus, and computer program product for image clustering
CN103049514A (en) * 2012-12-14 2013-04-17 杭州淘淘搜科技有限公司 Balanced image clustering method based on hierarchical clustering
CN106203445A (en) * 2016-06-30 2016-12-07 北京小白世纪网络科技有限公司 A kind of dictionary construction method migrated based on scale dictionary
CN109165947A (en) * 2018-06-27 2019-01-08 阿里巴巴集团控股有限公司 Account information determines method, apparatus and server
CN111783808A (en) * 2019-07-23 2020-10-16 北京沃东天骏信息技术有限公司 Method and apparatus for generating information
CN111444363A (en) * 2020-03-02 2020-07-24 高新兴科技集团股份有限公司 Picture retrieval method and device, terminal equipment and storage medium
CN111783517A (en) * 2020-05-13 2020-10-16 北京达佳互联信息技术有限公司 Image recognition method and device, electronic equipment and storage medium
CN112101238A (en) * 2020-09-17 2020-12-18 浙江商汤科技开发有限公司 Clustering method and device, electronic equipment and storage medium
CN112257801A (en) * 2020-10-30 2021-01-22 浙江商汤科技开发有限公司 Incremental clustering method and device for images, electronic equipment and storage medium
CN113052245A (en) * 2021-03-30 2021-06-29 重庆紫光华山智安科技有限公司 Image clustering method and device, electronic equipment and storage medium

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
NIEVES BRISABOA等: "Clustering-Based Similarity Search in Metric Spaces with Sparse Spatial Centers", 《SOFSEM 2008: THEORY AND PRACTICE OF COMPUTER SCIENCE》, pages 186 - 197 *
金元华等: "一种基于CFAR检测和密度聚类的ISAR图像预处理方法", 《太赫兹科学与电子信息学报》, vol. 18, no. 2, pages 278 - 283 *

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114612967A (en) * 2022-03-03 2022-06-10 北京百度网讯科技有限公司 Face clustering method, device, equipment and storage medium

Also Published As

Publication number Publication date
CN113743533B (en) 2023-08-01

Similar Documents

Publication Publication Date Title
WO2020125216A1 (en) Pedestrian re-identification method, device, electronic device and computer-readable storage medium
CN110209859B (en) Method and device for recognizing places and training models of places and electronic equipment
CN108197532A (en) The method, apparatus and computer installation of recognition of face
WO2020024744A1 (en) Image feature point detecting method, terminal device, and storage medium
CN110147710B (en) Method and device for processing human face features and storage medium
CN110781911B (en) Image matching method, device, equipment and storage medium
CN111368943A (en) Method and device for identifying object in image, storage medium and electronic device
US9665773B2 (en) Searching for events by attendants
CN113537254B (en) Image feature extraction method and device, electronic equipment and readable storage medium
WO2021175040A1 (en) Video processing method and related device
CN111666976B (en) Feature fusion method, device and storage medium based on attribute information
CN111091106A (en) Image clustering method and device, storage medium and electronic device
CN110245573A (en) A kind of register method, apparatus and terminal device based on recognition of face
CN110941978A (en) Face clustering method and device for unidentified personnel and storage medium
CN113743533B (en) Picture clustering method and device and storage medium
CN113157962B (en) Image retrieval method, electronic device, and storage medium
CN114187463A (en) Electronic archive generation method and device, terminal equipment and storage medium
CN112052251B (en) Target data updating method and related device, equipment and storage medium
CN111709473A (en) Object feature clustering method and device
CN111177450B (en) Image retrieval cloud identification method and system and computer readable storage medium
CN114723652A (en) Cell density determination method, cell density determination device, electronic apparatus, and storage medium
CN112257689A (en) Training and recognition method of face recognition model, storage medium and related equipment
CN112069412A (en) Information recommendation method and device, computer equipment and storage medium
CN111626212A (en) Method and device for identifying object in picture, storage medium and electronic device
CN115147633A (en) Image clustering method, device, equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant