CN111401300B - Face clustering archiving method and device and storage medium - Google Patents

Face clustering archiving method and device and storage medium Download PDF

Info

Publication number
CN111401300B
CN111401300B CN202010266218.7A CN202010266218A CN111401300B CN 111401300 B CN111401300 B CN 111401300B CN 202010266218 A CN202010266218 A CN 202010266218A CN 111401300 B CN111401300 B CN 111401300B
Authority
CN
China
Prior art keywords
class
class center
similarity
feature
file
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202010266218.7A
Other languages
Chinese (zh)
Other versions
CN111401300A (en
Inventor
邸德宁
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhejiang Dahua Technology Co Ltd
Original Assignee
Zhejiang Dahua Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhejiang Dahua Technology Co Ltd filed Critical Zhejiang Dahua Technology Co Ltd
Priority to CN202010266218.7A priority Critical patent/CN111401300B/en
Publication of CN111401300A publication Critical patent/CN111401300A/en
Application granted granted Critical
Publication of CN111401300B publication Critical patent/CN111401300B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/161Detection; Localisation; Normalisation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/22Matching criteria, e.g. proximity measures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques

Landscapes

  • Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • General Engineering & Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Oral & Maxillofacial Surgery (AREA)
  • Human Computer Interaction (AREA)
  • Multimedia (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The application discloses a face cluster filing method, a face cluster filing device and a storage medium. The method comprises the following steps: acquiring a new feature and a file set, wherein the file set comprises classified files, the total similarity threshold of each file is obtained by adjusting a first preset threshold according to the total similarity of all features in the files, and the similarity of each feature is the similarity between the feature when being classified into the files and the class center feature of the files; calculating the total similarity between the new feature and the class center feature of each file in the file set to obtain a target file, wherein the total similarity between the class center feature of the target file in the file set and the new feature is the maximum; if the total similarity between the class center feature of the target file and the new feature is larger than the total similarity threshold of the target file, the new feature is classified into the target file, otherwise, the new feature is classified into the new file. By the method, the effectiveness of the face clustering filing process can be improved.

Description

Face clustering archiving method and device and storage medium
Technical Field
The present application relates to the field of image processing, and in particular, to a face cluster archiving method, apparatus, and storage medium.
Background
At present, the problem that which two face images belong to the same person can be judged by using a face recognition technology, and the problem that which faces belong to the same person can be judged by using a clustering technology. Therefore, a large number of faces can be clustered through the face clustering technology, and the faces belonging to the same person in the large number of faces are classified into the same file, namely 'one person and one file'. However, the face cluster archiving accuracy in the prior art is not high.
Disclosure of Invention
The method and the device mainly solve the technical problem that in the prior art, face clustering archiving accuracy is not high.
In order to solve the technical problem, the application adopts a technical scheme that: a face cluster filing method is provided, the method comprises: acquiring a new feature and a file set, wherein the file set comprises classified files, the total similarity threshold of each file is obtained by adjusting a first preset threshold according to the total similarity of all features in the files, and the total similarity of each feature is the total similarity between the feature when being classified into the files and the class center feature of the files; calculating the total similarity between the new features and the class center features of each file in the file set to obtain a target file, wherein the total similarity between the class center features of the target file in the file set and the new features is the maximum; if the total similarity between the class center feature of the target file and the new feature is larger than the total similarity threshold of the target file, the new feature is classified into the target file, otherwise, the new feature is classified into the new file.
In order to solve the above technical problem, another technical solution adopted by the present application is: a face cluster filing apparatus is provided, which comprises a processor, and a memory coupled to the processor, wherein the memory stores program instructions stored in the execution memory to implement the method.
In order to solve the above technical problem, the present application adopts another technical solution that: there is provided a storage medium storing program instructions that when executed enable the above method to be implemented.
The beneficial effect of this application is: through the implementation of the scheme, the total similarity between the new feature and the class center feature of each archive in the archive set is calculated to select the archive which is most similar to the new feature, namely the target archive, and when the total similarity between the new feature and the target archive is greater than the similarity threshold of the target archive, the new feature is classified into the target archive, the total similarity threshold of each archive is obtained by adjusting a first preset threshold according to the total similarity of all features in the archive, the total similarity of each feature is the total similarity between the feature and the class center feature of the archive when the feature is classified into the archive, the total similarity threshold can be adaptively optimized for each archive, and therefore the accuracy of new feature (face) cluster archiving can be improved.
Drawings
In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts. Wherein:
FIG. 1 is a schematic flow chart of a first embodiment of a face cluster archiving method according to the present application;
FIG. 2 is a schematic view of a detailed flow chart of S120 in FIG. 1;
FIG. 3 is a schematic diagram of a detailed flow chart of S123 in FIG. 2;
FIG. 4 is a schematic flow chart of a second embodiment of the face cluster archiving method according to the present application;
FIG. 5 is a schematic flow chart of a third embodiment of the face cluster archiving method according to the present application;
FIG. 6 is a detailed flowchart of S310 in FIG. 5;
FIG. 7 is a detailed flowchart of S320 in FIG. 5;
FIG. 8 is a schematic structural diagram of an embodiment of a face cluster filing apparatus according to the present application;
FIG. 9 is a schematic structural diagram of an embodiment of a storage medium according to the present application.
Detailed Description
The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.
Fig. 1 is a schematic flow chart of a first embodiment of a face cluster archiving method according to the present application. It should be noted that, if the result is substantially the same, the flow sequence shown in fig. 1 is not limited in this embodiment. As shown in fig. 1, the present embodiment includes:
s110: a new set of features and archives is obtained.
In the present application, the features refer to face features extracted from a face image, and each feature corresponds to one face image. The new features are the facial features to be clustered and archived. The archives are concentrated to include the archives of having classifyed, and every archives represents a class that the cluster obtained, and under the ideal condition, all characteristics in single archives all belong to the same person, and the face in the facial image that all characteristics in an archives correspond all belongs to the same person promptly, and everybody has only a archives.
The total similarity threshold of each file is obtained by adjusting the first preset threshold according to the total similarity of all the features in the file, so as to better adapt to the features of each file, and the specific adjustment mode refers to the following embodiments. The overall similarity of each feature in the profile is the overall similarity between the feature when it is included in the profile and the class-center feature of the profile.
S120: and calculating the total similarity between the new characteristics and the class center characteristics of each file in the file set to obtain the target file.
The overall similarity between the class-centric feature and the new feature of the target profile in the profile set is greatest.
In an archive set, each archive has a corresponding class center, which may also be referred to as a class center feature. Among all the profiles included in the profile set, the profile having the greatest total similarity with the new feature is referred to as a target profile.
Referring to fig. 2, S120 may specifically include the following sub-steps:
s121: a first similarity between the new feature and the class center feature of each profile in the profile set is calculated to obtain a first class center set.
The first class center set comprises a first number of class centers with the highest first similarity with the new feature.
The first similarity may be a cosine similarity, a euclidean distance, or the like. There are various methods for obtaining the first number of class centers with the highest first similarity to the new feature, including but not limited to ergodic brute force search, approximate neighbor search such as kd-tree, etc. The first number is valued such that the first class of centroid sets includes class of centroids of faces of the same person and/or persons that are more similar to the person as the new feature.
For example, the first number is p (p is an integer), and the first class center set C1 includes p class centers with the highest first similarity to the new feature, which is denoted as C1 ═ top-1, top-2, …, top-p }. Of course, the first class center set C1 is only an example, and the arrangement order of the p class centers contained therein is not particularly limited.
S122: and screening class centers with the first similarity between the class centers and the new features larger than a second preset threshold value from the first class center set to form a second class center set.
The selection capability of the second preset threshold is stronger than that of the p class centers contained in the C1 set, in other words, the specific value of the second preset threshold is such that the second class center set selected from C1 contains class centers belonging to the same person as much as possible, but does not include too many wrong class centers.
For example, the second class center set C2 includes n (n is an integer and n ≦ p) class centers with the highest first similarity between the new feature, which is denoted as C2 ═ top-1, top-2, …, top-n }, where, of course, the second class center set C2 is only an example, and the arrangement order of the n class centers included therein is not limited.
S123: a second similarity between the new feature and each class center in the second set of class centers is calculated.
The second similarity is a neighbor similarity, and referring to fig. 3, the calculation method includes:
s1231: and respectively calculating first similarity between each class center in the second class center set and all class centers in the first class center set to obtain a third class center set.
The third class center set comprises a second number of class centers with the highest first similarity between each class center in the first class center set and each class center in the second class center set, the class centers in the third class center set are sequentially arranged from large to small according to the first similarity between the class centers in the second class center set, and each class center in the second class center set is provided with a corresponding third class center set.
Taking a class center top-1 in the second class center set as an example, respectively calculating the similarity between top-1 in C2 and each class center in C1, and screening out a second number m (m is an integer) of class centers (defined herein using a traversal search) that are most similar to top-1 in C2 to form a third class center set C3 corresponding to top-1, wherein m class centers in C3 are arranged in order from large to small according to the first similarity with top-1 in C2. And, C3 can be named as the third class center set of top-1 in C2.
S1232: and calculating a second similarity between the new feature and each class center in the second class center set according to the internal judgment condition of whether each class center in the fourth class center set belongs to the third class center set.
The fourth class center set is composed of a second number of class centers with the largest first similarity between the first class center set and the new feature, and the second number of class centers in the fourth class center set are sequentially arranged from large to small according to the first similarity between the fourth class center set and the new feature.
The number of class centers in the fourth class center set is equal to the number of class centers in the third class center set. And recording a fourth class center set as C4 ═ { New-1, New-2, …, New-m }, wherein New-1 to New-m in C4 are arranged in the order of maximum first similarity with the New features.
The second similarity between the new feature and each class center in the second class center set is specifically calculated according to the following formula:
Figure BDA0002441342420000051
wherein, Neighsim is the second similarity, C3 is the third class center set including m class centers in the first class center set most similar to the class centers in the second class center set, New i Is the ith (i is an integer and i is less than or equal to m) class center most similar to the new feature in the first class center set, nw i,j Is a weight matrix of Neighsim, j represents New i Rank value in C3; new i E C3 represents the case that the internal judgment condition is satisfied, δ represents a mark function, δ represents 1 if the internal condition is satisfied, otherwise, δ represents 0.
Wherein, nw i,j The values of (1) can be all 1, which is equivalent to counting the number of centers of the same type in the third type center set and the fourth type center set. Of course, nw i,j The value of (d) can also be determined by the values of i and j, for example, the smaller the values of i and j, the smaller the value of nw i,j The larger the value of (a), the more adjacent the weight is.
S124: and calculating the total similarity between the new feature and each class center in the second class center set according to the second similarity and the first similarity between the new feature and each class center in the second class center set.
Wherein, the larger the first similarity and the second similarity are, the larger the total similarity calculated according to the first similarity and the second similarity is. Three examples of similarity calculation methods are given below:
example 1: sim ═ FeatSim ═ NeighSim
Example 2: sim ═ sw f ·FeatSim+(1-sw f )·NeighSim
Example 3:
Figure BDA0002441342420000061
wherein Sim is total similarity, FeatSim is first similarity, NeighSim is second similarity, sw f Represents a weight value corresponding to the first similarity and sw f ∈(0,1),k 1 Representing a decay rate factor, which is, for example, 0.1.
S130: and judging whether the total similarity between the class center feature of the target file and the new feature is greater than the similarity threshold of the target file.
If yes, go to S140, otherwise go to S150.
S140: the new features are included in the target profile.
And if the total similarity between the class center feature of the target file and the new feature is greater than the total similarity threshold of the target file, the new feature meets the requirement of being classified into the target file and can be classified into the target file.
S150: the new features are included in the new profile.
If the similarity between the class-center feature of the target profile and the new feature is less than or equal to the total similarity threshold of the target profile, it means that the new feature does not meet the requirement of being included in the target profile, and therefore a new profile needs to be established and included.
In addition, after the new features are classified into the files (S140, S150), the method may further include:
s160: and updating the class center characteristics of the file.
After the new features are included in the file, the class-centric features of the file may be updated based on the new features. Specifically, the class center feature of the file with the new feature included therein can be calculated according to the class center feature of the file with the new feature included therein and the new feature.
The specific calculation formula is as follows:
Figure BDA0002441342420000071
Sumfw CNum+1 =Sumfw CNum +fw CNum+1
wherein, F i To update the ith dimension value, F ', of the class-centric feature of the previous archive' i For the ith dimension value, Sumfw, of the class-centric feature of the updated archive CNum Weighted sum, fw, of CNum features included in the pre-archive for the new feature CNum+1 Sumfw as a weight value to be applied to the CNum +1 st feature (new feature) in the archive CNum+1 Cumulative sum of weights for CNum +1 features in the archive after the new feature is included, f i,CNum+1 The ith dimension value of the CNum +1 th feature (new feature).
In the following, f is given i,CNum+1 The example of the value-taking method of (1):
example 1: all the characteristics in the file have the weight value of 1, namely f, regardless of the difference between the faces i,CNum+1 =1,Sumfw CNum =CNum。
Example 2: in consideration of the facial image quality (information influencing the facial image recognition accuracy such as the fuzzy degree, the angle, the expression and the like), at the moment, a weight value can be given to the features according to the facial image quality scores, the facial image features with better quality are closer to the class center, so the facial image features with higher quality scores have larger weight values, and the facial image features with lower quality scores have smaller weight values.
Example 3: considering the shooting time of the face image, the closer the shooting time is, the closer the face image feature should be to the new feature face, and the weighted value is larger. Thus can be according to the gradeThe time sequence of the feature addition in the scheme gives weight to the features. When new features are added to the archive, the weighted sum of the features in the archive can be adjusted according to an exponential weighted moving average method to balance the time effect. For example, Sumfw CNum ′=Sumfw CNum Wm, wm is a predetermined momentum factor, Sumfw CNum To sum the feature weights before adjustment, Sumfw CNum ' is the adjusted feature weight sum. Wherein wm ∈ (0,1), the larger the time effect, the weaker the effect.
Since the original class center feature values of the new file are all 0, when the new feature is included in the new file, the class center feature of the new file can be updated by the method, and the new feature can also be directly used as the class center feature of the new file.
Through the implementation of the embodiment, the archive which is most similar to the new feature, namely the target archive, is selected by calculating the total similarity between the new feature and the class center feature of each archive in the archive set, and the new feature is classified into the target archive only when the total similarity between the new feature and the target archive is greater than the total similarity threshold of the adjusted target archive, the total similarity threshold of each archive is obtained by adjusting the first preset threshold according to the total similarity of all features in the archive, the total similarity of each feature is the total similarity between the feature classified into the class center feature of the archive, and the total similarity threshold can be adaptively optimized for each archive, so that the accuracy of new feature (face) cluster archiving can be improved.
Fig. 4 is a flowchart illustrating a second embodiment of the face cluster archiving method according to the present application. It should be noted that, if the result is substantially the same, the flow sequence shown in fig. 4 is not limited in this embodiment. As shown in fig. 4, after the new features are included in the target file based on the foregoing embodiment, the embodiment may further include:
s210: the variance/standard deviation of the total similarity of all features in the target profile is calculated.
When new features are included in the target profile, the variance/standard deviation and mean of the total similarity of the features in the target profile may be updated. The specific calculation method is as follows:
the cumulative sum of squares, cumulative sum of total similarity of all features in the target profile after the new feature is included is calculated:
Figure BDA0002441342420000081
Figure BDA0002441342420000082
wherein, SumSquare CNum+1 SumSquare for the cumulative sum of squares of the total similarity of CNum +1 features in the target archive after the new feature is included CNum Cumulative sum of squares, Sim, of total similarity of CNum features included in pre-target archive for new features CNum+1 Is the total similarity of the new features; SumSim CNum+1 Cumulative sum of total similarity of CNum +1 features in target archive after new feature is included, SumSim CNum The cumulative sum of the total similarity of CNum existing features that were included in the pre-target profile for the new feature.
And then calculating the variance/standard deviation of the total similarity of the features in the current target file according to the cumulative sum of squares of the total similarity of all the features in the target file and the cumulative sum:
Figure BDA0002441342420000083
Figure BDA0002441342420000084
wherein Mean is CNum+1 Is the mean value of the total similarity of CNum +1 features in the current target file, Var CNum+1 Is the variance of the total similarity of CNum +1 features in the current target profile.
S220: and adjusting the first preset threshold value according to the variance/standard deviation to obtain the total similarity threshold value of the target file.
The specific adjustment method in the step is as follows:
when the number of features in the target file is smaller than or equal to a first number threshold, the total similarity threshold of the target file is equal to a first preset threshold. When the number of the features in the target file is small, the mean and variance of the total similarity of all the features in the target file have insufficient meaning, so the first preset threshold is not adjusted temporarily, but is directly used as the total similarity threshold of the file. Specifically, it can be formulated as:
T1A=T1,
wherein T1A is the total similarity threshold of the target file, and T1 is the first predetermined threshold.
When the number of features in the target file is greater than the first number threshold and less than the second number threshold, the total similarity threshold of the target file is equal to the first preset threshold plus the product of the difference between the standard deviation and the standard deviation reference value and the first adjustment amplitude factor. When the number of features in the target profile is slightly larger, the mean and variance of the total similarity of all features in the target profile have certain representative meaning, and the threshold value can be slightly adjusted according to the mean and variance. Specifically, it can be formulated as:
Figure BDA0002441342420000091
k 1 to adjust the amplitude factor, Std all A reference value of standard deviation, Var, of the total similarity of all features in the target file CNum Is the variance of the total similarity of all features in the target profile,
Figure BDA0002441342420000092
is the standard deviation of the total similarity of all features in the target profile. Wherein Std all The total distribution of the features of the archive set can be represented, which can be set as the mean of the standard deviation of the class center feature distribution of all the archives in the archive set (updated with the class center feature update), and of course, can also be specified by the outside world.
When the similarity distribution of the features in the target file is sparse (the variance is large), a first preset threshold value can be adjusted downwards through an upper formula to serve as a total similarity threshold value of the target file; when the similarity distribution of the features in the target file is close (the variance is small), the first preset threshold can be adjusted up as the total similarity threshold of the target file by the above formula.
When the number of features in the target profile is greater than or equal to the second number threshold, the total similarity threshold of the target profile is equal to the first preset threshold minus the product of the standard deviation and the second adjustment magnitude factor. When the number of features in the target file is large, the overall similarity distribution of all the features in the target file is relatively stable, and the threshold value can be adjusted by fully referring to the overall similarity distribution. In particular, it can be formulated as,
Figure BDA0002441342420000101
wherein k is 2 For second adjustment of the amplitude factor, Mean CNum Is the average of the total similarity of all features in the target profile. The magnitude of the risk of the acceptable error class members entering can be controlled by adjusting the second adjustment amplitude factor, and meanwhile, the influence of the error class members with low similarity in the file on the adjustment process of the second preset threshold value is weakened by means of a statistical outlier detection method.
After each new feature is archived, the above process of adjusting the total similarity threshold can be performed on the target archive into which the feature is included, so as to adjust the total similarity threshold of all the archives in the archive set.
Fig. 5 is a schematic flow chart of a third embodiment of the face cluster archiving method according to the present application. It should be noted that, if the result is substantially the same, the flow sequence shown in fig. 5 is not limited in this embodiment. As shown in fig. 5, the present embodiment includes:
s310: and merging the two files with the total similarity larger than a third preset threshold value in the file set.
The files with the total similarity larger than the third preset threshold in the file set are merged, so that the recall rate of the features can be improved, and the probability of the occurrence of the condition of one person with multiple files is reduced. In order to reduce the calculation amount and time-consuming burden, the step of file centralized file combination can be executed once within a preset time length.
Referring to fig. 6, this step may specifically include the following sub-steps:
s311: and respectively calculating the total similarity between class center characteristics of the existing files in the file set.
The class-center feature of the less-featured profile (similar to the new feature described above) is used to calculate the total similarity with the more-featured profile as the total similarity between the two profiles.
S312: and judging whether two files with the total similarity between the class center features larger than a third preset threshold exist.
If yes, go to step S313.
S313: and merging the two files with the total similarity larger than a third preset threshold value.
When two files are merged, the file with less characteristics is added to the file with more characteristics. The number of features in the merged file is the sum of the number of class members of the two files before merging, i.e.
CNum whole =CNum small +CNum big
Wherein, CNum whole For the characteristic number of the merged file, CNum small The number of features of the less featured file, CNum big The number of features of the more featured document.
The calculation method of the cumulative sum of squares of the total similarity of the features is as follows:
SumSquare whole =SumSquare big +CNum small ·Sim big_small 2
wherein, SumSquare whole Cumulative sum of squares, SumSquare, for the total similarity of features of the merged files big Cumulative sum of squares, Sim, of total similarity of features for more characteristic files big_small The total similarity between the central features of the first two file classes is merged.
The method for calculating the cumulative sum of the total similarity of the features in the merged file is as follows:
SumSim whole =SumSim big +CNum small ·Sim big_small
wherein, SumSim whole Cumulative sum of total similarity of features for merged files, SumSim big The feature total similarity of the more-feature files is accumulated.
S320: and carrying out noise check on the target file.
In order to reduce the calculation amount and time-consuming burden, the step can be executed once within a preset time length.
Referring to fig. 7, S320 may specifically include the following sub-steps:
s321: the total similarity between every two features in the target profile is calculated.
Referring to S123, the third class center set and the fourth class center set involved in the process of calculating the total similarity between two features of the target profile set in this step are respectively a third number of class centers that are the most similar among all features of the entire target profile for each of the two features for which the total similarity needs to be calculated.
Wherein the third number may be 2/3 times the number of features in the target profile.
S322: and performing offline clustering on all the features in the target file based on the total similarity between every two features to obtain at least one subclass.
The off-line clustering method can be density clustering, hierarchical clustering, spectral clustering and the like.
S323: and replacing the target file with the target subclass, calculating the class center characteristics of the target subclass and other subclasses, replacing the class center characteristics of the target file with the class center characteristics of the target subclass, and clustering and archiving the class center characteristics of other subclasses as new characteristics.
And if more than one subclass is obtained, replacing the target file with the target subclass, and calculating the class center characteristic of the target subclass to replace the class center characteristic of the target file, wherein the target subclass is the subclass with the most characteristics. And clustering and archiving the subclasses except the target subclass again.
Specifically, each feature in the target sub-class may be added as a new feature into the same new archive one by one according to the method in the foregoing embodiment, so as to obtain the class center feature of the target sub-class.
Similarly, the class center feature of each of the other subclasses can be calculated according to the method of the above embodiment, and it is determined whether the class center feature of each of the other subclasses satisfies the condition for adding to the other files in the file set.
Referring to fig. 8, fig. 8 is a schematic structural diagram of an embodiment of a face cluster filing apparatus according to the present application. As shown in fig. 8, the face cluster archive 800 includes a processor 810, a memory 820 coupled to the processor.
Wherein the memory 820 stores program instructions for implementing the method of any of the embodiments described above; processor 810 is configured to execute program instructions stored by memory 820 to implement the steps of the above-described method embodiments. The processor 810 may also be referred to as a Central Processing Unit (CPU). Processor 810 may be an integrated circuit chip having signal processing capabilities. The processor 810 may also be a general purpose processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA) or other programmable logic device, discrete gate or transistor logic, discrete hardware components. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.
Referring to fig. 9, fig. 9 is a schematic structural diagram of a storage medium according to an embodiment of the present application. The storage medium 900 of the embodiment of the present application stores program instructions, and the program instructions, when executed, implement the face cluster archiving method of the present application. The instructions may form a program file stored in the storage medium in the form of a software product, so as to enable a computer device (which may be a personal computer, a server, or a network device) or a processor (processor) to execute all or part of the steps of the methods according to the embodiments of the present application. And the aforementioned storage medium includes: various media capable of storing program codes, such as a usb disk, a mobile hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, or terminal devices, such as a computer, a server, a mobile phone, and a tablet.
The above description is only for the purpose of illustrating embodiments of the present application and is not intended to limit the scope of the present application, and all modifications of equivalent structures and equivalent processes, which are made by the contents of the specification and the drawings of the present application or are directly or indirectly applied to other related technical fields, are also included in the scope of the present application.

Claims (12)

1. A face cluster archiving method is characterized by comprising the following steps:
acquiring new features and a file set, wherein the file set comprises classified files, the total similarity threshold of each file is obtained by adjusting a first preset threshold according to the total similarity of all the features in the files, and the total similarity of each feature is the total similarity between the feature when being classified into the files and the class center feature of the files;
calculating the total similarity between the new feature and the class center feature of each of the files in the file set to obtain a target file, wherein the total similarity between the class center feature of the target file in the file set and the new feature is the maximum;
if the total similarity between the class-center feature of the target profile and the new feature is greater than a total similarity threshold of the target profile, then the new feature is classified into the target profile; otherwise, the new features are classified into a new file;
after said attributing the new feature to the target profile, the method further comprises:
calculating the standard deviation of the total similarity of all the characteristics in the target file;
determining an adjustment mode of the first preset threshold value based on the number range of the feature number in the target file;
adjusting the first preset threshold value based on the determined adjustment mode to obtain a total similarity threshold value of the target file;
if the feature quantity belongs to a first quantity range, determining that an adjustment mode of the first preset threshold is a first adjustment mode, wherein in the first adjustment mode, a total similarity threshold of the target file is equal to the first preset threshold, and the first quantity range is a range smaller than or equal to the first quantity threshold; if the feature quantity belongs to a second quantity range, determining that the adjustment mode of the first preset threshold is a second adjustment mode, wherein in the second adjustment mode, the total similarity threshold of the target file is equal to the first preset threshold plus the product of the difference value between the standard deviation and the standard deviation reference value and a first adjustment amplitude factor, and the second quantity range is a range which is larger than the first quantity threshold and smaller than a second quantity threshold; and if the feature quantity belongs to a third quantity range, determining that the adjustment mode of the first preset threshold is a third adjustment mode, wherein in the third adjustment mode, the total similarity threshold of the target file is equal to the product of the first preset threshold minus the standard deviation and a second adjustment amplitude factor, and the third quantity range is a range larger than or equal to the second quantity threshold.
2. The method of claim 1, wherein calculating the overall similarity between the new feature and the class-centric feature of each profile in the set of profiles comprises:
calculating a first similarity between the new feature and the class center feature of each file in the file set to obtain a first class center set, wherein the first class center set comprises a first number of class centers with the highest first similarity with the new feature;
screening class centers with the first similarity between the class centers and the new features larger than a second preset threshold value from the first class center set to form a second class center set;
calculating a second similarity between the new feature and each class center in the second class center set, wherein the second similarity is a neighbor similarity;
and calculating the total similarity between the new feature and each class center in the second class center set according to the second similarity and the first similarity between the new feature and each class center in the second class center set.
3. The method of claim 2, wherein said calculating a second similarity between the new feature and each class center in the second set of class centers comprises:
respectively calculating first similarity between each class center in the second class center set and all class centers in the first class center set to obtain a third class center set, wherein the third class center set comprises a second number of class centers with the highest first similarity between each class center in the first class center set and each class center in the second class center set, the class centers in the third class center set are sequentially arranged from large to small according to the first similarity between the class centers in the second class center set, and each class center in the second class center set is provided with a corresponding third class center set;
and calculating a second similarity between the new feature and each class center in the second class center set according to an internal judgment condition that whether each class center in a fourth class center set belongs to the third class center set, wherein the fourth class center set is composed of a second number of class centers with the largest first similarity between the first class center set and the new feature, and the second number of class centers in the fourth class center set are sequentially arranged from large to small according to the first similarity between the second class center set and the new feature.
4. The method according to claim 3, wherein the second similarity between the new feature and each class center in the second class center set is calculated as follows:
Figure 689446DEST_PATH_IMAGE001
wherein the content of the first and second substances,
Figure 725666DEST_PATH_IMAGE002
in order to be said second degree of similarity,
Figure 993836DEST_PATH_IMAGE003
is the most similar center class in the first center class set and the second center class set
Figure 48380DEST_PATH_IMAGE004
The third set of class centers of individual class centers,
Figure 9383DEST_PATH_IMAGE005
is the most similar to the new feature in the first class center set
Figure 795548DEST_PATH_IMAGE006
The center of each category is a center of the category,
Figure 601830DEST_PATH_IMAGE006
is an integer, and
Figure 510881DEST_PATH_IMAGE007
Figure 377205DEST_PATH_IMAGE008
is composed of
Figure 653597DEST_PATH_IMAGE009
The weight matrix of (a) is determined,
Figure 263570DEST_PATH_IMAGE010
represents
Figure 292706DEST_PATH_IMAGE011
In that
Figure 595511DEST_PATH_IMAGE012
Rank values in the set;
Figure 828041DEST_PATH_IMAGE013
is the condition of the inside of the container,
Figure 241704DEST_PATH_IMAGE014
is a mark function, if the internal condition is satisfied
Figure 125347DEST_PATH_IMAGE015
Is 1, otherwise is 0.
5. The method of claim 1, further comprising:
and after the new features are classified into the files, updating the class center features of the files.
6. The method of claim 5, wherein updating the class-centric feature of the archive after the new feature is included in the archive comprises:
and calculating the class center characteristic of the file after the new characteristic is included according to the class center characteristic of the file before the new characteristic is included and the new characteristic.
7. The method of claim 1, further comprising:
and combining the two files with the total similarity larger than a third preset threshold in the file set.
8. The method of claim 7, wherein said merging two said profiles of said set of profiles having a total similarity greater than a third predetermined threshold comprises:
respectively calculating the total similarity between class center characteristics of the existing files in the file set;
judging whether two files with the total similarity between the class center features larger than a third preset threshold exist or not;
and if so, merging the two files of which the total similarity is greater than the third preset threshold.
9. The method of claim 1, wherein the attributing the new feature to the target profile further comprises:
and carrying out noise check on the target file.
10. The method of claim 9, wherein said noise checking said target profile comprises:
calculating the total similarity between every two features in the target file;
performing offline clustering on all the features in the target archive based on the total similarity between every two features to obtain at least one subclass;
replacing the target archive by a target sub-class, calculating class center characteristics of the target sub-class and other sub-classes, replacing the class center characteristics of the target archive by the class center characteristics of the target sub-class, and clustering and archiving the class center characteristics of the other sub-classes as the new characteristics according to the method of any one of claims 1 to 7, wherein the target sub-class is the sub-class with the most characteristics.
11. A face cluster archiving apparatus, the face cluster archiving apparatus comprising a processor, a memory coupled to the processor, wherein,
the memory stores program instructions for implementing the method of any one of claims 1-10;
the processor is configured to execute the program instructions stored by the memory to implement the method of any of claims 1-10.
12. A storage medium storing program instructions which, when executed, implement the steps of the method of any one of claims 1 to 10.
CN202010266218.7A 2020-04-07 2020-04-07 Face clustering archiving method and device and storage medium Active CN111401300B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010266218.7A CN111401300B (en) 2020-04-07 2020-04-07 Face clustering archiving method and device and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010266218.7A CN111401300B (en) 2020-04-07 2020-04-07 Face clustering archiving method and device and storage medium

Publications (2)

Publication Number Publication Date
CN111401300A CN111401300A (en) 2020-07-10
CN111401300B true CN111401300B (en) 2022-08-09

Family

ID=71429464

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010266218.7A Active CN111401300B (en) 2020-04-07 2020-04-07 Face clustering archiving method and device and storage medium

Country Status (1)

Country Link
CN (1) CN111401300B (en)

Families Citing this family (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112633051A (en) * 2020-09-11 2021-04-09 博云视觉(北京)科技有限公司 Online face clustering method based on image search
CN112668635B (en) * 2020-12-25 2022-05-27 浙江大华技术股份有限公司 Image archiving method, device, equipment and computer storage medium
CN112686141A (en) * 2020-12-29 2021-04-20 杭州海康威视数字技术股份有限公司 Personnel filing method and device and electronic equipment
CN113673550A (en) * 2021-06-30 2021-11-19 浙江大华技术股份有限公司 Clustering method, clustering device, electronic equipment and computer-readable storage medium
CN113255621B (en) * 2021-07-13 2021-11-16 浙江大华技术股份有限公司 Face image filtering method, electronic device and computer-readable storage medium
CN114580392B (en) * 2022-04-29 2022-07-29 中科雨辰科技有限公司 Data processing system for identifying entity

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109740003A (en) * 2018-12-28 2019-05-10 上海依图网络科技有限公司 A kind of archiving method and device
CN109815370A (en) * 2018-12-28 2019-05-28 上海依图网络科技有限公司 A kind of archiving method and device
CN110414429A (en) * 2019-07-29 2019-11-05 佳都新太科技股份有限公司 Face cluster method, apparatus, equipment and storage medium

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109740003A (en) * 2018-12-28 2019-05-10 上海依图网络科技有限公司 A kind of archiving method and device
CN109815370A (en) * 2018-12-28 2019-05-28 上海依图网络科技有限公司 A kind of archiving method and device
CN110414429A (en) * 2019-07-29 2019-11-05 佳都新太科技股份有限公司 Face cluster method, apparatus, equipment and storage medium

Also Published As

Publication number Publication date
CN111401300A (en) 2020-07-10

Similar Documents

Publication Publication Date Title
CN111401300B (en) Face clustering archiving method and device and storage medium
US6704725B1 (en) Method of searching multimedia data
CN110825894A (en) Data index establishing method, data index retrieving method, data index establishing device, data index retrieving device, data index establishing equipment and storage medium
US20230237771A1 (en) Self-supervised learning method and apparatus for image features, device, and storage medium
CN112163637B (en) Image classification model training method and device based on unbalanced data
WO2021179631A1 (en) Convolutional neural network model compression method, apparatus and device, and storage medium
CN113743470B (en) AI algorithm-based garbage recognition precision improvement method for automatic bag breaking classification box
CN114282054A (en) Video recommendation method and device, computer equipment and storage medium
CN108491887A (en) A kind of commodity tax incorporates the acquisition methods of code into own forces
US20160292537A1 (en) Feature Interpolation
CN114283332A (en) Fuzzy clustering remote sensing image segmentation method, system, terminal and storage medium
CN110188625B (en) Video fine structuring method based on multi-feature fusion
CN107527058A (en) A kind of image search method based on weighting local feature Aggregation Descriptor
AU2020403709B2 (en) Target object identification method and apparatus
CN108694411A (en) A method of identification similar image
CN116935057A (en) Target evaluation method, electronic device, and computer-readable storage medium
CN115391539A (en) Corpus data processing method and device and electronic equipment
WO2022127333A1 (en) Training method and apparatus for image segmentation model, image segmentation method and apparatus, and device
CN114757353A (en) Compression method and compression device of machine learning model and readable storage medium
Wei et al. Salient object detection based on weighted hypergraph and random walk
CN108804499B (en) Trademark image retrieval method
CN113553326A (en) Spreadsheet data processing method, device, computer equipment and storage medium
CN113673550A (en) Clustering method, clustering device, electronic equipment and computer-readable storage medium
WO2019232645A1 (en) Unsupervised classification of documents using a labeled data set of other documents
CN117476165B (en) Intelligent management method and system for Chinese patent medicine medicinal materials

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant