CN111401300B

CN111401300B - Face clustering archiving method and device and storage medium

Info

Publication number: CN111401300B
Application number: CN202010266218.7A
Authority: CN
Inventors: 邸德宁
Original assignee: Zhejiang Dahua Technology Co Ltd
Current assignee: Zhejiang Dahua Technology Co Ltd
Priority date: 2020-04-07
Filing date: 2020-04-07
Publication date: 2022-08-09
Anticipated expiration: 2040-04-07
Also published as: CN111401300A

Abstract

The application discloses a face cluster filing method, a face cluster filing device and a storage medium. The method comprises the following steps: acquiring a new feature and a file set, wherein the file set comprises classified files, the total similarity threshold of each file is obtained by adjusting a first preset threshold according to the total similarity of all features in the files, and the similarity of each feature is the similarity between the feature when being classified into the files and the class center feature of the files; calculating the total similarity between the new feature and the class center feature of each file in the file set to obtain a target file, wherein the total similarity between the class center feature of the target file in the file set and the new feature is the maximum; if the total similarity between the class center feature of the target file and the new feature is larger than the total similarity threshold of the target file, the new feature is classified into the target file, otherwise, the new feature is classified into the new file. By the method, the effectiveness of the face clustering filing process can be improved.

Description

Face clustering archiving method and device and storage medium

Technical Field

The present application relates to the field of image processing, and in particular, to a face cluster archiving method, apparatus, and storage medium.

Background

At present, the problem that which two face images belong to the same person can be judged by using a face recognition technology, and the problem that which faces belong to the same person can be judged by using a clustering technology. Therefore, a large number of faces can be clustered through the face clustering technology, and the faces belonging to the same person in the large number of faces are classified into the same file, namely 'one person and one file'. However, the face cluster archiving accuracy in the prior art is not high.

Disclosure of Invention

The method and the device mainly solve the technical problem that in the prior art, face clustering archiving accuracy is not high.

In order to solve the technical problem, the application adopts a technical scheme that: a face cluster filing method is provided, the method comprises: acquiring a new feature and a file set, wherein the file set comprises classified files, the total similarity threshold of each file is obtained by adjusting a first preset threshold according to the total similarity of all features in the files, and the total similarity of each feature is the total similarity between the feature when being classified into the files and the class center feature of the files; calculating the total similarity between the new features and the class center features of each file in the file set to obtain a target file, wherein the total similarity between the class center features of the target file in the file set and the new features is the maximum; if the total similarity between the class center feature of the target file and the new feature is larger than the total similarity threshold of the target file, the new feature is classified into the target file, otherwise, the new feature is classified into the new file.

In order to solve the above technical problem, another technical solution adopted by the present application is: a face cluster filing apparatus is provided, which comprises a processor, and a memory coupled to the processor, wherein the memory stores program instructions stored in the execution memory to implement the method.

In order to solve the above technical problem, the present application adopts another technical solution that: there is provided a storage medium storing program instructions that when executed enable the above method to be implemented.

The beneficial effect of this application is: through the implementation of the scheme, the total similarity between the new feature and the class center feature of each archive in the archive set is calculated to select the archive which is most similar to the new feature, namely the target archive, and when the total similarity between the new feature and the target archive is greater than the similarity threshold of the target archive, the new feature is classified into the target archive, the total similarity threshold of each archive is obtained by adjusting a first preset threshold according to the total similarity of all features in the archive, the total similarity of each feature is the total similarity between the feature and the class center feature of the archive when the feature is classified into the archive, the total similarity threshold can be adaptively optimized for each archive, and therefore the accuracy of new feature (face) cluster archiving can be improved.

Drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts. Wherein:

FIG. 1 is a schematic flow chart of a first embodiment of a face cluster archiving method according to the present application;

FIG. 2 is a schematic view of a detailed flow chart of S120 in FIG. 1;

FIG. 3 is a schematic diagram of a detailed flow chart of S123 in FIG. 2;

FIG. 4 is a schematic flow chart of a second embodiment of the face cluster archiving method according to the present application;

FIG. 5 is a schematic flow chart of a third embodiment of the face cluster archiving method according to the present application;

FIG. 6 is a detailed flowchart of S310 in FIG. 5;

FIG. 7 is a detailed flowchart of S320 in FIG. 5;

FIG. 8 is a schematic structural diagram of an embodiment of a face cluster filing apparatus according to the present application;

FIG. 9 is a schematic structural diagram of an embodiment of a storage medium according to the present application.

Detailed Description

The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.

Fig. 1 is a schematic flow chart of a first embodiment of a face cluster archiving method according to the present application. It should be noted that, if the result is substantially the same, the flow sequence shown in fig. 1 is not limited in this embodiment. As shown in fig. 1, the present embodiment includes:

s110: a new set of features and archives is obtained.

In the present application, the features refer to face features extracted from a face image, and each feature corresponds to one face image. The new features are the facial features to be clustered and archived. The archives are concentrated to include the archives of having classifyed, and every archives represents a class that the cluster obtained, and under the ideal condition, all characteristics in single archives all belong to the same person, and the face in the facial image that all characteristics in an archives correspond all belongs to the same person promptly, and everybody has only a archives.

The total similarity threshold of each file is obtained by adjusting the first preset threshold according to the total similarity of all the features in the file, so as to better adapt to the features of each file, and the specific adjustment mode refers to the following embodiments. The overall similarity of each feature in the profile is the overall similarity between the feature when it is included in the profile and the class-center feature of the profile.

S120: and calculating the total similarity between the new characteristics and the class center characteristics of each file in the file set to obtain the target file.

The overall similarity between the class-centric feature and the new feature of the target profile in the profile set is greatest.

In an archive set, each archive has a corresponding class center, which may also be referred to as a class center feature. Among all the profiles included in the profile set, the profile having the greatest total similarity with the new feature is referred to as a target profile.

Referring to fig. 2, S120 may specifically include the following sub-steps:

s121: a first similarity between the new feature and the class center feature of each profile in the profile set is calculated to obtain a first class center set.

The first class center set comprises a first number of class centers with the highest first similarity with the new feature.

The first similarity may be a cosine similarity, a euclidean distance, or the like. There are various methods for obtaining the first number of class centers with the highest first similarity to the new feature, including but not limited to ergodic brute force search, approximate neighbor search such as kd-tree, etc. The first number is valued such that the first class of centroid sets includes class of centroids of faces of the same person and/or persons that are more similar to the person as the new feature.

For example, the first number is p (p is an integer), and the first class center set C1 includes p class centers with the highest first similarity to the new feature, which is denoted as C1 ═ top-1, top-2, …, top-p }. Of course, the first class center set C1 is only an example, and the arrangement order of the p class centers contained therein is not particularly limited.

S122: and screening class centers with the first similarity between the class centers and the new features larger than a second preset threshold value from the first class center set to form a second class center set.

The selection capability of the second preset threshold is stronger than that of the p class centers contained in the C1 set, in other words, the specific value of the second preset threshold is such that the second class center set selected from C1 contains class centers belonging to the same person as much as possible, but does not include too many wrong class centers.

For example, the second class center set C2 includes n (n is an integer and n ≦ p) class centers with the highest first similarity between the new feature, which is denoted as C2 ═ top-1, top-2, …, top-n }, where, of course, the second class center set C2 is only an example, and the arrangement order of the n class centers included therein is not limited.

S123: a second similarity between the new feature and each class center in the second set of class centers is calculated.

The second similarity is a neighbor similarity, and referring to fig. 3, the calculation method includes:

s1231: and respectively calculating first similarity between each class center in the second class center set and all class centers in the first class center set to obtain a third class center set.

The third class center set comprises a second number of class centers with the highest first similarity between each class center in the first class center set and each class center in the second class center set, the class centers in the third class center set are sequentially arranged from large to small according to the first similarity between the class centers in the second class center set, and each class center in the second class center set is provided with a corresponding third class center set.

Taking a class center top-1 in the second class center set as an example, respectively calculating the similarity between top-1 in C2 and each class center in C1, and screening out a second number m (m is an integer) of class centers (defined herein using a traversal search) that are most similar to top-1 in C2 to form a third class center set C3 corresponding to top-1, wherein m class centers in C3 are arranged in order from large to small according to the first similarity with top-1 in C2. And, C3 can be named as the third class center set of top-1 in C2.

S1232: and calculating a second similarity between the new feature and each class center in the second class center set according to the internal judgment condition of whether each class center in the fourth class center set belongs to the third class center set.

The fourth class center set is composed of a second number of class centers with the largest first similarity between the first class center set and the new feature, and the second number of class centers in the fourth class center set are sequentially arranged from large to small according to the first similarity between the fourth class center set and the new feature.

The number of class centers in the fourth class center set is equal to the number of class centers in the third class center set. And recording a fourth class center set as C4 ═ { New-1, New-2, …, New-m }, wherein New-1 to New-m in C4 are arranged in the order of maximum first similarity with the New features.

The second similarity between the new feature and each class center in the second class center set is specifically calculated according to the following formula:

wherein, Neighsim is the second similarity, C3 is the third class center set including m class centers in the first class center set most similar to the class centers in the second class center set, New _i Is the ith (i is an integer and i is less than or equal to m) class center most similar to the new feature in the first class center set, nw _i,j Is a weight matrix of Neighsim, j represents New _i Rank value in C3; new _i E C3 represents the case that the internal judgment condition is satisfied, δ represents a mark function, δ represents 1 if the internal condition is satisfied, otherwise, δ represents 0.

Wherein, nw _i,j The values of (1) can be all 1, which is equivalent to counting the number of centers of the same type in the third type center set and the fourth type center set. Of course, nw _i,j The value of (d) can also be determined by the values of i and j, for example, the smaller the values of i and j, the smaller the value of nw _i,j The larger the value of (a), the more adjacent the weight is.

S124: and calculating the total similarity between the new feature and each class center in the second class center set according to the second similarity and the first similarity between the new feature and each class center in the second class center set.

Wherein, the larger the first similarity and the second similarity are, the larger the total similarity calculated according to the first similarity and the second similarity is. Three examples of similarity calculation methods are given below:

example 1: sim ═ FeatSim ═ NeighSim

Example 2: sim ═ sw _f ·FeatSim+(1-sw _f )·NeighSim

Example 3:

wherein Sim is total similarity, FeatSim is first similarity, NeighSim is second similarity, sw _f Represents a weight value corresponding to the first similarity and sw _f ∈(0,1)，k ₁ Representing a decay rate factor, which is, for example, 0.1.

S130: and judging whether the total similarity between the class center feature of the target file and the new feature is greater than the similarity threshold of the target file.

If yes, go to S140, otherwise go to S150.

S140: the new features are included in the target profile.

And if the total similarity between the class center feature of the target file and the new feature is greater than the total similarity threshold of the target file, the new feature meets the requirement of being classified into the target file and can be classified into the target file.

S150: the new features are included in the new profile.

If the similarity between the class-center feature of the target profile and the new feature is less than or equal to the total similarity threshold of the target profile, it means that the new feature does not meet the requirement of being included in the target profile, and therefore a new profile needs to be established and included.

In addition, after the new features are classified into the files (S140, S150), the method may further include:

s160: and updating the class center characteristics of the file.

After the new features are included in the file, the class-centric features of the file may be updated based on the new features. Specifically, the class center feature of the file with the new feature included therein can be calculated according to the class center feature of the file with the new feature included therein and the new feature.

The specific calculation formula is as follows:

Sumfw _CNum+1 ＝Sumfw _CNum +fw _CNum+1 ，

wherein, F _i To update the ith dimension value, F ', of the class-centric feature of the previous archive' _i For the ith dimension value, Sumfw, of the class-centric feature of the updated archive _CNum Weighted sum, fw, of CNum features included in the pre-archive for the new feature _CNum+1 Sumfw as a weight value to be applied to the CNum +1 st feature (new feature) in the archive _CNum+1 Cumulative sum of weights for CNum +1 features in the archive after the new feature is included, f _i,CNum+1 The ith dimension value of the CNum +1 th feature (new feature).

In the following, f is given _i,CNum+1 The example of the value-taking method of (1):

example 1: all the characteristics in the file have the weight value of 1, namely f, regardless of the difference between the faces _i,CNum+1 ＝1，Sumfw _CNum ＝CNum。

Example 2: in consideration of the facial image quality (information influencing the facial image recognition accuracy such as the fuzzy degree, the angle, the expression and the like), at the moment, a weight value can be given to the features according to the facial image quality scores, the facial image features with better quality are closer to the class center, so the facial image features with higher quality scores have larger weight values, and the facial image features with lower quality scores have smaller weight values.

Example 3: considering the shooting time of the face image, the closer the shooting time is, the closer the face image feature should be to the new feature face, and the weighted value is larger. Thus can be according to the gradeThe time sequence of the feature addition in the scheme gives weight to the features. When new features are added to the archive, the weighted sum of the features in the archive can be adjusted according to an exponential weighted moving average method to balance the time effect. For example, Sumfw _CNum ′＝Sumfw _CNum Wm, wm is a predetermined momentum factor, Sumfw _CNum To sum the feature weights before adjustment, Sumfw _CNum ' is the adjusted feature weight sum. Wherein wm ∈ (0,1), the larger the time effect, the weaker the effect.

Since the original class center feature values of the new file are all 0, when the new feature is included in the new file, the class center feature of the new file can be updated by the method, and the new feature can also be directly used as the class center feature of the new file.

Through the implementation of the embodiment, the archive which is most similar to the new feature, namely the target archive, is selected by calculating the total similarity between the new feature and the class center feature of each archive in the archive set, and the new feature is classified into the target archive only when the total similarity between the new feature and the target archive is greater than the total similarity threshold of the adjusted target archive, the total similarity threshold of each archive is obtained by adjusting the first preset threshold according to the total similarity of all features in the archive, the total similarity of each feature is the total similarity between the feature classified into the class center feature of the archive, and the total similarity threshold can be adaptively optimized for each archive, so that the accuracy of new feature (face) cluster archiving can be improved.

Fig. 4 is a flowchart illustrating a second embodiment of the face cluster archiving method according to the present application. It should be noted that, if the result is substantially the same, the flow sequence shown in fig. 4 is not limited in this embodiment. As shown in fig. 4, after the new features are included in the target file based on the foregoing embodiment, the embodiment may further include:

s210: the variance/standard deviation of the total similarity of all features in the target profile is calculated.

When new features are included in the target profile, the variance/standard deviation and mean of the total similarity of the features in the target profile may be updated. The specific calculation method is as follows:

the cumulative sum of squares, cumulative sum of total similarity of all features in the target profile after the new feature is included is calculated:

wherein, SumSquare _CNum+1 SumSquare for the cumulative sum of squares of the total similarity of CNum +1 features in the target archive after the new feature is included _CNum Cumulative sum of squares, Sim, of total similarity of CNum features included in pre-target archive for new features _CNum+1 Is the total similarity of the new features; SumSim _CNum+1 Cumulative sum of total similarity of CNum +1 features in target archive after new feature is included, SumSim _CNum The cumulative sum of the total similarity of CNum existing features that were included in the pre-target profile for the new feature.

And then calculating the variance/standard deviation of the total similarity of the features in the current target file according to the cumulative sum of squares of the total similarity of all the features in the target file and the cumulative sum:

wherein Mean is _CNum+1 Is the mean value of the total similarity of CNum +1 features in the current target file, Var _CNum+1 Is the variance of the total similarity of CNum +1 features in the current target profile.

S220: and adjusting the first preset threshold value according to the variance/standard deviation to obtain the total similarity threshold value of the target file.

The specific adjustment method in the step is as follows:

when the number of features in the target file is smaller than or equal to a first number threshold, the total similarity threshold of the target file is equal to a first preset threshold. When the number of the features in the target file is small, the mean and variance of the total similarity of all the features in the target file have insufficient meaning, so the first preset threshold is not adjusted temporarily, but is directly used as the total similarity threshold of the file. Specifically, it can be formulated as:

T1A＝T1，

wherein T1A is the total similarity threshold of the target file, and T1 is the first predetermined threshold.

When the number of features in the target file is greater than the first number threshold and less than the second number threshold, the total similarity threshold of the target file is equal to the first preset threshold plus the product of the difference between the standard deviation and the standard deviation reference value and the first adjustment amplitude factor. When the number of features in the target profile is slightly larger, the mean and variance of the total similarity of all features in the target profile have certain representative meaning, and the threshold value can be slightly adjusted according to the mean and variance. Specifically, it can be formulated as:

k ₁ to adjust the amplitude factor, Std _all A reference value of standard deviation, Var, of the total similarity of all features in the target file _CNum Is the variance of the total similarity of all features in the target profile,

is the standard deviation of the total similarity of all features in the target profile. Wherein Std _all The total distribution of the features of the archive set can be represented, which can be set as the mean of the standard deviation of the class center feature distribution of all the archives in the archive set (updated with the class center feature update), and of course, can also be specified by the outside world.

When the similarity distribution of the features in the target file is sparse (the variance is large), a first preset threshold value can be adjusted downwards through an upper formula to serve as a total similarity threshold value of the target file; when the similarity distribution of the features in the target file is close (the variance is small), the first preset threshold can be adjusted up as the total similarity threshold of the target file by the above formula.

When the number of features in the target profile is greater than or equal to the second number threshold, the total similarity threshold of the target profile is equal to the first preset threshold minus the product of the standard deviation and the second adjustment magnitude factor. When the number of features in the target file is large, the overall similarity distribution of all the features in the target file is relatively stable, and the threshold value can be adjusted by fully referring to the overall similarity distribution. In particular, it can be formulated as,

wherein k is ₂ For second adjustment of the amplitude factor, Mean _CNum Is the average of the total similarity of all features in the target profile. The magnitude of the risk of the acceptable error class members entering can be controlled by adjusting the second adjustment amplitude factor, and meanwhile, the influence of the error class members with low similarity in the file on the adjustment process of the second preset threshold value is weakened by means of a statistical outlier detection method.

After each new feature is archived, the above process of adjusting the total similarity threshold can be performed on the target archive into which the feature is included, so as to adjust the total similarity threshold of all the archives in the archive set.

Fig. 5 is a schematic flow chart of a third embodiment of the face cluster archiving method according to the present application. It should be noted that, if the result is substantially the same, the flow sequence shown in fig. 5 is not limited in this embodiment. As shown in fig. 5, the present embodiment includes:

s310: and merging the two files with the total similarity larger than a third preset threshold value in the file set.

The files with the total similarity larger than the third preset threshold in the file set are merged, so that the recall rate of the features can be improved, and the probability of the occurrence of the condition of one person with multiple files is reduced. In order to reduce the calculation amount and time-consuming burden, the step of file centralized file combination can be executed once within a preset time length.

Referring to fig. 6, this step may specifically include the following sub-steps:

s311: and respectively calculating the total similarity between class center characteristics of the existing files in the file set.

The class-center feature of the less-featured profile (similar to the new feature described above) is used to calculate the total similarity with the more-featured profile as the total similarity between the two profiles.

S312: and judging whether two files with the total similarity between the class center features larger than a third preset threshold exist.

If yes, go to step S313.

S313: and merging the two files with the total similarity larger than a third preset threshold value.

When two files are merged, the file with less characteristics is added to the file with more characteristics. The number of features in the merged file is the sum of the number of class members of the two files before merging, i.e.

CNum _whole ＝CNum _small +CNum _big ，

Wherein, CNum _whole For the characteristic number of the merged file, CNum _small The number of features of the less featured file, CNum _big The number of features of the more featured document.

The calculation method of the cumulative sum of squares of the total similarity of the features is as follows:

SumSquare _whole ＝SumSquare _big +CNum _small ·Sim _{big_small} ² ，

wherein, SumSquare _whole Cumulative sum of squares, SumSquare, for the total similarity of features of the merged files _big Cumulative sum of squares, Sim, of total similarity of features for more characteristic files _{big_small} The total similarity between the central features of the first two file classes is merged.

The method for calculating the cumulative sum of the total similarity of the features in the merged file is as follows:

SumSim _whole ＝SumSim _big +CNum _small ·Sim _{big_small} ，

wherein, SumSim _whole Cumulative sum of total similarity of features for merged files, SumSim _big The feature total similarity of the more-feature files is accumulated.

S320: and carrying out noise check on the target file.

In order to reduce the calculation amount and time-consuming burden, the step can be executed once within a preset time length.

Referring to fig. 7, S320 may specifically include the following sub-steps:

s321: the total similarity between every two features in the target profile is calculated.

Referring to S123, the third class center set and the fourth class center set involved in the process of calculating the total similarity between two features of the target profile set in this step are respectively a third number of class centers that are the most similar among all features of the entire target profile for each of the two features for which the total similarity needs to be calculated.

Wherein the third number may be 2/3 times the number of features in the target profile.

S322: and performing offline clustering on all the features in the target file based on the total similarity between every two features to obtain at least one subclass.

The off-line clustering method can be density clustering, hierarchical clustering, spectral clustering and the like.

S323: and replacing the target file with the target subclass, calculating the class center characteristics of the target subclass and other subclasses, replacing the class center characteristics of the target file with the class center characteristics of the target subclass, and clustering and archiving the class center characteristics of other subclasses as new characteristics.

And if more than one subclass is obtained, replacing the target file with the target subclass, and calculating the class center characteristic of the target subclass to replace the class center characteristic of the target file, wherein the target subclass is the subclass with the most characteristics. And clustering and archiving the subclasses except the target subclass again.

Specifically, each feature in the target sub-class may be added as a new feature into the same new archive one by one according to the method in the foregoing embodiment, so as to obtain the class center feature of the target sub-class.

Similarly, the class center feature of each of the other subclasses can be calculated according to the method of the above embodiment, and it is determined whether the class center feature of each of the other subclasses satisfies the condition for adding to the other files in the file set.

Referring to fig. 8, fig. 8 is a schematic structural diagram of an embodiment of a face cluster filing apparatus according to the present application. As shown in fig. 8, the face cluster archive 800 includes a processor 810, a memory 820 coupled to the processor.

Wherein the memory 820 stores program instructions for implementing the method of any of the embodiments described above; processor 810 is configured to execute program instructions stored by memory 820 to implement the steps of the above-described method embodiments. The processor 810 may also be referred to as a Central Processing Unit (CPU). Processor 810 may be an integrated circuit chip having signal processing capabilities. The processor 810 may also be a general purpose processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA) or other programmable logic device, discrete gate or transistor logic, discrete hardware components. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.

Referring to fig. 9, fig. 9 is a schematic structural diagram of a storage medium according to an embodiment of the present application. The storage medium 900 of the embodiment of the present application stores program instructions, and the program instructions, when executed, implement the face cluster archiving method of the present application. The instructions may form a program file stored in the storage medium in the form of a software product, so as to enable a computer device (which may be a personal computer, a server, or a network device) or a processor (processor) to execute all or part of the steps of the methods according to the embodiments of the present application. And the aforementioned storage medium includes: various media capable of storing program codes, such as a usb disk, a mobile hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, or terminal devices, such as a computer, a server, a mobile phone, and a tablet.

The above description is only for the purpose of illustrating embodiments of the present application and is not intended to limit the scope of the present application, and all modifications of equivalent structures and equivalent processes, which are made by the contents of the specification and the drawings of the present application or are directly or indirectly applied to other related technical fields, are also included in the scope of the present application.

Claims

1. A face cluster archiving method is characterized by comprising the following steps:

acquiring new features and a file set, wherein the file set comprises classified files, the total similarity threshold of each file is obtained by adjusting a first preset threshold according to the total similarity of all the features in the files, and the total similarity of each feature is the total similarity between the feature when being classified into the files and the class center feature of the files;

calculating the total similarity between the new feature and the class center feature of each of the files in the file set to obtain a target file, wherein the total similarity between the class center feature of the target file in the file set and the new feature is the maximum;

if the total similarity between the class-center feature of the target profile and the new feature is greater than a total similarity threshold of the target profile, then the new feature is classified into the target profile; otherwise, the new features are classified into a new file;

after said attributing the new feature to the target profile, the method further comprises:

calculating the standard deviation of the total similarity of all the characteristics in the target file;

determining an adjustment mode of the first preset threshold value based on the number range of the feature number in the target file;

adjusting the first preset threshold value based on the determined adjustment mode to obtain a total similarity threshold value of the target file;

if the feature quantity belongs to a first quantity range, determining that an adjustment mode of the first preset threshold is a first adjustment mode, wherein in the first adjustment mode, a total similarity threshold of the target file is equal to the first preset threshold, and the first quantity range is a range smaller than or equal to the first quantity threshold; if the feature quantity belongs to a second quantity range, determining that the adjustment mode of the first preset threshold is a second adjustment mode, wherein in the second adjustment mode, the total similarity threshold of the target file is equal to the first preset threshold plus the product of the difference value between the standard deviation and the standard deviation reference value and a first adjustment amplitude factor, and the second quantity range is a range which is larger than the first quantity threshold and smaller than a second quantity threshold; and if the feature quantity belongs to a third quantity range, determining that the adjustment mode of the first preset threshold is a third adjustment mode, wherein in the third adjustment mode, the total similarity threshold of the target file is equal to the product of the first preset threshold minus the standard deviation and a second adjustment amplitude factor, and the third quantity range is a range larger than or equal to the second quantity threshold.

2. The method of claim 1, wherein calculating the overall similarity between the new feature and the class-centric feature of each profile in the set of profiles comprises:

calculating a first similarity between the new feature and the class center feature of each file in the file set to obtain a first class center set, wherein the first class center set comprises a first number of class centers with the highest first similarity with the new feature;

screening class centers with the first similarity between the class centers and the new features larger than a second preset threshold value from the first class center set to form a second class center set;

calculating a second similarity between the new feature and each class center in the second class center set, wherein the second similarity is a neighbor similarity;

and calculating the total similarity between the new feature and each class center in the second class center set according to the second similarity and the first similarity between the new feature and each class center in the second class center set.

3. The method of claim 2, wherein said calculating a second similarity between the new feature and each class center in the second set of class centers comprises:

respectively calculating first similarity between each class center in the second class center set and all class centers in the first class center set to obtain a third class center set, wherein the third class center set comprises a second number of class centers with the highest first similarity between each class center in the first class center set and each class center in the second class center set, the class centers in the third class center set are sequentially arranged from large to small according to the first similarity between the class centers in the second class center set, and each class center in the second class center set is provided with a corresponding third class center set;

and calculating a second similarity between the new feature and each class center in the second class center set according to an internal judgment condition that whether each class center in a fourth class center set belongs to the third class center set, wherein the fourth class center set is composed of a second number of class centers with the largest first similarity between the first class center set and the new feature, and the second number of class centers in the fourth class center set are sequentially arranged from large to small according to the first similarity between the second class center set and the new feature.

4. The method according to claim 3, wherein the second similarity between the new feature and each class center in the second class center set is calculated as follows:

wherein the content of the first and second substances,

in order to be said second degree of similarity,

is the most similar center class in the first center class set and the second center class set

The third set of class centers of individual class centers,

is the most similar to the new feature in the first class center set

The center of each category is a center of the category,

is an integer, and

，

is composed of

The weight matrix of (a) is determined,

represents

In that

Rank values in the set;

is the condition of the inside of the container,

is a mark function, if the internal condition is satisfied

Is 1, otherwise is 0.

5. The method of claim 1, further comprising:

and after the new features are classified into the files, updating the class center features of the files.

6. The method of claim 5, wherein updating the class-centric feature of the archive after the new feature is included in the archive comprises:

and calculating the class center characteristic of the file after the new characteristic is included according to the class center characteristic of the file before the new characteristic is included and the new characteristic.

7. The method of claim 1, further comprising:

and combining the two files with the total similarity larger than a third preset threshold in the file set.

8. The method of claim 7, wherein said merging two said profiles of said set of profiles having a total similarity greater than a third predetermined threshold comprises:

respectively calculating the total similarity between class center characteristics of the existing files in the file set;

judging whether two files with the total similarity between the class center features larger than a third preset threshold exist or not;

and if so, merging the two files of which the total similarity is greater than the third preset threshold.

9. The method of claim 1, wherein the attributing the new feature to the target profile further comprises:

and carrying out noise check on the target file.

10. The method of claim 9, wherein said noise checking said target profile comprises:

calculating the total similarity between every two features in the target file;

performing offline clustering on all the features in the target archive based on the total similarity between every two features to obtain at least one subclass;

replacing the target archive by a target sub-class, calculating class center characteristics of the target sub-class and other sub-classes, replacing the class center characteristics of the target archive by the class center characteristics of the target sub-class, and clustering and archiving the class center characteristics of the other sub-classes as the new characteristics according to the method of any one of claims 1 to 7, wherein the target sub-class is the sub-class with the most characteristics.

11. A face cluster archiving apparatus, the face cluster archiving apparatus comprising a processor, a memory coupled to the processor, wherein,

the memory stores program instructions for implementing the method of any one of claims 1-10;

the processor is configured to execute the program instructions stored by the memory to implement the method of any of claims 1-10.

12. A storage medium storing program instructions which, when executed, implement the steps of the method of any one of claims 1 to 10.