CN116563588A

CN116563588A - Image clustering method and device, electronic equipment and storage medium

Info

Publication number: CN116563588A
Application number: CN202310507757.9A
Authority: CN
Inventors: 程林
Original assignee: Vivo Mobile Communication Co Ltd
Current assignee: Vivo Mobile Communication Co Ltd
Priority date: 2023-05-06
Filing date: 2023-05-06
Publication date: 2023-08-08

Abstract

The application discloses an image clustering method, an image clustering device, electronic equipment and a storage medium, and belongs to the field of image processing. The method comprises the following steps: respectively carrying out face feature analysis and human body feature analysis on at least two person images, and determining face feature information of each person image, face quality information corresponding to the face feature information and human body feature information; based on the face feature information, the face quality information and the human feature information, or the face feature information and the face quality information, determining a human image similarity between every two human images in the human images; and carrying out image clustering on the person images based on the person image similarity to obtain at least one person image set.

Description

Image clustering method and device, electronic equipment and storage medium

Technical Field

The application belongs to the field of image processing, and particularly relates to an image clustering method, an image clustering device, electronic equipment and a storage medium.

Background

Many of the images captured by the user using the mobile phone are character images. A user may wish to review all images of the same person object in order to find a person image conveniently.

In the prior art, whether the same person object is contained in different person images is determined according to the face characteristics of the person objects in the person images, and the person images containing the same person object are aggregated together. Fig. 1 is a schematic flow chart of an image clustering method provided in the related art, as shown in fig. 1, the steps of clustering the person images in the related art include: face detection is carried out on the person image, such as a black square frame in fig. 1, and the detected face is aligned; extracting face features from the detected face; and clustering according to the face characteristics, determining whether faces in different person images belong to the same person, and aggregating person images of which the faces belong to the same person. Thus, extraction of face features has an important impact on clustering.

Because the content and the accuracy of face feature extraction in the human images are limited, human image clustering based on the face features is inaccurate.

Disclosure of Invention

An embodiment of the application aims to provide an image clustering method, an image clustering device, electronic equipment and a storage medium so as to solve the problems.

In a first aspect, an embodiment of the present application provides an image clustering method, including:

Respectively carrying out face feature analysis and human body feature analysis on at least two person images, and determining face feature information of each person image, face quality information corresponding to the face feature information and human body feature information;

based on the face feature information, the face quality information and the human feature information, or the face feature information and the face quality information, determining a human image similarity between every two human images in the human images;

and carrying out image clustering on the person images based on the person image similarity to obtain at least one person image set.

In a second aspect, an embodiment of the present application provides an image clustering apparatus, including:

the analysis module is used for respectively carrying out face feature analysis and human body feature analysis on at least two person images and determining face feature information of each person image, face quality information corresponding to the face feature information and human body feature information;

the determining module is used for determining the human image similarity between every two human images in the human images based on the human face characteristic information, the human face quality information and the human body characteristic information or the human face characteristic information and the human face quality information;

And the clustering module is used for carrying out image clustering on the person images based on the person image similarity to obtain at least one person image set.

In a third aspect, embodiments of the present application provide an electronic device comprising a processor and a memory storing a program or instructions executable on the processor, which when executed by the processor, implement the steps of the method as described in the first aspect.

In a fourth aspect, embodiments of the present application provide a readable storage medium having stored thereon a program or instructions which when executed by a processor implement the steps of the method according to the first aspect.

In a fifth aspect, embodiments of the present application provide a chip, where the chip includes a processor and a communication interface, where the communication interface is coupled to the processor, and where the processor is configured to execute a program or instructions to implement a method according to the first aspect.

In a sixth aspect, embodiments of the present application provide a computer program product stored in a storage medium, the program product being executable by at least one processor to implement the method according to the first aspect.

In the embodiment of the application, the clustering is performed based on the face feature information, the face quality information and the human feature information, or based on the face feature information and the face quality information, the human image similarity analysis between every two human image images is performed, and because the content and the accuracy of the extraction of the face feature information are limited, the human image similarity accuracy is low at the moment, more accurate human image similarity can be obtained by combining the face quality information capable of reflecting the face quality and the human feature information capable of reflecting the human features, and the more accurate human image similarity can effectively improve the clustering accuracy.

Drawings

FIG. 1 is a schematic flow chart of an image clustering method provided by the related art;

FIG. 2 is a schematic flow chart of an image clustering method according to an embodiment of the present application;

FIG. 3 is a second flowchart of an image clustering method according to an embodiment of the present disclosure;

fig. 4 is a schematic diagram of face quality definition in an image clustering method according to an embodiment of the present application;

fig. 5 is a schematic diagram of an FSRNet network structure in the image clustering method provided in the embodiment of the present application;

fig. 6 is a schematic structural diagram of an image clustering device provided in an embodiment of the present application;

fig. 7 is a schematic structural diagram of an electronic device according to an embodiment of the present application;

Fig. 8 is a schematic diagram of a hardware structure of an electronic device according to an embodiment of the present application.

Detailed Description

Technical solutions in the embodiments of the present application will be clearly described below with reference to the drawings in the embodiments of the present application, and it is apparent that the described embodiments are some embodiments of the present application, but not all embodiments. All other embodiments obtained by a person of ordinary skill in the art based on the embodiments in the present application are within the scope of the protection of the present application.

The terms first, second and the like in the description and in the claims, are used for distinguishing between similar objects and not necessarily for describing a particular sequential or chronological order. It is to be understood that the terms so used are interchangeable under appropriate circumstances such that the embodiments of the application are capable of operation in sequences other than those illustrated or otherwise described herein, and that the objects identified by "first," "second," etc. are generally of a type and do not limit the number of objects, for example, the first object may be one or more. Furthermore, in the description and claims, "and/or" means at least one of the connected objects, and the character "/", generally means that the associated object is an "or" relationship.

The image clustering method provided by the embodiment of the application is described in detail below through specific embodiments and application scenes thereof with reference to the accompanying drawings.

Fig. 2 is a schematic flow chart of an image clustering method provided in an embodiment of the present application, and as shown in fig. 2, the embodiment of the present application provides an image clustering method, where the method includes:

step 201, performing face feature analysis and human body feature analysis on at least two person images respectively, and determining face feature information of each person image, face quality information corresponding to the face feature information and human body feature information;

the image clustering method provided by the embodiment of the application can be executed by a server or a terminal. Terminals include, but are not limited to, smartphones, tablet computers, and the like.

The character image is an image taken by a camera, a video camera, a smart phone, a tablet computer, a computer, or the like or image frame data extracted from video taken by these devices.

The at least two character images can be stored in the local terminal or the cloud server. When the character image is stored in the local terminal, the image clustering method is processed locally; when the character images are stored in the cloud server, the image clustering method is processed in the cloud.

Fig. 3 is a second schematic flow chart of an image clustering method provided in an embodiment of the present application, as shown in fig. 3, a face feature analysis includes face detection, face feature extraction, and face quality assessment, and a human feature analysis includes human body detection and human feature extraction.

The face characteristic analysis process comprises the following steps: detecting a face region in each person image by using a face detection model; aligning face key points in a face region of the character image by using a face alignment model; and carrying out feature extraction on each face region aligned by using a face feature extraction model to obtain face feature information, and carrying out face quality evaluation on each face region aligned by using a face quality evaluation model to obtain face quality information.

The face detection model can be a support vector machine model, a Bayesian classifier and the like, the face alignment model can be a subjective shape model, an active appearance model, a constrained local model and the like, and the face feature extraction model and the face quality evaluation model can be other non-neural network models such as an artificial neural network, a convolutional neural network model, a support vector machine model and the like.

And carrying out face feature analysis on each person image through the face feature analysis process to obtain the face features of each person image. The face features are features of the face of the person, and include face feature information and face quality information.

The face feature information is information reflecting color features, texture features, shape features and spatial relationship features of each part of the face of the person object.

The face feature information may be an image composed of face contour key points detected from preset face key points, key points of eye parts, key points of nose parts, and key points of mouth parts.

The face quality information is information reflecting the face quality, and comprises face angles, face definition and face shielding degree.

The face angle influences the shape and the size of the face, so that the face characteristic information of the same person object in the person image is different, and the clustering effect is influenced.

The definition of the human face influences the accurate extraction of the human face features, so that the human face feature information of the same human object in the human image is different, and the clustering effect is influenced.

The face shielding degree influences the shape and the size of the face, so that the face characteristic information of the same person object in the person image is different, and the clustering effect is influenced.

The human body characteristic analysis process comprises the following steps: detecting a human body region in each character image; and extracting the characteristics of each human body region to obtain human body characteristic information.

The human body region can be detected by using a human body feature detection model, and the human body feature detection model can be a support vector machine model, a Bayesian classifier and the like.

Human feature information may be extracted using a human feature extraction model. The human body characteristic extraction model can be an artificial neural network model, a convolutional neural network model and the like.

The human body characteristic information is information reflecting color characteristics, texture characteristics, shape characteristics, and spatial relationship characteristics of each part of the human body (including clothing).

In an alternative embodiment, the human feature information may be an image formed by a plurality of key points, such as a head key point, a neck key point, an extremity key point, and a body key point, of the human object detected according to the preset key points. For each key point, the coordinate information of the key point, the color corresponding to the key point and other information can be obtained through detection. The overall shape characteristics of the human body (including clothing) may be composed of a plurality of key points. The shape features of the human body may include contour features of the head contour and body contour of the human body (such as head contour).

Step 202, determining the human image similarity between every two human images in the human images based on the human face characteristic information, the human face quality information and the human body characteristic information or the human face characteristic information and the human face quality information;

When the face feature information in the two person images is similar, the obtained person images according to the face feature information have high similarity, and it is likely that the two person images contain the same person image, so that false clustering is caused.

The embodiment of the application considers the confidence influence of the quality of the face on the similarity of the figures of the two figures. If the face quality of one or both of the two person images is low, for example, the side face degree of the face is high, the face definition is low or the face shielding area is large, even if the face feature information in the two person images is similar, the obtained person image by combining the face quality information and the face feature information is low, so that the confidence of the person image similarity is reduced. Under the condition that the similarity of the figures is low, the probability that two figures contain the same figure object is low, and the figures are difficult to aggregate into the same figure image set.

For example, the human objects in the two human images are all side faces, and the side faces do not contain the complete human face information of the human objects, so that the five-sense organ features of the human are not fully displayed, and even if the side face features of the human objects in the two human images are the same, the positive face features, such as the five-sense organ features, may be different, and the human objects with similar human face feature information in the two human images may not be the same human object. The human image similarity obtained by combining the human face quality information and the human face characteristic information is determined by the human face quality information and the human face characteristic information together, and the human image similarity is reduced under the condition of poor human face quality, so that the human image similarity obtained by combining the human face quality information is reduced compared with the human image similarity obtained by combining the human face characteristic similarity, the human images with poor human face quality are difficult to aggregate to the same human image set, and the false clustering is reduced.

The human body characteristic information affects the human image similarity of the two human images. For example, the side face degrees of faces in two character images are different, and the integrity degrees of faces are different. When the face angle is 0 degree, the figure image contains complete five sense organs; when the face angle is 90 degrees, the person image contains partial facial features, and the difference of the face features extracted from the two person images is large, so that the similarity of the face features is low even if the two person images contain the same person object. If the similarity of the face features is low and the similarity of the human feature information in the two person images is high, the human image similarity obtained by combining the human feature information and the face feature information is determined by the human face quality information and the human face feature information, and compared with the human image similarity of the two person images obtained according to the human face feature similarity, the human image similarity obtained by combining the human feature information and the human face feature information is improved, so that the same person object is aggregated into the same person image set as much as possible, and the accuracy of person image clustering is improved.

When the similarity between the two person images is determined based on the face feature information and the face feature quality, the face quality information may be used as the weight of the face feature information. The method comprises the following specific steps: normalizing the face quality information to the interval [0,1], and multiplying the face quality information of each person image by the face characteristic information to obtain the face quality information of each person image after adjustment; and taking the similarity between the face quality information of the two character images after the adjustment as the human image similarity of the two character images.

When the similarity between the two person images is determined based on the face feature information, the face quality information and the human feature information, the face feature information and the human feature information of each person image adjusted by the face quality information can be weighted and summed, and the similarity between the weighted and summed results of the two person images is used as the person image similarity of the two person images. Or carrying out weighted summation on the face quality information, the face characteristic information and the human characteristic information of each human image, and obtaining the weighted summation result of the two human images as the human image similarity of the two human images.

The human image similarity of the two human images can be calculated by using the Euclidean distance, the Jacquard similarity, the cosine similarity, the Pearson similarity and the like.

And 203, performing image clustering on the person images based on the person image similarity to obtain at least one person image set.

After the human image similarity of every two human images is determined, clustering the multiple human images by using a clustering algorithm.

The steps of the clustering algorithm may include: selecting a certain person image from the current non-clustered person images in the plurality of person images as an initial clustering center; determining the similarity of the character image and each current non-clustered character image, and adding the current non-clustered character image corresponding to the similarity larger than a first preset threshold value into a character image set; and determining the similarity between each last added personal image and each current non-clustered personal image, and adding the current non-clustered personal image corresponding to the similarity larger than the first preset threshold value into the personal image set until the current non-clustered personal image corresponding to the similarity larger than the first preset threshold value does not exist.

The step of the clustering algorithm may further comprise: for each of the plurality of person images, determining a similarity between the person image and each other of the plurality of person images; and clustering other character images corresponding to the similarity larger than the first preset threshold value with the character images to the same character image set.

And returning the clustering result of the plurality of character images to the client for display. The character image sets with a large number of character images can be preferably displayed in a manner that the number of character images in each character image set is arranged from large to small.

In addition, after the face area and the human body area in each human figure image are detected, the face area and the human body area can be associated and matched, and if a plurality of human figure objects exist in the human figure image, whether one face area and one human body area are the same human figure object in the same human figure image or not is determined.

Firstly, determining whether the face area and the human body area belong to the same figure image or not based on whether the identity of the face area and the figure image to which the human body area belongs is the same; then, based on whether the key point of the face region is in the head region of the human body region, it is determined whether the face region and the human body region belong to the same person object in the same person image.

Binding face characteristic information, face quality information and human body characteristic information corresponding to the face area which are associated and matched, so that the bound face characteristic information, face quality information and human body characteristic information belong to the same person object in the same person image.

And determining the human image similarity between every two human images based on the bound human face quality information, the human body characteristic information and the human face characteristic information or the bound human face characteristic information and the human face quality information.

The content shot by the camera can comprise images and videos, and in order to realize the mixed clustering of the character images and the character videos or the clustering of the character videos, the character images at least comprise image frame data extracted from one character video, such as extracting N image frame data from each character video at intervals, so that the power consumption is saved.

For each image frame data extracted by the human video, the number of image frame data belonging to the same human image set is counted.

When the number of the image frame data belonging to the same person image set is larger than a fourth preset threshold value, such as 30%, the person video is determined to belong to the person image set, so that the accuracy of person video clustering is improved.

According to the embodiment of the application, the clustering is performed by carrying out the human image similarity analysis between every two human images based on the human face feature information, the human face quality information and the human body feature information or based on the human face feature information and the human face quality information, and the accuracy of the human image similarity is low because the content and the accuracy of the human face feature information extraction are limited at the moment, so that more accurate human image similarity can be obtained by combining the human face quality information capable of reflecting the human face quality and the human body feature information capable of reflecting the human body features, and the more accurate human image similarity can effectively improve the accuracy of the clustering.

Optionally, the determining the person image similarity between every two person images in the person images based on the face feature information, the face quality information and the human feature information, or the face feature information and the face quality information includes:

determining the human image similarity based on the human face characteristic information of the two human images, the human face quality information of the two human images and the human body characteristic information of the two human images under the condition that the two human images meet the preset conditions;

determining the human image similarity based on the human face characteristic information of the two human images and the human face quality information of the two human images under the condition that the two human images do not meet the preset conditions;

Wherein, the preset conditions include: the human body characteristic information of the two person images is not empty, and the shooting time of the two person images is within the same preset time period.

The human body characteristic information of each human figure image is obtained through human body characteristic analysis, and if the human body region in the human figure image, namely the human body coordinates, are not detected in the human body detection stage of the human body characteristic analysis, the human body characteristic information of the human figure image is empty.

If the human body characteristic information exists in both the two human body images and the shooting time is within the same preset time period, the human body characteristic information can be further combined to judge whether the two human body images contain the same human body object, and the human body objects similar to the human body characteristic information in a short time are more likely to be the same human body object.

For example, the two person images are taken at the same day with little possibility of changing clothes of the person object, and if the two person images are similar in clothing color and shape, the two person images are more likely to contain the same person object.

The shooting time of the two character images is short, the shot character objects are limited, and the two character images with similar texture characteristics, shape characteristics and spatial relationship characteristics of all parts of the human body are more likely to contain the same character object.

When the two person images meet the preset condition, namely, the human body characteristic information of the two person images is not empty, and the shooting time of the two person images is in the same preset time period, the human body characteristic information can be effectively used for acquiring the human image similarity.

In this case, the human image similarity between the two human images can be determined by combining the human face characteristic information, the human body characteristic information and the human face quality information, so that the accuracy of human image clustering is improved.

When the two person images do not meet the preset condition, namely the human body characteristic information of at least one person image in the two person images is empty, or the shooting time is in the same preset time period, the human body characteristic information cannot be effectively used for acquiring the human image similarity.

In this case, the human image similarity between the two human images can be determined by combining the human face characteristic information and the human face quality information, so that false clustering of the human images is reduced, and the accuracy of human image clustering is improved.

According to the embodiment of the application, under the condition that two person images meet the preset condition, the two person images with similar human body characteristic information are more likely to contain the same person object, and the person image clustering accuracy is improved by combining the human body characteristic information for clustering; under the condition that the two person images do not meet the preset conditions, clustering is carried out without considering the human characteristic information, false clustering of the person images is reduced, and the accuracy of person image clustering is improved.

Optionally, the determining the person image similarity based on the face feature information of the two person images and the face quality information of the two person images when the two person images do not meet the preset condition includes:

under the condition that the two person images do not meet the preset conditions, determining the similarity of the face feature information based on the face feature information of the two person images to serve as the similarity to be adjusted;

and adjusting the similarity to be adjusted based on the face quality information of the two character images to obtain the image similarity.

The similarity of the face feature information may use euclidean distance, jekcard similarity, cosine similarity, pearson similarity, and the like.

For example, when face feature information of two person images is 128 dimensions, cosine similarity d of face feature information _f The method comprises the following steps:

wherein a is _i The ith dimension face characteristic information of one of the two character images, b _i The i-th dimension face characteristic information of the other person image in the two person images.

The confidence of the similarity of the face feature information may be controlled using the face quality information of one or both of the two person images.

If one of the two person images is clustered and the other person image is not clustered, as in the first clustering algorithm described above, the similarity of the face feature information may be adjusted using the face quality information S1 of the non-clustered person images.

If neither of the two person images is clustered, as in the second clustering algorithm, the average value S2 or the smaller value S3 of the face quality information of the two person images may be used to adjust the similarity of the face feature information.

Adjusting the similarity d of the face characteristic information by using the face quality information S1, S2 or S3 _f For example, the similarity d between the face quality information S1, S2 or S3 and the face characteristic information _f The adjusted similarity d1 of the face feature information is obtained by multiplication, and the adjusting mode is not limited in the embodiment of the application.

Under the condition that the similarity d1 of the face characteristic information after adjustment is larger than a first preset threshold value, the faces in the two person images are considered to belong to the same person object, the two person images are clustered into the same person image set, otherwise, the faces in the two person images belong to different person objects, and the other two person images are traversed to perform person image clustering.

According to the embodiment of the application, the similarity of the face feature information of the two person images is adjusted by using the face quality information of the two person images, the requirement on low-quality face aggregation is higher, the false clustering of the low-quality faces is obviously reduced, and the accuracy of the person image clustering is improved.

Optionally, the determining the portrait similarity based on the face feature information of the two person images, the face quality information of the two person images, and the human feature information of the two person images when the two person images meet a preset condition includes:

under the condition that the two person images meet the preset conditions, determining the comprehensive similarity of the face feature information and the human feature information as the similarity to be adjusted based on the face feature information of the two person images and the human feature information of the two person images;

When the human body characteristic information of the two person images is not empty and the shooting time of the two person images is in the same preset time period, the human body characteristic information can be effectively used for acquiring the human image similarity.

In this case, the face feature information and the human feature information are integrated to obtain the integrated similarity, so that whether the two human images contain the same human object can be judged more accurately; and then, the confidence coefficient of the comprehensive similarity is adjusted by using the face quality information, so that the comprehensive similarity is reduced, the aggregation possibility is reduced, and the clustering accuracy is improved for the character images with poor face quality.

The step of obtaining the comprehensive similarity of the face feature information and the human feature information may be: carrying out weighted summation on the face characteristic information and the human characteristic information of each of the two human images; the similarity of the corresponding weighted summation results of the two character images is taken as the comprehensive similarity d _fb 。

The step of obtaining the comprehensive similarity of the face feature information and the human feature information may further be: respectively calculating the similarity of the face characteristic information of the two person images and the similarity of the face characteristic information; the similarity of the face feature information and the similarity of the face feature information are weighted and summed to obtain the comprehensive similarity d _fb 。

Using human faceThe quantity information S1, S2 or S3 adjusts the integrated similarity d _fb For example, the face quality information S1, S2 or S3 is compared with the comprehensive similarity d _fb The adjusted integrated similarity d2 is obtained by multiplication, and the adjustment mode is not limited in the embodiment of the present application.

And under the condition that the adjusted comprehensive similarity d2 is larger than a first preset threshold value, the faces in the two person images are considered to belong to the same person object, the two person images are clustered into the same person image set, otherwise, the faces in the two person images belong to different person objects, and the other two person images are traversed to perform person image clustering.

According to the embodiment of the application, the face quality information of the two person images is used for adjusting the comprehensive similarity of the face characteristic information and the human characteristic information of the two person images, the requirement on low-quality face aggregation is higher, the false clustering of the low-quality faces is obviously reduced, and the accuracy of the person image clustering is improved.

Optionally, the determining the comprehensive similarity of the face feature information and the body feature information based on the face feature information of the two person images and the body feature information of the two person images includes:

determining the similarity of the face feature information based on the face feature information of the two person images;

determining the similarity of the human body characteristic information based on the human body characteristic information of the two human body images;

summing the similarity of the face feature information and the similarity of the human feature information to obtain an average value;

and determining the maximum value obtained by comparing the similarity of the face characteristic information with the average value as the comprehensive similarity.

Similarity d of face feature information _f Similarity d with human body characteristic information _b The average value of (d) is (d) _f +d _b )/2。

Comprehensive similarity d _fb ＝max[d _f ,(d _f +d _b )/2]。

The embodiment of the application has the comprehensive similarity d between the face characteristic information and the human characteristic information _fb Is higher than the similarity d of the face characteristic information _f When the human face feature information and the human body feature information are combined, the similarity d is obtained _fb Otherwise, trust the similarity d of the face characteristic information _f The recall rate of the character images is improved, so that more character images are clustered successfully.

Optionally, the face quality information includes a face angle, a face definition, and a face shielding degree, and the adjusting the similarity to be adjusted based on the face quality information of the two person images, to obtain the image similarity includes:

determining the comprehensive quality information of the face features of each person image based on the face angle, the face definition and the face shielding degree of each person image in the two person images;

normalizing the face feature comprehensive quality information of the two character images, mapping the face feature comprehensive quality information after normalization to a section between a preset minimum attenuation base number and 1, and obtaining mapped face feature comprehensive quality information;

Adjusting the similarity to be adjusted based on the mapped comprehensive quality information of the face features to obtain the image similarity;

fig. 4 is a schematic diagram of face quality definition in the image clustering method provided in the embodiment of the present application, as shown in fig. 4, a face angle Sp1 is an angle deviated by a face relative to a shooting plane of a camera, and a first row in fig. 4 shows different face angles.

When a user performs face recognition, a general front face faces a camera, but when the user shoots a person image, various angles exist for person objects in the person image, so that the accuracy and the completeness of face feature extraction are affected, the side face feature information of different person objects can be similar, the front face of the same person object and the side face feature information can be dissimilar, and therefore the face angle can influence the clustering accuracy.

The face sharpness Sp2 is the sharpness of a face region in a person image, and the second row in fig. 4 shows different face sharpness. The definition of the face affects the accurate extraction of the face feature information, thereby affecting the clustering effect.

The face shielding degree Sp3 is the degree to which the face is shielded, and the third row in fig. 4 shows different face shielding degrees. The face shielding degree influences the accuracy and the completeness of face characteristic information extraction, so that the clustering effect is influenced.

The face angle Sp1, the face definition Sp2 and the face shielding degree Sp3 of each person image may be weighted and summed to obtain the face feature comprehensive quality information S.

The face angle Sp1, the face sharpness Sp2, and the face shielding degree Sp3 of each person image may be multiplied to obtain the face feature integrated quality information s=sp1×sp2×sp3.

The face angle Sp1, the face definition Sp2, and the face shielding degree Sp3 may be normalized to the [0,1] interval. The closer the comprehensive quality information S of the face features is to 1, the higher the face quality is; the closer the face feature integrated quality information S is to 0, the lower the face quality is.

Defining a preset minimum attenuation base epsilon, and mapping the face characteristic comprehensive quality information to [ epsilon, 1] in a normalized manner]Intervals, i.e. S _ε ＝S×(1-ε)+ε。

And under the condition that the two character images meet the preset condition, using the face characteristic comprehensive quality information normalized and mapped by one or two character images in the two character images to control the confidence degree of the similarity of the face characteristic information.

If one of the two person images is clustered and the other person image is not clustered, as in the first clustering algorithm, the face feature comprehensive quality information S after normalization mapping of the non-clustered person images can be used _ε1 And adjusting the similarity of the face characteristic information. Similarity d of face characteristic information after adjustment _ε ＝d _f ×S _ε1 。

If two persons in two person imagesIf none of the object images is clustered, as in the second clustering algorithm, the average value S of the face feature comprehensive quality information after normalization mapping of the two person images can be used _ε2 Or a smaller value S therein _ε3 And adjusting the similarity of the face characteristic information. Similarity d of face characteristic information after adjustment _ε ＝d _f ×S _ε2 Or d _ε ＝d _f ×S _ε3 。

Similarity d of face characteristic information after adjustment _ε And under the condition that the human faces in the two human images are larger than a first preset threshold value, the human faces in the two human images are considered to belong to the same human object, the two human images are clustered into the same human image set, otherwise, the human faces in the two human images belong to different human objects, and the other two human images are traversed to perform human image clustering.

And under the condition that the two character images do not meet the preset condition, using the face characteristic comprehensive quality information normalized and mapped by one or two character images in the two character images to control the confidence degree of the face characteristic information and the human characteristic information comprehensive similarity.

If one of the two person images is clustered and the other person image is not clustered, as in the first clustering algorithm, the face feature comprehensive quality information S after normalization mapping of the non-clustered person images can be used _ε1 And (5) adjusting the comprehensive similarity. Adjusted integrated similarity d _ε ＝d _fb ×S _ε1 。

If neither of the two person images is clustered, as in the second clustering algorithm, the average value S of the face feature comprehensive quality information after normalization mapping of the two person images can be used _ε2 Or a smaller value S therein _ε3 And (5) adjusting the comprehensive similarity. Adjusted integrated similarity d _ε ＝d _fb ×S _ε2 Or d _ε ＝d _fb ×S _ε3 。

Under the condition that the adjusted comprehensive similarity is larger than a first preset threshold value, the faces in the two person images are considered to belong to the same person object, the two person images are clustered into the same person image set, otherwise, the faces in the two person images belong to different person objects, and the other two person images are traversed to perform person image clustering.

According to the embodiment of the application, the multi-dimensional face quality information is synthesized to obtain the face feature comprehensive quality information, the confidence level of the similarity is controlled by using the face feature comprehensive quality information, the false clustering of low-quality faces is reduced, and the clustering accuracy is improved; and the adjustment strength of the similarity of the comprehensive quality information of the face features is controlled by presetting the minimum attenuation base, so that the adjustment flexibility is improved.

Optionally, each two person images include a first person image and a second person image, and the image clustering is performed on the person images based on the person image similarity to obtain at least one person image set, including:

Acquiring M approximate person images of the first person image from the person images under the condition that the person image similarity between the first person image and the second person image is larger than a first preset threshold value, wherein M is a positive integer, and the approximate person images are person images of which the person image similarity with the first person image is larger than a second preset threshold value;

and under the condition that the average value of the human image similarity between the approximate human images of the second human image and the first human image is larger than a third preset threshold value, controlling the two human images to be clustered into the same human image set, and traversing every two human images in the human images to obtain at least one human image set.

If the similarity of the facial features of the two person images is greater than the first threshold, the two person images are determined to contain the same person object, and the clustering radius is 1.

Because the face feature similarity between the two person images may have deviation, in the embodiment of the present application, under the condition that the face feature similarity between the two person images is greater than a first preset threshold, whether the average value of the face feature similarity between one image and the similar person image of the other image is greater than a third preset threshold is continuously determined, so as to increase the clustering radius.

The first of the two person images may be clustered person images and the second person image may be an unclustered person image.

And under the condition that the average value of the human image similarity between the approximate human images of the second human image and the first human image is larger than a third preset threshold value, the human faces in the two human images are considered to belong to the same human object, the two human images are clustered into the same human image set, otherwise, the human faces in the two human images belong to different human objects, and the other two human images are traversed to perform human image clustering.

According to the method and the device for clustering the person images, the clustering radius is increased, so that approximation between two person images is required to be met, approximation of one person image and the other person image is required to be met, and under the condition that the person image similarity of the two person images may deviate, the accuracy of person image clustering is improved.

Optionally, the face feature information and the face quality information of the person images are obtained by inputting the person images into a face super-resolution network model, and outputting the person images by the face super-resolution network model, where the face super-resolution network model obtains the face feature information of each person image, the face quality information and the face feature information corresponding to the face feature information before the face feature analysis and the face feature analysis are performed on at least two person images respectively, and the obtaining manner includes:

Inputting each character image sample into a preset face super-resolution network (Face Super Resolution, FSRNet) model, and outputting face characteristic information and face quality information of each character image sample;

the face quality information may include face angle, face clarity, and face occlusion level. Before training a preset face super-resolution network, defining a face quality standard, scoring the face area after detection alignment in each human image sample according to the face angle, the face definition and the face shielding degree, wherein each face quality information dimension corresponds to four scores with different degrees, as shown in fig. 4.

And constructing training samples, wherein each training sample comprises a character image sample, a face quality information label corresponding to the character image sample and a character identity label.

And (3) carrying out face scoring on each human image sample according to the definition of the face quality, and taking the score of three dimensions obtained by each human image sample as the score of the face quality information label, wherein the lower the score is, the worse the face quality of the dimension is. Each face region is transformed to a network input size, 112 x 112 being an example of an embodiment of the present application.

In the process of constructing a face super-resolution network model based on face quality, each face area is transformed into 112×112×3, normalized and then input into a preset face super-resolution network model. Fig. 5 is a schematic diagram of an FSRNet network structure in an image clustering method according to an embodiment of the present application. The feature map of 28×28×64 is obtained through different convolution modules shown in fig. 5, and the feature map is input into two branches of a preset face super-resolution network model simultaneously, one is a face feature recognition branch, and the other is a face feature quality evaluation branch.

In the face feature recognition branch, the feature map is subjected to convolution, pooling, global pooling, full connection layer and other operations to obtain a 128-dimensional face feature vector. And outputting different person identities through softmax according to the person identity tag. For example, 1000 different personally identifiable tags are represented by 0 through 999.

In the face feature quality evaluation branch, the feature map is subjected to operations such as convolution, pooling, global pooling and full connection layer to obtain a 3-dimensional quality score, and the three-dimensional quality scores of the face region, which represent the face super-resolution network model prediction, are respectively represented by Sp1, sp2 and Sp 3. Because the numerical value output by the face super-resolution network model is between 0 and 1, the score range of the face quality information is also between 0 and 1.

Determining a loss value of face characteristic information and a loss value of face quality information of each person image sample;

the loss value Ls of the face quality information can be calculated by mean square error, and the formula is as follows:

wherein α, β and γ represent weights of three dimensions of face quality information, sp1, sp2 and Sp3 represent quality scores of three dimensions of a face region predicted by a face super-resolution network model, sg1, sg2 and Sg3 represent quality scores of three dimensions of a manual label, and division 10 is to normalize the label quality scores to be within a section [0,1 ].

The loss value of the face feature information may be cross entropy loss using softmax. Assuming that N person image samples are provided, the formula of the loss value Lr of the face feature information is as follows:

where N represents the number of person image samples,representing the probability of the ith personage image sample on personage identity tag y, B representing the number of personage identities,/->Representing the probability that the ith person image sample is on the jth person identity tag. The above-described loss function is equivalent to the processing of N human image samples, each with a probability of 1/N.

Adjusting the loss value of the face characteristic information based on the face quality information of each person image sample to obtain the loss value after the face characteristic information is adjusted;

the embodiment of the application optimizes the loss value of the face feature information by combining the face quality information, for example, the face quality information of each person image sample is multiplied by the face feature information loss value of the person image sample, namely:

wherein S is _i Face quality information of the ith person image sample.

And training the preset face super-resolution network model in a combined optimization mode based on the loss value of the face quality information and the loss value after the face characteristic information is adjusted, so as to obtain the face super-resolution network model.

And training the FSRNet network structure in a combined and optimized way based on the loss value Ls of the face quality information and the loss value Lrn after the face characteristic information is adjusted, so as to obtain the weight parameter W of the whole face super-resolution network model.

And inputting each character image into a trained human face super-resolution network model, and clustering by using human face characteristic information output by a layer before softmax, human face quality information output by a human face characteristic quality evaluation branch and human body characteristic information bound with the human face characteristic information.

According to the face super-resolution network model prediction method and device, the loss value of the face feature information is optimized by combining the face quality information, so that the lower the contribution degree of the low-quality face to the face feature recognition is, the higher the contribution degree of the high-quality face to the face feature recognition is, and the face super-resolution network model is jointly optimized by combining the loss value of the face quality information and the loss value of the face feature information after optimization, so that the accuracy of the face super-resolution network model weight information prediction is improved.

According to the image clustering method provided by the embodiment of the application, the execution subject can be an image clustering device. In the embodiment of the present application, a method for performing person image clustering by using an image clustering device is taken as an example, and the device for performing person image clustering provided in the embodiment of the present application is described.

Fig. 6 is a schematic structural diagram of an image clustering device provided in an embodiment of the present application. As shown in fig. 6, the image clustering device includes an analysis module 601, a determination module 602, and a clustering module 603, wherein:

the analysis module 601 is configured to perform face feature analysis and body feature analysis on at least two person images, respectively, to determine face feature information of each person image, face quality information corresponding to the face feature information, and body feature information;

the determining module 602 is configured to determine a human image similarity between every two human images in the human images based on the human face feature information, the human face quality information, and the human body feature information, or the human face feature information and the human face quality information;

the clustering module 603 is configured to perform image clustering on the person images based on the person image similarity, to obtain at least one person image set.

Optionally, the determining module is specifically configured to:

and comparing the similarity of the face characteristic information with the average value to obtain a maximum value serving as the comprehensive similarity.

When the comprehensive similarity of the face feature information and the human body feature information is higher than the human face feature information similarity, the human face feature information and the human body feature information are trusted, otherwise, the human face feature information similarity is trusted, the recall rate of the person images is improved, and more person images are clustered successfully.

Optionally, the face quality information includes a face angle, a face definition, and a face shielding degree, and the determining module is specifically configured to:

and adjusting the similarity to be adjusted based on the mapped comprehensive quality information of the face features to obtain the image similarity.

Optionally, each two person images include a first person image and a second person image, and the clustering module is specifically configured to:

and under the condition that the average value of the human image similarity between the second human image and the approximate human image of the first human image is larger than a third preset threshold value, controlling the two human images to be clustered to the same human image set, and traversing every two human images in the human images to obtain at least one human image set.

inputting each character image sample into a preset face super-resolution network model, and outputting face characteristic information and face quality information of each character image sample;

The image clustering device in the embodiment of the application may be an electronic device, or may be a component in the electronic device, for example, an integrated circuit or a chip. The electronic device may be a terminal, or may be other devices than a terminal. By way of example, the electronic device may be a mobile phone, tablet computer, notebook computer, palm computer, vehicle-mounted electronic device, mobile internet appliance (Mobile Internet Device, MID), augmented reality (augmented reality, AR)/Virtual Reality (VR) device, robot, wearable device, ultra-mobile personal computer, UMPC, netbook or personal digital assistant (personal digital assistant, PDA), etc., but may also be a server, network attached storage (Network Attached Storage, NAS), personal computer (personal computer, PC), television (TV), teller machine or self-service machine, etc., and the embodiments of the present application are not limited in particular.

The image clustering device in the embodiment of the application may be a device with an operating system. The operating system may be an Android operating system, an ios operating system, or other possible operating systems, which are not specifically limited in the embodiments of the present application.

The image clustering device provided in the embodiment of the present application can implement each process implemented by the embodiments of the methods of fig. 2 to 5, and in order to avoid repetition, a description is omitted here.

Optionally, fig. 7 is a schematic structural diagram of an electronic device provided in the embodiment of the present application, as shown in fig. 7, and further provides an electronic device 700, including a processor 701 and a memory 702, where a program or an instruction capable of running on the processor 701 is stored in the memory 702, and the program or the instruction implements each step of the above image clustering method embodiment when executed by the processor 701, and the steps can achieve the same technical effect, so that repetition is avoided and no further description is given here.

The electronic device in the embodiment of the application includes the mobile electronic device and the non-mobile electronic device described above.

Fig. 8 is a schematic hardware structure of an electronic device according to an embodiment of the present application.

The electronic device 800 includes, but is not limited to: radio frequency unit 801, network module 802, audio output unit 803, input unit 804, sensor 805, display unit 806, user input unit 807, interface unit 808, memory 809, and processor 810.

Those skilled in the art will appreciate that the electronic device 800 may also include a power source (e.g., a battery) for powering the various components, which may be logically connected to the processor 810 by a power management system to perform functions such as managing charge, discharge, and power consumption by the power management system. The electronic device structure shown in fig. 8 does not constitute a limitation of the electronic device, and the electronic device may include more or less components than shown, or may combine certain components, or may be arranged in different components, which are not described in detail herein.

The processor 810 is configured to perform face feature analysis and body feature analysis on at least two person images, respectively, to determine face feature information of each person image, face quality information corresponding to the face feature information, and body feature information; based on the face feature information, the face quality information and the human feature information, or the face feature information and the face quality information, determining a human image similarity between every two human images in the human images; and carrying out image clustering on the person images based on the person image similarity to obtain at least one person image set.

The processor 810 is further configured to determine the person image similarity based on the face features of the two person images and the person feature information of the two person images, if the two person images satisfy a preset condition;

determining the portrait similarity based on the face features of the two portrait images under the condition that the two portrait images do not meet the preset condition;

The processor 810 is further configured to determine, based on face feature information of the two person images, a similarity of the face feature information as a similarity to be adjusted, if the two person images do not satisfy the preset condition;

The processor 810 is further configured to determine, when the two person images satisfy the preset condition, a comprehensive similarity between the face feature information and the human feature information as a similarity to be adjusted, based on the face feature information of the two person images and the human feature information of the two person images;

The processor 810 is further configured to determine a similarity of the human feature information based on the human feature information of the two person images;

The processor 810 is further configured to determine comprehensive quality information of face features of each of the two person images based on a face angle, a face sharpness, and a face shielding degree of each of the two person images;

The processor 810 is further configured to obtain M approximate person images of the first person image from the person images, where M is a positive integer, in a case where the person similarity between the first person image and the second person image is greater than a first preset threshold, and the approximate person images are person images of the person images having a person similarity with the first person image greater than a second preset threshold;

The processor 810 is further configured to input each character image sample into a preset face super-resolution network model, and output face feature information and face quality information of each character image sample;

It should be appreciated that in embodiments of the present application, the input unit 804 may include a graphics processor (Graphics Processing Unit, GPU) 8041 and a microphone 8042, with the graphics processor 8041 processing image data of still pictures or video obtained by an image capturing device (e.g., a camera) in a video capturing mode or an image capturing mode. The display unit 806 may include a display panel 8061, and the display panel 8061 may be configured in the form of a liquid crystal display, an organic light emitting diode, or the like. The user input unit 807 includes at least one of a touch panel 8071 and other input devices 8072. Touch panel 8071, also referred to as a touch screen. The touch panel 8071 may include two parts, a touch detection device and a touch controller. Other input devices 8072 may include, but are not limited to, a physical keyboard, function keys (e.g., volume control keys, switch keys, etc.), a trackball, a mouse, a joystick, and so forth, which are not described in detail herein.

The memory 809 can be used to store software programs as well as various data. The memory 809 may mainly include a first storage area storing programs or instructions and a second storage area storing data, wherein the first storage area may store an operating system, application programs or instructions (such as a sound playing function, an image playing function, etc.) required for at least one function, and the like. Further, the memory 809 may include volatile memory or nonvolatile memory, or the memory 809 may include both volatile and nonvolatile memory. The nonvolatile Memory may be a Read-Only Memory (ROM), a Programmable ROM (PROM), an Erasable PROM (EPROM), an Electrically Erasable EPROM (EEPROM), or a flash Memory. The volatile memory may be random access memory (Random Access Memory, RAM), static RAM (SRAM), dynamic RAM (DRAM), synchronous DRAM (SDRAM), double Data Rate SDRAM (ddr SDRAM), enhanced SDRAM (Enhanced SDRAM), synchronous DRAM (SLDRAM), and Direct RAM (DRRAM). Memory 809 in embodiments of the present application includes, but is not limited to, these and any other suitable types of memory.

The processor 810 may include one or more processing units; optionally, the processor 810 integrates an application processor that primarily processes operations involving an operating system, user interface, application programs, etc., and a modem processor that primarily processes wireless communication signals, such as a baseband processor. It will be appreciated that the modem processor described above may not be integrated into the processor 810.

The embodiment of the present application further provides a readable storage medium, where a program or an instruction is stored, and when the program or the instruction is executed by a processor, the processes of the embodiment of the image clustering method are implemented, and the same technical effects can be achieved, so that repetition is avoided, and no further description is given here.

Wherein the processor is a processor in the electronic device described in the above embodiment. The readable storage medium includes computer readable storage medium such as computer readable memory ROM, random access memory RAM, magnetic or optical disk, etc.

The embodiment of the application further provides a chip, the chip includes a processor and a communication interface, the communication interface is coupled with the processor, the processor is used for running a program or an instruction, implementing each process of the image clustering method embodiment, and achieving the same technical effect, so as to avoid repetition, and no redundant description is provided herein.

It should be understood that the chips referred to in the embodiments of the present application may also be referred to as system-on-chip chips, chip systems, or system-on-chip chips, etc.

The embodiments of the present application provide a computer program product, which is stored in a storage medium, and the program product is executed by at least one processor to implement the respective processes of the embodiments of the image clustering method, and achieve the same technical effects, so that repetition is avoided, and a detailed description is omitted here.

It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element. Furthermore, it should be noted that the scope of the methods and apparatus in the embodiments of the present application is not limited to performing the functions in the order shown or discussed, but may also include performing the functions in a substantially simultaneous manner or in an opposite order depending on the functions involved, e.g., the described methods may be performed in an order different from that described, and various steps may also be added, omitted, or combined. Additionally, features described with reference to certain examples may be combined in other examples.

From the above description of the embodiments, it will be clear to those skilled in the art that the above-described embodiment method may be implemented by means of software plus a necessary general hardware platform, but of course may also be implemented by means of hardware, but in many cases the former is a preferred embodiment. Based on such understanding, the technical solutions of the present application may be embodied essentially or in a part contributing to the prior art in the form of a computer software product stored in a storage medium (such as ROM/RAM, magnetic disk, optical disk), comprising several instructions for causing a terminal (which may be a mobile phone, a computer, a server, or a network device, etc.) to perform the methods described in the embodiments of the present application.

The embodiments of the present application have been described above with reference to the accompanying drawings, but the present application is not limited to the above-described embodiments, which are merely illustrative and not restrictive, and many forms may be made by those of ordinary skill in the art without departing from the spirit of the present application and the scope of the claims, which are also within the protection of the present application.

Claims

1. An image clustering method, comprising:

2. The image clustering method according to claim 1, wherein the determining of the person image similarity between each two person images in the person images based on the face feature information, the face quality information, and the human feature information, or the face feature information and the face quality information, comprises:

3. The image clustering method according to claim 2, wherein the determining the person image similarity based on the face feature information of the two person images and the face quality information of the two person images in the case where the two person images do not satisfy the preset condition includes:

4. The image clustering method according to claim 2, wherein the determining the person image similarity based on the face feature information of the two person images, the face quality information of the two person images, and the person feature information of the two person images in the case where the two person images satisfy a preset condition includes:

5. The image clustering method according to claim 4, wherein the determining the integrated similarity of the face feature information and the body feature information based on the face feature information of the two person images and the body feature information of the two person images includes:

6. The image clustering method according to claim 3 or 4, wherein the face quality information includes a face angle, a face definition, and a face shielding degree, the adjusting the similarity to be adjusted based on the face quality information of the two person images, to obtain the image similarity, includes:

7. The image clustering method according to any one of claims 1 to 5, wherein each two person images includes a first person image and a second person image, wherein image clustering is performed on the person images based on the person image similarity to obtain at least one person image set, including:

8. The image clustering method according to any one of claims 1 to 5, wherein the face feature information and the face quality information of the person images are obtained by inputting the person images into a face super-resolution network model, and outputting the person images from the face super-resolution network model, and the face super-resolution network model obtains the face feature information of each person image, the face quality information corresponding to the face feature information, and the face feature information before the face feature analysis and the face feature analysis are performed on at least two person images, respectively, in such a manner that:

9. An image clustering apparatus, comprising:

10. The image clustering device of claim 9, wherein the determining module is specifically configured to:

11. The image clustering device of claim 10, wherein the determining module is specifically configured to:

12. The image clustering device of claim 10, wherein the determining module is specifically configured to:

13. The image clustering apparatus of claim 12, wherein the determining module is specifically configured to:

14. The image clustering apparatus according to claim 11 or 12, wherein the face quality information includes a face angle, a face sharpness, and a face shielding degree, and the determining module is specifically configured to:

15. The image clustering apparatus of any one of claims 9-13, wherein each two person images comprises a first person image and a second person image, the clustering module being specifically configured to:

16. The image clustering apparatus according to any one of claims 9 to 13, wherein the face feature information and the face quality information of the person images are obtained by inputting the person images into a face super-resolution network model, and outputting the person images from the face super-resolution network model, wherein the face super-resolution network model obtains the face feature information of each person image, the face quality information corresponding to the face feature information, and the face feature information before the face feature analysis and the face feature analysis are performed on the at least two person images, respectively, in such a manner that:

17. An electronic device comprising a processor and a memory storing a program or instructions executable on the processor, which when executed by the processor, implement the steps of the image clustering method of any one of claims 1-8.

18. A readable storage medium, characterized in that the readable storage medium has stored thereon a program or instructions which, when executed by a processor, implement the steps of the image clustering method according to any one of claims 1-8.