CN115719428A

CN115719428A - Face image clustering method, device, equipment and medium based on classification model

Info

Publication number: CN115719428A
Application number: CN202211418963.4A
Authority: CN
Inventors: 邢玲; 王爱波; 杨一帆; 余晓填; 王孝宇
Original assignee: Hangzhou Lifei Software Technology Co ltd; Shenzhen Intellifusion Technologies Co Ltd
Current assignee: Hangzhou Lifei Software Technology Co ltd; Shenzhen Intellifusion Technologies Co Ltd
Priority date: 2022-11-14
Filing date: 2022-11-14
Publication date: 2023-02-28

Abstract

The invention relates to the technical field of artificial intelligence, in particular to a face image clustering method, a face image clustering device, face image clustering equipment and a face image clustering medium based on a classification model. The method includes the steps of taking face images as nodes, constructing edges between any two nodes according to the similarity of any two nodes, forming node map data, calculating the similarity between the human body images and other human body images according to the human body image to which any face image belongs, screening out a target image according to the calculation result, inputting the similarity of the human body images and the target image and the acquired information of the similarity and the acquired information of the human body images and the target image into a classification model, updating the node map data according to the classification result, clustering the updated result, taking the acquired information as the input of the classification model, enabling the classification result to be more accurate, providing additional information for the node map data according to the similarity between the human body images, assisting the node map data to be updated, improving the accuracy of cluster analysis based on the updated node map data, and further improving the accuracy of face image clustering.

Description

Face image clustering method, device, equipment and medium based on classification model

Technical Field

The invention relates to the technical field of artificial intelligence, in particular to a face image clustering method, a face image clustering device, face image clustering equipment and a face image clustering medium based on a classification model.

Background

At present, with the development of artificial intelligence technology, face clustering technology has been widely applied to application scenarios such as intelligent communities and intelligent security, and can provide accurate clustering information for application tasks such as face recognition and pedestrian re-recognition.

However, in the actual face collecting process, due to factors such as image blurring and collecting angle difference, the features of two face images of the same person may not be similar, and then a situation of "one person with multiple gears" appears after cluster analysis, and meanwhile, due to factors such as face facial shielding, the features of two face images of different persons may also be similar, and then a situation of "multiple persons with one gear" appears after cluster analysis, so that the accuracy of face image clustering is greatly reduced, and therefore, how to improve the accuracy of face image clustering becomes a problem to be solved urgently.

Disclosure of Invention

In view of this, embodiments of the present invention provide a method, an apparatus, a device, and a medium for clustering face images based on a classification model, so as to solve the problem of low accuracy of face image clustering.

In a first aspect, an embodiment of the present invention provides a face image clustering method based on a classification model, where the face image clustering method includes:

acquiring N face images and face characteristics of each face image, calculating first similarity between the face characteristics of any two nodes by taking each face image as a node, and constructing an edge between the two corresponding nodes according to the first similarity to form node image data, wherein N is an integer greater than one;

acquiring a human body image to which each human face image belongs, calculating a second similarity between the human body image and other human body images aiming at any human body image, and screening out a target image meeting the condition that the second similarity is greater than a preset similarity threshold value from all other human body images;

acquiring acquisition information of the human body image and acquisition information of the target image, splicing the second similarity between the human body image and the target image and the acquisition information of the human body image and the acquisition information of the target image, inputting a splicing result into a trained classification model to obtain a classification result, and updating the node map data according to the classification result;

and carrying out clustering analysis on the updated node map data by adopting a preset clustering algorithm to obtain a clustering result of the N face images.

In a second aspect, an embodiment of the present invention provides a facial image clustering device based on a classification model, where the facial image clustering device includes:

the image construction module is used for acquiring N human face images and human face characteristics of each human face image, calculating first similarity between the human face characteristics of any two nodes by taking each human face image as a node, constructing an edge between the two corresponding nodes according to the first similarity, and forming node image data, wherein N is an integer greater than one;

the similarity calculation module is used for acquiring the human body image to which each human face image belongs, calculating second similarities between the human body image and other human body images aiming at any human body image, and screening out a target image meeting the condition that the second similarity is greater than a preset similarity threshold value from all other human body images;

the graph updating module is used for acquiring the acquisition information of the human body image and the acquisition information of the target image, splicing the second similarity between the human body image and the target image and the acquisition information of the human body image and the acquisition information of the target image, inputting the splicing result into a trained classification model to obtain a classification result, and updating the node graph data according to the classification result;

and the cluster analysis module is used for carrying out cluster analysis on the updated node map data by adopting a preset cluster algorithm to obtain the cluster results of the N face images.

In a third aspect, an embodiment of the present invention provides a computer device, where the computer device includes a processor, a memory, and a computer program stored in the memory and executable on the processor, and the processor, when executing the computer program, implements the face image clustering method according to the first aspect.

In a fourth aspect, an embodiment of the present invention provides a computer-readable storage medium, where a computer program is stored, and when being executed by a processor, the computer program implements the face image clustering method according to the first aspect.

Compared with the prior art, the embodiment of the invention has the following beneficial effects:

the method comprises the steps of obtaining N face images and face characteristics of each face image, calculating first similarity between the face characteristics of any two nodes by taking each face image as a node, constructing an edge between the two corresponding nodes according to the first similarity, forming node graph data, obtaining a human body image to which each face image belongs, calculating second similarity between the human body image and other human body images aiming at any human body image, screening out a target image meeting the second similarity and larger than a preset similarity threshold value from all other human body images, obtaining acquisition information of the human body image and acquisition information of the target image, splicing the second similarity between the human body image and the target image and the acquisition information of the human body image and the acquisition information of the target image, inputting the spliced result into a trained classification model to obtain a classification result, updating node graph data according to the classification result, analyzing the updated node graph data by adopting a preset clustering algorithm to obtain the clustering results of the N face images, using the human body image, the target image and the acquisition information as input of the classification model to enable the classification result of the node graph data of the point graph to be more accurate, and improving the accuracy of the clustering of the node graph data of the updated by taking the image as the input of the classification model, thereby improving the accuracy of the node graph data of the new face image.

Drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present invention, the drawings needed to be used in the embodiments or the prior art descriptions will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without inventive exercise.

Fig. 1 is a schematic view of an application environment of a face image clustering method based on a classification model according to an embodiment of the present invention;

fig. 2 is a schematic flow chart of a face image clustering method based on a classification model according to an embodiment of the present invention;

fig. 3 is a schematic view illustrating a flow of updating node map data in a face image clustering method based on a classification model according to an embodiment of the present invention;

fig. 4 is a schematic structural diagram of a face image clustering device based on a classification model according to a second embodiment of the present invention;

fig. 5 is a schematic structural diagram of a computer device according to a third embodiment of the present invention.

Detailed Description

In the following description, for purposes of explanation and not limitation, specific details are set forth, such as particular system structures, techniques, etc. in order to provide a thorough understanding of the embodiments of the invention. It will be apparent, however, to one skilled in the art that the present invention may be practiced in other embodiments that depart from these specific details. In other instances, detailed descriptions of well-known systems, devices, circuits, and methods are omitted so as not to obscure the description of the present invention with unnecessary detail.

It should be understood that the terms "comprises" and/or "comprising," when used in this specification and the appended claims, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof.

It should also be understood that the term "and/or" as used in this specification and the appended claims refers to and includes any and all possible combinations of one or more of the associated listed items.

As used in this specification and the appended claims, the term "if" may be interpreted contextually as "when", "upon" or "in response to a determination" or "in response to a detection". Similarly, the phrase "if it is determined" or "if a [ described condition or event ] is detected" may be interpreted contextually to mean "upon determining" or "in response to determining" or "upon detecting [ described condition or event ]" or "in response to detecting [ described condition or event ]".

Furthermore, in the description of the present invention and the appended claims, the terms "first," "second," "third," and the like are used for distinguishing between descriptions and not necessarily for describing or implying relative importance.

Reference throughout this specification to "one embodiment" or "some embodiments," or the like, means that a particular feature, structure, or characteristic described in connection with the embodiment is included in one or more embodiments of the present invention. Thus, appearances of the phrases "in one embodiment," "in some embodiments," "in other embodiments," or the like, in various places throughout this specification are not necessarily all referring to the same embodiment, but rather "one or more but not all embodiments" unless specifically stated otherwise. The terms "comprising," "including," "having," and variations thereof mean "including, but not limited to," unless expressly specified otherwise.

The embodiment of the invention can acquire and process related data based on an artificial intelligence technology. Among them, artificial Intelligence (AI) is a theory, method, technique and application system that simulates, extends and expands human Intelligence using a digital computer or a machine controlled by a digital computer, senses the environment, acquires knowledge and uses the knowledge to obtain the best result.

The artificial intelligence infrastructure generally includes technologies such as sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing technologies, operation/interaction systems, mechatronics, and the like. The artificial intelligence software technology mainly comprises a computer vision technology, a robot technology, a biological recognition technology, a voice processing technology, a natural language processing technology, machine learning/deep learning and the like.

It should be understood that, the sequence numbers of the steps in the following embodiments do not imply an execution sequence, and the execution sequence of each process should be determined by its function and inherent logic, and should not constitute any limitation to the implementation process of the embodiments of the present invention.

In order to explain the technical means of the present invention, the following description will be given by way of specific examples.

The facial image clustering method based on the classification model provided by the embodiment of the invention can be applied to the application environment shown in fig. 1, wherein a client communicates with a server. The client includes, but is not limited to, a palm top computer, a desktop computer, a notebook computer, an ultra-mobile personal computer (UMPC), a netbook, a cloud terminal device, a Personal Digital Assistant (PDA), and other computer devices. The server may be an independent server, or may be a cloud server that provides basic cloud computing services such as a cloud service, a cloud database, cloud computing, a cloud function, cloud storage, a Network service, cloud communication, a middleware service, a domain name service, a security service, a Content Delivery Network (CDN), a big data and artificial intelligence platform, and the like. The client can be deployed in application scenes such as face recognition, image classification, intelligent security, product recommendation and the like, and provides face image clustering information for tasks in the application scenes, in general, the face image clustering information can be used for improving the efficiency of task execution, or can be used as verification information to improve the accuracy of task execution, for example, in a face recognition application scene, in the case of a face image recognition process, if the face image is detected to be in the known face image clustering information, a person corresponding to the face image can be directly determined according to the face image clustering information, no recognition model inference is needed, and further the efficiency of face recognition task execution can be improved; or after the face image is identified by the identification model, whether the face image cluster information corresponding to the identified person contains the face image can be detected, and then the face image cluster information is used as verification information, so that misjudgment of face identification is avoided, and the accuracy of execution of a face identification task is improved.

Referring to fig. 2, which is a schematic flow chart of a face image clustering method based on a classification model according to an embodiment of the present invention, the face image clustering method may be applied to a client in fig. 1, a computer device corresponding to the client is connected to a server to obtain N face images and face features of each face image, a human body image to which each face image belongs, and acquisition information of each human body image from the server, the N face images are face images to be clustered, a trained classification model is deployed on a computer device corresponding to the client, and the trained classification model may be used to classify whether human body images belong to the same person. As shown in fig. 2, the face image clustering method may include the following steps:

step S201, obtaining N face images and the face characteristics of each face image, calculating a first similarity between the face characteristics of any two nodes by taking each face image as a node, and constructing an edge between the two corresponding nodes according to the first similarity to form node map data.

The face image can be obtained by face extraction from a snapshot image, the snapshot image can be obtained by collecting through image collection equipment such as a monitoring camera and a monitoring video recorder, the face features can be obtained by feature extraction from the face image through a face feature extraction model, and N is an integer greater than one.

The nodes may refer to graph nodes, data included in one graph node is a face image, the first similarity may be used to represent a feature similarity between face features of two nodes, the edges may refer to a connection relationship between two nodes, the node graph data may include nodes and edges, and the node graph data may be a graph structure representation of N face images and a connection relationship between face images.

Specifically, image acquisition devices such as surveillance cameras and surveillance video recorders can be deployed in specific application scenes, for example, intelligent security scenes, and the image acquisition devices can acquire snapshot images according to a fixed sampling frequency.

For the collected snapshot image, face extraction can be performed through a trained first target detection model, the trained first target detection model can be realized by using a YOLO model, a Mask-RCNN model and the like, an output result of the trained first target detection model can be an enclosure frame of a face region, the enclosure frame of the face region can be represented by an upper left-corner coordinate point and a lower right-corner coordinate point of the enclosure frame, a first cutting region can be determined in the snapshot image according to the upper left-corner coordinate point and the lower right-corner coordinate point of the enclosure frame, the snapshot image is cut according to the first cutting region, the face image is cut, and it needs to be explained that size normalization processing is performed on the cut face image, so that the face images of different types can be processed in the same mode subsequently.

The face feature extraction model can adopt an encoder part in a trained face recognition model, the encoder part generally comprises a convolution layer and a pooling layer, the convolution layer can be used for feature aggregation, and the pooling layer can be used for feature dimension reduction.

The method for calculating the first similarity may adopt cosine similarity, euclidean distance, manhattan distance, and the like, in this embodiment, the cosine similarity is adopted to calculate the first similarity, the value range of the first similarity is [0,1], the closer the first similarity is to 1, which indicates that the facial features of two nodes used for calculating the first similarity are similar, and the closer the first similarity is to 0, which indicates that the facial features of two nodes used for calculating the first similarity are dissimilar.

Optionally, constructing an edge between the two corresponding nodes according to the first similarity includes:

comparing the first similarity with a preset connection threshold value to obtain a comparison result;

if the comparison result is that the first similarity is greater than the connection threshold, constructing an edge between the two corresponding nodes;

and if the comparison result is that the first similarity is smaller than or equal to the connection threshold, constructing edges between the two corresponding nodes is not performed.

For example, when the first similarity is calculated by using cosine similarity, the value range is [0,1], and the closer the first similarity is to 1, it indicates that the more similar the face features of the two nodes used for calculating the first similarity are, and therefore, the connection threshold may be set to 0.7, so that the two nodes with sufficient similarity between the face features are connected by an edge.

Specifically, if the comparison result shows that the first similarity is greater than the connection threshold, it indicates that two face features used for calculating the first similarity are sufficiently similar, and at this time, it can be considered that the face images respectively corresponding to the two face features used for calculating the first similarity belong to the same person, so that the construction edges between the nodes corresponding to the two face images are connected to reduce the occurrence of "one person with multiple grades" in the subsequent clustering process, and improve the accuracy of face image clustering.

If the comparison result shows that the first similarity is smaller than or equal to the connection threshold, it indicates that the two face features used for calculating the first similarity are not similar, and at this time, it can be considered that the face images respectively corresponding to the two face features used for calculating the first similarity do not belong to the same person, so that the nodes corresponding to the two face images are not connected, thereby reducing the occurrence of the situation of 'one-grade-by-many' in the subsequent clustering process and improving the accuracy of face image clustering.

In this embodiment, according to the comparison result of the first similarity and the connection threshold, the connection relationship between two nodes corresponding to two face features for calculating the first similarity is determined, so that the representation capability of the constructed node graph data is improved, the node graph data can contain the similarity information between the face features corresponding to the nodes, the subsequent face image clustering process can be conveniently and accurately performed, the situations of "one person with multiple gears" or "multiple persons with one gear" are avoided, and the accuracy of face image clustering can be effectively improved.

The method comprises the steps of obtaining N face images and the face characteristics of each face image, calculating the first similarity between the face characteristics of any two nodes by taking each face image as a node, constructing an edge between the two corresponding nodes according to the first similarity, and forming node graph data, and constructing the node graph data according to the face images and the face characteristics thereof, so that the N face images are represented in a structured form, the connection between the nodes is conveniently and rapidly discovered, the optimization, the updating and the modification are also facilitated, the situation that under the condition that the number of the face images is large and the relation is complex, a large storage space and a large number of computing resources are needed for storing and calculating data is avoided, and the convenience of face image clustering can be effectively improved.

Step S202, obtaining the human body image to which each human face image belongs, calculating a second similarity between the human body image and other human body images aiming at any human body image, and screening out a target image meeting the condition that the second similarity is larger than a preset similarity threshold value from all other human body images.

The human body image can be a human body image of a person to which the human face image belongs in a snapshot image to which the human face image belongs, the second similarity can be used for representing the similarity degree between the human body images, the similarity threshold can be used for judging whether the human body images are similar enough, namely whether the human body images belong to the same person, and the target image is a human body image which is possibly belonging to the same person with the targeted human body image in other human body images.

Specifically, the face image is obtained by extracting a face from a snapshot image, then one face image corresponds to one snapshot image, the snapshot image is the snapshot image to which the face image belongs, the snapshot image is subjected to body extraction, the body extraction can be performed through a trained second target detection model, the trained second target detection model can also adopt the same architecture as the trained first target detection model, such as a YOLO model, a Mask-RCNN model, and the like, the output result of the trained second target detection model can be a bounding box of a body region, the bounding box of the body region can be represented through an upper left-corner coordinate point and a lower right-corner coordinate point of the bounding box, a second clipping region can be determined in the snapshot image according to the upper left-corner coordinate point and the lower right-corner coordinate point of the bounding box, the snapshot image is clipped according to the second clipping region, the body image is clipped, and the obtained body image needs to be subjected to size normalization so as to process different subsequent body images in the same manner.

Because the captured image may contain a plurality of people, after the captured image is subjected to human body extraction, more than one human body image may be obtained, at this time, the human body image to which the human face image belongs needs to be determined, specifically, the bounding box of the human face region and the bounding box of each human body region are subjected to cross-over comparison calculation, that is, the area of the intersection part of the bounding boxes of the human body regions of the bounding boxes of the human face region is adopted to be compared with the area of the union part of the bounding boxes of the human body regions of the bounding boxes of the human face region, and the obtained ratio is the cross-over comparison.

And determining a surrounding frame of the human body area corresponding to the maximum value of the cross-over ratio from all the cross-over ratios, and cutting out the human body image in the snapshot image according to the surrounding frame of the human body area, wherein the human body image is the human body image to which the human face image belongs.

The human body image to which each human face image belongs can be obtained according to the method, the number of the human body images is N due to the fact that N human face images are obtained, the number of the other corresponding human body images for any human body image is N-1, the N-1 other human body images are respectively subjected to similarity calculation with the corresponding human body images, and N-1 second similarities are obtained.

And comparing the N-1 second similarities with a preset similarity threshold respectively, and screening out second similarities which are greater than the similarity threshold from the N-1 second similarities, wherein all the human body images corresponding to the second similarities which are greater than the similarity threshold are target images, in this embodiment, the preset similarity threshold can be set to 0.6, and correspondingly, normalization processing needs to be performed on the second similarities after calculation so as to ensure that the value range of the second similarities is [0,1].

If the comparison result is that the second similarity is smaller than or equal to the similarity threshold, it indicates that the two human body images used for calculating the second similarity are not similar, and at this time, the two human body images used for calculating the second similarity are considered not to belong to the same person.

If the second similarity is greater than the similarity threshold, it indicates that the two human body images used for calculating the second similarity are sufficiently similar, and at this time, it may be considered that the two human body images used for calculating the second similarity may belong to the same person, and other human body images that may belong to the same person as the targeted human body image are determined as the target image.

Optionally, for any human body image, calculating the second similarity between the human body image and the other human body images includes:

inputting each human body image into a trained feature extraction model for feature extraction to obtain the human body features of each corresponding human body image;

and respectively carrying out similarity calculation on the human body characteristics of the human body image and the human body characteristics of other human body images aiming at any human body image to obtain a second similarity between the human body image and the other human body images.

The trained feature extraction model can be used for extracting features of the human body image, the trained feature extraction model can adopt an encoder structure of the trained target detection model, the trained feature extraction model is input into the human body image and output into human body features corresponding to the input human body image, and the human body features can be used for representing information of the human body image.

Specifically, when similarity calculation is directly performed on a human body image, due to the influence of factors such as image noise, low-level information in the human body image is different, and further, a similarity calculation result is deviated.

In this embodiment, the similarity between the human features may be calculated by using cosine similarity, euclidean distance, manhattan distance, and the like.

In the embodiment, the similarity calculation is performed by adopting the human body characteristics corresponding to the human body image, so that the noise information of the image is ignored, the accuracy of the similarity calculation is improved, and the representation capability of the similarity calculation result is also improved.

Optionally, the step of screening out a target image satisfying that the second similarity is greater than a preset similarity threshold from all other human body images includes:

calculating a similarity threshold according to all the second similarities, and comparing each second similarity with the similarity threshold respectively;

and determining other human body images corresponding to the second similarity larger than the similarity threshold value as target images.

The determination of the similarity threshold is based on all actually obtained second similarities, and the calculation method can be determined by adopting a mean value, a median and the like.

Specifically, in this embodiment, all the second similarities are sorted from large to small to obtain a second similarity sequence, a K +1 th element in the second similarity sequence is determined, the similarity corresponding to the K +1 th element is used as a similarity threshold, and in this embodiment, the value of K is set to be 5.

At this time, the number of the second similarities larger than the similarity threshold is K, that is, the number of the target images is K.

In the embodiment, the similarity threshold is determined in a dynamic adjustment mode, so that the number of the target images is stabilized at K, the calculated amount is reduced, and the target images are not too few due to the fixed similarity threshold, so that the target images of the same person are not completely extracted and are omitted, and the accuracy of extraction of the target images of the person is ensured.

The method comprises the steps of obtaining a human body image to which each human face image belongs, calculating a second similarity between the human body image and other human body images aiming at any human body image, screening out a target image meeting the condition that the second similarity is larger than a preset similarity threshold value from all other human body images, and determining a target image corresponding to the human body image according to the comparison between the second similarity between the human body images to which the human face images belong and the preset similarity threshold value, namely the target image possibly belonging to a person with the human body image, so that the subsequent judgment is conveniently carried out according to the target image and the human body image, the node image data is updated, and the accuracy of the node image data information representation is improved.

Step S203, acquiring the acquisition information of the human body image and the acquisition information of the target image, splicing the second similarity between the human body image and the target image and the acquisition information of the human body image and the acquisition information of the target image, inputting the splicing result into the trained classification model to obtain a classification result, and updating the node diagram data according to the classification result.

The acquisition information may include acquisition time information and acquisition location information, that is, image acquisition equipment for acquiring a human body image or a target image, a time point for acquiring the image, and a world coordinate system position where the image acquisition equipment is located.

The concatenation mode can adopt the connection, the classification model that trains well includes encoder layer and full tie layer, wherein, the encoder layer can be used for extracting the characteristic of input, full tie layer can be arranged in with the characteristic mapping of input to the classification space, in this embodiment, the classification result includes two categories, first category is same personnel category, same personnel category is used for the human image and the target image that the characterization input belongs to same personnel, the second category is non-same personnel category, non-same personnel category is used for the human image and the target image that the characterization input does not belong to same personnel.

The node map data may be updated according to the classification result by adding and deleting edges in the node map data.

Specifically, a time difference T between the acquisition time information of the human body image and the acquisition time information of the target image is calculated, a distance D between the acquisition location information of the human body image and the acquisition location information of the target image is calculated, then a second similarity α between the human body image and the target image, the time difference T between the acquisition time information of the human body image and the acquisition time information of the target image, and the distance D between the acquisition location information of the human body image and the acquisition location information of the target image are spliced, and an obtained splicing result can be represented as [ α, T, D ], that is, input data of the trained classification model is input into the trained classification model, and a classification result can be obtained.

In an embodiment, the second similarity α between the human body image and the target image, and the acquisition time information t of the human body image may also be used ₁ Acquisition location information d of human body image ₁ Acquisition time information t of target image ₂ Acquisition location information d of the target image ₂ Splicing is directly carried out, and the splicing result after splicing can be expressed as alpha, t ₁ ,d ₁ ,t ₂ ,d ₂ ]。

Optionally, the updating the node map data according to the classification result includes:

when the classification result is detected to be a preset first category, whether an edge exists between a node corresponding to the human body image and a node corresponding to the target image is detected in the node map data;

and if no edge exists between the node corresponding to the human body image and the node corresponding to the target image, constructing an edge between the node corresponding to the human body image and the node corresponding to the target image to obtain updated node map data.

The preset first category may refer to the same person category, that is, the input human body image and the target image belong to the same person, and the updated node map data may refer to node map data subjected to edge adding processing.

Specifically, when the classification result is detected to be the preset first category, it is described that the human body image and the target image are considered to belong to the same person at the time, whether an edge exists between a node corresponding to the human body image and a node corresponding to the target image is detected, and if no edge exists, it is described that the human face image is considered not to belong to the same person when node map data is formed based on the similarity between the human face images.

Because the face image is unclear due to various factors such as the pose and the illumination condition of the image acquisition device when acquiring the snapshot image, and the features of the face image mainly gather in detail features such as eyebrows, eyes, a nose and lips, in an application scene of face recognition, it is generally not ensured that the photographed person is fixed at the same pose for the image acquisition device to acquire, but the image acquisition device performs snapshot in continuous actions of the photographed person, so that the detail features are blocked or blurred, and the face image similarity is calculated to have errors.

Meanwhile, the photographed person may have a situation that the facial region is blocked, for example, wearing a mask or glasses, which further results in that the detailed features of the face cannot be recognized and also results in an error in calculating the similarity of the face image.

Therefore, the reliability of the similarity calculated based on the face image is low due to objective environment factors of the application scene and subjective factors of the photographed person, which affects the accuracy of the representation of the node map data, and the characteristics of the human body image mainly gather in obvious characteristics, such as the dressing type and color of the person, the human body key point distribution of the person, and the information of articles carried by the person, which are generally considered not to change within a certain period of time, so that the node map data can be updated in an auxiliary manner based on the classification result of the human body image and the target image.

And when the classification result is detected to be a preset first type and no edge exists between the node corresponding to the human body image and the node corresponding to the target image, the calculation result of the similarity of the human face image is possibly not credible, and an edge is constructed between the node corresponding to the human body image and the node corresponding to the target image, namely, the node map data is updated in an updating mode of a newly added edge to obtain updated node map data.

In this embodiment, when it is detected that the classification result is the preset first category and no edge exists between the node corresponding to the human body image and the node corresponding to the target image, an edge is constructed between the node corresponding to the human body image and the node corresponding to the target image, so that it is avoided that the similarity obtained by calculation based on the human face image is low due to the influence of objective factors of a scene and subjective factors of a person to be shot, such as a large deviation of a shot angle, and a missed edge exists in the node map data, so that a condition of "one person has multiple gears" is generated, and thus the characterization accuracy of the node map data is improved.

when the classification result is detected to be a preset second category, whether an edge exists between a node corresponding to the human body image and a node corresponding to the target image is detected in the node map data;

and if the situation that edges exist between the nodes corresponding to the human body images and the nodes corresponding to the target images is detected, deleting the edges between the nodes corresponding to the human body images and the nodes corresponding to the target images to obtain updated node map data.

The preset second category may refer to a non-identical person category, that is, the input human body image and the target image do not belong to the same person, and the updated node map data may refer to node map data subjected to edge deletion processing.

Specifically, when the classification result is detected to be the preset second category, it is described that the human body image and the target image are not considered to belong to the same person at the time, whether an edge exists between a node corresponding to the human body image and a node corresponding to the target image is detected, and if an edge exists, it is described that the human face image is considered to belong to the same person when node map data is formed based on the similarity between the human face images.

And deleting the edges between the nodes corresponding to the human body images and the nodes corresponding to the target images, namely updating the node map data in an edge deletion updating mode to obtain the updated node map data.

In this embodiment, when it is detected that the classification result is the preset second category and an edge exists between a node corresponding to the human body image and a node corresponding to the target image, the edge is constructed between the node corresponding to the human body image and the node corresponding to the target image, so that the situation that "one multi-person and one-file" are generated due to higher similarity obtained by calculation based on the human face image and the existence of an erroneous edge in node map data caused by the influence of scene objective factors and subjective factors of a person to be shot, such as image blurring and mask wearing, is avoided, and the characterization accuracy of the node map data is improved.

Optionally, the splicing result of the sample similarity and the sample acquisition information is used as a training sample of the classification model, the sample category is used as a training label of the classification model, the binary cross entropy loss is used as a loss function during the training of the classification model, the sample similarity is used for representing the similarity between two human body images of the sample, and the sample acquisition information comprises sample acquisition time and sample acquisition place;

the training process of the classification model comprises the following steps:

calculating the difference value of the sample acquisition time corresponding to the two sample human body images to obtain a sample interval;

comparing sample acquisition places corresponding to the two sample human body images to obtain a place comparison result;

splicing the sample similarity, the sample interval and the place comparison result, and inputting the spliced sample similarity, sample interval and place comparison result into a classification model to obtain a sample classification result;

and calculating the classification loss according to the sample classification result, the sample category and the binary cross entropy loss, and training the classification model by taking the classification loss as a basis to obtain the trained classification model.

The sample human body image can be a human body image used for classification model training, the sample acquisition information is time information and location information when the sample human body image is acquired, the sample category is a pre-labeled category, and the sample category comprises the same person category and a non-same person category.

Specifically, in the present embodiment, the annotation 2500 pairs of positive samples, i.e., two sample human images of the same person category, and 2500 pairs of negative samples, i.e., a pair of sample human images of non-same person category. Meanwhile, a test set is constructed, and the test set comprises positive and negative sample pairs which are not less than 1000 pairs in total, so that the generalization performance of the model can be quantitatively checked.

If the classification model is recorded as f, the following mathematical function expression relationship is shown: y = f (X), where X represents the stitching result and y represents the sample classification result, and in this embodiment, the classification model f may also adopt a machine learning model such as a random forest, a support vector machine, and the like.

In the embodiment, the classification model is trained by adopting a plurality of groups of positive and negative samples, so that the accuracy of model fitting is improved, and the generalization capability and accuracy of the trained classification model are further ensured.

The method comprises the steps of acquiring acquisition information of a human body image and acquisition information of a target image, splicing the second similarity between the human body image and the target image and the acquisition information of the human body image and the acquisition information of the target image, inputting the splicing result into a trained classification model to obtain a classification result, and updating node point data according to the classification result.

And step S204, carrying out clustering analysis on the updated node map data by adopting a preset clustering algorithm to obtain a clustering result of the N face images.

The preset clustering algorithm may adopt a community discovery algorithm, such as an infomap algorithm. The clustering result after the clustering analysis comprises a plurality of clustering sets, wherein each clustering set is a face image set of the same person.

Specifically, the result of face image clustering can be applied to application scenes such as face recognition, image classification, intelligent security, product recommendation and the like, for example, the intelligent security scene is taken as an example, a cell entrance in a target time period is subjected to image snapshot to obtain a plurality of face images, and after the plurality of face images are processed, a plurality of cluster sets can be obtained, wherein the number of face images contained in each cluster set is inconsistent, and people corresponding to the cluster set with the small number of face images are likely to be very living people or foreign people, so that initial investigation information is provided for supervisors.

The step of performing cluster analysis on the updated node map data by adopting the preset cluster algorithm to obtain the cluster results of the N face images is performed, and the updated node map data is used for performing cluster analysis, so that the face images of the same person can be effectively grouped into one group, and the face images of different persons are grouped into multiple groups, thereby improving the accuracy of face image clustering.

In this embodiment, additional information is provided for the node map data through the similarity between the human body images to which the face images belong, the node map data is updated in an auxiliary manner, and the accuracy of updating the node map data is improved, so that the accuracy of a clustering result obtained based on the updated node map data clustering analysis is improved, that is, the accuracy of face image clustering is improved.

A structural block diagram of a face image clustering device based on a classification model according to the second embodiment of the present invention is shown in fig. 4, where the face image clustering device is applied to a client, a computer device corresponding to the client is connected to a server to obtain N face images and face features of each face image, a human body image to which each face image belongs, and acquisition information of each human body image from the server, the N face images are face images to be clustered, a trained classification model is deployed on a computer device corresponding to the client, and the trained classification model can be used to classify whether human body images belong to the same person. For convenience of explanation, only portions related to the embodiments of the present invention are shown.

Referring to fig. 4, the face image clustering apparatus includes:

the graph constructing module 41 is configured to obtain N face images and face features of each face image, calculate a first similarity between the face features of any two nodes with each face image as a node, construct an edge between two corresponding nodes according to the first similarity, and form node graph data, where N is an integer greater than one;

the similarity calculation module 42 is configured to obtain a human body image to which each human face image belongs, calculate a second similarity between the human body image and other human body images for any human body image, and screen out a target image satisfying that the second similarity is greater than a preset similarity threshold from all other human body images;

the map updating module 43 is configured to acquire acquisition information of the human body image and acquisition information of the target image, splice the second similarity between the human body image and the target image and the acquisition information of the human body image and the acquisition information of the target image, input the spliced result into the trained classification model to obtain a classification result, and update the node map data according to the classification result;

and the cluster analysis module 44 is configured to perform cluster analysis on the updated node map data by using a preset cluster algorithm to obtain a cluster result of the N face images.

Optionally, the graph building module 41 includes:

the first comparison unit is used for comparing the first similarity with a preset connection threshold value to obtain a comparison result;

a first constructing unit, configured to construct an edge between two corresponding nodes if the comparison result is that the first similarity is greater than the connection threshold;

and the second construction unit does not construct the edge between the two corresponding nodes if the comparison result shows that the first similarity is smaller than or equal to the connection threshold.

Optionally, the similarity calculation module 42 includes:

the characteristic extraction unit is used for inputting each human body image into the trained characteristic extraction model for characteristic extraction to obtain the human body characteristic of each corresponding human body image;

and the characteristic calculation unit is used for calculating the similarity of the human body characteristics of the human body image and the human body characteristics of other human body images respectively aiming at any human body image to obtain a second similarity between the human body image and the other human body images.

Optionally, the similarity calculation module 42 includes:

the second comparison unit is used for comparing each second similarity with the similarity threshold value respectively;

and the target image determining unit is used for determining other human body images corresponding to the second similarity larger than the similarity threshold as target images.

Optionally, the graph update module 43 includes:

the first detection unit is used for detecting whether edges exist between nodes corresponding to the human body images and nodes corresponding to the target images in the node map data when the classification result is detected to be a preset first category;

and the first updating unit is used for constructing edges between the nodes corresponding to the human body images and the nodes corresponding to the target images to obtain updated node map data if no edges exist between the nodes corresponding to the human body images and the nodes corresponding to the target images.

Optionally, the graph update module 43 includes:

a second detecting unit, configured to detect whether an edge exists between a node corresponding to the human body image and a node corresponding to the target image in the node map data when the classification result is detected to be a preset second category;

and the second updating unit is used for deleting the edges between the nodes corresponding to the human body image and the nodes corresponding to the target image if the edges between the nodes corresponding to the human body image and the nodes corresponding to the target image are detected, so as to obtain updated node map data.

the face image clustering device further includes:

the interval calculation module is used for calculating the difference value of the sample acquisition time corresponding to the two sample human body images to obtain a sample interval;

the location comparison module is used for comparing the sample acquisition locations corresponding to the two sample human body images to obtain a location comparison result;

the sample classification module is used for splicing the sample similarity, the sample interval and the place comparison result and inputting the spliced sample similarity, the sample interval and the place comparison result into a classification model to obtain a sample classification result;

and the model training module is used for calculating the classification loss according to the sample classification result, the sample category and the binary cross entropy loss, and training the classification model by taking the classification loss as a basis to obtain the trained classification model.

It should be noted that, because the above-mentioned information interaction between the modules and units, the execution process, and other contents are based on the same concept, and the specific functions and technical effects thereof are based on the same concept, reference may be made to the section of the method embodiment specifically, and details are not described here.

Fig. 5 is a schematic structural diagram of a computer device according to a third embodiment of the present invention. As shown in fig. 5, the computer apparatus of this embodiment includes: at least one processor (only one shown in fig. 5), a memory, and a computer program stored in the memory and executable on the at least one processor, the processor when executing the computer program implementing the steps of any of the various facial image clustering method embodiments described above.

The computer device may include, but is not limited to, a processor, a memory. It will be appreciated by those skilled in the art that fig. 5 is merely an example of a computer device and is not intended to be limiting, and that a computer device may include more or fewer components than those shown, or some components may be combined, or different components may be included, such as a network interface, a display screen, and input devices, etc.

The Processor may be a CPU, or other general purpose Processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field-Programmable Gate Array (FPGA) or other Programmable logic device, discrete Gate or transistor logic, discrete hardware components, or the like. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.

The memory includes readable storage medium, internal memory, etc., where the internal memory may be a memory of the computer device, and the internal memory provides an environment for the operating system and the execution of computer-readable instructions in the readable storage medium. The readable storage medium may be a hard disk of the computer device, and in other embodiments may also be an external storage device of the computer device, such as a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card), and the like, provided on the computer device. Further, the memory may also include both internal and external storage units of the computer device. The memory is used for storing an operating system, application programs, a BootLoader (BootLoader), data, and other programs, such as program codes of a computer program, and the like. The memory may also be used to temporarily store data that has been output or is to be output.

It will be apparent to those skilled in the art that, for convenience and brevity of description, only the above-mentioned division of the functional units and modules is illustrated, and in practical applications, the above-mentioned function distribution may be performed by different functional units and modules according to needs, that is, the internal structure of the apparatus is divided into different functional units or modules, so as to perform all or part of the functions described above. Each functional unit and module in the embodiments may be integrated in one processing unit, or each unit may exist alone physically, or two or more units are integrated in one unit, and the integrated unit may be implemented in a form of hardware, or in a form of software functional unit. In addition, specific names of the functional units and modules are only for convenience of distinguishing from each other, and are not used for limiting the protection scope of the present invention. The specific working processes of the units and modules in the above-mentioned apparatus may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again. The integrated unit, if implemented in the form of a software functional unit and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, all or part of the flow of the method of the above embodiments may be implemented by a computer program, which may be stored in a computer readable storage medium and used by a processor to implement the steps of the above method embodiments. Wherein the computer program comprises computer program code, which may be in the form of source code, object code, an executable file or some intermediate form, etc. The computer readable medium may include at least: any entity or device capable of carrying computer program code, recording medium, computer Memory, read-Only Memory (ROM), random Access Memory (RAM), electrical carrier signals, telecommunications signals, and software distribution media. Such as a usb-disk, a removable hard disk, a magnetic or optical disk, etc. In certain jurisdictions, computer-readable media may not be an electrical carrier signal or a telecommunications signal in accordance with legislative and patent practice.

The present invention can also be implemented by a computer program product, which when executed on a computer device causes the computer device to implement all or part of the processes in the method of the above embodiments.

In the above embodiments, the descriptions of the respective embodiments have respective emphasis, and reference may be made to the related descriptions of other embodiments for parts that are not described or illustrated in a certain embodiment.

Those of ordinary skill in the art will appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware or combinations of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present invention.

In the embodiments provided by the present invention, it should be understood that the disclosed apparatus/computer device and method may be implemented in other ways. For example, the above-described apparatus/computer device embodiments are merely illustrative, and for example, a module or a unit may be divided into only one logical function, and may be implemented in other ways, for example, multiple units or components may be combined or integrated into another system, or some features may be omitted, or not implemented. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be through some interfaces, indirect coupling or communication connection of devices or units, and may be in an electrical, mechanical or other form.

Units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.

The above examples are only intended to illustrate the technical solution of the present invention, but not to limit it; although the present invention has been described in detail with reference to the foregoing embodiments, it should be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; such modifications and substitutions do not substantially depart from the spirit and scope of the embodiments of the present invention, and are intended to be included within the scope of the present invention.

Claims

1. A facial image clustering method based on a classification model is characterized by comprising the following steps:

2. The facial image clustering method according to claim 1, wherein the constructing an edge between two corresponding nodes according to the first similarity comprises:

if the comparison result is that the first similarity is greater than the connection threshold, an edge is constructed between the two corresponding nodes;

and if the comparison result shows that the first similarity is smaller than or equal to the connection threshold, constructing no edge between the two corresponding nodes.

3. The facial image clustering method according to claim 1, wherein the calculating a second similarity between the human body image and other human body images for any human body image comprises:

and aiming at any human body image, respectively carrying out similarity calculation on the human body characteristics of the human body image and the human body characteristics of other human body images to obtain a second similarity between the human body image and the other human body images.

4. The facial image clustering method according to claim 1, wherein the screening out the target images satisfying that the second similarity is greater than a preset similarity threshold from all other human images comprises:

comparing each second similarity with the similarity threshold respectively;

5. The method for clustering face images according to claim 1, wherein the updating the node map data according to the classification result comprises:

and if no edge exists between the node corresponding to the human body image and the node corresponding to the target image, constructing an edge between the node corresponding to the human body image and the node corresponding to the target image to obtain the updated node map data.

6. The method for clustering face images according to claim 1, wherein the updating the node map data according to the classification result comprises:

and if detecting that an edge exists between the node corresponding to the human body image and the node corresponding to the target image, deleting the edge between the node corresponding to the human body image and the node corresponding to the target image to obtain the updated node map data.

7. The method for clustering human face images according to any one of claims 1 to 6, wherein a splicing result of sample similarity and sample acquisition information is used as a training sample of the classification model, a sample category is used as a training label of the classification model, and binary cross entropy loss is used as a loss function during training of the classification model, wherein the sample similarity is used for representing the similarity between two sample human body images, and the sample acquisition information comprises sample acquisition time and sample acquisition place;

the training process of the classification model comprises the following steps:

after the sample similarity, the sample interval and the place comparison result are spliced, inputting the spliced sample similarity, the spliced sample interval and the place comparison result into the classification model to obtain a sample classification result;

8. A facial image clustering apparatus based on a classification model, the facial image clustering apparatus comprising:

9. A computer device, characterized in that the computer device comprises a processor, a memory and a computer program stored in the memory and operable on the processor, the processor implementing the face image clustering method according to any one of claims 1 to 7 when executing the computer program.

10. A computer-readable storage medium storing a computer program, wherein the computer program is executed by a processor to implement the face image clustering method according to any one of claims 1 to 7.