CN112766421A - Face clustering method and device based on structure perception - Google Patents

Face clustering method and device based on structure perception Download PDF

Info

Publication number
CN112766421A
CN112766421A CN202110272409.9A CN202110272409A CN112766421A CN 112766421 A CN112766421 A CN 112766421A CN 202110272409 A CN202110272409 A CN 202110272409A CN 112766421 A CN112766421 A CN 112766421A
Authority
CN
China
Prior art keywords
graph
neighbor
face
sampling
sub
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202110272409.9A
Other languages
Chinese (zh)
Other versions
CN112766421B (en
Inventor
周杰
鲁继文
沈帅
李万华
朱政
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tsinghua University
Original Assignee
Tsinghua University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tsinghua University filed Critical Tsinghua University
Priority to CN202110272409.9A priority Critical patent/CN112766421B/en
Publication of CN112766421A publication Critical patent/CN112766421A/en
Application granted granted Critical
Publication of CN112766421B publication Critical patent/CN112766421B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/168Feature extraction; Face representation

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Health & Medical Sciences (AREA)
  • General Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Engineering & Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • General Health & Medical Sciences (AREA)
  • Computing Systems (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Computational Linguistics (AREA)
  • Biophysics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Biomedical Technology (AREA)
  • Molecular Biology (AREA)
  • Oral & Maxillofacial Surgery (AREA)
  • Human Computer Interaction (AREA)
  • Multimedia (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Biology (AREA)
  • Image Analysis (AREA)

Abstract

The application provides a face clustering method and a face clustering device based on structure perception, wherein the method comprises the following steps: acquiring a plurality of face images to be processed, extracting the face features of each face image to be processed based on a pre-trained convolutional neural network model, and constructing a K neighbor image according to the face features of each face image to be processed; inputting the K neighbor graph into a pre-trained edge score prediction model to obtain the score of each edge in the K neighbor graph; the edge score prediction model is obtained by sampling a K neighbor graph by using a structure-preserving sub-graph sampling strategy and training a graph convolution neural network by using a sub-graph obtained by sampling; and performing first pruning operation on the K neighbor graph according to the scores of all edges in the K neighbor graph to obtain face clusters aiming at the plurality of face images to be processed. The technical problem of insufficient face clustering accuracy in the correlation technology is solved.

Description

Face clustering method and device based on structure perception
Technical Field
The application relates to the technical field of artificial intelligence and deep learning in the technical field of image processing, in particular to a face clustering method and device based on structure perception.
Background
The development of face recognition technology relies on the presentation of large-scale face data sets. In recent years, with the development of face recognition technology, the size of a face data set is getting larger, but with the increase of the size of the data set, the labeling cost is getting more expensive.
The face clustering algorithm is an effective method for reducing the labeling cost, however, in the related art, when large-scale real face data is faced, the clustering accuracy of a clustering model needs to be improved.
Disclosure of Invention
The present application is directed to solving, at least to some extent, one of the technical problems in the related art.
Therefore, a first objective of the present application is to provide a face clustering method based on structure sensing to improve face clustering accuracy.
A second object of the present application is to propose a device.
A third object of the present application is to provide an electronic device.
A fourth object of the present application is to propose a non-transitory computer readable storage medium.
A fifth object of the present application is to propose a computer program product.
In order to achieve the above object, an embodiment of a first aspect of the present application provides a face clustering method, including:
acquiring a plurality of face images to be processed, extracting the face features of each face image to be processed based on a pre-trained convolutional neural network model, and constructing a K neighbor image according to the face features of each face image to be processed;
inputting the K neighbor graph into a pre-trained edge score prediction model to obtain the score of each edge in the K neighbor graph; the edge score prediction model is obtained by sampling a K neighbor graph by using a structure-preserving sub-graph sampling strategy and training a graph convolution neural network by using a sub-graph obtained by sampling;
and performing first pruning operation on the K neighbor graph according to the scores of all edges in the K neighbor graph to obtain face clusters aiming at the plurality of face images to be processed.
To achieve the above object, an embodiment of a second aspect of the present application provides an apparatus, including:
the first acquisition module is used for acquiring a plurality of face images to be processed;
the characteristic extraction module is used for extracting the face characteristic of each face image to be processed based on a pre-trained convolutional neural network model;
the construction module is used for constructing a K neighbor image according to the face features of each face image to be processed;
the prediction module is used for inputting the K neighbor graph into a pre-trained edge score prediction model to obtain the score of each edge in the K neighbor graph; the edge score prediction model is obtained by sampling a K neighbor graph by using a structure-preserving sub-graph sampling strategy and training a graph convolution neural network by using a sub-graph obtained by sampling;
and the pruning operation module is used for carrying out first pruning operation on the K neighbor graph according to the scores of all edges in the K neighbor graph to obtain the face clusters aiming at the plurality of face images to be processed.
According to a third aspect of the present application, there is provided an electronic device comprising:
at least one processor; and
a memory communicatively coupled to the at least one processor; wherein,
the memory stores instructions executable by the at least one processor to enable the at least one processor to perform a method for structure perception based face clustering in accordance with an aspect of the present application.
According to a fourth aspect of the present application, there is provided a non-transitory computer-readable storage medium storing computer instructions for causing a computer to perform the method for face clustering based on structure perception of the first aspect of the present application.
According to a fifth aspect of the present application, there is provided a computer program product comprising a computer program which, when executed by a processor, implements the method for structure perception based face clustering according to the first aspect.
According to the technical scheme of the embodiment of the application, the features of the face image are extracted through the convolutional neural network, the accurate face features are obtained, and the K neighbor image is generated. And acquiring a subgraph from the K neighbor graph through a structure-preserving subgraph sampling strategy, wherein the subgraph can represent the intra-cluster relation and the inter-cluster relation in the K neighbor graph, and the edge score prediction model trained by the subgraph is used for carrying out first pruning operation on the K neighbor graph, so that the face clustering is more accurate.
Additional aspects and advantages of the present application will be set forth in part in the description which follows and, in part, will be obvious from the description, or may be learned by practice of the present application.
Drawings
The foregoing and/or additional aspects and advantages of the present application will become apparent and readily appreciated from the following description of the embodiments, taken in conjunction with the accompanying drawings of which:
FIG. 1 is a flow diagram of a method for face clustering based on structure perception according to an embodiment of the present application;
FIG. 2 is a flow diagram of a method for face clustering based on structure perception according to another embodiment of the present application;
FIG. 3 is a schematic illustration of a second pruning operation according to one embodiment of the present application;
FIG. 4 is a flow diagram of a method for face clustering based on structure perception according to yet another embodiment of the present application;
FIG. 5 is a flow diagram of a method for face clustering based on structure perception according to an embodiment of the present application;
FIG. 6 is a block diagram of a face clustering device based on structure perception according to an embodiment of the present application;
fig. 7 is a block diagram of a face clustering device based on structure perception according to another embodiment of the present application;
fig. 8 is a block diagram of an electronic device for implementing a method for face clustering based on structure perception according to an embodiment of the present application.
Detailed Description
Reference will now be made in detail to embodiments of the present application, examples of which are illustrated in the accompanying drawings, wherein like or similar reference numerals refer to the same or similar elements or elements having the same or similar function throughout. The embodiments described below with reference to the drawings are exemplary and intended to be used for explaining the present application and should not be construed as limiting the present application.
The following describes a face clustering method and apparatus based on structure sensing according to an embodiment of the present application with reference to the drawings.
Fig. 1 is a flowchart of a face clustering method based on structure perception according to an embodiment of the present application. It should be noted that the face clustering method based on structure sensing in the embodiments of the present application can be applied to the face clustering device based on structure sensing in the embodiments of the present application, and the face clustering device based on structure sensing can be configured on the electronic device in the embodiments of the present application.
As shown in fig. 1, the method for clustering faces based on structure perception may include:
step 101, obtaining a plurality of face images to be processed, extracting face features of each face image to be processed based on a pre-trained convolutional neural network model, and constructing a K neighbor image according to the face features of each face image to be processed.
In some embodiments of the present application, facial images belonging to the same type may be connected together through a K-neighbor graph, where nodes in the K-neighbor graph represent facial images, and edges of the nodes in the K-neighbor graph represent facial images corresponding to the nodes belonging to the same type. In the construction process of the K-nearest neighbor graph, two parameters are needed, and the two parameters are as follows: and the face feature and the K value of each face image to be processed. The human face features can be extracted from the human face image to be processed through a pre-trained convolutional neural network model; the value of K may be set based on experience and/or the number of face images to be processed, and the value of K may determine the classification effect of the K neighbor images.
It can be understood that due to the limitation of the pre-trained convolutional neural network model and the selection of the K value, in the K neighbor map, not only the face images belonging to the same class are connected together, but also the face images not belonging to the same class are connected together with a certain probability, so that the face image classification accuracy of the K neighbor map at this time cannot meet the requirement. In order to obtain accurate face clusters, further operation needs to be performed on the K neighbor images to remove the connection relationship of the face images which are connected in error.
Step 102, inputting the K neighbor graph into a pre-trained edge score prediction model to obtain scores of all edges in the K neighbor graph; the edge score prediction model is obtained by sampling a K neighbor graph by using a structure-preserving sub-graph sampling strategy and training a graph convolution neural network by using a sub-graph obtained by sampling.
It can be understood that scores of all edges in the K neighbor graph can be obtained by scoring all edges in the K neighbor graph, so that pruning operation on the K neighbor graph can be realized through screening the scores, and thus more accurate face clustering is obtained.
In some embodiments of the present application, an edge score prediction model may be trained in advance in order to score each edge in the K-neighbor graph. In some cases, due to the limitation of hardware computation power, a structure-preserving sub-graph sampling strategy can be used to sample the K neighbor graph, and a fractional prediction model is obtained from the sub-graph after sampling. Therefore, the edge score prediction model suitable for the K neighbor graph can be obtained on the premise of saving calculation power. Generally, in the K-nearest neighbor graph, nodes with close connection relation can be regarded as a cluster, and there may be connection between different clusters, but the connection relation between different clusters is usually not close.
For example, the structure-preserving subgraph sampling strategy may be to randomly sample the same number of nodes from each cluster in the K-neighbor graph and preserve the connection relationship between the nodes to obtain the subgraph. The subgraph can keep the connection relation of nodes in each cluster and also can keep the connection relation between the clusters, and understandably, each edge in the subgraph can be connected with two nodes, and the two nodes connected by one edge can be used as a node pair. An edge score prediction model may be obtained from the subgraph, the initial model of which includes but is not limited to: the atlas neural network and the atlas neural network are connected with any one of the multilayer perceptrons. When the edge score prediction model is trained, the input of the model is the whole sub-graph, and the output of the model is the edge score corresponding to each edge in the sub-graph. It is understood that when the node pair belongs to the same class, the edge score corresponding to the node pair is 1; when the node pair does not belong to the same class, the edge score corresponding to the node pair is 0. The training may also be supervised using loss functions including, but not limited to: an exponential loss function, a cross entropy loss function.
And 103, performing first pruning operation on the K neighbor graph according to the scores of all edges in the K neighbor graph to obtain face clusters aiming at a plurality of face images to be processed.
In some embodiments of the present application, in order to make face clustering in the K neighbor graph more accurate, a first pruning operation may be performed on the K neighbor graph. The first pruning operation can be completed through a pre-trained edge score prediction model, the pre-trained edge score prediction model can score each edge in the K neighbor graph, a threshold value can be preset, and a scoring result is compared with the threshold value. When the scoring result is greater than or equal to the threshold, the two nodes connected by the edge belong to the same class, and the edge can be reserved; when the scoring result is smaller than the threshold, it indicates that the two nodes connected by the edge do not belong to the same class, and the edge is removed. It is understood that edges connecting nodes belonging to the same class may be referred to as true edges; edges connecting nodes that do not belong to the same class may be referred to as false edges. And through the first pruning operation processing, face clusters aiming at a plurality of face images to be processed can be obtained.
According to the face clustering method based on the structure perception, the face images to be processed are clustered according to the face features extracted by the convolutional neural network model, and a K neighbor graph is constructed. Through a deep learning technology, more accurate face features can be obtained, so that the face clustering of the constructed K neighbor graph is more accurate. The subgraph of the K-neighbor graph can represent the relationship of each face image in one cluster in the K-neighbor graph and can also represent the relationship between different clusters in the K-neighbor graph. After the graph convolution neural network is trained, a score prediction model can be obtained, pruning operation on the K neighbor graph is achieved through the score prediction model, false edges are removed, and more accurate face clustering is obtained.
In the second embodiment of the present application, based on the first embodiment, in order to make face clustering more accurate, second pruning operation may be performed on the face clustering after the first pruning. The method can be specifically explained by using the second embodiment based on the face clustering method based on the structure perception of fig. 1. Optionally, step 201 and step 202 are further included after the first pruning operation is performed on the K-neighbor graph according to the scores of the edges in the K-neighbor graph.
Fig. 2 is a flowchart of a face clustering method based on structure perception according to another embodiment of the present application. To more clearly illustrate how to perform the second pruning operation, the second pruning operation can be specifically illustrated by fig. 2, where fig. 2 is a flowchart of a face clustering method based on structure sensing according to an embodiment of the present application, and specifically includes:
step 201, calculating the intimacy between every two nodes in the K neighbor graph after the first pruning operation.
In some embodiments of the present application, there may be an affinity between every two nodes of the K-neighbor graph. The affinity reflects whether two nodes belong to the same category. The higher the intimacy is, the higher the probability that every two nodes belong to the same category is; the lower the intimacy, the lower the probability that the two nodes belong to the same category. Generally, the probability that a node connected by an edge between clusters is a class is low; the probability is higher that an edge within a cluster connects a node that is a class.
For example, the intimacy degree calculation method may be to determine a range diameter according to the number of nodes in a cluster, and regarding the center of the cluster as a circle center, regarding nodes within the range diameter as adequate intimacy degree, and regarding points not within the range diameter as inadequate intimacy degree.
In some embodiments of the present application, the intimacy between every two nodes may also be calculated by the following formula:
Figure BDA0002974907820000081
wherein n is1Indicates the number of edges connected to node N1, N2Indicates the number of edges connected to node N2, and k indicates the number of neighbor nodes shared by node N1 and node N2.
Fig. 3 is a diagram illustrating a second pruning operation according to an embodiment of the present application, where an a node has nine edges and a B node has eight edges, where there is one neighbor node common to both the a node and the B node. The C node has seven edges, the D node has ten edges, wherein the total neighbor nodes of the C node and the D node are six, and the intimacy of A, B is obtained
Figure BDA0002974907820000082
C. D has an affinity of
Figure BDA0002974907820000083
In some embodiments of the present application, the computation of intimacy can also be realized by using matrix operation, where an adjacency matrix corresponding to the K-neighbor graph is a e RN × N, and the number of common neighbor nodes of all pairwise nodes is
Figure BDA0002974907820000084
Figure BDA0002974907820000085
Each element in (1)
Figure BDA0002974907820000086
Representing a node NiAnd NjThe number of common neighbors. The value of that intimacy is
Figure BDA0002974907820000087
Wherein, sum0=vec(∑ja·j -1),sum1=vec(∑ia -1). Where vec () represents vectorization, sum0Representing the sum of the rows of the matrix A, taking the reciprocal of each element and then vectorizing, sum1The expression is that after summing the matrix A by columns, the reciprocal of each element is taken and then vectorized.
And 202, performing second pruning operation on the K neighbor graph subjected to the first pruning operation according to the intimacy between every two nodes.
In some embodiments of the present application, a second pruning operation may be performed based on the K-neighbor map of the first pruning operation, and the second pruning operation may be implemented by presetting an affinity threshold. Under the condition that the intimacy between every two nodes is greater than or equal to the threshold value, retaining edges corresponding to the two nodes; and under the condition that the intimacy between every two nodes is smaller than the threshold value, removing edges corresponding to the two nodes.
According to the face clustering method based on the structure perception, after the first pruning operation, some false edges still exist, and the face clustering accuracy is influenced because the false edges are connected with different types by mistake. Therefore, the second pruning operation can be carried out on the K neighbor graph after the first pruning operation according to the intimacy between every two nodes, the obtained face clustering is more accurate through the second pruning operation, and the structure of the K neighbor graph is clearer. After the second pruning, most wrong edges in the K neighbor graph are deleted, and the face clustering information read from the K neighbor graph is more accurate.
In a third embodiment of the present application, based on the above embodiments, the edge score prediction model may be obtained by training in advance through the following steps.
More clearly, fig. 4 may be used to specifically explain, and fig. 4 is a flowchart of a face clustering method based on structure sensing according to another embodiment of the present application, which specifically includes:
step 401, a training sample set is obtained, where the training sample set includes a plurality of face sample images.
It can be understood that, a training sample set is usually required for performing deep learning model training, and the purpose of the present application is to obtain face clusters, and therefore, in some embodiments of the present application, the training sample set may include a plurality of face sample images.
And step 402, extracting the face features of each face sample image based on the convolutional neural network model.
It is to be understood that a sub-graph for training a graph convolution neural network may be sampled from the K-neighbor graph. When constructing the K neighbor map, the face features of the face sample image are needed. In some embodiments of the present application, a convolutional neural network model may be used to extract a face feature of a face sample image, where the face feature may be a vector.
And step 403, constructing K neighbor image samples according to the face features of each face sample image.
It can be understood that each face sample image has its corresponding face features, and K neighbor image samples can be constructed according to the face features.
And step 404, sampling the K neighbor graph sample based on a structure-preserving subgraph sampling strategy to obtain a subgraph obtained after sampling, training a graph convolution neural network by using the subgraph obtained after sampling to obtain a network parameter, and generating an edge score prediction model according to the network parameter.
In some embodiments of the present application, a structure-preserving sub-graph sampling strategy is used to sample K neighbor graph samples to obtain sub-graphs obtained after sampling, and the sub-graphs are obtained by a method that a certain number of face images are randomly selected from each cluster of the K neighbor graph to obtain sub-graphs, or by the following steps:
step one, randomly selecting M clusters from K neighbor image samples as sampling seeds.
It is to be understood that M clusters can be randomly selected from the K-neighbor image samples as sampling seeds in units of clusters in order to model edges between face images within the clusters. The number of sampling seeds may be one or more.
Step two, for each seed cluster, selecting N nearest neighbor clusters of each seed cluster, and taking a graph formed by the M clusters and the N nearest neighbor clusters as a first sub-graph S1.
In some embodiments of the present application, in order to model a relationship between a cluster and a cluster, for each seed cluster, N nearest neighbor clusters may be selected, and a calculation method of the nearest neighbor clusters may be implemented by a similarity calculation, which may be a cosine similarity of a cluster center. In order to allow simultaneous modeling of intra-cluster and inter-cluster relationships, M clusters and N nearest neighbor clusters may be grouped into a first sub-graph S1.
And step three, randomly selecting K1 clusters from the first sub-graph S1 to construct a second sub-graph S2.
In some embodiments of the present application, K1 clusters may be randomly selected from the first sub-graph S1 to construct a second sub-graph S2 in order to generalize the selected clusters.
And step four, randomly selecting K2 nodes from the second subgraph S2 as the subgraph obtained after sampling.
In some embodiments of the present application, to perform generalization on the nodes in the cluster, K2 nodes may be randomly selected from the second sub-graph S2 as the sampled sub-graph.
The K neighbor graph obtained by the structure-preserving subgraph sampling strategy can train the graph convolution neural network, and the training method can comprise the following steps:
step one, inputting the sub-graph obtained after sampling into a graph convolution neural network to obtain a fraction prediction value of each edge in the sub-graph.
In some embodiments of the present application, when nodes connected by each edge belong to the same class, the score prediction value may be 1; when the face images connected on each side belong to different classes, the score predicted value can be 0.
And step two, calculating loss values between the fraction predicted values of the edges and the real fraction values of the corresponding edges based on the cross entropy loss function.
In some embodiments of the present application, there may be a loss value between the score prediction value of each edge and the true score value of its corresponding edge, and a cross entropy loss function may be used to calculate the loss value.
And step three, training the graph convolution neural network according to the loss value.
It will be appreciated that the graph convolution neural network may be trained on the loss values.
Through training, network parameters can be obtained, and an edge score prediction model can be generated according to the obtained network parameters. The output of the edge score prediction model is shown in fig. 5, where fig. 5 is the output result of the edge score prediction model according to the embodiment of the present application, where the horizontal axis represents the output of the edge score prediction model, and the closer to 0, the smaller the probability that two face images connected by the edge belong to the same type; the closer to 1, the smaller the probability that the two face images connected by the edge belong to the same type.
According to the face clustering method based on structure perception, the K neighbor graph is constructed, the structure-preserving sub-graph sampling strategy is used for processing the K neighbor graph to obtain sub-graphs, and therefore an edge score prediction model is obtained. In the method, the subgraph reserves important structural information of the whole K neighbor graph, and reserves the connection with high correlation degree in the cluster and the connection with low correlation degree between the clusters. And randomness is introduced for generalization, and the processing method can enhance the performance of the model.
According to the embodiment of the application, the application also provides a face clustering device based on structure perception.
Fig. 6 is a block diagram of a face clustering device based on structure perception according to an embodiment of the present application. As shown in fig. 6, the face clustering apparatus 600 based on structure perception may include: a first obtaining module 601, a constructing module 602, a predicting module 603 and a pruning operation module 604.
Specifically, the first obtaining module 601 is configured to obtain a plurality of face images to be processed;
the feature extraction module is used for extracting the face features of each face image to be processed based on a pre-trained convolutional neural network model;
a constructing module 602, configured to construct a K neighbor map according to the face features of each to-be-processed face image;
the prediction module 603 is configured to input the K neighbor graph into a pre-trained edge score prediction model, and obtain scores of each edge in the K neighbor graph; the edge score prediction model is obtained by sampling a K neighbor graph by using a structure-preserving sub-graph sampling strategy and training a graph convolution neural network by using a sub-graph obtained by sampling;
and the pruning operation module 604 is configured to perform a first pruning operation on the K neighbor map according to the score of each edge in the K neighbor map, so as to obtain a face cluster for a plurality of face images to be processed.
Fig. 7 is a block diagram of a face clustering device based on structure perception according to another embodiment of the present application. In some embodiments of the present application, as shown in fig. 7, the face clustering apparatus based on structure perception further includes: a pre-training module 705.
Specifically, the pre-training module 705 is configured to pre-train the edge score prediction model; wherein, the pre-training module is specifically configured to: acquiring a training sample set, wherein the training sample set comprises a plurality of face sample images; extracting the face features of each face sample image based on a convolutional neural network model; constructing K neighbor image samples according to the face characteristics of each face sample image; sampling K neighbor graph samples based on a structure-preserving subgraph sampling strategy to obtain subgraphs obtained after sampling, training a graph convolution neural network by using the subgraphs obtained after sampling to obtain network parameters, and generating an edge score prediction model according to the network parameters.
Wherein 701 and 704 in fig. 7 and 601 and 604 in fig. 6 have the same functions and structures.
With regard to the apparatus in the above-described embodiment, the specific manner in which each module performs the operation has been described in detail in the embodiment related to the method, and will not be elaborated here.
There is also provided, in accordance with an embodiment of the present application, an electronic device, a readable storage medium, and a computer program product.
FIG. 8 shows a schematic block diagram of an example electronic device 800 that may be used to implement embodiments of the present application. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. The electronic device may also represent various forms of mobile devices, such as personal digital processing, cellular phones, smart phones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be examples only, and are not meant to limit implementations of the present application that are described and/or claimed herein.
As shown in fig. 8, the apparatus 800 includes a computing unit 801 that can perform various appropriate actions and processes according to a computer program stored in a Read Only Memory (ROM)802 or a computer program loaded from a storage unit 808 into a Random Access Memory (RAM) 803. In the RAM 803, various programs and data required for the operation of the device 800 can also be stored. The calculation unit 801, the ROM 802, and the RAM 803 are connected to each other by a bus 804. An input/output (I/O) interface 805 is also connected to bus 804.
A number of components in the device 800 are connected to the I/O interface 805, including: an input unit 806, such as a keyboard, a mouse, or the like; an output unit 807 such as various types of displays, speakers, and the like; a storage unit 808, such as a magnetic disk, optical disk, or the like; and a communication unit 809 such as a network card, modem, wireless communication transceiver, etc. The communication unit 809 allows the device 800 to exchange information/data with other devices via a computer network such as the internet and/or various telecommunication networks.
Computing unit 801 may be a variety of general and/or special purpose processing components with processing and computing capabilities. Some examples of the computing unit 801 include, but are not limited to, a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), various dedicated Artificial Intelligence (AI) computing chips, various computing units running machine learning model algorithms, a Digital Signal Processor (DSP), and any suitable processor, controller, microcontroller, and the like. The calculation unit 801 executes the respective methods and processes described above, such as the information sorting method. For example, in some embodiments, the information ranking method may be implemented as a computer software program tangibly embodied in a machine-readable medium, such as storage unit 808. In some embodiments, part or all of the computer program can be loaded and/or installed onto device 800 via ROM 802 and/or communications unit 809. When loaded into RAM 803 and executed by computing unit 801, a computer program may perform one or more of the steps of the information ordering method described above. Alternatively, in other embodiments, the computing unit 801 may be configured to perform the information ordering method by any other suitable means (e.g., by means of firmware).
Various implementations of the systems and techniques described here above may be implemented in digital electronic circuitry, integrated circuitry, Field Programmable Gate Arrays (FPGAs), Application Specific Integrated Circuits (ASICs), Application Specific Standard Products (ASSPs), system on a chip (SOCs), load programmable logic devices (CPLDs), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs that are executable and/or interpretable on a programmable system including at least one programmable processor, which may be special or general purpose, receiving data and instructions from, and transmitting data and instructions to, a storage system, at least one input device, and at least one output device.
Program code for implementing the methods of the present application may be written in any combination of one or more programming languages. These program codes may be provided to a processor or controller of a general purpose computer, special purpose computer, or other programmable data processing apparatus, such that the program codes, when executed by the processor or controller, cause the functions/operations specified in the flowchart and/or block diagram to be performed. The program code may execute entirely on the machine, partly on the machine, as a stand-alone software package partly on the machine and partly on a remote machine or entirely on the remote machine or server.
In the context of this application, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. A machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.
To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and a pointing device (e.g., a mouse or a trackball) by which a user can provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user can be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic, speech, or tactile input.
The systems and techniques described here can be implemented in a computing system that includes a back-end component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such back-end, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), Wide Area Networks (WANs), the internet, and blockchain networks.
The computer system may include clients and servers. A client and server are generally remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other. The Server can be a cloud Server, also called a cloud computing Server or a cloud host, and is a host product in a cloud computing service system, so as to solve the defects of high management difficulty and weak service expansibility in the traditional physical host and VPS service ("Virtual Private Server", or simply "VPS"). The server may also be a server of a distributed system, or a server incorporating a blockchain.
According to the face clustering method based on the structure perception, the face images to be processed are clustered according to the face features extracted by the convolutional neural network model, and a K neighbor graph is constructed. Through a deep learning technology, more accurate face features can be obtained, so that the face clustering of the constructed K neighbor graph is more accurate. The subgraph of the K-neighbor graph can represent the relationship of each face image in one cluster in the K-neighbor graph and can also represent the relationship between different clusters in the K-neighbor graph. After the graph convolution neural network is trained, a score prediction model can be obtained, the first pruning operation on the K neighbor graph is realized through the score prediction model, the edges which are connected in error are removed, and more accurate face clustering is obtained.
After the first pruning operation, the K neighbor graph can be subjected to second pruning operation, false edges are removed through intimacy degree calculation, the obtained face clustering is more accurate, and the structure of the K neighbor graph is clearer. After the second pruning, most false edges in the K neighbor graph are deleted, and the face clustering information read from the K neighbor graph is more accurate.
In the process of constructing the K neighbor graph, a structure-preserving sub-graph sampling strategy can be used for processing the K neighbor graph to obtain a sub-graph, so that an edge score prediction model is obtained. In the method, the subgraph reserves important structural information of the whole K neighbor graph, and reserves the connection with high correlation degree in the cluster and the connection with low correlation degree between the clusters. And randomness is introduced for generalization, and the processing method can enhance the performance of the model.
The above-described embodiments should not be construed as limiting the scope of the present application. It should be understood by those skilled in the art that various modifications, combinations, sub-combinations and substitutions may be made in accordance with design requirements and other factors. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present application shall be included in the protection scope of the present application.

Claims (10)

1. A face clustering method based on structure perception is characterized by comprising the following steps:
acquiring a plurality of face images to be processed, extracting the face features of each face image to be processed based on a pre-trained convolutional neural network model, and constructing a K neighbor image according to the face features of each face image to be processed;
inputting the K neighbor graph into a pre-trained edge score prediction model to obtain the score of each edge in the K neighbor graph; the edge score prediction model is obtained by sampling a K neighbor graph by using a structure-preserving sub-graph sampling strategy and training a graph convolution neural network by using a sub-graph obtained by sampling;
and performing first pruning operation on the K neighbor graph according to the scores of all edges in the K neighbor graph to obtain face clusters aiming at the plurality of face images to be processed.
2. The method of claim 1, wherein after the first pruning operation on the K-neighbor graph according to the scores of the edges in the K-neighbor graph, the method further comprises:
calculating the intimacy between every two nodes in the K neighbor graph subjected to the first pruning operation;
and carrying out secondary pruning operation on the K neighbor graph subjected to the primary pruning operation according to the intimacy between every two nodes.
3. The method of claim 2, wherein the affinity between each pair of nodes is calculated by the following formula:
Figure FDA0002974907810000011
wherein n is1Indicates the number of edges connected to node N1, N2Indicates the number of edges connected to node N2, and k indicates the number of neighbor nodes shared by node N1 and node N2.
4. The method according to any one of claims 1 to 3, wherein the edge score prediction model is obtained by training in advance by adopting the following steps:
acquiring a training sample set, wherein the training sample set comprises a plurality of face sample images;
extracting the face features of each face sample image based on the convolutional neural network model;
constructing K neighbor image samples according to the face features of each face sample image;
sampling K neighbor graph samples based on the structure-preserving sub-graph sampling strategy to obtain sub-graphs obtained after sampling, training a graph convolution neural network by using the sub-graphs obtained after sampling to obtain network parameters, and generating the edge score prediction model according to the network parameters.
5. The method of claim 4, wherein the sampling K-neighbor graph samples based on the structure-preserving sub-graph sampling strategy to obtain sampled sub-graphs comprises:
randomly selecting M clusters from the K neighbor image samples as sampling seeds;
for each seed cluster, selecting N nearest neighbor clusters of each seed cluster, and taking a graph formed by the M clusters and the N nearest neighbor clusters as a first sub-graph S1;
randomly selecting K1 clusters from the first sub-graph S1 to construct a second sub-graph S2;
and randomly selecting K2 nodes from the second subgraph S2 as the subgraph obtained after sampling.
6. The method of claim 4, wherein the training of the graph convolution neural network using the sampled subgraph comprises:
inputting the sub-graph obtained after sampling into the graph convolution neural network to obtain a fraction prediction value of each edge in the sub-graph;
calculating loss values between the fraction predicted values of the edges and the real fraction values of the corresponding edges based on a cross entropy loss function;
and training the graph convolution neural network according to the loss value.
7. A face clustering device based on structure perception is characterized by comprising:
the first acquisition module is used for acquiring a plurality of face images to be processed;
the characteristic extraction module is used for extracting the face characteristic of each face image to be processed based on a pre-trained convolutional neural network model;
the construction module is used for constructing a K neighbor image according to the face features of each face image to be processed;
the prediction module is used for inputting the K neighbor graph into a pre-trained edge score prediction model to obtain the score of each edge in the K neighbor graph; the edge score prediction model is obtained by sampling a K neighbor graph by using a structure-preserving sub-graph sampling strategy and training a graph convolution neural network by using a sub-graph obtained by sampling;
and the pruning operation module is used for carrying out first pruning operation on the K neighbor graph according to the scores of all edges in the K neighbor graph to obtain the face clusters aiming at the plurality of face images to be processed.
8. The apparatus of claim 7, wherein the pruning operation module is further configured to:
after carrying out first pruning operation on the K neighbor graph according to the scores of all edges in the K neighbor graph, calculating the intimacy between every two nodes in the K neighbor graph subjected to the first pruning operation;
and carrying out secondary pruning operation on the K neighbor graph subjected to the primary pruning operation according to the intimacy between every two nodes.
9. The apparatus of claim 7 or 8, further comprising:
the pre-training module is used for pre-training the edge score prediction model; wherein the pre-training module is specifically configured to:
acquiring a training sample set, wherein the training sample set comprises a plurality of face sample images;
extracting the face features of each face sample image based on the convolutional neural network model;
constructing K neighbor image samples according to the face features of each face sample image;
sampling K neighbor graph samples based on the structure-preserving sub-graph sampling strategy to obtain sub-graphs obtained after sampling, training a graph convolution neural network by using the sub-graphs obtained after sampling to obtain network parameters, and generating the edge score prediction model according to the network parameters.
10. The apparatus of claim 9, wherein the pre-training module is specifically configured to:
randomly selecting M clusters from the K neighbor image samples as sampling seeds;
for each seed cluster, selecting N nearest neighbor clusters of each seed cluster, and taking a graph formed by the M clusters and the N nearest neighbor clusters as a first sub-graph S1;
randomly selecting K1 clusters from the first sub-graph S1 to construct a second sub-graph S2;
and randomly selecting K2 nodes from the second subgraph S2 as the subgraph obtained after sampling.
CN202110272409.9A 2021-03-12 2021-03-12 Face clustering method and device based on structure perception Active CN112766421B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110272409.9A CN112766421B (en) 2021-03-12 2021-03-12 Face clustering method and device based on structure perception

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110272409.9A CN112766421B (en) 2021-03-12 2021-03-12 Face clustering method and device based on structure perception

Publications (2)

Publication Number Publication Date
CN112766421A true CN112766421A (en) 2021-05-07
CN112766421B CN112766421B (en) 2024-09-24

Family

ID=75691348

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110272409.9A Active CN112766421B (en) 2021-03-12 2021-03-12 Face clustering method and device based on structure perception

Country Status (1)

Country Link
CN (1) CN112766421B (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113361402A (en) * 2021-06-04 2021-09-07 北京百度网讯科技有限公司 Training method of recognition model, method, device and equipment for determining accuracy
CN113901904A (en) * 2021-09-29 2022-01-07 北京百度网讯科技有限公司 Image processing method, face recognition model training method, device and equipment
CN114511905A (en) * 2022-01-20 2022-05-17 哈尔滨工程大学 Face clustering method based on graph convolution neural network
CN115083003A (en) * 2022-08-23 2022-09-20 浙江大华技术股份有限公司 Clustering network training and target clustering method, device, terminal and storage medium

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20080304755A1 (en) * 2007-06-08 2008-12-11 Microsoft Corporation Face Annotation Framework With Partial Clustering And Interactive Labeling
CN110458078A (en) * 2019-08-05 2019-11-15 高新兴科技集团股份有限公司 A kind of face image data clustering method, system and equipment
WO2020119053A1 (en) * 2018-12-11 2020-06-18 平安科技(深圳)有限公司 Picture clustering method and apparatus, storage medium and terminal device
CN112101086A (en) * 2020-07-24 2020-12-18 南京航空航天大学 Face clustering method based on link prediction
WO2021027193A1 (en) * 2019-08-12 2021-02-18 佳都新太科技股份有限公司 Face clustering method and apparatus, device and storage medium

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20080304755A1 (en) * 2007-06-08 2008-12-11 Microsoft Corporation Face Annotation Framework With Partial Clustering And Interactive Labeling
WO2020119053A1 (en) * 2018-12-11 2020-06-18 平安科技(深圳)有限公司 Picture clustering method and apparatus, storage medium and terminal device
CN110458078A (en) * 2019-08-05 2019-11-15 高新兴科技集团股份有限公司 A kind of face image data clustering method, system and equipment
WO2021027193A1 (en) * 2019-08-12 2021-02-18 佳都新太科技股份有限公司 Face clustering method and apparatus, device and storage medium
CN112101086A (en) * 2020-07-24 2020-12-18 南京航空航天大学 Face clustering method based on link prediction

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113361402A (en) * 2021-06-04 2021-09-07 北京百度网讯科技有限公司 Training method of recognition model, method, device and equipment for determining accuracy
CN113361402B (en) * 2021-06-04 2023-08-18 北京百度网讯科技有限公司 Training method of recognition model, method, device and equipment for determining accuracy
CN113901904A (en) * 2021-09-29 2022-01-07 北京百度网讯科技有限公司 Image processing method, face recognition model training method, device and equipment
CN114511905A (en) * 2022-01-20 2022-05-17 哈尔滨工程大学 Face clustering method based on graph convolution neural network
CN115083003A (en) * 2022-08-23 2022-09-20 浙江大华技术股份有限公司 Clustering network training and target clustering method, device, terminal and storage medium
CN115083003B (en) * 2022-08-23 2022-11-22 浙江大华技术股份有限公司 Clustering network training and target clustering method, device, terminal and storage medium

Also Published As

Publication number Publication date
CN112766421B (en) 2024-09-24

Similar Documents

Publication Publication Date Title
CN113657465B (en) Pre-training model generation method and device, electronic equipment and storage medium
CN112766421B (en) Face clustering method and device based on structure perception
CN113642431B (en) Training method and device of target detection model, electronic equipment and storage medium
CN112560985B (en) Neural network searching method and device and electronic equipment
CN113222942A (en) Training method of multi-label classification model and method for predicting labels
CN112862005B (en) Video classification method, device, electronic equipment and storage medium
CN114677565B (en) Training method and image processing method and device for feature extraction network
CN114648676A (en) Point cloud processing model training and point cloud instance segmentation method and device
CN113313053A (en) Image processing method, apparatus, device, medium, and program product
CN113657289A (en) Training method and device of threshold estimation model and electronic equipment
CN110796135A (en) Target positioning method and device, computer equipment and computer storage medium
CN112580666A (en) Image feature extraction method, training method, device, electronic equipment and medium
CN113033458A (en) Action recognition method and device
CN114399780B (en) Form detection method, form detection model training method and device
CN113887630A (en) Image classification method and device, electronic equipment and storage medium
CN113963011A (en) Image recognition method and device, electronic equipment and storage medium
CN112561061A (en) Neural network thinning method, apparatus, device, storage medium, and program product
CN114898881A (en) Survival prediction method, device, equipment and storage medium
CN113536751B (en) Processing method and device of form data, electronic equipment and storage medium
CN115439916A (en) Face recognition method, apparatus, device and medium
CN115116080A (en) Table analysis method and device, electronic equipment and storage medium
CN114943608A (en) Fraud risk assessment method, device, equipment and storage medium
CN115019057A (en) Image feature extraction model determining method and device and image identification method and device
CN114330576A (en) Model processing method and device, and image recognition method and device
CN113590774A (en) Event query method, device and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant